Commit Graph

106 Commits

Author SHA1 Message Date
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) (#55330)
* support orc filter push down

* update orc lib version

* replace setqueryinfo with setkeycondition

* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536

* refactor source with key condition

* fix building error

* remove std::cout

* update orc

* update orc version

* fix bugs

* improve code

* upgrade orc lib

* fix code style

* change as requested

* add performance tests for orc filter push down

* add performance tests for orc filter push down

* fix all bugs

* fix default as null issue

* add uts for null as default issues

* upgrade orc lib

* fix failed orc lib uts and fix typo

* fix failed uts

* fix failed uts

* fix ast fuzzer tests

* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html

* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm

* fix wrong performance tests

* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html

* add some comments

* add some comments

* inline range::equals and range::less

* fix data race of key condition

* trigger ci
2023-10-24 12:08:17 -07:00
avogar
2d8f33bfa2 Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header 2023-09-11 14:55:37 +00:00
avogar
894513f6cd Fix tests 2023-08-23 18:43:08 +00:00
Kruglov Pavel
592fa77987
Merge branch 'master' into cache-count 2023-08-23 15:18:02 +02:00
robot-ch-test-poll1
c22ffa6195
Merge pull request #53529 from Avogar/filter-files-all-table-functions
Use filter by file/path before reading in url/file/hdfs table functins
2023-08-23 14:21:23 +02:00
Kruglov Pavel
c0bdd0e00b
Merge branch 'master' into cache-count 2023-08-22 14:42:22 +02:00
avogar
b4145aeddc Cache number of rows in files for count in file/s3/url/hdfs/azure functions 2023-08-22 11:59:59 +00:00
pufit
9d454d9afc Merge branch 'master' into pufit/fix_s3_threads
# Conflicts:
#	src/Storages/StorageS3.cpp
#	src/Storages/StorageS3.h
#	src/Storages/StorageURL.cpp
#	src/Storages/StorageURL.h
2023-08-21 21:32:15 -04:00
pufit
98a701e2c1 Limiting number of parsing threads for S3 source 2023-08-21 21:21:03 -04:00
Michael Kolupaev
2f4d433e69 Parquet filter pushdown 2023-08-21 14:15:52 -07:00
avogar
60b0b88d50 Clean up 2023-08-17 16:59:57 +00:00
avogar
4c32097df3 Use filter by file/path before reading in url/file/hdfs table functions, reduce code duplication 2023-08-17 16:54:43 +00:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns 2023-07-26 13:30:35 +02:00
avogar
8d634c992b Fix tests 2023-07-06 17:47:01 +00:00
avogar
d11cd0dc30 Fix tests 2023-07-05 17:56:03 +00:00
avogar
98aa6b317f Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions 2023-07-04 21:17:26 +00:00
avogar
4eeb431003 Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-progress-bar-2 2023-06-28 18:53:08 +00:00
avogar
c679dd400e Make better 2023-06-23 13:43:40 +00:00
avogar
cf082f2f9a Use read_bytes/total_bytes_to_read for progress bar in s3/file/url/... table functions 2023-06-22 17:24:43 +00:00
Sema Checherinda
d0bb985061 fix other classes based on SinkToStorage 2023-06-22 14:33:25 +02:00
Sema Checherinda
95349a405b release buffers with exception context 2023-06-22 13:00:13 +02:00
avogar
3209ebe34b Improve progress bar for file/s3/hdfs/url table functions. Step 1 2023-06-16 15:51:18 +00:00
avogar
2e1f56ae33 Address comments 2023-06-13 14:43:50 +00:00
Kruglov Pavel
bf28074d32
Merge branch 'master' into allow-skip-empty-files 2023-06-08 12:36:18 +02:00
Antonio Andelic
b11f744252
Correctly disable async insert with deduplication when it's not needed (#50663)
* Correctly disable async insert when it's not used

* Better

* Add comment

* Better

* Fix tests

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-06-07 20:33:08 +02:00
Michael Kolupaev
b51064a508 Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead 2023-06-01 18:48:30 -07:00
avogar
d4efbbfbd3 Allow to skip empty files in file/s3/url/hdfs table functions 2023-05-30 19:32:24 +00:00
avogar
88e4c93abc Merge branch 'master' of github.com:ClickHouse/ClickHouse into urlCluster 2023-05-22 19:19:57 +00:00
Nikolay Degterinsky
d4b89cb643
Merge pull request #49356 from Ziy1-Tan/vcol
Support for `_path` and `_file` virtual columns for table function `url`.
2023-05-22 18:10:32 +02:00
avogar
3ee8de792c Merge branch 'master' of github.com:ClickHouse/ClickHouse into urlCluster 2023-05-11 12:46:20 +00:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Ziy1-Tan
2c159061ed Support _path and _file virtual columns for table function url. 2023-05-01 21:40:30 +08:00
avogar
447189a6ca Better 2023-04-21 17:54:09 +00:00
avogar
0097230611 Better 2023-04-21 17:35:17 +00:00
avogar
944f54aadf Finish urlCluster, refactor code, reduce code duplication 2023-04-21 17:24:37 +00:00
avogar
86686fbbc3 Fix conflicts 2023-04-21 14:11:18 +02:00
kssenii
bb0beb7449 Merge remote-tracking branch 'upstream/master' into named-collections-finish 2023-03-17 13:02:36 +01:00
Konstantin Bogdanov
1bbf5acd47
Pass headers from StorageURL to WriteBufferFromHTTP (#46996)
* Pass headers from StorageURL to WriteBufferFromHTTP

* Add a test

* Lint

* `time.sleep(1)`

* Start echo server earlier

* Add proper handling for mock server start

* Automatic style fix

---------

Co-authored-by: robot-clickhouse <robot-clickhouse@users.noreply.github.com>
2023-03-03 13:55:52 +01:00
kssenii
ad88251ee7 Fix tests 2023-02-27 17:42:04 +01:00
kssenii
68e06ecb99 Replace for table function remote, and external storage 2023-02-21 14:33:37 +01:00
kssenii
a54b011670 Finish for mysql 2023-02-20 21:37:38 +01:00
kssenii
ab0dedf0c8 Simplify code around storage s3 configuration 2023-02-06 16:23:17 +01:00
attack204
1f4139718a fix:style 2023-01-19 16:19:39 +08:00
attack204
f549380867 fix:style 2023-01-19 16:10:59 +08:00
attack204
e312cfa794 feature:urlCluster 2023-01-19 10:19:04 +08:00
kssenii
30547d2dcd Replace old named collections code for url 2022-12-17 00:24:05 +01:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexey Milovidov
6e564b18bf
Merge pull request #40600 from FrankChen021/check_url_arg
Validate the CompressionMethod parameter of URL table engine
2022-08-27 19:29:55 +03:00
Frank Chen
c9ea4f9f77 Change compression_method from String to CompressionMethod 2022-08-25 19:18:04 +08:00