Commit Graph

355 Commits

Author SHA1 Message Date
Alexey Milovidov
d112492c56 Remove some code 2024-01-13 03:48:04 +01:00
Robert Schulze
499227b9cf
Merge remote-tracking branch 'rschu1ze/master' into replace-std_regexp-by-re2 2024-01-10 10:02:53 +00:00
pufit
6cf55b82f4
Merge pull request #58539 from canhld94/file_custom_compress_level
Allow explicitly set compression level in output format
2024-01-09 13:43:38 -05:00
Robert Schulze
f553b55e3a
Merge remote-tracking branch 'rschu1ze/master' into regex-std-re2 2024-01-07 22:31:35 +00:00
Robert Schulze
8e804487f3
Some fixups 2024-01-07 22:28:08 +00:00
Duc Canh Le
2e14cfb526 add settings for output compression level and window size
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2024-01-04 08:16:00 +00:00
Nikolai Kochetov
d06de83ac1 Fix KeyCondition for file/url/s3 2024-01-03 17:44:28 +00:00
Robert Schulze
8d4b519bb1
Replace std::regex by re2 2024-01-03 15:06:20 +00:00
Nikolai Kochetov
eeed23b1bc Fix sanitizer assert. 2024-01-03 09:45:25 +00:00
Nikolai Kochetov
9c25cb6692 Cleanup 2024-01-02 18:08:04 +00:00
Nikolai Kochetov
1b20ce5162 Cleanup 2024-01-02 17:50:06 +00:00
Nikolai Kochetov
c808b03e55 Remove unneeded code 2024-01-02 17:27:33 +00:00
Nikolai Kochetov
8936c8376a Use predicate in getTaskIteratorExtension. 2024-01-02 17:14:16 +00:00
Nikolai Kochetov
3e3fed1cbe Add reading step to URL 2024-01-02 15:18:13 +00:00
Azat Khuzhin
b9233f6d4f Move Allocator code into module part
This should reduce amount of code that should be recompiled on
Exception.h changes (and everything else that had been included there).

This will actually not help a lot, because it is also included into
PODArray.h and ThreadPool.h at least... Sigh.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-12-27 15:42:08 +01:00
Nikita Mikhaylov
8372c70958 Merge branch 'master' of github.com:ClickHouse/ClickHouse into remove-the-limit-for-connections-per-endpoint 2023-12-13 18:29:56 +00:00
avogar
ee7af95bc0 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-12-08 20:29:28 +00:00
Nikita Mikhaylov
04d167c6d9 Better 2023-12-05 13:34:37 +01:00
avogar
4d9a1b50f9 Add information about new _size virtual column in file/s3/url/hdfs/azure table functions 2023-11-28 18:15:07 +00:00
avogar
007353a2dd Add _size virtual column to s3/file/hdfs/url/azureBlobStorage engines 2023-11-22 18:12:36 +00:00
avogar
872556a5d4 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-11-20 14:03:36 +00:00
Arthur Passos
b6e205dcdf
Add ClickHouse setting to disable tunneling for HTTPS requests over HTTP proxy (#55033)
* initial commit. integ tests passing, need to re-run unit & my own personal tests

* partial refactoring to remove Protocol::ANY

* improve naming

* remove all usages of ProxyConfiguration::Protocol::ANY

* fix ut

* blabla

* support url functions as well

* support for HTTPS requests over HTTP proxy with tunneling off

* remove gtestabc

* fix silly mistake

* ...

* remove usages of httpclientsession::proxyconfig in src/

* got you

* remove stale comment

* it seems like I need reasonable defaults

* fix ut

* add some comments

* remove no longer needed header

* matrix out

* add https over http proxy with no tunneling

* soem docs

* partial refactoring

* rename to use_tunneling_for_https_requests_over_http_proxy

* improve docs

* use shorter version

* remove useless test

* rename the setting

* update

* fix typo

* fix setting docs typo

* move ); up

* move ) up
2023-11-04 13:47:52 -04:00
Kruglov Pavel
570b66f027
Merge branch 'master' into schema-inference-union 2023-10-26 19:26:00 +02:00
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) (#55330)
* support orc filter push down

* update orc lib version

* replace setqueryinfo with setkeycondition

* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536

* refactor source with key condition

* fix building error

* remove std::cout

* update orc

* update orc version

* fix bugs

* improve code

* upgrade orc lib

* fix code style

* change as requested

* add performance tests for orc filter push down

* add performance tests for orc filter push down

* fix all bugs

* fix default as null issue

* add uts for null as default issues

* upgrade orc lib

* fix failed orc lib uts and fix typo

* fix failed uts

* fix failed uts

* fix ast fuzzer tests

* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html

* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm

* fix wrong performance tests

* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html

* add some comments

* add some comments

* inline range::equals and range::less

* fix data race of key condition

* trigger ci
2023-10-24 12:08:17 -07:00
avogar
6934e27e8b Add union mode for schema inference to infer union schema of files with different schemas 2023-10-20 20:46:41 +00:00
Michael Kolupaev
ce7eca0615
DWARF input format (#55450)
* Add ReadBufferFromFileBase::isRegularLocalFile()

* DWARF input format

* Review comments

* Changed things around ENABLE_EMBEDDED_COMPILER build setting

* Added 'ranges' column

* no-msan no-ubsan
2023-10-16 17:00:07 -07:00
avogar
3e08800cb5 Forbid special columns for file/s3/url/... storages, fix insert into ephemeral columns from files 2023-09-20 16:25:55 +00:00
avogar
2d8f33bfa2 Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header 2023-09-11 14:55:37 +00:00
Kseniia Sumarokova
b3319f7908
Minor changes (#54171) 2023-09-03 15:47:52 +02:00
Kruglov Pavel
d80be4673b
Merge pull request #53692 from Avogar/cache-count
Cache number of rows in files for count in file/s3/url/hdfs/azure functions
2023-08-25 09:08:52 +02:00
Antonio Andelic
f0e800092f
Merge pull request #53796 from ClickHouse/fix-filtered-url-paths
Fix reading from `url` with all filtered paths
2023-08-25 08:49:33 +02:00
Kruglov Pavel
44db5fa992
Merge branch 'master' into cache-count 2023-08-24 17:21:18 +02:00
Antonio Andelic
6eb400c6ae Fix reading from url with all filtered paths 2023-08-24 13:37:11 +00:00
Arthur Passos
2bade7db08
Add global proxy setting (#51749)
* initial impl

* fix env ut

* move ut directory

* make sure no null proxy resolver is returned by ProxyConfigurationResolverProvider

* minor adjustment

* add a few tests, still incomplete

* add proxy support for url table function

* use proxy for select from url as well

* remove optional from return type, just returns empty config

* fix style

* style

* black

* ohg boy

* rm in progress file

* god pls don't let me kill anyone

* ...

* add use_aws guards

* remove hard coded s3 proxy resolver

* add concurrency-mt-unsafe

* aa

* black

* add logging back

* revert change

* imrpove code a bit

* helper functions and separate tests

* for some reason, this env test is not working..

* formatting

* :)

* clangtidy

* lint

* revert some stupid things

* small test adjusmtments

* simplify tests

* rename test

* remove extra line

* freaking style change

* simplify a bit

* fix segfault & remove an extra call

* tightly couple proxy provider with context..

* remove useless include

* rename config prefix parameter

* simplify provider a bit

* organize provider a bit

* add a few comments

* comment out proxy env tests

* fix nullptr in unit tests

* make sure old storage proxy config is properly covered without global context instance

* move a few functions from class to anonymous namespace

* fix no fallback for specific storage conf

* change API to accept http method instead of bool

* implement http/https distinction in listresolver, any still not implemented

* implement http/https distinction in remote resolver

* progress on code, improve tests and add url function working test

* use protcol instead of method for http and https

* small fix

* few more adjustments

* fix style

* black

* move enum to proxyconfiguration

* wip

* fix build

* fix ut

* delete atomicroundrobin class

* remove stale include

* add some tests.. need to spend some more time on the design..

* change design a bit

* progress

* use existing context for tests

* rename aux function and fix ut

* ..

* rename test

* try to simplify tests a bit

* simplify tests a bit more

* attempt to fix tests, accept more than one remote resolver

* use proper log id

* try waiting for resolver

* proper wait logic

* black

* empty

* address a few comments

* refactor tests

* remove old tests

* baclk

* use RAII to set/unset env

* black

* clang tidy

* fix env proxy not respecting any

* use log trace

* fix wrong logic in getRemoteREsolver

* fix wrong logic in getRemoteREsolver

* fix test

* remove unwanted code

* remove ClientConfigurationperRequest and auxilary classes

* remove unwanted code

* remove adapter test

* few adjustments and add test for s3 storage conf  with new proxy settings

* black

* use chassert for context

* Add getenv comment
2023-08-24 16:07:26 +03:00
Kruglov Pavel
f7e1abd774
Merge branch 'master' into cache-count 2023-08-23 22:31:49 +02:00
avogar
894513f6cd Fix tests 2023-08-23 18:43:08 +00:00
Kruglov Pavel
592fa77987
Merge branch 'master' into cache-count 2023-08-23 15:18:02 +02:00
Kruglov Pavel
7e362a2110
Merge branch 'master' into fast-count-from-files 2023-08-23 15:13:20 +02:00
robot-ch-test-poll1
c22ffa6195
Merge pull request #53529 from Avogar/filter-files-all-table-functions
Use filter by file/path before reading in url/file/hdfs table functins
2023-08-23 14:21:23 +02:00
Kruglov Pavel
e193aec583
Merge branch 'master' into fast-count-from-files 2023-08-23 12:15:34 +02:00
pufit
e42da9411b Fix variables names 2023-08-22 11:23:10 -04:00
Kruglov Pavel
a34c9c3ebc
Fix build 2023-08-22 15:21:54 +02:00
Kruglov Pavel
67c5c0203b
Merge branch 'master' into fast-count-from-files 2023-08-22 15:03:48 +02:00
Kruglov Pavel
c0bdd0e00b
Merge branch 'master' into cache-count 2023-08-22 14:42:22 +02:00
avogar
b4145aeddc Cache number of rows in files for count in file/s3/url/hdfs/azure functions 2023-08-22 11:59:59 +00:00
pufit
9d454d9afc Merge branch 'master' into pufit/fix_s3_threads
# Conflicts:
#	src/Storages/StorageS3.cpp
#	src/Storages/StorageS3.h
#	src/Storages/StorageURL.cpp
#	src/Storages/StorageURL.h
2023-08-21 21:32:15 -04:00
pufit
98a701e2c1 Limiting number of parsing threads for S3 source 2023-08-21 21:21:03 -04:00
Michael Kolupaev
2f4d433e69 Parquet filter pushdown 2023-08-21 14:15:52 -07:00
avogar
47304bf7aa Optimize count from files in most input formats 2023-08-21 12:30:52 +00:00
avogar
8e445b5461 Fixes 2023-08-18 17:49:40 +00:00