Commit Graph

188 Commits

Author SHA1 Message Date
Kruglov Pavel
64e88cde21
Merge branch 'master' into better-progress-bar-2 2023-07-18 13:37:53 +02:00
avogar
c679dd400e Make better 2023-06-23 13:43:40 +00:00
avogar
cf082f2f9a Use read_bytes/total_bytes_to_read for progress bar in s3/file/url/... table functions 2023-06-22 17:24:43 +00:00
Michael Kolupaev
4a570a05c9 Decrease default timeouts for S3 and HTTP requests 2023-06-21 18:08:50 +00:00
avogar
3209ebe34b Improve progress bar for file/s3/hdfs/url table functions. Step 1 2023-06-16 15:51:18 +00:00
Antonio Andelic
b11f744252
Correctly disable async insert with deduplication when it's not needed (#50663)
* Correctly disable async insert when it's not used

* Better

* Add comment

* Better

* Fix tests

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-06-07 20:33:08 +02:00
Michael Kolupaev
b51064a508 Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead 2023-06-01 18:48:30 -07:00
Alexey Milovidov
52ffaa4630
Merge pull request #50203 from Avogar/head-requests-on-shcema-inference
Don't send head request for all keys in Iceberg schema inference
2023-06-02 01:28:00 +03:00
Alexey Milovidov
fb86fe8f9d Remove useless code 2023-06-01 03:08:05 +02:00
Alexey Milovidov
956c399b2a Remove useless code 2023-06-01 03:04:29 +02:00
avogar
bc527c7588 Don't send head request for all keys in Iceberg schema inference 2023-05-24 17:07:31 +00:00
Azat Khuzhin
2c40dd6a4c Switch Block::NameMap to google::dense_hash_map over HashMap
Since HashMap creates 2^8 elements by default, while dense_hash_map
should be good here.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 05:52:57 +02:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Michael Kolupaev
87be78e6de Better 2023-04-17 04:58:32 +00:00
kssenii
6f53784f22 Merge remote-tracking branch 'upstream/master' into better-tests-for-data-lakes 2023-04-13 15:56:40 +02:00
Azat Khuzhin
79b83c4fd2 Remove superfluous includes of logger_userful.h from headers
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-10 17:59:30 +02:00
kssenii
9b3d0ec86d Adjustments after conflicts 2023-04-03 19:53:34 +02:00
kssenii
8915f49b7d Merge remote-tracking branch 'upstream/master' into better-tests-for-data-lakes 2023-04-03 17:43:42 +02:00
Anton Popov
ff1cc5598f fix clang-tidy 2023-04-02 22:21:10 +00:00
Anton Popov
38389d878c fix one more race in StorageS3 2023-03-30 21:06:53 +00:00
Anton Popov
ed29c141fb fix race in StorageS3 2023-03-29 22:13:45 +00:00
kssenii
13f29a7242 Better 2023-03-28 18:57:24 +02:00
kssenii
36cc6fee51 Rewrite data lakes (part 1) 2023-03-24 22:35:12 +01:00
kssenii
e48d8d12e7 Fixes for hudi 2023-03-17 19:44:30 +01:00
flynn
b3a9468661 fix 2023-02-17 12:42:24 +00:00
flynn
4b1d997b82 fix 2023-02-17 12:27:53 +00:00
flynn
1d5b7ebc73 fix 2023-02-17 09:01:13 +00:00
flynn
a39f6f419b refactor 2023-02-17 08:27:52 +00:00
flynn
7f4c23ec8a fix 2023-02-16 12:48:22 +00:00
flynn
2968cdc8f6 fix 2023-02-16 10:18:22 +00:00
flynn
ecc39978d7 fix conflict 2023-02-16 02:23:55 +00:00
Kruglov Pavel
4f380370a9
Fix s3Cluster schema inference in parallel distributed insert select (#46381)
* Fix s3Cluster schema inference in parallel distributed insert select
* Try fix flaky test
* Try SYSTEM SYNC REPLICA to avoid test flakiness
2023-02-15 15:30:43 +01:00
flynn
289c5c60d3 fix 2023-02-15 11:24:06 +00:00
flynn
18cf72147e remove more redundant header files 2023-02-15 07:02:44 +00:00
flynn
f31451822e fix 2023-02-15 03:56:01 +00:00
flynn
c49a293a9f refactor and get rid of s3 2023-02-13 12:39:54 +00:00
flynn
d3dd9421da refactor and get rid of s3 2023-02-13 08:29:22 +00:00
flynn
db15634a01 fix conflict 2023-02-10 08:41:04 +00:00
kssenii
ab0dedf0c8 Simplify code around storage s3 configuration 2023-02-06 16:23:17 +01:00
flynn
0dd8a61a8e fix conflict 2023-02-06 03:25:12 +00:00
Antonio Andelic
d5117f2aa6
Define S3 client with bucket and endpoint resolution (#45783)
* Update aws

* Define S3 client with bucket and endpoint resolution

* Add defines for ErrorCodes

* Use S3Client everywhere

* Remove unused errorcode

* Add DROP S3 CLIENT CACHE query

* Add a comment

* Fix style

* Update aws

* Update reference files

* Add missing include

* Fix unit test

* Remove unneeded declarations

* Correctly use RetryStrategy

* Rename S3Client to Client

* Fix retry count

* fix clang-tidy warnings
2023-02-03 14:30:52 +01:00
flynn
fd1ee98183 fix style 2023-01-31 14:40:48 +00:00
flynn
ffddc0dcce fix conflict 2023-01-31 10:28:58 +00:00
flynn
fc2ce9e8e2 refactor and unify storage data lake 2023-01-29 14:53:56 +00:00
flynn
9b517cdc76 fix conflict 2023-01-29 08:34:56 +00:00
Vitaly Baranov
aea9ccdb60 Pass request settings to S3::getObjectInfo(). 2023-01-27 15:10:09 +01:00
Vitaly Baranov
a8304525ed Move getObjectInfo() to a separate header. 2023-01-27 15:09:38 +01:00
Anton Popov
8a1ea62aec fix race on cancelation of query in StorageS3 2023-01-24 01:12:01 +00:00
flynn
2fb32dc56c fix and add test 2023-01-18 08:33:55 +00:00