Commit Graph

113 Commits

Author SHA1 Message Date
Kruglov Pavel
46a6b84a5a
Merge branch 'master' into auto-format-detection 2024-01-25 22:11:07 +01:00
Maksim Kita
2a327107b6 Updated implementation 2024-01-25 14:31:49 +03:00
avogar
617cc514b7 Try to detect file format automatically during schema inference if it's unknown 2024-01-23 18:59:39 +00:00
Nikolai Kochetov
8936c8376a Use predicate in getTaskIteratorExtension. 2024-01-02 17:14:16 +00:00
avogar
007353a2dd Add _size virtual column to s3/file/hdfs/url/azureBlobStorage engines 2023-11-22 18:12:36 +00:00
kssenii
d644992192 Fxi 2023-09-28 16:25:04 +02:00
kssenii
1749874e7b Fxi 2023-09-28 13:51:07 +02:00
kssenii
6b191a1afe Better 2023-09-27 14:54:31 +02:00
avogar
4c32097df3 Use filter by file/path before reading in url/file/hdfs table functions, reduce code duplication 2023-08-17 16:54:43 +00:00
Kruglov Pavel
fec5675cd4
Merge branch 'master' into better-progress-bar-2 2023-07-24 19:59:38 +02:00
avogar
cf082f2f9a Use read_bytes/total_bytes_to_read for progress bar in s3/file/url/... table functions 2023-06-22 17:24:43 +00:00
Nikolay Degterinsky
9a25958be8 Add HTTP header filtering 2023-06-15 13:49:49 +00:00
avogar
334f062fa0 fix style 2023-05-15 16:39:26 +00:00
avogar
70a8fd2c50 Fix schema inference with named collection, refactor Cluster table functions 2023-05-12 13:58:45 +00:00
avogar
2949ceced1 Fix adding structure to cluster table functions, make it better 2023-04-24 13:20:04 +00:00
avogar
447189a6ca Better 2023-04-21 17:54:09 +00:00
avogar
944f54aadf Finish urlCluster, refactor code, reduce code duplication 2023-04-21 17:24:37 +00:00
Kruglov Pavel
2ad161d2b7
Merge branch 'master' into non-blocking-connect 2023-04-19 13:39:40 +02:00
kssenii
13f29a7242 Better 2023-03-28 18:57:24 +02:00
kssenii
36cc6fee51 Rewrite data lakes (part 1) 2023-03-24 22:35:12 +01:00
Kruglov Pavel
f3f93dd06c
Merge branch 'master' into non-blocking-connect 2023-03-24 15:59:40 +01:00
Amos Bird
02c5d1f364
Correct exact_rows_before_limit in all senarios 2023-03-22 23:26:31 +08:00
avogar
38e44861ae Fix possible race conditions 2023-03-21 16:01:54 +00:00
Alexander Tokmakov
ed08f8f5c5
Merge branch 'master' into revert_25674 2023-03-12 02:33:25 +03:00
Alexander Tokmakov
7b1b238d0b Revert "Merge pull request #25674 from amosbird/distributedreturnconnection"
This reverts commit 5ffd99dfd4, reversing
changes made to 2796aa333f.
2023-03-11 19:09:47 +01:00
Maksim Kita
c835fa3958 Fixed tests 2023-03-11 11:51:54 +01:00
Maksim Kita
0358cb36d8 Fixed tests 2023-03-11 11:51:54 +01:00
flynn
b3a9468661 fix 2023-02-17 12:42:24 +00:00
Kruglov Pavel
4f380370a9
Fix s3Cluster schema inference in parallel distributed insert select (#46381)
* Fix s3Cluster schema inference in parallel distributed insert select
* Try fix flaky test
* Try SYSTEM SYNC REPLICA to avoid test flakiness
2023-02-15 15:30:43 +01:00
Robert Schulze
6ff232d782
Merge branch 'master' into rs/fix-fragile-linking 2023-02-08 12:51:12 +01:00
kssenii
ab0dedf0c8 Simplify code around storage s3 configuration 2023-02-06 16:23:17 +01:00
Robert Schulze
84b9ff450f
Fix terribly broken, fragile and potentially cyclic linking
Sorry for the clickbaity title. This is about static method
ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header
IO/ConnectionTimeouts.h, and defined in header
IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with
linking on s390x (##45520). There was an attempt to fix some
inconsistencies (#45848) but neither did @Algunenano nor me at first
really understand why the definition is in the header.

Turns out that ConnectionTimeoutsContext.h is only #include'd from
source files which are part of the normal server build BUT NOT part of
the keeper standalone build (which must be enabled via CMake
-DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as
a result, some misguided workarounds were introduced earlier, e.g.
0341c6c54b

The deeper cause was that getHTTPTimeouts() is passed a "Context". This
class is part of the "dbms" libary which is deliberately not linked by
the standalone build of clickhouse-keeper. The context is only used to
read the settings and the "Settings" class is part of the
clickhouse_common library which is linked by clickhouse-keeper already.

To resolve this mess, this PR

- creates source file IO/ConnectionTimeouts.cpp and moves all
  ConnectionTimeouts definitions into it, including getHTTPTimeouts().

- breaks the wrong dependency by passing "Settings" instead of "Context"
  into getHTTPTimeouts().

- resolves the previous hacks
2023-02-05 20:49:34 +00:00
Antonio Andelic
d5117f2aa6
Define S3 client with bucket and endpoint resolution (#45783)
* Update aws

* Define S3 client with bucket and endpoint resolution

* Add defines for ErrorCodes

* Use S3Client everywhere

* Remove unused errorcode

* Add DROP S3 CLIENT CACHE query

* Add a comment

* Fix style

* Update aws

* Update reference files

* Add missing include

* Fix unit test

* Remove unneeded declarations

* Correctly use RetryStrategy

* Rename S3Client to Client

* Fix retry count

* fix clang-tidy warnings
2023-02-03 14:30:52 +01:00
Raúl Marín
7c31cb7adc Proper includes for ConnectionTimeoutsContext.h 2023-01-31 16:11:32 +01:00
avogar
117ec13c9e Fix s3Cluster schema inference when structure from insertion table is used 2023-01-18 20:33:50 +00:00
Nikita Mikhaylov
857799fbca
Parallel distributed insert select with s3Cluster [3] (#44955)
* Revert "Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)""

This reverts commit b8d9066004.

* Fix build

* Better

* Fix test

* Automatic style fix

Co-authored-by: robot-clickhouse <robot-clickhouse@users.noreply.github.com>
2023-01-09 13:30:32 +01:00
Anton Popov
6cd606ffeb better saving of object info in iterator 2022-12-13 17:18:17 +00:00
Anton Popov
0c87031e80 Merge remote-tracking branch 'upstream/master' into HEAD 2022-12-13 16:33:21 +00:00
kssenii
88523ef0b6 Fix 2022-12-07 11:22:48 +01:00
kssenii
c7429d19e7 Merge remote-tracking branch 'upstream/master' into fix-progress-from-s3 2022-12-05 18:32:47 +01:00
chen
b6eddbac0d
fix s3Cluster function returns NOT_FOUND_COLUMN_IN_BLOCK error (#43629)
* fix s3Cluster function returns NOT_FOUND_COLUMN_IN_BLOCK error

* Update StorageS3Cluster.cpp

* Update 01801_s3_cluster_count.sql

* fix
2022-12-02 15:43:29 +01:00
Anton Popov
65a78bcd91 improve performance of storage S3 2022-11-26 15:24:01 +00:00
kssenii
5e01441f61 Show progress bar while reading from s3 table function 2022-11-21 17:56:02 +01:00
Sergei Trifonov
f2f0676bcc
Revert "Revert "S3 request per second rate throttling"" 2022-11-17 17:35:04 +01:00
Alexander Tokmakov
9011a18234
Revert "S3 request per second rate throttling" 2022-11-16 22:33:48 +03:00
Kseniia Sumarokova
59cf5def67
Merge branch 'master' into disk-s3-throttler 2022-11-15 12:13:37 +01:00
xiedeyantu
5504f3af9b fix skip_unavailable_shards does not work using s3Cluster table function 2022-11-12 00:03:36 +08:00
serxa
6d5d9ff421 rename ReadWriteSettings -> RequestSettings 2022-11-08 13:48:23 +00:00
Kruglov Pavel
21d50f76ea
Merge pull request #41979 from Avogar/s3-cluster-schema-inference
Fix schema inference in s3Cluster and improve in hdfsCluster
2022-11-01 14:00:21 +01:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00