Commit Graph

570 Commits

Author SHA1 Message Date
Shani Elharrar
679a0e1300 StorageS3 / TableFunctionS3: Allow passing session_token to AuthSettings
This can help users that want to pass temporary credentials that
issued by AWS in order to load data from S3 without changing
configuration or creating an IAM User.

Fixes #57848
2023-12-19 08:06:36 +02:00
Shani Elharrar
c696c0bfe7 S3Common.AuthSettings: Allow passing SESSION_TOKEN to AWSCredentials
This sets the infrastructure of loading session_token and passing it directly
to all AWSCredentials instances that are created using the AuthSettings.

The default SESSION_TOKEN is set to an empty string as documented in AWS SDK
reference: https://sdk.amazonaws.com/cpp/api/0.12.9/d4/d27/class_aws_1_1_auth_1_1_a_w_s_credentials.html
2023-12-17 10:29:15 +02:00
avogar
ee7af95bc0 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-12-08 20:29:28 +00:00
Alexey Milovidov
10d65a1ade
Merge pull request #55559 from azat/s3-fix-excessive-reads
Add ability to disable checksums for S3 to avoid excessive input file read
2023-12-05 06:34:21 +01:00
Nikolai Kochetov
0b4131546a
Merge pull request #56813 from jsc0218/SystemTablesFilterEngine
Able to Filter Engine When Scanning System Tables
2023-12-01 16:02:27 +01:00
avogar
4d9a1b50f9 Add information about new _size virtual column in file/s3/url/hdfs/azure table functions 2023-11-28 18:15:07 +00:00
Nikolai Kochetov
08a7575984 Re-implement filtering a bit. 2023-11-28 16:17:35 +00:00
Azat Khuzhin
4a02de4674 Add ability to disable checksums for S3 to avoid excessive input file read
AWS S3 client can read file multiple times, this is required for:
- calculate checksums
- calculate signature (done only for HTTP, since ClickHouse uses
  PayloadSigningPolicy::Never)

So this means that for HTTP, to send file to S3 it will be read 3x
times, and for HTTPS 2x times.

By overriding GetChecksumAlgorithmName() to return empty string,
checksums can be disabled, and the input file will be read only once.

And even though additional https layer adds extra integrity layer,
someone still may find this too risky I guess, even though ClickHouse
internal format (for MergeTree) has checksums, and more.

Here is an example stacktrace of this excessive read:

<details>

<summary>stacktrace</summary>

    (lldb) bt
    * thread 383, name = 'BackupWorker', stop reason = breakpoint 1.1
      * frame 0: 0x00000000103c5fc0 clickhouse`DB::StdStreamBufFromReadBuffer::seekpos() + 32 at StdStreamBufFromReadBuffer.cpp:67
        frame 1: 0x000000001777f7f8 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() [inlined] std::__1::basic_streambuf<char, std::__1::char_traits<char>>::pubseekoff[abi:v15000](this=<unavailable>, __off=0, __way=cur, __which=8) + 120 at streambuf:162
        frame 2: 0x000000001777f7e3 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() + 99 at istream:1249
        frame 3: 0x00000000152e4979 clickhouse`Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate() + 57 at CryptoImpl.cpp:223
        frame 4: 0x00000000152dedee clickhouse`Aws::Utils::Crypto::MD5::Calculate() + 14 at MD5.cpp:30
        frame 5: 0x00000000152db5ac clickhouse`Aws::Utils::HashingUtils::CalculateMD5() + 44 at HashingUtils.cpp:235
        frame 6: 0x000000001528b97b clickhouse`Aws::Client::AWSClient::AddChecksumToRequest() const + 507 at AWSClient.cpp:772
        frame 7: 0x000000001528ded2 clickhouse`Aws::Client::AWSClient::BuildHttpRequest() const + 1682 at AWSClient.cpp:930
        frame 8: 0x00000000100b864f clickhouse`DB::S3::Client::BuildHttpRequest() const + 15 at Client.cpp:622
        frame 9: 0x0000000015286a41 clickhouse`Aws::Client::AWSClient::AttemptOneRequest(this=0x00007ffde2f8f000, httpRequest=<unavailable>, request=<unavailable>, signerName=<unavailable>, signerRegionOverride=<unavailable>, signerServiceNameOverride="s3") const + 65 at AWSClient.cpp:491
        frame 10: 0x00000000152845b9 clickhouse`Aws::Client::AWSClient::AttemptExhaustively(this=0x00007ffde2f8f000, uri=0x00007ffdd4d44f38, request=0x00007ffdd4d45d10, method=HTTP_PUT, signerName="SignatureV4", signerRegionOverride="us-east-1", signerServiceNameOverride="s3") const + 1337 at AWSClient.cpp:272
        frame 11: 0x0000000015298d0d clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 45 at AWSXmlClient.cpp:99
        frame 12: 0x0000000015298cb5 clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 309 at AWSXmlClient.cpp:66
        frame 13: 0x0000000015354b23 clickhouse`Aws::S3::S3Client::PutObject(this=0x00007ffde2f8f000, request=0x00007ffdd4d45d10) const + 2659 at S3Client.cpp:1731
        frame 14: 0x00000000100b174f clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 15: 0x00000000100b173a clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 41 at Client.cpp:578
        frame 16: 0x00000000100b1711 clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 981 at Client.cpp:508
        frame 17: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 18: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject() const + 28 at Client.cpp:418
        frame 19: 0x00000000103b96d6 clickhouse`DB::copyDataToS3File()

</details>

This new behaviour could be enabled with `s3_disable_checksum=true`.

Note, that I've checked this implementation with GCS/R2/S3/MinIO and it
works everywhere.
2023-11-26 19:20:19 +01:00
Kruglov Pavel
b84e3cf683
Merge branch 'master' into size-virtual-column 2023-11-22 19:25:00 +01:00
avogar
4a86f4a7b9 Fix style changes 2023-11-22 18:24:34 +00:00
avogar
007353a2dd Add _size virtual column to s3/file/hdfs/url/azureBlobStorage engines 2023-11-22 18:12:36 +00:00
vdimir
ffbe85d3a0
Merge pull request #56668 from ClickHouse/vdimir/analyzer_s3_partition_pruning
Analyzer: filtering by virtual columns for StorageS3
2023-11-22 16:44:44 +01:00
vdimir
15234474d7
Implement system table blob_storage_log 2023-11-21 09:18:25 +00:00
vdimir
31a6c7c1c4
Style changes around filterKeysForPartitionPruning 2023-11-20 18:08:45 +00:00
vdimir
95e9a27417
Remove ast based code from filterKeysForPartitionPruning 2023-11-20 17:59:58 +00:00
vdimir
a915eeded8
StorageS3 use filters from SourceStepWithFilter 2023-11-20 17:59:58 +00:00
vdimir
1cddfb1e6b
rewrite getBlockWithVirtuals for S3Storage 2023-11-20 17:59:57 +00:00
vdimir
cbb2e02c03
Analyzer: partition pruning for S3 2023-11-20 17:59:53 +00:00
avogar
f537bad469 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-11-20 14:32:50 +00:00
avogar
872556a5d4 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-11-20 14:03:36 +00:00
Sema Checherinda
f999337dae
Revert "Revert "s3 adaptive timeouts"" 2023-11-20 14:53:22 +01:00
Alexander Tokmakov
5031f239c3
Revert "s3 adaptive timeouts" 2023-11-20 14:28:59 +01:00
Sema Checherinda
a950595c24
Merge pull request #56314 from CheSema/s3-aggressive-timeouts
s3 adaptive timeouts
2023-11-19 14:12:14 +01:00
Alexey Milovidov
d56cbda185 Add metrics for the number of queued jobs, which is useful for the IO thread pool 2023-11-18 19:07:59 +01:00
Sema Checherinda
8d36fd6e54 get rid off of client_with_long_timeout_ptr 2023-11-14 11:34:12 +01:00
Sema Checherinda
27fb25d056 alter the naming, fix client_with_long_timeout in s3 storage 2023-11-14 11:34:12 +01:00
Alexey Milovidov
8c253b9e3e Remove C++ templates 2023-11-10 05:25:02 +01:00
Kruglov Pavel
570b66f027
Merge branch 'master' into schema-inference-union 2023-10-26 19:26:00 +02:00
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) (#55330)
* support orc filter push down

* update orc lib version

* replace setqueryinfo with setkeycondition

* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536

* refactor source with key condition

* fix building error

* remove std::cout

* update orc

* update orc version

* fix bugs

* improve code

* upgrade orc lib

* fix code style

* change as requested

* add performance tests for orc filter push down

* add performance tests for orc filter push down

* fix all bugs

* fix default as null issue

* add uts for null as default issues

* upgrade orc lib

* fix failed orc lib uts and fix typo

* fix failed uts

* fix failed uts

* fix ast fuzzer tests

* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html

* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm

* fix wrong performance tests

* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html

* add some comments

* add some comments

* inline range::equals and range::less

* fix data race of key condition

* trigger ci
2023-10-24 12:08:17 -07:00
Kruglov Pavel
844c1fb688
Fix 2023-10-24 19:35:03 +02:00
avogar
cfa510ea0a Add more documentation, fix build 2023-10-23 14:38:34 +00:00
Kruglov Pavel
6f61ccfe28
Merge branch 'master' into schema-inference-union 2023-10-20 22:54:11 +02:00
avogar
6934e27e8b Add union mode for schema inference to infer union schema of files with different schemas 2023-10-20 20:46:41 +00:00
kssenii
42ed249954 Fix build 2023-10-17 12:03:49 +02:00
kssenii
4464c86895 Merge remote-tracking branch 'origin/master' into s3-queue-fixes 2023-10-17 11:16:52 +02:00
Michael Kolupaev
ce7eca0615
DWARF input format (#55450)
* Add ReadBufferFromFileBase::isRegularLocalFile()

* DWARF input format

* Review comments

* Changed things around ENABLE_EMBEDDED_COMPILER build setting

* Added 'ranges' column

* no-msan no-ubsan
2023-10-16 17:00:07 -07:00
Kseniia Sumarokova
96c518be5b
Merge branch 'master' into s3-queue-fixes 2023-10-16 22:19:13 +02:00
Kruglov Pavel
836e35b6c4
Fix progress bar for s3 and azure Cluster functions with url without globs 2023-10-16 12:38:10 +02:00
kssenii
d64b990712 Merge remote-tracking branch 'origin/master' into s3-queue-fixes 2023-10-13 12:13:56 +02:00
kssenii
d644992192 Fxi 2023-09-28 16:25:04 +02:00
kssenii
f753b91a3b Better maintenance of processing node 2023-09-27 17:17:52 +02:00
kssenii
6b191a1afe Better 2023-09-27 14:54:31 +02:00
Robert Schulze
9fff447716
Re-enable clang-tidy checks 2023-09-26 09:34:12 +00:00
robot-ch-test-poll2
3c93a939a2
Merge pull request #54936 from ClickHouse/pufit/s3-yet-another-num-streams-adjustment
Set a minimum limit of `num_streams` in StorageS3
2023-09-23 02:59:50 +02:00
robot-clickhouse-ci-2
d98234dc9d
Merge pull request #54803 from Avogar/ephemeral-columns-from-files
Forbid special columns for file/s3/url/... storages, fix insert into ephemeral columns from files
2023-09-22 23:24:42 +02:00
pufit
13eee0e950 Set a minimum limit of num_streams in StorageS3 2023-09-22 14:13:20 -04:00
pufit
99c1d76604 Fix division by zero in StorageS3 2023-09-21 15:55:42 -04:00
Michael Kolupaev
9af9b4a085
Enable connection pooling for s3 table function (#54812)
Enable connection pooling for s3 table function
2023-09-21 09:27:20 -07:00
pufit
bd387f6d2c
Merge pull request #54815 from ClickHouse/pufit/optimal-num-streams-for-s3-storage
Adjusting `num_streams` by expected work in StorageS3
2023-09-21 08:41:39 -04:00
pufit
4a2f7976f0 Resolve PR issues 2023-09-20 19:43:02 -04:00