Commit Graph

3189 Commits

Author SHA1 Message Date
Alexey Milovidov
10d65a1ade
Merge pull request #55559 from azat/s3-fix-excessive-reads
Add ability to disable checksums for S3 to avoid excessive input file read
2023-12-05 06:34:21 +01:00
kssenii
4a28f10c3d Minor cache changes 2023-12-04 19:02:37 +01:00
vdimir
a4ae90de0d
Merge pull request #57275 from ClickHouse/vdimir/merge_task_tmp_data
Background merges correctly use temporary data storage in the cache
2023-12-04 14:52:20 +01:00
robot-ch-test-poll
1b49463bd2
Merge pull request #55841 from nickitat/optimize_reading3
Optimize reading from cache
2023-12-01 17:36:57 +01:00
Nikolai Kochetov
823ba2db46
Merge pull request #57075 from yariks5s/s3_links_fix
S3-style links bug fix
2023-11-29 17:41:08 +01:00
vdimir
b5babe1692
MergeTask uses temporary data storage 2023-11-29 16:18:32 +00:00
Nikita Taranov
52f644c0df Merge branch 'master' into optimize_reading3 2023-11-28 16:36:38 +01:00
Antonio Andelic
6e8e4a6ca5 Lower level for annoying log 2023-11-28 12:41:35 +00:00
Azat Khuzhin
4a02de4674 Add ability to disable checksums for S3 to avoid excessive input file read
AWS S3 client can read file multiple times, this is required for:
- calculate checksums
- calculate signature (done only for HTTP, since ClickHouse uses
  PayloadSigningPolicy::Never)

So this means that for HTTP, to send file to S3 it will be read 3x
times, and for HTTPS 2x times.

By overriding GetChecksumAlgorithmName() to return empty string,
checksums can be disabled, and the input file will be read only once.

And even though additional https layer adds extra integrity layer,
someone still may find this too risky I guess, even though ClickHouse
internal format (for MergeTree) has checksums, and more.

Here is an example stacktrace of this excessive read:

<details>

<summary>stacktrace</summary>

    (lldb) bt
    * thread 383, name = 'BackupWorker', stop reason = breakpoint 1.1
      * frame 0: 0x00000000103c5fc0 clickhouse`DB::StdStreamBufFromReadBuffer::seekpos() + 32 at StdStreamBufFromReadBuffer.cpp:67
        frame 1: 0x000000001777f7f8 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() [inlined] std::__1::basic_streambuf<char, std::__1::char_traits<char>>::pubseekoff[abi:v15000](this=<unavailable>, __off=0, __way=cur, __which=8) + 120 at streambuf:162
        frame 2: 0x000000001777f7e3 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() + 99 at istream:1249
        frame 3: 0x00000000152e4979 clickhouse`Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate() + 57 at CryptoImpl.cpp:223
        frame 4: 0x00000000152dedee clickhouse`Aws::Utils::Crypto::MD5::Calculate() + 14 at MD5.cpp:30
        frame 5: 0x00000000152db5ac clickhouse`Aws::Utils::HashingUtils::CalculateMD5() + 44 at HashingUtils.cpp:235
        frame 6: 0x000000001528b97b clickhouse`Aws::Client::AWSClient::AddChecksumToRequest() const + 507 at AWSClient.cpp:772
        frame 7: 0x000000001528ded2 clickhouse`Aws::Client::AWSClient::BuildHttpRequest() const + 1682 at AWSClient.cpp:930
        frame 8: 0x00000000100b864f clickhouse`DB::S3::Client::BuildHttpRequest() const + 15 at Client.cpp:622
        frame 9: 0x0000000015286a41 clickhouse`Aws::Client::AWSClient::AttemptOneRequest(this=0x00007ffde2f8f000, httpRequest=<unavailable>, request=<unavailable>, signerName=<unavailable>, signerRegionOverride=<unavailable>, signerServiceNameOverride="s3") const + 65 at AWSClient.cpp:491
        frame 10: 0x00000000152845b9 clickhouse`Aws::Client::AWSClient::AttemptExhaustively(this=0x00007ffde2f8f000, uri=0x00007ffdd4d44f38, request=0x00007ffdd4d45d10, method=HTTP_PUT, signerName="SignatureV4", signerRegionOverride="us-east-1", signerServiceNameOverride="s3") const + 1337 at AWSClient.cpp:272
        frame 11: 0x0000000015298d0d clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 45 at AWSXmlClient.cpp:99
        frame 12: 0x0000000015298cb5 clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 309 at AWSXmlClient.cpp:66
        frame 13: 0x0000000015354b23 clickhouse`Aws::S3::S3Client::PutObject(this=0x00007ffde2f8f000, request=0x00007ffdd4d45d10) const + 2659 at S3Client.cpp:1731
        frame 14: 0x00000000100b174f clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 15: 0x00000000100b173a clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 41 at Client.cpp:578
        frame 16: 0x00000000100b1711 clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 981 at Client.cpp:508
        frame 17: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 18: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject() const + 28 at Client.cpp:418
        frame 19: 0x00000000103b96d6 clickhouse`DB::copyDataToS3File()

</details>

This new behaviour could be enabled with `s3_disable_checksum=true`.

Note, that I've checked this implementation with GCS/R2/S3/MinIO and it
works everywhere.
2023-11-26 19:20:19 +01:00
Alexey Milovidov
289df618f4
Merge pull request #57001 from arthurpassos/aws-s3-sign-any-x-amz-header-clean
Sign all aws headers
2023-11-24 21:12:10 +01:00
Yarik Briukhovetskyi
360d1b075a
Update URI.cpp 2023-11-22 13:43:33 +01:00
Yarik Briukhovetskyi
155cc84941
Update URI.cpp 2023-11-22 13:41:01 +01:00
Kseniia Sumarokova
e4f66b8469
Merge pull request #55158 from kssenii/fs-cache-improvement
fs cache improvement for big reads
2023-11-21 21:50:00 +01:00
yariks5s
be3b1f8188 initial fix 2023-11-21 18:49:16 +00:00
vdimir
a139ae97eb
Merge pull request #52918 from ClickHouse/vdimir/s3_blob_log
Add system table with blob storage operations log
2023-11-21 17:40:42 +01:00
Kseniia Sumarokova
d384762123
Merge branch 'master' into fs-cache-improvement 2023-11-21 11:24:52 +01:00
vdimir
15234474d7
Implement system table blob_storage_log 2023-11-21 09:18:25 +00:00
Sema Checherinda
485f1834d8
Merge pull request #56938 from CheSema/lz4-buffering
Lz4 compression: buffer block in a rare case
2023-11-20 20:33:30 +01:00
Arthur Passos
3544ee1e5f fix build by removing some const specifiers 2023-11-20 13:52:18 -03:00
Arthur Passos
e5129990ed sign all aws headers 2023-11-20 13:38:32 -03:00
Sema Checherinda
f999337dae
Revert "Revert "s3 adaptive timeouts"" 2023-11-20 14:53:22 +01:00
Alexander Tokmakov
5031f239c3
Revert "s3 adaptive timeouts" 2023-11-20 14:28:59 +01:00
Sema Checherinda
fafd169e7b
Update src/IO/Lz4DeflatingWriteBuffer.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-11-20 14:12:52 +01:00
Sema Checherinda
ebb66c1a9e add comments 2023-11-20 12:13:24 +01:00
Sema Checherinda
a950595c24
Merge pull request #56314 from CheSema/s3-aggressive-timeouts
s3 adaptive timeouts
2023-11-19 14:12:14 +01:00
Sema Checherinda
cacc23b8b7 safe SinkToOut d-tor 2023-11-19 12:25:42 +01:00
Sema Checherinda
24fbe620d3 fix build 2023-11-19 12:14:53 +01:00
Alexey Milovidov
edc3b2fe48
Merge pull request #56958 from ClickHouse/metric-queued-jobs
Add metrics for the number of queued jobs, which is useful for the IO thread pool
2023-11-19 10:37:18 +01:00
Sema Checherinda
053b20a255 fix in_data pointer 2023-11-19 00:44:39 +01:00
Alexey Milovidov
593f04a6b5 Fix style 2023-11-18 20:19:24 +01:00
Alexey Milovidov
d56cbda185 Add metrics for the number of queued jobs, which is useful for the IO thread pool 2023-11-18 19:07:59 +01:00
Sema Checherinda
773715a562 finalize tmp_out 2023-11-18 17:30:49 +01:00
Sema Checherinda
6d5a5f9fcd buffer result if out copacity is not enough 2023-11-17 17:31:00 +01:00
Jianfei Hu
ef79bf6467 Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into keeper-az-fix 2023-11-16 21:58:02 +00:00
Kseniia Sumarokova
a2ed756eec
Merge branch 'master' into fs-cache-improvement 2023-11-16 17:49:26 +01:00
kssenii
472cfdc86d Review fix 2023-11-16 17:47:51 +01:00
Sema Checherinda
4a1e207e7a review notes 2023-11-16 12:31:00 +01:00
Jianfei Hu
69f214cdbc fix comments.
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-16 08:04:57 +00:00
Jianfei Hu
ea92dbb1c7 fix build for non USE_S3 case
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-15 20:53:35 +00:00
Jianfei Hu
d0398e3c1d remove variant header
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-15 18:47:28 +00:00
Jianfei Hu
d862dfdf9c fix comments
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-15 18:38:23 +00:00
Sema Checherinda
6e3e6383ba perf check 2 2023-11-15 19:00:27 +01:00
avogar
38f200d969 Fix Date text parsing in optimistic path
1
2023-11-14 18:58:00 +00:00
Jianfei Hu
9df2775f08 reduce timeout and setTimeout earlier.
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-14 17:58:16 +00:00
Sema Checherinda
3075bd9745 track clickhouse high level retries 2023-11-14 11:34:12 +01:00
Sema Checherinda
8d36fd6e54 get rid off of client_with_long_timeout_ptr 2023-11-14 11:34:12 +01:00
Sema Checherinda
27fb25d056 alter the naming, fix client_with_long_timeout in s3 storage 2023-11-14 11:34:12 +01:00
Sema Checherinda
be01a5cd3e turn off agressive timeouts for heavy requests 2023-11-14 11:34:12 +01:00
Sema Checherinda
770a762317 aggressive timeout 2023-11-14 11:34:11 +01:00
Jianfei Hu
554d907189 Fix the keeper_server availability zone configuration.
Signed-off-by: Jianfei Hu <hujianfei258@gmail.com>
2023-11-13 23:42:51 +00:00