Commit Graph

157 Commits

Author SHA1 Message Date
Azat Khuzhin
8c54380d80 Avoid sending ComposeObject requests after upload to GCS
This should not be required anymore, but leave it as an option, since
likely this is required for old files.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-12-29 11:53:49 +01:00
Azat Khuzhin
f4a7789cd4 Convert various S3::Client settings into separate ClientSettings struct
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-12-29 11:53:49 +01:00
Vitaly Baranov
e1a136b791 Explicit finalize() function in ZipArchiveWriter.
Simplify too complicated code in ZipArchiveWriter.
2023-12-24 00:33:59 +01:00
Azat Khuzhin
4a02de4674 Add ability to disable checksums for S3 to avoid excessive input file read
AWS S3 client can read file multiple times, this is required for:
- calculate checksums
- calculate signature (done only for HTTP, since ClickHouse uses
  PayloadSigningPolicy::Never)

So this means that for HTTP, to send file to S3 it will be read 3x
times, and for HTTPS 2x times.

By overriding GetChecksumAlgorithmName() to return empty string,
checksums can be disabled, and the input file will be read only once.

And even though additional https layer adds extra integrity layer,
someone still may find this too risky I guess, even though ClickHouse
internal format (for MergeTree) has checksums, and more.

Here is an example stacktrace of this excessive read:

<details>

<summary>stacktrace</summary>

    (lldb) bt
    * thread 383, name = 'BackupWorker', stop reason = breakpoint 1.1
      * frame 0: 0x00000000103c5fc0 clickhouse`DB::StdStreamBufFromReadBuffer::seekpos() + 32 at StdStreamBufFromReadBuffer.cpp:67
        frame 1: 0x000000001777f7f8 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() [inlined] std::__1::basic_streambuf<char, std::__1::char_traits<char>>::pubseekoff[abi:v15000](this=<unavailable>, __off=0, __way=cur, __which=8) + 120 at streambuf:162
        frame 2: 0x000000001777f7e3 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() + 99 at istream:1249
        frame 3: 0x00000000152e4979 clickhouse`Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate() + 57 at CryptoImpl.cpp:223
        frame 4: 0x00000000152dedee clickhouse`Aws::Utils::Crypto::MD5::Calculate() + 14 at MD5.cpp:30
        frame 5: 0x00000000152db5ac clickhouse`Aws::Utils::HashingUtils::CalculateMD5() + 44 at HashingUtils.cpp:235
        frame 6: 0x000000001528b97b clickhouse`Aws::Client::AWSClient::AddChecksumToRequest() const + 507 at AWSClient.cpp:772
        frame 7: 0x000000001528ded2 clickhouse`Aws::Client::AWSClient::BuildHttpRequest() const + 1682 at AWSClient.cpp:930
        frame 8: 0x00000000100b864f clickhouse`DB::S3::Client::BuildHttpRequest() const + 15 at Client.cpp:622
        frame 9: 0x0000000015286a41 clickhouse`Aws::Client::AWSClient::AttemptOneRequest(this=0x00007ffde2f8f000, httpRequest=<unavailable>, request=<unavailable>, signerName=<unavailable>, signerRegionOverride=<unavailable>, signerServiceNameOverride="s3") const + 65 at AWSClient.cpp:491
        frame 10: 0x00000000152845b9 clickhouse`Aws::Client::AWSClient::AttemptExhaustively(this=0x00007ffde2f8f000, uri=0x00007ffdd4d44f38, request=0x00007ffdd4d45d10, method=HTTP_PUT, signerName="SignatureV4", signerRegionOverride="us-east-1", signerServiceNameOverride="s3") const + 1337 at AWSClient.cpp:272
        frame 11: 0x0000000015298d0d clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 45 at AWSXmlClient.cpp:99
        frame 12: 0x0000000015298cb5 clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 309 at AWSXmlClient.cpp:66
        frame 13: 0x0000000015354b23 clickhouse`Aws::S3::S3Client::PutObject(this=0x00007ffde2f8f000, request=0x00007ffdd4d45d10) const + 2659 at S3Client.cpp:1731
        frame 14: 0x00000000100b174f clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 15: 0x00000000100b173a clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 41 at Client.cpp:578
        frame 16: 0x00000000100b1711 clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 981 at Client.cpp:508
        frame 17: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 18: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject() const + 28 at Client.cpp:418
        frame 19: 0x00000000103b96d6 clickhouse`DB::copyDataToS3File()

</details>

This new behaviour could be enabled with `s3_disable_checksum=true`.

Note, that I've checked this implementation with GCS/R2/S3/MinIO and it
works everywhere.
2023-11-26 19:20:19 +01:00
vdimir
15234474d7
Implement system table blob_storage_log 2023-11-21 09:18:25 +00:00
Sema Checherinda
8d36fd6e54 get rid off of client_with_long_timeout_ptr 2023-11-14 11:34:12 +01:00
alesapin
3b02748cb6 Fix some typos 2023-10-15 15:43:02 +02:00
Sema Checherinda
d9e15c00c9 limit the delay before next try in S3 2023-09-14 19:45:07 +02:00
kssenii
9eb1dfcd12 Refactor buffers reading from object storage 2023-09-01 14:03:07 +02:00
Sema Checherinda
60103c577d
Merge pull request #53651 from CheSema/read_beyon_last_offset
fix Logical Error in AsynchronousBoundedReadBuffer
2023-08-23 11:21:47 +02:00
Sema Checherinda
7e4e6e31dc
NOLINT on sscanf in gtest 2023-08-22 13:32:21 +02:00
Sema Checherinda
ae5f66da1e fix special build 2023-08-22 11:34:08 +02:00
Sema Checherinda
81577e041b fix Logical Error in AsynchronousBoundedReadBuffer 2023-08-21 19:30:04 +02:00
Antonio Andelic
0e17d26b88 More formats supported, read single archive from 1 thread 2023-08-09 11:58:37 +00:00
Antonio Andelic
1fc1b6aae4 More fixes 2023-07-28 13:00:35 +00:00
Antonio Andelic
e83e0ec2cd Fix build 2023-07-28 12:26:56 +00:00
Antonio Andelic
720d587e85 Merge branch 'master' into add-reading-from-archives 2023-07-28 08:49:00 +00:00
Alexey Milovidov
59eadca95c
Merge branch 'master' into less-logs-2 2023-07-09 08:49:44 +03:00
Sema Checherinda
79a03432bf add test, add comment 2023-06-25 13:27:07 +02:00
Alexey Milovidov
7407330130
Merge branch 'master' into retry 2023-06-23 08:18:18 +03:00
Michael Kolupaev
4a570a05c9 Decrease default timeouts for S3 and HTTP requests 2023-06-21 18:08:50 +00:00
Sema Checherinda
1cb02e2710 do call finalize for all buffers 2023-06-16 16:38:18 +02:00
Sema Checherinda
aedd3afb8a
fix hung in unit tests (#50391)
* fix hung in unit tests

* Update gtest_writebuffer_s3.cpp

* Update gtest_writebuffer_s3.cpp

---------

Co-authored-by: Alexander Tokmakov <tavplubix@clickhouse.com>
2023-05-31 19:20:58 +03:00
Vitaly Baranov
6d45d0c374
Use fingerprints instead of key IDs in encrypted disks (#49882)
* Use fingerprints instead of key IDs to find keys in encrypted disks.
Always use little endian in the headers of encryption files.

* Add tests.

* Fix copying binary files to test containers.

* Fix ownership for copied files in test containers.

* Add comments after review.

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-05-31 13:11:10 +02:00
nikitakeba
f604fb82b2
Merge branch 'master' into add-reading-from-archives-support 2023-05-29 23:34:19 +03:00
Nikita Keba
04450a2042 add CheckFileInfo Unit Tests 2023-05-29 20:28:15 +00:00
Nikita Keba
8cf79cdb6c add SevenZipArchiveReader unit tests 2023-05-29 19:55:46 +00:00
Nikita Keba
636d50caa0 fix cmake + add unit tests for TarArchiveReader 2023-05-29 19:35:24 +00:00
kssenii
8924c17575 Fix build 2023-05-20 13:31:27 +02:00
Sema Checherinda
22f7aa8d89 make special build pass 2023-05-12 12:00:15 +02:00
Sema Checherinda
7fbf87be17 rework WriteBufferFromS3, squashed 2023-05-10 18:31:47 +00:00
Alexey Milovidov
9d6c3d7a4c
Merge pull request #48920 from awfeequdng/bugfix/maskLowBits
A non significant change (does not affect anything): add support for signed integers in the maskBits function
2023-04-21 13:22:44 +03:00
pengxiangcai
c7d6b8b643 add maskLowBits test 2023-04-20 21:33:41 +08:00
vdimir
92d0d9d4ff Http temporary buffer integration with fs cache 2023-04-19 16:44:21 +02:00
Azat Khuzhin
9ea4a55ddf Add a test for stringstream with INT_MAX
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-17 10:46:39 +02:00
Robert Schulze
05606a8835
Clean up GCC warning pragmas 2023-04-11 18:21:08 +00:00
Raúl Marín
b4ea2268ca Adapt unit tests to the new exception 2023-04-03 10:54:47 +02:00
Robert Schulze
f72a337074
Remove cruft from build
No need to check compiler flags, clang >= 15 supports all of them.
2023-03-17 13:44:04 +00:00
Mike Kot
9920a52c51 use std::lerp, constexpr hex.h 2023-03-07 22:50:17 +00:00
avogar
3c5a81f9f9 Add unit test for recursive checkpoints 2023-01-24 01:20:13 +00:00
Joanna Hulboj
501cc390f6 Prevent duplicates in column name hints. Improve formatting. 2022-12-22 16:58:30 +00:00
xiedeyantu
c258d3ac8b fix s3 support question mark wildcard 2022-11-18 12:11:22 +08:00
Raúl Marín
d4a0b76abf Fix compilation 2022-11-11 12:08:43 +01:00
Nikita Taranov
49f6692a2e
Adapt internal data structures to 512-bit era (#42564)
* impl

* update tests

* fix tests
2022-10-25 13:56:28 +02:00
Robert Schulze
78fc36ca49
Generate config.h into ${CONFIG_INCLUDE_PATH}
This makes the target location consistent with other auto-generated
files like config_formats.h, config_core.h, and config_functions.h and
simplifies the build of clickhouse_common.
2022-09-28 12:48:26 +00:00
Alexey Milovidov
730655d4fd Fix 8/9 of trash 2022-09-19 08:53:20 +02:00
Kruglov Pavel
ae81341ab5
Fix hadoop unit test 2022-08-22 12:42:18 +02:00
Sema Checherinda
f8480c26e7 do not trigger allocator to return null under sanitizers 2022-08-19 11:22:34 +02:00
Sema Checherinda
128e1fec3d Memory don't do alignment by itself, Allocator does 2022-08-18 16:01:30 +02:00
Sema Checherinda
b101ebdf32 check integer overflow at Memory class 2022-08-18 14:30:52 +02:00