ClickHouse® is a real-time analytics DBMS
Go to file
Azat Khuzhin 4a02de4674 Add ability to disable checksums for S3 to avoid excessive input file read
AWS S3 client can read file multiple times, this is required for:
- calculate checksums
- calculate signature (done only for HTTP, since ClickHouse uses
  PayloadSigningPolicy::Never)

So this means that for HTTP, to send file to S3 it will be read 3x
times, and for HTTPS 2x times.

By overriding GetChecksumAlgorithmName() to return empty string,
checksums can be disabled, and the input file will be read only once.

And even though additional https layer adds extra integrity layer,
someone still may find this too risky I guess, even though ClickHouse
internal format (for MergeTree) has checksums, and more.

Here is an example stacktrace of this excessive read:

<details>

<summary>stacktrace</summary>

    (lldb) bt
    * thread 383, name = 'BackupWorker', stop reason = breakpoint 1.1
      * frame 0: 0x00000000103c5fc0 clickhouse`DB::StdStreamBufFromReadBuffer::seekpos() + 32 at StdStreamBufFromReadBuffer.cpp:67
        frame 1: 0x000000001777f7f8 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() [inlined] std::__1::basic_streambuf<char, std::__1::char_traits<char>>::pubseekoff[abi:v15000](this=<unavailable>, __off=0, __way=cur, __which=8) + 120 at streambuf:162
        frame 2: 0x000000001777f7e3 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() + 99 at istream:1249
        frame 3: 0x00000000152e4979 clickhouse`Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate() + 57 at CryptoImpl.cpp:223
        frame 4: 0x00000000152dedee clickhouse`Aws::Utils::Crypto::MD5::Calculate() + 14 at MD5.cpp:30
        frame 5: 0x00000000152db5ac clickhouse`Aws::Utils::HashingUtils::CalculateMD5() + 44 at HashingUtils.cpp:235
        frame 6: 0x000000001528b97b clickhouse`Aws::Client::AWSClient::AddChecksumToRequest() const + 507 at AWSClient.cpp:772
        frame 7: 0x000000001528ded2 clickhouse`Aws::Client::AWSClient::BuildHttpRequest() const + 1682 at AWSClient.cpp:930
        frame 8: 0x00000000100b864f clickhouse`DB::S3::Client::BuildHttpRequest() const + 15 at Client.cpp:622
        frame 9: 0x0000000015286a41 clickhouse`Aws::Client::AWSClient::AttemptOneRequest(this=0x00007ffde2f8f000, httpRequest=<unavailable>, request=<unavailable>, signerName=<unavailable>, signerRegionOverride=<unavailable>, signerServiceNameOverride="s3") const + 65 at AWSClient.cpp:491
        frame 10: 0x00000000152845b9 clickhouse`Aws::Client::AWSClient::AttemptExhaustively(this=0x00007ffde2f8f000, uri=0x00007ffdd4d44f38, request=0x00007ffdd4d45d10, method=HTTP_PUT, signerName="SignatureV4", signerRegionOverride="us-east-1", signerServiceNameOverride="s3") const + 1337 at AWSClient.cpp:272
        frame 11: 0x0000000015298d0d clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 45 at AWSXmlClient.cpp:99
        frame 12: 0x0000000015298cb5 clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 309 at AWSXmlClient.cpp:66
        frame 13: 0x0000000015354b23 clickhouse`Aws::S3::S3Client::PutObject(this=0x00007ffde2f8f000, request=0x00007ffdd4d45d10) const + 2659 at S3Client.cpp:1731
        frame 14: 0x00000000100b174f clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 15: 0x00000000100b173a clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 41 at Client.cpp:578
        frame 16: 0x00000000100b1711 clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 981 at Client.cpp:508
        frame 17: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined]
        frame 18: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject() const + 28 at Client.cpp:418
        frame 19: 0x00000000103b96d6 clickhouse`DB::copyDataToS3File()

</details>

This new behaviour could be enabled with `s3_disable_checksum=true`.

Note, that I've checked this implementation with GCS/R2/S3/MinIO and it
works everywhere.
2023-11-26 19:20:19 +01:00
.github Fix missing argument for style_check.py in master workflow 2023-11-13 18:43:42 +01:00
base Revert "Mark select() as harmful function" 2023-11-24 14:04:42 +01:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Disable checksums for builds with fuzzer 2023-11-22 17:19:59 +01:00
contrib Revert "Update Sentry" 2023-11-26 04:30:05 +03:00
docker Merge pull request #55836 from azat/dist/limit-by-fix 2023-11-26 04:03:41 +01:00
docs Merge pull request #54327 from den-crane/background_fetches_pool_size 2023-11-26 02:50:38 +01:00
packages Follow-up 2023-11-20 21:34:22 +01:00
programs Merge pull request #57133 from vitlibar/change-default-for-wait_dictionaries_load_at_startup 2023-11-24 17:09:05 +01:00
rust Update Cargo.toml 2023-11-24 22:59:29 +03:00
src Add ability to disable checksums for S3 to avoid excessive input file read 2023-11-26 19:20:19 +01:00
tests Merge pull request #57230 from ClickHouse/remove-bad-test 2023-11-26 04:38:16 +01:00
utils Update version_date.tsv and changelogs after v23.8.8.20-lts 2023-11-25 19:48:29 +00:00
.clang-format Fix: incorrect brace style 2023-10-30 16:37:12 +00:00
.clang-tidy Disable '-clang-analyzer-unix.Malloc' for now 2023-09-27 08:51:17 +00:00
.clangd Enable few slow clang-tidy checks for clangd 2023-05-13 14:08:25 +02:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.exrc Fix vim settings (wrong group for autocmd) 2022-12-03 21:23:24 +01:00
.git-blame-ignore-revs Add files with revision to ignore for git blame 2022-09-13 23:05:56 +02:00
.gitattributes Ignore core.autocrlf for tests references 2022-10-05 09:13:27 +02:00
.gitignore Reproducible builds for Rust 2023-07-22 22:46:22 +02:00
.gitmodules changed method name, updated pocketfft repo reference 2023-11-21 06:52:47 -08:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.snyk Add exclusions from the Snyk scan 2022-10-31 17:47:02 +01:00
.yamllint Increase line-length limit for yamlllint 2023-06-13 19:45:51 +02:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Changelog for 23.10 2023-11-02 13:08:18 +01:00
CMakeLists.txt Fix build 2023-11-20 01:55:34 +01:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update LICENSE 2023-01-02 00:35:32 +01:00
PreLoad.cmake Revert "Check what will happen if we build ClickHouse with Musl" 2023-09-04 22:01:10 +02:00
README.md Add website link to README badge 2023-11-10 14:52:00 -08:00
SECURITY.md Update version_date.tsv and changelogs after v23.10.1.1976-stable 2023-11-02 19:34:42 +00:00

Website Apache 2.0 License

The ClickHouse company logo.

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

How To Install (Linux, macOS, FreeBSD)

curl https://clickhouse.com/ | sh

Upcoming Events

Also, keep an eye out for upcoming meetups around the world. Somewhere else you want us to be? Please feel free to reach out to tyler clickhouse com.

Recent Recordings

  • Recent Meetup Videos: Meetup Playlist Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
  • Recording available: v23.10 Release Webinar All the features of 23.10, one convenient video! Watch it now!
  • All release webinar recordings: YouTube playlist

Interested in joining ClickHouse and making it your full-time job?

We are a globally diverse and distributed team, united behind a common goal of creating industry-leading, real-time analytics. Here, you will have an opportunity to solve some of the most cutting-edge technical challenges and have direct ownership of your work and vision. If you are a contributor by nature, a thinker and a doer - well definitely click!

Check out our current openings here: https://clickhouse.com/company/careers

Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!