mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-02 20:42:04 +00:00
ClickHouse® is a real-time analytics DBMS
4a02de4674
AWS S3 client can read file multiple times, this is required for: - calculate checksums - calculate signature (done only for HTTP, since ClickHouse uses PayloadSigningPolicy::Never) So this means that for HTTP, to send file to S3 it will be read 3x times, and for HTTPS 2x times. By overriding GetChecksumAlgorithmName() to return empty string, checksums can be disabled, and the input file will be read only once. And even though additional https layer adds extra integrity layer, someone still may find this too risky I guess, even though ClickHouse internal format (for MergeTree) has checksums, and more. Here is an example stacktrace of this excessive read: <details> <summary>stacktrace</summary> (lldb) bt * thread 383, name = 'BackupWorker', stop reason = breakpoint 1.1 * frame 0: 0x00000000103c5fc0 clickhouse`DB::StdStreamBufFromReadBuffer::seekpos() + 32 at StdStreamBufFromReadBuffer.cpp:67 frame 1: 0x000000001777f7f8 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() [inlined] std::__1::basic_streambuf<char, std::__1::char_traits<char>>::pubseekoff[abi:v15000](this=<unavailable>, __off=0, __way=cur, __which=8) + 120 at streambuf:162 frame 2: 0x000000001777f7e3 clickhouse`std::__1::basic_istream<char, std::__1::char_traits<char>>::tellg() + 99 at istream:1249 frame 3: 0x00000000152e4979 clickhouse`Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate() + 57 at CryptoImpl.cpp:223 frame 4: 0x00000000152dedee clickhouse`Aws::Utils::Crypto::MD5::Calculate() + 14 at MD5.cpp:30 frame 5: 0x00000000152db5ac clickhouse`Aws::Utils::HashingUtils::CalculateMD5() + 44 at HashingUtils.cpp:235 frame 6: 0x000000001528b97b clickhouse`Aws::Client::AWSClient::AddChecksumToRequest() const + 507 at AWSClient.cpp:772 frame 7: 0x000000001528ded2 clickhouse`Aws::Client::AWSClient::BuildHttpRequest() const + 1682 at AWSClient.cpp:930 frame 8: 0x00000000100b864f clickhouse`DB::S3::Client::BuildHttpRequest() const + 15 at Client.cpp:622 frame 9: 0x0000000015286a41 clickhouse`Aws::Client::AWSClient::AttemptOneRequest(this=0x00007ffde2f8f000, httpRequest=<unavailable>, request=<unavailable>, signerName=<unavailable>, signerRegionOverride=<unavailable>, signerServiceNameOverride="s3") const + 65 at AWSClient.cpp:491 frame 10: 0x00000000152845b9 clickhouse`Aws::Client::AWSClient::AttemptExhaustively(this=0x00007ffde2f8f000, uri=0x00007ffdd4d44f38, request=0x00007ffdd4d45d10, method=HTTP_PUT, signerName="SignatureV4", signerRegionOverride="us-east-1", signerServiceNameOverride="s3") const + 1337 at AWSClient.cpp:272 frame 11: 0x0000000015298d0d clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 45 at AWSXmlClient.cpp:99 frame 12: 0x0000000015298cb5 clickhouse`Aws::Client::AWSXMLClient::MakeRequest() const + 309 at AWSXmlClient.cpp:66 frame 13: 0x0000000015354b23 clickhouse`Aws::S3::S3Client::PutObject(this=0x00007ffde2f8f000, request=0x00007ffdd4d45d10) const + 2659 at S3Client.cpp:1731 frame 14: 0x00000000100b174f clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined] frame 15: 0x00000000100b173a clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 41 at Client.cpp:578 frame 16: 0x00000000100b1711 clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const + 981 at Client.cpp:508 frame 17: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject(DB::S3::ExtendedRequest<Aws::S3::Model::PutObjectRequest> const&) const [inlined] frame 18: 0x00000000100b133c clickhouse`DB::S3::Client::PutObject() const + 28 at Client.cpp:418 frame 19: 0x00000000103b96d6 clickhouse`DB::copyDataToS3File() </details> This new behaviour could be enabled with `s3_disable_checksum=true`. Note, that I've checked this implementation with GCS/R2/S3/MinIO and it works everywhere. |
||
---|---|---|
.github | ||
base | ||
benchmark | ||
cmake | ||
contrib | ||
docker | ||
docs | ||
packages | ||
programs | ||
rust | ||
src | ||
tests | ||
utils | ||
.clang-format | ||
.clang-tidy | ||
.clangd | ||
.editorconfig | ||
.exrc | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.pylintrc | ||
.snyk | ||
.yamllint | ||
AUTHORS | ||
CHANGELOG.md | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
format_sources | ||
LICENSE | ||
PreLoad.cmake | ||
README.md | ||
SECURITY.md |
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.
How To Install (Linux, macOS, FreeBSD)
curl https://clickhouse.com/ | sh
Useful Links
- Official website has a quick high-level overview of ClickHouse on the main page.
- ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
- Tutorial shows how to set up and query a small ClickHouse cluster.
- Documentation provides more in-depth information.
- YouTube channel has a lot of content about ClickHouse in video format.
- Slack and Telegram allow chatting with ClickHouse users in real-time.
- Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
- Code Browser (github.dev) with syntax highlighting, powered by github.dev.
- Static Analysis (SonarCloud) proposes C++ quality improvements.
- Contacts can help to get your questions answered if there are any.
Upcoming Events
- ClickHouse Meetup in San Francisco - Nov 14
- ClickHouse Meetup in Singapore - Nov 15
- ClickHouse Meetup in Berlin - Nov 30
- ClickHouse Meetup in NYC - Dec 11
- ClickHouse Meetup in Boston - Dec 12
Also, keep an eye out for upcoming meetups around the world. Somewhere else you want us to be? Please feel free to reach out to tyler clickhouse com.
Recent Recordings
- Recent Meetup Videos: Meetup Playlist Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
- Recording available: v23.10 Release Webinar All the features of 23.10, one convenient video! Watch it now!
- All release webinar recordings: YouTube playlist
Interested in joining ClickHouse and making it your full-time job?
We are a globally diverse and distributed team, united behind a common goal of creating industry-leading, real-time analytics. Here, you will have an opportunity to solve some of the most cutting-edge technical challenges and have direct ownership of your work and vision. If you are a contributor by nature, a thinker and a doer - we’ll definitely click!
Check out our current openings here: https://clickhouse.com/company/careers
Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!