mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-09-20 08:40:50 +00:00
ClickHouse® is a real-time analytics DBMS
a3f189e191
In case of skewed distribution simple division by module will not give you good distribution between shards and eventually this can lead to performance the same as non-sharded dictionary (except for it will occupy +1 thread for Block::scatter). But if HashedDictionary::blockToAttributes() will not have calls to HashedDictionary::getShard() this can be fixed by using a more complex key-to-shard (getShard()) mapping. And actually you do not need to call getShard() in blockToAttributes() you can simply use passed shard, and that's it. And by wrapping key with intHash64() in getShard() skewed distribution can be fixed. Note, that previously I tried similar approach but did not removed getShard() from blockToAttributes(), that's why it failed. And now it works almost as fast as with simple createBlockSelector(), just 13.6% slower (18.75min vs 16.5min, with 16 threads). Note, that I've also tried to add libdivide for this, but it does not improves the performance. I've also tried the approach without scatter, and it works 20% slower then this one (22.5min VS 18.75min, with 16 threads). v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard() (with intHash64() it works very slower, almost 2x slower, there was 18min with 32 threads) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> |
||
---|---|---|
.github | ||
base | ||
benchmark | ||
cmake | ||
contrib | ||
docker | ||
docs | ||
packages | ||
programs | ||
rust | ||
src | ||
tests | ||
utils | ||
.clang-format | ||
.clang-tidy | ||
.editorconfig | ||
.exrc | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.pylintrc | ||
.snyk | ||
.yamllint | ||
AUTHORS | ||
CHANGELOG.md | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
format_sources | ||
LICENSE | ||
PreLoad.cmake | ||
README.md | ||
SECURITY.md |
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.
Useful Links
- Official website has a quick high-level overview of ClickHouse on the main page.
- ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
- Tutorial shows how to set up and query a small ClickHouse cluster.
- Documentation provides more in-depth information.
- YouTube channel has a lot of content about ClickHouse in video format.
- Slack and Telegram allow chatting with ClickHouse users in real-time.
- Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
- Code Browser (Woboq) with syntax highlight and navigation.
- Code Browser (github.dev) with syntax highlight, powered by github.dev.
- Contacts can help to get your questions answered if there are any.
Upcoming events
- Recording available: v22.12 Release Webinar 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
- ClickHouse Meetup at the CHEQ office in Tel Aviv - Jan 16 - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion!
- ClickHouse Meetup at Microsoft Office in Seattle - Jan 18 - Keep an eye on this space as we will be announcing speakers soon!