mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-09-20 08:40:50 +00:00
ClickHouse® is a real-time analytics DBMS
345c422e28
Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> |
||
---|---|---|
.github | ||
base | ||
benchmark | ||
cmake | ||
contrib | ||
docker | ||
docs | ||
packages | ||
programs | ||
rust | ||
src | ||
tests | ||
utils | ||
.clang-format | ||
.clang-tidy | ||
.editorconfig | ||
.exrc | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.pylintrc | ||
.snyk | ||
.yamllint | ||
AUTHORS | ||
CHANGELOG.md | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
format_sources | ||
LICENSE | ||
PreLoad.cmake | ||
README.md | ||
SECURITY.md |
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.
Useful Links
- Official website has a quick high-level overview of ClickHouse on the main page.
- ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
- Tutorial shows how to set up and query a small ClickHouse cluster.
- Documentation provides more in-depth information.
- YouTube channel has a lot of content about ClickHouse in video format.
- Slack and Telegram allow chatting with ClickHouse users in real-time.
- Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
- Code Browser (Woboq) with syntax highlight and navigation.
- Code Browser (github.dev) with syntax highlight, powered by github.dev.
- Contacts can help to get your questions answered if there are any.
Upcoming events
- Recording available: v22.12 Release Webinar 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
- ClickHouse Meetup at the CHEQ office in Tel Aviv - Jan 16 - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion!
- ClickHouse Meetup at Microsoft Office in Seattle - Jan 18 - Keep an eye on this space as we will be announcing speakers soon!