ClickHouse® is a real-time analytics DBMS
Go to file
Azat Khuzhin 345c422e28 Add ability to load hashed dictionaries using multiple threads
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.

And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.

So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.

And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.

v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
.github Add a checkbox for documentation 2023-01-12 00:26:03 +03:00
base fix TSA support 2023-01-10 01:19:42 +00:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Merge pull request #44828 from ClickHouse/remove-two-lines-of-code 2023-01-04 04:50:52 +03:00
contrib Merge pull request #45144 from ClibMouse/crc-power-fix 2023-01-13 11:24:18 +01:00
docker Merge pull request #44594 from arenadata/ADQM-634 2023-01-12 15:07:45 -05:00
docs Add ability to load hashed dictionaries using multiple threads 2023-01-13 13:39:25 +01:00
packages Remove adduser dependency 2023-01-07 01:45:54 +01:00
programs Merge pull request #44327 from kssenii/use-new-named-collections-code-2 2023-01-06 13:06:26 +01:00
rust What happens if I remove these 139 lines of code? 2023-01-03 18:35:31 +00:00
src Add ability to load hashed dictionaries using multiple threads 2023-01-13 13:39:25 +01:00
tests Add ability to load hashed dictionaries using multiple threads 2023-01-13 13:39:25 +01:00
utils Update version_date.tsv and changelogs after v22.3.17.13-lts 2023-01-12 19:22:22 +00:00
.clang-format add BeforeLambdaBody to .clang-format 2022-02-11 16:51:45 +01:00
.clang-tidy Temporarily disable misc-* due to being too slow 2022-12-07 11:43:47 +00:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.exrc Fix vim settings (wrong group for autocmd) 2022-12-03 21:23:24 +01:00
.git-blame-ignore-revs Add files with revision to ignore for git blame 2022-09-13 23:05:56 +02:00
.gitattributes Ignore core.autocrlf for tests references 2022-10-05 09:13:27 +02:00
.gitignore Integrate skim into the client/local 2022-12-14 20:57:41 +01:00
.gitmodules Changes to support the CRC32 in PowerPC to address the WeakHash collision issue. Update the reference to support the hash values based on the specific platform 2023-01-10 21:20:13 -08:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.snyk Add exclusions from the Snyk scan 2022-10-31 17:47:02 +01:00
.yamllint Drop truthy.check-keys from yamllint (does not supported on CI) 2021-02-21 06:15:36 +03:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Update CHANGELOG.md 2022-12-20 20:56:40 +03:00
CMakeLists.txt What happens if I remove 156 lines of code? 2023-01-03 18:51:16 +00:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update LICENSE 2023-01-02 00:35:32 +01:00
PreLoad.cmake Update PreLoad.cmake 2022-08-26 18:30:05 +08:00
README.md Update README.md 2022-12-21 02:16:35 +03:00
SECURITY.md Update version_date.tsv and changelogs after v22.12.1.1752-stable 2022-12-15 17:07:16 +00:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

  • Official website has a quick high-level overview of ClickHouse on the main page.
  • ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
  • Tutorial shows how to set up and query a small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow chatting with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser (Woboq) with syntax highlight and navigation.
  • Code Browser (github.dev) with syntax highlight, powered by github.dev.
  • Contacts can help to get your questions answered if there are any.

Upcoming events

  • Recording available: v22.12 Release Webinar 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
  • ClickHouse Meetup at the CHEQ office in Tel Aviv - Jan 16 - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion!
  • ClickHouse Meetup at Microsoft Office in Seattle - Jan 18 - Keep an eye on this space as we will be announcing speakers soon!