ClickHouse® is a real-time analytics DBMS
Go to file
Azat Khuzhin 79ad81dfdf Implement separate queue for parallel loader of hashed dictionaries
Previous patches in this series has a bottleneck in rehash(). This is
the most slowest operation when insert lots of rows into the hashtable
and eventually all that thread pool sometimes work as the most slowest
thread since we did not have any queue of blocks.

This patch adds such queue and now it scales linearly, so initialy with
1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value),
after this patch it works in 16 minutes with 16 threads (well actually I
have to use 32 threads because of distribution of data in the source
table).

And now with 16 threads it works 16 times faster.

Also this patch adds more optimal block splitting for the non-complex
dictionaries, and usual block splitting for complex dictionaries.
But anyway this moves the overhead from the loading into the hashtable
threads out to the reader thread, and this is better, since reader does
not uses that much CPU.

v2: fix use-after-free on failed load (add missing wait in dtor)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
.github Add a checkbox for documentation 2023-01-12 00:26:03 +03:00
base fix TSA support 2023-01-10 01:19:42 +00:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Merge pull request #44828 from ClickHouse/remove-two-lines-of-code 2023-01-04 04:50:52 +03:00
contrib Merge pull request #45144 from ClibMouse/crc-power-fix 2023-01-13 11:24:18 +01:00
docker Merge pull request #44594 from arenadata/ADQM-634 2023-01-12 15:07:45 -05:00
docs Add ability to load hashed dictionaries using multiple threads 2023-01-13 13:39:25 +01:00
packages Remove adduser dependency 2023-01-07 01:45:54 +01:00
programs Merge pull request #44327 from kssenii/use-new-named-collections-code-2 2023-01-06 13:06:26 +01:00
rust What happens if I remove these 139 lines of code? 2023-01-03 18:35:31 +00:00
src Implement separate queue for parallel loader of hashed dictionaries 2023-01-13 13:39:26 +01:00
tests Add ability to load hashed dictionaries using multiple threads 2023-01-13 13:39:25 +01:00
utils Update version_date.tsv and changelogs after v22.3.17.13-lts 2023-01-12 19:22:22 +00:00
.clang-format add BeforeLambdaBody to .clang-format 2022-02-11 16:51:45 +01:00
.clang-tidy Temporarily disable misc-* due to being too slow 2022-12-07 11:43:47 +00:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.exrc Fix vim settings (wrong group for autocmd) 2022-12-03 21:23:24 +01:00
.git-blame-ignore-revs Add files with revision to ignore for git blame 2022-09-13 23:05:56 +02:00
.gitattributes Ignore core.autocrlf for tests references 2022-10-05 09:13:27 +02:00
.gitignore Integrate skim into the client/local 2022-12-14 20:57:41 +01:00
.gitmodules Changes to support the CRC32 in PowerPC to address the WeakHash collision issue. Update the reference to support the hash values based on the specific platform 2023-01-10 21:20:13 -08:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.snyk Add exclusions from the Snyk scan 2022-10-31 17:47:02 +01:00
.yamllint Drop truthy.check-keys from yamllint (does not supported on CI) 2021-02-21 06:15:36 +03:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Update CHANGELOG.md 2022-12-20 20:56:40 +03:00
CMakeLists.txt What happens if I remove 156 lines of code? 2023-01-03 18:51:16 +00:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update LICENSE 2023-01-02 00:35:32 +01:00
PreLoad.cmake Update PreLoad.cmake 2022-08-26 18:30:05 +08:00
README.md Update README.md 2022-12-21 02:16:35 +03:00
SECURITY.md Update version_date.tsv and changelogs after v22.12.1.1752-stable 2022-12-15 17:07:16 +00:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

  • Official website has a quick high-level overview of ClickHouse on the main page.
  • ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
  • Tutorial shows how to set up and query a small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow chatting with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser (Woboq) with syntax highlight and navigation.
  • Code Browser (github.dev) with syntax highlight, powered by github.dev.
  • Contacts can help to get your questions answered if there are any.

Upcoming events

  • Recording available: v22.12 Release Webinar 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
  • ClickHouse Meetup at the CHEQ office in Tel Aviv - Jan 16 - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion!
  • ClickHouse Meetup at Microsoft Office in Seattle - Jan 18 - Keep an eye on this space as we will be announcing speakers soon!