ClickHouse® is a real-time analytics DBMS
Go to file
Jiebin Sun 78f3a575f9
Convert hashSets in parallel before merge (#50748)
* Convert hashSets in parallel before merge

Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet,
then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel
and it will cost lots of cycle if it cosume all the singleLevelSet.

The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if
the hashsets are not all singleLevel or not all twoLevel.

I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream
ClickHouse.
Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance
gain. The overall geomean of 43 queries has gained 7.4% more than the base code.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* add resize() for the data_vec in parallelizeMergePrepare()

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Add the performance test prepare_hash_before_merge.xml

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Fit the CI to rename the data set from hits_v1 to test.hits.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* remove the redundant branch in UniqExactSet

Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>

* Remove the empty methods and add throw exception in parallelizeMergePrepare()

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

---------

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
2023-07-27 15:06:34 +02:00
.github Add dependencies to FinishCheck 2023-07-17 13:50:21 +00:00
base Improve logging macros (#52519) 2023-07-26 23:38:14 +03:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Use incbin for resources, part 2 2023-07-23 06:11:03 +02:00
contrib Merge pull request #49367 from ClickHouse/enc 2023-07-27 00:48:54 +02:00
docker Merge pull request #51851 from ClickHouse/add_delay_for_replicated 2023-07-26 12:59:37 +03:00
docs Merge pull request #52520 from zvonand/revert-52450-remove-to-decimal-string 2023-07-27 00:18:36 +02:00
packages Fix capabilities installed via systemd service (fixes netlink/IO priorities) 2023-07-21 13:57:31 +02:00
programs Merge pull request #49367 from ClickHouse/enc 2023-07-27 00:48:54 +02:00
rust Reproducible builds for Rust 2023-07-22 22:46:22 +02:00
src Convert hashSets in parallel before merge (#50748) 2023-07-27 15:06:34 +02:00
tests Convert hashSets in parallel before merge (#50748) 2023-07-27 15:06:34 +02:00
utils Merge pull request #50986 from arenadata/ADQM-822 2023-07-26 12:27:04 +02:00
.clang-format Configure rule for concepts requires clause 2023-06-05 06:55:52 -07:00
.clang-tidy readability-identifier-names: adjust invalid options 2023-05-13 20:28:55 +00:00
.clangd Enable few slow clang-tidy checks for clangd 2023-05-13 14:08:25 +02:00
.editorconfig
.exrc Fix vim settings (wrong group for autocmd) 2022-12-03 21:23:24 +01:00
.git-blame-ignore-revs Add files with revision to ignore for git blame 2022-09-13 23:05:56 +02:00
.gitattributes Ignore core.autocrlf for tests references 2022-10-05 09:13:27 +02:00
.gitignore Reproducible builds for Rust 2023-07-22 22:46:22 +02:00
.gitmodules Use incbin for resources, part 1 2023-07-23 06:11:03 +02:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.snyk Add exclusions from the Snyk scan 2022-10-31 17:47:02 +01:00
.yamllint Increase line-length limit for yamlllint 2023-06-13 19:45:51 +02:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Update CHANGELOG (#52655) 2023-07-27 14:38:53 +02:00
CMakeLists.txt Force libunwind usage (removes gcc_eh support) 2023-07-08 20:55:50 +02:00
CODE_OF_CONDUCT.md
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources
LICENSE Update LICENSE 2023-01-02 00:35:32 +01:00
PreLoad.cmake CMake cleanup: Remove configuration of CMAKE_SHARED_LINKER_FLAGS 2023-03-26 17:59:39 +00:00
README.md Update README.md 2023-07-14 18:56:53 +03:00
SECURITY.md Update version_date.tsv and changelogs after v23.6.1.1524-stable 2023-06-30 15:21:13 +00:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

How To Install (Linux, macOS, FreeBSD)

curl https://clickhouse.com/ | sh

Upcoming Events

Also, keep an eye out for upcoming meetups around the world. Somewhere else you want us to be? Please feel free to reach out to tyler clickhouse com.

Recent Recordings

  • Recent Meetup Videos: Meetup Playlist Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
  • Recording available: v23.6 Release Webinar All the features of 23.6, one convenient video! Watch it now!
  • All release webinar recordings: YouTube playlist

Interested in joining ClickHouse and making it your full-time job?

We are a globally diverse and distributed team, united behind a common goal of creating industry-leading, real-time analytics. Here, you will have an opportunity to solve some of the most cutting-edge technical challenges and have direct ownership of your work and vision. If you are a contributor by nature, a thinker and a doer - well definitely click!

Check out our current openings here: https://clickhouse.com/company/careers

Cant find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!