Commit Graph

1065 Commits

Author SHA1 Message Date
Jiebin Sun
78f3a575f9
Convert hashSets in parallel before merge (#50748)
* Convert hashSets in parallel before merge

Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet,
then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel
and it will cost lots of cycle if it cosume all the singleLevelSet.

The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if
the hashsets are not all singleLevel or not all twoLevel.

I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream
ClickHouse.
Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance
gain. The overall geomean of 43 queries has gained 7.4% more than the base code.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* add resize() for the data_vec in parallelizeMergePrepare()

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Add the performance test prepare_hash_before_merge.xml

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Fit the CI to rename the data set from hits_v1 to test.hits.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* remove the redundant branch in UniqExactSet

Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>

* Remove the empty methods and add throw exception in parallelizeMergePrepare()

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

---------

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
2023-07-27 15:06:34 +02:00
robot-ch-test-poll4
110500049a
Merge pull request #50532 from nickitat/more_pushdown_for_right_side_of_join
Push down to right side of a join in more cases
2023-07-26 14:43:57 +02:00
Nikita Taranov
b2acbe42b7 add perf test 2023-07-24 20:34:01 +02:00
Igor Nikonov
91f7185e8c
Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct 2023-07-24 18:47:23 +02:00
Igor Nikonov
90e393ecf6 Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct 2023-07-18 14:26:22 +00:00
Alexey Milovidov
62bfa4ed93 Fix performance test for regexp cache 2023-07-09 02:21:48 +02:00
vdimir
737cff7e57 Remove whole join_set_filter.xml, will resubmit 2023-07-03 17:00:20 +02:00
vdimir
9ea5d929a5 Update tests/performance/join_set_filter.xml 2023-07-03 17:00:20 +02:00
vdimir
ebd7ecb230 Remove unstable queries from performance/join_set_filter 2023-07-03 17:00:20 +02:00
Igor Nikonov
35bc97e5f9 Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct 2023-06-16 20:56:56 +00:00
Azat Khuzhin
5caa3a9e80 Adjust min_insert_block_size_rows for materialized_view_parallelize_output_from_storages
Otherwise it is too slow for perf tests on CI [1].

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/50214/e287ec50920c7cadabea6ec19ef14b353345ac93/performance_comparison_[3_4]/report.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-06-14 19:11:23 +03:00
Azat Khuzhin
3e419730c3 Disable parallelize_output_from_storages for processing MATERIALIZED VIEWs
Adding more processors for parallelize_output_from_storages is not a
costless operation (I've experienced some issues in production because
of this), and it is not easy to fix in a normal way, so let's disable it
for now.

Before this patch:
- INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=1, min_insert_block_size_rows=1000
  0 rows in set. Elapsed: 3.648 sec. Processed 20.00 million rows, 120.00 MB (5.48 million rows/s., 32.90 MB/s.)

- INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=0, min_insert_block_size_rows=1000
  0 rows in set. Elapsed: 1.851 sec. Processed 20.00 million rows, 120.00 MB (10.80 million rows/s., 64.82 MB/s.)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-06-14 19:11:23 +03:00
Igor Nikonov
79f53f428b
Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct 2023-06-13 13:45:36 +02:00
flynn
92c87dedad
Add parallel state merge for some other combinator except If (#50413)
* Add parallel state merge for some other combinator except If

* add test

* update test
2023-06-08 00:41:32 +02:00
flynn
f616314f8b fix typo 2023-05-29 02:22:13 +00:00
flynn
05783f99cd update test 2023-05-28 14:17:59 +00:00
flynn
ec82c657eb Parallel merge of uniqExactIf states 2023-05-28 06:04:23 +00:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
lgbo-ustc
a07359fbe8 enable used flags's reinit only when the hash talbe rehash 2023-05-11 11:06:13 +08:00
Alexey Milovidov
8a6e07f0ea Make projections production-ready 2023-05-10 03:35:13 +02:00
Alexey Milovidov
f449df85b6 Deprecate in-memory parts 2023-05-03 00:31:09 +02:00
Alexey Milovidov
c279516ac1
Merge branch 'master' into parallel-reading-from-file 2023-04-10 08:02:50 +03:00
Igor Nikonov
8fdc2b3326 Perf test 2023-04-07 20:06:11 +00:00
Anton Popov
10d2b1330b add perf test 2023-04-04 21:29:52 +00:00
Anton Popov
1e79245b94 add tests 2023-03-28 17:20:05 +00:00
Ongkong
d9c7bc1859
Fix ASOF LEFT JOIN performance degradation (#47544) 2023-03-18 23:53:00 +01:00
LiuNeng
d4c5ab9dcd
Optimize one nullable key aggregate performance (#45772) 2023-03-02 21:01:52 +01:00
Igor Nikonov
2f7aa8849b
Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct 2023-03-02 20:48:28 +01:00
Igor Nikonov
548d79c2e8 Remove perf test duplicate_order_by_and_distinct.xml 2023-03-02 12:31:09 +00:00
Alexander Gololobov
f64d08bd5c Enable lightweight delete support by default 2023-03-01 19:35:55 +01:00
Nikita Taranov
ab44740efb
Enable perf tests added in #45364 (#46623) 2023-02-28 00:26:11 +01:00
Alexey Milovidov
17992b178a
Merge pull request #45364 from nickitat/aggr_partitions_independently
Add option to aggregate partitions independently
2023-02-19 17:44:18 +03:00
Alexey Milovidov
417158f59f
Merge branch 'master' into lower_upper 2023-02-19 04:05:10 +03:00
Nikita Taranov
f70044f34b Merge branch 'master' into aggr_partitions_independently 2023-02-18 13:19:05 +00:00
Alexey Milovidov
6e0dab71ed
Merge pull request #46188 from bigo-sg/rewrite_array_exists
Rewrite array exists to has
2023-02-12 05:53:22 +03:00
Alexey Milovidov
786aa069e1
Merge pull request #46187 from ClickHouse/speed-up-count-digits
Speed up `countDigits`
2023-02-10 07:41:12 +03:00
taiyang-li
b83ad6bb81 add perf test 2023-02-09 12:30:50 +08:00
Alexey Milovidov
9a86d0087c Add performance test 2023-02-09 04:52:33 +01:00
Alexey Milovidov
66043eec24
Merge branch 'master' into decimal-performance 2023-02-09 04:59:37 +03:00
Igor Nikonov
72c393e7c4
Merge pull request #46014 from ClickHouse/inorder-optimization-update-sorting-properties
Update sorting properties after reading in order applied
2023-02-08 10:19:47 +01:00
Alexey Milovidov
a2df6e950e Whitespace 2023-02-08 03:38:23 +01:00
Alexey Milovidov
168fbc9d7b Add a test 2023-02-08 02:17:23 +01:00
李扬
444373679a
Merge branch 'master' into improve_decimal 2023-02-06 13:08:51 +08:00
Igor Nikonov
089a0009ad Polishing
+ try to stabilize distinct in order perf test
2023-02-05 13:38:20 +00:00
Nikita Taranov
b983b363f8 Merge branch 'master' into aggr_partitions_independently 2023-02-04 18:24:31 +00:00
李扬
ad6f39389d
Update tests/performance/column_array_filter.xml
Co-authored-by: Alexander Gololobov <440544+davenger@users.noreply.github.com>
2023-02-04 18:49:13 +08:00
Nikita Mikhaylov
33877b5e00
Parallel replicas. Part [2] (#43772) 2023-02-03 14:34:18 +01:00
taiyang-li
36a98a1628 add performance tests 2023-02-02 20:16:16 +08:00
Nikita Taranov
e7ca90adab fix perf test 2023-01-30 17:11:56 +00:00