ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-10 09:32:06 +00:00

Author	SHA1	Message	Date
JackyWoo	70a262a775	Add optimization uniq to count	2023-09-13 16:16:11 +08:00
Alexey Milovidov	bd4aec0601	Revert "Optimize uniq to count"	2023-09-13 09:14:06 +03:00
JackyWoo	d065ac32e0	Merge branch 'master' into optimize_uniq_to_count	2023-09-04 10:06:36 +08:00
Jiebin Sun	7c529e5691	Optimize the merge if all hashSets are singleLevel in UniqExactSet (#52973 ) * Optimize the merge if all hashSets are singleLevel In PR(https://github.com/ClickHouse/ClickHouse/pull/50748), it has added new phase `parallelizeMergePrepare` before merge if all the hashSets are not all singleLevel or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in parallel, which will increase the CPU utilization and QPS. But if all the hashtables are singleLevel, it could also benefit from the `parallelizeMergePrepare` optimization in most cases if the hashtable size are not too small. By tuning the Query `SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1` in different threads, we have got the mild threshold 6,000. Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs server. If the threads are less than 48, the hashSets are all twoLevel or mixed by singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel. And the QPS has got at most 2.35x performance gain. Threads Opt/Base 8 100.0% 16 99.4% 24 110.3% 32 99.9% 40 99.3% 48 99.8% 56 183.0% 64 234.7% 72 233.1% 80 229.9% 88 224.5% 96 229.6% 104 235.1% 112 229.5% 120 229.1% 128 217.8% 136 222.9% 144 217.8% 152 204.3% 160 203.2% Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * Add the comment and explanation for PR#52973 Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> --------- Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>	2023-08-30 11:26:16 +02:00
JackyWoo	a963048e1a	Merge branch 'master' into optimize_uniq_to_count	2023-08-28 11:10:05 +08:00
avogar	ecb0e9844c	Disable cache in perf test	2023-08-23 21:01:18 +00:00
avogar	68e3af56d4	Address comments	2023-08-23 13:19:15 +00:00
Kruglov Pavel	e193aec583	Merge branch 'master' into fast-count-from-files	2023-08-23 12:15:34 +02:00
Kruglov Pavel	67c5c0203b	Merge branch 'master' into fast-count-from-files	2023-08-22 15:03:48 +02:00
Alexey Milovidov	037277c4a2	Remove bad test	2023-08-22 14:21:23 +02:00
Michael Kolupaev	d752611c43	Performance test	2023-08-21 14:15:52 -07:00
Kruglov Pavel	88aee95122	Merge branch 'master' into fast-count-from-files	2023-08-21 14:46:33 +02:00
avogar	47304bf7aa	Optimize count from files in most input formats	2023-08-21 12:30:52 +00:00
Alexey Milovidov	125169d9ae	Remove useless test	2023-08-20 03:51:30 +02:00
robot-ch-test-poll4	3aa9cb1267	Merge pull request #51399 from liuneng1994/optimize_nullable_aggragate_serialized_method Optimize aggregation performance of nullable String key when use AggregationMethodSerialized	2023-08-16 19:37:44 +02:00
liuneng	8a83301316	optimize	2023-08-08 13:38:25 +08:00
liuneng	f33367cd8b	add more test	2023-08-08 13:38:25 +08:00
liuneng	f96b9b7512	optimize fixed size column	2023-08-08 13:38:25 +08:00
liuneng	035dbdaf22	remove numbers optimization. It will decrease performance	2023-08-08 13:38:25 +08:00
liuneng	4f9920c71c	optimize performance of nullable String And Number column serializeValueIntoArena	2023-08-08 13:38:25 +08:00
Duc Canh Le	ad0ac43814	fix performance test	2023-08-07 06:25:46 +00:00
Duc Canh Le	ed2a1d7c9b	select required columns when getting join	2023-08-07 03:15:20 +00:00
JackyWoo	43ea21a4ce	make default optimize_uniq_to_count to true	2023-08-02 18:28:22 +08:00
JackyWoo	1c930f34de	reduce performance time	2023-08-02 18:10:01 +08:00
JackyWoo	162c674d74	remove settings in uniq_to_count	2023-08-02 10:50:04 +08:00
JackyWoo	ef3f5e2a7c	fix performance tests error	2023-08-02 10:15:56 +08:00
JackyWoo	93b28903cb	Merge branch 'master' into optimize_uniq_to_count	2023-08-02 10:13:22 +08:00
Jiebin Sun	78f3a575f9	Convert hashSets in parallel before merge (#50748 ) * Convert hashSets in parallel before merge Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet, then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel and it will cost lots of cycle if it cosume all the singleLevelSet. The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if the hashsets are not all singleLevel or not all twoLevel. I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream ClickHouse. Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance gain. The overall geomean of 43 queries has gained 7.4% more than the base code. Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * add resize() for the data_vec in parallelizeMergePrepare() Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * Add the performance test prepare_hash_before_merge.xml Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * Fit the CI to rename the data set from hits_v1 to test.hits. Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * remove the redundant branch in UniqExactSet Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com> * Remove the empty methods and add throw exception in parallelizeMergePrepare() Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> --------- Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>	2023-07-27 15:06:34 +02:00
JackyWoo	5f47aacef2	add performance tests	2023-07-27 15:41:16 +08:00
JackyWoo	95c41f49e0	not change projection columns	2023-07-27 15:41:16 +08:00
robot-ch-test-poll4	110500049a	Merge pull request #50532 from nickitat/more_pushdown_for_right_side_of_join Push down to right side of a join in more cases	2023-07-26 14:43:57 +02:00
Nikita Taranov	b2acbe42b7	add perf test	2023-07-24 20:34:01 +02:00
Igor Nikonov	91f7185e8c	Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct	2023-07-24 18:47:23 +02:00
Igor Nikonov	90e393ecf6	Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct	2023-07-18 14:26:22 +00:00
Alexey Milovidov	62bfa4ed93	Fix performance test for regexp cache	2023-07-09 02:21:48 +02:00
vdimir	737cff7e57	Remove whole join_set_filter.xml, will resubmit	2023-07-03 17:00:20 +02:00
vdimir	9ea5d929a5	Update tests/performance/join_set_filter.xml	2023-07-03 17:00:20 +02:00
vdimir	ebd7ecb230	Remove unstable queries from performance/join_set_filter	2023-07-03 17:00:20 +02:00
Igor Nikonov	35bc97e5f9	Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct	2023-06-16 20:56:56 +00:00
Azat Khuzhin	5caa3a9e80	Adjust min_insert_block_size_rows for materialized_view_parallelize_output_from_storages Otherwise it is too slow for perf tests on CI [1]. [1]: https://s3.amazonaws.com/clickhouse-test-reports/50214/e287ec50920c7cadabea6ec19ef14b353345ac93/performance_comparison_[3_4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-06-14 19:11:23 +03:00
Azat Khuzhin	3e419730c3	Disable parallelize_output_from_storages for processing MATERIALIZED VIEWs Adding more processors for parallelize_output_from_storages is not a costless operation (I've experienced some issues in production because of this), and it is not easy to fix in a normal way, so let's disable it for now. Before this patch: - INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=1, min_insert_block_size_rows=1000 0 rows in set. Elapsed: 3.648 sec. Processed 20.00 million rows, 120.00 MB (5.48 million rows/s., 32.90 MB/s.) - INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=0, min_insert_block_size_rows=1000 0 rows in set. Elapsed: 1.851 sec. Processed 20.00 million rows, 120.00 MB (10.80 million rows/s., 64.82 MB/s.) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-06-14 19:11:23 +03:00
Igor Nikonov	79f53f428b	Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct	2023-06-13 13:45:36 +02:00
flynn	92c87dedad	Add parallel state merge for some other combinator except If (#50413 ) * Add parallel state merge for some other combinator except If * add test * update test	2023-06-08 00:41:32 +02:00
flynn	f616314f8b	fix typo	2023-05-29 02:22:13 +00:00
flynn	05783f99cd	update test	2023-05-28 14:17:59 +00:00
flynn	ec82c657eb	Parallel merge of uniqExactIf states	2023-05-28 06:04:23 +00:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
lgbo-ustc	a07359fbe8	enable used flags's reinit only when the hash talbe rehash	2023-05-11 11:06:13 +08:00
Alexey Milovidov	8a6e07f0ea	Make projections production-ready	2023-05-10 03:35:13 +02:00

1 2 3 4 5 ...

1094 Commits