ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-26 09:32:01 +00:00

Author	SHA1	Message	Date
Jiebin Sun	78f3a575f9	Convert hashSets in parallel before merge (#50748 ) * Convert hashSets in parallel before merge Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet, then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel and it will cost lots of cycle if it cosume all the singleLevelSet. The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if the hashsets are not all singleLevel or not all twoLevel. I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream ClickHouse. Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance gain. The overall geomean of 43 queries has gained 7.4% more than the base code. Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * add resize() for the data_vec in parallelizeMergePrepare() Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * Add the performance test prepare_hash_before_merge.xml Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * Fit the CI to rename the data set from hits_v1 to test.hits. Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> * remove the redundant branch in UniqExactSet Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com> * Remove the empty methods and add throw exception in parallelizeMergePrepare() Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> --------- Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>	2023-07-27 15:06:34 +02:00
JackyWoo	5f47aacef2	add performance tests	2023-07-27 15:41:16 +08:00
JackyWoo	95c41f49e0	not change projection columns	2023-07-27 15:41:16 +08:00
robot-ch-test-poll4	110500049a	Merge pull request #50532 from nickitat/more_pushdown_for_right_side_of_join Push down to right side of a join in more cases	2023-07-26 14:43:57 +02:00
Nikita Taranov	b2acbe42b7	add perf test	2023-07-24 20:34:01 +02:00
Igor Nikonov	91f7185e8c	Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct	2023-07-24 18:47:23 +02:00
Igor Nikonov	90e393ecf6	Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct	2023-07-18 14:26:22 +00:00
Alexey Milovidov	62bfa4ed93	Fix performance test for regexp cache	2023-07-09 02:21:48 +02:00
vdimir	737cff7e57	Remove whole join_set_filter.xml, will resubmit	2023-07-03 17:00:20 +02:00
vdimir	9ea5d929a5	Update tests/performance/join_set_filter.xml	2023-07-03 17:00:20 +02:00
vdimir	ebd7ecb230	Remove unstable queries from performance/join_set_filter	2023-07-03 17:00:20 +02:00
Igor Nikonov	35bc97e5f9	Merge remote-tracking branch 'origin/master' into remove-perf-test-duplicate-order-by-and-distinct	2023-06-16 20:56:56 +00:00
Azat Khuzhin	5caa3a9e80	Adjust min_insert_block_size_rows for materialized_view_parallelize_output_from_storages Otherwise it is too slow for perf tests on CI [1]. [1]: https://s3.amazonaws.com/clickhouse-test-reports/50214/e287ec50920c7cadabea6ec19ef14b353345ac93/performance_comparison_[3_4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-06-14 19:11:23 +03:00
Azat Khuzhin	3e419730c3	Disable parallelize_output_from_storages for processing MATERIALIZED VIEWs Adding more processors for parallelize_output_from_storages is not a costless operation (I've experienced some issues in production because of this), and it is not easy to fix in a normal way, so let's disable it for now. Before this patch: - INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=1, min_insert_block_size_rows=1000 0 rows in set. Elapsed: 3.648 sec. Processed 20.00 million rows, 120.00 MB (5.48 million rows/s., 32.90 MB/s.) - INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=0, min_insert_block_size_rows=1000 0 rows in set. Elapsed: 1.851 sec. Processed 20.00 million rows, 120.00 MB (10.80 million rows/s., 64.82 MB/s.) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-06-14 19:11:23 +03:00
Igor Nikonov	79f53f428b	Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct	2023-06-13 13:45:36 +02:00
flynn	92c87dedad	Add parallel state merge for some other combinator except If (#50413 ) * Add parallel state merge for some other combinator except If * add test * update test	2023-06-08 00:41:32 +02:00
flynn	f616314f8b	fix typo	2023-05-29 02:22:13 +00:00
flynn	05783f99cd	update test	2023-05-28 14:17:59 +00:00
flynn	ec82c657eb	Parallel merge of uniqExactIf states	2023-05-28 06:04:23 +00:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Lirikl	39a505e3f3	init	2023-05-11 22:54:00 +03:00
lgbo-ustc	a07359fbe8	enable used flags's reinit only when the hash talbe rehash	2023-05-11 11:06:13 +08:00
Alexey Milovidov	8a6e07f0ea	Make projections production-ready	2023-05-10 03:35:13 +02:00
Alexey Milovidov	f449df85b6	Deprecate in-memory parts	2023-05-03 00:31:09 +02:00
Alexey Milovidov	c279516ac1	Merge branch 'master' into parallel-reading-from-file	2023-04-10 08:02:50 +03:00
Igor Nikonov	8fdc2b3326	Perf test	2023-04-07 20:06:11 +00:00
Anton Popov	10d2b1330b	add perf test	2023-04-04 21:29:52 +00:00
Anton Popov	1e79245b94	add tests	2023-03-28 17:20:05 +00:00
Ongkong	d9c7bc1859	Fix ASOF LEFT JOIN performance degradation (#47544 )	2023-03-18 23:53:00 +01:00
LiuNeng	d4c5ab9dcd	Optimize one nullable key aggregate performance (#45772 )	2023-03-02 21:01:52 +01:00
Igor Nikonov	2f7aa8849b	Merge branch 'master' into remove-perf-test-duplicate-order-by-and-distinct	2023-03-02 20:48:28 +01:00
Igor Nikonov	548d79c2e8	Remove perf test duplicate_order_by_and_distinct.xml	2023-03-02 12:31:09 +00:00
Alexander Gololobov	f64d08bd5c	Enable lightweight delete support by default	2023-03-01 19:35:55 +01:00
Nikita Taranov	ab44740efb	Enable perf tests added in #45364 (#46623 )	2023-02-28 00:26:11 +01:00
Alexey Milovidov	17992b178a	Merge pull request #45364 from nickitat/aggr_partitions_independently Add option to aggregate partitions independently	2023-02-19 17:44:18 +03:00
Alexey Milovidov	417158f59f	Merge branch 'master' into lower_upper	2023-02-19 04:05:10 +03:00
Nikita Taranov	f70044f34b	Merge branch 'master' into aggr_partitions_independently	2023-02-18 13:19:05 +00:00
Alexey Milovidov	6e0dab71ed	Merge pull request #46188 from bigo-sg/rewrite_array_exists Rewrite array exists to has	2023-02-12 05:53:22 +03:00
Alexey Milovidov	786aa069e1	Merge pull request #46187 from ClickHouse/speed-up-count-digits Speed up `countDigits`	2023-02-10 07:41:12 +03:00
taiyang-li	b83ad6bb81	add perf test	2023-02-09 12:30:50 +08:00
Alexey Milovidov	9a86d0087c	Add performance test	2023-02-09 04:52:33 +01:00
Alexey Milovidov	66043eec24	Merge branch 'master' into decimal-performance	2023-02-09 04:59:37 +03:00
Igor Nikonov	72c393e7c4	Merge pull request #46014 from ClickHouse/inorder-optimization-update-sorting-properties Update sorting properties after reading in order applied	2023-02-08 10:19:47 +01:00
Alexey Milovidov	a2df6e950e	Whitespace	2023-02-08 03:38:23 +01:00
Alexey Milovidov	168fbc9d7b	Add a test	2023-02-08 02:17:23 +01:00
李扬	444373679a	Merge branch 'master' into improve_decimal	2023-02-06 13:08:51 +08:00
Igor Nikonov	089a0009ad	Polishing + try to stabilize distinct in order perf test	2023-02-05 13:38:20 +00:00
Nikita Taranov	b983b363f8	Merge branch 'master' into aggr_partitions_independently	2023-02-04 18:24:31 +00:00
李扬	ad6f39389d	Update tests/performance/column_array_filter.xml Co-authored-by: Alexander Gololobov <440544+davenger@users.noreply.github.com>	2023-02-04 18:49:13 +08:00
Nikita Mikhaylov	33877b5e00	Parallel replicas. Part [2] (#43772 )	2023-02-03 14:34:18 +01:00
taiyang-li	36a98a1628	add performance tests	2023-02-02 20:16:16 +08:00
Nikita Taranov	e7ca90adab	fix perf test	2023-01-30 17:11:56 +00:00
Nikita Taranov	ac77808133	fix perf test	2023-01-30 17:11:56 +00:00
Nikita Taranov	52fe7edbd9	better key analysis	2023-01-30 17:11:56 +00:00
Nikita Taranov	2057db68a2	cosmetics	2023-01-30 17:10:45 +00:00
Nikita Taranov	1d45cce03c	support for aggr in order	2023-01-30 17:10:45 +00:00
Nikita Taranov	a2c9aeb7c9	stash	2023-01-30 17:10:45 +00:00
taiyang-li	d25740da83	change as request	2023-01-30 16:13:12 +08:00
Alexey Milovidov	bc2f454522	Merge branch 'master' into block-non-float-gorilla-v2	2023-01-28 03:30:12 +03:00
Igor Nikonov	300f78df96	Merge pull request #45567 from ClickHouse/enable-remove-redundant-sorting Enable query_plan_remove_redundant_sorting optimization by default	2023-01-27 19:14:36 +01:00
Igor Nikonov	41b94b4954	Enable query_plan_remove_redundant_sorting optimization by default	2023-01-24 13:38:21 +00:00
Robert Schulze	97d1bed114	Merge branch 'master' into improve_week_day	2023-01-21 20:40:33 +01:00
Robert Schulze	e6167d6b36	Deprecate Gorilla compression of non-float columns Reasons: 1. The original Gorilla paper proposed a compression schema for pairs of time stamps and double-precision FP values. ClickHouse's Gorilla codec only implements compression of the latter and it does not impose any data type restrictions. - Data types != Float* or (U)Int* (e.g. Decimal, Point etc.) are definitely not supposed to be used with Gorilla. - (U)Int* types are debatable. The paper only considers integers-stored-as-FP-values, a practical use case for which Gorilla works well. Standalone integers are not considered which makes them at least suspicious. 2. Achieve consistency with FPC, another specialized floating-point timeseries codec, which rejects non-float data. 3. On practical datasets, ZSTD is often "good enough" (*) so it should be okay to disincentive non-ZSTD codecs a little bit. If needed, Delta and DoubleDelta codecs are viable alternative for slowly changing (time-series-like) integer sequences. Since on-prem and hosted users may still have Gorilla-compressed non-float data, this combination is only deprecated for now. No warning or error will be emitted. Users are encouraged to migrate Gorilla-compressed non-float data to an alternative codec. It is planned to treat Gorilla-compressed non-float columns as "suspicious" six months after this commit (i.e. in v23.6). Even then, it will still be possible to set "allow_suspicious_codecs = true" and read and write Gorilla-compressed non-float data. () Sec. 4.1.2, "Gorilla restricts the value element in its tuple to a double floating point type.", https://doi.org/10.14778/2824032.2824078 (**) https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema	2023-01-20 17:31:16 +00:00
Igor Nikonov	7ed8fec94f	Revert "Remove redundant sorting"	2023-01-18 18:38:25 +01:00
Igor Nikonov	72066846cf	Merge pull request #43905 from ClickHouse/igor/remove_redundant_order_by Remove redundant sorting	2023-01-18 13:25:03 +01:00
Igor Nikonov	0cfa08df7a	Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by	2023-01-17 16:28:17 +00:00
Alexander Tokmakov	df75c24f01	Revert "Disallow Gorilla codec on non-float columns"	2023-01-16 19:14:28 +03:00
Igor Nikonov	a34991cb65	Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by	2023-01-16 12:14:02 +00:00
Robert Schulze	bd41c74ddf	Various test, code and docs fixups	2023-01-15 13:47:34 +00:00
Robert Schulze	7023d68536	Fix codecs_int_*.xml	2023-01-15 13:31:45 +00:00
Azat Khuzhin	925fd2c33a	tests/performance: do not use scientific notation in hashed_dictionary_sharded v2: fix few mistakes Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	345c422e28	Add ability to load hashed dictionaries using multiple threads Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:25 +01:00
Nikolai Kochetov	30310df5be	Merge branch 'master' into logical-optimizer-lowcardinality	2023-01-12 18:51:05 +01:00
Nikita Taranov	006fdd32d4	Apply preallocation optimisation more carefully (#44455 ) * impl * add perf test * fix * review fixes	2023-01-09 13:30:48 +01:00
Igor Nikonov	2187bdd4cc	Disable diagnostics + cleanup + disable optimization in sort performance test since it removes sorting at all	2023-01-06 17:00:05 +00:00
Nikolay Degterinsky	dfe93b5d82	Merge pull request #42284 from Algunenano/perf_experiment Performance experiment	2022-12-30 03:14:22 +01:00
Alexey Milovidov	79f2e747e4	Remove QuestDB (flaky test)	2022-12-28 12:42:14 +01:00
Raúl Marín	fc1fa82a39	Merge branch 'master' into perf_experiment	2022-12-27 10:51:58 +01:00
Raúl Marín	45d27f461b	Merge branch 'master' into perf_experiment	2022-12-20 09:07:48 +00:00
Kruglov Pavel	37df9b9990	Merge branch 'master' into refactor-schema-inference	2022-12-16 19:13:15 +01:00
Azat Khuzhin	53bac4de71	tests/perf: fix dependency check during DROP CI [1]: DB::Exception: Cannot drop or rename default.hierarchical_dictionary_source_table, because some tables depend on it: default.hierarchical_hashed_array_dictionary, default.hierarchical_flat_dictionary, default.hierarchical_hashed_dictionary. Stack trace: [1]: https://s3.amazonaws.com/clickhouse-test-reports/44256/8e67a361a8f14abec6717af09ee997eb25151685/performance_comparison_[1/4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-12-16 15:15:15 +01:00
Nikolay Degterinsky	9b6d31b95d	Merge branch 'master' into perf_experiment	2022-12-13 17:15:07 +01:00
avogar	7375a7d429	Refactor and improve schema inference for text formats	2022-12-07 21:19:27 +00:00
Guo Wangyang	b86686b3f8	Merge branch 'master' into logical-optimizer-lowcardinality	2022-12-07 13:33:25 +08:00
Maksim Kita	1cdc7ab62a	Merge pull request #43556 from Algunenano/interpretation_benchmark Add benchmark for query interpretation with JOINs	2022-12-01 22:53:02 +03:00
Vladimir C	53dc70a2d0	Merge pull request #38191 from BigRedEye/grace_hash_join Closes https://github.com/ClickHouse/ClickHouse/issues/11596	2022-11-30 17:01:00 +01:00
Nikolai Kochetov	51439e2c19	Merge pull request #43260 from ClickHouse/read-from-mt-in-io-pool Read from MergeTree in I/O pool	2022-11-29 12:09:03 +01:00
Nikolai Kochetov	d9fc13b230	Update async_remote_read.xml	2022-11-28 14:00:49 +01:00
Nikita Taranov	8ed5cfc265	Memory bound merging for distributed aggregation in order (#40879 ) * impl * fix style * make executeQueryWithParallelReplicas similar to executeQuery * impl for parallel replicas * cleaner code for remote sorting properties * update test * fix * handle when nodes of old versions participate * small fixes * temporary enable for testing * fix after merge * Revert "temporary enable for testing" This reverts commit `cce7f8884c`. * review fixes * add bc test * Update src/Core/Settings.h	2022-11-28 00:41:31 +01:00
Nikita Taranov	d1c258cf20	Add `xxh3` hash function (#43411 ) * impl * try fix * add docs * add test * rm unused file * excellent	2022-11-26 00:14:08 +01:00
Nikolai Kochetov	4632e7c644	Add max_streams_for_merge_tree_reading setting.	2022-11-25 17:14:22 +00:00
Nikolai Kochetov	dfd3976040	Update async_remote_read.xml	2022-11-25 14:53:45 +01:00
Igor Nikonov	236e7e3989	Small fixes	2022-11-25 12:04:12 +00:00
Igor Nikonov	20e67b7140	Merge remote-tracking branch 'origin/master' into HEAD	2022-11-24 13:10:37 +00:00
Nikolai Kochetov	e79c91947a	Update async_remote_read.xml	2022-11-24 12:35:02 +01:00
Raúl Marín	e910648c5d	Add benchmark for query interpretation with JOINs	2022-11-23 13:15:35 +01:00
Raúl Marín	ed0c174c0c	Merge remote-tracking branch 'blessed/master' into perf_experiment	2022-11-21 11:02:31 +01:00
Guo Wangyang	7d6ff90e34	Merge branch 'master' into logical-optimizer-lowcardinality	2022-11-20 09:56:50 +08:00
Nikolai Kochetov	5da1d893fd	Merge branch 'master' into read-from-mt-in-io-pool	2022-11-18 21:10:45 +01:00

1 2 3 4 5 ...

1118 Commits