ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-11 08:52:06 +00:00

Author	SHA1	Message	Date
Robert Schulze	b3b0716b32	Merge pull request #37544 from ClickHouse/cached_patterns Cache compiled regexps when evaluating non-const needles	2022-06-01 19:55:25 +02:00
Robert Schulze	81318e07d6	Try to fix performance test results	2022-06-01 11:53:37 +02:00
Anton Popov	20e319d67a	Merge pull request #37666 from CurtizJ/optimize-coalesce Optimize function `COALESCE` with two arguments	2022-05-31 23:48:13 +02:00
Anton Popov	30f8eb800a	optimize function coalesce with two arguments	2022-05-30 22:29:35 +00:00
Robert Schulze	ad12adc31c	Measure and rework internal re2 caching This commit is based on local benchmarks of ClickHouse's re2 caching. Question 1: ----------------------------------------------------------- Is pattern caching useful for queries with const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T; The short answer is: no. Runtime is (unsurprisingly) dominated by pattern evaluation + other stuff going on in queries, but definitely not pattern compilation. For space reasons, I omit details of the local experiments. (Side note: the current caching scheme is unbounded in size which poses a DoS risk (think of multi-tenancy). This risk is more pronounced when unbounded caching is used with non-const patterns ..., see next question) Question 2: ----------------------------------------------------------- Is pattern caching useful for queries with non-const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T; I benchmarked five caching strategies: 1. no caching as a baseline (= recompile for each row) 2. unbounded cache (= threadsafe global hash-map) 3. LRU cache (= threadsafe global hash-map + LRU queue) 4. lightweight local cache 1 (= not threadsafe local hashmap with collision list which grows to a certain size (here: 10 elements) and afterwards never changes) 5. lightweight local cache 2 (not threadsafe local hashmap without collision list in which a collision replaces the stored element, idea by Alexey) ... using a haystack of 2 mio strings and A). 2 mio distinct simple patterns B). 10 simple patterns C) 2 mio distinct complex patterns D) 10 complex patterns Fo A) and C), caching does not help but these queries still allow to judge the static overhead of caching on query runtimes. B) and D) are extreme but common cases in practice. They include queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' : '%pattern2%'). Caching should help significantly. Because LIKE patterns are internally translated to re2 expressions, I show only measurements for MATCH queries. Results in sec, averaged over on multiple measurements; 1.A): 2.12 B): 1.68 C): 9.75 D): 9.45 2.A): 2.17 B): 1.73 C): 9.78 D): 9.47 3.A): 9.8 B): 0.63 C): 31.8 D): 0.98 4.A): 2.14 B): 0.29 C): 9.82 D): 0.41 5.A) 2.12 / 2.15 / 2.26 B) 1.51 / 0.43 / 0.30 C) 9.97 / 9.88 / 10.13 D) 5.70 / 0.42 / 0.43 (10/100/1000 buckets, resp. 10/1/0.1% collision rate) Evaluation: 1. This is the baseline. It was surprised that complex patterns (C, D) slow down the queries so badly compared to simple patterns (A, B). The runtime includes evaluation costs, but as caching only helps with compilation, and looking at 4.D and 5.D, compilation makes up over 90% of the runtime! 2. No speedup compared to 1, probably due to locking overhead. The cache is unbounded, and in experiments with data sets > 2 mio rows, 2. is the only scheme to throw OOM exceptions which is not acceptable. 3. Unique patterns (A and C) lead to thrashing of the LRU cache and very bad runtimes due to LRU queue maintenance and locking. Works pretty well however with few distinct patterns (B and D). 4. This scheme is tailored to queries B and D where it performs pretty good. More importantly, the caching is lightweight enough to not deteriorate performance on datasets A and C. 5. After some tuning of the hash map size, 100 buckets seem optimal to be in the same ballpark with 10 distinct patterns as 4. Performance also does not deteriorate on A and C compared to the baseline. Unlike 4., this scheme behaves LRU-like and can adjust to changing pattern distributions. As a conclusion, this commit implementes two things: 1. Based on Q1, pattern search with const needle no longer uses caching. This applies to LIKE and MATCH + a few (exotic) other SQL functions. The code for the unbounded caching was removed. 2. Based on Q2, pattern search with non-const needles now use method 5.	2022-05-30 20:00:35 +02:00
Alexey Milovidov	9e3242f186	Merge pull request #37617 from CurtizJ/aggregation-sparse-columns Better performance with sparse columns in aggregate functions	2022-05-29 09:36:07 +03:00
Anton Popov	c39d95e2e6	add perf test	2022-05-28 12:56:38 +00:00
Alexey Milovidov	86afa3a245	Merge pull request #37502 from ClickHouse/array_norm_dist_fixes Renamed arrayXXNorm/arrayXXDistance functions to XXNorm/XXDistance and fixed some overflow cases	2022-05-27 00:56:29 +03:00
koloshmet	7e69779575	added fpc codec to float perftest	2022-05-26 22:32:56 +03:00
Maksim Kita	3a92e61827	Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement Dictionary getDescendants performance improvement	2022-05-26 12:29:17 +02:00
Maksim Kita	bee3c30f66	Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance Geo distance functions improve performance	2022-05-25 22:40:40 +02:00
Alexander Gololobov	168b47d0ad	Use same norm and distance function names for tuples and arrays	2022-05-25 22:39:59 +02:00
Maksim Kita	45da28ecae	Improve performance of geo distance functions	2022-05-25 14:22:22 +02:00
Maksim Kita	3c0c322d7c	Merge pull request #37480 from kitaisreal/dynamic-dispatch-infrastructure-improvements Dynamic dispatch infrastructure style fixes	2022-05-24 18:13:53 +02:00
Maksim Kita	e6e4b2826d	Dynamic dispatch infrastructure style fixes	2022-05-24 14:25:29 +02:00
Kruglov Pavel	6c9a524f6b	Merge pull request #37192 from Avogar/formats-with-names Improve performance and memory usage for select of subset of columns for some formats	2022-05-24 13:28:14 +02:00
avogar	3651ef93fe	Fix performance test	2022-05-23 17:42:13 +00:00
Alexander Gololobov	d0f5551c9f	Parameterized with norm kind	2022-05-23 18:27:41 +02:00
Alexander Gololobov	2658a9eeeb	Test with max_threads=1	2022-05-23 18:06:07 +02:00
Maksim Kita	94772f9cfc	Added performance tests	2022-05-23 14:43:13 +02:00
Alexander Gololobov	70cc27ecac	Test with different element types	2022-05-23 14:08:15 +02:00
Alexander Gololobov	7897a5bac7	Perf test for Norm and Distance fuctions for arrays and tuples	2022-05-23 10:18:24 +02:00
avogar	566d1b15fd	Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names	2022-05-20 13:54:52 +00:00
Maksim Kita	beb34e7062	Improve performance of unary arithmetic functions	2022-05-17 13:53:20 +02:00
avogar	cef13c2c02	Allow to skip unknown columns in Native format	2022-05-13 14:27:15 +00:00
avogar	b17fec659a	Improve performance and memory usage for select of subset of columns for some formats	2022-05-13 13:51:28 +00:00
Maksim Kita	437d70d4da	Fixed tests	2022-05-11 21:59:51 +02:00
Maksim Kita	837f2e8b9c	Update performance test	2022-05-11 21:59:51 +02:00
Maksim Kita	ea8ce3140a	Fixed tests	2022-05-11 21:59:51 +02:00
Maksim Kita	d85d72e5ad	Added performance tests	2022-05-11 21:59:51 +02:00
Alexey Milovidov	5a750d3305	Merge branch 'master' into revert-group-array-sorted	2022-05-05 00:51:01 +02:00
Alexey Milovidov	c95591f294	Merge pull request #36841 from ClickHouse/fix-performance-test-5 Remove "preconditions" from performance tests (overengineering, unneeded feature)	2022-05-02 12:22:56 +03:00
Alexey Milovidov	7cc64d40a6	Remove tags as well	2022-05-02 02:35:44 +02:00
Alexey Milovidov	88826e2da5	Remove "preconditions" from performance tests (overengineering, unneeded feature)	2022-05-02 02:33:22 +02:00
Alexey Milovidov	2691261aa2	Fix performance test	2022-05-02 02:21:19 +02:00
Alexey Milovidov	d6c0de0d40	Revert "Merge pull request #34055 from palegre-tiny/groupSortedArray" This reverts commit `f055d7b692`, reversing changes made to `4ec3c35e14`.	2022-04-30 12:29:23 +02:00
Alexey Milovidov	03dc4f6b83	Merge pull request #36779 from ClickHouse/fix-performance-test-2 Fix performance test	2022-04-30 08:56:44 +03:00
Alexey Milovidov	6c75b63953	Merge pull request #35914 from DevTeamBK/FIPS_compliance ClickHouse's boringssl module updated to the official version of the FIPS compliant.	2022-04-29 21:08:51 +03:00
Alexey Milovidov	76d660d6df	Fix performance test	2022-04-29 10:27:44 +02:00
Alexey Milovidov	df45c9503d	Trim down some tests	2022-04-29 04:33:12 +02:00
Alexey Milovidov	31215f874c	Fix slow performance test	2022-04-29 04:29:06 +02:00
Alexey Milovidov	bf100e0a4c	Remove "partial merge join" performance test, because we are not interested in the results	2022-04-29 04:13:30 +02:00
Meena Renganathan	bdaf5391cf	Merge branch 'master' of https://github.com/DevTeamBK/ClickHouse into FIPS_compliance	2022-04-28 06:15:46 -07:00
Meena Renganathan	98543a9a3f	Removed the tests aes-192-cfb128 and aes-256-cfb128 since the latest boringssl modules doesn't support	2022-04-27 07:48:43 -07:00
Alexander Gololobov	3c000b098a	Merge pull request #36638 from nickitat/fix_sorting_step Fix SortingStep::updateOutputStream()	2022-04-26 15:49:49 +02:00
mergify[bot]	d2ac9b2223	Merge branch 'master' into to_start_of_five_minutes	2022-04-25 17:36:38 +00:00
Nikita Taranov	5dc9478bac	fix SortingStep::updateOutputStream()	2022-04-25 17:29:14 +00:00
Mikhail f. Shiryaev	f53040b95e	Merge pull request #36559 from ClickHouse/performance-rebalance Use just index to split performance tests by group	2022-04-25 14:15:23 +02:00
Mikhail f. Shiryaev	f3aaff773a	Disable broken performance test	2022-04-25 11:26:05 +02:00
mergify[bot]	705d5af3a0	Merge branch 'master' into to_start_of_five_minutes	2022-04-24 22:24:24 +00:00
Mikhail f. Shiryaev	2aaaf41ee4	Speed-up file_table_function 20 times	2022-04-23 14:33:03 +02:00
Memo	39aadf0975	replaced toStartOfFiveMinute to toStartOfFiveMinutes	2022-04-22 10:49:59 +08:00
Maksim Kita	04089be144	Fix hash_table_sizes_stats performance test	2022-04-12 17:46:16 +02:00
Maksim Kita	3d36698f56	Fix group_by_sundy_li performance test	2022-04-12 17:06:38 +02:00
Maksim Kita	9b332c1e31	Fix early_constant_folding performance test	2022-04-12 17:06:21 +02:00
Maksim Kita	8cf67ed4c0	Fix constant_column_search performance tests	2022-04-12 15:22:14 +02:00
Maksim Kita	7803ecaee5	Fix performance tests	2022-04-12 15:22:14 +02:00
Kruglov Pavel	73adbb4c15	Merge pull request #35986 from amosbird/better-scalar1 Fix performance regression of scalar query	2022-04-07 14:07:59 +02:00
Amos Bird	df06f9f974	Fix performance regression of scalar query	2022-04-06 17:50:22 +08:00
Nikolai Kochetov	4479b68980	Merge pull request #35623 from nickitat/function_calculation_after_sorting_and_limit Functions calculation after sorting	2022-04-05 12:09:56 +02:00
Maksim Kita	b160ffd726	Merge pull request #35723 from ClickHouse/array-has-all-sse-avx2-optimizations Merging #27653	2022-04-05 11:09:14 +02:00
Nickita Taranov	4c51329ad6	stash	2022-04-04 14:33:57 +02:00
Maksim Kita	3c472a7897	Simplified hasAll performance tests	2022-04-04 13:34:40 +02:00
Nikolay Degterinsky	f055d7b692	Merge pull request #34055 from palegre-tiny/groupSortedArray Add groupSortedArray() function	2022-03-31 01:20:15 +03:00
Nikita Taranov	30f2a942c5	Predict size of hash table for GROUP BY (#33439 ) * use AggregationMethod ctor with reserve * add new settings * add HashTablesStatistics * support queries with limit * support distributed and with external aggregation * add new profile events * add some tests * add perf test * export cache stats through AsynchronousMetrics * rm redundant trace * fix style * fix 02122_parallel_formatting test * review fixes * fix 02122_parallel_formatting test * apply also to two-level HTs * try simpler strategy * increase max_size_to_preallocate_for_aggregation for experiment * fixes * Revert "increase max_size_to_preallocate_for_aggregation for experiment" This reverts commit `6cf6f75704`. * fix test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2022-03-30 22:47:51 +02:00
Maksim Kita	2742b88e6c	Merge pull request #27653 from ContentSquare/hasAllAny_SIMD Implement HasAll specialization for SSE and AVX2	2022-03-29 16:35:59 +02:00
mergify[bot]	bf5a9dcb7a	Merge branch 'master' into groupSortedArray	2022-03-28 20:57:07 +00:00
Alexey Milovidov	d54138425f	Rename yandexConsistentHash to kostikConsistentHash	2022-03-24 02:18:25 +01:00
mergify[bot]	5df84df596	Merge branch 'master' into groupSortedArray	2022-03-21 13:35:06 +00:00
Raúl Marín	90fd425117	Add ASOF performance test with random data	2022-03-17 19:48:25 +01:00
Raúl Marín	e2cec4e65b	Merge remote-tracking branch 'blessed/master' into asof_ftw	2022-03-17 16:33:29 +01:00
Maksim Kita	b202130841	Fixed performance tests	2022-03-15 15:43:39 +00:00
Maksim Kita	08bb39d869	Fixed performance tests	2022-03-15 15:43:39 +00:00
Maksim Kita	d49df02074	Fixed performance tests	2022-03-15 15:43:39 +00:00
Maksim Kita	98df85d2b7	Performance tests fix	2022-03-15 15:43:39 +00:00
Maksim Kita	0dd807d19d	Merge pull request #34750 from kitaisreal/merge-tree-improve-insert-performance MergeTree improve insert performance	2022-03-13 13:39:18 +01:00
Alexey Milovidov	0995c63ea1	Adjust timezone in performance tests	2022-03-11 23:49:13 +01:00
Maksim Kita	f1d2f2a9e1	Updated tests	2022-03-11 21:16:25 +00:00
Maksim Kita	d12618cd2e	Updated performance tests	2022-03-10 21:45:31 +00:00
Maksim Kita	765cd09d06	MergeTree improve insert performance	2022-03-10 21:45:31 +00:00
mergify[bot]	93b13c0232	Merge branch 'master' into asof_ftw	2022-03-02 13:20:50 +00:00
mergify[bot]	cd6f1d8fa4	Merge branch 'master' into groupSortedArray	2022-02-25 11:45:48 +00:00
Raúl Marín	e0c6014ecd	Mention scipy dep in performance bench README	2022-02-20 02:43:28 +01:00
Raúl Marín	2627c8d437	Add a performance test using ASOF	2022-02-18 17:37:24 +01:00
Maksim Kita	80b0efb367	Performance tests fix H3	2022-02-18 15:57:54 +00:00
Pablo Alegre	9466aafb3c	fixup! Add groupSortedArray() function	2022-02-15 14:48:20 +01:00
Anton Popov	5c316ffabe	support filtering by sparse columns without convertion to full	2022-02-15 14:30:54 +03:00
mergify[bot]	aab54f4c83	Merge branch 'master' into groupSortedArray	2022-02-14 12:47:48 +00:00
Maksim Kita	e2c8ba9ab2	Added performance test	2022-02-12 16:05:35 +00:00
Pablo Alegre	1e4b504ae2	fixup! Add groupSortedArray() function	2022-02-10 16:49:28 +01:00
avogar	bfa96463ca	Fix possible error 'file_size: Operation not supported'	2022-02-10 09:23:27 +03:00
Maksim Kita	613c9fa3c2	Merge pull request #34339 from kitaisreal/map-populate-series-refactoring Function mapPopulateSeries added additional performance test	2022-02-06 01:19:32 +01:00
Maksim Kita	35235d2d7f	Added additional performance test	2022-02-05 16:11:36 +00:00
Maksim Kita	eff16baaf3	Merge pull request #34318 from kitaisreal/map-populate-series-refactoring Function mapPopulateSeries refactoring	2022-02-05 12:51:02 +01:00
Maksim Kita	6e789f98ea	Added performance tests	2022-02-04 14:58:55 +00:00
Danila Kutenin	c90b1f7794	Optimize quantilesExact{Low,High} to use nth_element instead of sort	2022-02-03 12:24:33 +00:00
avogar	6229ec530d	Fix some perf tests	2022-01-31 21:07:20 +03:00
youenn lebras	c0864e6cd9	Update branch - Merge master	2022-01-31 10:22:25 +01:00
Maksim Kita	f3453024ff	Merge pull request #34060 from amosbird/optimizetupleorderby Make ORDER BY tuple almost as fast as ORDER BY columns	2022-01-29 15:58:09 +01:00
Amos Bird	faee95b897	Make ORDER BY tuple almost as fast as ORDER BY columns We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. Before: select * from numbers(300000000) order by (1 - number , number + 1 , number) limit 10; 2.613 sec. After: select * from numbers(300000000) order by (1 - number , number + 1 , number) limit 10; 0.755 sec No tuple: select * from numbers(300000000) order by 1 - number , number + 1 , number limit 10; 0.755 sec	2022-01-27 21:42:08 +08:00

1 2 3 4 5 ...

877 Commits