Nikita Taranov
ee31be4286
impl
2022-09-16 15:41:15 +02:00
Igor Nikonov
eeecaf7a31
Merge remote-tracking branch 'origin/master' into distinct_in_order_wo_order_by
2022-09-16 10:30:52 +00:00
Raúl Marín
c3ff66bd9d
Implement batch processing for aggregate functions with multiple nullable arguments ( #41058 )
...
* Implement batch processing for aggregate functions with multiple nullable arguments
* Fix broken perf test
* Improve filter handling in addBatchSinglePlace with nullable arguments
* Fix detecting the Null filter usage
2022-09-15 23:51:38 +02:00
Raúl Marín
6dac509739
Speed up reading uniqState ( #41089 )
...
* Speed up reading UniquesHashSet
* Improve uniq serialization tests
2022-09-15 23:41:15 +02:00
Igor Nikonov
8a4806e8c0
Fix test
...
- remove perfomance queries which can be unstable
2022-09-15 10:53:42 +00:00
BoloniniD
e8bcbcd016
Merge branch 'master' into BLAKE3
2022-09-09 11:48:31 +03:00
vdimir
6d4b6c452a
Merge branch 'master' into grace_hash_join
2022-09-07 08:00:14 +00:00
Nikita Taranov
7c4f42d014
Skip empty literals in lz4 decompression ( #40142 )
2022-09-06 13:58:26 +02:00
Alexey Milovidov
193cd1b3b2
Merge pull request #39138 from nickitat/control_block_size_in_aggregator
...
Control block size in aggregator
2022-09-04 04:51:00 +03:00
vdimir
e21763e759
remove new setting from join_set_filter.xml
2022-08-29 09:49:13 +00:00
vdimir
470dcff89c
Add tests/performance/join_set_filter.xml
2022-08-29 09:49:11 +00:00
Alexey Milovidov
ab91c99495
Merge branch 'master' into control_block_size_in_aggregator
2022-08-20 21:28:27 +03:00
Kruglov Pavel
b67cb9e378
Merge pull request #40173 from Avogar/arrow-dict
...
Improve and fix dictionaries in Arrow format
2022-08-18 20:54:55 +02:00
Igor Nikonov
46ed4f6cdf
Merge pull request #38719 from ClickHouse/skipping_sorting_step
...
SortingStep: deduce way to sort based on input stream sort description
2022-08-17 12:58:11 +02:00
Nikita Taranov
63bc894a42
more parallelism
2022-08-16 18:56:22 +02:00
Alexander Tokmakov
6fd4d2cfb3
Revert "tests/performance: cover sparse_hashed dictionary ( #40027 )"
...
This reverts commit 6a30c23252
.
2022-08-16 15:32:50 +03:00
avogar
c8571f82f9
Fix performance test
2022-08-15 11:41:03 +00:00
Kruglov Pavel
ac85676d84
Update arrow_format.xml
2022-08-15 00:10:08 +02:00
avogar
398576e9c9
Improve and fix dictionaries in Arrow format
2022-08-12 18:56:21 +00:00
Igor Nikonov
75f6fcfa70
Merge remote-tracking branch 'origin/master' into skipping_sorting_step
2022-08-11 12:35:55 +00:00
Azat Khuzhin
6a30c23252
tests/performance: cover sparse_hashed dictionary ( #40027 )
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-10 21:48:00 +02:00
BoloniniD
b161773f71
Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3
2022-08-02 20:25:25 +03:00
Igor Nikonov
7f0adb5eb0
Merge remote-tracking branch 'origin/master' into skipping_sorting_step
2022-07-31 07:07:36 +00:00
Alexey Milovidov
c9e6850306
Merge pull request #39325 from azat/perf-parallel_mv-fix
...
tests/performance: improve parallel_mv test
2022-07-31 02:51:38 +03:00
Alexey Milovidov
36e6500e54
Merge branch 'master' into BLAKE3
2022-07-30 23:14:05 +03:00
Anton Popov
1547c010b9
Merge pull request #39432 from ClickHouse/distinct_sorted_chunk_perf_impr
...
DISTINCT in order: perf improvement
2022-07-27 14:17:58 +02:00
Alexander Gololobov
460950ecdc
Merge branch 'master' into feature/sql-standard-delete
2022-07-24 21:27:22 +02:00
Alexander Gololobov
594195451e
Cleanups
2022-07-24 12:21:18 +02:00
Igor Nikonov
739ff34c6e
Add some tests, still not sure about optimize_memory_usage option
2022-07-22 22:48:26 +00:00
Igor Nikonov
7db5d54820
Adopt to the case when not all columns in distinct are part of sorting
...
description
2022-07-21 21:04:58 +00:00
Igor Nikonov
122a1123b2
- disable the worst case for distinct in order in perf test for now
...
+ functional test for query with the worst perfomance
+ debug logging in DistinctStep
2022-07-21 15:03:19 +00:00
Igor Nikonov
ac116324b2
rename and fix perf test
2022-07-19 21:21:39 +00:00
Igor Nikonov
c74600d282
Merge branch 'master' into skipping_sorting_step
2022-07-19 18:59:36 +02:00
Igor Nikonov
1fe83cc8d8
optimize_sorting_for_input_stream setting and perf tests
2022-07-19 16:58:15 +00:00
Alexander Gololobov
f31788ed2a
Perf test for read after deleting many rows
2022-07-18 20:08:09 +02:00
Azat Khuzhin
cf1a5baa23
tests/performance: improve parallel_mv test
...
Right now it is possible for parallel_mv to fail [1] due to exceeding 15
seconds limit.
[1]: https://s3.amazonaws.com/clickhouse-test-reports/39183/ad6b50b087086fef8aa6f0f72b3a42f014266763/performance_comparison_aarch64_[4/4]/report.html
Let's try to really disable MERGES and see.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-18 16:23:17 +03:00
Igor Nikonov
828f3711d2
Update perf tests (to test perf test)
2022-07-17 20:40:58 +00:00
Igor Nikonov
6f224b026a
Perf test. Code polishing
2022-07-15 21:54:57 +00:00
Kruglov Pavel
3436fcfda6
Update tests/performance/low_cardinality_argument.xml
...
Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>
2022-07-15 18:24:44 +02:00
Kruglov Pavel
9c443038c7
Update low_cardinality_argument.xml
2022-07-14 18:28:25 +02:00
Kruglov Pavel
1f7fe10313
Update low_cardinality_argument.xml
2022-07-14 12:54:14 +02:00
avogar
390b1ac2f7
Improve isNullable/isConstant/isNull/isNotNull performance for LowCardinality argument
2022-07-13 17:56:34 +00:00
Igor Nikonov
16d2319a8d
SortingStep: type of sorting is deduced based on input stream sorting description in during transformation
...
+ perf test
2022-07-11 20:59:38 +00:00
Igor Nikonov
1f46f48d7d
Fix: remove heeavy performance tests, introduced within this PR
2022-07-07 07:57:05 +00:00
Igor Nikonov
a20a15ff30
Tests
...
+ check that EXPLAIN SYNTAX return the same result for ordinary ORDER BY and ORDER BY tuple
+ performance
2022-07-06 22:27:53 +00:00
Alexander Gololobov
612e836e60
Merge pull request #38740 from ClickHouse/array_norm_vectorize
...
Improved vectorized execution of main loop for array norm/distance
2022-07-04 10:19:57 +02:00
Alexey Milovidov
c711012399
Merge pull request #38731 from azat/views-max_insert_threads
...
Fix number of threads for pushing to views
2022-07-04 07:43:26 +03:00
Azat Khuzhin
4ae7db8369
Fix max_insert_threads while pushing to views
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-03 15:14:05 +03:00
Alexander Gololobov
ca2829188d
Perf test for norm/distance with long arrays of floats
2022-07-03 08:01:49 +02:00
mergify[bot]
12f5250e86
Merge branch 'master' into dictinct_in_order_optimization
2022-07-01 22:51:35 +00:00
Igor Nikonov
9ef8ff5a31
Addressing review comments
2022-07-01 22:50:00 +00:00
Igor Nikonov
488ee75fc4
+ use DistinctSorted for final distinct step
...
+ fix performance tests
2022-06-30 13:03:39 +00:00
Anton Popov
7c721578c7
Merge pull request #38320 from CurtizJ/dynamic-columns-16
...
Improve performace of insertion to columns of type JSON
2022-06-30 14:18:20 +02:00
Igor Nikonov
d435532c68
Adapt range search algorithm to high cardinality case
...
+ range search done in steps of some number of rows.
Controled by new
setting `distinct_in_order_range_search_step`. By default 0, i.e.
whole chunk
+ before start binary search, linear probing is done on each step (32
rows currently)
2022-06-29 23:30:35 +00:00
mergify[bot]
36139eacd7
Merge branch 'master' into dictinct_in_order_optimization
2022-06-29 13:37:16 +00:00
Igor Nikonov
3627c6ff36
Perf tests with high cardinality
2022-06-29 13:13:39 +00:00
Alexander Tokmakov
ceb66ade4b
Merge pull request #38335 from ClickHouse/deprecate_ordinary_database
...
Deprecate Ordinary database and old *MergeTree syntax
2022-06-29 13:42:59 +03:00
Nikita Taranov
f5d26572df
Quick fix for aggregation pipeline ( #38295 )
2022-06-29 01:16:30 +02:00
Anton Popov
58c8facebb
minor fixes
2022-06-28 14:21:21 +00:00
BoloniniD
6ddcec0906
Merge branch 'master' into BLAKE3
2022-06-28 16:53:06 +03:00
Alexander Tokmakov
31dcc7634e
Merge branch 'master' into deprecate_ordinary_database
2022-06-24 18:16:07 +02:00
Alexander Tokmakov
0d304f7b8c
fix tests
2022-06-23 21:19:07 +02:00
mergify[bot]
234f0c6399
Merge branch 'master' into revert-35914-FIPS_compliance
2022-06-23 12:06:17 +00:00
Anton Popov
3e62d0fb8c
fix test
2022-06-23 11:31:39 +00:00
Alexander Tokmakov
f00e6b5a7a
deprecate old MergeTree syntax
2022-06-23 11:24:54 +02:00
Sergey Skvortsov
202a2fd709
feat: Add grace hash join perf tests
2022-06-23 08:44:26 +03:00
Anton Popov
52db1b35a1
improve performace of insertion to columns of type JSON
2022-06-22 17:45:51 +00:00
Nikita Taranov
41ba0118b5
Bring back #36396 ( #38110 )
...
* Revert "Revert "More parallel execution for queries with `FINAL` (#36396 )""
This reverts commit 5bfb15262c
.
* fix tests
* fix review suggestions
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-22 15:05:07 +02:00
Alexey Milovidov
5855668514
Remove trash
2022-06-22 06:23:35 +02:00
Alexey Milovidov
0cf88e0950
Revert "ClickHouse's boringssl module updated to the official version of the FIPS compliant."
2022-06-18 23:16:18 +03:00
Antonio Andelic
f72e509b3b
Merge pull request #38052 from amosbird/join_regression_fix
...
Fix significant join performance regression
2022-06-17 19:55:33 +02:00
Robert Schulze
a0d936cc9f
Small follow-up for FPC codec
...
- add paper reference + doxygen
- remove endianness handling (ClickHouse assumes little endian)
- documentation
- other minor stuff
2022-06-15 14:21:28 +02:00
mergify[bot]
2cb9579234
Merge branch 'master' into join_regression_fix
2022-06-15 11:53:42 +00:00
Nikita Taranov
c8afeafe0e
More parallel execution for queries with FINAL
( #36396 )
2022-06-15 12:44:20 +02:00
Robert Schulze
9794098ebb
Merge pull request #37553 from koloshmet/fpc_codec
...
FPC Codec for floating point data
2022-06-15 12:03:41 +02:00
Maksim Kita
dc2e117cce
UnaryLogicalFunctions improve performance using dynamic dispatch
2022-06-14 17:30:11 +02:00
Amos Bird
9a6e6ccfaf
Fix significant join performance regression
2022-06-14 21:14:18 +08:00
Maksim Kita
daa128f378
Fixed performance tests
2022-06-13 13:31:02 +02:00
Maksim Kita
1247ba1b01
Hierarchical dictionaries performance test fix
2022-06-13 12:31:39 +02:00
Mikhail Guzov
092a00d95a
Merge branch 'ClickHouse:master' into fpc_codec
2022-06-11 21:24:06 +03:00
Maksim Kita
3a0e7b662c
Merge pull request #37954 from kitaisreal/normalize-utf8-performance-tests-fix
...
Normalize UTF8 performance test fix
2022-06-11 15:23:06 +02:00
mergify[bot]
a44590ea84
Merge branch 'master' into normalize-utf8-performance-tests-fix
2022-06-09 14:33:29 +00:00
Maksim Kita
5009374036
Normalize UTF8 performance test fix
2022-06-09 15:35:53 +02:00
BoloniniD
b05ee41d25
Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3
2022-06-06 16:03:10 +03:00
Nikita Taranov
0a9d8398d8
impl
2022-06-04 19:14:38 +00:00
Robert Schulze
b3b0716b32
Merge pull request #37544 from ClickHouse/cached_patterns
...
Cache compiled regexps when evaluating non-const needles
2022-06-01 19:55:25 +02:00
Robert Schulze
81318e07d6
Try to fix performance test results
2022-06-01 11:53:37 +02:00
BoloniniD
dd8aefdf1e
Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3
2022-06-01 11:46:55 +03:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
...
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Anton Popov
30f8eb800a
optimize function coalesce with two arguments
2022-05-30 22:29:35 +00:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
...
This commit is based on local benchmarks of ClickHouse's re2 caching.
Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;
The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.
(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)
Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;
I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
collision list which grows to a certain size (here: 10 elements) and
afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
collision list in which a collision replaces the stored element, idea
by Alexey)
... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C) 2 mio distinct complex patterns
D) 10 complex patterns
Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.
B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.
Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.
Results in sec, averaged over on multiple measurements;
1.A): 2.12
B): 1.68
C): 9.75
D): 9.45
2.A): 2.17
B): 1.73
C): 9.78
D): 9.47
3.A): 9.8
B): 0.63
C): 31.8
D): 0.98
4.A): 2.14
B): 0.29
C): 9.82
D): 0.41
5.A) 2.12 / 2.15 / 2.26
B) 1.51 / 0.43 / 0.30
C) 9.97 / 9.88 / 10.13
D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)
Evaluation:
1. This is the baseline. It was surprised that complex patterns (C, D)
slow down the queries so badly compared to simple patterns (A, B).
The runtime includes evaluation costs, but as caching only helps with
compilation, and looking at 4.D and 5.D, compilation makes up over 90%
of the runtime!
2. No speedup compared to 1, probably due to locking overhead. The cache
is unbounded, and in experiments with data sets > 2 mio rows, 2. is
the only scheme to throw OOM exceptions which is not acceptable.
3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
bad runtimes due to LRU queue maintenance and locking. Works pretty
well however with few distinct patterns (B and D).
4. This scheme is tailored to queries B and D where it performs pretty
good. More importantly, the caching is lightweight enough to not
deteriorate performance on datasets A and C.
5. After some tuning of the hash map size, 100 buckets seem optimal to
be in the same ballpark with 10 distinct patterns as 4. Performance
also does not deteriorate on A and C compared to the baseline.
Unlike 4., this scheme behaves LRU-like and can adjust to changing
pattern distributions.
As a conclusion, this commit implementes two things:
1. Based on Q1, pattern search with const needle no longer uses
caching. This applies to LIKE and MATCH + a few (exotic) other SQL
functions. The code for the unbounded caching was removed.
2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexey Milovidov
9e3242f186
Merge pull request #37617 from CurtizJ/aggregation-sparse-columns
...
Better performance with sparse columns in aggregate functions
2022-05-29 09:36:07 +03:00
Anton Popov
c39d95e2e6
add perf test
2022-05-28 12:56:38 +00:00
Alexey Milovidov
86afa3a245
Merge pull request #37502 from ClickHouse/array_norm_dist_fixes
...
Renamed arrayXXNorm/arrayXXDistance functions to XXNorm/XXDistance and fixed some overflow cases
2022-05-27 00:56:29 +03:00
koloshmet
7e69779575
added fpc codec to float perftest
2022-05-26 22:32:56 +03:00
Maksim Kita
3a92e61827
Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement
...
Dictionary getDescendants performance improvement
2022-05-26 12:29:17 +02:00
Maksim Kita
bee3c30f66
Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance
...
Geo distance functions improve performance
2022-05-25 22:40:40 +02:00
Alexander Gololobov
168b47d0ad
Use same norm and distance function names for tuples and arrays
2022-05-25 22:39:59 +02:00
Maksim Kita
45da28ecae
Improve performance of geo distance functions
2022-05-25 14:22:22 +02:00
Maksim Kita
3c0c322d7c
Merge pull request #37480 from kitaisreal/dynamic-dispatch-infrastructure-improvements
...
Dynamic dispatch infrastructure style fixes
2022-05-24 18:13:53 +02:00