Commit Graph

18129 Commits

Author SHA1 Message Date
Robert Schulze
b3b0716b32
Merge pull request #37544 from ClickHouse/cached_patterns
Cache compiled regexps when evaluating non-const needles
2022-06-01 19:55:25 +02:00
alesapin
b7e8bbb154
Merge pull request #37679 from ClickHouse/fix-keeper-recovery-test
Fix `test_keeper_force_recovery*` tests
2022-06-01 18:45:32 +02:00
Alexey Milovidov
89638de521
Merge pull request #37738 from ClickHouse/fix-intersect-with-const
Fix `Intersect` with constant strings
2022-06-01 19:31:55 +03:00
Yakov Olkhovskiy
e23cec01d5
Merge pull request #37581 from ClickHouse/http-named-collection
Support for HTTP source for Data Dictionaries in Named Collections
2022-06-01 11:55:04 -04:00
Dmitry Novik
7fbe91ca81
Merge pull request #37460 from ClickHouse/memory-overcommit-improvement
Memory Overcommit: update defaults, exception message and add ProfileEvent
2022-06-01 17:06:33 +02:00
alesapin
b3b3d7a459 Fix test 2022-06-01 16:58:07 +02:00
Sema Checherinda
16dc3ed97d FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:07 +02:00
Sema Checherinda
2626a49616 FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:06 +02:00
Kseniia Sumarokova
7afcfcbaaf
Merge pull request #37691 from kssenii/fix-rabbitmq-restart-with-no-settings
Fix rabbitmq restart with empty settings
2022-06-01 14:59:34 +02:00
Kruglov Pavel
251be860e7
Merge pull request #37428 from loyd/fix/37420-rowbinary-bom
Stop removing UTF-8 BOM in RowBinary format
2022-06-01 13:36:55 +02:00
Vladimir C
8c0dba7302
Merge pull request #37650 from amosbird/joinget-fix
Fix joinGet with  join_use_nulls = 1 and Array type
2022-06-01 13:30:29 +02:00
Vladimir C
c466cdebf4
Merge pull request #37530 from vdimir/join_cond_dict_issue_37386 2022-06-01 13:29:01 +02:00
Antonio Andelic
6c31d06b2e Add test for const string intersect 2022-06-01 11:17:56 +00:00
Antonio Andelic
6c2db00f1f
Merge pull request #37568 from ClickHouse/tiny_fixes_for_jepsen
Slightly better jepsen tests
2022-06-01 12:16:09 +02:00
Robert Schulze
81318e07d6
Try to fix performance test results 2022-06-01 11:53:37 +02:00
Alexey Milovidov
154cae4356
Merge pull request #37704 from vdimir/failed_test_default_sort
Display entires for failed tests at the top of report by default
2022-06-01 12:00:55 +03:00
Alexey Milovidov
31b3350749
Merge pull request #37710 from ClickHouse/fix-grouping-function
Make GROUPING function skip constant folding
2022-06-01 12:00:14 +03:00
Alexey Milovidov
a0020cb55c
Merge pull request #37724 from CurtizJ/fix-ast-optimizations-remote
Fix `optimize_monotonous_functions_in_order_by` in distributed queries
2022-06-01 11:54:45 +03:00
Antonio Andelic
df0a1d523e add comment for overwrite 2022-06-01 06:32:52 +00:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
Fixes #37420
2022-06-01 13:12:55 +08:00
Anton Popov
6cf9405f09 fix optimize_monotonous_functions_in_order_by in distributed queries 2022-06-01 00:50:28 +00:00
Alexander Tokmakov
77c06447d5
Merge pull request #37622 from ClickHouse/try_fix_tests
Try fix tests
2022-06-01 01:11:28 +03:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Dmitry Novik
b11749ca2c Make GROUPING function skip constant folding 2022-05-31 16:45:29 +00:00
vdimir
c5ac6294ae
Display entires for failed tests at the top of report 2022-05-31 16:18:16 +00:00
vdimir
ca0bd754b5
Add no-backward-compatibility-check to 01391_join_on_dict_crash.sql 2022-05-31 16:02:58 +00:00
vdimir
673bc84bfc
Reformat 02244_lowcardinality_hash_join 2022-05-31 16:02:57 +00:00
vdimir
2476c6a988
Fix error on joining with dictionary on some conditions 2022-05-31 16:02:57 +00:00
Vladimir C
2a38fdb796
Merge pull request #37653 from vdimir/cross_join_dup_col_names 2022-05-31 17:50:19 +02:00
Antonio Andelic
49f815060a Use tar for logs 2022-05-31 15:18:44 +00:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Antonio Andelic
737d6ab1e3 Merge branch 'master' into tiny_fixes_for_jepsen 2022-05-31 13:53:55 +00:00
Antonio Andelic
792adb0576 Update jepsen and scp 2022-05-31 13:53:45 +00:00
Yakov Olkhovskiy
4b427336e3 tests with overridden and appended parameters 2022-05-31 09:37:34 -04:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement 2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145 Fix 2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests 2022-05-31 11:58:16 +00:00
Antonio Andelic
582be42329 Wait for leader election 2022-05-31 11:53:46 +00:00
Alexander Tokmakov
30a7b07d97
Merge pull request #37658 from vitlibar/fix-flaky-test_row_policy
Fix flaky test test_row_policy
2022-05-31 12:59:50 +03:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests 2022-05-30 20:52:40 +00:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds 2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00