Commit Graph

90422 Commits

Author SHA1 Message Date
HeenaBansal2009
2584bbb7f1 Fix Style check 2022-05-31 07:49:54 -07:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Anton Popov
00f87b0f57 replace multiIf to if in case of one condition 2022-05-31 14:45:12 +00:00
Nikolai Kochetov
147a819221 Refactor a little bit more. 2022-05-31 14:43:38 +00:00
Antonio Andelic
3e71a716f5 Enable only jepsen tests 2022-05-31 13:55:01 +00:00
Antonio Andelic
737d6ab1e3 Merge branch 'master' into tiny_fixes_for_jepsen 2022-05-31 13:53:55 +00:00
Antonio Andelic
792adb0576 Update jepsen and scp 2022-05-31 13:53:45 +00:00
Yakov Olkhovskiy
4b427336e3 tests with overridden and appended parameters 2022-05-31 09:37:34 -04:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement 2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145 Fix 2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
kssenii
69cd3a2b10 Fix 2022-05-31 14:20:31 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests 2022-05-31 11:58:16 +00:00
Antonio Andelic
582be42329 Wait for leader election 2022-05-31 11:53:46 +00:00
Sergei Trifonov
026e073b0b minor improvement 2022-05-31 13:50:09 +02:00
HeenaBansal2009
1823f736cc Some more CheckTriviallyCopyableMove fix 2022-05-31 04:11:11 -07:00
mergify[bot]
1e08046c47
Merge branch 'master' into cleanup_unused 2022-05-31 10:28:19 +00:00
Alexander Tokmakov
30a7b07d97
Merge pull request #37658 from vitlibar/fix-flaky-test_row_policy
Fix flaky test test_row_policy
2022-05-31 12:59:50 +03:00
alesapin
65057bf8c4
Merge pull request #37616 from ClickHouse/remove-resursive-submodules
Remove resursive submodules
2022-05-31 11:58:04 +02:00
mergify[bot]
f90dddccba
Merge branch 'master' into fix-temp-table-drop 2022-05-31 09:10:00 +00:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
Robert Schulze
32c810fd35
Merge pull request #37644 from ClickHouse/fix-amqp-cpp-cassandra-dependencies
Disable amqp-cpp and cassandra build if libuv is disabled
2022-05-31 10:47:38 +02:00
Robert Schulze
557bb2d235
Disable amqp-cpp and cassandra build if libuv is disabled
On MacOS/GCC, the libuv build is disabled due to a compiler bug. This
is now propagated to dependent libraries amqp-cpp and cassandra.
Oddly enough, the Mac/GCC build was broken since at least Jan 2022
without someone noticing.
2022-05-31 10:34:03 +02:00
Sergei Trifonov
7e95bf31b2 more verbose sanity checks 2022-05-31 09:26:26 +02:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
showCertificate function implementation
2022-05-31 02:50:03 -04:00
xlwh
ba4cdd43bd Cleanup unused file 2022-05-31 14:37:30 +08:00
zhanglistar
53020b096d
Merge branch 'ClickHouse:master' into typo 2022-05-31 11:28:12 +08:00
Han Fei
7870e02fdf add user access tests 2022-05-31 11:16:06 +08:00
HeenaBansal2009
3976afa56a Fix build failures 2022-05-30 20:06:27 -07:00
Alexey Milovidov
bcbd6b802f Fix clang-tidy-14 2022-05-31 04:19:08 +02:00
yaqi-zhao
a2857491c4 add avx512 support for mergetreereader 2022-05-30 20:53:00 -04:00
Yakov Olkhovskiy
c6b20cd5ed
Merge pull request #37187 from Algunenano/floating_seconds
Allow decimal values in settings using seconds
2022-05-30 20:33:47 -04:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
Dmitry Novik
9d04305a5a
Update Settings.h 2022-05-30 23:00:28 +02:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests 2022-05-30 20:52:40 +00:00
Nikolai Kochetov
df0d580a8c Fix another one test. 2022-05-30 19:29:57 +00:00
Kseniia Sumarokova
18bda56e4c
Merge pull request #37655 from ClickHouse/kssenii-patch-3-1
Fix hung check
2022-05-30 21:22:12 +02:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds 2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
Nikolai Kochetov
913e7a91ae Fix limits from subquery. 2022-05-30 18:25:17 +00:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexander Gololobov
e2dd6f6249 Removed prewhere_info.alias_actions 2022-05-30 19:58:23 +02:00
Han Fei
e15cdec39c address comments 2022-05-31 01:46:31 +08:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
60b910a4de Fix 2022-05-30 19:04:25 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00