Commit Graph

90065 Commits

Author SHA1 Message Date
Sergei Trifonov
f86b352375
Merge pull request #37676 from ClickHouse/verbose-sanity-checks
more verbose sanity checks
2022-05-31 17:29:27 +02:00
Alexey Milovidov
4bb04f913f Fix clang-tidy-14 2022-05-31 17:20:07 +02:00
Antonio Andelic
49f815060a Use tar for logs 2022-05-31 15:18:44 +00:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Antonio Andelic
3e71a716f5 Enable only jepsen tests 2022-05-31 13:55:01 +00:00
Antonio Andelic
737d6ab1e3 Merge branch 'master' into tiny_fixes_for_jepsen 2022-05-31 13:53:55 +00:00
Antonio Andelic
792adb0576 Update jepsen and scp 2022-05-31 13:53:45 +00:00
Yakov Olkhovskiy
4b427336e3 tests with overridden and appended parameters 2022-05-31 09:37:34 -04:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement 2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145 Fix 2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests 2022-05-31 11:58:16 +00:00
Antonio Andelic
582be42329 Wait for leader election 2022-05-31 11:53:46 +00:00
Sergei Trifonov
026e073b0b minor improvement 2022-05-31 13:50:09 +02:00
Alexander Tokmakov
30a7b07d97
Merge pull request #37658 from vitlibar/fix-flaky-test_row_policy
Fix flaky test test_row_policy
2022-05-31 12:59:50 +03:00
alesapin
65057bf8c4
Merge pull request #37616 from ClickHouse/remove-resursive-submodules
Remove resursive submodules
2022-05-31 11:58:04 +02:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
Robert Schulze
32c810fd35
Merge pull request #37644 from ClickHouse/fix-amqp-cpp-cassandra-dependencies
Disable amqp-cpp and cassandra build if libuv is disabled
2022-05-31 10:47:38 +02:00
Robert Schulze
557bb2d235
Disable amqp-cpp and cassandra build if libuv is disabled
On MacOS/GCC, the libuv build is disabled due to a compiler bug. This
is now propagated to dependent libraries amqp-cpp and cassandra.
Oddly enough, the Mac/GCC build was broken since at least Jan 2022
without someone noticing.
2022-05-31 10:34:03 +02:00
Sergei Trifonov
7e95bf31b2 more verbose sanity checks 2022-05-31 09:26:26 +02:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
showCertificate function implementation
2022-05-31 02:50:03 -04:00
Alexey Milovidov
bcbd6b802f Fix clang-tidy-14 2022-05-31 04:19:08 +02:00
Yakov Olkhovskiy
c6b20cd5ed
Merge pull request #37187 from Algunenano/floating_seconds
Allow decimal values in settings using seconds
2022-05-30 20:33:47 -04:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
Dmitry Novik
9d04305a5a
Update Settings.h 2022-05-30 23:00:28 +02:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests 2022-05-30 20:52:40 +00:00
Kseniia Sumarokova
18bda56e4c
Merge pull request #37655 from ClickHouse/kssenii-patch-3-1
Fix hung check
2022-05-30 21:22:12 +02:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds 2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
60b910a4de Fix 2022-05-30 19:04:25 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection 2022-05-30 16:40:18 +00:00
Vitaly Baranov
486a11a5e2 Fix flaky test test_row_policy. 2022-05-30 18:34:28 +02:00
vdimir
8a3f4bda62
Fix columns number mismatch in cross join 2022-05-30 15:40:15 +00:00
Kseniia Sumarokova
b1ba7b7027
Fix build 2022-05-30 17:30:59 +02:00
Kseniia Sumarokova
1869adfd7d
Update FileCache.cpp 2022-05-30 16:06:58 +02:00
Amos Bird
217f492264
Fix test 2022-05-30 21:38:24 +08:00
Amos Bird
b68e8efaf0
Fix joinGet with cannot be null type 2022-05-30 21:01:27 +08:00
Alexander Tokmakov
351956d108
Merge pull request #37640 from azat/transaction-fix
Fix excessive LIST requests to coordinator for transactions
2022-05-30 15:46:52 +03:00
mergify[bot]
87e896ae05
Merge branch 'master' into try_fix_tests 2022-05-30 12:44:35 +00:00
alesapin
84ed5aa6b0 No recursive in CI 2022-05-30 14:41:27 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
Check format name on storage creation
2022-05-30 14:23:20 +02:00
avogar
139a7e19a9 Fix comments 2022-05-30 11:43:29 +00:00
alesapin
87baabb1a8 Followup fix 2022-05-30 13:34:42 +02:00
alesapin
362fa745e6 Ignore broken metadata 2022-05-30 13:32:12 +02:00