Commit Graph

90284 Commits

Author SHA1 Message Date
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexander Gololobov
e2dd6f6249 Removed prewhere_info.alias_actions 2022-05-30 19:58:23 +02:00
Han Fei
e15cdec39c address comments 2022-05-31 01:46:31 +08:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
60b910a4de Fix 2022-05-30 19:04:25 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection 2022-05-30 16:40:18 +00:00
Vitaly Baranov
486a11a5e2 Fix flaky test test_row_policy. 2022-05-30 18:34:28 +02:00
Han Fei
af86900c52 Merge branch 'hanfei/zk-write' of github.com:hanfei1991/ClickHouse into hanfei/zk-write 2022-05-31 00:17:38 +08:00
Han Fei
42fca8d5f0 address comments 2022-05-31 00:17:32 +08:00
Han Fei
a464b10afe
Update src/Storages/System/StorageSystemZooKeeper.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2022-05-30 23:52:37 +08:00
Han Fei
194445646a
Update src/Storages/System/StorageSystemZooKeeper.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2022-05-30 23:47:43 +08:00
vdimir
8a3f4bda62
Fix columns number mismatch in cross join 2022-05-30 15:40:15 +00:00
Kseniia Sumarokova
b1ba7b7027
Fix build 2022-05-30 17:30:59 +02:00
Kseniia Sumarokova
1869adfd7d
Update FileCache.cpp 2022-05-30 16:06:58 +02:00
Amos Bird
217f492264
Fix test 2022-05-30 21:38:24 +08:00
Nikolai Kochetov
c71256ea38 Remove some commented code. 2022-05-30 13:18:20 +00:00
Nikolai Kochetov
5ef51ed27b Fix more tests. 2022-05-30 13:10:30 +00:00
Amos Bird
b68e8efaf0
Fix joinGet with cannot be null type 2022-05-30 21:01:27 +08:00
Alexander Tokmakov
351956d108
Merge pull request #37640 from azat/transaction-fix
Fix excessive LIST requests to coordinator for transactions
2022-05-30 15:46:52 +03:00
mergify[bot]
87e896ae05
Merge branch 'master' into try_fix_tests 2022-05-30 12:44:35 +00:00
alesapin
84ed5aa6b0 No recursive in CI 2022-05-30 14:41:27 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
Check format name on storage creation
2022-05-30 14:23:20 +02:00
avogar
139a7e19a9 Fix comments 2022-05-30 11:43:29 +00:00
alesapin
87baabb1a8 Followup fix 2022-05-30 13:34:42 +02:00
alesapin
362fa745e6 Ignore broken metadata 2022-05-30 13:32:12 +02:00
Nikolai Kochetov
5b4658aa5e Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-30 09:47:35 +00:00
Amos Bird
6525bfc4cd
Avoid context copy for InterpreterSelects 2022-05-30 17:08:12 +08:00
Maksim Kita
55f1faf77d
Merge pull request #37637 from ucasfl/typo
fix typo
2022-05-30 11:03:35 +02:00
Maksim Kita
53d848f187 Fixed tests 2022-05-30 10:53:04 +02:00
Azat Khuzhin
d86181d3cd keeper: store only unique session IDs for watches
This should speed up keeper, especially in case of incorrect usage (like
the case that had been fixed in #37640), especially in case on non
release build.

And also this should fix SIGKILL in stress tests.

You will find some details for one of such SIGKILL in `<details>` tag [1]:

<details>

    $ pigz -cd clickhouse-server.stress.log.gz | tail
    2022.05.27 16:17:24.882971 [ 637 ] {} <Trace> BackgroundSchedulePool/BgSchPool: Waiting for threads to finish.
    2022.05.27 16:17:24.896749 [ 637 ] {} <Debug> MemoryTracker: Peak memory usage (for query): 4.09 MiB.
    2022.05.27 16:17:24.907163 [ 637 ] {} <Debug> Application: Shut down storages.
    2022.05.27 16:17:24.907233 [ 637 ] {} <Debug> Application: Waiting for current connections to servers for tables to finish.
    2022.05.27 16:17:24.934335 [ 637 ] {} <Information> Application: Closed all listening sockets. Waiting for 1 outstanding connections.
    2022.05.27 16:17:29.843491 [ 637 ] {} <Information> Application: Closed connections to servers for tables. But 1 remain. Probably some tables of other users cannot finish their connections after context shutdown.
    2022.05.27 16:17:29.843632 [ 637 ] {} <Debug> KeeperDispatcher: Shutting down storage dispatcher
    2022.05.27 16:17:34.612616 [ 688 ] {} <Test> virtual Coordination::ZooKeeperRequest::~ZooKeeperRequest(): Processing of request xid=2147483647 took 10000 ms
    2022.05.27 16:17:54.612109 [ 3176 ] {} <Debug> KeeperTCPHandler: Session #12 expired
    2022.05.27 16:19:59.823038 [ 635 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).

    Thread 26 (Thread 0x7f1c7703f700 (LWP 708)):
    0  0x000000000b074b2a in __tsan::MemoryAccessImpl(__tsan::ThreadState*, unsigned long, int, bool, bool, unsigned long long*, __tsan::Shadow) ()
    1  0x000000000b08630c in __tsan::MemoryAccessRange(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
    2  0x000000000b01ff03 in memmove ()
    3  0x000000001bbc8996 in std::__1::__move<long, long> (__first=0xb8600000d83304, __last=<optimized out>, __result=0x7f1c021cd000) at ../contrib/libcxx/include/__algorithm/move.h:57
    4  std::__1::move<long*, long*> (__first=0xb8600000d83304, __last=<optimized out>, __result=0x7f1c021cd000) at ../contrib/libcxx/include/__algorithm/move.h:70
    5  std::__1::vector<long, std::__1::allocator<long> >::erase (this=0x7b1400584c48, __position=...) at ../contrib/libcxx/include/vector:1608
    6  DB::KeeperStorage::clearDeadWatches (this=0x7b5800001ad8, this@entry=0x7b5800001800, session_id=session_id@entry=12) at ../src/Coordination/KeeperStorage.cpp:1228
    7  0x000000001bbc5c55 in DB::KeeperStorage::processRequest (this=0x7b5800001800, zk_request=..., session_id=12, time=1, new_last_zxid=..., check_acl=true) at ../src/Coordination/KeeperStorage.cpp:1122
    8  0x000000001bba06a3 in DB::KeeperStateMachine::commit (this=<optimized out>, log_idx=3549, data=...) at ../src/Coordination/KeeperStateMachine.cpp:143
    9  0x000000001bba6193 in nuraft::state_machine::commit_ext (this=0x7b4c00001f98, params=...) at ../contrib/NuRaft/include/libnuraft/state_machine.hxx:75
    10 0x00000000202c5a55 in nuraft::raft_server::commit_app_log (this=this@entry=0x7b6c00002a18, idx_to_commit=idx_to_commit@entry=3549, le=..., need_to_handle_commit_elem=true, initial_commit_exec=false) at ../contrib/NuRaft/src/handle_commit.cxx:311
    11 0x00000000202c4f98 in nuraft::raft_server::commit_in_bg_exec (this=<optimized out>, this@entry=0x7b6c00002a18, timeout_ms=timeout_ms@entry=0, initial_commit_exec=false) at ../contrib/NuRaft/src/handle_commit.cxx:241
    12 0x00000000202c4613 in nuraft::raft_server::commit_in_bg (this=this@entry=0x7b6c00002a18) at ../contrib/NuRaft/src/handle_commit.cxx:149
    ...
    Thread 28 (Thread 0x7f1c7603d700 (LWP 710)):
    0  0x00007f1d22a6d110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
    1  0x00007f1d22a650a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
    2  0x000000000b0337b0 in pthread_mutex_lock ()
    3  0x00000000221884da in std::__1::__libcpp_mutex_lock (__m=0x7b4c00002088) at ../contrib/libcxx/include/__threading_support:303
    4  std::__1::mutex::lock (this=0x7b4c00002088) at ../contrib/libcxx/src/mutex.cpp:33
    5  0x000000001bba4188 in std::__1::lock_guard<std::__1::mutex>::lock_guard (__m=..., this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:91
    6  DB::KeeperStateMachine::getDeadSessions (this=0x7b4c00001f98) at ../src/Coordination/KeeperStateMachine.cpp:360
    7  0x000000001bb79b4b in DB::KeeperServer::getDeadSessions (this=0x7b4400012700) at ../src/Coordination/KeeperServer.cpp:572
    8  0x000000001bb64d1a in DB::KeeperDispatcher::sessionCleanerTask (this=<optimized out>, this@entry=0x7b640001c218) at ../src/Coordination/KeeperDispatcher.cpp:399
    ...
    Thread 1 (Thread 0x7f1d227148c0 (LWP 637)):
    0  0x00007f1d22a69376 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    1  0x000000000b0895e0 in __tsan::call_pthread_cancel_with_cleanup(int (*)(void*), void (*)(void*), void*) ()
    2  0x000000000b017091 in pthread_cond_wait ()
    3  0x0000000020569d98 in Poco::EventImpl::waitImpl (this=0x7b2000008798) at ../contrib/poco/Foundation/src/Event_POSIX.cpp:106
    4  0x000000001bb636cf in Poco::Event::wait (this=0x7b2000008798) at ../contrib/poco/Foundation/include/Poco/Event.h:97
    5  ThreadFromGlobalPool::join (this=<optimized out>) at ../src/Common/ThreadPool.h:217
    6  DB::KeeperDispatcher::shutdown (this=0x7b640001c218) at ../src/Coordination/KeeperDispatcher.cpp:322
    7  0x0000000019ca8bfc in DB::Context::shutdownKeeperDispatcher (this=<optimized out>) at ../src/Interpreters/Context.cpp:2111
    8  0x000000000b0a979b in DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_9::operator()() const (this=0x7ffcde44f0a0) at ../programs/server/Server.cpp:1407

</details>

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37593/2613149f6bf4f242bbbf2c3c8539b5176fd77286/stress_test__thread__actions_.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-30 11:50:48 +03:00
Azat Khuzhin
78bd47d8df Fix excessive LIST requests to coordinator for transactions
In [1] there was only few transactions, but lots of List for /test/clickhouse/txn/log:

    $ clickhouse-local --format TSVWithNamesAndTypes --file zookeeper_log.tsv.gz -q "select * except('path|session_id|event_time|thread_id|event_date|xid') apply(x->groupUniqArray(x)), path, session_id, min(event_time), max(event_time), count() from table where has_watch and type = 'Request' group by path, session_id order by count() desc limit 1 format Vertical"
    Row 1:
    ──────
    groupUniqArray(type):             ['Request']
    groupUniqArray(query_id):         ['','62d75128-9031-48a5-87ba-aec3f0b591c6']
    groupUniqArray(address):          ['::1']
    groupUniqArray(port):             [9181]
    groupUniqArray(has_watch):        [1]
    groupUniqArray(op_num):           ['List']
    groupUniqArray(data):             ['']
    groupUniqArray(is_ephemeral):     [0]
    groupUniqArray(is_sequential):    [0]
    groupUniqArray(version):          []
    groupUniqArray(requests_size):    [0]
    groupUniqArray(request_idx):      [0]
    groupUniqArray(error):            []
    groupUniqArray(watch_type):       []
    groupUniqArray(watch_state):      []
    groupUniqArray(stat_version):     [0]
    groupUniqArray(stat_cversion):    [0]
    groupUniqArray(stat_dataLength):  [0]
    groupUniqArray(stat_numChildren): [0]
    groupUniqArray(children):         [[]]
    path:                             /test/clickhouse/txn/log
    session_id:                       1
    min(event_time):                  2022-05-27 12:54:09.025897
    max(event_time):                  2022-05-27 13:37:12.846314 <!-- last transaction was at 12:54, see server log
    count():                          3673675 <-- huge

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37593/2613149f6bf4f242bbbf2c3c8539b5176fd77286/stateless_tests__debug__actions__[1/3].html

Server log:

    $ pigz -cd clickhouse-server.log.gz | fgrep TransactionLog: | head -1
    2022.05.27 12:54:09.026852 [ 5018 ] {62d75128-9031-48a5-87ba-aec3f0b591c6} <Trace> TransactionLog: Loading 33 entries from /test/clickhouse/txn/log: csn-0000000000..csn-0000000032
    $ pigz -cd clickhouse-server.log.gz | fgrep TransactionLog: | tail -1
    2022.05.27 12:54:58.909222 [ 509 ] {} <Test> TransactionLog: Closing readonly transaction (177, 38, 41b51ff1-bcba-43bf-bcea-e97ad05f6040)
    $ pigz -cd clickhouse-server.log.gz | fgrep 62d75128-9031-48a5-87ba-aec3f0b591c6 | tail -1
    2022.05.27 12:54:09.064857 [ 5018 ] {62d75128-9031-48a5-87ba-aec3f0b591c6} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

Fixes: #37398 (cc @tavplubix)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-30 11:39:34 +03:00
Sergei Trifonov
2589f78135
Fix misleading arrayFunction description 2022-05-30 10:32:22 +02:00
mergify[bot]
7a2460233b
Merge branch 'master' into try_fix_tests 2022-05-30 08:16:29 +00:00
flynn
9ffa5f5e0d fix typo 2022-05-30 08:10:16 +00:00
Alexey Milovidov
1d6f9de001 Merge branch 'llvm-14' of github.com:ClickHouse/ClickHouse into llvm-14 2022-05-30 05:36:43 +02:00
Alexey Milovidov
f1fb57c6ce Fix clang-tidy-14 2022-05-30 05:36:26 +02:00
Alexey Milovidov
9a5dd75a68 Merge branch 'master' into llvm-14 2022-05-30 05:34:38 +02:00
Alexey Milovidov
a67fb05ea0 Add test 2022-05-30 05:03:16 +02:00
Alexey Milovidov
c0e6ff4216 More precise result of "dumpColumnStructure" and "byteSize" miscellaneous functions 2022-05-30 04:56:54 +02:00
taiyang-li
0b0d38b18c Merge branch 'master' into fix_settings 2022-05-30 10:07:09 +08:00
Alexey Milovidov
33b2a50740
Merge pull request #37630 from wangqinghuan/add-clickcat
add clickcat intro
2022-05-30 04:25:22 +03:00
qinghuan wang
ded9f5675d
Update gui.md 2022-05-30 09:22:06 +08:00
alesapin
6f35c28592 Fix style 2022-05-30 00:29:30 +02:00
Alexey Milovidov
746ff42239
Update gui.md 2022-05-30 01:13:03 +03:00
alesapin
c32b6076fb Remove stranges from code 2022-05-30 00:12:33 +02:00
Alexey Milovidov
5fda199dcf
Update src/Common/ErrorCodes.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@clickhouse.com>
2022-05-30 00:20:49 +03:00
mergify[bot]
8ceafbe1fa
Merge branch 'master' into try_fix_tests 2022-05-29 18:04:44 +00:00
alesapin
753bcee954 Fix lazy mark load 2022-05-29 18:38:09 +02:00
alesapin
aecab57e17
Merge pull request #37629 from ClickHouse/fix_failure
Fix refactoring issue
2022-05-29 18:35:54 +02:00