Commit Graph

26965 Commits

Author SHA1 Message Date
mergify[bot]
88bed9de20
Merge branch 'master' into fix_replace_range_again 2022-06-02 07:05:44 +00:00
Alexey Milovidov
cb8b0219ac Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14 2022-06-02 08:46:31 +02:00
Alexander Gololobov
54d8fd0753
Merge pull request #37773 from excitoon/patch-17
Typos
2022-06-02 09:27:02 +03:00
Alexander Gololobov
a0cf902d49
Merge pull request #37588 from yaqi-zhao/avx512_tail_zero
add avx512 support for mergetree reader
2022-06-02 08:51:52 +03:00
Vladimir Chebotarev
a857bc2ccf
Update S3Common.cpp 2022-06-02 08:46:41 +03:00
Vladimir Chebotarev
5fcf840156
Typo. 2022-06-02 08:43:44 +03:00
Han Fei
1424c420fa try to fill in right metadata columns 2022-06-02 13:41:37 +08:00
Vxider
8221fcd5f1 update var name 2022-06-02 04:38:45 +00:00
Vxider
df4db70bb8 fix empty target table id 2022-06-02 04:28:18 +00:00
Vladimir Chebotarev
d5022a0c01 Moved ClientConfigurationPerRequest from ClickHouse/aws-sdk-cpp#1 and ClickHouse/aws-sdk-cpp#2 to ClickHouse. 2022-06-02 06:07:01 +03:00
Nikolay Degterinsky
9575a6d048
Merge pull request #37587 from bigo-sg/typo
Fix a typo
2022-06-02 02:18:03 +02:00
Alexander Gololobov
ec6e413f0b Fixed runtime check for AVX512F 2022-06-01 23:00:49 +02:00
Alexey Milovidov
b5f48a7d3f Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14 2022-06-01 22:09:58 +02:00
Robert Schulze
366f368d06
Disallow LIKE patterns with trailing escape
Trailing escape ('ab\') is disallowed in SQL, in standardese:

  "If an escape character is specified, then [...] If there is not a
  partitioning of the string PVC into substrings such that each substring
  has length 1 (one) or 2, no substring of length 1 (one) is the escape
  character ECV, and each substring of length 2 is the escape character
  ECV followed by either the escape character ECV, an <underscore>
  character, or the <percent> character, then an exception condition is
  raised: data exception - invalid escape sequence."

I first thought this is checked already higher up in the stack, at least
for const needles, as single trailing backslashes ('ab\') are rejected,
but then I realized that ClickHouse quotes by default. I.e., double
trailing backslashes ('ab\\') are not rejected but when interpreted as
LIKE needle ('ab\') they should.
2022-06-01 21:38:46 +02:00
lthaooo
6632616733
Fix TTL merge scheduling bug (#36387) 2022-06-01 21:09:53 +02:00
Azat Khuzhin
545a56ce45 Fix sinks with onException() handler
It is possible to call onException() even after onFinish(), in case of
onFinish() throws, and in this case onException() should be no-op for
such sinks.

Also there can be caveats with PartitionedSync.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
02af58f41d Fix possible "Cannot write to finalized buffer"
It is still possible to get this error since onException does not
finalize format correctly.

Here is an example of such error, that was found by CI [1]:

<details>

    [ 2686 ] {fa01bf02-73f6-4f7f-b14f-e725de6d7f9b} <Fatal> : Logical error: 'Cannot write to finalized buffer'.
    [ 34577 ] {} <Fatal> BaseDaemon: ########################################
    [ 34577 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: AB8040A6769E01A0) (from thread 2686) (query_id: fa01bf02-73f6-4f7f-b14f-e725de6d7f9b) (query: insert into test_02302 select number from numbers(10) settings s3_truncate_on_insert=1;) Received signal Aborted (6)
    [ 34577 ] {} <Fatal> BaseDaemon:
    [ 34577 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fcbaa5a703b 0x7fcbaa586859 0xfad9bab 0xfad9e05 0xfaf6a3b 0x24a48c7f 0x258fb9b9 0x258f2004 0x258b88f4 0x258b863b 0x2581773d 0x258177ce 0x24bb5e98 0xfad01d6 0xfad0105 0x2419b11d 0xfad01d6 0xfad0105 0x2215afbb 0x2215aa48 0xfad01d6 0xfad0105 0xfcc265d 0x225cc546 0x249a1c40 0x249bc1b6 0x2685902c 0x26859505 0x269d7767 0x269d504c 0x7fcbaa75e609 0x7fcbaa683163
    [ 34577 ] {} <Fatal> BaseDaemon: 3. raise @ 0x7fcbaa5a703b in ?
    [ 34577 ] {} <Fatal> BaseDaemon: 4. abort @ 0x7fcbaa586859 in ?
    [ 34577 ] {} <Fatal> BaseDaemon: 5. ./build_docker/../src/Common/Exception.cpp:47: DB::abortOnFailedAssertion(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xfad9bab in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 6. ./build_docker/../src/Common/Exception.cpp:70: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xfad9e05 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 7. ./build_docker/../src/IO/WriteBuffer.h:0: DB::WriteBuffer::write(char const*, unsigned long) @ 0xfaf6a3b in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 8. ./build_docker/../src/Processors/Formats/Impl/ArrowBufferedStreams.cpp:47: DB::ArrowBufferedOutputStream::Write(void const*, long) @ 0x24a48c7f in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 9. long parquet::ThriftSerializer::Serialize<parquet::format::FileMetaData>(parquet::format::FileMetaData const*, arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) @ 0x258fb9b9 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 10. parquet::FileMetaData::FileMetaDataImpl::WriteTo(arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) const @ 0x258f2004 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 11. parquet::WriteFileMetaData(parquet::FileMetaData const&, arrow::io::OutputStream*) @ 0x258b88f4 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 12. parquet::ParquetFileWriter::~ParquetFileWriter() @ 0x258b863b in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 13. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x2581773d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 14. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x258177ce in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 15. ./build_docker/../src/Processors/Formats/Impl/ParquetBlockOutputFormat.h:27: DB::ParquetBlockOutputFormat::~ParquetBlockOutputFormat() @ 0x24bb5e98 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 16. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 17. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 18.1. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:312: std::__1::unique_ptr<DB::WriteBuffer, std::__1::default_delete<DB::WriteBuffer> >::reset(DB::WriteBuffer*)
    [ 34577 ] {} <Fatal> BaseDaemon: 18.2. inlined from ../contrib/libcxx/include/__memory/unique_ptr.h:269: ~unique_ptr
    [ 34577 ] {} <Fatal> BaseDaemon: 18. ../src/Storages/StorageS3.cpp:566: DB::StorageS3Sink::~StorageS3Sink() @ 0x2419b11d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 19. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 20. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 21. ./build_docker/../contrib/abseil-cpp/absl/container/internal/raw_hash_set.h:1662: absl::lts_20211102::container_internal::raw_hash_set<absl::lts_20211102::container_internal::FlatHashMapPolicy<StringRef, std::__1::shared_ptr<DB::SinkToStorage> >, absl::lts_20211102::hash_internal::Hash<StringRef>, std::__1::equal_to<StringRef>, std::__1::allocator<std::__1::pair<StringRef const, std::__1::shared_ptr<DB::SinkToStorage> > > >::destroy_slots() @ 0x2215afbb in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 22.1. inlined from ./build_docker/../contrib/libcxx/include/string:1445: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__is_long() const
    [ 34577 ] {} <Fatal> BaseDaemon: 22.2. inlined from ../contrib/libcxx/include/string:2231: ~basic_string
    [ 34577 ] {} <Fatal> BaseDaemon: 22. ../src/Storages/PartitionedSink.h:14: DB::PartitionedSink::~PartitionedSink() @ 0x2215aa48 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 23. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 24. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 25. ./build_docker/../contrib/libcxx/include/vector:802: std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >::__base_destruct_at_end(std::__1::shared_ptr<DB::IProcessor>*) @ 0xfcc265d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 26.1. inlined from ./build_docker/../contrib/libcxx/include/vector:402: ~vector
    [ 34577 ] {} <Fatal> BaseDaemon: 26.2. inlined from ../src/QueryPipeline/QueryPipeline.cpp:29: ~QueryPipeline
    [ 34577 ] {} <Fatal> BaseDaemon: 26. ../src/QueryPipeline/QueryPipeline.cpp:535: DB::QueryPipeline::reset() @ 0x225cc546 in /usr/bin/clickhouse
    [ 614 ] {} <Fatal> Application: Child process was terminated by signal 6.

</details>

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37542/8a224239c1d922158b4dc9f5d6609dca836dfd06/stress_test__undefined__actions_.html

Follow-up for: #36979

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
62d78d8f20 Fix WriteBufferFromS3 is_finalized check in case of exception
WriteBufferFromS3::is_finalized is not set if finalizeImpl() throws,
while WriteBuffer::finalized correctly set even in case of exception, so
it should be used instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Alexander Tokmakov
06f80770b8 fix stuck REPALCE_RANGE 2022-06-01 20:11:53 +02:00
Robert Schulze
b3b0716b32
Merge pull request #37544 from ClickHouse/cached_patterns
Cache compiled regexps when evaluating non-const needles
2022-06-01 19:55:25 +02:00
Nikolai Kochetov
cc0d5a0daa Fix test again. 2022-06-01 17:39:12 +00:00
Nikolai Kochetov
9b131f2d2d Fix tewst again. 2022-06-01 16:56:26 +00:00
avogar
4abfd54dd6 Fix possible segfault in schema inference 2022-06-01 16:53:37 +00:00
Alexey Milovidov
89638de521
Merge pull request #37738 from ClickHouse/fix-intersect-with-const
Fix `Intersect` with constant strings
2022-06-01 19:31:55 +03:00
Yakov Olkhovskiy
e23cec01d5
Merge pull request #37581 from ClickHouse/http-named-collection
Support for HTTP source for Data Dictionaries in Named Collections
2022-06-01 11:55:04 -04:00
Anton Popov
1ef48c3a4a turn on setting output_format_json_named_tuples_as_objects by default 2022-06-01 15:42:12 +00:00
avogar
7ef02a2e44 Fix possible logical error in values table function 2022-06-01 15:32:33 +00:00
Nikolai Kochetov
6e924cdc77 Fix some more tests. 2022-06-01 15:21:47 +00:00
Dmitry Novik
7fbe91ca81
Merge pull request #37460 from ClickHouse/memory-overcommit-improvement
Memory Overcommit: update defaults, exception message and add ProfileEvent
2022-06-01 17:06:33 +02:00
Sema Checherinda
16dc3ed97d FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:07 +02:00
Sema Checherinda
2626a49616 FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:06 +02:00
Kseniia Sumarokova
7afcfcbaaf
Merge pull request #37691 from kssenii/fix-rabbitmq-restart-with-no-settings
Fix rabbitmq restart with empty settings
2022-06-01 14:59:34 +02:00
flynn
b62e4cec65 Fix crash of FunctionHashID 2022-06-01 12:39:16 +00:00
Nikolai Kochetov
e401ab8169 Fix more tests. 2022-06-01 11:51:56 +00:00
Antonio Andelic
08c20be4d0 Cleaner exception handling in ParallelReadBuffer 2022-06-01 11:51:01 +00:00
Robert Schulze
ee302f2d9f
Merge pull request #37643 from amosbird/avoid-useless-context-copy
Avoid useless context copy when building query interpreters
2022-06-01 13:49:56 +02:00
Antonio Andelic
f49dd19e7a Revert "Initialize ParallelReadBuffer after construction"
This reverts commit 31e1e67836.
2022-06-01 11:43:58 +00:00
Kruglov Pavel
251be860e7
Merge pull request #37428 from loyd/fix/37420-rowbinary-bom
Stop removing UTF-8 BOM in RowBinary format
2022-06-01 13:36:55 +02:00
Vladimir C
8c0dba7302
Merge pull request #37650 from amosbird/joinget-fix
Fix joinGet with  join_use_nulls = 1 and Array type
2022-06-01 13:30:29 +02:00
Vladimir C
c466cdebf4
Merge pull request #37530 from vdimir/join_cond_dict_issue_37386 2022-06-01 13:29:01 +02:00
Antonio Andelic
ded1398565 Fix intersect with const string 2022-06-01 11:13:33 +00:00
Robert Schulze
600512cc08
Replace exceptions thrown for programming errors by asserts 2022-06-01 11:53:37 +02:00
Alexey Milovidov
31b3350749
Merge pull request #37710 from ClickHouse/fix-grouping-function
Make GROUPING function skip constant folding
2022-06-01 12:00:14 +03:00
Alexey Milovidov
a0020cb55c
Merge pull request #37724 from CurtizJ/fix-ast-optimizations-remote
Fix `optimize_monotonous_functions_in_order_by` in distributed queries
2022-06-01 11:54:45 +03:00
Han Fei
ea693dd0c2 add config and change test logic 2022-06-01 14:57:07 +08:00
Antonio Andelic
31e1e67836 Initialize ParallelReadBuffer after construction 2022-06-01 06:25:32 +00:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
Fixes #37420
2022-06-01 13:12:55 +08:00
Anton Popov
6cf9405f09 fix optimize_monotonous_functions_in_order_by in distributed queries 2022-06-01 00:50:28 +00:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Nikolai Kochetov
04c14e9c5d Fix tests and add comment. 2022-05-31 20:59:50 +00:00
Alexander Gololobov
26609a1875 Style fixes 2022-05-31 21:41:10 +02:00
Nikolai Kochetov
9954c59dc1 Update test. 2022-05-31 19:40:50 +00:00
Nikolai Kochetov
86fbb74703 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-31 18:07:47 +00:00
Nikolai Kochetov
32010f0ba8 Add a test. 2022-05-31 17:56:48 +00:00
vdimir
284c9bc68b
Rollback some changes from appendFromBlock 2022-05-31 17:40:35 +00:00
mergify[bot]
d57d987a02
Merge branch 'master' into sql-user-defined-functions-readonly-fix 2022-05-31 16:50:00 +00:00
Maksim Kita
66f43b9ad3 Fix executable user default functions execution with Nullable arguments 2022-05-31 18:46:33 +02:00
Dmitry Novik
b11749ca2c Make GROUPING function skip constant folding 2022-05-31 16:45:29 +00:00
Anton Kozlov
3576625647 CLICKHOUSE-2131 Add an option to disable connection pooling in ODBC bridge 2022-05-31 16:26:08 +00:00
vdimir
e7be677fca
Assert structure match up to locard in appendFromBlock 2022-05-31 16:02:58 +00:00
vdimir
7f4ddb1667
Fix assert for 02244_lowcardinality_hash_join 2022-05-31 16:02:57 +00:00
vdimir
2476c6a988
Fix error on joining with dictionary on some conditions 2022-05-31 16:02:57 +00:00
Vladimir C
2a38fdb796
Merge pull request #37653 from vdimir/cross_join_dup_col_names 2022-05-31 17:50:19 +02:00
Maksim Kita
d1a4550b4f Fix create or drop of sql user defined functions in readonly mode 2022-05-31 17:23:41 +02:00
Alexey Milovidov
4bb04f913f Fix clang-tidy-14 2022-05-31 17:20:07 +02:00
Han Fei
5693e6212d add config and fix style check 2022-05-31 23:18:05 +08:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Anton Popov
00f87b0f57 replace multiIf to if in case of one condition 2022-05-31 14:45:12 +00:00
Nikolai Kochetov
147a819221 Refactor a little bit more. 2022-05-31 14:43:38 +00:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement 2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145 Fix 2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
kssenii
69cd3a2b10 Fix 2022-05-31 14:20:31 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests 2022-05-31 11:58:16 +00:00
mergify[bot]
1e08046c47
Merge branch 'master' into cleanup_unused 2022-05-31 10:28:19 +00:00
mergify[bot]
f90dddccba
Merge branch 'master' into fix-temp-table-drop 2022-05-31 09:10:00 +00:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
showCertificate function implementation
2022-05-31 02:50:03 -04:00
xlwh
ba4cdd43bd Cleanup unused file 2022-05-31 14:37:30 +08:00
zhanglistar
53020b096d
Merge branch 'ClickHouse:master' into typo 2022-05-31 11:28:12 +08:00
yaqi-zhao
a2857491c4 add avx512 support for mergetreereader 2022-05-30 20:53:00 -04:00
Yakov Olkhovskiy
c6b20cd5ed
Merge pull request #37187 from Algunenano/floating_seconds
Allow decimal values in settings using seconds
2022-05-30 20:33:47 -04:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
Dmitry Novik
9d04305a5a
Update Settings.h 2022-05-30 23:00:28 +02:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests 2022-05-30 20:52:40 +00:00
Nikolai Kochetov
df0d580a8c Fix another one test. 2022-05-30 19:29:57 +00:00
Kseniia Sumarokova
18bda56e4c
Merge pull request #37655 from ClickHouse/kssenii-patch-3-1
Fix hung check
2022-05-30 21:22:12 +02:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds 2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
Nikolai Kochetov
913e7a91ae Fix limits from subquery. 2022-05-30 18:25:17 +00:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexander Gololobov
e2dd6f6249 Removed prewhere_info.alias_actions 2022-05-30 19:58:23 +02:00
Han Fei
e15cdec39c address comments 2022-05-31 01:46:31 +08:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection 2022-05-30 16:40:18 +00:00
Han Fei
af86900c52 Merge branch 'hanfei/zk-write' of github.com:hanfei1991/ClickHouse into hanfei/zk-write 2022-05-31 00:17:38 +08:00