mergify[bot]
88bed9de20
Merge branch 'master' into fix_replace_range_again
2022-06-02 07:05:44 +00:00
Alexey Milovidov
cb8b0219ac
Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14
2022-06-02 08:46:31 +02:00
Alexander Gololobov
54d8fd0753
Merge pull request #37773 from excitoon/patch-17
...
Typos
2022-06-02 09:27:02 +03:00
Alexander Gololobov
a0cf902d49
Merge pull request #37588 from yaqi-zhao/avx512_tail_zero
...
add avx512 support for mergetree reader
2022-06-02 08:51:52 +03:00
Vladimir Chebotarev
a857bc2ccf
Update S3Common.cpp
2022-06-02 08:46:41 +03:00
Vladimir Chebotarev
5fcf840156
Typo.
2022-06-02 08:43:44 +03:00
Han Fei
1424c420fa
try to fill in right metadata columns
2022-06-02 13:41:37 +08:00
Vxider
8221fcd5f1
update var name
2022-06-02 04:38:45 +00:00
Vxider
df4db70bb8
fix empty target table id
2022-06-02 04:28:18 +00:00
Vladimir Chebotarev
d5022a0c01
Moved ClientConfigurationPerRequest
from ClickHouse/aws-sdk-cpp#1 and ClickHouse/aws-sdk-cpp#2 to ClickHouse.
2022-06-02 06:07:01 +03:00
Nikolay Degterinsky
9575a6d048
Merge pull request #37587 from bigo-sg/typo
...
Fix a typo
2022-06-02 02:18:03 +02:00
Alexander Gololobov
ec6e413f0b
Fixed runtime check for AVX512F
2022-06-01 23:00:49 +02:00
Alexey Milovidov
b5f48a7d3f
Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14
2022-06-01 22:09:58 +02:00
Robert Schulze
366f368d06
Disallow LIKE patterns with trailing escape
...
Trailing escape ('ab\') is disallowed in SQL, in standardese:
"If an escape character is specified, then [...] If there is not a
partitioning of the string PVC into substrings such that each substring
has length 1 (one) or 2, no substring of length 1 (one) is the escape
character ECV, and each substring of length 2 is the escape character
ECV followed by either the escape character ECV, an <underscore>
character, or the <percent> character, then an exception condition is
raised: data exception - invalid escape sequence."
I first thought this is checked already higher up in the stack, at least
for const needles, as single trailing backslashes ('ab\') are rejected,
but then I realized that ClickHouse quotes by default. I.e., double
trailing backslashes ('ab\\') are not rejected but when interpreted as
LIKE needle ('ab\') they should.
2022-06-01 21:38:46 +02:00
lthaooo
6632616733
Fix TTL merge scheduling bug ( #36387 )
2022-06-01 21:09:53 +02:00
Azat Khuzhin
545a56ce45
Fix sinks with onException() handler
...
It is possible to call onException() even after onFinish(), in case of
onFinish() throws, and in this case onException() should be no-op for
such sinks.
Also there can be caveats with PartitionedSync.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
02af58f41d
Fix possible "Cannot write to finalized buffer"
...
It is still possible to get this error since onException does not
finalize format correctly.
Here is an example of such error, that was found by CI [1]:
<details>
[ 2686 ] {fa01bf02-73f6-4f7f-b14f-e725de6d7f9b} <Fatal> : Logical error: 'Cannot write to finalized buffer'.
[ 34577 ] {} <Fatal> BaseDaemon: ########################################
[ 34577 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: AB8040A6769E01A0) (from thread 2686) (query_id: fa01bf02-73f6-4f7f-b14f-e725de6d7f9b) (query: insert into test_02302 select number from numbers(10) settings s3_truncate_on_insert=1;) Received signal Aborted (6)
[ 34577 ] {} <Fatal> BaseDaemon:
[ 34577 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fcbaa5a703b 0x7fcbaa586859 0xfad9bab 0xfad9e05 0xfaf6a3b 0x24a48c7f 0x258fb9b9 0x258f2004 0x258b88f4 0x258b863b 0x2581773d 0x258177ce 0x24bb5e98 0xfad01d6 0xfad0105 0x2419b11d 0xfad01d6 0xfad0105 0x2215afbb 0x2215aa48 0xfad01d6 0xfad0105 0xfcc265d 0x225cc546 0x249a1c40 0x249bc1b6 0x2685902c 0x26859505 0x269d7767 0x269d504c 0x7fcbaa75e609 0x7fcbaa683163
[ 34577 ] {} <Fatal> BaseDaemon: 3. raise @ 0x7fcbaa5a703b in ?
[ 34577 ] {} <Fatal> BaseDaemon: 4. abort @ 0x7fcbaa586859 in ?
[ 34577 ] {} <Fatal> BaseDaemon: 5. ./build_docker/../src/Common/Exception.cpp:47: DB::abortOnFailedAssertion(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xfad9bab in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 6. ./build_docker/../src/Common/Exception.cpp:70: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xfad9e05 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 7. ./build_docker/../src/IO/WriteBuffer.h:0: DB::WriteBuffer::write(char const*, unsigned long) @ 0xfaf6a3b in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 8. ./build_docker/../src/Processors/Formats/Impl/ArrowBufferedStreams.cpp:47: DB::ArrowBufferedOutputStream::Write(void const*, long) @ 0x24a48c7f in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 9. long parquet::ThriftSerializer::Serialize<parquet::format::FileMetaData>(parquet::format::FileMetaData const*, arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) @ 0x258fb9b9 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 10. parquet::FileMetaData::FileMetaDataImpl::WriteTo(arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) const @ 0x258f2004 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 11. parquet::WriteFileMetaData(parquet::FileMetaData const&, arrow::io::OutputStream*) @ 0x258b88f4 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 12. parquet::ParquetFileWriter::~ParquetFileWriter() @ 0x258b863b in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 13. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x2581773d in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 14. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x258177ce in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 15. ./build_docker/../src/Processors/Formats/Impl/ParquetBlockOutputFormat.h:27: DB::ParquetBlockOutputFormat::~ParquetBlockOutputFormat() @ 0x24bb5e98 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 16. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 17. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 18.1. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:312: std::__1::unique_ptr<DB::WriteBuffer, std::__1::default_delete<DB::WriteBuffer> >::reset(DB::WriteBuffer*)
[ 34577 ] {} <Fatal> BaseDaemon: 18.2. inlined from ../contrib/libcxx/include/__memory/unique_ptr.h:269: ~unique_ptr
[ 34577 ] {} <Fatal> BaseDaemon: 18. ../src/Storages/StorageS3.cpp:566: DB::StorageS3Sink::~StorageS3Sink() @ 0x2419b11d in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 19. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 20. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 21. ./build_docker/../contrib/abseil-cpp/absl/container/internal/raw_hash_set.h:1662: absl::lts_20211102::container_internal::raw_hash_set<absl::lts_20211102::container_internal::FlatHashMapPolicy<StringRef, std::__1::shared_ptr<DB::SinkToStorage> >, absl::lts_20211102::hash_internal::Hash<StringRef>, std::__1::equal_to<StringRef>, std::__1::allocator<std::__1::pair<StringRef const, std::__1::shared_ptr<DB::SinkToStorage> > > >::destroy_slots() @ 0x2215afbb in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 22.1. inlined from ./build_docker/../contrib/libcxx/include/string:1445: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__is_long() const
[ 34577 ] {} <Fatal> BaseDaemon: 22.2. inlined from ../contrib/libcxx/include/string:2231: ~basic_string
[ 34577 ] {} <Fatal> BaseDaemon: 22. ../src/Storages/PartitionedSink.h:14: DB::PartitionedSink::~PartitionedSink() @ 0x2215aa48 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 23. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 24. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 25. ./build_docker/../contrib/libcxx/include/vector:802: std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >::__base_destruct_at_end(std::__1::shared_ptr<DB::IProcessor>*) @ 0xfcc265d in /usr/bin/clickhouse
[ 34577 ] {} <Fatal> BaseDaemon: 26.1. inlined from ./build_docker/../contrib/libcxx/include/vector:402: ~vector
[ 34577 ] {} <Fatal> BaseDaemon: 26.2. inlined from ../src/QueryPipeline/QueryPipeline.cpp:29: ~QueryPipeline
[ 34577 ] {} <Fatal> BaseDaemon: 26. ../src/QueryPipeline/QueryPipeline.cpp:535: DB::QueryPipeline::reset() @ 0x225cc546 in /usr/bin/clickhouse
[ 614 ] {} <Fatal> Application: Child process was terminated by signal 6.
</details>
[1]: https://s3.amazonaws.com/clickhouse-test-reports/37542/8a224239c1d922158b4dc9f5d6609dca836dfd06/stress_test__undefined__actions_.html
Follow-up for: #36979
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
62d78d8f20
Fix WriteBufferFromS3 is_finalized check in case of exception
...
WriteBufferFromS3::is_finalized is not set if finalizeImpl() throws,
while WriteBuffer::finalized correctly set even in case of exception, so
it should be used instead.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Alexander Tokmakov
06f80770b8
fix stuck REPALCE_RANGE
2022-06-01 20:11:53 +02:00
Robert Schulze
b3b0716b32
Merge pull request #37544 from ClickHouse/cached_patterns
...
Cache compiled regexps when evaluating non-const needles
2022-06-01 19:55:25 +02:00
Nikolai Kochetov
cc0d5a0daa
Fix test again.
2022-06-01 17:39:12 +00:00
Nikolai Kochetov
9b131f2d2d
Fix tewst again.
2022-06-01 16:56:26 +00:00
avogar
4abfd54dd6
Fix possible segfault in schema inference
2022-06-01 16:53:37 +00:00
Alexey Milovidov
89638de521
Merge pull request #37738 from ClickHouse/fix-intersect-with-const
...
Fix `Intersect` with constant strings
2022-06-01 19:31:55 +03:00
Yakov Olkhovskiy
e23cec01d5
Merge pull request #37581 from ClickHouse/http-named-collection
...
Support for HTTP source for Data Dictionaries in Named Collections
2022-06-01 11:55:04 -04:00
Anton Popov
1ef48c3a4a
turn on setting output_format_json_named_tuples_as_objects by default
2022-06-01 15:42:12 +00:00
avogar
7ef02a2e44
Fix possible logical error in values table function
2022-06-01 15:32:33 +00:00
Nikolai Kochetov
6e924cdc77
Fix some more tests.
2022-06-01 15:21:47 +00:00
Dmitry Novik
7fbe91ca81
Merge pull request #37460 from ClickHouse/memory-overcommit-improvement
...
Memory Overcommit: update defaults, exception message and add ProfileEvent
2022-06-01 17:06:33 +02:00
Sema Checherinda
16dc3ed97d
FR: Expose what triggered the merge in system.part_log #26255
2022-06-01 16:58:07 +02:00
Sema Checherinda
2626a49616
FR: Expose what triggered the merge in system.part_log #26255
2022-06-01 16:58:06 +02:00
Kseniia Sumarokova
7afcfcbaaf
Merge pull request #37691 from kssenii/fix-rabbitmq-restart-with-no-settings
...
Fix rabbitmq restart with empty settings
2022-06-01 14:59:34 +02:00
flynn
b62e4cec65
Fix crash of FunctionHashID
2022-06-01 12:39:16 +00:00
Nikolai Kochetov
e401ab8169
Fix more tests.
2022-06-01 11:51:56 +00:00
Antonio Andelic
08c20be4d0
Cleaner exception handling in ParallelReadBuffer
2022-06-01 11:51:01 +00:00
Robert Schulze
ee302f2d9f
Merge pull request #37643 from amosbird/avoid-useless-context-copy
...
Avoid useless context copy when building query interpreters
2022-06-01 13:49:56 +02:00
Antonio Andelic
f49dd19e7a
Revert "Initialize ParallelReadBuffer after construction"
...
This reverts commit 31e1e67836
.
2022-06-01 11:43:58 +00:00
Kruglov Pavel
251be860e7
Merge pull request #37428 from loyd/fix/37420-rowbinary-bom
...
Stop removing UTF-8 BOM in RowBinary format
2022-06-01 13:36:55 +02:00
Vladimir C
8c0dba7302
Merge pull request #37650 from amosbird/joinget-fix
...
Fix joinGet with join_use_nulls = 1 and Array type
2022-06-01 13:30:29 +02:00
Vladimir C
c466cdebf4
Merge pull request #37530 from vdimir/join_cond_dict_issue_37386
2022-06-01 13:29:01 +02:00
Antonio Andelic
ded1398565
Fix intersect with const string
2022-06-01 11:13:33 +00:00
Robert Schulze
600512cc08
Replace exceptions thrown for programming errors by asserts
2022-06-01 11:53:37 +02:00
Alexey Milovidov
31b3350749
Merge pull request #37710 from ClickHouse/fix-grouping-function
...
Make GROUPING function skip constant folding
2022-06-01 12:00:14 +03:00
Alexey Milovidov
a0020cb55c
Merge pull request #37724 from CurtizJ/fix-ast-optimizations-remote
...
Fix `optimize_monotonous_functions_in_order_by` in distributed queries
2022-06-01 11:54:45 +03:00
Han Fei
ea693dd0c2
add config and change test logic
2022-06-01 14:57:07 +08:00
Antonio Andelic
31e1e67836
Initialize ParallelReadBuffer after construction
2022-06-01 06:25:32 +00:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
...
Fixes #37420
2022-06-01 13:12:55 +08:00
Anton Popov
6cf9405f09
fix optimize_monotonous_functions_in_order_by in distributed queries
2022-06-01 00:50:28 +00:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
...
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Nikolai Kochetov
04c14e9c5d
Fix tests and add comment.
2022-05-31 20:59:50 +00:00
Alexander Gololobov
26609a1875
Style fixes
2022-05-31 21:41:10 +02:00
Nikolai Kochetov
9954c59dc1
Update test.
2022-05-31 19:40:50 +00:00
Nikolai Kochetov
86fbb74703
Merge branch 'master' into refactor-read-metrics-and-callbacks
2022-05-31 18:07:47 +00:00
Nikolai Kochetov
32010f0ba8
Add a test.
2022-05-31 17:56:48 +00:00
vdimir
284c9bc68b
Rollback some changes from appendFromBlock
2022-05-31 17:40:35 +00:00
mergify[bot]
d57d987a02
Merge branch 'master' into sql-user-defined-functions-readonly-fix
2022-05-31 16:50:00 +00:00
Maksim Kita
66f43b9ad3
Fix executable user default functions execution with Nullable arguments
2022-05-31 18:46:33 +02:00
Dmitry Novik
b11749ca2c
Make GROUPING function skip constant folding
2022-05-31 16:45:29 +00:00
Anton Kozlov
3576625647
CLICKHOUSE-2131 Add an option to disable connection pooling in ODBC bridge
2022-05-31 16:26:08 +00:00
vdimir
e7be677fca
Assert structure match up to locard in appendFromBlock
2022-05-31 16:02:58 +00:00
vdimir
7f4ddb1667
Fix assert for 02244_lowcardinality_hash_join
2022-05-31 16:02:57 +00:00
vdimir
2476c6a988
Fix error on joining with dictionary on some conditions
2022-05-31 16:02:57 +00:00
Vladimir C
2a38fdb796
Merge pull request #37653 from vdimir/cross_join_dup_col_names
2022-05-31 17:50:19 +02:00
Maksim Kita
d1a4550b4f
Fix create or drop of sql user defined functions in readonly mode
2022-05-31 17:23:41 +02:00
Alexey Milovidov
4bb04f913f
Fix clang-tidy-14
2022-05-31 17:20:07 +02:00
Han Fei
5693e6212d
add config and fix style check
2022-05-31 23:18:05 +08:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
...
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Anton Popov
00f87b0f57
replace multiIf to if in case of one condition
2022-05-31 14:45:12 +00:00
Nikolai Kochetov
147a819221
Refactor a little bit more.
2022-05-31 14:43:38 +00:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
...
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
...
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement
2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
...
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145
Fix
2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
...
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
kssenii
69cd3a2b10
Fix
2022-05-31 14:20:31 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests
2022-05-31 11:58:16 +00:00
mergify[bot]
1e08046c47
Merge branch 'master' into cleanup_unused
2022-05-31 10:28:19 +00:00
mergify[bot]
f90dddccba
Merge branch 'master' into fix-temp-table-drop
2022-05-31 09:10:00 +00:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
...
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
...
showCertificate function implementation
2022-05-31 02:50:03 -04:00
xlwh
ba4cdd43bd
Cleanup unused file
2022-05-31 14:37:30 +08:00
zhanglistar
53020b096d
Merge branch 'ClickHouse:master' into typo
2022-05-31 11:28:12 +08:00
yaqi-zhao
a2857491c4
add avx512 support for mergetreereader
2022-05-30 20:53:00 -04:00
Yakov Olkhovskiy
c6b20cd5ed
Merge pull request #37187 from Algunenano/floating_seconds
...
Allow decimal values in settings using seconds
2022-05-30 20:33:47 -04:00
Anton Popov
30f8eb800a
optimize function coalesce with two arguments
2022-05-30 22:29:35 +00:00
Dmitry Novik
9d04305a5a
Update Settings.h
2022-05-30 23:00:28 +02:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests
2022-05-30 20:52:40 +00:00
Nikolai Kochetov
df0d580a8c
Fix another one test.
2022-05-30 19:29:57 +00:00
Kseniia Sumarokova
18bda56e4c
Merge pull request #37655 from ClickHouse/kssenii-patch-3-1
...
Fix hung check
2022-05-30 21:22:12 +02:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds
2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
...
Add GROUPING function
2022-05-30 20:45:04 +02:00
Nikolai Kochetov
913e7a91ae
Fix limits from subquery.
2022-05-30 18:25:17 +00:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
...
This commit is based on local benchmarks of ClickHouse's re2 caching.
Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;
The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.
(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)
Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;
I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
collision list which grows to a certain size (here: 10 elements) and
afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
collision list in which a collision replaces the stored element, idea
by Alexey)
... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C) 2 mio distinct complex patterns
D) 10 complex patterns
Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.
B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.
Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.
Results in sec, averaged over on multiple measurements;
1.A): 2.12
B): 1.68
C): 9.75
D): 9.45
2.A): 2.17
B): 1.73
C): 9.78
D): 9.47
3.A): 9.8
B): 0.63
C): 31.8
D): 0.98
4.A): 2.14
B): 0.29
C): 9.82
D): 0.41
5.A) 2.12 / 2.15 / 2.26
B) 1.51 / 0.43 / 0.30
C) 9.97 / 9.88 / 10.13
D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)
Evaluation:
1. This is the baseline. It was surprised that complex patterns (C, D)
slow down the queries so badly compared to simple patterns (A, B).
The runtime includes evaluation costs, but as caching only helps with
compilation, and looking at 4.D and 5.D, compilation makes up over 90%
of the runtime!
2. No speedup compared to 1, probably due to locking overhead. The cache
is unbounded, and in experiments with data sets > 2 mio rows, 2. is
the only scheme to throw OOM exceptions which is not acceptable.
3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
bad runtimes due to LRU queue maintenance and locking. Works pretty
well however with few distinct patterns (B and D).
4. This scheme is tailored to queries B and D where it performs pretty
good. More importantly, the caching is lightweight enough to not
deteriorate performance on datasets A and C.
5. After some tuning of the hash map size, 100 buckets seem optimal to
be in the same ballpark with 10 distinct patterns as 4. Performance
also does not deteriorate on A and C compared to the baseline.
Unlike 4., this scheme behaves LRU-like and can adjust to changing
pattern distributions.
As a conclusion, this commit implementes two things:
1. Based on Q1, pattern search with const needle no longer uses
caching. This applies to LIKE and MATCH + a few (exotic) other SQL
functions. The code for the unbounded caching was removed.
2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexander Gololobov
e2dd6f6249
Removed prewhere_info.alias_actions
2022-05-30 19:58:23 +02:00
Han Fei
e15cdec39c
address comments
2022-05-31 01:46:31 +08:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
...
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
...
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection
2022-05-30 16:40:18 +00:00
Han Fei
af86900c52
Merge branch 'hanfei/zk-write' of github.com:hanfei1991/ClickHouse into hanfei/zk-write
2022-05-31 00:17:38 +08:00