Commit Graph

27256 Commits

Author SHA1 Message Date
Alexey Milovidov
b5f48a7d3f Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14 2022-06-01 22:09:58 +02:00
Robert Schulze
366f368d06
Disallow LIKE patterns with trailing escape
Trailing escape ('ab\') is disallowed in SQL, in standardese:

  "If an escape character is specified, then [...] If there is not a
  partitioning of the string PVC into substrings such that each substring
  has length 1 (one) or 2, no substring of length 1 (one) is the escape
  character ECV, and each substring of length 2 is the escape character
  ECV followed by either the escape character ECV, an <underscore>
  character, or the <percent> character, then an exception condition is
  raised: data exception - invalid escape sequence."

I first thought this is checked already higher up in the stack, at least
for const needles, as single trailing backslashes ('ab\') are rejected,
but then I realized that ClickHouse quotes by default. I.e., double
trailing backslashes ('ab\\') are not rejected but when interpreted as
LIKE needle ('ab\') they should.
2022-06-01 21:38:46 +02:00
alesapin
a9af4a80b0 Add first simple implementation 2022-06-01 21:29:10 +02:00
lthaooo
6632616733
Fix TTL merge scheduling bug (#36387) 2022-06-01 21:09:53 +02:00
Azat Khuzhin
545a56ce45 Fix sinks with onException() handler
It is possible to call onException() even after onFinish(), in case of
onFinish() throws, and in this case onException() should be no-op for
such sinks.

Also there can be caveats with PartitionedSync.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
02af58f41d Fix possible "Cannot write to finalized buffer"
It is still possible to get this error since onException does not
finalize format correctly.

Here is an example of such error, that was found by CI [1]:

<details>

    [ 2686 ] {fa01bf02-73f6-4f7f-b14f-e725de6d7f9b} <Fatal> : Logical error: 'Cannot write to finalized buffer'.
    [ 34577 ] {} <Fatal> BaseDaemon: ########################################
    [ 34577 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: AB8040A6769E01A0) (from thread 2686) (query_id: fa01bf02-73f6-4f7f-b14f-e725de6d7f9b) (query: insert into test_02302 select number from numbers(10) settings s3_truncate_on_insert=1;) Received signal Aborted (6)
    [ 34577 ] {} <Fatal> BaseDaemon:
    [ 34577 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fcbaa5a703b 0x7fcbaa586859 0xfad9bab 0xfad9e05 0xfaf6a3b 0x24a48c7f 0x258fb9b9 0x258f2004 0x258b88f4 0x258b863b 0x2581773d 0x258177ce 0x24bb5e98 0xfad01d6 0xfad0105 0x2419b11d 0xfad01d6 0xfad0105 0x2215afbb 0x2215aa48 0xfad01d6 0xfad0105 0xfcc265d 0x225cc546 0x249a1c40 0x249bc1b6 0x2685902c 0x26859505 0x269d7767 0x269d504c 0x7fcbaa75e609 0x7fcbaa683163
    [ 34577 ] {} <Fatal> BaseDaemon: 3. raise @ 0x7fcbaa5a703b in ?
    [ 34577 ] {} <Fatal> BaseDaemon: 4. abort @ 0x7fcbaa586859 in ?
    [ 34577 ] {} <Fatal> BaseDaemon: 5. ./build_docker/../src/Common/Exception.cpp:47: DB::abortOnFailedAssertion(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xfad9bab in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 6. ./build_docker/../src/Common/Exception.cpp:70: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xfad9e05 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 7. ./build_docker/../src/IO/WriteBuffer.h:0: DB::WriteBuffer::write(char const*, unsigned long) @ 0xfaf6a3b in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 8. ./build_docker/../src/Processors/Formats/Impl/ArrowBufferedStreams.cpp:47: DB::ArrowBufferedOutputStream::Write(void const*, long) @ 0x24a48c7f in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 9. long parquet::ThriftSerializer::Serialize<parquet::format::FileMetaData>(parquet::format::FileMetaData const*, arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) @ 0x258fb9b9 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 10. parquet::FileMetaData::FileMetaDataImpl::WriteTo(arrow::io::OutputStream*, std::__1::shared_ptr<parquet::Encryptor> const&) const @ 0x258f2004 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 11. parquet::WriteFileMetaData(parquet::FileMetaData const&, arrow::io::OutputStream*) @ 0x258b88f4 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 12. parquet::ParquetFileWriter::~ParquetFileWriter() @ 0x258b863b in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 13. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x2581773d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 14. parquet::arrow::FileWriterImpl::~FileWriterImpl() @ 0x258177ce in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 15. ./build_docker/../src/Processors/Formats/Impl/ParquetBlockOutputFormat.h:27: DB::ParquetBlockOutputFormat::~ParquetBlockOutputFormat() @ 0x24bb5e98 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 16. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 17. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 18.1. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:312: std::__1::unique_ptr<DB::WriteBuffer, std::__1::default_delete<DB::WriteBuffer> >::reset(DB::WriteBuffer*)
    [ 34577 ] {} <Fatal> BaseDaemon: 18.2. inlined from ../contrib/libcxx/include/__memory/unique_ptr.h:269: ~unique_ptr
    [ 34577 ] {} <Fatal> BaseDaemon: 18. ../src/Storages/StorageS3.cpp:566: DB::StorageS3Sink::~StorageS3Sink() @ 0x2419b11d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 19. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 20. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 21. ./build_docker/../contrib/abseil-cpp/absl/container/internal/raw_hash_set.h:1662: absl::lts_20211102::container_internal::raw_hash_set<absl::lts_20211102::container_internal::FlatHashMapPolicy<StringRef, std::__1::shared_ptr<DB::SinkToStorage> >, absl::lts_20211102::hash_internal::Hash<StringRef>, std::__1::equal_to<StringRef>, std::__1::allocator<std::__1::pair<StringRef const, std::__1::shared_ptr<DB::SinkToStorage> > > >::destroy_slots() @ 0x2215afbb in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 22.1. inlined from ./build_docker/../contrib/libcxx/include/string:1445: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__is_long() const
    [ 34577 ] {} <Fatal> BaseDaemon: 22.2. inlined from ../contrib/libcxx/include/string:2231: ~basic_string
    [ 34577 ] {} <Fatal> BaseDaemon: 22. ../src/Storages/PartitionedSink.h:14: DB::PartitionedSink::~PartitionedSink() @ 0x2215aa48 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 23. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:173: std::__1::__shared_count::__release_shared() @ 0xfad01d6 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 24. ./build_docker/../contrib/libcxx/include/__memory/shared_ptr.h:216: std::__1::__shared_weak_count::__release_shared() @ 0xfad0105 in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 25. ./build_docker/../contrib/libcxx/include/vector:802: std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >::__base_destruct_at_end(std::__1::shared_ptr<DB::IProcessor>*) @ 0xfcc265d in /usr/bin/clickhouse
    [ 34577 ] {} <Fatal> BaseDaemon: 26.1. inlined from ./build_docker/../contrib/libcxx/include/vector:402: ~vector
    [ 34577 ] {} <Fatal> BaseDaemon: 26.2. inlined from ../src/QueryPipeline/QueryPipeline.cpp:29: ~QueryPipeline
    [ 34577 ] {} <Fatal> BaseDaemon: 26. ../src/QueryPipeline/QueryPipeline.cpp:535: DB::QueryPipeline::reset() @ 0x225cc546 in /usr/bin/clickhouse
    [ 614 ] {} <Fatal> Application: Child process was terminated by signal 6.

</details>

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37542/8a224239c1d922158b4dc9f5d6609dca836dfd06/stress_test__undefined__actions_.html

Follow-up for: #36979

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Azat Khuzhin
62d78d8f20 Fix WriteBufferFromS3 is_finalized check in case of exception
WriteBufferFromS3::is_finalized is not set if finalizeImpl() throws,
while WriteBuffer::finalized correctly set even in case of exception, so
it should be used instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-01 21:50:30 +03:00
Alexander Tokmakov
06f80770b8 fix stuck REPALCE_RANGE 2022-06-01 20:11:53 +02:00
Robert Schulze
b3b0716b32
Merge pull request #37544 from ClickHouse/cached_patterns
Cache compiled regexps when evaluating non-const needles
2022-06-01 19:55:25 +02:00
Nikolai Kochetov
cc0d5a0daa Fix test again. 2022-06-01 17:39:12 +00:00
Nikolai Kochetov
9b131f2d2d Fix tewst again. 2022-06-01 16:56:26 +00:00
avogar
4abfd54dd6 Fix possible segfault in schema inference 2022-06-01 16:53:37 +00:00
Alexey Milovidov
89638de521
Merge pull request #37738 from ClickHouse/fix-intersect-with-const
Fix `Intersect` with constant strings
2022-06-01 19:31:55 +03:00
Yakov Olkhovskiy
e23cec01d5
Merge pull request #37581 from ClickHouse/http-named-collection
Support for HTTP source for Data Dictionaries in Named Collections
2022-06-01 11:55:04 -04:00
Anton Popov
1ef48c3a4a turn on setting output_format_json_named_tuples_as_objects by default 2022-06-01 15:42:12 +00:00
avogar
7ef02a2e44 Fix possible logical error in values table function 2022-06-01 15:32:33 +00:00
Nikolai Kochetov
6e924cdc77 Fix some more tests. 2022-06-01 15:21:47 +00:00
Dmitry Novik
7fbe91ca81
Merge pull request #37460 from ClickHouse/memory-overcommit-improvement
Memory Overcommit: update defaults, exception message and add ProfileEvent
2022-06-01 17:06:33 +02:00
Sema Checherinda
16dc3ed97d FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:07 +02:00
Sema Checherinda
2626a49616 FR: Expose what triggered the merge in system.part_log #26255 2022-06-01 16:58:06 +02:00
Alexander Tokmakov
3d346c766a better code 2022-06-01 16:49:26 +02:00
HeenaBansal2009
10990402ac Removed move ctor from EventBase hierarchy 2022-06-01 06:01:22 -07:00
Kseniia Sumarokova
7afcfcbaaf
Merge pull request #37691 from kssenii/fix-rabbitmq-restart-with-no-settings
Fix rabbitmq restart with empty settings
2022-06-01 14:59:34 +02:00
flynn
b62e4cec65 Fix crash of FunctionHashID 2022-06-01 12:39:16 +00:00
Alexander Tokmakov
75f49a48e1 Merge branch 'master' into fix_trash 2022-06-01 14:20:46 +02:00
Anton Popov
011131198c Merge remote-tracking branch 'upstream/master' into fix-mutations-again 2022-06-01 12:04:18 +00:00
Nikolai Kochetov
e401ab8169 Fix more tests. 2022-06-01 11:51:56 +00:00
Antonio Andelic
08c20be4d0 Cleaner exception handling in ParallelReadBuffer 2022-06-01 11:51:01 +00:00
Robert Schulze
ee302f2d9f
Merge pull request #37643 from amosbird/avoid-useless-context-copy
Avoid useless context copy when building query interpreters
2022-06-01 13:49:56 +02:00
Antonio Andelic
f49dd19e7a Revert "Initialize ParallelReadBuffer after construction"
This reverts commit 31e1e67836.
2022-06-01 11:43:58 +00:00
Kruglov Pavel
251be860e7
Merge pull request #37428 from loyd/fix/37420-rowbinary-bom
Stop removing UTF-8 BOM in RowBinary format
2022-06-01 13:36:55 +02:00
Vladimir C
8c0dba7302
Merge pull request #37650 from amosbird/joinget-fix
Fix joinGet with  join_use_nulls = 1 and Array type
2022-06-01 13:30:29 +02:00
Vladimir C
c466cdebf4
Merge pull request #37530 from vdimir/join_cond_dict_issue_37386 2022-06-01 13:29:01 +02:00
Antonio Andelic
ded1398565 Fix intersect with const string 2022-06-01 11:13:33 +00:00
Anton Popov
cae2478b3f Revert "Merge pull request #37355 from ClickHouse/revert-37266-fix-mutations-with-object"
This reverts commit a53cfa9fca, reversing
changes made to 9acb42fcdb.
2022-06-01 10:57:20 +00:00
Robert Schulze
600512cc08
Replace exceptions thrown for programming errors by asserts 2022-06-01 11:53:37 +02:00
Alexey Milovidov
31b3350749
Merge pull request #37710 from ClickHouse/fix-grouping-function
Make GROUPING function skip constant folding
2022-06-01 12:00:14 +03:00
Alexey Milovidov
a0020cb55c
Merge pull request #37724 from CurtizJ/fix-ast-optimizations-remote
Fix `optimize_monotonous_functions_in_order_by` in distributed queries
2022-06-01 11:54:45 +03:00
Han Fei
ea693dd0c2 add config and change test logic 2022-06-01 14:57:07 +08:00
Antonio Andelic
31e1e67836 Initialize ParallelReadBuffer after construction 2022-06-01 06:25:32 +00:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
Fixes #37420
2022-06-01 13:12:55 +08:00
Anton Popov
6cf9405f09 fix optimize_monotonous_functions_in_order_by in distributed queries 2022-06-01 00:50:28 +00:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Nikolai Kochetov
04c14e9c5d Fix tests and add comment. 2022-05-31 20:59:50 +00:00
Alexander Gololobov
26609a1875 Style fixes 2022-05-31 21:41:10 +02:00
Nikolai Kochetov
9954c59dc1 Update test. 2022-05-31 19:40:50 +00:00
Nikolai Kochetov
86fbb74703 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-31 18:07:47 +00:00
Nikolai Kochetov
32010f0ba8 Add a test. 2022-05-31 17:56:48 +00:00
vdimir
284c9bc68b
Rollback some changes from appendFromBlock 2022-05-31 17:40:35 +00:00
mergify[bot]
d57d987a02
Merge branch 'master' into sql-user-defined-functions-readonly-fix 2022-05-31 16:50:00 +00:00
Maksim Kita
66f43b9ad3 Fix executable user default functions execution with Nullable arguments 2022-05-31 18:46:33 +02:00
Dmitry Novik
b11749ca2c Make GROUPING function skip constant folding 2022-05-31 16:45:29 +00:00
Anton Kozlov
3576625647 CLICKHOUSE-2131 Add an option to disable connection pooling in ODBC bridge 2022-05-31 16:26:08 +00:00
vdimir
e7be677fca
Assert structure match up to locard in appendFromBlock 2022-05-31 16:02:58 +00:00
vdimir
7f4ddb1667
Fix assert for 02244_lowcardinality_hash_join 2022-05-31 16:02:57 +00:00
vdimir
2476c6a988
Fix error on joining with dictionary on some conditions 2022-05-31 16:02:57 +00:00
Vladimir C
2a38fdb796
Merge pull request #37653 from vdimir/cross_join_dup_col_names 2022-05-31 17:50:19 +02:00
Maksim Kita
d1a4550b4f Fix create or drop of sql user defined functions in readonly mode 2022-05-31 17:23:41 +02:00
Alexey Milovidov
4bb04f913f Fix clang-tidy-14 2022-05-31 17:20:07 +02:00
avogar
858570d335 Refactor docs related to format settings 2022-05-31 15:18:49 +00:00
Han Fei
5693e6212d add config and fix style check 2022-05-31 23:18:05 +08:00
HeenaBansal2009
2584bbb7f1 Fix Style check 2022-05-31 07:49:54 -07:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
Anton Popov
00f87b0f57 replace multiIf to if in case of one condition 2022-05-31 14:45:12 +00:00
Nikolai Kochetov
147a819221 Refactor a little bit more. 2022-05-31 14:43:38 +00:00
Dmitry Novik
b41fe00f31
Merge pull request #37542 from azat/grouping-sets-fix-optimize_aggregation_in_order
Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR)
2022-05-31 15:31:45 +02:00
Dmitry Novik
f58623a375
Merge pull request #37593 from azat/union-type-cast-resubmit
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-31 15:27:50 +02:00
mergify[bot]
ba49c6bb46
Merge branch 'master' into memory-overcommit-improvement 2022-05-31 13:17:06 +00:00
alesapin
473b0bd0db
Merge pull request #37604 from ClickHouse/turn_on_s3_tests
Turn on s3 tests to red mode
2022-05-31 15:01:24 +02:00
kssenii
c2087b3145 Fix 2022-05-31 14:38:11 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
kssenii
69cd3a2b10 Fix 2022-05-31 14:20:31 +02:00
mergify[bot]
d85c3ec69e
Merge branch 'master' into turn_on_s3_tests 2022-05-31 11:58:16 +00:00
mergify[bot]
1e08046c47
Merge branch 'master' into cleanup_unused 2022-05-31 10:28:19 +00:00
taiyang-li
047387bf1c fix 2 bugs: 1. select count(1) from hive_table; 2. select _file, _path from hive_table 2022-05-31 17:39:02 +08:00
mergify[bot]
f90dddccba
Merge branch 'master' into fix-temp-table-drop 2022-05-31 09:10:00 +00:00
Kseniia Sumarokova
73ed9c3977
Merge pull request #37619 from Vxider/wv-fix-table-identifier
Fix bugs in WindowView when using table identifier
2022-05-31 11:07:11 +02:00
zvonand
869486cc3b fix segfault(2) 2022-05-31 11:40:49 +03:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
showCertificate function implementation
2022-05-31 02:50:03 -04:00
xlwh
ba4cdd43bd Cleanup unused file 2022-05-31 14:37:30 +08:00
zhanglistar
53020b096d
Merge branch 'ClickHouse:master' into typo 2022-05-31 11:28:12 +08:00
HeenaBansal2009
3976afa56a Fix build failures 2022-05-30 20:06:27 -07:00
yaqi-zhao
a2857491c4 add avx512 support for mergetreereader 2022-05-30 20:53:00 -04:00
Yakov Olkhovskiy
c6b20cd5ed
Merge pull request #37187 from Algunenano/floating_seconds
Allow decimal values in settings using seconds
2022-05-30 20:33:47 -04:00
zvonand
8aebaa7194 fix segfault 2022-05-31 01:33:44 +03:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
Dmitry Novik
9d04305a5a
Update Settings.h 2022-05-30 23:00:28 +02:00
mergify[bot]
55913cf8e1
Merge branch 'master' into turn_on_s3_tests 2022-05-30 20:52:40 +00:00
Nikolai Kochetov
df0d580a8c Fix another one test. 2022-05-30 19:29:57 +00:00
Kseniia Sumarokova
18bda56e4c
Merge pull request #37655 from ClickHouse/kssenii-patch-3-1
Fix hung check
2022-05-30 21:22:12 +02:00
mergify[bot]
b43cfd056f
Merge branch 'master' into floating_seconds 2022-05-30 19:18:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
Nikolai Kochetov
913e7a91ae Fix limits from subquery. 2022-05-30 18:25:17 +00:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexander Gololobov
e2dd6f6249 Removed prewhere_info.alias_actions 2022-05-30 19:58:23 +02:00
Han Fei
e15cdec39c address comments 2022-05-31 01:46:31 +08:00
Anton Popov
52d3791eb9
Merge pull request #37600 from CurtizJ/fix-with-fill-interval
Fix `WITH FILL` with negative intervals in `STEP` clause
2022-05-30 19:43:12 +02:00
alesapin
6db44f633f
Merge pull request #37641 from azat/keeper-list-watches
keeper: store only unique session IDs for watches (should fix SIGKILL in stress tests)
2022-05-30 18:55:52 +02:00
Fred Wulff
537050ef8e Add support for DeleteObject when chunk_size == 0 2022-05-30 16:54:44 +00:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection 2022-05-30 16:40:18 +00:00
Han Fei
af86900c52 Merge branch 'hanfei/zk-write' of github.com:hanfei1991/ClickHouse into hanfei/zk-write 2022-05-31 00:17:38 +08:00
Han Fei
42fca8d5f0 address comments 2022-05-31 00:17:32 +08:00
Han Fei
a464b10afe
Update src/Storages/System/StorageSystemZooKeeper.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2022-05-30 23:52:37 +08:00
Han Fei
194445646a
Update src/Storages/System/StorageSystemZooKeeper.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2022-05-30 23:47:43 +08:00
vdimir
8a3f4bda62
Fix columns number mismatch in cross join 2022-05-30 15:40:15 +00:00
Kseniia Sumarokova
b1ba7b7027
Fix build 2022-05-30 17:30:59 +02:00
Kseniia Sumarokova
1869adfd7d
Update FileCache.cpp 2022-05-30 16:06:58 +02:00
Nikolai Kochetov
c71256ea38 Remove some commented code. 2022-05-30 13:18:20 +00:00
Nikolai Kochetov
5ef51ed27b Fix more tests. 2022-05-30 13:10:30 +00:00
Amos Bird
b68e8efaf0
Fix joinGet with cannot be null type 2022-05-30 21:01:27 +08:00
Alexander Tokmakov
351956d108
Merge pull request #37640 from azat/transaction-fix
Fix excessive LIST requests to coordinator for transactions
2022-05-30 15:46:52 +03:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
Check format name on storage creation
2022-05-30 14:23:20 +02:00
avogar
139a7e19a9 Fix comments 2022-05-30 11:43:29 +00:00
alesapin
87baabb1a8 Followup fix 2022-05-30 13:34:42 +02:00
alesapin
362fa745e6 Ignore broken metadata 2022-05-30 13:32:12 +02:00
Nikolai Kochetov
5b4658aa5e Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-30 09:47:35 +00:00
Amos Bird
6525bfc4cd
Avoid context copy for InterpreterSelects 2022-05-30 17:08:12 +08:00
Azat Khuzhin
d86181d3cd keeper: store only unique session IDs for watches
This should speed up keeper, especially in case of incorrect usage (like
the case that had been fixed in #37640), especially in case on non
release build.

And also this should fix SIGKILL in stress tests.

You will find some details for one of such SIGKILL in `<details>` tag [1]:

<details>

    $ pigz -cd clickhouse-server.stress.log.gz | tail
    2022.05.27 16:17:24.882971 [ 637 ] {} <Trace> BackgroundSchedulePool/BgSchPool: Waiting for threads to finish.
    2022.05.27 16:17:24.896749 [ 637 ] {} <Debug> MemoryTracker: Peak memory usage (for query): 4.09 MiB.
    2022.05.27 16:17:24.907163 [ 637 ] {} <Debug> Application: Shut down storages.
    2022.05.27 16:17:24.907233 [ 637 ] {} <Debug> Application: Waiting for current connections to servers for tables to finish.
    2022.05.27 16:17:24.934335 [ 637 ] {} <Information> Application: Closed all listening sockets. Waiting for 1 outstanding connections.
    2022.05.27 16:17:29.843491 [ 637 ] {} <Information> Application: Closed connections to servers for tables. But 1 remain. Probably some tables of other users cannot finish their connections after context shutdown.
    2022.05.27 16:17:29.843632 [ 637 ] {} <Debug> KeeperDispatcher: Shutting down storage dispatcher
    2022.05.27 16:17:34.612616 [ 688 ] {} <Test> virtual Coordination::ZooKeeperRequest::~ZooKeeperRequest(): Processing of request xid=2147483647 took 10000 ms
    2022.05.27 16:17:54.612109 [ 3176 ] {} <Debug> KeeperTCPHandler: Session #12 expired
    2022.05.27 16:19:59.823038 [ 635 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).

    Thread 26 (Thread 0x7f1c7703f700 (LWP 708)):
    0  0x000000000b074b2a in __tsan::MemoryAccessImpl(__tsan::ThreadState*, unsigned long, int, bool, bool, unsigned long long*, __tsan::Shadow) ()
    1  0x000000000b08630c in __tsan::MemoryAccessRange(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
    2  0x000000000b01ff03 in memmove ()
    3  0x000000001bbc8996 in std::__1::__move<long, long> (__first=0xb8600000d83304, __last=<optimized out>, __result=0x7f1c021cd000) at ../contrib/libcxx/include/__algorithm/move.h:57
    4  std::__1::move<long*, long*> (__first=0xb8600000d83304, __last=<optimized out>, __result=0x7f1c021cd000) at ../contrib/libcxx/include/__algorithm/move.h:70
    5  std::__1::vector<long, std::__1::allocator<long> >::erase (this=0x7b1400584c48, __position=...) at ../contrib/libcxx/include/vector:1608
    6  DB::KeeperStorage::clearDeadWatches (this=0x7b5800001ad8, this@entry=0x7b5800001800, session_id=session_id@entry=12) at ../src/Coordination/KeeperStorage.cpp:1228
    7  0x000000001bbc5c55 in DB::KeeperStorage::processRequest (this=0x7b5800001800, zk_request=..., session_id=12, time=1, new_last_zxid=..., check_acl=true) at ../src/Coordination/KeeperStorage.cpp:1122
    8  0x000000001bba06a3 in DB::KeeperStateMachine::commit (this=<optimized out>, log_idx=3549, data=...) at ../src/Coordination/KeeperStateMachine.cpp:143
    9  0x000000001bba6193 in nuraft::state_machine::commit_ext (this=0x7b4c00001f98, params=...) at ../contrib/NuRaft/include/libnuraft/state_machine.hxx:75
    10 0x00000000202c5a55 in nuraft::raft_server::commit_app_log (this=this@entry=0x7b6c00002a18, idx_to_commit=idx_to_commit@entry=3549, le=..., need_to_handle_commit_elem=true, initial_commit_exec=false) at ../contrib/NuRaft/src/handle_commit.cxx:311
    11 0x00000000202c4f98 in nuraft::raft_server::commit_in_bg_exec (this=<optimized out>, this@entry=0x7b6c00002a18, timeout_ms=timeout_ms@entry=0, initial_commit_exec=false) at ../contrib/NuRaft/src/handle_commit.cxx:241
    12 0x00000000202c4613 in nuraft::raft_server::commit_in_bg (this=this@entry=0x7b6c00002a18) at ../contrib/NuRaft/src/handle_commit.cxx:149
    ...
    Thread 28 (Thread 0x7f1c7603d700 (LWP 710)):
    0  0x00007f1d22a6d110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
    1  0x00007f1d22a650a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
    2  0x000000000b0337b0 in pthread_mutex_lock ()
    3  0x00000000221884da in std::__1::__libcpp_mutex_lock (__m=0x7b4c00002088) at ../contrib/libcxx/include/__threading_support:303
    4  std::__1::mutex::lock (this=0x7b4c00002088) at ../contrib/libcxx/src/mutex.cpp:33
    5  0x000000001bba4188 in std::__1::lock_guard<std::__1::mutex>::lock_guard (__m=..., this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:91
    6  DB::KeeperStateMachine::getDeadSessions (this=0x7b4c00001f98) at ../src/Coordination/KeeperStateMachine.cpp:360
    7  0x000000001bb79b4b in DB::KeeperServer::getDeadSessions (this=0x7b4400012700) at ../src/Coordination/KeeperServer.cpp:572
    8  0x000000001bb64d1a in DB::KeeperDispatcher::sessionCleanerTask (this=<optimized out>, this@entry=0x7b640001c218) at ../src/Coordination/KeeperDispatcher.cpp:399
    ...
    Thread 1 (Thread 0x7f1d227148c0 (LWP 637)):
    0  0x00007f1d22a69376 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    1  0x000000000b0895e0 in __tsan::call_pthread_cancel_with_cleanup(int (*)(void*), void (*)(void*), void*) ()
    2  0x000000000b017091 in pthread_cond_wait ()
    3  0x0000000020569d98 in Poco::EventImpl::waitImpl (this=0x7b2000008798) at ../contrib/poco/Foundation/src/Event_POSIX.cpp:106
    4  0x000000001bb636cf in Poco::Event::wait (this=0x7b2000008798) at ../contrib/poco/Foundation/include/Poco/Event.h:97
    5  ThreadFromGlobalPool::join (this=<optimized out>) at ../src/Common/ThreadPool.h:217
    6  DB::KeeperDispatcher::shutdown (this=0x7b640001c218) at ../src/Coordination/KeeperDispatcher.cpp:322
    7  0x0000000019ca8bfc in DB::Context::shutdownKeeperDispatcher (this=<optimized out>) at ../src/Interpreters/Context.cpp:2111
    8  0x000000000b0a979b in DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_9::operator()() const (this=0x7ffcde44f0a0) at ../programs/server/Server.cpp:1407

</details>

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37593/2613149f6bf4f242bbbf2c3c8539b5176fd77286/stress_test__thread__actions_.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-30 11:50:48 +03:00
Azat Khuzhin
78bd47d8df Fix excessive LIST requests to coordinator for transactions
In [1] there was only few transactions, but lots of List for /test/clickhouse/txn/log:

    $ clickhouse-local --format TSVWithNamesAndTypes --file zookeeper_log.tsv.gz -q "select * except('path|session_id|event_time|thread_id|event_date|xid') apply(x->groupUniqArray(x)), path, session_id, min(event_time), max(event_time), count() from table where has_watch and type = 'Request' group by path, session_id order by count() desc limit 1 format Vertical"
    Row 1:
    ──────
    groupUniqArray(type):             ['Request']
    groupUniqArray(query_id):         ['','62d75128-9031-48a5-87ba-aec3f0b591c6']
    groupUniqArray(address):          ['::1']
    groupUniqArray(port):             [9181]
    groupUniqArray(has_watch):        [1]
    groupUniqArray(op_num):           ['List']
    groupUniqArray(data):             ['']
    groupUniqArray(is_ephemeral):     [0]
    groupUniqArray(is_sequential):    [0]
    groupUniqArray(version):          []
    groupUniqArray(requests_size):    [0]
    groupUniqArray(request_idx):      [0]
    groupUniqArray(error):            []
    groupUniqArray(watch_type):       []
    groupUniqArray(watch_state):      []
    groupUniqArray(stat_version):     [0]
    groupUniqArray(stat_cversion):    [0]
    groupUniqArray(stat_dataLength):  [0]
    groupUniqArray(stat_numChildren): [0]
    groupUniqArray(children):         [[]]
    path:                             /test/clickhouse/txn/log
    session_id:                       1
    min(event_time):                  2022-05-27 12:54:09.025897
    max(event_time):                  2022-05-27 13:37:12.846314 <!-- last transaction was at 12:54, see server log
    count():                          3673675 <-- huge

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37593/2613149f6bf4f242bbbf2c3c8539b5176fd77286/stateless_tests__debug__actions__[1/3].html

Server log:

    $ pigz -cd clickhouse-server.log.gz | fgrep TransactionLog: | head -1
    2022.05.27 12:54:09.026852 [ 5018 ] {62d75128-9031-48a5-87ba-aec3f0b591c6} <Trace> TransactionLog: Loading 33 entries from /test/clickhouse/txn/log: csn-0000000000..csn-0000000032
    $ pigz -cd clickhouse-server.log.gz | fgrep TransactionLog: | tail -1
    2022.05.27 12:54:58.909222 [ 509 ] {} <Test> TransactionLog: Closing readonly transaction (177, 38, 41b51ff1-bcba-43bf-bcea-e97ad05f6040)
    $ pigz -cd clickhouse-server.log.gz | fgrep 62d75128-9031-48a5-87ba-aec3f0b591c6 | tail -1
    2022.05.27 12:54:09.064857 [ 5018 ] {62d75128-9031-48a5-87ba-aec3f0b591c6} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

Fixes: #37398 (cc @tavplubix)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-30 11:39:34 +03:00
flynn
9ffa5f5e0d fix typo 2022-05-30 08:10:16 +00:00
Alexey Milovidov
1d6f9de001 Merge branch 'llvm-14' of github.com:ClickHouse/ClickHouse into llvm-14 2022-05-30 05:36:43 +02:00
Alexey Milovidov
f1fb57c6ce Fix clang-tidy-14 2022-05-30 05:36:26 +02:00
Alexey Milovidov
9a5dd75a68 Merge branch 'master' into llvm-14 2022-05-30 05:34:38 +02:00
Alexey Milovidov
c0e6ff4216 More precise result of "dumpColumnStructure" and "byteSize" miscellaneous functions 2022-05-30 04:56:54 +02:00
taiyang-li
dbb8a09825 merge master and solve conflict 2022-05-30 10:47:04 +08:00
taiyang-li
0b0d38b18c Merge branch 'master' into fix_settings 2022-05-30 10:07:09 +08:00
taiyang-li
51a893c8be add some metrics 2022-05-30 10:05:20 +08:00
Andrey Zvonov
55a9b99cb4 style fix(2) 2022-05-30 02:48:28 +03:00
Andrey Zvonov
c79d87e629 fix style 2022-05-30 02:26:02 +03:00
alesapin
6f35c28592 Fix style 2022-05-30 00:29:30 +02:00
alesapin
c32b6076fb Remove stranges from code 2022-05-30 00:12:33 +02:00
Alexey Milovidov
5fda199dcf
Update src/Common/ErrorCodes.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@clickhouse.com>
2022-05-30 00:20:49 +03:00
alesapin
0e8ab36913 Merge branch 'master' into turn_on_s3_tests 2022-05-29 14:37:10 +02:00
alesapin
d2cdbf3956 Fix refactoring issue 2022-05-29 14:09:49 +02:00
alesapin
9dc81e5cc8
Merge pull request #37598 from ClickHouse/revert-37545-revert-37424-fix_fetching_part_deadlock
Revert "Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part""
2022-05-29 13:54:34 +02:00
Andrey Zvonov
2dbbf14de5
Merge branch 'master' into non-neg-deriv 2022-05-29 10:09:51 +03:00
Alexander Tokmakov
579b0e3323
Merge pull request #37627 from ClickHouse/revert-37416-fix_ReplicatedMergeTree_comments
Revert "Implemented changing comment to a ReplicatedMergeTree table"
2022-05-29 09:57:12 +03:00
Alexey Milovidov
9e3242f186
Merge pull request #37617 from CurtizJ/aggregation-sparse-columns
Better performance with sparse columns in aggregate functions
2022-05-29 09:36:07 +03:00
Alexander Tokmakov
562eec591e
Revert "Implemented changing comment to a ReplicatedMergeTree table" 2022-05-29 09:28:47 +03:00
Alexey Milovidov
97606c324c
Merge pull request #37574 from azat/mt-tiny-refactor
Remove unused MergeTreeDataMergerMutator::chooseMergeAlgorithm()
2022-05-29 07:59:57 +03:00
zvonand
18e74eb540 remove old files 2022-05-29 03:48:59 +03:00
zvonand
295a0f9ec2 added tests 2022-05-29 03:38:42 +03:00
Alexey Milovidov
c1169019d2 Merge branch 'master' into llvm-14 2022-05-29 02:29:02 +02:00
Alexey Milovidov
11788c8129 Fix clang-tidy-14 2022-05-29 02:28:46 +02:00
zvonand
032e54abbf works now 2022-05-29 03:21:07 +03:00
Alexey Milovidov
73e2e63414
Merge pull request #37612 from ClickHouse/clang-tidy-14
Fix clang-tidy-14, part 1
2022-05-29 03:16:32 +03:00
Alexey Milovidov
4e60c88a27
Merge pull request #37609 from DevTeamBK/Medium-Clang-Tidy-Fix
Fix Clang-Tidy:  remove std::move() from trivially-copyable object
2022-05-28 21:27:08 +03:00
Alexander Tokmakov
4e52f45695 Merge branch 'master' into fix_trash 2022-05-28 19:43:19 +02:00
Anton Popov
c39d95e2e6 add perf test 2022-05-28 12:56:38 +00:00
Anton Popov
1d9b3be7da
Merge pull request #37536 from CurtizJ/profile-events-for-part-types
Add profile events for introspection of part types
2022-05-28 14:25:21 +02:00
Kseniia Sumarokova
8be351717f
Merge pull request #37606 from ClickHouse/kssenii-patch-3
Update FileCache.cpp
2022-05-28 12:48:45 +02:00
Han Fei
340a264a62 fix style 2022-05-28 18:26:14 +08:00
Han Fei
0a0d77bdef fix build 2022-05-28 17:57:59 +08:00
Han Fei
0f71231574 try fix flaky tests and refine code style 2022-05-28 17:25:33 +08:00
Vxider
b24346328d fix parser when using table identifer 2022-05-28 08:22:34 +00:00
Anton Popov
b2cff26ecf better performace with sparse columns in aggregate functions 2022-05-28 02:22:20 +00:00
Alexey Milovidov
eff285e24a
Update ThreadFuzzer.cpp 2022-05-28 05:13:16 +03:00
Alexander Gololobov
6a57e1a970
Merge pull request #37601 from ClickHouse/array_norm_dist_fixes
Added LpNorm and LpDistance functions for arrays
2022-05-27 23:40:38 +02:00
Alexey Milovidov
d6597efc08 Fix clang-tidy-14, part 1 2022-05-27 23:03:30 +02:00
Alexey Milovidov
be07c4c4b0 Fix clang-tidy-14, part 1 2022-05-27 23:03:16 +02:00
Alexey Milovidov
d62c57be3f Fix clang-tidy-14, part 1 2022-05-27 23:02:25 +02:00
Alexey Milovidov
8e9d771237 Fix clang-tidy-14, part 1 2022-05-27 23:02:05 +02:00
Alexey Milovidov
6c2699a991 Fix clang-tidy-14, part 1 2022-05-27 23:00:45 +02:00
Alexey Milovidov
3086c19341 Fix clang-tidy-14, part 1 2022-05-27 23:00:23 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
Alexey Milovidov
d2c6fd90cb Fix clang-tidy-14, part 1 2022-05-27 22:51:37 +02:00
Kseniia Sumarokova
10c9716467
Fix clang-tidy 2022-05-27 22:48:07 +02:00
Nikolai Kochetov
b80b1940ce Fix some tests. 2022-05-27 20:47:35 +00:00
Alexander Tokmakov
7e659036f8 fix 2022-05-27 20:30:06 +02:00
HeenaBansal2009
a061acadbe Remove std::move from trivially-copyable object 2022-05-27 11:04:29 -07:00
Yakov Olkhovskiy
41ef0044f0 endpoint is added 2022-05-27 13:43:34 -04:00
mergify[bot]
f5ee337bab
Merge branch 'master' into revert-37545-revert-37424-fix_fetching_part_deadlock 2022-05-27 16:52:00 +00:00
mergify[bot]
923ad2e905
Merge branch 'master' into turn_on_s3_tests 2022-05-27 16:31:43 +00:00
Dmitry Novik
60b9d81773 Remove global_memory_usage_overcommit_max_wait_microseconds 2022-05-27 16:30:29 +00:00
alesapin
f63fa9bcc6
Merge pull request #37416 from Enmk/fix_ReplicatedMergeTree_comments
Implemented changing comment to a ReplicatedMergeTree table
2022-05-27 18:29:34 +02:00
Alexander Gololobov
9b1b30855c Fixed check for HUGE_VAL 2022-05-27 18:25:11 +02:00
Alexander Gololobov
6361c5f38c Fix for failed style check 2022-05-27 18:22:16 +02:00
Kseniia Sumarokova
8099361cbc
Update FileCache.cpp 2022-05-27 17:48:14 +02:00
Alexander Gololobov
540353566c Added LpNorm and LpDistance functions for arrays 2022-05-27 17:17:08 +02:00
Azat Khuzhin
8a224239c1 Prohibit optimize_aggregation_in_order with GROUPING SETS
AggregatingStep ignores it anyway, and it leads to the following error
in getSortDescriptionFromGroupBy(), like in [1]:

    2022.05.24 04:29:29.279431 [ 3395 ] {26543564-8bc8-4a3a-b984-70a2adf0245d} <Fatal> : Logical error: 'Trying to get name of not a column: ExpressionList'.

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/36914/67d3ac72d26ab74d69f03c03422349d4faae9e19/stateless_tests__ubsan__actions_.html

v2: revert change to getSortDescriptionFromGroupBy() after
    GroupingSetsRewriterVisitor had been introduced
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-27 17:44:57 +03:00
Azat Khuzhin
1f29b0a901 Rewrite queries GROUPING SETS (foo, bar) to GROUP BY foo, bar
This is better then introducing separate
SelectQueryExpressionAnalyzer::useGroupingSetKey(), since for
optimize_aggregation_in_order that method will not be enough, because
size of ManyExpressionActions will not match size of SortDescription, in
ReadInOrderOptimizer::ReadInOrderOptimizer()

And plus it is cleaner.

v2: fix clang-tidy
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-27 17:44:51 +03:00
alesapin
5a296aec01 Fix build 2022-05-27 16:34:16 +02:00
alesapin
be1c3c132b Fix some trash 2022-05-27 16:08:49 +02:00
Anton Popov
abc90fad8d fix WITH FILL with negative itervals 2022-05-27 12:42:51 +00:00
zvonand
5c558d0be9 old work upload 2022-05-27 15:07:22 +03:00
alesapin
6d6779f17a
Merge pull request #37139 from ClickHouse/i_object_storage
Separate object storage operations from disks
2022-05-27 13:59:50 +02:00
alesapin
c79600c4c8 Fix build 2022-05-27 13:44:29 +02:00
taiyang-li
73d2c889c6 fix log level 2022-05-27 19:23:58 +08:00
alesapin
841858ec30
Revert "Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part"" 2022-05-27 13:13:36 +02:00
Azat Khuzhin
2613149f6b Fix converting types for UNION queries (may produce LOGICAL_ERROR)
CI founds [1]:

    2022.02.20 15:14:23.969247 [ 492 ] {} <Fatal> BaseDaemon: (version 22.3.1.1, build id: 6082C357CFA6FF99) (from thread 472) (query_id: a5187ff9-962a-4e7c-86f6-8d48850a47d6) (query: SELECT 0., round(avgWeighted(x, y)) FROM (SELECT toDate(toDate('214748364.8', '-922337203.6854775808', '-0.1', NULL) - NULL, 10.000100135803223, '-2147483647'), 255 AS x, -2147483647 AS y UNION ALL SELECT y, NULL AS x, 2147483646 AS y)) Received signal Aborted (6)

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/0/26d0e5438c86e52a145aaaf4cb523c399989a878/fuzzer_astfuzzerdebug,actions//report.html

The problem is that subqueries returns different headers:
- first query  -- x, y
- second query -- y, x

v2: Make order of columns strict only for UNION
    https://s3.amazonaws.com/clickhouse-test-reports/34775/9cc8c01a463d18c471853568b2f0af659a4e643f/stateless_tests__address__actions__[2/2].html
    Fixes: 00597_push_down_predicate_long
v3: add no-backward-compatibility-check for the test
Fixes: #37569
Resubmit: #34775
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
(cherry picked from commit a813f5996e)
2022-05-27 14:11:57 +03:00
Han Fei
2ea027ffcb Support insert into system.zookeeper 2022-05-27 18:53:12 +08:00
taiyang-li
ea450b86cb add some prefetch metric codes 2022-05-27 18:06:40 +08:00
Kseniia Sumarokova
141334448e
Merge pull request #37566 from kssenii/fix-assertion
Fix failed assertion in cache
2022-05-27 11:59:03 +02:00
Kseniia Sumarokova
2943d44bf1
Merge pull request #37554 from msaf1980/cleanup_hdfs
Cleanup StorageHDFS (unused variables prevent build with clang 12)
2022-05-27 11:57:18 +02:00
Kseniia Sumarokova
f5d69506b4
Merge pull request #37516 from KinderRiven/improve_local_cache
Control cache downloads to avoid negative optimization of local caches
2022-05-27 11:53:17 +02:00
zhanglistar
ca67e67a74 Fix a typo 2022-05-27 15:52:04 +08:00
Robert Schulze
80061aa3e2
Merge remote-tracking branch 'origin/master' into cached_patterns 2022-05-27 09:21:01 +02:00
Vxider
54d6f98122 flush and shutdown temporary table before drop 2022-05-27 04:50:36 +00:00
Dmitry Novik
3a9239b79f
Revert "RFC: Fix converting types for UNION queries (may produce LOGICAL_ERROR)" 2022-05-27 04:05:32 +02:00
Yakov Olkhovskiy
25884c68f1 http named collection source implemented for dictionary 2022-05-26 20:46:26 -04:00
Alexey Milovidov
8ba865bb60
Merge pull request #37344 from excitoon-favorites/fixs3colonandequalssign
Fixed error with symbols in key name in S3
2022-05-27 00:58:35 +03:00
Alexey Milovidov
86afa3a245
Merge pull request #37502 from ClickHouse/array_norm_dist_fixes
Renamed arrayXXNorm/arrayXXDistance functions to XXNorm/XXDistance and fixed some overflow cases
2022-05-27 00:56:29 +03:00
Alexander Gololobov
e655863d53
Merge pull request #37528 from kitaisreal/normalize-utf8-performance-tests-fix
Functions normalizeUTF8 unstable performance tests fix
2022-05-26 20:49:10 +02:00
Azat Khuzhin
c6c60364ae Remove unused MergeTreeDataMergerMutator::chooseMergeAlgorithm()
In favor of
MergeTask::ExecuteAndFinalizeHorizontalPart::chooseMergeAlgorithm()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 21:20:50 +03:00
Alexander Tokmakov
eb71dd4c78
Merge pull request #37547 from ClickHouse/followup_37398
Follow-up to #37398
2022-05-26 20:29:41 +03:00
Azat Khuzhin
73a99d4eee Improve error message for skipped/expired columns
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
f0cac5417a More optional skipping of fully expired columns (by TTL)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
46a94f395d Move the check into IMergedBlockOutputStream::removeEmptyColumnsFromPart()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
4288d09a85 Do not write expired columns by TTL after merge w/o TTL
Usually second merge do not perform TTL, since everything is up to date,
however in this case TTLTransform is not used, and hence expired_columns
will not be filled for new part, and so those columns will be written
with default values.

Avoid this, by manually filling expired_columns.

Here is a simpler reproducer:

Simple reproducer:

    ```sql
    create table ttl_02262 (date Date, key Int, value String TTL date + interval 1 month) engine=MergeTree order by key settings min_bytes_for_wide_part=0, min_rows_for_wide_part=0;
    insert into ttl_02262 values ('2010-01-01', 2010, 'foo');
    ```

    ```sh
    # ls -l .server/data/default/ttl_02262/all_*
    .server/data/default/ttl_02262/all_1_1_0:
    total 48
    -rw-r----- 1 root root 335 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  99 May 26 14:19 ttl.txt
    -rw-r----- 1 root root  30 May 26 14:19 value.bin
    -rw-r----- 1 root root  48 May 26 14:19 value.mrk2
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_1:
    total 40
    -rw-r----- 1 root root 279 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  61 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  81 May 26 14:19 ttl.txt
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_2:
    total 48
    -rw-r----- 1 root root 349 May 26 14:20 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:20 columns.txt
    -rw-r----- 1 root root   1 May 26 14:20 count.txt
    -rw-r----- 1 root root  28 May 26 14:20 date.bin
    -rw-r----- 1 root root  48 May 26 14:20 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:20 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:20 key.bin
    -rw-r----- 1 root root  48 May 26 14:20 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:20 primary.idx
    -rw-r----- 1 root root  81 May 26 14:20 ttl.txt
    -rw-r----- 1 root root  27 May 26 14:20 value.bin
    -rw-r----- 1 root root  48 May 26 14:20 value.mrk2
    ```

And now we have `value.*` for all_1_1_2, this should not happen.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
8328d7068b Fix updating of MergeTreeDataPartTTLInfo::finished
Previously you cannot distinguish non-initialized finished with
initialized to false, so update() cannot do the correct thing.

Rename the field to avoid hidden usage.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
0de1a64436 Log empty parts in IMergedBlockOutputStream::removeEmptyColumnsFromPart()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:09 +03:00
Nikolai Kochetov
ec4e8d71b2 Fixing build 2022-05-26 15:33:21 +00:00
Dmitry Novik
673a521d0b
Merge pull request #34775 from azat/union-type-cast
RFC: Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-26 17:28:23 +02:00
kssenii
36af6b1fa8 Fix assertion 2022-05-26 16:15:02 +02:00
alesapin
c862f89b8d Fix tidy 2022-05-26 15:43:21 +02:00
Alexander Tokmakov
e8f33fb0d9 fix flaky tests 2022-05-26 14:17:05 +02:00
Azat Khuzhin
dc9ca3d70c
Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges (#37413) 2022-05-26 14:14:58 +02:00
Nikolai Kochetov
bf95541531 Fixing style. 2022-05-26 11:09:36 +00:00
Nikolai Kochetov
84f97b53de Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-26 11:07:45 +00:00
Nikolai Kochetov
fea2401f1f
Merge pull request #37532 from ClickHouse/add-separate-mutex-for-factories-info
Use a separate mutex for query_factories_info in Context.
2022-05-26 13:03:28 +02:00
mergify[bot]
a7629f900f
Merge branch 'master' into normalize-utf8-performance-tests-fix 2022-05-26 10:29:55 +00:00
Maksim Kita
3a92e61827
Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement
Dictionary getDescendants performance improvement
2022-05-26 12:29:17 +02:00
KinderRiven
824628c0da fix style 2022-05-26 16:51:16 +08:00
KinderRiven
822ecd982f better & support clean stash 2022-05-26 16:36:05 +08:00
Vasily Nemkov
abe6b5d013
Reverted unnecessary modification 2022-05-26 10:09:27 +03:00
Antonio Andelic
fe236c98d5
Merge pull request #37534 from ClickHouse/revert-37036-keeper-preprocess-operations
Revert "Add support for preprocessing ZooKeeper operations in `clickhouse-keeper`"
2022-05-26 08:14:46 +02:00
taiyang-li
561c87222d add prefetch for hive text 2022-05-26 11:04:35 +08:00
Yakov Olkhovskiy
2dc160a4c3 style fix 2022-05-25 20:56:36 -04:00
Dmitry Novik
5c3c994d2a
Merge pull request #37493 from ClickHouse/grouping-sets-optimization-fix
Fix ORDER BY optimization in case of GROUPING SETS
2022-05-26 02:25:02 +02:00
Anton Popov
f488efd27e fix tests 2022-05-26 00:03:31 +00:00
Dmitry Novik
16c6b60703 Introduce AggregationKeysInfo 2022-05-25 23:22:29 +00:00
alesapin
8f1aac0ce4 Fix merge with master 2022-05-26 00:44:45 +02:00
Alexey Milovidov
f321925032
Merge pull request #36341 from ClickHouse/allow-setuid-inside-clickhouse
Allow to drop privileges at startup
2022-05-26 01:07:04 +03:00
Dmitry Novik
7cd7782e4f Process columns more efficiently in GROUPING() 2022-05-25 21:55:41 +00:00
Dmitry Novik
3c1b6609ae Add comments and make tests more verbose 2022-05-25 21:23:35 +00:00
mergify[bot]
49cce189e3
Merge branch 'master' into floating_seconds 2022-05-25 21:08:55 +00:00
alesapin
1db9cf480b Merge remote-tracking branch 'origin/master' into i_object_storage 2022-05-25 22:50:22 +02:00
Maksim Kita
58cd1bd3ec
Merge pull request #36843 from bharatnc/ncb/h3-unidirectionaledges-funcs
add h3 unidirectional edge functions
2022-05-25 22:46:40 +02:00
Maksim Kita
bee3c30f66
Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance
Geo distance functions improve performance
2022-05-25 22:40:40 +02:00
Maksim Kita
b12b363158 Fixed build of hierarchical index for HashedArrayDictionary 2022-05-25 22:40:19 +02:00
Alexander Gololobov
168b47d0ad Use same norm and distance function names for tuples and arrays 2022-05-25 22:39:59 +02:00
Alexander Gololobov
b065839f44 always return Float64 2022-05-25 22:27:00 +02:00
Alexander Gololobov
5df14cd956 Cast arguments to result type to avoid int overflow 2022-05-25 22:27:00 +02:00
Alexander Tokmakov
3d259fa0de
Update Context.cpp 2022-05-25 23:18:07 +03:00
Alexander Tokmakov
47820c216d
Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part" 2022-05-25 23:10:33 +03:00
Robert Schulze
49934a3dc8
Cache compiled regexps when evaluating non-const needles
Needles in a (non-const) needle column may repeat and this commit allows
to skip compilation for known needles. Out of the different design
alternatives (see below, if someone is interested), we now maintain
- one global pattern cache,
- with a fixed size of 42k elements currently,
- and use LRU as eviction strategy.

------------------------------------------------------------------------

(sorry for the wall of text, dumping it here not for reading but just
for reference)

Write-up about considered design alternatives:

1. Keep the current global cache of const needles. For non-const
   needles, probe the cache but don't store values in it.
   Pros: need to maintain just a single cache, no problem with cache
         pollution assuming there are few distinct constant needles
   Cons: only useful if a non-const needle occurred as already as a
         const needle
   --> overall too simplistic

2. Keep the current global cache for const needles. For non-const
   needles, create a local (e.g. per-query) cache
   Pros: unlike (1.), non-const needles can be skipped even if they
         did not occur yet, no pollution of the const pattern cache when
         there are very many non-const needles (e.g. large / highly
         distinct needle columns).
   Cons: caches may explode "horizontally", i.e. we'll end up with the
         const cache + caches for Q1, Q2, ... QN, this makes it harder
         to control the overall space consumption, also patterns
         residing in different caches cannot be reused between queries,
         another difficulty is that the concept of "query" does not
         really exist at matching level - there are only column chunks
         and we'd potentially end up with 1 cache / chunk

3. Queries with const and non-const needles insert into the same global
   cache.
   Pros: the advantages of (2.) + allows to reuse compiled patterns
         accross parallel queries
   Cons: needs an eviction strategy to control cache size and pollution
         (and btw. (2.) also needs eviction strategies for the
         individual caches)

4. Queries with const needle use global cache, queries with non-const
   needle use a different global cache
   --> Overall similar to (3) but ignores the (likely) edge case that
       const and non-const needles overlap.

In sum, (3.) seems the simplest and most beneficial approach.

Eviction strategies:

0. Don't ever evict --> cache may grow infinitely and eventually make
   the system unusable (may even pose a DoS risk)

1. Flush the cache after a certain threshold is exceeded --> very
   simple but may lead to peridic performance drops

2. Use LRU --> more graceful performance degradation at threshold but
   comes with a (constant) performance overhead to maintain the LRU
   queue

In sum, given that the pattern compilation in RE2 should be quite costly
(pattern-to-DFA/NFA), LRU may be acceptable.
2022-05-25 22:04:06 +02:00
Robert Schulze
ea60a614d2
Decrease namespace indent 2022-05-25 21:56:35 +02:00
alesapin
c7b16065e1 Merge with master 2022-05-25 21:47:05 +02:00
Nikolai Kochetov
6d4a26afac Update ReadProgressCallback. 2022-05-25 19:45:48 +00:00
Alexey Milovidov
abf2558fba
Merge pull request #37491 from ClickHouse/match_refactoring
Refactorings of LIKE/MATCH code
2022-05-25 22:05:38 +03:00
Alexey Milovidov
4482da9eb6
Update greatCircleDistance.cpp 2022-05-25 21:59:31 +03:00
alesapin
6f5c86e55e Merge branch 'master' into i_object_storage 2022-05-25 20:49:01 +02:00
Alexander Tokmakov
779e6ea0b9 make it better, fix on cluster queries 2022-05-25 20:17:49 +02:00
Nikolai Kochetov
54d7e4139f Fix build. 2022-05-25 18:16:48 +00:00
alesapin
51868a9a4f
Merge pull request #37424 from metahys/fix_fetching_part_deadlock
(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part
2022-05-25 20:15:41 +02:00
Azat Khuzhin
a813f5996e Fix converting types for UNION queries (may produce LOGICAL_ERROR)
CI founds [1]:

    2022.02.20 15:14:23.969247 [ 492 ] {} <Fatal> BaseDaemon: (version 22.3.1.1, build id: 6082C357CFA6FF99) (from thread 472) (query_id: a5187ff9-962a-4e7c-86f6-8d48850a47d6) (query: SELECT 0., round(avgWeighted(x, y)) FROM (SELECT toDate(toDate('214748364.8', '-922337203.6854775808', '-0.1', NULL) - NULL, 10.000100135803223, '-2147483647'), 255 AS x, -2147483647 AS y UNION ALL SELECT y, NULL AS x, 2147483646 AS y)) Received signal Aborted (6)

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/0/26d0e5438c86e52a145aaaf4cb523c399989a878/fuzzer_astfuzzerdebug,actions//report.html

The problem is that subqueries returns different headers:
- first query  -- x, y
- second query -- y, x

v2: Make order of columns strict only for UNION
    https://s3.amazonaws.com/clickhouse-test-reports/34775/9cc8c01a463d18c471853568b2f0af659a4e643f/stateless_tests__address__actions__[2/2].html
    Fixes: 00597_push_down_predicate_long
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-25 20:31:47 +03:00
Nikolai Kochetov
ff98c24d44
Merge pull request #37048 from Avogar/fix-array-map-nothing
Add default implementation for Nothing in functions
2022-05-25 19:10:40 +02:00
Nikolai Kochetov
1f9b1cf726 Fixing build. 2022-05-25 18:59:46 +02:00
alesapin
0a3597da72
Merge pull request #34915 from ianton-ru/MDB-16962
Fix collision of S3 operation log revision
2022-05-25 18:15:31 +02:00
Yakov Olkhovskiy
6692b9c2ed showCertificate function implementation 2022-05-25 12:11:44 -04:00
Alexey Milovidov
cb92482ca5
Merge pull request #37484 from kitaisreal/function-has-all-avx2-dynamic-dispatch
Function hasAll added dynamic dispatch
2022-05-25 19:05:32 +03:00
Nikolai Kochetov
7b681fa8ac Fixing build. 2022-05-25 17:15:23 +02:00
avogar
ede6e2f433 Add docs for settings 2022-05-25 15:10:20 +00:00
avogar
4c9812d4c1 Allow to skip some of the first rows in CSV/TSV formats 2022-05-25 15:00:11 +00:00
KinderRiven
a33c7ce648 fix 2022-05-25 22:58:47 +08:00
Anton Popov
16e839ac71 add profile events for introspection of part types 2022-05-25 14:54:49 +00:00
Antonio Andelic
6a962549d5
Revert "Add support for preprocessing ZooKeeper operations in clickhouse-keeper" 2022-05-25 16:45:32 +02:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
msaf1980
fda6ddeffa cleanup StorageHDFS (unused variables) 2022-05-25 19:23:05 +05:00
mergify[bot]
73662b4436
Merge branch 'master' into fix_fetching_part_deadlock 2022-05-25 14:22:35 +00:00
Maksim Kita
28355114c0 Fixed tests 2022-05-25 16:19:29 +02:00
Nikolai Kochetov
6370c29049 Use a separate mutex for query_factories_info in Context. 2022-05-25 14:16:59 +00:00
Maksim Kita
e67b3537f7 Functions normalizeUTF8 unstable performance tests fix 2022-05-25 15:54:52 +02:00
KinderRiven
875557abc2 fix 2022-05-25 21:53:28 +08:00
KinderRiven
adbb821176 fix 2022-05-25 21:05:15 +08:00
mergify[bot]
f49552d48e
Merge branch 'master' into grouping-sets-optimization-fix 2022-05-25 13:03:54 +00:00
Maksim Kita
45da28ecae Improve performance of geo distance functions 2022-05-25 14:22:22 +02:00
KinderRiven
2211c1ddb8 fix 2022-05-25 20:15:43 +08:00
taiyang-li
a7a816dcb6 fix build error 2022-05-25 19:55:11 +08:00
Maksim Kita
83554d1f2d Fixed style 2022-05-25 13:05:39 +02:00
Maksim Kita
fbec38ddb9 Fixed performance tests 2022-05-25 12:51:21 +02:00
KinderRiven
d0fcffec66 fix style 2022-05-25 17:51:03 +08:00
Maksim Kita
c372c3d6aa Fix performance tests 2022-05-25 11:49:59 +02:00
Maksim Kita
9a9df26eec Fixed tests 2022-05-25 11:44:37 +02:00
Maksim Kita
6c033f340b Fixed tests 2022-05-25 11:44:37 +02:00
Maksim Kita
0e5f13e53e MergingSortedAlgorithm single column specialization 2022-05-25 11:44:37 +02:00
KinderRiven
1ce219bae2 fix 2022-05-25 17:24:38 +08:00
taiyang-li
1d9f65a7d4 Merge branch 'master' into async_hdfs_read_buffer 2022-05-25 17:10:22 +08:00
Kseniia Sumarokova
b50d4549c9
Merge pull request #37356 from amosbird/partition-prune-for-s3
"Partition pruning" for s3
2022-05-25 11:03:07 +02:00
KinderRiven
e3f76cab55 impl improve remote fs cache 2022-05-25 16:54:28 +08:00
avogar
f782fa31c6 Merge branch 'master' of github.com:ClickHouse/ClickHouse into check-format-on-storage-creation 2022-05-25 08:42:54 +00:00
Robert Schulze
05e4fa7df1
Fix special case of trivial regexp
Previously, we would alsays set 1 in case of a trivial regex (which is
correct). If someone in future builds a negated operator, then this
will produce wrong results. Right now, negation of regexp (SQL: NOT
MATCH) is implemented at a higher level, so we are safe and this is more
a preventive fix.
2022-05-25 10:05:55 +02:00
Robert Schulze
01ab7b9bad
Pass strings in some places as string_view
The original goal was to get change

  const auto & needle = String(
        reinterpret_cast<const char *>(cur_needle_data),
        cur_needle_length);

in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
   not always the case (e.g. fixed string columns has a dense layout w/o
   NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
   too dangerous.

So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.

Just changing a few other places from std::string to std::string_view
but this will not help with performance.
2022-05-25 10:05:51 +02:00
Robert Schulze
e8c96777f6
Make OptimizedRegularExpression::analyze() private 2022-05-25 10:05:45 +02:00
Robert Schulze
040fbf3686
Tighter sanity checks in matching code 2022-05-25 10:05:06 +02:00
Robert Schulze
35bef17302
Introduce variables to hold the match result
--> nicer when debugging
2022-05-25 10:04:47 +02:00
Robert Schulze
b044d44fef
Refactoring: Make template instantiation easier to read
- introduced class MatchTraits with enums that replace bool template
  parameters

- (minor: made negation the last template parameters because negation
  executes last during evaluation)
2022-05-25 10:03:58 +02:00
Bharat Nallan Chakravarthy
57cfc0bd04 check for validity of h3 index 2022-05-25 06:17:15 +05:30
Alexey Milovidov
516fba27dc Merge branch 'master' into allow-setuid-inside-clickhouse 2022-05-24 23:31:14 +02:00