Commit Graph

3728 Commits

Author SHA1 Message Date
Robert Schulze
600512cc08
Replace exceptions thrown for programming errors by asserts 2022-06-01 11:53:37 +02:00
Anton Popov
20e319d67a
Merge pull request #37666 from CurtizJ/optimize-coalesce
Optimize function `COALESCE` with two arguments
2022-05-31 23:48:13 +02:00
Yakov Olkhovskiy
873ac9f8ff
Merge pull request #37540 from ClickHouse/feature-server-certificate
showCertificate function implementation
2022-05-31 02:50:03 -04:00
Anton Popov
30f8eb800a optimize function coalesce with two arguments 2022-05-30 22:29:35 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Robert Schulze
ad12adc31c
Measure and rework internal re2 caching
This commit is based on local benchmarks of ClickHouse's re2 caching.

Question 1: -----------------------------------------------------------
Is pattern caching useful for queries with const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T;

The short answer is: no. Runtime is (unsurprisingly) dominated by
pattern evaluation + other stuff going on in queries, but definitely not
pattern compilation. For space reasons, I omit details of the local
experiments.

(Side note: the current caching scheme is unbounded in size which poses
a DoS risk (think of multi-tenancy). This risk is more pronounced when
unbounded caching is used with non-const patterns ..., see next
question)

Question 2: -----------------------------------------------------------
Is pattern caching useful for queries with non-const LIKE/REGEX
patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T;

I benchmarked five caching strategies:
1. no caching as a baseline (= recompile for each row)
2. unbounded cache (= threadsafe global hash-map)
3. LRU cache (= threadsafe global hash-map + LRU queue)
4. lightweight local cache 1 (= not threadsafe local hashmap with
   collision list which grows to a certain size (here: 10 elements) and
   afterwards never changes)
5. lightweight local cache 2 (not threadsafe local hashmap without
   collision list in which a collision replaces the stored element, idea
   by Alexey)

... using a haystack of 2 mio strings and
A). 2 mio distinct simple patterns
B). 10 simple patterns
C)  2 mio distinct complex patterns
D)  10 complex patterns

Fo A) and C), caching does not help but these queries still allow to
judge the static overhead of caching on query runtimes.

B) and D) are extreme but common cases in practice. They include
queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' :
'%pattern2%'). Caching should help significantly.

Because LIKE patterns are internally translated to re2 expressions, I
show only measurements for MATCH queries.

Results in sec, averaged over on multiple measurements;

1.A): 2.12
  B): 1.68
  C): 9.75
  D): 9.45

2.A): 2.17
  B): 1.73
  C): 9.78
  D): 9.47

3.A): 9.8
  B): 0.63
  C): 31.8
  D): 0.98

4.A): 2.14
  B): 0.29
  C): 9.82
  D): 0.41

5.A) 2.12 / 2.15 / 2.26
  B) 1.51 / 0.43 / 0.30
  C) 9.97 / 9.88 / 10.13
  D) 5.70 / 0.42 / 0.43
(10/100/1000 buckets, resp. 10/1/0.1% collision rate)

Evaluation:

1. This is the baseline. It was surprised that complex patterns (C, D)
   slow down the queries so badly compared to simple patterns (A, B).
   The runtime includes evaluation costs, but as caching only helps with
   compilation, and looking at 4.D and 5.D, compilation makes up over 90%
   of the runtime!

2. No speedup compared to 1, probably due to locking overhead. The cache
   is unbounded, and in experiments with data sets > 2 mio rows, 2. is
   the only scheme to throw OOM exceptions which is not acceptable.

3. Unique patterns (A and C) lead to thrashing of the LRU cache and very
   bad runtimes due to LRU queue maintenance and locking. Works pretty
   well however with few distinct patterns (B and D).

4. This scheme is tailored to queries B and D where it performs pretty
   good. More importantly, the caching is lightweight enough to not
   deteriorate performance on datasets A and C.

5. After some tuning of the hash map size, 100 buckets seem optimal to
   be in the same ballpark with 10 distinct patterns as 4. Performance
   also does not deteriorate on A and C compared to the baseline.
   Unlike 4., this scheme behaves LRU-like and can adjust to changing
   pattern distributions.

As a conclusion, this commit implementes two things:

1. Based on Q1, pattern search with const needle no longer uses
   caching. This applies to LIKE and MATCH + a few (exotic) other SQL
   functions. The code for the unbounded caching was removed.

2. Based on Q2, pattern search with non-const needles now use method 5.
2022-05-30 20:00:35 +02:00
Alexey Milovidov
f1fb57c6ce Fix clang-tidy-14 2022-05-30 05:36:26 +02:00
Alexey Milovidov
c0e6ff4216 More precise result of "dumpColumnStructure" and "byteSize" miscellaneous functions 2022-05-30 04:56:54 +02:00
Alexey Milovidov
c1169019d2 Merge branch 'master' into llvm-14 2022-05-29 02:29:02 +02:00
Alexey Milovidov
73e2e63414
Merge pull request #37612 from ClickHouse/clang-tidy-14
Fix clang-tidy-14, part 1
2022-05-29 03:16:32 +03:00
Alexander Tokmakov
4e52f45695 Merge branch 'master' into fix_trash 2022-05-28 19:43:19 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
Alexey Milovidov
d2c6fd90cb Fix clang-tidy-14, part 1 2022-05-27 22:51:37 +02:00
Alexander Gololobov
9b1b30855c Fixed check for HUGE_VAL 2022-05-27 18:25:11 +02:00
Alexander Gololobov
6361c5f38c Fix for failed style check 2022-05-27 18:22:16 +02:00
Alexander Gololobov
540353566c Added LpNorm and LpDistance functions for arrays 2022-05-27 17:17:08 +02:00
Robert Schulze
80061aa3e2
Merge remote-tracking branch 'origin/master' into cached_patterns 2022-05-27 09:21:01 +02:00
Alexey Milovidov
86afa3a245
Merge pull request #37502 from ClickHouse/array_norm_dist_fixes
Renamed arrayXXNorm/arrayXXDistance functions to XXNorm/XXDistance and fixed some overflow cases
2022-05-27 00:56:29 +03:00
mergify[bot]
a7629f900f
Merge branch 'master' into normalize-utf8-performance-tests-fix 2022-05-26 10:29:55 +00:00
Maksim Kita
3a92e61827
Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement
Dictionary getDescendants performance improvement
2022-05-26 12:29:17 +02:00
Yakov Olkhovskiy
2dc160a4c3 style fix 2022-05-25 20:56:36 -04:00
Dmitry Novik
7cd7782e4f Process columns more efficiently in GROUPING() 2022-05-25 21:55:41 +00:00
Dmitry Novik
3c1b6609ae Add comments and make tests more verbose 2022-05-25 21:23:35 +00:00
Maksim Kita
58cd1bd3ec
Merge pull request #36843 from bharatnc/ncb/h3-unidirectionaledges-funcs
add h3 unidirectional edge functions
2022-05-25 22:46:40 +02:00
Maksim Kita
bee3c30f66
Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance
Geo distance functions improve performance
2022-05-25 22:40:40 +02:00
Alexander Gololobov
168b47d0ad Use same norm and distance function names for tuples and arrays 2022-05-25 22:39:59 +02:00
Alexander Gololobov
b065839f44 always return Float64 2022-05-25 22:27:00 +02:00
Alexander Gololobov
5df14cd956 Cast arguments to result type to avoid int overflow 2022-05-25 22:27:00 +02:00
Robert Schulze
49934a3dc8
Cache compiled regexps when evaluating non-const needles
Needles in a (non-const) needle column may repeat and this commit allows
to skip compilation for known needles. Out of the different design
alternatives (see below, if someone is interested), we now maintain
- one global pattern cache,
- with a fixed size of 42k elements currently,
- and use LRU as eviction strategy.

------------------------------------------------------------------------

(sorry for the wall of text, dumping it here not for reading but just
for reference)

Write-up about considered design alternatives:

1. Keep the current global cache of const needles. For non-const
   needles, probe the cache but don't store values in it.
   Pros: need to maintain just a single cache, no problem with cache
         pollution assuming there are few distinct constant needles
   Cons: only useful if a non-const needle occurred as already as a
         const needle
   --> overall too simplistic

2. Keep the current global cache for const needles. For non-const
   needles, create a local (e.g. per-query) cache
   Pros: unlike (1.), non-const needles can be skipped even if they
         did not occur yet, no pollution of the const pattern cache when
         there are very many non-const needles (e.g. large / highly
         distinct needle columns).
   Cons: caches may explode "horizontally", i.e. we'll end up with the
         const cache + caches for Q1, Q2, ... QN, this makes it harder
         to control the overall space consumption, also patterns
         residing in different caches cannot be reused between queries,
         another difficulty is that the concept of "query" does not
         really exist at matching level - there are only column chunks
         and we'd potentially end up with 1 cache / chunk

3. Queries with const and non-const needles insert into the same global
   cache.
   Pros: the advantages of (2.) + allows to reuse compiled patterns
         accross parallel queries
   Cons: needs an eviction strategy to control cache size and pollution
         (and btw. (2.) also needs eviction strategies for the
         individual caches)

4. Queries with const needle use global cache, queries with non-const
   needle use a different global cache
   --> Overall similar to (3) but ignores the (likely) edge case that
       const and non-const needles overlap.

In sum, (3.) seems the simplest and most beneficial approach.

Eviction strategies:

0. Don't ever evict --> cache may grow infinitely and eventually make
   the system unusable (may even pose a DoS risk)

1. Flush the cache after a certain threshold is exceeded --> very
   simple but may lead to peridic performance drops

2. Use LRU --> more graceful performance degradation at threshold but
   comes with a (constant) performance overhead to maintain the LRU
   queue

In sum, given that the pattern compilation in RE2 should be quite costly
(pattern-to-DFA/NFA), LRU may be acceptable.
2022-05-25 22:04:06 +02:00
Robert Schulze
ea60a614d2
Decrease namespace indent 2022-05-25 21:56:35 +02:00
Alexey Milovidov
abf2558fba
Merge pull request #37491 from ClickHouse/match_refactoring
Refactorings of LIKE/MATCH code
2022-05-25 22:05:38 +03:00
Alexey Milovidov
4482da9eb6
Update greatCircleDistance.cpp 2022-05-25 21:59:31 +03:00
Alexander Tokmakov
779e6ea0b9 make it better, fix on cluster queries 2022-05-25 20:17:49 +02:00
Nikolai Kochetov
ff98c24d44
Merge pull request #37048 from Avogar/fix-array-map-nothing
Add default implementation for Nothing in functions
2022-05-25 19:10:40 +02:00
Yakov Olkhovskiy
6692b9c2ed showCertificate function implementation 2022-05-25 12:11:44 -04:00
Alexey Milovidov
cb92482ca5
Merge pull request #37484 from kitaisreal/function-has-all-avx2-dynamic-dispatch
Function hasAll added dynamic dispatch
2022-05-25 19:05:32 +03:00
Maksim Kita
28355114c0 Fixed tests 2022-05-25 16:19:29 +02:00
Maksim Kita
e67b3537f7 Functions normalizeUTF8 unstable performance tests fix 2022-05-25 15:54:52 +02:00
Maksim Kita
45da28ecae Improve performance of geo distance functions 2022-05-25 14:22:22 +02:00
Maksim Kita
c372c3d6aa Fix performance tests 2022-05-25 11:49:59 +02:00
Kseniia Sumarokova
b50d4549c9
Merge pull request #37356 from amosbird/partition-prune-for-s3
"Partition pruning" for s3
2022-05-25 11:03:07 +02:00
Robert Schulze
05e4fa7df1
Fix special case of trivial regexp
Previously, we would alsays set 1 in case of a trivial regex (which is
correct). If someone in future builds a negated operator, then this
will produce wrong results. Right now, negation of regexp (SQL: NOT
MATCH) is implemented at a higher level, so we are safe and this is more
a preventive fix.
2022-05-25 10:05:55 +02:00
Robert Schulze
01ab7b9bad
Pass strings in some places as string_view
The original goal was to get change

  const auto & needle = String(
        reinterpret_cast<const char *>(cur_needle_data),
        cur_needle_length);

in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
   not always the case (e.g. fixed string columns has a dense layout w/o
   NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
   too dangerous.

So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.

Just changing a few other places from std::string to std::string_view
but this will not help with performance.
2022-05-25 10:05:51 +02:00
Robert Schulze
040fbf3686
Tighter sanity checks in matching code 2022-05-25 10:05:06 +02:00
Robert Schulze
35bef17302
Introduce variables to hold the match result
--> nicer when debugging
2022-05-25 10:04:47 +02:00
Robert Schulze
b044d44fef
Refactoring: Make template instantiation easier to read
- introduced class MatchTraits with enums that replace bool template
  parameters

- (minor: made negation the last template parameters because negation
  executes last during evaluation)
2022-05-25 10:03:58 +02:00
Bharat Nallan Chakravarthy
57cfc0bd04 check for validity of h3 index 2022-05-25 06:17:15 +05:30
Alexander Gololobov
2ff747785e
Merge pull request #37394 from ClickHouse/array_norm_dist_fixes
Do computations on the raw input data without copying to Eigen::Matrix
2022-05-24 20:59:04 +02:00
Robert Schulze
7348a0eb28
Merge pull request #37251 from ClickHouse/non_const_like
Support non-constant SQL functions (NOT) (I)LIKE and MATCH
2022-05-24 20:28:31 +02:00
Robert Schulze
028f15c4fa
Review comment: Throw LOGICAL_ERROR for different sizes of haystack / needles 2022-05-24 20:19:13 +02:00
Maksim Kita
3c0c322d7c
Merge pull request #37480 from kitaisreal/dynamic-dispatch-infrastructure-improvements
Dynamic dispatch infrastructure style fixes
2022-05-24 18:13:53 +02:00
Maksim Kita
6fb51e8bd3 Function hasAll added dynamic dispatch 2022-05-24 17:06:06 +02:00
Maksim Kita
86180614e7 Fixed tests 2022-05-24 15:33:03 +02:00
Anton Popov
e96af9fd75 better binary serialization of ColumnObject 2022-05-24 13:16:11 +00:00
Maksim Kita
e6e4b2826d Dynamic dispatch infrastructure style fixes 2022-05-24 14:25:29 +02:00
Amos Bird
c25ef92139
Fix tests 2022-05-24 18:57:55 +08:00
Amos Bird
093d315756
partition pruning for s3 2022-05-24 18:57:55 +08:00
Maksim Kita
712b000f2a
Merge pull request #37443 from kitaisreal/functions-normalize-utf8-fix
Functions normalize utf8 fix
2022-05-24 11:11:15 +02:00
Alexander Gololobov
7d0ed7e51a Remove eigen library 2022-05-24 10:24:50 +02:00
Alexander Gololobov
caad1435d5 Optimized the case when one the argumnets is Const 2022-05-24 10:24:50 +02:00
Alexander Gololobov
65fbda436a Do computations on the raw input data without copying to Eigen::Matrix 2022-05-24 10:24:50 +02:00
Bharat Nallan Chakravarthy
6e49b76cfd try suppress h3 asan errors 2022-05-24 10:22:46 +05:30
Maksim Kita
996241493f
Merge pull request #37447 from kitaisreal/binary-function-vectorized-remove-macro
BinaryFunctionVectorized remove macro
2022-05-23 16:50:12 +02:00
Maksim Kita
fe21b4ca9e Fixed style check 2022-05-23 14:41:07 +02:00
Maksim Kita
008de5c779
Merge pull request #37438 from kitaisreal/function-binary-representation-style-fixes
FunctionBinaryRepresentation style fixes
2022-05-23 13:54:15 +02:00
Maksim Kita
e550843d56 BinaryFunctionVectorized remove macro 2022-05-23 12:45:16 +02:00
Maksim Kita
585b86446e Added hierarchical_index_bytes_allocated column in system.dictionaries 2022-05-23 12:42:00 +02:00
Maksim Kita
be9c3d9bd4 Fixed build 2022-05-23 12:42:00 +02:00
Maksim Kita
100afa8bcf Dictionary getDescendants performance improvement 2022-05-23 12:42:00 +02:00
Maksim Kita
78782de887 Functions normalizeUTF8 logical error fix 2022-05-23 12:19:14 +02:00
Maksim Kita
98bb34f2f2 FunctionBinaryRepresentation style fixes 2022-05-23 10:59:33 +02:00
Robert Schulze
e25ca139cd
Implement SQL functions (NOT) (I)LIKE() + MATCH() with non-const needles
With this commit, SQL functions LIKE and MATCH and their variants can
work with non-const needle arguments. E.g.

  create table tab
    (id UInt32, haystack String, needle String)
    engine = MergeTree()
    order by id;

  insert into tab values
  (1, 'Hello', '%ell%')
  (2, 'World', '%orl%')

  select id, haystack, needle, like(haystack, needle)
  from tab;

For that, methods vectorVector() and vectorFixedVector() were added to
MatchImpl. The existing code for const needles has an optimization where
the compiled regexp is cached. The new code expects a different needle
per row and consequently does not cache the regexp.
2022-05-23 09:41:28 +02:00
Alexey Milovidov
698e5e5352
Merge pull request #37415 from Joeywzr/gen_uuid
Generate multiple columns with UUID
2022-05-23 00:29:42 +03:00
Robert Schulze
4829ae8380
Replace overly clever const argument logic by something simpler
The previous logic was smart but too inflexible to support the next
commits. Replace by a simple pushdown logic where string search
implementations return their const arguments instead of having the
common class figure these out based on properties/traits.
2022-05-22 17:50:38 +02:00
Robert Schulze
0299cc87e4
Improve naming consistency of string search code
Just renamings, nothing major ...
2022-05-22 17:50:38 +02:00
Robert Schulze
19d53c14fa
Merge pull request #37382 from ClickHouse/wc++98-compat-extra-semi
Enable -Wc++98-compat-extra-semi
2022-05-22 09:40:45 +02:00
Memo
15a76d012f add NUMBER_OF_ARGUMENTS_DOESNT_MATCH defination 2022-05-22 13:38:47 +08:00
Yakov Olkhovskiy
d878f193d8
Merge pull request #37013 from mnutt/hashid
Add hashid support
2022-05-21 17:14:54 -04:00
Memo
942af133e5 init 2022-05-21 23:54:12 +08:00
Maksim Kita
0d69f35b6a Fixed style check 2022-05-21 14:54:45 +02:00
Maksim Kita
42439aeb3c Improve performance of number comparison functions 2022-05-20 22:42:48 +02:00
Robert Schulze
0f6715bd91
Follow-up to PR #37300: semicolon warnings
In PR #37300, Alexej asked why we the compiler does not warn about
unnecessary semicolons, e.g.

  f()
  {
  }; // <-- here

The answer is surprising: In C++98, above syntax was disallowed but by
most compilers accepted it regardless. C++>11 introduced "empty
declarations" which made the syntax legal.

The previous behavior can be restored using flag
-Wc++98-compat-extra-semi. This finds many useless semicolons which were
removed in this change. Unfortunately, there are also false positives
which would require #pragma-s and HAS_* logic (--> check_flags.cmake) to
suppress. In the end, -Wc++98-compat-extra-semi comes with extra effort
for little benefit. Therefore, this change only fixes some semicolons
but does not enable the flag.
2022-05-20 15:06:34 +02:00
Michael Nutt
23dbf1b257 Merge branch 'master' into hashid 2022-05-20 08:42:01 -04:00
Robert Schulze
b475fbc9a7
Merge pull request #37300 from ClickHouse/cmake-cleanup-pt3
Various cmake cleanups
2022-05-20 10:02:36 +02:00
Dmitry Novik
b3ccf96c81 Merge remote-tracking branch 'origin/master' into grouping-function 2022-05-19 17:58:33 +00:00
Dmitry Novik
d4c66f4a48 Code cleanup & fix GROUPING() with TOTALS 2022-05-19 16:36:51 +00:00
avogar
f69c3175af Fix comments 2022-05-19 10:13:44 +00:00
avogar
cb8646fbb4 Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-array-map-nothing 2022-05-19 07:18:48 +00:00
Michael Nutt
e0c14dfc01 fix includes 2022-05-18 20:16:43 -04:00
Bharat Nallan Chakravarthy
00d3bbc2e0 review fixes 2022-05-18 17:04:15 -07:00
Michael Nutt
c87638d2ba put hashid behind allow_experimental_hash_functions setting 2022-05-18 19:06:33 -04:00
Michael Nutt
11a17997b3 better const column checking 2022-05-18 18:09:45 -04:00
Michael Nutt
da99b1b250 simplify hashing 2022-05-18 16:57:30 -04:00
Michael Nutt
d6d1c22008 better argument type checking 2022-05-18 16:57:21 -04:00
Michael Nutt
e453132db8 remove hashid define guard 2022-05-18 15:26:54 -04:00
Maksim Kita
df0cb06209
Merge pull request #37289 from kitaisreal/unary-arithmetic-functions-improve-performance-dynamic-dispatch
Improve performance of unary arithmetic functions
2022-05-18 19:16:30 +02:00
Dmitry Novik
6356112a76 Refactor GROUPING function 2022-05-18 15:23:31 +00:00
Anton Popov
715d5b0173
Merge pull request #37270 from Avogar/fix-bool-eof
Fix Nullable(String) to Nullable(Bool/IPv4/IPv6) conversion
2022-05-18 14:08:52 +02:00
Nikolai Kochetov
64ecb3941c
Merge pull request #37259 from ClickHouse/clangtidies2
Activate more clangtidies
2022-05-18 13:01:40 +02:00
Bharat Nallan Chakravarthy
c476b8dd92 Merge remote-tracking branch 'upstream/master' into ncb/h3-unidirectionaledges-funcs 2022-05-17 20:10:03 -07:00
mergify[bot]
05305811f8
Merge branch 'master' into fix-bool-eof 2022-05-17 19:28:11 +00:00
Robert Schulze
0c55ac76d2
A few clangtidy updates
Enable:

- bugprone-lambda-function-name: "Checks for attempts to get the name of
  a function from within a lambda expression. The name of a lambda is
  always something like operator(), which is almost never what was
  intended."

- bugprone-unhandled-self-assignment: "Finds user-defined copy
  assignment operators which do not protect the code against
  self-assignment either by checking self-assignment explicitly or using
  the copy-and-swap or the copy-and-move method.""

- hicpp-invalid-access-moved: "Warns if an object is used after it has
  been moved."

- hicpp-use-noexcept: "This check replaces deprecated dynamic exception
  specifications with the appropriate noexcept specification (introduced
  in C++11)"

- hicpp-use-override: "Adds override (introduced in C++11) to overridden
  virtual functions and removes virtual from those functions as it is
  not required."

- performance-type-promotion-in-math-fn: "Finds calls to C math library
  functions (from math.h or, in C++, cmath) with implicit float to
  double promotions."

Split up:

- cppcoreguidelines-*. Some of them may be useful (haven't checked in
  detail), therefore allow to toggle them individually.

Disable:

- linuxkernel-*. Obvious.
2022-05-17 20:56:57 +02:00
mergify[bot]
36b4ed19c5
Merge branch 'master' into unary-arithmetic-functions-improve-performance-dynamic-dispatch 2022-05-17 18:08:24 +00:00
Alexander Gololobov
38f291c70d
Merge pull request #37030 from bharatnc/ncb/h3-missing-traversal-funcs
add remaining h3 traversal funcs
2022-05-17 18:19:56 +02:00
avogar
46f4f8a457 Fix use of unitialized memory 2022-05-17 12:59:46 +00:00
Maksim Kita
beb34e7062 Improve performance of unary arithmetic functions 2022-05-17 13:53:20 +02:00
Alexander Gololobov
670a8bac29 Fixed required array size calculation and reduced number of reallocations 2022-05-17 09:45:49 +02:00
Kseniia Sumarokova
94683786dc
Merge branch 'master' into MeiliSearch 2022-05-16 22:42:09 +02:00
Alexander Gololobov
e2e3536a80 Fixed handling of gridPathCellsSize() errors 2022-05-16 21:23:45 +02:00
avogar
415aabd4d0 Fix Nullable(String) to Nullable(Bool/IPv4/IPv6) conversion 2022-05-16 19:15:18 +00:00
Robert Schulze
43945cea1b
Fixing some warnings 2022-05-16 20:59:27 +02:00
Dmitry Novik
e5b395e054 Support ROLLUP and CUBE in GROUPING function 2022-05-16 17:33:38 +00:00
Robert Schulze
e3cfec5b09
Merge remote-tracking branch 'origin/master' into clangtidies 2022-05-16 10:12:50 +02:00
Michael Nutt
8bff9b8ce9
Merge branch 'master' into hashid 2022-05-14 09:52:05 +09:00
Dmitry Novik
6fc7dfea80 Support ordinary GROUP BY 2022-05-13 23:04:12 +00:00
Maksim Kita
3f18d7da33
Merge pull request #37189 from kitaisreal/function-h3-k-ring-add-cast
Function h3kRing added cast
2022-05-13 22:53:20 +02:00
Dmitry Novik
efb30bdf64 Correctly use __grouping_set_map column 2022-05-13 18:20:12 +00:00
Dmitry Novik
ae81268d4d Try to compute helper column lazy 2022-05-13 14:55:50 +00:00
Maksim Kita
ef7e21ea46 Function h3kRing added cast 2022-05-13 15:20:04 +02:00
Michael Nutt
9599c1f05c use single-character find for bad alphabet 2022-05-13 19:01:20 +09:00
qieqieplus
8b3fb22c6d check array sizes for short cut 2022-05-13 17:05:18 +08:00
mergify[bot]
2fdd305ef1
Merge branch 'master' into array-distance-functions 2022-05-13 07:56:57 +00:00
Michael Nutt
62a1e1c0cd use existing error code 2022-05-13 09:58:14 +09:00
Michael Nutt
03a7f7c4bd disallow null characters in custom alphabet 2022-05-13 08:43:42 +09:00
Dmitry Novik
92575fc3e5 Add missing file 2022-05-12 16:54:02 +00:00
Dmitry Novik
c5b40a9c91 WIP on GROUPING function 2022-05-12 16:40:26 +00:00
avogar
4c945d7fe5 Fix 2022-05-12 16:07:58 +00:00
avogar
0311dbb422 Add default implementation for Nothing, support arrays of nullable for arrayFilter and similar functions 2022-05-12 15:15:31 +00:00
Alexander Gololobov
548625a003 Reserve result vectors 2022-05-12 14:33:20 +02:00
Alexander Gololobov
7c226f6067 Fixed special case condition 2022-05-12 14:32:47 +02:00
Alexander Gololobov
355c5443a0 Trying to fix sanitizer failure 2022-05-12 13:50:53 +02:00
Robert Schulze
f8c24c5fe8
Merge pull request #37117 from ClickHouse/bug-37114
Fix Bug 37114 - ilike on FixedString(N)s produces wrong results
2022-05-12 09:39:36 +02:00
Mikhail Artemenko
031aca593d fix after merge 2022-05-12 01:42:34 +03:00
Alexander Gololobov
096b4626d6 Print more info in mismatching array sizes error message 2022-05-11 21:20:33 +02:00
Michael Nutt
2ff13c4e5d
Merge branch 'master' into hashid 2022-05-12 03:12:10 +09:00
Alexander Gololobov
b34a55c9e9
Merge branch 'master' into array-distance-functions 2022-05-11 16:55:02 +02:00
Yakov Olkhovskiy
6d3a54a044
Merge pull request #36467 from olevino/wyhash
Wyhash
2022-05-11 09:57:09 -04:00
Alexander Gololobov
3533cd770d Reserve result arrays 2022-05-11 14:46:06 +02:00
Robert Schulze
7232f47c68
Fix Bug 37114 - ilike on FixedString(N) columns produces wrong results
The main fix is in MatchImpl.h where the "case_insensitive" parameter is
added to Regexps::get().

Also made "case_insensitive" a non-default template parameter to reduce
the risk of future bugs.

The remainder of this commit are minor random code improvements.

resoves #37114
2022-05-11 14:30:21 +02:00
avogar
246aafa58a Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-array-map-nothing 2022-05-11 10:51:14 +00:00
mergify[bot]
0e2a86dcee
Merge branch 'master' into MeiliSearch 2022-05-11 08:49:19 +00:00
qieqieplus
5f9eee976f fix & format 2022-05-11 16:14:43 +08:00
Michael Nutt
b3340caea4 fixing hashid function registration when hashid is disabled 2022-05-10 20:26:42 +09:00
Sergei Trifonov
376e556474
Merge pull request #36861 from Vxider/fix-fire-hop-window
Fix fire in window view with hop window
2022-05-10 09:25:24 +02:00
bharatnc
592de6895c Merge remote-tracking branch 'upstream/master' into ncb/h3-missing-traversal-funcs 2022-05-09 22:41:03 -07:00
avogar
cbada6fe03 Fix Illegal column Nothing while using arrayMap 2022-05-09 15:51:31 +00:00
Yakov Olkhovskiy
24d1176bf3
Update CMakeLists.txt 2022-05-09 09:37:03 -04:00
Yakov Olkhovskiy
a0e67be32f
Update CMakeLists.txt 2022-05-09 08:54:00 -04:00
qieqieplus
b00a17ca38 Merge branch 'master' into array-distance-functions 2022-05-09 15:15:07 +08:00
qieqieplus
307511aab4 impl norm functions for array 2022-05-09 14:42:09 +08:00
qieqieplus
a17da05bda use return type as matrix value type 2022-05-09 14:42:02 +08:00
Robert Schulze
1b81bb49b4
Enable clang-tidy modernize-deprecated-headers & hicpp-deprecated-headers
Official docs:

  Some headers from C library were deprecated in C++ and are no longer
  welcome in C++ codebases. Some have no effect in C++. For more details
  refer to the C++ 14 Standard [depr.c.headers] section. This check
  replaces C standard library headers with their C++ alternatives and
  removes redundant ones.
2022-05-09 08:23:33 +02:00
Yakov Olkhovskiy
a2b1f7fe08
Update CMakeLists.txt 2022-05-09 01:15:50 -04:00
bharatnc
d49491a945 add h3HexRing func 2022-05-08 22:05:44 -07:00
bharatnc
ef623a39a0 minor fix to func return type 2022-05-08 22:05:44 -07:00
bharatnc
2145aa3e3a add h3Distance func 2022-05-08 22:05:44 -07:00
Yakov Olkhovskiy
c53ce4269f
Update CMakeLists.txt
define language for header only library
2022-05-09 00:51:03 -04:00
Michael Nutt
e87309ae8d clang-format FunctionHashID 2022-05-09 09:33:47 +09:00
Michael Nutt
e9f8114738 clean up std::string usage 2022-05-09 09:00:10 +09:00
Michael Nutt
477d9b1793 guard against hashid support being disabled 2022-05-09 07:52:35 +09:00
Robert Schulze
61cbcbf073
Enable clang-tidy readability-misleading-indentation
Official docs:

  Correct indentation helps to understand code. Mismatch of the
  syntactical structure and the indentation of the code may hide serious
  problems.
2022-05-08 19:12:01 +02:00
Michael Nutt
c16ce7657e add hashid support 2022-05-08 06:42:51 +09:00
mergify[bot]
2d1057bc87
Merge branch 'master' into fix-substring-negative-offset-length 2022-05-07 10:30:39 +00:00
bharatnc
be3f497b30 add h3Line func 2022-05-06 09:17:07 -07:00
Anton Popov
0caf91602f
Merge pull request #36812 from CurtizJ/hash-array-of-tuples
Allow to execute hash functions with arguments of type `Array(Tuple(..))`
2022-05-06 14:15:38 +02:00
mergify[bot]
eba26ec956
Merge branch 'master' into fix-fire-hop-window 2022-05-05 13:11:34 +00:00
bharatnc
01ea1beee5 Merge remote-tracking branch 'upstream/master' into ncb/h3-unidirectionaledges-funcs 2022-05-04 15:55:56 -07:00
Yakov Olkhovskiy
9c1a06703a
Merge pull request #36564 from awakeljw/fork_chmaster2
Fix CAST Object to Object with Nullable subcolumns
2022-05-04 14:40:43 -04:00
Vxider
407c14251a simplify code 2022-05-04 20:56:09 +08:00
mergify[bot]
17aecac7ff
Merge branch 'master' into new-clangtidies 2022-05-03 19:44:01 +00:00
Robert Schulze
0a4eccb73e
Activated a bunch of LLVM 12/13/14 clang-tidy warnings
Omitted new checks which produce too many matches or which are
controversial (e.g. readability-identifier-length).

New checks:

- misc-misleading-bidirectional + misc-misleading-identifier

  Detects potential attack as described in the Trojan Source attack

- modernize-macro-to-enum

  Replaces groups of adjacent macros with an unscoped anonymous enum

- modernize-shrink-to-fit

  Replace copy and swap tricks on shrinkable containers with the
  shrink_to_fit() method call

- modernize-use-transparent-functors

  Prefer transparent functors to non-transparent ones

- modernize-use-uncaught-exceptions

  This check will warn on calls to std::uncaught_exception and replace
  them with calls to std::uncaught_exceptions (uncaught_exception was
  deprecated with C++17)

- performance-no-int-to-ptr

  Diagnoses every integer to pointer cast

- readability-duplicate-include

  Looks for duplicate includes and removes them

- readability-redundant-preprocessor

  Finds potentially redundant preprocessor directives

- bugprone-lambda-function-name

  Checks for attempts to get the name of a function from within a lambda
  expression

- bugprone-redundant-branch-condition

  Finds condition variables in nested if statements that were also
  checked in the outer if statement and were not changed

- bugprone-shared-ptr-array-mismatch

  Finds initializations of C++ shared pointers to non-array type that
  are initialized with an array

- bugprone-stringview-nullptr

  Checks for various ways that the const CharT* constructor of
  std::basic_string_view can be passed a null argument and replaces them
  with the default constructor in most cases

- bugprone-suspicious-memory-comparison

  Finds potentially incorrect calls to memcmp() based on properties of
  the arguments
2022-05-03 09:22:11 +02:00
bharatnc
e56f7a1451 fix style check 2022-05-02 22:20:27 -07:00
bharatnc
a0da885c3c add h3GetUnidirectionalEdgeBoundary func 2022-05-02 21:38:54 -07:00
Dmitry Novik
9be17ef50c
Merge pull request #35111 from azat/optimize_aggregation_in_order-prefix
Implement partial GROUP BY key for optimize_aggregation_in_order
2022-05-02 17:49:48 +02:00
Vladimir C
7293a69e5e
Merge pull request #36656 from amosbird/timefunctionunderflow
Saturate date/datetime to zero (part 2)
2022-05-02 17:10:48 +02:00
bharatnc
745a44a7b0 add h3GetUnidirectionalEdgesFromHexagon func 2022-05-01 22:25:58 -07:00
bharatnc
30d14c1217 add h3GetIndexesFromUnidirectionalEdge func 2022-05-01 21:26:44 -07:00
awakeljw
0a32fe4da3 Fix CAST Object to Object with Nullable subcolumns 2022-05-02 11:31:13 +08:00
bharatnc
77b5f6fee0 add h3GetDestinationIndexFromUnidirectionalEdge func 2022-05-01 14:06:45 -07:00
bharatnc
7e871adf91 add h3GetOriginIndexFromUnidirectionalEdge func 2022-05-01 13:47:43 -07:00
bharatnc
0e4a833717 add h3UnidirectionalEdgeIsValid func 2022-05-01 13:21:18 -07:00
bharatnc
6ce66e6d13 add func h3GetUnidirectionalEdge 2022-05-01 11:12:05 -07:00
Mikhail Artemenko
41f657d8ed
Merge branch 'master' into MeiliSearch 2022-05-01 10:01:56 +03:00
Alexey Milovidov
1ddb04b992
Merge pull request #36715 from amosbird/refactorbase
Reorganize source files so that base won't depend on Common
2022-04-30 09:40:58 +03:00
Anton Popov
9878cae3e8 allow to execute hash function with arguments of type Array(Tuple(..)) 2022-04-29 18:50:42 +00:00
Azat Khuzhin
767acd53fb Add ability to pass range of rows to Aggregator
v2: fix compiled aggregate functions (seek result to row_start)
v3: fix compiled aggregate functions (seek args to row_start)
v4: change signatures for JIT
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:57:55 +03:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Amos Bird
e81929a8b5
Saturate date/datetime to zero (part 2)
For partial hours/minutes timezones.
2022-04-29 10:24:18 +08:00
Meena Renganathan
bdaf5391cf Merge branch 'master' of https://github.com/DevTeamBK/ClickHouse into FIPS_compliance 2022-04-28 06:15:46 -07:00
Nikita Mikhaylov
7d95051d32
Fixed integer overflow in toStartOfInterval (#36546) 2022-04-26 11:44:57 +02:00
Meena Renganathan
ab329721d7 Merge branch 'master' of https://github.com/DevTeamBK/ClickHouse into FIPS_compliance 2022-04-25 06:21:44 -07:00
Memo
856412ea6e fix wrong alias 2022-04-22 11:27:24 +08:00
Memo
25f4d76da3 change name 2022-04-22 11:24:44 +08:00
Memo
956d525840 fix conflict 2022-04-22 11:22:50 +08:00
Memo
32721b001f add alias 2022-04-22 11:18:07 +08:00
Memo
39aadf0975 replaced toStartOfFiveMinute to toStartOfFiveMinutes 2022-04-22 10:49:59 +08:00
olevino999
6594ae8e44 Merge branch 'master' of github.com:ClickHouse/ClickHouse into wyhash 2022-04-21 19:36:25 +03:00
Maksim Kita
57444fc7d3
Merge pull request #36444 from rschu1ze/clang-tidy-fixes
Clang tidy fixes
2022-04-21 16:11:27 +02:00
olevino999
465191b562 wyhash 2022-04-21 02:31:31 +03:00
olevino999
64989afa52 wyhash 2022-04-21 02:26:37 +03:00
Yakov Olkhovskiy
95fc6243b1
Merge pull request #36386 from Joeywzr/hex_support_uint128
hex support Int128/Int256/UInt128/UInt256
2022-04-20 11:11:51 -04:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
qieqieplus
2865c8141d Merge branch 'master' into array-distance-functions 2022-04-19 14:46:39 +08:00
qieqieplus
c4b5c45740 refactor & add tests 2022-04-19 14:39:40 +08:00
Memo
8d4e433c7d add int8 int16 int32 int64 and tests 2022-04-19 14:38:16 +08:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Memo
335be4c807 hex support Int128/Int256/UInt128/UInt256 2022-04-18 20:13:43 +08:00
Alexey Milovidov
f6ab2bd523
Merge pull request #36312 from ClickHouse/remove-arcadia
Remove remaining parts of Arcadia
2022-04-18 07:02:54 +03:00
Alexey Milovidov
294efeccfe Fix clang-tidy-14 (part 1) 2022-04-16 04:54:04 +02:00
Alexey Milovidov
cbeeb7ec4f Remove Arcadia 2022-04-16 00:20:47 +02:00
Mikhail Artemenko
2fd86cc564
Merge branch 'master' into MeiliSearch 2022-04-13 12:05:46 +03:00
bharatnc
e5494de63c h3Res0Indexes - remove unused array 2022-04-11 22:18:55 -07:00
Anton Popov
471e945efe
Merge pull request #35934 from ClickHouse/make_date
Implementation of makeDateTime() and makeDateTime64() #30895
2022-04-11 16:38:23 +02:00
Alexander Tokmakov
6a46da93ae Merge branch 'master' into mvcc_prototype 2022-04-07 23:22:19 +02:00
Kruglov Pavel
73adbb4c15
Merge pull request #35986 from amosbird/better-scalar1
Fix performance regression of scalar query
2022-04-07 14:07:59 +02:00
Alexander Tokmakov
8290ffa88d Merge branch 'master' into mvcc_prototype 2022-04-07 13:50:42 +02:00
Alexander Gololobov
42d4a84a6f More tests for corner cases 2022-04-07 12:34:26 +02:00
Alexander Gololobov
81d150ed43 Implementation of makeDateTime() and makeDateTime64() 2022-04-07 00:30:18 +02:00
Meena Renganathan
645e156af6 Updated the boringssl-cmake to match the latest broingssl module update 2022-04-06 14:52:33 -07:00
Mikhail Artemenko
151eeb1a27
Merge branch 'master' into MeiliSearch 2022-04-06 17:07:55 +03:00
Amos Bird
df06f9f974
Fix performance regression of scalar query 2022-04-06 17:50:22 +08:00
Vladimir C
2ebae2d722
Merge pull request #35682 from CurtizJ/dynamic-columns-6 2022-04-06 11:48:07 +02:00
Alexander Tokmakov
1fe50ad201 Merge branch 'master' into mvcc_prototype 2022-04-05 14:38:02 +02:00
Maksim Kita
b160ffd726
Merge pull request #35723 from ClickHouse/array-has-all-sse-avx2-optimizations
Merging #27653
2022-04-05 11:09:14 +02:00
Alexander Gololobov
f0de8eb625 Extracted argument handling into a separate class to reuse it for makeDateTime() and makeDateTime64() 2022-04-04 19:57:04 +02:00
Alexander Tokmakov
a2167f12b8 Merge branch 'master' into mvcc_prototype 2022-04-04 14:24:23 +02:00
Maksim Kita
47528de78b Fix build 2022-04-04 14:07:05 +02:00
Maksim Kita
af405d3ba6 Fixed style check 2022-04-04 13:34:27 +02:00
Nikolai Kochetov
19819c72f8
Merge pull request #35290 from bigo-sg/function_enumerate_streams
Add function getTypeSerializationStreams
2022-04-04 12:09:53 +02:00
Alexey Milovidov
d9e5ca2119
Merge pull request #34394 from holadepo/last_day
Add toLastDayOfMonth function
2022-04-04 07:02:08 +03:00
Alexander Tokmakov
5a50ad9de3 Merge branch 'master' into mvcc_prototype 2022-03-31 11:35:04 +02:00
taiyang-li
a5765dccb1 Merge branch 'master' into function_enumerate_streams 2022-03-31 12:21:00 +08:00
Kruglov Pavel
4ec3c35e14
Merge pull request #35755 from Avogar/fix-custom-to-string
Fix bug in conversion from custom types to string
2022-03-31 00:06:48 +02:00
Maksim Kita
e43fdcd7eb Function hasAll added dynamic dispatch for SSE4.2, AVX2 2022-03-30 18:41:34 +02:00
Maksim Kita
8d0a9689e4 Update gatherutils CMakeLists to use X86_INTRINSICS_FLAGS from cpu_features 2022-03-30 18:40:18 +02:00
Maksim Kita
91eec8962f Rename test 2022-03-30 18:39:28 +02:00
avogar
af4bfec051 Fix bug in conversion from custom types to string 2022-03-30 11:19:03 +00:00
Vladimir C
31c367d3cd
Merge pull request #35651 from amosbird/columntransformerrename 2022-03-30 12:37:30 +02:00
Antonio Andelic
d85ed8f2a9
Merge pull request #35655 from ClickHouse/exception-compile-time-message-check
Use compile-time check for `Exception` messages
2022-03-30 08:11:32 +02:00
taiyang-li
47f3e9330e merge master and fix conflict 2022-03-30 11:06:51 +08:00
Anton Popov
a842a81aba
Merge pull request #35690 from CurtizJ/flatten-tuple
Add function `flattenTuple`
2022-03-30 00:24:36 +02:00
Maksim Kita
2742b88e6c
Merge pull request #27653 from ContentSquare/hasAllAny_SIMD
Implement HasAll specialization for SSE and AVX2
2022-03-29 16:35:59 +02:00
Alexander Tokmakov
287d858fda Merge branch 'master' into mvcc_prototype 2022-03-29 16:24:12 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Alexander Gololobov
bf376ee2f5
Merge pull request #35628 from ClickHouse/make_date
Implementation of makeDate and makeDate32
2022-03-29 13:35:26 +02:00
taiyang-li
db436ad621 Merge branch 'master' into function_enumerate_streams 2022-03-29 11:35:21 +08:00
Alexander Gololobov
b49993f993 Fixes according to the code review 2022-03-28 22:47:39 +02:00
Anton Popov
24c0cf86d4 add function 'flattenTuple' 2022-03-28 19:32:12 +00:00