Commit Graph

3645 Commits

Author SHA1 Message Date
Kruglov Pavel
57a719bafd
Merge pull request #39037 from amosbird/index-fix-1-again
Fix toHour() monotonicity which can lead to incorrect query result (incorrect index analysis) (second try)
2022-07-11 13:36:01 +02:00
Robert Schulze
5b8e448c7e
Merge pull request #39012 from ClickHouse/fix-crashing-stringsearch-with-empty-needle
Don't throw Logical error in functions multiMatch[Fuzzy](AllIndices/Any/AnyIndex)() with empty needle
2022-07-11 08:13:52 +02:00
Amos Bird
28aefc33a0
Revert "Merge pull request #39001 from ClickHouse/revert-38675-index-fix-1"
This reverts commit 1cf01bb959, reversing
changes made to af1136c990.
2022-07-08 23:22:10 +08:00
Robert Schulze
8e1a3cd194
Don't crash in functions multiMatch[Fuzzy](AllIndices/Any/AnyIndex)() with empty needle
Queries like
  "select multiMatchAnyIndex('abc', []::Array(String))"
were not properly handled and crashed.
2022-07-08 11:18:53 +00:00
Alexander Tokmakov
d4784203b7
Revert "Fix toHour() monotonicity which can lead to incorrect query result (incorrect index analysis)" 2022-07-08 12:51:30 +03:00
Robert Schulze
524f39551c
Merge pull request #38485 from ClickHouse/multi-match-with-non_const-patterns
Multi match with non const patterns
2022-07-08 09:29:10 +02:00
Alexey Milovidov
89fdcbf08c
Merge pull request #38675 from amosbird/index-fix-1
Fix toHour() monotonicity which can lead to incorrect query result (incorrect index analysis)
2022-07-08 00:32:27 +03:00
Robert Schulze
49348b833a
Simplify 2022-07-07 20:25:26 +00:00
Robert Schulze
1de5e9a7da
Avoid copy-ing array elements 2022-07-07 12:33:34 +00:00
Robert Schulze
dec184f61d
Add comment about const needle + const haystack 2022-07-06 17:41:15 +00:00
Robert Schulze
8a18705729
More variable renamings for more uniformity 2022-07-06 14:55:02 +00:00
Robert Schulze
0b0d64c5ca
Don't resize output vector in each loop iteration 2022-07-06 14:45:22 +00:00
Robert Schulze
144c1edb03
Inherit default implementation of getArgumentsThatAreAlwaysConstant() 2022-07-06 14:42:19 +00:00
mergify[bot]
8cc2c3914b
Merge branch 'master' into isnullable 2022-07-06 14:40:01 +00:00
Robert Schulze
6a907b23fb
Replace typeid_cast() with checkAndGetColumnConst()
... syntactic sugar
2022-07-06 14:38:28 +00:00
Robert Schulze
0c4da85e75
More uniform naming 2022-07-06 14:29:42 +00:00
Nikolai Kochetov
020c99a269
Merge pull request #38617 from azat/contrib-debug-symbols
Add separate option to omit symbols from heavy contrib
2022-07-06 14:40:24 +02:00
Robert Schulze
d0b2f13f9d
Fix style check 2022-07-05 13:41:52 +02:00
lokax
e6bd0105b1 feat(Function): isNullable
Signed-off-by: lokax <m632656684@gmail.com>
2022-07-05 15:51:53 +08:00
lokax
849c46e6fa feat(Function): isNullable 2022-07-05 15:23:07 +08:00
mergify[bot]
f14b62b2d6
Merge branch 'master' into index-fix-1 2022-07-04 22:50:36 +00:00
Robert Schulze
1eed72b525
Make more multi-search methods work with non-const needles
After making function multi[Fuzzy]Match(Any|AnyIndex|AllIndices)() work
with non-const needles, 12 more functions started to fail in test
"00233_position_function_family":

multiSearchAny()
multiSearchAnyCaseInsensitive()
multiSearchAnyUTF8
multiSearchAnyCaseInsensitiveUTF8()

multiSearchFirstPosition()
multiSearchFirstPositionCaseInsensitive()
multiSearchFirstPositionUTF8()
multiSearchFirstPositionCaseInsensitiveUTF8()

multiSearchFirstIndex()
multiSearchFirstIndexCaseInsensitive()
multiSearchFirstIndexUTF8()
multiSearchFirstIndexCaseInsensitiveUTF8()

Failing queries take the form
  select 0 = multiSearchAny('\0', CAST([], 'Array(String)'));
2022-07-04 14:00:21 +00:00
Robert Schulze
ece61f6da3
Fix davenger's review comments
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907397214
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907385290
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907406097

(the latter is no longer relevant as the affected places were removed in
the meantime)
2022-07-04 10:43:21 +00:00
Robert Schulze
d547aa7849
Allow non-const pattern array argument in multi[Fuzzy]Match*()
Resolves #38046
2022-07-04 10:43:16 +00:00
Alexander Gololobov
8ce8158f7f Do computations in Float32 (not Float64) for arrays of Float32 2022-07-03 10:33:11 +02:00
Alexander Gololobov
c6691cc5f2 Improved vectorized execution of main loop for array norm/distance 2022-07-02 22:45:22 +02:00
mergify[bot]
b016be264c
Merge branch 'master' into squared_l2 2022-07-02 09:17:28 +00:00
Azat Khuzhin
e8f5cd3c68 Add separate option to omit symbols from heavy contrib
Sometimes it is useful to build contrib with debug symbols for further
debugging.

With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).

P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-02 06:32:03 +03:00
Robert Schulze
2a1ede0f5a
Merge pull request #38589 from ClickHouse/fix-zero-bytes-in-haystack
Fix countSubstrings() & position() on patterns with 0-bytes
2022-07-01 16:15:43 +02:00
Amos Bird
84a407f381
Fix toHour monotonicity 2022-07-01 18:24:24 +08:00
Alexander Gololobov
b2b31103c5 Reuse common code for L2Squared and L2 2022-06-30 14:12:25 +02:00
Azat Khuzhin
a47355877e
Add revision() function (#38555)
It can be useful to match versions, since in some tables
(system.trace_log) there is only revision column.

P.S. came to this when was digging into stress reports from CI.
P.P.S. case insensitive by analogy with version().

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-30 12:58:26 +02:00
Robert Schulze
81bb2242fd
Fix countSubstrings() & position() on patterns with 0-bytes
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.

    create table t (id UInt32, pattern String) engine = MergeTree() order by id;
    insert into t values (1, 'x');
    select countSubstrings('aaaxxxaa\0xxx', pattern) from t;

We returned 3 before this commit, now we return 6
2022-06-29 21:41:18 +00:00
Julian Gilyadov
d0d72e0b5e
Implement L2Squared Distance and Norm 2022-06-28 15:28:58 -04:00
Miel Donkers
4e9f396a48
Small improvement of the error message to hint at possible issue (#38458) 2022-06-28 13:36:30 +02:00
Robert Schulze
959cbaab02
Move loop over patterns into implementations
- This is preparation for non-const regexp arguments, where this loop
  will run for each row.
2022-06-26 16:26:13 +00:00
Robert Schulze
cb5d1a4a85
Fix style check 2022-06-26 16:25:49 +00:00
Robert Schulze
b8f67185bf
Cosmetics: Whitespaces 2022-06-26 16:25:49 +00:00
Robert Schulze
e2b11899a1
Move check if cfg allows hyperscan into implementations
- This is not needed for non-const regexp array arguments but cleans up
  the code and runs the check only in functions which actually use
  hyperscan.
2022-06-26 16:25:49 +00:00
Robert Schulze
c2cea38b97
Move local variable into if statement 2022-06-26 16:25:49 +00:00
Robert Schulze
c9ce0efa66
Instantiate MultiMatchAnyImpl template using enums
- With this, invalid combinations of the FindAny/FindAnyIndex bools are
  no longer possible and we can remove the corresponding check

- Also makes the instantiations more readable.
2022-06-26 16:25:49 +00:00
Robert Schulze
2f15d45f27
Move check for regexp array size into implementations
- This is not needed for non-const regexp array arguments (the
  cardinality of arrays is fixed per column) but it cleans up the code
  and runs the check only in functions which have restrictions on the
  number of patterns.

- For functions using hyperscans, it was checked that the number of
  regexes is < 2^32. Removed the check because I don't think anyone will
  every specify 4 billion patterns.
2022-06-26 16:25:43 +00:00
Robert Schulze
3478db9fb6
Move check for regexp array size into implementations
- This is not needed for non-const regexp array arguments (the
  cardinality of arrays is fixed per column) but it cleans up the code
  and runs the check only in functions which have restrictions on the
  number of patterns.

- For functions using hyperscans, it was checked that the number of
  regexes is < 2^32. Removed the check because I don't think anyone will
  every specify 4 billion patterns.
2022-06-26 15:38:12 +00:00
Robert Schulze
7913edc172
Move check for hyperscan regexp constraints into implementations
- This is preparation for non-const regexp arguments, where this check
  will run for each row.
2022-06-26 15:38:05 +00:00
Robert Schulze
89bfdd50bf
Remove unnecessary check
- getReturnTypeImpl() ensures that the haystack column has type "String"
  and we can simply assert that.
2022-06-26 15:34:24 +00:00
Robert Schulze
580d89477f
Minimally faster performance 2022-06-26 15:34:22 +00:00
Robert Schulze
4bc59c18e3
Cosmetics: Move some code around + docs + whitespaces + minor stuff 2022-06-26 15:34:15 +00:00
Robert Schulze
1273756911
Cosmetics: fmt-based exceptions 2022-06-26 15:33:18 +00:00
Robert Schulze
e5c74a14f7
Cosmetics: More consistent naming
- rename utility function and file to "checkHyperscanRegexp"
2022-06-26 15:33:18 +00:00
Robert Schulze
072e0855a8
Cosmetics: Make member variables const 2022-06-26 15:32:26 +00:00