After making function multi[Fuzzy]Match(Any|AnyIndex|AllIndices)() work
with non-const needles, 12 more functions started to fail in test
"00233_position_function_family":
multiSearchAny()
multiSearchAnyCaseInsensitive()
multiSearchAnyUTF8
multiSearchAnyCaseInsensitiveUTF8()
multiSearchFirstPosition()
multiSearchFirstPositionCaseInsensitive()
multiSearchFirstPositionUTF8()
multiSearchFirstPositionCaseInsensitiveUTF8()
multiSearchFirstIndex()
multiSearchFirstIndexCaseInsensitive()
multiSearchFirstIndexUTF8()
multiSearchFirstIndexCaseInsensitiveUTF8()
Failing queries take the form
select 0 = multiSearchAny('\0', CAST([], 'Array(String)'));
Sometimes it is useful to build contrib with debug symbols for further
debugging.
With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).
P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It can be useful to match versions, since in some tables
(system.trace_log) there is only revision column.
P.S. came to this when was digging into stress reports from CI.
P.P.S. case insensitive by analogy with version().
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.
create table t (id UInt32, pattern String) engine = MergeTree() order by id;
insert into t values (1, 'x');
select countSubstrings('aaaxxxaa\0xxx', pattern) from t;
We returned 3 before this commit, now we return 6
- With this, invalid combinations of the FindAny/FindAnyIndex bools are
no longer possible and we can remove the corresponding check
- Also makes the instantiations more readable.
- This is not needed for non-const regexp array arguments (the
cardinality of arrays is fixed per column) but it cleans up the code
and runs the check only in functions which have restrictions on the
number of patterns.
- For functions using hyperscans, it was checked that the number of
regexes is < 2^32. Removed the check because I don't think anyone will
every specify 4 billion patterns.
- This is not needed for non-const regexp array arguments (the
cardinality of arrays is fixed per column) but it cleans up the code
and runs the check only in functions which have restrictions on the
number of patterns.
- For functions using hyperscans, it was checked that the number of
regexes is < 2^32. Removed the check because I don't think anyone will
every specify 4 billion patterns.