Semantics are unchanged.
Some special case handling was changed to early-out, because of that the
indentation changed but the logic is the same as before.
With this commit, SQL functions LIKE and MATCH and their variants can
work with non-const needle arguments. E.g.
create table tab
(id UInt32, haystack String, needle String)
engine = MergeTree()
order by id;
insert into tab values
(1, 'Hello', '%ell%')
(2, 'World', '%orl%')
select id, haystack, needle, like(haystack, needle)
from tab;
For that, methods vectorVector() and vectorFixedVector() were added to
MatchImpl. The existing code for const needles has an optimization where
the compiled regexp is cached. The new code expects a different needle
per row and consequently does not cache the regexp.
The previous logic was smart but too inflexible to support the next
commits. Replace by a simple pushdown logic where string search
implementations return their const arguments instead of having the
common class figure these out based on properties/traits.
Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.
The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()
v0: substringCount()
v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle
v3: rename to countSubstrings (by analogy with countEqual())