Commit Graph

12 Commits

Author SHA1 Message Date
Robert Schulze
81bb2242fd
Fix countSubstrings() & position() on patterns with 0-bytes
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.

    create table t (id UInt32, pattern String) engine = MergeTree() order by id;
    insert into t values (1, 'x');
    select countSubstrings('aaaxxxaa\0xxx', pattern) from t;

We returned 3 before this commit, now we return 6
2022-06-29 21:41:18 +00:00
Robert Schulze
bb7c627964
Cosmetics: Pass patterns around as std::string_view instead of StringRef
- The patterns are not used in hashing, there should not be a performance
  impact when we use stuff from the standard library instead.

- added forgotten .reserve() in FunctionsMultiStringPosition.h
2022-06-26 15:32:19 +00:00
Robert Schulze
e25ca139cd
Implement SQL functions (NOT) (I)LIKE() + MATCH() with non-const needles
With this commit, SQL functions LIKE and MATCH and their variants can
work with non-const needle arguments. E.g.

  create table tab
    (id UInt32, haystack String, needle String)
    engine = MergeTree()
    order by id;

  insert into tab values
  (1, 'Hello', '%ell%')
  (2, 'World', '%orl%')

  select id, haystack, needle, like(haystack, needle)
  from tab;

For that, methods vectorVector() and vectorFixedVector() were added to
MatchImpl. The existing code for const needles has an optimization where
the compiled regexp is cached. The new code expects a different needle
per row and consequently does not cache the regexp.
2022-05-23 09:41:28 +02:00
Robert Schulze
4829ae8380
Replace overly clever const argument logic by something simpler
The previous logic was smart but too inflexible to support the next
commits. Replace by a simple pushdown logic where string search
implementations return their const arguments instead of having the
common class figure these out based on properties/traits.
2022-05-22 17:50:38 +02:00
Robert Schulze
0299cc87e4
Improve naming consistency of string search code
Just renamings, nothing major ...
2022-05-22 17:50:38 +02:00
Vasily Nemkov
2c6b9aa174 Better exception messages for some String-related functions 2021-09-26 08:18:37 +03:00
Alexey Milovidov
269b6383f5 Check for #pragma once in headers 2020-10-10 21:37:02 +03:00
vdimir
df63ada6cc Fix Darwin build (std::max candidate template) 2020-08-15 20:08:59 +00:00
Anton Popov
1ac1955ffb
bump CI 2020-08-05 14:32:41 +03:00
vdimir
368314b930 Fix style in PositionImpl and FunctionsStringSearch 2020-08-02 14:24:39 +00:00
vdimir
9ed7df64cd Support 'start_pos' argument in 'position' function 2020-08-02 12:56:18 +00:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00