Commit Graph

22 Commits

Author SHA1 Message Date
Robert Schulze
df889351ad
feat: replace unbounded vectorscan cache by bounded cache
VectorScan patterns can grow large (up to multiple MBs) and queries
involving different VectorScan patterns lead to an ever increasing
pattern cache size.

With this commit, the unbounded cache for vectorscan patterns is
replaced by a bounded cache with "CacheTable" eviction strategy, similar
to what we have for re2 patterns. The cache size is currently hard-coded
to 500 entries.

Fixes #19869
2022-08-16 21:09:47 +00:00
Robert Schulze
89bf69c35f
style: improve code aesthetics and make it slightly safer 2022-08-16 21:09:32 +00:00
Robert Schulze
8e1a3cd194
Don't crash in functions multiMatch[Fuzzy](AllIndices/Any/AnyIndex)() with empty needle
Queries like
  "select multiMatchAnyIndex('abc', []::Array(String))"
were not properly handled and crashed.
2022-07-08 11:18:53 +00:00
Robert Schulze
49348b833a
Simplify 2022-07-07 20:25:26 +00:00
Robert Schulze
1de5e9a7da
Avoid copy-ing array elements 2022-07-07 12:33:34 +00:00
Robert Schulze
1eed72b525
Make more multi-search methods work with non-const needles
After making function multi[Fuzzy]Match(Any|AnyIndex|AllIndices)() work
with non-const needles, 12 more functions started to fail in test
"00233_position_function_family":

multiSearchAny()
multiSearchAnyCaseInsensitive()
multiSearchAnyUTF8
multiSearchAnyCaseInsensitiveUTF8()

multiSearchFirstPosition()
multiSearchFirstPositionCaseInsensitive()
multiSearchFirstPositionUTF8()
multiSearchFirstPositionCaseInsensitiveUTF8()

multiSearchFirstIndex()
multiSearchFirstIndexCaseInsensitive()
multiSearchFirstIndexUTF8()
multiSearchFirstIndexCaseInsensitiveUTF8()

Failing queries take the form
  select 0 = multiSearchAny('\0', CAST([], 'Array(String)'));
2022-07-04 14:00:21 +00:00
Robert Schulze
ece61f6da3
Fix davenger's review comments
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907397214
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907385290
https://github.com/ClickHouse/ClickHouse/pull/38434#discussion_r907406097

(the latter is no longer relevant as the affected places were removed in
the meantime)
2022-07-04 10:43:21 +00:00
Robert Schulze
d547aa7849
Allow non-const pattern array argument in multi[Fuzzy]Match*()
Resolves #38046
2022-07-04 10:43:16 +00:00
Robert Schulze
959cbaab02
Move loop over patterns into implementations
- This is preparation for non-const regexp arguments, where this loop
  will run for each row.
2022-06-26 16:26:13 +00:00
Robert Schulze
e2b11899a1
Move check if cfg allows hyperscan into implementations
- This is not needed for non-const regexp array arguments but cleans up
  the code and runs the check only in functions which actually use
  hyperscan.
2022-06-26 16:25:49 +00:00
Robert Schulze
c9ce0efa66
Instantiate MultiMatchAnyImpl template using enums
- With this, invalid combinations of the FindAny/FindAnyIndex bools are
  no longer possible and we can remove the corresponding check

- Also makes the instantiations more readable.
2022-06-26 16:25:49 +00:00
Robert Schulze
7913edc172
Move check for hyperscan regexp constraints into implementations
- This is preparation for non-const regexp arguments, where this check
  will run for each row.
2022-06-26 15:38:05 +00:00
Robert Schulze
4bc59c18e3
Cosmetics: Move some code around + docs + whitespaces + minor stuff 2022-06-26 15:34:15 +00:00
Robert Schulze
bb7c627964
Cosmetics: Pass patterns around as std::string_view instead of StringRef
- The patterns are not used in hashing, there should not be a performance
  impact when we use stuff from the standard library instead.

- added forgotten .reserve() in FunctionsMultiStringPosition.h
2022-06-26 15:32:19 +00:00
Robert Schulze
2c828338f4
Replace hyperscan by vectorscan
This commit migrates ClickHouse to Vectorscan. The first 10 min of
[0] explain the reasons for it.

(*) Addresses (but does not resolve) #38046

(*) Config parameter names (e.g. "max_hyperscan_regexp_length") are
    preserved for compatibility. Likewise, error codes (e.g.
    "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g.
    "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in
    replacement.

[0] https://www.youtube.com/watch?v=KlZWmmflW6M
2022-06-24 10:47:52 +02:00
Robert Schulze
b044d44fef
Refactoring: Make template instantiation easier to read
- introduced class MatchTraits with enums that replace bool template
  parameters

- (minor: made negation the last template parameters because negation
  executes last during evaluation)
2022-05-25 10:03:58 +02:00
Robert Schulze
0299cc87e4
Improve naming consistency of string search code
Just renamings, nothing major ...
2022-05-22 17:50:38 +02:00
Alexey Milovidov
8b4a6a2416 Remove cruft 2021-10-28 02:10:39 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Vasily Nemkov
2c6b9aa174 Better exception messages for some String-related functions 2021-09-26 08:18:37 +03:00
vdimir
8fc470d1c5 Fix compilcation errror in MultiMatchAnyImpl.h 2020-08-15 18:08:00 +00:00
Alexey Milovidov
466da445e1 Every function in its own file, part 13 2020-05-07 02:21:13 +03:00