mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-27 18:12:02 +00:00
bcb058f999
The new per-dictionary settings control regex match semantics around case sensitivity and the '.' wildcard with newlines. They must be set at the dictionary level since they're applied to regex engines at pattern-compile-time. - regexp_dict_flag_case_insensitive: case insensitive matching - regexp_dict_flag_dotall: '.' matches all characters including newlines They correspond to HS_FLAG_CASELESS and HS_FLAG_DOTALL in Vectorscan and case_sensitive and dot_nl in RE2. These are the most useful options compatible with the internal behavior of RegExpTreeDictionary around splitting up simple and complex patterns between Vectorscan and RE2. The alternative is to use (?i) and/or (?s) for all patterns. However, (?s) isn't handled properly by OptimizedRegularExpression::analyze(). And while (?i) is, it still causes the dictionary to treat the pattern as "complex" for sequential scanning with RE2 rather than multi-matching with Vectorscan, even though Vectorscan supports case insensitive literal matching. Setting dictionary-wide flags is both more convenient, and circumvents these problems. |
||
---|---|---|
.. | ||
_snippet_dictionary_in_cloud.md | ||
index.md |