ClickHouse/src/Dictionaries
johanngan bcb058f999 Add case insensitive and dot-all modes to RegExpTree dictionary
The new per-dictionary settings control regex match semantics around
case sensitivity and the '.' wildcard with newlines. They must be set at
the dictionary level since they're applied to regex engines at
pattern-compile-time.

- regexp_dict_flag_case_insensitive: case insensitive matching
- regexp_dict_flag_dotall: '.' matches all characters including newlines

They correspond to HS_FLAG_CASELESS and HS_FLAG_DOTALL in Vectorscan
and case_sensitive and dot_nl in RE2. These are the most useful options
compatible with the internal behavior of RegExpTreeDictionary around
splitting up simple and complex patterns between Vectorscan and RE2.

The alternative is to use (?i) and/or (?s) for all patterns. However,
(?s) isn't handled properly by OptimizedRegularExpression::analyze().
And while (?i) is, it still causes the dictionary to treat the pattern
as "complex" for sequential scanning with RE2 rather than multi-matching
with Vectorscan, even though Vectorscan supports case insensitive
literal matching. Setting dictionary-wide flags is both more convenient,
and circumvents these problems.
2023-09-06 11:28:53 -05:00
..
Embedded Remove PVS-Studio 2023-02-19 23:30:05 +01:00
tests Clean up GCC warning pragmas 2023-04-11 18:21:08 +00:00
CacheDictionary.cpp Merge remote-tracking branch 'origin/master' into ADQM-870 2023-07-10 13:19:21 +00:00
CacheDictionary.h Remove superfluous includes of logger_userful.h from headers 2023-04-10 17:59:30 +02:00
CacheDictionaryStorage.h Don't count unreserved bytes in Arenas as read_bytes 2023-04-13 12:43:24 +02:00
CacheDictionaryUpdateQueue.cpp ThreadPool metrics introspection 2023-03-29 10:46:59 +02:00
CacheDictionaryUpdateQueue.h review fixes 2023-01-12 15:51:04 +00:00
CassandraDictionarySource.cpp
CassandraDictionarySource.h
CassandraHelpers.cpp
CassandraHelpers.h
CassandraSource.cpp Better formatting for exception messages (#45449) 2023-01-24 00:13:58 +03:00
CassandraSource.h
ClickHouseDictionarySource.cpp Add USE NAMED COLLECTION access 2023-06-06 14:46:34 +02:00
ClickHouseDictionarySource.h refine table source for regexp tree dictionary 2023-05-09 20:17:54 +02:00
CMakeLists.txt Consistent file management in CMake 2023-08-21 11:45:08 +08:00
DictionaryFactory.cpp If a dictionary is created with a complex key, automatically choose the "complex key" layout variant. 2023-05-06 11:09:45 +08:00
DictionaryFactory.h Fixed a lowercase initial letter and removed needless data 2023-05-07 19:06:06 +08:00
DictionaryHelpers.cpp
DictionaryHelpers.h Reduce the usage of Arena.h 2023-04-13 10:31:32 +02:00
DictionarySource.cpp
DictionarySource.h
DictionarySourceFactory.cpp
DictionarySourceFactory.h
DictionarySourceHelpers.cpp
DictionarySourceHelpers.h
DictionaryStructure.cpp Better formatting for exception messages (#45449) 2023-01-24 00:13:58 +03:00
DictionaryStructure.h support IPv4 and IPv6 as dictionary attributes 2023-07-04 02:19:45 +00:00
DirectDictionary.cpp Simplification 2023-05-07 06:31:00 +02:00
DirectDictionary.h PullingAsyncPipelineExecutor for Direct dictionary with ClickHouse source 2023-03-27 09:52:26 +00:00
ExecutableDictionarySource.cpp Refactor 2023-08-18 15:38:46 +08:00
ExecutableDictionarySource.h Remove superfluous includes of logger_userful.h from headers 2023-04-10 17:59:30 +02:00
ExecutablePoolDictionarySource.cpp Refactor 2023-08-18 15:38:46 +08:00
ExecutablePoolDictionarySource.h Remove superfluous includes of logger_userful.h from headers 2023-04-10 17:59:30 +02:00
ExternalQueryBuilder.cpp fix build 2023-05-08 00:57:13 +02:00
ExternalQueryBuilder.h refine table source for regexp tree dictionary 2023-05-09 20:17:54 +02:00
FileDictionarySource.cpp
FileDictionarySource.h
FlatDictionary.cpp Always check that block has rows to fix wrong allocation in HashedArrayDictionary::updateData and others. 2023-09-05 09:57:13 +02:00
FlatDictionary.h
getDictionaryConfigurationFromAST.cpp Merge branch 'master' into master 2023-07-30 12:20:54 +08:00
getDictionaryConfigurationFromAST.h fix another issue with dependencies 2023-05-05 16:27:12 +02:00
HashedArrayDictionary.cpp Always check that block has rows to fix wrong allocation in HashedArrayDictionary::updateData and others. 2023-09-05 09:57:13 +02:00
HashedArrayDictionary.h replace domain IP types (IPv4, IPv6) with native 2022-11-14 14:17:17 +00:00
HashedDictionary.cpp Always check that block has rows to fix wrong allocation in HashedArrayDictionary::updateData and others. 2023-09-05 09:57:13 +02:00
HashedDictionary.h Wrap implementation helpers into HashedDictionaryImpl namespace 2023-05-19 06:07:21 +02:00
HashedDictionaryCollectionTraits.h Wrap implementation helpers into HashedDictionaryImpl namespace 2023-05-19 06:07:21 +02:00
HashedDictionaryCollectionType.h Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment 2023-05-19 06:07:21 +02:00
HierarchyDictionariesUtils.cpp
HierarchyDictionariesUtils.h
HTTPDictionarySource.cpp Merge branch 'master' into headers-blacklist 2023-07-03 16:54:50 +02:00
HTTPDictionarySource.h
ICacheDictionaryStorage.h
IDictionary.h Add dictGetAll function for RegExpTreeDictionary 2023-06-04 23:46:04 -05:00
IDictionarySource.h Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries 2023-01-18 20:18:37 +01:00
IPAddressDictionary.cpp Don't count unreserved bytes in Arenas as read_bytes 2023-04-13 12:43:24 +02:00
IPAddressDictionary.h Reduce the usage of Arena.h 2023-04-13 10:31:32 +02:00
LibraryDictionarySource.cpp
LibraryDictionarySource.h
MongoDBDictionarySource.cpp Update MongoDB protocol 2023-05-22 09:05:23 +00:00
MongoDBDictionarySource.h Update MongoDB protocol 2023-05-22 09:05:23 +00:00
MySQLDictionarySource.cpp Add USE NAMED COLLECTION access 2023-06-06 14:46:34 +02:00
MySQLDictionarySource.h
NullDictionarySource.cpp
NullDictionarySource.h
PolygonDictionary.cpp
PolygonDictionary.h
PolygonDictionaryImplementations.cpp
PolygonDictionaryImplementations.h
PolygonDictionaryUtils.cpp
PolygonDictionaryUtils.h Ditch tons of garbage 2023-08-09 02:19:02 +02:00
PostgreSQLDictionarySource.cpp Remove superfluous includes of logger_userful.h from headers 2023-04-10 17:59:30 +02:00
PostgreSQLDictionarySource.h Remove superfluous includes of logger_userful.h from headers 2023-04-10 17:59:30 +02:00
RangeHashedDictionary.h Always check that block has rows to fix wrong allocation in HashedArrayDictionary::updateData and others. 2023-09-05 09:57:13 +02:00
RangeHashedDictionaryComplex.cpp Fix typo 2023-01-24 23:00:02 +00:00
RangeHashedDictionarySimple.cpp Review suggestions 2023-01-24 22:54:01 +00:00
readInvalidateQuery.cpp Better formatting for exception messages (#45449) 2023-01-24 00:13:58 +03:00
readInvalidateQuery.h
RedisDictionarySource.cpp little fix 2023-06-02 10:05:54 +08:00
RedisDictionarySource.h unify storage type 2023-06-02 10:05:54 +08:00
RedisSource.cpp little fix 2023-06-02 10:05:54 +08:00
RedisSource.h new redis engine schema design 2023-06-02 10:05:54 +08:00
RegExpTreeDictionary.cpp Add case insensitive and dot-all modes to RegExpTree dictionary 2023-09-06 11:28:53 -05:00
RegExpTreeDictionary.h Add case insensitive and dot-all modes to RegExpTree dictionary 2023-09-06 11:28:53 -05:00
registerCacheDictionaries.cpp Better formatting for exception messages (#45449) 2023-01-24 00:13:58 +03:00
registerDictionaries.cpp
registerDictionaries.h
registerRangeHashedDictionary.cpp Fix MSan build 2023-01-03 02:21:26 +00:00
SSDCacheDictionaryStorage.h Check return value of ::close() 2023-02-07 11:28:22 +01:00
XDBCDictionarySource.cpp Remove PVS-Studio 2023-02-19 23:30:05 +01:00
XDBCDictionarySource.h
YAMLRegExpTreeDictionarySource.cpp Remove -Wshadow suppression which leaked into global namespace 2023-04-13 08:46:40 +00:00
YAMLRegExpTreeDictionarySource.h fix build 2023-01-04 12:45:12 +01:00