ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-13 01:41:59 +00:00

Author	SHA1	Message	Date
johanngan	bcb058f999	Add case insensitive and dot-all modes to RegExpTree dictionary The new per-dictionary settings control regex match semantics around case sensitivity and the '.' wildcard with newlines. They must be set at the dictionary level since they're applied to regex engines at pattern-compile-time. - regexp_dict_flag_case_insensitive: case insensitive matching - regexp_dict_flag_dotall: '.' matches all characters including newlines They correspond to HS_FLAG_CASELESS and HS_FLAG_DOTALL in Vectorscan and case_sensitive and dot_nl in RE2. These are the most useful options compatible with the internal behavior of RegExpTreeDictionary around splitting up simple and complex patterns between Vectorscan and RE2. The alternative is to use (?i) and/or (?s) for all patterns. However, (?s) isn't handled properly by OptimizedRegularExpression::analyze(). And while (?i) is, it still causes the dictionary to treat the pattern as "complex" for sequential scanning with RE2 rather than multi-matching with Vectorscan, even though Vectorscan supports case insensitive literal matching. Setting dictionary-wide flags is both more convenient, and circumvents these problems.	2023-09-06 11:28:53 -05:00
Vitaly Baranov	3b58c5baa6	Always check that block has rows to fix wrong allocation in HashedArrayDictionary::updateData and others.	2023-09-05 09:57:13 +02:00
Sergei Trifonov	802579f3f1	Merge pull request #49618 from ClickHouse/concurrency-control-controllable Make concurrency control controllable	2023-08-29 19:44:51 +02:00
Raúl Marín	93dac0c880	Support clang-18 (Wmissing-field-initializers)	2023-08-23 15:53:45 +02:00
Amos Bird	076a67bdaa	Consistent file management in CMake	2023-08-21 11:45:08 +08:00
Amos Bird	c43bf153f5	Refactor	2023-08-18 15:38:46 +08:00
Amos Bird	dd0c71b32a	Add error_exit_reaction	2023-08-18 15:38:46 +08:00
Amos Bird	476f3cedc1	Various reactions when executable stderr has data	2023-08-18 15:38:45 +08:00
Sergei Trifonov	771710b377	Merge branch 'master' into concurrency-control-controllable	2023-08-11 16:50:13 +02:00
Alexey Milovidov	aa757490bd	Ditch tons of garbage	2023-08-09 02:19:02 +02:00
Han Fei	65dcd79eb0	fix mem leak in RegExpTreeDictionary	2023-08-08 14:58:18 +02:00
Sergei Trifonov	01196ac41f	Merge branch 'master' into concurrency-control-controllable	2023-08-01 15:40:50 +02:00
xiebin	33e2cfcecb	Merge branch 'master' into master	2023-07-30 12:20:54 +08:00
Yakov Olkhovskiy	9a1c59a2f1	Merge branch 'master' into fix-ip-dict	2023-07-26 12:08:49 -04:00
Alexey Milovidov	21382afa2b	Check for punctuation	2023-07-25 06:10:04 +02:00
Nikita Mikhaylov	ee0bbc0e54	Merge branch 'master' into headers-blacklist	2023-07-17 19:08:52 +02:00
Yakov Olkhovskiy	e95d413d9a	Merge branch 'master' into fix-ip-dict	2023-07-14 09:11:42 -04:00
Dmitry Kardymon	385a210fee	Merge remote-tracking branch 'origin/master' into ADQM-870	2023-07-10 13:19:21 +00:00
robot-clickhouse	1343e5cc45	Merge pull request #51853 from kitaisreal/cache-dictionary-request-only-unique-keys-from-source CacheDictionary request only unique keys from source	2023-07-08 20:58:16 +02:00
Maksim Kita	8266067e1a	Fixed style check	2023-07-07 19:09:55 +03:00
Maksim Kita	23bd23802f	CacheDictionary request only unique keys from source	2023-07-07 12:26:15 +03:00
Nikolay Degterinsky	e98d136243	Merge branch 'master' into headers-blacklist	2023-07-07 04:44:06 +02:00
Kseniia Sumarokova	e97e107bcc	Merge branch 'master' into add-separate-access-for-use-named-collections	2023-07-06 12:16:53 +02:00
Alexey Milovidov	2c96580a77	Merge branch 'master' into concurrency-control-controllable	2023-07-04 23:16:04 +03:00
Dmitry Kardymon	ab4142eb8f	Merge remote-tracking branch 'clickhouse/master' into ADQM-870	2023-07-04 08:23:31 +03:00
Yakov Olkhovskiy	0529772dd8	support IPv4 and IPv6 as dictionary attributes	2023-07-04 02:19:45 +00:00
Nikolay Degterinsky	82e0237e67	Merge branch 'master' into headers-blacklist	2023-07-03 16:54:50 +02:00
kssenii	ac77f5fe6f	Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections	2023-07-03 13:55:45 +02:00
Robert Schulze	fe49e98455	Follow-up to re2 update 2023-06-02 (#50949 )	2023-07-03 08:28:25 +00:00
Nikolay Degterinsky	8dfa773f44	Merge branch 'master' into headers-blacklist	2023-06-30 23:40:17 +02:00
Sema Checherinda	d0d12bbf3b	Merge branch 'master' into no-finalize-WriteBufferFromOStream	2023-06-30 12:15:17 +02:00
Robert Schulze	6872084051	Merge pull request #50949 from georgthegreat/update-re2 Update contrib/re2 to 2023-06-02	2023-06-30 10:40:17 +02:00
Sema Checherinda	2a1f34e3f9	Merge branch 'master' into no-finalize-WriteBufferFromOStream	2023-06-30 08:01:05 +02:00
Igor Nikonov	56354b7251	Fix yet another place	2023-06-28 16:55:22 +00:00
Igor Nikonov	0b19c1832a	Fix: detach from thread group	2023-06-28 14:15:03 +00:00
Sema Checherinda	fe97021929	add missing finalize calls in buffers	2023-06-27 16:54:14 +02:00
Yuriy Chernyshov	3e6654a1fe	Merge branch 'master' into update-re2	2023-06-24 22:34:44 +02:00
Nikita Taranov	fb7d23f245	fix build	2023-06-22 23:54:25 +02:00
Anton Kozlov	0c440b9d6f	Report loading status for executable dictionaries correctly	2023-06-22 10:28:13 +00:00
Nikolay Degterinsky	575a1a4907	Add header checks to HTTP dictionary source	2023-06-20 13:29:25 +00:00
Dmitry Kardymon	806176d88e	Add input_format_csv_missing_as_default setting and tests	2023-06-15 11:23:08 +00:00
kssenii	25ae93bbf8	Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections	2023-06-14 13:33:56 +02:00
JackyWoo	a1641aa25d	Merge branch 'master' into support_redis	2023-06-12 09:53:06 +08:00
Nikolay Degterinsky	9ad8e022a8	Merge branch 'master' into update-mongo	2023-06-10 10:58:02 +02:00
pufit	55d228e78e	Merge branch 'master' into support_redis	2023-06-09 11:45:12 -04:00
kssenii	63f8a3275b	Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections	2023-06-09 14:32:41 +02:00
johanngan	be8e048799	Revert invalid RegExpTreeDictionary optimization This reverts the following commits: - `e77dd81036` - `e8527e720b` Additionally, functional tests are added. When scanning complex regexp nodes sequentially with RE2, the old code has an optimization to break out of the loop early upon finding a leaf node that matches. This is an invalid optimization because there's no guarantee that it's actually a VALID match, because its parents might NOT have matched. Semantically, a user would expect this match to be discarded and for the search to continue. Instead, since we skipped matching after the first false positive, subsequent nodes that would have matched are missing from the output value. This affects both dictGet and dictGetAll. It's difficult to distinguish a true positive from a false positive while looping through complex_regexp_nodes because we would have to scan all the parents of a matching node to confirm a true positive. Trying to do this might actually end up being slower than just scanning every complex regexp node, because complex_regexp_nodes is only a subset of all the tree nodes; we may end up duplicating work with scanning that Vectorscan has already done, depending on whether the parent nodes are "simple" or "complex". So instead of trying to fix this optimization, just remove it entirely.	2023-06-06 16:28:44 -05:00
kssenii	adfedb4df0	Add USE NAMED COLLECTION access	2023-06-06 14:46:34 +02:00
johanngan	c0f162c5b6	Add dictGetAll function for RegExpTreeDictionary This function outputs an array of attribute values from all regexp nodes that matched in a regexp tree dictionary. An optional final argument can be passed to limit the array size.	2023-06-04 23:46:04 -05:00
JackyWoo	e6d1b3c351	little fix	2023-06-02 10:05:54 +08:00

1 2 3 4 5 ...

1472 Commits