ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-26 09:32:01 +00:00

Author	SHA1	Message	Date
Nikita Mikhaylov	82ba97c3a7	More explicit template instantiations (#60730 )	2024-03-07 17:16:13 +01:00
Robert Schulze	7b378dbad3	Remove broken lockless variant of re2	2023-09-14 16:40:42 +00:00
Alexey Milovidov	5561e3e198	Remove garbage and speed up Debug and Tidy builds	2023-08-09 01:44:39 +02:00
Robert Schulze	fe49e98455	Follow-up to re2 update 2023-06-02 (#50949 )	2023-07-03 08:28:25 +00:00
Alexander Tokmakov	70d1adfe4b	Better formatting for exception messages (#45449 ) * save format string for NetException * format exceptions * format exceptions 2 * format exceptions 3 * format exceptions 4 * format exceptions 5 * format exceptions 6 * fix * format exceptions 7 * format exceptions 8 * Update MergeTreeIndexGin.cpp * Update AggregateFunctionMap.cpp * Update AggregateFunctionMap.cpp * fix	2023-01-24 00:13:58 +03:00
Azat Khuzhin	4e76629aaf	Fixes for -Wshorten-64-to-32 - lots of static_cast - add safe_cast - types adjustments - config - IStorage::read/watch - ... - some TODO's (to convert types in future) P.S. That was quite a journey... v2: fixes after rebase v3: fix conflicts after #42308 merged Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-10-21 13:25:19 +02:00
Robert Schulze	77e64935e1	Reduce some usage of StringRef	2022-08-19 09:56:59 +00:00
Robert Schulze	ad12adc31c	Measure and rework internal re2 caching This commit is based on local benchmarks of ClickHouse's re2 caching. Question 1: ----------------------------------------------------------- Is pattern caching useful for queries with const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T; The short answer is: no. Runtime is (unsurprisingly) dominated by pattern evaluation + other stuff going on in queries, but definitely not pattern compilation. For space reasons, I omit details of the local experiments. (Side note: the current caching scheme is unbounded in size which poses a DoS risk (think of multi-tenancy). This risk is more pronounced when unbounded caching is used with non-const patterns ..., see next question) Question 2: ----------------------------------------------------------- Is pattern caching useful for queries with non-const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T; I benchmarked five caching strategies: 1. no caching as a baseline (= recompile for each row) 2. unbounded cache (= threadsafe global hash-map) 3. LRU cache (= threadsafe global hash-map + LRU queue) 4. lightweight local cache 1 (= not threadsafe local hashmap with collision list which grows to a certain size (here: 10 elements) and afterwards never changes) 5. lightweight local cache 2 (not threadsafe local hashmap without collision list in which a collision replaces the stored element, idea by Alexey) ... using a haystack of 2 mio strings and A). 2 mio distinct simple patterns B). 10 simple patterns C) 2 mio distinct complex patterns D) 10 complex patterns Fo A) and C), caching does not help but these queries still allow to judge the static overhead of caching on query runtimes. B) and D) are extreme but common cases in practice. They include queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' : '%pattern2%'). Caching should help significantly. Because LIKE patterns are internally translated to re2 expressions, I show only measurements for MATCH queries. Results in sec, averaged over on multiple measurements; 1.A): 2.12 B): 1.68 C): 9.75 D): 9.45 2.A): 2.17 B): 1.73 C): 9.78 D): 9.47 3.A): 9.8 B): 0.63 C): 31.8 D): 0.98 4.A): 2.14 B): 0.29 C): 9.82 D): 0.41 5.A) 2.12 / 2.15 / 2.26 B) 1.51 / 0.43 / 0.30 C) 9.97 / 9.88 / 10.13 D) 5.70 / 0.42 / 0.43 (10/100/1000 buckets, resp. 10/1/0.1% collision rate) Evaluation: 1. This is the baseline. It was surprised that complex patterns (C, D) slow down the queries so badly compared to simple patterns (A, B). The runtime includes evaluation costs, but as caching only helps with compilation, and looking at 4.D and 5.D, compilation makes up over 90% of the runtime! 2. No speedup compared to 1, probably due to locking overhead. The cache is unbounded, and in experiments with data sets > 2 mio rows, 2. is the only scheme to throw OOM exceptions which is not acceptable. 3. Unique patterns (A and C) lead to thrashing of the LRU cache and very bad runtimes due to LRU queue maintenance and locking. Works pretty well however with few distinct patterns (B and D). 4. This scheme is tailored to queries B and D where it performs pretty good. More importantly, the caching is lightweight enough to not deteriorate performance on datasets A and C. 5. After some tuning of the hash map size, 100 buckets seem optimal to be in the same ballpark with 10 distinct patterns as 4. Performance also does not deteriorate on A and C compared to the baseline. Unlike 4., this scheme behaves LRU-like and can adjust to changing pattern distributions. As a conclusion, this commit implementes two things: 1. Based on Q1, pattern search with const needle no longer uses caching. This applies to LIKE and MATCH + a few (exotic) other SQL functions. The code for the unbounded caching was removed. 2. Based on Q2, pattern search with non-const needles now use method 5.	2022-05-30 20:00:35 +02:00
Robert Schulze	7232f47c68	Fix Bug 37114 - ilike on FixedString(N) columns produces wrong results The main fix is in MatchImpl.h where the "case_insensitive" parameter is added to Regexps::get(). Also made "case_insensitive" a non-default template parameter to reduce the risk of future bugs. The remainder of this commit are minor random code improvements. resoves #37114	2022-05-11 14:30:21 +02:00
Maksim Kita	538f8cbaad	Fix clang-tidy warnings in Disks, Formats, Functions folders	2022-03-14 18:17:35 +00:00
Maksim Kita	c2407fee06	Fixed tests	2021-09-30 14:35:24 +03:00
Pavel Kruglov	70b51133c1	Try to simplify code	2021-08-09 18:01:08 +03:00
Pavel Kruglov	0662df8b76	Fix performance with JIT, add arguments to function isSuitableForShortCircuitArgumentsExecution	2021-08-09 17:54:14 +03:00
Pavel Kruglov	e792fa588f	Mark all Functions as sutable or not for executing as short circuit arguments	2021-08-09 17:50:09 +03:00
Vasily Nemkov	a1fb16df52	setting regexp_max_matches_per_row instead of 3rd argument	2021-07-30 12:28:21 +03:00
Vasily Nemkov	ec77ba8bfc	Updated extractAllGroupsHorizontal - flexible limit on number of matches per row. If it is not set via third argument, it deafults to previously hardcoded value 1000.	2021-07-29 15:36:55 +03:00
Nikolai Kochetov	dbaa6ffc62	Rename ContextConstPtr to ContextPtr.	2021-06-01 15:20:52 +03:00
Alexander Kuzmenkov	3f57fc085b	remove mutable context references from functions interface Also remove it from some visitors.	2021-05-28 19:45:37 +03:00
Maksim Kita	d923d9e6ef	Function move file	2021-05-17 10:30:42 +03:00
Maksim Kita	947f28d430	IFunction refactoring	2021-05-15 20:33:15 +03:00
alexey-milovidov	6a2a9cecdd	Update extractAllGroups.h	2021-04-14 01:24:46 +03:00
Vasily Nemkov	77bdb5b391	Fixed erroneus failure of extractAllGroupsHorizontal on large columns	2021-04-14 00:17:06 +03:00
Ivan	495c6e03aa	Replace all Context references with std::weak_ptr (#22297 ) * Replace all Context references with std::weak_ptr * Fix shared context captured by value * Fix build * Fix Context with named sessions * Fix copy context * Fix gcc build * Merge with master and fix build * Fix gcc-9 build	2021-04-11 02:33:54 +03:00
Alexey Milovidov	e38ff3517d	Fail fast in incorrect usage of extractAllGroups	2021-01-22 02:48:26 +03:00
Ivan Lezhankin	f897f7c93f	Refactor IFunction to execute with const arguments	2020-11-17 16:24:45 +03:00
Nikolai Kochetov	295e612343	Fix build and tests.	2020-10-20 16:11:57 +03:00
Nikolai Kochetov	ce2f6a0560	Part 4.	2020-10-18 00:41:50 +03:00
Nikolai Kochetov	959424f28a	Rename block to columns.	2020-10-14 17:04:50 +03:00
Nikolai Kochetov	966b1d6cf5	Rename Block to ColumnsWithTypeAndName.	2020-10-14 16:09:11 +03:00
Nikolai Kochetov	9b42bfdc36	Merge pull request #15817 from ClickHouse/new-block-for-functions-2 Use `ColumnsWithTypeAndName` instead of `Block` for function calls [part 2]	2020-10-12 00:32:58 +03:00
Alexey Milovidov	269b6383f5	Check for #pragma once in headers	2020-10-10 21:37:02 +03:00
Nikolai Kochetov	d28325a353	Replace getByPosition to []	2020-10-10 21:24:57 +03:00
Alexey Milovidov	c37b55c3b1	Fix error in "extractAllGroups" function	2020-09-17 00:19:58 +03:00
Alexey Milovidov	edd89a8610	Fix half of typos	2020-08-08 03:47:03 +03:00
Nikolai Kochetov	e4689ce302	Make IFunction::executeImpl const.	2020-07-21 16:58:07 +03:00
Alexey Milovidov	9dc43fc435	Fix race condition in extractAllGroups	2020-06-25 19:57:30 +03:00
Alexey Milovidov	9901e4d528	Remove debug output #11554	2020-06-13 20:20:54 +03:00
Alexey Milovidov	787163d0b4	Minor modifications after merging #11554	2020-06-12 17:03:00 +03:00
Vasily Nemkov	50a184acac	extractAllGroupsHorizontal and extractAllGroupsVertical Split tests, fixed some error messages Fixed test and error reporting of extractGroups	2020-06-11 11:03:17 +03:00

39 Commits