ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 10:52:30 +00:00

Author	SHA1	Message	Date
Igor Nikonov	bf7dd39282	Fix: decimal rounding Fixes #37531	2022-06-14 18:03:05 +00:00
Maksim Kita	dc2e117cce	UnaryLogicalFunctions improve performance using dynamic dispatch	2022-06-14 17:30:11 +02:00
Robert Schulze	5f5732a2c4	Merge pull request #37969 from ClickHouse/consistent-macro-usage More consistent use of platform macros	2022-06-10 14:10:01 +02:00
Robert Schulze	1a0b5f33b3	More consistent use of platform macros cmake/target.cmake defines macros for the supported platforms, this commit changes predefined system macros to our own macros. __linux__ --> OS_LINUX __APPLE__ --> OS_DARWIN __FreeBSD__ --> OS_FREEBSD	2022-06-10 10:22:31 +02:00
Maksim Kita	0c1211eb61	Merge pull request #37930 from kitaisreal/function-dict-get-check-arguments-size Function dictGet check arguments size	2022-06-08 23:25:14 +02:00
Maksim Kita	b7152fa2bf	Function dictGet check arguments size	2022-06-08 17:19:30 +02:00
Maksim Kita	7d1a43cfeb	Fix setting cast_ipv4_ipv6_default_on_conversion_error for internal cast	2022-06-08 12:43:39 +02:00
Maksim Kita	4e160105b9	Merge pull request #37805 from kitaisreal/dictionaries-hierarchy-nullable-key-support Hierarchical dictinaries support nullable parent key	2022-06-08 12:36:09 +02:00
Anton Popov	df6882d2b9	Revert "Fix errors of CheckTriviallyCopyableMove type"	2022-06-07 13:53:10 +02:00
mergify[bot]	014d9e2144	Merge branch 'master' into fix-nothing-error	2022-06-07 11:24:28 +00:00
avogar	cbd50aecd4	Fix	2022-06-07 11:23:59 +00:00
Vitaly Baranov	d199478169	Merge pull request #37303 from ClickHouse/fix_trash Try to fix some trash	2022-06-07 10:17:39 +02:00
Robert Schulze	2d87af2a15	Merge pull request #37647 from DevTeamBK/Fix-all-CheckTriviallyCopyableMove-Errors Fix errors of CheckTriviallyCopyableMove type	2022-06-05 19:58:47 +02:00
Maksim Kita	6db5c08fde	Functions dictGetChildren, dictGetDescendants added support for nullable parent key	2022-06-03 17:36:16 +02:00
Maksim Kita	a0cbbd9edc	Hierarchical Cache, Direct dictionaries added support for nullable parent key	2022-06-03 17:21:55 +02:00
Anton Popov	f592a802a1	Merge pull request #37482 from CurtizJ/json-new-serialization Better binary serialization of `ColumnObject`	2022-06-03 13:29:19 +02:00
Robert Schulze	05f08357a9	Merge pull request #37764 from ClickHouse/like_with_trailing_backslash Disallow LIKE patterns with trailing escape	2022-06-03 13:19:51 +02:00
Alexey Milovidov	1529d47207	Merge pull request #34754 from ClickHouse/llvm-14 Switch to clang/llvm 14	2022-06-03 14:07:34 +03:00
Alexey Milovidov	de16784832	Merge pull request #37633 from ClickHouse/dump-column-structure-more-precise More precise result of the `dumpColumnStructure` and `byteSize` miscellaneous functions	2022-06-03 14:05:20 +03:00
Alexey Milovidov	ea89f81a78	Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14	2022-06-03 03:07:14 +02:00
Robert Schulze	657662d89f	Minor follow-up to cache table: std::{vector-->array}	2022-06-02 20:18:10 +02:00
Maksim Kita	20b55a45b2	Hierarchical dictionaries support nullable parent key	2022-06-02 19:24:23 +02:00
HeenaBansal2009	e3080f2a97	Merge remote-tracking branch 'origin' into Fix-all-CheckTriviallyCopyableMove-Errors	2022-06-02 07:30:08 -07:00
Alexander Gololobov	b34782dc6a	Merge pull request #37775 from liuneng1994/fix_date32_to_string fix toString error on DatatypeDate32	2022-06-02 16:40:47 +03:00
Vladimir C	670c721ded	Merge pull request #37742 from ucasfl/hashid	2022-06-02 12:47:11 +02:00
Robert Schulze	4e18659bfd	Fix tests + more precise exception msg	2022-06-02 11:11:56 +02:00
liuneng1994	7b15055e72	fix toString error on DatatypeDate32 Signed-off-by: liuneng1994 <1398775315@qq.com>	2022-06-02 16:56:43 +08:00
Alexey Milovidov	b5f48a7d3f	Merge branch 'master' of github.com:ClickHouse/ClickHouse into llvm-14	2022-06-01 22:09:58 +02:00
Robert Schulze	366f368d06	Disallow LIKE patterns with trailing escape Trailing escape ('ab\') is disallowed in SQL, in standardese: "If an escape character is specified, then [...] If there is not a partitioning of the string PVC into substrings such that each substring has length 1 (one) or 2, no substring of length 1 (one) is the escape character ECV, and each substring of length 2 is the escape character ECV followed by either the escape character ECV, an <underscore> character, or the <percent> character, then an exception condition is raised: data exception - invalid escape sequence." I first thought this is checked already higher up in the stack, at least for const needles, as single trailing backslashes ('ab\') are rejected, but then I realized that ClickHouse quotes by default. I.e., double trailing backslashes ('ab\\') are not rejected but when interpreted as LIKE needle ('ab\') they should.	2022-06-01 21:38:46 +02:00
Robert Schulze	b3b0716b32	Merge pull request #37544 from ClickHouse/cached_patterns Cache compiled regexps when evaluating non-const needles	2022-06-01 19:55:25 +02:00
avogar	966b864986	Fix possible logical error with type Nothing and JSON functions	2022-06-01 16:34:31 +00:00
flynn	b62e4cec65	Fix crash of FunctionHashID	2022-06-01 12:39:16 +00:00
Alexander Tokmakov	75f49a48e1	Merge branch 'master' into fix_trash	2022-06-01 14:20:46 +02:00
Robert Schulze	600512cc08	Replace exceptions thrown for programming errors by asserts	2022-06-01 11:53:37 +02:00
Anton Popov	20e319d67a	Merge pull request #37666 from CurtizJ/optimize-coalesce Optimize function `COALESCE` with two arguments	2022-05-31 23:48:13 +02:00
Yakov Olkhovskiy	873ac9f8ff	Merge pull request #37540 from ClickHouse/feature-server-certificate showCertificate function implementation	2022-05-31 02:50:03 -04:00
Anton Popov	30f8eb800a	optimize function coalesce with two arguments	2022-05-30 22:29:35 +00:00
Nikolai Kochetov	77b07dd0a8	Merge pull request #37163 from ClickHouse/grouping-function Add GROUPING function	2022-05-30 20:45:04 +02:00
HeenaBansal2009	b7eb6bbd38	Fixed clang-tidy-CheckTriviallyCopyableMove-errors	2022-05-30 11:09:03 -07:00
Robert Schulze	ad12adc31c	Measure and rework internal re2 caching This commit is based on local benchmarks of ClickHouse's re2 caching. Question 1: ----------------------------------------------------------- Is pattern caching useful for queries with const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, '%HelloWorld') FROM T; The short answer is: no. Runtime is (unsurprisingly) dominated by pattern evaluation + other stuff going on in queries, but definitely not pattern compilation. For space reasons, I omit details of the local experiments. (Side note: the current caching scheme is unbounded in size which poses a DoS risk (think of multi-tenancy). This risk is more pronounced when unbounded caching is used with non-const patterns ..., see next question) Question 2: ----------------------------------------------------------- Is pattern caching useful for queries with non-const LIKE/REGEX patterns? E.g. SELECT LIKE(col_haystack, col_needle) FROM T; I benchmarked five caching strategies: 1. no caching as a baseline (= recompile for each row) 2. unbounded cache (= threadsafe global hash-map) 3. LRU cache (= threadsafe global hash-map + LRU queue) 4. lightweight local cache 1 (= not threadsafe local hashmap with collision list which grows to a certain size (here: 10 elements) and afterwards never changes) 5. lightweight local cache 2 (not threadsafe local hashmap without collision list in which a collision replaces the stored element, idea by Alexey) ... using a haystack of 2 mio strings and A). 2 mio distinct simple patterns B). 10 simple patterns C) 2 mio distinct complex patterns D) 10 complex patterns Fo A) and C), caching does not help but these queries still allow to judge the static overhead of caching on query runtimes. B) and D) are extreme but common cases in practice. They include queries like "SELECT ... WHERE LIKE (col_haystack, flag ? '%pattern1%' : '%pattern2%'). Caching should help significantly. Because LIKE patterns are internally translated to re2 expressions, I show only measurements for MATCH queries. Results in sec, averaged over on multiple measurements; 1.A): 2.12 B): 1.68 C): 9.75 D): 9.45 2.A): 2.17 B): 1.73 C): 9.78 D): 9.47 3.A): 9.8 B): 0.63 C): 31.8 D): 0.98 4.A): 2.14 B): 0.29 C): 9.82 D): 0.41 5.A) 2.12 / 2.15 / 2.26 B) 1.51 / 0.43 / 0.30 C) 9.97 / 9.88 / 10.13 D) 5.70 / 0.42 / 0.43 (10/100/1000 buckets, resp. 10/1/0.1% collision rate) Evaluation: 1. This is the baseline. It was surprised that complex patterns (C, D) slow down the queries so badly compared to simple patterns (A, B). The runtime includes evaluation costs, but as caching only helps with compilation, and looking at 4.D and 5.D, compilation makes up over 90% of the runtime! 2. No speedup compared to 1, probably due to locking overhead. The cache is unbounded, and in experiments with data sets > 2 mio rows, 2. is the only scheme to throw OOM exceptions which is not acceptable. 3. Unique patterns (A and C) lead to thrashing of the LRU cache and very bad runtimes due to LRU queue maintenance and locking. Works pretty well however with few distinct patterns (B and D). 4. This scheme is tailored to queries B and D where it performs pretty good. More importantly, the caching is lightweight enough to not deteriorate performance on datasets A and C. 5. After some tuning of the hash map size, 100 buckets seem optimal to be in the same ballpark with 10 distinct patterns as 4. Performance also does not deteriorate on A and C compared to the baseline. Unlike 4., this scheme behaves LRU-like and can adjust to changing pattern distributions. As a conclusion, this commit implementes two things: 1. Based on Q1, pattern search with const needle no longer uses caching. This applies to LIKE and MATCH + a few (exotic) other SQL functions. The code for the unbounded caching was removed. 2. Based on Q2, pattern search with non-const needles now use method 5.	2022-05-30 20:00:35 +02:00
Alexey Milovidov	f1fb57c6ce	Fix clang-tidy-14	2022-05-30 05:36:26 +02:00
Alexey Milovidov	c0e6ff4216	More precise result of "dumpColumnStructure" and "byteSize" miscellaneous functions	2022-05-30 04:56:54 +02:00
Alexey Milovidov	c1169019d2	Merge branch 'master' into llvm-14	2022-05-29 02:29:02 +02:00
Alexey Milovidov	73e2e63414	Merge pull request #37612 from ClickHouse/clang-tidy-14 Fix clang-tidy-14, part 1	2022-05-29 03:16:32 +03:00
Alexander Tokmakov	4e52f45695	Merge branch 'master' into fix_trash	2022-05-28 19:43:19 +02:00
Alexey Milovidov	c50791dd3b	Fix clang-tidy-14, part 1	2022-05-27 22:52:14 +02:00
Alexey Milovidov	d2c6fd90cb	Fix clang-tidy-14, part 1	2022-05-27 22:51:37 +02:00
Alexander Gololobov	9b1b30855c	Fixed check for HUGE_VAL	2022-05-27 18:25:11 +02:00
Alexander Gololobov	6361c5f38c	Fix for failed style check	2022-05-27 18:22:16 +02:00
Alexander Gololobov	540353566c	Added LpNorm and LpDistance functions for arrays	2022-05-27 17:17:08 +02:00
Robert Schulze	80061aa3e2	Merge remote-tracking branch 'origin/master' into cached_patterns	2022-05-27 09:21:01 +02:00
Alexey Milovidov	86afa3a245	Merge pull request #37502 from ClickHouse/array_norm_dist_fixes Renamed arrayXXNorm/arrayXXDistance functions to XXNorm/XXDistance and fixed some overflow cases	2022-05-27 00:56:29 +03:00
mergify[bot]	a7629f900f	Merge branch 'master' into normalize-utf8-performance-tests-fix	2022-05-26 10:29:55 +00:00
Maksim Kita	3a92e61827	Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement Dictionary getDescendants performance improvement	2022-05-26 12:29:17 +02:00
Yakov Olkhovskiy	2dc160a4c3	style fix	2022-05-25 20:56:36 -04:00
Dmitry Novik	7cd7782e4f	Process columns more efficiently in GROUPING()	2022-05-25 21:55:41 +00:00
Dmitry Novik	3c1b6609ae	Add comments and make tests more verbose	2022-05-25 21:23:35 +00:00
Maksim Kita	58cd1bd3ec	Merge pull request #36843 from bharatnc/ncb/h3-unidirectionaledges-funcs add h3 unidirectional edge functions	2022-05-25 22:46:40 +02:00
Maksim Kita	bee3c30f66	Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance Geo distance functions improve performance	2022-05-25 22:40:40 +02:00
Alexander Gololobov	168b47d0ad	Use same norm and distance function names for tuples and arrays	2022-05-25 22:39:59 +02:00
Alexander Gololobov	b065839f44	always return Float64	2022-05-25 22:27:00 +02:00
Alexander Gololobov	5df14cd956	Cast arguments to result type to avoid int overflow	2022-05-25 22:27:00 +02:00
Robert Schulze	49934a3dc8	Cache compiled regexps when evaluating non-const needles Needles in a (non-const) needle column may repeat and this commit allows to skip compilation for known needles. Out of the different design alternatives (see below, if someone is interested), we now maintain - one global pattern cache, - with a fixed size of 42k elements currently, - and use LRU as eviction strategy. ------------------------------------------------------------------------ (sorry for the wall of text, dumping it here not for reading but just for reference) Write-up about considered design alternatives: 1. Keep the current global cache of const needles. For non-const needles, probe the cache but don't store values in it. Pros: need to maintain just a single cache, no problem with cache pollution assuming there are few distinct constant needles Cons: only useful if a non-const needle occurred as already as a const needle --> overall too simplistic 2. Keep the current global cache for const needles. For non-const needles, create a local (e.g. per-query) cache Pros: unlike (1.), non-const needles can be skipped even if they did not occur yet, no pollution of the const pattern cache when there are very many non-const needles (e.g. large / highly distinct needle columns). Cons: caches may explode "horizontally", i.e. we'll end up with the const cache + caches for Q1, Q2, ... QN, this makes it harder to control the overall space consumption, also patterns residing in different caches cannot be reused between queries, another difficulty is that the concept of "query" does not really exist at matching level - there are only column chunks and we'd potentially end up with 1 cache / chunk 3. Queries with const and non-const needles insert into the same global cache. Pros: the advantages of (2.) + allows to reuse compiled patterns accross parallel queries Cons: needs an eviction strategy to control cache size and pollution (and btw. (2.) also needs eviction strategies for the individual caches) 4. Queries with const needle use global cache, queries with non-const needle use a different global cache --> Overall similar to (3) but ignores the (likely) edge case that const and non-const needles overlap. In sum, (3.) seems the simplest and most beneficial approach. Eviction strategies: 0. Don't ever evict --> cache may grow infinitely and eventually make the system unusable (may even pose a DoS risk) 1. Flush the cache after a certain threshold is exceeded --> very simple but may lead to peridic performance drops 2. Use LRU --> more graceful performance degradation at threshold but comes with a (constant) performance overhead to maintain the LRU queue In sum, given that the pattern compilation in RE2 should be quite costly (pattern-to-DFA/NFA), LRU may be acceptable.	2022-05-25 22:04:06 +02:00
Robert Schulze	ea60a614d2	Decrease namespace indent	2022-05-25 21:56:35 +02:00
Alexey Milovidov	abf2558fba	Merge pull request #37491 from ClickHouse/match_refactoring Refactorings of LIKE/MATCH code	2022-05-25 22:05:38 +03:00
Alexey Milovidov	4482da9eb6	Update greatCircleDistance.cpp	2022-05-25 21:59:31 +03:00
Alexander Tokmakov	779e6ea0b9	make it better, fix on cluster queries	2022-05-25 20:17:49 +02:00
Nikolai Kochetov	ff98c24d44	Merge pull request #37048 from Avogar/fix-array-map-nothing Add default implementation for Nothing in functions	2022-05-25 19:10:40 +02:00
Yakov Olkhovskiy	6692b9c2ed	showCertificate function implementation	2022-05-25 12:11:44 -04:00
Alexey Milovidov	cb92482ca5	Merge pull request #37484 from kitaisreal/function-has-all-avx2-dynamic-dispatch Function hasAll added dynamic dispatch	2022-05-25 19:05:32 +03:00
Maksim Kita	28355114c0	Fixed tests	2022-05-25 16:19:29 +02:00
Maksim Kita	e67b3537f7	Functions normalizeUTF8 unstable performance tests fix	2022-05-25 15:54:52 +02:00
Maksim Kita	45da28ecae	Improve performance of geo distance functions	2022-05-25 14:22:22 +02:00
Maksim Kita	c372c3d6aa	Fix performance tests	2022-05-25 11:49:59 +02:00
Kseniia Sumarokova	b50d4549c9	Merge pull request #37356 from amosbird/partition-prune-for-s3 "Partition pruning" for s3	2022-05-25 11:03:07 +02:00
Robert Schulze	05e4fa7df1	Fix special case of trivial regexp Previously, we would alsays set 1 in case of a trivial regex (which is correct). If someone in future builds a negated operator, then this will produce wrong results. Right now, negation of regexp (SQL: NOT MATCH) is implemented at a higher level, so we are safe and this is more a preventive fix.	2022-05-25 10:05:55 +02:00
Robert Schulze	01ab7b9bad	Pass strings in some places as string_view The original goal was to get change const auto & needle = String( reinterpret_cast<const char >(cur_needle_data), cur_needle_length); in Functions/MatchImpl.h into a std::string_view to save an allocation + copy. The needle is eventually passed as search pattern into the re2 library. Re2 has an alternative constructor taking a const char i.e. a NULL-terminated string. Here, the needle is NULL-terminated but 1. this is only because it is passed inside a ColumnString yet this is not always the case (e.g. fixed string columns has a dense layout w/o NULL terminator). 2. assuming NULL termination for users != MatchImpl of the regex code is too dangerous. So, for now we'll stay with copying to be on the safe side. One fine day when re2 has a ptr/size ctor, we can use std::string_view. Just changing a few other places from std::string to std::string_view but this will not help with performance.	2022-05-25 10:05:51 +02:00
Robert Schulze	040fbf3686	Tighter sanity checks in matching code	2022-05-25 10:05:06 +02:00
Robert Schulze	35bef17302	Introduce variables to hold the match result --> nicer when debugging	2022-05-25 10:04:47 +02:00
Robert Schulze	b044d44fef	Refactoring: Make template instantiation easier to read - introduced class MatchTraits with enums that replace bool template parameters - (minor: made negation the last template parameters because negation executes last during evaluation)	2022-05-25 10:03:58 +02:00
Bharat Nallan Chakravarthy	57cfc0bd04	check for validity of h3 index	2022-05-25 06:17:15 +05:30
Alexander Gololobov	2ff747785e	Merge pull request #37394 from ClickHouse/array_norm_dist_fixes Do computations on the raw input data without copying to Eigen::Matrix	2022-05-24 20:59:04 +02:00
Robert Schulze	7348a0eb28	Merge pull request #37251 from ClickHouse/non_const_like Support non-constant SQL functions (NOT) (I)LIKE and MATCH	2022-05-24 20:28:31 +02:00
Robert Schulze	028f15c4fa	Review comment: Throw LOGICAL_ERROR for different sizes of haystack / needles	2022-05-24 20:19:13 +02:00
Maksim Kita	3c0c322d7c	Merge pull request #37480 from kitaisreal/dynamic-dispatch-infrastructure-improvements Dynamic dispatch infrastructure style fixes	2022-05-24 18:13:53 +02:00
Maksim Kita	6fb51e8bd3	Function hasAll added dynamic dispatch	2022-05-24 17:06:06 +02:00
Maksim Kita	86180614e7	Fixed tests	2022-05-24 15:33:03 +02:00
Anton Popov	e96af9fd75	better binary serialization of ColumnObject	2022-05-24 13:16:11 +00:00
Maksim Kita	e6e4b2826d	Dynamic dispatch infrastructure style fixes	2022-05-24 14:25:29 +02:00
Amos Bird	c25ef92139	Fix tests	2022-05-24 18:57:55 +08:00
Amos Bird	093d315756	partition pruning for s3	2022-05-24 18:57:55 +08:00
Maksim Kita	712b000f2a	Merge pull request #37443 from kitaisreal/functions-normalize-utf8-fix Functions normalize utf8 fix	2022-05-24 11:11:15 +02:00
Alexander Gololobov	7d0ed7e51a	Remove eigen library	2022-05-24 10:24:50 +02:00
Alexander Gololobov	caad1435d5	Optimized the case when one the argumnets is Const	2022-05-24 10:24:50 +02:00
Alexander Gololobov	65fbda436a	Do computations on the raw input data without copying to Eigen::Matrix	2022-05-24 10:24:50 +02:00
Bharat Nallan Chakravarthy	6e49b76cfd	try suppress h3 asan errors	2022-05-24 10:22:46 +05:30
Maksim Kita	996241493f	Merge pull request #37447 from kitaisreal/binary-function-vectorized-remove-macro BinaryFunctionVectorized remove macro	2022-05-23 16:50:12 +02:00
Maksim Kita	fe21b4ca9e	Fixed style check	2022-05-23 14:41:07 +02:00
Maksim Kita	008de5c779	Merge pull request #37438 from kitaisreal/function-binary-representation-style-fixes FunctionBinaryRepresentation style fixes	2022-05-23 13:54:15 +02:00
Maksim Kita	e550843d56	BinaryFunctionVectorized remove macro	2022-05-23 12:45:16 +02:00

1 2 3 4 5 ...

3611 Commits