ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 10:52:30 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	0b47f4a9e9	Fix optimize_trivial_count_query with partition predicate Consider the following example: CREATE TABLE test(p DateTime, k int) ENGINE MergeTree PARTITION BY toDate(p) ORDER BY k; INSERT INTO test VALUES ('2020-09-01 00:01:02', 1), ('2020-09-01 20:01:03', 2), ('2020-09-02 00:01:03', 3); - SELECT count() FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00' In this case rpn will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN (due to strict), FUNCTION_AND) and for optimize_trivial_count_query we cannot use index if there is at least one FUNCTION_UNKNOWN. since there is no post processing and return count() based on only the first predicate is wrong. Before this patch FUNCTION_UNKNOWN was allowed for optimize_trivial_count_query, and the result was wrong. And two examples above just to show the difference, the behaviour hadn't been changed with this patch: - SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00' In this case will be (FUNCTION_IN_RANGE, FUNCTION_IN_RANGE (due to non-strict), FUNCTION_AND) so it will prune everything out and nothing will be read. - SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND toUnixTimestamp(p)%5==0 In this case will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN, FUNCTION_AND) and all, two, partitions will be scanned, but due to filtering later none of rows will be matched.	2020-11-25 23:09:17 +03:00
Nikolai Kochetov	46f70dd0de	Merge branch 'master' into actions-dag-f14	2020-11-12 11:54:44 +03:00
Alexander Tokmakov	b94cc5c4e5	remove more stringstreams	2020-11-10 21:22:26 +03:00
Nikolai Kochetov	99cc9b1ec0	Fix build	2020-11-09 16:20:56 +03:00
Amos Bird	aa436a3cb1	Transform single point	2020-11-06 14:59:55 +08:00
alexey-milovidov	adeba6bdd8	Merge pull request #15074 from amosbird/btc Extend trivial count optimization.	2020-10-22 02:50:57 +03:00
Nikolai Kochetov	a7fb2e38a5	Use ColumnWithTypeAndName as function argument instead of Block.	2020-10-09 10:41:28 +03:00
Amos Bird	867216103f	Extend trivial count optimization.	2020-10-08 18:08:17 +08:00
Nikolai Kochetov	dad9d369a1	Merge branch 'master' into bobrik-parallel-randes	2020-07-23 16:21:32 +03:00
Artem Zuikov	2afd123eda	Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645 )	2020-07-22 20:13:05 +03:00
Nikolai Kochetov	755f15def3	Make MergeTreeSetIndex::checkInRange const.	2020-07-21 14:22:45 +03:00
Nikolai Kochetov	12c5e376c6	Remove mutable from RPNElement.	2020-07-21 14:02:58 +03:00
Ivan Babrou	d9d8d0242e	Optimize PK lookup for queries that match exact PK range Existing code that looks up marks that match the query has a pathological case, when most of the part does in fact match the query. The code works by recursively splitting a part into ranges and then discarding the ranges that definitely do not match the query, based on primary key. The problem is that it requires visiting every mark that matches the query, making the complexity of this sort of look up O(n). For queries that match exact range on the primary key, we can find both left and right parts of the range with O(log 2) complexity. This change implements exactly that. To engage this optimization, the query must: * Have a prefix list of the primary key. * Have only range or single set element constraints for columns. * Have only AND as a boolean operator. Consider a table with `(service, timestamp)` as the primary key. The following conditions will be optimized: * `service = 'foo'` * `service = 'foo' and timestamp >= now() - 3600` * `service in ('foo')` * `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now` The following will fall back to previous lookup algorithm: * `timestamp >= now() - 3600` * `service in ('foo', 'bar') and timestamp >= now() - 3600` * `service = 'foo'` Note that the optimization won't engage when PK has a range expression followed by a point expression, since in that case the range is not continuous. Trace query logging provides the following messages types of messages, each representing a different kind of PK usage for a part: ``` Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps Not using index on part 20200710_5710473_5710473_0 ``` Number of steps translates to computational complexity. Here's a comparison for before and after for a query over 24h of data: ``` Read 4562944 rows, 148.05 MiB in 45.19249672 sec., 100966 rows/sec., 3.28 MiB/sec. Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec. ``` This is especially useful for queries that read data in order and terminate early to return "last X things" matching a query. See #11564 for more thoughts on this.	2020-07-11 12:26:54 -07:00
myrrc	8c3417fbf7	ILIKE operator (#12125 ) * Integrated CachingAllocator into MarkCache * fixed build errors * reset func hotfix * upd: Fixing build * updated submodules links * fix 2 * updating grabber allocator proto * updating lost work * updating CMake to use concepts * some other changes to get it building (integration into MarkCache) * further integration into caches * updated Async metrics, fixed some build errors * and some other errors revealing * added perfect forwarding to some functions * fix: forward template * fix: constexpr modifier * fix: FakePODAllocator missing member func * updated PODArray constructor taking alloc params * fix: PODArray overload with n restored * fix: FakePODAlloc duplicating alloc() func * added constexpr variable for alloc_tag_t * split cache values by allocators, provided updates * fix: memcpy * fix: constexpr modifier * fix: noexcept modifier * fix: alloc_tag_t for PODArray constructor * fix: PODArray copy ctor with different alloc * fix: resize() signature * updating to lastest working master * syncing with 273267 * first draft version * fix: update Searcher to case-insensitive * added ILIKE test * fixed style errors, updated test, split like and ilike, added notILike * replaced inconsistent comments * fixed show tables ilike * updated missing test cases * regenerated ya.make * Update 01355_ilike.sql Co-authored-by: myrrc <me-clickhouse@myrrec.space> Co-authored-by: alexey-milovidov <milovidov@yandex-team.ru>	2020-07-05 18:57:59 +03:00
Azat Khuzhin	d93b9a57f6	Forward declaration for Context as much as possible. Now after changing Context.h 488 modules will be recompiled instead of 582.	2020-05-21 01:53:18 +03:00
alexey-milovidov	a46a61c970	Update KeyCondition.h	2020-04-08 05:56:25 +03:00
alexey-milovidov	a42d875a68	Update KeyCondition.h	2020-04-08 05:55:39 +03:00
alexey-milovidov	723a1f41e2	Update KeyCondition.h	2020-04-08 05:55:22 +03:00
alexey-milovidov	94a621060d	Update KeyCondition.h	2020-04-08 05:55:03 +03:00
Anton Popov	2dc1eddfab	fix FieldRef	2020-04-06 16:35:11 +03:00
Anton Popov	5ada959853	improve performance of index analysis with monotonic functions	2020-04-06 13:37:34 +03:00
Anton Popov	79024d73a2	improve performance of index analysis with monotonic functions	2020-04-06 13:37:34 +03:00
Ivan Lezhankin	06446b4f08	dbms/ → src/	2020-04-03 18:14:31 +03:00

23 Commits