ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 19:02:04 +00:00

Author	SHA1	Message	Date
Nikolai Kochetov	1aeb705b20	Fix some tests.	2021-05-27 19:53:58 +03:00
Nikolai Kochetov	cbdf3752ef	Part 3.	2021-05-27 16:40:33 +03:00
Nikolai Kochetov	a51a6ea0b7	Part 2.	2021-05-26 21:14:43 +03:00
Nikolai Kochetov	34eaa48294	Part 1.	2021-05-25 19:34:43 +03:00
Alexey Milovidov	d32819f068	Mark false positives for PVS-Studio	2021-05-24 06:59:12 +03:00
alesapin	17f229857c	Merge branch 'master' into nvartolomei-parts-move	2021-05-17 13:52:48 +03:00
alesapin	46e136b5c4	Merge branch 'master' into nv/parts-uuid-move-shard	2021-05-11 15:36:40 +03:00
Amos Bird	9c069ebdbf	support prewhere, row_filter, read_in_order and decent projection selection TODO set index analysis in projection	2021-05-11 18:12:27 +08:00
Amos Bird	ebaf42a448	Reformat and fix some tests	2021-05-11 18:12:27 +08:00
Nikolai Kochetov	672cfedd13	Disable normal projection by the number of granules.	2021-05-11 18:12:26 +08:00
Amos Bird	264cff6415	Projections TODO (suggested by Nikolai) 1. Build query plan fro current query (inside storage::read) up to WithMergableState 2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier) 3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection 4. Return WithMergableState if projection applies 3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also) Also need to figure out how prewhere works for projections, and row_filter_policies. wip	2021-05-11 18:12:23 +08:00
Nicolae Vartolomei	53d57ffb52	Part movement between shards Integrate query deduplication from #17348	2021-04-27 14:20:12 +01:00
Amos Bird	8a3b5c1fab	Add _partition_value virtual column	2021-04-27 16:15:59 +08:00
Amos Bird	50f2e488bd	Fix invalid virtual column expr	2021-04-21 10:29:03 +08:00
Nikolai Kochetov	f6d86d6032	Merge branch 'master' into add-read-from-mt-step	2021-04-12 15:23:32 +03:00
Ivan	495c6e03aa	Replace all Context references with std::weak_ptr (#22297 ) * Replace all Context references with std::weak_ptr * Fix shared context captured by value * Fix build * Fix Context with named sessions * Fix copy context * Fix gcc build * Merge with master and fix build * Fix gcc-9 build	2021-04-11 02:33:54 +03:00
Nikolai Kochetov	7ffbeac9df	Add info about indexes to ReadFromMergeTree step.	2021-04-08 14:48:54 +03:00
Nikolai Kochetov	7c5a9133df	Add index info to ReadFromStorageStep.	2021-04-08 11:19:04 +03:00
Nikolai Kochetov	9f39f5d52d	Add more counters to MergeTreeDataSelectExecutor	2021-04-06 15:39:55 +03:00
Amos Bird	93b661ad5a	partition id pruning	2021-03-04 19:43:03 +08:00
alesapin	c29d7c7f49	Shutup clang tidy	2021-03-02 19:13:36 +03:00
alesapin	9ebf1b4fad	Get rid of separate minmax index fields	2021-03-02 13:33:54 +03:00
Azat Khuzhin	68f23b7087	Improve logging during MergeTree reading - Remove "Not using primary index on part {}" message (too noisy) - Add number of total marks before filtering by primary key into the common message - Make "Index {} has dropped {} / {} granules." not per-part, but per-query	2021-02-13 18:08:55 +03:00
alesapin	7cbc135e72	More isolated code	2021-02-05 12:54:34 +03:00
Amos Bird	66fe97d8bd	Per MergeTree table query limit	2021-01-26 14:03:31 +08:00
Nikolai Kochetov	1846bb3cac	Merge branch 'master' into actions-dag-f14	2020-11-11 13:08:57 +03:00
Nikolai Kochetov	07a7c46b89	Refactor ExpressionActions [Part 3]	2020-11-03 14:28:28 +03:00
Nikolai Kochetov	ec64def384	Use QueryPlan while reading from MergeTree.	2020-10-01 20:34:22 +03:00
Nikolai Kochetov	e411916bde	Refactor Pipe [part 1].	2020-08-03 14:33:11 +03:00
Nikolai Kochetov	0cc55781d8	Try fix tests.	2020-07-20 18:09:00 +03:00
Ivan Babrou	d9d8d0242e	Optimize PK lookup for queries that match exact PK range Existing code that looks up marks that match the query has a pathological case, when most of the part does in fact match the query. The code works by recursively splitting a part into ranges and then discarding the ranges that definitely do not match the query, based on primary key. The problem is that it requires visiting every mark that matches the query, making the complexity of this sort of look up O(n). For queries that match exact range on the primary key, we can find both left and right parts of the range with O(log 2) complexity. This change implements exactly that. To engage this optimization, the query must: * Have a prefix list of the primary key. * Have only range or single set element constraints for columns. * Have only AND as a boolean operator. Consider a table with `(service, timestamp)` as the primary key. The following conditions will be optimized: * `service = 'foo'` * `service = 'foo' and timestamp >= now() - 3600` * `service in ('foo')` * `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now` The following will fall back to previous lookup algorithm: * `timestamp >= now() - 3600` * `service in ('foo', 'bar') and timestamp >= now() - 3600` * `service = 'foo'` Note that the optimization won't engage when PK has a range expression followed by a point expression, since in that case the range is not continuous. Trace query logging provides the following messages types of messages, each representing a different kind of PK usage for a part: ``` Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps Not using index on part 20200710_5710473_5710473_0 ``` Number of steps translates to computational complexity. Here's a comparison for before and after for a query over 24h of data: ``` Read 4562944 rows, 148.05 MiB in 45.19249672 sec., 100966 rows/sec., 3.28 MiB/sec. Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec. ``` This is especially useful for queries that read data in order and terminate early to return "last X things" matching a query. See #11564 for more thoughts on this.	2020-07-11 12:26:54 -07:00
Alexey Milovidov	8872417d00	Respect direct_io/mmap settings while reading secondary indices	2020-06-25 22:31:54 +03:00
alesapin	b1e8976df4	Merge with master	2020-06-22 12:04:27 +03:00
alesapin	dffdece350	getColumns in StorageInMemoryMetadta (only compilable)	2020-06-17 19:39:58 +03:00
alesapin	1afdebeebd	Primary key in storage metadata	2020-06-17 15:39:20 +03:00
alesapin	71f99a274d	Compileable getSampleBlockWithColumns in StorageInMemoryMetadata	2020-06-16 17:25:08 +03:00
alesapin	3847ea892d	Merge branch 'master' into consistent_metadata3	2020-06-01 13:17:59 +03:00
Alexey Milovidov	25f941020b	Remove namespace pollution	2020-05-31 00:57:37 +03:00
alesapin	5f8b69547b	More readable code	2020-05-28 16:45:08 +03:00
Nikolai Kochetov	4d22374f24	Merged with master.	2020-05-14 12:06:15 +03:00
Anton Popov	84d4ad4315	add some comments	2020-05-13 19:00:06 +03:00
Anton Popov	67213b8ad4	fix sample with final	2020-05-12 21:23:40 +03:00
Nikolai Kochetov	19dadb8c2d	Add parallel final.	2020-04-30 12:59:08 +03:00
Ivan Lezhankin	06446b4f08	dbms/ → src/	2020-04-03 18:14:31 +03:00

44 Commits