Commit Graph

48 Commits

Author SHA1 Message Date
Amos Bird
0558ecdc3f
Aggressive IN index analysis for projections. 2021-07-14 22:56:52 +08:00
Nikolai Kochetov
58fbc544cc Add more comments. 2021-05-28 20:16:09 +03:00
Nikolai Kochetov
295a302bc8 Remove settings from ReadFromMergeTree. 2021-05-28 17:34:02 +03:00
Nikolai Kochetov
94f1ac5a16 Remove some commented code. 2021-05-28 12:41:07 +03:00
Nikolai Kochetov
1aeb705b20 Fix some tests. 2021-05-27 19:53:58 +03:00
Nikolai Kochetov
cbdf3752ef Part 3. 2021-05-27 16:40:33 +03:00
Nikolai Kochetov
a51a6ea0b7 Part 2. 2021-05-26 21:14:43 +03:00
Nikolai Kochetov
34eaa48294 Part 1. 2021-05-25 19:34:43 +03:00
Alexey Milovidov
d32819f068 Mark false positives for PVS-Studio 2021-05-24 06:59:12 +03:00
alesapin
17f229857c Merge branch 'master' into nvartolomei-parts-move 2021-05-17 13:52:48 +03:00
alesapin
46e136b5c4
Merge branch 'master' into nv/parts-uuid-move-shard 2021-05-11 15:36:40 +03:00
Amos Bird
9c069ebdbf
support prewhere, row_filter, read_in_order and decent projection selection
TODO set index analysis in projection
2021-05-11 18:12:27 +08:00
Amos Bird
ebaf42a448
Reformat and fix some tests 2021-05-11 18:12:27 +08:00
Nikolai Kochetov
672cfedd13
Disable normal projection by the number of granules. 2021-05-11 18:12:26 +08:00
Amos Bird
264cff6415
Projections
TODO (suggested by Nikolai)

1. Build query plan fro current query (inside storage::read) up to WithMergableState
2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier)
3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection
4. Return WithMergableState if projection applies

3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also)

Also need to figure out how prewhere works for projections, and
row_filter_policies.

wip
2021-05-11 18:12:23 +08:00
Nicolae Vartolomei
53d57ffb52 Part movement between shards
Integrate query deduplication from #17348
2021-04-27 14:20:12 +01:00
Amos Bird
8a3b5c1fab
Add _partition_value virtual column 2021-04-27 16:15:59 +08:00
Amos Bird
50f2e488bd
Fix invalid virtual column expr 2021-04-21 10:29:03 +08:00
Nikolai Kochetov
f6d86d6032 Merge branch 'master' into add-read-from-mt-step 2021-04-12 15:23:32 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Nikolai Kochetov
7ffbeac9df Add info about indexes to ReadFromMergeTree step. 2021-04-08 14:48:54 +03:00
Nikolai Kochetov
7c5a9133df Add index info to ReadFromStorageStep. 2021-04-08 11:19:04 +03:00
Nikolai Kochetov
9f39f5d52d Add more counters to MergeTreeDataSelectExecutor 2021-04-06 15:39:55 +03:00
Amos Bird
93b661ad5a
partition id pruning 2021-03-04 19:43:03 +08:00
alesapin
c29d7c7f49 Shutup clang tidy 2021-03-02 19:13:36 +03:00
alesapin
9ebf1b4fad Get rid of separate minmax index fields 2021-03-02 13:33:54 +03:00
Azat Khuzhin
68f23b7087 Improve logging during MergeTree reading
- Remove "Not using primary index on part {}" message (too noisy)
- Add number of total marks before filtering by primary key into the
  common message
- Make "Index {} has dropped {} / {} granules." not per-part, but
  per-query
2021-02-13 18:08:55 +03:00
alesapin
7cbc135e72 More isolated code 2021-02-05 12:54:34 +03:00
Amos Bird
66fe97d8bd
Per MergeTree table query limit 2021-01-26 14:03:31 +08:00
Nikolai Kochetov
1846bb3cac Merge branch 'master' into actions-dag-f14 2020-11-11 13:08:57 +03:00
Nikolai Kochetov
07a7c46b89 Refactor ExpressionActions [Part 3] 2020-11-03 14:28:28 +03:00
Nikolai Kochetov
ec64def384 Use QueryPlan while reading from MergeTree. 2020-10-01 20:34:22 +03:00
Nikolai Kochetov
e411916bde Refactor Pipe [part 1]. 2020-08-03 14:33:11 +03:00
Nikolai Kochetov
0cc55781d8 Try fix tests. 2020-07-20 18:09:00 +03:00
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
Alexey Milovidov
8872417d00 Respect direct_io/mmap settings while reading secondary indices 2020-06-25 22:31:54 +03:00
alesapin
b1e8976df4 Merge with master 2020-06-22 12:04:27 +03:00
alesapin
dffdece350 getColumns in StorageInMemoryMetadta (only compilable) 2020-06-17 19:39:58 +03:00
alesapin
1afdebeebd Primary key in storage metadata 2020-06-17 15:39:20 +03:00
alesapin
71f99a274d Compileable getSampleBlockWithColumns in StorageInMemoryMetadata 2020-06-16 17:25:08 +03:00
alesapin
3847ea892d Merge branch 'master' into consistent_metadata3 2020-06-01 13:17:59 +03:00
Alexey Milovidov
25f941020b Remove namespace pollution 2020-05-31 00:57:37 +03:00
alesapin
5f8b69547b More readable code 2020-05-28 16:45:08 +03:00
Nikolai Kochetov
4d22374f24 Merged with master. 2020-05-14 12:06:15 +03:00
Anton Popov
84d4ad4315 add some comments 2020-05-13 19:00:06 +03:00
Anton Popov
67213b8ad4 fix sample with final 2020-05-12 21:23:40 +03:00
Nikolai Kochetov
19dadb8c2d Add parallel final. 2020-04-30 12:59:08 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00