Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
This commit is contained in:
Vladimir Goncharov 2021-06-18 19:37:37 +03:00 committed by GitHub
parent 4fe722d1a4
commit 5badf38d57
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -39,7 +39,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2,
...
PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [WHERE] [GROUP BY] [ORDER BY]),
PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [WHERE] [GROUP BY] [ORDER BY])
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
@ -385,6 +388,26 @@ Functions with a constant argument that is less than ngram size cant be used
- `s != 1`
- `NOT startsWith(s, 'test')`
### Projections {#projections}
Projections are like materialized views, but defined in part-level. It provides consistency guarantees along with automatic usage in queries.
#### Query {#projection-query}
A projection query is what defines a projection. It has the following grammar:
`SELECT <COLUMN LIST EXPR> [WHERE] [GROUP BY] [ORDER BY]`
It implicitly selects data from the parent table.
#### Storage {#projection-storage}
Projections are stored inside the part directory. It's similar to an index but contains an subdirectory which stores an anonymous MergeTree table's part. The table is induced by the definition query of the projection. If there is a GROUP BY clause, the underlying storage engine becomes AggregatedMergeTree, and all aggregate functions are converted to either AggregateFunction or SimpleAggregateFunction. If there is an ORDER BY clause, the MergeTree table will use it as its primary key expression. During the merge process, the projection part will be merged via its storage's merge routine. The checksum of the parent table's part will combine the projection's part. Other maintenance jobs are similar to skip indices.
#### Query Routing {#projection-query-routing}
1. Check if the projection contains all the needed columns and rows.
2. If it's an aggregated projection, also check if it has the right columns inside the GROUP BY clause along with required aggregate functions.
3. If it's an sorted projection, also check how many granules will be selected by the KeyCondition.
4. Select the best feasible match.
5. The query pipeline which uses projections will be different from the one that uses the original parts. if the projection is absent in some parts, we can add the pipeline to "project" it on the fly.
## Concurrent Data Access {#concurrent-data-access}
For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.