3.5 KiB
Custom partitioning key
Starting with version 1.1.54310, you can create tables in the MergeTree family with any partition expression (not only partitioning by month).
The partition key can be an expression from the table columns, or a tuple of such expressions (similar to the primary key). The partition key can be omitted. When creating a table, specify the partition key in the ENGINE description with the new syntax:
ENGINE [=] Name(...) [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [SETTINGS name=value, ...]
For MergeTree tables, the partition expression is specified after PARTITION BY
, the primary key after ORDER BY
, the sampling key after SAMPLE BY
, and SETTINGS
can specify index_granularity
(optional; the default value is 8192), as well as other settings from MergeTreeSettings.h. Example:
ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/name', 'replica1', Sign)
PARTITION BY (toMonday(StartDate), EventType)
ORDER BY (CounterID, StartDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
The traditional partitioning by month is expressed as toYYYYMM(date_column)
.
You can't convert an old-style table to a table with custom partitions (only via INSERT SELECT).
After this table is created, merge will only work for data parts that have the same value for the partition expression. Note: This means that you shouldn't make overly granular partitions (more than about a thousand partitions), or SELECT will perform poorly.
To specify a partition in ALTER PARTITION commands, specify the value of the partition expression (or a tuple). Constants and constant expressions are supported. Example:
ALTER TABLE table DROP PARTITION (toMonday(today()), 1)
Deletes the partition for the current week with event type 1. The same is true for the OPTIMIZE query. To specify the only partition in a non-partitioned table, specify PARTITION tuple()
.
Note: For old-style tables, the partition can be specified either as a number 201710
or a string '201710'
. The syntax for the new style of tables is stricter with types (similar to the parser for the VALUES input format). In addition, ALTER TABLE FREEZE PARTITION uses exact match for new-style tables (not prefix match).
In the system.parts
table, the partition
column should specify the value of the partition expression to use in ALTER queries (if quotas are removed). The name
column should specify the name of the data part that has a new format.
Before: 20140317_20140323_2_2_0
(minimal data - maximal data - number of minimal block - number of maximal block - level).
After: 201403_2_2_0
(partition ID - number of minimal block - number of maximal block - level).
The partition ID is its string identifier (human-readable, if possible) that is used for the names of data parts in the file system and in ZooKeeper. You can specify it in ALTER queries in place of the partition key. Example: Partition key toYYYYMM(EventDate)
; ALTER can specify either PARTITION 201710
or PARTITION ID '201710'
.
There are more examples in the tests 00502_custom_partitioning_local
and 00502_custom_partitioning_replicated_zookeeper
.