ClickHouse/docs/en/operations/table_engines/custom_partitioning_key.md
Ivan Blinkov 0a4a5b36cc
Some WIP on documentation refactoring (#2659)
* Additional .gitignore entries

* Merge a bunch of small articles about system tables into single one

* Merge a bunch of small articles about formats into single one

* Adapt table with formats to English docs too

* Add SPb meetup link to main page

* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles

* Merge MacOS.md into build_osx.md

* Move Data types higher in ToC

* Publish changelog on website alongside documentation

* Few fixes for en/table_engines/file.md

* Use smaller header sizes in changelogs

* Group up table engines inside ToC

* Move table engines out of top level too

* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.

* Move stuff that is part of query language into respective folder

* Move table functions lower in ToC

* Lost redirects.txt update

* Do not rely on comments in yaml + fix few ru titles

* Extract major parts of queries.md into separate articles

* queries.md has been supposed to be removed

* Fix weird translation

* Fix a bunch of links

* There is only table of contents left

* "Query language" is actually part of SQL abbreviation

* Change filename in README.md too

* fix mistype
2018-07-18 13:00:53 +03:00

3.6 KiB

Custom partitioning key

Starting with version 1.1.54310, you can create tables in the MergeTree family with any partitioning expression (not only partitioning by month).

The partition key can be an expression from the table columns, or a tuple of such expressions (similar to the primary key). The partition key can be omitted. When creating a table, specify the partition key in the ENGINE description with the new syntax:

ENGINE [=] Name(...) [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [SETTINGS name=value, ...]

For MergeTree tables, the partition expression is specified after PARTITION BY, the primary key after ORDER BY, the sampling key after SAMPLE BY, and SETTINGS can specify index_granularity (optional; the default value is 8192), as well as other settings from MergeTreeSettings.h. The other engine parameters are specified in parentheses after the engine name, as previously. Example:

ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/name', 'replica1', Sign)
    PARTITION BY (toMonday(StartDate), EventType)
    ORDER BY (CounterID, StartDate, intHash32(UserID))
    SAMPLE BY intHash32(UserID)

The traditional partitioning by month is expressed as toYYYYMM(date_column).

You can't convert an old-style table to a table with custom partitions (only via INSERT SELECT).

After this table is created, merge will only work for data parts that have the same value for the partitioning expression. Note: This means that you shouldn't make overly granular partitions (more than about a thousand partitions), or SELECT will perform poorly.

To specify a partition in ALTER PARTITION commands, specify the value of the partition expression (or a tuple). Constants and constant expressions are supported. Example:

ALTER TABLE table DROP PARTITION (toMonday(today()), 1)

Deletes the partition for the current week with event type 1. The same is true for the OPTIMIZE query. To specify the only partition in a non-partitioned table, specify PARTITION tuple().

Note: For old-style tables, the partition can be specified either as a number 201710 or a string '201710'. The syntax for the new style of tables is stricter with types (similar to the parser for the VALUES input format). In addition, ALTER TABLE FREEZE PARTITION uses exact match for new-style tables (not prefix match).

In the system.parts table, the partition column specifies the value of the partition expression to use in ALTER queries (if quotas are removed). The name column should specify the name of the data part that has a new format.

Was: 20140317_20140323_2_2_0 (minimum date - maximum date - minimum block number - maximum block number - level).

Now: 201403_2_2_0 (partition ID - minimum block number - maximum block number - level).

The partition ID is its string identifier (human-readable, if possible) that is used for the names of data parts in the file system and in ZooKeeper. You can specify it in ALTER queries in place of the partition key. Example: Partition key toYYYYMM(EventDate); ALTER can specify either PARTITION 201710 or PARTITION ID '201710'.

For more examples, see the tests 00502_custom_partitioning_local and 00502_custom_partitioning_replicated_zookeeper.