Merge branch 'master' of github.com:yandex/ClickHouse

This commit is contained in:
Ivan Blinkov 2019-02-06 16:00:23 +03:00
commit 00d36b4ef9
6 changed files with 238 additions and 125 deletions

View File

@ -123,7 +123,7 @@ This table contains a single row with a single 'dummy' UInt8 column containing t
This table is used if a SELECT query doesn't specify the FROM clause.
This is similar to the DUAL table found in other DBMSs.
## system.parts
## system.parts {#system_tables-parts}
Contains information about parts of [MergeTree](table_engines/mergetree.md) tables.
@ -200,7 +200,7 @@ query String - The query text. For INSERT, it doesn't include the da
query_id String - Query ID, if defined.
```
## system.replicas
## system.replicas {#system_tables-replicas}
Contains information and status for replicated tables residing on the local server.
This table can be used for monitoring. The table contains a row for every Replicated\* table.

View File

@ -1,46 +1,124 @@
# Custom Partitioning Key
The partition key can be an expression from the table columns, or a tuple of such expressions (similar to the primary key). The partition key can be omitted. When creating a table, specify the partition key in the ENGINE description with the new syntax:
Partitioning is available for the [MergeTree](mergetree.md) family tables (including [replicated](replication.md) tables). [Materialized views](materializedview.md) based on MergeTree table support partitioning as well.
```
ENGINE [=] Name(...) [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [SETTINGS name=value, ...]
A partition is a logical combination of records in a table by a specified criterion. You can set a partition by an arbitrary criterion, for example, by month, by day or by event type. Each partition is stored separately in order to simplify manipulations with this data. When accessing the data ClickHouse only uses as small subset of partitions as possible.
The partition is specified in the `PARTITION BY expr` clause when [creating a table](mergetree.md#table_engine-mergetree-creating-a-table). The partition key can be any expression from the table columns. For example, to specify the partitioning by month, use an expression `toYYYYMM(date_column)`:
``` sql
CREATE TABLE visits
(
VisitDate Date,
Hour UInt8,
ClientID UUID
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(Date)
ORDER BY Hour
```
For MergeTree tables, the partition expression is specified after `PARTITION BY`, the primary key after `ORDER BY`, the sampling key after `SAMPLE BY`, and `SETTINGS` can specify `index_granularity` (optional; the default value is 8192), as well as other settings from [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h). The other engine parameters are specified in parentheses after the engine name, as previously. Example:
The partition key also can be a tuple of expressions (similar to the [primary key](mergetree.md#primary-keys-and-indexes-in-queries)). For example:
``` sql
ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/name', 'replica1', Sign)
PARTITION BY (toMonday(StartDate), EventType)
ORDER BY (CounterID, StartDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
PARTITION BY (toMonday(StartDate), EventType)
ORDER BY (CounterID, StartDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
```
The traditional partitioning by month is expressed as `toYYYYMM(date_column)`.
In this example, we set partitioning by the event types that occurred the current week.
You can't convert an old-style table to a table with custom partitions (only via INSERT SELECT).
When inserting new data to a table, this data is stored as a separate part (chunk) sorted by the primary key. In 10-15 minutes after inserting, the parts of the same partition are merged into the entire part.
After this table is created, merge will only work for data parts that have the same value for the partitioning expression. Note: This means that you shouldn't make overly granular partitions (more than about a thousand partitions), or SELECT will perform poorly.
!!! info
A merge only works for data parts that have the same value for the partitioning expression. It means **you should not make overly granular partitions** (more than about a thousand partitions). Otherwise, the `SELECT` query performs poorly because of an unreasonably large number of files in the file system and open file descriptors.
To specify a partition in ALTER PARTITION commands, specify the value of the partition expression (or a tuple). Constants and constant expressions are supported. Example:
To view the table parts and partitions, use the [system.parts](../system_tables.md#system_tables-parts) table. For example, let's assume that we have a table `visits` with partitioning by month. Let's perform the `SELECT` query for the `system.parts` table:
``` sql
ALTER TABLE table DROP PARTITION (toMonday(today()), 1)
SELECT
partition,
name,
active
FROM system.parts
WHERE table = 'visits'
```
Deletes the partition for the current week with event type 1. The same is true for the OPTIMIZE query. To specify the only partition in a non-partitioned table, specify `PARTITION tuple()`.
```
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 1 │
│ 201902 │ 201902_10_10_0 │ 1 │
│ 201902 │ 201902_11_11_0 │ 1 │
└───────────┴────────────────┴────────┘
```
Note: For old-style tables, the partition can be specified either as a number `201710` or a string `'201710'`. The syntax for the new style of tables is stricter with types (similar to the parser for the VALUES input format). In addition, ALTER TABLE FREEZE PARTITION uses exact match for new-style tables (not prefix match).
The `partition` column contains the names of the partitions. There are two partitions in this example: `201901` and `201902`. You can use this column value to specify the partition name in [ALTER ... PARTITION](#alter_manipulations-with-partitions) queries.
In the `system.parts` table, the `partition` column specifies the value of the partition expression to use in ALTER queries (if quotas are removed). The `name` column should specify the name of the data part that has a new format.
The `name` column contains the names of the partition data parts. You can use this column to specify the name of the part in the [ALTER ATTACH PART](#alter_attach-partition) query.
The `active` column shows the status of the part. `1` is active; `0` is inactive. The inactive parts are, for example, source parts remaining after merging to a larger part. The corrupted data parts are also indicated as inactive.
Old: `20140317_20140323_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
Let's break down the name of the first part: `201901_1_3_1`:
Now: `201403_2_2_0` (partition ID - minimum block number - maximum block number - level).
- `201901` is the partition name.
- `1` is the minimum number of the data block.
- `3` is the maximum number of the data block.
- `1` is the chunk level (the depth of the merge tree it is formed from).
The partition ID is its string identifier (human-readable, if possible) that is used for the names of data parts in the file system and in ZooKeeper. You can specify it in ALTER queries in place of the partition key. Example: Partition key `toYYYYMM(EventDate)`; ALTER can specify either `PARTITION 201710` or `PARTITION ID '201710'`.
!!! info
The parts of old-type tables have the name: `20190117_20190123_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
For more examples, see the tests [`00502_custom_partitioning_local`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_local.sql) and [`00502_custom_partitioning_replicated_zookeeper`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_replicated_zookeeper.sql).
As you can see in the example, there are several separated parts of the same partition (for example, `201901_1_1_0` and `201901_2_2_0`). It means that these parts are not merged yet. ClickHouse merges the inserted parts of data periodically, approximately in 15 minutes after inserting. Also, you can perform a non-scheduled merge, using the [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) query. Example:
``` sql
OPTIMIZE TABLE visits PARTITION 201902;
```
```
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 0 │
│ 201902 │ 201902_4_11_2 │ 1 │
│ 201902 │ 201902_10_10_0 │ 0 │
│ 201902 │ 201902_11_11_0 │ 0 │
└───────────┴────────────────┴────────┘
```
Inactive parts will be deleted approximately in 10 minutes after merging.
Another way to view a set of parts and partitions, is to go into the directory of the table: `/var/lib/clickhouse/data/<database>/<table>/`. For example:
```bash
dev:/var/lib/clickhouse/data/default/visits$ ls -l
total 40
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 1 16:48 201901_1_3_1
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 16:17 201901_1_9_2
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 15:52 201901_8_8_0
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 15:52 201901_9_9_0
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 16:17 201902_10_10_0
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 16:17 201902_11_11_0
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 16:19 201902_4_11_2
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 12:09 201902_4_6_1
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 1 16:48 detached
```
The folders '201901_1_1_0', '201901_1_7_1' and so on, are the directories of the parts. Each part relates to a corresponding partition and contains data just for a certain month (the table in this example has partitioning by month).
The 'detached' directory contains parts that were detached from the table using the [DETACH](#alter_detach-partition) query. The corrupted parts are also moved to this directory, instead of being deleted. The server does not use the parts from 'detached' directory. You can add, delete, or modify the data in this directory at any time the server will not know about this until you make the [ATTACH](../../query_language/alter.md#alter_attach-partition) query.
!!! info
On the operating server, you cannot manually change the set of parts or their data on the file system, since the server will not know about it.
For non-replicated tables, you can do this when the server is stopped, but we do not recommend it.
For replicated tables, the set of parts cannot be changed in any case.
ClickHouse allows to do operations with the partitions: delete them, copy from one table to another, create a backup. See the list of all operations in a section [Manipulations With Partitions and Parts](../../query_language/alter.md#alter_manipulations-with-partitions).
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/custom_partitioning_key/) <!--hide-->

View File

@ -69,7 +69,7 @@ For a description of request parameters, see [request description](../../query_l
`SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.
- `SETTINGS` — Additional parameters that control the behavior of the `MergeTree`:
- `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. By default, 8192.
- `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. By default, 8192. The list of all available parameters you can see in [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h).
**Example of sections setting**
@ -124,7 +124,7 @@ For each data part, ClickHouse creates an index file that contains the primary k
You can use a single large table and continually add data to it in small chunks this is what the `MergeTree` engine is intended for.
## Primary Keys and Indexes in Queries
## Primary Keys and Indexes in Queries {#primary-keys-and-indexes-in-queries}
Let's take the `(CounterID, Date)` primary key. In this case, the sorting and index can be illustrated as follows:

View File

@ -30,8 +30,15 @@ DROP COLUMN name
```
Deletes the column with the name 'name'.
Deletes data from the file system. Since this deletes entire files, the query is completed almost instantly.
``` sql
CLEAR COLUMN name IN PARTITION partition_name
```
Clears all data in a column in a specified partition.
``` sql
MODIFY COLUMN name [type] [default_expr]
```
@ -96,143 +103,171 @@ are available:
These commands are lightweight in a sense that they only change metadata or remove files.
Also, they are replicated (syncing indices metadata through ZooKeeper).
### Manipulations With Partitions and Parts
### Manipulations With Partitions and Parts {#alter_manipulations-with-partitions}
It only works for tables in the [`MergeTree`](../operations/table_engines/mergetree.md) family (including
[replicated](../operations/table_engines/replication.md) tables). The following operations
are available:
The following operations with [partitions](../operations/table_engines/custom_partitioning_key.md) are available:
- `DETACH PARTITION` Move a partition to the 'detached' directory and forget it.
- `DROP PARTITION` Delete a partition.
- `ATTACH PART|PARTITION` Add a new part or partition from the `detached` directory to the table.
- `FREEZE PARTITION` Create a backup of a partition.
- `FETCH PARTITION` Download a partition from another server.
- [DETACH PARTITION](#alter_detach-partition) Moves a partition to the 'detached' directory and forget it.
- [DROP PARTITION](#alter_drop-partition) Deletes a partition.
- [ATTACH PART|PARTITION](#alter_attach-partition) Adds a part or partition from the 'detached' directory to the table.
- [REPLACE PARTITION](#alter_replace-partition) - Copies the data partition from one table to another.
- [CLEAR COLUMN IN PARTITION](#alter_clear-column-partition) - Resets the value of a specified column in a partition.
- [FREEZE PARTITION](#alter_freeze-partition) Creates a backup of a partition.
- [FETCH PARTITION](#alter_fetch-partition) Downloads a partition from another server.
Each type of query is covered separately below.
A partition in a table is data for a single calendar month. This is determined by the values of the date key specified in the table engine parameters. Each month's data is stored separately in order to simplify manipulations with this data.
A "part" in the table is part of the data from a single partition, sorted by the primary key.
You can use the `system.parts` table to view the set of table parts and partitions:
#### DETACH PARTITION {#alter_detach-partition}
``` sql
SELECT * FROM system.parts WHERE active
ALTER TABLE table_name DETACH PARTITION partition_expr
```
`active` Only count active parts. Inactive parts are, for example, source parts remaining after merging to a larger part these parts are deleted approximately 10 minutes after merging.
Moves all data for the specified partition to the 'detached' directory ([how to specify the partition expression](#alter-how-to-specify-part-expr)). The server forgets about the detached data partition as if it does not exist. The server will not know about this data until you make the [ATTACH](#alter_attach-partition) query.
Another way to view a set of parts and partitions is to go into the directory with table data.
Data directory: `/var/lib/clickhouse/data/database/table/`,where `/var/lib/clickhouse/` is the path to the ClickHouse data, 'database' is the database name, and 'table' is the table name. Example:
```bash
$ ls -l /var/lib/clickhouse/data/test/visits/
total 48
drwxrwxrwx 2 clickhouse clickhouse 20480 May 5 02:58 20140317_20140323_2_2_0
drwxrwxrwx 2 clickhouse clickhouse 20480 May 5 02:58 20140317_20140323_4_4_0
drwxrwxrwx 2 clickhouse clickhouse 4096 May 5 02:55 detached
-rw-rw-rw- 1 clickhouse clickhouse 2 May 5 02:58 increment.txt
```
Here, `20140317_20140323_2_2_0` and ` 20140317_20140323_4_4_0` are the directories of data parts.
Let's break down the name of the first part: `20140317_20140323_2_2_0`.
- `20140317` is the minimum date of the data in the chunk.
- `20140323` is the maximum date of the data in the chunk.
- `2` is the minimum number of the data block.
- `2` is the maximum number of the data block.
- `0` is the chunk level (the depth of the merge tree it is formed from).
Each piece relates to a single partition and contains data for just one month.
`201403` is the name of the partition. A partition is a set of parts for a single month.
On an operating server, you can't manually change the set of parts or their data on the file system, since the server won't know about it.
For non-replicated tables, you can do this when the server is stopped, but we don't recommended it.
For replicated tables, the set of parts can't be changed in any case.
The `detached` directory contains parts that are not used by the server - detached from the table using the `ALTER ... DETACH` query. Parts that are damaged are also moved to this directory, instead of deleting them. You can add, delete, or modify the data in the 'detached' directory at any time the server won't know about this until you make the `ALTER TABLE ... ATTACH` query.
Example:
``` sql
ALTER TABLE [db.]table DETACH PARTITION 'name'
ALTER TABLE visits DETACH PARTITION 201901
```
Move all data for partitions named 'name' to the 'detached' directory and forget about them.
The partition name is specified in YYYYMM format. It can be indicated in single quotes or without them.
After the query is executed, you can do whatever you want with the data in the 'detached' directory — delete it from the file system, or just leave it.
The query is replicated data will be moved to the 'detached' directory and forgotten on all replicas. The query can only be sent to a leader replica. To find out if a replica is a leader, perform SELECT to the 'system.replicas' system table. Alternatively, it is easier to make a query on all replicas, and all except one will throw an exception.
This query is replicated it moves the data to the 'detached' directory on all replicas. Note that you can execute this query only on a leader replica. To find out if a replica is a leader, use the [system.replicas](../operations/system_tables.md#system_tables-replicas) table. Alternatively, it is easier to make a query on all replicas - all the replicas throw an exception, except the leader replica.
#### DROP PARTITION {#alter_drop-partition}
``` sql
ALTER TABLE [db.]table DROP PARTITION 'name'
ALTER TABLE table_name DROP PARTITION partition_expr
```
The same as the `DETACH` operation. Deletes data from the table. Data parts will be tagged as inactive and will be completely deleted in approximately 10 minutes. The query is replicated data will be deleted on all replicas.
Deletes the data of specified partition from the table ([how to specify the partition expression](#alter-how-to-specify-part-expr)). This query tags the partition as inactive and deletes data completely, approximately in 10 minutes.
The query is replicated it deletes data on all replicas.
#### ATTACH PARTITION|PART {#alter_attach-partition}
``` sql
ALTER TABLE [db.]table ATTACH PARTITION|PART 'name'
ALTER TABLE table_name ATTACH PARTITION|PART partition_expr
```
Adds data to the table from the 'detached' directory.
It is possible to add data for an entire partition or a separate part. For a part, specify the full name of the part in single quotes.
The query is replicated. Each replica checks whether there is data in the 'detached' directory. If there is data, it checks the integrity, verifies that it matches the data on the server that initiated the query, and then adds it if everything is correct. If not, it downloads data from the query requestor replica, or from another replica where the data has already been added.
So you can put data in the 'detached' directory on one replica, and use the ALTER ... ATTACH query to add it to the table on all replicas.
Adds data to the table from the 'detached' directory. It is possible to add data for an entire partition or for a separate part. Examples:
``` sql
ALTER TABLE [db.]table FREEZE PARTITION 'name'
ALTER TABLE visits ATTACH PARTITION 201901;
ALTER TABLE visits ATTACH PART 201901_2_2_0;
```
Creates a local backup of one or multiple partitions. The name can be the full name of the partition (for example, 201403), or its prefix (for example, 2014): then the backup will be created for all the corresponding partitions.
Read more about setting the partition expression in a section [How to specify the partition expression](#alter-how-to-specify-part-expr).
The query does the following: for a data snapshot at the time of execution, it creates hardlinks to table data in the directory `/var/lib/clickhouse/shadow/N/...`
This query is replicated. Each replica checks whether there is data in the 'detached' directory. If the data is in this directory, the query checks the integrity, verifies that it matches the data on the server that initiated the query. If everything is correct, the query adds data to the replica. If not, it downloads data from the query requestor replica, or from another replica where the data has already been added.
`/var/lib/clickhouse/` is the working ClickHouse directory from the config.
`N` is the incremental number of the backup.
So you can put data to the 'detached' directory on one replica, and use the `ALTER ... ATTACH` query to add it to the table on all replicas.
The same structure of directories is created inside the backup as inside `/var/lib/clickhouse/`.
It also performs 'chmod' for all files, forbidding writes to them.
The backup is created almost instantly (but first it waits for current queries to the corresponding table to finish running). At first, the backup doesn't take any space on the disk. As the system works, the backup can take disk space, as data is modified. If the backup is made for old enough data, it won't take space on the disk.
After creating the backup, data from `/var/lib/clickhouse/shadow/` can be copied to the remote server and then deleted on the local server.
The entire backup process is performed without stopping the server.
The `ALTER ... FREEZE PARTITION` query is not replicated. A local backup is only created on the local server.
As an alternative, you can manually copy data from the `/var/lib/clickhouse/data/database/table` directory.
But if you do this while the server is running, race conditions are possible when copying directories with files being added or changed, and the backup may be inconsistent. You can do this if the server isn't running then the resulting data will be the same as after the `ALTER TABLE t FREEZE PARTITION` query.
`ALTER TABLE ... FREEZE PARTITION` only copies data, not table metadata. To make a backup of table metadata, copy the file `/var/lib/clickhouse/metadata/database/table.sql`
To restore from a backup:
> - Use the CREATE query to create the table if it doesn't exist. The query can be taken from an .sql file (replace `ATTACH` in it with `CREATE`).
- Copy the data from the data/database/table/ directory inside the backup to the `/var/lib/clickhouse/data/database/table/detached/ directory.`
- Run `ALTER TABLE ... ATTACH PARTITION YYYYMM` queries, where `YYYYMM` is the month, for every month.
In this way, data from the backup will be added to the table.
Restoring from a backup doesn't require stopping the server.
#### REPLACE PARTITION {#alter_replace-partition}
``` sql
ALTER TABLE [db.]table FETCH PARTITION 'name' FROM 'path-in-zookeeper'
ALTER TABLE table2_name REPLACE PARTITION partition_expr FROM table1_name
```
This query only works for replicatable tables.
This query copies the data partition from the `table1` to `table2`. Note that:
It downloads the specified partition from the shard that has its `ZooKeeper path` specified in the `FROM` clause, then puts it in the `detached` directory for the specified table.
- Both tables must have the same structure.
- When creating the `table2`, you must specify the same partition key as for the `table1`.
Although the query is called `ALTER TABLE`, it does not change the table structure, and does not immediately change the data available in the table.
Read about setting the partition expression in a section [How to specify the partition expression](#alter-how-to-specify-part-expr).
Data is placed in the `detached` directory. You can use the `ALTER TABLE ... ATTACH` query to attach the data.
#### CLEAR COLUMN IN PARTITION {#alter_clear-column-partition}
The ` FROM` clause specifies the path in ` ZooKeeper`. For example, `/clickhouse/tables/01-01/visits`.
Before downloading, the system checks that the partition exists and the table structure matches. The most appropriate replica is selected automatically from the healthy replicas.
``` sql
ALTER TABLE table_name CLEAR COLUMN column_name IN PARTITION partition_expr
```
The `ALTER ... FETCH PARTITION` query is not replicated. The partition will be downloaded to the 'detached' directory only on the local server. Note that if after this you use the `ALTER TABLE ... ATTACH` query to add data to the table, the data will be added on all replicas (on one of the replicas it will be added from the 'detached' directory, and on the rest it will be loaded from neighboring replicas).
Resets all values in the specified column in a partition. If the `DEFAULT` clause was determined when creating a table, this query sets the column value to a specified default value.
Example:
``` sql
ALTER TABLE visits CLEAR COLUMN hour in PARTITION 201902
```
#### FREEZE PARTITION {#alter_freeze-partition}
``` sql
ALTER TABLE table_name FREEZE [PARTITION partition_expr]
```
This query creates a local backup of a specified partition. If the `PARTITION` clause is omitted, the query creates the backup of all partitions at once. Note that for old-styled tables you can specify the prefix of the partition name (for example, '2019') - then the query creates the backup for all the corresponding partitions.
At the time of execution, for a data snapshot, the query creates hardlinks to a table data. Hardlinks are placed in the directory `/var/lib/clickhouse/shadow/N/...`, where
- `/var/lib/clickhouse/` is the working ClickHouse directory specified in the config.
- `N` is the incremental number of the backup.
The same structure of directories is created inside the backup as inside `/var/lib/clickhouse/`. It also performs 'chmod' for all files, forbidding writing into them.
The query creates backup almost instantly (but first it waits for the current queries to the corresponding table to finish running). At first, the backup does not take any space on the disk. As the system works, the backup can take disk space, as data is modified. If the backup is made for old enough data, it does not take space on the disk.
After creating the backup, you can copy the data from `/var/lib/clickhouse/shadow/` to the remote server and then delete it from the local server. The entire backup process is performed without stopping the server.
The `ALTER ... FREEZE PARTITION` query is not replicated. It creates a local backup only on the local server.
As an alternative, you can manually copy data from the `/var/lib/clickhouse/data/database/table` directory. But if you do this while the server is running, race conditions are possible when copying directories with files being added or changed, and the backup may be inconsistent. Copy the data when the server is not running then the resulting data will be the same as after the `ALTER TABLE t FREEZE PARTITION` query.
`ALTER TABLE ... FREEZE PARTITION` copies only the data, not table metadata. To make a backup of table metadata, copy the file `/var/lib/clickhouse/metadata/database/table.sql`
To add the data to a table from a backup, do the following:
1. Use the `CREATE` query to create the table if it does not exist. To view the query, use the .sql file (replace `ATTACH` in it with `CREATE`).
2. Copy the data from the `data/database/table/` directory inside the backup to the `/var/lib/clickhouse/data/database/table/detached/` directory.
3. Run `ALTER TABLE ... ATTACH PARTITION` queries to add the data to a table.
Restoring from a backup does not require stopping the server.
#### FETCH PARTITION {#alter_fetch-partition}
``` sql
ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'path-in-zookeeper'
```
Downloads a partition from another server. This query only works for the replicated tables.
The query does the following:
1. Downloads the partition from the specified shard. In 'path-in-zookeeper' you must specify a path to the shard in ZooKeeper.
2. Then the query puts the downloaded data to the 'detached' directory of the specified table. Use the [ATTACH PART|PARTITION](#alter_attach-partition) query to add the data to the table.
For example:
``` sql
ALTER TABLE users FETCH PARTITION 201902 FROM '/clickhouse/tables/01-01/visits';
ALTER TABLE users ATTACH PARTITION 201902;
```
Before downloading, the system checks if the partition exists and the table structure matches. The most appropriate replica is selected automatically from the healthy replicas.
Although the query is called `ALTER TABLE`, it does not change the table structure and does not immediately change the data available in the table.
The `ALTER ... FETCH PARTITION` query is not replicated. It places the partition to the 'detached' directory only on the local server. Note that when you perform the `ALTER TABLE ... ATTACH` query, it adds the data to all replicas. The data is added to one of the replicas from the 'detached' directory, and to the others - from neighboring replicas.
#### How To Set Partition Expression {#alter-how-to-specify-part-expr}
You can specify the partition expression in `ALTER ... PARTITION` queries in different ways:
- As a value from the `partition` column of the `system.parts` table. For example, `ALTER TABLE visits DETACH PARTITION 201901`.
- As the expression from the table column. Constants and constant expressions are supported. For example, `ALTER TABLE visits DETACH PARTITION toYYYYMM(toDate('2019-01-25'))`.
- Using the partition ID. Partition ID is a string identifier of the partition (human-readable, if possible) that is used as the names of partitions in the file system and in ZooKeeper. The partition ID must be specified in the `PARTITION ID` clause, in a single quotes. For example, `ALTER TABLE visits DETACH PARTITION ID '201901'`.
- In the [ALTER ATTACH PART](#alter_attach-partition) query, to specify the name of a part, use a value from the `name` column of the `system.parts` table. For example, `ALTER TABLE visits ATTACH PART 201901_1_1_0`.
Correct usage of quotes in the partition expression depends on the type of the partition key, that was specified when creating a table. For example, for the String type partitions, you have to specify its name in quotes (`'`). For the Date and Int* types no quotes needed.
For old-style tables, you can specify the partition either as a number `201901` or a string `'201901'`. The syntax for the new-style tables is stricter with types (similar to the parser for the VALUES input format).
All the rules above are also true for the [OPTIMIZE](misc.md#misc_operations-optimize) query. If you need to specify the only partition when optimizing a non-partitioned table, set the expression `PARTITION tuple()`. For example:
``` sql
OPTIMIZE TABLE table_not_partitioned PARTITION tuple() FINAL;
```
The examples of `ALTER ... PARTITION` queries are demonstrated in the tests [`00502_custom_partitioning_local`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_local.sql) and [`00502_custom_partitioning_replicated_zookeeper`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_replicated_zookeeper.sql).
### Synchronicity of ALTER Queries

View File

@ -80,7 +80,7 @@ If you add a new column to a table but later change its default expression, the
It is not possible to set default values for elements in nested data structures.
# Column Compression Codecs
## Column Compression Codecs
Besides default data compression, defined in [server settings](../operations/server_settings/settings.md#compression), per-column specification is also available.

View File

@ -137,7 +137,7 @@ A test query (`TEST`) only checks the user's rights and displays a list of queri
[Original article](https://clickhouse.yandex/docs/en/query_language/misc/) <!--hide-->
## OPTIMIZE
## OPTIMIZE {#misc_operations-optimize}
``` sql
OPTIMIZE TABLE [db.]name [ON CLUSTER cluster] [PARTITION partition] [FINAL]