ClickHouse/docs/en/engines/table-engines/mergetree-family/mergetree.md

---
slug: /en/engines/table-engines/mergetree-family/mergetree
sidebar_position: 11
sidebar_label:  MergeTree
---

# MergeTree

The `MergeTree` engine and other engines of this family (`*MergeTree`) are the most robust ClickHouse table engines.

Engines in the `MergeTree` family are designed for inserting a very large amount of data into a table. The data is quickly written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.

Main features:

- Stores data sorted by primary key.

    This allows you to create a small sparse index that helps find data faster.

- Partitions can be used if the [partitioning key](/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key.md) is specified.

    ClickHouse supports certain operations with partitions that are more efficient than general operations on the same data with the same result. ClickHouse also automatically cuts off the partition data where the partitioning key is specified in the query.

- Data replication support.

    The family of `ReplicatedMergeTree` tables provides data replication. For more information, see [Data replication](/docs/en/engines/table-engines/mergetree-family/replication.md).

- Data sampling support.

    If necessary, you can set the data sampling method in the table.

:::info
The [Merge](/docs/en/engines/table-engines/special/merge.md/#merge) engine does not belong to the `*MergeTree` family.
:::

## Creating a Table {#table_engine-mergetree-creating-a-table}

``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS|EPHEMERAL expr1] [TTL expr1] [CODEC(codec1)] [[NOT] NULL|PRIMARY KEY],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS|EPHEMERAL expr2] [TTL expr2] [CODEC(codec2)] [[NOT] NULL|PRIMARY KEY],
    ...
    INDEX index_name1 expr1 TYPE type1(...) [GRANULARITY value1],
    INDEX index_name2 expr2 TYPE type2(...) [GRANULARITY value2],
    ...
    PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
    PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
    ...
    STATISTIC <COLUMN LIST> TYPE type1,
    STATISTIC <COLUMN LIST> TYPE type2
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr
    [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx' [, ...] ]
    [WHERE conditions]
    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ] ]
[SETTINGS name=value, ...]
```

For a description of parameters, see the [CREATE query description](/docs/en/sql-reference/statements/create/table.md).

### Query Clauses {#mergetree-query-clauses}

#### ENGINE

`ENGINE` — Name and parameters of the engine. `ENGINE = MergeTree()`. The `MergeTree` engine does not have parameters.

#### ORDER_BY

`ORDER BY` — The sorting key.

A tuple of column names or arbitrary expressions. Example: `ORDER BY (CounterID, EventDate)`.

ClickHouse uses the sorting key as a primary key if the primary key is not defined explicitly by the `PRIMARY KEY` clause.

Use the `ORDER BY tuple()` syntax, if you do not need sorting. See [Selecting the Primary Key](#selecting-the-primary-key).

#### PARTITION BY

`PARTITION BY` — The [partitioning key](/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key.md). Optional. In most cases, you don't need a partition key, and if you do need to partition, generally you do not need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).

For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/docs/en/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.

#### PRIMARY KEY

`PRIMARY KEY` — The primary key if it [differs from the sorting key](#choosing-a-primary-key-that-differs-from-the-sorting-key). Optional.

By default the primary key is the same as the sorting key (which is specified by the `ORDER BY` clause). Thus in most cases it is unnecessary to specify a separate `PRIMARY KEY` clause.

#### SAMPLE BY

`SAMPLE BY` — An expression for sampling. Optional.

If a sampling expression is used, the primary key must contain it. The result of a sampling expression must be an unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.

####  TTL

`TTL` — A list of rules specifying storage duration of rows and defining logic of automatic parts movement [between disks and volumes](#table_engine-mergetree-multiple-volumes). Optional.

Expression must have one `Date` or `DateTime` column as a result. Example:
```
TTL date + INTERVAL 1 DAY
```

Type of the rule `DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'|GROUP BY` specifies an action to be done with the part if the expression is satisfied (reaches current time): removal of expired rows, moving a part (if expression is satisfied for all rows in a part) to specified disk (`TO DISK 'xxx'`) or to volume (`TO VOLUME 'xxx'`), or aggregating values in expired rows. Default type of the rule is removal (`DELETE`). List of multiple rules can be specified, but there should be no more than one `DELETE` rule.

For more details, see [TTL for columns and tables](#table_engine-mergetree-ttl)

### SETTINGS
Additional parameters that control the behavior of the `MergeTree` (optional):

#### index_granularity

`index_granularity` — Maximum number of data rows between the marks of an index. Default value: 8192. See [Data Storage](#mergetree-data-storage).

#### index_granularity_bytes

`index_granularity_bytes` — Maximum size of data granules in bytes. Default value: 10Mb. To restrict the granule size only by number of rows, set to 0 (not recommended). See [Data Storage](#mergetree-data-storage).

#### min_index_granularity_bytes

`min_index_granularity_bytes` — Min allowed size of data granules in bytes. Default value: 1024b. To provide a safeguard against accidentally creating tables with very low index_granularity_bytes. See [Data Storage](#mergetree-data-storage).

#### enable_mixed_granularity_parts

`enable_mixed_granularity_parts` — Enables or disables transitioning to control the granule size with the `index_granularity_bytes` setting. Before version 19.11, there was only the `index_granularity` setting for restricting granule size. The `index_granularity_bytes` setting improves ClickHouse performance when selecting data from tables with big rows (tens and hundreds of megabytes). If you have tables with big rows, you can enable this setting for the tables to improve the efficiency of `SELECT` queries.

#### use_minimalistic_part_header_in_zookeeper

`use_minimalistic_part_header_in_zookeeper` — Storage method of the data parts headers in ZooKeeper. If `use_minimalistic_part_header_in_zookeeper=1`, then ZooKeeper stores less data. For more information, see the [setting description](/docs/en/operations/server-configuration-parameters/settings.md/#server-settings-use_minimalistic_part_header_in_zookeeper) in “Server configuration parameters”.

#### min_merge_bytes_to_use_direct_io

`min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.

#### merge_with_ttl_timeout

`merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with delete TTL. Default value: `14400` seconds (4 hours).
#### merge_with_recompression_ttl_timeout

`merge_with_recompression_ttl_timeout` — Minimum delay in seconds before repeating a merge with recompression TTL. Default value: `14400` seconds (4 hours).

#### try_fetch_recompressed_part_timeout

`try_fetch_recompressed_part_timeout` — Timeout (in seconds) before starting merge with recompression. During this time ClickHouse tries to fetch recompressed part from replica which assigned this merge with recompression. Default value: `7200` seconds (2 hours).

#### write_final_mark

`write_final_mark` — Enables or disables writing the final index mark at the end of data part (after the last byte). Default value: 1. Don’t turn it off.

#### merge_max_block_size

`merge_max_block_size` — Maximum number of rows in block for merge operations. Default value: 8192.

#### storage_policy

`storage_policy` — Storage policy. See [Using Multiple Block Devices for Data Storage](#table_engine-mergetree-multiple-volumes).

#### min_bytes_for_wide_part

`min_bytes_for_wide_part`, `min_rows_for_wide_part` — Minimum number of bytes/rows in a data part that can be stored in `Wide` format. You can set one, both or none of these settings. See [Data Storage](#mergetree-data-storage).

#### max_parts_in_total

`max_parts_in_total` — Maximum number of parts in all partitions.

#### max_compress_block_size

`max_compress_block_size` — Maximum size of blocks of uncompressed data before compressing for writing to a table. You can also specify this setting in the global settings (see [max_compress_block_size](/docs/en/operations/settings/settings.md/#max-compress-block-size) setting). The value specified when table is created overrides the global value for this setting.

#### min_compress_block_size

`min_compress_block_size` — Minimum size of blocks of uncompressed data required for compression when writing the next mark. You can also specify this setting in the global settings (see [min_compress_block_size](/docs/en/operations/settings/settings.md/#min-compress-block-size) setting). The value specified when table is created overrides the global value for this setting.

#### max_partitions_to_read

`max_partitions_to_read` — Limits the maximum number of partitions that can be accessed in one query. You can also specify setting [max_partitions_to_read](/docs/en/operations/settings/merge-tree-settings.md/#max-partitions-to-read) in the global setting.

**Example of Sections Setting**

``` sql
ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192
```

In the example, we set partitioning by month.

We also set an expression for sampling as a hash by the user ID. This allows you to pseudorandomize the data in the table for each `CounterID` and `EventDate`. If you define a [SAMPLE](/docs/en/sql-reference/statements/select/sample.md/#select-sample-clause) clause when selecting the data, ClickHouse will return an evenly pseudorandom data sample for a subset of users.

The `index_granularity` setting can be omitted because 8192 is the default value.

<details markdown="1">

<summary>Deprecated Method for Creating a Table</summary>

:::note
Do not use this method in new projects. If possible, switch old projects to the method described above.
:::

``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE [=] MergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
```

**MergeTree() Parameters**

- `date-column` — The name of a column of the [Date](/docs/en/sql-reference/data-types/date.md) type. ClickHouse automatically creates partitions by month based on this column. The partition names are in the `"YYYYMM"` format.
- `sampling_expression` — An expression for sampling.
- `(primary, key)` — Primary key. Type: [Tuple()](/docs/en/sql-reference/data-types/tuple.md)
- `index_granularity` — The granularity of an index. The number of data rows between the “marks” of an index. The value 8192 is appropriate for most tasks.

**Example**

``` sql
MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)
```

The `MergeTree` engine is configured in the same way as in the example above for the main engine configuration method.
</details>

## Data Storage {#mergetree-data-storage}

A table consists of data parts sorted by primary key.

When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by primary key. For example, if the primary key is `(CounterID, Date)`, the data in the part is sorted by `CounterID`, and within each `CounterID`, it is ordered by `Date`.

Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.

Data parts can be stored in `Wide` or `Compact` format. In `Wide` format each column is stored in a separate file in a filesystem, in `Compact` format all columns are stored in one file. `Compact` format can be used to increase performance of small and frequent inserts.

Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the table engine. If the number of bytes or rows in a data part is less then the corresponding setting's value, the part is stored in `Compact` format. Otherwise it is stored in `Wide` format. If none of these settings is set, data parts are stored in `Wide` format.

Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it’s in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.

The granule size is restricted by the `index_granularity` and `index_granularity_bytes` settings of the table engine. The number of rows in a granule lays in the `[1, index_granularity]` range, depending on the size of the rows. The size of a granule can exceed `index_granularity_bytes` if the size of a single row is greater than the value of the setting. In this case, the size of the granule equals the size of the row.

## Primary Keys and Indexes in Queries {#primary-keys-and-indexes-in-queries}

Take the `(CounterID, Date)` primary key as an example. In this case, the sorting and index can be illustrated as follows:

      Whole data:     [---------------------------------------------]
      CounterID:      [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
      Date:           [1111111222222233331233211111222222333211111112122222223111112223311122333]
      Marks:           |      |      |      |      |      |      |      |      |      |      |
                      a,1    a,2    a,3    b,3    e,2    e,3    g,1    h,2    i,1    i,3    l,3
      Marks numbers:   0      1      2      3      4      5      6      7      8      9      10

If the data query specifies:

- `CounterID in ('a', 'h')`, the server reads the data in the ranges of marks `[0, 3)` and `[6, 8)`.
- `CounterID IN ('a', 'h') AND Date = 3`, the server reads the data in the ranges of marks `[1, 3)` and `[7, 8)`.
- `Date = 3`, the server reads the data in the range of marks `[1, 10]`.

The examples above show that it is always more effective to use an index than a full scan.

A sparse index allows extra data to be read. When reading a single range of the primary key, up to `index_granularity * 2` extra rows in each data block can be read.

Sparse indexes allow you to work with a very large number of table rows, because in most cases, such indexes fit in the computer’s RAM.

ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.

You can use `Nullable`-typed expressions in the `PRIMARY KEY` and `ORDER BY` clauses but it is strongly discouraged. To allow this feature, turn on the [allow_nullable_key](/docs/en/operations/settings/settings.md/#allow-nullable-key) setting. The [NULLS_LAST](/docs/en/sql-reference/statements/select/order-by.md/#sorting-of-special-values) principle applies for `NULL` values in the `ORDER BY` clause.

### Selecting the Primary Key {#selecting-the-primary-key}

The number of columns in the primary key is not explicitly limited. Depending on the data structure, you can include more or fewer columns in the primary key. This may:

- Improve the performance of an index.

    If the primary key is `(a, b)`, then adding another column `c` will improve the performance if the following conditions are met:

    - There are queries with a condition on column `c`.
    - Long data ranges (several times longer than the `index_granularity`) with identical values for `(a, b)` are common. In other words, when adding another column allows you to skip quite long data ranges.

- Improve data compression.

    ClickHouse sorts data by primary key, so the higher the consistency, the better the compression.

- Provide additional logic when merging data parts in the [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree.md/#table_engine-collapsingmergetree) and [SummingMergeTree](/docs/en/engines/table-engines/mergetree-family/summingmergetree.md) engines.

    In this case it makes sense to specify the *sorting key* that is different from the primary key.

A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.

You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#settings-max-insert-threads).

To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#settings-max_threads) `SELECT` queries.

### Choosing a Primary Key that Differs from the Sorting Key {#choosing-a-primary-key-that-differs-from-the-sorting-key}

It is possible to specify a primary key (an expression with values that are written in the index file for each mark) that is different from the sorting key (an expression for sorting the rows in data parts). In this case the primary key expression tuple must be a prefix of the sorting key expression tuple.

This feature is helpful when using the [SummingMergeTree](/docs/en/engines/table-engines/mergetree-family/summingmergetree.md) and
[AggregatingMergeTree](/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree.md) table engines. In a common case when using these engines, the table has two types of columns: *dimensions* and *measures*. Typical queries aggregate values of measure columns with arbitrary `GROUP BY` and filtering by dimensions. Because SummingMergeTree and AggregatingMergeTree aggregate rows with the same value of the sorting key, it is natural to add all dimensions to it. As a result, the key expression consists of a long list of columns and this list must be frequently updated with newly added dimensions.

In this case it makes sense to leave only a few columns in the primary key that will provide efficient range scans and add the remaining dimension columns to the sorting key tuple.

[ALTER](/docs/en/sql-reference/statements/alter/index.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts do not need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.

### Use of Indexes and Partitions in Queries {#use-of-indexes-and-partitions-in-queries}

For `SELECT` queries, ClickHouse analyzes whether an index can be used. An index can be used if the `WHERE/PREWHERE` clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has `IN` or `LIKE` with a fixed prefix on columns or expressions that are in the primary key or partitioning key, or on certain partially repetitive functions of these columns, or logical relationships of these expressions.

Thus, it is possible to quickly run queries on one or many ranges of the primary key. In this example, queries will be fast when run for a specific tracking tag, for a specific tag and date range, for a specific tag and date, for multiple tags with a date range, and so on.

Let’s look at the engine configured as follows:
```sql
ENGINE MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate)
SETTINGS index_granularity=8192
```

In this case, in queries:

``` sql
SELECT count() FROM table
WHERE EventDate = toDate(now())
AND CounterID = 34

SELECT count() FROM table
WHERE EventDate = toDate(now())
AND (CounterID = 34 OR CounterID = 42)

SELECT count() FROM table
WHERE ((EventDate >= toDate('2014-01-01')
AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01'))
AND CounterID IN (101500, 731962, 160656)
AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))
```

ClickHouse will use the primary key index to trim improper data and the monthly partitioning key to trim partitions that are in improper date ranges.

The queries above show that the index is used even for complex expressions. Reading from the table is organized so that using the index can’t be slower than a full scan.

In the example below, the index can’t be used.

``` sql
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```

To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#settings-force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).

The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.

### Use of Index for Partially-monotonic Primary Keys {#use-of-index-for-partially-monotonic-primary-keys}

Consider, for example, the days of the month. They form a [monotonic sequence](https://en.wikipedia.org/wiki/Monotonic_function) for one month, but not monotonic for more extended periods. This is a partially-monotonic sequence. If a user creates the table with partially-monotonic primary key, ClickHouse creates a sparse index as usual. When a user selects data from this kind of table, ClickHouse analyzes the query conditions. If the user wants to get data between two marks of the index and both these marks fall within one month, ClickHouse can use the index in this particular case because it can calculate the distance between the parameters of a query and index marks.

ClickHouse cannot use an index if the values of the primary key in the query parameter range do not represent a monotonic sequence. In this case, ClickHouse uses the full scan method.

ClickHouse uses this logic not only for days of the month sequences, but for any primary key that represents a partially-monotonic sequence.

### Data Skipping Indexes {#table_engine-mergetree-data_skipping-indexes}

The index declaration is in the columns section of the `CREATE` query.

``` sql
INDEX index_name expr TYPE type(...) [GRANULARITY granularity_value]
```

For tables from the `*MergeTree` family, data skipping indices can be specified.

These indices aggregate some information about the specified expression on blocks, which consist of `granularity_value` granules (the size of the granule is specified using the `index_granularity` setting in the table engine). Then these aggregates are used in `SELECT` queries for reducing the amount of data to read from the disk by skipping big blocks of data where the `where` query cannot be satisfied.

The `GRANULARITY` clause can be omitted, the default value of `granularity_value` is 1.

**Example**

``` sql
CREATE TABLE table_name
(
    u64 UInt64,
    i32 Int32,
    s String,
    ...
    INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3,
    INDEX idx2 u64 * i32 TYPE minmax GRANULARITY 3,
    INDEX idx3 u64 * length(s) TYPE set(1000) GRANULARITY 4
) ENGINE = MergeTree()
...
```

Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:

``` sql
SELECT count() FROM table WHERE u64 == 10;
SELECT count() FROM table WHERE u64 * i32 >= 1234
SELECT count() FROM table WHERE u64 * length(s) == 1234
```

Data skipping indexes can also be created on composite columns:

```sql
-- on columns of type Map:
INDEX map_key_index mapKeys(map_column) TYPE bloom_filter
INDEX map_value_index mapValues(map_column) TYPE bloom_filter

-- on columns of type Tuple:
INDEX tuple_1_index tuple_column.1 TYPE bloom_filter
INDEX tuple_2_index tuple_column.2 TYPE bloom_filter

-- on columns of type Nested:
INDEX nested_1_index col.nested_col1 TYPE bloom_filter
INDEX nested_2_index col.nested_col2 TYPE bloom_filter
```

### Available Types of Indices {#available-types-of-indices}

#### MinMax

Stores extremes of the specified expression (if the expression is `tuple`, then it stores extremes for each element of `tuple`), uses stored info for skipping blocks of data like the primary key.

Syntax: `minmax`

#### Set

Stores unique values of the specified expression (no more than `max_rows` rows, `max_rows=0` means “no limits”). Uses the values to check if the `WHERE` expression is not satisfiable on a block of data.

Syntax: `set(max_rows)`

#### Bloom Filter

Stores a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) for the specified columns. An optional `false_positive` parameter with possible values between 0 and 1 specifies the probability of receiving a false positive response from the filter. Default value: 0.025. Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `String`, `FixedString`, `Array`, `LowCardinality`, `Nullable`, `UUID` and `Map`. For the `Map` data type, the client can specify if the index should be created for keys or values using [mapKeys](/docs/en/sql-reference/functions/tuple-map-functions.md/#mapkeys) or [mapValues](/docs/en/sql-reference/functions/tuple-map-functions.md/#mapvalues) function.

Syntax: `bloom_filter([false_positive])`

#### N-gram Bloom Filter

Stores a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) that contains all n-grams from a block of data. Only works with datatypes: [String](/docs/en/sql-reference/data-types/string.md), [FixedString](/docs/en/sql-reference/data-types/fixedstring.md) and [Map](/docs/en/sql-reference/data-types/map.md). Can be used for optimization of `EQUALS`, `LIKE` and `IN` expressions.

Syntax: `ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`

- `n` — ngram size,
- `size_of_bloom_filter_in_bytes` — Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).
- `number_of_hash_functions` — The number of hash functions used in the Bloom filter.
- `random_seed` — The seed for Bloom filter hash functions.

Users can create [UDF](/docs/en/sql-reference/statements/create/function.md) to estimate the parameters set of `ngrambf_v1`. Query statements are as follows:

```sql
CREATE FUNCTION bfEstimateFunctions [ON CLUSTER cluster]
AS
(total_nubmer_of_all_grams, size_of_bloom_filter_in_bits) -> round((size_of_bloom_filter_in_bits / total_nubmer_of_all_grams) * log(2));

CREATE FUNCTION bfEstimateBmSize [ON CLUSTER cluster]
AS
(total_nubmer_of_all_grams,  probability_of_false_positives) -> ceil((total_nubmer_of_all_grams * log(probability_of_false_positives)) / log(1 / pow(2, log(2))));

CREATE FUNCTION bfEstimateFalsePositive [ON CLUSTER cluster]
AS
(total_nubmer_of_all_grams, number_of_hash_functions, size_of_bloom_filter_in_bytes) -> pow(1 - exp(-number_of_hash_functions/ (size_of_bloom_filter_in_bytes / total_nubmer_of_all_grams)), number_of_hash_functions);

CREATE FUNCTION bfEstimateGramNumber [ON CLUSTER cluster]
AS
(number_of_hash_functions, probability_of_false_positives, size_of_bloom_filter_in_bytes) -> ceil(size_of_bloom_filter_in_bytes / (-number_of_hash_functions / log(1 - exp(log(probability_of_false_positives) / number_of_hash_functions))))

```
To use those functions,we need to specify two parameter at least.
For example, if there 4300 ngrams in the granule and we expect false positives to be less than 0.0001. The other parameters can be estimated by executing following queries:


```sql
--- estimate number of bits in the filter
SELECT bfEstimateBmSize(4300, 0.0001) / 8 as size_of_bloom_filter_in_bytes;

┌─size_of_bloom_filter_in_bytes─┐
│                         10304 │
└───────────────────────────────┘

--- estimate number of hash functions
SELECT bfEstimateFunctions(4300, bfEstimateBmSize(4300, 0.0001)) as number_of_hash_functions

┌─number_of_hash_functions─┐
│                       13 │
└──────────────────────────┘

```
Of course, you can also use those functions to estimate parameters by other conditions.
The functions refer to the content [here](https://hur.st/bloomfilter).


#### Token Bloom Filter

The same as `ngrambf_v1`, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.

Syntax: `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`

#### Special-purpose

- Experimental indexes to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
- An experimental inverted index to support full-text search. See [here](invertedindexes.md) for details.

### Functions Support {#functions-support}

Conditions in the `WHERE` clause contains calls of the functions that operate with columns. If the column is a part of an index, ClickHouse tries to use this index when performing the functions. ClickHouse supports different subsets of functions for using indexes.

Indexes of type `set` can be utilized by all functions. The other index types are supported as follows:

| Function (operator) / Index                                                                                | primary key | minmax | ngrambf_v1 | tokenbf_v1 | bloom_filter | inverted |
|------------------------------------------------------------------------------------------------------------|-------------|--------|------------|------------|--------------|----------|
| [equals (=, ==)](/docs/en/sql-reference/functions/comparison-functions.md/#function-equals)                | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
| [notEquals(!=, &lt;&gt;)](/docs/en/sql-reference/functions/comparison-functions.md/#function-notequals)    | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
| [like](/docs/en/sql-reference/functions/string-search-functions.md/#function-like)                         | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
| [notLike](/docs/en/sql-reference/functions/string-search-functions.md/#function-notlike)                   | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
| [startsWith](/docs/en/sql-reference/functions/string-functions.md/#startswith)                             | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
| [endsWith](/docs/en/sql-reference/functions/string-functions.md/#endswith)                                 | ✗           | ✗      | ✔          | ✔          | ✗            | ✔        |
| [multiSearchAny](/docs/en/sql-reference/functions/string-search-functions.md/#function-multisearchany)     | ✗           | ✗      | ✔          | ✗          | ✗            | ✔        |
| [in](/docs/en/sql-reference/functions/in-functions#in-functions)                                           | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
| [notIn](/docs/en/sql-reference/functions/in-functions#in-functions)                                        | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
| [less (<)](/docs/en/sql-reference/functions/comparison-functions.md/#function-less)                        | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [greater (>)](/docs/en/sql-reference/functions/comparison-functions.md/#function-greater)                  | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [lessOrEquals (<=)](/docs/en/sql-reference/functions/comparison-functions.md/#function-lessorequals)       | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [greaterOrEquals (>=)](/docs/en/sql-reference/functions/comparison-functions.md/#function-greaterorequals) | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [empty](/docs/en/sql-reference/functions/array-functions#function-empty)                                   | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [notEmpty](/docs/en/sql-reference/functions/array-functions#function-notempty)                             | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
| [has](/docs/en/sql-reference/functions/array-functions#function-has)                                       | ✗           | ✗      | ✔          | ✔          | ✔            | ✔        |
| [hasAny](/docs/en/sql-reference/functions/array-functions#function-hasAny)                                 | ✗           | ✗      | ✗          | ✗          | ✔            | ✗        |
| [hasAll](/docs/en/sql-reference/functions/array-functions#function-hasAll)                                 | ✗           | ✗      | ✗          | ✗          | ✔            | ✗        |
| hasToken                                                                                                   | ✗           | ✗      | ✗          | ✔          | ✗            | ✔        |
| hasTokenOrNull                                                                                             | ✗           | ✗      | ✗          | ✔          | ✗            | ✔        |
| hasTokenCaseInsensitive (*)                                                                                | ✗           | ✗      | ✗          | ✔          | ✗            | ✗        |
| hasTokenCaseInsensitiveOrNull (*)                                                                          | ✗           | ✗      | ✗          | ✔          | ✗            | ✗        |

Functions with a constant argument that is less than ngram size can’t be used by `ngrambf_v1` for query optimization.

(*) For `hasTokenCaseInsensitive` and `hasTokenCaseInsensitiveOrNull` to be effective, the `tokenbf_v1` index must be created on lowercased data, for example `INDEX idx (lower(str_col)) TYPE tokenbf_v1(512, 3, 0)`.

:::note
Bloom filters can have false positive matches, so the `ngrambf_v1`, `tokenbf_v1`, and `bloom_filter` indexes can not be used for optimizing queries where the result of a function is expected to be false.

For example:

- Can be optimized:
    - `s LIKE '%test%'`
    - `NOT s NOT LIKE '%test%'`
    - `s = 1`
    - `NOT s != 1`
    - `startsWith(s, 'test')`
- Can not be optimized:
    - `NOT s LIKE '%test%'`
    - `s NOT LIKE '%test%'`
    - `NOT s = 1`
    - `s != 1`
    - `NOT startsWith(s, 'test')`
:::


## Projections {#projections}
Projections are like [materialized views](/docs/en/sql-reference/statements/create/view.md/#materialized) but defined in part-level. It provides consistency guarantees along with automatic usage in queries.

:::note
When you are implementing projections you should also consider the [force_optimize_projection](/docs/en/operations/settings/settings.md/#force-optimize-projection) setting.
:::

Projections are not supported in the `SELECT` statements with the [FINAL](/docs/en/sql-reference/statements/select/from.md/#select-from-final) modifier.

### Projection Query {#projection-query}
A projection query is what defines a projection. It implicitly selects data from the parent table.
**Syntax**

```sql
SELECT <column list expr> [GROUP BY] <group keys expr> [ORDER BY] <expr>
```

Projections can be modified or dropped with the [ALTER](/docs/en/sql-reference/statements/alter/projection.md) statement.

### Projection Storage {#projection-storage}
Projections are stored inside the part directory. It's similar to an index but contains a subdirectory that stores an anonymous `MergeTree` table's part. The table is induced by the definition query of the projection. If there is a `GROUP BY` clause, the underlying storage engine becomes [AggregatingMergeTree](aggregatingmergetree.md), and all aggregate functions are converted to `AggregateFunction`. If there is an `ORDER BY` clause, the `MergeTree` table uses it as its primary key expression. During the merge process the projection part is merged via its storage's merge routine. The checksum of the parent table's part is combined with the projection's part. Other maintenance jobs are similar to skip indices.

### Query Analysis {#projection-query-analysis}
1. Check if the projection can be used to answer the given query, that is, it generates the same answer as querying the base table.
2. Select the best feasible match, which contains the least granules to read.
3. The query pipeline which uses projections will be different from the one that uses the original parts. If the projection is absent in some parts, we can add the pipeline to "project" it on the fly.

## Concurrent Data Access {#concurrent-data-access}

For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.

Reading from a table is automatically parallelized.

## TTL for Columns and Tables {#table_engine-mergetree-ttl}

Determines the lifetime of values.

The `TTL` clause can be set for the whole table and for each individual column. Table-level `TTL` can also specify the logic of automatic moving data between disks and volumes, or recompressing parts where all the data has been expired.

Expressions must evaluate to [Date](/docs/en/sql-reference/data-types/date.md) or [DateTime](/docs/en/sql-reference/data-types/datetime.md) data type.

**Syntax**

Setting time-to-live for a column:

``` sql
TTL time_column
TTL time_column + interval
```

To define `interval`, use [time interval](/docs/en/sql-reference/operators/index.md#operators-datetime) operators, for example:

``` sql
TTL date_time + INTERVAL 1 MONTH
TTL date_time + INTERVAL 15 HOUR
```

### Column TTL {#mergetree-column-ttl}

When the values in the column expire, ClickHouse replaces them with the default values for the column data type. If all the column values in the data part expire, ClickHouse deletes this column from the data part in a filesystem.

The `TTL` clause can’t be used for key columns.

**Examples**

#### Creating a table with `TTL`:

``` sql
CREATE TABLE example_table
(
    d DateTime,
    a Int TTL d + INTERVAL 1 MONTH,
    b Int TTL d + INTERVAL 1 MONTH,
    c String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d;
```

#### Adding TTL to a column of an existing table

``` sql
ALTER TABLE example_table
    MODIFY COLUMN
    c String TTL d + INTERVAL 1 DAY;
```

#### Altering TTL of the column

``` sql
ALTER TABLE example_table
    MODIFY COLUMN
    c String TTL d + INTERVAL 1 MONTH;
```

### Table TTL {#mergetree-table-ttl}

Table can have an expression for removal of expired rows, and multiple expressions for automatic move of parts between [disks or volumes](#table_engine-mergetree-multiple-volumes). When rows in the table expire, ClickHouse deletes all corresponding rows. For parts moving or recompressing, all rows of a part must satisfy the `TTL` expression criteria.

``` sql
TTL expr
    [DELETE|RECOMPRESS codec_name1|TO DISK 'xxx'|TO VOLUME 'xxx'][, DELETE|RECOMPRESS codec_name2|TO DISK 'aaa'|TO VOLUME 'bbb'] ...
    [WHERE conditions]
    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ]
```

Type of TTL rule may follow each TTL expression. It affects an action which is to be done once the expression is satisfied (reaches current time):

- `DELETE` - delete expired rows (default action);
- `RECOMPRESS codec_name` - recompress data part with the `codec_name`;
- `TO DISK 'aaa'` - move part to the disk `aaa`;
- `TO VOLUME 'bbb'` - move part to the disk `bbb`;
- `GROUP BY` - aggregate expired rows.

`DELETE` action can be used together with `WHERE` clause to delete only some of the expired rows based on a filtering condition:
``` sql
TTL time_column + INTERVAL 1 MONTH DELETE WHERE column = 'value'
```

`GROUP BY` expression must be a prefix of the table primary key.

If a column is not part of the `GROUP BY` expression and is not set explicitly in the `SET` clause, in result row it contains an occasional value from the grouped rows (as if aggregate function `any` is applied to it).

**Examples**

#### Creating a table with `TTL`:

``` sql
CREATE TABLE example_table
(
    d DateTime,
    a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d
TTL d + INTERVAL 1 MONTH DELETE,
    d + INTERVAL 1 WEEK TO VOLUME 'aaa',
    d + INTERVAL 2 WEEK TO DISK 'bbb';
```

#### Altering `TTL` of the table:

``` sql
ALTER TABLE example_table
    MODIFY TTL d + INTERVAL 1 DAY;
```

Creating a table, where the rows are expired after one month. The expired rows where dates are Mondays are deleted:

``` sql
CREATE TABLE table_with_where
(
    d DateTime,
    a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d
TTL d + INTERVAL 1 MONTH DELETE WHERE toDayOfWeek(d) = 1;
```

#### Creating a table, where expired rows are recompressed:

```sql
CREATE TABLE table_for_recompression
(
    d DateTime,
    key UInt64,
    value String
) ENGINE MergeTree()
ORDER BY tuple()
PARTITION BY key
TTL d + INTERVAL 1 MONTH RECOMPRESS CODEC(ZSTD(17)), d + INTERVAL 1 YEAR RECOMPRESS CODEC(LZ4HC(10))
SETTINGS min_rows_for_wide_part = 0, min_bytes_for_wide_part = 0;
```

Creating a table, where expired rows are aggregated. In result rows `x` contains the maximum value across the grouped rows, `y` — the minimum value, and `d` — any occasional value from grouped rows.

``` sql
CREATE TABLE table_for_aggregation
(
    d DateTime,
    k1 Int,
    k2 Int,
    x Int,
    y Int
)
ENGINE = MergeTree
ORDER BY (k1, k2)
TTL d + INTERVAL 1 MONTH GROUP BY k1, k2 SET x = max(x), y = min(y);
```

### Removing Expired Data {#mergetree-removing-expired-data}

Data with an expired `TTL` is removed when ClickHouse merges data parts.

When ClickHouse detects that data is expired, it performs an off-schedule merge. To control the frequency of such merges, you can set `merge_with_ttl_timeout`. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources.

If you perform the `SELECT` query between merges, you may get expired data. To avoid it, use the [OPTIMIZE](/docs/en/sql-reference/statements/optimize.md) query before `SELECT`.

**See Also**

- [ttl_only_drop_parts](/docs/en/operations/settings/settings.md/#ttl_only_drop_parts) setting


## Disk types

In addition to local block devices, ClickHouse supports these storage types:
- [`s3` for S3 and MinIO](#table_engine-mergetree-s3)
- [`gcs` for GCS](/docs/en/integrations/data-ingestion/gcs/index.md/#creating-a-disk)
- [`blob_storage_disk` for Azure Blob Storage](#table_engine-mergetree-azure-blob-storage)
- [`hdfs` for HDFS](#hdfs-storage)
- [`web` for read-only from web](#web-storage)
- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)

## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}

### Introduction {#introduction}

`MergeTree` family table engines can store data on multiple block devices. For example, it can be useful when the data of a certain table are implicitly split into “hot” and “cold”. The most recent data is regularly requested but requires only a small amount of space. On the contrary, the fat-tailed historical data is requested rarely. If several disks are available, the “hot” data may be located on fast disks (for example, NVMe SSDs or in memory), while the “cold” data - on relatively slow ones (for example, HDD).

Data part is the minimum movable unit for `MergeTree`-engine tables. The data belonging to one part are stored on one disk. Data parts can be moved between disks in the background (according to user settings) as well as by means of the [ALTER](/docs/en/sql-reference/statements/alter/partition.md/#alter_move-partition) queries.

### Terms {#terms}

- Disk — Block device mounted to the filesystem.
- Default disk — Disk that stores the path specified in the [path](/docs/en/operations/server-configuration-parameters/settings.md/#server_configuration_parameters-path) server setting.
- Volume — Ordered set of equal disks (similar to [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures)).
- Storage policy — Set of volumes and the rules for moving data between them.

The names given to the described entities can be found in the system tables, [system.storage_policies](/docs/en/operations/system-tables/storage_policies.md/#system_tables-storage_policies) and [system.disks](/docs/en/operations/system-tables/disks.md/#system_tables-disks). To apply one of the configured storage policies for a table, use the `storage_policy` setting of `MergeTree`-engine family tables.

### Configuration {#table_engine-mergetree-multiple-volumes_configure}

Disks, volumes and storage policies should be declared inside the `<storage_configuration>` tag either in a file in the `config.d` directory.

:::tip
Disks can also be declared in the `SETTINGS` section of a query.  This is useful
for ad-hoc analysis to temporarily attach a disk that is, for example, hosted at a URL.
See [dynamic storage](#dynamic-storage) for more details.
:::

Configuration structure:

``` xml
<storage_configuration>
    <disks>
        <disk_name_1> <!-- disk name -->
            <path>/mnt/fast_ssd/clickhouse/</path>
        </disk_name_1>
        <disk_name_2>
            <path>/mnt/hdd1/clickhouse/</path>
            <keep_free_space_bytes>10485760</keep_free_space_bytes>
        </disk_name_2>
        <disk_name_3>
            <path>/mnt/hdd2/clickhouse/</path>
            <keep_free_space_bytes>10485760</keep_free_space_bytes>
        </disk_name_3>

        ...
    </disks>

    ...
</storage_configuration>
```

Tags:

- `<disk_name_N>` — Disk name. Names must be different for all disks.
- `path` — path under which a server will store data (`data` and `shadow` folders), should be terminated with ‘/’.
- `keep_free_space_bytes` — the amount of free disk space to be reserved.

The order of the disk definition is not important.

Storage policies configuration markup:

``` xml
<storage_configuration>
    ...
    <policies>
        <policy_name_1>
            <volumes>
                <volume_name_1>
                    <disk>disk_name_from_disks_configuration</disk>
                    <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
                    <load_balancing>round_robin</load_balancing>
                </volume_name_1>
                <volume_name_2>
                    <!-- configuration -->
                </volume_name_2>
                <!-- more volumes -->
            </volumes>
            <move_factor>0.2</move_factor>
        </policy_name_1>
        <policy_name_2>
            <!-- configuration -->
        </policy_name_2>

        <!-- more policies -->
    </policies>
    ...
</storage_configuration>
```

Tags:

- `policy_name_N` — Policy name. Policy names must be unique.
- `volume_name_N` — Volume name. Volume names must be unique.
- `disk` — a disk within a volume.
- `max_data_part_size_bytes` — the maximum size of a part that can be stored on any of the volume’s disks. If the a size of a merged part estimated to be bigger than `max_data_part_size_bytes` then this part will be written to a next volume. Basically this feature allows to keep new/small parts on a hot (SSD) volume and move them to a cold (HDD) volume when they reach large size. Do not use this setting if your policy has only one volume.
- `move_factor` — when the amount of available space gets lower than this factor, data automatically starts to move on the next volume if any (by default, 0.1). ClickHouse sorts existing parts by size from largest to smallest (in descending order) and selects parts with the total size that is sufficient to meet the `move_factor` condition. If the total size of all parts is insufficient, all parts will be moved.
- `prefer_not_to_merge` — Disables merging of data parts on this volume. When this setting is enabled, merging data on this volume is not allowed. This allows controlling how ClickHouse works with slow disks.
- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default (if enabled) if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3). If disabled then already expired data part is written into a default volume and then right after moved to TTL volume.
- `load_balancing` - Policy for disk balancing, `round_robin` or `least_used`.

Configuration examples:

``` xml
<storage_configuration>
    ...
    <policies>
        <hdd_in_order> <!-- policy name -->
            <volumes>
                <single> <!-- volume name -->
                    <disk>disk1</disk>
                    <disk>disk2</disk>
                </single>
            </volumes>
        </hdd_in_order>

        <moving_from_ssd_to_hdd>
            <volumes>
                <hot>
                    <disk>fast_ssd</disk>
                    <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
                </hot>
                <cold>
                    <disk>disk1</disk>
                </cold>
            </volumes>
            <move_factor>0.2</move_factor>
        </moving_from_ssd_to_hdd>

        <small_jbod_with_external_no_merges>
            <volumes>
                <main>
                    <disk>jbod1</disk>
                </main>
                <external>
                    <disk>external</disk>
                    <prefer_not_to_merge>true</prefer_not_to_merge>
                </external>
            </volumes>
        </small_jbod_with_external_no_merges>
    </policies>
    ...
</storage_configuration>
```

In given example, the `hdd_in_order` policy implements the [round-robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) approach. Thus this policy defines only one volume (`single`), the data parts are stored on all its disks in circular order. Such policy can be quite useful if there are several similar disks are mounted to the system, but RAID is not configured. Keep in mind that each individual disk drive is not reliable and you might want to compensate it with replication factor of 3 or more.

If there are different kinds of disks available in the system, `moving_from_ssd_to_hdd` policy can be used instead. The volume `hot` consists of an SSD disk (`fast_ssd`), and the maximum size of a part that can be stored on this volume is 1GB. All the parts with the size larger than 1GB will be stored directly on the `cold` volume, which contains an HDD disk `disk1`.
Also, once the disk `fast_ssd` gets filled by more than 80%, data will be transferred to the `disk1` by a background process.

The order of volume enumeration within a storage policy is important. Once a volume is overfilled, data are moved to the next one. The order of disk enumeration is important as well because data are stored on them in turns.

When creating a table, one can apply one of the configured storage policies to it:

``` sql
CREATE TABLE table_with_non_default_policy (
    EventDate Date,
    OrderID UInt64,
    BannerID UInt64,
    SearchPhrase String
) ENGINE = MergeTree
ORDER BY (OrderID, BannerID)
PARTITION BY toYYYYMM(EventDate)
SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
```

The `default` storage policy implies using only one volume, which consists of only one disk given in `<path>`.
You could change storage policy after table creation with [ALTER TABLE ... MODIFY SETTING] query, new policy should include all old disks and volumes with same names.

The number of threads performing background moves of data parts can be changed by [background_move_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_move_pool_size) setting.

### Dynamic Storage

This example query shows how to attach a table stored at a URL and configure the
remote storage within the query. The web storage is not configured in the ClickHouse
configuration files; all the settings are in the CREATE/ATTACH query.

:::note
The example uses `type=web`, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter `custom_local_disks_base_directory`, which has no default, so set that also when using local disk.
:::

#### Example dynamic web storage

:::tip
A [demo dataset](https://github.com/ClickHouse/web-tables-demo) is hosted in GitHub.  To prepare your own tables for web storage see the tool [clickhouse-static-files-uploader](/docs/en/operations/storing-data.md/#storing-data-on-webserver)
:::

In this `ATTACH TABLE` query the `UUID` provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.

```sql
# highlight-next-line
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
    price UInt32,
    date Date,
    postcode1 LowCardinality(String),
    postcode2 LowCardinality(String),
    type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
    is_new UInt8,
    duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
    addr1 String,
    addr2 String,
    street LowCardinality(String),
    locality LowCardinality(String),
    town LowCardinality(String),
    district LowCardinality(String),
    county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
  # highlight-start
  SETTINGS disk = disk(
      type=web,
      endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
      );
  # highlight-end
```

### Nested Dynamic Storage

This example query builds on the above dynamic disk configuration and shows how to
use a local disk to cache data from a table stored at a URL. Neither the cache disk
nor the web storage is configured in the ClickHouse configuration files; both are
configured in the CREATE/ATTACH query settings.

In the settings highlighted below notice that the disk of `type=web` is nested within
the disk of `type=cache`.

```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
    price UInt32,
    date Date,
    postcode1 LowCardinality(String),
    postcode2 LowCardinality(String),
    type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
    is_new UInt8,
    duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
    addr1 String,
    addr2 String,
    street LowCardinality(String),
    locality LowCardinality(String),
    town LowCardinality(String),
    district LowCardinality(String),
    county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
  # highlight-start
  SETTINGS disk = disk(
    type=cache,
    max_size='1Gi',
    path='/var/lib/clickhouse/custom_disk_cache/',
    disk=disk(
      type=web,
      endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
      )
  );
  # highlight-end
```

### Details {#details}

In the case of `MergeTree` tables, data is getting to disk in different ways:

- As a result of an insert (`INSERT` query).
- During background merges and [mutations](/docs/en/sql-reference/statements/alter/index.md#alter-mutations).
- When downloading from another replica.
- As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](/docs/en/sql-reference/statements/alter/partition.md/#alter_freeze-partition).

In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:

1.  The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
2.  Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).

Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.

In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](/docs/en/operations/system-tables/part_log.md/#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](/docs/en/operations/system-tables/parts.md/#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.

User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](/docs/en/sql-reference/statements/alter/partition.md/#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.

Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.

After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.

User can assign new big parts to different disks of a [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures) volume in a balanced way using the [min_bytes_to_rebalance_partition_over_jbod](/docs/en/operations/settings/merge-tree-settings.md/#min-bytes-to-rebalance-partition-over-jbod) setting.

## Using S3 for Data Storage {#table_engine-mergetree-s3}

:::note
Google Cloud Storage (GCS) is also supported using the type `s3`. See [GCS backed MergeTree](/docs/en/integrations/gcs).
:::

`MergeTree` family table engines can store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.

Configuration markup:
``` xml
<storage_configuration>
    ...
    <disks>
        <s3>
            <type>s3</type>
            <support_batch_delete>true</support_batch_delete>
            <endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/root-path/</endpoint>
            <access_key_id>your_access_key_id</access_key_id>
            <secret_access_key>your_secret_access_key</secret_access_key>
            <region></region>
            <header>Authorization: Bearer SOME-TOKEN</header>
            <server_side_encryption_customer_key_base64>your_base64_encoded_customer_key</server_side_encryption_customer_key_base64>
            <server_side_encryption_kms_key_id>your_kms_key_id</server_side_encryption_kms_key_id>
            <server_side_encryption_kms_encryption_context>your_kms_encryption_context</server_side_encryption_kms_encryption_context>
            <server_side_encryption_kms_bucket_key_enabled>true</server_side_encryption_kms_bucket_key_enabled>
            <proxy>
                <uri>http://proxy1</uri>
                <uri>http://proxy2</uri>
            </proxy>
            <connect_timeout_ms>10000</connect_timeout_ms>
            <request_timeout_ms>5000</request_timeout_ms>
            <retry_attempts>10</retry_attempts>
            <single_read_retries>4</single_read_retries>
            <min_bytes_for_seek>1000</min_bytes_for_seek>
            <metadata_path>/var/lib/clickhouse/disks/s3/</metadata_path>
            <skip_access_check>false</skip_access_check>
        </s3>
        <s3_cache>
            <type>cache</type>
            <disk>s3</disk>
            <path>/var/lib/clickhouse/disks/s3_cache/</path>
            <max_size>10Gi</max_size>
        </s3_cache>
    </disks>
    ...
</storage_configuration>
```

:::note cache configuration
ClickHouse versions 22.3 through 22.7 use a different cache configuration, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) if you are using one of those versions.
:::

### Configuring the S3 disk

Required parameters:

- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data.
- `access_key_id` — S3 access key id.
- `secret_access_key` — S3 secret access key.

Optional parameters:

- `region` — S3 region name.
- `support_batch_delete` — This controls the check to see if batch deletes are supported. Set this to `false` when using Google Cloud Storage (GCS) as GCS does not support batch deletes and preventing the checks will prevent error messages in the logs.
- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`.
- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
- `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL.
- `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`.
- `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`.
- `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`.
- `single_read_retries` — Number of retry attempts in case of connection drop during read. Default value is `4`.
- `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`.
- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
- `header` —  Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
- `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
- `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional.
- `server_side_encryption_kms_bucket_key_enabled` - If specified alongside `server_side_encryption_kms_key_id`, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be `true` or `false`, defaults to nothing (matches the bucket-level setting).
- `s3_max_put_rps` — Maximum PUT requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_put_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_put_rps`.
- `s3_max_get_rps` — Maximum GET requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_get_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_get_rps`.

### Configuring the cache

This is the cache configuration from above:
```xml
        <s3_cache>
            <type>cache</type>
            <disk>s3</disk>
            <path>/var/lib/clickhouse/disks/s3_cache/</path>
            <max_size>10Gi</max_size>
        </s3_cache>
```

These parameters define the cache layer:
- `type` — If a disk is of type `cache` it caches mark and index files in memory.
- `disk` — The name of the disk that will be cached.

Cache parameters:
- `path` — The path where metadata for the cache is stored.
- `max_size` — The size (amount of disk space) that the cache can grow to.

:::tip
There are several other cache parameters that you can use to tune your storage, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) for the details.
:::

S3 disk can be configured as `main` or `cold` storage:
``` xml
<storage_configuration>
    ...
    <disks>
        <s3>
            <type>s3</type>
            <endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/root-path/</endpoint>
            <access_key_id>your_access_key_id</access_key_id>
            <secret_access_key>your_secret_access_key</secret_access_key>
        </s3>
    </disks>
    <policies>
        <s3_main>
            <volumes>
                <main>
                    <disk>s3</disk>
                </main>
            </volumes>
        </s3_main>
        <s3_cold>
            <volumes>
                <main>
                    <disk>default</disk>
                </main>
                <external>
                    <disk>s3</disk>
                </external>
            </volumes>
            <move_factor>0.2</move_factor>
        </s3_cold>
    </policies>
    ...
</storage_configuration>
```

In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.

## Using Azure Blob Storage for Data Storage {#table_engine-mergetree-azure-blob-storage}

`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.

As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.

Configuration markup:
``` xml
<storage_configuration>
    ...
    <disks>
        <blob_storage_disk>
            <type>azure_blob_storage</type>
            <storage_account_url>http://account.blob.core.windows.net</storage_account_url>
            <container_name>container</container_name>
            <account_name>account</account_name>
            <account_key>pass123</account_key>
            <metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
            <cache_enabled>true</cache_enabled>
            <cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
            <skip_access_check>false</skip_access_check>
        </blob_storage_disk>
    </disks>
    ...
</storage_configuration>
```

Connection parameters:
* `storage_account_url` - **Required**, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
* `container_name` - Target container name, defaults to `default-container`.
* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.

Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
* `connection_string` - For authentication using a connection string.
* `account_name` and `account_key` - For authentication using Shared Key.

Limit parameters (mainly for internal usage):
* `s3_max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
* `min_bytes_for_seek` - Limits the size of a seekable region.
* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
* `s3_max_inflight_parts_for_one_file` - Limits the number of put requests that can be run concurrently for one object.

Other parameters:
* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
* `cache_enabled` - Allows to cache mark and index files on local FS. Default value is `true`.
* `cache_path` - Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.

Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).

  :::note Zero-copy replication is not ready for production
  Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher.  This feature is not recommended for production use.
  :::

## HDFS storage {#hdfs-storage}

In this sample configuration:
- the disk is of type `hdfs`
- the data is hosted at `hdfs://hdfs1:9000/clickhouse/`

```xml
<clickhouse>
    <storage_configuration>
        <disks>
            <hdfs>
                <type>hdfs</type>
                <endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
                <skip_access_check>true</skip_access_check>
            </hdfs>
            <hdd>
                <type>local</type>
                <path>/</path>
            </hdd>
        </disks>
        <policies>
            <hdfs>
                <volumes>
                    <main>
                        <disk>hdfs</disk>
                    </main>
                    <external>
                        <disk>hdd</disk>
                    </external>
                </volumes>
            </hdfs>
        </policies>
    </storage_configuration>
</clickhouse>
```

## Web storage (read-only) {#web-storage}

Web storage can be used for read-only purposes. An example use is for hosting sample
data, or for migrating data.

:::tip
Storage can also be configured temporarily within a query, if a web dataset is not expected
to be used routinely, see [dynamic storage](#dynamic-storage) and skip editing the
configuration file.
:::

In this sample configuration:
- the disk is of type `web`
- the data is hosted at `http://nginx:80/test1/`
- a cache on local storage is used

```xml
<clickhouse>
    <storage_configuration>
        <disks>
            <web>
                <type>web</type>
                <endpoint>http://nginx:80/test1/</endpoint>
            </web>
            <cached_web>
                <type>cache</type>
                <disk>web</disk>
                <path>cached_web_cache/</path>
                <max_size>100000000</max_size>
            </cached_web>
        </disks>
        <policies>
            <web>
                <volumes>
                    <main>
                        <disk>web</disk>
                    </main>
                </volumes>
            </web>
            <cached_web>
                <volumes>
                    <main>
                        <disk>cached_web</disk>
                    </main>
                </volumes>
            </cached_web>
        </policies>
    </storage_configuration>
</clickhouse>
```

## Virtual Columns {#virtual-columns}

- `_part` — Name of a part.
- `_part_index` — Sequential index of the part in the query result.
- `_partition_id` — Name of a partition.
- `_part_uuid` — Unique part identifier (if enabled MergeTree setting `assign_part_uuids`).
- `_partition_value` — Values (a tuple) of a `partition by` expression.
- `_sample_factor` — Sample factor (from the query).

## Column Statistics (Experimental) {#column-statistics}

The statistic declaration is in the columns section of the `CREATE` query.

``` sql
STATISTIC <list of columns> TYPE type
```

For tables from the `*MergeTree` family, statistics can be specified.

These lightweight statistics aggregate information about distribution of values in columns.
They can be used for query optimization (At current time they are used for moving expressions to PREWHERE).

#### Available Types of Column Statistics {#available-types-of-column-statistics}

-   `tdigest`

    Stores distribution of values from numeric columns in [TDigest](https://github.com/tdunning/t-digest) sketch.
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
-												add slugs

											
										
										
											2022-08-28 14:53:34 +00:00
+								slug: /en/engines/table-engines/mergetree-family/mergetree
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								sidebar_position: 11
 								sidebar_label:  MergeTree
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								# MergeTree
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Fix typos

											
										
										
											2019-08-23 10:55:34 +00:00
+								The `MergeTree` engine and other engines of this family (`*MergeTree`) are the most robust ClickHouse table engines.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Engines in the `MergeTree` family are designed for inserting a very large amount of data into a table. The data is quickly written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								Main features:
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Stores data sorted by primary key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    This allows you to create a small sparse index that helps find data faster.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Partitions can be used if the [partitioning key](/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key.md) is specified.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												fix typos in docs

											
										
										
											2021-05-04 13:12:39 +00:00
+								    ClickHouse supports certain operations with partitions that are more efficient than general operations on the same data with the same result. ClickHouse also automatically cuts off the partition data where the partitioning key is specified in the query.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Data replication support.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								    The family of `ReplicatedMergeTree` tables provides data replication. For more information, see [Data replication](/docs/en/engines/table-engines/mergetree-family/replication.md).
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Data sampling support.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    If necessary, you can set the data sampling method in the table.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::info
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								The [Merge](/docs/en/engines/table-engines/special/merge.md/#merge) engine does not belong to the `*MergeTree` family.
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
-												Restore some old manual anchors in docs (#9803)

* Simplify 404 page

* add es array_functions.md

* restore some old manual anchors

* update sitemaps

* trigger checks

* restore more old manual anchors

* refactor test.md + temporary disable failure again

* fix mistype
											
										
										
											2020-03-22 09:14:59 +00:00
+								## Creating a Table {#table_engine-mergetree-creating-a-table}
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
 								(
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS|EPHEMERAL expr1] [TTL expr1] [CODEC(codec1)] [[NOT] NULL|PRIMARY KEY],
 								    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS|EPHEMERAL expr2] [TTL expr2] [CODEC(codec2)] [[NOT] NULL|PRIMARY KEY],
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								    ...
-												Add docs

											
										
										
											2023-01-20 15:22:13 +00:00
+								    INDEX index_name1 expr1 TYPE type1(...) [GRANULARITY value1],
 								    INDEX index_name2 expr2 TYPE type2(...) [GRANULARITY value2],
-												Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
											
										
										
											2021-06-18 16:37:37 +00:00
+								    ...
-												Update mergetree.md

Currently WHERE is not supported.
											
										
										
											2021-06-19 15:10:51 +00:00
+								    PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
-												update docs and refine statements

											
										
										
											2023-09-08 00:27:17 +00:00
+								    PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
 								    ...
 								    STATISTIC <COLUMN LIST> TYPE type1,
 								    STATISTIC <COLUMN LIST> TYPE type2
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								) ENGINE = MergeTree()
-												DOCS-271: Updated the MergeTree() ORDER BY description (#11433)

* CLICKHOUSEDOCS-271: Updated the MergeTree() ORDER BY description.

* CLICKHOUSEDOCS-271: Fixes grammar.

* CLICKHOUSEDOCS-271: Updated by comments.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-06-06 17:44:48 +00:00
+								ORDER BY expr
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								[PARTITION BY expr]
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
+								[PRIMARY KEY expr]
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								[SAMPLE BY expr]
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								[TTL expr
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
+								    [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx' [, ...] ]
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								    [WHERE conditions]
 								    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ] ]
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								[SETTINGS name=value, ...]
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								For a description of parameters, see the [CREATE query description](/docs/en/sql-reference/statements/create/table.md).
-												DOCAPI-7430: MergeTree INDEX bloom filter docs. (#5992)

* DOCAPI-7430: MergeTree INDEX bloom filter docs.

* DOCAPI-7430: Updated bloom filter docs with the parameter description.
											
										
										
											2019-08-16 15:31:29 +00:00
-												remove extra space (#9736)


											
										
										
											2020-03-18 18:43:51 +00:00
+								### Query Clauses {#mergetree-query-clauses}
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### ENGINE
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								`ENGINE` — Name and parameters of the engine. `ENGINE = MergeTree()`. The `MergeTree` engine does not have parameters.
-												docs improvements based on comments [#CLICKHOUSE-3859]

											
										
										
											2018-12-05 11:37:45 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								#### ORDER_BY
-												docs improvements based on comments [#CLICKHOUSE-3859]

											
										
										
											2018-12-05 11:37:45 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								`ORDER BY` — The sorting key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								A tuple of column names or arbitrary expressions. Example: `ORDER BY (CounterID, EventDate)`.
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								ClickHouse uses the sorting key as a primary key if the primary key is not defined explicitly by the `PRIMARY KEY` clause.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								Use the `ORDER BY tuple()` syntax, if you do not need sorting. See [Selecting the Primary Key](#selecting-the-primary-key).
-												DOCS-271: Updated the MergeTree() ORDER BY description (#11433)

* CLICKHOUSEDOCS-271: Updated the MergeTree() ORDER BY description.

* CLICKHOUSEDOCS-271: Fixes grammar.

* CLICKHOUSEDOCS-271: Updated by comments.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-06-06 17:44:48 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								#### PARTITION BY
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												add PARTITION BY to s3 and hdfs docs

											
										
										
											2023-01-25 14:09:28 +00:00
+								`PARTITION BY` — The [partitioning key](/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key.md). Optional. In most cases, you don't need a partition key, and if you do need to partition, generally you do not need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/docs/en/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								#### PRIMARY KEY
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								`PRIMARY KEY` — The primary key if it [differs from the sorting key](#choosing-a-primary-key-that-differs-from-the-sorting-key). Optional.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								By default the primary key is the same as the sorting key (which is specified by the `ORDER BY` clause). Thus in most cases it is unnecessary to specify a separate `PRIMARY KEY` clause.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								#### SAMPLE BY
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								`SAMPLE BY` — An expression for sampling. Optional.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								If a sampling expression is used, the primary key must contain it. The result of a sampling expression must be an unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:23:02 +00:00
+								####  TTL
 								`TTL` — A list of rules specifying storage duration of rows and defining logic of automatic parts movement [between disks and volumes](#table_engine-mergetree-multiple-volumes). Optional.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								Expression must have one `Date` or `DateTime` column as a result. Example:
 								```
 								TTL date + INTERVAL 1 DAY
 								```
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								Type of the rule `DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'|GROUP BY` specifies an action to be done with the part if the expression is satisfied (reaches current time): removal of expired rows, moving a part (if expression is satisfied for all rows in a part) to specified disk (`TO DISK 'xxx'`) or to volume (`TO VOLUME 'xxx'`), or aggregating values in expired rows. Default type of the rule is removal (`DELETE`). List of multiple rules can be specified, but there should be no more than one `DELETE` rule.
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								For more details, see [TTL for columns and tables](#table_engine-mergetree-ttl)
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								### SETTINGS
 								Additional parameters that control the behavior of the `MergeTree` (optional):
 								#### index_granularity
 								`index_granularity` — Maximum number of data rows between the marks of an index. Default value: 8192. See [Data Storage](#mergetree-data-storage).
 								#### index_granularity_bytes
 								`index_granularity_bytes` — Maximum size of data granules in bytes. Default value: 10Mb. To restrict the granule size only by number of rows, set to 0 (not recommended). See [Data Storage](#mergetree-data-storage).
 								#### min_index_granularity_bytes
 								`min_index_granularity_bytes` — Min allowed size of data granules in bytes. Default value: 1024b. To provide a safeguard against accidentally creating tables with very low index_granularity_bytes. See [Data Storage](#mergetree-data-storage).
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### enable_mixed_granularity_parts
 								`enable_mixed_granularity_parts` — Enables or disables transitioning to control the granule size with the `index_granularity_bytes` setting. Before version 19.11, there was only the `index_granularity` setting for restricting granule size. The `index_granularity_bytes` setting improves ClickHouse performance when selecting data from tables with big rows (tens and hundreds of megabytes). If you have tables with big rows, you can enable this setting for the tables to improve the efficiency of `SELECT` queries.
 								#### use_minimalistic_part_header_in_zookeeper
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								`use_minimalistic_part_header_in_zookeeper` — Storage method of the data parts headers in ZooKeeper. If `use_minimalistic_part_header_in_zookeeper=1`, then ZooKeeper stores less data. For more information, see the [setting description](/docs/en/operations/server-configuration-parameters/settings.md/#server-settings-use_minimalistic_part_header_in_zookeeper) in “Server configuration parameters”.
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
 								#### min_merge_bytes_to_use_direct_io
 								`min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.
 								#### merge_with_ttl_timeout
 								`merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with delete TTL. Default value: `14400` seconds (4 hours).
 								#### merge_with_recompression_ttl_timeout
 								`merge_with_recompression_ttl_timeout` — Minimum delay in seconds before repeating a merge with recompression TTL. Default value: `14400` seconds (4 hours).
 								#### try_fetch_recompressed_part_timeout
 								`try_fetch_recompressed_part_timeout` — Timeout (in seconds) before starting merge with recompression. During this time ClickHouse tries to fetch recompressed part from replica which assigned this merge with recompression. Default value: `7200` seconds (2 hours).
 								#### write_final_mark
 								`write_final_mark` — Enables or disables writing the final index mark at the end of data part (after the last byte). Default value: 1. Don’t turn it off.
 								#### merge_max_block_size
 								`merge_max_block_size` — Maximum number of rows in block for merge operations. Default value: 8192.
 								#### storage_policy
 								`storage_policy` — Storage policy. See [Using Multiple Block Devices for Data Storage](#table_engine-mergetree-multiple-volumes).
 								#### min_bytes_for_wide_part
 								`min_bytes_for_wide_part`, `min_rows_for_wide_part` — Minimum number of bytes/rows in a data part that can be stored in `Wide` format. You can set one, both or none of these settings. See [Data Storage](#mergetree-data-storage).
 								#### max_parts_in_total
 								`max_parts_in_total` — Maximum number of parts in all partitions.
 								#### max_compress_block_size
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								`max_compress_block_size` — Maximum size of blocks of uncompressed data before compressing for writing to a table. You can also specify this setting in the global settings (see [max_compress_block_size](/docs/en/operations/settings/settings.md/#max-compress-block-size) setting). The value specified when table is created overrides the global value for this setting.
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
 								#### min_compress_block_size
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								`min_compress_block_size` — Minimum size of blocks of uncompressed data required for compression when writing the next mark. You can also specify this setting in the global settings (see [min_compress_block_size](/docs/en/operations/settings/settings.md/#min-compress-block-size) setting). The value specified when table is created overrides the global value for this setting.
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
 								#### max_partitions_to_read
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								`max_partitions_to_read` — Limits the maximum number of partitions that can be accessed in one query. You can also specify setting [max_partitions_to_read](/docs/en/operations/settings/merge-tree-settings.md/#max-partitions-to-read) in the global setting.
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
-												DOCAPI-7062: MySQL database engine and MergeTree TTL docs. EN review, RU translation (#6407)

* Update create.md

* Update mergetree.md

* Update index.md

* Update mysql.md

* DOCAPI-7062: RU translation.

* DOCAPI-7062: Fixes

* Update docs/ru/database_engines/index.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/query_language/create.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* DOCAPI-7062: Clarifications and fixes.

* DOCAPI-7062: Clarifications.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

											
										
										
											2019-08-14 16:42:09 +00:00
+								**Example of Sections Setting**
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								In the example, we set partitioning by month.
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								We also set an expression for sampling as a hash by the user ID. This allows you to pseudorandomize the data in the table for each `CounterID` and `EventDate`. If you define a [SAMPLE](/docs/en/sql-reference/statements/select/sample.md/#select-sample-clause) clause when selecting the data, ClickHouse will return an evenly pseudorandom data sample for a subset of users.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								The `index_granularity` setting can be omitted because 8192 is the default value.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								<details markdown="1">
 								<summary>Deprecated Method for Creating a Table</summary>
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												standardize admonitions

											
										
										
											2023-03-27 18:54:05 +00:00
+								:::note
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								Do not use this method in new projects. If possible, switch old projects to the method described above.
 								:::
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
 								(
 								    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
 								    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
 								    ...
 								) ENGINE [=] MergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								```
-												DOCAPI-7062: MySQL database engine and MergeTree TTL docs. EN review, RU translation (#6407)

* Update create.md

* Update mergetree.md

* Update index.md

* Update mysql.md

* DOCAPI-7062: RU translation.

* DOCAPI-7062: Fixes

* Update docs/ru/database_engines/index.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/query_language/create.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* DOCAPI-7062: Clarifications and fixes.

* DOCAPI-7062: Clarifications.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

											
										
										
											2019-08-14 16:42:09 +00:00
+								**MergeTree() Parameters**
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `date-column` — The name of a column of the [Date](/docs/en/sql-reference/data-types/date.md) type. ClickHouse automatically creates partitions by month based on this column. The partition names are in the `"YYYYMM"` format.
 								- `sampling_expression` — An expression for sampling.
 								- `(primary, key)` — Primary key. Type: [Tuple()](/docs/en/sql-reference/data-types/tuple.md)
 								- `index_granularity` — The granularity of an index. The number of data rows between the “marks” of an index. The value 8192 is appropriate for most tasks.
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								**Example**
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)
 								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The `MergeTree` engine is configured in the same way as in the example above for the main engine configuration method.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								</details>
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												remove extra space (#9736)


											
										
										
											2020-03-18 18:43:51 +00:00
+								## Data Storage {#mergetree-data-storage}
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												DOCAPI-7413: T64 codec docs (#6347)


											
										
										
											2019-08-07 16:02:56 +00:00
+								A table consists of data parts sorted by primary key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by primary key. For example, if the primary key is `(CounterID, Date)`, the data in the part is sorted by `CounterID`, and within each `CounterID`, it is ordered by `Date`.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								Data parts can be stored in `Wide` or `Compact` format. In `Wide` format each column is stored in a separate file in a filesystem, in `Compact` format all columns are stored in one file. `Compact` format can be used to increase performance of small and frequent inserts.
-												DOCSUP-1315 Updated description of the 'parts' system table  (#134) (#12535)

* Updated description of the 'parts' system table and 2 new settings for the MergeTree table

* Apply suggestions from code review

Co-authored-by: BayoNet <da-daos@yandex.ru>

* Data part format description moved to the data storage section.

* An error fixed in english version and russian version added.

* Minor bug fixed in russian version.

Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
Co-authored-by: BayoNet <da-daos@yandex.ru>

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-07-17 15:05:07 +00:00
 								Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the table engine. If the number of bytes or rows in a data part is less then the corresponding setting's value, the part is stored in `Compact` format. Otherwise it is stored in `Wide` format. If none of these settings is set, data parts are stored in `Wide` format.
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it’s in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												DOCAPI-6422: EN review, RU translation. Docs for adaptive index granularity and some settings (#7381)

* Typo fix.

* Links fix.

* Fixed links in docs.

* More fixes.

* Link fixes.

* Update settings.md (#64)

* Update mergetree.md (#65)

* DOCAPI-6422: EN review. RU translation.

* Update docs/en/operations/settings/settings.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/settings/settings.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* DOCAPI-6422: Update.

											
										
										
											2019-11-06 05:24:33 +00:00
+								The granule size is restricted by the `index_granularity` and `index_granularity_bytes` settings of the table engine. The number of rows in a granule lays in the `[1, index_granularity]` range, depending on the size of the rows. The size of a granule can exceed `index_granularity_bytes` if the size of a single row is greater than the value of the setting. In this case, the size of the granule equals the size of the row.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								## Primary Keys and Indexes in Queries {#primary-keys-and-indexes-in-queries}
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Take the `(CounterID, Date)` primary key as an example. In this case, the sorting and index can be illustrated as follows:
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								      Whole data:     [---------------------------------------------]
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								      CounterID:      [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
 								      Date:           [1111111222222233331233211111222222333211111112122222223111112223311122333]
 								      Marks:           |      |      |      |      |      |      |      |      |      |      |
 								                      a,1    a,2    a,3    b,3    e,2    e,3    g,1    h,2    i,1    i,3    l,3
 								      Marks numbers:   0      1      2      3      4      5      6      7      8      9      10
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								If the data query specifies:
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `CounterID in ('a', 'h')`, the server reads the data in the ranges of marks `[0, 3)` and `[6, 8)`.
 								- `CounterID IN ('a', 'h') AND Date = 3`, the server reads the data in the ranges of marks `[1, 3)` and `[7, 8)`.
 								- `Date = 3`, the server reads the data in the range of marks `[1, 10]`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The examples above show that it is always more effective to use an index than a full scan.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												DOCAPI-6422: EN docs for adaptive index granularity and some settings (#7012)

* Typo fix.

* Links fix.

* Fixed links in docs.

* More fixes.

* DOCAPI-6422: Adaptive granularity

* DOCAPI-6422: fix.

* Update settings.md

* Update settings.md

* DOCAPI-6422: Clarifications and fixes.

* DOCAPI-6422: Fix.

* DOCAPI-6422: Link fix.

											
										
										
											2019-09-26 08:45:08 +00:00
+								A sparse index allows extra data to be read. When reading a single range of the primary key, up to `index_granularity * 2` extra rows in each data block can be read.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Sparse indexes allow you to work with a very large number of table rows, because in most cases, such indexes fit in the computer’s RAM.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								You can use `Nullable`-typed expressions in the `PRIMARY KEY` and `ORDER BY` clauses but it is strongly discouraged. To allow this feature, turn on the [allow_nullable_key](/docs/en/operations/settings/settings.md/#allow-nullable-key) setting. The [NULLS_LAST](/docs/en/sql-reference/statements/select/order-by.md/#sorting-of-special-values) principle applies for `NULL` values in the `ORDER BY` clause.
-												DOCSUP-4435: Allow nullable key in MergeTree (#17345)

* Update settings.md and mergetree.md

Задокументировал настройку allow_nullable_key. Добавил пару предложений в mergetree.md.

* Update mergetree.md

Исправляю битую ссылку.

* Update mergetree.md

Вставил ссылку на allow_nullable_key.

* Update mergetree.md

Правлю битую ссылку.

* Update mergetree.md

Исправляю битую ссылку.

* Update settings.md

Правлю битую ссылку.

* Update mergetree.md and settings.md

Выполнил перевод на русский язык и поправил немного английскую версию.

* Update mergetree.md

Убрал выражение 'для семейства mergetree'.

* Update settings.md

Внес поправки в русскую версию.

* Update mergetree.md and settings.md

Сделал поправки в английскую версию на основе комментарий в PR.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-12-04 18:17:58 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								### Selecting the Primary Key {#selecting-the-primary-key}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The number of columns in the primary key is not explicitly limited. Depending on the data structure, you can include more or fewer columns in the primary key. This may:
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Improve the performance of an index.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    If the primary key is `(a, b)`, then adding another column `c` will improve the performance if the following conditions are met:
-												DOCAPI-7459: EN review, RU translation for skip_unavailable_shards docs (#6996)

* Typo fix.

* Update settings.md (#46)

* DOCAPI-7459: RU translation.

* DOCAPI-7459: Translation.

* DOCAPI-7459: Fix.

* DOCAPI-7459: Rewrited the whole text after Alexey's review.

* Update docs/en/operations/settings/settings.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/settings/settings.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/settings/settings.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

											
										
										
											2019-11-05 19:15:54 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								    - There are queries with a condition on column `c`.
 								    - Long data ranges (several times longer than the `index_granularity`) with identical values for `(a, b)` are common. In other words, when adding another column allows you to skip quite long data ranges.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Improve data compression.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    ClickHouse sorts data by primary key, so the higher the consistency, the better the compression.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Provide additional logic when merging data parts in the [CollapsingMergeTree](/docs/en/engines/table-engines/mergetree-family/collapsingmergetree.md/#table_engine-collapsingmergetree) and [SummingMergeTree](/docs/en/engines/table-engines/mergetree-family/summingmergetree.md) engines.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    In this case it makes sense to specify the *sorting key* that is different from the primary key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#settings-max-insert-threads).
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#settings-max_threads) `SELECT` queries.
-												DOCS-271: Updated the MergeTree() ORDER BY description (#11433)

* CLICKHOUSEDOCS-271: Updated the MergeTree() ORDER BY description.

* CLICKHOUSEDOCS-271: Fixes grammar.

* CLICKHOUSEDOCS-271: Updated by comments.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-06-06 17:44:48 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								### Choosing a Primary Key that Differs from the Sorting Key {#choosing-a-primary-key-that-differs-from-the-sorting-key}
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								It is possible to specify a primary key (an expression with values that are written in the index file for each mark) that is different from the sorting key (an expression for sorting the rows in data parts). In this case the primary key expression tuple must be a prefix of the sorting key expression tuple.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								This feature is helpful when using the [SummingMergeTree](/docs/en/engines/table-engines/mergetree-family/summingmergetree.md) and
 								[AggregatingMergeTree](/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree.md) table engines. In a common case when using these engines, the table has two types of columns: *dimensions* and *measures*. Typical queries aggregate values of measure columns with arbitrary `GROUP BY` and filtering by dimensions. Because SummingMergeTree and AggregatingMergeTree aggregate rows with the same value of the sorting key, it is natural to add all dimensions to it. As a result, the key expression consists of a long list of columns and this list must be frequently updated with newly added dimensions.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								In this case it makes sense to leave only a few columns in the primary key that will provide efficient range scans and add the remaining dimension columns to the sorting key tuple.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												fix nav

											
										
										
											2022-11-09 01:21:26 +00:00
+								[ALTER](/docs/en/sql-reference/statements/alter/index.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts do not need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								### Use of Indexes and Partitions in Queries {#use-of-indexes-and-partitions-in-queries}
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								For `SELECT` queries, ClickHouse analyzes whether an index can be used. An index can be used if the `WHERE/PREWHERE` clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has `IN` or `LIKE` with a fixed prefix on columns or expressions that are in the primary key or partitioning key, or on certain partially repetitive functions of these columns, or logical relationships of these expressions.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Thus, it is possible to quickly run queries on one or many ranges of the primary key. In this example, queries will be fast when run for a specific tracking tag, for a specific tag and date range, for a specific tag and date, for multiple tags with a date range, and so on.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Let’s look at the engine configured as follows:
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								```sql
 								ENGINE MergeTree()
 								PARTITION BY toYYYYMM(EventDate)
 								ORDER BY (CounterID, EventDate)
 								SETTINGS index_granularity=8192
 								```
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								In this case, in queries:
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
+								``` sql
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								SELECT count() FROM table
 								WHERE EventDate = toDate(now())
 								AND CounterID = 34
 								SELECT count() FROM table
 								WHERE EventDate = toDate(now())
 								AND (CounterID = 34 OR CounterID = 42)
 								SELECT count() FROM table
 								WHERE ((EventDate >= toDate('2014-01-01')
 								AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01'))
 								AND CounterID IN (101500, 731962, 160656)
 								AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								ClickHouse will use the primary key index to trim improper data and the monthly partitioning key to trim partitions that are in improper date ranges.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The queries above show that the index is used even for complex expressions. Reading from the table is organized so that using the index can’t be slower than a full scan.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								In the example below, the index can’t be used.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
+								``` sql
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
 								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#settings-force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								### Use of Index for Partially-monotonic Primary Keys {#use-of-index-for-partially-monotonic-primary-keys}
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
-												DOCAPI-4148: EN review, RU translation. MergeTree partially monotonic keys (#6085)

* Update http.md

* Update settings.md

* Update mergetree.md

* DOCAPI-6213: RU translastion.

* DOCAPI-4148: 4148

											
										
										
											2019-07-29 10:19:30 +00:00
+								Consider, for example, the days of the month. They form a [monotonic sequence](https://en.wikipedia.org/wiki/Monotonic_function) for one month, but not monotonic for more extended periods. This is a partially-monotonic sequence. If a user creates the table with partially-monotonic primary key, ClickHouse creates a sparse index as usual. When a user selects data from this kind of table, ClickHouse analyzes the query conditions. If the user wants to get data between two marks of the index and both these marks fall within one month, ClickHouse can use the index in this particular case because it can calculate the distance between the parameters of a query and index marks.
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								ClickHouse cannot use an index if the values of the primary key in the query parameter range do not represent a monotonic sequence. In this case, ClickHouse uses the full scan method.
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
-												DOCAPI-4148: EN review, RU translation. MergeTree partially monotonic keys (#6085)

* Update http.md

* Update settings.md

* Update mergetree.md

* DOCAPI-6213: RU translastion.

* DOCAPI-4148: 4148

											
										
										
											2019-07-29 10:19:30 +00:00
+								ClickHouse uses this logic not only for days of the month sequences, but for any primary key that represents a partially-monotonic sequence.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												Remove note about experimental from skipping indexes docs (#11704)

https://github.com/ClickHouse/ClickHouse/pull/7974
											
										
										
											2020-06-16 19:07:22 +00:00
+								### Data Skipping Indexes {#table_engine-mergetree-data_skipping-indexes}
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								The index declaration is in the columns section of the `CREATE` query.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
 								``` sql
-												Add docs

											
										
										
											2023-01-20 15:22:13 +00:00
+								INDEX index_name expr TYPE type(...) [GRANULARITY granularity_value]
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								For tables from the `*MergeTree` family, data skipping indices can be specified.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								These indices aggregate some information about the specified expression on blocks, which consist of `granularity_value` granules (the size of the granule is specified using the `index_granularity` setting in the table engine). Then these aggregates are used in `SELECT` queries for reducing the amount of data to read from the disk by skipping big blocks of data where the `where` query cannot be satisfied.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												Add docs

											
										
										
											2023-01-20 15:22:13 +00:00
+								The `GRANULARITY` clause can be omitted, the default value of `granularity_value` is 1.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								**Example**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								CREATE TABLE table_name
 								(
 								    u64 UInt64,
 								    i32 Int32,
 								    s String,
 								    ...
-												Docs: Update secondary index example

Fixes: #47923

											
										
										
											2023-03-23 22:36:38 +00:00
+								    INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3,
 								    INDEX idx2 u64 * i32 TYPE minmax GRANULARITY 3,
 								    INDEX idx3 u64 * length(s) TYPE set(1000) GRANULARITY 4
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								) ENGINE = MergeTree()
 								...
 								```
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Docs: Update secondary index example

Fixes: #47923

											
										
										
											2023-03-23 22:36:38 +00:00
+								SELECT count() FROM table WHERE u64 == 10;
 								SELECT count() FROM table WHERE u64 * i32 >= 1234
 								SELECT count() FROM table WHERE u64 * length(s) == 1234
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```
-												+ on

											
										
										
											2023-03-23 10:51:25 +00:00
+								Data skipping indexes can also be created on composite columns:
-												More fixup

											
										
										
											2023-03-23 10:50:40 +00:00
 								```sql
 								-- on columns of type Map:
 								INDEX map_key_index mapKeys(map_column) TYPE bloom_filter
 								INDEX map_value_index mapValues(map_column) TYPE bloom_filter
 								-- on columns of type Tuple:
 								INDEX tuple_1_index tuple_column.1 TYPE bloom_filter
 								INDEX tuple_2_index tuple_column.2 TYPE bloom_filter
 								-- on columns of type Nested:
 								INDEX nested_1_index col.nested_col1 TYPE bloom_filter
 								INDEX nested_2_index col.nested_col2 TYPE bloom_filter
 								```
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								### Available Types of Indices {#available-types-of-indices}
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								#### MinMax
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								Stores extremes of the specified expression (if the expression is `tuple`, then it stores extremes for each element of `tuple`), uses stored info for skipping blocks of data like the primary key.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								Syntax: `minmax`
 								#### Set
-												unique

											
										
										
											2019-01-29 18:22:12 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								Stores unique values of the specified expression (no more than `max_rows` rows, `max_rows=0` means “no limits”). Uses the values to check if the `WHERE` expression is not satisfiable on a block of data.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								Syntax: `set(max_rows)`
 								#### Bloom Filter
-												Mini fix

											
										
										
											2023-01-19 17:02:41 +00:00
+								Stores a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) for the specified columns. An optional `false_positive` parameter with possible values between 0 and 1 specifies the probability of receiving a false positive response from the filter. Default value: 0.025. Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `String`, `FixedString`, `Array`, `LowCardinality`, `Nullable`, `UUID` and `Map`. For the `Map` data type, the client can specify if the index should be created for keys or values using [mapKeys](/docs/en/sql-reference/functions/tuple-map-functions.md/#mapkeys) or [mapValues](/docs/en/sql-reference/functions/tuple-map-functions.md/#mapvalues) function.
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
 								Syntax: `bloom_filter([false_positive])`
 								#### N-gram Bloom Filter
-												docs

											
										
										
											2019-03-12 15:16:39 +00:00
-												Mini fix

											
										
										
											2023-01-19 17:02:41 +00:00
+								Stores a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) that contains all n-grams from a block of data. Only works with datatypes: [String](/docs/en/sql-reference/data-types/string.md), [FixedString](/docs/en/sql-reference/data-types/fixedstring.md) and [Map](/docs/en/sql-reference/data-types/map.md). Can be used for optimization of `EQUALS`, `LIKE` and `IN` expressions.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								Syntax: `ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								- `n` — ngram size,
 								- `size_of_bloom_filter_in_bytes` — Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).
 								- `number_of_hash_functions` — The number of hash functions used in the Bloom filter.
 								- `random_seed` — The seed for Bloom filter hash functions.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								Users can create [UDF](/docs/en/sql-reference/statements/create/function.md) to estimate the parameters set of `ngrambf_v1`. Query statements are as follows:
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
 								```sql
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								CREATE FUNCTION bfEstimateFunctions [ON CLUSTER cluster]
 								AS
 								(total_nubmer_of_all_grams, size_of_bloom_filter_in_bits) -> round((size_of_bloom_filter_in_bits / total_nubmer_of_all_grams) * log(2));
 								CREATE FUNCTION bfEstimateBmSize [ON CLUSTER cluster]
 								AS
 								(total_nubmer_of_all_grams,  probability_of_false_positives) -> ceil((total_nubmer_of_all_grams * log(probability_of_false_positives)) / log(1 / pow(2, log(2))));
 								CREATE FUNCTION bfEstimateFalsePositive [ON CLUSTER cluster]
 								AS
 								(total_nubmer_of_all_grams, number_of_hash_functions, size_of_bloom_filter_in_bytes) -> pow(1 - exp(-number_of_hash_functions/ (size_of_bloom_filter_in_bytes / total_nubmer_of_all_grams)), number_of_hash_functions);
 								CREATE FUNCTION bfEstimateGramNumber [ON CLUSTER cluster]
 								AS
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
+								(number_of_hash_functions, probability_of_false_positives, size_of_bloom_filter_in_bytes) -> ceil(size_of_bloom_filter_in_bytes / (-number_of_hash_functions / log(1 - exp(log(probability_of_false_positives) / number_of_hash_functions))))
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								```
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
+								To use those functions,we need to specify two parameter at least.
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								For example, if there 4300 ngrams in the granule and we expect false positives to be less than 0.0001. The other parameters can be estimated by executing following queries:
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
 								```sql
 								--- estimate number of bits in the filter
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								SELECT bfEstimateBmSize(4300, 0.0001) / 8 as size_of_bloom_filter_in_bytes;
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
 								┌─size_of_bloom_filter_in_bytes─┐
 								│                         10304 │
 								└───────────────────────────────┘
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
+								--- estimate number of hash functions
 								SELECT bfEstimateFunctions(4300, bfEstimateBmSize(4300, 0.0001)) as number_of_hash_functions
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
-												add docs for create bmEstimate functions

											
										
										
											2023-04-26 09:51:39 +00:00
+								┌─number_of_hash_functions─┐
 								│                       13 │
 								└──────────────────────────┘
 								```
 								Of course, you can also use those functions to estimate parameters by other conditions.
 								The functions refer to the content [here](https://hur.st/bloomfilter).
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								#### Token Bloom Filter
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												fix formatting of code clocks and lists

											
										
										
											2022-06-30 16:45:10 +00:00
+								The same as `ngrambf_v1`, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.
-												docs

											
										
										
											2019-03-12 15:16:39 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								Syntax: `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`
-												DOCSUP-15198: output_format_csv_null_representation setting translation (#29977)

* Перевод без изменений содержания

* в ру-версию добавлены ngrambf_v1 и tokenbf_v1

* Update mergetree.md

* Update docs/ru/operations/settings/settings.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/operations/settings/settings.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/operations/settings/settings.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/operations/settings/settings.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/operations/settings/settings.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/engines/table-engines/mergetree-family/mergetree.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/engines/table-engines/mergetree-family/mergetree.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Update docs/ru/engines/table-engines/mergetree-family/mergetree.md

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>

* Corrections and translation

* in EN ver. lines 349-351 were included into the codeblock -- moved them to the proper place
* ...

Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>
											
										
										
											2021-10-18 12:40:26 +00:00
-												Docs: Beautify section on secondary index types

											
										
										
											2023-01-19 16:58:32 +00:00
+								#### Special-purpose
-												Fix bad_cast in Annoy index

- Problem originally found by data type fuzzer
  https://s3.amazonaws.com/clickhouse-test-reports/42180/2f83d8790581dce0ffeec56c137b1d13160cfa7b/fuzzer_astfuzzermsan//report.html

- This commit restricts which data types are allowed for Annoy indexes
  (similar things are done for other index types).

											
										
										
											2022-10-19 12:35:47 +00:00
-												Misc Annoy fixes

											
										
										
											2023-06-08 08:10:40 +00:00
+								- Experimental indexes to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
-												Initial inverted index docs

											
										
										
											2023-01-19 18:38:07 +00:00
+								- An experimental inverted index to support full-text search. See [here](invertedindexes.md) for details.
-												Bloom filter map added support for has function

											
										
										
											2021-09-22 22:10:14 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								### Functions Support {#functions-support}
-												DOCAPI-7695: Functions support for indexes (#6784)


											
										
										
											2019-09-06 09:07:23 +00:00
-												DOCAPI-7695: EN review, RU translation. Functions support for indexes (#7045)

* Typo fix.

* DOCAPI-7695: Typo fixed

* Update mergetree.md (#49)

* DOCAPI-7695: RU translation

* Update mergetree.md

											
										
										
											2019-09-23 23:50:26 +00:00
+								Conditions in the `WHERE` clause contains calls of the functions that operate with columns. If the column is a part of an index, ClickHouse tries to use this index when performing the functions. ClickHouse supports different subsets of functions for using indexes.
-												DOCAPI-7695: Functions support for indexes (#6784)


											
										
										
											2019-09-06 09:07:23 +00:00
-												Docs: Update index support of has(), hasAny(), hasAll()

											
										
										
											2023-03-04 17:27:47 +00:00
+								Indexes of type `set` can be utilized by all functions. The other index types are supported as follows:
-												DOCAPI-7695: Functions support for indexes (#6784)


											
										
										
											2019-09-06 09:07:23 +00:00
-												Implement tokenbf_v1 index utilization for hasTokenCaseInsensitive

											
										
										
											2023-02-09 23:42:08 +00:00
+								| Function (operator) / Index                                                                                | primary key | minmax | ngrambf_v1 | tokenbf_v1 | bloom_filter | inverted |
 								|------------------------------------------------------------------------------------------------------------|-------------|--------|------------|------------|--------------|----------|
-												Small follow-up to #46252

											
										
										
											2023-03-28 20:38:43 +00:00
+								| [equals (=, ==)](/docs/en/sql-reference/functions/comparison-functions.md/#function-equals)                | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
 								| [notEquals(!=, &lt;&gt;)](/docs/en/sql-reference/functions/comparison-functions.md/#function-notequals)    | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
 								| [like](/docs/en/sql-reference/functions/string-search-functions.md/#function-like)                         | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
 								| [notLike](/docs/en/sql-reference/functions/string-search-functions.md/#function-notlike)                   | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
 								| [startsWith](/docs/en/sql-reference/functions/string-functions.md/#startswith)                             | ✔           | ✔      | ✔          | ✔          | ✗            | ✔        |
 								| [endsWith](/docs/en/sql-reference/functions/string-functions.md/#endswith)                                 | ✗           | ✗      | ✔          | ✔          | ✗            | ✔        |
 								| [multiSearchAny](/docs/en/sql-reference/functions/string-search-functions.md/#function-multisearchany)     | ✗           | ✗      | ✔          | ✗          | ✗            | ✔        |
 								| [in](/docs/en/sql-reference/functions/in-functions#in-functions)                                           | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
 								| [notIn](/docs/en/sql-reference/functions/in-functions#in-functions)                                        | ✔           | ✔      | ✔          | ✔          | ✔            | ✔        |
 								| [less (<)](/docs/en/sql-reference/functions/comparison-functions.md/#function-less)                        | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [greater (>)](/docs/en/sql-reference/functions/comparison-functions.md/#function-greater)                  | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [lessOrEquals (<=)](/docs/en/sql-reference/functions/comparison-functions.md/#function-lessorequals)       | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [greaterOrEquals (>=)](/docs/en/sql-reference/functions/comparison-functions.md/#function-greaterorequals) | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [empty](/docs/en/sql-reference/functions/array-functions#function-empty)                                   | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [notEmpty](/docs/en/sql-reference/functions/array-functions#function-notempty)                             | ✔           | ✔      | ✗          | ✗          | ✗            | ✗        |
 								| [has](/docs/en/sql-reference/functions/array-functions#function-has)                                       | ✗           | ✗      | ✔          | ✔          | ✔            | ✔        |
 								| [hasAny](/docs/en/sql-reference/functions/array-functions#function-hasAny)                                 | ✗           | ✗      | ✗          | ✗          | ✔            | ✗        |
 								| [hasAll](/docs/en/sql-reference/functions/array-functions#function-hasAll)                                 | ✗           | ✗      | ✗          | ✗          | ✔            | ✗        |
 								| hasToken                                                                                                   | ✗           | ✗      | ✗          | ✔          | ✗            | ✔        |
 								| hasTokenOrNull                                                                                             | ✗           | ✗      | ✗          | ✔          | ✗            | ✔        |
 								| hasTokenCaseInsensitive (*)                                                                                | ✗           | ✗      | ✗          | ✔          | ✗            | ✗        |
 								| hasTokenCaseInsensitiveOrNull (*)                                                                          | ✗           | ✗      | ✗          | ✔          | ✗            | ✗        |
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
 								Functions with a constant argument that is less than ngram size can’t be used by `ngrambf_v1` for query optimization.
-												Small follow-up to #46252

											
										
										
											2023-03-28 20:38:43 +00:00
-												Fix docs for case insensitive searches with a token bloom filter
											
										
										
											2023-04-13 20:04:55 +00:00
+								(*) For `hasTokenCaseInsensitive` and `hasTokenCaseInsensitiveOrNull` to be effective, the `tokenbf_v1` index must be created on lowercased data, for example `INDEX idx (lower(str_col)) TYPE tokenbf_v1(512, 3, 0)`.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								Bloom filters can have false positive matches, so the `ngrambf_v1`, `tokenbf_v1`, and `bloom_filter` indexes can not be used for optimizing queries where the result of a function is expected to be false.
 								For example:
-												DOCAPI-7695: Functions support for indexes (#6784)


											
										
										
											2019-09-06 09:07:23 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Can be optimized:
 								    - `s LIKE '%test%'`
 								    - `NOT s NOT LIKE '%test%'`
 								    - `s = 1`
 								    - `NOT s != 1`
 								    - `startsWith(s, 'test')`
 								- Can not be optimized:
 								    - `NOT s LIKE '%test%'`
 								    - `s NOT LIKE '%test%'`
 								    - `NOT s = 1`
 								    - `s != 1`
 								    - `NOT startsWith(s, 'test')`
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Revert "Revert "Add Annoy index""

This reverts commit 6fdfb964d0a70e5d2a78cb727bdf30d0ba5a1a34.

											
										
										
											2022-08-30 15:26:56 +00:00
-												Initial

											
										
										
											2021-08-16 01:57:09 +00:00
+								## Projections {#projections}
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Projections are like [materialized views](/docs/en/sql-reference/statements/create/view.md/#materialized) but defined in part-level. It provides consistency guarantees along with automatic usage in queries.
-												fix note

											
										
										
											2022-08-12 18:40:09 +00:00
 								:::note
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								When you are implementing projections you should also consider the [force_optimize_projection](/docs/en/operations/settings/settings.md/#force-optimize-projection) setting.
-												Update docs/en/engines/table-engines/mergetree-family/mergetree.md
											
										
										
											2022-07-25 16:58:57 +00:00
+								:::
-												fix note

											
										
										
											2022-08-12 18:40:09 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Projections are not supported in the `SELECT` statements with the [FINAL](/docs/en/sql-reference/statements/select/from.md/#select-from-final) modifier.
-												Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
											
										
										
											2021-06-18 16:37:37 +00:00
-												Initial

											
										
										
											2021-08-16 01:57:09 +00:00
+								### Projection Query {#projection-query}
-												Bloom filter map added support for has function

											
										
										
											2021-09-22 22:10:14 +00:00
+								A projection query is what defines a projection. It implicitly selects data from the parent table.
-												Apply suggestions from code review

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-08-20 12:36:25 +00:00
+								**Syntax**
-												Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
											
										
										
											2021-06-18 16:37:37 +00:00
-												Initial

											
										
										
											2021-08-16 01:57:09 +00:00
+								```sql
 								SELECT <column list expr> [GROUP BY] <group keys expr> [ORDER BY] <expr>
 								```
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Projections can be modified or dropped with the [ALTER](/docs/en/sql-reference/statements/alter/projection.md) statement.
-												Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
											
										
										
											2021-06-18 16:37:37 +00:00
-												Initial

											
										
										
											2021-08-16 01:57:09 +00:00
+								### Projection Storage {#projection-storage}
 								Projections are stored inside the part directory. It's similar to an index but contains a subdirectory that stores an anonymous `MergeTree` table's part. The table is induced by the definition query of the projection. If there is a `GROUP BY` clause, the underlying storage engine becomes [AggregatingMergeTree](aggregatingmergetree.md), and all aggregate functions are converted to `AggregateFunction`. If there is an `ORDER BY` clause, the `MergeTree` table uses it as its primary key expression. During the merge process the projection part is merged via its storage's merge routine. The checksum of the parent table's part is combined with the projection's part. Other maintenance jobs are similar to skip indices.
-												Update mergetree.md

Query Routing -> Query Analysis. SimpleAggregateFunction is not used for now.
											
										
										
											2021-06-19 15:23:26 +00:00
-												Initial

											
										
										
											2021-08-16 01:57:09 +00:00
+								### Query Analysis {#projection-query-analysis}
-												Update mergetree.md

Query Routing -> Query Analysis. SimpleAggregateFunction is not used for now.
											
										
										
											2021-06-19 15:23:26 +00:00
+. Check if the projection can be used to answer the given query, that is, it generates the same answer as querying the base table.
 . Select the best feasible match, which contains the least granules to read.
-												Update docs/en/engines/table-engines/mergetree-family/mergetree.md

Co-authored-by: Alexey Boykov <33257111+mathalex@users.noreply.github.com>
											
										
										
											2021-08-12 11:32:35 +00:00
+. The query pipeline which uses projections will be different from the one that uses the original parts. If the projection is absent in some parts, we can add the pipeline to "project" it on the fly.
-												Update mergetree.md

Basic info about projections based on RFC https://github.com/ClickHouse/ClickHouse/issues/14730
											
										
										
											2021-06-18 16:37:37 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Concurrent Data Access {#concurrent-data-access}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Reading from a table is automatically parallelized.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								## TTL for Columns and Tables {#table_engine-mergetree-ttl}
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								Determines the lifetime of values.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												Apply suggestions from code review
											
										
										
											2021-08-22 19:06:42 +00:00
+								The `TTL` clause can be set for the whole table and for each individual column. Table-level `TTL` can also specify the logic of automatic moving data between disks and volumes, or recompressing parts where all the data has been expired.
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Expressions must evaluate to [Date](/docs/en/sql-reference/data-types/date.md) or [DateTime](/docs/en/sql-reference/data-types/datetime.md) data type.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								**Syntax**
 								Setting time-to-live for a column:
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												DOCAPI-7062: MySQL database engine and MergeTree TTL docs. EN review, RU translation (#6407)

* Update create.md

* Update mergetree.md

* Update index.md

* Update mysql.md

* DOCAPI-7062: RU translation.

* DOCAPI-7062: Fixes

* Update docs/ru/database_engines/index.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/query_language/create.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* DOCAPI-7062: Clarifications and fixes.

* DOCAPI-7062: Clarifications.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

											
										
										
											2019-08-14 16:42:09 +00:00
+								TTL time_column
 								TTL time_column + interval
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								```
-												Fix docs

											
										
										
											2022-11-16 02:32:44 +00:00
+								To define `interval`, use [time interval](/docs/en/sql-reference/operators/index.md#operators-datetime) operators, for example:
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								TTL date_time + INTERVAL 1 MONTH
 								TTL date_time + INTERVAL 15 HOUR
 								```
-												remove extra space (#9736)


											
										
										
											2020-03-18 18:43:51 +00:00
+								### Column TTL {#mergetree-column-ttl}
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												DOCAPI-7062: MySQL database engine and MergeTree TTL docs. EN review, RU translation (#6407)

* Update create.md

* Update mergetree.md

* Update index.md

* Update mysql.md

* DOCAPI-7062: RU translation.

* DOCAPI-7062: Fixes

* Update docs/ru/database_engines/index.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/database_engines/mysql.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/ru/query_language/create.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* DOCAPI-7062: Clarifications and fixes.

* DOCAPI-7062: Clarifications.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

											
										
										
											2019-08-14 16:42:09 +00:00
+								When the values in the column expire, ClickHouse replaces them with the default values for the column data type. If all the column values in the data part expire, ClickHouse deletes this column from the data part in a filesystem.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The `TTL` clause can’t be used for key columns.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								**Examples**
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Creating a table with `TTL`:
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
 								CREATE TABLE example_table
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								(
 								    d DateTime,
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								    a Int TTL d + INTERVAL 1 MONTH,
 								    b Int TTL d + INTERVAL 1 MONTH,
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								    c String
 								)
 								ENGINE = MergeTree
 								PARTITION BY toYYYYMM(d)
 								ORDER BY d;
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								```
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Adding TTL to a column of an existing table
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								ALTER TABLE example_table
 								    MODIFY COLUMN
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								    c String TTL d + INTERVAL 1 DAY;
 								```
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Altering TTL of the column
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								ALTER TABLE example_table
 								    MODIFY COLUMN
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								    c String TTL d + INTERVAL 1 MONTH;
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								```
-												remove extra space (#9736)


											
										
										
											2020-03-18 18:43:51 +00:00
+								### Table TTL {#mergetree-table-ttl}
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								Table can have an expression for removal of expired rows, and multiple expressions for automatic move of parts between [disks or volumes](#table_engine-mergetree-multiple-volumes). When rows in the table expire, ClickHouse deletes all corresponding rows. For parts moving or recompressing, all rows of a part must satisfy the `TTL` expression criteria.
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								TTL expr
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								    [DELETE|RECOMPRESS codec_name1|TO DISK 'xxx'|TO VOLUME 'xxx'][, DELETE|RECOMPRESS codec_name2|TO DISK 'aaa'|TO VOLUME 'bbb'] ...
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								    [WHERE conditions]
 								    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ]
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
+								```
 								Type of TTL rule may follow each TTL expression. It affects an action which is to be done once the expression is satisfied (reaches current time):
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `DELETE` - delete expired rows (default action);
 								- `RECOMPRESS codec_name` - recompress data part with the `codec_name`;
 								- `TO DISK 'aaa'` - move part to the disk `aaa`;
 								- `TO VOLUME 'bbb'` - move part to the disk `bbb`;
 								- `GROUP BY` - aggregate expired rows.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												DELETE + WHERE in TTL

ClickHouse actually supports only `DELETE` action with `WHERE` clause of a `TTL` statement
											
										
										
											2023-01-24 15:50:29 +00:00
+								`DELETE` action can be used together with `WHERE` clause to delete only some of the expired rows based on a filtering condition:
 								``` sql
 								TTL time_column + INTERVAL 1 MONTH DELETE WHERE column = 'value'
 								```
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								`GROUP BY` expression must be a prefix of the table primary key.
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
-												fix typos in docs

											
										
										
											2021-05-04 13:12:39 +00:00
+								If a column is not part of the `GROUP BY` expression and is not set explicitly in the `SET` clause, in result row it contains an occasional value from the grouped rows (as if aggregate function `any` is applied to it).
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
 								**Examples**
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Creating a table with `TTL`:
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
 								CREATE TABLE example_table
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								(
 								    d DateTime,
 								    a Int
 								)
 								ENGINE = MergeTree
 								PARTITION BY toYYYYMM(d)
 								ORDER BY d
-												fix table ttl doc example

											
										
										
											2022-12-28 04:45:05 +00:00
+								TTL d + INTERVAL 1 MONTH DELETE,
-												Added english documentation for extended TTL syntax (#8261)

* Added english documentation for extended TTL syntax.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Added link to multiple volumes.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Changed deletion to removal.

* Removed redundant piece of text.

											
										
										
											2019-12-18 07:54:21 +00:00
+								    d + INTERVAL 1 WEEK TO VOLUME 'aaa',
 								    d + INTERVAL 2 WEEK TO DISK 'bbb';
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								```
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Altering `TTL` of the table:
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								ALTER TABLE example_table
-												Update mergetree.md

TTL examples / Requested change
											
										
										
											2019-09-06 16:52:48 +00:00
+								    MODIFY TTL d + INTERVAL 1 DAY;
-												Update mergetree.md

TTL examples
											
										
										
											2019-09-06 15:26:55 +00:00
+								```
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
+								Creating a table, where the rows are expired after one month. The expired rows where dates are Mondays are deleted:
 								``` sql
 								CREATE TABLE table_with_where
 								(
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								    d DateTime,
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
+								    a Int
 								)
 								ENGINE = MergeTree
 								PARTITION BY toYYYYMM(d)
 								ORDER BY d
 								TTL d + INTERVAL 1 MONTH DELETE WHERE toDayOfWeek(d) = 1;
 								```
-												move settings to H3 level

											
										
										
											2022-06-24 15:13:15 +00:00
+								#### Creating a table, where expired rows are recompressed:
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
 								```sql
 								CREATE TABLE table_for_recompression
 								(
 								    d DateTime,
 								    key UInt64,
 								    value String
 								) ENGINE MergeTree()
 								ORDER BY tuple()
 								PARTITION BY key
 								TTL d + INTERVAL 1 MONTH RECOMPRESS CODEC(ZSTD(17)), d + INTERVAL 1 YEAR RECOMPRESS CODEC(LZ4HC(10))
 								SETTINGS min_rows_for_wide_part = 0, min_bytes_for_wide_part = 0;
 								```
-												Fix some grammar mistakes in documentation, code and tests

											
										
										
											2023-05-04 16:35:18 +00:00
+								Creating a table, where expired rows are aggregated. In result rows `x` contains the maximum value across the grouped rows, `y` — the minimum value, and `d` — any occasional value from grouped rows.
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
 								``` sql
 								CREATE TABLE table_for_aggregation
 								(
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								    d DateTime,
 								    k1 Int,
 								    k2 Int,
 								    x Int,
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
+								    y Int
 								)
 								ENGINE = MergeTree
-												fix doc
											
										
										
											2021-03-16 12:45:08 +00:00
+								ORDER BY (k1, k2)
-												Syntax updated, examples added.

											
										
										
											2021-01-23 18:16:59 +00:00
+								TTL d + INTERVAL 1 MONTH GROUP BY k1, k2 SET x = max(x), y = min(y);
 								```
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								### Removing Expired Data {#mergetree-removing-expired-data}
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								Data with an expired `TTL` is removed when ClickHouse merges data parts.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												Update docs/en/engines/table-engines/mergetree-family/mergetree.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-08-22 19:03:56 +00:00
+								When ClickHouse detects that data is expired, it performs an off-schedule merge. To control the frequency of such merges, you can set `merge_with_ttl_timeout`. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								If you perform the `SELECT` query between merges, you may get expired data. To avoid it, use the [OPTIMIZE](/docs/en/sql-reference/statements/optimize.md) query before `SELECT`.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
+								**See Also**
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								- [ttl_only_drop_parts](/docs/en/operations/settings/settings.md/#ttl_only_drop_parts) setting
-												Recompression TTL and settings

											
										
										
											2021-08-22 14:53:44 +00:00
-												list the disk types

											
										
										
											2023-06-28 13:59:32 +00:00
+								## Disk types
-												review comments

											
										
										
											2023-06-29 16:13:26 +00:00
+								In addition to local block devices, ClickHouse supports these storage types:
 								- [`s3` for S3 and MinIO](#table_engine-mergetree-s3)
 								- [`gcs` for GCS](/docs/en/integrations/data-ingestion/gcs/index.md/#creating-a-disk)
 								- [`blob_storage_disk` for Azure Blob Storage](#table_engine-mergetree-azure-blob-storage)
 								- [`hdfs` for HDFS](#hdfs-storage)
 								- [`web` for read-only from web](#web-storage)
 								- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
 								- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)
-												list the disk types

											
										
										
											2023-06-28 13:59:32 +00:00
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
+								## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								### Introduction {#introduction}
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								`MergeTree` family table engines can store data on multiple block devices. For example, it can be useful when the data of a certain table are implicitly split into “hot” and “cold”. The most recent data is regularly requested but requires only a small amount of space. On the contrary, the fat-tailed historical data is requested rarely. If several disks are available, the “hot” data may be located on fast disks (for example, NVMe SSDs or in memory), while the “cold” data - on relatively slow ones (for example, HDD).
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Data part is the minimum movable unit for `MergeTree`-engine tables. The data belonging to one part are stored on one disk. Data parts can be moved between disks in the background (according to user settings) as well as by means of the [ALTER](/docs/en/sql-reference/statements/alter/partition.md/#alter_move-partition) queries.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								### Terms {#terms}
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- Disk — Block device mounted to the filesystem.
 								- Default disk — Disk that stores the path specified in the [path](/docs/en/operations/server-configuration-parameters/settings.md/#server_configuration_parameters-path) server setting.
 								- Volume — Ordered set of equal disks (similar to [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures)).
 								- Storage policy — Set of volumes and the rules for moving data between them.
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								The names given to the described entities can be found in the system tables, [system.storage_policies](/docs/en/operations/system-tables/storage_policies.md/#system_tables-storage_policies) and [system.disks](/docs/en/operations/system-tables/disks.md/#system_tables-disks). To apply one of the configured storage policies for a table, use the `storage_policy` setting of `MergeTree`-engine family tables.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												DOCS-624: Fixing links to nowhere (#10675)

* enbaskakova-DOCSUP-652 (#101)

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

* Update docs/en/sql_reference/aggregate_functions/combinators.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/sql_reference/aggregate_functions/combinators.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/sql_reference/aggregate_functions/combinators.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/sql_reference/aggregate_functions/combinators.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/sql_reference/aggregate_functions/combinators.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

* "docs(orNull&orDefault): Functions 'orNull&orDefault' have been edited"

Co-authored-by: elenbaskakova <elenbaskakova@yandex-team.ru>
Co-authored-by: BayoNet <da-daos@yandex.ru>

* Revert "enbaskakova-DOCSUP-652 (#101)" (#107)

This reverts commit 639fee7610f28e421d14e535b7def3f466e7efca.

* CLICKHOUSEDOCS-624: Fixed links. Was 60, became 13.

* CLICKHOUSEDOCS-624: Finished fix links in Enlish version.

* CLICKHOUSEDOCS-624: Fixed RU links

Co-authored-by: elenaspb2019 <47083263+elenaspb2019@users.noreply.github.com>
Co-authored-by: elenbaskakova <elenbaskakova@yandex-team.ru>
Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-05-06 06:13:29 +00:00
+								### Configuration {#table_engine-mergetree-multiple-volumes_configure}
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												add docs for nested dynamic disks

											
										
										
											2023-05-01 17:02:32 +00:00
+								Disks, volumes and storage policies should be declared inside the `<storage_configuration>` tag either in a file in the `config.d` directory.
 								:::tip
 								Disks can also be declared in the `SETTINGS` section of a query.  This is useful
-												Typos: Follow-up to #50476

											
										
										
											2023-06-02 13:27:56 +00:00
+								for ad-hoc analysis to temporarily attach a disk that is, for example, hosted at a URL.
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
+								See [dynamic storage](#dynamic-storage) for more details.
-												add docs for nested dynamic disks

											
										
										
											2023-05-01 17:02:32 +00:00
+								:::
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
 								Configuration structure:
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` xml
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								<storage_configuration>
 								    <disks>
 								        <disk_name_1> <!-- disk name -->
-												DOCS-595: Updated the example for MergeTree storage config (#10410)

* CLICKHOUSEDOCS-595: Updated the example.

* CLICKHOUSEDOCS-595: Fixed in other languages.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-04-23 08:05:27 +00:00
+								            <path>/mnt/fast_ssd/clickhouse/</path>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								        </disk_name_1>
 								        <disk_name_2>
-												DOCS-595: Updated the example for MergeTree storage config (#10410)

* CLICKHOUSEDOCS-595: Updated the example.

* CLICKHOUSEDOCS-595: Fixed in other languages.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-04-23 08:05:27 +00:00
+								            <path>/mnt/hdd1/clickhouse/</path>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								            <keep_free_space_bytes>10485760</keep_free_space_bytes>
 								        </disk_name_2>
 								        <disk_name_3>
-												DOCS-595: Updated the example for MergeTree storage config (#10410)

* CLICKHOUSEDOCS-595: Updated the example.

* CLICKHOUSEDOCS-595: Fixed in other languages.

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-04-23 08:05:27 +00:00
+								            <path>/mnt/hdd2/clickhouse/</path>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								            <keep_free_space_bytes>10485760</keep_free_space_bytes>
 								        </disk_name_3>
 								        ...
 								    </disks>
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
+								    ...
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								</storage_configuration>
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
+								```
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
+								Tags:
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `<disk_name_N>` — Disk name. Names must be different for all disks.
 								- `path` — path under which a server will store data (`data` and `shadow` folders), should be terminated with ‘/’.
 								- `keep_free_space_bytes` — the amount of free disk space to be reserved.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
 								The order of the disk definition is not important.
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
+								Storage policies configuration markup:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` xml
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								<storage_configuration>
 								    ...
 								    <policies>
 								        <policy_name_1>
 								            <volumes>
 								                <volume_name_1>
 								                    <disk>disk_name_from_disks_configuration</disk>
 								                    <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
-												Implement lead_used load balancing algorithm for disks inside volume

v2: rebase on top removed raid1
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

											
										
										
											2022-04-26 14:58:09 +00:00
+								                    <load_balancing>round_robin</load_balancing>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								                </volume_name_1>
 								                <volume_name_2>
 								                    <!-- configuration -->
 								                </volume_name_2>
 								                <!-- more volumes -->
 								            </volumes>
 								            <move_factor>0.2</move_factor>
 								        </policy_name_1>
 								        <policy_name_2>
 								            <!-- configuration -->
 								        </policy_name_2>
 								        <!-- more policies -->
 								    </policies>
 								    ...
 								</storage_configuration>
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
+								```
 								Tags:
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `policy_name_N` — Policy name. Policy names must be unique.
 								- `volume_name_N` — Volume name. Volume names must be unique.
 								- `disk` — a disk within a volume.
 								- `max_data_part_size_bytes` — the maximum size of a part that can be stored on any of the volume’s disks. If the a size of a merged part estimated to be bigger than `max_data_part_size_bytes` then this part will be written to a next volume. Basically this feature allows to keep new/small parts on a hot (SSD) volume and move them to a cold (HDD) volume when they reach large size. Do not use this setting if your policy has only one volume.
 								- `move_factor` — when the amount of available space gets lower than this factor, data automatically starts to move on the next volume if any (by default, 0.1). ClickHouse sorts existing parts by size from largest to smallest (in descending order) and selects parts with the total size that is sufficient to meet the `move_factor` condition. If the total size of all parts is insufficient, all parts will be moved.
 								- `prefer_not_to_merge` — Disables merging of data parts on this volume. When this setting is enabled, merging data on this volume is not allowed. This allows controlling how ClickHouse works with slow disks.
-												Update mergetree.md
											
										
										
											2023-06-14 22:04:20 +00:00
+								- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default (if enabled) if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3). If disabled then already expired data part is written into a default volume and then right after moved to TTL volume.
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `load_balancing` - Policy for disk balancing, `round_robin` or `least_used`.
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
-												CI: Fix aspell on nested docs

											
										
										
											2023-06-02 11:30:05 +00:00
+								Configuration examples:
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` xml
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								<storage_configuration>
 								    ...
 								    <policies>
 								        <hdd_in_order> <!-- policy name -->
 								            <volumes>
 								                <single> <!-- volume name -->
 								                    <disk>disk1</disk>
 								                    <disk>disk2</disk>
 								                </single>
 								            </volumes>
 								        </hdd_in_order>
 								        <moving_from_ssd_to_hdd>
 								            <volumes>
 								                <hot>
 								                    <disk>fast_ssd</disk>
 								                    <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
 								                </hot>
 								                <cold>
 								                    <disk>disk1</disk>
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								                </cold>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								            </volumes>
 								            <move_factor>0.2</move_factor>
 								        </moving_from_ssd_to_hdd>
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
-												doc/mergetree: fix aligment in documentation

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

											
										
										
											2022-04-26 14:49:02 +00:00
+								        <small_jbod_with_external_no_merges>
-												DOCSUP-3871: Document prefer_not_to_merge option (#17090)

* Init commit

* Translation

* Fixed

* Fixed

* Fixed

* Fixed
											
										
										
											2020-12-02 19:03:24 +00:00
+								            <volumes>
 								                <main>
 								                    <disk>jbod1</disk>
 								                </main>
 								                <external>
 								                    <disk>external</disk>
 								                    <prefer_not_to_merge>true</prefer_not_to_merge>
 								                </external>
 								            </volumes>
 								        </small_jbod_with_external_no_merges>
-												Doc change. mergetree.md Added storage_configuration tag into examples (#8518)

* Update mergetree.md

added storage_configuration tag into examples

* Update mergetree.md

added storage_configuration tag to examples

											
										
										
											2020-01-04 15:43:15 +00:00
+								    </policies>
 								    ...
 								</storage_configuration>
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
+								```
-												DOCS-439: RU review. EN translation. Data storage policies. (#7597)

* CLICKHOUSEDOCS-439: RU review. EN translation.

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/operations/table_engines/mergetree.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* Update docs/en/query_language/alter.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-439: The RU version is syncronized with EN.

											
										
										
											2019-11-07 12:24:42 +00:00
+								In given example, the `hdd_in_order` policy implements the [round-robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) approach. Thus this policy defines only one volume (`single`), the data parts are stored on all its disks in circular order. Such policy can be quite useful if there are several similar disks are mounted to the system, but RAID is not configured. Keep in mind that each individual disk drive is not reliable and you might want to compensate it with replication factor of 3 or more.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								If there are different kinds of disks available in the system, `moving_from_ssd_to_hdd` policy can be used instead. The volume `hot` consists of an SSD disk (`fast_ssd`), and the maximum size of a part that can be stored on this volume is 1GB. All the parts with the size larger than 1GB will be stored directly on the `cold` volume, which contains an HDD disk `disk1`.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
+								Also, once the disk `fast_ssd` gets filled by more than 80%, data will be transferred to the `disk1` by a background process.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The order of volume enumeration within a storage policy is important. Once a volume is overfilled, data are moved to the next one. The order of disk enumeration is important as well because data are stored on them in turns.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								When creating a table, one can apply one of the configured storage policies to it:
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
+								CREATE TABLE table_with_non_default_policy (
 								    EventDate Date,
 								    OrderID UInt64,
 								    BannerID UInt64,
 								    SearchPhrase String
 								) ENGINE = MergeTree
 								ORDER BY (OrderID, BannerID)
 								PARTITION BY toYYYYMM(EventDate)
 								SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
 								```
-												Update change storage policy en doc
											
										
										
											2021-06-17 08:50:58 +00:00
+								The `default` storage policy implies using only one volume, which consists of only one disk given in `<path>`.
 								You could change storage policy after table creation with [ALTER TABLE ... MODIFY SETTING] query, new policy should include all old disks and volumes with same names.
-												Add a tentative english translation of the 'tiered storage' section

											
										
										
											2019-10-17 18:33:18 +00:00
-												Refactor reading the pool setting & from server config. (#48055)

After #36425 there was a lot of confusions/problems with configuring pools - when the message was confusing, and settings need to be ajusted in several places.
See some examples in #44251, #43351, #47900, #46515.

The commit includes the following changes:
1) Introduced a unified mechanism for reading pool sizes from the configuration file(s). Previously, pool sizes were read from the Context.cpp with fallbacks to profiles, whereas main_config_reloader in Server.cpp read them directly without fallbacks.
2) Corrected the data type for background_merges_mutations_concurrency_ratio. It should be float instead of int.
3) Refactored the default values for settings. Previously, they were defined in multiple places throughout the codebase, but they are now defined in one place (or two, to be exact: Settings.h and ServerSettings.h).
4) Improved documentation, including the correct message in system.settings.

Additionally make the code more conform with #46550.
											
										
										
											2023-03-30 14:44:11 +00:00
+								The number of threads performing background moves of data parts can be changed by [background_move_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_move_pool_size) setting.
-												DOCS-620: Background pools settings (#11358)

* DOCSUP-1036: Describe settings for different pools (#118)

* describe settings for different pools

* minor changes

* add refs to and from distributed, server settings, kafka, replication, mergetree

* changes in links description

* add description and links to RU version

* add descrtiption and links to ru version

* minor changes

* CLICKHOUSEDOCS-620: Returned text lost by the last merge.

Co-authored-by: Evgenia Sudarikova <56156889+otrazhenia@users.noreply.github.com>
Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
											
										
										
											2020-06-02 18:11:56 +00:00
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
+								### Dynamic Storage
 								This example query shows how to attach a table stored at a URL and configure the
 								remote storage within the query. The web storage is not configured in the ClickHouse
-												Apply suggestions from code review

Co-authored-by: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com>
											
										
										
											2023-05-01 18:38:05 +00:00
+								configuration files; all the settings are in the CREATE/ATTACH query.
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
-												add note about other disk types

											
										
										
											2023-05-01 18:49:12 +00:00
+								:::note
 								The example uses `type=web`, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter `custom_local_disks_base_directory`, which has no default, so set that also when using local disk.
 								:::
-												add example web config

											
										
										
											2023-06-28 15:00:07 +00:00
+								#### Example dynamic web storage
-												add web disk type

											
										
										
											2023-06-28 14:30:49 +00:00
-												docs clickhouse-static-files-uploader and demo repo

											
										
										
											2023-06-30 13:03:25 +00:00
+								:::tip
 								A [demo dataset](https://github.com/ClickHouse/web-tables-demo) is hosted in GitHub.  To prepare your own tables for web storage see the tool [clickhouse-static-files-uploader](/docs/en/operations/storing-data.md/#storing-data-on-webserver)
 								:::
 								In this `ATTACH TABLE` query the `UUID` provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
+								```sql
-												docs clickhouse-static-files-uploader and demo repo

											
										
										
											2023-06-30 13:03:25 +00:00
+								# highlight-next-line
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
+								ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
 								(
 								    price UInt32,
 								    date Date,
 								    postcode1 LowCardinality(String),
 								    postcode2 LowCardinality(String),
 								    type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
 								    is_new UInt8,
 								    duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
 								    addr1 String,
 								    addr2 String,
 								    street LowCardinality(String),
 								    locality LowCardinality(String),
 								    town LowCardinality(String),
 								    district LowCardinality(String),
 								    county LowCardinality(String)
 								)
 								ENGINE = MergeTree
 								ORDER BY (postcode1, postcode2, addr1, addr2)
 								  # highlight-start
 								  SETTINGS disk = disk(
 								      type=web,
 								      endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
 								      );
 								  # highlight-end
 								```
-												add docs for nested dynamic disks

											
										
										
											2023-05-01 17:02:32 +00:00
+								### Nested Dynamic Storage
-												add dynamic and nested dynamic

											
										
										
											2023-05-01 18:06:50 +00:00
+								This example query builds on the above dynamic disk configuration and shows how to
 								use a local disk to cache data from a table stored at a URL. Neither the cache disk
 								nor the web storage is configured in the ClickHouse configuration files; both are
-												Apply suggestions from code review

Co-authored-by: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com>
											
										
										
											2023-05-01 18:38:05 +00:00
+								configured in the CREATE/ATTACH query settings.
-												add docs for nested dynamic disks

											
										
										
											2023-05-01 17:02:32 +00:00
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								In the settings highlighted below notice that the disk of `type=web` is nested within
-												add docs for nested dynamic disks

											
										
										
											2023-05-01 17:02:32 +00:00
+								the disk of `type=cache`.
 								```sql
 								ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
 								(
 								    price UInt32,
 								    date Date,
 								    postcode1 LowCardinality(String),
 								    postcode2 LowCardinality(String),
 								    type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
 								    is_new UInt8,
 								    duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
 								    addr1 String,
 								    addr2 String,
 								    street LowCardinality(String),
 								    locality LowCardinality(String),
 								    town LowCardinality(String),
 								    district LowCardinality(String),
 								    county LowCardinality(String)
 								)
 								ENGINE = MergeTree
 								ORDER BY (postcode1, postcode2, addr1, addr2)
 								  # highlight-start
 								  SETTINGS disk = disk(
 								    type=cache,
 								    max_size='1Gi',
 								    path='/var/lib/clickhouse/custom_disk_cache/',
 								    disk=disk(
 								      type=web,
 								      endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
 								      )
 								  );
 								  # highlight-end
 								```
-												Headers order changed

											
										
										
											2021-03-12 10:00:46 +00:00
+								### Details {#details}
 								In the case of `MergeTree` tables, data is getting to disk in different ways:
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- As a result of an insert (`INSERT` query).
 								- During background merges and [mutations](/docs/en/sql-reference/statements/alter/index.md#alter-mutations).
 								- When downloading from another replica.
 								- As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](/docs/en/sql-reference/statements/alter/partition.md/#alter_freeze-partition).
-												Headers order changed

											
										
										
											2021-03-12 10:00:46 +00:00
 								In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:
 .  The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
 .  Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).
 								Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.
 								In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](/docs/en/operations/system-tables/part_log.md/#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](/docs/en/operations/system-tables/parts.md/#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.
-												Headers order changed

											
										
										
											2021-03-12 10:00:46 +00:00
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](/docs/en/sql-reference/statements/alter/partition.md/#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.
-												Headers order changed

											
										
										
											2021-03-12 10:00:46 +00:00
 								Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.
 								After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
 								During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.
-												update links

											
										
										
											2022-11-09 00:17:58 +00:00
+								User can assign new big parts to different disks of a [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures) volume in a balanced way using the [min_bytes_to_rebalance_partition_over_jbod](/docs/en/operations/settings/merge-tree-settings.md/#min-bytes-to-rebalance-partition-over-jbod) setting.
-												Document the min_bytes_to_rebalance_partition_over_jbod setting

Задокументировал настройку min_bytes_to_rebalance_partition_over_jbod.

											
										
										
											2021-10-29 18:35:11 +00:00
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								## Using S3 for Data Storage {#table_engine-mergetree-s3}
-												add support_batch_delete to mergetree docs

											
										
										
											2023-01-18 17:16:31 +00:00
+								:::note
-												New nav - reverting the revert

											
										
										
											2023-03-18 02:45:43 +00:00
+								Google Cloud Storage (GCS) is also supported using the type `s3`. See [GCS backed MergeTree](/docs/en/integrations/gcs).
-												add support_batch_delete to mergetree docs

											
										
										
											2023-01-18 17:16:31 +00:00
+								:::
-												Update HDFS

Внес небольшие правки в описание.

											
										
										
											2021-07-16 17:26:55 +00:00
+								`MergeTree` family table engines can store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
 								Configuration markup:
 								``` xml
 								<storage_configuration>
 								    ...
 								    <disks>
 								        <s3>
 								            <type>s3</type>
-												add support_batch_delete to mergetree docs

											
										
										
											2023-01-18 17:16:31 +00:00
+								            <support_batch_delete>true</support_batch_delete>
-												Incorporated feedback

											
										
										
											2022-03-12 16:04:51 +00:00
+								            <endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/root-path/</endpoint>
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								            <access_key_id>your_access_key_id</access_key_id>
 								            <secret_access_key>your_secret_access_key</secret_access_key>
-												Added S3 `region` to documentation.

											
										
										
											2021-05-01 17:41:31 +00:00
+								            <region></region>
-												Support "header" setting in S3 disk config

The S3 table engine supports specifying extra HTTP headers in S3
requests to certain endpoints, via the "headers" setting. This commit
adds the same setting to S3 disk config.

											
										
										
											2023-04-12 16:59:37 +00:00
+								            <header>Authorization: Bearer SOME-TOKEN</header>
-												Added SSE-C support in S3 client.

											
										
										
											2021-01-28 06:32:41 +00:00
+								            <server_side_encryption_customer_key_base64>your_base64_encoded_customer_key</server_side_encryption_customer_key_base64>
-												Add support for SSE-KMS configuration with S3

https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html

Similar to the server_side_encryption_customer_key_base64 option for
configuring SSE-C with S3, add the following settings to configure
SSE-KMS on a per-endpoint/disk basis:
  - server_side_encryption_kms_key_id
  - server_side_encryption_kms_encryption_context
  - server_side_encryption_kms_bucket_key_enabled

											
										
										
											2023-04-12 17:01:07 +00:00
+								            <server_side_encryption_kms_key_id>your_kms_key_id</server_side_encryption_kms_key_id>
 								            <server_side_encryption_kms_encryption_context>your_kms_encryption_context</server_side_encryption_kms_encryption_context>
 								            <server_side_encryption_kms_bucket_key_enabled>true</server_side_encryption_kms_bucket_key_enabled>
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								            <proxy>
 								                <uri>http://proxy1</uri>
 								                <uri>http://proxy2</uri>
 								            </proxy>
 								            <connect_timeout_ms>10000</connect_timeout_ms>
 								            <request_timeout_ms>5000</request_timeout_ms>
 								            <retry_attempts>10</retry_attempts>
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								            <single_read_retries>4</single_read_retries>
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								            <min_bytes_for_seek>1000</min_bytes_for_seek>
 								            <metadata_path>/var/lib/clickhouse/disks/s3/</metadata_path>
 								            <skip_access_check>false</skip_access_check>
 								        </s3>
-												updates related to s3 cache change

											
										
										
											2023-01-26 21:32:39 +00:00
+								        <s3_cache>
 								            <type>cache</type>
 								            <disk>s3</disk>
 								            <path>/var/lib/clickhouse/disks/s3_cache/</path>
 								            <max_size>10Gi</max_size>
 								        </s3_cache>
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								    </disks>
 								    ...
 								</storage_configuration>
 								```
-												review from ksenii

											
										
										
											2023-01-27 13:48:29 +00:00
+								:::note cache configuration
 								ClickHouse versions 22.3 through 22.7 use a different cache configuration, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) if you are using one of those versions.
 								:::
-												add missing heading

											
										
										
											2023-01-27 13:56:25 +00:00
+								### Configuring the S3 disk
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								Required parameters:
-												Update HDFS

Внес небольшие правки в описание.

											
										
										
											2021-07-16 17:26:55 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data.
 								- `access_key_id` — S3 access key id.
 								- `secret_access_key` — S3 secret access key.
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								Optional parameters:
-												Update HDFS

Внес небольшие правки в описание.

											
										
										
											2021-07-16 17:26:55 +00:00
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `region` — S3 region name.
 								- `support_batch_delete` — This controls the check to see if batch deletes are supported. Set this to `false` when using Google Cloud Storage (GCS) as GCS does not support batch deletes and preventing the checks will prevent error messages in the logs.
 								- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
 								- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`.
 								- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
 								- `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL.
 								- `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`.
 								- `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`.
 								- `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`.
 								- `single_read_retries` — Number of retry attempts in case of connection drop during read. Default value is `4`.
 								- `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`.
 								- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
 								- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
-												Support "header" setting in S3 disk config

The S3 table engine supports specifying extra HTTP headers in S3
requests to certain endpoints, via the "headers" setting. This commit
adds the same setting to S3 disk config.

											
										
										
											2023-04-12 16:59:37 +00:00
+								- `header` —  Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
-												Add support for SSE-KMS configuration with S3

https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html

Similar to the server_side_encryption_customer_key_base64 option for
configuring SSE-C with S3, add the following settings to configure
SSE-KMS on a per-endpoint/disk basis:
  - server_side_encryption_kms_key_id
  - server_side_encryption_kms_encryption_context
  - server_side_encryption_kms_bucket_key_enabled

											
										
										
											2023-04-12 17:01:07 +00:00
+								- `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
 								- `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional.
 								- `server_side_encryption_kms_bucket_key_enabled` - If specified alongside `server_side_encryption_kms_key_id`, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be `true` or `false`, defaults to nothing (matches the bucket-level setting).
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `s3_max_put_rps` — Maximum PUT requests per second rate before throttling. Default value is `0` (unlimited).
 								- `s3_max_put_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_put_rps`.
 								- `s3_max_get_rps` — Maximum GET requests per second rate before throttling. Default value is `0` (unlimited).
 								- `s3_max_get_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_get_rps`.
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
-												review from ksenii

											
										
										
											2023-01-27 13:48:29 +00:00
+								### Configuring the cache
-												updates related to s3 cache change

											
										
										
											2023-01-26 21:32:39 +00:00
-												review from ksenii

											
										
										
											2023-01-27 13:48:29 +00:00
+								This is the cache configuration from above:
 								```xml
 								        <s3_cache>
 								            <type>cache</type>
 								            <disk>s3</disk>
 								            <path>/var/lib/clickhouse/disks/s3_cache/</path>
 								            <max_size>10Gi</max_size>
 								        </s3_cache>
 								```
 								These parameters define the cache layer:
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `type` — If a disk is of type `cache` it caches mark and index files in memory.
 								- `disk` — The name of the disk that will be cached.
-												review from ksenii

											
										
										
											2023-01-27 13:48:29 +00:00
 								Cache parameters:
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `path` — The path where metadata for the cache is stored.
-												Update mergetree.md
											
										
										
											2023-06-12 20:04:33 +00:00
+								- `max_size` — The size (amount of disk space) that the cache can grow to.
-												updates related to s3 cache change

											
										
										
											2023-01-26 21:32:39 +00:00
-												review from ksenii

											
										
										
											2023-01-27 13:48:29 +00:00
+								:::tip
 								There are several other cache parameters that you can use to tune your storage, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) for the details.
 								:::
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								S3 disk can be configured as `main` or `cold` storage:
 								``` xml
 								<storage_configuration>
 								    ...
 								    <disks>
 								        <s3>
 								            <type>s3</type>
-												Incorporated feedback

											
										
										
											2022-03-12 16:04:51 +00:00
+								            <endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/root-path/</endpoint>
-												Add S3 disk documentation [EN]

											
										
										
											2021-01-19 09:03:48 +00:00
+								            <access_key_id>your_access_key_id</access_key_id>
 								            <secret_access_key>your_secret_access_key</secret_access_key>
 								        </s3>
 								    </disks>
 								    <policies>
 								        <s3_main>
 								            <volumes>
 								                <main>
 								                    <disk>s3</disk>
 								                </main>
 								            </volumes>
 								        </s3_main>
 								        <s3_cold>
 								            <volumes>
 								                <main>
 								                    <disk>default</disk>
 								                </main>
 								                <external>
 								                    <disk>s3</disk>
 								                </external>
 								            </volumes>
 								            <move_factor>0.2</move_factor>
 								        </s3_cold>
 								    </policies>
 								    ...
 								</storage_configuration>
 								```
-												better

											
										
										
											2021-06-01 14:23:46 +00:00
+								In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
-												Virtual Columns MergeTree engine
											
										
										
											2022-02-10 14:45:27 +00:00
-												Add documentation for Azure Blob Storage choice as storage_policy for MergeTree

											
										
										
											2022-02-23 12:24:55 +00:00
+								## Using Azure Blob Storage for Data Storage {#table_engine-mergetree-azure-blob-storage}
 								`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
 								As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.
 								Configuration markup:
 								``` xml
 								<storage_configuration>
 								    ...
 								    <disks>
 								        <blob_storage_disk>
 								            <type>azure_blob_storage</type>
 								            <storage_account_url>http://account.blob.core.windows.net</storage_account_url>
 								            <container_name>container</container_name>
 								            <account_name>account</account_name>
 								            <account_key>pass123</account_key>
 								            <metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
 								            <cache_enabled>true</cache_enabled>
 								            <cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
 								            <skip_access_check>false</skip_access_check>
 								        </blob_storage_disk>
 								    </disks>
 								    ...
 								</storage_configuration>
 								```
 								Connection parameters:
 								* `storage_account_url` - **Required**, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
 								* `container_name` - Target container name, defaults to `default-container`.
 								* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.
 								Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
 								* `connection_string` - For authentication using a connection string.
 								* `account_name` and `account_key` - For authentication using Shared Key.
 								Limit parameters (mainly for internal usage):
-												update doc, add profile event WriteBufferFromS3WaitInflightLimitMicroseconds

											
										
										
											2023-05-24 11:43:48 +00:00
+								* `s3_max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
-												Add documentation for Azure Blob Storage choice as storage_policy for MergeTree

											
										
										
											2022-02-23 12:24:55 +00:00
+								* `min_bytes_for_seek` - Limits the size of a seekable region.
 								* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
 								* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
 								* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
-												CI: Fix aspell on nested docs

											
										
										
											2023-06-02 11:30:05 +00:00
+								* `s3_max_inflight_parts_for_one_file` - Limits the number of put requests that can be run concurrently for one object.
-												Add documentation for Azure Blob Storage choice as storage_policy for MergeTree

											
										
										
											2022-02-23 12:24:55 +00:00
 								Other parameters:
 								* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
 								* `cache_enabled` - Allows to cache mark and index files on local FS. Default value is `true`.
 								* `cache_path` - Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
 								* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.
-												Links to working configurations

											
										
										
											2022-02-23 13:20:37 +00:00
+								Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
-												Add documentation for Azure Blob Storage choice as storage_policy for MergeTree

											
										
										
											2022-02-23 12:24:55 +00:00
-												standardize admonitions

											
										
										
											2023-03-27 18:54:05 +00:00
+								  :::note Zero-copy replication is not ready for production
-												update for setting change

											
										
										
											2022-08-18 20:05:44 +00:00
+								  Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher.  This feature is not recommended for production use.
 								  :::
-												add HDFS example

											
										
										
											2023-06-28 15:17:16 +00:00
+								## HDFS storage {#hdfs-storage}
 								In this sample configuration:
 								- the disk is of type `hdfs`
 								- the data is hosted at `hdfs://hdfs1:9000/clickhouse/`
 								```xml
 								<clickhouse>
 								    <storage_configuration>
 								        <disks>
 								            <hdfs>
 								                <type>hdfs</type>
 								                <endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
 								                <skip_access_check>true</skip_access_check>
 								            </hdfs>
 								            <hdd>
 								                <type>local</type>
 								                <path>/</path>
 								            </hdd>
 								        </disks>
 								        <policies>
 								            <hdfs>
 								                <volumes>
 								                    <main>
 								                        <disk>hdfs</disk>
 								                    </main>
 								                    <external>
 								                        <disk>hdd</disk>
 								                    </external>
 								                </volumes>
 								            </hdfs>
 								        </policies>
 								    </storage_configuration>
 								</clickhouse>
 								```
-												add example web config

											
										
										
											2023-06-28 15:00:07 +00:00
+								## Web storage (read-only) {#web-storage}
 								Web storage can be used for read-only purposes. An example use is for hosting sample
 								data, or for migrating data.
 								:::tip
 								Storage can also be configured temporarily within a query, if a web dataset is not expected
 								to be used routinely, see [dynamic storage](#dynamic-storage) and skip editing the
 								configuration file.
 								:::
 								In this sample configuration:
 								- the disk is of type `web`
 								- the data is hosted at `http://nginx:80/test1/`
-												Update CREATE TABLE docs

											
										
										
											2023-07-06 10:44:06 +00:00
+								- a cache on local storage is used
-												add example web config

											
										
										
											2023-06-28 15:00:07 +00:00
 								```xml
 								<clickhouse>
 								    <storage_configuration>
 								        <disks>
 								            <web>
 								                <type>web</type>
 								                <endpoint>http://nginx:80/test1/</endpoint>
 								            </web>
 								            <cached_web>
 								                <type>cache</type>
 								                <disk>web</disk>
 								                <path>cached_web_cache/</path>
 								                <max_size>100000000</max_size>
 								            </cached_web>
 								        </disks>
 								        <policies>
 								            <web>
 								                <volumes>
 								                    <main>
 								                        <disk>web</disk>
 								                    </main>
 								                </volumes>
 								            </web>
 								            <cached_web>
 								                <volumes>
 								                    <main>
 								                        <disk>cached_web</disk>
 								                    </main>
 								                </volumes>
 								            </cached_web>
 								        </policies>
 								    </storage_configuration>
 								</clickhouse>
 								```
-												Virtual Columns MergeTree engine
											
										
										
											2022-02-10 14:45:27 +00:00
+								## Virtual Columns {#virtual-columns}
-												Docs: Replace annoying three spaces in enumerations by a single space

											
										
										
											2023-04-19 15:55:29 +00:00
+								- `_part` — Name of a part.
 								- `_part_index` — Sequential index of the part in the query result.
 								- `_partition_id` — Name of a partition.
 								- `_part_uuid` — Unique part identifier (if enabled MergeTree setting `assign_part_uuids`).
 								- `_partition_value` — Values (a tuple) of a `partition by` expression.
 								- `_sample_factor` — Sample factor (from the query).
-												update docs and refine statements

											
										
										
											2023-09-08 00:27:17 +00:00
 								## Column Statistics (Experimental) {#column-statistics}
 								The statistic declaration is in the columns section of the `CREATE` query.
 								``` sql
 								STATISTIC <list of columns> TYPE type
 								```
 								For tables from the `*MergeTree` family, statistics can be specified.
 								These lightweight statistics aggregate information about distribution of values in columns.
 								They can be used for query optimization (At current time they are used for moving expressions to PREWHERE).
 								#### Available Types of Column Statistics {#available-types-of-column-statistics}
 								-   `tdigest`
 								    Stores distribution of values from numeric columns in [TDigest](https://github.com/tdunning/t-digest) sketch.