ClickHouse/docs/en/operations/table_engines/mergetree.md

# MergeTree {#table_engines-mergetree}

The `MergeTree` engine and other engines of this family (`*MergeTree`) are the most robust ClickHousе table engines.

Engines in the `MergeTree` family are designed for inserting a very large amount of data into a table. The data is quickly written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.

Main features:

- Stores data sorted by primary key.

    This allows you to create a small sparse index that helps find data faster.

- Partitions can be used if the [partitioning key](custom_partitioning_key.md) is specified.

    ClickHouse supports certain operations with partitions that are more effective than general operations on the same data with the same result. ClickHouse also automatically cuts off the partition data where the partitioning key is specified in the query. This also improves query performance.

- Data replication support.

    The family of `ReplicatedMergeTree` tables provides data replication. For more information, see [Data replication](replication.md).

- Data sampling support.

    If necessary, you can set the data sampling method in the table.

!!! info
    The [Merge](merge.md) engine does not belong to the `*MergeTree` family.


## Creating a Table  {#table_engine-mergetree-creating-a-table}

```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
```

For descriptions of request parameters, see the [request description](../../query_language/create.md).

**Query clauses**

- `ENGINE` — Name and parameters of the engine. `ENGINE = MergeTree()`. The `MergeTree` engine does not have parameters.

- `PARTITION BY` — The [partitioning key](custom_partitioning_key.md).

    For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](../../data_types/date.md). The partition names here have the `"YYYYMM"` format.

- `ORDER BY` — The sorting key.

    A tuple of columns or arbitrary expressions. Example: `ORDER BY (CounterID, EventDate)`.

- `PRIMARY KEY` — The primary key if it [differs from the sorting key](mergetree.md).

    By default the primary key is the same as the sorting key (which is specified by the `ORDER BY` clause). Thus in most cases it is unnecessary to specify a separate `PRIMARY KEY` clause.

- `SAMPLE BY` — An expression for sampling.

    If a sampling expression is used, the primary key must contain it. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.

- `TTL` — An expression for setting storage time for rows.

    It must depend on the `Date` or `DateTime` column and have one `Date` or `DateTime` column as a result. Example:
    `TTL date + INTERVAL 1 DAY`

    For more details, see [TTL for columns and tables](#table_engine-mergetree-ttl)

- `SETTINGS` — Additional parameters that control the behavior of the `MergeTree`:
    - `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. By default, 8192. For the list of available parameters, see [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h).
    - `use_minimalistic_part_header_in_zookeeper` — Storage method of the data parts headers in ZooKeeper. If  `use_minimalistic_part_header_in_zookeeper=1`, then ZooKeeper stores less data. For more information, see the [setting description](../server_settings/settings.md#server-settings-use_minimalistic_part_header_in_zookeeper) in "Server configuration parameters".
    - `min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.
    <a name="mergetree_setting-merge_with_ttl_timeout"></a>
    - `merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with TTL. Default value: 86400 (1 day).

**Example of setting the sections **

```
ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192
```

In the example, we set partitioning by month.

We also set an expression for sampling as a hash by the user ID. This allows you to pseudorandomize the data in the table for each `CounterID` and `EventDate`. If you define a [SAMPLE](../../query_language/select.md#select-sample-clause) clause when selecting the data, ClickHouse will return an evenly pseudorandom data sample for a subset of users.

The `index_granularity` setting can be omitted because 8192 is the default value.

<details markdown="1"><summary>Deprecated Method for Creating a Table</summary>

!!! attention
    Do not use this method in new projects. If possible, switch old projects to the method described above.

```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE [=] MergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
```

**MergeTree() parameters**

- `date-column` — The name of a column of the [Date](../../data_types/date.md) type. ClickHouse automatically creates partitions by month based on this column. The partition names are in the `"YYYYMM"` format.
- `sampling_expression` — An expression for sampling.
- `(primary, key)` — Primary key. Type: [Tuple()](../../data_types/tuple.md)
- `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. The value 8192 is appropriate for most tasks.

**Example**

```
MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)
```

The `MergeTree` engine is configured in the same way as in the example above for the main engine configuration method.
</details>

## Data Storage

A table consists of data *parts* sorted by primary key.

When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by primary key. For example, if the primary key is `(CounterID, Date)`, the data in the part is sorted by `CounterID`, and within each `CounterID`, it is ordered by `Date`.

Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.

For each data part, ClickHouse creates an index file that contains the primary key value for each index row ("mark"). Index row numbers are defined as `n * index_granularity`. The maximum value `n` is equal to the integer part of dividing the total number of rows by the `index_granularity`. For each column, the "marks" are also written for the same index rows as the primary key. These "marks" allow you to find the data directly in the columns.

You can use a single large table and continually add data to it in small chunks – this is what the `MergeTree` engine is intended for.

## Primary Keys and Indexes in Queries {#primary-keys-and-indexes-in-queries}

Take the `(CounterID, Date)` primary key as an example. In this case, the sorting and index can be illustrated as follows:

```
Whole data:     [-------------------------------------------------------------------------]
CounterID:      [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
Date:           [1111111222222233331233211111222222333211111112122222223111112223311122333]
Marks:           |      |      |      |      |      |      |      |      |      |      |
                a,1    a,2    a,3    b,3    e,2    e,3    g,1    h,2    i,1    i,3    l,3
Marks numbers:   0      1      2      3      4      5      6      7      8      9      10
```

If the data query specifies:

- `CounterID in ('a', 'h')`, the server reads the data in the ranges of marks `[0, 3)` and `[6, 8)`.
- `CounterID IN ('a', 'h') AND Date = 3`, the server reads the data in the ranges of marks `[1, 3)` and `[7, 8)`.
- `Date = 3`, the server reads the data in the range of marks `[1, 10]`.

The examples above show that it is always more effective to use an index than a full scan.

A sparse index allows extra data to be read. When reading a single range of the primary key, up to `index_granularity * 2` extra rows in each data block can be read. In most cases, ClickHouse performance does not degrade when `index_granularity = 8192`.

Sparse indexes allow you to work with a very large number of table rows, because such indexes are always stored in the computer's RAM.

ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.

### Selecting the Primary Key

The number of columns in the primary key is not explicitly limited. Depending on the data structure, you can include more or fewer columns in the primary key. This may:

- Improve the performance of an index.

    If the primary key is `(a, b)`, then adding another column `c` will improve the performance if the following conditions are met:
    - There are queries with a condition on column `c`.
    - Long data ranges (several times longer than the `index_granularity`) with identical values for `(a, b)` are common. In other words, when adding another column allows you to skip quite long data ranges.

- Improve data compression.

    ClickHouse sorts data by primary key, so the higher the consistency, the better the compression.

- Provide additional logic when merging data parts in the [CollapsingMergeTree](collapsingmergetree.md#table_engine-collapsingmergetree) and [SummingMergeTree](summingmergetree.md) engines.

    In this case it makes sense to specify the *sorting key* that is different from the primary key.

A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.


### Choosing a Primary Key that Differs from the Sorting Key

It is possible to specify a primary key (an expression with values that are written in the index file for each mark) that is different from the sorting key (an expression for sorting the rows in data parts). In this case the primary key expression tuple must be a prefix of the sorting key expression tuple.

This feature is helpful when using the [SummingMergeTree](summingmergetree.md) and
[AggregatingMergeTree](aggregatingmergetree.md) table engines. In a common case when using these engines, the table has two types of columns: *dimensions* and *measures*. Typical queries aggregate values of measure columns with arbitrary `GROUP BY` and filtering by dimensions. Because SummingMergeTree and AggregatingMergeTree aggregate rows with the same value of the sorting key, it is natural to add all dimensions to it. As a result, the key expression consists of a long list of columns and this list must be frequently updated with newly added dimensions.

In this case it makes sense to leave only a few columns in the primary key that will provide efficient range scans and add the remaining dimension columns to the sorting key tuple.

[ALTER](../../query_language/alter.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts don't need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.

### Use of Indexes and Partitions in Queries

For `SELECT` queries, ClickHouse analyzes whether an index can be used. An index can be used if the `WHERE/PREWHERE` clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has `IN` or `LIKE` with a fixed prefix on columns or expressions that are in the primary key or partitioning key, or on certain partially repetitive functions of these columns, or logical relationships of these expressions.

Thus, it is possible to quickly run queries on one or many ranges of the primary key. In this example, queries will be fast when run for a specific tracking tag, for a specific tag and date range, for a specific tag and date, for multiple tags with a date range, and so on.

Let's look at the engine configured as follows:

```
ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate) SETTINGS index_granularity=8192
```

In this case, in queries:

``` sql
SELECT count() FROM table WHERE EventDate = toDate(now()) AND CounterID = 34
SELECT count() FROM table WHERE EventDate = toDate(now()) AND (CounterID = 34 OR CounterID = 42)
SELECT count() FROM table WHERE ((EventDate >= toDate('2014-01-01') AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01')) AND CounterID IN (101500, 731962, 160656) AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))
```

ClickHouse will use the primary key index to trim improper data and the monthly partitioning key to trim partitions that are in improper date ranges.

The queries above show that the index is used even for complex expressions. Reading from the table is organized so that using the index can't be slower than a full scan.

In the example below, the index can't be used.

``` sql
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```

To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](../settings/settings.md#settings-force_index_by_date) and [force_primary_key](../settings/settings.md).

The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.

### Use of Index for Partially-Monotonic Primary Keys

Consider, for example, the days of the month. They form a [monotonic sequence](https://en.wikipedia.org/wiki/Monotonic_function) for one month, but not monotonic for more extended periods. This is a partially-monotonic sequence. If a user creates the table with partially-monotonic primary key, ClickHouse creates a sparse index as usual. When a user selects data from this kind of table, ClickHouse analyzes the query conditions. If the user wants to get data between two marks of the index and both these marks fall within one month, ClickHouse can use the index in this particular case because it can calculate the distance between the parameters of a query and index marks.

ClickHouse cannot use an index if the values of the primary key in the query parameter range don't represent a monotonic sequence. In this case, ClickHouse uses the full scan method.

ClickHouse uses this logic not only for days of the month sequences, but for any primary key that represents a partially-monotonic sequence.

### Data Skipping Indices (Experimental)

You need to set `allow_experimental_data_skipping_indices` to 1 to use indices.  (run `SET allow_experimental_data_skipping_indices = 1`).

The index declaration is in the columns section of the `CREATE` query.
```sql
INDEX index_name expr TYPE type(...) GRANULARITY granularity_value
```

For tables from the `*MergeTree` family, data skipping indices can be specified.

These indices aggregate some information about the specified expression on blocks, which consist of `granularity_value` granules (the size of the granule is specified using the `index_granularity` setting in the table engine). Then these aggregates are used in `SELECT` queries for reducing the amount of data to read from the disk by skipping big blocks of data where the `where` query cannot be satisfied.


**Example**

```sql
CREATE TABLE table_name
(
    u64 UInt64,
    i32 Int32,
    s String,
    ...
    INDEX a (u64 * i32, s) TYPE minmax GRANULARITY 3,
    INDEX b (u64 * length(s)) TYPE set(1000) GRANULARITY 4
) ENGINE = MergeTree()
...
```

Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:

```sql
SELECT count() FROM table WHERE s < 'z'
SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) >= 1234
```

#### Available Types of Indices

- `minmax`

    Stores extremes of the specified expression (if the expression is `tuple`, then it stores extremes for each element of `tuple`), uses stored info for skipping blocks of data like the primary key.

- `set(max_rows)`

    Stores unique values of the specified expression (no more than `max_rows` rows, `max_rows=0` means "no limits"). Uses the values to check if the `WHERE` expression is not satisfiable on a block of data.  

- `ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`

    Stores a [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) that contains all ngrams from a block of data. Works only with strings. Can be used for optimization of `equals`, `like` and `in` expressions.

    - `n` — ngram size,
    - `size_of_bloom_filter_in_bytes` — Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).
    - `number_of_hash_functions` — The number of hash functions used in the bloom filter.
    - `random_seed` — The seed for bloom filter hash functions.

- `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`

    The same as `ngrambf_v1`, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.

```sql
INDEX sample_index (u64 * length(s)) TYPE minmax GRANULARITY 4
INDEX sample_index2 (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARITY 4
INDEX sample_index3 (lower(str), str) TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4
```

## Concurrent Data Access

For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.

Reading from a table is automatically parallelized.


## TTL for Columns and Tables {#table_engine-mergetree-ttl}

Determines the lifetime of values.

The `TTL` clause can be set for the whole table and for each individual column. If `TTL` is set for the whole table, individual `TTL` for columns are ignored.


The table must have the column of the [Date](../../data_types/date.md) or [DateTime](../../data_types/datetime.md) data type. This date column should be used in the `TTL` clause. You can only set lifetime of the data as an interval from the date column value.

```
TTL date_time + interval
```

You can set the `interval` by any expression, returning the value of the `DateTime` data type. For example, you can use [time interval](../../query_language/operators.md#operators-datetime) operators.

```
TTL date_time + INTERVAL 1 MONTH
TTL date_time + INTERVAL 15 HOUR
```

**Column TTL**

When the values in the column expire, ClickHouse replace them with the default values for the column data type. If all the column values in the data part become expired, ClickHouse deletes this column from the data part in a filesystem.

The `TTL` clause cannot be used for key columns.

**Table TTL**

When some data in table expires, ClickHouse deletes all the corresponding rows.

**Cleaning up of Data**

Data with expired TTL is removed, when ClickHouse merges data parts.

When ClickHouse see that some data is expired, it performs off-schedule merge. To control frequency of such merges, you can set [merge_with_ttl_timeout](#mergetree_setting-merge_with_ttl_timeout). If it is too low, many off-schedule merges consume much resources.

If you perform the `SELECT` query between merges you can get the expired data. To avoid it, use the [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) query before `SELECT`.

[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) <!--hide-->
-												Doc fixes: remove all anchors <a> (#3897)

* Doc fixes: rm anchors <a>

* Doc fixes: rm anchors <a>

* Doc fixes: fix links

* Doc fixes: fix the links

											
										
										
											2018-12-21 19:23:55 +00:00
+								# MergeTree {#table_engines-mergetree}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								The `MergeTree` engine and other engines of this family (`*MergeTree`) are the most robust ClickHousе table engines.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Engines in the `MergeTree` family are designed for inserting a very large amount of data into a table. The data is quickly written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								Main features:
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								- Stores data sorted by primary key.
 								    This allows you to create a small sparse index that helps find data faster.
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								- Partitions can be used if the [partitioning key](custom_partitioning_key.md) is specified.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    ClickHouse supports certain operations with partitions that are more effective than general operations on the same data with the same result. ClickHouse also automatically cuts off the partition data where the partitioning key is specified in the query. This also improves query performance.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								- Data replication support.
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    The family of `ReplicatedMergeTree` tables provides data replication. For more information, see [Data replication](replication.md).
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								- Data sampling support.
 								    If necessary, you can set the data sampling method in the table.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								!!! info
-												WIP on docs (#3813)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

* Update some links on front page

* Remove some outdated comment

* Add twitter link to front page

* More front page links tuning

* Add Amsterdam meetup link

* Smaller font to avoid second line

* Add Amsterdam link to README.md

* Proper docs nav translation

* Back to 300 font-weight except Chinese

* fix docs build

* Update Amsterdam link

* remove symlinks

* more zh punctuation

* apply lost comment by @zhang2014

* Apply comments by @zhang2014 from #3417

* Remove Beijing link

* rm incorrect symlink

* restore content of docs/zh/operations/table_engines/index.md

* CLICKHOUSE-3751: stem terms while searching docs

* CLICKHOUSE-3751: use English stemmer in non-English docs too

* CLICKHOUSE-4135 fix

* Remove past meetup link

* Add blog link to top nav

* Add ContentSquare article link

* Add form link to front page + refactor some texts

* couple markup fixes

* minor

* Introduce basic ODBC driver page in docs

* More verbose 3rd party libs disclaimer

* Put third-party stuff into a separate folder

* Separate third-party stuff in ToC too

* Update links

* Move stuff that is not really (only) a client library into a separate page

* Add clickhouse-hdfs-loader link

* Some introduction for "interfaces" section

* Rewrite tcp.md

* http_interface.md -> http.md

* fix link

* Remove unconvenient error for now

* try to guess anchor instead of failing

* remove symlink

* Remove outdated info from introduction

* remove ru roadmap.md

* replace ru roadmap.md with symlink

* Update roadmap.md

* lost file

* Title case in toc_en.yml

* Sync "Functions" ToC section with en

* Remove reference to pretty old ClickHouse release from docs

* couple lost symlinks in fa

* Close quote in proper place

* Rewrite en/getting_started/index.md

* Sync en<>ru getting_started/index.md

* minor changes

* Some gui.md refactoring

* Translate DataGrip section to ru

* Translate DataGrip section to zh

* Translate DataGrip section to fa

* Translate DBeaver section to fa

* Translate DBeaver section to zh

* Split third-party GUI to open-source and commercial

* Mention some RDBMS integrations + ad-hoc translation fixes

* Add rel="external nofollow" to outgoing links from docs

* Lost blank lines

* Fix class name

* More rel="external nofollow"

* Apply suggestions by @sundy-li

* Mobile version of front page improvements

* test

* test 2

* test 3

* Update LICENSE

* minor docs fix

* Highlight current article as suggested by @sundy-li

* fix link destination

* Introduce backup.md (only "en" for now)

* Mention INSERT+SELECT in backup.md

* Some improvements for replication.md

* Add backup.md to toc

* Mention clickhouse-backup tool

* Mention LightHouse in third-party GUI list

* Introduce interfaces/third-party/proxy.md

* Add clickhouse-bulk to proxy.md

* Major extension of integrations.md contents

* fix link target

* remove unneeded file

* better toc item name

* fix markdown

* better ru punctuation

* Add yet another possible backup approach

* Simplify copying permalinks to headers

* Support non-eng link anchors in docs + update some deps

* Generate anchors for single-page mode automatically

* Remove anchors to top of pages

* Remove anchors that nobody links to

* build fixes

* fix few links

* restore css

* fix some links

* restore gifs

* fix lost words

* more docs fixes

* docs fixes

* NULL anchor

* update urllib3 dependency

* more fixes

											
										
										
											2018-12-12 17:28:00 +00:00
+								    The [Merge](merge.md) engine does not belong to the `*MergeTree` family.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
-												Docapi 4479 merge tree new syntax translate (#4085)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

* DOCAPI-4821. Sync between ru and en versions of docs.

* Fixed the ambiguity in geo functions description.

* Example of JOIN in ru docs

* Deleted misinforming example.

* Fixed links to IN operators.

* Updated the description of ALTER MODIFY.

* [RU] Updated ALTER MODIFY description.

* DOCAPI-4479. English changes are translated into russian.

* DOCAPI-4479. Minor text and markup fixes.

* DOCAPI-4479. Minor text edits.

											
										
										
											2019-01-18 16:07:48 +00:00
+								## Creating a Table  {#table_engine-mergetree-creating-a-table}
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								```
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
 								(
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
+								    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
 								    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								    ...
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
 								    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								) ENGINE = MergeTree()
 								[PARTITION BY expr]
 								[ORDER BY expr]
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
+								[PRIMARY KEY expr]
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								[SAMPLE BY expr]
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
+								[TTL expr]
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								[SETTINGS name=value, ...]
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								For descriptions of request parameters, see the [request description](../../query_language/create.md).
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
 								**Query clauses**
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								- `ENGINE` — Name and parameters of the engine. `ENGINE = MergeTree()`. The `MergeTree` engine does not have parameters.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												WIP on docs (#3813)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

* Update some links on front page

* Remove some outdated comment

* Add twitter link to front page

* More front page links tuning

* Add Amsterdam meetup link

* Smaller font to avoid second line

* Add Amsterdam link to README.md

* Proper docs nav translation

* Back to 300 font-weight except Chinese

* fix docs build

* Update Amsterdam link

* remove symlinks

* more zh punctuation

* apply lost comment by @zhang2014

* Apply comments by @zhang2014 from #3417

* Remove Beijing link

* rm incorrect symlink

* restore content of docs/zh/operations/table_engines/index.md

* CLICKHOUSE-3751: stem terms while searching docs

* CLICKHOUSE-3751: use English stemmer in non-English docs too

* CLICKHOUSE-4135 fix

* Remove past meetup link

* Add blog link to top nav

* Add ContentSquare article link

* Add form link to front page + refactor some texts

* couple markup fixes

* minor

* Introduce basic ODBC driver page in docs

* More verbose 3rd party libs disclaimer

* Put third-party stuff into a separate folder

* Separate third-party stuff in ToC too

* Update links

* Move stuff that is not really (only) a client library into a separate page

* Add clickhouse-hdfs-loader link

* Some introduction for "interfaces" section

* Rewrite tcp.md

* http_interface.md -> http.md

* fix link

* Remove unconvenient error for now

* try to guess anchor instead of failing

* remove symlink

* Remove outdated info from introduction

* remove ru roadmap.md

* replace ru roadmap.md with symlink

* Update roadmap.md

* lost file

* Title case in toc_en.yml

* Sync "Functions" ToC section with en

* Remove reference to pretty old ClickHouse release from docs

* couple lost symlinks in fa

* Close quote in proper place

* Rewrite en/getting_started/index.md

* Sync en<>ru getting_started/index.md

* minor changes

* Some gui.md refactoring

* Translate DataGrip section to ru

* Translate DataGrip section to zh

* Translate DataGrip section to fa

* Translate DBeaver section to fa

* Translate DBeaver section to zh

* Split third-party GUI to open-source and commercial

* Mention some RDBMS integrations + ad-hoc translation fixes

* Add rel="external nofollow" to outgoing links from docs

* Lost blank lines

* Fix class name

* More rel="external nofollow"

* Apply suggestions by @sundy-li

* Mobile version of front page improvements

* test

* test 2

* test 3

* Update LICENSE

* minor docs fix

* Highlight current article as suggested by @sundy-li

* fix link destination

* Introduce backup.md (only "en" for now)

* Mention INSERT+SELECT in backup.md

* Some improvements for replication.md

* Add backup.md to toc

* Mention clickhouse-backup tool

* Mention LightHouse in third-party GUI list

* Introduce interfaces/third-party/proxy.md

* Add clickhouse-bulk to proxy.md

* Major extension of integrations.md contents

* fix link target

* remove unneeded file

* better toc item name

* fix markdown

* better ru punctuation

* Add yet another possible backup approach

* Simplify copying permalinks to headers

* Support non-eng link anchors in docs + update some deps

* Generate anchors for single-page mode automatically

* Remove anchors to top of pages

* Remove anchors that nobody links to

* build fixes

* fix few links

* restore css

* fix some links

* restore gifs

* fix lost words

* more docs fixes

* docs fixes

* NULL anchor

* update urllib3 dependency

* more fixes

											
										
										
											2018-12-12 17:28:00 +00:00
+								- `PARTITION BY` — The [partitioning key](custom_partitioning_key.md).
-												docs improvements based on comments [#CLICKHOUSE-3859]

											
										
										
											2018-12-05 11:37:45 +00:00
-												WIP on docs (#3813)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

* Update some links on front page

* Remove some outdated comment

* Add twitter link to front page

* More front page links tuning

* Add Amsterdam meetup link

* Smaller font to avoid second line

* Add Amsterdam link to README.md

* Proper docs nav translation

* Back to 300 font-weight except Chinese

* fix docs build

* Update Amsterdam link

* remove symlinks

* more zh punctuation

* apply lost comment by @zhang2014

* Apply comments by @zhang2014 from #3417

* Remove Beijing link

* rm incorrect symlink

* restore content of docs/zh/operations/table_engines/index.md

* CLICKHOUSE-3751: stem terms while searching docs

* CLICKHOUSE-3751: use English stemmer in non-English docs too

* CLICKHOUSE-4135 fix

* Remove past meetup link

* Add blog link to top nav

* Add ContentSquare article link

* Add form link to front page + refactor some texts

* couple markup fixes

* minor

* Introduce basic ODBC driver page in docs

* More verbose 3rd party libs disclaimer

* Put third-party stuff into a separate folder

* Separate third-party stuff in ToC too

* Update links

* Move stuff that is not really (only) a client library into a separate page

* Add clickhouse-hdfs-loader link

* Some introduction for "interfaces" section

* Rewrite tcp.md

* http_interface.md -> http.md

* fix link

* Remove unconvenient error for now

* try to guess anchor instead of failing

* remove symlink

* Remove outdated info from introduction

* remove ru roadmap.md

* replace ru roadmap.md with symlink

* Update roadmap.md

* lost file

* Title case in toc_en.yml

* Sync "Functions" ToC section with en

* Remove reference to pretty old ClickHouse release from docs

* couple lost symlinks in fa

* Close quote in proper place

* Rewrite en/getting_started/index.md

* Sync en<>ru getting_started/index.md

* minor changes

* Some gui.md refactoring

* Translate DataGrip section to ru

* Translate DataGrip section to zh

* Translate DataGrip section to fa

* Translate DBeaver section to fa

* Translate DBeaver section to zh

* Split third-party GUI to open-source and commercial

* Mention some RDBMS integrations + ad-hoc translation fixes

* Add rel="external nofollow" to outgoing links from docs

* Lost blank lines

* Fix class name

* More rel="external nofollow"

* Apply suggestions by @sundy-li

* Mobile version of front page improvements

* test

* test 2

* test 3

* Update LICENSE

* minor docs fix

* Highlight current article as suggested by @sundy-li

* fix link destination

* Introduce backup.md (only "en" for now)

* Mention INSERT+SELECT in backup.md

* Some improvements for replication.md

* Add backup.md to toc

* Mention clickhouse-backup tool

* Mention LightHouse in third-party GUI list

* Introduce interfaces/third-party/proxy.md

* Add clickhouse-bulk to proxy.md

* Major extension of integrations.md contents

* fix link target

* remove unneeded file

* better toc item name

* fix markdown

* better ru punctuation

* Add yet another possible backup approach

* Simplify copying permalinks to headers

* Support non-eng link anchors in docs + update some deps

* Generate anchors for single-page mode automatically

* Remove anchors to top of pages

* Remove anchors that nobody links to

* build fixes

* fix few links

* restore css

* fix some links

* restore gifs

* fix lost words

* more docs fixes

* docs fixes

* NULL anchor

* update urllib3 dependency

* more fixes

											
										
										
											2018-12-12 17:28:00 +00:00
+								    For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](../../data_types/date.md). The partition names here have the `"YYYYMM"` format.
-												docs improvements based on comments [#CLICKHOUSE-3859]

											
										
										
											2018-12-05 11:37:45 +00:00
 								- `ORDER BY` — The sorting key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								    A tuple of columns or arbitrary expressions. Example: `ORDER BY (CounterID, EventDate)`.
-												DOCAPI-6207: JDBC engine EN description (#5112)

* DOCAPI-6207: JDBC engine EN description.

* DOCAPI-6207: Edits after review.

											
										
										
											2019-05-07 22:50:37 +00:00
+								- `PRIMARY KEY` — The primary key if it [differs from the sorting key](mergetree.md).
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								    By default the primary key is the same as the sorting key (which is specified by the `ORDER BY` clause). Thus in most cases it is unnecessary to specify a separate `PRIMARY KEY` clause.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Doc fixes: remove double placeholders; add them where missing. (#3923)

* Doc fix: add spaces where missing

* Doc fixes: rm double spaces

* Doc fixes: edit spaces

* Doc fixes: rm double spaces in /fa

* Revert "Doc fixes: rm double spaces in /fa"

This reverts commit bb879a62ef5fa965d989fea3b1b2a693d2016a2d.

* Doc fix: resolve all problems with double spaces in /fa

* Doc fix: add spaces for readability

* Doc fix: add spaces

* Fix spaces

											
										
										
											2018-12-25 15:25:43 +00:00
+								- `SAMPLE BY` — An expression for sampling.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								    If a sampling expression is used, the primary key must contain it. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												DOCAPI-6207: JDBC engine EN description (#5112)

* DOCAPI-6207: JDBC engine EN description.

* DOCAPI-6207: Edits after review.

											
										
										
											2019-05-07 22:50:37 +00:00
+								- `TTL` — An expression for setting storage time for rows.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    It must depend on the `Date` or `DateTime` column and have one `Date` or `DateTime` column as a result. Example:
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
+								    `TTL date + INTERVAL 1 DAY`
-												Fix TTL link in MergeTree docs
											
										
										
											2019-07-20 08:45:56 +00:00
+								    For more details, see [TTL for columns and tables](#table_engine-mergetree-ttl)
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								- `SETTINGS` — Additional parameters that control the behavior of the `MergeTree`:
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    - `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. By default, 8192. For the list of available parameters, see [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h).
 								    - `use_minimalistic_part_header_in_zookeeper` — Storage method of the data parts headers in ZooKeeper. If  `use_minimalistic_part_header_in_zookeeper=1`, then ZooKeeper stores less data. For more information, see the [setting description](../server_settings/settings.md#server-settings-use_minimalistic_part_header_in_zookeeper) in "Server configuration parameters".
 								    - `min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								    <a name="mergetree_setting-merge_with_ttl_timeout"></a>
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    - `merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with TTL. Default value: 86400 (1 day).
-												DOCAPI-5203: The direct I/O settings are moved to right places.

											
										
										
											2019-03-04 15:34:35 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								**Example of setting the sections **
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								```
 								ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								In the example, we set partitioning by month.
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								We also set an expression for sampling as a hash by the user ID. This allows you to pseudorandomize the data in the table for each `CounterID` and `EventDate`. If you define a [SAMPLE](../../query_language/select.md#select-sample-clause) clause when selecting the data, ClickHouse will return an evenly pseudorandom data sample for a subset of users.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								The `index_granularity` setting can be omitted because 8192 is the default value.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								<details markdown="1"><summary>Deprecated Method for Creating a Table</summary>
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								!!! attention
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    Do not use this method in new projects. If possible, switch old projects to the method described above.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								```
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
 								(
 								    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
 								    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
 								    ...
 								) ENGINE [=] MergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								```
 								**MergeTree() parameters**
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								- `date-column` — The name of a column of the [Date](../../data_types/date.md) type. ClickHouse automatically creates partitions by month based on this column. The partition names are in the `"YYYYMM"` format.
 								- `sampling_expression` — An expression for sampling.
 								- `(primary, key)` — Primary key. Type: [Tuple()](../../data_types/tuple.md)
-												Translate docs/zh/operations/table_engines/mergetree.md (#4827)

* translate part of materializedview.md

* translate part of docs/zh/operations/table_engines/mergetree.md

* translate docs/zh/operations/table_engines/mergetree.md roughly

* translate docs/zh/operations/table_engines/mergetree.md more

* translate docs/zh/operations/table_engines/mergetree.md completely

* a little for mergetree.md

* small fix for mergetree.md

											
										
										
											2019-03-28 20:51:29 +00:00
+								- `index_granularity` — The granularity of an index. The number of data rows between the "marks" of an index. The value 8192 is appropriate for most tasks.
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								**Example**
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								```
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)
 								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The `MergeTree` engine is configured in the same way as in the example above for the main engine configuration method.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								</details>
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								## Data Storage
 								A table consists of data *parts* sorted by primary key.
 								When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by primary key. For example, if the primary key is `(CounterID, Date)`, the data in the part is sorted by `CounterID`, and within each `CounterID`, it is ordered by `Date`.
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								For each data part, ClickHouse creates an index file that contains the primary key value for each index row ("mark"). Index row numbers are defined as `n * index_granularity`. The maximum value `n` is equal to the integer part of dividing the total number of rows by the `index_granularity`. For each column, the "marks" are also written for the same index rows as the primary key. These "marks" allow you to find the data directly in the columns.
 								You can use a single large table and continually add data to it in small chunks – this is what the `MergeTree` engine is intended for.
-												Doc fix: actualizing the partitions description (#4288)


											
										
										
											2019-02-06 13:00:14 +00:00
+								## Primary Keys and Indexes in Queries {#primary-keys-and-indexes-in-queries}
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Take the `(CounterID, Date)` primary key as an example. In this case, the sorting and index can be illustrated as follows:
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								```
 								Whole data:     [-------------------------------------------------------------------------]
 								CounterID:      [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
 								Date:           [1111111222222233331233211111222222333211111112122222223111112223311122333]
 								Marks:           |      |      |      |      |      |      |      |      |      |      |
 								                a,1    a,2    a,3    b,3    e,2    e,3    g,1    h,2    i,1    i,3    l,3
 								Marks numbers:   0      1      2      3      4      5      6      7      8      9      10
 								```
 								If the data query specifies:
 								- `CounterID in ('a', 'h')`, the server reads the data in the ranges of marks `[0, 3)` and `[6, 8)`.
 								- `CounterID IN ('a', 'h') AND Date = 3`, the server reads the data in the ranges of marks `[1, 3)` and `[7, 8)`.
-												Partial sync between ru and en version (#3464)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

* DOCAPI-4821. Sync between ru and en versions of docs.

* Fixed the ambiguity in geo functions description.

* Example of JOIN in ru docs

* Deleted misinforming example.

											
										
										
											2018-11-01 13:28:45 +00:00
+								- `Date = 3`, the server reads the data in the range of marks `[1, 10]`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The examples above show that it is always more effective to use an index than a full scan.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Translate docs/zh/operations/table_engines/mergetree.md (#4827)

* translate part of materializedview.md

* translate part of docs/zh/operations/table_engines/mergetree.md

* translate docs/zh/operations/table_engines/mergetree.md roughly

* translate docs/zh/operations/table_engines/mergetree.md more

* translate docs/zh/operations/table_engines/mergetree.md completely

* a little for mergetree.md

* small fix for mergetree.md

											
										
										
											2019-03-28 20:51:29 +00:00
+								A sparse index allows extra data to be read. When reading a single range of the primary key, up to `index_granularity * 2` extra rows in each data block can be read. In most cases, ClickHouse performance does not degrade when `index_granularity = 8192`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								Sparse indexes allow you to work with a very large number of table rows, because such indexes are always stored in the computer's RAM.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								### Selecting the Primary Key
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The number of columns in the primary key is not explicitly limited. Depending on the data structure, you can include more or fewer columns in the primary key. This may:
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								- Improve the performance of an index.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								    If the primary key is `(a, b)`, then adding another column `c` will improve the performance if the following conditions are met:
 								    - There are queries with a condition on column `c`.
 								    - Long data ranges (several times longer than the `index_granularity`) with identical values for `(a, b)` are common. In other words, when adding another column allows you to skip quite long data ranges.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								- Improve data compression.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								    ClickHouse sorts data by primary key, so the higher the consistency, the better the compression.
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								- Provide additional logic when merging data parts in the [CollapsingMergeTree](collapsingmergetree.md#table_engine-collapsingmergetree) and [SummingMergeTree](summingmergetree.md) engines.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												docs improvements based on comments [#CLICKHOUSE-3859]

											
										
										
											2018-12-05 11:37:45 +00:00
+								    In this case it makes sense to specify the *sorting key* that is different from the primary key.
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
 								A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								### Choosing a Primary Key that Differs from the Sorting Key
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								It is possible to specify a primary key (an expression with values that are written in the index file for each mark) that is different from the sorting key (an expression for sorting the rows in data parts). In this case the primary key expression tuple must be a prefix of the sorting key expression tuple.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
 								This feature is helpful when using the [SummingMergeTree](summingmergetree.md) and
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								[AggregatingMergeTree](aggregatingmergetree.md) table engines. In a common case when using these engines, the table has two types of columns: *dimensions* and *measures*. Typical queries aggregate values of measure columns with arbitrary `GROUP BY` and filtering by dimensions. Because SummingMergeTree and AggregatingMergeTree aggregate rows with the same value of the sorting key, it is natural to add all dimensions to it. As a result, the key expression consists of a long list of columns and this list must be frequently updated with newly added dimensions.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								In this case it makes sense to leave only a few columns in the primary key that will provide efficient range scans and add the remaining dimension columns to the sorting key tuple.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								[ALTER](../../query_language/alter.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts don't need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.
-												add en docs for ALTER ORDER BY [#CLICKHOUSE-3859]

											
										
										
											2018-12-04 19:12:33 +00:00
-												Updates for Aggregating-,Collapsing-, Replacing- and SummingMergeTree. (#3346)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

											
										
										
											2018-10-19 11:25:22 +00:00
+								### Use of Indexes and Partitions in Queries
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								For `SELECT` queries, ClickHouse analyzes whether an index can be used. An index can be used if the `WHERE/PREWHERE` clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has `IN` or `LIKE` with a fixed prefix on columns or expressions that are in the primary key or partitioning key, or on certain partially repetitive functions of these columns, or logical relationships of these expressions.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Thus, it is possible to quickly run queries on one or many ranges of the primary key. In this example, queries will be fast when run for a specific tracking tag, for a specific tag and date range, for a specific tag and date, for multiple tags with a date range, and so on.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								Let's look at the engine configured as follows:
 								```
 								ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate) SETTINGS index_granularity=8192
 								```
 								In this case, in queries:
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
+								``` sql
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								SELECT count() FROM table WHERE EventDate = toDate(now()) AND CounterID = 34
 								SELECT count() FROM table WHERE EventDate = toDate(now()) AND (CounterID = 34 OR CounterID = 42)
 								SELECT count() FROM table WHERE ((EventDate >= toDate('2014-01-01') AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01')) AND CounterID IN (101500, 731962, 160656) AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))
 								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								ClickHouse will use the primary key index to trim improper data and the monthly partitioning key to trim partitions that are in improper date ranges.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The queries above show that the index is used even for complex expressions. Reading from the table is organized so that using the index can't be slower than a full scan.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								In the example below, the index can't be used.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
+								``` sql
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
 								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Doc fixes: remove all anchors <a> (#3897)

* Doc fixes: rm anchors <a>

* Doc fixes: rm anchors <a>

* Doc fixes: fix links

* Doc fixes: fix the links

											
										
										
											2018-12-21 19:23:55 +00:00
+								To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](../settings/settings.md#settings-force_index_by_date) and [force_primary_key](../settings/settings.md).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
+								### Use of Index for Partially-Monotonic Primary Keys
-												DOCAPI-4148: EN review, RU translation. MergeTree partially monotonic keys (#6085)

* Update http.md

* Update settings.md

* Update mergetree.md

* DOCAPI-6213: RU translastion.

* DOCAPI-4148: 4148

											
										
										
											2019-07-29 10:19:30 +00:00
+								Consider, for example, the days of the month. They form a [monotonic sequence](https://en.wikipedia.org/wiki/Monotonic_function) for one month, but not monotonic for more extended periods. This is a partially-monotonic sequence. If a user creates the table with partially-monotonic primary key, ClickHouse creates a sparse index as usual. When a user selects data from this kind of table, ClickHouse analyzes the query conditions. If the user wants to get data between two marks of the index and both these marks fall within one month, ClickHouse can use the index in this particular case because it can calculate the distance between the parameters of a query and index marks.
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
-												DOCAPI-4148: EN review, RU translation. MergeTree partially monotonic keys (#6085)

* Update http.md

* Update settings.md

* Update mergetree.md

* DOCAPI-6213: RU translastion.

* DOCAPI-4148: 4148

											
										
										
											2019-07-29 10:19:30 +00:00
+								ClickHouse cannot use an index if the values of the primary key in the query parameter range don't represent a monotonic sequence. In this case, ClickHouse uses the full scan method.
-												DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys (#5501)

* DOCAPI-4148: Use of Index for Partially-Monotonic Primary Keys

* DOCAPI-4148: Added a link.

											
										
										
											2019-06-15 14:31:23 +00:00
-												DOCAPI-4148: EN review, RU translation. MergeTree partially monotonic keys (#6085)

* Update http.md

* Update settings.md

* Update mergetree.md

* DOCAPI-6213: RU translastion.

* DOCAPI-4148: 4148

											
										
										
											2019-07-29 10:19:30 +00:00
+								ClickHouse uses this logic not only for days of the month sequences, but for any primary key that represents a partially-monotonic sequence.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												fix

											
										
										
											2019-02-12 17:56:57 +00:00
+								### Data Skipping Indices (Experimental)
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												docs

											
										
										
											2019-02-12 18:02:45 +00:00
+								You need to set `allow_experimental_data_skipping_indices` to 1 to use indices.  (run `SET allow_experimental_data_skipping_indices = 1`).
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								The index declaration is in the columns section of the `CREATE` query.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```sql
 								INDEX index_name expr TYPE type(...) GRANULARITY granularity_value
 								```
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								For tables from the `*MergeTree` family, data skipping indices can be specified.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								These indices aggregate some information about the specified expression on blocks, which consist of `granularity_value` granules (the size of the granule is specified using the `index_granularity` setting in the table engine). Then these aggregates are used in `SELECT` queries for reducing the amount of data to read from the disk by skipping big blocks of data where the `where` query cannot be satisfied.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								**Example**
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```sql
 								CREATE TABLE table_name
 								(
 								    u64 UInt64,
 								    i32 Int32,
 								    s String,
 								    ...
 								    INDEX a (u64 * i32, s) TYPE minmax GRANULARITY 3,
-												set args

											
										
										
											2019-02-13 19:29:31 +00:00
+								    INDEX b (u64 * length(s)) TYPE set(1000) GRANULARITY 4
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								) ENGINE = MergeTree()
 								...
 								```
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```sql
 								SELECT count() FROM table WHERE s < 'z'
 								SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) >= 1234
 								```
 								#### Available Types of Indices
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								- `minmax`
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    Stores extremes of the specified expression (if the expression is `tuple`, then it stores extremes for each element of `tuple`), uses stored info for skipping blocks of data like the primary key.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
 								- `set(max_rows)`
-												unique

											
										
										
											2019-01-29 18:22:12 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    Stores unique values of the specified expression (no more than `max_rows` rows, `max_rows=0` means "no limits"). Uses the values to check if the `WHERE` expression is not satisfiable on a block of data.
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
+								- `ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`
-												docs

											
										
										
											2019-03-12 15:16:39 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    Stores a [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) that contains all ngrams from a block of data. Works only with strings. Can be used for optimization of `equals`, `like` and `in` expressions.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
 								    - `n` — ngram size,
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    - `size_of_bloom_filter_in_bytes` — Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).
 								    - `number_of_hash_functions` — The number of hash functions used in the bloom filter.
 								    - `random_seed` — The seed for bloom filter hash functions.
-												DOCAPI-5203: Direct I/O settings for MergeTree descriptions. EN review and RU translation. (#4848)


											
										
										
											2019-04-08 16:25:37 +00:00
 								- `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)`
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								    The same as `ngrambf_v1`, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.
-												docs

											
										
										
											2019-03-12 15:16:39 +00:00
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```sql
-												fix docs

											
										
										
											2019-01-22 18:22:16 +00:00
+								INDEX sample_index (u64 * length(s)) TYPE minmax GRANULARITY 4
-												docs

											
										
										
											2019-03-12 15:16:39 +00:00
+								INDEX sample_index2 (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARITY 4
-												_v1

											
										
										
											2019-03-20 14:52:05 +00:00
+								INDEX sample_index3 (lower(str), str) TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4
-												docs en

											
										
										
											2019-01-22 14:39:18 +00:00
+								```
-												Apply useful patches from TANKER-453459

											
										
										
											2018-09-05 08:41:04 +00:00
+								## Concurrent Data Access
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Reading from a table is automatically parallelized.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												TRANSLATE-2372: EN review

											
										
										
											2019-07-29 14:38:32 +00:00
+								## TTL for Columns and Tables {#table_engine-mergetree-ttl}
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								Determines the lifetime of values.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
+								The `TTL` clause can be set for the whole table and for each individual column. If `TTL` is set for the whole table, individual `TTL` for columns are ignored.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												DOCAPI-7062: MySQL database engine docs. Update of MergeTree TTL docs. (#5706)


											
										
										
											2019-07-12 12:03:33 +00:00
 								The table must have the column of the [Date](../../data_types/date.md) or [DateTime](../../data_types/datetime.md) data type. This date column should be used in the `TTL` clause. You can only set lifetime of the data as an interval from the date column value.
 								```
 								TTL date_time + interval
 								```
 								You can set the `interval` by any expression, returning the value of the `DateTime` data type. For example, you can use [time interval](../../query_language/operators.md#operators-datetime) operators.
 								```
 								TTL date_time + INTERVAL 1 MONTH
 								TTL date_time + INTERVAL 15 HOUR
 								```
 								**Column TTL**
 								When the values in the column expire, ClickHouse replace them with the default values for the column data type. If all the column values in the data part become expired, ClickHouse deletes this column from the data part in a filesystem.
 								The `TTL` clause cannot be used for key columns.
 								**Table TTL**
 								When some data in table expires, ClickHouse deletes all the corresponding rows.
 								**Cleaning up of Data**
 								Data with expired TTL is removed, when ClickHouse merges data parts.
 								When ClickHouse see that some data is expired, it performs off-schedule merge. To control frequency of such merges, you can set [merge_with_ttl_timeout](#mergetree_setting-merge_with_ttl_timeout). If it is too low, many off-schedule merges consume much resources.
 								If you perform the `SELECT` query between merges you can get the expired data. To avoid it, use the [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) query before `SELECT`.
-												TTL for columns and tables (#4212)

Add TTL for columns and tables.

											
										
										
											2019-04-15 09:30:45 +00:00
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
+								[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) <!--hide-->