ClickHouse/doc/reference/en/table_engines/mergetree.rst

MergeTree
---------

The MergeTree engine supports an index by primary key and by date, and provides the possibility to update data in real time.
This is the most advanced table engine in ClickHouse. Don't confuse it with the Merge engine.

The engine accepts parameters: the name of a Date type column containing the date, a sampling expression (optional), a tuple that defines the table's primary key, and the index granularity.
Example:

Example without sampling support:
.. code-block:: text

  MergeTree(EventDate, (CounterID, EventDate), 8192)

Example with sampling support:
.. code-block:: text

  MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)

A MergeTree type table must have a separate column containing the date. In this example, it is the 'EventDate' column. The type of the date column must be 'Date' (not 'DateTime').

The primary key may be a tuple from any expressions (usually this is just a tuple of columns), or a single expression.

The sampling expression (optional) can be any expression. It must also be present in the primary key. The example uses a hash of user IDs to pseudo-randomly disperse data in the table for each CounterID and EventDate. In other words, when using the SAMPLE clause in a query, you get an evenly pseudo-random sample of data for a subset of users.

The table is implemented as a set of parts. Each part is sorted by the primary key. In addition, each part has the minimum and maximum date assigned. When inserting in the table, a new sorted part is created. The merge process is periodically initiated in the background. When merging, several parts are selected, usually the smallest ones, and then merged into one large sorted part.

In other words, incremental sorting occurs when inserting to the table. Merging is implemented so that the table always consists of a small number of sorted parts, and the merge itself doesn't do too much work.

During insertion, data belonging to different months is separated into different parts. The parts that correspond to different months are never combined. The purpose of this is to provide local data modification (for ease in backups).

Parts are combined up to a certain size threshold, so there aren't any merges that are too long.

For each part, an index file is also written. The index file contains the primary key value for every 'index_granularity' row in the table. In other words, this is an abbreviated index of sorted data.

For columns, "marks" are also written to each 'index_granularity' row so that data can be read in a specific range.

When reading from a table, the SELECT query is analyzed for whether indexes can be used. An index can be used if the WHERE or PREWHERE clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has IN above columns that are in the primary key or date, or Boolean operators over them.

Thus, it is possible to quickly run queries on one or many ranges of the primary key. In the example given, queries will work quickly for a specific counter, for a specific counter and range of dates, for a specific counter and date, for multiple counters and a range of dates, and so on.

.. code-block:: sql

  SELECT count() FROM table WHERE EventDate = toDate(now()) AND CounterID = 34
  SELECT count() FROM table WHERE EventDate = toDate(now()) AND (CounterID = 34 OR CounterID = 42)
  SELECT count() FROM table WHERE ((EventDate >= toDate('2014-01-01') AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01')) AND CounterID IN (101500, 731962, 160656) AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))

All of these cases will use the index by date and by primary key. The index is used even for complex expressions. Reading from the table is organized so that using the index can't be slower than a full scan.

In this example, the index can't be used:

.. code-block:: sql

  SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'

The index by date only allows reading those parts that contain dates from the desired range. However, a data part may contain data for many dates (up to an entire month), while within a single part the data is ordered by the primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.

For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.

Reading from a table is automatically parallelized.

The OPTIMIZE query is supported, which calls an extra merge step.

You can use a single large table and continually add data to it in small chunks - this is what MergeTree is intended for.

Data replication is possible for all types of tables in the MergeTree family (see the section "Data replication").
Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`MergeTree`
			`---------`

Table engines 2017-04-26 17:26:17 +00:00			`The MergeTree engine supports an index by primary key and by date, and provides the possibility to update data in real time.`
			`This is the most advanced table engine in ClickHouse. Don't confuse it with the Merge engine.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The engine accepts parameters: the name of a Date type column containing the date, a sampling expression (optional), a tuple that defines the table's primary key, and the index granularity.`
			`Example:`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Example without sampling support:`
CLICKHOUSE-2720: progress on website (#865) * update presentations * CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com * update submodule * lost files * CLICKHOUSE-2981: prefer sphinx docs over original reference * CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links * update presentations * Less confusing directory structure (docs -> doc/reference/) * Minify sphinx docs too * Website release script: fail fast + pass docker hash on deploy * Do not underline links in docs * shorter * cleanup docker images * tune nginx config * CLICKHOUSE-3043: get rid of habrastorage links * Lost translation * CLICKHOUSE-2936: temporary client-side redirect * behaves weird in test * put redirect back * CLICKHOUSE-3047: copy docs txts to public too * move to proper file * remove old pages to avoid confusion * Remove reference redirect warning for now * Refresh README.md * Yellow buttons in docs * Use svg flags instead of unicode ones in docs * fix test website instance * Put flags to separate files * wrong flag * Copy Yandex.Metrica introduction from main page to docs * Yet another home page structure change, couple new blocks (CLICKHOUSE-3045) * Update Contacts section * CLICKHOUSE-2849: more detailed legal information * CLICKHOUSE-2978 preparation - split by files * More changes in Contacts block * Tune texts on index page * update presentations * One more benchmark * Add usage sections to index page, adapted from slides * Get the roadmap started, based on slides from last ClickHouse Meetup * CLICKHOUSE-2977: some rendering tuning * Get rid of excessive section in the end of getting started * Make headers linkable * CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849 * CLICKHOUSE-2981: fix mobile styles in docs * Ban crawling of duplicating docs * Open some external links in new tab * Ban old docs too * Lots of trivial fixes in english docs * Lots of trivial fixes in russian docs * Remove getting started copies in markdown * Add Yandex.Webmaster * Fix some sphinx warnings * More warnings fixed in english docs * More sphinx warnings fixed * Add code-block:: text * More code-block:: text * These headers look not that well * Better switch between documentation languages * merge use_case.rst into ya_metrika_task.rst * Edit the agg_functions.rst texts * Add lost empty lines 2017-06-13 04:15:47 +00:00			`.. code-block:: text`

Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`MergeTree(EventDate, (CounterID, EventDate), 8192)`

Table engines 2017-04-26 17:26:17 +00:00			`Example with sampling support:`
CLICKHOUSE-2720: progress on website (#865) * update presentations * CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com * update submodule * lost files * CLICKHOUSE-2981: prefer sphinx docs over original reference * CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links * update presentations * Less confusing directory structure (docs -> doc/reference/) * Minify sphinx docs too * Website release script: fail fast + pass docker hash on deploy * Do not underline links in docs * shorter * cleanup docker images * tune nginx config * CLICKHOUSE-3043: get rid of habrastorage links * Lost translation * CLICKHOUSE-2936: temporary client-side redirect * behaves weird in test * put redirect back * CLICKHOUSE-3047: copy docs txts to public too * move to proper file * remove old pages to avoid confusion * Remove reference redirect warning for now * Refresh README.md * Yellow buttons in docs * Use svg flags instead of unicode ones in docs * fix test website instance * Put flags to separate files * wrong flag * Copy Yandex.Metrica introduction from main page to docs * Yet another home page structure change, couple new blocks (CLICKHOUSE-3045) * Update Contacts section * CLICKHOUSE-2849: more detailed legal information * CLICKHOUSE-2978 preparation - split by files * More changes in Contacts block * Tune texts on index page * update presentations * One more benchmark * Add usage sections to index page, adapted from slides * Get the roadmap started, based on slides from last ClickHouse Meetup * CLICKHOUSE-2977: some rendering tuning * Get rid of excessive section in the end of getting started * Make headers linkable * CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849 * CLICKHOUSE-2981: fix mobile styles in docs * Ban crawling of duplicating docs * Open some external links in new tab * Ban old docs too * Lots of trivial fixes in english docs * Lots of trivial fixes in russian docs * Remove getting started copies in markdown * Add Yandex.Webmaster * Fix some sphinx warnings * More warnings fixed in english docs * More sphinx warnings fixed * Add code-block:: text * More code-block:: text * These headers look not that well * Better switch between documentation languages * merge use_case.rst into ya_metrika_task.rst * Edit the agg_functions.rst texts * Add lost empty lines 2017-06-13 04:15:47 +00:00			`.. code-block:: text`

Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)`

Table engines 2017-04-26 17:26:17 +00:00			`A MergeTree type table must have a separate column containing the date. In this example, it is the 'EventDate' column. The type of the date column must be 'Date' (not 'DateTime').`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The primary key may be a tuple from any expressions (usually this is just a tuple of columns), or a single expression.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The sampling expression (optional) can be any expression. It must also be present in the primary key. The example uses a hash of user IDs to pseudo-randomly disperse data in the table for each CounterID and EventDate. In other words, when using the SAMPLE clause in a query, you get an evenly pseudo-random sample of data for a subset of users.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The table is implemented as a set of parts. Each part is sorted by the primary key. In addition, each part has the minimum and maximum date assigned. When inserting in the table, a new sorted part is created. The merge process is periodically initiated in the background. When merging, several parts are selected, usually the smallest ones, and then merged into one large sorted part.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`In other words, incremental sorting occurs when inserting to the table. Merging is implemented so that the table always consists of a small number of sorted parts, and the merge itself doesn't do too much work.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`During insertion, data belonging to different months is separated into different parts. The parts that correspond to different months are never combined. The purpose of this is to provide local data modification (for ease in backups).`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Parts are combined up to a certain size threshold, so there aren't any merges that are too long.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`For each part, an index file is also written. The index file contains the primary key value for every 'index_granularity' row in the table. In other words, this is an abbreviated index of sorted data.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`For columns, "marks" are also written to each 'index_granularity' row so that data can be read in a specific range.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`When reading from a table, the SELECT query is analyzed for whether indexes can be used. An index can be used if the WHERE or PREWHERE clause has an expression (as one of the conjunction elements, or entirely) that represents an equality or inequality comparison operation, or if it has IN above columns that are in the primary key or date, or Boolean operators over them.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Thus, it is possible to quickly run queries on one or many ranges of the primary key. In the example given, queries will work quickly for a specific counter, for a specific counter and range of dates, for a specific counter and date, for multiple counters and a range of dates, and so on.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
			`.. code-block:: sql`

			`SELECT count() FROM table WHERE EventDate = toDate(now()) AND CounterID = 34`
			`SELECT count() FROM table WHERE EventDate = toDate(now()) AND (CounterID = 34 OR CounterID = 42)`
			`SELECT count() FROM table WHERE ((EventDate >= toDate('2014-01-01') AND EventDate <= toDate('2014-01-31')) OR EventDate = toDate('2014-05-01')) AND CounterID IN (101500, 731962, 160656) AND (CounterID = 101500 OR EventDate != toDate('2014-05-01'))`

Table engines 2017-04-26 17:26:17 +00:00			`All of these cases will use the index by date and by primary key. The index is used even for complex expressions. Reading from the table is organized so that using the index can't be slower than a full scan.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`In this example, the index can't be used:`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
			`.. code-block:: sql`

			`SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'`

Table engines 2017-04-26 17:26:17 +00:00			`The index by date only allows reading those parts that contain dates from the desired range. However, a data part may contain data for many dates (up to an entire month), while within a single part the data is ordered by the primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Reading from a table is automatically parallelized.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The OPTIMIZE query is supported, which calls an extra merge step.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`You can use a single large table and continually add data to it in small chunks - this is what MergeTree is intended for.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Data replication is possible for all types of tables in the MergeTree family (see the section "Data replication").`