ClickHouse/docs/en/table_engines/buffer.rst

Buffer
------

Buffers the data to write in RAM, periodically flushing it to another table. During the read operation, data is read from the buffer and the other table simultaneously.

.. code-block:: text

  Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)

Engine parameters:
database, table - The table to flush data to. Instead of the database name, you can use a constant expression that returns a string.
num_layers - The level of parallelism. Physically, the table will be represented as 'num_layers' of independent buffers. The recommended value is 16.
min_time, max_time, min_rows, max_rows, min_bytes, and max_bytes are conditions for flushing data from the buffer.

Data is flushed from the buffer and written to the destination table if all the 'min' conditions or at least one 'max' condition are met.
min_time, max_time - Condition for the time in seconds from the moment of the first write to the buffer.
min_rows, max_rows - Condition for the number of rows in the buffer.
min_bytes, max_bytes - Condition for the number of bytes in the buffer.

During the write operation, data is inserted to a 'num_layers' number of random buffers. Or, if the data part to insert is large enough (greater than 'max_rows' or 'max_bytes'), it is written directly to the destination table, omitting the buffer.

The conditions for flushing the data are calculated separately for each of the 'num_layers' buffers. For example, if num_layers = 16 and max_bytes = 100000000, the maximum RAM consumption is 1.6 GB.

Example:

.. code-block:: sql

  CREATE TABLE merge.hits_buffer AS merge.hits ENGINE = Buffer(merge, hits, 16, 10, 100, 10000, 1000000, 10000000, 100000000)

Creating a 'merge.hits_buffer' table with the same structure as 'merge.hits' and using the Buffer engine. When writing to this table, data is buffered in RAM and later written to the 'merge.hits' table. 16 buffers are created. The data in each of them is flushed if either 100 seconds have passed, or one million rows have been written, or 100 MB of data have been written; or if simultaneously 10 seconds have passed and 10,000 rows and 10 MB of data have been written. For example, if just one row has been written, after 100 seconds it will be flushed, no matter what. But if many rows have been written, the data will be flushed sooner.

When the server is stopped, with DROP TABLE or DETACH TABLE, buffer data is also flushed to the destination table.

You can set empty strings in single quotation marks for the database and table name. This indicates the absence of a destination table. In this case, when the data flush conditions are reached, the buffer is simply cleared. This may be useful for keeping a window of data in memory.

When reading from a Buffer table, data is processed both from the buffer and from the destination table (if there is one).
Note that the Buffer tables does not support an index. In other words, data in the buffer is fully scanned, which might be slow for large buffers. (For data in a subordinate table, the index it supports will be used.)

If the set of columns in the Buffer table doesn't match the set of columns in a subordinate table, a subset of columns that exist in both tables is inserted.

If the types don't match for one of the columns in the Buffer table and a subordinate table, an error message is entered in the server log and the buffer is cleared.
The same thing happens if the subordinate table doesn't exist when the buffer is flushed.

If you need to run ALTER for a subordinate table and the Buffer table, we recommend first deleting the Buffer table, running ALTER for the subordinate table, then creating the Buffer table again.

If the server is restarted abnormally, the data in the buffer is lost.

PREWHERE, FINAL and SAMPLE do not work correctly for Buffer tables. These conditions are passed to the destination table, but are not used for processing data in the buffer. Because of this, we recommend only using the Buffer table for writing, while reading from the destination table.

When adding data to a Buffer, one of the buffers is locked. This causes delays if a read operation is simultaneously being performed from the table.

Data that is inserted to a Buffer table may end up in the subordinate table in a different order and in different blocks. Because of this, a Buffer table is difficult to use for writing to a CollapsingMergeTree correctly. To avoid problems, you can set 'num_layers' to 1.

If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. The random changes to the order of rows and sizes of data parts cause data deduplication to quit working, which means it is not possible to have a reliable 'exactly once' write to replicated tables.

Due to these disadvantages, we can only recommend using a Buffer table in rare cases.

A Buffer table is used when too many INSERTs are received from a large number of servers over a unit of time and data can't be buffered before insertion, which means the INSERTs can't run fast enough.

Note that it doesn't make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section "Performance").
Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`Buffer`
			`------`

Table engines 2017-04-26 17:26:17 +00:00			`Buffers the data to write in RAM, periodically flushing it to another table. During the read operation, data is read from the buffer and the other table simultaneously.`
Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2]. for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i \| sed -r -e 's/$/<NEWLINE>/' \| tr -d '\n' \| sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' \| sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done 2017-06-13 20:35:07 +00:00
CLICKHOUSE-2720: progress on website (#865) * update presentations * CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com * update submodule * lost files * CLICKHOUSE-2981: prefer sphinx docs over original reference * CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links * update presentations * Less confusing directory structure (docs -> doc/reference/) * Minify sphinx docs too * Website release script: fail fast + pass docker hash on deploy * Do not underline links in docs * shorter * cleanup docker images * tune nginx config * CLICKHOUSE-3043: get rid of habrastorage links * Lost translation * CLICKHOUSE-2936: temporary client-side redirect * behaves weird in test * put redirect back * CLICKHOUSE-3047: copy docs txts to public too * move to proper file * remove old pages to avoid confusion * Remove reference redirect warning for now * Refresh README.md * Yellow buttons in docs * Use svg flags instead of unicode ones in docs * fix test website instance * Put flags to separate files * wrong flag * Copy Yandex.Metrica introduction from main page to docs * Yet another home page structure change, couple new blocks (CLICKHOUSE-3045) * Update Contacts section * CLICKHOUSE-2849: more detailed legal information * CLICKHOUSE-2978 preparation - split by files * More changes in Contacts block * Tune texts on index page * update presentations * One more benchmark * Add usage sections to index page, adapted from slides * Get the roadmap started, based on slides from last ClickHouse Meetup * CLICKHOUSE-2977: some rendering tuning * Get rid of excessive section in the end of getting started * Make headers linkable * CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849 * CLICKHOUSE-2981: fix mobile styles in docs * Ban crawling of duplicating docs * Open some external links in new tab * Ban old docs too * Lots of trivial fixes in english docs * Lots of trivial fixes in russian docs * Remove getting started copies in markdown * Add Yandex.Webmaster * Fix some sphinx warnings * More warnings fixed in english docs * More sphinx warnings fixed * Add code-block:: text * More code-block:: text * These headers look not that well * Better switch between documentation languages * merge use_case.rst into ya_metrika_task.rst * Edit the agg_functions.rst texts * Add lost empty lines 2017-06-13 04:15:47 +00:00			`.. code-block:: text`

Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)`

Table engines 2017-04-26 17:26:17 +00:00			`Engine parameters:`
			`database, table - The table to flush data to. Instead of the database name, you can use a constant expression that returns a string.`
			`num_layers - The level of parallelism. Physically, the table will be represented as 'num_layers' of independent buffers. The recommended value is 16.`
			`min_time, max_time, min_rows, max_rows, min_bytes, and max_bytes are conditions for flushing data from the buffer.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Data is flushed from the buffer and written to the destination table if all the 'min' conditions or at least one 'max' condition are met.`
			`min_time, max_time - Condition for the time in seconds from the moment of the first write to the buffer.`
			`min_rows, max_rows - Condition for the number of rows in the buffer.`
			`min_bytes, max_bytes - Condition for the number of bytes in the buffer.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`During the write operation, data is inserted to a 'num_layers' number of random buffers. Or, if the data part to insert is large enough (greater than 'max_rows' or 'max_bytes'), it is written directly to the destination table, omitting the buffer.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`The conditions for flushing the data are calculated separately for each of the 'num_layers' buffers. For example, if num_layers = 16 and max_bytes = 100000000, the maximum RAM consumption is 1.6 GB.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Example:`
Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2]. for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i \| sed -r -e 's/$/<NEWLINE>/' \| tr -d '\n' \| sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' \| sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done 2017-06-13 20:35:07 +00:00
CLICKHOUSE-2720: progress on website (#865) * update presentations * CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com * update submodule * lost files * CLICKHOUSE-2981: prefer sphinx docs over original reference * CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links * update presentations * Less confusing directory structure (docs -> doc/reference/) * Minify sphinx docs too * Website release script: fail fast + pass docker hash on deploy * Do not underline links in docs * shorter * cleanup docker images * tune nginx config * CLICKHOUSE-3043: get rid of habrastorage links * Lost translation * CLICKHOUSE-2936: temporary client-side redirect * behaves weird in test * put redirect back * CLICKHOUSE-3047: copy docs txts to public too * move to proper file * remove old pages to avoid confusion * Remove reference redirect warning for now * Refresh README.md * Yellow buttons in docs * Use svg flags instead of unicode ones in docs * fix test website instance * Put flags to separate files * wrong flag * Copy Yandex.Metrica introduction from main page to docs * Yet another home page structure change, couple new blocks (CLICKHOUSE-3045) * Update Contacts section * CLICKHOUSE-2849: more detailed legal information * CLICKHOUSE-2978 preparation - split by files * More changes in Contacts block * Tune texts on index page * update presentations * One more benchmark * Add usage sections to index page, adapted from slides * Get the roadmap started, based on slides from last ClickHouse Meetup * CLICKHOUSE-2977: some rendering tuning * Get rid of excessive section in the end of getting started * Make headers linkable * CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849 * CLICKHOUSE-2981: fix mobile styles in docs * Ban crawling of duplicating docs * Open some external links in new tab * Ban old docs too * Lots of trivial fixes in english docs * Lots of trivial fixes in russian docs * Remove getting started copies in markdown * Add Yandex.Webmaster * Fix some sphinx warnings * More warnings fixed in english docs * More sphinx warnings fixed * Add code-block:: text * More code-block:: text * These headers look not that well * Better switch between documentation languages * merge use_case.rst into ya_metrika_task.rst * Edit the agg_functions.rst texts * Add lost empty lines 2017-06-13 04:15:47 +00:00			`.. code-block:: sql`

Initial commit if EN docs 2017-04-03 19:49:50 +00:00			`CREATE TABLE merge.hits_buffer AS merge.hits ENGINE = Buffer(merge, hits, 16, 10, 100, 10000, 1000000, 10000000, 100000000)`

Table engines 2017-04-26 17:26:17 +00:00			Creating a 'merge.hits_buffer' table with the same structure as 'merge.hits' and using the Buffer engine. When writing to this table, data is buffered in RAM and later written to the 'merge.hits' table. 16 buffers are created. The data in each of them is flushed if either 100 seconds have passed, or one million rows have been written, or 100 MB of data have been written; or if simultaneously 10 seconds have passed and 10,000 rows and 10 MB of data have been written. For example, if just one row has been written, after 100 seconds it will be flushed, no matter what. But if many rows have been written, the data will be flushed sooner.
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`When the server is stopped, with DROP TABLE or DETACH TABLE, buffer data is also flushed to the destination table.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`You can set empty strings in single quotation marks for the database and table name. This indicates the absence of a destination table. In this case, when the data flush conditions are reached, the buffer is simply cleared. This may be useful for keeping a window of data in memory.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`When reading from a Buffer table, data is processed both from the buffer and from the destination table (if there is one).`
			`Note that the Buffer tables does not support an index. In other words, data in the buffer is fully scanned, which might be slow for large buffers. (For data in a subordinate table, the index it supports will be used.)`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`If the set of columns in the Buffer table doesn't match the set of columns in a subordinate table, a subset of columns that exist in both tables is inserted.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`If the types don't match for one of the columns in the Buffer table and a subordinate table, an error message is entered in the server log and the buffer is cleared.`
			`The same thing happens if the subordinate table doesn't exist when the buffer is flushed.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`If you need to run ALTER for a subordinate table and the Buffer table, we recommend first deleting the Buffer table, running ALTER for the subordinate table, then creating the Buffer table again.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`If the server is restarted abnormally, the data in the buffer is lost.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`PREWHERE, FINAL and SAMPLE do not work correctly for Buffer tables. These conditions are passed to the destination table, but are not used for processing data in the buffer. Because of this, we recommend only using the Buffer table for writing, while reading from the destination table.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`When adding data to a Buffer, one of the buffers is locked. This causes delays if a read operation is simultaneously being performed from the table.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Data that is inserted to a Buffer table may end up in the subordinate table in a different order and in different blocks. Because of this, a Buffer table is difficult to use for writing to a CollapsingMergeTree correctly. To avoid problems, you can set 'num_layers' to 1.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. The random changes to the order of rows and sizes of data parts cause data deduplication to quit working, which means it is not possible to have a reliable 'exactly once' write to replicated tables.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Due to these disadvantages, we can only recommend using a Buffer table in rare cases.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`A Buffer table is used when too many INSERTs are received from a large number of servers over a unit of time and data can't be buffered before insertion, which means the INSERTs can't run fast enough.`
Initial commit if EN docs 2017-04-03 19:49:50 +00:00
Table engines 2017-04-26 17:26:17 +00:00			`Note that it doesn't make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section "Performance").`