CLICKHOUSEDOCS-631: temporary_files_codec, join_on_disk_max_files_to_merge settings. (#11242)

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
This commit is contained in:
BayoNet 2020-06-01 22:02:16 +03:00 committed by GitHub
parent 4df6d41457
commit a7b3343ee4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -433,6 +433,18 @@ Possible values:
Default value: 65536. Default value: 65536.
## join_on_disk_max_files_to_merge {#join_on_disk_max_files_to_merge}
Limits the number of files allowed for parallel sorting in MergeJoin operations when they are executed on disk.
The bigger the value of the setting, the more RAM used and the less disk I/O needed.
Possible values:
- Any positive integer, starting from 2.
Default value: 64.
## any_join_distinct_right_table_keys {#any_join_distinct_right_table_keys} ## any_join_distinct_right_table_keys {#any_join_distinct_right_table_keys}
Enables legacy ClickHouse server behavior in `ANY INNER|LEFT JOIN` operations. Enables legacy ClickHouse server behavior in `ANY INNER|LEFT JOIN` operations.
@ -463,6 +475,18 @@ See also:
- [JOIN strictness](../../sql-reference/statements/select/join.md#select-join-strictness) - [JOIN strictness](../../sql-reference/statements/select/join.md#select-join-strictness)
## temporary_files_codec {#temporary_files_codec}
Sets compression codec for temporary files used in sorting and joining operations on disk.
Possible values:
- LZ4 — [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) compression is applied.
- NONE — No compression is applied.
Default value: LZ4.
## max\_block\_size {#setting-max_block_size} ## max\_block\_size {#setting-max_block_size}
In ClickHouse, data is processed by blocks (sets of column parts). The internal processing cycles for a single block are efficient enough, but there are noticeable expenditures on each block. The `max_block_size` setting is a recommendation for what size of the block (in a count of rows) to load from tables. The block size shouldnt be too small, so that the expenditures on each block are still noticeable, but not too large so that the query with LIMIT that is completed after the first block is processed quickly. The goal is to avoid consuming too much memory when extracting a large number of columns in multiple threads and to preserve at least some cache locality. In ClickHouse, data is processed by blocks (sets of column parts). The internal processing cycles for a single block are efficient enough, but there are noticeable expenditures on each block. The `max_block_size` setting is a recommendation for what size of the block (in a count of rows) to load from tables. The block size shouldnt be too small, so that the expenditures on each block are still noticeable, but not too large so that the query with LIMIT that is completed after the first block is processed quickly. The goal is to avoid consuming too much memory when extracting a large number of columns in multiple threads and to preserve at least some cache locality.