CLICKHOUSEDOCS-631: temporary_files_codec, join_on_disk_max_files_to_merge settings. (#11242)

Co-authored-by: Sergei Shtykov <bayonet@yandex-team.ru>
2024-11-21 15:12:02 +00:00 · 2020-06-01 22:02:16 +03:00 · 2020-06-01 22:02:16 +03:00 · a7b3343ee4
commit a7b3343ee4
parent 4df6d41457
1 changed files with 24 additions and 0 deletions
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@ -433,6 +433,18 @@ Possible values:

 Default value: 65536.

+## join_on_disk_max_files_to_merge {#join_on_disk_max_files_to_merge}
+
+Limits the number of files allowed for parallel sorting in MergeJoin operations when they are executed on disk. 
+
+The bigger the value of the setting, the more RAM used and the less disk I/O needed.
+
+Possible values:
+
+- Any positive integer, starting from 2.
+
+Default value: 64.
+
 ## any_join_distinct_right_table_keys {#any_join_distinct_right_table_keys}

 Enables legacy ClickHouse server behavior in `ANY INNER|LEFT JOIN` operations.
@ -463,6 +475,18 @@ See also:
 -   [JOIN strictness](../../sql-reference/statements/select/join.md#select-join-strictness)


+## temporary_files_codec {#temporary_files_codec}
+
+Sets compression codec for temporary files used in sorting and joining operations on disk.
+
+Possible values: 
+
+- LZ4 — [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) compression is applied.
+- NONE — No compression is applied.
+
+Default value: LZ4.
+
+
 ## max\_block\_size {#setting-max_block_size}

 In ClickHouse, data is processed by blocks (sets of column parts). The internal processing cycles for a single block are efficient enough, but there are noticeable expenditures on each block. The `max_block_size` setting is a recommendation for what size of the block (in a count of rows) to load from tables. The block size shouldn’t be too small, so that the expenditures on each block are still noticeable, but not too large so that the query with LIMIT that is completed after the first block is processed quickly. The goal is to avoid consuming too much memory when extracting a large number of columns in multiple threads and to preserve at least some cache locality.