From 9e440e14176d535197a96f8f8abc1f09de1eca29 Mon Sep 17 00:00:00 2001 From: Robert Schulze Date: Tue, 4 Jun 2024 11:20:15 +0000 Subject: [PATCH] Add docs --- .../settings/merge-tree-settings.md | 42 +++++++++++++++++++ .../sql-reference/statements/create/table.md | 2 +- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/docs/en/operations/settings/merge-tree-settings.md b/docs/en/operations/settings/merge-tree-settings.md index 76250b80476..284f7c4ae06 100644 --- a/docs/en/operations/settings/merge-tree-settings.md +++ b/docs/en/operations/settings/merge-tree-settings.md @@ -885,3 +885,45 @@ Default value: false **See Also** - [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting + +### allow_experimental_optimized_row_order + +Controls if the row order should be optimized during inserts to improve the compressability of the newly inserted table part. + +MergeTree tables are (optionally) compressed using [compression codecs](../../sql-reference/statements/create/table.md#column_compression_codec). +Generic compression codecs such as LZ4 and ZSTD achieve maximum compression rates if the data exposes patterns. +Long runs of the same value typically compress very well. + +If this setting is enabled, ClickHouse attempts to store the data in newly inserted parts in a row order that minimizes the number of equal-value across the columns of the new table part. +In other words, a small number of equal-value runs mean that individual runs are long and compress well. + +Finding the optimal row order is computationally infeasible (NP hard). +Therefore, ClickHouse uses a heuristics to quickly find a row order which still improves compressability. + +
+ +Heuristics for finding a row order + +It is generally possible to shuffle table rows freely as SQL considers the same table (or table part) in different row order as equivalent. + +This freedom of shuffling rows is restricted when a primary key is defined for the table. +A primary key `C1, C2, ..., CN` in ClickHouse enforces that the table rows are sorted by columns `C1`, `C2`, ... `Cn` ([clustered index](https://en.wikipedia.org/wiki/Database_index#Clustered)). +As a result, rows can only be shuffled within "equivalence classes" of rows that have the same values in their primary key columns. +The intuition is that primary keys with high-cardinality, e.g. primary keys involving a `DateTime64` timestamp column, lead to many small equivalence classes. +Likewise, tables with a low-cardinality primary key, create few and large equivalence classes. +A table with no primary key represents an extreme case with a single equivalence class spanning all rows. + +The applied heuristics to find the best row order within each equivalence class is suggested by D. Lemir, O. Kaser in [Reordering columns for smaller indexes](https://doi.org/10.1016/j.ins.2011.02.002) and based on sorting the rows within each equivalence class by ascending cardinality of the non-primary key columns. +It performs three steps: +1. Find all equivalence classes based on the row values in primary key columns. +2. For each equivalence class, calculate (usually estimates) the cardinalities of the non-primary-key columns. +3. For each equivalence class, sort the rows in ascending order of non-primary-key column cardinality. + +
+ +If enabled, insert operations incur additional CPU costs to analyze and optimize the row order of the new data. +INSERTs are expected to take 30-50% longer depending on the data characteristics. +Compression rates of LZ4 or ZSTD improve by 20-40% on average. + +This setting works best for tables with no primary key or a low-cardinality primary key, i.e. a table with only few distinct primary key values. +High-cardinality primary keys, e.g. involving timestamp columns of type `DateTime64`, are not expected to benefit from this setting. diff --git a/docs/en/sql-reference/statements/create/table.md b/docs/en/sql-reference/statements/create/table.md index 16918102f02..628fe1d2875 100644 --- a/docs/en/sql-reference/statements/create/table.md +++ b/docs/en/sql-reference/statements/create/table.md @@ -337,7 +337,7 @@ Then, when executing the query `SELECT name FROM users_a WHERE length(name) < 5; Defines storage time for values. Can be specified only for MergeTree-family tables. For the detailed description, see [TTL for columns and tables](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). -## Column Compression Codecs +## Column Compression Codecs {#column_compression_codec} By default, ClickHouse applies `lz4` compression in the self-managed version, and `zstd` in ClickHouse Cloud.