Add docs

2024-11-22 15:42:02 +00:00 · 2024-06-04 11:20:15 +00:00 · 2024-06-04 11:20:15 +00:00 · 9e440e1417
commit 9e440e1417
parent 52eb917ef0
2 changed files with 43 additions and 1 deletions
--- a/docs/en/operations/settings/merge-tree-settings.md
+++ b/docs/en/operations/settings/merge-tree-settings.md
@ -885,3 +885,45 @@ Default value: false
 **See Also**

 - [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting
+
+### allow_experimental_optimized_row_order
+
+Controls if the row order should be optimized during inserts to improve the compressability of the newly inserted table part.
+
+MergeTree tables are (optionally) compressed using [compression codecs](../../sql-reference/statements/create/table.md#column_compression_codec).
+Generic compression codecs such as LZ4 and ZSTD achieve maximum compression rates if the data exposes patterns.
+Long runs of the same value typically compress very well.
+
+If this setting is enabled, ClickHouse attempts to store the data in newly inserted parts in a row order that minimizes the number of equal-value across the columns of the new table part.
+In other words, a small number of equal-value runs mean that individual runs are long and compress well.
+
+Finding the optimal row order is computationally infeasible (NP hard).
+Therefore, ClickHouse uses a heuristics to quickly find a row order which still improves compressability.
+
+<details markdown="1">
+
+<summary>Heuristics for finding a row order</summary>
+
+It is generally possible to shuffle table rows freely as SQL considers the same table (or table part) in different row order as equivalent.
+
+This freedom of shuffling rows is restricted when a primary key is defined for the table.
+A primary key `C1, C2, ..., CN` in ClickHouse enforces that the table rows are sorted by columns `C1`, `C2`, ... `Cn` ([clustered index](https://en.wikipedia.org/wiki/Database_index#Clustered)).
+As a result, rows can only be shuffled within "equivalence classes" of rows that have the same values in their primary key columns.
+The intuition is that primary keys with high-cardinality, e.g. primary keys involving a `DateTime64` timestamp column, lead to many small equivalence classes.
+Likewise, tables with a low-cardinality primary key, create few and large equivalence classes.
+A table with no primary key represents an extreme case with a single equivalence class spanning all rows.
+
+The applied heuristics to find the best row order within each equivalence class is suggested by D. Lemir, O. Kaser in [Reordering columns for smaller indexes](https://doi.org/10.1016/j.ins.2011.02.002) and based on sorting the rows within each equivalence class by ascending cardinality of the non-primary key columns.
+It performs three steps:
+1. Find all equivalence classes based on the row values in primary key columns.
+2. For each equivalence class, calculate (usually estimates) the cardinalities of the non-primary-key columns.
+3. For each equivalence class, sort the rows in ascending order of non-primary-key column cardinality.
+
+</details>
+
+If enabled, insert operations incur additional CPU costs to analyze and optimize the row order of the new data.
+INSERTs are expected to take 30-50% longer depending on the data characteristics.
+Compression rates of LZ4 or ZSTD improve by 20-40% on average.
+
+This setting works best for tables with no primary key or a low-cardinality primary key, i.e. a table with only few distinct primary key values.
+High-cardinality primary keys, e.g. involving timestamp columns of type `DateTime64`, are not expected to benefit from this setting.
--- a/docs/en/sql-reference/statements/create/table.md
+++ b/docs/en/sql-reference/statements/create/table.md
@ -337,7 +337,7 @@ Then, when executing the query `SELECT name FROM users_a WHERE length(name) < 5;

 Defines storage time for values. Can be specified only for MergeTree-family tables. For the detailed description, see [TTL for columns and tables](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl).

-## Column Compression Codecs
+## Column Compression Codecs {#column_compression_codec}

 By default, ClickHouse applies `lz4` compression in the self-managed version, and `zstd` in ClickHouse Cloud.