From 9e440e14176d535197a96f8f8abc1f09de1eca29 Mon Sep 17 00:00:00 2001
From: Robert Schulze <robert@clickhouse.com>
Date: Tue, 4 Jun 2024 11:20:15 +0000
Subject: [PATCH] Add docs

---
 .../settings/merge-tree-settings.md           | 42 +++++++++++++++++++
 .../sql-reference/statements/create/table.md  |  2 +-
 2 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/docs/en/operations/settings/merge-tree-settings.md b/docs/en/operations/settings/merge-tree-settings.md
index 76250b80476..284f7c4ae06 100644
--- a/docs/en/operations/settings/merge-tree-settings.md
+++ b/docs/en/operations/settings/merge-tree-settings.md
@@ -885,3 +885,45 @@ Default value: false
 **See Also**
 
 - [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting
+
+### allow_experimental_optimized_row_order
+
+Controls if the row order should be optimized during inserts to improve the compressability of the newly inserted table part.
+
+MergeTree tables are (optionally) compressed using [compression codecs](../../sql-reference/statements/create/table.md#column_compression_codec).
+Generic compression codecs such as LZ4 and ZSTD achieve maximum compression rates if the data exposes patterns.
+Long runs of the same value typically compress very well.
+
+If this setting is enabled, ClickHouse attempts to store the data in newly inserted parts in a row order that minimizes the number of equal-value across the columns of the new table part.
+In other words, a small number of equal-value runs mean that individual runs are long and compress well.
+
+Finding the optimal row order is computationally infeasible (NP hard).
+Therefore, ClickHouse uses a heuristics to quickly find a row order which still improves compressability.
+
+<details markdown="1">
+
+<summary>Heuristics for finding a row order</summary>
+
+It is generally possible to shuffle table rows freely as SQL considers the same table (or table part) in different row order as equivalent.
+
+This freedom of shuffling rows is restricted when a primary key is defined for the table.
+A primary key `C1, C2, ..., CN` in ClickHouse enforces that the table rows are sorted by columns `C1`, `C2`, ... `Cn` ([clustered index](https://en.wikipedia.org/wiki/Database_index#Clustered)).
+As a result, rows can only be shuffled within "equivalence classes" of rows that have the same values in their primary key columns.
+The intuition is that primary keys with high-cardinality, e.g. primary keys involving a `DateTime64` timestamp column, lead to many small equivalence classes.
+Likewise, tables with a low-cardinality primary key, create few and large equivalence classes.
+A table with no primary key represents an extreme case with a single equivalence class spanning all rows.
+
+The applied heuristics to find the best row order within each equivalence class is suggested by D. Lemir, O. Kaser in [Reordering columns for smaller indexes](https://doi.org/10.1016/j.ins.2011.02.002) and based on sorting the rows within each equivalence class by ascending cardinality of the non-primary key columns.
+It performs three steps:
+1. Find all equivalence classes based on the row values in primary key columns.
+2. For each equivalence class, calculate (usually estimates) the cardinalities of the non-primary-key columns.
+3. For each equivalence class, sort the rows in ascending order of non-primary-key column cardinality.
+
+</details>
+
+If enabled, insert operations incur additional CPU costs to analyze and optimize the row order of the new data.
+INSERTs are expected to take 30-50% longer depending on the data characteristics.
+Compression rates of LZ4 or ZSTD improve by 20-40% on average.
+
+This setting works best for tables with no primary key or a low-cardinality primary key, i.e. a table with only few distinct primary key values.
+High-cardinality primary keys, e.g. involving timestamp columns of type `DateTime64`, are not expected to benefit from this setting.
diff --git a/docs/en/sql-reference/statements/create/table.md b/docs/en/sql-reference/statements/create/table.md
index 16918102f02..628fe1d2875 100644
--- a/docs/en/sql-reference/statements/create/table.md
+++ b/docs/en/sql-reference/statements/create/table.md
@@ -337,7 +337,7 @@ Then, when executing the query `SELECT name FROM users_a WHERE length(name) < 5;
 
 Defines storage time for values. Can be specified only for MergeTree-family tables. For the detailed description, see [TTL for columns and tables](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl).
 
-## Column Compression Codecs
+## Column Compression Codecs {#column_compression_codec}
 
 By default, ClickHouse applies `lz4` compression in the self-managed version, and `zstd` in ClickHouse Cloud.