This commit is contained in:
Robert Schulze 2024-06-04 11:20:15 +00:00
parent 52eb917ef0
commit 9e440e1417
No known key found for this signature in database
GPG Key ID: 26703B55FB13728A
2 changed files with 43 additions and 1 deletions

View File

@ -885,3 +885,45 @@ Default value: false
**See Also**
- [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting
### allow_experimental_optimized_row_order
Controls if the row order should be optimized during inserts to improve the compressability of the newly inserted table part.
MergeTree tables are (optionally) compressed using [compression codecs](../../sql-reference/statements/create/table.md#column_compression_codec).
Generic compression codecs such as LZ4 and ZSTD achieve maximum compression rates if the data exposes patterns.
Long runs of the same value typically compress very well.
If this setting is enabled, ClickHouse attempts to store the data in newly inserted parts in a row order that minimizes the number of equal-value across the columns of the new table part.
In other words, a small number of equal-value runs mean that individual runs are long and compress well.
Finding the optimal row order is computationally infeasible (NP hard).
Therefore, ClickHouse uses a heuristics to quickly find a row order which still improves compressability.
<details markdown="1">
<summary>Heuristics for finding a row order</summary>
It is generally possible to shuffle table rows freely as SQL considers the same table (or table part) in different row order as equivalent.
This freedom of shuffling rows is restricted when a primary key is defined for the table.
A primary key `C1, C2, ..., CN` in ClickHouse enforces that the table rows are sorted by columns `C1`, `C2`, ... `Cn` ([clustered index](https://en.wikipedia.org/wiki/Database_index#Clustered)).
As a result, rows can only be shuffled within "equivalence classes" of rows that have the same values in their primary key columns.
The intuition is that primary keys with high-cardinality, e.g. primary keys involving a `DateTime64` timestamp column, lead to many small equivalence classes.
Likewise, tables with a low-cardinality primary key, create few and large equivalence classes.
A table with no primary key represents an extreme case with a single equivalence class spanning all rows.
The applied heuristics to find the best row order within each equivalence class is suggested by D. Lemir, O. Kaser in [Reordering columns for smaller indexes](https://doi.org/10.1016/j.ins.2011.02.002) and based on sorting the rows within each equivalence class by ascending cardinality of the non-primary key columns.
It performs three steps:
1. Find all equivalence classes based on the row values in primary key columns.
2. For each equivalence class, calculate (usually estimates) the cardinalities of the non-primary-key columns.
3. For each equivalence class, sort the rows in ascending order of non-primary-key column cardinality.
</details>
If enabled, insert operations incur additional CPU costs to analyze and optimize the row order of the new data.
INSERTs are expected to take 30-50% longer depending on the data characteristics.
Compression rates of LZ4 or ZSTD improve by 20-40% on average.
This setting works best for tables with no primary key or a low-cardinality primary key, i.e. a table with only few distinct primary key values.
High-cardinality primary keys, e.g. involving timestamp columns of type `DateTime64`, are not expected to benefit from this setting.

View File

@ -337,7 +337,7 @@ Then, when executing the query `SELECT name FROM users_a WHERE length(name) < 5;
Defines storage time for values. Can be specified only for MergeTree-family tables. For the detailed description, see [TTL for columns and tables](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl).
## Column Compression Codecs
## Column Compression Codecs {#column_compression_codec}
By default, ClickHouse applies `lz4` compression in the self-managed version, and `zstd` in ClickHouse Cloud.