mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-09-20 08:40:50 +00:00
Merge pull request #40473 from den-crane/patch-37
Doc. Some notes about insert deduplication
This commit is contained in:
commit
766a3a0e11
@ -135,7 +135,7 @@ Possible values:
|
||||
|
||||
Default value: 100.
|
||||
|
||||
The `Insert` command creates one or more blocks (parts). For [insert deduplication](../../engines/table-engines/mergetree-family/replication/), when writing into replicated tables, ClickHouse writes the hash sums of the created parts into ClickHouse Keeper. Hash sums are stored only for the most recent `replicated_deduplication_window` blocks. The oldest hash sums are removed from ClickHouse Keeper.
|
||||
The `Insert` command creates one or more blocks (parts). For [insert deduplication](../../engines/table-engines/mergetree-family/replication.md), when writing into replicated tables, ClickHouse writes the hash sums of the created parts into ClickHouse Keeper. Hash sums are stored only for the most recent `replicated_deduplication_window` blocks. The oldest hash sums are removed from ClickHouse Keeper.
|
||||
A large number of `replicated_deduplication_window` slows down `Inserts` because it needs to compare more entries.
|
||||
The hash sum is calculated from the composition of the field names and types and the data of the inserted part (stream of bytes).
|
||||
|
||||
@ -164,6 +164,8 @@ Default value: 604800 (1 week).
|
||||
|
||||
Similar to [replicated_deduplication_window](#replicated-deduplication-window), `replicated_deduplication_window_seconds` specifies how long to store hash sums of blocks for insert deduplication. Hash sums older than `replicated_deduplication_window_seconds` are removed from ClickHouse Keeper, even if they are less than ` replicated_deduplication_window`.
|
||||
|
||||
The time is relative to the time of the most recent record, not to the wall time. If it's the only record it will be stored forever.
|
||||
|
||||
## max_replicated_logs_to_keep
|
||||
|
||||
How many records may be in the ClickHouse Keeper log if there is inactive replica. An inactive replica becomes lost when when this number exceed.
|
||||
|
@ -1253,7 +1253,9 @@ Possible values:
|
||||
|
||||
Default value: 1.
|
||||
|
||||
By default, blocks inserted into replicated tables by the `INSERT` statement are deduplicated (see [Data Replication](../../engines/table-engines/mergetree-family/replication.md)).
|
||||
By default, blocks inserted into replicated tables by the `INSERT` statement are deduplicated (see [Data Replication](../../engines/table-engines/mergetree-family/replication.md)).
|
||||
For the replicated tables by default the only 100 of the most recent blocks for each partition are deduplicated (see [replicated_deduplication_window](merge-tree-settings.md#replicated-deduplication-window), [replicated_deduplication_window_seconds](merge-tree-settings.md/#replicated-deduplication-window-seconds)).
|
||||
For not replicated tables see [non_replicated_deduplication_window](merge-tree-settings.md/#non-replicated-deduplication-window).
|
||||
|
||||
## deduplicate_blocks_in_dependent_materialized_views {#settings-deduplicate-blocks-in-dependent-materialized-views}
|
||||
|
||||
@ -1287,6 +1289,9 @@ Default value: empty string (disabled)
|
||||
|
||||
`insert_deduplication_token` is used for deduplication _only_ when not empty.
|
||||
|
||||
For the replicated tables by default the only 100 of the most recent inserts for each partition are deduplicated (see [replicated_deduplication_window](merge-tree-settings.md#replicated-deduplication-window), [replicated_deduplication_window_seconds](merge-tree-settings.md/#replicated-deduplication-window-seconds)).
|
||||
For not replicated tables see [non_replicated_deduplication_window](merge-tree-settings.md/#non-replicated-deduplication-window).
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
|
Loading…
Reference in New Issue
Block a user