mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 23:52:03 +00:00
update docs for async insert deduplication
This commit is contained in:
parent
3e08a98f16
commit
5fc4998f10
@ -176,6 +176,59 @@ Similar to [replicated_deduplication_window](#replicated-deduplication-window),
|
||||
|
||||
The time is relative to the time of the most recent record, not to the wall time. If it's the only record it will be stored forever.
|
||||
|
||||
## replicated_deduplication_window_for_async_inserts {#replicated-deduplication-window-for-async-inserts}
|
||||
|
||||
The number of most recently async inserted blocks for which ClickHouse Keeper stores hash sums to check for duplicates.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
- 0 (disable deduplication for async_inserts)
|
||||
|
||||
Default value: 10000.
|
||||
|
||||
The [Async Insert](./settings.md#async-insert) command will be cached in one or more blocks (parts). For [insert deduplication](../../engines/table-engines/mergetree-family/replication.md), when writing into replicated tables, ClickHouse writes the hash sums of each inserts into ClickHouse Keeper. Hash sums are stored only for the most recent `replicated_deduplication_window_for_async_inserts` blocks. The oldest hash sums are removed from ClickHouse Keeper.
|
||||
A large number of `replicated_deduplication_window_for_async_inserts` slows down `Async Inserts` because it needs to compare more entries.
|
||||
The hash sum is calculated from the composition of the field names and types and the data of the insert (stream of bytes).
|
||||
|
||||
## replicated_deduplication_window_seconds_for_async_inserts {#replicated-deduplication-window-seconds-for-async_inserts}
|
||||
|
||||
The number of seconds after which the hash sums of the async inserts are removed from ClickHouse Keeper.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 604800 (1 week).
|
||||
|
||||
Similar to [replicated_deduplication_window_for_async_inserts](#replicated-deduplication-window-for-async-inserts), `replicated_deduplication_window_seconds_for_async_inserts` specifies how long to store hash sums of blocks for async insert deduplication. Hash sums older than `replicated_deduplication_window_seconds_for_async_inserts` are removed from ClickHouse Keeper, even if they are less than ` replicated_deduplication_window_for_async_inserts`.
|
||||
|
||||
The time is relative to the time of the most recent record, not to the wall time. If it's the only record it will be stored forever.
|
||||
|
||||
## use_async_block_ids_cache {#use-async-block-ids-cache}
|
||||
|
||||
If true, we cache the hash sums of the async inserts.
|
||||
|
||||
Possible values:
|
||||
|
||||
- true, false
|
||||
|
||||
Default value: false.
|
||||
|
||||
A block bearing multiple async inserts will generate multiple hash sums. When some of the inserts are duplicated, keeper will only return one duplicated hash sum in one RPC, which will cause unnecessary RPC retries. This cache will watch the hash sums path in keeper. If updates are watched in the keeper, the cache will update as soon as possible, so that we are able to filter the duplicated inserts in the memory.
|
||||
|
||||
## async_block_ids_cache_min_update_interval_ms
|
||||
|
||||
The minimum interval (in milliseconds) to update the `use_async_block_ids_cache`
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 100.
|
||||
|
||||
Normally, the `use_async_block_ids_cache` updates as soon as there're updates in the watching keeper path. However, the cache updates might be too frequent and become a heavy burden. This minimun interval prevent the cache from updating too fast. Note that if we set this value too long, the block with duplicated inserts will have a longer retry time.
|
||||
|
||||
## max_replicated_logs_to_keep
|
||||
|
||||
How many records may be in the ClickHouse Keeper log if there is inactive replica. An inactive replica becomes lost when when this number exceed.
|
||||
@ -745,4 +798,4 @@ You can see which parts of `s` were stored using the sparse serialization:
|
||||
│ id │ Default │
|
||||
│ s │ Sparse │
|
||||
└────────┴────────────────────┘
|
||||
```
|
||||
```
|
||||
|
@ -1394,6 +1394,22 @@ By default, blocks inserted into replicated tables by the `INSERT` statement are
|
||||
For the replicated tables by default the only 100 of the most recent blocks for each partition are deduplicated (see [replicated_deduplication_window](merge-tree-settings.md/#replicated-deduplication-window), [replicated_deduplication_window_seconds](merge-tree-settings.md/#replicated-deduplication-window-seconds)).
|
||||
For not replicated tables see [non_replicated_deduplication_window](merge-tree-settings.md/#non-replicated-deduplication-window).
|
||||
|
||||
## async_insert_deduplicate {#settings-async-insert-deduplicate}
|
||||
|
||||
Enables or disables insert deduplication of `ASYNC INSERT` (for Replicated\* tables).
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Disabled.
|
||||
- 1 — Enabled.
|
||||
|
||||
Default value: 1.
|
||||
|
||||
By default, async inserted into replicated tables by the `INSERT` statement enabling [async_isnert](#async-insert) are deduplicated (see [Data Replication](../../engines/table-engines/mergetree-family/replication.md)).
|
||||
For the replicated tables by default the only 10000 of the most recent inserts for each partition are deduplicated (see [replicated_deduplication_window_for_async_inserts](merge-tree-settings.md/#replicated-deduplication-window-async-inserts), [replicated_deduplication_window_seconds_for_async_inserts](merge-tree-settings.md/#replicated-deduplication-window-seconds-async-inserts)).
|
||||
We recommend to enable the [async_block_ids_cache](merge-tree-settings.md/#use-async-block-ids-cache) to increase the efficiency of deduplication.
|
||||
This function does not work for non-replicated tables.
|
||||
|
||||
## deduplicate_blocks_in_dependent_materialized_views {#settings-deduplicate-blocks-in-dependent-materialized-views}
|
||||
|
||||
Enables or disables the deduplication check for materialized views that receive data from Replicated\* tables.
|
||||
|
Loading…
Reference in New Issue
Block a user