mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-05 05:52:05 +00:00
100ee92c64
The setting allows a user to provide own deduplication semantic in Replicated*MergeTree If provided, it's used instead of data digest to generate block ID So, for example, by providing a unique value for the setting in each INSERT statement, user can avoid the same inserted data being deduplicated Inserting data within the same INSERT statement are split into blocks according to the *insert_block_size* settings (max_insert_block_size, min_insert_block_size_rows, min_insert_block_size_bytes). Each block with the same INSERT statement will get an ordinal number. The ordinal number is added to insert_deduplication_token to get block dedup token i.e. <token>_0, <token>_1, ... Deduplication is done per block So, to guarantee deduplication for two same INSERT queries, dedup token and number of blocks to have to be the same Issue: #7461
25 lines
594 B
Plaintext
25 lines
594 B
Plaintext
create replica 1 and check deduplication
|
|
two inserts with exact data, one inserted, one deduplicated by data digest
|
|
1 1001
|
|
two inserts with the same dedup token, one inserted, one deduplicated by the token
|
|
1 1001
|
|
1 1001
|
|
reset deduplication token and insert new row
|
|
1 1001
|
|
1 1001
|
|
2 1002
|
|
create replica 2 and check deduplication
|
|
inserted value deduplicated by data digest, the same result as before
|
|
1 1001
|
|
1 1001
|
|
2 1002
|
|
inserted value deduplicated by dedup token, the same result as before
|
|
1 1001
|
|
1 1001
|
|
2 1002
|
|
new record inserted by providing new deduplication token
|
|
1 1001
|
|
1 1001
|
|
2 1002
|
|
2 1002
|