For async s3 writes final part flushing was defered until all the INSERT
block was processed, however in case of too many partitions/columns you
may exceed max_memory_usage limit (since each stream has overhead).
Introduce max_insert_delayed_streams_for_parallel_writes (with default
to 1000 for S3, 0 otherwise), to avoid this.
This should "Memory limit exceeded" errors in performance tests.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The setting allows a user to provide own deduplication semantic in Replicated*MergeTree
If provided, it's used instead of data digest to generate block ID
So, for example, by providing a unique value for the setting in each INSERT statement,
user can avoid the same inserted data being deduplicated
Inserting data within the same INSERT statement are split into blocks
according to the *insert_block_size* settings
(max_insert_block_size, min_insert_block_size_rows, min_insert_block_size_bytes).
Each block with the same INSERT statement will get an ordinal number.
The ordinal number is added to insert_deduplication_token to get block dedup token
i.e. <token>_0, <token>_1, ... Deduplication is done per block
So, to guarantee deduplication for two same INSERT queries,
dedup token and number of blocks to have to be the same
Issue: #7461