The MySQL replication code assumed that row update events would be
preceded by a single TABLE_MAP_EVENT. However, if a single SQL
statement modifies rows in multiple tables, MySQL will first send
table map events for all involved tables, and then row update events.
Depending on circumstances, this could lead to an exception when the row
update was processed, the update could be incorrectly dropped, or the
update could be applied to the wrong table.
In [1]:
start_cluster = <helpers.cluster.ClickHouseCluster object at 0x7f2d46895dd8>
def test_different_versions(start_cluster):
with pytest.raises(QueryTimeoutExceedException):
node.query("SELECT sleep(3)", timeout=1)
with pytest.raises(QueryRuntimeException):
> node.query("SELECT 1", settings={'max_concurrent_queries_for_user': 1})
E Failed: DID NOT RAISE <class 'helpers.client.QueryRuntimeException'>
[1]: https://clickhouse-test-reports.s3.yandex.net/19451/b68508002d134ead05bf2ca0e22a13a34a5c55c6/integration_tests_(thread).html#fail1
Since ExternalLoader::PeriodicUpdater::check_period_sec = 5, and so if
it will be scheduled too early the reload will be skipped, and indeed
this is what you can see in logs [1], the reload is done each 10
seconds, not 5:
2021.01.31 14:20:22.590999 [ 48 ] {} <Trace> ExternalDictionariesLoader: Supposed update time for 'dep_x' is 2021-01-31 14:20:27 (loaded, lifetime [5, 5], no errors)
2021.01.31 14:20:22.591016 [ 48 ] {} <Trace> ExternalDictionariesLoader: Next update time for 'dep_x' was set to 2021-01-31 14:20:27
...
2021.01.31 14:20:32.164882 [ 50 ] {} <Trace> ExternalDictionariesLoader: Start loading object 'dep_x'
[1]: https://clickhouse-test-reports.s3.yandex.net/19584/37797fdf5b30dc97147e73b3ac8ca9025b80aaed/integration_tests_(release).html#fail1
* add the query data deduplication excluding duplicated parts in MergeTree family engines.
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.
* add _part_uuid virtual column, allowing to use UUIDs in predicates.
Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>
address comments