Commit Graph

19 Commits

Author SHA1 Message Date
Aleksei Semiglazov
921518db0a CLICKHOUSE-606: query deduplication based on parts' UUID
* add the query data deduplication excluding duplicated parts in MergeTree family engines.

query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1

allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.

data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.

NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.

* add _part_uuid virtual column, allowing to use UUIDs in predicates.

Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>

address comments
2021-02-02 16:53:39 +00:00
Nikolai Kochetov
af7f5c9518
Merge pull request #17868 from ClickHouse/async-read-from-socket
Async read from socket
2020-12-23 12:20:42 +03:00
Nikolai Kochetov
c7ef57c6fd Add setting async_socket_for_remote 2020-12-18 16:15:03 +03:00
Nikolai Kochetov
a860f128f5 Try fix race in RemoteQueryExecutor. 2020-12-17 13:07:28 +03:00
nikitamikhaylov
bcd6fc1c88 better 2020-12-16 14:55:33 +03:00
nikitamikhaylov
18d52dbc63 better 2020-12-16 14:55:33 +03:00
Nikolai Kochetov
db9ad80168 Fixing tests. 2020-12-14 19:16:08 +03:00
Nikolai Kochetov
8de5cd5bc7 Merge branch 'master' into async-read-from-socket 2020-12-14 17:45:38 +03:00
Azat Khuzhin
5b3ab48861 More forward declaration for generic headers
The following headers are pretty generic, so use forward declaration as
much as possible:
- Context.h
- Settings.h
- ConnectionTimeouts.h
(Also this shows that some missing some includes -- this has been fixed)

And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since
module part cannot be added for it, due to recursive build dependencies
that will be introduced)

Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor
and just pass the context, since settings was passed only in speicifc
places, that can allow making a copy of Context (i.e. Copier).

Approx results (How much units will be recompiled after changing file X?):

- ConnectionTimeouts.h
  - mainline: 100

- Context.h:
  - mainline: ~800
  - patched:  415

- Settings.h:
  - mainline: 900-1K
  - patched:  440 (most of them because of the Context.h)
2020-12-12 17:43:10 +03:00
Nikolai Kochetov
e8667bad45 Fix build and tests. 2020-12-09 17:12:27 +03:00
Nikolai Kochetov
156f44808f Fixing crash. 2020-12-09 17:11:45 +03:00
Nikolai Kochetov
9ca837d535 Add async status to RemoteSource. 2020-12-04 13:52:57 +03:00
Nikolai Kochetov
082a496364 Add async read to RemoteQueryExecutor 2020-12-03 15:21:10 +03:00
Nikolai Kochetov
e3946bc2b5 Add async read to RemoteQueryExecutor. 2020-12-02 20:02:14 +03:00
Nikolai Kochetov
b419d73880 Fix build. 2020-06-04 16:16:58 +03:00
Nikolai Kochetov
83b6467308 Added RemoteSource. 2020-06-03 22:50:11 +03:00
Nikolai Kochetov
f1cccf31b2 Fix build. 2020-06-02 19:29:29 +03:00
Nikolai Kochetov
b1d1034111 Refactor RemoteBlockInputStream. 2020-06-02 19:27:05 +03:00
Nikolai Kochetov
13c5ec5b54 Refactor RemoteBlockInputStream. 2020-06-02 18:59:57 +03:00