mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-15 19:02:04 +00:00
921518db0a
* add the query data deduplication excluding duplicated parts in MergeTree family engines. query deduplication is based on parts' UUID which should be enabled first with merge_tree setting assign_part_uuids=1 allow_experimental_query_deduplication setting is to enable part deduplication, default ot false. data part UUID is a mechanism of giving a data part a unique identifier. Having UUID and deduplication mechanism provides a potential of moving parts between shards preserving data consistency on a read path: duplicated UUIDs will cause root executor to retry query against on of the replica explicitly asking to exclude encountered duplicated fingerprints during a distributed query execution. NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will update part's UUID. * add _part_uuid virtual column, allowing to use UUIDs in predicates. Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com> address comments
35 lines
890 B
C++
35 lines
890 B
C++
#pragma once
|
|
|
|
#include <memory>
|
|
#include <mutex>
|
|
#include <unordered_set>
|
|
#include <Core/UUID.h>
|
|
|
|
namespace DB
|
|
{
|
|
|
|
/** PartUUIDs is a uuid set to control query deduplication.
|
|
* The object is used in query context in both direction:
|
|
* Server->Client to send all parts' UUIDs that have been read during the query
|
|
* Client->Server to ignored specified parts from being processed.
|
|
*
|
|
* Current implementation assumes a user setting allow_experimental_query_deduplication=1 is set.
|
|
*/
|
|
struct PartUUIDs
|
|
{
|
|
public:
|
|
/// Add new UUIDs if not duplicates found otherwise return duplicated UUIDs
|
|
std::vector<UUID> add(const std::vector<UUID> & uuids);
|
|
/// Get accumulated UUIDs
|
|
std::vector<UUID> get() const;
|
|
bool has(const UUID & uuid) const;
|
|
|
|
private:
|
|
mutable std::mutex mutex;
|
|
std::unordered_set<UUID> uuids;
|
|
};
|
|
|
|
using PartUUIDsPtr = std::shared_ptr<PartUUIDs>;
|
|
|
|
}
|