ClickHouse/src/Storages/MergeTree/MergeTreeData.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1443 lines
66 KiB
C++
Raw Normal View History

2014-03-09 17:36:01 +00:00
#pragma once
Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
2022-06-14 22:35:55 +00:00
#include <base/defines.h>
#include <Common/SimpleIncrement.h>
#include <Common/MultiVersion.h>
2019-05-03 02:00:57 +00:00
#include <Storages/IStorage.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/ReadBufferFromFile.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <Disks/StoragePolicy.h>
2020-04-14 09:21:24 +00:00
#include <Processors/Merges/Algorithms/Graphite.h>
#include <Storages/MergeTree/BackgroundJobsAssignee.h>
#include <Storages/MergeTree/MergeTreeIndices.h>
#include <Storages/MergeTree/MergeTreePartInfo.h>
#include <Storages/MergeTree/MergeTreeSettings.h>
#include <Storages/MergeTree/MergeTreeMutationStatus.h>
#include <Storages/MergeTree/MergeList.h>
2019-10-10 16:30:30 +00:00
#include <Storages/MergeTree/IMergeTreeDataPart.h>
#include <Storages/MergeTree/MergeTreeDataPartInMemory.h>
2019-09-05 13:12:29 +00:00
#include <Storages/MergeTree/MergeTreePartsMover.h>
2020-04-14 19:47:19 +00:00
#include <Storages/MergeTree/MergeTreeWriteAheadLog.h>
#include <Storages/MergeTree/PinnedPartUUIDs.h>
#include <Storages/MergeTree/ZeroCopyLock.h>
#include <Storages/MergeTree/TemporaryParts.h>
#include <Storages/IndicesDescription.h>
2022-09-05 16:55:00 +00:00
#include <Storages/MergeTree/AlterConversions.h>
#include <Storages/DataDestinationType.h>
2020-06-16 08:39:12 +00:00
#include <Storages/extractKeyExpressionList.h>
2020-07-13 17:27:52 +00:00
#include <Storages/PartitionCommands.h>
#include <Interpreters/PartLog.h>
2022-01-17 11:52:51 +00:00
2014-04-02 13:45:39 +00:00
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/global_fun.hpp>
#include <boost/range/iterator_range_core.hpp>
2014-04-02 13:45:39 +00:00
2014-03-09 17:36:01 +00:00
namespace DB
{
/// Number of streams is not number parts, but number or parts*files, hence 1000.
const size_t DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE = 1000;
2018-12-25 23:11:36 +00:00
class AlterCommands;
2019-09-05 13:12:29 +00:00
class MergeTreePartsMover;
class MergeTreeDataMergerMutator;
2020-03-24 17:05:38 +00:00
class MutationCommands;
class Context;
using PartitionIdToMaxBlock = std::unordered_map<String, Int64>;
2020-10-16 10:12:31 +00:00
struct JobAndPool;
2021-04-08 17:20:45 +00:00
class MergeTreeTransaction;
2022-01-17 11:52:51 +00:00
struct ZeroCopyLock;
2018-12-25 23:11:36 +00:00
class IBackupEntry;
using BackupEntries = std::vector<std::pair<String, std::shared_ptr<const IBackupEntry>>>;
2022-07-20 20:30:16 +00:00
class MergeTreeTransaction;
using MergeTreeTransactionPtr = std::shared_ptr<MergeTreeTransaction>;
2021-02-18 08:50:31 +00:00
/// Auxiliary struct holding information about the future merged or mutated part.
struct EmergingPartInfo
{
String disk_name;
String partition_id;
size_t estimate_bytes;
};
struct CurrentlySubmergingEmergingTagger;
struct SelectQueryOptions;
class ExpressionActions;
using ExpressionActionsPtr = std::shared_ptr<ExpressionActions>;
using ManyExpressionActions = std::vector<ExpressionActionsPtr>;
2021-04-02 16:45:18 +00:00
class MergeTreeDeduplicationLog;
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
2017-02-07 17:52:41 +00:00
/// Data structure for *MergeTree engines.
/// Merge tree is used for incremental sorting of data.
/// The table consists of several sorted parts.
/// During insertion new data is sorted according to the primary key and is written to the new part.
/// Parts are merged in the background according to a heuristic algorithm.
/// For each part the index file is created containing primary key values for every n-th row.
/// This allows efficient selection by primary key range predicate.
///
/// Additionally:
///
/// The date column is specified. For each part min and max dates are remembered.
/// Essentially it is an index too.
///
2017-09-13 16:22:04 +00:00
/// Data is partitioned by the value of the partitioning expression.
/// Parts belonging to different partitions are not merged - for the ease of administration (data sync and backup).
2017-02-07 17:52:41 +00:00
///
2017-09-13 16:22:04 +00:00
/// File structure of old-style month-partitioned tables (format_version = 0):
2017-02-07 17:52:41 +00:00
/// Part directory - / min-date _ max-date _ min-id _ max-id _ level /
/// Inside the part directory:
/// checksums.txt - contains the list of all files along with their sizes and checksums.
/// columns.txt - contains the list of all columns and their types.
/// primary.idx - contains the primary index.
/// [Column].bin - contains compressed column data.
/// [Column].mrk - marks, pointing to seek positions allowing to skip n * k rows.
///
2017-09-13 16:22:04 +00:00
/// File structure of tables with custom partitioning (format_version >= 1):
2020-05-21 09:00:44 +00:00
/// Part directory - / partition-id _ min-id _ max-id _ level /
2017-09-13 16:22:04 +00:00
/// Inside the part directory:
/// The same files as for month-partitioned tables, plus
/// count.txt - contains total number of rows in this part.
/// partition.dat - contains the value of the partitioning expression.
2020-01-17 12:24:27 +00:00
/// minmax_[Column].idx - MinMax indexes (see IMergeTreeDataPart::MinMaxIndex class) for the columns required by the partitioning expression.
2017-09-13 16:22:04 +00:00
///
2017-02-07 17:52:41 +00:00
/// Several modes are implemented. Modes determine additional actions during merge:
/// - Ordinary - don't do anything special
/// - Collapsing - collapse pairs of rows with the opposite values of sign_columns for the same values
2022-05-09 19:13:02 +00:00
/// of primary key (cf. CollapsingSortedTransform.h)
2017-02-07 17:52:41 +00:00
/// - Replacing - for all rows with the same primary key keep only the latest one. Or, if the version
/// column is set, keep the latest row with the maximal version.
/// - Summing - sum all numeric columns not contained in the primary key for all rows with the same primary key.
/// - Aggregating - merge columns containing aggregate function states for all rows with the same primary key.
/// - Graphite - performs coarsening of historical data for Graphite (a system for quantitative monitoring).
/// The MergeTreeData class contains a list of parts and the data structure parameters.
/// To read and modify the data use other classes:
/// - MergeTreeDataSelectExecutor
/// - MergeTreeDataWriter
/// - MergeTreeDataMergerMutator
2017-02-07 17:52:41 +00:00
2021-05-31 14:49:02 +00:00
class MergeTreeData : public IStorage, public WithMutableContext
2014-03-09 17:36:01 +00:00
{
public:
2017-02-07 17:52:41 +00:00
/// Function to call if the part is suspected to contain corrupt data.
2016-02-14 04:58:47 +00:00
using BrokenPartCallback = std::function<void (const String &)>;
2019-10-10 16:30:30 +00:00
using DataPart = IMergeTreeDataPart;
2016-02-14 04:58:47 +00:00
using MutableDataPartPtr = std::shared_ptr<DataPart>;
using MutableDataPartsVector = std::vector<MutableDataPartPtr>;
2017-02-07 17:52:41 +00:00
/// After the DataPart is added to the working set, it cannot be changed.
2016-02-14 04:58:47 +00:00
using DataPartPtr = std::shared_ptr<const DataPart>;
2022-08-12 11:03:57 +00:00
using DataPartState = MergeTreeDataPartState;
using DataPartStates = std::initializer_list<DataPartState>;
using DataPartStateVector = std::vector<DataPartState>;
using PinnedPartUUIDsPtr = std::shared_ptr<const PinnedPartUUIDs>;
constexpr static auto FORMAT_VERSION_FILE_NAME = "format_version.txt";
constexpr static auto DETACHED_DIR_NAME = "detached";
2022-09-20 12:00:57 +00:00
constexpr static auto MOVING_DIR_NAME = "moving";
/// Auxiliary structure for index comparison. Keep in mind lifetime of MergeTreePartInfo.
struct DataPartStateAndInfo
{
DataPartState state;
const MergeTreePartInfo & info;
};
/// Auxiliary structure for index comparison
struct DataPartStateAndPartitionID
{
DataPartState state;
String partition_id;
};
STRONG_TYPEDEF(String, PartitionID)
struct LessDataPart
{
using is_transparent = void;
bool operator()(const DataPartPtr & lhs, const MergeTreePartInfo & rhs) const { return lhs->info < rhs; }
bool operator()(const MergeTreePartInfo & lhs, const DataPartPtr & rhs) const { return lhs < rhs->info; }
bool operator()(const DataPartPtr & lhs, const DataPartPtr & rhs) const { return lhs->info < rhs->info; }
bool operator()(const MergeTreePartInfo & lhs, const PartitionID & rhs) const { return lhs.partition_id < rhs.toUnderType(); }
bool operator()(const PartitionID & lhs, const MergeTreePartInfo & rhs) const { return lhs.toUnderType() < rhs.partition_id; }
};
struct LessStateDataPart
{
using is_transparent = void;
bool operator() (const DataPartStateAndInfo & lhs, const DataPartStateAndInfo & rhs) const
{
return std::forward_as_tuple(static_cast<UInt8>(lhs.state), lhs.info)
< std::forward_as_tuple(static_cast<UInt8>(rhs.state), rhs.info);
}
bool operator() (DataPartStateAndInfo info, const DataPartState & state) const
{
return static_cast<size_t>(info.state) < static_cast<size_t>(state);
}
bool operator() (const DataPartState & state, DataPartStateAndInfo info) const
{
return static_cast<size_t>(state) < static_cast<size_t>(info.state);
}
bool operator() (const DataPartStateAndInfo & lhs, const DataPartStateAndPartitionID & rhs) const
{
return std::forward_as_tuple(static_cast<UInt8>(lhs.state), lhs.info.partition_id)
< std::forward_as_tuple(static_cast<UInt8>(rhs.state), rhs.partition_id);
}
bool operator() (const DataPartStateAndPartitionID & lhs, const DataPartStateAndInfo & rhs) const
{
return std::forward_as_tuple(static_cast<UInt8>(lhs.state), lhs.partition_id)
< std::forward_as_tuple(static_cast<UInt8>(rhs.state), rhs.info.partition_id);
}
};
using DataParts = std::set<DataPartPtr, LessDataPart>;
using MutableDataParts = std::set<MutableDataPartPtr, LessDataPart>;
2016-02-14 04:58:47 +00:00
using DataPartsVector = std::vector<DataPartPtr>;
using DataPartsLock = std::unique_lock<std::mutex>;
DataPartsLock lockParts() const { return DataPartsLock(data_parts_mutex); }
using OperationDataPartsLock = std::unique_lock<std::mutex>;
OperationDataPartsLock lockOperationsWithParts() const { return OperationDataPartsLock(operation_with_data_parts_mutex); }
2020-02-13 14:42:48 +00:00
MergeTreeDataPartType choosePartType(size_t bytes_uncompressed, size_t rows_count) const;
2020-04-14 19:47:19 +00:00
MergeTreeDataPartType choosePartTypeOnDisk(size_t bytes_uncompressed, size_t rows_count) const;
2020-02-13 14:42:48 +00:00
/// After this method setColumns must be called
2019-11-21 16:10:22 +00:00
MutableDataPartPtr createPart(const String & name,
2019-11-22 12:51:00 +00:00
MergeTreeDataPartType type, const MergeTreePartInfo & part_info,
2022-10-22 22:51:59 +00:00
const MutableDataPartStoragePtr & data_part_storage, const IMergeTreeDataPart * parent_part = nullptr) const;
2019-11-21 16:10:22 +00:00
2020-06-03 09:51:23 +00:00
/// Create part, that already exists on filesystem.
/// After this methods 'loadColumnsChecksumsIndexes' must be called.
2019-11-21 16:10:22 +00:00
MutableDataPartPtr createPart(const String & name,
2022-10-22 22:51:59 +00:00
const MutableDataPartStoragePtr & data_part_storage, const IMergeTreeDataPart * parent_part = nullptr) const;
2019-11-21 16:10:22 +00:00
MutableDataPartPtr createPart(const String & name, const MergeTreePartInfo & part_info,
2022-10-22 22:51:59 +00:00
const MutableDataPartStoragePtr & data_part_storage, const IMergeTreeDataPart * parent_part = nullptr) const;
2019-11-21 16:10:22 +00:00
/// Auxiliary object to add a set of parts into the working set in two steps:
2021-12-30 14:27:22 +00:00
/// * First, as PreActive parts (the parts are ready, but not yet in the active set).
/// * Next, if commit() is called, the parts are added to the active set and the parts that are
/// covered by them are marked Outdated.
2017-02-07 17:52:41 +00:00
/// If neither commit() nor rollback() was called, the destructor rollbacks the operation.
2014-07-01 15:58:25 +00:00
class Transaction : private boost::noncopyable
{
public:
2022-06-24 11:34:00 +00:00
Transaction(MergeTreeData & data_, MergeTreeTransaction * txn_);
DataPartsVector commit(MergeTreeData::DataPartsLock * acquired_parts_lock = nullptr);
2022-10-22 22:51:59 +00:00
void addPart(MutableDataPartPtr & part);
2022-06-28 10:51:49 +00:00
void rollback();
2020-09-17 19:30:17 +00:00
/// Immediately remove parts from table's data_parts set and change part
/// state to temporary. Useful for new parts which not present in table.
void rollbackPartsToTemporaryState();
2018-05-13 00:24:23 +00:00
size_t size() const { return precommitted_parts.size(); }
bool isEmpty() const { return precommitted_parts.empty(); }
2014-07-01 15:58:25 +00:00
~Transaction()
{
try
{
rollback();
2014-07-01 15:58:25 +00:00
}
catch (...)
2014-07-01 15:58:25 +00:00
{
tryLogCurrentException("~MergeTreeData::Transaction");
}
}
TransactionID getTID() const;
2014-07-01 15:58:25 +00:00
private:
friend class MergeTreeData;
MergeTreeData & data;
2021-06-04 09:26:47 +00:00
MergeTreeTransaction * txn;
MutableDataParts precommitted_parts;
MutableDataParts locked_parts;
void clear();
2014-07-01 15:58:25 +00:00
};
using TransactionUniquePtr = std::unique_ptr<Transaction>;
using PathWithDisk = std::pair<String, DiskPtr>;
2019-07-30 17:24:40 +00:00
struct PartsTemporaryRename : private boost::noncopyable
{
2019-08-29 16:17:47 +00:00
PartsTemporaryRename(
const MergeTreeData & storage_,
const String & source_dir_)
: storage(storage_)
, source_dir(source_dir_)
{
}
2019-07-30 17:24:40 +00:00
2021-11-29 18:20:44 +00:00
/// Adds part to rename. Both names are relative to relative_data_path.
2021-11-24 19:45:10 +00:00
void addPart(const String & old_name, const String & new_name, const DiskPtr & disk);
2019-07-30 17:24:40 +00:00
2019-07-31 14:44:55 +00:00
/// Renames part from old_name to new_name
void tryRenameAll();
2019-07-30 17:24:40 +00:00
/// Renames all added parts from new_name to old_name if old name is not empty
~PartsTemporaryRename();
2021-11-24 19:45:10 +00:00
struct RenameInfo
{
String old_name;
String new_name;
2021-11-29 18:20:44 +00:00
/// Disk cannot be changed
2021-11-24 19:45:10 +00:00
DiskPtr disk;
};
2019-07-31 14:44:55 +00:00
const MergeTreeData & storage;
2019-08-29 16:17:47 +00:00
const String source_dir;
2021-11-24 19:45:10 +00:00
std::vector<RenameInfo> old_and_new_names;
2019-07-31 14:44:55 +00:00
bool renamed = false;
2019-07-30 17:24:40 +00:00
};
2017-02-07 17:52:41 +00:00
/// Parameters for various modes.
2016-04-15 17:13:51 +00:00
struct MergingParams
{
2017-02-07 17:52:41 +00:00
/// Merging mode. See above.
2016-04-15 17:42:51 +00:00
enum Mode
{
Ordinary = 0, /// Enum values are saved. Do not change them.
Collapsing = 1,
Summing = 2,
Aggregating = 3,
Replacing = 5,
Graphite = 6,
VersionedCollapsing = 7,
2016-04-15 17:42:51 +00:00
};
2016-04-15 17:42:51 +00:00
Mode mode;
/// For Collapsing and VersionedCollapsing mode.
2016-04-15 17:13:51 +00:00
String sign_column;
2017-02-07 17:52:41 +00:00
/// For Summing mode. If empty - columns_to_sum is determined automatically.
2016-04-15 17:13:51 +00:00
Names columns_to_sum;
/// For Replacing and VersionedCollapsing mode. Can be empty for Replacing.
2016-04-15 17:13:51 +00:00
String version_column;
2017-02-07 17:52:41 +00:00
/// For Graphite mode.
2016-04-24 09:44:47 +00:00
Graphite::Params graphite_params;
2017-02-07 17:52:41 +00:00
/// Check that needed columns are present and have correct types.
2020-07-06 14:33:31 +00:00
void check(const StorageInMemoryMetadata & metadata) const;
2016-04-24 09:44:47 +00:00
String getModeName() const;
2014-03-13 12:48:07 +00:00
};
/// Attach the table corresponding to the directory in full_path inside policy (must end with /), with the given columns.
2017-02-07 17:52:41 +00:00
/// Correctness of names and paths is not checked.
///
/// date_column_name - if not empty, the name of the Date column used for partitioning by month.
/// Otherwise, partition_by_ast is used for partitioning.
///
/// order_by_ast - a single expression or a tuple. It is used as a sorting key
/// (an ASTExpressionList used for sorting data in parts);
/// primary_key_ast - can be nullptr, an expression, or a tuple.
/// Used to determine an ASTExpressionList values of which are written in the primary.idx file
/// for one row in every `index_granularity` rows to speed up range queries.
2018-10-15 18:02:07 +00:00
/// Primary key must be a prefix of the sorting key;
/// If it is nullptr, then it will be determined from order_by_ast.
2018-10-15 18:02:07 +00:00
///
2017-02-07 17:52:41 +00:00
/// require_part_metadata - should checksums.txt and columns.txt exist in the part directory.
/// attach - whether the existing table is attached or the new table is created.
2019-12-04 16:06:55 +00:00
MergeTreeData(const StorageID & table_id_,
const String & relative_data_path_,
2020-06-09 17:28:29 +00:00
const StorageInMemoryMetadata & metadata_,
2021-05-31 14:49:02 +00:00
ContextMutablePtr context_,
const String & date_column_name,
const MergingParams & merging_params_,
2019-08-26 14:24:29 +00:00
std::unique_ptr<MergeTreeSettings> settings_,
bool require_part_metadata_,
bool attach,
BrokenPartCallback broken_part_callback_ = [](const String &){});
2021-10-13 17:31:37 +00:00
/// Build a block of minmax and count values of a MergeTree table. These values are extracted
/// from minmax_indices, the first expression of primary key, and part rows.
///
2022-04-01 09:18:55 +00:00
/// has_filter - if query has no filter, bypass partition pruning completely
///
2021-10-13 17:31:37 +00:00
/// query_info - used to filter unneeded parts
///
/// parts - part set to filter
///
/// normal_parts - collects parts that don't have all the needed values to form the block.
/// Specifically, this is when a part doesn't contain a final mark and the related max value is
/// required.
Block getMinMaxCountProjectionBlock(
const StorageMetadataPtr & metadata_snapshot,
const Names & required_columns,
2022-03-31 07:55:56 +00:00
bool has_filter,
const SelectQueryInfo & query_info,
const DataPartsVector & parts,
DataPartsVector & normal_parts,
const PartitionIdToMaxBlock * max_block_numbers_to_read,
ContextPtr query_context) const;
2021-11-19 03:51:05 +00:00
std::optional<ProjectionCandidate> getQueryProcessingStageWithAggregateProjection(
ContextPtr query_context, const StorageSnapshotPtr & storage_snapshot, SelectQueryInfo & query_info) const;
2021-04-21 16:00:27 +00:00
QueryProcessingStage::Enum getQueryProcessingStage(
ContextPtr query_context,
QueryProcessingStage::Enum to_stage,
const StorageSnapshotPtr & storage_snapshot,
2021-04-21 16:00:27 +00:00
SelectQueryInfo & info) const override;
ReservationPtr reserveSpace(UInt64 expected_size, VolumePtr & volume) const;
static ReservationPtr tryReserveSpace(UInt64 expected_size, const IDataPartStorage & data_part_storage);
static ReservationPtr reserveSpace(UInt64 expected_size, const IDataPartStorage & data_part_storage);
static bool partsContainSameProjections(const DataPartPtr & left, const DataPartPtr & right);
2019-12-26 18:17:05 +00:00
StoragePolicyPtr getStoragePolicy() const override;
2019-05-03 02:00:57 +00:00
bool supportsPrewhere() const override { return true; }
2022-01-17 04:33:47 +00:00
bool supportsFinal() const override;
2020-12-22 16:40:53 +00:00
bool supportsSubcolumns() const override { return true; }
bool supportsTTL() const override { return true; }
bool supportsDynamicSubcolumns() const override { return true; }
bool supportsLightweightDelete() const override;
NamesAndTypesList getVirtuals() const override;
2019-08-07 15:21:45 +00:00
bool mayBenefitFromIndexForIn(const ASTPtr & left_in_operand, ContextPtr, const StorageMetadataPtr & metadata_snapshot) const override;
2022-03-14 17:29:18 +00:00
/// Snapshot for MergeTree contains the current set of data parts
/// at the moment of the start of query.
struct SnapshotData : public StorageSnapshot::Data
{
DataPartsVector parts;
};
StorageSnapshotPtr getStorageSnapshot(const StorageMetadataPtr & metadata_snapshot, ContextPtr query_context) const override;
2019-05-04 03:45:58 +00:00
/// Load the set of data parts from disk. Call once - immediately after the object is created.
void loadDataParts(bool skip_sanity_checks);
String getLogName() const { return *std::atomic_load(&log_name); }
2019-05-04 03:45:58 +00:00
Int64 getMaxBlockNumber() const;
struct ProjectionPartsVector
{
DataPartsVector projection_parts;
DataPartsVector data_parts;
};
2021-11-17 18:14:14 +00:00
2017-02-07 17:52:41 +00:00
/// Returns a copy of the list so that the caller shouldn't worry about locks.
DataParts getDataParts(const DataPartStates & affordable_states) const;
SYSTEM RESTORE REPLICA replica [ON CLUSTER cluster] (#13652) * initial commit: add setting and stub * typo * added test stub * fix * wip merging new integration test and code proto * adding steps interpreters * adding firstly proposed solution (moving parts etc) * added checking zookeeper path existence * fixing the include * fixing and sorting includes * fixing outdated struct * fix the name * added ast ptr as level of indirection * fix ref * updating the changes * working on test stub * fix iterator -> reference * revert rocksdb submodule update * fixed show privileges test * updated the test stub * replaced rand() with thread_local_rng(), updated the tests updated the test fixed test config path test fix removed error messages fixed the test updated the test fixed string literal fixed literal typo: = * fixed the empty replica error message * updated the test and the code with logs * updated the possible test cases, updated * added the code/test milestone comments * updated the test (added more testcases) * replaced native assert with CH one * individual replicas recursive delete fix * updated the AS db.name AST * two small logging fixes * manually generated AST fixes * Updated the test, added the possible algo change * Some thoughts about optimizing the solution: ALTER MOVE PARTITION .. TO TABLE -> move to detached/ + ALTER ... ATTACH * fix * Removed the replica sync in test as it's invalid * Some test tweaks * tmp * Rewrote the algo by using the executeQuery instead of hand-crafting the ASTPtr. Two questions still active. * tr: logging active parts * Extracted the parts moving algo into a separate helper function * Fixed the test data and the queries slightly * Replaced query to system.parts to direct invocation, started building the test that breaks on various parts. * Added the case for tables when at least one replica is alive * Updated the test to test replicas restoration by detaching/attaching * Altered the test to check restoration without replica restart * Added the tables swap in the start if the server failed last time * Hotfix when only /replicas/replica... path was deleted * Restore ZK paths while creating a replicated MergeTree table * Updated the docs, fixed the algo for individual replicas restoration case * Initial parts table storage fix, tests sync fix * Reverted individual replica restoration to general algo * Slightly optimised getDataParts * Trying another solution with parts detaching * Rewrote algo without any steps, added ON CLUSTER support * Attaching parts from other replica on restoration * Getting part checksums from ZK * Removed ON CLUSTER, finished working solution * Multiple small changes after review * Fixing parallel test * Supporting rewritten form on cluster * Test fix * Moar logging * Using source replica as checksum provider * improve test, remove some code from parser * Trying solution with move to detached + forget * Moving all parts (not only Committed) to detached * Edited docs for RESTORE REPLICA * Re-merging * minor fixes Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>
2021-06-20 08:24:43 +00:00
DataPartsVector getDataPartsVectorForInternalUsage(
const DataPartStates & affordable_states, const DataPartsLock & lock, DataPartStateVector * out_states = nullptr) const;
/// Returns sorted list of the parts with specified states
/// out_states will contain snapshot of each part state
DataPartsVector getDataPartsVectorForInternalUsage(
const DataPartStates & affordable_states, DataPartStateVector * out_states = nullptr) const;
/// Same as above but only returns projection parts
ProjectionPartsVector getProjectionPartsVectorForInternalUsage(
const DataPartStates & affordable_states, DataPartStateVector * out_states = nullptr) const;
/// Returns absolutely all parts (and snapshot of their states)
DataPartsVector getAllDataPartsVector(DataPartStateVector * out_states = nullptr) const;
/// Same as above but only returns projection parts
ProjectionPartsVector getAllProjectionPartsVector(MergeTreeData::DataPartStateVector * out_states = nullptr) const;
/// Returns parts in Active state
2021-11-17 18:14:14 +00:00
DataParts getDataPartsForInternalUsage() const;
DataPartsVector getDataPartsVectorForInternalUsage() const;
2022-03-14 20:43:34 +00:00
void filterVisibleDataParts(DataPartsVector & maybe_visible_parts, CSN snapshot_version, TransactionID current_tid) const;
2021-11-17 18:14:14 +00:00
/// Returns parts that visible with current snapshot
DataPartsVector getVisibleDataPartsVector(ContextPtr local_context) const;
DataPartsVector getVisibleDataPartsVectorUnlocked(ContextPtr local_context, const DataPartsLock & lock) const;
2021-05-17 11:14:09 +00:00
DataPartsVector getVisibleDataPartsVector(const MergeTreeTransactionPtr & txn) const;
2022-03-14 20:43:34 +00:00
DataPartsVector getVisibleDataPartsVector(CSN snapshot_version, TransactionID current_tid) const;
2021-04-08 17:20:45 +00:00
/// Returns a part in Active state with the given name or a part containing it. If there is no such part, returns nullptr.
DataPartPtr getActiveContainingPart(const String & part_name) const;
DataPartPtr getActiveContainingPart(const MergeTreePartInfo & part_info) const;
DataPartPtr getActiveContainingPart(const MergeTreePartInfo & part_info, DataPartState state, DataPartsLock & lock) const;
2019-08-16 15:57:19 +00:00
/// Swap part with it's identical copy (possible with another path on another disk).
/// If original part is not active or doesn't exist exception will be thrown.
2019-08-19 14:40:12 +00:00
void swapActivePart(MergeTreeData::DataPartPtr part_copy);
2019-06-07 19:16:42 +00:00
/// Returns all parts in specified partition
2022-04-12 12:14:26 +00:00
DataPartsVector getVisibleDataPartsVectorInPartition(MergeTreeTransaction * txn, const String & partition_id, DataPartsLock * acquired_lock = nullptr) const;
2022-06-17 16:13:57 +00:00
DataPartsVector getVisibleDataPartsVectorInPartition(ContextPtr local_context, const String & partition_id, DataPartsLock & lock) const;
2021-11-17 18:14:14 +00:00
DataPartsVector getVisibleDataPartsVectorInPartition(ContextPtr local_context, const String & partition_id) const;
DataPartsVector getVisibleDataPartsVectorInPartitions(ContextPtr local_context, const std::unordered_set<String> & partition_ids) const;
DataPartsVector getDataPartsVectorInPartitionForInternalUsage(const DataPartState & state, const String & partition_id, DataPartsLock * acquired_lock = nullptr) const;
2022-05-09 13:21:33 +00:00
DataPartsVector getDataPartsVectorInPartitionForInternalUsage(const DataPartStates & affordable_states, const String & partition_id, DataPartsLock * acquired_lock = nullptr) const;
/// Returns the part with the given name and state or nullptr if no such part.
DataPartPtr getPartIfExists(const String & part_name, const DataPartStates & valid_states);
DataPartPtr getPartIfExists(const MergeTreePartInfo & part_info, const DataPartStates & valid_states);
2017-02-07 17:52:41 +00:00
/// Total size of active parts in bytes.
size_t getTotalActiveSizeInBytes() const;
size_t getTotalActiveSizeInRows() const;
size_t getPartsCount() const;
2022-10-01 18:46:50 +00:00
/// Returns a pair with: max number of parts in partition across partitions; sum size of parts inside that partition.
/// (if there are multiple partitions with max number of parts, the sum size of parts is returned for arbitrary of them)
std::pair<size_t, size_t> getMaxPartsCountAndSizeForPartitionWithState(DataPartState state) const;
std::pair<size_t, size_t> getMaxPartsCountAndSizeForPartition() const;
2021-02-10 03:22:24 +00:00
size_t getMaxInactivePartsCountForPartition() const;
/// Get min value of part->info.getDataVersion() for all active parts.
/// Makes sense only for ordinary MergeTree engines because for them block numbering doesn't depend on partition.
std::optional<Int64> getMinPartDataVersion() const;
2021-11-17 18:14:14 +00:00
/// Returns all detached parts
DetachedPartsInfo getDetachedParts() const;
static void validateDetachedPartName(const String & name);
2021-11-17 18:14:14 +00:00
void dropDetached(const ASTPtr & partition, bool part, ContextPtr context);
MutableDataPartsVector tryLoadPartsToAttach(const ASTPtr & partition, bool attach_part,
ContextPtr context, PartsTemporaryRename & renamed_parts);
2017-02-07 17:52:41 +00:00
/// If the table contains too many active parts, sleep for a while to give them time to merge.
/// If until is non-null, wake up from the sleep earlier if the event happened.
void delayInsertOrThrowIfNeeded(Poco::Event * until, ContextPtr query_context) const;
/// Renames temporary part to a permanent part and adds it to the parts set.
2017-02-07 17:52:41 +00:00
/// It is assumed that the part does not intersect with existing parts.
2022-06-24 15:33:43 +00:00
/// Adds the part in the PreActive state (the part will be added to the active set later with out_transaction->commit()).
/// Returns true if part was added. Returns false if part is covered by bigger part.
bool renameTempPartAndAdd(
MutableDataPartPtr & part,
2022-06-24 11:19:29 +00:00
Transaction & transaction,
2022-06-24 15:19:59 +00:00
DataPartsLock & lock);
/// The same as renameTempPartAndAdd but the block range of the part can contain existing parts.
/// Returns all parts covered by the added part (in ascending order).
DataPartsVector renameTempPartAndReplace(
2022-06-24 15:33:43 +00:00
MutableDataPartPtr & part,
2022-10-22 22:51:59 +00:00
Transaction & out_transaction);
2022-06-24 15:33:43 +00:00
/// Unlocked version of previous one. Useful when added multiple parts with a single lock.
bool renameTempPartAndReplaceUnlocked(
MutableDataPartPtr & part,
2022-06-24 15:19:59 +00:00
Transaction & out_transaction,
DataPartsLock & lock,
DataPartsVector * out_covered_parts = nullptr);
2020-09-17 19:30:17 +00:00
/// Remove parts from working set immediately (without wait for background
/// process). Transfer part state to temporary. Have very limited usage only
/// for new parts which aren't already present in table.
void removePartsFromWorkingSetImmediatelyAndSetTemporaryState(const DataPartsVector & remove);
/// Removes parts from the working set parts.
2021-12-30 14:27:22 +00:00
/// Parts in add must already be in data_parts with PreActive, Active, or Outdated states.
2017-02-07 17:52:41 +00:00
/// If clear_without_timeout is true, the parts will be deleted at once, or during the next call to
/// clearOldParts (ignoring old_parts_lifetime).
2021-06-08 18:17:18 +00:00
void removePartsFromWorkingSet(MergeTreeTransaction * txn, const DataPartsVector & remove, bool clear_without_timeout, DataPartsLock * acquired_lock = nullptr);
void removePartsFromWorkingSet(MergeTreeTransaction * txn, const DataPartsVector & remove, bool clear_without_timeout, DataPartsLock & acquired_lock);
2022-05-04 14:22:06 +00:00
/// Removes all parts covered by drop_range from the working set parts.
/// Used in REPLACE PARTITION command.
void removePartsInRangeFromWorkingSet(MergeTreeTransaction * txn, const MergeTreePartInfo & drop_range, DataPartsLock & lock);
2022-10-30 16:30:51 +00:00
/// This wrapper is required to restrict access to parts in Deleting state
class PartToRemoveFromZooKeeper
{
DataPartPtr part;
bool was_active;
public:
explicit PartToRemoveFromZooKeeper(DataPartPtr && part_, bool was_active_ = true)
2022-10-30 16:30:51 +00:00
: part(std::move(part_)), was_active(was_active_)
{
}
2022-10-31 13:18:17 +00:00
/// It's safe to get name of any part
2022-10-30 16:30:51 +00:00
const String & getPartName() const { return part->name; }
DataPartPtr getPartIfItWasActive() const
{
return was_active ? part : nullptr;
}
};
using PartsToRemoveFromZooKeeper = std::vector<PartToRemoveFromZooKeeper>;
2022-05-04 14:22:06 +00:00
/// Same as above, but also returns list of parts to remove from ZooKeeper.
/// It includes parts that have been just removed by these method
/// and Outdated parts covered by drop_range that were removed earlier for any reason.
2022-10-30 16:30:51 +00:00
PartsToRemoveFromZooKeeper removePartsInRangeFromWorkingSetAndGetPartsToRemoveFromZooKeeper(
2022-05-04 14:22:06 +00:00
MergeTreeTransaction * txn, const MergeTreePartInfo & drop_range, DataPartsLock & lock);
2021-06-04 09:26:47 +00:00
/// Restores Outdated part and adds it to working set
void restoreAndActivatePart(const DataPartPtr & part, DataPartsLock * acquired_lock = nullptr);
/// Renames the part to detached/<prefix>_<part> and removes it from data_parts,
//// so it will not be deleted in clearOldParts.
2017-02-07 17:52:41 +00:00
/// If restore_covered is true, adds to the working set inactive parts, which were merged into the deleted part.
/// NOTE: This method is safe to use only for parts which nobody else holds (like on server start or for parts which was not committed).
/// For active parts it's unsafe because this method modifies fields of part (rename) while some other thread can try to read it.
void forcefullyMovePartToDetachedAndRemoveFromMemory(const DataPartPtr & part, const String & prefix = "", bool restore_covered = false);
/// Outdate broken part, set remove time to zero (remove as fast as possible) and make clone in detached directory.
void outdateBrokenPartAndCloneToDetached(const DataPartPtr & part, const String & prefix);
/// If the part is Obsolete and not used by anybody else, immediately delete it from filesystem and remove from memory.
void tryRemovePartImmediately(DataPartPtr && part);
2020-01-13 14:45:13 +00:00
/// Returns old inactive parts that can be deleted. At the same time removes them from the list of parts but not from the disk.
/// If 'force' - don't wait for old_parts_lifetime.
2020-01-10 09:46:24 +00:00
DataPartsVector grabOldParts(bool force = false);
/// Reverts the changes made by grabOldParts(), parts should be in Deleting state.
void rollbackDeletingParts(const DataPartsVector & parts);
/// Removes parts from data_parts, they should be in Deleting state
void removePartsFinally(const DataPartsVector & parts);
/// When WAL is not enabled, the InMemoryParts need to be persistent.
void flushAllInMemoryPartsIfNeeded();
/// Delete irrelevant parts from memory and disk.
2020-01-13 14:45:13 +00:00
/// If 'force' - don't wait for old_parts_lifetime.
2021-11-09 12:26:51 +00:00
size_t clearOldPartsFromFilesystem(bool force = false);
2022-05-07 22:53:55 +00:00
/// Try to clear parts from filesystem. Throw exception in case of errors.
2022-05-09 13:21:21 +00:00
void clearPartsFromFilesystem(const DataPartsVector & parts, bool throw_on_error = true, NameSet * parts_failed_to_delete = nullptr);
/// Delete WAL files containing parts, that all already stored on disk.
2021-11-09 12:26:51 +00:00
size_t clearOldWriteAheadLogs();
2022-09-05 01:50:24 +00:00
size_t clearOldBrokenPartsFromDetachedDirectory();
/// Delete all directories which names begin with "tmp"
/// Must be called with locked lockForShare() because it's using relative_data_path.
2022-08-09 16:44:51 +00:00
size_t clearOldTemporaryDirectories(size_t custom_directories_lifetime_seconds, const NameSet & valid_prefixes = {"tmp_", "tmp-fetch_"});
2021-11-09 12:26:51 +00:00
size_t clearEmptyParts();
2017-02-07 17:52:41 +00:00
/// After the call to dropAllData() no method can be called.
/// Deletes the data directory and flushes the uncompressed blocks cache and the marks cache.
2014-03-13 17:44:00 +00:00
void dropAllData();
/// This flag is for hardening and assertions.
bool all_data_dropped = false;
/// Drop data directories if they are empty. It is safe to call this method if table creation was unsuccessful.
void dropIfEmpty();
2020-06-23 08:04:43 +00:00
/// Moves the entire data directory. Flushes the uncompressed blocks cache
/// and the marks cache. Must be called with locked lockExclusively()
/// because changes relative_data_path.
2020-04-07 14:05:51 +00:00
void rename(const String & new_table_path, const StorageID & new_table_id) override;
/// Also rename log names.
void renameInMemory(const StorageID & new_table_id) override;
2017-02-07 17:52:41 +00:00
/// Check if the ALTER can be performed:
/// - all needed columns are present.
/// - all type conversions can be done.
2019-01-16 16:53:38 +00:00
/// - columns corresponding to primary key, indices, sign, sampling expression and date are not affected.
2017-02-07 17:52:41 +00:00
/// If something is wrong, throws an exception.
void checkAlterIsPossible(const AlterCommands & commands, ContextPtr context) const override;
/// Checks if the Mutation can be performed.
2021-02-25 14:43:58 +00:00
/// (currently no additional checks: always ok)
void checkMutationIsPossible(const MutationCommands & commands, const Settings & settings) const override;
2020-07-14 08:19:39 +00:00
/// Checks that partition name in all commands is valid
2020-07-13 17:27:52 +00:00
void checkAlterPartitionIsPossible(const PartitionCommands & commands, const StorageMetadataPtr & metadata_snapshot, const Settings & settings) const override;
/// Change MergeTreeSettings
void changeSettings(
2020-07-10 08:13:21 +00:00
const ASTPtr & new_settings,
2021-10-25 17:49:49 +00:00
AlterLockHolder & table_lock_holder);
2019-08-06 13:04:29 +00:00
2017-02-07 17:52:41 +00:00
/// Should be called if part data is suspected to be corrupted.
2022-07-18 21:37:07 +00:00
/// Has the ability to check all other parts
/// which reside on the same disk of the suspicious part.
void reportBrokenPart(MergeTreeData::DataPartPtr & data_part) const;
2020-06-16 08:39:12 +00:00
/// TODO (alesap) Duplicate method required for compatibility.
/// Must be removed.
static ASTPtr extractKeyExpressionList(const ASTPtr & node)
{
return DB::extractKeyExpressionList(node);
}
2020-01-22 19:52:55 +00:00
/** Create local backup (snapshot) for parts with specified prefix.
* Backup is created in directory clickhouse_dir/shadow/i/, where i - incremental number,
* or if 'with_name' is specified - backup is created in directory with specified name.
*/
2021-03-02 20:28:42 +00:00
PartitionCommandsResultInfo freezePartition(
const ASTPtr & partition,
const StorageMetadataPtr & metadata_snapshot,
const String & with_name,
ContextPtr context,
2021-03-02 20:28:42 +00:00
TableLockHolder & table_lock_holder);
2021-03-02 20:28:42 +00:00
/// Freezes all parts.
PartitionCommandsResultInfo freezeAll(
const String & with_name,
const StorageMetadataPtr & metadata_snapshot,
ContextPtr context,
2021-03-02 20:28:42 +00:00
TableLockHolder & table_lock_holder);
2021-03-02 20:28:42 +00:00
/// Unfreezes particular partition.
PartitionCommandsResultInfo unfreezePartition(
const ASTPtr & partition,
2021-02-24 14:26:46 +00:00
const String & backup_name,
ContextPtr context,
2021-03-02 20:28:42 +00:00
TableLockHolder & table_lock_holder);
/// Unfreezes all parts.
PartitionCommandsResultInfo unfreezeAll(
const String & backup_name,
ContextPtr context,
2021-03-02 20:28:42 +00:00
TableLockHolder & table_lock_holder);
/// Makes backup entries to backup the data of the storage.
void backupData(BackupEntriesCollector & backup_entries_collector, const String & data_path_in_backup, const std::optional<ASTs> & partitions) override;
2021-08-18 22:19:14 +00:00
/// Extract data from the backup and put it to the storage.
void restoreDataFromBackup(RestorerFromBackup & restorer, const String & data_path_in_backup, const std::optional<ASTs> & partitions) override;
2021-08-18 22:19:14 +00:00
/// Returns true if the storage supports backup/restore for specific partitions.
bool supportsBackupPartition() const override { return true; }
2021-08-18 22:19:14 +00:00
/// Moves partition to specified Disk
void movePartitionToDisk(const ASTPtr & partition, const String & name, bool moving_part, ContextPtr context);
2019-07-18 15:19:03 +00:00
/// Moves partition to specified Volume
void movePartitionToVolume(const ASTPtr & partition, const String & name, bool moving_part, ContextPtr context);
2021-11-17 18:14:14 +00:00
/// Checks that Partition could be dropped right now
/// Otherwise - throws an exception with detailed information.
/// We do not use mutex because it is not very important that the size could change during the operation.
void checkPartitionCanBeDropped(const ASTPtr & partition, ContextPtr local_context);
2020-07-13 16:19:08 +00:00
2021-05-17 14:26:36 +00:00
void checkPartCanBeDropped(const String & part_name);
Pipe alterPartition(
const StorageMetadataPtr & metadata_snapshot,
const PartitionCommands & commands,
ContextPtr query_context) override;
size_t getColumnCompressedSize(const std::string & name) const
2014-09-19 11:44:29 +00:00
{
2019-03-28 19:58:41 +00:00
auto lock = lockParts();
2014-09-19 11:44:29 +00:00
const auto it = column_sizes.find(name);
return it == std::end(column_sizes) ? 0 : it->second.data_compressed;
2014-09-19 11:44:29 +00:00
}
ColumnSizeByName getColumnSizes() const override
{
2019-03-28 19:58:41 +00:00
auto lock = lockParts();
return column_sizes;
}
2022-05-06 14:44:00 +00:00
const ColumnsDescription & getConcreteObjectColumns() const { return object_columns; }
2022-09-05 01:50:24 +00:00
/// Creates description of columns of data type Object from the range of data parts.
2022-05-06 14:44:00 +00:00
static ColumnsDescription getConcreteObjectColumns(
const DataPartsVector & parts, const ColumnsDescription & storage_columns);
2021-04-24 04:09:01 +00:00
IndexSizeByName getSecondaryIndexSizes() const override
{
auto lock = lockParts();
return secondary_index_sizes;
}
/// For ATTACH/DETACH/DROP PARTITION.
2022-06-17 16:13:57 +00:00
String getPartitionIDFromQuery(const ASTPtr & ast, ContextPtr context, DataPartsLock * acquired_lock = nullptr) const;
2021-08-18 22:19:14 +00:00
std::unordered_set<String> getPartitionIDsFromQuery(const ASTs & asts, ContextPtr context) const;
2021-12-15 18:19:29 +00:00
std::set<String> getPartitionIdsAffectedByCommands(const MutationCommands & commands, ContextPtr query_context) const;
/// Returns set of partition_ids of all Active parts
std::unordered_set<String> getAllPartitionIds() const;
/// Extracts MergeTreeData of other *MergeTree* storage
/// and checks that their structure suitable for ALTER TABLE ATTACH PARTITION FROM
/// Tables structure should be locked.
MergeTreeData & checkStructureAndGetMergeTreeData(const StoragePtr & source_table, const StorageMetadataPtr & src_snapshot, const StorageMetadataPtr & my_snapshot) const;
MergeTreeData & checkStructureAndGetMergeTreeData(IStorage & source_table, const StorageMetadataPtr & src_snapshot, const StorageMetadataPtr & my_snapshot) const;
2022-04-19 13:53:10 +00:00
struct HardlinkedFiles
{
2022-04-20 19:08:26 +00:00
/// Shared table uuid where hardlinks live
std::string source_table_shared_id;
2022-04-19 13:53:10 +00:00
/// Hardlinked from part
std::string source_part_name;
/// Hardlinked files list
NameSet hardlinks_from_source_part;
};
2022-08-09 16:44:51 +00:00
std::pair<MergeTreeData::MutableDataPartPtr, scope_guard> cloneAndLoadDataPartOnSameDisk(
const MergeTreeData::DataPartPtr & src_part, const String & tmp_part_prefix,
const MergeTreePartInfo & dst_part_info, const StorageMetadataPtr & metadata_snapshot,
const MergeTreeTransactionPtr & txn, HardlinkedFiles * hardlinked_files,
2022-09-27 13:23:02 +00:00
bool copy_instead_of_hardlink, const NameSet & files_to_copy_instead_of_hardlinks);
2019-05-03 02:00:57 +00:00
virtual std::vector<MergeTreeMutationStatus> getMutationsStatus() const = 0;
/// Returns true if table can create new parts with adaptive granularity
/// Has additional constraint in replicated version
virtual bool canUseAdaptiveGranularity() const
2019-06-19 14:46:06 +00:00
{
2019-08-26 14:24:29 +00:00
const auto settings = getSettings();
2019-08-13 10:29:31 +00:00
return settings->index_granularity_bytes != 0 &&
(settings->enable_mixed_granularity_parts || !has_non_adaptive_index_granularity_parts);
2019-06-19 14:46:06 +00:00
}
2019-08-28 18:23:20 +00:00
/// Get constant pointer to storage settings.
/// Copy this pointer into your scope and you will
/// get consistent settings.
MergeTreeSettingsPtr getSettings() const
{
return storage_settings.get();
}
2019-06-19 16:16:13 +00:00
2020-04-08 08:41:13 +00:00
String getRelativeDataPath() const { return relative_data_path; }
2019-09-06 15:09:20 +00:00
/// Get table path on disk
2019-11-27 09:39:44 +00:00
String getFullPathOnDisk(const DiskPtr & disk) const;
2019-09-06 15:09:20 +00:00
2021-11-24 19:45:10 +00:00
/// Looks for detached part on all disks,
/// returns pointer to the disk where part is found or nullptr (the second function throws an exception)
DiskPtr tryGetDiskForDetachedPart(const String & part_name) const;
DiskPtr getDiskForDetachedPart(const String & part_name) const;
2020-11-01 17:38:43 +00:00
bool storesDataOnDisk() const override { return true; }
Strings getDataPaths() const override;
2019-12-05 08:05:07 +00:00
/// Reserves space at least 1MB.
ReservationPtr reserveSpace(UInt64 expected_size) const;
2019-12-05 08:05:07 +00:00
/// Reserves space at least 1MB on specific disk or volume.
2020-03-18 00:57:00 +00:00
static ReservationPtr reserveSpace(UInt64 expected_size, SpacePtr space);
static ReservationPtr tryReserveSpace(UInt64 expected_size, SpacePtr space);
2019-12-05 08:05:07 +00:00
/// Reserves space at least 1MB preferring best destination according to `ttl_infos`.
2020-06-17 13:39:26 +00:00
ReservationPtr reserveSpacePreferringTTLRules(
2020-10-05 16:41:46 +00:00
const StorageMetadataPtr & metadata_snapshot,
2020-06-17 13:39:26 +00:00
UInt64 expected_size,
const IMergeTreeDataPart::TTLInfos & ttl_infos,
time_t time_of_move,
size_t min_volume_index = 0,
2021-02-18 08:50:31 +00:00
bool is_insert = false,
DiskPtr selected_disk = nullptr) const;
2020-06-17 13:39:26 +00:00
ReservationPtr tryReserveSpacePreferringTTLRules(
2020-10-05 16:41:46 +00:00
const StorageMetadataPtr & metadata_snapshot,
2020-06-17 13:39:26 +00:00
UInt64 expected_size,
const IMergeTreeDataPart::TTLInfos & ttl_infos,
time_t time_of_move,
size_t min_volume_index = 0,
2021-02-18 08:50:31 +00:00
bool is_insert = false,
DiskPtr selected_disk = nullptr) const;
/// Reserves space for the part based on the distribution of "big parts" in the same partition.
/// Parts with estimated size larger than `min_bytes_to_rebalance_partition_over_jbod` are
/// considered as big. The priority is lower than TTL. If reservation fails, return nullptr.
ReservationPtr balancedReservation(
const StorageMetadataPtr & metadata_snapshot,
size_t part_size,
size_t max_volume_index,
const String & part_name,
const MergeTreePartInfo & part_info,
MergeTreeData::DataPartsVector covered_parts,
std::optional<CurrentlySubmergingEmergingTagger> * tagger_ptr,
const IMergeTreeDataPart::TTLInfos * ttl_infos,
bool is_insert = false);
2020-06-17 13:39:26 +00:00
/// Choose disk with max available free space
/// Reserves 0 bytes
ReservationPtr makeEmptyReservationOnLargestDisk() const { return getStoragePolicy()->makeEmptyReservationOnLargestDisk(); }
2021-08-24 23:05:55 +00:00
Disks getDisks() const { return getStoragePolicy()->getDisks(); }
2020-04-03 10:40:46 +00:00
/// Return alter conversions for part which must be applied on fly.
AlterConversions getAlterConversionsForPart(MergeTreeDataPartPtr part) const;
/// Returns destination disk or volume for the TTL rule according to current storage policy.
SpacePtr getDestinationForMoveTTL(const TTLDescription & move_ttl) const;
/// Whether INSERT of a data part which is already expired should move it immediately to a volume/disk declared in move rule.
bool shouldPerformTTLMoveOnInsert(const SpacePtr & move_destination) const;
2020-05-25 17:07:14 +00:00
/// Checks if given part already belongs destination disk or volume for the
/// TTL rule.
2020-05-28 15:33:44 +00:00
bool isPartInTTLDestination(const TTLDescription & ttl, const IMergeTreeDataPart & part) const;
2020-03-24 17:05:38 +00:00
2020-09-04 10:08:09 +00:00
/// Get count of total merges with TTL in MergeList (system.merges) for all
/// tables (not only current table).
/// Method is cheap and doesn't require any locks.
size_t getTotalMergesWithTTLInMergeList() const;
2020-04-14 19:47:19 +00:00
using WriteAheadLogPtr = std::shared_ptr<MergeTreeWriteAheadLog>;
2020-06-30 18:47:12 +00:00
WriteAheadLogPtr getWriteAheadLog();
2020-04-14 19:47:19 +00:00
2022-09-22 22:51:13 +00:00
constexpr static auto EMPTY_PART_TMP_PREFIX = "tmp_empty_";
MergeTreeData::MutableDataPartPtr createEmptyPart(MergeTreePartInfo & new_part_info, const MergeTreePartition & partition, const String & new_part_name, const MergeTreeTransactionPtr & txn);
MergeTreeDataFormatVersion format_version;
2017-02-07 17:52:41 +00:00
/// Merging params - what additional actions to perform during merge.
2016-04-15 17:13:51 +00:00
const MergingParams merging_params;
bool is_custom_partitioned = false;
2021-03-02 10:57:09 +00:00
/// Used only for old syntax tables. Never changes after init.
Int64 minmax_idx_date_column_pos = -1; /// In a common case minmax index includes a date column.
Int64 minmax_idx_time_column_pos = -1; /// In other cases, minmax index often includes a dateTime column.
2021-03-02 10:57:09 +00:00
/// Get partition key expression on required columns
static ExpressionActionsPtr getMinMaxExpr(const KeyDescription & partition_key, const ExpressionActionsSettings & settings);
2021-03-02 10:57:09 +00:00
/// Get column names required for partition key
2021-03-02 16:13:36 +00:00
static Names getMinMaxColumnsNames(const KeyDescription & partition_key);
2021-03-02 10:57:09 +00:00
/// Get column types required for partition key
2021-03-02 16:13:36 +00:00
static DataTypes getMinMaxColumnsTypes(const KeyDescription & partition_key);
ExpressionActionsPtr getPrimaryKeyAndSkipIndicesExpression(const StorageMetadataPtr & metadata_snapshot) const;
ExpressionActionsPtr getSortingKeyAndSkipIndicesExpression(const StorageMetadataPtr & metadata_snapshot) const;
2020-09-07 07:59:14 +00:00
/// Get compression codec for part according to TTL rules and <compression>
/// section from config.xml.
2020-08-31 19:50:42 +00:00
CompressionCodecPtr getCompressionCodecForPart(size_t part_size_compressed, const IMergeTreeDataPart::TTLInfos & ttl_infos, time_t current_time) const;
2020-01-22 19:52:55 +00:00
2021-10-08 03:40:29 +00:00
std::lock_guard<std::mutex> getQueryIdSetLock() const { return std::lock_guard<std::mutex>(query_id_set_mutex); }
2021-01-25 05:01:39 +00:00
/// Record current query id where querying the table. Throw if there are already `max_queries` queries accessing the same table.
2021-10-08 03:40:29 +00:00
/// Returns false if the `query_id` already exists in the running set, otherwise return true.
bool insertQueryIdOrThrow(const String & query_id, size_t max_queries) const;
Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
2022-06-14 22:35:55 +00:00
bool insertQueryIdOrThrowNoLock(const String & query_id, size_t max_queries) const TSA_REQUIRES(query_id_set_mutex);
2021-01-25 05:01:39 +00:00
/// Remove current query id after query finished.
void removeQueryId(const String & query_id) const;
Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
2022-06-14 22:35:55 +00:00
void removeQueryIdNoLock(const String & query_id) const TSA_REQUIRES(query_id_set_mutex);
2021-01-25 05:01:39 +00:00
2021-04-27 08:15:59 +00:00
/// Return the partition expression types as a Tuple type. Return DataTypeUInt8 if partition expression is empty.
DataTypePtr getPartitionValueType() const;
2021-08-26 11:01:15 +00:00
/// Construct a sample block of virtual columns.
Block getSampleBlockWithVirtualColumns() const;
2021-04-27 08:15:59 +00:00
/// Construct a block consisting only of possible virtual columns for part pruning.
/// If one_part is true, fill in at most one part.
Block getBlockWithVirtualPartColumns(const MergeTreeData::DataPartsVector & parts, bool one_part, bool ignore_empty = false) const;
/// For generating names of temporary parts during insertion.
SimpleIncrement insert_increment;
2019-06-19 14:46:06 +00:00
bool has_non_adaptive_index_granularity_parts = false;
/// True if at least one part contains lightweight delete.
mutable std::atomic_bool has_lightweight_delete_parts = false;
2019-09-05 13:12:29 +00:00
/// Parts that currently moving from disk/volume to another.
/// This set have to be used with `currently_processing_in_background_mutex`.
/// Moving may conflict with merges and mutations, but this is OK, because
/// if we decide to move some part to another disk, than we
/// assuredly will choose this disk for containing part, which will appear
/// as result of merge or mutation.
DataParts currently_moving_parts;
2019-09-06 15:09:20 +00:00
/// Mutex for currently_moving_parts
2019-09-05 15:53:23 +00:00
mutable std::mutex moving_parts_mutex;
2019-08-13 08:35:49 +00:00
PinnedPartUUIDsPtr getPinnedPartUUIDs() const;
2021-06-21 13:36:21 +00:00
/// Schedules background job to like merge/mutate/fetch an executor
2021-09-08 00:21:21 +00:00
virtual bool scheduleDataProcessingJob(BackgroundJobsAssignee & assignee) = 0;
2021-06-21 13:36:21 +00:00
/// Schedules job to move parts between disks/volumes and so on.
2021-09-08 00:21:21 +00:00
bool scheduleDataMovingJob(BackgroundJobsAssignee & assignee);
2020-10-14 07:22:48 +00:00
bool areBackgroundMovesNeeded() const;
2020-10-13 14:25:42 +00:00
2022-04-19 13:53:10 +00:00
2021-07-05 03:32:56 +00:00
/// Lock part in zookeeper for shared data in several nodes
/// Overridden in StorageReplicatedMergeTree
2022-04-19 13:53:10 +00:00
virtual void lockSharedData(const IMergeTreeDataPart &, bool = false, std::optional<HardlinkedFiles> = {}) const {} /// NOLINT
2021-02-26 09:48:57 +00:00
2021-07-05 03:32:56 +00:00
/// Unlock shared data part in zookeeper
/// Overridden in StorageReplicatedMergeTree
2022-04-18 23:09:09 +00:00
virtual std::pair<bool, NameSet> unlockSharedData(const IMergeTreeDataPart &) const { return std::make_pair(true, NameSet{}); }
2021-02-26 09:48:57 +00:00
/// Fetch part only if some replica has it on shared storage like S3
/// Overridden in StorageReplicatedMergeTree
virtual MutableDataPartStoragePtr tryToFetchIfShared(const IMergeTreeDataPart &, const DiskPtr &, const String &) { return nullptr; }
2021-02-26 09:48:57 +00:00
/// Check shared data usage on other replicas for detached/freezed part
/// Remove local files and remote files if needed
virtual bool removeDetachedPart(DiskPtr disk, const String & path, const String & part_name);
2022-04-20 19:08:26 +00:00
virtual String getTableSharedID() const { return ""; }
2021-12-27 16:27:06 +00:00
/// Store metadata for replicated tables
/// Do nothing for non-replicated tables
virtual void createAndStoreFreezeMetadata(DiskPtr disk, DataPartPtr part, String backup_part_path) const;
2021-02-18 08:50:31 +00:00
/// Parts that currently submerging (merging to bigger parts) or emerging
2021-03-08 09:38:07 +00:00
/// (to be appeared after merging finished). These two variables have to be used
2021-02-18 08:50:31 +00:00
/// with `currently_submerging_emerging_mutex`.
2021-03-08 09:38:07 +00:00
DataParts currently_submerging_big_parts;
std::map<String, EmergingPartInfo> currently_emerging_big_parts;
2021-02-18 08:50:31 +00:00
/// Mutex for currently_submerging_parts and currently_emerging_parts
mutable std::mutex currently_submerging_emerging_mutex;
2022-06-08 12:09:59 +00:00
/// Used for freezePartitionsByMatcher and unfreezePartitionsByMatcher
using MatcherFn = std::function<bool(const String &)>;
2022-08-09 16:44:51 +00:00
/// Returns an object that protects temporary directory from cleanup
scope_guard getTemporaryPartDirectoryHolder(const String & part_dir_name) const;
2022-08-09 16:44:51 +00:00
2019-05-03 02:00:57 +00:00
protected:
2019-10-10 16:30:30 +00:00
friend class IMergeTreeDataPart;
friend class MergeTreeDataMergerMutator;
friend struct ReplicatedMergeTreeTableMetadata;
2019-05-03 02:00:57 +00:00
friend class StorageReplicatedMergeTree;
friend class MergeTreeDataWriter;
friend class MergeTask;
friend class IPartMetadataManager;
friend class IMergedBlockOutputStream; // for access to log
bool require_part_metadata;
/// Relative path data, changes during rename for ordinary databases use
/// under lockForShare if rename is possible.
String relative_data_path;
/// Current column sizes in compressed and uncompressed form.
ColumnSizeByName column_sizes;
/// Current secondary index sizes in compressed and uncompressed form.
IndexSizeByName secondary_index_sizes;
/// Engine-specific methods
2014-07-23 09:15:41 +00:00
BrokenPartCallback broken_part_callback;
/// log_name will change during table RENAME. Use atomic_shared_ptr to allow concurrent RW.
/// NOTE clang-14 doesn't have atomic_shared_ptr yet. Use std::atomic* operations for now.
std::shared_ptr<String> log_name;
std::atomic<Poco::Logger *> log;
2019-08-26 14:24:29 +00:00
/// Storage settings.
/// Use get and set to receive readonly versions.
MultiVersion<MergeTreeSettings> storage_settings;
/// Used to determine which UUIDs to send to root query executor for deduplication.
mutable std::shared_mutex pinned_part_uuids_mutex;
PinnedPartUUIDsPtr pinned_part_uuids;
2022-02-03 18:57:09 +00:00
/// True if at least one part was created/removed with transaction.
mutable std::atomic_bool transactions_enabled = false;
2022-09-15 15:35:28 +00:00
std::atomic_bool data_parts_loading_finished = false;
/// Work with data parts
struct TagByInfo{};
struct TagByStateAndInfo{};
static const MergeTreePartInfo & dataPartPtrToInfo(const DataPartPtr & part)
{
return part->info;
}
static DataPartStateAndInfo dataPartPtrToStateAndInfo(const DataPartPtr & part)
{
return {part->getState(), part->info};
}
using DataPartsIndexes = boost::multi_index_container<DataPartPtr,
boost::multi_index::indexed_by<
/// Index by Info
boost::multi_index::ordered_unique<
boost::multi_index::tag<TagByInfo>,
boost::multi_index::global_fun<const DataPartPtr &, const MergeTreePartInfo &, dataPartPtrToInfo>
>,
/// Index by (State, Info), is used to obtain ordered slices of parts with the same state
boost::multi_index::ordered_unique<
boost::multi_index::tag<TagByStateAndInfo>,
boost::multi_index::global_fun<const DataPartPtr &, DataPartStateAndInfo, dataPartPtrToStateAndInfo>,
LessStateDataPart
>
>
>;
2017-02-07 17:52:41 +00:00
/// Current set of data parts.
2016-01-29 02:22:43 +00:00
mutable std::mutex data_parts_mutex;
DataPartsIndexes data_parts_indexes;
DataPartsIndexes::index<TagByInfo>::type & data_parts_by_info;
DataPartsIndexes::index<TagByStateAndInfo>::type & data_parts_by_state_and_info;
/// Mutex for critical sections which alter set of parts
/// It is like truncate, drop/detach partition
mutable std::mutex operation_with_data_parts_mutex;
2022-09-05 01:50:24 +00:00
/// Current description of columns of data type Object.
2022-03-01 16:32:55 +00:00
/// It changes only when set of parts is changed and is
/// protected by @data_parts_mutex.
ColumnsDescription object_columns;
2019-09-05 13:12:29 +00:00
MergeTreePartsMover parts_mover;
/// Executors are common for both ReplicatedMergeTree and plain MergeTree
/// but they are being started and finished in derived classes, so let them be protected.
2021-09-08 00:21:21 +00:00
///
/// Why there are two executors, not one? Or an executor for each kind of operation?
/// It is historically formed.
/// Another explanation is that moving operations are common for Replicated and Plain MergeTree classes.
/// Task that schedules this operations is executed with its own timetable and triggered in a specific places in code.
/// And for ReplicatedMergeTree we don't have LogEntry type for this operation.
BackgroundJobsAssignee background_operations_assignee;
BackgroundJobsAssignee background_moves_assignee;
2021-12-28 11:29:01 +00:00
bool use_metadata_cache;
2021-09-08 00:21:21 +00:00
/// Strongly connected with two fields above.
/// Every task that is finished will ask to assign a new one into an executor.
2021-09-08 00:21:21 +00:00
/// These callbacks will be passed to the constructor of each task.
IExecutableTask::TaskResultCallback common_assignee_trigger;
IExecutableTask::TaskResultCallback moves_assignee_trigger;
using DataPartIteratorByInfo = DataPartsIndexes::index<TagByInfo>::type::iterator;
using DataPartIteratorByStateAndInfo = DataPartsIndexes::index<TagByStateAndInfo>::type::iterator;
boost::iterator_range<DataPartIteratorByStateAndInfo> getDataPartsStateRange(DataPartState state) const
{
auto begin = data_parts_by_state_and_info.lower_bound(state, LessStateDataPart());
auto end = data_parts_by_state_and_info.upper_bound(state, LessStateDataPart());
return {begin, end};
}
boost::iterator_range<DataPartIteratorByInfo> getDataPartsPartitionRange(const String & partition_id) const
{
auto begin = data_parts_by_info.lower_bound(PartitionID(partition_id), LessDataPart());
auto end = data_parts_by_info.upper_bound(PartitionID(partition_id), LessDataPart());
return {begin, end};
}
2022-09-05 01:50:24 +00:00
/// Creates description of columns of data type Object from the range of data parts.
2022-05-06 14:44:00 +00:00
static ColumnsDescription getConcreteObjectColumns(
boost::iterator_range<DataPartIteratorByStateAndInfo> range, const ColumnsDescription & storage_columns);
2021-03-03 08:36:20 +00:00
std::optional<UInt64> totalRowsByPartitionPredicateImpl(
const SelectQueryInfo & query_info, ContextPtr context, const DataPartsVector & parts) const;
2021-03-03 08:36:20 +00:00
static decltype(auto) getStateModifier(DataPartState state)
{
return [state] (const DataPartPtr & part) { part->setState(state); };
}
void modifyPartState(DataPartIteratorByStateAndInfo it, DataPartState state)
{
if (!data_parts_by_state_and_info.modify(it, getStateModifier(state)))
throw Exception("Can't modify " + (*it)->getNameWithState(), ErrorCodes::LOGICAL_ERROR);
}
void modifyPartState(DataPartIteratorByInfo it, DataPartState state)
{
if (!data_parts_by_state_and_info.modify(data_parts_indexes.project<TagByStateAndInfo>(it), getStateModifier(state)))
throw Exception("Can't modify " + (*it)->getNameWithState(), ErrorCodes::LOGICAL_ERROR);
}
void modifyPartState(const DataPartPtr & part, DataPartState state)
{
auto it = data_parts_by_info.find(part->info);
if (it == data_parts_by_info.end() || (*it).get() != part.get())
throw Exception("Part " + part->name + " doesn't exist", ErrorCodes::LOGICAL_ERROR);
if (!data_parts_by_state_and_info.modify(data_parts_indexes.project<TagByStateAndInfo>(it), getStateModifier(state)))
throw Exception("Can't modify " + (*it)->getNameWithState(), ErrorCodes::LOGICAL_ERROR);
}
2017-02-07 17:52:41 +00:00
/// Used to serialize calls to grabOldParts.
std::mutex grab_old_parts_mutex;
2017-02-07 17:52:41 +00:00
/// The same for clearOldTemporaryDirectories.
2016-04-18 21:38:06 +00:00
std::mutex clear_old_temporary_directories_mutex;
void checkProperties(const StorageInMemoryMetadata & new_metadata, const StorageInMemoryMetadata & old_metadata, bool attach = false) const;
2020-06-10 11:16:31 +00:00
void setProperties(const StorageInMemoryMetadata & new_metadata, const StorageInMemoryMetadata & old_metadata, bool attach = false);
void checkPartitionKeyAndInitMinMax(const KeyDescription & new_partition_key);
void checkTTLExpressions(const StorageInMemoryMetadata & new_metadata, const StorageInMemoryMetadata & old_metadata) const;
2020-04-02 16:11:10 +00:00
2020-04-22 06:22:14 +00:00
void checkStoragePolicy(const StoragePolicyPtr & new_storage_policy) const;
/// Calculates column and secondary indexes sizes in compressed form for the current state of data_parts. Call with data_parts mutex locked.
void calculateColumnAndSecondaryIndexSizesImpl();
/// Adds or subtracts the contribution of the part to compressed column and secondary indexes sizes.
void addPartContributionToColumnAndSecondaryIndexSizes(const DataPartPtr & part);
void removePartContributionToColumnAndSecondaryIndexSizes(const DataPartPtr & part);
/// If there is no part in the partition with ID `partition_id`, returns empty ptr. Should be called under the lock.
2020-07-13 17:27:52 +00:00
DataPartPtr getAnyPartInPartition(const String & partition_id, DataPartsLock & data_parts_lock) const;
2021-12-30 14:27:22 +00:00
/// Return parts in the Active set that are covered by the new_part_info or the part that covers it.
/// Will check that the new part doesn't already exist and that it doesn't intersect existing part.
DataPartsVector getActivePartsToReplace(
const MergeTreePartInfo & new_part_info,
const String & new_part_name,
DataPartPtr & out_covering_part,
2021-03-04 23:10:20 +00:00
DataPartsLock & data_parts_lock) const;
DataPartsVector getCoveredOutdatedParts(
const DataPartPtr & part,
DataPartsLock & data_parts_lock) const;
struct PartHierarchy
{
DataPartPtr duplicate_part;
DataPartsVector covering_parts;
DataPartsVector covered_parts;
DataPartsVector intersected_parts;
};
PartHierarchy getPartHierarchy(
2022-09-14 13:04:24 +00:00
const MergeTreePartInfo & part_info,
DataPartState state,
DataPartsLock & /* data_parts_lock */) const;
2018-03-16 06:51:37 +00:00
/// Checks whether the column is in the primary key, possibly wrapped in a chain of functions with single argument.
2020-06-17 12:39:20 +00:00
bool isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(const ASTPtr & node, const StorageMetadataPtr & metadata_snapshot) const;
/// Common part for |freezePartition()| and |freezeAll()|.
PartitionCommandsResultInfo freezePartitionsByMatcher(MatcherFn matcher, const StorageMetadataPtr & metadata_snapshot, const String & with_name, ContextPtr context);
PartitionCommandsResultInfo unfreezePartitionsByMatcher(MatcherFn matcher, const String & backup_name, ContextPtr context);
2019-06-19 16:16:13 +00:00
// Partition helpers
2020-03-09 01:50:33 +00:00
bool canReplacePartition(const DataPartPtr & src_part) const;
2019-09-03 11:32:25 +00:00
2021-04-20 02:31:08 +00:00
/// Tries to drop part in background without any waits or throwing exceptions in case of errors.
2021-05-17 14:26:36 +00:00
virtual void dropPartNoWaitNoThrow(const String & part_name) = 0;
virtual void dropPart(const String & part_name, bool detach, ContextPtr context) = 0;
virtual void dropPartition(const ASTPtr & partition, bool detach, ContextPtr context) = 0;
virtual PartitionCommandsResultInfo attachPartition(const ASTPtr & partition, const StorageMetadataPtr & metadata_snapshot, bool part, ContextPtr context) = 0;
virtual void replacePartitionFrom(const StoragePtr & source_table, const ASTPtr & partition, bool replace, ContextPtr context) = 0;
virtual void movePartitionToTable(const StoragePtr & dest_table, const ASTPtr & partition, ContextPtr context) = 0;
2020-11-11 18:17:41 +00:00
virtual void fetchPartition(
const ASTPtr & partition,
const StorageMetadataPtr & metadata_snapshot,
const String & from,
bool fetch_part,
ContextPtr query_context);
virtual void movePartitionToShard(const ASTPtr & partition, bool move_part, const String & to, ContextPtr query_context);
2019-09-03 11:32:25 +00:00
void writePartLog(
PartLogElement::Type type,
const ExecutionStatus & execution_status,
UInt64 elapsed_ns,
const String & new_part_name,
const DataPartPtr & result_part,
const DataPartsVector & source_parts,
const MergeListEntry * merge_entry);
2019-09-05 15:53:23 +00:00
2019-09-06 15:09:20 +00:00
/// If part is assigned to merge or mutation (possibly replicated)
2020-08-08 00:47:03 +00:00
/// Should be overridden by children, because they can have different
2019-09-10 11:21:59 +00:00
/// mechanisms for parts locking
2019-09-05 15:53:23 +00:00
virtual bool partIsAssignedToBackgroundOperation(const DataPartPtr & part) const = 0;
2020-04-03 10:40:46 +00:00
/// Return most recent mutations commands for part which weren't applied
/// Used to receive AlterConversions for part and apply them on fly. This
/// method has different implementations for replicated and non replicated
/// MergeTree because they store mutations in different way.
2020-11-28 08:17:20 +00:00
virtual MutationCommands getFirstAlterMutationCommandsForPart(const DataPartPtr & part) const = 0;
2019-09-10 11:21:59 +00:00
/// Moves part to specified space, used in ALTER ... MOVE ... queries
2019-11-27 09:39:44 +00:00
bool movePartsToSpace(const DataPartsVector & parts, SpacePtr space);
2019-09-05 15:53:23 +00:00
/// Makes backup entries to backup the parts of this table.
BackupEntries backupParts(const DataPartsVector & data_parts, const String & data_path_in_backup, const ContextPtr & local_context);
class RestoredPartsHolder;
/// Restores the parts of this table from backup.
void restorePartsFromBackup(RestorerFromBackup & restorer, const String & data_path_in_backup, const std::optional<ASTs> & partitions);
2022-07-06 10:03:10 +00:00
void restorePartFromBackup(std::shared_ptr<RestoredPartsHolder> restored_parts_holder, const MergeTreePartInfo & part_info, const String & part_path_in_backup) const;
/// Attaches restored parts to the storage.
virtual void attachRestoredParts(MutableDataPartsVector && parts) = 0;
2022-09-13 22:43:59 +00:00
void resetObjectColumnsFromActiveParts(const DataPartsLock & lock);
void updateObjectColumns(const DataPartPtr & part, const DataPartsLock & lock);
static void incrementInsertedPartsProfileEvent(MergeTreeDataPartType type);
static void incrementMergedPartsProfileEvent(MergeTreeDataPartType type);
2019-09-05 15:53:23 +00:00
private:
/// Checking that candidate part doesn't break invariants: correct partition
void checkPartPartition(MutableDataPartPtr & part, DataPartsLock & lock) const;
void checkPartDuplicate(MutableDataPartPtr & part, Transaction & transaction, DataPartsLock & lock) const;
2022-06-24 13:41:09 +00:00
2022-06-24 15:43:18 +00:00
/// Preparing itself to be committed in memory: fill some fields inside part, add it to data_parts_indexes
2022-09-05 01:50:24 +00:00
/// in precommitted state and to transaction
2022-10-22 22:51:59 +00:00
void preparePartForCommit(MutableDataPartPtr & part, Transaction & out_transaction);
2022-06-24 13:41:09 +00:00
2022-06-24 15:33:43 +00:00
/// Low-level method for preparing parts for commit (in-memory).
2022-06-24 13:24:02 +00:00
/// FIXME Merge MergeTreeTransaction and Transaction
2022-06-24 11:19:29 +00:00
bool renameTempPartAndReplaceImpl(
MutableDataPartPtr & part,
Transaction & out_transaction,
DataPartsLock & lock,
2022-06-24 13:24:02 +00:00
DataPartsVector * out_covered_parts);
2019-09-06 15:09:20 +00:00
/// RAII Wrapper for atomic work with currently moving parts
2020-06-27 19:05:00 +00:00
/// Acquire them in constructor and remove them in destructor
2019-09-10 11:21:59 +00:00
/// Uses data.currently_moving_parts_mutex
2019-09-05 15:53:23 +00:00
struct CurrentlyMovingPartsTagger
{
MergeTreeMovingParts parts_to_move;
MergeTreeData & data;
CurrentlyMovingPartsTagger(MergeTreeMovingParts && moving_parts_, MergeTreeData & data_);
~CurrentlyMovingPartsTagger();
};
2020-10-14 07:22:48 +00:00
using CurrentlyMovingPartsTaggerPtr = std::shared_ptr<CurrentlyMovingPartsTagger>;
2019-09-10 11:21:59 +00:00
/// Move selected parts to corresponding disks
2020-10-14 07:22:48 +00:00
bool moveParts(const CurrentlyMovingPartsTaggerPtr & moving_tagger);
2019-09-05 15:53:23 +00:00
2019-09-06 15:09:20 +00:00
/// Select parts for move and disks for them. Used in background moving processes.
2020-10-14 07:22:48 +00:00
CurrentlyMovingPartsTaggerPtr selectPartsForMove();
2019-09-06 15:09:20 +00:00
2019-09-10 11:21:59 +00:00
/// Check selected parts for movements. Used by ALTER ... MOVE queries.
2020-10-14 07:22:48 +00:00
CurrentlyMovingPartsTaggerPtr checkPartsForMove(const DataPartsVector & parts, SpacePtr space);
bool canUsePolymorphicParts(const MergeTreeSettings & settings, String * out_reason = nullptr) const;
2020-04-14 19:47:19 +00:00
2020-06-30 18:47:12 +00:00
std::mutex write_ahead_log_mutex;
2020-04-14 19:47:19 +00:00
WriteAheadLogPtr write_ahead_log;
virtual void startBackgroundMovesIfNeeded() = 0;
2020-07-12 12:58:17 +00:00
bool allow_nullable_key{};
void addPartContributionToDataVolume(const DataPartPtr & part);
void removePartContributionToDataVolume(const DataPartPtr & part);
Fix tiny race between count() and INSERT/merges/... in MergeTree Before it was possible to return stale counter from StorageMergeTree::totalRows() (that is used for optimize_trivial_count_query) since the accounting is done in two steps: - subtract old number of rows <-- here the number can be zero, even though there are rows --> - add new number of rows This was found by CI [1] in 01615_random_one_shard_insertion test: Here you can see that INSERT went to both tables: <details> 2022.01.16 09:07:34.288252 [ 154369 ] {a1905be0-93da-460c-8c6f-9b5adace72a0} <Debug> DistributedBlockOutputStream: It took 0.035197041 sec. to insert 100 blocks, 2841.1479249065287 rows per second. Insertion status: Wrote 54 blocks and 54 rows on shard 0 replica 0, localhost:9000 (average 0 ms per block, the slowest block 1 ms) Wrote 46 blocks and 46 rows on shard 1 replica 0, localhost:9000 (average 0 ms per block, the slowest block 1 ms) </details> But the test fails, since select from shard1.tbl returns 0, and the problem was concurrent merge: <details> 2022.01.16 09:07:34.289470 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> executeQuery: (from [::1]:42082) (comment: 01615_random_one_shard_insertion.sql) select count() != 0 from shard_0.tbl; 2022.01.16 09:07:34.289564 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Debug> MergeTask::MergeProjectionsStage: Merge sorted 54 rows, containing 1 columns (1 merged, 0 gathered) in 0.00171193 sec., 31543.345814373253 rows/sec., 246.43 KiB> 2022.01.16 09:07:34.289810 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Trace> shard_0.tbl (c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab): Renaming temporary part tmp_merge_all_1_54_9 to all_1_54_9. 2022.01.16 09:07:34.289858 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> ContextAccess (default): Access granted: SELECT(number) ON shard_0.tbl 2022.01.16 09:07:34.289897 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Trace> shard_0.tbl (c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab) (MergerMutator): Merged 6 parts: from all_1_49_8 to all_54_54_0 2022.01.16 09:07:34.289920 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> InterpreterSelectQuery: WithMergeableState -> Complete 2022.01.16 09:07:34.289987 [ 375 ] {} <Debug> MemoryTracker: Peak memory usage Mutate/Merge: 3.12 MiB. 2022.01.16 09:07:34.290305 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> MergingAggregatedTransform: Reading blocks of partially aggregated data. 2022.01.16 09:07:34.290332 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> MergingAggregatedTransform: Read 1 blocks of partially aggregated data, total 1 rows. 2022.01.16 09:07:34.290343 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Merging partially aggregated single-level data. 2022.01.16 09:07:34.290358 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Merged partially aggregated single-level data. 2022.01.16 09:07:34.290366 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Converting aggregated data to blocks 2022.01.16 09:07:34.290391 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> Aggregator: Converted aggregated data to blocks. 1 rows, 8.00 B in 1.0939e-05 sec. (91416.034 rows/sec., 714.19 KiB/sec.) 2022.01.16 09:07:34.290709 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Information> executeQuery: Read 1 rows, 4.01 KiB in 0.001187722 sec., 841 rows/sec., 3.30 MiB/sec. 2022.01.16 09:07:34.290774 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B. </details> [1]: https://s3.amazonaws.com/clickhouse-test-reports/33675/7848ea7d609e4c720e8e4494eb6207c0751f5aea/stateless_tests__ubsan__actions_.html This also fixes a race between DROP TABLE check and INSERT/merges. v0: use Active parts instead. v2: fix total counters accounting instead.
2022-01-18 20:19:46 +00:00
void increaseDataVolume(ssize_t bytes, ssize_t rows, ssize_t parts);
void setDataVolume(size_t bytes, size_t rows, size_t parts);
std::atomic<size_t> total_active_size_bytes = 0;
std::atomic<size_t> total_active_size_rows = 0;
std::atomic<size_t> total_active_size_parts = 0;
2021-01-25 05:01:39 +00:00
// Record all query ids which access the table. It's guarded by `query_id_set_mutex` and is always mutable.
Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
2022-06-14 22:35:55 +00:00
mutable std::set<String> query_id_set TSA_GUARDED_BY(query_id_set_mutex);
2021-01-25 05:01:39 +00:00
mutable std::mutex query_id_set_mutex;
2021-03-02 20:28:42 +00:00
// Get partition matcher for FREEZE / UNFREEZE queries.
MatcherFn getPartitionMatcher(const ASTPtr & partition, ContextPtr context) const;
2021-07-05 12:44:58 +00:00
/// Returns default settings for storage with possible changes from global config.
virtual std::unique_ptr<MergeTreeSettings> getDefaultSettings() const = 0;
2021-08-30 15:29:09 +00:00
void loadDataPartsFromDisk(
MutableDataPartsVector & broken_parts_to_detach,
MutableDataPartsVector & duplicate_parts_to_remove,
2021-08-30 15:29:09 +00:00
ThreadPool & pool,
size_t num_parts,
std::queue<std::vector<std::pair<String, DiskPtr>>> & parts_queue,
bool skip_sanity_checks,
const MergeTreeSettingsPtr & settings);
void loadDataPartsFromWAL(
MutableDataPartsVector & duplicate_parts_to_remove,
MutableDataPartsVector & parts_from_wal);
2022-01-17 11:52:51 +00:00
/// Create zero-copy exclusive lock for part and disk. Useful for coordination of
/// distributed operations which can lead to data duplication. Implemented only in ReplicatedMergeTree.
2022-02-10 19:45:52 +00:00
virtual std::optional<ZeroCopyLock> tryCreateZeroCopyExclusiveLock(const String &, const DiskPtr &) { return std::nullopt; }
2022-05-07 22:53:55 +00:00
/// Remove parts from disk calling part->remove(). Can do it in parallel in case of big set of parts and enabled settings.
/// If we fail to remove some part and throw_on_error equal to `true` will throw an exception on the first failed part.
/// Otherwise, in non-parallel case will break and return.
2022-09-05 01:50:24 +00:00
void clearPartsFromFilesystemImpl(const DataPartsVector & parts, NameSet * part_names_succeed);
2022-05-07 22:53:55 +00:00
static MutableDataPartPtr preparePartForRemoval(const DataPartPtr & part);
mutable TemporaryParts temporary_parts;
2014-03-09 17:36:01 +00:00
};
2021-02-18 08:50:31 +00:00
/// RAII struct to record big parts that are submerging or emerging.
/// It's used to calculate the balanced statistics of JBOD array.
struct CurrentlySubmergingEmergingTagger
{
MergeTreeData & storage;
2021-03-08 09:38:07 +00:00
String emerging_part_name;
MergeTreeData::DataPartsVector submerging_parts;
2021-02-18 08:50:31 +00:00
Poco::Logger * log;
CurrentlySubmergingEmergingTagger(
MergeTreeData & storage_, const String & name_, MergeTreeData::DataPartsVector && parts_, Poco::Logger * log_)
2021-03-08 09:38:07 +00:00
: storage(storage_), emerging_part_name(name_), submerging_parts(std::move(parts_)), log(log_)
2021-02-18 08:50:31 +00:00
{
}
~CurrentlySubmergingEmergingTagger();
};
/// TODO: move it somewhere
[[ maybe_unused ]] static bool needSyncPart(size_t input_rows, size_t input_bytes, const MergeTreeSettings & settings)
{
return ((settings.min_rows_to_fsync_after_merge && input_rows >= settings.min_rows_to_fsync_after_merge)
|| (settings.min_compressed_bytes_to_fsync_after_merge && input_bytes >= settings.min_compressed_bytes_to_fsync_after_merge));
}
2014-03-09 17:36:01 +00:00
}