Merge remote-tracking branch 'origin/master' into pr-local-plan

This commit is contained in:
Igor Nikonov 2024-09-06 18:46:42 +00:00
commit b436057cba
63 changed files with 865 additions and 135 deletions

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.8.3.59"
ARG VERSION="24.8.4.13"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.8.3.59"
ARG VERSION="24.8.4.13"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.8.3.59"
ARG VERSION="24.8.4.13"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
#docker-official-library:off

View File

@ -3,6 +3,8 @@
FROM alpine:3.18
RUN apk add --no-cache -U iproute2 \
&& for bin in iptables iptables-restore iptables-save; \
&& for bin in \
iptables iptables-restore iptables-save \
ip6tables ip6tables-restore ip6tables-save; \
do ln -sf xtables-nft-multi "/sbin/$bin"; \
done

View File

@ -0,0 +1,17 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.3.11.7-lts (28795d0a47e) FIXME as compared to v24.3.10.33-lts (37b6502ebf0)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#67479](https://github.com/ClickHouse/ClickHouse/issues/67479): In rare cases ClickHouse could consider parts as broken because of some unexpected projections on disk. Now it's fixed. [#66898](https://github.com/ClickHouse/ClickHouse/pull/66898) ([alesapin](https://github.com/alesapin)).
* Backported in [#69243](https://github.com/ClickHouse/ClickHouse/issues/69243): `UNION` clause in subqueries wasn't handled correctly in queries with parallel replicas and lead to LOGICAL_ERROR `Duplicate announcement received for replica`. [#69146](https://github.com/ClickHouse/ClickHouse/pull/69146) ([Igor Nikonov](https://github.com/devcrafter)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#69221](https://github.com/ClickHouse/ClickHouse/issues/69221): Disable memory test with sanitizer. [#69193](https://github.com/ClickHouse/ClickHouse/pull/69193) ([alesapin](https://github.com/alesapin)).

View File

@ -0,0 +1,18 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.5.8.10-stable (f11729638ea) FIXME as compared to v24.5.7.31-stable (6c185e9aec1)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#69295](https://github.com/ClickHouse/ClickHouse/issues/69295): TODO. [#68744](https://github.com/ClickHouse/ClickHouse/pull/68744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#69245](https://github.com/ClickHouse/ClickHouse/issues/69245): `UNION` clause in subqueries wasn't handled correctly in queries with parallel replicas and lead to LOGICAL_ERROR `Duplicate announcement received for replica`. [#69146](https://github.com/ClickHouse/ClickHouse/pull/69146) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix crash when using `s3` table function with GLOB paths and filters. [#69176](https://github.com/ClickHouse/ClickHouse/pull/69176) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#69223](https://github.com/ClickHouse/ClickHouse/issues/69223): Disable memory test with sanitizer. [#69193](https://github.com/ClickHouse/ClickHouse/pull/69193) ([alesapin](https://github.com/alesapin)).

View File

@ -0,0 +1,16 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.6.6.6-stable (a4c4580e639) FIXME as compared to v24.6.5.30-stable (e6e196c92d6)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#69197](https://github.com/ClickHouse/ClickHouse/issues/69197): TODO. [#68744](https://github.com/ClickHouse/ClickHouse/pull/68744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#69225](https://github.com/ClickHouse/ClickHouse/issues/69225): Disable memory test with sanitizer. [#69193](https://github.com/ClickHouse/ClickHouse/pull/69193) ([alesapin](https://github.com/alesapin)).

View File

@ -0,0 +1,17 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.7.6.8-stable (7779883593a) FIXME as compared to v24.7.5.37-stable (f2533ca97be)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#69198](https://github.com/ClickHouse/ClickHouse/issues/69198): TODO. [#68744](https://github.com/ClickHouse/ClickHouse/pull/68744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#69249](https://github.com/ClickHouse/ClickHouse/issues/69249): `UNION` clause in subqueries wasn't handled correctly in queries with parallel replicas and lead to LOGICAL_ERROR `Duplicate announcement received for replica`. [#69146](https://github.com/ClickHouse/ClickHouse/pull/69146) ([Igor Nikonov](https://github.com/devcrafter)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#69227](https://github.com/ClickHouse/ClickHouse/issues/69227): Disable memory test with sanitizer. [#69193](https://github.com/ClickHouse/ClickHouse/pull/69193) ([alesapin](https://github.com/alesapin)).

View File

@ -0,0 +1,22 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.8.4.13-lts (53195bc189b) FIXME as compared to v24.8.3.59-lts (e729b9fa40e)
#### Improvement
* Backported in [#68699](https://github.com/ClickHouse/ClickHouse/issues/68699): Delete old code of named collections from dictionaries and substitute it to the new, which allows to use DDL created named collections in dictionaries. Closes [#60936](https://github.com/ClickHouse/ClickHouse/issues/60936), closes [#36890](https://github.com/ClickHouse/ClickHouse/issues/36890). [#68412](https://github.com/ClickHouse/ClickHouse/pull/68412) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#69231](https://github.com/ClickHouse/ClickHouse/issues/69231): Fix parsing error when null should be inserted as default in some cases during JSON type parsing. [#68955](https://github.com/ClickHouse/ClickHouse/pull/68955) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#69251](https://github.com/ClickHouse/ClickHouse/issues/69251): `UNION` clause in subqueries wasn't handled correctly in queries with parallel replicas and lead to LOGICAL_ERROR `Duplicate announcement received for replica`. [#69146](https://github.com/ClickHouse/ClickHouse/pull/69146) ([Igor Nikonov](https://github.com/devcrafter)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#69189](https://github.com/ClickHouse/ClickHouse/issues/69189): Don't create Object type if use_json_alias_for_old_object_type=1 but allow_experimental_object_type=0. [#69150](https://github.com/ClickHouse/ClickHouse/pull/69150) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#69229](https://github.com/ClickHouse/ClickHouse/issues/69229): Disable memory test with sanitizer. [#69193](https://github.com/ClickHouse/ClickHouse/pull/69193) ([alesapin](https://github.com/alesapin)).
* Backported in [#69219](https://github.com/ClickHouse/ClickHouse/issues/69219): Disable perf-like test with sanitizers. [#69194](https://github.com/ClickHouse/ClickHouse/pull/69194) ([alesapin](https://github.com/alesapin)).

View File

@ -989,7 +989,11 @@ ALTER TABLE tab DROP STATISTICS a;
These lightweight statistics aggregate information about distribution of values in columns. Statistics are stored in every part and updated when every insert comes.
They can be used for prewhere optimization only if we enable `set allow_statistics_optimize = 1`.
#### Available Types of Column Statistics {#available-types-of-column-statistics}
### Available Types of Column Statistics {#available-types-of-column-statistics}
- `MinMax`
The minimum and maximum column value which allows to estimate the selectivity of range filters on numeric columns.
- `TDigest`
@ -1003,6 +1007,27 @@ They can be used for prewhere optimization only if we enable `set allow_statisti
[Count-min](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch) sketches which provide an approximate count of the frequency of each value in a column.
### Supported Data Types {#supported-data-types}
| | (U)Int* | Float* | Decimal(*) | Date* | Boolean | Enum* | (Fixed)String |
|-----------|---------|--------|------------|-------|---------|-------|------------------|
| count_min | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| MinMax | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
| TDigest | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
| Uniq | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
### Supported Operations {#supported-operations}
| | Equality filters (==) | Range filters (>, >=, <, <=) |
|-----------|-----------------------|------------------------------|
| count_min | ✔ | ✗ |
| MinMax | ✗ | ✔ |
| TDigest | ✗ | ✔ |
| Uniq | ✔ | ✗ |
## Column-level Settings {#column-level-settings}
Certain MergeTree settings can be overridden at column level:

View File

@ -351,7 +351,7 @@ ALTER TABLE mt DELETE IN PARTITION ID '2' WHERE p = 2;
You can specify the partition expression in `ALTER ... PARTITION` queries in different ways:
- As a value from the `partition` column of the `system.parts` table. For example, `ALTER TABLE visits DETACH PARTITION 201901`.
- Using the keyword `ALL`. It can be used only with DROP/DETACH/ATTACH. For example, `ALTER TABLE visits ATTACH PARTITION ALL`.
- Using the keyword `ALL`. It can be used only with DROP/DETACH/ATTACH/ATTACH FROM. For example, `ALTER TABLE visits ATTACH PARTITION ALL`.
- As a tuple of expressions or constants that matches (in types) the table partitioning keys tuple. In the case of a single element partitioning key, the expression should be wrapped in the `tuple (...)` function. For example, `ALTER TABLE visits DETACH PARTITION tuple(toYYYYMM(toDate('2019-01-25')))`.
- Using the partition ID. Partition ID is a string identifier of the partition (human-readable, if possible) that is used as the names of partitions in the file system and in ZooKeeper. The partition ID must be specified in the `PARTITION ID` clause, in a single quotes. For example, `ALTER TABLE visits DETACH PARTITION ID '201901'`.
- In the [ALTER ATTACH PART](#attach-partitionpart) and [DROP DETACHED PART](#drop-detached-partitionpart) query, to specify the name of a part, use string literal with a value from the `name` column of the [system.detached_parts](/docs/en/operations/system-tables/detached_parts.md/#system_tables-detached_parts) table. For example, `ALTER TABLE visits ATTACH PART '201901_1_1_0'`.

View File

@ -24,9 +24,11 @@ DELETE FROM hits WHERE Title LIKE '%hello%';
## Lightweight `DELETE` does not delete data immediately
Lightweight `DELETE` is implemented as a [mutation](/en/sql-reference/statements/alter#mutations), which is executed asynchronously in the background by default. The statement is going to return almost immediately, but the data can still be visible to queries until the mutation is finished.
Lightweight `DELETE` is implemented as a [mutation](/en/sql-reference/statements/alter#mutations) that marks rows as deleted but does not immediately physically delete them.
The mutation marks rows as deleted, and at that point, they will no longer show up in query results. It does not physically delete the data, this will happen during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
By default, `DELETE` statements wait until marking the rows as deleted is completed before returning. This can take a long time if the amount of data is large. Alternatively, you can run it asynchronously in the background using the setting [`lightweight_deletes_sync`](/en/operations/settings/settings#lightweight_deletes_sync). If disabled, the `DELETE` statement is going to return immediately, but the data can still be visible to queries until the background mutation is finished.
The mutation does not physically delete the rows that have been marked as deleted, this will only happen during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
If you need to guarantee that your data is deleted from storage in a predictable time, consider using the table setting [`min_age_to_force_merge_seconds`](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#min_age_to_force_merge_seconds). Or you can use the [ALTER TABLE ... DELETE](/en/sql-reference/statements/alter/delete) command. Note that deleting data using `ALTER TABLE ... DELETE` may consume significant resources as it recreates all affected parts.

View File

@ -1896,6 +1896,21 @@ void ClientBase::processParsedSingleQuery(const String & full_query, const Strin
/// Temporarily apply query settings to context.
std::optional<Settings> old_settings;
SCOPE_EXIT_SAFE({
try
{
/// We need to park ParallelFormating threads,
/// because they can use settings from global context
/// and it can lead to data race with `setSettings`
resetOutput();
}
catch (...)
{
if (!have_error)
{
client_exception = std::make_unique<Exception>(getCurrentExceptionMessageAndPattern(print_stack_trace), getCurrentExceptionCode());
have_error = true;
}
}
if (old_settings)
client_context->setSettings(*old_settings);
});

View File

@ -42,7 +42,7 @@ public:
size_t max_error_cap = DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT);
using Entry = IConnectionPool::Entry;
using PoolWithFailoverBase<IConnectionPool>::isTryResultInvalid;
using PoolWithFailoverBase<IConnectionPool>::checkTryResultIsValid;
/** Allocates connection to work. */
Entry get(const ConnectionTimeouts & timeouts) override;

View File

@ -122,6 +122,14 @@ public:
return result.entry.isNull() || !result.is_usable || (skip_read_only_replicas && result.is_readonly);
}
void checkTryResultIsValid(const TryResult & result, bool skip_read_only_replicas) const
{
if (isTryResultInvalid(result, skip_read_only_replicas))
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR,
"Got an invalid connection result: entry.isNull {}, is_usable {}, is_up_to_date {}, delay {}, is_readonly {}, skip_read_only_replicas {}",
result.entry.isNull(), result.is_usable, result.is_up_to_date, result.delay, result.is_readonly, skip_read_only_replicas);
}
size_t getPoolSize() const { return nested_pools.size(); }
protected:

View File

@ -781,14 +781,14 @@ InterpreterCreateQuery::TableProperties InterpreterCreateQuery::getTableProperti
const auto & settings = getContext()->getSettingsRef();
if (index_desc.type == FULL_TEXT_INDEX_NAME && !settings.allow_experimental_full_text_index)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Experimental full-text index feature is not enabled (the setting 'allow_experimental_full_text_index')");
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Experimental full-text index feature is disabled. Turn on setting 'allow_experimental_full_text_index'");
/// ----
/// Temporary check during a transition period. Please remove at the end of 2024.
if (index_desc.type == INVERTED_INDEX_NAME && !settings.allow_experimental_inverted_index)
throw Exception(ErrorCodes::ILLEGAL_INDEX, "Please use index type 'full_text' instead of 'inverted'");
/// ----
if (index_desc.type == "vector_similarity" && !settings.allow_experimental_vector_similarity_index)
throw Exception(ErrorCodes::INCORRECT_QUERY, "Vector similarity index is disabled. Turn on allow_experimental_vector_similarity_index");
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Experimental vector similarity index is disabled. Turn on setting 'allow_experimental_vector_similarity_index'");
properties.indices.push_back(index_desc);
}

View File

@ -1142,6 +1142,16 @@ bool AlterCommands::hasFullTextIndex(const StorageInMemoryMetadata & metadata)
return false;
}
bool AlterCommands::hasVectorSimilarityIndex(const StorageInMemoryMetadata & metadata)
{
for (const auto & index : metadata.secondary_indices)
{
if (index.type == "vector_similarity")
return true;
}
return false;
}
void AlterCommands::apply(StorageInMemoryMetadata & metadata, ContextPtr context) const
{
if (!prepared)

View File

@ -237,6 +237,9 @@ public:
/// Check if commands have any full-text index
static bool hasFullTextIndex(const StorageInMemoryMetadata & metadata);
/// Check if commands have any vector similarity index
static bool hasVectorSimilarityIndex(const StorageInMemoryMetadata & metadata);
};
}

View File

@ -28,7 +28,6 @@ namespace ErrorCodes
extern const int TOO_MANY_PARTITIONS;
extern const int DISTRIBUTED_TOO_MANY_PENDING_BYTES;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int LOGICAL_ERROR;
}
/// Can the batch be split and send files from batch one-by-one instead?
@ -244,9 +243,7 @@ void DistributedAsyncInsertBatch::sendBatch(const SettingsChanges & settings_cha
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(insert_settings);
auto results = parent.pool->getManyCheckedForInsert(timeouts, insert_settings, PoolMode::GET_ONE, parent.storage.remote_storage.getQualifiedName());
auto result = results.front();
if (parent.pool->isTryResultInvalid(result, insert_settings.distributed_insert_skip_read_only_replicas))
throw Exception(ErrorCodes::LOGICAL_ERROR, "Got an invalid connection result");
parent.pool->checkTryResultIsValid(result, insert_settings.distributed_insert_skip_read_only_replicas);
connection = std::move(result.entry);
compression_expected = connection->getCompression() == Protocol::Compression::Enable;
@ -306,9 +303,7 @@ void DistributedAsyncInsertBatch::sendSeparateFiles(const SettingsChanges & sett
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(insert_settings);
auto results = parent.pool->getManyCheckedForInsert(timeouts, insert_settings, PoolMode::GET_ONE, parent.storage.remote_storage.getQualifiedName());
auto result = results.front();
if (parent.pool->isTryResultInvalid(result, insert_settings.distributed_insert_skip_read_only_replicas))
throw Exception(ErrorCodes::LOGICAL_ERROR, "Got an invalid connection result");
parent.pool->checkTryResultIsValid(result, insert_settings.distributed_insert_skip_read_only_replicas);
auto connection = std::move(result.entry);
bool compression_expected = connection->getCompression() == Protocol::Compression::Enable;

View File

@ -416,9 +416,7 @@ void DistributedAsyncInsertDirectoryQueue::processFile(std::string & file_path,
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(insert_settings);
auto results = pool->getManyCheckedForInsert(timeouts, insert_settings, PoolMode::GET_ONE, storage.remote_storage.getQualifiedName());
auto result = results.front();
if (pool->isTryResultInvalid(result, insert_settings.distributed_insert_skip_read_only_replicas))
throw Exception(ErrorCodes::LOGICAL_ERROR, "Got an invalid connection result");
pool->checkTryResultIsValid(result, insert_settings.distributed_insert_skip_read_only_replicas);
auto connection = std::move(result.entry);
LOG_DEBUG(log, "Sending `{}` to {} ({} rows, {} bytes)",

View File

@ -347,7 +347,7 @@ DistributedSink::runWritingJob(JobReplica & job, const Block & current_block, si
}
const Block & shard_block = (num_shards > 1) ? job.current_shard_block : current_block;
const Settings & settings = context->getSettingsRef();
const Settings settings = context->getSettingsCopy();
size_t rows = shard_block.rows();
@ -378,9 +378,7 @@ DistributedSink::runWritingJob(JobReplica & job, const Block & current_block, si
/// (anyway fallback_to_stale_replicas_for_distributed_queries=true by default)
auto results = shard_info.pool->getManyCheckedForInsert(timeouts, settings, PoolMode::GET_ONE, storage.remote_storage.getQualifiedName());
auto result = results.front();
if (shard_info.pool->isTryResultInvalid(result, settings.distributed_insert_skip_read_only_replicas))
throw Exception(ErrorCodes::LOGICAL_ERROR, "Got an invalid connection result");
shard_info.pool->checkTryResultIsValid(result, settings.distributed_insert_skip_read_only_replicas);
job.connection_entry = std::move(result.entry);
}
else

View File

@ -95,22 +95,18 @@ UInt32 DataPartStorageOnDiskFull::getRefCount(const String & file_name) const
return volume->getDisk()->getRefCount(fs::path(root_path) / part_dir / file_name);
}
std::string DataPartStorageOnDiskFull::getRemotePath(const std::string & file_name, bool if_exists) const
std::vector<std::string> DataPartStorageOnDiskFull::getRemotePaths(const std::string & file_name) const
{
const std::string path = fs::path(root_path) / part_dir / file_name;
auto objects = volume->getDisk()->getStorageObjects(path);
if (objects.empty() && if_exists)
return "";
std::vector<std::string> remote_paths;
remote_paths.reserve(objects.size());
if (objects.size() != 1)
{
throw Exception(ErrorCodes::LOGICAL_ERROR,
"One file must be mapped to one object on blob storage by path {} in MergeTree tables, have {}.",
path, objects.size());
}
for (const auto & object : objects)
remote_paths.push_back(object.remote_path);
return objects[0].remote_path;
return remote_paths;
}
String DataPartStorageOnDiskFull::getUniqueId() const

View File

@ -23,7 +23,7 @@ public:
Poco::Timestamp getFileLastModified(const String & file_name) const override;
size_t getFileSize(const std::string & file_name) const override;
UInt32 getRefCount(const std::string & file_name) const override;
std::string getRemotePath(const std::string & file_name, bool if_exists) const override;
std::vector<std::string> getRemotePaths(const std::string & file_name) const override;
String getUniqueId() const override;
std::unique_ptr<ReadBufferFromFileBase> readFile(

View File

@ -126,7 +126,7 @@ public:
virtual UInt32 getRefCount(const std::string & file_name) const = 0;
/// Get path on remote filesystem from file name on local filesystem.
virtual std::string getRemotePath(const std::string & file_name, bool if_exists) const = 0;
virtual std::vector<std::string> getRemotePaths(const std::string & file_name) const = 0;
virtual UInt64 calculateTotalSizeOnDisk() const = 0;

View File

@ -3230,6 +3230,10 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Experimental full-text index feature is not enabled (turn on setting 'allow_experimental_full_text_index')");
if (AlterCommands::hasVectorSimilarityIndex(new_metadata) && !settings.allow_experimental_vector_similarity_index)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Experimental vector similarity index is disabled (turn on setting 'allow_experimental_vector_similarity_index')");
for (const auto & disk : getDisks())
if (!disk->supportsHardLinks() && !commands.isSettingsAlter() && !commands.isCommentAlter())
throw Exception(
@ -5009,7 +5013,7 @@ void MergeTreeData::checkAlterPartitionIsPossible(
const auto * partition_ast = command.partition->as<ASTPartition>();
if (partition_ast && partition_ast->all)
{
if (command.type != PartitionCommand::DROP_PARTITION && command.type != PartitionCommand::ATTACH_PARTITION)
if (command.type != PartitionCommand::DROP_PARTITION && command.type != PartitionCommand::ATTACH_PARTITION && !(command.type == PartitionCommand::REPLACE_PARTITION && !command.replace))
throw DB::Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Only support DROP/DETACH/ATTACH PARTITION ALL currently");
}
else
@ -5810,7 +5814,7 @@ String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, ContextPtr loc
const auto & partition_ast = ast->as<ASTPartition &>();
if (partition_ast.all)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Only Support DETACH PARTITION ALL currently");
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Only Support DROP/DETACH/ATTACH PARTITION ALL currently");
if (!partition_ast.value)
{

View File

@ -195,7 +195,7 @@ void MergeTreeIndexGranuleVectorSimilarity::serializeBinary(WriteBuffer & ostr)
LOG_TRACE(logger, "Start writing vector similarity index");
if (empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to write empty minmax index {}", backQuote(index_name));
throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to write empty vector similarity index {}", backQuote(index_name));
writeIntBinary(FILE_FORMAT_VERSION, ostr);

View File

@ -40,10 +40,12 @@ void extractReferenceVectorFromLiteral(std::vector<Float64> & reference_vector,
}
}
VectorSimilarityCondition::Info::DistanceFunction stringToDistanceFunction(std::string_view distance_function)
VectorSimilarityCondition::Info::DistanceFunction stringToDistanceFunction(const String & distance_function)
{
if (distance_function == "L2Distance")
return VectorSimilarityCondition::Info::DistanceFunction::L2;
else if (distance_function == "cosineDistance")
return VectorSimilarityCondition::Info::DistanceFunction::Cosine;
else
return VectorSimilarityCondition::Info::DistanceFunction::Unknown;
}
@ -57,7 +59,7 @@ VectorSimilarityCondition::VectorSimilarityCondition(const SelectQueryInfo & que
, index_is_useful(checkQueryStructure(query_info))
{}
bool VectorSimilarityCondition::alwaysUnknownOrTrue(String distance_function) const
bool VectorSimilarityCondition::alwaysUnknownOrTrue(const String & distance_function) const
{
if (!index_is_useful)
return true; /// query isn't supported

View File

@ -57,7 +57,8 @@ public:
enum class DistanceFunction : uint8_t
{
Unknown,
L2
L2,
Cosine
};
std::vector<Float64> reference_vector;
@ -68,7 +69,7 @@ public:
};
/// Returns false if query can be speeded up by an ANN index, true otherwise.
bool alwaysUnknownOrTrue(String distance_function) const;
bool alwaysUnknownOrTrue(const String & distance_function) const;
std::vector<Float64> getReferenceVector() const;
size_t getDimensions() const;
@ -141,18 +142,12 @@ private:
/// Traverses the AST of ORDERBY section
void traverseOrderByAST(const ASTPtr & node, RPN & rpn);
/// Returns true and stores ANNExpr if the query has valid WHERE section
static bool matchRPNWhere(RPN & rpn, Info & info);
/// Returns true and stores ANNExpr if the query has valid ORDERBY section
static bool matchRPNOrderBy(RPN & rpn, Info & info);
/// Returns true and stores Length if we have valid LIMIT clause in query
static bool matchRPNLimit(RPNElement & rpn, UInt64 & limit);
/// Matches dist function, reference vector, column name
static bool matchMainParts(RPN::iterator & iter, const RPN::iterator & end, Info & info);
/// Gets float or int from AST node
static float getFloatOrIntLiteralOrPanic(const RPN::iterator& iter);

View File

@ -391,17 +391,9 @@ IMergeTreeDataPart::Checksums checkDataPart(
auto file_name = it->name();
if (!data_part_storage.isDirectory(file_name))
{
const bool is_projection_part = data_part->isProjectionPart();
auto remote_path = data_part_storage.getRemotePath(file_name, /* if_exists */is_projection_part);
if (remote_path.empty())
{
chassert(is_projection_part);
throw Exception(
ErrorCodes::BROKEN_PROJECTION,
"Remote path for {} does not exist for projection path. Projection {} is broken",
file_name, data_part->name);
}
cache.removePathIfExists(remote_path, FileCache::getCommonUser().user_id);
auto remote_paths = data_part_storage.getRemotePaths(file_name);
for (const auto & remote_path : remote_paths)
cache.removePathIfExists(remote_path, FileCache::getCommonUser().user_id);
}
}

View File

@ -9,6 +9,7 @@
#include <Storages/ColumnsDescription.h>
#include <Storages/Statistics/ConditionSelectivityEstimator.h>
#include <Storages/Statistics/StatisticsCountMinSketch.h>
#include <Storages/Statistics/StatisticsMinMax.h>
#include <Storages/Statistics/StatisticsTDigest.h>
#include <Storages/Statistics/StatisticsUniq.h>
#include <Storages/StatisticsDescription.h>
@ -101,6 +102,8 @@ Float64 ColumnStatistics::estimateLess(const Field & val) const
{
if (stats.contains(StatisticsType::TDigest))
return stats.at(StatisticsType::TDigest)->estimateLess(val);
if (stats.contains(StatisticsType::MinMax))
return stats.at(StatisticsType::MinMax)->estimateLess(val);
return rows * ConditionSelectivityEstimator::default_cond_range_factor;
}
@ -121,6 +124,14 @@ Float64 ColumnStatistics::estimateEqual(const Field & val) const
if (stats.contains(StatisticsType::CountMinSketch))
return stats.at(StatisticsType::CountMinSketch)->estimateEqual(val);
#endif
if (stats.contains(StatisticsType::Uniq))
{
UInt64 cardinality = stats.at(StatisticsType::Uniq)->estimateCardinality();
if (cardinality == 0 || rows == 0)
return 0;
return 1.0 / cardinality * rows; /// assume uniform distribution
}
return rows * ConditionSelectivityEstimator::default_cond_equal_factor;
}
@ -198,6 +209,9 @@ void MergeTreeStatisticsFactory::registerValidator(StatisticsType stats_type, Va
MergeTreeStatisticsFactory::MergeTreeStatisticsFactory()
{
registerValidator(StatisticsType::MinMax, minMaxStatisticsValidator);
registerCreator(StatisticsType::MinMax, minMaxStatisticsCreator);
registerValidator(StatisticsType::TDigest, tdigestStatisticsValidator);
registerCreator(StatisticsType::TDigest, tdigestStatisticsCreator);
@ -234,7 +248,7 @@ ColumnStatisticsPtr MergeTreeStatisticsFactory::get(const ColumnDescription & co
{
auto it = creators.find(type);
if (it == creators.end())
throw Exception(ErrorCodes::INCORRECT_QUERY, "Unknown statistic type '{}'. Available types: 'tdigest' 'uniq' and 'count_min'", type);
throw Exception(ErrorCodes::INCORRECT_QUERY, "Unknown statistic type '{}'. Available types: 'count_min', 'minmax', 'tdigest' and 'uniq'", type);
auto stat_ptr = (it->second)(desc, column_desc.type);
column_stat->stats[type] = stat_ptr;
}

View File

@ -1,4 +1,3 @@
#include <Storages/Statistics/StatisticsCountMinSketch.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataTypes/DataTypeNullable.h>

View File

@ -0,0 +1,86 @@
#include <Storages/Statistics/StatisticsMinMax.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataTypes/DataTypeNullable.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <algorithm>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_STATISTICS;
}
StatisticsMinMax::StatisticsMinMax(const SingleStatisticsDescription & description, const DataTypePtr & data_type_)
: IStatistics(description)
, data_type(data_type_)
{
}
void StatisticsMinMax::update(const ColumnPtr & column)
{
for (size_t row = 0; row < column->size(); ++row)
{
if (column->isNullAt(row))
continue;
auto value = column->getFloat64(row);
min = std::min(value, min);
max = std::max(value, max);
}
row_count += column->size();
}
void StatisticsMinMax::serialize(WriteBuffer & buf)
{
writeIntBinary(row_count, buf);
writeFloatBinary(min, buf);
writeFloatBinary(max, buf);
}
void StatisticsMinMax::deserialize(ReadBuffer & buf)
{
readIntBinary(row_count, buf);
readFloatBinary(min, buf);
readFloatBinary(max, buf);
}
Float64 StatisticsMinMax::estimateLess(const Field & val) const
{
if (row_count == 0)
return 0;
auto val_as_float = StatisticsUtils::tryConvertToFloat64(val, data_type);
if (!val_as_float.has_value())
return 0;
if (val_as_float < min)
return 0;
if (val_as_float > max)
return row_count;
if (min == max)
return (val_as_float != max) ? 0 : row_count;
return ((*val_as_float - min) / (max - min)) * row_count;
}
void minMaxStatisticsValidator(const SingleStatisticsDescription & /*description*/, const DataTypePtr & data_type)
{
auto inner_data_type = removeNullable(data_type);
inner_data_type = removeLowCardinalityAndNullable(inner_data_type);
if (!inner_data_type->isValueRepresentedByNumber())
throw Exception(ErrorCodes::ILLEGAL_STATISTICS, "Statistics of type 'minmax' do not support type {}", data_type->getName());
}
StatisticsPtr minMaxStatisticsCreator(const SingleStatisticsDescription & description, const DataTypePtr & data_type)
{
return std::make_shared<StatisticsMinMax>(description, data_type);
}
}

View File

@ -0,0 +1,33 @@
#pragma once
#include <Storages/Statistics/Statistics.h>
#include <DataTypes/IDataType.h>
namespace DB
{
class StatisticsMinMax : public IStatistics
{
public:
StatisticsMinMax(const SingleStatisticsDescription & statistics_description, const DataTypePtr & data_type_);
void update(const ColumnPtr & column) override;
void serialize(WriteBuffer & buf) override;
void deserialize(ReadBuffer & buf) override;
Float64 estimateLess(const Field & val) const override;
private:
Float64 min = std::numeric_limits<Float64>::max();
Float64 max = std::numeric_limits<Float64>::min();
UInt64 row_count = 0;
DataTypePtr data_type;
};
void minMaxStatisticsValidator(const SingleStatisticsDescription & description, const DataTypePtr & data_type);
StatisticsPtr minMaxStatisticsCreator(const SingleStatisticsDescription & description, const DataTypePtr & data_type);
}

View File

@ -56,7 +56,7 @@ void uniqStatisticsValidator(const SingleStatisticsDescription & /*description*/
{
DataTypePtr inner_data_type = removeNullable(data_type);
inner_data_type = removeLowCardinalityAndNullable(inner_data_type);
if (!inner_data_type->isValueRepresentedByNumber())
if (!inner_data_type->isValueRepresentedByNumber() && !isStringOrFixedString(inner_data_type))
throw Exception(ErrorCodes::ILLEGAL_STATISTICS, "Statistics of type 'uniq' do not support type {}", data_type->getName());
}

View File

@ -50,7 +50,9 @@ static StatisticsType stringToStatisticsType(String type)
return StatisticsType::Uniq;
if (type == "count_min")
return StatisticsType::CountMinSketch;
throw Exception(ErrorCodes::INCORRECT_QUERY, "Unknown statistics type: {}. Supported statistics types are 'tdigest', 'uniq' and 'count_min'.", type);
if (type == "minmax")
return StatisticsType::MinMax;
throw Exception(ErrorCodes::INCORRECT_QUERY, "Unknown statistics type: {}. Supported statistics types are 'count_min', 'minmax', 'tdigest' and 'uniq'.", type);
}
String SingleStatisticsDescription::getTypeName() const
@ -63,8 +65,10 @@ String SingleStatisticsDescription::getTypeName() const
return "Uniq";
case StatisticsType::CountMinSketch:
return "count_min";
case StatisticsType::MinMax:
return "minmax";
default:
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown statistics type: {}. Supported statistics types are 'tdigest', 'uniq' and 'count_min'.", type);
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown statistics type: {}. Supported statistics types are 'count_min', 'minmax', 'tdigest' and 'uniq'.", type);
}
}

View File

@ -14,6 +14,7 @@ enum class StatisticsType : UInt8
TDigest = 0,
Uniq = 1,
CountMinSketch = 2,
MinMax = 3,
Max = 63,
};

View File

@ -2090,9 +2090,22 @@ void StorageMergeTree::replacePartitionFrom(const StoragePtr & source_table, con
ProfileEventsScope profile_events_scope;
MergeTreeData & src_data = checkStructureAndGetMergeTreeData(source_table, source_metadata_snapshot, my_metadata_snapshot);
String partition_id = getPartitionIDFromQuery(partition, local_context);
DataPartsVector src_parts;
String partition_id;
bool is_all = partition->as<ASTPartition>()->all;
if (is_all)
{
if (replace)
throw DB::Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Only support DROP/DETACH/ATTACH PARTITION ALL currently");
src_parts = src_data.getVisibleDataPartsVector(local_context);
}
else
{
partition_id = getPartitionIDFromQuery(partition, local_context);
src_parts = src_data.getVisibleDataPartsVectorInPartition(local_context, partition_id);
}
DataPartsVector src_parts = src_data.getVisibleDataPartsVectorInPartition(local_context, partition_id);
MutableDataPartsVector dst_parts;
std::vector<scope_guard> dst_parts_locks;
@ -2100,6 +2113,9 @@ void StorageMergeTree::replacePartitionFrom(const StoragePtr & source_table, con
for (const DataPartPtr & src_part : src_parts)
{
if (is_all)
partition_id = src_part->partition.getID(src_data);
if (!canReplacePartition(src_part))
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Cannot replace partition '{}' because part '{}' has inconsistent granularity with table",

View File

@ -8033,24 +8033,77 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
/// First argument is true, because we possibly will add new data to current table.
auto lock1 = lockForShare(query_context->getCurrentQueryId(), query_context->getSettingsRef().lock_acquire_timeout);
auto lock2 = source_table->lockForShare(query_context->getCurrentQueryId(), query_context->getSettingsRef().lock_acquire_timeout);
auto storage_settings_ptr = getSettings();
auto source_metadata_snapshot = source_table->getInMemoryMetadataPtr();
auto metadata_snapshot = getInMemoryMetadataPtr();
const auto storage_settings_ptr = getSettings();
const auto source_metadata_snapshot = source_table->getInMemoryMetadataPtr();
const auto metadata_snapshot = getInMemoryMetadataPtr();
const MergeTreeData & src_data = checkStructureAndGetMergeTreeData(source_table, source_metadata_snapshot, metadata_snapshot);
Stopwatch watch;
std::unordered_set<String> partitions;
if (partition->as<ASTPartition>()->all)
{
if (replace)
throw DB::Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Only support DROP/DETACH/ATTACH PARTITION ALL currently");
partitions = src_data.getAllPartitionIds();
}
else
{
partitions = std::unordered_set<String>();
partitions.emplace(getPartitionIDFromQuery(partition, query_context));
}
LOG_INFO(log, "Will try to attach {} partitions", partitions.size());
const Stopwatch watch;
ProfileEventsScope profile_events_scope;
const auto zookeeper = getZooKeeper();
MergeTreeData & src_data = checkStructureAndGetMergeTreeData(source_table, source_metadata_snapshot, metadata_snapshot);
String partition_id = getPartitionIDFromQuery(partition, query_context);
const bool zero_copy_enabled = storage_settings_ptr->allow_remote_fs_zero_copy_replication
|| dynamic_cast<const MergeTreeData *>(source_table.get())->getSettings()->allow_remote_fs_zero_copy_replication;
std::unique_ptr<ReplicatedMergeTreeLogEntryData> entries[partitions.size()];
size_t idx = 0;
for (const auto & partition_id : partitions)
{
entries[idx] = replacePartitionFromImpl(watch,
profile_events_scope,
metadata_snapshot,
src_data,
partition_id,
zookeeper,
replace,
zero_copy_enabled,
storage_settings_ptr->always_use_copy_instead_of_hardlinks,
query_context);
++idx;
}
for (const auto & entry : entries)
waitForLogEntryToBeProcessedIfNecessary(*entry, query_context);
}
std::unique_ptr<ReplicatedMergeTreeLogEntryData> StorageReplicatedMergeTree::replacePartitionFromImpl(
const Stopwatch & watch,
ProfileEventsScope & profile_events_scope,
const StorageMetadataPtr & metadata_snapshot,
const MergeTreeData & src_data,
const String & partition_id,
const ZooKeeperPtr & zookeeper,
bool replace,
const bool & zero_copy_enabled,
const bool & always_use_copy_instead_of_hardlinks,
const ContextPtr & query_context)
{
/// NOTE: Some covered parts may be missing in src_all_parts if corresponding log entries are not executed yet.
DataPartsVector src_all_parts = src_data.getVisibleDataPartsVectorInPartition(query_context, partition_id);
LOG_DEBUG(log, "Cloning {} parts", src_all_parts.size());
static const String TMP_PREFIX = "tmp_replace_from_";
auto zookeeper = getZooKeeper();
std::optional<ZooKeeperMetadataTransaction> txn;
if (auto query_txn = query_context->getZooKeeperMetadataTransaction())
txn.emplace(query_txn->getZooKeeper(),
query_txn->getDatabaseZooKeeperPath(),
query_txn->isInitialQuery(),
query_txn->getTaskZooKeeperPath());
/// Retry if alter_partition_version changes
for (size_t retry = 0; retry < 1000; ++retry)
@ -8136,11 +8189,9 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
UInt64 index = lock->getNumber();
MergeTreePartInfo dst_part_info(partition_id, index, index, src_part->info.level);
bool zero_copy_enabled = storage_settings_ptr->allow_remote_fs_zero_copy_replication
|| dynamic_cast<const MergeTreeData *>(source_table.get())->getSettings()->allow_remote_fs_zero_copy_replication;
IDataPartStorage::ClonePartParams clone_params
{
.copy_instead_of_hardlink = storage_settings_ptr->always_use_copy_instead_of_hardlinks || (zero_copy_enabled && src_part->isStoredOnRemoteDiskWithZeroCopySupport()),
.copy_instead_of_hardlink = always_use_copy_instead_of_hardlinks || (zero_copy_enabled && src_part->isStoredOnRemoteDiskWithZeroCopySupport()),
.metadata_version_to_write = metadata_snapshot->getMetadataVersion()
};
if (replace)
@ -8148,7 +8199,7 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
/// Replace can only work on the same disk
auto [dst_part, part_lock] = cloneAndLoadDataPart(
src_part,
TMP_PREFIX,
TMP_PREFIX_REPLACE_PARTITION_FROM,
dst_part_info,
metadata_snapshot,
clone_params,
@ -8163,7 +8214,7 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
/// Attach can work on another disk
auto [dst_part, part_lock] = cloneAndLoadDataPart(
src_part,
TMP_PREFIX,
TMP_PREFIX_REPLACE_PARTITION_FROM,
dst_part_info,
metadata_snapshot,
clone_params,
@ -8179,15 +8230,15 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
part_checksums.emplace_back(hash_hex);
}
ReplicatedMergeTreeLogEntryData entry;
auto entry = std::make_unique<ReplicatedMergeTreeLogEntryData>();
{
auto src_table_id = src_data.getStorageID();
entry.type = ReplicatedMergeTreeLogEntryData::REPLACE_RANGE;
entry.source_replica = replica_name;
entry.create_time = time(nullptr);
entry.replace_range_entry = std::make_shared<ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry>();
entry->type = ReplicatedMergeTreeLogEntryData::REPLACE_RANGE;
entry->source_replica = replica_name;
entry->create_time = time(nullptr);
entry->replace_range_entry = std::make_shared<ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry>();
auto & entry_replace = *entry.replace_range_entry;
auto & entry_replace = *entry->replace_range_entry;
entry_replace.drop_range_part_name = drop_range_fake_part_name;
entry_replace.from_database = src_table_id.database_name;
entry_replace.from_table = src_table_id.table_name;
@ -8220,7 +8271,7 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
ephemeral_locks[i].getUnlockOp(ops);
}
if (auto txn = query_context->getZooKeeperMetadataTransaction())
if (txn)
txn->moveOpsTo(ops);
delimiting_block_lock->getUnlockOp(ops);
@ -8228,7 +8279,7 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
ops.emplace_back(zkutil::makeSetRequest(alter_partition_version_path, "", alter_partition_version_stat.version));
/// Just update version, because merges assignment relies on it
ops.emplace_back(zkutil::makeSetRequest(fs::path(zookeeper_path) / "log", "", -1));
ops.emplace_back(zkutil::makeCreateRequest(fs::path(zookeeper_path) / "log/log-", entry.toString(), zkutil::CreateMode::PersistentSequential));
ops.emplace_back(zkutil::makeCreateRequest(fs::path(zookeeper_path) / "log/log-", entry->toString(), zkutil::CreateMode::PersistentSequential));
Transaction transaction(*this, NO_TRANSACTION_RAW);
{
@ -8278,14 +8329,11 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
}
String log_znode_path = dynamic_cast<const Coordination::CreateResponse &>(*op_results.back()).path_created;
entry.znode_name = log_znode_path.substr(log_znode_path.find_last_of('/') + 1);
entry->znode_name = log_znode_path.substr(log_znode_path.find_last_of('/') + 1);
for (auto & lock : ephemeral_locks)
lock.assumeUnlocked();
lock2.reset();
lock1.reset();
/// We need to pull the REPLACE_RANGE before cleaning the replaced parts (otherwise CHeckThread may decide that parts are lost)
queue.pullLogsToQueue(getZooKeeperAndAssertNotReadonly(), {}, ReplicatedMergeTreeQueue::SYNC);
// No need to block operations further, especially that in case we have to wait for mutation to finish, the intent would block
@ -8294,10 +8342,7 @@ void StorageReplicatedMergeTree::replacePartitionFrom(
parts_holder.clear();
cleanup_thread.wakeup();
waitForLogEntryToBeProcessedIfNecessary(entry, query_context);
return;
return entry;
}
throw Exception(

View File

@ -37,6 +37,7 @@
#include <base/defines.h>
#include <Core/BackgroundSchedulePool.h>
#include <QueryPipeline/Pipe.h>
#include <Common/ProfileEventsScope.h>
#include <Storages/MergeTree/BackgroundJobsAssignee.h>
#include <Parsers/SyncReplicaMode.h>
@ -1013,6 +1014,18 @@ private:
DataPartsVector::const_iterator it;
};
const String TMP_PREFIX_REPLACE_PARTITION_FROM = "tmp_replace_from_";
std::unique_ptr<ReplicatedMergeTreeLogEntryData> replacePartitionFromImpl(
const Stopwatch & watch,
ProfileEventsScope & profile_events_scope,
const StorageMetadataPtr & metadata_snapshot,
const MergeTreeData & src_data,
const String & partition_id,
const zkutil::ZooKeeperPtr & zookeeper,
bool replace,
const bool & zero_copy_enabled,
const bool & always_use_copy_instead_of_hardlinks,
const ContextPtr & query_context);
};
String getPartNamePossiblyFake(MergeTreeDataFormatVersion format_version, const MergeTreePartInfo & part_info);

View File

@ -2112,6 +2112,7 @@ class ClickHouseCluster:
self.base_cmd + ["up", "--force-recreate", "--no-deps", "-d", node.name]
)
node.ip_address = self.get_instance_ip(node.name)
node.ipv6_address = self.get_instance_global_ipv6(node.name)
node.client = Client(node.ip_address, command=self.client_bin_path)
logging.info("Restart node with ip change")
@ -3182,6 +3183,7 @@ class ClickHouseCluster:
for instance in self.instances.values():
instance.docker_client = self.docker_client
instance.ip_address = self.get_instance_ip(instance.name)
instance.ipv6_address = self.get_instance_global_ipv6(instance.name)
logging.debug(
f"Waiting for ClickHouse start in {instance.name}, ip: {instance.ip_address}..."

View File

@ -3,6 +3,7 @@ import subprocess
import time
import logging
import docker
import ipaddress
class PartitionManager:
@ -26,25 +27,76 @@ class PartitionManager:
self._check_instance(instance)
self._add_rule(
{"source": instance.ip_address, "destination_port": 2181, "action": action}
{
"source": instance.ip_address,
"destination_port": 2181,
"action": action,
}
)
self._add_rule(
{"destination": instance.ip_address, "source_port": 2181, "action": action}
{
"destination": instance.ip_address,
"source_port": 2181,
"action": action,
}
)
if instance.ipv6_address:
self._add_rule(
{
"source": instance.ipv6_address,
"destination_port": 2181,
"action": action,
}
)
self._add_rule(
{
"destination": instance.ipv6_address,
"source_port": 2181,
"action": action,
}
)
def dump_rules(self):
return _NetworkManager.get().dump_rules()
v4 = _NetworkManager.get().dump_rules()
v6 = _NetworkManager.get().dump_v6_rules()
return v4 + v6
def restore_instance_zk_connections(self, instance, action="DROP"):
self._check_instance(instance)
self._delete_rule(
{"source": instance.ip_address, "destination_port": 2181, "action": action}
{
"source": instance.ip_address,
"destination_port": 2181,
"action": action,
}
)
self._delete_rule(
{"destination": instance.ip_address, "source_port": 2181, "action": action}
{
"destination": instance.ip_address,
"source_port": 2181,
"action": action,
}
)
if instance.ipv6_address:
self._delete_rule(
{
"source": instance.ipv6_address,
"destination_port": 2181,
"action": action,
}
)
self._delete_rule(
{
"destination": instance.ipv6_address,
"source_port": 2181,
"action": action,
}
)
def partition_instances(self, left, right, port=None, action="DROP"):
self._check_instance(left)
self._check_instance(right)
@ -59,16 +111,34 @@ class PartitionManager:
rule["destination_port"] = port
return rule
def create_rule_v6(src, dst):
rule = {
"source": src.ipv6_address,
"destination": dst.ipv6_address,
"action": action,
}
if port is not None:
rule["destination_port"] = port
return rule
self._add_rule(create_rule(left, right))
self._add_rule(create_rule(right, left))
if left.ipv6_address and right.ipv6_address:
self._add_rule(create_rule_v6(left, right))
self._add_rule(create_rule_v6(right, left))
def add_network_delay(self, instance, delay_ms):
self._add_tc_netem_delay(instance, delay_ms)
def heal_all(self):
while self._iptables_rules:
rule = self._iptables_rules.pop()
_NetworkManager.get().delete_iptables_rule(**rule)
if self._is_ipv6_rule(rule):
_NetworkManager.get().delete_ip6tables_rule(**rule)
else:
_NetworkManager.get().delete_iptables_rule(**rule)
while self._netem_delayed_instances:
instance = self._netem_delayed_instances.pop()
@ -90,12 +160,27 @@ class PartitionManager:
if instance.ip_address is None:
raise Exception("Instance + " + instance.name + " is not launched!")
@staticmethod
def _is_ipv6_rule(rule):
if rule.get("source"):
return ipaddress.ip_address(rule["source"]).version == 6
if rule.get("destination"):
return ipaddress.ip_address(rule["destination"]).version == 6
return False
def _add_rule(self, rule):
_NetworkManager.get().add_iptables_rule(**rule)
if self._is_ipv6_rule(rule):
_NetworkManager.get().add_ip6tables_rule(**rule)
else:
_NetworkManager.get().add_iptables_rule(**rule)
self._iptables_rules.append(rule)
def _delete_rule(self, rule):
_NetworkManager.get().delete_iptables_rule(**rule)
if self._is_ipv6_rule(rule):
_NetworkManager.get().delete_ip6tables_rule(**rule)
else:
_NetworkManager.get().delete_iptables_rule(**rule)
self._iptables_rules.remove(rule)
def _add_tc_netem_delay(self, instance, delay_ms):
@ -150,35 +235,65 @@ class _NetworkManager:
cls._instance = cls(**kwargs)
return cls._instance
def setup_ip6tables_docker_user_chain(self):
_rules = subprocess.check_output(f"ip6tables-save", shell=True)
if "DOCKER-USER" in _rules.decode("utf-8"):
return
setup_cmds = [
["ip6tables", "--wait", "-N", "DOCKER-USER"],
["ip6tables", "--wait", "-I", "FORWARD", "-j", "DOCKER-USER"],
["ip6tables", "--wait", "-A", "DOCKER-USER", "-j", "RETURN"],
]
for cmd in setup_cmds:
self._exec_run(cmd, privileged=True)
def add_iptables_rule(self, **kwargs):
cmd = ["iptables", "--wait", "-I", "DOCKER-USER", "1"]
cmd.extend(self._iptables_cmd_suffix(**kwargs))
self._exec_run(cmd, privileged=True)
def add_ip6tables_rule(self, **kwargs):
self.setup_ip6tables_docker_user_chain()
cmd = ["ip6tables", "--wait", "-I", "DOCKER-USER", "1"]
cmd.extend(self._iptables_cmd_suffix(**kwargs))
self._exec_run(cmd, privileged=True)
def delete_iptables_rule(self, **kwargs):
cmd = ["iptables", "--wait", "-D", "DOCKER-USER"]
cmd.extend(self._iptables_cmd_suffix(**kwargs))
self._exec_run(cmd, privileged=True)
def delete_ip6tables_rule(self, **kwargs):
cmd = ["ip6tables", "--wait", "-D", "DOCKER-USER"]
cmd.extend(self._iptables_cmd_suffix(**kwargs))
self._exec_run(cmd, privileged=True)
def dump_rules(self):
cmd = ["iptables", "-L", "DOCKER-USER"]
return self._exec_run(cmd, privileged=True)
def dump_v6_rules(self):
cmd = ["ip6tables", "-L", "DOCKER-USER"]
return self._exec_run(cmd, privileged=True)
@staticmethod
def clean_all_user_iptables_rules():
for i in range(1000):
iptables_iter = i
# when rules will be empty, it will return error
res = subprocess.run("iptables --wait -D DOCKER-USER 1", shell=True)
for iptables in ("iptables", "ip6tables"):
for i in range(1000):
iptables_iter = i
# when rules will be empty, it will return error
res = subprocess.run(f"{iptables} --wait -D DOCKER-USER 1", shell=True)
if res.returncode != 0:
logging.info(
"All iptables rules cleared, "
+ str(iptables_iter)
+ " iterations, last error: "
+ str(res.stderr)
)
return
if res.returncode != 0:
logging.info(
f"All {iptables} rules cleared, "
+ str(iptables_iter)
+ " iterations, last error: "
+ str(res.stderr)
)
break
@staticmethod
def _iptables_cmd_suffix(

View File

@ -343,6 +343,13 @@ def test_increment_backup_without_changes():
def test_incremental_backup_overflow():
if (
instance.is_built_with_thread_sanitizer()
or instance.is_built_with_memory_sanitizer()
or instance.is_built_with_address_sanitizer()
):
pytest.skip("The test is slow in builds with sanitizer")
backup_name = new_backup_name()
incremental_backup_name = new_backup_name()

View File

@ -154,6 +154,13 @@ def test_aggregate_states(start_cluster):
def test_string_functions(start_cluster):
if (
upstream.is_built_with_thread_sanitizer()
or upstream.is_built_with_memory_sanitizer()
or upstream.is_built_with_address_sanitizer()
):
pytest.skip("The test is slow in builds with sanitizer")
functions = backward.query(
"""
SELECT if(NOT empty(alias_to), alias_to, name)

View File

@ -5,6 +5,7 @@
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<use_cluster>false</use_cluster>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>

View File

@ -5,6 +5,7 @@
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<use_cluster>false</use_cluster>
<tcp_port>9181</tcp_port>
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>

View File

@ -5,6 +5,7 @@
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<use_cluster>false</use_cluster>
<tcp_port>9181</tcp_port>
<server_id>3</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>

View File

@ -2,6 +2,9 @@ import pytest
from helpers.cluster import ClickHouseCluster
from time import sleep
from retry import retry
from multiprocessing.dummy import Pool
import helpers.keeper_utils as keeper_utils
from minio.deleteobjects import DeleteObject
from kazoo.client import KazooClient
@ -75,7 +78,18 @@ def wait_node(node):
raise Exception("Can't wait node", node.name, "to become ready")
def delete_keeper_snapshots_logs(nodex):
nodex.exec_in_container(
[
"bash",
"-c",
"rm -rf /var/lib/clickhouse/coordination/log /var/lib/clickhouse/coordination/snapshots",
]
)
def test_s3_upload(started_cluster):
node1_zk = get_fake_zk(node1.name)
# we defined in configs snapshot_distance as 50
@ -89,6 +103,11 @@ def test_s3_upload(started_cluster):
for obj in list(cluster.minio_client.list_objects("snapshots"))
]
def delete_s3_snapshots():
snapshots = cluster.minio_client.list_objects("snapshots")
for s in snapshots:
cluster.minio_client.remove_object("snapshots", s.object_name)
# Keeper sends snapshots asynchornously, hence we need to retry.
@retry(AssertionError, tries=10, delay=2)
def _check_snapshots():
@ -125,3 +144,26 @@ def test_s3_upload(started_cluster):
)
destroy_zk_client(node2_zk)
node2.stop_clickhouse()
delete_keeper_snapshots_logs(node2)
node3.stop_clickhouse()
delete_keeper_snapshots_logs(node3)
delete_keeper_snapshots_logs(node1)
p = Pool(3)
waiters = []
def start_clickhouse(node):
node.start_clickhouse()
waiters.append(p.apply_async(start_clickhouse, args=(node1,)))
waiters.append(p.apply_async(start_clickhouse, args=(node2,)))
waiters.append(p.apply_async(start_clickhouse, args=(node3,)))
delete_s3_snapshots() # for next iteration
for waiter in waiters:
waiter.wait()
keeper_utils.wait_until_connected(cluster, node1)
keeper_utils.wait_until_connected(cluster, node2)
keeper_utils.wait_until_connected(cluster, node3)

View File

@ -629,5 +629,6 @@ def test_roles_cache():
check()
instance.query("DROP USER " + ", ".join(users))
instance.query("DROP ROLE " + ", ".join(roles))
if roles:
instance.query("DROP ROLE " + ", ".join(roles))
instance.query("DROP TABLE tbl")

View File

@ -10,11 +10,12 @@ REPLACE recursive
4 8
1
ATTACH FROM
5 8
6 8
10 12
OPTIMIZE
5 8 5
5 8 3
10 12 9
10 12 5
After restart
5 8
10 12
DETACH+ATTACH PARTITION
3 4
7 7

View File

@ -53,12 +53,16 @@ DROP TABLE src;
CREATE TABLE src (p UInt64, k String, d UInt64) ENGINE = MergeTree PARTITION BY p ORDER BY k;
INSERT INTO src VALUES (1, '0', 1);
INSERT INTO src VALUES (1, '1', 1);
INSERT INTO src VALUES (2, '2', 1);
INSERT INTO src VALUES (3, '3', 1);
SYSTEM STOP MERGES dst;
INSERT INTO dst VALUES (1, '1', 2);
INSERT INTO dst VALUES (1, '1', 2), (1, '2', 0);
ALTER TABLE dst ATTACH PARTITION 1 FROM src;
SELECT count(), sum(d) FROM dst;
ALTER TABLE dst ATTACH PARTITION ALL FROM src;
SELECT count(), sum(d) FROM dst;
SELECT 'OPTIMIZE';
SELECT count(), sum(d), uniqExact(_part) FROM dst;

View File

@ -16,6 +16,7 @@ REPLACE recursive
ATTACH FROM
5 8
5 8
7 12
REPLACE with fetch
4 6
4 6

View File

@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Tags: zookeeper, no-object-storage
# Tags: zookeeper, no-object-storage, long
# Because REPLACE PARTITION does not forces immediate removal of replaced data parts from local filesystem
# (it tries to do it as quick as possible, but it still performed in separate thread asynchronously)
@ -82,6 +82,8 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE src;"
$CLICKHOUSE_CLIENT --query="CREATE TABLE src (p UInt64, k String, d UInt64) ENGINE = MergeTree PARTITION BY p ORDER BY k;"
$CLICKHOUSE_CLIENT --query="INSERT INTO src VALUES (1, '0', 1);"
$CLICKHOUSE_CLIENT --query="INSERT INTO src VALUES (1, '1', 1);"
$CLICKHOUSE_CLIENT --query="INSERT INTO src VALUES (3, '1', 2);"
$CLICKHOUSE_CLIENT --query="INSERT INTO src VALUES (4, '1', 2);"
$CLICKHOUSE_CLIENT --query="INSERT INTO dst_r2 VALUES (1, '1', 2);"
query_with_retry "ALTER TABLE dst_r2 ATTACH PARTITION 1 FROM src;"
@ -90,6 +92,13 @@ $CLICKHOUSE_CLIENT --query="SYSTEM SYNC REPLICA dst_r1;"
$CLICKHOUSE_CLIENT --query="SELECT count(), sum(d) FROM dst_r1;"
$CLICKHOUSE_CLIENT --query="SELECT count(), sum(d) FROM dst_r2;"
query_with_retry "ALTER TABLE dst_r2 ATTACH PARTITION ALL FROM src;"
$CLICKHOUSE_CLIENT --query="SYSTEM SYNC REPLICA dst_r2;"
$CLICKHOUSE_CLIENT --query="SELECT count(), sum(d) FROM dst_r2;"
query_with_retry "ALTER TABLE dst_r2 DROP PARTITION 3;"
$CLICKHOUSE_CLIENT --query="SYSTEM SYNC REPLICA dst_r2;"
query_with_retry "ALTER TABLE dst_r2 DROP PARTITION 4;"
$CLICKHOUSE_CLIENT --query="SYSTEM SYNC REPLICA dst_r2;"
$CLICKHOUSE_CLIENT --query="SELECT 'REPLACE with fetch';"
$CLICKHOUSE_CLIENT --query="DROP TABLE src;"

View File

@ -1,4 +1,4 @@
-- Tags: no-fasttest
-- Tags: no-fasttest, no-debug, no-tsan, no-msan, no-asan
SET min_execution_speed = 100000000000, timeout_before_checking_execution_speed = 0;
SELECT count() FROM system.numbers; -- { serverError TOO_SLOW }

View File

@ -0,0 +1,32 @@
-- Tags: no-fasttest, no-ordinary-database
-- Tests that CREATE TABLE and ADD INDEX respect setting 'allow_experimental_vector_similarity_index'.
DROP TABLE IF EXISTS tab;
-- Test CREATE TABLE
SET allow_experimental_vector_similarity_index = 0;
CREATE TABLE tab (id UInt32, vec Array(Float32), INDEX idx vec TYPE vector_similarity('hnsw', 'L2Distance')) ENGINE = MergeTree ORDER BY tuple(); -- { serverError SUPPORT_IS_DISABLED }
SET allow_experimental_vector_similarity_index = 1;
CREATE TABLE tab (id UInt32, vec Array(Float32), INDEX idx vec TYPE vector_similarity('hnsw', 'L2Distance')) ENGINE = MergeTree ORDER BY tuple();
DROP TABLE tab;
-- Test ADD INDEX
CREATE TABLE tab (id UInt32, vec Array(Float32)) ENGINE = MergeTree ORDER BY tuple();
SET allow_experimental_vector_similarity_index = 0;
ALTER TABLE tab ADD INDEX idx vec TYPE vector_similarity('hnsw', 'L2Distance'); -- { serverError SUPPORT_IS_DISABLED }
SET allow_experimental_vector_similarity_index = 1;
ALTER TABLE tab ADD INDEX idx vec TYPE vector_similarity('hnsw', 'L2Distance');
-- Other index DDL must work regardless of the setting
SET allow_experimental_vector_similarity_index = 0;
ALTER TABLE tab MATERIALIZE INDEX idx;
-- ALTER TABLE tab CLEAR INDEX idx; -- <-- Should work but doesn't w/o enabled setting. Unexpected but not terrible.
ALTER TABLE tab DROP INDEX idx;
DROP TABLE tab;

View File

@ -41,6 +41,21 @@ Special cases
6 [1,9.3] 0.005731362878640178
1 [2,3.2] 0.15200169244542905
7 [5.5,4.7] 0.3503476876550442
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))
Sorting (Sorting for ORDER BY)
Expression (Before ORDER BY)
ReadFromMergeTree (default.tab)
Indexes:
PrimaryKey
Condition: true
Parts: 1/1
Granules: 4/4
Skip
Name: idx
Description: vector_similarity GRANULARITY 2
Parts: 1/1
Granules: 2/4
-- Setting "max_limit_for_ann_queries"
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))

View File

@ -63,6 +63,13 @@ FROM tab
ORDER BY cosineDistance(vec, reference_vec)
LIMIT 3;
EXPLAIN indexes = 1
WITH [0.0, 2.0] AS reference_vec
SELECT id, vec, cosineDistance(vec, reference_vec)
FROM tab
ORDER BY cosineDistance(vec, reference_vec)
LIMIT 3;
SELECT '-- Setting "max_limit_for_ann_queries"';
EXPLAIN indexes=1
WITH [0.0, 2.0] as reference_vec

View File

@ -0,0 +1,5 @@
Test create statistics:
CREATE TABLE default.tab\n(\n `a` LowCardinality(Int64) STATISTICS(tdigest, uniq, count_min, minmax),\n `b` LowCardinality(Nullable(String)) STATISTICS(uniq, count_min),\n `c` LowCardinality(Nullable(Int64)) STATISTICS(tdigest, uniq, count_min, minmax),\n `d` DateTime STATISTICS(tdigest, uniq, count_min, minmax),\n `pk` String\n)\nENGINE = MergeTree\nORDER BY pk\nSETTINGS index_granularity = 8192
Test materialize and drop statistics:
CREATE TABLE default.tab\n(\n `a` LowCardinality(Int64),\n `b` LowCardinality(Nullable(String)) STATISTICS(uniq, count_min),\n `c` LowCardinality(Nullable(Int64)),\n `d` DateTime,\n `pk` String\n)\nENGINE = MergeTree\nORDER BY pk\nSETTINGS index_granularity = 8192
CREATE TABLE default.tab\n(\n `a` LowCardinality(Int64),\n `b` LowCardinality(Nullable(String)),\n `c` LowCardinality(Nullable(Int64)),\n `d` DateTime,\n `pk` String\n)\nENGINE = MergeTree\nORDER BY pk\nSETTINGS index_granularity = 8192

View File

@ -0,0 +1,35 @@
-- Tags: no-fasttest
DROP TABLE IF EXISTS tab SYNC;
SET allow_experimental_statistics = 1;
SET allow_statistics_optimize = 1;
SET allow_suspicious_low_cardinality_types=1;
SET mutations_sync = 2;
SELECT 'Test create statistics:';
CREATE TABLE tab
(
a LowCardinality(Int64) STATISTICS(count_min, minmax, tdigest, uniq),
b LowCardinality(Nullable(String)) STATISTICS(count_min, uniq),
c LowCardinality(Nullable(Int64)) STATISTICS(count_min, minmax, tdigest, uniq),
d DateTime STATISTICS(count_min, minmax, tdigest, uniq),
pk String,
) Engine = MergeTree() ORDER BY pk;
INSERT INTO tab select number, number, number, toDateTime(number), generateUUIDv4() FROM system.numbers LIMIT 10000;
SHOW CREATE TABLE tab;
SELECT 'Test materialize and drop statistics:';
ALTER TABLE tab DROP STATISTICS a, b, c, d;
ALTER TABLE tab ADD STATISTICS b TYPE count_min, uniq;
ALTER TABLE tab MATERIALIZE STATISTICS b;
SHOW CREATE TABLE tab;
ALTER TABLE tab DROP STATISTICS b;
SHOW CREATE TABLE tab;
DROP TABLE IF EXISTS tab SYNC;

View File

@ -7,6 +7,7 @@ SET mutations_sync = 1;
DROP TABLE IF EXISTS tab;
SET allow_experimental_statistics = 0;
-- Error case: Can't create statistics when allow_experimental_statistics = 0
CREATE TABLE tab (col Float64 STATISTICS(tdigest)) Engine = MergeTree() ORDER BY tuple(); -- { serverError INCORRECT_QUERY }
@ -46,7 +47,7 @@ CREATE TABLE tab (col Map(UInt64, UInt64) STATISTICS(tdigest)) Engine = MergeTre
CREATE TABLE tab (col UUID STATISTICS(tdigest)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col IPv6 STATISTICS(tdigest)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
-- uniq requires data_type.isValueRepresentedByInteger
-- uniq requires data_type.isValueRepresentedByInteger or (Fixed)String
-- These types work:
CREATE TABLE tab (col UInt8 STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col UInt256 STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
@ -61,9 +62,9 @@ CREATE TABLE tab (col IPv4 STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple
CREATE TABLE tab (col Nullable(UInt8) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col LowCardinality(UInt8) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col LowCardinality(Nullable(UInt8)) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col String STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col FixedString(1) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
-- These types don't work:
CREATE TABLE tab (col String STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col FixedString(1) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Array(Float64) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Tuple(Float64, Float64) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Map(UInt64, UInt64) STATISTICS(uniq)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
@ -94,6 +95,30 @@ CREATE TABLE tab (col Map(UInt64, UInt64) STATISTICS(count_min)) Engine = MergeT
CREATE TABLE tab (col UUID STATISTICS(count_min)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col IPv6 STATISTICS(count_min)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
-- minmax requires data_type.isValueRepresentedByInteger
-- These types work:
CREATE TABLE tab (col UInt8 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col UInt256 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Float32 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Decimal32(3) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Date STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Date32 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col DateTime STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col DateTime64 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Enum('hello', 'world') STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col IPv4 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col Nullable(UInt8) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col LowCardinality(UInt8) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
CREATE TABLE tab (col LowCardinality(Nullable(UInt8)) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); DROP TABLE tab;
-- These types don't work:
CREATE TABLE tab (col String STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col FixedString(1) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Array(Float64) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Tuple(Float64, Float64) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col Map(UInt64, UInt64) STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col UUID STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
CREATE TABLE tab (col IPv6 STATISTICS(minmax)) Engine = MergeTree() ORDER BY tuple(); -- { serverError ILLEGAL_STATISTICS }
-- CREATE TABLE was easy, ALTER is more fun
CREATE TABLE tab
@ -173,6 +198,13 @@ ALTER TABLE tab MODIFY STATISTICS f64 TYPE count_min; ALTER TABLE tab DROP STATI
-- Doesn't work:
ALTER TABLE tab ADD STATISTICS a TYPE count_min; -- { serverError ILLEGAL_STATISTICS }
ALTER TABLE tab MODIFY STATISTICS a TYPE count_min; -- { serverError ILLEGAL_STATISTICS }
-- minmax
-- Works:
ALTER TABLE tab ADD STATISTICS f64 TYPE minmax; ALTER TABLE tab DROP STATISTICS f64;
ALTER TABLE tab MODIFY STATISTICS f64 TYPE minmax; ALTER TABLE tab DROP STATISTICS f64;
-- Doesn't work:
ALTER TABLE tab ADD STATISTICS a TYPE minmax; -- { serverError ILLEGAL_STATISTICS }
ALTER TABLE tab MODIFY STATISTICS a TYPE minmax; -- { serverError ILLEGAL_STATISTICS }
-- Any data type changes on columns with statistics are disallowed, for simplicity even if the new data type is compatible with all existing
-- statistics objects (e.g. tdigest can be created on Float64 and UInt64)

View File

@ -3,10 +3,13 @@ u64 and =
10
10
10
10
0
0
0
0
0
10
10
10
10
@ -16,10 +19,13 @@ u64 and <
70
70
70
70
80
80
80
80
80
70
70
70
70
@ -29,6 +35,8 @@ f64 and =
10
10
10
10
0
0
0
0
@ -37,6 +45,8 @@ f64 and =
10
10
10
10
0
0
0
0
@ -46,6 +56,8 @@ f64 and <
70
70
70
70
80
80
80
80
@ -54,6 +66,8 @@ f64 and <
70
70
70
70
80
80
80
80
@ -63,6 +77,8 @@ dt and =
0
0
0
0
10
10
10
10
@ -72,6 +88,8 @@ dt and <
10000
10000
10000
10000
70
70
70
70
@ -89,6 +107,10 @@ b and =
5000
5000
5000
5000
5000
5000
0
0
0
0
@ -96,3 +118,4 @@ b and =
s and =
10
10
10

View File

@ -12,46 +12,56 @@ CREATE TABLE tab
(
u64 UInt64,
u64_tdigest UInt64 STATISTICS(tdigest),
u64_minmax UInt64 STATISTICS(minmax),
u64_count_min UInt64 STATISTICS(count_min),
u64_uniq UInt64 STATISTICS(uniq),
f64 Float64,
f64_tdigest Float64 STATISTICS(tdigest),
f64_minmax Float64 STATISTICS(minmax),
f64_count_min Float64 STATISTICS(count_min),
f64_uniq Float64 STATISTICS(uniq),
dt DateTime,
dt_tdigest DateTime STATISTICS(tdigest),
dt_minmax DateTime STATISTICS(minmax),
dt_count_min DateTime STATISTICS(count_min),
dt_uniq DateTime STATISTICS(uniq),
b Bool,
b_tdigest Bool STATISTICS(tdigest),
b_minmax Bool STATISTICS(minmax),
b_count_min Bool STATISTICS(count_min),
b_uniq Bool STATISTICS(uniq),
s String,
-- s_tdigest String STATISTICS(tdigest), -- not supported by tdigest
s_count_min String STATISTICS(count_min)
-- s_uniq String STATISTICS(uniq), -- not supported by uniq
-- s_minmax String STATISTICS(minmax), -- not supported by minmax
s_count_min String STATISTICS(count_min),
s_uniq String STATISTICS(uniq)
) Engine = MergeTree() ORDER BY tuple()
SETTINGS min_bytes_for_wide_part = 0;
INSERT INTO tab
-- SELECT number % 10000, number % 1000, -(number % 100) FROM system.numbers LIMIT 10000;
SELECT number % 1000,
SELECT number % 1000, -- u64
number % 1000,
number % 1000,
number % 1000,
number % 1000,
number % 1000, -- f64
number % 1000,
number % 1000,
number % 1000,
number % 1000,
number % 1000, -- dt
number % 1000,
number % 1000,
number % 1000,
number % 1000,
number % 2, -- b
number % 2,
number % 2,
number % 2,
number % 2,
toString(number % 1000),
toString(number % 1000),
toString(number % 1000)
FROM system.numbers LIMIT 10000;
@ -61,21 +71,25 @@ SELECT 'u64 and =';
SELECT count(*) FROM tab WHERE u64 = 7;
SELECT count(*) FROM tab WHERE u64_tdigest = 7;
SELECT count(*) FROM tab WHERE u64_minmax = 7;
SELECT count(*) FROM tab WHERE u64_count_min = 7;
SELECT count(*) FROM tab WHERE u64_uniq = 7;
SELECT count(*) FROM tab WHERE u64 = 7.7;
SELECT count(*) FROM tab WHERE u64_tdigest = 7.7;
SELECT count(*) FROM tab WHERE u64_minmax = 7.7;
SELECT count(*) FROM tab WHERE u64_count_min = 7.7;
SELECT count(*) FROM tab WHERE u64_uniq = 7.7;
SELECT count(*) FROM tab WHERE u64 = '7';
SELECT count(*) FROM tab WHERE u64_tdigest = '7';
SELECT count(*) FROM tab WHERE u64_minmax = '7';
SELECT count(*) FROM tab WHERE u64_count_min = '7';
SELECT count(*) FROM tab WHERE u64_uniq = '7';
SELECT count(*) FROM tab WHERE u64 = '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_tdigest = '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_minmax = '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_count_min = '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_uniq = '7.7'; -- { serverError TYPE_MISMATCH }
@ -83,21 +97,25 @@ SELECT 'u64 and <';
SELECT count(*) FROM tab WHERE u64 < 7;
SELECT count(*) FROM tab WHERE u64_tdigest < 7;
SELECT count(*) FROM tab WHERE u64_minmax < 7;
SELECT count(*) FROM tab WHERE u64_count_min < 7;
SELECT count(*) FROM tab WHERE u64_uniq < 7;
SELECT count(*) FROM tab WHERE u64 < 7.7;
SELECT count(*) FROM tab WHERE u64_tdigest < 7.7;
SELECT count(*) FROM tab WHERE u64_minmax < 7.7;
SELECT count(*) FROM tab WHERE u64_count_min < 7.7;
SELECT count(*) FROM tab WHERE u64_uniq < 7.7;
SELECT count(*) FROM tab WHERE u64 < '7';
SELECT count(*) FROM tab WHERE u64_tdigest < '7';
SELECT count(*) FROM tab WHERE u64_minmax < '7';
SELECT count(*) FROM tab WHERE u64_count_min < '7';
SELECT count(*) FROM tab WHERE u64_uniq < '7';
SELECT count(*) FROM tab WHERE u64 < '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_tdigest < '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_minmax < '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_count_min < '7.7'; -- { serverError TYPE_MISMATCH }
SELECT count(*) FROM tab WHERE u64_uniq < '7.7'; -- { serverError TYPE_MISMATCH }
@ -107,21 +125,25 @@ SELECT 'f64 and =';
SELECT count(*) FROM tab WHERE f64 = 7;
SELECT count(*) FROM tab WHERE f64_tdigest = 7;
SELECT count(*) FROM tab WHERE f64_minmax = 7;
SELECT count(*) FROM tab WHERE f64_count_min = 7;
SELECT count(*) FROM tab WHERE f64_uniq = 7;
SELECT count(*) FROM tab WHERE f64 = 7.7;
SELECT count(*) FROM tab WHERE f64_tdigest = 7.7;
SELECT count(*) FROM tab WHERE f64_minmax = 7.7;
SELECT count(*) FROM tab WHERE f64_count_min = 7.7;
SELECT count(*) FROM tab WHERE f64_uniq = 7.7;
SELECT count(*) FROM tab WHERE f64 = '7';
SELECT count(*) FROM tab WHERE f64_tdigest = '7';
SELECT count(*) FROM tab WHERE f64_minmax = '7';
SELECT count(*) FROM tab WHERE f64_count_min = '7';
SELECT count(*) FROM tab WHERE f64_uniq = '7';
SELECT count(*) FROM tab WHERE f64 = '7.7';
SELECT count(*) FROM tab WHERE f64_tdigest = '7.7';
SELECT count(*) FROM tab WHERE f64_minmax = '7.7';
SELECT count(*) FROM tab WHERE f64_count_min = '7.7';
SELECT count(*) FROM tab WHERE f64_uniq = '7.7';
@ -129,21 +151,25 @@ SELECT 'f64 and <';
SELECT count(*) FROM tab WHERE f64 < 7;
SELECT count(*) FROM tab WHERE f64_tdigest < 7;
SELECT count(*) FROM tab WHERE f64_minmax < 7;
SELECT count(*) FROM tab WHERE f64_count_min < 7;
SELECT count(*) FROM tab WHERE f64_uniq < 7;
SELECT count(*) FROM tab WHERE f64 < 7.7;
SELECT count(*) FROM tab WHERE f64_tdigest < 7.7;
SELECT count(*) FROM tab WHERE f64_minmax < 7.7;
SELECT count(*) FROM tab WHERE f64_count_min < 7.7;
SELECT count(*) FROM tab WHERE f64_uniq < 7.7;
SELECT count(*) FROM tab WHERE f64 < '7';
SELECT count(*) FROM tab WHERE f64_tdigest < '7';
SELECT count(*) FROM tab WHERE f64_minmax < '7';
SELECT count(*) FROM tab WHERE f64_count_min < '7';
SELECT count(*) FROM tab WHERE f64_uniq < '7';
SELECT count(*) FROM tab WHERE f64 < '7.7';
SELECT count(*) FROM tab WHERE f64_tdigest < '7.7';
SELECT count(*) FROM tab WHERE f64_minmax < '7.7';
SELECT count(*) FROM tab WHERE f64_count_min < '7.7';
SELECT count(*) FROM tab WHERE f64_uniq < '7.7';
@ -153,11 +179,13 @@ SELECT 'dt and =';
SELECT count(*) FROM tab WHERE dt = '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_tdigest = '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_minmax = '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_count_min = '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_uniq = '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt = 7;
SELECT count(*) FROM tab WHERE dt_tdigest = 7;
SELECT count(*) FROM tab WHERE dt_minmax = 7;
SELECT count(*) FROM tab WHERE dt_count_min = 7;
SELECT count(*) FROM tab WHERE dt_uniq = 7;
@ -165,11 +193,13 @@ SELECT 'dt and <';
SELECT count(*) FROM tab WHERE dt < '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_tdigest < '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_minmax < '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_count_min < '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt_uniq < '2024-08-08 11:12:13';
SELECT count(*) FROM tab WHERE dt < 7;
SELECT count(*) FROM tab WHERE dt_tdigest < 7;
SELECT count(*) FROM tab WHERE dt_minmax < 7;
SELECT count(*) FROM tab WHERE dt_count_min < 7;
SELECT count(*) FROM tab WHERE dt_uniq < 7;
@ -179,21 +209,25 @@ SELECT 'b and =';
SELECT count(*) FROM tab WHERE b = true;
SELECT count(*) FROM tab WHERE b_tdigest = true;
SELECT count(*) FROM tab WHERE b_minmax = true;
SELECT count(*) FROM tab WHERE b_count_min = true;
SELECT count(*) FROM tab WHERE b_uniq = true;
SELECT count(*) FROM tab WHERE b = 'true';
SELECT count(*) FROM tab WHERE b_tdigest = 'true';
SELECT count(*) FROM tab WHERE b_minmax = 'true';
SELECT count(*) FROM tab WHERE b_count_min = 'true';
SELECT count(*) FROM tab WHERE b_uniq = 'true';
SELECT count(*) FROM tab WHERE b = 1;
SELECT count(*) FROM tab WHERE b_tdigest = 1;
SELECT count(*) FROM tab WHERE b_minmax = 1;
SELECT count(*) FROM tab WHERE b_count_min = 1;
SELECT count(*) FROM tab WHERE b_uniq = 1;
SELECT count(*) FROM tab WHERE b = 1.1;
SELECT count(*) FROM tab WHERE b_tdigest = 1.1;
SELECT count(*) FROM tab WHERE b_minmax = 1.1;
SELECT count(*) FROM tab WHERE b_count_min = 1.1;
SELECT count(*) FROM tab WHERE b_uniq = 1.1;
@ -203,12 +237,14 @@ SELECT 's and =';
SELECT count(*) FROM tab WHERE s = 7; -- { serverError NO_COMMON_TYPE }
-- SELECT count(*) FROM tab WHERE s_tdigest = 7; -- not supported
-- SELECT count(*) FROM tab WHERE s_minmax = 7; -- not supported
SELECT count(*) FROM tab WHERE s_count_min = 7; -- { serverError NO_COMMON_TYPE }
-- SELECT count(*) FROM tab WHERE s_uniq = 7; -- not supported
SELECT count(*) FROM tab WHERE s_uniq = 7; -- { serverError NO_COMMON_TYPE }
SELECT count(*) FROM tab WHERE s = '7';
-- SELECT count(*) FROM tab WHERE s_tdigest = '7'; -- not supported
-- SELECT count(*) FROM tab WHERE s_minmax = '7'; -- not supported
SELECT count(*) FROM tab WHERE s_count_min = '7';
-- SELECT count(*) FROM tab WHERE s_uniq = '7'; -- not supported
SELECT count(*) FROM tab WHERE s_uniq = '7';
DROP TABLE tab;

View File

@ -1,16 +1,22 @@
v24.8.4.13-lts 2024-09-06
v24.8.3.59-lts 2024-09-03
v24.8.2.3-lts 2024-08-22
v24.8.1.2684-lts 2024-08-21
v24.7.6.8-stable 2024-09-06
v24.7.5.37-stable 2024-09-03
v24.7.4.51-stable 2024-08-23
v24.7.3.47-stable 2024-09-04
v24.7.3.42-stable 2024-08-08
v24.7.2.13-stable 2024-08-01
v24.7.1.2915-stable 2024-07-30
v24.6.6.6-stable 2024-09-06
v24.6.5.30-stable 2024-09-03
v24.6.4.42-stable 2024-08-23
v24.6.3.95-stable 2024-08-06
v24.6.3.38-stable 2024-09-04
v24.6.2.17-stable 2024-07-05
v24.6.1.4423-stable 2024-07-01
v24.5.8.10-stable 2024-09-06
v24.5.7.31-stable 2024-09-03
v24.5.6.45-stable 2024-08-23
v24.5.5.78-stable 2024-08-05
@ -22,6 +28,7 @@ v24.4.4.113-stable 2024-08-02
v24.4.3.25-stable 2024-06-14
v24.4.2.141-stable 2024-06-07
v24.4.1.2088-stable 2024-05-01
v24.3.11.7-lts 2024-09-06
v24.3.10.33-lts 2024-09-03
v24.3.9.5-lts 2024-08-22
v24.3.8.13-lts 2024-08-20

1 v24.8.3.59-lts v24.8.4.13-lts 2024-09-03 2024-09-06
1 v24.8.4.13-lts 2024-09-06
2 v24.8.3.59-lts v24.8.3.59-lts 2024-09-03 2024-09-03
3 v24.8.2.3-lts v24.8.2.3-lts 2024-08-22 2024-08-22
4 v24.8.1.2684-lts v24.8.1.2684-lts 2024-08-21 2024-08-21
5 v24.7.6.8-stable 2024-09-06
6 v24.7.5.37-stable v24.7.5.37-stable 2024-09-03 2024-09-03
7 v24.7.4.51-stable v24.7.4.51-stable 2024-08-23 2024-08-23
8 v24.7.3.47-stable 2024-09-04
9 v24.7.3.42-stable v24.7.3.42-stable 2024-08-08 2024-08-08
10 v24.7.2.13-stable v24.7.2.13-stable 2024-08-01 2024-08-01
11 v24.7.1.2915-stable v24.7.1.2915-stable 2024-07-30 2024-07-30
12 v24.6.6.6-stable 2024-09-06
13 v24.6.5.30-stable v24.6.5.30-stable 2024-09-03 2024-09-03
14 v24.6.4.42-stable v24.6.4.42-stable 2024-08-23 2024-08-23
15 v24.6.3.95-stable v24.6.3.95-stable 2024-08-06 2024-08-06
16 v24.6.3.38-stable 2024-09-04
17 v24.6.2.17-stable v24.6.2.17-stable 2024-07-05 2024-07-05
18 v24.6.1.4423-stable v24.6.1.4423-stable 2024-07-01 2024-07-01
19 v24.5.8.10-stable 2024-09-06
20 v24.5.7.31-stable v24.5.7.31-stable 2024-09-03 2024-09-03
21 v24.5.6.45-stable v24.5.6.45-stable 2024-08-23 2024-08-23
22 v24.5.5.78-stable v24.5.5.78-stable 2024-08-05 2024-08-05
28 v24.4.3.25-stable v24.4.3.25-stable 2024-06-14 2024-06-14
29 v24.4.2.141-stable v24.4.2.141-stable 2024-06-07 2024-06-07
30 v24.4.1.2088-stable v24.4.1.2088-stable 2024-05-01 2024-05-01
31 v24.3.11.7-lts 2024-09-06
32 v24.3.10.33-lts v24.3.10.33-lts 2024-09-03 2024-09-03
33 v24.3.9.5-lts v24.3.9.5-lts 2024-08-22 2024-08-22
34 v24.3.8.13-lts v24.3.8.13-lts 2024-08-20 2024-08-20