diff --git a/CHANGELOG.md b/CHANGELOG.md index c4935f88245..4507b491493 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,7 +9,7 @@ # 2024 Changelog -### ClickHouse release 24.6, 2024-06-27 +### ClickHouse release 24.6, 2024-06-30 #### Backward Incompatible Change * Enable asynchronous load of databases and tables by default. See the `async_load_databases` in config.xml. While this change is fully compatible, it can introduce a difference in behavior. When `async_load_databases` is false, as in the previous versions, the server will not accept connections until all tables are loaded. When `async_load_databases` is true, as in the new version, the server can accept connections before all the tables are loaded. If a query is made to a table that is not yet loaded, it will wait for the table's loading, which can take considerable time. It can change the behavior of the server if it is part of a large distributed system under a load balancer. In the first case, the load balancer can get a connection refusal and quickly failover to another server. In the second case, the load balancer can connect to a server that is still loading the tables, and the query will have a higher latency. Moreover, if many queries accumulate in the waiting state, it can lead to a "thundering herd" problem when they start processing simultaneously. This can make a difference only for highly loaded distributed backends. You can set the value of `async_load_databases` to false to avoid this problem. [#57695](https://github.com/ClickHouse/ClickHouse/pull/57695) ([Alexey Milovidov](https://github.com/alexey-milovidov)). diff --git a/docs/en/engines/table-engines/integrations/azureBlobStorage.md b/docs/en/engines/table-engines/integrations/azureBlobStorage.md index dfc27d6b8cf..bdf96832e9d 100644 --- a/docs/en/engines/table-engines/integrations/azureBlobStorage.md +++ b/docs/en/engines/table-engines/integrations/azureBlobStorage.md @@ -56,6 +56,15 @@ SELECT * FROM test_table; - `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`. - `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`. +## Authentication + +Currently there are 3 ways to authenticate: +- `Managed Identity` - Can be used by providing an `endpoint`, `connection_string` or `storage_account_url`. +- `SAS Token` - Can be used by providing an `endpoint`, `connection_string` or `storage_account_url`. It is identified by presence of '?' in the url. +- `Workload Identity` - Can be used by providing an `endpoint` or `storage_account_url`. If `use_workload_identity` parameter is set in config, ([workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications)) is used for authentication. + + + ## See also [Azure Blob Storage Table Function](/docs/en/sql-reference/table-functions/azureBlobStorage) diff --git a/docs/en/getting-started/example-datasets/images/stackoverflow.png b/docs/en/getting-started/example-datasets/images/stackoverflow.png new file mode 100644 index 00000000000..f31acdc8cc3 Binary files /dev/null and b/docs/en/getting-started/example-datasets/images/stackoverflow.png differ diff --git a/docs/en/getting-started/example-datasets/stackoverflow.md b/docs/en/getting-started/example-datasets/stackoverflow.md new file mode 100644 index 00000000000..e982a3c3dfc --- /dev/null +++ b/docs/en/getting-started/example-datasets/stackoverflow.md @@ -0,0 +1,394 @@ +--- +slug: /en/getting-started/example-datasets/stackoverflow +sidebar_label: Stack Overflow +sidebar_position: 1 +description: Analyzing Stack Overflow data with ClickHouse +--- + +# Analyzing Stack Overflow data with ClickHouse + +This dataset contains every `Post`, `User`, `Vote`, `Comment`, `Badge, `PostHistory`, and `PostLink` that has occurred on Stack Overflow. + +Users can either download pre-prepared Parquet versions of the data, containing every post up to April 2024, or download the latest data in XML format and load this. Stack Overflow provide updates to this data periodically - historically every 3 months. + +The following diagram shows the schema for the available tables assuming Parquet format. + +![Stack Overflow schema](./images/stackoverflow.png) + +A description of the schema of this data can be found [here](https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede). + +## Pre-prepared data + +We provide a copy of this data in Parquet format, up to date as of April 2024. While small for ClickHouse with respect to the number of rows (60 million posts), this dataset contains significant volumes of text and large String columns. + +```sql +CREATE DATABASE stackoverflow +``` + +The following timings are for a 96 GiB, 24 vCPU ClickHouse Cloud cluster located in `eu-west-2`. The dataset is located in `eu-west-3`. + +### Posts + +```sql +CREATE TABLE stackoverflow.posts +( + `Id` Int32 CODEC(Delta(4), ZSTD(1)), + `PostTypeId` Enum8('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8), + `AcceptedAnswerId` UInt32, + `CreationDate` DateTime64(3, 'UTC'), + `Score` Int32, + `ViewCount` UInt32 CODEC(Delta(4), ZSTD(1)), + `Body` String, + `OwnerUserId` Int32, + `OwnerDisplayName` String, + `LastEditorUserId` Int32, + `LastEditorDisplayName` String, + `LastEditDate` DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(1)), + `LastActivityDate` DateTime64(3, 'UTC'), + `Title` String, + `Tags` String, + `AnswerCount` UInt16 CODEC(Delta(2), ZSTD(1)), + `CommentCount` UInt8, + `FavoriteCount` UInt8, + `ContentLicense` LowCardinality(String), + `ParentId` String, + `CommunityOwnedDate` DateTime64(3, 'UTC'), + `ClosedDate` DateTime64(3, 'UTC') +) +ENGINE = MergeTree +PARTITION BY toYear(CreationDate) +ORDER BY (PostTypeId, toDate(CreationDate), CreationDate) + +INSERT INTO stackoverflow.posts SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posts/*.parquet') + +0 rows in set. Elapsed: 265.466 sec. Processed 59.82 million rows, 38.07 GB (225.34 thousand rows/s., 143.42 MB/s.) +``` + +Posts are also available by year e.g. [https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posts/2020.parquet](https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posts/2020.parquet) + + +### Votes + +```sql +CREATE TABLE stackoverflow.votes +( + `Id` UInt32, + `PostId` Int32, + `VoteTypeId` UInt8, + `CreationDate` DateTime64(3, 'UTC'), + `UserId` Int32, + `BountyAmount` UInt8 +) +ENGINE = MergeTree +ORDER BY (VoteTypeId, CreationDate, PostId, UserId) + +INSERT INTO stackoverflow.votes SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/votes/*.parquet') + +0 rows in set. Elapsed: 21.605 sec. Processed 238.98 million rows, 2.13 GB (11.06 million rows/s., 98.46 MB/s.) +``` + +Votes are also available by year e.g. [https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posts/2020.parquet](https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/votes/2020.parquet) + + +### Comments + +```sql +CREATE TABLE stackoverflow.comments +( + `Id` UInt32, + `PostId` UInt32, + `Score` UInt16, + `Text` String, + `CreationDate` DateTime64(3, 'UTC'), + `UserId` Int32, + `UserDisplayName` LowCardinality(String) +) +ENGINE = MergeTree +ORDER BY CreationDate + +INSERT INTO stackoverflow.comments SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/comments/*.parquet') + +0 rows in set. Elapsed: 56.593 sec. Processed 90.38 million rows, 11.14 GB (1.60 million rows/s., 196.78 MB/s.) +``` + +Comments are also available by year e.g. [https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posts/2020.parquet](https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/comments/2020.parquet) + +### Users + +```sql +CREATE TABLE stackoverflow.users +( + `Id` Int32, + `Reputation` LowCardinality(String), + `CreationDate` DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(1)), + `DisplayName` String, + `LastAccessDate` DateTime64(3, 'UTC'), + `AboutMe` String, + `Views` UInt32, + `UpVotes` UInt32, + `DownVotes` UInt32, + `WebsiteUrl` String, + `Location` LowCardinality(String), + `AccountId` Int32 +) +ENGINE = MergeTree +ORDER BY (Id, CreationDate) + +INSERT INTO stackoverflow.users SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/users.parquet') + +0 rows in set. Elapsed: 10.988 sec. Processed 22.48 million rows, 1.36 GB (2.05 million rows/s., 124.10 MB/s.) +``` + +### Badges + +```sql +CREATE TABLE stackoverflow.badges +( + `Id` UInt32, + `UserId` Int32, + `Name` LowCardinality(String), + `Date` DateTime64(3, 'UTC'), + `Class` Enum8('Gold' = 1, 'Silver' = 2, 'Bronze' = 3), + `TagBased` Bool +) +ENGINE = MergeTree +ORDER BY UserId + +INSERT INTO stackoverflow.badges SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/badges.parquet') + +0 rows in set. Elapsed: 6.635 sec. Processed 51.29 million rows, 797.05 MB (7.73 million rows/s., 120.13 MB/s.) +``` + +### `PostLinks` + +```sql +CREATE TABLE stackoverflow.postlinks +( + `Id` UInt64, + `CreationDate` DateTime64(3, 'UTC'), + `PostId` Int32, + `RelatedPostId` Int32, + `LinkTypeId` Enum8('Linked' = 1, 'Duplicate' = 3) +) +ENGINE = MergeTree +ORDER BY (PostId, RelatedPostId) + +INSERT INTO stackoverflow.postlinks SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/postlinks.parquet') + +0 rows in set. Elapsed: 1.534 sec. Processed 6.55 million rows, 129.70 MB (4.27 million rows/s., 84.57 MB/s.) +``` + +### `PostHistory` + +```sql +CREATE TABLE stackoverflow.posthistory +( + `Id` UInt64, + `PostHistoryTypeId` UInt8, + `PostId` Int32, + `RevisionGUID` String, + `CreationDate` DateTime64(3, 'UTC'), + `UserId` Int32, + `Text` String, + `ContentLicense` LowCardinality(String), + `Comment` String, + `UserDisplayName` String +) +ENGINE = MergeTree +ORDER BY (CreationDate, PostId) + +INSERT INTO stackoverflow.posthistory SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/posthistory/*.parquet') + +0 rows in set. Elapsed: 422.795 sec. Processed 160.79 million rows, 67.08 GB (380.30 thousand rows/s., 158.67 MB/s.) +``` + +## Original dataset + +The original dataset is available in compressed (7zip) XML format at [https://archive.org/download/stackexchange](https://archive.org/download/stackexchange) - files with prefix `stackoverflow.com*`. + +### Download + +```bash +wget https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-Comments.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-PostHistory.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-PostLinks.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-Users.7z +wget https://archive.org/download/stackexchange/stackoverflow.com-Votes.7z +``` + +These files are up to 35GB and can take around 30 mins to download depending on internet connection - the download server throttles at around 20MB/sec. + +### Convert to JSON + +At the time of writing, ClickHouse does not have native support for XML as an input format. To load the data into ClickHouse we first convert to NDJSON. + +To convert XML to JSON we recommend the [`xq`](https://github.com/kislyuk/yq) linux tool, a simple `jq` wrapper for XML documents. + +Install xq and jq: + +```bash +sudo apt install jq +pip install yq +``` + +The following steps apply to any of the above files. We use the `stackoverflow.com-Posts.7z` file as an example. Modify as required. + +Extract the file using [p7zip](https://p7zip.sourceforge.net/). This will produce a single xml file - in this case `Posts.xml`. + +> Files are compressed approximately 4.5x. At 22GB compressed, the posts file requires around 97G uncompressed. + +```bash +p7zip -d stackoverflow.com-Posts.7z +``` + +The following splits the xml file into files, each containing 10000 rows. + +```bash +mkdir posts +cd posts +# the following splits the input xml file into sub files of 10000 rows +tail +3 ../Posts.xml | head -n -1 | split -l 10000 --filter='{ printf "\n"; cat - ; printf "\n"; } > $FILE' - +``` + +After running the above users will have a set of files, each with 10000 lines. This ensures the memory overhead of the next command is not excessive (xml to JSON conversion is done in memory). + +```bash +find . -maxdepth 1 -type f -exec xq -c '.rows.row[]' {} \; | sed -e 's:"@:":g' > posts_v2.json +``` + +The above command will produce a single `posts.json` file. + +Load into ClickHouse with the following command. Note the schema is specified for the `posts.json` file. This will need to be adjusted per data type to align with the target table. + +```bash +clickhouse local --query "SELECT * FROM file('posts.json', JSONEachRow, 'Id Int32, PostTypeId UInt8, AcceptedAnswerId UInt32, CreationDate DateTime64(3, \'UTC\'), Score Int32, ViewCount UInt32, Body String, OwnerUserId Int32, OwnerDisplayName String, LastEditorUserId Int32, LastEditorDisplayName String, LastEditDate DateTime64(3, \'UTC\'), LastActivityDate DateTime64(3, \'UTC\'), Title String, Tags String, AnswerCount UInt16, CommentCount UInt8, FavoriteCount UInt8, ContentLicense String, ParentId String, CommunityOwnedDate DateTime64(3, \'UTC\'), ClosedDate DateTime64(3, \'UTC\')') FORMAT Native" | clickhouse client --host --secure --password --query "INSERT INTO stackoverflow.posts_v2 FORMAT Native" +``` + +## Example queries + +A few simple questions to you get started. + +### Most popular tags on Stack Overflow + +```sql + +SELECT + arrayJoin(arrayFilter(t -> (t != ''), splitByChar('|', Tags))) AS Tags, + count() AS c +FROM stackoverflow.posts +GROUP BY Tags +ORDER BY c DESC +LIMIT 10 + +┌─Tags───────┬───────c─┐ +│ javascript │ 2527130 │ +│ python │ 2189638 │ +│ java │ 1916156 │ +│ c# │ 1614236 │ +│ php │ 1463901 │ +│ android │ 1416442 │ +│ html │ 1186567 │ +│ jquery │ 1034621 │ +│ c++ │ 806202 │ +│ css │ 803755 │ +└────────────┴─────────┘ + +10 rows in set. Elapsed: 1.013 sec. Processed 59.82 million rows, 1.21 GB (59.07 million rows/s., 1.19 GB/s.) +Peak memory usage: 224.03 MiB. +``` + +### User with the most answers (active accounts) + +Account requires a `UserId`. + +```sql +SELECT + any(OwnerUserId) UserId, + OwnerDisplayName, + count() AS c +FROM stackoverflow.posts WHERE OwnerDisplayName != '' AND PostTypeId='Answer' AND OwnerUserId != 0 +GROUP BY OwnerDisplayName +ORDER BY c DESC +LIMIT 5 + +┌─UserId─┬─OwnerDisplayName─┬────c─┐ +│ 22656 │ Jon Skeet │ 2727 │ +│ 23354 │ Marc Gravell │ 2150 │ +│ 12950 │ tvanfosson │ 1530 │ +│ 3043 │ Joel Coehoorn │ 1438 │ +│ 10661 │ S.Lott │ 1087 │ +└────────┴──────────────────┴──────┘ + +5 rows in set. Elapsed: 0.154 sec. Processed 35.83 million rows, 193.39 MB (232.33 million rows/s., 1.25 GB/s.) +Peak memory usage: 206.45 MiB. +``` + +### ClickHouse related posts with the most views + +```sql +SELECT + Id, + Title, + ViewCount, + AnswerCount +FROM stackoverflow.posts +WHERE Title ILIKE '%ClickHouse%' +ORDER BY ViewCount DESC +LIMIT 10 + +┌───────Id─┬─Title────────────────────────────────────────────────────────────────────────────┬─ViewCount─┬─AnswerCount─┐ +│ 52355143 │ Is it possible to delete old records from clickhouse table? │ 41462 │ 3 │ +│ 37954203 │ Clickhouse Data Import │ 38735 │ 3 │ +│ 37901642 │ Updating data in Clickhouse │ 36236 │ 6 │ +│ 58422110 │ Pandas: How to insert dataframe into Clickhouse │ 29731 │ 4 │ +│ 63621318 │ DBeaver - Clickhouse - SQL Error [159] .. Read timed out │ 27350 │ 1 │ +│ 47591813 │ How to filter clickhouse table by array column contents? │ 27078 │ 2 │ +│ 58728436 │ How to search the string in query with case insensitive on Clickhouse database? │ 26567 │ 3 │ +│ 65316905 │ Clickhouse: DB::Exception: Memory limit (for query) exceeded │ 24899 │ 2 │ +│ 49944865 │ How to add a column in clickhouse │ 24424 │ 1 │ +│ 59712399 │ How to cast date Strings to DateTime format with extended parsing in ClickHouse? │ 22620 │ 1 │ +└──────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴─────────────┘ + +10 rows in set. Elapsed: 0.472 sec. Processed 59.82 million rows, 1.91 GB (126.63 million rows/s., 4.03 GB/s.) +Peak memory usage: 240.01 MiB. +``` + +### Most controversial posts + +```sql +SELECT + Id, + Title, + UpVotes, + DownVotes, + abs(UpVotes - DownVotes) AS Controversial_ratio +FROM stackoverflow.posts +INNER JOIN +( + SELECT + PostId, + countIf(VoteTypeId = 2) AS UpVotes, + countIf(VoteTypeId = 3) AS DownVotes + FROM stackoverflow.votes + GROUP BY PostId + HAVING (UpVotes > 10) AND (DownVotes > 10) +) AS votes ON posts.Id = votes.PostId +WHERE Title != '' +ORDER BY Controversial_ratio ASC +LIMIT 3 + +┌───────Id─┬─Title─────────────────────────────────────────────┬─UpVotes─┬─DownVotes─┬─Controversial_ratio─┐ +│ 583177 │ VB.NET Infinite For Loop │ 12 │ 12 │ 0 │ +│ 9756797 │ Read console input as enumerable - one statement? │ 16 │ 16 │ 0 │ +│ 13329132 │ What's the point of ARGV in Ruby? │ 22 │ 22 │ 0 │ +└──────────┴───────────────────────────────────────────────────┴─────────┴───────────┴─────────────────────┘ + +3 rows in set. Elapsed: 4.779 sec. Processed 298.80 million rows, 3.16 GB (62.52 million rows/s., 661.05 MB/s.) +Peak memory usage: 6.05 GiB. +``` + +## Attribution + +We thank Stack Overflow for providing this data under the `cc-by-sa 4.0` license, acknowledging their efforts and the original source of the data at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange). diff --git a/programs/keeper-client/KeeperClient.cpp b/programs/keeper-client/KeeperClient.cpp index 68adc2c2aac..a20c1f686f3 100644 --- a/programs/keeper-client/KeeperClient.cpp +++ b/programs/keeper-client/KeeperClient.cpp @@ -383,6 +383,9 @@ int KeeperClient::main(const std::vector & /* args */) for (const auto & key : keys) { + if (key != "node") + continue; + String prefix = "zookeeper." + key; String host = clickhouse_config.configuration->getString(prefix + ".host"); String port = clickhouse_config.configuration->getString(prefix + ".port"); @@ -401,6 +404,7 @@ int KeeperClient::main(const std::vector & /* args */) zk_args.hosts.push_back(host + ":" + port); } + zk_args.availability_zones.resize(zk_args.hosts.size()); zk_args.connection_timeout_ms = config().getInt("connection-timeout", 10) * 1000; zk_args.session_timeout_ms = config().getInt("session-timeout", 10) * 1000; zk_args.operation_timeout_ms = config().getInt("operation-timeout", 10) * 1000; diff --git a/programs/keeper/Keeper.cpp b/programs/keeper/Keeper.cpp index bb04ff88936..f14ef2e5552 100644 --- a/programs/keeper/Keeper.cpp +++ b/programs/keeper/Keeper.cpp @@ -355,10 +355,7 @@ try std::string include_from_path = config().getString("include_from", "/etc/metrika.xml"); - if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX)) - { - PlacementInfo::PlacementInfo::instance().initialize(config()); - } + PlacementInfo::PlacementInfo::instance().initialize(config()); GlobalThreadPool::initialize( /// We need to have sufficient amount of threads for connections + nuraft workers + keeper workers, 1000 is an estimation diff --git a/programs/local/LocalServer.cpp b/programs/local/LocalServer.cpp index 503cb0fb97d..b33e1595056 100644 --- a/programs/local/LocalServer.cpp +++ b/programs/local/LocalServer.cpp @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -59,8 +60,13 @@ # include #endif + namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric MemoryTracking; +} namespace DB { @@ -131,11 +137,12 @@ void LocalServer::initialize(Poco::Util::Application & self) getClientConfiguration().add(loaded_config.configuration.duplicate(), PRIO_DEFAULT, false); } + server_settings.loadSettingsFromConfig(config()); + GlobalThreadPool::initialize( - getClientConfiguration().getUInt("max_thread_pool_size", 10000), - getClientConfiguration().getUInt("max_thread_pool_free_size", 1000), - getClientConfiguration().getUInt("thread_pool_queue_size", 10000) - ); + server_settings.max_thread_pool_size, + server_settings.max_thread_pool_free_size, + server_settings.thread_pool_queue_size); #if USE_AZURE_BLOB_STORAGE /// See the explanation near the same line in Server.cpp @@ -146,18 +153,17 @@ void LocalServer::initialize(Poco::Util::Application & self) #endif getIOThreadPool().initialize( - getClientConfiguration().getUInt("max_io_thread_pool_size", 100), - getClientConfiguration().getUInt("max_io_thread_pool_free_size", 0), - getClientConfiguration().getUInt("io_thread_pool_queue_size", 10000)); + server_settings.max_io_thread_pool_size, + server_settings.max_io_thread_pool_free_size, + server_settings.io_thread_pool_queue_size); - - const size_t active_parts_loading_threads = getClientConfiguration().getUInt("max_active_parts_loading_thread_pool_size", 64); + const size_t active_parts_loading_threads = server_settings.max_active_parts_loading_thread_pool_size; getActivePartsLoadingThreadPool().initialize( active_parts_loading_threads, 0, // We don't need any threads one all the parts will be loaded active_parts_loading_threads); - const size_t outdated_parts_loading_threads = getClientConfiguration().getUInt("max_outdated_parts_loading_thread_pool_size", 32); + const size_t outdated_parts_loading_threads = server_settings.max_outdated_parts_loading_thread_pool_size; getOutdatedPartsLoadingThreadPool().initialize( outdated_parts_loading_threads, 0, // We don't need any threads one all the parts will be loaded @@ -165,7 +171,7 @@ void LocalServer::initialize(Poco::Util::Application & self) getOutdatedPartsLoadingThreadPool().setMaxTurboThreads(active_parts_loading_threads); - const size_t unexpected_parts_loading_threads = getClientConfiguration().getUInt("max_unexpected_parts_loading_thread_pool_size", 32); + const size_t unexpected_parts_loading_threads = server_settings.max_unexpected_parts_loading_thread_pool_size; getUnexpectedPartsLoadingThreadPool().initialize( unexpected_parts_loading_threads, 0, // We don't need any threads one all the parts will be loaded @@ -173,7 +179,7 @@ void LocalServer::initialize(Poco::Util::Application & self) getUnexpectedPartsLoadingThreadPool().setMaxTurboThreads(active_parts_loading_threads); - const size_t cleanup_threads = getClientConfiguration().getUInt("max_parts_cleaning_thread_pool_size", 128); + const size_t cleanup_threads = server_settings.max_parts_cleaning_thread_pool_size; getPartsCleaningThreadPool().initialize( cleanup_threads, 0, // We don't need any threads one all the parts will be deleted @@ -438,7 +444,7 @@ try UseSSL use_ssl; thread_status.emplace(); - StackTrace::setShowAddresses(getClientConfiguration().getBool("show_addresses_in_stack_traces", true)); + StackTrace::setShowAddresses(server_settings.show_addresses_in_stack_traces); setupSignalHandler(); @@ -624,12 +630,43 @@ void LocalServer::processConfig() global_context->getProcessList().setMaxSize(0); const size_t physical_server_memory = getMemoryAmount(); - const double cache_size_to_ram_max_ratio = getClientConfiguration().getDouble("cache_size_to_ram_max_ratio", 0.5); + + size_t max_server_memory_usage = server_settings.max_server_memory_usage; + double max_server_memory_usage_to_ram_ratio = server_settings.max_server_memory_usage_to_ram_ratio; + + size_t default_max_server_memory_usage = static_cast(physical_server_memory * max_server_memory_usage_to_ram_ratio); + + if (max_server_memory_usage == 0) + { + max_server_memory_usage = default_max_server_memory_usage; + LOG_INFO(log, "Setting max_server_memory_usage was set to {}" + " ({} available * {:.2f} max_server_memory_usage_to_ram_ratio)", + formatReadableSizeWithBinarySuffix(max_server_memory_usage), + formatReadableSizeWithBinarySuffix(physical_server_memory), + max_server_memory_usage_to_ram_ratio); + } + else if (max_server_memory_usage > default_max_server_memory_usage) + { + max_server_memory_usage = default_max_server_memory_usage; + LOG_INFO(log, "Setting max_server_memory_usage was lowered to {}" + " because the system has low amount of memory. The amount was" + " calculated as {} available" + " * {:.2f} max_server_memory_usage_to_ram_ratio", + formatReadableSizeWithBinarySuffix(max_server_memory_usage), + formatReadableSizeWithBinarySuffix(physical_server_memory), + max_server_memory_usage_to_ram_ratio); + } + + total_memory_tracker.setHardLimit(max_server_memory_usage); + total_memory_tracker.setDescription("(total)"); + total_memory_tracker.setMetric(CurrentMetrics::MemoryTracking); + + const double cache_size_to_ram_max_ratio = server_settings.cache_size_to_ram_max_ratio; const size_t max_cache_size = static_cast(physical_server_memory * cache_size_to_ram_max_ratio); - String uncompressed_cache_policy = getClientConfiguration().getString("uncompressed_cache_policy", DEFAULT_UNCOMPRESSED_CACHE_POLICY); - size_t uncompressed_cache_size = getClientConfiguration().getUInt64("uncompressed_cache_size", DEFAULT_UNCOMPRESSED_CACHE_MAX_SIZE); - double uncompressed_cache_size_ratio = getClientConfiguration().getDouble("uncompressed_cache_size_ratio", DEFAULT_UNCOMPRESSED_CACHE_SIZE_RATIO); + String uncompressed_cache_policy = server_settings.uncompressed_cache_policy; + size_t uncompressed_cache_size = server_settings.uncompressed_cache_size; + double uncompressed_cache_size_ratio = server_settings.uncompressed_cache_size_ratio; if (uncompressed_cache_size > max_cache_size) { uncompressed_cache_size = max_cache_size; @@ -637,9 +674,9 @@ void LocalServer::processConfig() } global_context->setUncompressedCache(uncompressed_cache_policy, uncompressed_cache_size, uncompressed_cache_size_ratio); - String mark_cache_policy = getClientConfiguration().getString("mark_cache_policy", DEFAULT_MARK_CACHE_POLICY); - size_t mark_cache_size = getClientConfiguration().getUInt64("mark_cache_size", DEFAULT_MARK_CACHE_MAX_SIZE); - double mark_cache_size_ratio = getClientConfiguration().getDouble("mark_cache_size_ratio", DEFAULT_MARK_CACHE_SIZE_RATIO); + String mark_cache_policy = server_settings.mark_cache_policy; + size_t mark_cache_size = server_settings.mark_cache_size; + double mark_cache_size_ratio = server_settings.mark_cache_size_ratio; if (!mark_cache_size) LOG_ERROR(log, "Too low mark cache size will lead to severe performance degradation."); if (mark_cache_size > max_cache_size) @@ -649,9 +686,9 @@ void LocalServer::processConfig() } global_context->setMarkCache(mark_cache_policy, mark_cache_size, mark_cache_size_ratio); - String index_uncompressed_cache_policy = getClientConfiguration().getString("index_uncompressed_cache_policy", DEFAULT_INDEX_UNCOMPRESSED_CACHE_POLICY); - size_t index_uncompressed_cache_size = getClientConfiguration().getUInt64("index_uncompressed_cache_size", DEFAULT_INDEX_UNCOMPRESSED_CACHE_MAX_SIZE); - double index_uncompressed_cache_size_ratio = getClientConfiguration().getDouble("index_uncompressed_cache_size_ratio", DEFAULT_INDEX_UNCOMPRESSED_CACHE_SIZE_RATIO); + String index_uncompressed_cache_policy = server_settings.index_uncompressed_cache_policy; + size_t index_uncompressed_cache_size = server_settings.index_uncompressed_cache_size; + double index_uncompressed_cache_size_ratio = server_settings.index_uncompressed_cache_size_ratio; if (index_uncompressed_cache_size > max_cache_size) { index_uncompressed_cache_size = max_cache_size; @@ -659,9 +696,9 @@ void LocalServer::processConfig() } global_context->setIndexUncompressedCache(index_uncompressed_cache_policy, index_uncompressed_cache_size, index_uncompressed_cache_size_ratio); - String index_mark_cache_policy = getClientConfiguration().getString("index_mark_cache_policy", DEFAULT_INDEX_MARK_CACHE_POLICY); - size_t index_mark_cache_size = getClientConfiguration().getUInt64("index_mark_cache_size", DEFAULT_INDEX_MARK_CACHE_MAX_SIZE); - double index_mark_cache_size_ratio = getClientConfiguration().getDouble("index_mark_cache_size_ratio", DEFAULT_INDEX_MARK_CACHE_SIZE_RATIO); + String index_mark_cache_policy = server_settings.index_mark_cache_policy; + size_t index_mark_cache_size = server_settings.index_mark_cache_size; + double index_mark_cache_size_ratio = server_settings.index_mark_cache_size_ratio; if (index_mark_cache_size > max_cache_size) { index_mark_cache_size = max_cache_size; @@ -669,7 +706,7 @@ void LocalServer::processConfig() } global_context->setIndexMarkCache(index_mark_cache_policy, index_mark_cache_size, index_mark_cache_size_ratio); - size_t mmap_cache_size = getClientConfiguration().getUInt64("mmap_cache_size", DEFAULT_MMAP_CACHE_MAX_SIZE); + size_t mmap_cache_size = server_settings.mmap_cache_size; if (mmap_cache_size > max_cache_size) { mmap_cache_size = max_cache_size; @@ -681,8 +718,8 @@ void LocalServer::processConfig() global_context->setQueryCache(0, 0, 0, 0); #if USE_EMBEDDED_COMPILER - size_t compiled_expression_cache_max_size_in_bytes = getClientConfiguration().getUInt64("compiled_expression_cache_size", DEFAULT_COMPILED_EXPRESSION_CACHE_MAX_SIZE); - size_t compiled_expression_cache_max_elements = getClientConfiguration().getUInt64("compiled_expression_cache_elements_size", DEFAULT_COMPILED_EXPRESSION_CACHE_MAX_ENTRIES); + size_t compiled_expression_cache_max_size_in_bytes = server_settings.compiled_expression_cache_size; + size_t compiled_expression_cache_max_elements = server_settings.compiled_expression_cache_elements_size; CompiledExpressionCacheFactory::instance().init(compiled_expression_cache_max_size_in_bytes, compiled_expression_cache_max_elements); #endif @@ -699,7 +736,7 @@ void LocalServer::processConfig() /// We load temporary database first, because projections need it. DatabaseCatalog::instance().initializeAndLoadTemporaryDatabase(); - std::string default_database = getClientConfiguration().getString("default_database", "default"); + std::string default_database = server_settings.default_database; DatabaseCatalog::instance().attachDatabase(default_database, createClickHouseLocalDatabaseOverlay(default_database, global_context)); global_context->setCurrentDatabase(default_database); diff --git a/programs/local/LocalServer.h b/programs/local/LocalServer.h index 4ab09ffc353..da2466650a7 100644 --- a/programs/local/LocalServer.h +++ b/programs/local/LocalServer.h @@ -66,6 +66,8 @@ private: void applyCmdOptions(ContextMutablePtr context); void applyCmdSettings(ContextMutablePtr context); + ServerSettings server_settings; + std::optional status; std::optional temporary_directory_to_delete; diff --git a/programs/main.cpp b/programs/main.cpp index c270388f17f..61e2bc18ed7 100644 --- a/programs/main.cpp +++ b/programs/main.cpp @@ -13,6 +13,7 @@ #include +#include "config.h" #include "config_tools.h" #include @@ -439,6 +440,14 @@ extern "C" } #endif +/// Prevent messages from JeMalloc in the release build. +/// Some of these messages are non-actionable for the users, such as: +/// : Number of CPUs detected is not deterministic. Per-CPU arena disabled. +#if USE_JEMALLOC && defined(NDEBUG) && !defined(SANITIZER) +extern "C" void (*malloc_message)(void *, const char *s); +__attribute__((constructor(0))) void init_je_malloc_message() { malloc_message = [](void *, const char *){}; } +#endif + /// This allows to implement assert to forbid initialization of a class in static constructors. /// Usage: /// diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index e2554a6ff03..4cb3b5f45c7 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -1003,6 +1003,8 @@ try ServerUUID::load(path / "uuid", log); + PlacementInfo::PlacementInfo::instance().initialize(config()); + zkutil::validateZooKeeperConfig(config()); bool has_zookeeper = zkutil::hasZooKeeperConfig(config()); @@ -1817,11 +1819,6 @@ try } - if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX)) - { - PlacementInfo::PlacementInfo::instance().initialize(config()); - } - { std::lock_guard lock(servers_lock); /// We should start interserver communications before (and more important shutdown after) tables. diff --git a/src/Client/LocalConnection.cpp b/src/Client/LocalConnection.cpp index 3b2c14ee4f9..072184e0a66 100644 --- a/src/Client/LocalConnection.cpp +++ b/src/Client/LocalConnection.cpp @@ -358,22 +358,18 @@ bool LocalConnection::poll(size_t) if (!state->is_finished) { - if (send_progress && (state->after_send_progress.elapsedMicroseconds() >= query_context->getSettingsRef().interactive_delay)) - { - state->after_send_progress.restart(); - next_packet_type = Protocol::Server::Progress; + if (needSendProgressOrMetrics()) return true; - } - - if (send_profile_events && (state->after_send_profile_events.elapsedMicroseconds() >= query_context->getSettingsRef().interactive_delay)) - { - sendProfileEvents(); - return true; - } try { - pollImpl(); + while (pollImpl()) + { + LOG_DEBUG(&Poco::Logger::get("LocalConnection"), "Executor timeout encountered, will retry"); + + if (needSendProgressOrMetrics()) + return true; + } } catch (const Exception & e) { @@ -468,12 +464,34 @@ bool LocalConnection::poll(size_t) return false; } +bool LocalConnection::needSendProgressOrMetrics() +{ + if (send_progress && (state->after_send_progress.elapsedMicroseconds() >= query_context->getSettingsRef().interactive_delay)) + { + state->after_send_progress.restart(); + next_packet_type = Protocol::Server::Progress; + return true; + } + + if (send_profile_events && (state->after_send_profile_events.elapsedMicroseconds() >= query_context->getSettingsRef().interactive_delay)) + { + sendProfileEvents(); + return true; + } + + return false; +} + bool LocalConnection::pollImpl() { Block block; auto next_read = pullBlock(block); - if (block && !state->io.null_format) + if (!block && next_read) + { + return true; + } + else if (block && !state->io.null_format) { state->block.emplace(block); } @@ -482,7 +500,7 @@ bool LocalConnection::pollImpl() state->is_finished = true; } - return true; + return false; } Packet LocalConnection::receivePacket() diff --git a/src/Client/LocalConnection.h b/src/Client/LocalConnection.h index 899d134cce5..fb6fa1b55eb 100644 --- a/src/Client/LocalConnection.h +++ b/src/Client/LocalConnection.h @@ -151,8 +151,11 @@ private: void sendProfileEvents(); + /// Returns true on executor timeout, meaning a retryable error. bool pollImpl(); + bool needSendProgressOrMetrics(); + ContextMutablePtr query_context; Session session; diff --git a/src/Common/GetPriorityForLoadBalancing.cpp b/src/Common/GetPriorityForLoadBalancing.cpp index d4c6f89ff92..dc5704ef6b5 100644 --- a/src/Common/GetPriorityForLoadBalancing.cpp +++ b/src/Common/GetPriorityForLoadBalancing.cpp @@ -60,4 +60,26 @@ GetPriorityForLoadBalancing::getPriorityFunc(LoadBalancing load_balance, size_t return get_priority; } +/// Some load balancing strategies (such as "nearest hostname") have preferred nodes to connect to. +/// Usually it's a node in the same data center/availability zone. +/// For other strategies there's no difference between nodes. +bool GetPriorityForLoadBalancing::hasOptimalNode() const +{ + switch (load_balancing) + { + case LoadBalancing::NEAREST_HOSTNAME: + return true; + case LoadBalancing::HOSTNAME_LEVENSHTEIN_DISTANCE: + return true; + case LoadBalancing::IN_ORDER: + return false; + case LoadBalancing::RANDOM: + return false; + case LoadBalancing::FIRST_OR_RANDOM: + return true; + case LoadBalancing::ROUND_ROBIN: + return false; + } +} + } diff --git a/src/Common/GetPriorityForLoadBalancing.h b/src/Common/GetPriorityForLoadBalancing.h index 0de99730977..01dae9a1289 100644 --- a/src/Common/GetPriorityForLoadBalancing.h +++ b/src/Common/GetPriorityForLoadBalancing.h @@ -30,6 +30,8 @@ public: Func getPriorityFunc(LoadBalancing load_balance, size_t offset, size_t pool_size) const; + bool hasOptimalNode() const; + std::vector hostname_prefix_distance; /// Prefix distances from name of this host to the names of hosts of pools. std::vector hostname_levenshtein_distance; /// Levenshtein Distances from name of this host to the names of hosts of pools. diff --git a/src/Common/ZooKeeper/IKeeper.h b/src/Common/ZooKeeper/IKeeper.h index 7d574247aa5..2c6cbc4a5d5 100644 --- a/src/Common/ZooKeeper/IKeeper.h +++ b/src/Common/ZooKeeper/IKeeper.h @@ -559,6 +559,8 @@ public: /// Useful to check owner of ephemeral node. virtual int64_t getSessionID() const = 0; + virtual String tryGetAvailabilityZone() { return ""; } + /// If the method will throw an exception, callbacks won't be called. /// /// After the method is executed successfully, you must wait for callbacks @@ -635,10 +637,6 @@ public: virtual const DB::KeeperFeatureFlags * getKeeperFeatureFlags() const { return nullptr; } - /// A ZooKeeper session can have an optional deadline set on it. - /// After it has been reached, the session needs to be finalized. - virtual bool hasReachedDeadline() const = 0; - /// Expire session and finish all pending requests virtual void finalize(const String & reason) = 0; }; diff --git a/src/Common/ZooKeeper/TestKeeper.h b/src/Common/ZooKeeper/TestKeeper.h index 2774055652c..2194ad015bf 100644 --- a/src/Common/ZooKeeper/TestKeeper.h +++ b/src/Common/ZooKeeper/TestKeeper.h @@ -39,7 +39,6 @@ public: ~TestKeeper() override; bool isExpired() const override { return expired; } - bool hasReachedDeadline() const override { return false; } Int8 getConnectedNodeIdx() const override { return 0; } String getConnectedHostPort() const override { return "TestKeeper:0000"; } int32_t getConnectionXid() const override { return 0; } diff --git a/src/Common/ZooKeeper/ZooKeeper.cpp b/src/Common/ZooKeeper/ZooKeeper.cpp index 4ec44a39136..56db9adb787 100644 --- a/src/Common/ZooKeeper/ZooKeeper.cpp +++ b/src/Common/ZooKeeper/ZooKeeper.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -16,10 +17,12 @@ #include #include #include +#include #include "Common/ZooKeeper/IKeeper.h" #include #include #include +#include #include #include @@ -55,70 +58,120 @@ static void check(Coordination::Error code, const std::string & path) throw KeeperException::fromPath(code, path); } +UInt64 getSecondsUntilReconnect(const ZooKeeperArgs & args) +{ + std::uniform_int_distribution fallback_session_lifetime_distribution + { + args.fallback_session_lifetime.min_sec, + args.fallback_session_lifetime.max_sec, + }; + UInt32 session_lifetime_seconds = fallback_session_lifetime_distribution(thread_local_rng); + return session_lifetime_seconds; +} -void ZooKeeper::init(ZooKeeperArgs args_) +void ZooKeeper::updateAvailabilityZones() +{ + ShuffleHosts shuffled_hosts = shuffleHosts(); + + for (const auto & node : shuffled_hosts) + { + try + { + ShuffleHosts single_node{node}; + auto tmp_impl = std::make_unique(single_node, args, zk_log); + auto idx = node.original_index; + availability_zones[idx] = tmp_impl->tryGetAvailabilityZone(); + LOG_TEST(log, "Got availability zone for {}: {}", args.hosts[idx], availability_zones[idx]); + } + catch (...) + { + DB::tryLogCurrentException(log, "Failed to get availability zone for " + node.host); + } + } + LOG_DEBUG(log, "Updated availability zones: [{}]", fmt::join(availability_zones, ", ")); +} + +void ZooKeeper::init(ZooKeeperArgs args_, std::unique_ptr existing_impl) { args = std::move(args_); log = getLogger("ZooKeeper"); - if (args.implementation == "zookeeper") + if (existing_impl) + { + chassert(args.implementation == "zookeeper"); + impl = std::move(existing_impl); + LOG_INFO(log, "Switching to connection to a more optimal node {}", impl->getConnectedHostPort()); + } + else if (args.implementation == "zookeeper") { if (args.hosts.empty()) throw KeeperException::fromMessage(Coordination::Error::ZBADARGUMENTS, "No hosts passed to ZooKeeper constructor."); - Coordination::ZooKeeper::Nodes nodes; - nodes.reserve(args.hosts.size()); + chassert(args.availability_zones.size() == args.hosts.size()); + if (availability_zones.empty()) + { + /// availability_zones is empty on server startup or after config reloading + /// We will keep the az info when starting new sessions + availability_zones = args.availability_zones; + LOG_TEST(log, "Availability zones from config: [{}], client: {}", fmt::join(availability_zones, ", "), args.client_availability_zone); + if (args.availability_zone_autodetect) + updateAvailabilityZones(); + } + chassert(availability_zones.size() == args.hosts.size()); /// Shuffle the hosts to distribute the load among ZooKeeper nodes. - std::vector shuffled_hosts = shuffleHosts(); + ShuffleHosts shuffled_hosts = shuffleHosts(); - bool dns_error = false; - for (auto & host : shuffled_hosts) - { - auto & host_string = host.host; - try - { - const bool secure = startsWith(host_string, "secure://"); - - if (secure) - host_string.erase(0, strlen("secure://")); - - /// We want to resolve all hosts without DNS cache for keeper connection. - Coordination::DNSResolver::instance().removeHostFromCache(host_string); - - const Poco::Net::SocketAddress host_socket_addr{host_string}; - LOG_TEST(log, "Adding ZooKeeper host {} ({})", host_string, host_socket_addr.toString()); - nodes.emplace_back(Coordination::ZooKeeper::Node{host_socket_addr, host.original_index, secure}); - } - catch (const Poco::Net::HostNotFoundException & e) - { - /// Most likely it's misconfiguration and wrong hostname was specified - LOG_ERROR(log, "Cannot use ZooKeeper host {}, reason: {}", host_string, e.displayText()); - } - catch (const Poco::Net::DNSException & e) - { - /// Most likely DNS is not available now - dns_error = true; - LOG_ERROR(log, "Cannot use ZooKeeper host {} due to DNS error: {}", host_string, e.displayText()); - } - } - - if (nodes.empty()) - { - /// For DNS errors we throw exception with ZCONNECTIONLOSS code, so it will be considered as hardware error, not user error - if (dns_error) - throw KeeperException::fromMessage(Coordination::Error::ZCONNECTIONLOSS, "Cannot resolve any of provided ZooKeeper hosts due to DNS error"); - else - throw KeeperException::fromMessage(Coordination::Error::ZCONNECTIONLOSS, "Cannot use any of provided ZooKeeper nodes"); - } - - impl = std::make_unique(nodes, args, zk_log); + impl = std::make_unique(shuffled_hosts, args, zk_log); + Int8 node_idx = impl->getConnectedNodeIdx(); if (args.chroot.empty()) LOG_TRACE(log, "Initialized, hosts: {}", fmt::join(args.hosts, ",")); else LOG_TRACE(log, "Initialized, hosts: {}, chroot: {}", fmt::join(args.hosts, ","), args.chroot); + + + /// If the balancing strategy has an optimal node then it will be the first in the list + bool connected_to_suboptimal_node = node_idx != shuffled_hosts[0].original_index; + bool respect_az = args.prefer_local_availability_zone && !args.client_availability_zone.empty(); + bool may_benefit_from_reconnecting = respect_az || args.get_priority_load_balancing.hasOptimalNode(); + if (connected_to_suboptimal_node && may_benefit_from_reconnecting) + { + auto reconnect_timeout_sec = getSecondsUntilReconnect(args); + LOG_DEBUG(log, "Connected to a suboptimal ZooKeeper host ({}, index {})." + " To preserve balance in ZooKeeper usage, this ZooKeeper session will expire in {} seconds", + impl->getConnectedHostPort(), node_idx, reconnect_timeout_sec); + + auto reconnect_task_holder = DB::Context::getGlobalContextInstance()->getSchedulePool().createTask("ZKReconnect", [this, optimal_host = shuffled_hosts[0]]() + { + try + { + LOG_DEBUG(log, "Trying to connect to a more optimal node {}", optimal_host.host); + ShuffleHosts node{optimal_host}; + std::unique_ptr new_impl = std::make_unique(node, args, zk_log); + Int8 new_node_idx = new_impl->getConnectedNodeIdx(); + + /// Maybe the node was unavailable when getting AZs first time, update just in case + if (args.availability_zone_autodetect && availability_zones[new_node_idx].empty()) + { + availability_zones[new_node_idx] = new_impl->tryGetAvailabilityZone(); + LOG_DEBUG(log, "Got availability zone for {}: {}", optimal_host.host, availability_zones[new_node_idx]); + } + + optimal_impl = std::move(new_impl); + impl->finalize("Connected to a more optimal node"); + } + catch (...) + { + LOG_WARNING(log, "Failed to connect to a more optimal ZooKeeper, will try again later: {}", DB::getCurrentExceptionMessage(/*with_stacktrace*/ false)); + (*reconnect_task)->scheduleAfter(getSecondsUntilReconnect(args) * 1000); + } + }); + reconnect_task = std::make_unique(std::move(reconnect_task_holder)); + (*reconnect_task)->activate(); + (*reconnect_task)->scheduleAfter(reconnect_timeout_sec * 1000); + } } else if (args.implementation == "testkeeper") { @@ -152,29 +205,53 @@ void ZooKeeper::init(ZooKeeperArgs args_) } } +ZooKeeper::~ZooKeeper() +{ + if (reconnect_task) + (*reconnect_task)->deactivate(); +} ZooKeeper::ZooKeeper(const ZooKeeperArgs & args_, std::shared_ptr zk_log_) : zk_log(std::move(zk_log_)) { - init(args_); + init(args_, /*existing_impl*/ {}); +} + + +ZooKeeper::ZooKeeper(const ZooKeeperArgs & args_, std::shared_ptr zk_log_, Strings availability_zones_, std::unique_ptr existing_impl) + : availability_zones(std::move(availability_zones_)), zk_log(std::move(zk_log_)) +{ + if (availability_zones.size() != args_.hosts.size()) + throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "Argument sizes mismatch: availability_zones count {} and hosts count {}", + availability_zones.size(), args_.hosts.size()); + init(args_, std::move(existing_impl)); } ZooKeeper::ZooKeeper(const Poco::Util::AbstractConfiguration & config, const std::string & config_name, std::shared_ptr zk_log_) : zk_log(std::move(zk_log_)) { - init(ZooKeeperArgs(config, config_name)); + init(ZooKeeperArgs(config, config_name), /*existing_impl*/ {}); } -std::vector ZooKeeper::shuffleHosts() const +ShuffleHosts ZooKeeper::shuffleHosts() const { - std::function get_priority = args.get_priority_load_balancing.getPriorityFunc(args.get_priority_load_balancing.load_balancing, 0, args.hosts.size()); - std::vector shuffle_hosts; + std::function get_priority = args.get_priority_load_balancing.getPriorityFunc( + args.get_priority_load_balancing.load_balancing, /* offset for first_or_random */ 0, args.hosts.size()); + ShuffleHosts shuffle_hosts; for (size_t i = 0; i < args.hosts.size(); ++i) { ShuffleHost shuffle_host; shuffle_host.host = args.hosts[i]; shuffle_host.original_index = static_cast(i); + + shuffle_host.secure = startsWith(shuffle_host.host, "secure://"); + if (shuffle_host.secure) + shuffle_host.host.erase(0, strlen("secure://")); + + if (!args.client_availability_zone.empty() && !availability_zones[i].empty()) + shuffle_host.az_info = availability_zones[i] == args.client_availability_zone ? ShuffleHost::SAME : ShuffleHost::OTHER; + if (get_priority) shuffle_host.priority = get_priority(i); shuffle_host.randomize(); @@ -1023,7 +1100,10 @@ ZooKeeperPtr ZooKeeper::create(const Poco::Util::AbstractConfiguration & config, ZooKeeperPtr ZooKeeper::startNewSession() const { - auto res = std::shared_ptr(new ZooKeeper(args, zk_log)); + if (reconnect_task) + (*reconnect_task)->deactivate(); + + auto res = std::shared_ptr(new ZooKeeper(args, zk_log, availability_zones, std::move(optimal_impl))); res->initSession(); return res; } @@ -1456,6 +1536,16 @@ int32_t ZooKeeper::getConnectionXid() const return impl->getConnectionXid(); } +String ZooKeeper::getConnectedHostAvailabilityZone() const +{ + if (args.implementation != "zookeeper" || !impl) + return ""; + Int8 idx = impl->getConnectedNodeIdx(); + if (idx < 0) + return ""; /// session expired + return availability_zones.at(idx); +} + size_t getFailedOpIndex(Coordination::Error exception_code, const Coordination::Responses & responses) { if (responses.empty()) diff --git a/src/Common/ZooKeeper/ZooKeeper.h b/src/Common/ZooKeeper/ZooKeeper.h index 08ff60a80cf..4ae2cfa6096 100644 --- a/src/Common/ZooKeeper/ZooKeeper.h +++ b/src/Common/ZooKeeper/ZooKeeper.h @@ -32,6 +32,7 @@ namespace DB { class ZooKeeperLog; class ZooKeeperWithFaultInjection; +class BackgroundSchedulePoolTaskHolder; namespace ErrorCodes { @@ -48,11 +49,23 @@ constexpr size_t MULTI_BATCH_SIZE = 100; struct ShuffleHost { + enum AvailabilityZoneInfo + { + SAME = 0, + UNKNOWN = 1, + OTHER = 2, + }; + String host; + bool secure = false; UInt8 original_index = 0; + AvailabilityZoneInfo az_info = UNKNOWN; Priority priority; UInt64 random = 0; + /// We should resolve it each time without caching + mutable std::optional address; + void randomize() { random = thread_local_rng(); @@ -60,11 +73,13 @@ struct ShuffleHost static bool compare(const ShuffleHost & lhs, const ShuffleHost & rhs) { - return std::forward_as_tuple(lhs.priority, lhs.random) - < std::forward_as_tuple(rhs.priority, rhs.random); + return std::forward_as_tuple(lhs.az_info, lhs.priority, lhs.random) + < std::forward_as_tuple(rhs.az_info, rhs.priority, rhs.random); } }; +using ShuffleHosts = std::vector; + struct RemoveException { explicit RemoveException(std::string_view path_ = "", bool remove_subtree_ = true) @@ -197,6 +212,9 @@ class ZooKeeper explicit ZooKeeper(const ZooKeeperArgs & args_, std::shared_ptr zk_log_ = nullptr); + /// Allows to keep info about availability zones when starting a new session + ZooKeeper(const ZooKeeperArgs & args_, std::shared_ptr zk_log_, Strings availability_zones_, std::unique_ptr existing_impl); + /** Config of the form: @@ -228,7 +246,9 @@ public: using Ptr = std::shared_ptr; using ErrorsList = std::initializer_list; - std::vector shuffleHosts() const; + ~ZooKeeper(); + + ShuffleHosts shuffleHosts() const; static Ptr create(const Poco::Util::AbstractConfiguration & config, const std::string & config_name, @@ -596,8 +616,6 @@ public: UInt32 getSessionUptime() const { return static_cast(session_uptime.elapsedSeconds()); } - bool hasReachedDeadline() const { return impl->hasReachedDeadline(); } - uint64_t getSessionTimeoutMS() const { return args.session_timeout_ms; } void setServerCompletelyStarted(); @@ -606,6 +624,8 @@ public: String getConnectedHostPort() const; int32_t getConnectionXid() const; + String getConnectedHostAvailabilityZone() const; + const DB::KeeperFeatureFlags * getKeeperFeatureFlags() const { return impl->getKeeperFeatureFlags(); } /// Checks that our session was not killed, and allows to avoid applying a request from an old lost session. @@ -625,7 +645,8 @@ public: void addCheckSessionOp(Coordination::Requests & requests) const; private: - void init(ZooKeeperArgs args_); + void init(ZooKeeperArgs args_, std::unique_ptr existing_impl); + void updateAvailabilityZones(); /// The following methods don't any throw exceptions but return error codes. Coordination::Error createImpl(const std::string & path, const std::string & data, int32_t mode, std::string & path_created); @@ -690,15 +711,20 @@ private: } std::unique_ptr impl; + mutable std::unique_ptr optimal_impl; ZooKeeperArgs args; + Strings availability_zones; + LoggerPtr log = nullptr; std::shared_ptr zk_log; AtomicStopwatch session_uptime; int32_t session_node_version; + + std::unique_ptr reconnect_task; }; diff --git a/src/Common/ZooKeeper/ZooKeeperArgs.cpp b/src/Common/ZooKeeper/ZooKeeperArgs.cpp index a581b6a7f38..18dff779a70 100644 --- a/src/Common/ZooKeeper/ZooKeeperArgs.cpp +++ b/src/Common/ZooKeeper/ZooKeeperArgs.cpp @@ -5,6 +5,9 @@ #include #include #include +#include +#include +#include #include namespace DB @@ -53,6 +56,7 @@ ZooKeeperArgs::ZooKeeperArgs(const Poco::Util::AbstractConfiguration & config, c ZooKeeperArgs::ZooKeeperArgs(const String & hosts_string) { splitInto<','>(hosts, hosts_string); + availability_zones.resize(hosts.size()); } void ZooKeeperArgs::initFromKeeperServerSection(const Poco::Util::AbstractConfiguration & config) @@ -103,8 +107,11 @@ void ZooKeeperArgs::initFromKeeperServerSection(const Poco::Util::AbstractConfig for (const auto & key : keys) { if (startsWith(key, "server")) + { hosts.push_back( (secure ? "secure://" : "") + config.getString(raft_configuration_key + "." + key + ".hostname") + ":" + tcp_port); + availability_zones.push_back(config.getString(raft_configuration_key + "." + key + ".availability_zone", "")); + } } static constexpr std::array load_balancing_keys @@ -123,11 +130,15 @@ void ZooKeeperArgs::initFromKeeperServerSection(const Poco::Util::AbstractConfig auto load_balancing = magic_enum::enum_cast(Poco::toUpper(load_balancing_str)); if (!load_balancing) throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Unknown load balancing: {}", load_balancing_str); - get_priority_load_balancing.load_balancing = *load_balancing; + get_priority_load_balancing = DB::GetPriorityForLoadBalancing(*load_balancing, thread_local_rng() % hosts.size()); break; } } + availability_zone_autodetect = config.getBool(std::string{config_name} + ".availability_zone_autodetect", false); + prefer_local_availability_zone = config.getBool(std::string{config_name} + ".prefer_local_availability_zone", false); + if (prefer_local_availability_zone) + client_availability_zone = DB::PlacementInfo::PlacementInfo::instance().getAvailabilityZone(); } void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguration & config, const std::string & config_name) @@ -137,6 +148,8 @@ void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguratio Poco::Util::AbstractConfiguration::Keys keys; config.keys(config_name, keys); + std::optional load_balancing; + for (const auto & key : keys) { if (key.starts_with("node")) @@ -144,6 +157,7 @@ void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguratio hosts.push_back( (config.getBool(config_name + "." + key + ".secure", false) ? "secure://" : "") + config.getString(config_name + "." + key + ".host") + ":" + config.getString(config_name + "." + key + ".port", "2181")); + availability_zones.push_back(config.getString(config_name + "." + key + ".availability_zone", "")); } else if (key == "session_timeout_ms") { @@ -199,6 +213,10 @@ void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguratio { sessions_path = config.getString(config_name + "." + key); } + else if (key == "prefer_local_availability_zone") + { + prefer_local_availability_zone = config.getBool(config_name + "." + key); + } else if (key == "implementation") { implementation = config.getString(config_name + "." + key); @@ -207,10 +225,9 @@ void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguratio { String load_balancing_str = config.getString(config_name + "." + key); /// Use magic_enum to avoid dependency from dbms (`SettingFieldLoadBalancingTraits::fromString(...)`) - auto load_balancing = magic_enum::enum_cast(Poco::toUpper(load_balancing_str)); + load_balancing = magic_enum::enum_cast(Poco::toUpper(load_balancing_str)); if (!load_balancing) throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Unknown load balancing: {}", load_balancing_str); - get_priority_load_balancing.load_balancing = *load_balancing; } else if (key == "fallback_session_lifetime") { @@ -224,9 +241,19 @@ void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguratio { use_compression = config.getBool(config_name + "." + key); } + else if (key == "availability_zone_autodetect") + { + availability_zone_autodetect = config.getBool(config_name + "." + key); + } else throw KeeperException(Coordination::Error::ZBADARGUMENTS, "Unknown key {} in config file", key); } + + if (load_balancing) + get_priority_load_balancing = DB::GetPriorityForLoadBalancing(*load_balancing, thread_local_rng() % hosts.size()); + + if (prefer_local_availability_zone) + client_availability_zone = DB::PlacementInfo::PlacementInfo::instance().getAvailabilityZone(); } } diff --git a/src/Common/ZooKeeper/ZooKeeperArgs.h b/src/Common/ZooKeeper/ZooKeeperArgs.h index 27ba173c0c3..945b77bf9c1 100644 --- a/src/Common/ZooKeeper/ZooKeeperArgs.h +++ b/src/Common/ZooKeeper/ZooKeeperArgs.h @@ -32,10 +32,12 @@ struct ZooKeeperArgs String zookeeper_name = "zookeeper"; String implementation = "zookeeper"; Strings hosts; + Strings availability_zones; String auth_scheme; String identity; String chroot; String sessions_path = "/clickhouse/sessions"; + String client_availability_zone; int32_t connection_timeout_ms = Coordination::DEFAULT_CONNECTION_TIMEOUT_MS; int32_t session_timeout_ms = Coordination::DEFAULT_SESSION_TIMEOUT_MS; int32_t operation_timeout_ms = Coordination::DEFAULT_OPERATION_TIMEOUT_MS; @@ -47,6 +49,8 @@ struct ZooKeeperArgs UInt64 send_sleep_ms = 0; UInt64 recv_sleep_ms = 0; bool use_compression = false; + bool prefer_local_availability_zone = false; + bool availability_zone_autodetect = false; SessionLifetimeConfiguration fallback_session_lifetime = {}; DB::GetPriorityForLoadBalancing get_priority_load_balancing; diff --git a/src/Common/ZooKeeper/ZooKeeperImpl.cpp b/src/Common/ZooKeeper/ZooKeeperImpl.cpp index ed7498b1ac9..8653af51308 100644 --- a/src/Common/ZooKeeper/ZooKeeperImpl.cpp +++ b/src/Common/ZooKeeper/ZooKeeperImpl.cpp @@ -23,6 +23,9 @@ #include #include +#include +#include + #include "Coordination/KeeperConstants.h" #include "config.h" @@ -338,7 +341,7 @@ ZooKeeper::~ZooKeeper() ZooKeeper::ZooKeeper( - const Nodes & nodes, + const zkutil::ShuffleHosts & nodes, const zkutil::ZooKeeperArgs & args_, std::shared_ptr zk_log_) : args(args_) @@ -426,7 +429,7 @@ ZooKeeper::ZooKeeper( void ZooKeeper::connect( - const Nodes & nodes, + const zkutil::ShuffleHosts & nodes, Poco::Timespan connection_timeout) { if (nodes.empty()) @@ -434,15 +437,51 @@ void ZooKeeper::connect( static constexpr size_t num_tries = 3; bool connected = false; + bool dns_error = false; + + size_t resolved_count = 0; + for (const auto & node : nodes) + { + try + { + const Poco::Net::SocketAddress host_socket_addr{node.host}; + LOG_TRACE(log, "Adding ZooKeeper host {} ({}), az: {}, priority: {}", node.host, host_socket_addr.toString(), node.az_info, node.priority); + node.address = host_socket_addr; + ++resolved_count; + } + catch (const Poco::Net::HostNotFoundException & e) + { + /// Most likely it's misconfiguration and wrong hostname was specified + LOG_ERROR(log, "Cannot use ZooKeeper host {}, reason: {}", node.host, e.displayText()); + } + catch (const Poco::Net::DNSException & e) + { + /// Most likely DNS is not available now + dns_error = true; + LOG_ERROR(log, "Cannot use ZooKeeper host {} due to DNS error: {}", node.host, e.displayText()); + } + } + + if (resolved_count == 0) + { + /// For DNS errors we throw exception with ZCONNECTIONLOSS code, so it will be considered as hardware error, not user error + if (dns_error) + throw zkutil::KeeperException::fromMessage( + Coordination::Error::ZCONNECTIONLOSS, "Cannot resolve any of provided ZooKeeper hosts due to DNS error"); + else + throw zkutil::KeeperException::fromMessage(Coordination::Error::ZCONNECTIONLOSS, "Cannot use any of provided ZooKeeper nodes"); + } WriteBufferFromOwnString fail_reasons; for (size_t try_no = 0; try_no < num_tries; ++try_no) { - for (size_t i = 0; i < nodes.size(); ++i) + for (const auto & node : nodes) { - const auto & node = nodes[i]; try { + if (!node.address) + continue; + /// Reset the state of previous attempt. if (node.secure) { @@ -458,7 +497,7 @@ void ZooKeeper::connect( socket = Poco::Net::StreamSocket(); } - socket.connect(node.address, connection_timeout); + socket.connect(*node.address, connection_timeout); socket_address = socket.peerAddress(); socket.setReceiveTimeout(args.operation_timeout_ms * 1000); @@ -498,27 +537,11 @@ void ZooKeeper::connect( } original_index = static_cast(node.original_index); - - if (i != 0) - { - std::uniform_int_distribution fallback_session_lifetime_distribution - { - args.fallback_session_lifetime.min_sec, - args.fallback_session_lifetime.max_sec, - }; - UInt32 session_lifetime_seconds = fallback_session_lifetime_distribution(thread_local_rng); - client_session_deadline = clock::now() + std::chrono::seconds(session_lifetime_seconds); - - LOG_DEBUG(log, "Connected to a suboptimal ZooKeeper host ({}, index {})." - " To preserve balance in ZooKeeper usage, this ZooKeeper session will expire in {} seconds", - node.address.toString(), i, session_lifetime_seconds); - } - break; } catch (...) { - fail_reasons << "\n" << getCurrentExceptionMessage(false) << ", " << node.address.toString(); + fail_reasons << "\n" << getCurrentExceptionMessage(false) << ", " << node.address->toString(); } } @@ -532,6 +555,9 @@ void ZooKeeper::connect( bool first = true; for (const auto & node : nodes) { + if (!node.address) + continue; + if (first) first = false; else @@ -540,7 +566,7 @@ void ZooKeeper::connect( if (node.secure) message << "secure://"; - message << node.address.toString(); + message << node.address->toString(); } message << fail_reasons.str() << "\n"; @@ -1153,7 +1179,6 @@ void ZooKeeper::pushRequest(RequestInfo && info) { try { - checkSessionDeadline(); info.time = clock::now(); auto maybe_zk_log = std::atomic_load(&zk_log); if (maybe_zk_log) @@ -1201,44 +1226,44 @@ bool ZooKeeper::isFeatureEnabled(KeeperFeatureFlag feature_flag) const return keeper_feature_flags.isEnabled(feature_flag); } -void ZooKeeper::initFeatureFlags() +std::optional ZooKeeper::tryGetSystemZnode(const std::string & path, const std::string & description) { - const auto try_get = [&](const std::string & path, const std::string & description) -> std::optional + auto promise = std::make_shared>(); + auto future = promise->get_future(); + + auto callback = [promise](const Coordination::GetResponse & response) mutable { - auto promise = std::make_shared>(); - auto future = promise->get_future(); - - auto callback = [promise](const Coordination::GetResponse & response) mutable - { - promise->set_value(response); - }; - - get(path, std::move(callback), {}); - if (future.wait_for(std::chrono::milliseconds(args.operation_timeout_ms)) != std::future_status::ready) - throw Exception(Error::ZOPERATIONTIMEOUT, "Failed to get {}: timeout", description); - - auto response = future.get(); - - if (response.error == Coordination::Error::ZNONODE) - { - LOG_TRACE(log, "Failed to get {}", description); - return std::nullopt; - } - else if (response.error != Coordination::Error::ZOK) - { - throw Exception(response.error, "Failed to get {}", description); - } - - return std::move(response.data); + promise->set_value(response); }; - if (auto feature_flags = try_get(keeper_api_feature_flags_path, "feature flags"); feature_flags.has_value()) + get(path, std::move(callback), {}); + if (future.wait_for(std::chrono::milliseconds(args.operation_timeout_ms)) != std::future_status::ready) + throw Exception(Error::ZOPERATIONTIMEOUT, "Failed to get {}: timeout", description); + + auto response = future.get(); + + if (response.error == Coordination::Error::ZNONODE) + { + LOG_TRACE(log, "Failed to get {}", description); + return std::nullopt; + } + else if (response.error != Coordination::Error::ZOK) + { + throw Exception(response.error, "Failed to get {}", description); + } + + return std::move(response.data); +} + +void ZooKeeper::initFeatureFlags() +{ + if (auto feature_flags = tryGetSystemZnode(keeper_api_feature_flags_path, "feature flags"); feature_flags.has_value()) { keeper_feature_flags.setFeatureFlags(std::move(*feature_flags)); return; } - auto keeper_api_version_string = try_get(keeper_api_version_path, "API version"); + auto keeper_api_version_string = tryGetSystemZnode(keeper_api_version_path, "API version"); DB::KeeperApiVersion keeper_api_version{DB::KeeperApiVersion::ZOOKEEPER_COMPATIBLE}; @@ -1256,6 +1281,17 @@ void ZooKeeper::initFeatureFlags() keeper_feature_flags.fromApiVersion(keeper_api_version); } +String ZooKeeper::tryGetAvailabilityZone() +{ + auto res = tryGetSystemZnode(keeper_availability_zone_path, "availability zone"); + if (res) + { + LOG_TRACE(log, "Availability zone for ZooKeeper at {}: {}", getConnectedHostPort(), *res); + return *res; + } + return ""; +} + void ZooKeeper::executeGenericRequest( const ZooKeeperRequestPtr & request, @@ -1587,17 +1623,6 @@ void ZooKeeper::setupFaultDistributions() inject_setup.test_and_set(); } -void ZooKeeper::checkSessionDeadline() const -{ - if (unlikely(hasReachedDeadline())) - throw Exception::fromMessage(Error::ZSESSIONEXPIRED, "Session expired (force expiry client-side)"); -} - -bool ZooKeeper::hasReachedDeadline() const -{ - return client_session_deadline.has_value() && clock::now() >= client_session_deadline.value(); -} - void ZooKeeper::maybeInjectSendFault() { if (unlikely(inject_setup.test() && send_inject_fault && send_inject_fault.value()(thread_local_rng))) diff --git a/src/Common/ZooKeeper/ZooKeeperImpl.h b/src/Common/ZooKeeper/ZooKeeperImpl.h index 8fdf0f97d9d..0c88c35b381 100644 --- a/src/Common/ZooKeeper/ZooKeeperImpl.h +++ b/src/Common/ZooKeeper/ZooKeeperImpl.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -102,21 +103,12 @@ using namespace DB; class ZooKeeper final : public IKeeper { public: - struct Node - { - Poco::Net::SocketAddress address; - UInt8 original_index; - bool secure; - }; - - using Nodes = std::vector; - /** Connection to nodes is performed in order. If you want, shuffle them manually. * Operation timeout couldn't be greater than session timeout. * Operation timeout applies independently for network read, network write, waiting for events and synchronization. */ ZooKeeper( - const Nodes & nodes, + const zkutil::ShuffleHosts & nodes, const zkutil::ZooKeeperArgs & args_, std::shared_ptr zk_log_); @@ -130,9 +122,7 @@ public: String getConnectedHostPort() const override { return (original_index == -1) ? "" : args.hosts[original_index]; } int32_t getConnectionXid() const override { return next_xid.load(); } - /// A ZooKeeper session can have an optional deadline set on it. - /// After it has been reached, the session needs to be finalized. - bool hasReachedDeadline() const override; + String tryGetAvailabilityZone() override; /// Useful to check owner of ephemeral node. int64_t getSessionID() const override { return session_id; } @@ -271,7 +261,6 @@ private: clock::time_point time; }; - std::optional client_session_deadline {}; using RequestsQueue = ConcurrentBoundedQueue; RequestsQueue requests_queue{1024}; @@ -316,7 +305,7 @@ private: LoggerPtr log; void connect( - const Nodes & node, + const zkutil::ShuffleHosts & node, Poco::Timespan connection_timeout); void sendHandshake(); @@ -346,9 +335,10 @@ private: void logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response = nullptr, bool finalize = false, UInt64 elapsed_microseconds = 0); + std::optional tryGetSystemZnode(const std::string & path, const std::string & description); + void initFeatureFlags(); - void checkSessionDeadline() const; CurrentMetrics::Increment active_session_metric_increment{CurrentMetrics::ZooKeeperSession}; std::shared_ptr zk_log; diff --git a/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp b/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp index 25d66b94b46..b3a1564b8ab 100644 --- a/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp +++ b/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp @@ -25,24 +25,24 @@ try Poco::Logger::root().setChannel(channel); Poco::Logger::root().setLevel("trace"); - std::string hosts_arg = argv[1]; - std::vector hosts_strings; - splitInto<','>(hosts_strings, hosts_arg); - ZooKeeper::Nodes nodes; - nodes.reserve(hosts_strings.size()); - for (size_t i = 0; i < hosts_strings.size(); ++i) + zkutil::ZooKeeperArgs args{argv[1]}; + zkutil::ShuffleHosts nodes; + nodes.reserve(args.hosts.size()); + for (size_t i = 0; i < args.hosts.size(); ++i) { - std::string host_string = hosts_strings[i]; - bool secure = startsWith(host_string, "secure://"); + zkutil::ShuffleHost node; + std::string host_string = args.hosts[i]; + node.secure = startsWith(host_string, "secure://"); - if (secure) + if (node.secure) host_string.erase(0, strlen("secure://")); - nodes.emplace_back(ZooKeeper::Node{Poco::Net::SocketAddress{host_string}, static_cast(i) , secure}); + node.host = host_string; + node.original_index = i; + + nodes.emplace_back(node); } - - zkutil::ZooKeeperArgs args; ZooKeeper zk(nodes, args, nullptr); Poco::Event event(true); diff --git a/src/Core/Settings.cpp b/src/Core/Settings.cpp index 8257b94cd9f..9c9c9c1db00 100644 --- a/src/Core/Settings.cpp +++ b/src/Core/Settings.cpp @@ -142,6 +142,7 @@ void Settings::applyCompatibilitySetting(const String & compatibility_value) return; ClickHouseVersion version(compatibility_value); + const auto & settings_changes_history = getSettingsChangesHistory(); /// Iterate through ClickHouse version in descending order and apply reversed /// changes for each version that is higher that version from compatibility setting for (auto it = settings_changes_history.rbegin(); it != settings_changes_history.rend(); ++it) diff --git a/src/Core/SettingsChangesHistory.cpp b/src/Core/SettingsChangesHistory.cpp new file mode 100644 index 00000000000..01db729be2e --- /dev/null +++ b/src/Core/SettingsChangesHistory.cpp @@ -0,0 +1,324 @@ +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; + extern const int LOGICAL_ERROR; +} + +ClickHouseVersion::ClickHouseVersion(const String & version) +{ + Strings split; + boost::split(split, version, [](char c){ return c == '.'; }); + components.reserve(split.size()); + if (split.empty()) + throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version}; + + for (const auto & split_element : split) + { + size_t component; + ReadBufferFromString buf(split_element); + if (!tryReadIntText(component, buf) || !buf.eof()) + throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version}; + components.push_back(component); + } +} + +ClickHouseVersion::ClickHouseVersion(const char * version) + : ClickHouseVersion(String(version)) +{ +} + +String ClickHouseVersion::toString() const +{ + String version = std::to_string(components[0]); + for (size_t i = 1; i < components.size(); ++i) + version += "." + std::to_string(components[i]); + + return version; +} + +// clang-format off +/// History of settings changes that controls some backward incompatible changes +/// across all ClickHouse versions. It maps ClickHouse version to settings changes that were done +/// in this version. This history contains both changes to existing settings and newly added settings. +/// Settings changes is a vector of structs +/// {setting_name, previous_value, new_value, reason}. +/// For newly added setting choose the most appropriate previous_value (for example, if new setting +/// controls new feature and it's 'true' by default, use 'false' as previous_value). +/// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972) +/// Note: please check if the key already exists to prevent duplicate entries. +static std::initializer_list> settings_changes_history_initializer = +{ + {"24.7", {{"output_format_parquet_write_page_index", false, true, "Add a possibility to write page index into parquet files."}, + {"parallel_replicas_local_plan", false, true, "Use local plan for local replica in a query with parallel replicas"}, + }}, + {"24.6", {{"materialize_skip_indexes_on_insert", true, true, "Added new setting to allow to disable materialization of skip indexes on insert"}, + {"materialize_statistics_on_insert", true, true, "Added new setting to allow to disable materialization of statistics on insert"}, + {"input_format_parquet_use_native_reader", false, false, "When reading Parquet files, to use native reader instead of arrow reader."}, + {"hdfs_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in HDFS engine instead of empty query result"}, + {"azure_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in AzureBlobStorage engine instead of empty query result"}, + {"s3_validate_request_settings", true, true, "Allow to disable S3 request settings validation"}, + {"allow_experimental_full_text_index", false, false, "Enable experimental full-text index"}, + {"azure_skip_empty_files", false, false, "Allow to skip empty files in azure table engine"}, + {"hdfs_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in HDFS table engine"}, + {"azure_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in AzureBlobStorage table engine"}, + {"s3_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in S3 table engine"}, + {"s3_max_part_number", 10000, 10000, "Maximum part number number for s3 upload part"}, + {"s3_max_single_operation_copy_size", 32 * 1024 * 1024, 32 * 1024 * 1024, "Maximum size for a single copy operation in s3"}, + {"input_format_parquet_max_block_size", 8192, DEFAULT_BLOCK_SIZE, "Increase block size for parquet reader."}, + {"input_format_parquet_prefer_block_bytes", 0, DEFAULT_BLOCK_SIZE * 256, "Average block bytes output by parquet reader."}, + {"enable_blob_storage_log", true, true, "Write information about blob storage operations to system.blob_storage_log table"}, + {"allow_deprecated_snowflake_conversion_functions", true, false, "Disabled deprecated functions snowflakeToDateTime[64] and dateTime[64]ToSnowflake."}, + {"allow_statistic_optimize", false, false, "Old setting which popped up here being renamed."}, + {"allow_experimental_statistic", false, false, "Old setting which popped up here being renamed."}, + {"allow_statistics_optimize", false, false, "The setting was renamed. The previous name is `allow_statistic_optimize`."}, + {"allow_experimental_statistics", false, false, "The setting was renamed. The previous name is `allow_experimental_statistic`."}, + {"enable_vertical_final", false, true, "Enable vertical final by default again after fixing bug"}, + {"parallel_replicas_custom_key_range_lower", 0, 0, "Add settings to control the range filter when using parallel replicas with dynamic shards"}, + {"parallel_replicas_custom_key_range_upper", 0, 0, "Add settings to control the range filter when using parallel replicas with dynamic shards. A value of 0 disables the upper limit"}, + {"output_format_pretty_display_footer_column_names", 0, 1, "Add a setting to display column names in the footer if there are many rows. Threshold value is controlled by output_format_pretty_display_footer_column_names_min_rows."}, + {"output_format_pretty_display_footer_column_names_min_rows", 0, 50, "Add a setting to control the threshold value for setting output_format_pretty_display_footer_column_names_min_rows. Default 50."}, + {"output_format_csv_serialize_tuple_into_separate_columns", true, true, "A new way of how interpret tuples in CSV format was added."}, + {"input_format_csv_deserialize_separate_columns_into_tuple", true, true, "A new way of how interpret tuples in CSV format was added."}, + {"input_format_csv_try_infer_strings_from_quoted_tuples", true, true, "A new way of how interpret tuples in CSV format was added."}, + {"input_format_json_ignore_key_case", false, false, "Ignore json key case while read json field from string."}, + }}, + {"24.5", {{"allow_deprecated_error_prone_window_functions", true, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)"}, + {"allow_experimental_join_condition", false, false, "Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y."}, + {"input_format_tsv_crlf_end_of_line", false, false, "Enables reading of CRLF line endings with TSV formats"}, + {"output_format_parquet_use_custom_encoder", false, true, "Enable custom Parquet encoder."}, + {"cross_join_min_rows_to_compress", 0, 10000000, "Minimal count of rows to compress block in CROSS JOIN. Zero value means - disable this threshold. This block is compressed when any of the two thresholds (by rows or by bytes) are reached."}, + {"cross_join_min_bytes_to_compress", 0, 1_GiB, "Minimal size of block to compress in CROSS JOIN. Zero value means - disable this threshold. This block is compressed when any of the two thresholds (by rows or by bytes) are reached."}, + {"http_max_chunk_size", 0, 0, "Internal limitation"}, + {"prefer_external_sort_block_bytes", 0, DEFAULT_BLOCK_SIZE * 256, "Prefer maximum block bytes for external sort, reduce the memory usage during merging."}, + {"input_format_force_null_for_omitted_fields", false, false, "Disable type-defaults for omitted fields when needed"}, + {"cast_string_to_dynamic_use_inference", false, false, "Add setting to allow converting String to Dynamic through parsing"}, + {"allow_experimental_dynamic_type", false, false, "Add new experimental Dynamic type"}, + {"azure_max_blocks_in_multipart_upload", 50000, 50000, "Maximum number of blocks in multipart upload for Azure."}, + }}, + {"24.4", {{"input_format_json_throw_on_bad_escape_sequence", true, true, "Allow to save JSON strings with bad escape sequences"}, + {"max_parsing_threads", 0, 0, "Add a separate setting to control number of threads in parallel parsing from files"}, + {"ignore_drop_queries_probability", 0, 0, "Allow to ignore drop queries in server with specified probability for testing purposes"}, + {"lightweight_deletes_sync", 2, 2, "The same as 'mutation_sync', but controls only execution of lightweight deletes"}, + {"query_cache_system_table_handling", "save", "throw", "The query cache no longer caches results of queries against system tables"}, + {"input_format_json_ignore_unnecessary_fields", false, true, "Ignore unnecessary fields and not parse them. Enabling this may not throw exceptions on json strings of invalid format or with duplicated fields"}, + {"input_format_hive_text_allow_variable_number_of_columns", false, true, "Ignore extra columns in Hive Text input (if file has more columns than expected) and treat missing fields in Hive Text input as default values."}, + {"allow_experimental_database_replicated", false, true, "Database engine Replicated is now in Beta stage"}, + {"temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds", (10 * 60 * 1000), (10 * 60 * 1000), "Wait time to lock cache for sapce reservation in temporary data in filesystem cache"}, + {"optimize_rewrite_sum_if_to_count_if", false, true, "Only available for the analyzer, where it works correctly"}, + {"azure_allow_parallel_part_upload", "true", "true", "Use multiple threads for azure multipart upload."}, + {"max_recursive_cte_evaluation_depth", DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, "Maximum limit on recursive CTE evaluation depth"}, + {"query_plan_convert_outer_join_to_inner_join", false, true, "Allow to convert OUTER JOIN to INNER JOIN if filter after JOIN always filters default values"}, + }}, + {"24.3", {{"s3_connect_timeout_ms", 1000, 1000, "Introduce new dedicated setting for s3 connection timeout"}, + {"allow_experimental_shared_merge_tree", false, true, "The setting is obsolete"}, + {"use_page_cache_for_disks_without_file_cache", false, false, "Added userspace page cache"}, + {"read_from_page_cache_if_exists_otherwise_bypass_cache", false, false, "Added userspace page cache"}, + {"page_cache_inject_eviction", false, false, "Added userspace page cache"}, + {"default_table_engine", "None", "MergeTree", "Set default table engine to MergeTree for better usability"}, + {"input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects", false, false, "Allow to use String type for ambiguous paths during named tuple inference from JSON objects"}, + {"traverse_shadow_remote_data_paths", false, false, "Traverse shadow directory when query system.remote_data_paths."}, + {"throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert", false, true, "Deduplication is dependent materialized view cannot work together with async inserts."}, + {"parallel_replicas_allow_in_with_subquery", false, true, "If true, subquery for IN will be executed on every follower replica"}, + {"log_processors_profiles", false, true, "Enable by default"}, + {"function_locate_has_mysql_compatible_argument_order", false, true, "Increase compatibility with MySQL's locate function."}, + {"allow_suspicious_primary_key", true, false, "Forbid suspicious PRIMARY KEY/ORDER BY for MergeTree (i.e. SimpleAggregateFunction)"}, + {"filesystem_cache_reserve_space_wait_lock_timeout_milliseconds", 1000, 1000, "Wait time to lock cache for sapce reservation in filesystem cache"}, + {"max_parser_backtracks", 0, 1000000, "Limiting the complexity of parsing"}, + {"analyzer_compatibility_join_using_top_level_identifier", false, false, "Force to resolve identifier in JOIN USING from projection"}, + {"distributed_insert_skip_read_only_replicas", false, false, "If true, INSERT into Distributed will skip read-only replicas"}, + {"keeper_max_retries", 10, 10, "Max retries for general keeper operations"}, + {"keeper_retry_initial_backoff_ms", 100, 100, "Initial backoff timeout for general keeper operations"}, + {"keeper_retry_max_backoff_ms", 5000, 5000, "Max backoff timeout for general keeper operations"}, + {"s3queue_allow_experimental_sharded_mode", false, false, "Enable experimental sharded mode of S3Queue table engine. It is experimental because it will be rewritten"}, + {"allow_experimental_analyzer", false, true, "Enable analyzer and planner by default."}, + {"merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability", 0.0, 0.0, "For testing of `PartsSplitter` - split read ranges into intersecting and non intersecting every time you read from MergeTree with the specified probability."}, + {"allow_get_client_http_header", false, false, "Introduced a new function."}, + {"output_format_pretty_row_numbers", false, true, "It is better for usability."}, + {"output_format_pretty_max_value_width_apply_for_single_value", true, false, "Single values in Pretty formats won't be cut."}, + {"output_format_parquet_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, + {"output_format_orc_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, + {"output_format_arrow_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, + {"output_format_parquet_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."}, + {"output_format_orc_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."}, + {"output_format_pretty_highlight_digit_groups", false, true, "If enabled and if output is a terminal, highlight every digit corresponding to the number of thousands, millions, etc. with underline."}, + {"geo_distance_returns_float64_on_float64_arguments", false, true, "Increase the default precision."}, + {"azure_max_inflight_parts_for_one_file", 20, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited."}, + {"azure_strict_upload_part_size", 0, 0, "The exact size of part to upload during multipart upload to Azure blob storage."}, + {"azure_min_upload_part_size", 16*1024*1024, 16*1024*1024, "The minimum size of part to upload during multipart upload to Azure blob storage."}, + {"azure_max_upload_part_size", 5ull*1024*1024*1024, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to Azure blob storage."}, + {"azure_upload_part_size_multiply_factor", 2, 2, "Multiply azure_min_upload_part_size by this factor each time azure_multiply_parts_count_threshold parts were uploaded from a single write to Azure blob storage."}, + {"azure_upload_part_size_multiply_parts_count_threshold", 500, 500, "Each time this number of parts was uploaded to Azure blob storage, azure_min_upload_part_size is multiplied by azure_upload_part_size_multiply_factor."}, + {"output_format_csv_serialize_tuple_into_separate_columns", true, true, "A new way of how interpret tuples in CSV format was added."}, + {"input_format_csv_deserialize_separate_columns_into_tuple", true, true, "A new way of how interpret tuples in CSV format was added."}, + {"input_format_csv_try_infer_strings_from_quoted_tuples", true, true, "A new way of how interpret tuples in CSV format was added."}, + }}, + {"24.2", {{"allow_suspicious_variant_types", true, false, "Don't allow creating Variant type with suspicious variants by default"}, + {"validate_experimental_and_suspicious_types_inside_nested_types", false, true, "Validate usage of experimental and suspicious types inside nested types"}, + {"output_format_values_escape_quote_with_quote", false, false, "If true escape ' with '', otherwise quoted with \\'"}, + {"output_format_pretty_single_large_number_tip_threshold", 0, 1'000'000, "Print a readable number tip on the right side of the table if the block consists of a single number which exceeds this value (except 0)"}, + {"input_format_try_infer_exponent_floats", true, false, "Don't infer floats in exponential notation by default"}, + {"query_plan_optimize_prewhere", true, true, "Allow to push down filter to PREWHERE expression for supported storages"}, + {"async_insert_max_data_size", 1000000, 10485760, "The previous value appeared to be too small."}, + {"async_insert_poll_timeout_ms", 10, 10, "Timeout in milliseconds for polling data from asynchronous insert queue"}, + {"async_insert_use_adaptive_busy_timeout", false, true, "Use adaptive asynchronous insert timeout"}, + {"async_insert_busy_timeout_min_ms", 50, 50, "The minimum value of the asynchronous insert timeout in milliseconds; it also serves as the initial value, which may be increased later by the adaptive algorithm"}, + {"async_insert_busy_timeout_max_ms", 200, 200, "The minimum value of the asynchronous insert timeout in milliseconds; async_insert_busy_timeout_ms is aliased to async_insert_busy_timeout_max_ms"}, + {"async_insert_busy_timeout_increase_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout increases"}, + {"async_insert_busy_timeout_decrease_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout decreases"}, + {"format_template_row_format", "", "", "Template row format string can be set directly in query"}, + {"format_template_resultset_format", "", "", "Template result set format string can be set in query"}, + {"split_parts_ranges_into_intersecting_and_non_intersecting_final", true, true, "Allow to split parts ranges into intersecting and non intersecting during FINAL optimization"}, + {"split_intersecting_parts_ranges_into_layers_final", true, true, "Allow to split intersecting parts ranges into layers during FINAL optimization"}, + {"azure_max_single_part_copy_size", 256*1024*1024, 256*1024*1024, "The maximum size of object to copy using single part copy to Azure blob storage."}, + {"min_external_table_block_size_rows", DEFAULT_INSERT_BLOCK_SIZE, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to external table to specified size in rows, if blocks are not big enough"}, + {"min_external_table_block_size_bytes", DEFAULT_INSERT_BLOCK_SIZE * 256, DEFAULT_INSERT_BLOCK_SIZE * 256, "Squash blocks passed to external table to specified size in bytes, if blocks are not big enough."}, + {"parallel_replicas_prefer_local_join", true, true, "If true, and JOIN can be executed with parallel replicas algorithm, and all storages of right JOIN part are *MergeTree, local JOIN will be used instead of GLOBAL JOIN."}, + {"optimize_time_filter_with_preimage", true, true, "Optimize Date and DateTime predicates by converting functions into equivalent comparisons without conversions (e.g. toYear(col) = 2023 -> col >= '2023-01-01' AND col <= '2023-12-31')"}, + {"extract_key_value_pairs_max_pairs_per_row", 0, 0, "Max number of pairs that can be produced by the `extractKeyValuePairs` function. Used as a safeguard against consuming too much memory."}, + {"default_view_definer", "CURRENT_USER", "CURRENT_USER", "Allows to set default `DEFINER` option while creating a view"}, + {"default_materialized_view_sql_security", "DEFINER", "DEFINER", "Allows to set a default value for SQL SECURITY option when creating a materialized view"}, + {"default_normal_view_sql_security", "INVOKER", "INVOKER", "Allows to set default `SQL SECURITY` option while creating a normal view"}, + {"mysql_map_string_to_text_in_show_columns", false, true, "Reduce the configuration effort to connect ClickHouse with BI tools."}, + {"mysql_map_fixed_string_to_text_in_show_columns", false, true, "Reduce the configuration effort to connect ClickHouse with BI tools."}, + }}, + {"24.1", {{"print_pretty_type_names", false, true, "Better user experience."}, + {"input_format_json_read_bools_as_strings", false, true, "Allow to read bools as strings in JSON formats by default"}, + {"output_format_arrow_use_signed_indexes_for_dictionary", false, true, "Use signed indexes type for Arrow dictionaries by default as it's recommended"}, + {"allow_experimental_variant_type", false, false, "Add new experimental Variant type"}, + {"use_variant_as_common_type", false, false, "Allow to use Variant in if/multiIf if there is no common type"}, + {"output_format_arrow_use_64_bit_indexes_for_dictionary", false, false, "Allow to use 64 bit indexes type in Arrow dictionaries"}, + {"parallel_replicas_mark_segment_size", 128, 128, "Add new setting to control segment size in new parallel replicas coordinator implementation"}, + {"ignore_materialized_views_with_dropped_target_table", false, false, "Add new setting to allow to ignore materialized views with dropped target table"}, + {"output_format_compression_level", 3, 3, "Allow to change compression level in the query output"}, + {"output_format_compression_zstd_window_log", 0, 0, "Allow to change zstd window log in the query output when zstd compression is used"}, + {"enable_zstd_qat_codec", false, false, "Add new ZSTD_QAT codec"}, + {"enable_vertical_final", false, true, "Use vertical final by default"}, + {"output_format_arrow_use_64_bit_indexes_for_dictionary", false, false, "Allow to use 64 bit indexes type in Arrow dictionaries"}, + {"max_rows_in_set_to_optimize_join", 100000, 0, "Disable join optimization as it prevents from read in order optimization"}, + {"output_format_pretty_color", true, "auto", "Setting is changed to allow also for auto value, disabling ANSI escapes if output is not a tty"}, + {"function_visible_width_behavior", 0, 1, "We changed the default behavior of `visibleWidth` to be more precise"}, + {"max_estimated_execution_time", 0, 0, "Separate max_execution_time and max_estimated_execution_time"}, + {"iceberg_engine_ignore_schema_evolution", false, false, "Allow to ignore schema evolution in Iceberg table engine"}, + {"optimize_injective_functions_in_group_by", false, true, "Replace injective functions by it's arguments in GROUP BY section in analyzer"}, + {"update_insert_deduplication_token_in_dependent_materialized_views", false, false, "Allow to update insert deduplication token with table identifier during insert in dependent materialized views"}, + {"azure_max_unexpected_write_error_retries", 4, 4, "The maximum number of retries in case of unexpected errors during Azure blob storage write"}, + {"split_parts_ranges_into_intersecting_and_non_intersecting_final", false, true, "Allow to split parts ranges into intersecting and non intersecting during FINAL optimization"}, + {"split_intersecting_parts_ranges_into_layers_final", true, true, "Allow to split intersecting parts ranges into layers during FINAL optimization"}}}, + {"23.12", {{"allow_suspicious_ttl_expressions", true, false, "It is a new setting, and in previous versions the behavior was equivalent to allowing."}, + {"input_format_parquet_allow_missing_columns", false, true, "Allow missing columns in Parquet files by default"}, + {"input_format_orc_allow_missing_columns", false, true, "Allow missing columns in ORC files by default"}, + {"input_format_arrow_allow_missing_columns", false, true, "Allow missing columns in Arrow files by default"}}}, + {"23.11", {{"parsedatetime_parse_without_leading_zeros", false, true, "Improved compatibility with MySQL DATE_FORMAT/STR_TO_DATE"}}}, + {"23.9", {{"optimize_group_by_constant_keys", false, true, "Optimize group by constant keys by default"}, + {"input_format_json_try_infer_named_tuples_from_objects", false, true, "Try to infer named Tuples from JSON objects by default"}, + {"input_format_json_read_numbers_as_strings", false, true, "Allow to read numbers as strings in JSON formats by default"}, + {"input_format_json_read_arrays_as_strings", false, true, "Allow to read arrays as strings in JSON formats by default"}, + {"input_format_json_infer_incomplete_types_as_strings", false, true, "Allow to infer incomplete types as Strings in JSON formats by default"}, + {"input_format_json_try_infer_numbers_from_strings", true, false, "Don't infer numbers from strings in JSON formats by default to prevent possible parsing errors"}, + {"http_write_exception_in_output_format", false, true, "Output valid JSON/XML on exception in HTTP streaming."}}}, + {"23.8", {{"rewrite_count_distinct_if_with_count_distinct_implementation", false, true, "Rewrite countDistinctIf with count_distinct_implementation configuration"}}}, + {"23.7", {{"function_sleep_max_microseconds_per_block", 0, 3000000, "In previous versions, the maximum sleep time of 3 seconds was applied only for `sleep`, but not for `sleepEachRow` function. In the new version, we introduce this setting. If you set compatibility with the previous versions, we will disable the limit altogether."}}}, + {"23.6", {{"http_send_timeout", 180, 30, "3 minutes seems crazy long. Note that this is timeout for a single network write call, not for the whole upload operation."}, + {"http_receive_timeout", 180, 30, "See http_send_timeout."}}}, + {"23.5", {{"input_format_parquet_preserve_order", true, false, "Allow Parquet reader to reorder rows for better parallelism."}, + {"parallelize_output_from_storages", false, true, "Allow parallelism when executing queries that read from file/url/s3/etc. This may reorder rows."}, + {"use_with_fill_by_sorting_prefix", false, true, "Columns preceding WITH FILL columns in ORDER BY clause form sorting prefix. Rows with different values in sorting prefix are filled independently"}, + {"output_format_parquet_compliant_nested_types", false, true, "Change an internal field name in output Parquet file schema."}}}, + {"23.4", {{"allow_suspicious_indices", true, false, "If true, index can defined with identical expressions"}, + {"allow_nonconst_timezone_arguments", true, false, "Allow non-const timezone arguments in certain time-related functions like toTimeZone(), fromUnixTimestamp*(), snowflakeToDateTime*()."}, + {"connect_timeout_with_failover_ms", 50, 1000, "Increase default connect timeout because of async connect"}, + {"connect_timeout_with_failover_secure_ms", 100, 1000, "Increase default secure connect timeout because of async connect"}, + {"hedged_connection_timeout_ms", 100, 50, "Start new connection in hedged requests after 50 ms instead of 100 to correspond with previous connect timeout"}, + {"formatdatetime_f_prints_single_zero", true, false, "Improved compatibility with MySQL DATE_FORMAT()/STR_TO_DATE()"}, + {"formatdatetime_parsedatetime_m_is_month_name", false, true, "Improved compatibility with MySQL DATE_FORMAT/STR_TO_DATE"}}}, + {"23.3", {{"output_format_parquet_version", "1.0", "2.latest", "Use latest Parquet format version for output format"}, + {"input_format_json_ignore_unknown_keys_in_named_tuple", false, true, "Improve parsing JSON objects as named tuples"}, + {"input_format_native_allow_types_conversion", false, true, "Allow types conversion in Native input forma"}, + {"output_format_arrow_compression_method", "none", "lz4_frame", "Use lz4 compression in Arrow output format by default"}, + {"output_format_parquet_compression_method", "snappy", "lz4", "Use lz4 compression in Parquet output format by default"}, + {"output_format_orc_compression_method", "none", "lz4_frame", "Use lz4 compression in ORC output format by default"}, + {"async_query_sending_for_remote", false, true, "Create connections and send query async across shards"}}}, + {"23.2", {{"output_format_parquet_fixed_string_as_fixed_byte_array", false, true, "Use Parquet FIXED_LENGTH_BYTE_ARRAY type for FixedString by default"}, + {"output_format_arrow_fixed_string_as_fixed_byte_array", false, true, "Use Arrow FIXED_SIZE_BINARY type for FixedString by default"}, + {"query_plan_remove_redundant_distinct", false, true, "Remove redundant Distinct step in query plan"}, + {"optimize_duplicate_order_by_and_distinct", true, false, "Remove duplicate ORDER BY and DISTINCT if it's possible"}, + {"insert_keeper_max_retries", 0, 20, "Enable reconnections to Keeper on INSERT, improve reliability"}}}, + {"23.1", {{"input_format_json_read_objects_as_strings", 0, 1, "Enable reading nested json objects as strings while object type is experimental"}, + {"input_format_json_defaults_for_missing_elements_in_named_tuple", false, true, "Allow missing elements in JSON objects while reading named tuples by default"}, + {"input_format_csv_detect_header", false, true, "Detect header in CSV format by default"}, + {"input_format_tsv_detect_header", false, true, "Detect header in TSV format by default"}, + {"input_format_custom_detect_header", false, true, "Detect header in CustomSeparated format by default"}, + {"query_plan_remove_redundant_sorting", false, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries"}}}, + {"22.12", {{"max_size_to_preallocate_for_aggregation", 10'000'000, 100'000'000, "This optimizes performance"}, + {"query_plan_aggregation_in_order", 0, 1, "Enable some refactoring around query plan"}, + {"format_binary_max_string_size", 0, 1_GiB, "Prevent allocating large amount of memory"}}}, + {"22.11", {{"use_structure_from_insertion_table_in_table_functions", 0, 2, "Improve using structure from insertion table in table functions"}}}, + {"22.9", {{"force_grouping_standard_compatibility", false, true, "Make GROUPING function output the same as in SQL standard and other DBMS"}}}, + {"22.7", {{"cross_to_inner_join_rewrite", 1, 2, "Force rewrite comma join to inner"}, + {"enable_positional_arguments", false, true, "Enable positional arguments feature by default"}, + {"format_csv_allow_single_quotes", true, false, "Most tools don't treat single quote in CSV specially, don't do it by default too"}}}, + {"22.6", {{"output_format_json_named_tuples_as_objects", false, true, "Allow to serialize named tuples as JSON objects in JSON formats by default"}, + {"input_format_skip_unknown_fields", false, true, "Optimize reading subset of columns for some input formats"}}}, + {"22.5", {{"memory_overcommit_ratio_denominator", 0, 1073741824, "Enable memory overcommit feature by default"}, + {"memory_overcommit_ratio_denominator_for_user", 0, 1073741824, "Enable memory overcommit feature by default"}}}, + {"22.4", {{"allow_settings_after_format_in_insert", true, false, "Do not allow SETTINGS after FORMAT for INSERT queries because ClickHouse interpret SETTINGS as some values, which is misleading"}}}, + {"22.3", {{"cast_ipv4_ipv6_default_on_conversion_error", true, false, "Make functions cast(value, 'IPv4') and cast(value, 'IPv6') behave same as toIPv4 and toIPv6 functions"}}}, + {"21.12", {{"stream_like_engine_allow_direct_select", true, false, "Do not allow direct select for Kafka/RabbitMQ/FileLog by default"}}}, + {"21.9", {{"output_format_decimal_trailing_zeros", true, false, "Do not output trailing zeros in text representation of Decimal types by default for better looking output"}, + {"use_hedged_requests", false, true, "Enable Hedged Requests feature by default"}}}, + {"21.7", {{"legacy_column_name_of_tuple_literal", true, false, "Add this setting only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher"}}}, + {"21.5", {{"async_socket_for_remote", false, true, "Fix all problems and turn on asynchronous reads from socket for remote queries by default again"}}}, + {"21.3", {{"async_socket_for_remote", true, false, "Turn off asynchronous reads from socket for remote queries because of some problems"}, + {"optimize_normalize_count_variants", false, true, "Rewrite aggregate functions that semantically equals to count() as count() by default"}, + {"normalize_function_names", false, true, "Normalize function names to their canonical names, this was needed for projection query routing"}}}, + {"21.2", {{"enable_global_with_statement", false, true, "Propagate WITH statements to UNION queries and all subqueries by default"}}}, + {"21.1", {{"insert_quorum_parallel", false, true, "Use parallel quorum inserts by default. It is significantly more convenient to use than sequential quorum inserts"}, + {"input_format_null_as_default", false, true, "Allow to insert NULL as default for input formats by default"}, + {"optimize_on_insert", false, true, "Enable data optimization on INSERT by default for better user experience"}, + {"use_compact_format_in_distributed_parts_names", false, true, "Use compact format for async INSERT into Distributed tables by default"}}}, + {"20.10", {{"format_regexp_escaping_rule", "Escaped", "Raw", "Use Raw as default escaping rule for Regexp format to male the behaviour more like to what users expect"}}}, + {"20.7", {{"show_table_uuid_in_table_create_query_if_not_nil", true, false, "Stop showing UID of the table in its CREATE query for Engine=Atomic"}}}, + {"20.5", {{"input_format_with_names_use_header", false, true, "Enable using header with names for formats with WithNames/WithNamesAndTypes suffixes"}, + {"allow_suspicious_codecs", true, false, "Don't allow to specify meaningless compression codecs"}}}, + {"20.4", {{"validate_polygons", false, true, "Throw exception if polygon is invalid in function pointInPolygon by default instead of returning possibly wrong results"}}}, + {"19.18", {{"enable_scalar_subquery_optimization", false, true, "Prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once"}}}, + {"19.14", {{"any_join_distinct_right_table_keys", true, false, "Disable ANY RIGHT and ANY FULL JOINs by default to avoid inconsistency"}}}, + {"19.12", {{"input_format_defaults_for_omitted_fields", false, true, "Enable calculation of complex default expressions for omitted fields for some input formats, because it should be the expected behaviour"}}}, + {"19.5", {{"max_partitions_per_insert_block", 0, 100, "Add a limit for the number of partitions in one block"}}}, + {"18.12.17", {{"enable_optimize_predicate_expression", 0, 1, "Optimize predicates to subqueries by default"}}}, +}; + + +const std::map & getSettingsChangesHistory() +{ + static std::map settings_changes_history; + + static std::once_flag initialized_flag; + std::call_once(initialized_flag, []() + { + for (const auto & setting_change : settings_changes_history_initializer) + { + /// Disallow duplicate keys in the settings changes history. Example: + /// {"21.2", {{"some_setting_1", false, true, "[...]"}}}, + /// [...] + /// {"21.2", {{"some_setting_2", false, true, "[...]"}}}, + /// As std::set has unique keys, one of the entries would be overwritten. + if (settings_changes_history.contains(setting_change.first)) + throw Exception{ErrorCodes::LOGICAL_ERROR, "Detected duplicate version '{}'", setting_change.first.toString()}; + + settings_changes_history[setting_change.first] = setting_change.second; + } + }); + + return settings_changes_history; +} +} diff --git a/src/Core/SettingsChangesHistory.h b/src/Core/SettingsChangesHistory.h index fddf41172c2..b1a69c3b6d6 100644 --- a/src/Core/SettingsChangesHistory.h +++ b/src/Core/SettingsChangesHistory.h @@ -1,62 +1,25 @@ #pragma once #include -#include -#include -#include -#include #include +#include namespace DB { -namespace ErrorCodes -{ - extern const int BAD_ARGUMENTS; -} - class ClickHouseVersion { public: - ClickHouseVersion(const String & version) /// NOLINT(google-explicit-constructor) - { - Strings split; - boost::split(split, version, [](char c){ return c == '.'; }); - components.reserve(split.size()); - if (split.empty()) - throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version}; + /// NOLINTBEGIN(google-explicit-constructor) + ClickHouseVersion(const String & version); + ClickHouseVersion(const char * version); + /// NOLINTEND(google-explicit-constructor) - for (const auto & split_element : split) - { - size_t component; - ReadBufferFromString buf(split_element); - if (!tryReadIntText(component, buf) || !buf.eof()) - throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version}; - components.push_back(component); - } - } + String toString() const; - ClickHouseVersion(const char * version) : ClickHouseVersion(String(version)) {} /// NOLINT(google-explicit-constructor) - - String toString() const - { - String version = std::to_string(components[0]); - for (size_t i = 1; i < components.size(); ++i) - version += "." + std::to_string(components[i]); - - return version; - } - - bool operator<(const ClickHouseVersion & other) const - { - return components < other.components; - } - - bool operator>=(const ClickHouseVersion & other) const - { - return components >= other.components; - } + bool operator<(const ClickHouseVersion & other) const { return components < other.components; } + bool operator>=(const ClickHouseVersion & other) const { return components >= other.components; } private: std::vector components; @@ -75,255 +38,6 @@ namespace SettingsChangesHistory using SettingsChanges = std::vector; } -// clang-format off -/// History of settings changes that controls some backward incompatible changes -/// across all ClickHouse versions. It maps ClickHouse version to settings changes that were done -/// in this version. This history contains both changes to existing settings and newly added settings. -/// Settings changes is a vector of structs -/// {setting_name, previous_value, new_value, reason}. -/// For newly added setting choose the most appropriate previous_value (for example, if new setting -/// controls new feature and it's 'true' by default, use 'false' as previous_value). -/// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972) -static const std::map settings_changes_history = -{ - {"24.7", {{"output_format_parquet_write_page_index", false, true, "Add a possibility to write page index into parquet files."}, - }}, - {"24.6", {{"materialize_skip_indexes_on_insert", true, true, "Added new setting to allow to disable materialization of skip indexes on insert"}, - {"materialize_statistics_on_insert", true, true, "Added new setting to allow to disable materialization of statistics on insert"}, - {"input_format_parquet_use_native_reader", false, false, "When reading Parquet files, to use native reader instead of arrow reader."}, - {"hdfs_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in HDFS engine instead of empty query result"}, - {"azure_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in AzureBlobStorage engine instead of empty query result"}, - {"s3_validate_request_settings", true, true, "Allow to disable S3 request settings validation"}, - {"allow_experimental_full_text_index", false, false, "Enable experimental full-text index"}, - {"azure_skip_empty_files", false, false, "Allow to skip empty files in azure table engine"}, - {"hdfs_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in HDFS table engine"}, - {"azure_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in AzureBlobStorage table engine"}, - {"s3_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in S3 table engine"}, - {"s3_max_part_number", 10000, 10000, "Maximum part number number for s3 upload part"}, - {"s3_max_single_operation_copy_size", 32 * 1024 * 1024, 32 * 1024 * 1024, "Maximum size for a single copy operation in s3"}, - {"input_format_parquet_max_block_size", 8192, DEFAULT_BLOCK_SIZE, "Increase block size for parquet reader."}, - {"input_format_parquet_prefer_block_bytes", 0, DEFAULT_BLOCK_SIZE * 256, "Average block bytes output by parquet reader."}, - {"enable_blob_storage_log", true, true, "Write information about blob storage operations to system.blob_storage_log table"}, - {"allow_deprecated_snowflake_conversion_functions", true, false, "Disabled deprecated functions snowflakeToDateTime[64] and dateTime[64]ToSnowflake."}, - {"allow_statistic_optimize", false, false, "Old setting which popped up here being renamed."}, - {"allow_experimental_statistic", false, false, "Old setting which popped up here being renamed."}, - {"allow_statistics_optimize", false, false, "The setting was renamed. The previous name is `allow_statistic_optimize`."}, - {"allow_experimental_statistics", false, false, "The setting was renamed. The previous name is `allow_experimental_statistic`."}, - {"enable_vertical_final", false, true, "Enable vertical final by default again after fixing bug"}, - {"parallel_replicas_custom_key_range_lower", 0, 0, "Add settings to control the range filter when using parallel replicas with dynamic shards"}, - {"parallel_replicas_custom_key_range_upper", 0, 0, "Add settings to control the range filter when using parallel replicas with dynamic shards. A value of 0 disables the upper limit"}, - {"output_format_pretty_display_footer_column_names", 0, 1, "Add a setting to display column names in the footer if there are many rows. Threshold value is controlled by output_format_pretty_display_footer_column_names_min_rows."}, - {"output_format_pretty_display_footer_column_names_min_rows", 0, 50, "Add a setting to control the threshold value for setting output_format_pretty_display_footer_column_names_min_rows. Default 50."}, - {"output_format_csv_serialize_tuple_into_separate_columns", true, true, "A new way of how interpret tuples in CSV format was added."}, - {"input_format_csv_deserialize_separate_columns_into_tuple", true, true, "A new way of how interpret tuples in CSV format was added."}, - {"input_format_csv_try_infer_strings_from_quoted_tuples", true, true, "A new way of how interpret tuples in CSV format was added."}, - {"input_format_json_ignore_key_case", false, false, "Ignore json key case while read json field from string."}, - {"parallel_replicas_local_plan", false, true, "Use local plan for local replica in a query with parallel replicas"}, - }}, - {"24.5", {{"allow_deprecated_error_prone_window_functions", true, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)"}, - {"allow_experimental_join_condition", false, false, "Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y."}, - {"input_format_tsv_crlf_end_of_line", false, false, "Enables reading of CRLF line endings with TSV formats"}, - {"output_format_parquet_use_custom_encoder", false, true, "Enable custom Parquet encoder."}, - {"cross_join_min_rows_to_compress", 0, 10000000, "Minimal count of rows to compress block in CROSS JOIN. Zero value means - disable this threshold. This block is compressed when any of the two thresholds (by rows or by bytes) are reached."}, - {"cross_join_min_bytes_to_compress", 0, 1_GiB, "Minimal size of block to compress in CROSS JOIN. Zero value means - disable this threshold. This block is compressed when any of the two thresholds (by rows or by bytes) are reached."}, - {"http_max_chunk_size", 0, 0, "Internal limitation"}, - {"prefer_external_sort_block_bytes", 0, DEFAULT_BLOCK_SIZE * 256, "Prefer maximum block bytes for external sort, reduce the memory usage during merging."}, - {"input_format_force_null_for_omitted_fields", false, false, "Disable type-defaults for omitted fields when needed"}, - {"cast_string_to_dynamic_use_inference", false, false, "Add setting to allow converting String to Dynamic through parsing"}, - {"allow_experimental_dynamic_type", false, false, "Add new experimental Dynamic type"}, - {"azure_max_blocks_in_multipart_upload", 50000, 50000, "Maximum number of blocks in multipart upload for Azure."}, - }}, - {"24.4", {{"input_format_json_throw_on_bad_escape_sequence", true, true, "Allow to save JSON strings with bad escape sequences"}, - {"max_parsing_threads", 0, 0, "Add a separate setting to control number of threads in parallel parsing from files"}, - {"ignore_drop_queries_probability", 0, 0, "Allow to ignore drop queries in server with specified probability for testing purposes"}, - {"lightweight_deletes_sync", 2, 2, "The same as 'mutation_sync', but controls only execution of lightweight deletes"}, - {"query_cache_system_table_handling", "save", "throw", "The query cache no longer caches results of queries against system tables"}, - {"input_format_json_ignore_unnecessary_fields", false, true, "Ignore unnecessary fields and not parse them. Enabling this may not throw exceptions on json strings of invalid format or with duplicated fields"}, - {"input_format_hive_text_allow_variable_number_of_columns", false, true, "Ignore extra columns in Hive Text input (if file has more columns than expected) and treat missing fields in Hive Text input as default values."}, - {"allow_experimental_database_replicated", false, true, "Database engine Replicated is now in Beta stage"}, - {"temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds", (10 * 60 * 1000), (10 * 60 * 1000), "Wait time to lock cache for sapce reservation in temporary data in filesystem cache"}, - {"optimize_rewrite_sum_if_to_count_if", false, true, "Only available for the analyzer, where it works correctly"}, - {"azure_allow_parallel_part_upload", "true", "true", "Use multiple threads for azure multipart upload."}, - {"max_recursive_cte_evaluation_depth", DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, "Maximum limit on recursive CTE evaluation depth"}, - {"query_plan_convert_outer_join_to_inner_join", false, true, "Allow to convert OUTER JOIN to INNER JOIN if filter after JOIN always filters default values"}, - }}, - {"24.3", {{"s3_connect_timeout_ms", 1000, 1000, "Introduce new dedicated setting for s3 connection timeout"}, - {"allow_experimental_shared_merge_tree", false, true, "The setting is obsolete"}, - {"use_page_cache_for_disks_without_file_cache", false, false, "Added userspace page cache"}, - {"read_from_page_cache_if_exists_otherwise_bypass_cache", false, false, "Added userspace page cache"}, - {"page_cache_inject_eviction", false, false, "Added userspace page cache"}, - {"default_table_engine", "None", "MergeTree", "Set default table engine to MergeTree for better usability"}, - {"input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects", false, false, "Allow to use String type for ambiguous paths during named tuple inference from JSON objects"}, - {"traverse_shadow_remote_data_paths", false, false, "Traverse shadow directory when query system.remote_data_paths."}, - {"throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert", false, true, "Deduplication is dependent materialized view cannot work together with async inserts."}, - {"parallel_replicas_allow_in_with_subquery", false, true, "If true, subquery for IN will be executed on every follower replica"}, - {"log_processors_profiles", false, true, "Enable by default"}, - {"function_locate_has_mysql_compatible_argument_order", false, true, "Increase compatibility with MySQL's locate function."}, - {"allow_suspicious_primary_key", true, false, "Forbid suspicious PRIMARY KEY/ORDER BY for MergeTree (i.e. SimpleAggregateFunction)"}, - {"filesystem_cache_reserve_space_wait_lock_timeout_milliseconds", 1000, 1000, "Wait time to lock cache for sapce reservation in filesystem cache"}, - {"max_parser_backtracks", 0, 1000000, "Limiting the complexity of parsing"}, - {"analyzer_compatibility_join_using_top_level_identifier", false, false, "Force to resolve identifier in JOIN USING from projection"}, - {"distributed_insert_skip_read_only_replicas", false, false, "If true, INSERT into Distributed will skip read-only replicas"}, - {"keeper_max_retries", 10, 10, "Max retries for general keeper operations"}, - {"keeper_retry_initial_backoff_ms", 100, 100, "Initial backoff timeout for general keeper operations"}, - {"keeper_retry_max_backoff_ms", 5000, 5000, "Max backoff timeout for general keeper operations"}, - {"s3queue_allow_experimental_sharded_mode", false, false, "Enable experimental sharded mode of S3Queue table engine. It is experimental because it will be rewritten"}, - {"allow_experimental_analyzer", false, true, "Enable analyzer and planner by default."}, - {"merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability", 0.0, 0.0, "For testing of `PartsSplitter` - split read ranges into intersecting and non intersecting every time you read from MergeTree with the specified probability."}, - {"allow_get_client_http_header", false, false, "Introduced a new function."}, - {"output_format_pretty_row_numbers", false, true, "It is better for usability."}, - {"output_format_pretty_max_value_width_apply_for_single_value", true, false, "Single values in Pretty formats won't be cut."}, - {"output_format_parquet_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, - {"output_format_orc_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, - {"output_format_arrow_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."}, - {"output_format_parquet_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."}, - {"output_format_orc_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."}, - {"output_format_pretty_highlight_digit_groups", false, true, "If enabled and if output is a terminal, highlight every digit corresponding to the number of thousands, millions, etc. with underline."}, - {"geo_distance_returns_float64_on_float64_arguments", false, true, "Increase the default precision."}, - {"azure_max_inflight_parts_for_one_file", 20, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited."}, - {"azure_strict_upload_part_size", 0, 0, "The exact size of part to upload during multipart upload to Azure blob storage."}, - {"azure_min_upload_part_size", 16*1024*1024, 16*1024*1024, "The minimum size of part to upload during multipart upload to Azure blob storage."}, - {"azure_max_upload_part_size", 5ull*1024*1024*1024, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to Azure blob storage."}, - {"azure_upload_part_size_multiply_factor", 2, 2, "Multiply azure_min_upload_part_size by this factor each time azure_multiply_parts_count_threshold parts were uploaded from a single write to Azure blob storage."}, - {"azure_upload_part_size_multiply_parts_count_threshold", 500, 500, "Each time this number of parts was uploaded to Azure blob storage, azure_min_upload_part_size is multiplied by azure_upload_part_size_multiply_factor."}, - {"output_format_csv_serialize_tuple_into_separate_columns", true, true, "A new way of how interpret tuples in CSV format was added."}, - {"input_format_csv_deserialize_separate_columns_into_tuple", true, true, "A new way of how interpret tuples in CSV format was added."}, - {"input_format_csv_try_infer_strings_from_quoted_tuples", true, true, "A new way of how interpret tuples in CSV format was added."}, - }}, - {"24.2", {{"allow_suspicious_variant_types", true, false, "Don't allow creating Variant type with suspicious variants by default"}, - {"validate_experimental_and_suspicious_types_inside_nested_types", false, true, "Validate usage of experimental and suspicious types inside nested types"}, - {"output_format_values_escape_quote_with_quote", false, false, "If true escape ' with '', otherwise quoted with \\'"}, - {"output_format_pretty_single_large_number_tip_threshold", 0, 1'000'000, "Print a readable number tip on the right side of the table if the block consists of a single number which exceeds this value (except 0)"}, - {"input_format_try_infer_exponent_floats", true, false, "Don't infer floats in exponential notation by default"}, - {"query_plan_optimize_prewhere", true, true, "Allow to push down filter to PREWHERE expression for supported storages"}, - {"async_insert_max_data_size", 1000000, 10485760, "The previous value appeared to be too small."}, - {"async_insert_poll_timeout_ms", 10, 10, "Timeout in milliseconds for polling data from asynchronous insert queue"}, - {"async_insert_use_adaptive_busy_timeout", false, true, "Use adaptive asynchronous insert timeout"}, - {"async_insert_busy_timeout_min_ms", 50, 50, "The minimum value of the asynchronous insert timeout in milliseconds; it also serves as the initial value, which may be increased later by the adaptive algorithm"}, - {"async_insert_busy_timeout_max_ms", 200, 200, "The minimum value of the asynchronous insert timeout in milliseconds; async_insert_busy_timeout_ms is aliased to async_insert_busy_timeout_max_ms"}, - {"async_insert_busy_timeout_increase_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout increases"}, - {"async_insert_busy_timeout_decrease_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout decreases"}, - {"format_template_row_format", "", "", "Template row format string can be set directly in query"}, - {"format_template_resultset_format", "", "", "Template result set format string can be set in query"}, - {"split_parts_ranges_into_intersecting_and_non_intersecting_final", true, true, "Allow to split parts ranges into intersecting and non intersecting during FINAL optimization"}, - {"split_intersecting_parts_ranges_into_layers_final", true, true, "Allow to split intersecting parts ranges into layers during FINAL optimization"}, - {"azure_max_single_part_copy_size", 256*1024*1024, 256*1024*1024, "The maximum size of object to copy using single part copy to Azure blob storage."}, - {"min_external_table_block_size_rows", DEFAULT_INSERT_BLOCK_SIZE, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to external table to specified size in rows, if blocks are not big enough"}, - {"min_external_table_block_size_bytes", DEFAULT_INSERT_BLOCK_SIZE * 256, DEFAULT_INSERT_BLOCK_SIZE * 256, "Squash blocks passed to external table to specified size in bytes, if blocks are not big enough."}, - {"parallel_replicas_prefer_local_join", true, true, "If true, and JOIN can be executed with parallel replicas algorithm, and all storages of right JOIN part are *MergeTree, local JOIN will be used instead of GLOBAL JOIN."}, - {"optimize_time_filter_with_preimage", true, true, "Optimize Date and DateTime predicates by converting functions into equivalent comparisons without conversions (e.g. toYear(col) = 2023 -> col >= '2023-01-01' AND col <= '2023-12-31')"}, - {"extract_key_value_pairs_max_pairs_per_row", 0, 0, "Max number of pairs that can be produced by the `extractKeyValuePairs` function. Used as a safeguard against consuming too much memory."}, - {"default_view_definer", "CURRENT_USER", "CURRENT_USER", "Allows to set default `DEFINER` option while creating a view"}, - {"default_materialized_view_sql_security", "DEFINER", "DEFINER", "Allows to set a default value for SQL SECURITY option when creating a materialized view"}, - {"default_normal_view_sql_security", "INVOKER", "INVOKER", "Allows to set default `SQL SECURITY` option while creating a normal view"}, - {"mysql_map_string_to_text_in_show_columns", false, true, "Reduce the configuration effort to connect ClickHouse with BI tools."}, - {"mysql_map_fixed_string_to_text_in_show_columns", false, true, "Reduce the configuration effort to connect ClickHouse with BI tools."}, - }}, - {"24.1", {{"print_pretty_type_names", false, true, "Better user experience."}, - {"input_format_json_read_bools_as_strings", false, true, "Allow to read bools as strings in JSON formats by default"}, - {"output_format_arrow_use_signed_indexes_for_dictionary", false, true, "Use signed indexes type for Arrow dictionaries by default as it's recommended"}, - {"allow_experimental_variant_type", false, false, "Add new experimental Variant type"}, - {"use_variant_as_common_type", false, false, "Allow to use Variant in if/multiIf if there is no common type"}, - {"output_format_arrow_use_64_bit_indexes_for_dictionary", false, false, "Allow to use 64 bit indexes type in Arrow dictionaries"}, - {"parallel_replicas_mark_segment_size", 128, 128, "Add new setting to control segment size in new parallel replicas coordinator implementation"}, - {"ignore_materialized_views_with_dropped_target_table", false, false, "Add new setting to allow to ignore materialized views with dropped target table"}, - {"output_format_compression_level", 3, 3, "Allow to change compression level in the query output"}, - {"output_format_compression_zstd_window_log", 0, 0, "Allow to change zstd window log in the query output when zstd compression is used"}, - {"enable_zstd_qat_codec", false, false, "Add new ZSTD_QAT codec"}, - {"enable_vertical_final", false, true, "Use vertical final by default"}, - {"output_format_arrow_use_64_bit_indexes_for_dictionary", false, false, "Allow to use 64 bit indexes type in Arrow dictionaries"}, - {"max_rows_in_set_to_optimize_join", 100000, 0, "Disable join optimization as it prevents from read in order optimization"}, - {"output_format_pretty_color", true, "auto", "Setting is changed to allow also for auto value, disabling ANSI escapes if output is not a tty"}, - {"function_visible_width_behavior", 0, 1, "We changed the default behavior of `visibleWidth` to be more precise"}, - {"max_estimated_execution_time", 0, 0, "Separate max_execution_time and max_estimated_execution_time"}, - {"iceberg_engine_ignore_schema_evolution", false, false, "Allow to ignore schema evolution in Iceberg table engine"}, - {"optimize_injective_functions_in_group_by", false, true, "Replace injective functions by it's arguments in GROUP BY section in analyzer"}, - {"update_insert_deduplication_token_in_dependent_materialized_views", false, false, "Allow to update insert deduplication token with table identifier during insert in dependent materialized views"}, - {"azure_max_unexpected_write_error_retries", 4, 4, "The maximum number of retries in case of unexpected errors during Azure blob storage write"}, - {"split_parts_ranges_into_intersecting_and_non_intersecting_final", false, true, "Allow to split parts ranges into intersecting and non intersecting during FINAL optimization"}, - {"split_intersecting_parts_ranges_into_layers_final", true, true, "Allow to split intersecting parts ranges into layers during FINAL optimization"}}}, - {"23.12", {{"allow_suspicious_ttl_expressions", true, false, "It is a new setting, and in previous versions the behavior was equivalent to allowing."}, - {"input_format_parquet_allow_missing_columns", false, true, "Allow missing columns in Parquet files by default"}, - {"input_format_orc_allow_missing_columns", false, true, "Allow missing columns in ORC files by default"}, - {"input_format_arrow_allow_missing_columns", false, true, "Allow missing columns in Arrow files by default"}}}, - {"23.9", {{"optimize_group_by_constant_keys", false, true, "Optimize group by constant keys by default"}, - {"input_format_json_try_infer_named_tuples_from_objects", false, true, "Try to infer named Tuples from JSON objects by default"}, - {"input_format_json_read_numbers_as_strings", false, true, "Allow to read numbers as strings in JSON formats by default"}, - {"input_format_json_read_arrays_as_strings", false, true, "Allow to read arrays as strings in JSON formats by default"}, - {"input_format_json_infer_incomplete_types_as_strings", false, true, "Allow to infer incomplete types as Strings in JSON formats by default"}, - {"input_format_json_try_infer_numbers_from_strings", true, false, "Don't infer numbers from strings in JSON formats by default to prevent possible parsing errors"}, - {"http_write_exception_in_output_format", false, true, "Output valid JSON/XML on exception in HTTP streaming."}}}, - {"23.8", {{"rewrite_count_distinct_if_with_count_distinct_implementation", false, true, "Rewrite countDistinctIf with count_distinct_implementation configuration"}}}, - {"23.7", {{"function_sleep_max_microseconds_per_block", 0, 3000000, "In previous versions, the maximum sleep time of 3 seconds was applied only for `sleep`, but not for `sleepEachRow` function. In the new version, we introduce this setting. If you set compatibility with the previous versions, we will disable the limit altogether."}}}, - {"23.6", {{"http_send_timeout", 180, 30, "3 minutes seems crazy long. Note that this is timeout for a single network write call, not for the whole upload operation."}, - {"http_receive_timeout", 180, 30, "See http_send_timeout."}}}, - {"23.5", {{"input_format_parquet_preserve_order", true, false, "Allow Parquet reader to reorder rows for better parallelism."}, - {"parallelize_output_from_storages", false, true, "Allow parallelism when executing queries that read from file/url/s3/etc. This may reorder rows."}, - {"use_with_fill_by_sorting_prefix", false, true, "Columns preceding WITH FILL columns in ORDER BY clause form sorting prefix. Rows with different values in sorting prefix are filled independently"}, - {"output_format_parquet_compliant_nested_types", false, true, "Change an internal field name in output Parquet file schema."}}}, - {"23.4", {{"allow_suspicious_indices", true, false, "If true, index can defined with identical expressions"}, - {"allow_nonconst_timezone_arguments", true, false, "Allow non-const timezone arguments in certain time-related functions like toTimeZone(), fromUnixTimestamp*(), snowflakeToDateTime*()."}, - {"connect_timeout_with_failover_ms", 50, 1000, "Increase default connect timeout because of async connect"}, - {"connect_timeout_with_failover_secure_ms", 100, 1000, "Increase default secure connect timeout because of async connect"}, - {"hedged_connection_timeout_ms", 100, 50, "Start new connection in hedged requests after 50 ms instead of 100 to correspond with previous connect timeout"}}}, - {"23.3", {{"output_format_parquet_version", "1.0", "2.latest", "Use latest Parquet format version for output format"}, - {"input_format_json_ignore_unknown_keys_in_named_tuple", false, true, "Improve parsing JSON objects as named tuples"}, - {"input_format_native_allow_types_conversion", false, true, "Allow types conversion in Native input forma"}, - {"output_format_arrow_compression_method", "none", "lz4_frame", "Use lz4 compression in Arrow output format by default"}, - {"output_format_parquet_compression_method", "snappy", "lz4", "Use lz4 compression in Parquet output format by default"}, - {"output_format_orc_compression_method", "none", "lz4_frame", "Use lz4 compression in ORC output format by default"}, - {"async_query_sending_for_remote", false, true, "Create connections and send query async across shards"}}}, - {"23.2", {{"output_format_parquet_fixed_string_as_fixed_byte_array", false, true, "Use Parquet FIXED_LENGTH_BYTE_ARRAY type for FixedString by default"}, - {"output_format_arrow_fixed_string_as_fixed_byte_array", false, true, "Use Arrow FIXED_SIZE_BINARY type for FixedString by default"}, - {"query_plan_remove_redundant_distinct", false, true, "Remove redundant Distinct step in query plan"}, - {"optimize_duplicate_order_by_and_distinct", true, false, "Remove duplicate ORDER BY and DISTINCT if it's possible"}, - {"insert_keeper_max_retries", 0, 20, "Enable reconnections to Keeper on INSERT, improve reliability"}}}, - {"23.1", {{"input_format_json_read_objects_as_strings", 0, 1, "Enable reading nested json objects as strings while object type is experimental"}, - {"input_format_json_defaults_for_missing_elements_in_named_tuple", false, true, "Allow missing elements in JSON objects while reading named tuples by default"}, - {"input_format_csv_detect_header", false, true, "Detect header in CSV format by default"}, - {"input_format_tsv_detect_header", false, true, "Detect header in TSV format by default"}, - {"input_format_custom_detect_header", false, true, "Detect header in CustomSeparated format by default"}, - {"query_plan_remove_redundant_sorting", false, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries"}}}, - {"22.12", {{"max_size_to_preallocate_for_aggregation", 10'000'000, 100'000'000, "This optimizes performance"}, - {"query_plan_aggregation_in_order", 0, 1, "Enable some refactoring around query plan"}, - {"format_binary_max_string_size", 0, 1_GiB, "Prevent allocating large amount of memory"}}}, - {"22.11", {{"use_structure_from_insertion_table_in_table_functions", 0, 2, "Improve using structure from insertion table in table functions"}}}, - {"23.4", {{"formatdatetime_f_prints_single_zero", true, false, "Improved compatibility with MySQL DATE_FORMAT()/STR_TO_DATE()"}}}, - {"23.4", {{"formatdatetime_parsedatetime_m_is_month_name", false, true, "Improved compatibility with MySQL DATE_FORMAT/STR_TO_DATE"}}}, - {"23.11", {{"parsedatetime_parse_without_leading_zeros", false, true, "Improved compatibility with MySQL DATE_FORMAT/STR_TO_DATE"}}}, - {"22.9", {{"force_grouping_standard_compatibility", false, true, "Make GROUPING function output the same as in SQL standard and other DBMS"}}}, - {"22.7", {{"cross_to_inner_join_rewrite", 1, 2, "Force rewrite comma join to inner"}, - {"enable_positional_arguments", false, true, "Enable positional arguments feature by default"}, - {"format_csv_allow_single_quotes", true, false, "Most tools don't treat single quote in CSV specially, don't do it by default too"}}}, - {"22.6", {{"output_format_json_named_tuples_as_objects", false, true, "Allow to serialize named tuples as JSON objects in JSON formats by default"}, - {"input_format_skip_unknown_fields", false, true, "Optimize reading subset of columns for some input formats"}}}, - {"22.5", {{"memory_overcommit_ratio_denominator", 0, 1073741824, "Enable memory overcommit feature by default"}, - {"memory_overcommit_ratio_denominator_for_user", 0, 1073741824, "Enable memory overcommit feature by default"}}}, - {"22.4", {{"allow_settings_after_format_in_insert", true, false, "Do not allow SETTINGS after FORMAT for INSERT queries because ClickHouse interpret SETTINGS as some values, which is misleading"}}}, - {"22.3", {{"cast_ipv4_ipv6_default_on_conversion_error", true, false, "Make functions cast(value, 'IPv4') and cast(value, 'IPv6') behave same as toIPv4 and toIPv6 functions"}}}, - {"21.12", {{"stream_like_engine_allow_direct_select", true, false, "Do not allow direct select for Kafka/RabbitMQ/FileLog by default"}}}, - {"21.9", {{"output_format_decimal_trailing_zeros", true, false, "Do not output trailing zeros in text representation of Decimal types by default for better looking output"}, - {"use_hedged_requests", false, true, "Enable Hedged Requests feature by default"}}}, - {"21.7", {{"legacy_column_name_of_tuple_literal", true, false, "Add this setting only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher"}}}, - {"21.5", {{"async_socket_for_remote", false, true, "Fix all problems and turn on asynchronous reads from socket for remote queries by default again"}}}, - {"21.3", {{"async_socket_for_remote", true, false, "Turn off asynchronous reads from socket for remote queries because of some problems"}, - {"optimize_normalize_count_variants", false, true, "Rewrite aggregate functions that semantically equals to count() as count() by default"}, - {"normalize_function_names", false, true, "Normalize function names to their canonical names, this was needed for projection query routing"}}}, - {"21.2", {{"enable_global_with_statement", false, true, "Propagate WITH statements to UNION queries and all subqueries by default"}}}, - {"21.1", {{"insert_quorum_parallel", false, true, "Use parallel quorum inserts by default. It is significantly more convenient to use than sequential quorum inserts"}, - {"input_format_null_as_default", false, true, "Allow to insert NULL as default for input formats by default"}, - {"optimize_on_insert", false, true, "Enable data optimization on INSERT by default for better user experience"}, - {"use_compact_format_in_distributed_parts_names", false, true, "Use compact format for async INSERT into Distributed tables by default"}}}, - {"20.10", {{"format_regexp_escaping_rule", "Escaped", "Raw", "Use Raw as default escaping rule for Regexp format to male the behaviour more like to what users expect"}}}, - {"20.7", {{"show_table_uuid_in_table_create_query_if_not_nil", true, false, "Stop showing UID of the table in its CREATE query for Engine=Atomic"}}}, - {"20.5", {{"input_format_with_names_use_header", false, true, "Enable using header with names for formats with WithNames/WithNamesAndTypes suffixes"}, - {"allow_suspicious_codecs", true, false, "Don't allow to specify meaningless compression codecs"}}}, - {"20.4", {{"validate_polygons", false, true, "Throw exception if polygon is invalid in function pointInPolygon by default instead of returning possibly wrong results"}}}, - {"19.18", {{"enable_scalar_subquery_optimization", false, true, "Prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once"}}}, - {"19.14", {{"any_join_distinct_right_table_keys", true, false, "Disable ANY RIGHT and ANY FULL JOINs by default to avoid inconsistency"}}}, - {"19.12", {{"input_format_defaults_for_omitted_fields", false, true, "Enable calculation of complex default expressions for omitted fields for some input formats, because it should be the expected behaviour"}}}, - {"19.5", {{"max_partitions_per_insert_block", 0, 100, "Add a limit for the number of partitions in one block"}}}, - {"18.12.17", {{"enable_optimize_predicate_expression", 0, 1, "Optimize predicates to subqueries by default"}}}, -}; +const std::map & getSettingsChangesHistory(); } diff --git a/src/IO/S3/Credentials.cpp b/src/IO/S3/Credentials.cpp index fa9d018eaa6..dfb7727fca4 100644 --- a/src/IO/S3/Credentials.cpp +++ b/src/IO/S3/Credentials.cpp @@ -9,6 +9,21 @@ namespace ErrorCodes extern const int UNSUPPORTED_METHOD; } +namespace S3 +{ + std::string tryGetRunningAvailabilityZone() + { + try + { + return getRunningAvailabilityZone(); + } + catch (...) + { + tryLogCurrentException("tryGetRunningAvailabilityZone"); + return ""; + } + } +} } #if USE_AWS_S3 diff --git a/src/IO/S3/Credentials.h b/src/IO/S3/Credentials.h index b8698d9b302..95297ab0538 100644 --- a/src/IO/S3/Credentials.h +++ b/src/IO/S3/Credentials.h @@ -24,6 +24,7 @@ static inline constexpr char GCP_METADATA_SERVICE_ENDPOINT[] = "http://metadata. /// getRunningAvailabilityZone returns the availability zone of the underlying compute resources where the current process runs. std::string getRunningAvailabilityZone(); +std::string tryGetRunningAvailabilityZone(); class AWSEC2MetadataClient : public Aws::Internal::AWSHttpResourceClient { @@ -195,6 +196,7 @@ namespace DB namespace S3 { std::string getRunningAvailabilityZone(); +std::string tryGetRunningAvailabilityZone(); } } diff --git a/src/Interpreters/Cache/FileSegment.cpp b/src/Interpreters/Cache/FileSegment.cpp index 61a356fa3c3..838ca0b491e 100644 --- a/src/Interpreters/Cache/FileSegment.cpp +++ b/src/Interpreters/Cache/FileSegment.cpp @@ -187,13 +187,6 @@ size_t FileSegment::getDownloadedSize() const return downloaded_size; } -void FileSegment::setDownloadedSize(size_t delta) -{ - auto lk = lock(); - downloaded_size += delta; - assert(downloaded_size == std::filesystem::file_size(getPath())); -} - bool FileSegment::isDownloaded() const { auto lk = lock(); @@ -311,6 +304,11 @@ FileSegment::RemoteFileReaderPtr FileSegment::getRemoteFileReader() return remote_file_reader; } +FileSegment::LocalCacheWriterPtr FileSegment::getLocalCacheWriter() +{ + return cache_writer; +} + void FileSegment::resetRemoteFileReader() { auto lk = lock(); @@ -340,33 +338,31 @@ void FileSegment::setRemoteFileReader(RemoteFileReaderPtr remote_file_reader_) remote_file_reader = remote_file_reader_; } -void FileSegment::write(char * from, size_t size, size_t offset) +void FileSegment::write(char * from, size_t size, size_t offset_in_file) { ProfileEventTimeIncrement watch(ProfileEvents::FileSegmentWriteMicroseconds); - - if (!size) - throw Exception(ErrorCodes::LOGICAL_ERROR, "Writing zero size is not allowed"); - + auto file_segment_path = getPath(); { - auto lk = lock(); - assertIsDownloaderUnlocked("write", lk); - assertNotDetachedUnlocked(lk); - } + if (!size) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Writing zero size is not allowed"); - const auto file_segment_path = getPath(); + { + auto lk = lock(); + assertIsDownloaderUnlocked("write", lk); + assertNotDetachedUnlocked(lk); + } - { if (download_state != State::DOWNLOADING) throw Exception( ErrorCodes::LOGICAL_ERROR, "Expected DOWNLOADING state, got {}", stateToString(download_state)); const size_t first_non_downloaded_offset = getCurrentWriteOffset(); - if (offset != first_non_downloaded_offset) + if (offset_in_file != first_non_downloaded_offset) throw Exception( ErrorCodes::LOGICAL_ERROR, "Attempt to write {} bytes to offset: {}, but current write offset is {}", - size, offset, first_non_downloaded_offset); + size, offset_in_file, first_non_downloaded_offset); const size_t current_downloaded_size = getDownloadedSize(); chassert(reserved_size >= current_downloaded_size); @@ -396,10 +392,10 @@ void FileSegment::write(char * from, size_t size, size_t offset) #endif if (!cache_writer) - cache_writer = std::make_unique(file_segment_path, /* buf_size */0); + cache_writer = std::make_unique(getPath(), /* buf_size */0); /// Size is equal to offset as offset for write buffer points to data end. - cache_writer->set(from, size, /* offset */size); + cache_writer->set(from, /* size */size, /* offset */size); /// Reset the buffer when finished. SCOPE_EXIT({ cache_writer->set(nullptr, 0); }); /// Flush the buffer. @@ -435,7 +431,6 @@ void FileSegment::write(char * from, size_t size, size_t offset) } throw; - } catch (Exception & e) { @@ -445,7 +440,7 @@ void FileSegment::write(char * from, size_t size, size_t offset) throw; } - chassert(getCurrentWriteOffset() == offset + size); + chassert(getCurrentWriteOffset() == offset_in_file + size); } FileSegment::State FileSegment::wait(size_t offset) @@ -828,7 +823,7 @@ bool FileSegment::assertCorrectnessUnlocked(const FileSegmentGuard::Lock & lock) }; const auto file_path = getPath(); - if (segment_kind != FileSegmentKind::Temporary) + { std::lock_guard lk(write_mutex); if (downloaded_size == 0) diff --git a/src/Interpreters/Cache/FileSegment.h b/src/Interpreters/Cache/FileSegment.h index f28482a1ce4..d6b37b60dc1 100644 --- a/src/Interpreters/Cache/FileSegment.h +++ b/src/Interpreters/Cache/FileSegment.h @@ -48,7 +48,7 @@ friend class FileCache; /// Because of reserved_size in tryReserve(). public: using Key = FileCacheKey; using RemoteFileReaderPtr = std::shared_ptr; - using LocalCacheWriterPtr = std::unique_ptr; + using LocalCacheWriterPtr = std::shared_ptr; using Downloader = std::string; using DownloaderId = std::string; using Priority = IFileCachePriority; @@ -204,7 +204,7 @@ public: bool reserve(size_t size_to_reserve, size_t lock_wait_timeout_milliseconds, FileCacheReserveStat * reserve_stat = nullptr); /// Write data into reserved space. - void write(char * from, size_t size, size_t offset); + void write(char * from, size_t size, size_t offset_in_file); // Invariant: if state() != DOWNLOADING and remote file reader is present, the reader's // available() == 0, and getFileOffsetOfBufferEnd() == our getCurrentWriteOffset(). @@ -212,6 +212,7 @@ public: // The reader typically requires its internal_buffer to be assigned from the outside before // calling next(). RemoteFileReaderPtr getRemoteFileReader(); + LocalCacheWriterPtr getLocalCacheWriter(); RemoteFileReaderPtr extractRemoteFileReader(); @@ -219,8 +220,6 @@ public: void setRemoteFileReader(RemoteFileReaderPtr remote_file_reader_); - void setDownloadedSize(size_t delta); - void setDownloadFailed(); private: diff --git a/src/Interpreters/Cache/Metadata.cpp b/src/Interpreters/Cache/Metadata.cpp index 5ed4ccdbeca..1d23278a255 100644 --- a/src/Interpreters/Cache/Metadata.cpp +++ b/src/Interpreters/Cache/Metadata.cpp @@ -944,14 +944,7 @@ KeyMetadata::iterator LockedKey::removeFileSegmentImpl( try { const auto path = key_metadata->getFileSegmentPath(*file_segment); - if (file_segment->segment_kind == FileSegmentKind::Temporary) - { - /// FIXME: For temporary file segment the requirement is not as strong because - /// the implementation of "temporary data in cache" creates files in advance. - if (fs::exists(path)) - fs::remove(path); - } - else if (file_segment->downloaded_size == 0) + if (file_segment->downloaded_size == 0) { chassert(!fs::exists(path)); } diff --git a/src/Interpreters/Cache/WriteBufferToFileSegment.cpp b/src/Interpreters/Cache/WriteBufferToFileSegment.cpp index a593ebfdab2..e654d091561 100644 --- a/src/Interpreters/Cache/WriteBufferToFileSegment.cpp +++ b/src/Interpreters/Cache/WriteBufferToFileSegment.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include @@ -33,21 +34,20 @@ namespace } WriteBufferToFileSegment::WriteBufferToFileSegment(FileSegment * file_segment_) - : WriteBufferFromFileDecorator(std::make_unique(file_segment_->getPath())) + : WriteBufferFromFileBase(DBMS_DEFAULT_BUFFER_SIZE, nullptr, 0) , file_segment(file_segment_) , reserve_space_lock_wait_timeout_milliseconds(getCacheLockWaitTimeout()) { } WriteBufferToFileSegment::WriteBufferToFileSegment(FileSegmentsHolderPtr segment_holder_) - : WriteBufferFromFileDecorator( - segment_holder_->size() == 1 - ? std::make_unique(segment_holder_->front().getPath()) - : throw Exception(ErrorCodes::LOGICAL_ERROR, "WriteBufferToFileSegment can be created only from single segment")) + : WriteBufferFromFileBase(DBMS_DEFAULT_BUFFER_SIZE, nullptr, 0) , file_segment(&segment_holder_->front()) , segment_holder(std::move(segment_holder_)) , reserve_space_lock_wait_timeout_milliseconds(getCacheLockWaitTimeout()) { + if (segment_holder->size() != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "WriteBufferToFileSegment can be created only from single segment"); } /// If it throws an exception, the file segment will be incomplete, so you should not use it in the future. @@ -82,9 +82,6 @@ void WriteBufferToFileSegment::nextImpl() reserve_stat_msg += fmt::format("{} hold {}, can release {}; ", toString(kind), ReadableSize(stat.non_releasable_size), ReadableSize(stat.releasable_size)); - if (std::filesystem::exists(file_segment->getPath())) - std::filesystem::remove(file_segment->getPath()); - throw Exception(ErrorCodes::NOT_ENOUGH_SPACE, "Failed to reserve {} bytes for {}: {}(segment info: {})", bytes_to_write, file_segment->getKind() == FileSegmentKind::Temporary ? "temporary file" : "the file in cache", @@ -95,17 +92,37 @@ void WriteBufferToFileSegment::nextImpl() try { - SwapHelper swap(*this, *impl); /// Write data to the underlying buffer. - impl->next(); + file_segment->write(working_buffer.begin(), bytes_to_write, written_bytes); + written_bytes += bytes_to_write; } catch (...) { LOG_WARNING(getLogger("WriteBufferToFileSegment"), "Failed to write to the underlying buffer ({})", file_segment->getInfoForLog()); throw; } +} - file_segment->setDownloadedSize(bytes_to_write); +void WriteBufferToFileSegment::finalizeImpl() +{ + next(); + auto cache_writer = file_segment->getLocalCacheWriter(); + if (cache_writer) + { + SwapHelper swap(*this, *cache_writer); + cache_writer->finalize(); + } +} + +void WriteBufferToFileSegment::sync() +{ + next(); + auto cache_writer = file_segment->getLocalCacheWriter(); + if (cache_writer) + { + SwapHelper swap(*this, *cache_writer); + cache_writer->sync(); + } } std::unique_ptr WriteBufferToFileSegment::getReadBufferImpl() @@ -114,7 +131,10 @@ std::unique_ptr WriteBufferToFileSegment::getReadBufferImpl() * because in case destructor called without `getReadBufferImpl` called, data won't be read. */ finalize(); - return std::make_unique(file_segment->getPath()); + if (file_segment->getDownloadedSize() > 0) + return std::make_unique(file_segment->getPath()); + else + return std::make_unique(); } } diff --git a/src/Interpreters/Cache/WriteBufferToFileSegment.h b/src/Interpreters/Cache/WriteBufferToFileSegment.h index c4b0491f8c0..4719dd4be89 100644 --- a/src/Interpreters/Cache/WriteBufferToFileSegment.h +++ b/src/Interpreters/Cache/WriteBufferToFileSegment.h @@ -9,7 +9,7 @@ namespace DB class FileSegment; -class WriteBufferToFileSegment : public WriteBufferFromFileDecorator, public IReadableWriteBuffer +class WriteBufferToFileSegment : public WriteBufferFromFileBase, public IReadableWriteBuffer { public: explicit WriteBufferToFileSegment(FileSegment * file_segment_); @@ -17,6 +17,13 @@ public: void nextImpl() override; + std::string getFileName() const override { return file_segment->getPath(); } + + void sync() override; + +protected: + void finalizeImpl() override; + private: std::unique_ptr getReadBufferImpl() override; @@ -29,6 +36,7 @@ private: FileSegmentsHolderPtr segment_holder; const size_t reserve_space_lock_wait_timeout_milliseconds; + size_t written_bytes = 0; }; diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index 82e9afb1d36..b946c2cb21e 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -3402,8 +3402,6 @@ zkutil::ZooKeeperPtr Context::getZooKeeper() const const auto & config = shared->zookeeper_config ? *shared->zookeeper_config : getConfigRef(); if (!shared->zookeeper) shared->zookeeper = zkutil::ZooKeeper::create(config, zkutil::getZooKeeperConfigName(config), getZooKeeperLog()); - else if (shared->zookeeper->hasReachedDeadline()) - shared->zookeeper->finalize("ZooKeeper session has reached its deadline"); if (shared->zookeeper->expired()) { diff --git a/src/Interpreters/InterpreterSelectQuery.cpp b/src/Interpreters/InterpreterSelectQuery.cpp index c39c57d2169..90c484636ea 100644 --- a/src/Interpreters/InterpreterSelectQuery.cpp +++ b/src/Interpreters/InterpreterSelectQuery.cpp @@ -910,7 +910,7 @@ bool InterpreterSelectQuery::adjustParallelReplicasAfterAnalysis() UInt64 max_rows = maxBlockSizeByLimit(); if (settings.max_rows_to_read) max_rows = max_rows ? std::min(max_rows, settings.max_rows_to_read.value) : settings.max_rows_to_read; - query_info_copy.limit = max_rows; + query_info_copy.trivial_limit = max_rows; /// Apply filters to prewhere and add them to the query_info so we can filter out parts efficiently during row estimation applyFiltersToPrewhereInAnalysis(analysis_copy); @@ -2445,13 +2445,13 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc if (local_limits.local_limits.size_limits.max_rows != 0) { if (max_block_limited < local_limits.local_limits.size_limits.max_rows) - query_info.limit = max_block_limited; + query_info.trivial_limit = max_block_limited; else if (local_limits.local_limits.size_limits.max_rows < std::numeric_limits::max()) /// Ask to read just enough rows to make the max_rows limit effective (so it has a chance to be triggered). - query_info.limit = 1 + local_limits.local_limits.size_limits.max_rows; + query_info.trivial_limit = 1 + local_limits.local_limits.size_limits.max_rows; } else { - query_info.limit = max_block_limited; + query_info.trivial_limit = max_block_limited; } } diff --git a/src/Interpreters/TemporaryDataOnDisk.cpp b/src/Interpreters/TemporaryDataOnDisk.cpp index a74b5bba2b9..7f0fb8cd6ca 100644 --- a/src/Interpreters/TemporaryDataOnDisk.cpp +++ b/src/Interpreters/TemporaryDataOnDisk.cpp @@ -3,6 +3,8 @@ #include #include +#include +#include #include #include #include @@ -224,25 +226,37 @@ struct TemporaryFileStream::OutputWriter bool finalized = false; }; -TemporaryFileStream::Reader::Reader(const String & path, const Block & header_, size_t size) - : in_file_buf(path, size ? std::min(DBMS_DEFAULT_BUFFER_SIZE, size) : DBMS_DEFAULT_BUFFER_SIZE) - , in_compressed_buf(in_file_buf) - , in_reader(in_compressed_buf, header_, DBMS_TCP_PROTOCOL_VERSION) +TemporaryFileStream::Reader::Reader(const String & path_, const Block & header_, size_t size_) + : path(path_) + , size(size_ ? std::min(size_, DBMS_DEFAULT_BUFFER_SIZE) : DBMS_DEFAULT_BUFFER_SIZE) + , header(header_) { LOG_TEST(getLogger("TemporaryFileStream"), "Reading {} from {}", header_.dumpStructure(), path); } -TemporaryFileStream::Reader::Reader(const String & path, size_t size) - : in_file_buf(path, size ? std::min(DBMS_DEFAULT_BUFFER_SIZE, size) : DBMS_DEFAULT_BUFFER_SIZE) - , in_compressed_buf(in_file_buf) - , in_reader(in_compressed_buf, DBMS_TCP_PROTOCOL_VERSION) +TemporaryFileStream::Reader::Reader(const String & path_, size_t size_) + : path(path_) + , size(size_ ? std::min(size_, DBMS_DEFAULT_BUFFER_SIZE) : DBMS_DEFAULT_BUFFER_SIZE) { LOG_TEST(getLogger("TemporaryFileStream"), "Reading from {}", path); } Block TemporaryFileStream::Reader::read() { - return in_reader.read(); + if (!in_reader) + { + if (fs::exists(path)) + in_file_buf = std::make_unique(path, size); + else + in_file_buf = std::make_unique(); + + in_compressed_buf = std::make_unique(*in_file_buf); + if (header.has_value()) + in_reader = std::make_unique(*in_compressed_buf, header.value(), DBMS_TCP_PROTOCOL_VERSION); + else + in_reader = std::make_unique(*in_compressed_buf, DBMS_TCP_PROTOCOL_VERSION); + } + return in_reader->read(); } TemporaryFileStream::TemporaryFileStream(TemporaryFileOnDiskHolder file_, const Block & header_, TemporaryDataOnDisk * parent_) diff --git a/src/Interpreters/TemporaryDataOnDisk.h b/src/Interpreters/TemporaryDataOnDisk.h index 488eed70da9..d541c93e031 100644 --- a/src/Interpreters/TemporaryDataOnDisk.h +++ b/src/Interpreters/TemporaryDataOnDisk.h @@ -151,9 +151,13 @@ public: Block read(); - ReadBufferFromFile in_file_buf; - CompressedReadBuffer in_compressed_buf; - NativeReader in_reader; + const std::string path; + const size_t size; + const std::optional header; + + std::unique_ptr in_file_buf; + std::unique_ptr in_compressed_buf; + std::unique_ptr in_reader; }; struct Stat diff --git a/src/Planner/PlannerJoinTree.cpp b/src/Planner/PlannerJoinTree.cpp index 86ba5d33ebe..094562a2837 100644 --- a/src/Planner/PlannerJoinTree.cpp +++ b/src/Planner/PlannerJoinTree.cpp @@ -693,14 +693,14 @@ JoinTreeQueryPlan buildQueryPlanForTableExpression(QueryTreeNodePtr table_expres if (select_query_info.local_storage_limits.local_limits.size_limits.max_rows != 0) { if (max_block_size_limited < select_query_info.local_storage_limits.local_limits.size_limits.max_rows) - table_expression_query_info.limit = max_block_size_limited; + table_expression_query_info.trivial_limit = max_block_size_limited; /// Ask to read just enough rows to make the max_rows limit effective (so it has a chance to be triggered). else if (select_query_info.local_storage_limits.local_limits.size_limits.max_rows < std::numeric_limits::max()) - table_expression_query_info.limit = 1 + select_query_info.local_storage_limits.local_limits.size_limits.max_rows; + table_expression_query_info.trivial_limit = 1 + select_query_info.local_storage_limits.local_limits.size_limits.max_rows; } else { - table_expression_query_info.limit = max_block_size_limited; + table_expression_query_info.trivial_limit = max_block_size_limited; } } @@ -912,10 +912,11 @@ JoinTreeQueryPlan buildQueryPlanForTableExpression(QueryTreeNodePtr table_expres { auto result_ptr = reading->selectRangesToRead(); UInt64 rows_to_read = result_ptr->selected_rows; + reading->setAnalyzedResult(std::move(result_ptr)); - if (table_expression_query_info.limit > 0 && table_expression_query_info.limit < rows_to_read) - rows_to_read = table_expression_query_info.limit; + if (table_expression_query_info.trivial_limit > 0 && table_expression_query_info.trivial_limit < rows_to_read) + rows_to_read = table_expression_query_info.trivial_limit; if (max_block_size_limited && (max_block_size_limited < rows_to_read)) rows_to_read = max_block_size_limited; diff --git a/src/Processors/QueryPlan/ReadFromMergeTree.cpp b/src/Processors/QueryPlan/ReadFromMergeTree.cpp index 08c6989242e..7d798e2a399 100644 --- a/src/Processors/QueryPlan/ReadFromMergeTree.cpp +++ b/src/Processors/QueryPlan/ReadFromMergeTree.cpp @@ -250,9 +250,9 @@ void ReadFromMergeTree::AnalysisResult::checkLimits(const Settings & settings, c { /// Fail fast if estimated number of rows to read exceeds the limit size_t total_rows_estimate = selected_rows; - if (query_info_.limit > 0 && total_rows_estimate > query_info_.limit) + if (query_info_.trivial_limit > 0 && total_rows_estimate > query_info_.trivial_limit) { - total_rows_estimate = query_info_.limit; + total_rows_estimate = query_info_.trivial_limit; } limits.check(total_rows_estimate, 0, "rows (controlled by 'max_rows_to_read' setting)", ErrorCodes::TOO_MANY_ROWS); leaf_limits.check( @@ -424,8 +424,8 @@ Pipe ReadFromMergeTree::readFromPool( { size_t total_rows = parts_with_range.getRowsCountAllParts(); - if (query_info.limit > 0 && query_info.limit < total_rows) - total_rows = query_info.limit; + if (query_info.trivial_limit > 0 && query_info.trivial_limit < total_rows) + total_rows = query_info.trivial_limit; const auto & settings = context->getSettingsRef(); @@ -462,7 +462,7 @@ Pipe ReadFromMergeTree::readFromPool( * Because time spend during filling per thread tasks can be greater than whole query * execution for big tables with small limit. */ - bool use_prefetched_read_pool = query_info.limit == 0 && (allow_prefetched_remote || allow_prefetched_local); + bool use_prefetched_read_pool = query_info.trivial_limit == 0 && (allow_prefetched_remote || allow_prefetched_local); if (use_prefetched_read_pool) { @@ -586,16 +586,13 @@ Pipe ReadFromMergeTree::readInOrder( context); } - /// Actually it means that parallel reading from replicas enabled and read snapshot is not local - - /// we can't rely on local snapshot - /// In this case we won't set approximate rows, because it will be accounted multiple times. - /// Also do not count amount of read rows if we read in order of sorting key, - /// because we don't know actual amount of read rows in case when limit is set. + /// If parallel replicas enabled, set total rows in progress here only on initiator with local plan + /// Otherwise rows will counted multiple times const UInt64 in_order_limit = query_info.input_order_info ? query_info.input_order_info->limit : 0; const bool parallel_replicas_remote_plan_for_initiator = is_parallel_reading_from_replicas && !context->getSettingsRef().parallel_replicas_local_plan && context->canUseParallelReplicasOnInitiator(); const bool parallel_replicas_follower = is_parallel_reading_from_replicas && context->canUseParallelReplicasOnFollower(); - const bool set_total_rows_approx = !parallel_replicas_follower && !parallel_replicas_remote_plan_for_initiator && !in_order_limit; + const bool set_total_rows_approx = !parallel_replicas_follower && !parallel_replicas_remote_plan_for_initiator; Pipes pipes; for (size_t i = 0; i < parts_with_ranges.size(); ++i) @@ -603,8 +600,10 @@ Pipe ReadFromMergeTree::readInOrder( const auto & part_with_ranges = parts_with_ranges[i]; UInt64 total_rows = part_with_ranges.getRowsCount(); - if (query_info.limit > 0 && query_info.limit < total_rows) - total_rows = query_info.limit; + if (query_info.trivial_limit > 0 && query_info.trivial_limit < total_rows) + total_rows = query_info.trivial_limit; + else if (in_order_limit > 0 && in_order_limit < total_rows) + total_rows = in_order_limit; LOG_TRACE(log, "Reading {} ranges in{}order from part {}, approx. {} rows starting from {}", part_with_ranges.ranges.size(), diff --git a/src/Processors/QueryPlan/ReadFromSystemNumbersStep.cpp b/src/Processors/QueryPlan/ReadFromSystemNumbersStep.cpp index 5dbf6fa3318..eb974259c5e 100644 --- a/src/Processors/QueryPlan/ReadFromSystemNumbersStep.cpp +++ b/src/Processors/QueryPlan/ReadFromSystemNumbersStep.cpp @@ -393,7 +393,7 @@ ReadFromSystemNumbersStep::ReadFromSystemNumbersStep( , num_streams{num_streams_} , limit_length_and_offset(InterpreterSelectQuery::getLimitLengthAndOffset(query_info.query->as(), context)) , should_pushdown_limit(shouldPushdownLimit(query_info, limit_length_and_offset.first)) - , query_info_limit(query_info.limit) + , query_info_limit(query_info.trivial_limit) , storage_limits(query_info.storage_limits) { storage_snapshot->check(column_names); diff --git a/src/QueryPipeline/SizeLimits.cpp b/src/QueryPipeline/SizeLimits.cpp index 76832b1f951..4161f3f365f 100644 --- a/src/QueryPipeline/SizeLimits.cpp +++ b/src/QueryPipeline/SizeLimits.cpp @@ -2,7 +2,6 @@ #include #include #include -#include namespace ProfileEvents diff --git a/src/Server/CloudPlacementInfo.cpp b/src/Server/CloudPlacementInfo.cpp index 0790f825a45..d8810bb30de 100644 --- a/src/Server/CloudPlacementInfo.cpp +++ b/src/Server/CloudPlacementInfo.cpp @@ -11,6 +11,11 @@ namespace DB { +namespace ErrorCodes +{ +extern const int LOGICAL_ERROR; +} + namespace PlacementInfo { @@ -46,7 +51,15 @@ PlacementInfo & PlacementInfo::instance() } void PlacementInfo::initialize(const Poco::Util::AbstractConfiguration & config) +try { + if (!config.has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX)) + { + availability_zone = ""; + initialized = true; + return; + } + use_imds = config.getBool(getConfigPath("use_imds"), false); if (use_imds) @@ -67,14 +80,17 @@ void PlacementInfo::initialize(const Poco::Util::AbstractConfiguration & config) LOG_DEBUG(log, "Loaded info: availability_zone: {}", availability_zone); initialized = true; } +catch (...) +{ + tryLogCurrentException("Failed to get availability zone"); + availability_zone = ""; + initialized = true; +} std::string PlacementInfo::getAvailabilityZone() const { if (!initialized) - { - LOG_WARNING(log, "Placement info has not been loaded"); - return ""; - } + throw Exception(ErrorCodes::LOGICAL_ERROR, "Placement info has not been loaded"); return availability_zone; } diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index f9cc65871fe..fae2663f079 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -1759,11 +1759,14 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks, std::optional runner(getActivePartsLoadingThreadPool().get(), "ActiveParts"); + bool all_disks_are_readonly = true; for (size_t i = 0; i < disks.size(); ++i) { const auto & disk_ptr = disks[i]; if (disk_ptr->isBroken()) continue; + if (!disk_ptr->isReadOnly()) + all_disks_are_readonly = false; auto & disk_parts = parts_to_load_by_disk[i]; auto & unexpected_disk_parts = unexpected_parts_to_load_by_disk[i]; @@ -1916,7 +1919,6 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks, std::optionalrenameToDetached("broken-on-start"); /// detached parts must not have '_' in prefixes @@ -1961,7 +1963,8 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks, std::optionalgetSettingsRef().max_threads); UInt64 total_rows = result_ptr->selected_rows; - if (query_info.limit > 0 && query_info.limit < total_rows) - total_rows = query_info.limit; + if (query_info.trivial_limit > 0 && query_info.trivial_limit < total_rows) + total_rows = query_info.trivial_limit; return total_rows; } diff --git a/src/Storages/SelectQueryInfo.h b/src/Storages/SelectQueryInfo.h index 65c90fc1e6c..bdf69b9be15 100644 --- a/src/Storages/SelectQueryInfo.h +++ b/src/Storages/SelectQueryInfo.h @@ -229,8 +229,8 @@ struct SelectQueryInfo bool is_parameterized_view = false; bool optimize_trivial_count = false; - // If limit is not 0, that means it's a trivial limit query. - UInt64 limit = 0; + // If not 0, that means it's a trivial limit query. + UInt64 trivial_limit = 0; /// For IStorageSystemOneBlock std::vector columns_mask; diff --git a/src/Storages/StorageGenerateRandom.cpp b/src/Storages/StorageGenerateRandom.cpp index 2f850c76465..754bc096958 100644 --- a/src/Storages/StorageGenerateRandom.cpp +++ b/src/Storages/StorageGenerateRandom.cpp @@ -705,7 +705,7 @@ Pipe StorageGenerateRandom::read( } } - UInt64 query_limit = query_info.limit; + UInt64 query_limit = query_info.trivial_limit; if (query_limit && num_streams * max_block_size > query_limit) { /// We want to avoid spawning more streams than necessary @@ -717,7 +717,7 @@ Pipe StorageGenerateRandom::read( /// Will create more seed values for each source from initial seed. pcg64 generate(random_seed); - auto shared_state = std::make_shared(query_info.limit); + auto shared_state = std::make_shared(query_info.trivial_limit); for (UInt64 i = 0; i < num_streams; ++i) { diff --git a/src/Storages/System/StorageSystemSettingsChanges.cpp b/src/Storages/System/StorageSystemSettingsChanges.cpp index de47ec52031..d6c83870741 100644 --- a/src/Storages/System/StorageSystemSettingsChanges.cpp +++ b/src/Storages/System/StorageSystemSettingsChanges.cpp @@ -26,6 +26,7 @@ ColumnsDescription StorageSystemSettingsChanges::getColumnsDescription() void StorageSystemSettingsChanges::fillData(MutableColumns & res_columns, ContextPtr, const ActionsDAG::Node *, std::vector) const { + const auto & settings_changes_history = getSettingsChangesHistory(); for (auto it = settings_changes_history.rbegin(); it != settings_changes_history.rend(); ++it) { res_columns[0]->insert(it->first.toString()); diff --git a/src/Storages/System/StorageSystemZeros.cpp b/src/Storages/System/StorageSystemZeros.cpp index 09a2bb5d963..0720a2f24d9 100644 --- a/src/Storages/System/StorageSystemZeros.cpp +++ b/src/Storages/System/StorageSystemZeros.cpp @@ -109,8 +109,8 @@ Pipe StorageSystemZeros::read( storage_snapshot->check(column_names); UInt64 query_limit = limit ? *limit : 0; - if (query_info.limit) - query_limit = query_limit ? std::min(query_limit, query_info.limit) : query_info.limit; + if (query_info.trivial_limit) + query_limit = query_limit ? std::min(query_limit, query_info.trivial_limit) : query_info.trivial_limit; if (query_limit && query_limit < max_block_size) max_block_size = query_limit; diff --git a/src/Storages/System/StorageSystemZooKeeperConnection.cpp b/src/Storages/System/StorageSystemZooKeeperConnection.cpp index 950e20512c0..ec29b84dac3 100644 --- a/src/Storages/System/StorageSystemZooKeeperConnection.cpp +++ b/src/Storages/System/StorageSystemZooKeeperConnection.cpp @@ -36,7 +36,8 @@ ColumnsDescription StorageSystemZooKeeperConnection::getColumnsDescription() /* 9 */ {"xid", std::make_shared(), "XID of the current session."}, /* 10*/ {"enabled_feature_flags", std::make_shared(std::move(feature_flags_enum)), "Feature flags which are enabled. Only applicable to ClickHouse Keeper." - } + }, + /* 11*/ {"availability_zone", std::make_shared(), "Availability zone"}, }; } @@ -85,6 +86,7 @@ void StorageSystemZooKeeperConnection::fillData(MutableColumns & res_columns, Co columns[8]->insert(zookeeper->getClientID()); columns[9]->insert(zookeeper->getConnectionXid()); add_enabled_feature_flags(zookeeper); + columns[11]->insert(zookeeper->getConnectedHostAvailabilityZone()); } }; diff --git a/tests/ci/worker/prepare-ci-ami.sh b/tests/ci/worker/prepare-ci-ami.sh index 3e2f33c89d1..eb410ddcb00 100644 --- a/tests/ci/worker/prepare-ci-ami.sh +++ b/tests/ci/worker/prepare-ci-ami.sh @@ -9,7 +9,7 @@ set -xeuo pipefail echo "Running prepare script" export DEBIAN_FRONTEND=noninteractive -export RUNNER_VERSION=2.316.1 +export RUNNER_VERSION=2.317.0 export RUNNER_HOME=/home/ubuntu/actions-runner deb_arch() { @@ -54,7 +54,8 @@ apt-get install --yes --no-install-recommends \ python3-dev \ python3-pip \ qemu-user-static \ - unzip + unzip \ + gh # Install docker curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg @@ -101,7 +102,7 @@ sudo -u ubuntu docker buildx version sudo -u ubuntu docker buildx rm default-builder || : # if it's the second attempt sudo -u ubuntu docker buildx create --use --name default-builder -pip install boto3 pygithub requests urllib3 unidiff dohq-artifactory +pip install boto3 pygithub requests urllib3 unidiff dohq-artifactory jwt rm -rf $RUNNER_HOME # if it's the second attempt mkdir -p $RUNNER_HOME && cd $RUNNER_HOME @@ -212,9 +213,9 @@ chmod +x /usr/local/share/scripts/init-network.sh touch /var/tmp/clickhouse-ci-ami.success # END OF THE SCRIPT -# TOE description +# TOE (Task Orchestrator and Executor) description # name: CIInfrastructurePrepare -# description: instals the infrastructure for ClickHouse CI runners +# description: installs the infrastructure for ClickHouse CI runners # schemaVersion: 1.0 # # phases: diff --git a/tests/instructions/easy_tasks_sorted_ru.md b/tests/instructions/easy_tasks_sorted_ru.md index bc95e6b1c37..fbd86ebf08f 100644 --- a/tests/instructions/easy_tasks_sorted_ru.md +++ b/tests/instructions/easy_tasks_sorted_ru.md @@ -78,7 +78,7 @@ Upd: сделали по-другому: теперь всё безопасно. ## LEFT ONLY JOIN -## Функции makeDate, makeDateTime. +## + Функции makeDate, makeDateTime. `makeDate(year, month, day)` `makeDateTime(year, month, day, hour, minute, second, [timezone])` @@ -187,13 +187,13 @@ https://clickhouse.com/docs/en/operations/table_engines/external_data/ Не работает, если открыть clickhouse-client в интерактивном режиме и делать несколько запросов. -## + Настройка для возможности получить частичный результат при cancel-е. +## Настройка для возможности получить частичный результат при cancel-е. Хотим по Ctrl+C получить те данные, которые успели обработаться. ## Раскрытие кортежей в функциях высшего порядка. -## Табличная функция loop. +## + Табличная функция loop. `SELECT * FROM loop(database, table)` diff --git a/tests/integration/compose/docker_compose_ldap.yml b/tests/integration/compose/docker_compose_ldap.yml index f199516f315..1f50b34735d 100644 --- a/tests/integration/compose/docker_compose_ldap.yml +++ b/tests/integration/compose/docker_compose_ldap.yml @@ -15,7 +15,10 @@ services: ports: - ${LDAP_EXTERNAL_PORT:-1389}:${LDAP_INTERNAL_PORT:-1389} healthcheck: - test: "ldapsearch -x -b dc=example,dc=org cn > /dev/null" + test: > + ldapsearch -x -H ldap://localhost:$$LDAP_PORT_NUMBER -D $$LDAP_ADMIN_DN -w $$LDAP_ADMIN_PASSWORD -b $$LDAP_ROOT + | grep -c -E "member: cn=j(ohn|ane)doe" + | grep 2 >> /dev/null interval: 10s retries: 10 timeout: 2s diff --git a/tests/integration/helpers/cluster.py b/tests/integration/helpers/cluster.py index 41c162217d2..544b06cca1b 100644 --- a/tests/integration/helpers/cluster.py +++ b/tests/integration/helpers/cluster.py @@ -2640,7 +2640,9 @@ class ClickHouseCluster: [ "bash", "-c", - f"/opt/bitnami/openldap/bin/ldapsearch -x -H ldap://{self.ldap_host}:{self.ldap_port} -D cn=admin,dc=example,dc=org -w clickhouse -b dc=example,dc=org", + f"/opt/bitnami/openldap/bin/ldapsearch -x -H ldap://{self.ldap_host}:{self.ldap_port} -D cn=admin,dc=example,dc=org -w clickhouse -b dc=example,dc=org" + f'| grep -c -E "member: cn=j(ohn|ane)doe"' + f"| grep 2 >> /dev/null", ], user="root", ) diff --git a/tests/integration/test_zookeeper_config_load_balancing/configs/zookeeper_load_balancing2.xml b/tests/integration/test_zookeeper_config_load_balancing/configs/zookeeper_load_balancing2.xml new file mode 100644 index 00000000000..fd416cad505 --- /dev/null +++ b/tests/integration/test_zookeeper_config_load_balancing/configs/zookeeper_load_balancing2.xml @@ -0,0 +1,35 @@ + + + + random + + 1 + + + 0 + 1 + + + + zoo1 + 2181 + az1 + + + zoo2 + 2181 + az2 + + + zoo3 + 2181 + az3 + + 3000 + + + + 0 + az2 + + diff --git a/tests/integration/test_zookeeper_config_load_balancing/test.py b/tests/integration/test_zookeeper_config_load_balancing/test.py index f17e0c3f03f..9cdf7db2b08 100644 --- a/tests/integration/test_zookeeper_config_load_balancing/test.py +++ b/tests/integration/test_zookeeper_config_load_balancing/test.py @@ -1,6 +1,8 @@ +import time import pytest from helpers.cluster import ClickHouseCluster from helpers.network import PartitionManager +from helpers.test_tools import assert_eq_with_retry cluster = ClickHouseCluster( __file__, zookeeper_config_path="configs/zookeeper_load_balancing.xml" @@ -17,6 +19,10 @@ node3 = cluster.add_instance( "nod3", with_zookeeper=True, main_configs=["configs/zookeeper_load_balancing.xml"] ) +node4 = cluster.add_instance( + "nod4", with_zookeeper=True, main_configs=["configs/zookeeper_load_balancing2.xml"] +) + def change_balancing(old, new, reload=True): line = "{}<" @@ -405,113 +411,57 @@ def test_hostname_levenshtein_distance(started_cluster): def test_round_robin(started_cluster): pm = PartitionManager() try: - pm._add_rule( - { - "source": node1.ip_address, - "destination": cluster.get_instance_ip("zoo1"), - "action": "REJECT --reject-with tcp-reset", - } - ) - pm._add_rule( - { - "source": node2.ip_address, - "destination": cluster.get_instance_ip("zoo1"), - "action": "REJECT --reject-with tcp-reset", - } - ) - pm._add_rule( - { - "source": node3.ip_address, - "destination": cluster.get_instance_ip("zoo1"), - "action": "REJECT --reject-with tcp-reset", - } - ) change_balancing("random", "round_robin") - - print( - str( - node1.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep ':2181' | grep ESTABLISHED", - ], - privileged=True, - user="root", - ) + for node in [node1, node2, node3]: + idx = int( + node.query("select index from system.zookeeper_connection").strip() ) - ) - assert ( - "1" - == str( - node1.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep 'testzookeeperconfigloadbalancing_zoo2_1.*testzookeeperconfigloadbalancing_default:2181' | grep ESTABLISHED | wc -l", - ], - privileged=True, - user="root", - ) - ).strip() - ) + new_idx = (idx + 1) % 3 - print( - str( - node2.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep ':2181' | grep ESTABLISHED", - ], - privileged=True, - user="root", - ) + pm._add_rule( + { + "source": node.ip_address, + "destination": cluster.get_instance_ip("zoo" + str(idx + 1)), + "action": "REJECT --reject-with tcp-reset", + } ) - ) - assert ( - "1" - == str( - node2.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep 'testzookeeperconfigloadbalancing_zoo2_1.*testzookeeperconfigloadbalancing_default:2181' | grep ESTABLISHED | wc -l", - ], - privileged=True, - user="root", - ) - ).strip() - ) - print( - str( - node3.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep ':2181' | grep ESTABLISHED", - ], - privileged=True, - user="root", - ) + assert_eq_with_retry( + node, + "select index from system.zookeeper_connection", + str(new_idx) + "\n", ) - ) - assert ( - "1" - == str( - node3.exec_in_container( - [ - "bash", - "-c", - "lsof -a -i4 -i6 -itcp -w | grep 'testzookeeperconfigloadbalancing_zoo2_1.*testzookeeperconfigloadbalancing_default:2181' | grep ESTABLISHED | wc -l", - ], - privileged=True, - user="root", - ) - ).strip() - ) - + pm.heal_all() finally: pm.heal_all() change_balancing("round_robin", "random", reload=False) + + +def test_az(started_cluster): + pm = PartitionManager() + try: + # make sure it disconnects from the optimal node + pm._add_rule( + { + "source": node4.ip_address, + "destination": cluster.get_instance_ip("zoo2"), + "action": "REJECT --reject-with tcp-reset", + } + ) + + node4.query_with_retry("select * from system.zookeeper where path='/'") + assert "az2\n" != node4.query( + "select availability_zone from system.zookeeper_connection" + ) + + # fallback_session_lifetime.max is 1 second, but it shouldn't drop current session until the node becomes available + + time.sleep(5) # this is fine + assert 5 <= int(node4.query("select zookeeperSessionUptime()").strip()) + + pm.heal_all() + assert_eq_with_retry( + node4, "select availability_zone from system.zookeeper_connection", "az2\n" + ) + finally: + pm.heal_all() diff --git a/tests/integration/test_zookeeper_fallback_session/test.py b/tests/integration/test_zookeeper_fallback_session/test.py index 9afabfa3da3..932bbe482d2 100644 --- a/tests/integration/test_zookeeper_fallback_session/test.py +++ b/tests/integration/test_zookeeper_fallback_session/test.py @@ -84,10 +84,28 @@ def test_fallback_session(started_cluster: ClickHouseCluster): ) # at this point network partitioning has been reverted. - # the nodes should switch to zoo1 automatically because of `in_order` load-balancing. + # the nodes should switch to zoo1 because of `in_order` load-balancing. # otherwise they would connect to a random replica + + # but there's no reason to reconnect because current session works + # and there's no "optimal" node with `in_order` load-balancing + # so we need to break the current session + for node in [node1, node2, node3]: - assert_uses_zk_node(node, "zoo1") + assert_uses_zk_node(node, "zoo3") + + with PartitionManager() as pm: + for node in started_cluster.instances.values(): + pm._add_rule( + { + "source": node.ip_address, + "destination": cluster.get_instance_ip("zoo3"), + "action": "REJECT --reject-with tcp-reset", + } + ) + + for node in [node1, node2, node3]: + assert_uses_zk_node(node, "zoo1") node1.query_with_retry("INSERT INTO simple VALUES ({0}, {0})".format(2)) for node in [node2, node3]: diff --git a/tests/queries/0_stateless/00731_long_merge_tree_select_opened_files.sh b/tests/queries/0_stateless/00731_long_merge_tree_select_opened_files.sh index 1bb4dbd34de..af746c43da9 100755 --- a/tests/queries/0_stateless/00731_long_merge_tree_select_opened_files.sh +++ b/tests/queries/0_stateless/00731_long_merge_tree_select_opened_files.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Tags: long, no-s3-storage +# Tags: long, no-s3-storage, no-tsan # no-s3 because read FileOpen metric set -e @@ -31,6 +31,6 @@ $CLICKHOUSE_CLIENT $settings -q "$touching_many_parts_query" &> /dev/null $CLICKHOUSE_CLIENT $settings -q "SYSTEM FLUSH LOGS" -$CLICKHOUSE_CLIENT $settings -q "SELECT ProfileEvents['FileOpen'] as opened_files FROM system.query_log WHERE query='$touching_many_parts_query' and current_database = currentDatabase() ORDER BY event_time DESC, opened_files DESC LIMIT 1;" +$CLICKHOUSE_CLIENT $settings -q "SELECT ProfileEvents['FileOpen'] as opened_files FROM system.query_log WHERE query = '$touching_many_parts_query' AND current_database = currentDatabase() AND event_date >= yesterday() ORDER BY event_time DESC, opened_files DESC LIMIT 1;" $CLICKHOUSE_CLIENT $settings -q "DROP TABLE IF EXISTS merge_tree_table;" diff --git a/tests/queries/0_stateless/00763_lock_buffer_long.sh b/tests/queries/0_stateless/00763_lock_buffer_long.sh index 50680724149..046e4efaa85 100755 --- a/tests/queries/0_stateless/00763_lock_buffer_long.sh +++ b/tests/queries/0_stateless/00763_lock_buffer_long.sh @@ -1,5 +1,6 @@ #!/usr/bin/env bash -# Tags: long +# Tags: long, no-s3-storage, no-msan, no-asan, no-tsan, no-debug +# Some kind of stress test, it doesn't make sense to test in a non-release build set -e @@ -15,7 +16,7 @@ ${CLICKHOUSE_CLIENT} --query="CREATE TABLE buffer_00763_2 (s String) ENGINE = Bu function thread1() { - seq 1 500 | sed -r -e 's/.+/DROP TABLE IF EXISTS mt_00763_2; CREATE TABLE mt_00763_2 (s String) ENGINE = MergeTree ORDER BY s; INSERT INTO mt_00763_2 SELECT toString(number) FROM numbers(10);/' | ${CLICKHOUSE_CLIENT} --multiquery --ignore-error ||: + seq 1 500 | sed -r -e 's/.+/DROP TABLE IF EXISTS mt_00763_2; CREATE TABLE mt_00763_2 (s String) ENGINE = MergeTree ORDER BY s; INSERT INTO mt_00763_2 SELECT toString(number) FROM numbers(10);/' | ${CLICKHOUSE_CLIENT} --fsync-metadata 0 --multiquery --ignore-error ||: } function thread2() diff --git a/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.reference b/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.reference index 8984d35930a..03ed07cf1a4 100644 --- a/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.reference +++ b/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.reference @@ -2,6 +2,4 @@ Instruction check fail. The CPU does not support SSSE3 instruction set. Instruction check fail. The CPU does not support SSE4.1 instruction set. Instruction check fail. The CPU does not support SSE4.2 instruction set. Instruction check fail. The CPU does not support POPCNT instruction set. -: MADV_DONTNEED does not work (memset will be used instead) -: (This is the expected behaviour if you are running under QEMU) 1 diff --git a/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.sh b/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.sh index 01047aeb9ab..c37f1f95374 100755 --- a/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.sh +++ b/tests/queries/0_stateless/01103_check_cpu_instructions_at_startup.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # Tags: no-tsan, no-asan, no-ubsan, no-msan, no-debug, no-fasttest, no-cpu-aarch64 -# Tag no-fasttest: avoid dependency on qemu -- invonvenient when running locally +# Tag no-fasttest: avoid dependency on qemu -- inconvenient when running locally CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh diff --git a/tests/queries/0_stateless/01502_jemalloc_percpu_arena.reference b/tests/queries/0_stateless/01502_jemalloc_percpu_arena.reference index fe093e39a56..5accb577786 100644 --- a/tests/queries/0_stateless/01502_jemalloc_percpu_arena.reference +++ b/tests/queries/0_stateless/01502_jemalloc_percpu_arena.reference @@ -1,5 +1,3 @@ -: Number of CPUs detected is not deterministic. Per-CPU arena disabled. 1 -: Number of CPUs detected is not deterministic. Per-CPU arena disabled. 100000000 1 diff --git a/tests/queries/0_stateless/01502_jemalloc_percpu_arena.sh b/tests/queries/0_stateless/01502_jemalloc_percpu_arena.sh index b3ea6eca3f4..c1bd1e0e1fa 100755 --- a/tests/queries/0_stateless/01502_jemalloc_percpu_arena.sh +++ b/tests/queries/0_stateless/01502_jemalloc_percpu_arena.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Tags: no-tsan, no-asan, no-msan, no-ubsan, no-fasttest +# Tags: no-tsan, no-asan, no-msan, no-ubsan, no-fasttest, no-debug # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # NOTE: jemalloc is disabled under sanitizers diff --git a/tests/queries/0_stateless/01505_pipeline_executor_UAF.sh b/tests/queries/0_stateless/01505_pipeline_executor_UAF.sh index c2750ad31b2..35c2b796570 100755 --- a/tests/queries/0_stateless/01505_pipeline_executor_UAF.sh +++ b/tests/queries/0_stateless/01505_pipeline_executor_UAF.sh @@ -14,7 +14,7 @@ for _ in {1..10}; do ${CLICKHOUSE_LOCAL} -q 'select * from numbers_mt(100000000) settings max_threads=100 FORMAT Null' # Binding to specific CPU is not required, but this makes the test more reliable. taskset --cpu-list 0 ${CLICKHOUSE_LOCAL} -q 'select * from numbers_mt(100000000) settings max_threads=100 FORMAT Null' 2>&1 | { - # build with santiziers does not have jemalloc + # build with sanitiziers does not have jemalloc # and for jemalloc we have separate test # 01502_jemalloc_percpu_arena grep -v ': Number of CPUs detected is not deterministic. Per-CPU arena disabled.' diff --git a/tests/queries/0_stateless/02956_rocksdb_bulk_sink.sh b/tests/queries/0_stateless/02956_rocksdb_bulk_sink.sh index 45e65b18e07..b1d1c483396 100755 --- a/tests/queries/0_stateless/02956_rocksdb_bulk_sink.sh +++ b/tests/queries/0_stateless/02956_rocksdb_bulk_sink.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Tags: no-ordinary-database, use-rocksdb +# Tags: no-ordinary-database, use-rocksdb, no-random-settings CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh @@ -45,4 +45,3 @@ ${CLICKHOUSE_CLIENT} --query "INSERT INTO rocksdb_worm SELECT number, number+1 F ${CLICKHOUSE_CLIENT} --query "INSERT INTO rocksdb_worm SELECT number, number+1 FROM numbers_mt(1000000)" & wait ${CLICKHOUSE_CLIENT} --query "SELECT count() FROM rocksdb_worm;" - diff --git a/tests/queries/0_stateless/02963_test_flexible_disk_configuration.sql b/tests/queries/0_stateless/02963_test_flexible_disk_configuration.sql index 8f67cd7e030..7ebef866360 100644 --- a/tests/queries/0_stateless/02963_test_flexible_disk_configuration.sql +++ b/tests/queries/0_stateless/02963_test_flexible_disk_configuration.sql @@ -22,7 +22,7 @@ create table test (a Int32) engine = MergeTree() order by tuple() settings disk=disk(name='test2', type = object_storage, object_storage_type = s3, - metadata_storage_type = local, + metadata_type = local, endpoint = 'http://localhost:11111/test/common/', access_key_id = clickhouse, secret_access_key = clickhouse); @@ -32,7 +32,7 @@ create table test (a Int32) engine = MergeTree() order by tuple() settings disk=disk(name='test3', type = object_storage, object_storage_type = s3, - metadata_storage_type = local, + metadata_type = local, metadata_keep_free_space_bytes = 1024, endpoint = 'http://localhost:11111/test/common/', access_key_id = clickhouse, @@ -43,7 +43,7 @@ create table test (a Int32) engine = MergeTree() order by tuple() settings disk=disk(name='test4', type = object_storage, object_storage_type = s3, - metadata_storage_type = local, + metadata_type = local, metadata_keep_free_space_bytes = 0, endpoint = 'http://localhost:11111/test/common/', access_key_id = clickhouse, diff --git a/tests/queries/0_stateless/03196_local_memory_limit.reference b/tests/queries/0_stateless/03196_local_memory_limit.reference new file mode 100644 index 00000000000..f2e22e8aa5b --- /dev/null +++ b/tests/queries/0_stateless/03196_local_memory_limit.reference @@ -0,0 +1 @@ +maximum: 95.37 MiB diff --git a/tests/queries/0_stateless/03196_local_memory_limit.sh b/tests/queries/0_stateless/03196_local_memory_limit.sh new file mode 100755 index 00000000000..346b37be006 --- /dev/null +++ b/tests/queries/0_stateless/03196_local_memory_limit.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +${CLICKHOUSE_LOCAL} --config-file <(echo "100M") --query "SELECT number FROM system.numbers GROUP BY number HAVING count() > 1" 2>&1 | grep -o -P 'maximum: [\d\.]+ MiB' diff --git a/utils/backup/backup b/utils/backup/backup new file mode 100755 index 00000000000..6aa9c179033 --- /dev/null +++ b/utils/backup/backup @@ -0,0 +1,47 @@ +#!/bin/bash + +user="default" +path="." + +usage() { + echo + echo "A trivial script to upload your files into ClickHouse." + echo "You might want to use something like Dropbox instead, but..." + echo + echo "Usage: $0 --host [--user ] --password " + exit 1 +} + +while [[ "$#" -gt 0 ]]; do + case "$1" in + --host) + host="$2" + shift 2 + ;; + --user) + user="$2" + shift 2 + ;; + --password) + password="$2" + shift 2 + ;; + --help) + usage + ;; + *) + path="$1" + shift 1 + ;; + esac +done + +if [ -z "$host" ] || [ -z "$password" ]; then + echo "Error: --host and --password are mandatory." + usage +fi + +clickhouse-client --host "$host" --user "$user" --password "$password" --secure --query "CREATE TABLE IF NOT EXISTS default.files (time DEFAULT now(), path String, content String CODEC(ZSTD(6))) ENGINE = MergeTree ORDER BY (path, time)" && +find "$path" -type f | clickhouse-local --input-format LineAsString \ + --max-block-size 1 --min-insert-block-size-rows 0 --min-insert-block-size-bytes '100M' --max-insert-threads 1 \ + --query "INSERT INTO FUNCTION remoteSecure('$host', default.files, '$user', '$password') (path, content) SELECT line, file(line) FROM table" --progress diff --git a/utils/check-style/aspell-ignore/en/aspell-dict.txt b/utils/check-style/aspell-ignore/en/aspell-dict.txt index 5c3991ed293..229eccefa48 100644 --- a/utils/check-style/aspell-ignore/en/aspell-dict.txt +++ b/utils/check-style/aspell-ignore/en/aspell-dict.txt @@ -575,6 +575,7 @@ MySQLDump MySQLThreads NATS NCHAR +NDJSON NEKUDOTAYIM NEWDATE NEWDECIMAL @@ -717,6 +718,8 @@ PlantUML PointDistKm PointDistM PointDistRads +PostHistory +PostLink PostgreSQLConnection PostgreSQLThreads Postgres @@ -2516,6 +2519,7 @@ sqlite sqrt src srcReplicas +stackoverflow stacktrace stacktraces startsWith @@ -2854,6 +2858,7 @@ userver utils uuid uuidv +vCPU varPop varPopStable varSamp diff --git a/utils/keeper-bench/Runner.cpp b/utils/keeper-bench/Runner.cpp index ed7e09685f0..5ae4c7a0b1c 100644 --- a/utils/keeper-bench/Runner.cpp +++ b/utils/keeper-bench/Runner.cpp @@ -1238,9 +1238,13 @@ void Runner::createConnections() std::shared_ptr Runner::getConnection(const ConnectionInfo & connection_info, size_t connection_info_idx) { - Coordination::ZooKeeper::Node node{Poco::Net::SocketAddress{connection_info.host}, static_cast(connection_info_idx), connection_info.secure}; - std::vector nodes; - nodes.push_back(node); + zkutil::ShuffleHost host; + host.host = connection_info.host; + host.secure = connection_info.secure; + host.original_index = static_cast(connection_info_idx); + host.address = Poco::Net::SocketAddress{connection_info.host}; + + zkutil::ShuffleHosts nodes{host}; zkutil::ZooKeeperArgs args; args.session_timeout_ms = connection_info.session_timeout_ms; args.connection_timeout_ms = connection_info.connection_timeout_ms;