Merge branch 'master' into Backup_Restore_concurrency_check_node_2

This commit is contained in:
Smita Kulkarni 2023-04-04 18:51:30 +02:00
commit beb164dd51
155 changed files with 8338 additions and 822 deletions

View File

@ -0,0 +1,29 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v22.8.16.32-lts (7c4be737bd0) FIXME as compared to v22.8.15.23-lts (d36fa168bbf)
#### Build/Testing/Packaging Improvement
* Backported in [#48344](https://github.com/ClickHouse/ClickHouse/issues/48344): Use sccache as a replacement for ccache and using S3 as cache backend. [#46240](https://github.com/ClickHouse/ClickHouse/pull/46240) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#48250](https://github.com/ClickHouse/ClickHouse/issues/48250): The `clickhouse/clickhouse-keeper` image used to be pushed only with tags `-alpine`, e.g. `latest-alpine`. As it was suggested in https://github.com/ClickHouse/examples/pull/2, now it will be pushed as suffixless too. [#48236](https://github.com/ClickHouse/ClickHouse/pull/48236) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix bug in zero-copy replication disk choice during fetch [#47010](https://github.com/ClickHouse/ClickHouse/pull/47010) ([alesapin](https://github.com/alesapin)).
* Fix query parameters [#47488](https://github.com/ClickHouse/ClickHouse/pull/47488) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wait for zero copy lock during move [#47631](https://github.com/ClickHouse/ClickHouse/pull/47631) ([alesapin](https://github.com/alesapin)).
* Fix crash in polygonsSymDifferenceCartesian [#47702](https://github.com/ClickHouse/ClickHouse/pull/47702) ([pufit](https://github.com/pufit)).
* Backport to 22.8: Fix moving broken parts to the detached for the object storage disk on startup [#48273](https://github.com/ClickHouse/ClickHouse/pull/48273) ([Aleksei Filatov](https://github.com/aalexfvk)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Add a fuse for backport branches w/o a created PR [#47760](https://github.com/ClickHouse/ClickHouse/pull/47760) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Only valid Reviews.STATES overwrite existing reviews [#47789](https://github.com/ClickHouse/ClickHouse/pull/47789) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Place short return before big block, improve logging [#47822](https://github.com/ClickHouse/ClickHouse/pull/47822) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Artifacts s3 prefix [#47945](https://github.com/ClickHouse/ClickHouse/pull/47945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix tsan error lock-order-inversion [#47953](https://github.com/ClickHouse/ClickHouse/pull/47953) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -0,0 +1,34 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.1.6.42-stable (783ddf67991) FIXME as compared to v23.1.5.24-stable (0e51b53ba99)
#### Build/Testing/Packaging Improvement
* Backported in [#48215](https://github.com/ClickHouse/ClickHouse/issues/48215): Use sccache as a replacement for ccache and using S3 as cache backend. [#46240](https://github.com/ClickHouse/ClickHouse/pull/46240) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#48254](https://github.com/ClickHouse/ClickHouse/issues/48254): The `clickhouse/clickhouse-keeper` image used to be pushed only with tags `-alpine`, e.g. `latest-alpine`. As it was suggested in https://github.com/ClickHouse/examples/pull/2, now it will be pushed as suffixless too. [#48236](https://github.com/ClickHouse/ClickHouse/pull/48236) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix changing an expired role [#46772](https://github.com/ClickHouse/ClickHouse/pull/46772) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix bug in zero-copy replication disk choice during fetch [#47010](https://github.com/ClickHouse/ClickHouse/pull/47010) ([alesapin](https://github.com/alesapin)).
* Fix NOT_IMPLEMENTED error with CROSS JOIN and algorithm = auto [#47068](https://github.com/ClickHouse/ClickHouse/pull/47068) ([Vladimir C](https://github.com/vdimir)).
* Disable logical expression optimizer for expression with aliases. [#47451](https://github.com/ClickHouse/ClickHouse/pull/47451) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix query parameters [#47488](https://github.com/ClickHouse/ClickHouse/pull/47488) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Parameterized view bug fix 47287 47247 [#47495](https://github.com/ClickHouse/ClickHouse/pull/47495) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix wait for zero copy lock during move [#47631](https://github.com/ClickHouse/ClickHouse/pull/47631) ([alesapin](https://github.com/alesapin)).
* Hotfix for too verbose warnings in HTTP [#47903](https://github.com/ClickHouse/ClickHouse/pull/47903) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Better error messages in ReplicatedMergeTreeAttachThread [#47454](https://github.com/ClickHouse/ClickHouse/pull/47454) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix `00933_test_fix_extra_seek_on_compressed_cache` in releases. [#47490](https://github.com/ClickHouse/ClickHouse/pull/47490) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a fuse for backport branches w/o a created PR [#47760](https://github.com/ClickHouse/ClickHouse/pull/47760) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Only valid Reviews.STATES overwrite existing reviews [#47789](https://github.com/ClickHouse/ClickHouse/pull/47789) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Place short return before big block, improve logging [#47822](https://github.com/ClickHouse/ClickHouse/pull/47822) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Artifacts s3 prefix [#47945](https://github.com/ClickHouse/ClickHouse/pull/47945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix tsan error lock-order-inversion [#47953](https://github.com/ClickHouse/ClickHouse/pull/47953) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -0,0 +1,40 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.2.5.46-stable (b50faecbb12) FIXME as compared to v23.2.4.12-stable (8fe866cb035)
#### Improvement
* Backported in [#48164](https://github.com/ClickHouse/ClickHouse/issues/48164): Fixed `UNKNOWN_TABLE` exception when attaching to a materialized view that has dependent tables that are not available. This might be useful when trying to restore state from a backup. [#47975](https://github.com/ClickHouse/ClickHouse/pull/47975) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
#### Build/Testing/Packaging Improvement
* Backported in [#48216](https://github.com/ClickHouse/ClickHouse/issues/48216): Use sccache as a replacement for ccache and using S3 as cache backend. [#46240](https://github.com/ClickHouse/ClickHouse/pull/46240) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#48256](https://github.com/ClickHouse/ClickHouse/issues/48256): The `clickhouse/clickhouse-keeper` image used to be pushed only with tags `-alpine`, e.g. `latest-alpine`. As it was suggested in https://github.com/ClickHouse/examples/pull/2, now it will be pushed as suffixless too. [#48236](https://github.com/ClickHouse/ClickHouse/pull/48236) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix changing an expired role [#46772](https://github.com/ClickHouse/ClickHouse/pull/46772) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix bug in zero-copy replication disk choice during fetch [#47010](https://github.com/ClickHouse/ClickHouse/pull/47010) ([alesapin](https://github.com/alesapin)).
* Fix NOT_IMPLEMENTED error with CROSS JOIN and algorithm = auto [#47068](https://github.com/ClickHouse/ClickHouse/pull/47068) ([Vladimir C](https://github.com/vdimir)).
* Disable logical expression optimizer for expression with aliases. [#47451](https://github.com/ClickHouse/ClickHouse/pull/47451) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix query parameters [#47488](https://github.com/ClickHouse/ClickHouse/pull/47488) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Parameterized view bug fix 47287 47247 [#47495](https://github.com/ClickHouse/ClickHouse/pull/47495) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Proper fix for bug in parquet, revert reverted [#45878](https://github.com/ClickHouse/ClickHouse/issues/45878) [#47538](https://github.com/ClickHouse/ClickHouse/pull/47538) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix wait for zero copy lock during move [#47631](https://github.com/ClickHouse/ClickHouse/pull/47631) ([alesapin](https://github.com/alesapin)).
* Hotfix for too verbose warnings in HTTP [#47903](https://github.com/ClickHouse/ClickHouse/pull/47903) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* fix: keeper systemd service file include invalid inline comment [#47105](https://github.com/ClickHouse/ClickHouse/pull/47105) ([SuperDJY](https://github.com/cmsxbc)).
* Better error messages in ReplicatedMergeTreeAttachThread [#47454](https://github.com/ClickHouse/ClickHouse/pull/47454) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix `00933_test_fix_extra_seek_on_compressed_cache` in releases. [#47490](https://github.com/ClickHouse/ClickHouse/pull/47490) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix startup on older systemd versions [#47689](https://github.com/ClickHouse/ClickHouse/pull/47689) ([Thomas Casteleyn](https://github.com/Hipska)).
* Add a fuse for backport branches w/o a created PR [#47760](https://github.com/ClickHouse/ClickHouse/pull/47760) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Only valid Reviews.STATES overwrite existing reviews [#47789](https://github.com/ClickHouse/ClickHouse/pull/47789) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Place short return before big block, improve logging [#47822](https://github.com/ClickHouse/ClickHouse/pull/47822) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Artifacts s3 prefix [#47945](https://github.com/ClickHouse/ClickHouse/pull/47945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix tsan error lock-order-inversion [#47953](https://github.com/ClickHouse/ClickHouse/pull/47953) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -78,7 +78,8 @@ Of course, it's possible to manually run `CREATE TABLE` with same path on nonrel
### Inserts
When new rows are inserted into `KeeperMap`, if the key already exists, the value will be updated, otherwise new key is created.
When new rows are inserted into `KeeperMap`, if the key does not exist, a new entry for the key is created.
If the key exists, and setting `keeper_map_strict_mode` is set to `true`, an exception is thrown, otherwise, the value for the key is overwritten.
Example:
@ -89,6 +90,7 @@ INSERT INTO keeper_map_table VALUES ('some key', 1, 'value', 3.2);
### Deletes
Rows can be deleted using `DELETE` query or `TRUNCATE`.
If the key exists, and setting `keeper_map_strict_mode` is set to `true`, fetching and deleting data will succeed only if it can be executed atomically.
```sql
DELETE FROM keeper_map_table WHERE key LIKE 'some%' AND v1 > 1;
@ -105,6 +107,7 @@ TRUNCATE TABLE keeper_map_table;
### Updates
Values can be updated using `ALTER TABLE` query. Primary key cannot be updated.
If setting `keeper_map_strict_mode` is set to `true`, fetching and updating data will succeed only if it's executed atomically.
```sql
ALTER TABLE keeper_map_table UPDATE v1 = v1 * 10 + 2 WHERE key LIKE 'some%' AND v3 > 3.1;

View File

@ -103,6 +103,20 @@ cached - for that use setting [query_cache_min_query_runs](settings/settings.md#
Entries in the query cache become stale after a certain time period (time-to-live). By default, this period is 60 seconds but a different
value can be specified at session, profile or query level using setting [query_cache_ttl](settings/settings.md#query-cache-ttl).
Entries in the query cache are compressed by default. This reduces the overall memory consumption at the cost of slower writes into / reads
from the query cache. To disable compression, use setting [query_cache_compress_entries](settings/settings.md#query-cache-compress-entries).
ClickHouse reads table data in blocks of [max_block_size](settings/settings.md#settings-max_block_size) rows. Due to filtering, aggregation,
etc., result blocks are typically much smaller than 'max_block_size' but there are also cases where they are much bigger. Setting
[query_cache_squash_partial_results](settings/settings.md#query-cache-squash-partial-results) (enabled by default) controls if result blocks
are squashed (if they are tiny) or split (if they are large) into blocks of 'max_block_size' size before insertion into the query result
cache. This reduces performance of writes into the query cache but improves compression rate of cache entries and provides more natural
block granularity when query results are later served from the query cache.
As a result, the query cache stores for each query multiple (partial)
result blocks. While this behavior is a good default, it can be suppressed using setting
[query_cache_squash_partial_query_results](settings/settings.md#query-cache-squash-partial-query-results).
Also, results of queries with non-deterministic functions such as `rand()` and `now()` are not cached. This can be overruled using
setting [query_cache_store_results_of_queries_with_nondeterministic_functions](settings/settings.md#query-cache-store-results-of-queries-with-nondeterministic-functions).

View File

@ -1435,6 +1435,28 @@ Possible values:
Default value: `0`
## query_cache_compress_entries {#query-cache-compress-entries}
Compress entries in the [query cache](../query-cache.md). Lessens the memory consumption of the query cache at the cost of slower inserts into / reads from it.
Possible values:
- 0 - Disabled
- 1 - Enabled
Default value: `1`
## query_cache_squash_partial_results {#query-cache-squash-partial-results}
Squash partial result blocks to blocks of size [max_block_size](#setting-max_block_size). Reduces performance of inserts into the [query cache](../query-cache.md) but improves the compressability of cache entries (see [query_cache_compress-entries](#query_cache_compress_entries)).
Possible values:
- 0 - Disabled
- 1 - Enabled
Default value: `1`
## query_cache_ttl {#query-cache-ttl}
After this time in seconds entries in the [query cache](../query-cache.md) become stale.
@ -4049,3 +4071,44 @@ SELECT sum(number) FROM numbers(10000000000) SETTINGS partial_result_on_first_ca
Possible values: `true`, `false`
Default value: `false`
## function_json_value_return_type_allow_nullable
Control whether allow to return `NULL` when value is not exist for JSON_VALUE function.
```sql
SELECT JSON_VALUE('{"hello":"world"}', '$.b') settings function_json_value_return_type_allow_nullable=true;
┌─JSON_VALUE('{"hello":"world"}', '$.b')─┐
│ ᴺᵁᴸᴸ │
└────────────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
```
Possible values:
- true — Allow.
- false — Disallow.
Default value: `false`.
## function_json_value_return_type_allow_complex
Control whether allow to return complex type (such as: struct, array, map) for json_value function.
```sql
SELECT JSON_VALUE('{"hello":{"world":"!"}}', '$.hello') settings function_json_value_return_type_allow_complex=true
┌─JSON_VALUE('{"hello":{"world":"!"}}', '$.hello')─┐
│ {"world":"!"} │
└──────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
```
Possible values:
- true — Allow.
- false — Disallow.
Default value: `false`.

View File

@ -74,7 +74,7 @@ Never set the block size too small or too large.
You can use RAID-0 on SSD.
Regardless of RAID use, always use replication for data security.
Enable NCQ with a long queue. For HDD, choose the CFQ scheduler, and for SSD, choose noop. Dont reduce the readahead setting.
Enable NCQ with a long queue. For HDD, choose the mq-deadline or CFQ scheduler, and for SSD, choose noop. Dont reduce the readahead setting.
For HDD, enable the write cache.
Make sure that [`fstrim`](https://en.wikipedia.org/wiki/Trim_(computing)) is enabled for NVME and SSD disks in your OS (usually it's implemented using a cronjob or systemd service).

View File

@ -8,10 +8,150 @@ sidebar_label: clickhouse-local
The `clickhouse-local` program enables you to perform fast processing on local files, without having to deploy and configure the ClickHouse server. It accepts data that represent tables and queries them using [ClickHouse SQL dialect](../../sql-reference/index.md). `clickhouse-local` uses the same core as ClickHouse server, so it supports most of the features and the same set of formats and table engines.
By default `clickhouse-local` has access to data on the same host, and it does not depend on the server's configuration. It also supports loading server configuration using `--config-file` argument. For temporary data, a unique temporary data directory is created by default.
## Download clickhouse-local
`clickhouse-local` is executed using the same `clickhouse` binary that runs the ClickHouse server and `clickhouse-client`. The easiest way to download the latest version is with the following command:
```bash
curl https://clickhouse.com/ | sh
```
:::note
The binary you just downloaded can run all sorts of ClickHouse tools and utilities. If you want to run ClickHouse as a database server, check out the [Quick Start](../../quick-start.mdx).
:::
## Query data in a CSV file using SQL
A common use of `clickhouse-local` is to run ad-hoc queries on files: where you don't have to insert the data into a table. `clickhouse-local` can stream the data from a file into a temporary table and execute your SQL.
If the file is sitting on the same machine as `clickhouse-local`, use the `file` table engine. The following `reviews.tsv` file contains a sampling of Amazon product reviews:
```bash
./clickhouse local -q "SELECT * FROM file('reviews.tsv')"
```
ClickHouse knows the file uses a tab-separated format from filename extension. If you need to explicitly specify the format, simply add one of the [many ClickHouse input formats](../../interfaces/formats.md):
```bash
./clickhouse local -q "SELECT * FROM file('reviews.tsv', 'TabSeparated')"
```
The `file` table function creates a table, and you can use `DESCRIBE` to see the inferred schema:
```bash
./clickhouse local -q "DESCRIBE file('reviews.tsv')"
```
```response
marketplace Nullable(String)
customer_id Nullable(Int64)
review_id Nullable(String)
product_id Nullable(String)
product_parent Nullable(Int64)
product_title Nullable(String)
product_category Nullable(String)
star_rating Nullable(Int64)
helpful_votes Nullable(Int64)
total_votes Nullable(Int64)
vine Nullable(String)
verified_purchase Nullable(String)
review_headline Nullable(String)
review_body Nullable(String)
review_date Nullable(Date)
```
Let's find a product with the highest rating:
```bash
./clickhouse local -q "SELECT
argMax(product_title,star_rating),
max(star_rating)
FROM file('reviews.tsv')"
```
```response
Monopoly Junior Board Game 5
```
## Query data in a Parquet file in AWS S3
If you have a file in S3, use `clickhouse-local` and the `s3` table function to query the file in place (without inserting the data into a ClickHouse table). We have a file named `house_0.parquet` in a public bucket that contains home prices of property sold in the United Kingdom. Let's see how many rows it has:
```bash
./clickhouse local -q "
SELECT count()
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')"
```
The file has 2.7M rows:
```response
2772030
```
It's always useful to see what the inferred schema that ClickHouse determines from the file:
```bash
./clickhouse local -q "DESCRIBE s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')"
```
```response
price Nullable(Int64)
date Nullable(UInt16)
postcode1 Nullable(String)
postcode2 Nullable(String)
type Nullable(String)
is_new Nullable(UInt8)
duration Nullable(String)
addr1 Nullable(String)
addr2 Nullable(String)
street Nullable(String)
locality Nullable(String)
town Nullable(String)
district Nullable(String)
county Nullable(String)
```
Let's see what the most expensive neighborhoods are:
```bash
./clickhouse local -q "
SELECT
town,
district,
count() AS c,
round(avg(price)) AS price,
bar(price, 0, 5000000, 100)
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')
GROUP BY
town,
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 10"
```
```response
LONDON CITY OF LONDON 886 2271305 █████████████████████████████████████████████▍
LEATHERHEAD ELMBRIDGE 206 1176680 ███████████████████████▌
LONDON CITY OF WESTMINSTER 12577 1108221 ██████████████████████▏
LONDON KENSINGTON AND CHELSEA 8728 1094496 █████████████████████▉
HYTHE FOLKESTONE AND HYTHE 130 1023980 ████████████████████▍
CHALFONT ST GILES CHILTERN 113 835754 ████████████████▋
AMERSHAM BUCKINGHAMSHIRE 113 799596 ███████████████▉
VIRGINIA WATER RUNNYMEDE 356 789301 ███████████████▊
BARNET ENFIELD 282 740514 ██████████████▊
NORTHWOOD THREE RIVERS 184 731609 ██████████████▋
```
:::tip
When you are ready to insert your files into ClickHouse, startup a ClickHouse server and insert the results of your `file` and `s3` table functions into a `MergeTree` table. View the [Quick Start](../../quick-start.mdx) for more details.
:::
## Usage {#usage}
By default `clickhouse-local` has access to data of a ClickHouse server on the same host, and it does not depend on the server's configuration. It also supports loading server configuration using `--config-file` argument. For temporary data, a unique temporary data directory is created by default.
Basic usage (Linux):
``` bash
@ -24,7 +164,9 @@ Basic usage (Mac):
$ ./clickhouse local --structure "table_structure" --input-format "format_of_incoming_data" --query "query"
```
Also supported on Windows through WSL2.
:::note
`clickhouse-local` is also supported on Windows through WSL2.
:::
Arguments:

View File

@ -401,7 +401,7 @@ Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, jso
Parses a JSON and extract a value as JSON scalar.
If the value does not exist, an empty string will be returned.
If the value does not exist, an empty string will be returned by default, and by SET `function_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default, and by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned.
Example:
@ -410,6 +410,8 @@ SELECT JSON_VALUE('{"hello":"world"}', '$.hello');
SELECT JSON_VALUE('{"array":[[0, 1, 2, 3, 4, 5], [0, -1, -2, -3, -4, -5]]}', '$.array[*][0 to 2, 4]');
SELECT JSON_VALUE('{"hello":2}', '$.hello');
SELECT toTypeName(JSON_VALUE('{"hello":2}', '$.hello'));
select JSON_VALUE('{"hello":"world"}', '$.b') settings function_return_type_allow_nullable=true;
select JSON_VALUE('{"hello":{"world":"!"}}', '$.hello') settings function_json_value_return_type_allow_complex=true;
```
Result:

View File

@ -208,7 +208,7 @@ Type: [Array](../../sql-reference/data-types/array.md)([Tuple](../../sql-referen
Query:
``` sql
CREATE TABLE tupletest (`col` Tuple(user_ID UInt64, session_ID UInt64) ENGINE = Memory;
CREATE TABLE tupletest (col Tuple(user_ID UInt64, session_ID UInt64)) ENGINE = Memory;
INSERT INTO tupletest VALUES (tuple( 100, 2502)), (tuple(1,100));
@ -227,11 +227,11 @@ Result:
It is possible to transform colums to rows using this function:
``` sql
CREATE TABLE tupletest (`col` Tuple(CPU Float64, Memory Float64, Disk Float64)) ENGINE = Memory;
CREATE TABLE tupletest (col Tuple(CPU Float64, Memory Float64, Disk Float64)) ENGINE = Memory;
INSERT INTO tupletest VALUES(tuple(3.3, 5.5, 6.6));
SELECT arrayJoin(tupleToNameValuePairs(col))FROM tupletest;
SELECT arrayJoin(tupleToNameValuePairs(col)) FROM tupletest;
```
Result:

View File

@ -68,9 +68,9 @@ Result:
## mapFromArrays
Merges an [Array](../../sql-reference/data-types/array.md) of keys and an [Array](../../sql-reference/data-types/array.md) of values into a [Map(key, value)](../../sql-reference/data-types/map.md).
Merges an [Array](../../sql-reference/data-types/array.md) of keys and an [Array](../../sql-reference/data-types/array.md) of values into a [Map(key, value)](../../sql-reference/data-types/map.md). Notice that the second argument could also be a [Map](../../sql-reference/data-types/map.md), thus it is casted to an Array when executing.
The function is a more convenient alternative to `CAST((key_array, value_array), 'Map(key_type, value_type)')`. For example, instead of writing `CAST((['aa', 'bb'], [4, 5]), 'Map(String, UInt32)')`, you can write `mapFromArrays(['aa', 'bb'], [4, 5])`.
The function is a more convenient alternative to `CAST((key_array, value_array_or_map), 'Map(key_type, value_type)')`. For example, instead of writing `CAST((['aa', 'bb'], [4, 5]), 'Map(String, UInt32)')`, you can write `mapFromArrays(['aa', 'bb'], [4, 5])`.
**Syntax**
@ -82,11 +82,11 @@ Alias: `MAP_FROM_ARRAYS(keys, values)`
**Arguments**
- `keys` — Given key array to create a map from. The nested type of array must be: [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md), [LowCardinality](../../sql-reference/data-types/lowcardinality.md), [FixedString](../../sql-reference/data-types/fixedstring.md), [UUID](../../sql-reference/data-types/uuid.md), [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), [Date32](../../sql-reference/data-types/date32.md), [Enum](../../sql-reference/data-types/enum.md)
- `values` - Given value array to create a map from.
- `values` - Given value array or map to create a map from.
**Returned value**
- A map whose keys and values are constructed from the key and value arrays
- A map whose keys and values are constructed from the key array and value array/map.
**Example**
@ -94,13 +94,17 @@ Query:
```sql
select mapFromArrays(['a', 'b', 'c'], [1, 2, 3])
```
```text
┌─mapFromArrays(['a', 'b', 'c'], [1, 2, 3])─┐
│ {'a':1,'b':2,'c':3} │
└───────────────────────────────────────────┘
```
SELECT mapFromArrays([1, 2, 3], map('a', 1, 'b', 2, 'c', 3))
┌─mapFromArrays([1, 2, 3], map('a', 1, 'b', 2, 'c', 3))─┐
│ {1:('a',1),2:('b',2),3:('c',3)} │
└───────────────────────────────────────────────────────┘
```
## mapAdd

View File

@ -114,11 +114,11 @@ This will also create system tables even if message queue is empty.
## RELOAD CONFIG
Reloads ClickHouse configuration. Used when configuration is stored in ZooKeeper.
Reloads ClickHouse configuration. Used when configuration is stored in ZooKeeper. Note that `SYSTEM RELOAD CONFIG` does not reload `USER` configuration stored in ZooKeeper, it only reloads `USER` configuration that is stored in `users.xml`. To reload all `USER` config use `SYSTEM RELOAD USERS`
## RELOAD USERS
Reloads all access storages, including: users.xml, local disk access storage, replicated (in ZooKeeper) access storage. Note that `SYSTEM RELOAD CONFIG` will only reload users.xml access storage.
Reloads all access storages, including: users.xml, local disk access storage, replicated (in ZooKeeper) access storage.
## SHUTDOWN

View File

@ -1,4 +1,5 @@
#include "ClusterCopierApp.h"
#include <Common/ZooKeeper/ZooKeeper.h>
#include <Common/StatusFile.h>
#include <Common/TerminalSize.h>
#include <IO/ConnectionTimeouts.h>
@ -192,6 +193,8 @@ void ClusterCopierApp::mainImpl()
if (!task_file.empty())
copier->uploadTaskDescription(task_path, task_file, config().getBool("task-upload-force", false));
zkutil::validateZooKeeperConfig(config());
copier->init();
copier->process(ConnectionTimeouts::getTCPTimeoutsWithoutFailover(context->getSettingsRef()));

View File

@ -89,8 +89,12 @@ static std::vector<std::string> extractFromConfig(
if (has_zk_includes && process_zk_includes)
{
DB::ConfigurationPtr bootstrap_configuration(new Poco::Util::XMLConfiguration(config_xml));
zkutil::validateZooKeeperConfig(*bootstrap_configuration);
zkutil::ZooKeeperPtr zookeeper = std::make_shared<zkutil::ZooKeeper>(
*bootstrap_configuration, "zookeeper", nullptr);
*bootstrap_configuration, bootstrap_configuration->has("zookeeper") ? "zookeeper" : "keeper", nullptr);
zkutil::ZooKeeperNodeCache zk_node_cache([&] { return zookeeper; });
config_xml = processor.processConfig(&has_zk_includes, &zk_node_cache);
}

View File

@ -815,7 +815,8 @@ try
}
);
bool has_zookeeper = config().has("zookeeper");
zkutil::validateZooKeeperConfig(config());
bool has_zookeeper = zkutil::hasZooKeeperConfig(config());
zkutil::ZooKeeperNodeCache main_config_zk_node_cache([&] { return global_context->getZooKeeper(); });
zkutil::EventPtr main_config_zk_changed_event = std::make_shared<Poco::Event>();
@ -1307,7 +1308,7 @@ try
{
/// We do not load ZooKeeper configuration on the first config loading
/// because TestKeeper server is not started yet.
if (config->has("zookeeper"))
if (zkutil::hasZooKeeperConfig(*config))
global_context->reloadZooKeeperIfChanged(config);
global_context->reloadAuxiliaryZooKeepersConfigIfChanged(config);

View File

@ -8,7 +8,8 @@
namespace DB
{
BackupCoordinationLocal::BackupCoordinationLocal(bool plain_backup_) : file_infos(plain_backup_)
BackupCoordinationLocal::BackupCoordinationLocal(bool plain_backup_)
: log(&Poco::Logger::get("BackupCoordinationLocal")), file_infos(plain_backup_)
{
}
@ -35,7 +36,7 @@ Strings BackupCoordinationLocal::waitForStage(const String &, std::chrono::milli
void BackupCoordinationLocal::addReplicatedPartNames(const String & table_shared_id, const String & table_name_for_logs, const String & replica_name, const std::vector<PartNameAndChecksum> & part_names_and_checksums)
{
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addPartNames(table_shared_id, table_name_for_logs, replica_name, part_names_and_checksums);
replicated_tables.addPartNames({table_shared_id, table_name_for_logs, replica_name, part_names_and_checksums});
}
Strings BackupCoordinationLocal::getReplicatedPartNames(const String & table_shared_id, const String & replica_name) const
@ -48,7 +49,7 @@ Strings BackupCoordinationLocal::getReplicatedPartNames(const String & table_sha
void BackupCoordinationLocal::addReplicatedMutations(const String & table_shared_id, const String & table_name_for_logs, const String & replica_name, const std::vector<MutationInfo> & mutations)
{
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addMutations(table_shared_id, table_name_for_logs, replica_name, mutations);
replicated_tables.addMutations({table_shared_id, table_name_for_logs, replica_name, mutations});
}
std::vector<IBackupCoordination::MutationInfo> BackupCoordinationLocal::getReplicatedMutations(const String & table_shared_id, const String & replica_name) const
@ -61,7 +62,7 @@ std::vector<IBackupCoordination::MutationInfo> BackupCoordinationLocal::getRepli
void BackupCoordinationLocal::addReplicatedDataPath(const String & table_shared_id, const String & data_path)
{
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addDataPath(table_shared_id, data_path);
replicated_tables.addDataPath({table_shared_id, data_path});
}
Strings BackupCoordinationLocal::getReplicatedDataPaths(const String & table_shared_id) const
@ -74,7 +75,7 @@ Strings BackupCoordinationLocal::getReplicatedDataPaths(const String & table_sha
void BackupCoordinationLocal::addReplicatedAccessFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & file_path)
{
std::lock_guard lock{replicated_access_mutex};
replicated_access.addFilePath(access_zk_path, access_entity_type, "", file_path);
replicated_access.addFilePath({access_zk_path, access_entity_type, "", file_path});
}
Strings BackupCoordinationLocal::getReplicatedAccessFilePaths(const String & access_zk_path, AccessEntityType access_entity_type) const
@ -87,7 +88,7 @@ Strings BackupCoordinationLocal::getReplicatedAccessFilePaths(const String & acc
void BackupCoordinationLocal::addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path)
{
std::lock_guard lock{replicated_sql_objects_mutex};
replicated_sql_objects.addDirectory(loader_zk_path, object_type, "", dir_path);
replicated_sql_objects.addDirectory({loader_zk_path, object_type, "", dir_path});
}
Strings BackupCoordinationLocal::getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const
@ -125,7 +126,12 @@ bool BackupCoordinationLocal::startWritingFile(size_t data_file_index)
bool BackupCoordinationLocal::hasConcurrentBackups(const std::atomic<size_t> & num_active_backups) const
{
return (num_active_backups > 1);
if (num_active_backups > 1)
{
LOG_WARNING(log, "Found concurrent backups: num_active_backups={}", num_active_backups);
return true;
}
return false;
}
}

View File

@ -52,6 +52,8 @@ public:
bool hasConcurrentBackups(const std::atomic<size_t> & num_active_backups) const override;
private:
Poco::Logger * const log;
BackupCoordinationReplicatedTables TSA_GUARDED_BY(replicated_tables_mutex) replicated_tables;
BackupCoordinationReplicatedAccess TSA_GUARDED_BY(replicated_access_mutex) replicated_access;
BackupCoordinationReplicatedSQLObjects TSA_GUARDED_BY(replicated_sql_objects_mutex) replicated_sql_objects;

View File

@ -1,13 +1,18 @@
#include <Backups/BackupCoordinationRemote.h>
#include <base/hex.h>
#include <Access/Common/AccessEntityType.h>
#include <Backups/BackupCoordinationReplicatedAccess.h>
#include <Backups/BackupCoordinationStage.h>
#include <Common/escapeForFileName.h>
#include <Common/ZooKeeper/Common.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Functions/UserDefined/UserDefinedSQLObjectType.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteHelpers.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Common/escapeForFileName.h>
#include <Backups/BackupCoordinationStage.h>
namespace DB
@ -154,8 +159,7 @@ BackupCoordinationRemote::BackupCoordinationRemote(
const String & current_host_,
bool plain_backup_,
bool is_internal_)
: get_zookeeper(get_zookeeper_)
, root_zookeeper_path(root_zookeeper_path_)
: root_zookeeper_path(root_zookeeper_path_)
, zookeeper_path(root_zookeeper_path_ + "/backup-" + backup_uuid_)
, keeper_settings(keeper_settings_)
, backup_uuid(backup_uuid_)
@ -164,17 +168,33 @@ BackupCoordinationRemote::BackupCoordinationRemote(
, current_host_index(findCurrentHostIndex(all_hosts, current_host))
, plain_backup(plain_backup_)
, is_internal(is_internal_)
{
zookeeper_retries_info = ZooKeeperRetriesInfo(
"BackupCoordinationRemote",
&Poco::Logger::get("BackupCoordinationRemote"),
keeper_settings.keeper_max_retries,
keeper_settings.keeper_retry_initial_backoff_ms,
keeper_settings.keeper_retry_max_backoff_ms);
, log(&Poco::Logger::get("BackupCoordinationRemote"))
, with_retries(
log,
get_zookeeper_,
keeper_settings,
[zookeeper_path = zookeeper_path, current_host = current_host, is_internal = is_internal]
(WithRetries::FaultyKeeper & zk)
{
/// Recreate this ephemeral node to signal that we are alive.
if (is_internal)
{
String alive_node_path = zookeeper_path + "/stage/alive|" + current_host;
auto code = zk->tryCreate(alive_node_path, "", zkutil::CreateMode::Ephemeral);
if (code == Coordination::Error::ZNODEEXISTS)
zk->handleEphemeralNodeExistenceNoFailureInjection(alive_node_path, "");
else if (code != Coordination::Error::ZOK)
throw zkutil::KeeperException(code, alive_node_path);
}
})
{
createRootNodes();
stage_sync.emplace(
zookeeper_path + "/stage", [this] { return getZooKeeper(); }, &Poco::Logger::get("BackupCoordination"));
zookeeper_path,
with_retries,
log);
}
BackupCoordinationRemote::~BackupCoordinationRemote()
@ -190,44 +210,45 @@ BackupCoordinationRemote::~BackupCoordinationRemote()
}
}
zkutil::ZooKeeperPtr BackupCoordinationRemote::getZooKeeper() const
{
std::lock_guard lock{zookeeper_mutex};
if (!zookeeper || zookeeper->expired())
{
zookeeper = get_zookeeper();
/// It's possible that we connected to different [Zoo]Keeper instance
/// so we may read a bit stale state.
zookeeper->sync(zookeeper_path);
}
return zookeeper;
}
void BackupCoordinationRemote::createRootNodes()
{
auto zk = getZooKeeper();
zk->createAncestors(zookeeper_path);
zk->createIfNotExists(zookeeper_path, "");
zk->createIfNotExists(zookeeper_path + "/repl_part_names", "");
zk->createIfNotExists(zookeeper_path + "/repl_mutations", "");
zk->createIfNotExists(zookeeper_path + "/repl_data_paths", "");
zk->createIfNotExists(zookeeper_path + "/repl_access", "");
zk->createIfNotExists(zookeeper_path + "/repl_sql_objects", "");
zk->createIfNotExists(zookeeper_path + "/file_infos", "");
zk->createIfNotExists(zookeeper_path + "/writing_files", "");
auto holder = with_retries.createRetriesControlHolder("createRootNodes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
zk->createAncestors(zookeeper_path);
Coordination::Requests ops;
Coordination::Responses responses;
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path, "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_part_names", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_mutations", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_data_paths", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_access", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_sql_objects", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/file_infos", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/writing_files", "", zkutil::CreateMode::Persistent));
zk->tryMulti(ops, responses);
});
}
void BackupCoordinationRemote::removeAllNodes()
{
/// Usually this function is called by the initiator when a backup is complete so we don't need the coordination anymore.
///
/// However there can be a rare situation when this function is called after an error occurs on the initiator of a query
/// while some hosts are still making the backup. Removing all the nodes will remove the parent node of the backup coordination
/// at `zookeeper_path` which might cause such hosts to stop with exception "ZNONODE". Or such hosts might still do some useless part
/// of their backup work before that. Anyway in this case backup won't be finalized (because only an initiator can do that).
auto zk = getZooKeeper();
zk->removeRecursive(zookeeper_path);
auto holder = with_retries.createRetriesControlHolder("removeAllNodes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
/// Usually this function is called by the initiator when a backup is complete so we don't need the coordination anymore.
///
/// However there can be a rare situation when this function is called after an error occurs on the initiator of a query
/// while some hosts are still making the backup. Removing all the nodes will remove the parent node of the backup coordination
/// at `zookeeper_path` which might cause such hosts to stop with exception "ZNONODE". Or such hosts might still do some useless part
/// of their backup work before that. Anyway in this case backup won't be finalized (because only an initiator can do that).
with_retries.renewZooKeeper(zk);
zk->removeRecursive(zookeeper_path);
});
}
@ -255,10 +276,11 @@ Strings BackupCoordinationRemote::waitForStage(const String & stage_to_wait, std
void BackupCoordinationRemote::serializeToMultipleZooKeeperNodes(const String & path, const String & value, const String & logging_name)
{
{
ZooKeeperRetriesControl retries_ctl(logging_name + "::create", zookeeper_retries_info);
retries_ctl.retryLoop([&]
auto holder = with_retries.createRetriesControlHolder(logging_name + "::create");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
auto zk = getZooKeeper();
with_retries.renewZooKeeper(zk);
zk->createIfNotExists(path, "");
});
}
@ -279,10 +301,11 @@ void BackupCoordinationRemote::serializeToMultipleZooKeeperNodes(const String &
String part = value.substr(begin, end - begin);
String part_path = fmt::format("{}/{:06}", path, i);
ZooKeeperRetriesControl retries_ctl(logging_name + "::createPart", zookeeper_retries_info);
retries_ctl.retryLoop([&]
auto holder = with_retries.createRetriesControlHolder(logging_name + "::createPart");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
auto zk = getZooKeeper();
with_retries.renewZooKeeper(zk);
zk->createIfNotExists(part_path, part);
});
}
@ -293,9 +316,11 @@ String BackupCoordinationRemote::deserializeFromMultipleZooKeeperNodes(const Str
Strings part_names;
{
ZooKeeperRetriesControl retries_ctl(logging_name + "::getChildren", zookeeper_retries_info);
retries_ctl.retryLoop([&]{
auto zk = getZooKeeper();
auto holder = with_retries.createRetriesControlHolder(logging_name + "::getChildren");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
part_names = zk->getChildren(path);
std::sort(part_names.begin(), part_names.end());
});
@ -306,10 +331,11 @@ String BackupCoordinationRemote::deserializeFromMultipleZooKeeperNodes(const Str
{
String part;
String part_path = path + "/" + part_name;
ZooKeeperRetriesControl retries_ctl(logging_name + "::get", zookeeper_retries_info);
retries_ctl.retryLoop([&]
auto holder = with_retries.createRetriesControlHolder(logging_name + "::get");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
auto zk = getZooKeeper();
with_retries.renewZooKeeper(zk);
part = zk->get(part_path);
});
res += part;
@ -330,11 +356,16 @@ void BackupCoordinationRemote::addReplicatedPartNames(
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedPartNames() must not be called after preparing");
}
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_part_names/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(replica_name);
zk->create(path, ReplicatedPartNames::serialize(part_names_and_checksums, table_name_for_logs), zkutil::CreateMode::Persistent);
auto holder = with_retries.createRetriesControlHolder("addReplicatedPartNames");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_part_names/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(replica_name);
zk->createIfNotExists(path, ReplicatedPartNames::serialize(part_names_and_checksums, table_name_for_logs));
});
}
Strings BackupCoordinationRemote::getReplicatedPartNames(const String & table_shared_id, const String & replica_name) const
@ -356,11 +387,16 @@ void BackupCoordinationRemote::addReplicatedMutations(
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedMutations() must not be called after preparing");
}
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_mutations/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(replica_name);
zk->create(path, ReplicatedMutations::serialize(mutations, table_name_for_logs), zkutil::CreateMode::Persistent);
auto holder = with_retries.createRetriesControlHolder("addReplicatedMutations");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_mutations/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(replica_name);
zk->createIfNotExists(path, ReplicatedMutations::serialize(mutations, table_name_for_logs));
});
}
std::vector<IBackupCoordination::MutationInfo> BackupCoordinationRemote::getReplicatedMutations(const String & table_shared_id, const String & replica_name) const
@ -380,11 +416,16 @@ void BackupCoordinationRemote::addReplicatedDataPath(
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedDataPath() must not be called after preparing");
}
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_data_paths/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(data_path);
zk->createIfNotExists(path, "");
auto holder = with_retries.createRetriesControlHolder("addReplicatedDataPath");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_data_paths/" + escapeForFileName(table_shared_id);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(data_path);
zk->createIfNotExists(path, "");
});
}
Strings BackupCoordinationRemote::getReplicatedDataPaths(const String & table_shared_id) const
@ -400,55 +441,88 @@ void BackupCoordinationRemote::prepareReplicatedTables() const
if (replicated_tables)
return;
std::vector<BackupCoordinationReplicatedTables::PartNamesForTableReplica> part_names_for_replicated_tables;
{
auto holder = with_retries.createRetriesControlHolder("prepareReplicatedTables::repl_part_names");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
part_names_for_replicated_tables.clear();
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_part_names";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_replica_name : zk->getChildren(path2))
{
String replica_name = unescapeForFileName(escaped_replica_name);
auto part_names = ReplicatedPartNames::deserialize(zk->get(path2 + "/" + escaped_replica_name));
part_names_for_replicated_tables.push_back(
{table_shared_id, part_names.table_name_for_logs, replica_name, part_names.part_names_and_checksums});
}
}
});
}
std::vector<BackupCoordinationReplicatedTables::MutationsForTableReplica> mutations_for_replicated_tables;
{
auto holder = with_retries.createRetriesControlHolder("prepareReplicatedTables::repl_mutations");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
mutations_for_replicated_tables.clear();
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_mutations";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_replica_name : zk->getChildren(path2))
{
String replica_name = unescapeForFileName(escaped_replica_name);
auto mutations = ReplicatedMutations::deserialize(zk->get(path2 + "/" + escaped_replica_name));
mutations_for_replicated_tables.push_back(
{table_shared_id, mutations.table_name_for_logs, replica_name, mutations.mutations});
}
}
});
}
std::vector<BackupCoordinationReplicatedTables::DataPathForTableReplica> data_paths_for_replicated_tables;
{
auto holder = with_retries.createRetriesControlHolder("prepareReplicatedTables::repl_data_paths");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
data_paths_for_replicated_tables.clear();
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_data_paths";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_data_path : zk->getChildren(path2))
{
String data_path = unescapeForFileName(escaped_data_path);
data_paths_for_replicated_tables.push_back({table_shared_id, data_path});
}
}
});
}
replicated_tables.emplace();
auto zk = getZooKeeper();
{
String path = zookeeper_path + "/repl_part_names";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_replica_name : zk->getChildren(path2))
{
String replica_name = unescapeForFileName(escaped_replica_name);
auto part_names = ReplicatedPartNames::deserialize(zk->get(path2 + "/" + escaped_replica_name));
replicated_tables->addPartNames(table_shared_id, part_names.table_name_for_logs, replica_name, part_names.part_names_and_checksums);
}
}
}
{
String path = zookeeper_path + "/repl_mutations";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_replica_name : zk->getChildren(path2))
{
String replica_name = unescapeForFileName(escaped_replica_name);
auto mutations = ReplicatedMutations::deserialize(zk->get(path2 + "/" + escaped_replica_name));
replicated_tables->addMutations(table_shared_id, mutations.table_name_for_logs, replica_name, mutations.mutations);
}
}
}
{
String path = zookeeper_path + "/repl_data_paths";
for (const String & escaped_table_shared_id : zk->getChildren(path))
{
String table_shared_id = unescapeForFileName(escaped_table_shared_id);
String path2 = path + "/" + escaped_table_shared_id;
for (const String & escaped_data_path : zk->getChildren(path2))
{
String data_path = unescapeForFileName(escaped_data_path);
replicated_tables->addDataPath(table_shared_id, data_path);
}
}
}
for (auto & part_names : part_names_for_replicated_tables)
replicated_tables->addPartNames(std::move(part_names));
for (auto & mutations : mutations_for_replicated_tables)
replicated_tables->addMutations(std::move(mutations));
for (auto & data_paths : data_paths_for_replicated_tables)
replicated_tables->addDataPath(std::move(data_paths));
}
void BackupCoordinationRemote::addReplicatedAccessFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & file_path)
{
{
@ -457,13 +531,18 @@ void BackupCoordinationRemote::addReplicatedAccessFilePath(const String & access
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedAccessFilePath() must not be called after preparing");
}
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_access/" + escapeForFileName(access_zk_path);
zk->createIfNotExists(path, "");
path += "/" + AccessEntityTypeInfo::get(access_entity_type).name;
zk->createIfNotExists(path, "");
path += "/" + current_host;
zk->createIfNotExists(path, file_path);
auto holder = with_retries.createRetriesControlHolder("addReplicatedAccessFilePath");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_access/" + escapeForFileName(access_zk_path);
zk->createIfNotExists(path, "");
path += "/" + AccessEntityTypeInfo::get(access_entity_type).name;
zk->createIfNotExists(path, "");
path += "/" + current_host;
zk->createIfNotExists(path, file_path);
});
}
Strings BackupCoordinationRemote::getReplicatedAccessFilePaths(const String & access_zk_path, AccessEntityType access_entity_type) const
@ -478,25 +557,35 @@ void BackupCoordinationRemote::prepareReplicatedAccess() const
if (replicated_access)
return;
replicated_access.emplace();
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_access";
for (const String & escaped_access_zk_path : zk->getChildren(path))
std::vector<BackupCoordinationReplicatedAccess::FilePathForAccessEntitry> file_path_for_access_entities;
auto holder = with_retries.createRetriesControlHolder("prepareReplicatedAccess");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
String access_zk_path = unescapeForFileName(escaped_access_zk_path);
String path2 = path + "/" + escaped_access_zk_path;
for (const String & type_str : zk->getChildren(path2))
file_path_for_access_entities.clear();
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_access";
for (const String & escaped_access_zk_path : zk->getChildren(path))
{
AccessEntityType type = AccessEntityTypeInfo::parseType(type_str);
String path3 = path2 + "/" + type_str;
for (const String & host_id : zk->getChildren(path3))
String access_zk_path = unescapeForFileName(escaped_access_zk_path);
String path2 = path + "/" + escaped_access_zk_path;
for (const String & type_str : zk->getChildren(path2))
{
String file_path = zk->get(path3 + "/" + host_id);
replicated_access->addFilePath(access_zk_path, type, host_id, file_path);
AccessEntityType type = AccessEntityTypeInfo::parseType(type_str);
String path3 = path2 + "/" + type_str;
for (const String & host_id : zk->getChildren(path3))
{
String file_path = zk->get(path3 + "/" + host_id);
file_path_for_access_entities.push_back({access_zk_path, type, host_id, file_path});
}
}
}
}
});
replicated_access.emplace();
for (auto & file_path : file_path_for_access_entities)
replicated_access->addFilePath(std::move(file_path));
}
void BackupCoordinationRemote::addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path)
@ -507,21 +596,26 @@ void BackupCoordinationRemote::addReplicatedSQLObjectsDir(const String & loader_
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedSQLObjectsDir() must not be called after preparing");
}
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_sql_objects/" + escapeForFileName(loader_zk_path);
zk->createIfNotExists(path, "");
path += "/";
switch (object_type)
auto holder = with_retries.createRetriesControlHolder("addReplicatedSQLObjectsDir");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
case UserDefinedSQLObjectType::Function:
path += "functions";
break;
}
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_sql_objects/" + escapeForFileName(loader_zk_path);
zk->createIfNotExists(path, "");
zk->createIfNotExists(path, "");
path += "/" + current_host;
zk->createIfNotExists(path, dir_path);
path += "/";
switch (object_type)
{
case UserDefinedSQLObjectType::Function:
path += "functions";
break;
}
zk->createIfNotExists(path, "");
path += "/" + current_host;
zk->createIfNotExists(path, dir_path);
});
}
Strings BackupCoordinationRemote::getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const
@ -536,27 +630,36 @@ void BackupCoordinationRemote::prepareReplicatedSQLObjects() const
if (replicated_sql_objects)
return;
replicated_sql_objects.emplace();
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_sql_objects";
for (const String & escaped_loader_zk_path : zk->getChildren(path))
std::vector<BackupCoordinationReplicatedSQLObjects::DirectoryPathForSQLObject> directories_for_sql_objects;
auto holder = with_retries.createRetriesControlHolder("prepareReplicatedSQLObjects");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
String loader_zk_path = unescapeForFileName(escaped_loader_zk_path);
String objects_path = path + "/" + escaped_loader_zk_path;
directories_for_sql_objects.clear();
with_retries.renewZooKeeper(zk);
if (String functions_path = objects_path + "/functions"; zk->exists(functions_path))
String path = zookeeper_path + "/repl_sql_objects";
for (const String & escaped_loader_zk_path : zk->getChildren(path))
{
UserDefinedSQLObjectType object_type = UserDefinedSQLObjectType::Function;
for (const String & host_id : zk->getChildren(functions_path))
String loader_zk_path = unescapeForFileName(escaped_loader_zk_path);
String objects_path = path + "/" + escaped_loader_zk_path;
if (String functions_path = objects_path + "/functions"; zk->exists(functions_path))
{
String dir = zk->get(functions_path + "/" + host_id);
replicated_sql_objects->addDirectory(loader_zk_path, object_type, host_id, dir);
UserDefinedSQLObjectType object_type = UserDefinedSQLObjectType::Function;
for (const String & host_id : zk->getChildren(functions_path))
{
String dir = zk->get(functions_path + "/" + host_id);
directories_for_sql_objects.push_back({loader_zk_path, object_type, host_id, dir});
}
}
}
}
}
});
replicated_sql_objects.emplace();
for (auto & directory : directories_for_sql_objects)
replicated_sql_objects->addDirectory(std::move(directory));
}
void BackupCoordinationRemote::addFileInfos(BackupFileInfos && file_infos_)
{
@ -594,9 +697,11 @@ void BackupCoordinationRemote::prepareFileInfos() const
Strings hosts_with_file_infos;
{
ZooKeeperRetriesControl retries_ctl("prepareFileInfos::get_hosts", zookeeper_retries_info);
retries_ctl.retryLoop([&]{
auto zk = getZooKeeper();
auto holder = with_retries.createRetriesControlHolder("prepareFileInfos::get_hosts");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
hosts_with_file_infos = zk->getChildren(zookeeper_path + "/file_infos");
});
}
@ -615,10 +720,11 @@ bool BackupCoordinationRemote::startWritingFile(size_t data_file_index)
String full_path = zookeeper_path + "/writing_files/" + std::to_string(data_file_index);
String host_index_str = std::to_string(current_host_index);
ZooKeeperRetriesControl retries_ctl("startWritingFile", zookeeper_retries_info);
retries_ctl.retryLoop([&]
auto holder = with_retries.createRetriesControlHolder("startWritingFile");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
auto zk = getZooKeeper();
with_retries.renewZooKeeper(zk);
auto code = zk->tryCreate(full_path, host_index_str, zkutil::CreateMode::Persistent);
if (code == Coordination::Error::ZOK)
@ -632,54 +738,65 @@ bool BackupCoordinationRemote::startWritingFile(size_t data_file_index)
return acquired_writing;
}
bool BackupCoordinationRemote::hasConcurrentBackups(const std::atomic<size_t> &) const
{
/// If its internal concurrency will be checked for the base backup
if (is_internal)
return false;
auto zk = getZooKeeper();
std::string backup_stage_path = zookeeper_path + "/stage";
if (!zk->exists(root_zookeeper_path))
zk->createAncestors(root_zookeeper_path);
bool result = false;
for (size_t attempt = 0; attempt < MAX_ZOOKEEPER_ATTEMPTS; ++attempt)
auto holder = with_retries.createRetriesControlHolder("getAllArchiveSuffixes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
Coordination::Stat stat;
zk->get(root_zookeeper_path, &stat);
Strings existing_backup_paths = zk->getChildren(root_zookeeper_path);
with_retries.renewZooKeeper(zk);
for (const auto & existing_backup_path : existing_backup_paths)
if (!zk->exists(root_zookeeper_path))
zk->createAncestors(root_zookeeper_path);
for (size_t attempt = 0; attempt < MAX_ZOOKEEPER_ATTEMPTS; ++attempt)
{
if (startsWith(existing_backup_path, "restore-"))
continue;
Coordination::Stat stat;
zk->get(root_zookeeper_path, &stat);
Strings existing_backup_paths = zk->getChildren(root_zookeeper_path);
String existing_backup_uuid = existing_backup_path;
existing_backup_uuid.erase(0, String("backup-").size());
if (existing_backup_uuid == toString(backup_uuid))
continue;
String status;
if (zk->tryGet(root_zookeeper_path + "/" + existing_backup_path + "/stage", status))
for (const auto & existing_backup_path : existing_backup_paths)
{
if (status != Stage::COMPLETED)
return true;
if (startsWith(existing_backup_path, "restore-"))
continue;
String existing_backup_uuid = existing_backup_path;
existing_backup_uuid.erase(0, String("backup-").size());
if (existing_backup_uuid == toString(backup_uuid))
continue;
String status;
if (zk->tryGet(root_zookeeper_path + "/" + existing_backup_path + "/stage", status))
{
if (status != Stage::COMPLETED)
{
LOG_WARNING(log, "Found a concurrent backup: {}, current backup: {}", existing_backup_uuid, toString(backup_uuid));
result = true;
return;
}
}
}
zk->createIfNotExists(backup_stage_path, "");
auto code = zk->trySet(backup_stage_path, Stage::SCHEDULED_TO_START, stat.version);
if (code == Coordination::Error::ZOK)
break;
bool is_last_attempt = (attempt == MAX_ZOOKEEPER_ATTEMPTS - 1);
if ((code != Coordination::Error::ZBADVERSION) || is_last_attempt)
throw zkutil::KeeperException(code, backup_stage_path);
}
});
zk->createIfNotExists(backup_stage_path, "");
auto code = zk->trySet(backup_stage_path, Stage::SCHEDULED_TO_START, stat.version);
if (code == Coordination::Error::ZOK)
break;
bool is_last_attempt = (attempt == MAX_ZOOKEEPER_ATTEMPTS - 1);
if ((code != Coordination::Error::ZBADVERSION) || is_last_attempt)
throw zkutil::KeeperException(code, backup_stage_path);
}
return false;
return result;
}
}

View File

@ -6,7 +6,7 @@
#include <Backups/BackupCoordinationReplicatedSQLObjects.h>
#include <Backups/BackupCoordinationReplicatedTables.h>
#include <Backups/BackupCoordinationStageSync.h>
#include <Storages/MergeTree/ZooKeeperRetries.h>
#include <Backups/WithRetries.h>
namespace DB
@ -19,13 +19,7 @@ constexpr size_t MAX_ZOOKEEPER_ATTEMPTS = 10;
class BackupCoordinationRemote : public IBackupCoordination
{
public:
struct BackupKeeperSettings
{
UInt64 keeper_max_retries;
UInt64 keeper_retry_initial_backoff_ms;
UInt64 keeper_retry_max_backoff_ms;
UInt64 keeper_value_max_size;
};
using BackupKeeperSettings = WithRetries::KeeperSettings;
BackupCoordinationRemote(
zkutil::GetZooKeeper get_zookeeper_,
@ -79,7 +73,6 @@ public:
static size_t findCurrentHostIndex(const Strings & all_hosts, const String & current_host);
private:
zkutil::ZooKeeperPtr getZooKeeper() const;
void createRootNodes();
void removeAllNodes();
@ -94,7 +87,6 @@ private:
void prepareReplicatedSQLObjects() const TSA_REQUIRES(replicated_sql_objects_mutex);
void prepareFileInfos() const TSA_REQUIRES(file_infos_mutex);
const zkutil::GetZooKeeper get_zookeeper;
const String root_zookeeper_path;
const String zookeeper_path;
const BackupKeeperSettings keeper_settings;
@ -104,11 +96,12 @@ private:
const size_t current_host_index;
const bool plain_backup;
const bool is_internal;
Poco::Logger * const log;
mutable ZooKeeperRetriesInfo zookeeper_retries_info;
/// The order of these two fields matters, because stage_sync holds a reference to with_retries object
mutable WithRetries with_retries;
std::optional<BackupCoordinationStageSync> stage_sync;
mutable zkutil::ZooKeeperPtr TSA_GUARDED_BY(zookeeper_mutex) zookeeper;
mutable std::optional<BackupCoordinationReplicatedTables> TSA_GUARDED_BY(replicated_tables_mutex) replicated_tables;
mutable std::optional<BackupCoordinationReplicatedAccess> TSA_GUARDED_BY(replicated_access_mutex) replicated_access;
mutable std::optional<BackupCoordinationReplicatedSQLObjects> TSA_GUARDED_BY(replicated_sql_objects_mutex) replicated_sql_objects;

View File

@ -7,8 +7,13 @@ namespace DB
BackupCoordinationReplicatedAccess::BackupCoordinationReplicatedAccess() = default;
BackupCoordinationReplicatedAccess::~BackupCoordinationReplicatedAccess() = default;
void BackupCoordinationReplicatedAccess::addFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & host_id, const String & file_path)
void BackupCoordinationReplicatedAccess::addFilePath(FilePathForAccessEntitry && file_path_for_access_entity)
{
const auto & access_zk_path = file_path_for_access_entity.access_zk_path;
const auto & access_entity_type = file_path_for_access_entity.access_entity_type;
const auto & host_id = file_path_for_access_entity.host_id;
const auto & file_path = file_path_for_access_entity.file_path;
auto & ref = file_paths_by_zk_path[std::make_pair(access_zk_path, access_entity_type)];
ref.file_paths.emplace(file_path);

View File

@ -28,8 +28,16 @@ public:
BackupCoordinationReplicatedAccess();
~BackupCoordinationReplicatedAccess();
struct FilePathForAccessEntitry
{
String access_zk_path;
AccessEntityType access_entity_type;
String host_id;
String file_path;
};
/// Adds a path to access*.txt file keeping access entities of a ReplicatedAccessStorage.
void addFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & host_id, const String & file_path);
void addFilePath(FilePathForAccessEntitry && file_path_for_access_entity);
/// Returns all paths added by addFilePath() if `host_id` is a host chosen to store access.
Strings getFilePaths(const String & access_zk_path, AccessEntityType access_entity_type, const String & host_id) const;

View File

@ -7,8 +7,13 @@ namespace DB
BackupCoordinationReplicatedSQLObjects::BackupCoordinationReplicatedSQLObjects() = default;
BackupCoordinationReplicatedSQLObjects::~BackupCoordinationReplicatedSQLObjects() = default;
void BackupCoordinationReplicatedSQLObjects::addDirectory(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & host_id, const String & dir_path)
void BackupCoordinationReplicatedSQLObjects::addDirectory(DirectoryPathForSQLObject && directory_path_for_sql_object)
{
const auto & loader_zk_path = directory_path_for_sql_object.loader_zk_path;
const auto & object_type = directory_path_for_sql_object.object_type;
const auto & host_id = directory_path_for_sql_object.host_id;
const auto & dir_path = directory_path_for_sql_object.dir_path;
auto & ref = dir_paths_by_zk_path[std::make_pair(loader_zk_path, object_type)];
ref.dir_paths.emplace(dir_path);

View File

@ -28,8 +28,16 @@ public:
BackupCoordinationReplicatedSQLObjects();
~BackupCoordinationReplicatedSQLObjects();
struct DirectoryPathForSQLObject
{
String loader_zk_path;
UserDefinedSQLObjectType object_type;
String host_id;
String dir_path;
};
/// Adds a path to directory keeping user defined SQL objects.
void addDirectory(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & host_id, const String & dir_path);
void addDirectory(DirectoryPathForSQLObject && directory_path_for_sql_object);
/// Returns all added paths to directories if `host_id` is a host chosen to store user-defined SQL objects.
Strings getDirectories(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & host_id) const;

View File

@ -149,12 +149,13 @@ private:
BackupCoordinationReplicatedTables::BackupCoordinationReplicatedTables() = default;
BackupCoordinationReplicatedTables::~BackupCoordinationReplicatedTables() = default;
void BackupCoordinationReplicatedTables::addPartNames(
const String & table_shared_id,
const String & table_name_for_logs,
const String & replica_name,
const std::vector<PartNameAndChecksum> & part_names_and_checksums)
void BackupCoordinationReplicatedTables::addPartNames(PartNamesForTableReplica && part_names)
{
const auto & table_shared_id = part_names.table_shared_id;
const auto & table_name_for_logs = part_names.table_name_for_logs;
const auto & replica_name = part_names.replica_name;
const auto & part_names_and_checksums = part_names.part_names_and_checksums;
if (prepared)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addPartNames() must not be called after preparing");
@ -216,12 +217,13 @@ Strings BackupCoordinationReplicatedTables::getPartNames(const String & table_sh
return it2->second;
}
void BackupCoordinationReplicatedTables::addMutations(
const String & table_shared_id,
const String & table_name_for_logs,
const String & replica_name,
const std::vector<MutationInfo> & mutations)
void BackupCoordinationReplicatedTables::addMutations(MutationsForTableReplica && mutations_for_table_replica)
{
const auto & table_shared_id = mutations_for_table_replica.table_shared_id;
const auto & table_name_for_logs = mutations_for_table_replica.table_name_for_logs;
const auto & replica_name = mutations_for_table_replica.replica_name;
const auto & mutations = mutations_for_table_replica.mutations;
if (prepared)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addMutations() must not be called after preparing");
@ -254,8 +256,11 @@ BackupCoordinationReplicatedTables::getMutations(const String & table_shared_id,
return res;
}
void BackupCoordinationReplicatedTables::addDataPath(const String & table_shared_id, const String & data_path)
void BackupCoordinationReplicatedTables::addDataPath(DataPathForTableReplica && data_path_for_table_replica)
{
const auto & table_shared_id = data_path_for_table_replica.table_shared_id;
const auto & data_path = data_path_for_table_replica.data_path;
auto & table_info = table_infos[table_shared_id];
table_info.data_paths.emplace(data_path);
}

View File

@ -38,15 +38,19 @@ public:
using PartNameAndChecksum = IBackupCoordination::PartNameAndChecksum;
struct PartNamesForTableReplica
{
String table_shared_id;
String table_name_for_logs;
String replica_name;
std::vector<PartNameAndChecksum> part_names_and_checksums;
};
/// Adds part names which a specified replica of a replicated table is going to put to the backup.
/// Multiple replicas of the replicated table call this function and then the added part names can be returned by call of the function
/// getPartNames().
/// Checksums are used only to control that parts under the same names on different replicas are the same.
void addPartNames(
const String & table_shared_id,
const String & table_name_for_logs,
const String & replica_name,
const std::vector<PartNameAndChecksum> & part_names_and_checksums);
void addPartNames(PartNamesForTableReplica && part_names);
/// Returns the names of the parts which a specified replica of a replicated table should put to the backup.
/// This is the same list as it was added by call of the function addPartNames() but without duplications and without
@ -55,20 +59,30 @@ public:
using MutationInfo = IBackupCoordination::MutationInfo;
struct MutationsForTableReplica
{
String table_shared_id;
String table_name_for_logs;
String replica_name;
std::vector<MutationInfo> mutations;
};
/// Adds information about mutations of a replicated table.
void addMutations(
const String & table_shared_id,
const String & table_name_for_logs,
const String & replica_name,
const std::vector<MutationInfo> & mutations);
void addMutations(MutationsForTableReplica && mutations_for_table_replica);
/// Returns all mutations of a replicated table which are not finished for some data parts added by addReplicatedPartNames().
std::vector<MutationInfo> getMutations(const String & table_shared_id, const String & replica_name) const;
struct DataPathForTableReplica
{
String table_shared_id;
String data_path;
};
/// Adds a data path in backup for a replicated table.
/// Multiple replicas of the replicated table call this function and then all the added paths can be returned by call of the function
/// getDataPaths().
void addDataPath(const String & table_shared_id, const String & data_path);
void addDataPath(DataPathForTableReplica && data_path_for_table_replica);
/// Returns all the data paths in backup added for a replicated table (see also addReplicatedDataPath()).
Strings getDataPaths(const String & table_shared_id) const;

View File

@ -1,11 +1,13 @@
#include <Backups/BackupCoordinationStageSync.h>
#include <base/chrono_io.h>
#include <Common/ZooKeeper/Common.h>
#include <Common/Exception.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteHelpers.h>
#include <base/chrono_io.h>
namespace DB
@ -17,9 +19,12 @@ namespace ErrorCodes
}
BackupCoordinationStageSync::BackupCoordinationStageSync(const String & zookeeper_path_, zkutil::GetZooKeeper get_zookeeper_, Poco::Logger * log_)
: zookeeper_path(zookeeper_path_)
, get_zookeeper(get_zookeeper_)
BackupCoordinationStageSync::BackupCoordinationStageSync(
const String & root_zookeeper_path_,
WithRetries & with_retries_,
Poco::Logger * log_)
: zookeeper_path(root_zookeeper_path_ + "/stage")
, with_retries(with_retries_)
, log(log_)
{
createRootNodes();
@ -27,32 +32,48 @@ BackupCoordinationStageSync::BackupCoordinationStageSync(const String & zookeepe
void BackupCoordinationStageSync::createRootNodes()
{
auto zookeeper = get_zookeeper();
zookeeper->createAncestors(zookeeper_path);
zookeeper->createIfNotExists(zookeeper_path, "");
auto holder = with_retries.createRetriesControlHolder("createRootNodes");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
zookeeper->createAncestors(zookeeper_path);
zookeeper->createIfNotExists(zookeeper_path, "");
});
}
void BackupCoordinationStageSync::set(const String & current_host, const String & new_stage, const String & message)
{
auto zookeeper = get_zookeeper();
auto holder = with_retries.createRetriesControlHolder("set");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
/// Make an ephemeral node so the initiator can track if the current host is still working.
String alive_node_path = zookeeper_path + "/alive|" + current_host;
auto code = zookeeper->tryCreate(alive_node_path, "", zkutil::CreateMode::Ephemeral);
if (code != Coordination::Error::ZOK && code != Coordination::Error::ZNODEEXISTS)
throw zkutil::KeeperException(code, alive_node_path);
/// Make an ephemeral node so the initiator can track if the current host is still working.
String alive_node_path = zookeeper_path + "/alive|" + current_host;
auto code = zookeeper->tryCreate(alive_node_path, "", zkutil::CreateMode::Ephemeral);
if (code != Coordination::Error::ZOK && code != Coordination::Error::ZNODEEXISTS)
throw zkutil::KeeperException(code, alive_node_path);
zookeeper->createIfNotExists(zookeeper_path + "/started|" + current_host, "");
zookeeper->create(zookeeper_path + "/current|" + current_host + "|" + new_stage, message, zkutil::CreateMode::Persistent);
zookeeper->createIfNotExists(zookeeper_path + "/started|" + current_host, "");
zookeeper->createIfNotExists(zookeeper_path + "/current|" + current_host + "|" + new_stage, message);
});
}
void BackupCoordinationStageSync::setError(const String & current_host, const Exception & exception)
{
auto zookeeper = get_zookeeper();
WriteBufferFromOwnString buf;
writeStringBinary(current_host, buf);
writeException(exception, buf, true);
zookeeper->createIfNotExists(zookeeper_path + "/error", buf.str());
auto holder = with_retries.createRetriesControlHolder("setError");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
WriteBufferFromOwnString buf;
writeStringBinary(current_host, buf);
writeException(exception, buf, true);
zookeeper->createIfNotExists(zookeeper_path + "/error", buf.str());
});
}
Strings BackupCoordinationStageSync::wait(const Strings & all_hosts, const String & stage_to_wait)
@ -83,14 +104,24 @@ struct BackupCoordinationStageSync::State
};
BackupCoordinationStageSync::State BackupCoordinationStageSync::readCurrentState(
zkutil::ZooKeeperPtr zookeeper, const Strings & zk_nodes, const Strings & all_hosts, const String & stage_to_wait) const
const Strings & zk_nodes, const Strings & all_hosts, const String & stage_to_wait) const
{
std::unordered_set<std::string_view> zk_nodes_set{zk_nodes.begin(), zk_nodes.end()};
State state;
if (zk_nodes_set.contains("error"))
{
ReadBufferFromOwnString buf{zookeeper->get(zookeeper_path + "/error")};
String errors;
{
auto holder = with_retries.createRetriesControlHolder("readCurrentState");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
errors = zookeeper->get(zookeeper_path + "/error");
});
}
ReadBufferFromOwnString buf{errors};
String host;
readStringBinary(host, buf);
state.error = std::make_pair(host, readException(buf, fmt::format("Got error from {}", host)));
@ -102,8 +133,38 @@ BackupCoordinationStageSync::State BackupCoordinationStageSync::readCurrentState
if (!zk_nodes_set.contains("current|" + host + "|" + stage_to_wait))
{
UnreadyHostState unready_host_state;
unready_host_state.started = zk_nodes_set.contains("started|" + host);
unready_host_state.alive = zk_nodes_set.contains("alive|" + host);
const String started_node_name = "started|" + host;
const String alive_node_name = "alive|" + host;
const String alive_node_path = zookeeper_path + "/" + alive_node_name;
unready_host_state.started = zk_nodes_set.contains(started_node_name);
/// Because we do retries everywhere we can't fully rely on ephemeral nodes anymore.
/// Though we recreate "alive" node when reconnecting it might be not enough and race condition is possible.
/// And everything we can do here - just retry.
/// In worst case when we won't manage to see the alive node for a long time we will just abort the backup.
unready_host_state.alive = zk_nodes_set.contains(alive_node_name);
if (!unready_host_state.alive)
{
LOG_TRACE(log, "Seems like host ({}) is dead. Will retry the check to confirm", host);
auto holder = with_retries.createRetriesControlHolder("readCurrentState::checkAliveNode");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
if (zookeeper->existsNoFailureInjection(alive_node_path))
{
unready_host_state.alive = true;
return;
}
// Retry with backoff. We also check whether it is last retry or no, because we won't to rethrow an exception.
if (!holder.retries_ctl.isLastRetry())
holder.retries_ctl.setKeeperError(Coordination::Error::ZNONODE, "There is no alive node for host {}. Will retry", host);
});
}
LOG_TRACE(log, "Host ({}) appeared to be {}", host, unready_host_state.alive ? "alive" : "dead");
state.unready_hosts.emplace(host, unready_host_state);
if (!unready_host_state.alive && unready_host_state.started && !state.host_terminated)
state.host_terminated = host;
@ -113,51 +174,62 @@ BackupCoordinationStageSync::State BackupCoordinationStageSync::readCurrentState
if (state.host_terminated || !state.unready_hosts.empty())
return state;
state.results.reserve(all_hosts.size());
for (const auto & host : all_hosts)
state.results.emplace_back(zookeeper->get(zookeeper_path + "/current|" + host + "|" + stage_to_wait));
auto holder = with_retries.createRetriesControlHolder("waitImpl::collectStagesToWait");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
Strings results;
for (const auto & host : all_hosts)
results.emplace_back(zookeeper->get(zookeeper_path + "/current|" + host + "|" + stage_to_wait));
state.results = std::move(results);
});
return state;
}
Strings BackupCoordinationStageSync::waitImpl(const Strings & all_hosts, const String & stage_to_wait, std::optional<std::chrono::milliseconds> timeout) const
Strings BackupCoordinationStageSync::waitImpl(
const Strings & all_hosts, const String & stage_to_wait, std::optional<std::chrono::milliseconds> timeout) const
{
if (all_hosts.empty())
return {};
/// Wait until all hosts are ready or an error happens or time is out.
auto zookeeper = get_zookeeper();
/// Set by ZooKepper when list of zk nodes have changed.
auto watch = std::make_shared<Poco::Event>();
bool use_timeout = timeout.has_value();
std::chrono::steady_clock::time_point end_of_timeout;
if (use_timeout)
end_of_timeout = std::chrono::steady_clock::now() + std::chrono::duration_cast<std::chrono::steady_clock::duration>(*timeout);
State state;
String previous_unready_host; /// Used for logging: we don't want to log the same unready host again.
for (;;)
{
/// Get zk nodes and subscribe on their changes.
Strings zk_nodes = zookeeper->getChildren(zookeeper_path, nullptr, watch);
LOG_INFO(log, "Waiting for the stage {}", stage_to_wait);
/// Set by ZooKepper when list of zk nodes have changed.
auto watch = std::make_shared<Poco::Event>();
Strings zk_nodes;
{
auto holder = with_retries.createRetriesControlHolder("waitImpl::getChildren");
holder.retries_ctl.retryLoop(
[&, &zookeeper = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zookeeper);
watch->reset();
/// Get zk nodes and subscribe on their changes.
zk_nodes = zookeeper->getChildren(zookeeper_path, nullptr, watch);
});
}
/// Read and analyze the current state of zk nodes.
state = readCurrentState(zookeeper, zk_nodes, all_hosts, stage_to_wait);
state = readCurrentState(zk_nodes, all_hosts, stage_to_wait);
if (state.error || state.host_terminated || state.unready_hosts.empty())
break; /// Error happened or everything is ready.
/// Log that we will wait for another host.
/// Log that we will wait
const auto & unready_host = state.unready_hosts.begin()->first;
if (unready_host != previous_unready_host)
{
LOG_TRACE(log, "Waiting for host {}", unready_host);
previous_unready_host = unready_host;
}
LOG_INFO(log, "Waiting on ZooKeeper watch for any node to be changed (currently waiting for host {})", unready_host);
/// Wait until `watch_callback` is called by ZooKeeper meaning that zk nodes have changed.
{
@ -195,6 +267,7 @@ Strings BackupCoordinationStageSync::waitImpl(const Strings & all_hosts, const S
unready_host_state.started ? "" : ": Operation didn't start");
}
LOG_TRACE(log, "Everything is Ok. All hosts achieved stage {}", stage_to_wait);
return state.results;
}

View File

@ -1,7 +1,6 @@
#pragma once
#include <Common/ZooKeeper/Common.h>
#include <Backups/WithRetries.h>
namespace DB
{
@ -10,7 +9,10 @@ namespace DB
class BackupCoordinationStageSync
{
public:
BackupCoordinationStageSync(const String & zookeeper_path_, zkutil::GetZooKeeper get_zookeeper_, Poco::Logger * log_);
BackupCoordinationStageSync(
const String & root_zookeeper_path_,
WithRetries & with_retries_,
Poco::Logger * log_);
/// Sets the stage of the current host and signal other hosts if there were other hosts waiting for that.
void set(const String & current_host, const String & new_stage, const String & message);
@ -27,12 +29,13 @@ private:
void createRootNodes();
struct State;
State readCurrentState(zkutil::ZooKeeperPtr zookeeper, const Strings & zk_nodes, const Strings & all_hosts, const String & stage_to_wait) const;
State readCurrentState(const Strings & zk_nodes, const Strings & all_hosts, const String & stage_to_wait) const;
Strings waitImpl(const Strings & all_hosts, const String & stage_to_wait, std::optional<std::chrono::milliseconds> timeout) const;
String zookeeper_path;
zkutil::GetZooKeeper get_zookeeper;
/// A reference to the field of parent object - BackupCoordinationRemote or RestoreCoordinationRemote
WithRetries & with_retries;
Poco::Logger * log;
};

View File

@ -84,6 +84,12 @@ BackupEntriesCollector::BackupEntriesCollector(
, on_cluster_first_sync_timeout(context->getConfigRef().getUInt64("backups.on_cluster_first_sync_timeout", 180000))
, consistent_metadata_snapshot_timeout(context->getConfigRef().getUInt64("backups.consistent_metadata_snapshot_timeout", 600000))
, log(&Poco::Logger::get("BackupEntriesCollector"))
, global_zookeeper_retries_info(
"BackupEntriesCollector",
log,
context->getSettingsRef().backup_restore_keeper_max_retries,
context->getSettingsRef().backup_restore_keeper_retry_initial_backoff_ms,
context->getSettingsRef().backup_restore_keeper_retry_max_backoff_ms)
{
}
@ -482,7 +488,10 @@ std::vector<std::pair<ASTPtr, StoragePtr>> BackupEntriesCollector::findTablesInD
try
{
db_tables = database->getTablesForBackup(filter_by_table_name, context);
/// Database or table could be replicated - so may use ZooKeeper. We need to retry.
auto zookeeper_retries_info = global_zookeeper_retries_info;
ZooKeeperRetriesControl retries_ctl("getTablesForBackup", zookeeper_retries_info);
retries_ctl.retryLoop([&](){ db_tables = database->getTablesForBackup(filter_by_table_name, context); });
}
catch (Exception & e)
{
@ -745,6 +754,7 @@ void BackupEntriesCollector::addPostTask(std::function<void()> task)
/// Runs all the tasks added with addPostCollectingTask().
void BackupEntriesCollector::runPostTasks()
{
LOG_TRACE(log, "Will run {} post tasks", post_tasks.size());
/// Post collecting tasks can add other post collecting tasks, our code is fine with that.
while (!post_tasks.empty())
{
@ -752,6 +762,7 @@ void BackupEntriesCollector::runPostTasks()
post_tasks.pop();
std::move(task)();
}
LOG_TRACE(log, "All post tasks successfully executed");
}
size_t BackupEntriesCollector::getAccessCounter(AccessEntityType type)

View File

@ -6,6 +6,7 @@
#include <Parsers/ASTBackupQuery.h>
#include <Storages/IStorage_fwd.h>
#include <Storages/TableLockHolder.h>
#include <Storages/MergeTree/ZooKeeperRetries.h>
#include <filesystem>
#include <queue>
@ -96,6 +97,9 @@ private:
std::chrono::milliseconds on_cluster_first_sync_timeout;
std::chrono::milliseconds consistent_metadata_snapshot_timeout;
Poco::Logger * log;
/// Unfortunately we can use ZooKeeper for collecting information for backup
/// and we need to retry...
ZooKeeperRetriesInfo global_zookeeper_retries_info;
Strings all_hosts;
DDLRenamingMap renaming_map;

View File

@ -69,7 +69,7 @@ namespace
S3::CredentialsConfiguration
{
settings.auth_settings.use_environment_credentials.value_or(
context->getConfigRef().getBool("s3.use_environment_credentials", false)),
context->getConfigRef().getBool("s3.use_environment_credentials", true)),
settings.auth_settings.use_insecure_imds_request.value_or(
context->getConfigRef().getBool("s3.use_insecure_imds_request", false)),
settings.auth_settings.expiration_window_seconds.value_or(

View File

@ -58,10 +58,13 @@ namespace
BackupCoordinationRemote::BackupKeeperSettings keeper_settings
{
.keeper_max_retries = context->getSettingsRef().backup_keeper_max_retries,
.keeper_retry_initial_backoff_ms = context->getSettingsRef().backup_keeper_retry_initial_backoff_ms,
.keeper_retry_max_backoff_ms = context->getSettingsRef().backup_keeper_retry_max_backoff_ms,
.keeper_value_max_size = context->getSettingsRef().backup_keeper_value_max_size,
.keeper_max_retries = context->getSettingsRef().backup_restore_keeper_max_retries,
.keeper_retry_initial_backoff_ms = context->getSettingsRef().backup_restore_keeper_retry_initial_backoff_ms,
.keeper_retry_max_backoff_ms = context->getSettingsRef().backup_restore_keeper_retry_max_backoff_ms,
.batch_size_for_keeper_multiread = context->getSettingsRef().backup_restore_batch_size_for_keeper_multiread,
.keeper_fault_injection_probability = context->getSettingsRef().backup_restore_keeper_fault_injection_probability,
.keeper_fault_injection_seed = context->getSettingsRef().backup_restore_keeper_fault_injection_seed,
.keeper_value_max_size = context->getSettingsRef().backup_restore_keeper_value_max_size,
};
auto all_hosts = BackupSettings::Util::filterHostIDs(
@ -92,10 +95,27 @@ namespace
auto get_zookeeper = [global_context = context->getGlobalContext()] { return global_context->getZooKeeper(); };
RestoreCoordinationRemote::RestoreKeeperSettings keeper_settings
{
.keeper_max_retries = context->getSettingsRef().backup_restore_keeper_max_retries,
.keeper_retry_initial_backoff_ms = context->getSettingsRef().backup_restore_keeper_retry_initial_backoff_ms,
.keeper_retry_max_backoff_ms = context->getSettingsRef().backup_restore_keeper_retry_max_backoff_ms,
.batch_size_for_keeper_multiread = context->getSettingsRef().backup_restore_batch_size_for_keeper_multiread,
.keeper_fault_injection_probability = context->getSettingsRef().backup_restore_keeper_fault_injection_probability,
.keeper_fault_injection_seed = context->getSettingsRef().backup_restore_keeper_fault_injection_seed
};
auto all_hosts = BackupSettings::Util::filterHostIDs(
restore_settings.cluster_host_ids, restore_settings.shard_num, restore_settings.replica_num);
return std::make_shared<RestoreCoordinationRemote>(get_zookeeper, root_zk_path, toString(*restore_settings.restore_uuid), all_hosts, restore_settings.host_id, restore_settings.internal);
return std::make_shared<RestoreCoordinationRemote>(
get_zookeeper,
root_zk_path,
keeper_settings,
toString(*restore_settings.restore_uuid),
all_hosts,
restore_settings.host_id,
restore_settings.internal);
}
else
{
@ -660,7 +680,9 @@ void BackupsWorker::doRestore(
restore_coordination = makeRestoreCoordination(context, restore_settings, /* remote= */ on_cluster);
if (!allow_concurrent_restores && restore_coordination->hasConcurrentRestores(std::ref(num_active_restores)))
throw Exception(ErrorCodes::CONCURRENT_ACCESS_NOT_SUPPORTED, "Concurrent restores not supported, turn on setting 'allow_concurrent_restores'");
throw Exception(
ErrorCodes::CONCURRENT_ACCESS_NOT_SUPPORTED,
"Concurrent restores not supported, turn on setting 'allow_concurrent_restores'");
/// Do RESTORE.
if (on_cluster)

View File

@ -1,10 +1,14 @@
#include <Backups/RestoreCoordinationLocal.h>
#include <Common/logger_useful.h>
namespace DB
{
RestoreCoordinationLocal::RestoreCoordinationLocal() = default;
RestoreCoordinationLocal::RestoreCoordinationLocal() : log(&Poco::Logger::get("RestoreCoordinationLocal"))
{
}
RestoreCoordinationLocal::~RestoreCoordinationLocal() = default;
void RestoreCoordinationLocal::setStage(const String &, const String &)
@ -49,7 +53,12 @@ bool RestoreCoordinationLocal::acquireReplicatedSQLObjects(const String &, UserD
bool RestoreCoordinationLocal::hasConcurrentRestores(const std::atomic<size_t> & num_active_restores) const
{
return (num_active_restores > 1);
if (num_active_restores > 1)
{
LOG_WARNING(log, "Found concurrent backups: num_active_restores={}", num_active_restores);
return true;
}
return false;
}
}

View File

@ -42,6 +42,8 @@ public:
bool hasConcurrentRestores(const std::atomic<size_t> & num_active_restores) const override;
private:
Poco::Logger * const log;
std::set<std::pair<String /* database_zk_path */, String /* table_name */>> acquired_tables_in_replicated_databases;
std::unordered_set<String /* table_zk_path */> acquired_data_in_replicated_tables;
mutable std::mutex mutex;

View File

@ -1,9 +1,10 @@
#include <Backups/BackupCoordinationRemote.h>
#include <Backups/BackupCoordinationStage.h>
#include <Backups/RestoreCoordinationRemote.h>
#include <Functions/UserDefined/UserDefinedSQLObjectType.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Common/escapeForFileName.h>
#include <Backups/BackupCoordinationStage.h>
#include <Backups/BackupCoordinationRemote.h>
#include "Backups/BackupCoordinationStageSync.h"
namespace DB
{
@ -13,23 +14,47 @@ namespace Stage = BackupCoordinationStage;
RestoreCoordinationRemote::RestoreCoordinationRemote(
zkutil::GetZooKeeper get_zookeeper_,
const String & root_zookeeper_path_,
const RestoreKeeperSettings & keeper_settings_,
const String & restore_uuid_,
const Strings & all_hosts_,
const String & current_host_,
bool is_internal_)
: get_zookeeper(get_zookeeper_)
, root_zookeeper_path(root_zookeeper_path_)
, keeper_settings(keeper_settings_)
, restore_uuid(restore_uuid_)
, zookeeper_path(root_zookeeper_path_ + "/restore-" + restore_uuid_)
, all_hosts(all_hosts_)
, current_host(current_host_)
, current_host_index(BackupCoordinationRemote::findCurrentHostIndex(all_hosts, current_host))
, is_internal(is_internal_)
, log(&Poco::Logger::get("RestoreCoordinationRemote"))
, with_retries(
log,
get_zookeeper_,
keeper_settings,
[zookeeper_path = zookeeper_path, current_host = current_host, is_internal = is_internal]
(WithRetries::FaultyKeeper & zk)
{
/// Recreate this ephemeral node to signal that we are alive.
if (is_internal)
{
String alive_node_path = zookeeper_path + "/stage/alive|" + current_host;
auto code = zk->tryCreate(alive_node_path, "", zkutil::CreateMode::Ephemeral);
if (code == Coordination::Error::ZNODEEXISTS)
zk->handleEphemeralNodeExistenceNoFailureInjection(alive_node_path, "");
else if (code != Coordination::Error::ZOK)
throw zkutil::KeeperException(code, alive_node_path);
}
})
{
createRootNodes();
stage_sync.emplace(
zookeeper_path + "/stage", [this] { return getZooKeeper(); }, &Poco::Logger::get("RestoreCoordination"));
zookeeper_path,
with_retries,
log);
}
RestoreCoordinationRemote::~RestoreCoordinationRemote()
@ -45,31 +70,25 @@ RestoreCoordinationRemote::~RestoreCoordinationRemote()
}
}
zkutil::ZooKeeperPtr RestoreCoordinationRemote::getZooKeeper() const
{
std::lock_guard lock{mutex};
if (!zookeeper || zookeeper->expired())
{
zookeeper = get_zookeeper();
/// It's possible that we connected to different [Zoo]Keeper instance
/// so we may read a bit stale state.
zookeeper->sync(zookeeper_path);
}
return zookeeper;
}
void RestoreCoordinationRemote::createRootNodes()
{
auto zk = getZooKeeper();
zk->createAncestors(zookeeper_path);
zk->createIfNotExists(zookeeper_path, "");
zk->createIfNotExists(zookeeper_path + "/repl_databases_tables_acquired", "");
zk->createIfNotExists(zookeeper_path + "/repl_tables_data_acquired", "");
zk->createIfNotExists(zookeeper_path + "/repl_access_storages_acquired", "");
zk->createIfNotExists(zookeeper_path + "/repl_sql_objects_acquired", "");
}
auto holder = with_retries.createRetriesControlHolder("createRootNodes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
zk->createAncestors(zookeeper_path);
Coordination::Requests ops;
Coordination::Responses responses;
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path, "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_databases_tables_acquired", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_tables_data_acquired", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_access_storages_acquired", "", zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/repl_sql_objects_acquired", "", zkutil::CreateMode::Persistent));
zk->tryMulti(ops, responses);
});
}
void RestoreCoordinationRemote::setStage(const String & new_stage, const String & message)
{
@ -91,66 +110,121 @@ Strings RestoreCoordinationRemote::waitForStage(const String & stage_to_wait, st
return stage_sync->waitFor(all_hosts, stage_to_wait, timeout);
}
bool RestoreCoordinationRemote::acquireCreatingTableInReplicatedDatabase(const String & database_zk_path, const String & table_name)
{
auto zk = getZooKeeper();
bool result = false;
auto holder = with_retries.createRetriesControlHolder("acquireCreatingTableInReplicatedDatabase");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_databases_tables_acquired/" + escapeForFileName(database_zk_path);
zk->createIfNotExists(path, "");
String path = zookeeper_path + "/repl_databases_tables_acquired/" + escapeForFileName(database_zk_path);
zk->createIfNotExists(path, "");
path += "/" + escapeForFileName(table_name);
auto code = zk->tryCreate(path, "", zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
path += "/" + escapeForFileName(table_name);
auto code = zk->tryCreate(path, toString(current_host_index), zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
return (code == Coordination::Error::ZOK);
if (code == Coordination::Error::ZOK)
{
result = true;
return;
}
/// We need to check who created that node
result = zk->get(path) == toString(current_host_index);
});
return result;
}
bool RestoreCoordinationRemote::acquireInsertingDataIntoReplicatedTable(const String & table_zk_path)
{
auto zk = getZooKeeper();
bool result = false;
auto holder = with_retries.createRetriesControlHolder("acquireInsertingDataIntoReplicatedTable");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_tables_data_acquired/" + escapeForFileName(table_zk_path);
auto code = zk->tryCreate(path, "", zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
String path = zookeeper_path + "/repl_tables_data_acquired/" + escapeForFileName(table_zk_path);
auto code = zk->tryCreate(path, toString(current_host_index), zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
return (code == Coordination::Error::ZOK);
if (code == Coordination::Error::ZOK)
{
result = true;
return;
}
/// We need to check who created that node
result = zk->get(path) == toString(current_host_index);
});
return result;
}
bool RestoreCoordinationRemote::acquireReplicatedAccessStorage(const String & access_storage_zk_path)
{
auto zk = getZooKeeper();
bool result = false;
auto holder = with_retries.createRetriesControlHolder("acquireReplicatedAccessStorage");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_access_storages_acquired/" + escapeForFileName(access_storage_zk_path);
auto code = zk->tryCreate(path, "", zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
String path = zookeeper_path + "/repl_access_storages_acquired/" + escapeForFileName(access_storage_zk_path);
auto code = zk->tryCreate(path, toString(current_host_index), zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
return (code == Coordination::Error::ZOK);
if (code == Coordination::Error::ZOK)
{
result = true;
return;
}
/// We need to check who created that node
result = zk->get(path) == toString(current_host_index);
});
return result;
}
bool RestoreCoordinationRemote::acquireReplicatedSQLObjects(const String & loader_zk_path, UserDefinedSQLObjectType object_type)
{
auto zk = getZooKeeper();
bool result = false;
auto holder = with_retries.createRetriesControlHolder("acquireReplicatedSQLObjects");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
String path = zookeeper_path + "/repl_sql_objects_acquired/" + escapeForFileName(loader_zk_path);
zk->createIfNotExists(path, "");
String path = zookeeper_path + "/repl_sql_objects_acquired/" + escapeForFileName(loader_zk_path);
zk->createIfNotExists(path, "");
path += "/";
switch (object_type)
{
case UserDefinedSQLObjectType::Function:
path += "functions";
break;
}
path += "/";
switch (object_type)
{
case UserDefinedSQLObjectType::Function:
path += "functions";
break;
}
auto code = zk->tryCreate(path, "", zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
auto code = zk->tryCreate(path, "", zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, path);
return (code == Coordination::Error::ZOK);
if (code == Coordination::Error::ZOK)
{
result = true;
return;
}
/// We need to check who created that node
result = zk->get(path) == toString(current_host_index);
});
return result;
}
void RestoreCoordinationRemote::removeAllNodes()
@ -162,8 +236,13 @@ void RestoreCoordinationRemote::removeAllNodes()
/// at `zookeeper_path` which might cause such hosts to stop with exception "ZNONODE". Or such hosts might still do some part
/// of their restore work before that.
auto zk = getZooKeeper();
zk->removeRecursive(zookeeper_path);
auto holder = with_retries.createRetriesControlHolder("removeAllNodes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
with_retries.renewZooKeeper(zk);
zk->removeRecursive(zookeeper_path);
});
}
bool RestoreCoordinationRemote::hasConcurrentRestores(const std::atomic<size_t> &) const
@ -172,45 +251,57 @@ bool RestoreCoordinationRemote::hasConcurrentRestores(const std::atomic<size_t>
if (is_internal)
return false;
auto zk = getZooKeeper();
bool result = false;
std::string path = zookeeper_path +"/stage";
if (! zk->exists(root_zookeeper_path))
zk->createAncestors(root_zookeeper_path);
for (size_t attempt = 0; attempt < MAX_ZOOKEEPER_ATTEMPTS; ++attempt)
{
Coordination::Stat stat;
zk->get(root_zookeeper_path, &stat);
Strings existing_restore_paths = zk->getChildren(root_zookeeper_path);
for (const auto & existing_restore_path : existing_restore_paths)
auto holder = with_retries.createRetriesControlHolder("createRootNodes");
holder.retries_ctl.retryLoop(
[&, &zk = holder.faulty_zookeeper]()
{
if (startsWith(existing_restore_path, "backup-"))
continue;
with_retries.renewZooKeeper(zk);
String existing_restore_uuid = existing_restore_path;
existing_restore_uuid.erase(0, String("restore-").size());
if (! zk->exists(root_zookeeper_path))
zk->createAncestors(root_zookeeper_path);
if (existing_restore_uuid == toString(restore_uuid))
continue;
String status;
if (zk->tryGet(root_zookeeper_path + "/" + existing_restore_path + "/stage", status))
for (size_t attempt = 0; attempt < MAX_ZOOKEEPER_ATTEMPTS; ++attempt)
{
if (status != Stage::COMPLETED)
return true;
Coordination::Stat stat;
zk->get(root_zookeeper_path, &stat);
Strings existing_restore_paths = zk->getChildren(root_zookeeper_path);
for (const auto & existing_restore_path : existing_restore_paths)
{
if (startsWith(existing_restore_path, "backup-"))
continue;
String existing_restore_uuid = existing_restore_path;
existing_restore_uuid.erase(0, String("restore-").size());
if (existing_restore_uuid == toString(restore_uuid))
continue;
String status;
if (zk->tryGet(root_zookeeper_path + "/" + existing_restore_path + "/stage", status))
{
if (status != Stage::COMPLETED)
{
LOG_WARNING(log, "Found a concurrent restore: {}, current restore: {}", existing_restore_uuid, toString(restore_uuid));
result = true;
return;
}
}
}
zk->createIfNotExists(path, "");
auto code = zk->trySet(path, Stage::SCHEDULED_TO_START, stat.version);
if (code == Coordination::Error::ZOK)
break;
bool is_last_attempt = (attempt == MAX_ZOOKEEPER_ATTEMPTS - 1);
if ((code != Coordination::Error::ZBADVERSION) || is_last_attempt)
throw zkutil::KeeperException(code, path);
}
}
});
zk->createIfNotExists(path, "");
auto code = zk->trySet(path, Stage::SCHEDULED_TO_START, stat.version);
if (code == Coordination::Error::ZOK)
break;
bool is_last_attempt = (attempt == MAX_ZOOKEEPER_ATTEMPTS - 1);
if ((code != Coordination::Error::ZBADVERSION) || is_last_attempt)
throw zkutil::KeeperException(code, path);
}
return false;
return result;
}
}

View File

@ -2,6 +2,7 @@
#include <Backups/IRestoreCoordination.h>
#include <Backups/BackupCoordinationStageSync.h>
#include <Backups/WithRetries.h>
namespace DB
@ -11,9 +12,12 @@ namespace DB
class RestoreCoordinationRemote : public IRestoreCoordination
{
public:
using RestoreKeeperSettings = WithRetries::KeeperSettings;
RestoreCoordinationRemote(
zkutil::GetZooKeeper get_zookeeper_,
const String & root_zookeeper_path_,
const RestoreKeeperSettings & keeper_settings_,
const String & restore_uuid_,
const Strings & all_hosts_,
const String & current_host_,
@ -45,25 +49,26 @@ public:
bool hasConcurrentRestores(const std::atomic<size_t> & num_active_restores) const override;
private:
zkutil::ZooKeeperPtr getZooKeeper() const;
void createRootNodes();
void removeAllNodes();
class ReplicatedDatabasesMetadataSync;
/// get_zookeeper will provide a zookeeper client without any fault injection
const zkutil::GetZooKeeper get_zookeeper;
const String root_zookeeper_path;
const RestoreKeeperSettings keeper_settings;
const String restore_uuid;
const String zookeeper_path;
const Strings all_hosts;
const String current_host;
const size_t current_host_index;
const bool is_internal;
Poco::Logger * const log;
mutable WithRetries with_retries;
std::optional<BackupCoordinationStageSync> stage_sync;
mutable std::mutex mutex;
mutable zkutil::ZooKeeperPtr zookeeper;
};
}

View File

@ -0,0 +1,61 @@
#include <mutex>
#include <Backups/WithRetries.h>
namespace DB
{
WithRetries::WithRetries(Poco::Logger * log_, zkutil::GetZooKeeper get_zookeeper_, const KeeperSettings & settings_, RenewerCallback callback_)
: log(log_)
, get_zookeeper(get_zookeeper_)
, settings(settings_)
, callback(callback_)
, global_zookeeper_retries_info(
log->name(),
log,
settings.keeper_max_retries,
settings.keeper_retry_initial_backoff_ms,
settings.keeper_retry_max_backoff_ms)
{}
WithRetries::RetriesControlHolder::RetriesControlHolder(const WithRetries * parent, const String & name)
: info(parent->global_zookeeper_retries_info)
, retries_ctl(name, info)
, faulty_zookeeper(parent->getFaultyZooKeeper())
{}
WithRetries::RetriesControlHolder WithRetries::createRetriesControlHolder(const String & name)
{
return RetriesControlHolder(this, name);
}
void WithRetries::renewZooKeeper(FaultyKeeper my_faulty_zookeeper) const
{
std::lock_guard lock(zookeeper_mutex);
if (!zookeeper || zookeeper->expired())
{
zookeeper = get_zookeeper();
my_faulty_zookeeper->setKeeper(zookeeper);
callback(my_faulty_zookeeper);
}
}
WithRetries::FaultyKeeper WithRetries::getFaultyZooKeeper() const
{
/// We need to create new instance of ZooKeeperWithFaultInjection each time a copy a pointer to ZooKeeper client there
/// The reason is that ZooKeeperWithFaultInjection may reset the underlying pointer and there could be a race condition
/// when the same object is used from multiple threads.
auto faulty_zookeeper = ZooKeeperWithFaultInjection::createInstance(
settings.keeper_fault_injection_probability,
settings.keeper_fault_injection_seed,
zookeeper,
log->name(),
log);
return faulty_zookeeper;
}
}

79
src/Backups/WithRetries.h Normal file
View File

@ -0,0 +1,79 @@
#pragma once
#include <Storages/MergeTree/ZooKeeperRetries.h>
#include <Common/ZooKeeper/Common.h>
#include <Common/ZooKeeper/ZooKeeperWithFaultInjection.h>
namespace DB
{
/// In backups every request to [Zoo]Keeper should be retryable
/// and this tiny class encapsulates all the machinery for make it possible -
/// a [Zoo]Keeper client which injects faults with configurable probability
/// and a retries controller which performs retries with growing backoff.
class WithRetries
{
public:
using FaultyKeeper = Coordination::ZooKeeperWithFaultInjection::Ptr;
using RenewerCallback = std::function<void(FaultyKeeper &)>;
struct KeeperSettings
{
UInt64 keeper_max_retries{0};
UInt64 keeper_retry_initial_backoff_ms{0};
UInt64 keeper_retry_max_backoff_ms{0};
UInt64 batch_size_for_keeper_multiread{10000};
Float64 keeper_fault_injection_probability{0};
UInt64 keeper_fault_injection_seed{42};
UInt64 keeper_value_max_size{1048576};
};
/// For simplicity a separate ZooKeeperRetriesInfo and a faulty [Zoo]Keeper client
/// are stored in one place.
/// This helps to avoid writing too much boilerplate each time we need to
/// execute some operation (a set of requests) over [Zoo]Keeper with retries.
/// Why ZooKeeperRetriesInfo is separate for each operation?
/// The reason is that backup usually takes long time to finish and it makes no sense
/// to limit the overall number of retries (for example 1000) for the whole backup
/// and have a continuously growing backoff.
class RetriesControlHolder
{
public:
ZooKeeperRetriesInfo info;
ZooKeeperRetriesControl retries_ctl;
FaultyKeeper faulty_zookeeper;
private:
friend class WithRetries;
RetriesControlHolder(const WithRetries * parent, const String & name);
};
RetriesControlHolder createRetriesControlHolder(const String & name);
WithRetries(Poco::Logger * log, zkutil::GetZooKeeper get_zookeeper_, const KeeperSettings & settings, RenewerCallback callback);
/// Used to re-establish new connection inside a retry loop.
void renewZooKeeper(FaultyKeeper my_faulty_zookeeper) const;
private:
/// This will provide a special wrapper which is useful for testing
FaultyKeeper getFaultyZooKeeper() const;
Poco::Logger * log;
zkutil::GetZooKeeper get_zookeeper;
KeeperSettings settings;
/// This callback is called each time when a new [Zoo]Keeper session is created.
/// In backups it is primarily used to re-create an ephemeral node to signal the coordinator
/// that the host is alive and able to continue writing the backup.
/// Coordinator (or an initiator) of the backup also retries when it doesn't find an ephemeral node
/// for a particular host.
/// Again, this schema is not ideal. False-positives are still possible, but in worst case scenario
/// it could lead just to a failed backup which could possibly be successful
/// if there were a little bit more retries.
RenewerCallback callback;
ZooKeeperRetriesInfo global_zookeeper_retries_info;
/// This is needed only to protect zookeeper object
mutable std::mutex zookeeper_mutex;
mutable zkutil::ZooKeeperPtr zookeeper;
};
}

View File

@ -24,7 +24,7 @@ std::shared_ptr<Memory<>> ColumnCompressed::compressBuffer(const void * data, si
Memory<> compressed(max_dest_size);
auto compressed_size = LZ4_compress_default(
int compressed_size = LZ4_compress_default(
reinterpret_cast<const char *>(data),
compressed.data(),
static_cast<int>(data_size),

View File

@ -1,11 +1,14 @@
#pragma once
#include "ZooKeeper.h"
#include <functional>
#include <Common/ZooKeeper/ZooKeeper.h>
#include <Common/ZooKeeper/ZooKeeperWithFaultInjection.h>
namespace zkutil
{
using GetZooKeeper = std::function<ZooKeeperPtr()>;
using GetZooKeeperWithFaultInjection = std::function<Coordination::ZooKeeperWithFaultInjection::Ptr()>;
}

View File

@ -146,4 +146,3 @@ private:
};
}

View File

@ -29,6 +29,8 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
extern const int NOT_IMPLEMENTED;
extern const int BAD_ARGUMENTS;
extern const int NO_ELEMENTS_IN_CONFIG;
extern const int EXCESSIVE_ELEMENT_IN_CONFIG;
}
}
@ -1340,4 +1342,29 @@ String getSequentialNodeName(const String & prefix, UInt64 number)
return name;
}
void validateZooKeeperConfig(const Poco::Util::AbstractConfiguration & config)
{
if (config.has("zookeeper") && config.has("keeper"))
throw DB::Exception(DB::ErrorCodes::EXCESSIVE_ELEMENT_IN_CONFIG, "Both ZooKeeper and Keeper are specified");
}
bool hasZooKeeperConfig(const Poco::Util::AbstractConfiguration & config)
{
return config.has("zookeeper") || config.has("keeper") || (config.has("keeper_server") && config.getBool("keeper_server.use_cluster", true));
}
String getZooKeeperConfigName(const Poco::Util::AbstractConfiguration & config)
{
if (config.has("zookeeper"))
return "zookeeper";
if (config.has("keeper"))
return "keeper";
if (config.has("keeper_server") && config.getBool("keeper_server.use_cluster", true))
return "keeper_server";
throw DB::Exception(DB::ErrorCodes::NO_ELEMENTS_IN_CONFIG, "There is no Zookeeper configuration in server config");
}
}

View File

@ -669,4 +669,10 @@ String extractZooKeeperPath(const String & path, bool check_starts_with_slash, P
String getSequentialNodeName(const String & prefix, UInt64 number);
void validateZooKeeperConfig(const Poco::Util::AbstractConfiguration & config);
bool hasZooKeeperConfig(const Poco::Util::AbstractConfiguration & config);
String getZooKeeperConfigName(const Poco::Util::AbstractConfiguration & config);
}

View File

@ -18,6 +18,116 @@ namespace zkutil
{
ZooKeeperArgs::ZooKeeperArgs(const Poco::Util::AbstractConfiguration & config, const String & config_name)
{
if (config_name == "keeper_server")
initFromKeeperServerSection(config);
else
initFromKeeperSection(config, config_name);
if (!chroot.empty())
{
if (chroot.front() != '/')
throw KeeperException(
Coordination::Error::ZBADARGUMENTS,
"Root path in config file should start with '/', but got {}", chroot);
if (chroot.back() == '/')
chroot.pop_back();
}
if (session_timeout_ms < 0 || operation_timeout_ms < 0 || connection_timeout_ms < 0)
throw KeeperException("Timeout cannot be negative", Coordination::Error::ZBADARGUMENTS);
/// init get_priority_load_balancing
get_priority_load_balancing.hostname_differences.resize(hosts.size());
const String & local_hostname = getFQDNOrHostName();
for (size_t i = 0; i < hosts.size(); ++i)
{
const String & node_host = hosts[i].substr(0, hosts[i].find_last_of(':'));
get_priority_load_balancing.hostname_differences[i] = DB::getHostNameDifference(local_hostname, node_host);
}
}
ZooKeeperArgs::ZooKeeperArgs(const String & hosts_string)
{
splitInto<','>(hosts, hosts_string);
}
void ZooKeeperArgs::initFromKeeperServerSection(const Poco::Util::AbstractConfiguration & config)
{
static constexpr std::string_view config_name = "keeper_server";
if (auto key = std::string{config_name} + ".tcp_port_secure";
config.has(key))
{
auto tcp_port_secure = config.getString(key);
if (tcp_port_secure.empty())
throw KeeperException("Empty tcp_port_secure in config file", Coordination::Error::ZBADARGUMENTS);
}
bool secure{false};
std::string tcp_port;
if (auto tcp_port_secure_key = std::string{config_name} + ".tcp_port_secure";
config.has(tcp_port_secure_key))
{
secure = true;
tcp_port = config.getString(tcp_port_secure_key);
}
else if (auto tcp_port_key = std::string{config_name} + ".tcp_port";
config.has(tcp_port_key))
{
tcp_port = config.getString(tcp_port_key);
}
if (tcp_port.empty())
throw KeeperException("No tcp_port or tcp_port_secure in config file", Coordination::Error::ZBADARGUMENTS);
if (auto coordination_key = std::string{config_name} + ".coordination_settings";
config.has(coordination_key))
{
if (auto operation_timeout_key = coordination_key + ".operation_timeout_ms";
config.has(operation_timeout_key))
operation_timeout_ms = config.getInt(operation_timeout_key);
if (auto session_timeout_key = coordination_key + ".session_timeout_ms";
config.has(session_timeout_key))
session_timeout_ms = config.getInt(session_timeout_key);
}
Poco::Util::AbstractConfiguration::Keys keys;
std::string raft_configuration_key = std::string{config_name} + ".raft_configuration";
config.keys(raft_configuration_key, keys);
for (const auto & key : keys)
{
if (startsWith(key, "server"))
hosts.push_back(
(secure ? "secure://" : "") + config.getString(raft_configuration_key + "." + key + ".hostname") + ":" + tcp_port);
}
static constexpr std::array load_balancing_keys
{
".zookeeper_load_balancing",
".keeper_load_balancing"
};
for (const auto * load_balancing_key : load_balancing_keys)
{
if (auto load_balancing_config = std::string{config_name} + load_balancing_key;
config.has(load_balancing_config))
{
String load_balancing_str = config.getString(load_balancing_config);
/// Use magic_enum to avoid dependency from dbms (`SettingFieldLoadBalancingTraits::fromString(...)`)
auto load_balancing = magic_enum::enum_cast<DB::LoadBalancing>(Poco::toUpper(load_balancing_str));
if (!load_balancing)
throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Unknown load balancing: {}", load_balancing_str);
get_priority_load_balancing.load_balancing = *load_balancing;
break;
}
}
}
void ZooKeeperArgs::initFromKeeperSection(const Poco::Util::AbstractConfiguration & config, const std::string & config_name)
{
Poco::Util::AbstractConfiguration::Keys keys;
config.keys(config_name, keys);
@ -84,7 +194,7 @@ ZooKeeperArgs::ZooKeeperArgs(const Poco::Util::AbstractConfiguration & config, c
{
implementation = config.getString(config_name + "." + key);
}
else if (key == "zookeeper_load_balancing")
else if (key == "zookeeper_load_balancing" || key == "keeper_load_balancing")
{
String load_balancing_str = config.getString(config_name + "." + key);
/// Use magic_enum to avoid dependency from dbms (`SettingFieldLoadBalancingTraits::fromString(...)`)
@ -96,33 +206,6 @@ ZooKeeperArgs::ZooKeeperArgs(const Poco::Util::AbstractConfiguration & config, c
else
throw KeeperException(std::string("Unknown key ") + key + " in config file", Coordination::Error::ZBADARGUMENTS);
}
if (!chroot.empty())
{
if (chroot.front() != '/')
throw KeeperException(
Coordination::Error::ZBADARGUMENTS,
"Root path in config file should start with '/', but got {}", chroot);
if (chroot.back() == '/')
chroot.pop_back();
}
if (session_timeout_ms < 0 || operation_timeout_ms < 0 || connection_timeout_ms < 0)
throw KeeperException("Timeout cannot be negative", Coordination::Error::ZBADARGUMENTS);
/// init get_priority_load_balancing
get_priority_load_balancing.hostname_differences.resize(hosts.size());
const String & local_hostname = getFQDNOrHostName();
for (size_t i = 0; i < hosts.size(); ++i)
{
const String & node_host = hosts[i].substr(0, hosts[i].find_last_of(':'));
get_priority_load_balancing.hostname_differences[i] = DB::getHostNameDifference(local_hostname, node_host);
}
}
ZooKeeperArgs::ZooKeeperArgs(const String & hosts_string)
{
splitInto<','>(hosts, hosts_string);
}
}

View File

@ -37,6 +37,10 @@ struct ZooKeeperArgs
UInt64 recv_sleep_ms = 0;
DB::GetPriorityForLoadBalancing get_priority_load_balancing;
private:
void initFromKeeperServerSection(const Poco::Util::AbstractConfiguration & config);
void initFromKeeperSection(const Poco::Util::AbstractConfiguration & config, const std::string & config_name);
};
}

View File

@ -114,6 +114,7 @@ public:
void setKeeper(zk::Ptr const & keeper_) { keeper = keeper_; }
bool isNull() const { return keeper.get() == nullptr; }
bool expired() { return keeper->expired(); }
///
/// mirror ZooKeeper interface
@ -232,6 +233,11 @@ public:
return access("exists", path, [&]() { return keeper->exists(path, stat, watch); });
}
bool existsNoFailureInjection(const std::string & path, Coordination::Stat * stat = nullptr, const zkutil::EventPtr & watch = nullptr)
{
return access<false, false, false>("exists", path, [&]() { return keeper->exists(path, stat, watch); });
}
zkutil::ZooKeeper::MultiExistsResponse exists(const std::vector<std::string> & paths)
{
return access("exists", !paths.empty() ? paths.front() : "", [&]() { return keeper->exists(paths); });
@ -239,19 +245,30 @@ public:
std::string create(const std::string & path, const std::string & data, int32_t mode)
{
auto path_created = access(
"create",
std::string path_created;
auto code = tryCreate(path, data, mode, path_created);
if (code != Coordination::Error::ZOK)
throw zkutil::KeeperException(code, path);
return path_created;
}
Coordination::Error tryCreate(const std::string & path, const std::string & data, int32_t mode, std::string & path_created)
{
auto error = access(
"tryCreate",
path,
[&]() { return keeper->create(path, data, mode); },
[&](std::string const & result_path)
[&]() { return keeper->tryCreate(path, data, mode, path_created); },
[&](Coordination::Error &)
{
try
{
if (mode == zkutil::CreateMode::EphemeralSequential || mode == zkutil::CreateMode::Ephemeral)
{
keeper->remove(result_path);
keeper->remove(path);
if (unlikely(logger))
LOG_TRACE(logger, "ZooKeeperWithFaultInjection cleanup: seed={} func={} path={}", seed, "create", result_path);
LOG_TRACE(logger, "ZooKeeperWithFaultInjection cleanup: seed={} func={} path={}", seed, "tryCreate", path);
}
}
catch (const zkutil::KeeperException & e)
@ -261,8 +278,8 @@ public:
logger,
"ZooKeeperWithFaultInjection cleanup FAILED: seed={} func={} path={} code={} message={} ",
seed,
"create",
result_path,
"tryCreate",
path,
e.code,
e.message());
}
@ -272,10 +289,27 @@ public:
if (unlikely(fault_policy))
{
if (mode == zkutil::CreateMode::EphemeralSequential || mode == zkutil::CreateMode::Ephemeral)
ephemeral_nodes.push_back(path_created);
ephemeral_nodes.push_back(path);
}
return path_created;
return error;
}
Coordination::Error tryCreate(const std::string & path, const std::string & data, int32_t mode)
{
String path_created;
return tryCreate(path, data, mode, path_created);
}
void createIfNotExists(const std::string & path, const std::string & data)
{
std::string path_created;
auto code = tryCreate(path, data, zkutil::CreateMode::Persistent, path_created);
if (code == Coordination::Error::ZOK || code == Coordination::Error::ZNODEEXISTS)
return;
throw zkutil::KeeperException(code, path);
}
Coordination::Responses multi(const Coordination::Requests & requests)
@ -306,6 +340,27 @@ public:
return access("tryRemove", path, [&]() { return keeper->tryRemove(path, version); });
}
void removeRecursive(const std::string & path)
{
return access("removeRecursive", path, [&]() { return keeper->removeRecursive(path); });
}
std::string sync(const std::string & path)
{
return access("sync", path, [&]() { return keeper->sync(path); });
}
Coordination::Error trySet(const std::string & path, const std::string & data, int32_t version = -1, Coordination::Stat * stat = nullptr)
{
return access("trySet", path, [&]() { return keeper->trySet(path, data, version, stat); });
}
void handleEphemeralNodeExistenceNoFailureInjection(const std::string & path, const std::string & fast_delete_if_equal_value)
{
return access<false, false, false>("handleEphemeralNodeExistence", path, [&]() { return keeper->handleEphemeralNodeExistence(path, fast_delete_if_equal_value); });
}
void cleanupEphemeralNodes()
{
for (const auto & path : ephemeral_nodes)

View File

@ -105,7 +105,7 @@ void KeeperSnapshotManagerS3::updateS3Configuration(const Poco::Util::AbstractCo
std::move(headers),
S3::CredentialsConfiguration
{
auth_settings.use_environment_credentials.value_or(false),
auth_settings.use_environment_credentials.value_or(true),
auth_settings.use_insecure_imds_request.value_or(false),
auth_settings.expiration_window_seconds.value_or(S3::DEFAULT_EXPIRATION_WINDOW_SECONDS),
auth_settings.no_sign_request.value_or(false),

View File

@ -415,11 +415,13 @@ class IColumn;
M(UInt64, max_temporary_data_on_disk_size_for_user, 0, "The maximum amount of data consumed by temporary files on disk in bytes for all concurrently running user queries. Zero means unlimited.", 0)\
M(UInt64, max_temporary_data_on_disk_size_for_query, 0, "The maximum amount of data consumed by temporary files on disk in bytes for all concurrently running queries. Zero means unlimited.", 0)\
\
M(UInt64, backup_keeper_max_retries, 20, "Max retries for keeper operations during backup", 0) \
M(UInt64, backup_keeper_retry_initial_backoff_ms, 100, "Initial backoff timeout for [Zoo]Keeper operations during backup", 0) \
M(UInt64, backup_keeper_retry_max_backoff_ms, 5000, "Max backoff timeout for [Zoo]Keeper operations during backup", 0) \
M(UInt64, backup_keeper_value_max_size, 1048576, "Maximum size of data of a [Zoo]Keeper's node during backup", 0) \
M(UInt64, backup_batch_size_for_keeper_multiread, 10000, "Maximum size of batch for multiread request to [Zoo]Keeper during backup", 0) \
M(UInt64, backup_restore_keeper_max_retries, 20, "Max retries for keeper operations during backup or restore", 0) \
M(UInt64, backup_restore_keeper_retry_initial_backoff_ms, 100, "Initial backoff timeout for [Zoo]Keeper operations during backup or restore", 0) \
M(UInt64, backup_restore_keeper_retry_max_backoff_ms, 5000, "Max backoff timeout for [Zoo]Keeper operations during backup or restore", 0) \
M(Float, backup_restore_keeper_fault_injection_probability, 0.0f, "Approximate probability of failure for a keeper request during backup or restore. Valid value is in interval [0.0f, 1.0f]", 0) \
M(UInt64, backup_restore_keeper_fault_injection_seed, 0, "0 - random seed, otherwise the setting value", 0) \
M(UInt64, backup_restore_keeper_value_max_size, 1048576, "Maximum size of data of a [Zoo]Keeper's node during backup", 0) \
M(UInt64, backup_restore_batch_size_for_keeper_multiread, 10000, "Maximum size of batch for multiread request to [Zoo]Keeper during backup or restore", 0) \
\
M(Bool, log_profile_events, true, "Log query performance statistics into the query_log, query_thread_log and query_views_log.", 0) \
M(Bool, log_query_settings, true, "Log query settings into the query_log.", 0) \
@ -558,6 +560,8 @@ class IColumn;
M(Bool, query_cache_store_results_of_queries_with_nondeterministic_functions, false, "Store results of queries with non-deterministic functions (e.g. rand(), now()) in the query cache", 0) \
M(UInt64, query_cache_min_query_runs, 0, "Minimum number a SELECT query must run before its result is stored in the query cache", 0) \
M(Milliseconds, query_cache_min_query_duration, 0, "Minimum time in milliseconds for a query to run for its result to be stored in the query cache.", 0) \
M(Bool, query_cache_compress_entries, true, "Compress cache entries.", 0) \
M(Bool, query_cache_squash_partial_results, true, "Squash partial result blocks to blocks of size 'max_block_size'. Reduces performance of inserts into the query cache but improves the compressability of cache entries.", 0) \
M(Seconds, query_cache_ttl, 60, "After this time in seconds entries in the query cache become stale", 0) \
M(Bool, query_cache_share_between_users, false, "Allow other users to read entry in the query cache", 0) \
\
@ -722,6 +726,9 @@ class IColumn;
M(Bool, force_aggregation_in_order, false, "Force use of aggregation in order on remote nodes during distributed aggregation. PLEASE, NEVER CHANGE THIS SETTING VALUE MANUALLY!", IMPORTANT) \
M(UInt64, http_max_request_param_data_size, 10_MiB, "Limit on size of request data used as a query parameter in predefined HTTP requests.", 0) \
M(Bool, allow_experimental_undrop_table_query, false, "Allow to use undrop query to restore dropped table in a limited time", 0) \
M(Bool, keeper_map_strict_mode, false, "Enforce additional checks during operations on KeeperMap. E.g. throw an exception on an insert for already existing key", 0) \
M(Bool, function_json_value_return_type_allow_nullable, false, "Allow function to return nullable type.", 0) \
M(Bool, function_json_value_return_type_allow_complex, false, "Allow function to return complex type, such as: struct, array, map.", 0) \
// End of COMMON_SETTINGS
// Please add settings related to formats into the FORMAT_FACTORY_SETTINGS and move obsolete settings to OBSOLETE_SETTINGS.
@ -945,7 +952,6 @@ class IColumn;
\
M(Bool, dictionary_use_async_executor, false, "Execute a pipeline for reading from a dictionary with several threads. It's supported only by DIRECT dictionary with CLICKHOUSE source.", 0) \
// End of FORMAT_FACTORY_SETTINGS
// Please add settings non-related to formats into the COMMON_SETTINGS above.

View File

@ -34,6 +34,7 @@
#include <Parsers/parseQuery.h>
#include <Parsers/ParserCreateQuery.h>
#include <Parsers/queryToString.h>
#include <Storages/StorageKeeperMap.h>
namespace DB
{
@ -1390,6 +1391,13 @@ bool DatabaseReplicated::shouldReplicateQuery(const ContextPtr & query_context,
/// Some ALTERs are not replicated on database level
if (const auto * alter = query_ptr->as<const ASTAlterQuery>())
{
auto table_id = query_context->resolveStorageID(*alter, Context::ResolveOrdinary);
StoragePtr table = DatabaseCatalog::instance().getTable(table_id, query_context);
/// we never replicate KeeperMap operations because it doesn't make sense
if (auto * keeper_map = table->as<StorageKeeperMap>())
return false;
return !alter->isAttachAlter() && !alter->isFetchAlter() && !alter->isDropPartitionAlter();
}

View File

@ -154,7 +154,7 @@ std::unique_ptr<S3::Client> getClient(
{},
S3::CredentialsConfiguration
{
config.getBool(config_prefix + ".use_environment_credentials", config.getBool("s3.use_environment_credentials", false)),
config.getBool(config_prefix + ".use_environment_credentials", config.getBool("s3.use_environment_credentials", true)),
config.getBool(config_prefix + ".use_insecure_imds_request", config.getBool("s3.use_insecure_imds_request", false)),
config.getUInt64(config_prefix + ".expiration_window_seconds", config.getUInt64("s3.expiration_window_seconds", S3::DEFAULT_EXPIRATION_WINDOW_SECONDS)),
config.getBool(config_prefix + ".no_sign_request", config.getBool("s3.no_sign_request", false))

View File

@ -3,9 +3,11 @@
#include <sstream>
#include <type_traits>
#include <Columns/ColumnConst.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnsNumber.h>
#include <Core/Settings.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <Common/JSONParsers/DummyJSONParser.h>
@ -40,7 +42,7 @@ public:
class Executor
{
public:
static ColumnPtr run(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, uint32_t parse_depth)
static ColumnPtr run(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, uint32_t parse_depth, const ContextPtr & context)
{
MutableColumnPtr to{result_type->createColumn()};
to->reserve(input_rows_count);
@ -115,7 +117,6 @@ public:
/// Parse JSON for every row
Impl<JSONParser> impl;
for (const auto i : collections::range(0, input_rows_count))
{
std::string_view json{
@ -125,7 +126,7 @@ public:
bool added_to_column = false;
if (document_ok)
{
added_to_column = impl.insertResultToColumn(*to, document, res);
added_to_column = impl.insertResultToColumn(*to, document, res, context);
}
if (!added_to_column)
{
@ -154,7 +155,7 @@ public:
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
return Impl<DummyJSONParser>::getReturnType(Name::name, arguments);
return Impl<DummyJSONParser>::getReturnType(Name::name, arguments, getContext());
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
@ -167,9 +168,9 @@ public:
unsigned parse_depth = static_cast<unsigned>(getContext()->getSettingsRef().max_parser_depth);
#if USE_SIMDJSON
if (getContext()->getSettingsRef().allow_simdjson)
return FunctionSQLJSONHelpers::Executor<Name, Impl, SimdJSONParser>::run(arguments, result_type, input_rows_count, parse_depth);
return FunctionSQLJSONHelpers::Executor<Name, Impl, SimdJSONParser>::run(arguments, result_type, input_rows_count, parse_depth, getContext());
#endif
return FunctionSQLJSONHelpers::Executor<Name, Impl, DummyJSONParser>::run(arguments, result_type, input_rows_count, parse_depth);
return FunctionSQLJSONHelpers::Executor<Name, Impl, DummyJSONParser>::run(arguments, result_type, input_rows_count, parse_depth, getContext());
}
};
@ -194,11 +195,11 @@ class JSONExistsImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &) { return std::make_shared<DataTypeUInt8>(); }
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr &) { return std::make_shared<DataTypeUInt8>(); }
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr)
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr, const ContextPtr &)
{
GeneratorJSONPath<JSONParser> generator_json_path(query_ptr);
Element current_element = root;
@ -233,11 +234,22 @@ class JSONValueImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &) { return std::make_shared<DataTypeString>(); }
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr & context)
{
if (context->getSettingsRef().function_json_value_return_type_allow_nullable)
{
DataTypePtr string_type = std::make_shared<DataTypeString>();
return std::make_shared<DataTypeNullable>(string_type);
}
else
{
return std::make_shared<DataTypeString>();
}
}
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr)
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr, const ContextPtr & context)
{
GeneratorJSONPath<JSONParser> generator_json_path(query_ptr);
Element current_element = root;
@ -247,7 +259,11 @@ public:
{
if (status == VisitorStatus::Ok)
{
if (!(current_element.isArray() || current_element.isObject()))
if (context->getSettingsRef().function_json_value_return_type_allow_complex)
{
break;
}
else if (!(current_element.isArray() || current_element.isObject()))
{
break;
}
@ -267,9 +283,19 @@ public:
std::stringstream out; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
out << current_element.getElement();
auto output_str = out.str();
ColumnString & col_str = assert_cast<ColumnString &>(dest);
ColumnString::Chars & data = col_str.getChars();
ColumnString::Offsets & offsets = col_str.getOffsets();
ColumnString * col_str;
if (isColumnNullable(dest))
{
ColumnNullable & col_null = assert_cast<ColumnNullable &>(dest);
col_null.getNullMapData().push_back(0);
col_str = assert_cast<ColumnString *>(&col_null.getNestedColumn());
}
else
{
col_str = assert_cast<ColumnString *>(&dest);
}
ColumnString::Chars & data = col_str->getChars();
ColumnString::Offsets & offsets = col_str->getOffsets();
if (current_element.isString())
{
@ -280,7 +306,7 @@ public:
}
else
{
col_str.insertData(output_str.data(), output_str.size());
col_str->insertData(output_str.data(), output_str.size());
}
return true;
}
@ -296,11 +322,11 @@ class JSONQueryImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &) { return std::make_shared<DataTypeString>(); }
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr &) { return std::make_shared<DataTypeString>(); }
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr)
static bool insertResultToColumn(IColumn & dest, const Element & root, ASTPtr & query_ptr, const ContextPtr &)
{
GeneratorJSONPath<JSONParser> generator_json_path(query_ptr);
Element current_element = root;

View File

@ -20,6 +20,7 @@ namespace DB
namespace ErrorCodes
{
extern const int UNSUPPORTED_METHOD;
extern const int FUNCTION_CANNOT_HAVE_PARAMETERS;
}
void UserDefinedSQLFunctionVisitor::visit(ASTPtr & ast)
@ -132,6 +133,12 @@ ASTPtr UserDefinedSQLFunctionVisitor::tryToReplaceFunction(const ASTFunction & f
if (!user_defined_function)
return nullptr;
/// All UDFs are not parametric for now.
if (function.parameters)
{
throw Exception(ErrorCodes::FUNCTION_CANNOT_HAVE_PARAMETERS, "Function {} is not parametric", function.name);
}
const auto & function_arguments_list = function.children.at(0)->as<ASTExpressionList>();
auto & function_arguments = function_arguments_list->children;

View File

@ -174,23 +174,31 @@ public:
getName(),
arguments.size());
const auto * keys_type = checkAndGetDataType<DataTypeArray>(arguments[0].get());
if (!keys_type)
/// The first argument should always be Array.
/// Because key type can not be nested type of Map, which is Tuple
DataTypePtr key_type;
if (const auto * keys_type = checkAndGetDataType<DataTypeArray>(arguments[0].get()))
key_type = keys_type->getNestedType();
else
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "First argument for function {} must be an Array", getName());
const auto * values_type = checkAndGetDataType<DataTypeArray>(arguments[1].get());
if (!values_type)
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Second argument for function {} must be an Array", getName());
DataTypePtr value_type;
if (const auto * value_array_type = checkAndGetDataType<DataTypeArray>(arguments[1].get()))
value_type = value_array_type->getNestedType();
else if (const auto * value_map_type = checkAndGetDataType<DataTypeMap>(arguments[1].get()))
value_type = std::make_shared<DataTypeTuple>(value_map_type->getKeyValueTypes());
else
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Second argument for function {} must be Array or Map", getName());
DataTypes key_value_types{keys_type->getNestedType(), values_type->getNestedType()};
DataTypes key_value_types{key_type, value_type};
return std::make_shared<DataTypeMap>(key_value_types);
}
ColumnPtr executeImpl(
const ColumnsWithTypeAndName & arguments, const DataTypePtr & /* result_type */, size_t /* input_rows_count */) const override
{
ColumnPtr holder_keys;
bool is_keys_const = isColumnConst(*arguments[0].column);
ColumnPtr holder_keys;
const ColumnArray * col_keys;
if (is_keys_const)
{
@ -202,24 +210,26 @@ public:
col_keys = checkAndGetColumn<ColumnArray>(arguments[0].column.get());
}
ColumnPtr holder_values;
bool is_values_const = isColumnConst(*arguments[1].column);
const ColumnArray * col_values;
if (is_values_const)
{
holder_values = arguments[1].column->convertToFullColumnIfConst();
col_values = checkAndGetColumn<ColumnArray>(holder_values.get());
}
else
{
col_values = checkAndGetColumn<ColumnArray>(arguments[1].column.get());
}
if (!col_keys)
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "The first argument of function {} must be Array", getName());
if (!col_keys || !col_values)
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Arguments of function {} must be array", getName());
bool is_values_const = isColumnConst(*arguments[1].column);
ColumnPtr holder_values;
if (is_values_const)
holder_values = arguments[1].column->convertToFullColumnIfConst();
else
holder_values = arguments[1].column;
const ColumnArray * col_values;
if (const auto * col_values_array = checkAndGetColumn<ColumnArray>(holder_values.get()))
col_values = col_values_array;
else if (const auto * col_values_map = checkAndGetColumn<ColumnMap>(holder_values.get()))
col_values = &col_values_map->getNestedColumn();
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "The second arguments of function {} must be Array or Map", getName());
if (!col_keys->hasEqualOffsets(*col_values))
throw Exception(ErrorCodes::SIZES_OF_ARRAYS_DONT_MATCH, "Array arguments for function {} must have equal sizes", getName());
throw Exception(ErrorCodes::SIZES_OF_ARRAYS_DONT_MATCH, "Two arguments for function {} must have equal sizes", getName());
const auto & data_keys = col_keys->getDataPtr();
const auto & data_values = col_values->getDataPtr();

View File

@ -1,4 +1,5 @@
#include <IO/CascadeWriteBuffer.h>
#include <IO/MemoryReadWriteBuffer.h>
#include <Common/Exception.h>
namespace DB
@ -35,9 +36,9 @@ void CascadeWriteBuffer::nextImpl()
curr_buffer->position() = position();
curr_buffer->next();
}
catch (const Exception & e)
catch (const MemoryWriteBuffer::CurrentBufferExhausted &)
{
if (curr_buffer_num < num_sources && e.code() == ErrorCodes::CURRENT_WRITE_BUFFER_IS_EXHAUSTED)
if (curr_buffer_num < num_sources)
{
/// TODO: protocol should require set(position(), 0) before Exception
@ -46,7 +47,7 @@ void CascadeWriteBuffer::nextImpl()
curr_buffer = setNextBuffer();
}
else
throw;
throw Exception(ErrorCodes::CURRENT_WRITE_BUFFER_IS_EXHAUSTED, "MemoryWriteBuffer limit is exhausted");
}
set(curr_buffer->position(), curr_buffer->buffer().end() - curr_buffer->position());

View File

@ -16,7 +16,7 @@ namespace ErrorCodes
* (lazy_sources contains not pointers themself, but their delayed constructors)
*
* Firtly, CascadeWriteBuffer redirects data to first buffer of the sequence
* If current WriteBuffer cannot receive data anymore, it throws special exception CURRENT_WRITE_BUFFER_IS_EXHAUSTED in nextImpl() body,
* If current WriteBuffer cannot receive data anymore, it throws special exception MemoryWriteBuffer::CurrentBufferExhausted in nextImpl() body,
* CascadeWriteBuffer prepare next buffer and continuously redirects data to it.
* If there are no buffers anymore CascadeWriteBuffer throws an exception.
*

View File

@ -5,12 +5,6 @@
namespace DB
{
namespace ErrorCodes
{
extern const int CURRENT_WRITE_BUFFER_IS_EXHAUSTED;
}
class ReadBufferFromMemoryWriteBuffer : public ReadBuffer, boost::noncopyable, private Allocator<false>
{
public:
@ -118,7 +112,7 @@ void MemoryWriteBuffer::addChunk()
if (0 == next_chunk_size)
{
set(position(), 0);
throw Exception(ErrorCodes::CURRENT_WRITE_BUFFER_IS_EXHAUSTED, "MemoryWriteBuffer limit is exhausted");
throw MemoryWriteBuffer::CurrentBufferExhausted();
}
}

View File

@ -16,6 +16,12 @@ namespace DB
class MemoryWriteBuffer : public WriteBuffer, public IReadableWriteBuffer, boost::noncopyable, private Allocator<false>
{
public:
/// Special exception to throw when the current WriteBuffer cannot receive data
class CurrentBufferExhausted : public std::exception
{
public:
const char * what() const noexcept override { return "MemoryWriteBuffer limit is exhausted"; }
};
/// Use max_total_size_ = 0 for unlimited storage
explicit MemoryWriteBuffer(

View File

@ -198,7 +198,7 @@ TEST(MemoryWriteBuffer, WriteAndReread)
if (s > 1)
{
MemoryWriteBuffer buf(s - 1);
EXPECT_THROW(buf.write(data.data(), data.size()), DB::Exception);
EXPECT_THROW(buf.write(data.data(), data.size()), MemoryWriteBuffer::CurrentBufferExhausted);
}
}

View File

@ -2497,4 +2497,30 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
return result_dag;
}
FindOriginalNodeForOutputName::FindOriginalNodeForOutputName(const ActionsDAGPtr & actions_)
:actions(actions_)
{
for (const auto * node : actions->getOutputs())
index.emplace(node->result_name, node);
}
const ActionsDAG::Node * FindOriginalNodeForOutputName::find(const String & output_name)
{
const auto it = index.find(output_name);
if (it == index.end())
return nullptr;
/// find original(non alias) node it refers to
const ActionsDAG::Node * node = it->second;
while (node && node->type == ActionsDAG::ActionType::ALIAS)
{
chassert(!node->children.empty());
node = node->children.front();
}
if (node && node->type != ActionsDAG::ActionType::INPUT)
return nullptr;
return node;
}
}

View File

@ -402,6 +402,19 @@ private:
static ActionsDAGPtr cloneActionsForConjunction(NodeRawConstPtrs conjunction, const ColumnsWithTypeAndName & all_inputs);
};
class FindOriginalNodeForOutputName
{
using NameToNodeIndex = std::unordered_map<std::string_view, const ActionsDAG::Node *>;
public:
explicit FindOriginalNodeForOutputName(const ActionsDAGPtr & actions);
const ActionsDAG::Node* find(const String& output_name);
private:
ActionsDAGPtr actions;
NameToNodeIndex index;
};
/// This is an ugly way to bypass impossibility to forward declare ActionDAG::Node.
struct ActionDAGNodes
{

View File

@ -75,6 +75,7 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
extern const int TOO_MANY_ARGUMENTS_FOR_FUNCTION;
extern const int FUNCTION_CANNOT_HAVE_PARAMETERS;
}
static NamesAndTypesList::iterator findColumn(const String & name, NamesAndTypesList & cols)
@ -1109,6 +1110,12 @@ void ActionsMatcher::visit(const ASTFunction & node, const ASTPtr & ast, Data &
}
}
/// Normal functions are not parametric for now.
if (node.parameters)
{
throw Exception(ErrorCodes::FUNCTION_CANNOT_HAVE_PARAMETERS, "Function {} is not parametric", node.name);
}
Names argument_names;
DataTypes argument_types;
bool arguments_present = true;

View File

@ -122,12 +122,15 @@ ASTPtr removeQueryCacheSettings(ASTPtr ast)
QueryCache::Key::Key(
ASTPtr ast_,
Block header_, const std::optional<String> & username_,
std::chrono::time_point<std::chrono::system_clock> expires_at_)
Block header_,
const std::optional<String> & username_,
std::chrono::time_point<std::chrono::system_clock> expires_at_,
bool is_compressed_)
: ast(removeQueryCacheSettings(ast_))
, header(header_)
, username(username_)
, expires_at(expires_at_)
, is_compressed(is_compressed_)
{
}
@ -153,7 +156,7 @@ size_t QueryCache::KeyHasher::operator()(const Key & key) const
return res;
}
size_t QueryCache::QueryResultWeight::operator()(const QueryResult & chunks) const
size_t QueryCache::QueryResultWeight::operator()(const Chunks & chunks) const
{
size_t res = 0;
for (const auto & chunk : chunks)
@ -168,12 +171,16 @@ bool QueryCache::IsStale::operator()(const Key & key) const
QueryCache::Writer::Writer(Cache & cache_, const Key & key_,
size_t max_entry_size_in_bytes_, size_t max_entry_size_in_rows_,
std::chrono::milliseconds min_query_runtime_)
std::chrono::milliseconds min_query_runtime_,
bool squash_partial_results_,
size_t max_block_size_)
: cache(cache_)
, key(key_)
, max_entry_size_in_bytes(max_entry_size_in_bytes_)
, max_entry_size_in_rows(max_entry_size_in_rows_)
, min_query_runtime(min_query_runtime_)
, squash_partial_results(squash_partial_results_)
, max_block_size(max_block_size_)
{
if (auto entry = cache.getWithKey(key); entry.has_value() && !IsStale()(entry->key))
{
@ -211,6 +218,8 @@ void QueryCache::Writer::finalizeWrite()
std::lock_guard lock(mutex);
chassert(!was_finalized);
if (std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - query_start_time) < min_query_runtime)
{
LOG_TRACE(&Poco::Logger::get("QueryCache"), "Skipped insert (query not expensive enough), query: {}", key.queryStringFromAst());
@ -224,7 +233,67 @@ void QueryCache::Writer::finalizeWrite()
return;
}
if (squash_partial_results)
{
// Squash partial result chunks to chunks of size 'max_block_size' each. This costs some performance but provides a more natural
// compression of neither too small nor big blocks. Also, it will look like 'max_block_size' is respected when the query result is
// served later on from the query cache.
Chunks squashed_chunks;
size_t rows_remaining_in_squashed = 0; /// how many further rows can the last squashed chunk consume until it reaches max_block_size
for (const auto & chunk : *query_result)
{
const size_t rows_chunk = chunk.getNumRows();
size_t rows_chunk_processed = 0;
if (rows_chunk == 0)
continue;
while (true)
{
if (rows_remaining_in_squashed == 0)
{
Chunk empty_chunk = Chunk(chunk.cloneEmptyColumns(), 0);
squashed_chunks.push_back(std::move(empty_chunk));
rows_remaining_in_squashed = max_block_size;
}
const size_t rows_to_append = std::min(rows_chunk - rows_chunk_processed, rows_remaining_in_squashed);
squashed_chunks.back().append(chunk, rows_chunk_processed, rows_to_append);
rows_chunk_processed += rows_to_append;
rows_remaining_in_squashed += rows_to_append;
if (rows_chunk_processed == rows_chunk)
break;
}
}
*query_result = std::move(squashed_chunks);
}
if (key.is_compressed)
{
Chunks compressed_chunks;
const Chunks & decompressed_chunks = *query_result;
for (const auto & decompressed_chunk : decompressed_chunks)
{
const Columns & decompressed_columns = decompressed_chunk.getColumns();
Columns compressed_columns;
for (const auto & decompressed_column : decompressed_columns)
{
auto compressed_column = decompressed_column->compress();
compressed_columns.push_back(compressed_column);
}
Chunk compressed_chunk(compressed_columns, decompressed_chunk.getNumRows());
compressed_chunks.push_back(std::move(compressed_chunk));
}
*query_result = std::move(compressed_chunks);
}
cache.set(key, query_result);
was_finalized = true;
}
QueryCache::Reader::Reader(Cache & cache_, const Key & key, const std::lock_guard<std::mutex> &)
@ -249,7 +318,28 @@ QueryCache::Reader::Reader(Cache & cache_, const Key & key, const std::lock_guar
return;
}
pipe = Pipe(std::make_shared<SourceFromChunks>(entry->key.header, entry->mapped));
if (!entry->key.is_compressed)
pipe = Pipe(std::make_shared<SourceFromChunks>(entry->key.header, entry->mapped));
else
{
auto decompressed_chunks = std::make_shared<Chunks>();
const Chunks & compressed_chunks = *entry->mapped;
for (const auto & compressed_chunk : compressed_chunks)
{
const Columns & compressed_chunk_columns = compressed_chunk.getColumns();
Columns decompressed_columns;
for (const auto & compressed_column : compressed_chunk_columns)
{
auto column = compressed_column->decompress();
decompressed_columns.push_back(column);
}
Chunk decompressed_chunk(decompressed_columns, compressed_chunk.getNumRows());
decompressed_chunks->push_back(std::move(decompressed_chunk));
}
pipe = Pipe(std::make_shared<SourceFromChunks>(entry->key.header, decompressed_chunks));
}
LOG_TRACE(&Poco::Logger::get("QueryCache"), "Entry found for query {}", key.queryStringFromAst());
}
@ -277,10 +367,10 @@ QueryCache::Reader QueryCache::createReader(const Key & key)
return Reader(cache, key, lock);
}
QueryCache::Writer QueryCache::createWriter(const Key & key, std::chrono::milliseconds min_query_runtime)
QueryCache::Writer QueryCache::createWriter(const Key & key, std::chrono::milliseconds min_query_runtime, bool squash_partial_results, size_t max_block_size)
{
std::lock_guard lock(mutex);
return Writer(cache, key, max_entry_size_in_bytes, max_entry_size_in_rows, min_query_runtime);
return Writer(cache, key, max_entry_size_in_bytes, max_entry_size_in_rows, min_query_runtime, squash_partial_results, max_block_size);
}
void QueryCache::reset()
@ -308,7 +398,7 @@ std::vector<QueryCache::Cache::KeyMapped> QueryCache::dump() const
}
QueryCache::QueryCache()
: cache(std::make_unique<TTLCachePolicy<Key, QueryResult, KeyHasher, QueryResultWeight, IsStale>>())
: cache(std::make_unique<TTLCachePolicy<Key, Chunks, KeyHasher, QueryResultWeight, IsStale>>())
{
}

View File

@ -50,16 +50,19 @@ public:
/// When does the entry expire?
const std::chrono::time_point<std::chrono::system_clock> expires_at;
/// Is the entry compressed?
const bool is_compressed;
Key(ASTPtr ast_,
Block header_, const std::optional<String> & username_,
std::chrono::time_point<std::chrono::system_clock> expires_at_);
Block header_,
const std::optional<String> & username_,
std::chrono::time_point<std::chrono::system_clock> expires_at_,
bool is_compressed);
bool operator==(const Key & other) const;
String queryStringFromAst() const;
};
using QueryResult = Chunks;
private:
struct KeyHasher
{
@ -68,7 +71,7 @@ private:
struct QueryResultWeight
{
size_t operator()(const QueryResult & chunks) const;
size_t operator()(const Chunks & chunks) const;
};
struct IsStale
@ -77,7 +80,7 @@ private:
};
/// query --> query result
using Cache = CacheBase<Key, QueryResult, KeyHasher, QueryResultWeight>;
using Cache = CacheBase<Key, Chunks, KeyHasher, QueryResultWeight>;
/// query --> query execution count
using TimesExecuted = std::unordered_map<Key, size_t, KeyHasher>;
@ -109,12 +112,17 @@ public:
const size_t max_entry_size_in_rows;
const std::chrono::time_point<std::chrono::system_clock> query_start_time = std::chrono::system_clock::now(); /// Writer construction and finalizeWrite() coincide with query start/end
const std::chrono::milliseconds min_query_runtime;
std::shared_ptr<QueryResult> query_result TSA_GUARDED_BY(mutex) = std::make_shared<QueryResult>();
const bool squash_partial_results;
const size_t max_block_size;
std::shared_ptr<Chunks> query_result TSA_GUARDED_BY(mutex) = std::make_shared<Chunks>();
std::atomic<bool> skip_insert = false;
bool was_finalized = false;
Writer(Cache & cache_, const Key & key_,
size_t max_entry_size_in_bytes_, size_t max_entry_size_in_rows_,
std::chrono::milliseconds min_query_runtime_);
std::chrono::milliseconds min_query_runtime_,
bool squash_partial_results_,
size_t max_block_size_);
friend class QueryCache; /// for createWriter()
};
@ -136,7 +144,7 @@ public:
void updateConfiguration(const Poco::Util::AbstractConfiguration & config);
Reader createReader(const Key & key);
Writer createWriter(const Key & key, std::chrono::milliseconds min_query_runtime);
Writer createWriter(const Key & key, std::chrono::milliseconds min_query_runtime, bool squash_partial_results, size_t max_block_size);
void reset();

View File

@ -2362,7 +2362,7 @@ zkutil::ZooKeeperPtr Context::getZooKeeper() const
const auto & config = shared->zookeeper_config ? *shared->zookeeper_config : getConfigRef();
if (!shared->zookeeper)
shared->zookeeper = std::make_shared<zkutil::ZooKeeper>(config, "zookeeper", getZooKeeperLog());
shared->zookeeper = std::make_shared<zkutil::ZooKeeper>(config, zkutil::getZooKeeperConfigName(config), getZooKeeperLog());
else if (shared->zookeeper->expired())
{
Stopwatch watch;
@ -2401,8 +2401,9 @@ bool Context::tryCheckClientConnectionToMyKeeperCluster() const
{
try
{
const auto config_name = zkutil::getZooKeeperConfigName(getConfigRef());
/// If our server is part of main Keeper cluster
if (checkZooKeeperConfigIsLocal(getConfigRef(), "zookeeper"))
if (config_name == "keeper_server" || checkZooKeeperConfigIsLocal(getConfigRef(), config_name))
{
LOG_DEBUG(shared->log, "Keeper server is participant of the main zookeeper cluster, will try to connect to it");
getZooKeeper();
@ -2608,7 +2609,7 @@ void Context::reloadZooKeeperIfChanged(const ConfigurationPtr & config) const
bool server_started = isServerCompletelyStarted();
std::lock_guard lock(shared->zookeeper_mutex);
shared->zookeeper_config = config;
reloadZooKeeperIfChangedImpl(config, "zookeeper", shared->zookeeper, getZooKeeperLog(), server_started);
reloadZooKeeperIfChangedImpl(config, zkutil::getZooKeeperConfigName(*config), shared->zookeeper, getZooKeeperLog(), server_started);
}
void Context::reloadAuxiliaryZooKeepersConfigIfChanged(const ConfigurationPtr & config)
@ -2633,7 +2634,7 @@ void Context::reloadAuxiliaryZooKeepersConfigIfChanged(const ConfigurationPtr &
bool Context::hasZooKeeper() const
{
return getConfigRef().has("zookeeper");
return zkutil::hasZooKeeperConfig(getConfigRef());
}
bool Context::hasAuxiliaryZooKeeper(const String & name) const

View File

@ -20,6 +20,7 @@
#include <Storages/LiveView/StorageLiveView.h>
#include <Storages/MutationCommands.h>
#include <Storages/PartitionCommands.h>
#include <Storages/StorageKeeperMap.h>
#include <Common/typeid_cast.h>
#include <Functions/UserDefined/UserDefinedSQLFunctionFactory.h>
@ -39,6 +40,8 @@ namespace ErrorCodes
extern const int INCORRECT_QUERY;
extern const int NOT_IMPLEMENTED;
extern const int TABLE_IS_READ_ONLY;
extern const int BAD_ARGUMENTS;
extern const int UNKNOWN_TABLE;
}
@ -72,16 +75,21 @@ BlockIO InterpreterAlterQuery::executeToTable(const ASTAlterQuery & alter)
if (!UserDefinedSQLFunctionFactory::instance().empty())
UserDefinedSQLFunctionVisitor::visit(query_ptr);
auto table_id = getContext()->resolveStorageID(alter, Context::ResolveOrdinary);
query_ptr->as<ASTAlterQuery &>().setDatabase(table_id.database_name);
StoragePtr table = DatabaseCatalog::instance().tryGetTable(table_id, getContext());
if (!alter.cluster.empty() && !maybeRemoveOnCluster(query_ptr, getContext()))
{
if (table && table->as<StorageKeeperMap>())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Mutations with ON CLUSTER are not allowed for KeeperMap tables");
DDLQueryOnClusterParams params;
params.access_to_check = getRequiredAccess();
return executeDDLQueryOnCluster(query_ptr, getContext(), params);
}
getContext()->checkAccess(getRequiredAccess());
auto table_id = getContext()->resolveStorageID(alter, Context::ResolveOrdinary);
query_ptr->as<ASTAlterQuery &>().setDatabase(table_id.database_name);
DatabasePtr database = DatabaseCatalog::instance().getDatabase(table_id.database_name);
if (database->shouldReplicateQuery(getContext(), query_ptr))
@ -91,7 +99,9 @@ BlockIO InterpreterAlterQuery::executeToTable(const ASTAlterQuery & alter)
return database->tryEnqueueReplicatedDDL(query_ptr, getContext());
}
StoragePtr table = DatabaseCatalog::instance().getTable(table_id, getContext());
if (!table)
throw Exception(ErrorCodes::UNKNOWN_TABLE, "Could not find table: {}", table_id.table_name);
checkStorageSupportsTransactionsIfNeeded(table, getContext());
if (table->isStaticStorage())
throw Exception(ErrorCodes::TABLE_IS_READ_ONLY, "Table is read-only");

View File

@ -550,6 +550,12 @@ void MutationsInterpreter::prepare(bool dry_run)
if (source.hasLightweightDeleteMask())
all_columns.push_back({LightweightDeleteDescription::FILTER_COLUMN});
if (return_all_columns)
{
for (const auto & column : source.getStorage()->getVirtuals())
all_columns.push_back(column);
}
NameSet updated_columns;
bool materialize_ttl_recalculate_only = source.materializeTTLRecalculateOnly();
@ -906,6 +912,8 @@ void MutationsInterpreter::prepareMutationStages(std::vector<Stage> & prepared_s
{
auto storage_snapshot = source.getStorageSnapshot(metadata_snapshot, context);
auto options = GetColumnsOptions(GetColumnsOptions::AllPhysical).withExtendedObjects();
if (return_all_columns)
options.withVirtuals();
auto all_columns = storage_snapshot->getColumns(options);
/// Add _row_exists column if it is present in the part
@ -1256,6 +1264,7 @@ void MutationsInterpreter::validate()
}
QueryPlan plan;
initQueryPlan(stages.front(), plan);
auto pipeline = addStreamsForLaterStages(stages, plan);
}

View File

@ -726,7 +726,8 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
QueryCache::Key key(
ast, res.pipeline.getHeader(),
std::make_optional<String>(context->getUserName()),
std::chrono::system_clock::now() + std::chrono::seconds(settings.query_cache_ttl));
/*dummy value for expires_at*/ std::chrono::system_clock::from_time_t(1),
/*dummy value for is_compressed*/ true);
QueryCache::Reader reader = query_cache->createReader(key);
if (reader.hasCacheEntryForKey())
{
@ -748,13 +749,18 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
QueryCache::Key key(
ast, res.pipeline.getHeader(),
settings.query_cache_share_between_users ? std::nullopt : std::make_optional<String>(context->getUserName()),
std::chrono::system_clock::now() + std::chrono::seconds(settings.query_cache_ttl));
std::chrono::system_clock::now() + std::chrono::seconds(settings.query_cache_ttl),
settings.query_cache_compress_entries);
const size_t num_query_runs = query_cache->recordQueryRun(key);
if (num_query_runs > settings.query_cache_min_query_runs)
{
auto stream_in_query_cache_transform = std::make_shared<StreamInQueryCacheTransform>(res.pipeline.getHeader(), query_cache, key,
std::chrono::milliseconds(context->getSettings().query_cache_min_query_duration.totalMilliseconds()));
auto stream_in_query_cache_transform =
std::make_shared<StreamInQueryCacheTransform>(
res.pipeline.getHeader(), query_cache, key,
std::chrono::milliseconds(context->getSettings().query_cache_min_query_duration.totalMilliseconds()),
context->getSettings().query_cache_squash_partial_results,
context->getSettings().max_block_size);
res.pipeline.streamIntoQueryCache(stream_in_query_cache_transform);
}
}

View File

@ -611,8 +611,16 @@ void ASTAlterQuery::formatQueryImpl(const FormatSettings & settings, FormatState
FormatStateStacked frame_nested = frame;
frame_nested.need_parens = false;
frame_nested.expression_list_always_start_on_new_line = true;
static_cast<ASTExpressionList *>(command_list)->formatImplMultiline(settings, state, frame_nested);
if (settings.one_line)
{
frame_nested.expression_list_prepend_whitespace = true;
command_list->formatImpl(settings, state, frame_nested);
}
else
{
frame_nested.expression_list_always_start_on_new_line = true;
command_list->as<ASTExpressionList &>().formatImplMultiline(settings, state, frame_nested);
}
}
}

View File

@ -103,7 +103,7 @@ namespace
});
}
bool parseElement(IParser::Pos & pos, Expected & expected, bool allow_all, Element & element)
bool parseElement(IParser::Pos & pos, Expected & expected, Element & element)
{
return IParserBase::wrapParseImpl(pos, [&]
{
@ -169,7 +169,7 @@ namespace
return true;
}
if (allow_all && ParserKeyword{"ALL"}.ignore(pos, expected))
if (ParserKeyword{"ALL"}.ignore(pos, expected))
{
element.type = ElementType::ALL;
parseExceptDatabases(pos, expected, element.except_databases);
@ -181,7 +181,7 @@ namespace
});
}
bool parseElements(IParser::Pos & pos, Expected & expected, bool allow_all, std::vector<Element> & elements)
bool parseElements(IParser::Pos & pos, Expected & expected, std::vector<Element> & elements)
{
return IParserBase::wrapParseImpl(pos, [&]
{
@ -190,7 +190,7 @@ namespace
auto parse_element = [&]
{
Element element;
if (parseElement(pos, expected, allow_all, element))
if (parseElement(pos, expected, element))
{
result.emplace_back(std::move(element));
return true;
@ -334,11 +334,8 @@ bool ParserBackupQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
else
return false;
/// Disable "ALL" if this is a RESTORE command.
bool allow_all = (kind == Kind::RESTORE);
std::vector<Element> elements;
if (!parseElements(pos, expected, allow_all, elements))
if (!parseElements(pos, expected, elements))
return false;
String cluster;

View File

@ -23,11 +23,11 @@ Chunk::Chunk(Columns columns_, UInt64 num_rows_, ChunkInfoPtr chunk_info_)
checkNumRowsIsConsistent();
}
static Columns unmuteColumns(MutableColumns && mut_columns)
static Columns unmuteColumns(MutableColumns && mutable_columns)
{
Columns columns;
columns.reserve(mut_columns.size());
for (auto & col : mut_columns)
columns.reserve(mutable_columns.size());
for (auto & col : mutable_columns)
columns.emplace_back(std::move(col));
return columns;
@ -78,23 +78,23 @@ void Chunk::checkNumRowsIsConsistent()
MutableColumns Chunk::mutateColumns()
{
size_t num_columns = columns.size();
MutableColumns mut_columns(num_columns);
MutableColumns mutable_columns(num_columns);
for (size_t i = 0; i < num_columns; ++i)
mut_columns[i] = IColumn::mutate(std::move(columns[i]));
mutable_columns[i] = IColumn::mutate(std::move(columns[i]));
columns.clear();
num_rows = 0;
return mut_columns;
return mutable_columns;
}
MutableColumns Chunk::cloneEmptyColumns() const
{
size_t num_columns = columns.size();
MutableColumns mut_columns(num_columns);
MutableColumns mutable_columns(num_columns);
for (size_t i = 0; i < num_columns; ++i)
mut_columns[i] = columns[i]->cloneEmpty();
return mut_columns;
mutable_columns[i] = columns[i]->cloneEmpty();
return mutable_columns;
}
Columns Chunk::detachColumns()
@ -171,14 +171,19 @@ std::string Chunk::dumpStructure() const
void Chunk::append(const Chunk & chunk)
{
MutableColumns mutation = mutateColumns();
for (size_t position = 0; position < mutation.size(); ++position)
append(chunk, 0, chunk.getNumRows());
}
void Chunk::append(const Chunk & chunk, size_t from, size_t length)
{
MutableColumns mutable_columns = mutateColumns();
for (size_t position = 0; position < mutable_columns.size(); ++position)
{
auto column = chunk.getColumns()[position];
mutation[position]->insertRangeFrom(*column, 0, column->size());
mutable_columns[position]->insertRangeFrom(*column, from, length);
}
size_t rows = mutation[0]->size();
setColumns(std::move(mutation), rows);
size_t rows = mutable_columns[0]->size();
setColumns(std::move(mutable_columns), rows);
}
void ChunkMissingValues::setBit(size_t column_idx, size_t row_idx)

View File

@ -102,6 +102,7 @@ public:
std::string dumpStructure() const;
void append(const Chunk & chunk);
void append(const Chunk & chunk, size_t from, size_t length); // append rows [from, from+length) of chunk
private:
Columns columns;

View File

@ -12,12 +12,13 @@ namespace DB
static ITransformingStep::Traits getTraits(bool pre_distinct)
{
const bool preserves_number_of_streams = pre_distinct;
return ITransformingStep::Traits
{
{
.returns_single_stream = !pre_distinct,
.preserves_number_of_streams = pre_distinct,
.preserves_sorting = true, /// Sorting is preserved indeed because of implementation.
.preserves_number_of_streams = preserves_number_of_streams,
.preserves_sorting = preserves_number_of_streams,
},
{
.preserves_number_of_rows = false,

View File

@ -75,6 +75,32 @@ void ExpressionStep::updateOutputStream()
{
output_stream = createOutputStream(
input_streams.front(), ExpressionTransform::transformHeader(input_streams.front().header, *actions_dag), getDataStreamTraits());
if (!getDataStreamTraits().preserves_sorting)
return;
FindOriginalNodeForOutputName original_node_finder(actions_dag);
const auto & input_sort_description = getInputStreams().front().sort_description;
for (size_t i = 0, s = input_sort_description.size(); i < s; ++i)
{
const auto & desc = input_sort_description[i];
String alias;
const auto & origin_column = desc.column_name;
for (const auto & column : output_stream->header)
{
const auto * original_node = original_node_finder.find(column.name);
if (original_node && original_node->result_name == origin_column)
{
alias = column.name;
break;
}
}
if (alias.empty())
return;
output_stream->sort_description[i].column_name = alias;
}
}
}

View File

@ -105,6 +105,32 @@ void FilterStep::updateOutputStream()
input_streams.front(),
FilterTransform::transformHeader(input_streams.front().header, actions_dag.get(), filter_column_name, remove_filter_column),
getDataStreamTraits());
if (!getDataStreamTraits().preserves_sorting)
return;
FindOriginalNodeForOutputName original_node_finder(actions_dag);
const auto & input_sort_description = getInputStreams().front().sort_description;
for (size_t i = 0, s = input_sort_description.size(); i < s; ++i)
{
const auto & desc = input_sort_description[i];
String alias;
const auto & origin_column = desc.column_name;
for (const auto & column : output_stream->header)
{
const auto * original_node = original_node_finder.find(column.name);
if (original_node && original_node->result_name == origin_column)
{
alias = column.name;
break;
}
}
if (alias.empty())
return;
output_stream->sort_description[i].column_name = alias;
}
}
}

View File

@ -70,6 +70,9 @@ MergingAggregatedStep::MergingAggregatedStep(
void MergingAggregatedStep::applyOrder(SortDescription sort_description, DataStream::SortScope sort_scope)
{
is_order_overwritten = true;
overwritten_sort_scope = sort_scope;
auto & input_stream = input_streams.front();
input_stream.sort_scope = sort_scope;
input_stream.sort_description = sort_description;
@ -152,6 +155,8 @@ void MergingAggregatedStep::describeActions(JSONBuilder::JSONMap & map) const
void MergingAggregatedStep::updateOutputStream()
{
output_stream = createOutputStream(input_streams.front(), params.getHeader(input_streams.front().header, final), getDataStreamTraits());
if (is_order_overwritten) /// overwrite order again
applyOrder(group_by_sort_description, overwritten_sort_scope);
}
bool MergingAggregatedStep::memoryBoundMergingWillBeUsed() const

View File

@ -51,6 +51,9 @@ private:
const size_t memory_bound_merging_max_block_bytes;
SortDescription group_by_sort_description;
bool is_order_overwritten = false;
DataStream::SortScope overwritten_sort_scope = DataStream::SortScope::None;
/// These settings are used to determine if we should resize pipeline to 1 at the end.
const bool should_produce_results_in_order_of_bucket_number;
const bool memory_bound_merging_of_aggregation_results_enabled;

View File

@ -26,38 +26,11 @@ static ActionsDAGPtr buildActionsForPlanPath(std::vector<ActionsDAGPtr> & dag_st
return path_actions;
}
static const ActionsDAG::Node * getOriginalNodeForOutputAlias(const ActionsDAGPtr & actions, const String & output_name)
{
/// find alias in output
const ActionsDAG::Node * output_alias = nullptr;
for (const auto * node : actions->getOutputs())
{
if (node->result_name == output_name)
{
output_alias = node;
break;
}
}
if (!output_alias)
return nullptr;
/// find original(non alias) node it refers to
const ActionsDAG::Node * node = output_alias;
while (node && node->type == ActionsDAG::ActionType::ALIAS)
{
chassert(!node->children.empty());
node = node->children.front();
}
if (node && node->type != ActionsDAG::ActionType::INPUT)
return nullptr;
return node;
}
static std::set<std::string>
getOriginalDistinctColumns(const ColumnsWithTypeAndName & distinct_columns, std::vector<ActionsDAGPtr> & dag_stack)
{
auto actions = buildActionsForPlanPath(dag_stack);
FindOriginalNodeForOutputName original_node_finder(actions);
std::set<std::string> original_distinct_columns;
for (const auto & column : distinct_columns)
{
@ -65,7 +38,7 @@ getOriginalDistinctColumns(const ColumnsWithTypeAndName & distinct_columns, std:
if (isColumnConst(*column.column))
continue;
const auto * input_node = getOriginalNodeForOutputAlias(actions, column.name);
const auto * input_node = original_node_finder.find(column.name);
if (!input_node)
break;

View File

@ -64,37 +64,6 @@ namespace
return non_const_columns;
}
const ActionsDAG::Node * getOriginalNodeForOutputAlias(const ActionsDAGPtr & actions, const String & output_name)
{
/// find alias in output
const ActionsDAG::Node * output_alias = nullptr;
for (const auto * node : actions->getOutputs())
{
if (node->result_name == output_name)
{
output_alias = node;
break;
}
}
if (!output_alias)
{
logDebug("getOriginalNodeForOutputAlias: no output alias found", output_name);
return nullptr;
}
/// find original(non alias) node it refers to
const ActionsDAG::Node * node = output_alias;
while (node && node->type == ActionsDAG::ActionType::ALIAS)
{
chassert(!node->children.empty());
node = node->children.front();
}
if (node && node->type != ActionsDAG::ActionType::INPUT)
return nullptr;
return node;
}
bool compareAggregationKeysWithDistinctColumns(
const Names & aggregation_keys, const DistinctColumns & distinct_columns, const ActionsDAGPtr & path_actions)
{
@ -103,10 +72,11 @@ namespace
logDebug("distinct_columns size", distinct_columns.size());
std::set<std::string_view> original_distinct_columns;
FindOriginalNodeForOutputName original_node_finder(path_actions);
for (const auto & column : distinct_columns)
{
logDebug("distinct column name", column);
const auto * alias_node = getOriginalNodeForOutputAlias(path_actions, String(column));
const auto * alias_node = original_node_finder.find(String(column));
if (!alias_node)
{
logDebug("original name for alias is not found", column);
@ -273,9 +243,10 @@ namespace
logActionsDAG("distinct pass: merged DAG", path_actions);
/// compare columns of two DISTINCTs
FindOriginalNodeForOutputName original_node_finder(path_actions);
for (const auto & column : distinct_columns)
{
const auto * alias_node = getOriginalNodeForOutputAlias(path_actions, String(column));
const auto * alias_node = original_node_finder.find(String(column));
if (!alias_node)
return false;

View File

@ -8,6 +8,7 @@
#include <Processors/QueryPlan/LimitByStep.h>
#include <Processors/QueryPlan/LimitStep.h>
#include <Processors/QueryPlan/Optimizations/Optimizations.h>
#include <Processors/QueryPlan/QueryPlanVisitor.h>
#include <Processors/QueryPlan/ReadFromMergeTree.h>
#include <Processors/QueryPlan/ReadFromRemote.h>
#include <Processors/QueryPlan/SortingStep.h>
@ -18,108 +19,6 @@
namespace DB::QueryPlanOptimizations
{
template <typename Derived, bool debug_logging = false>
class QueryPlanVisitor
{
protected:
struct FrameWithParent
{
QueryPlan::Node * node = nullptr;
QueryPlan::Node * parent_node = nullptr;
size_t next_child = 0;
};
using StackWithParent = std::vector<FrameWithParent>;
QueryPlan::Node * root = nullptr;
StackWithParent stack;
public:
explicit QueryPlanVisitor(QueryPlan::Node * root_) : root(root_) { }
void visit()
{
stack.push_back({.node = root});
while (!stack.empty())
{
auto & frame = stack.back();
QueryPlan::Node * current_node = frame.node;
QueryPlan::Node * parent_node = frame.parent_node;
logStep("back", current_node);
/// top-down visit
if (0 == frame.next_child)
{
logStep("top-down", current_node);
if (!visitTopDown(current_node, parent_node))
continue;
}
/// Traverse all children
if (frame.next_child < frame.node->children.size())
{
auto next_frame = FrameWithParent{.node = current_node->children[frame.next_child], .parent_node = current_node};
++frame.next_child;
logStep("push", next_frame.node);
stack.push_back(next_frame);
continue;
}
/// bottom-up visit
logStep("bottom-up", current_node);
visitBottomUp(current_node, parent_node);
logStep("pop", current_node);
stack.pop_back();
}
}
bool visitTopDown(QueryPlan::Node * current_node, QueryPlan::Node * parent_node)
{
return getDerived().visitTopDownImpl(current_node, parent_node);
}
void visitBottomUp(QueryPlan::Node * current_node, QueryPlan::Node * parent_node)
{
getDerived().visitBottomUpImpl(current_node, parent_node);
}
private:
Derived & getDerived() { return *static_cast<Derived *>(this); }
const Derived & getDerived() const { return *static_cast<Derived *>(this); }
std::unordered_map<const IQueryPlanStep*, std::string> address2name;
std::unordered_map<std::string, UInt32> name_gen;
std::string getStepId(const IQueryPlanStep* step)
{
const auto step_name = step->getName();
auto it = address2name.find(step);
if (it != address2name.end())
return it->second;
const auto seq_num = name_gen[step_name]++;
return address2name.insert({step, fmt::format("{}{}", step_name, seq_num)}).first->second;
}
protected:
void logStep(const char * prefix, const QueryPlan::Node * node)
{
if constexpr (debug_logging)
{
const IQueryPlanStep * current_step = node->step.get();
LOG_DEBUG(
&Poco::Logger::get("QueryPlanVisitor"),
"{}: {}: {}",
prefix,
getStepId(current_step),
reinterpret_cast<const void *>(current_step));
}
}
};
constexpr bool debug_logging_enabled = false;
class RemoveRedundantSorting : public QueryPlanVisitor<RemoveRedundantSorting, debug_logging_enabled>

View File

@ -14,6 +14,8 @@
#include <Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h>
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/ReadFromMergeTree.h>
#include <Processors/QueryPlan/ITransformingStep.h>
#include <Processors/QueryPlan/QueryPlanVisitor.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
@ -452,6 +454,31 @@ void QueryPlan::explainPipeline(WriteBuffer & buffer, const ExplainPipelineOptio
}
}
static void updateDataStreams(QueryPlan::Node & root)
{
class UpdateDataStreams : public QueryPlanVisitor<UpdateDataStreams, false>
{
public:
explicit UpdateDataStreams(QueryPlan::Node * root_) : QueryPlanVisitor<UpdateDataStreams, false>(root_) { }
static bool visitTopDownImpl(QueryPlan::Node * /*current_node*/, QueryPlan::Node * /*parent_node*/) { return true; }
static void visitBottomUpImpl(QueryPlan::Node * current_node, QueryPlan::Node * parent_node)
{
if (!parent_node || parent_node->children.size() != 1)
return;
if (!current_node->step->hasOutputStream())
return;
if (auto * parent_transform_step = dynamic_cast<ITransformingStep *>(parent_node->step.get()); parent_transform_step)
parent_transform_step->updateInputStream(current_node->step->getOutputStream());
}
};
UpdateDataStreams(&root).visit();
}
void QueryPlan::optimize(const QueryPlanOptimizationSettings & optimization_settings)
{
/// optimization need to be applied before "mergeExpressions" optimization
@ -462,6 +489,8 @@ void QueryPlan::optimize(const QueryPlanOptimizationSettings & optimization_sett
QueryPlanOptimizations::optimizeTreeFirstPass(optimization_settings, *root, nodes);
QueryPlanOptimizations::optimizeTreeSecondPass(optimization_settings, *root, nodes);
updateDataStreams(*root);
}
void QueryPlan::explainEstimate(MutableColumns & columns)

View File

@ -0,0 +1,111 @@
#pragma once
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/IQueryPlanStep.h>
#include <Poco/Logger.h>
namespace DB
{
template <typename Derived, bool debug_logging = false>
class QueryPlanVisitor
{
protected:
struct FrameWithParent
{
QueryPlan::Node * node = nullptr;
QueryPlan::Node * parent_node = nullptr;
size_t next_child = 0;
};
using StackWithParent = std::vector<FrameWithParent>;
QueryPlan::Node * root = nullptr;
StackWithParent stack;
public:
explicit QueryPlanVisitor(QueryPlan::Node * root_) : root(root_) { }
void visit()
{
stack.push_back({.node = root});
while (!stack.empty())
{
auto & frame = stack.back();
QueryPlan::Node * current_node = frame.node;
QueryPlan::Node * parent_node = frame.parent_node;
logStep("back", current_node);
/// top-down visit
if (0 == frame.next_child)
{
logStep("top-down", current_node);
if (!visitTopDown(current_node, parent_node))
continue;
}
/// Traverse all children
if (frame.next_child < frame.node->children.size())
{
auto next_frame = FrameWithParent{.node = current_node->children[frame.next_child], .parent_node = current_node};
++frame.next_child;
logStep("push", next_frame.node);
stack.push_back(next_frame);
continue;
}
/// bottom-up visit
logStep("bottom-up", current_node);
visitBottomUp(current_node, parent_node);
logStep("pop", current_node);
stack.pop_back();
}
}
bool visitTopDown(QueryPlan::Node * current_node, QueryPlan::Node * parent_node)
{
return getDerived().visitTopDownImpl(current_node, parent_node);
}
void visitBottomUp(QueryPlan::Node * current_node, QueryPlan::Node * parent_node)
{
getDerived().visitBottomUpImpl(current_node, parent_node);
}
private:
Derived & getDerived() { return *static_cast<Derived *>(this); }
const Derived & getDerived() const { return *static_cast<Derived *>(this); }
std::unordered_map<const IQueryPlanStep*, std::string> address2name;
std::unordered_map<std::string, UInt32> name_gen;
std::string getStepId(const IQueryPlanStep* step)
{
const auto step_name = step->getName();
auto it = address2name.find(step);
if (it != address2name.end())
return it->second;
const auto seq_num = name_gen[step_name]++;
return address2name.insert({step, fmt::format("{}{}", step_name, seq_num)}).first->second;
}
protected:
void logStep(const char * prefix, const QueryPlan::Node * node)
{
if constexpr (debug_logging)
{
const IQueryPlanStep * current_step = node->step.get();
LOG_DEBUG(
&Poco::Logger::get("QueryPlanVisitor"),
"{}: {}: {}",
prefix,
getStepId(current_step),
reinterpret_cast<const void *>(current_step));
}
}
};
}

View File

@ -49,7 +49,9 @@ void RollupStep::transformPipeline(QueryPipelineBuilder & pipeline, const BuildQ
void RollupStep::updateOutputStream()
{
output_stream = createOutputStream(
input_streams.front(), appendGroupingSetColumn(params.getHeader(input_streams.front().header, final)), getDataStreamTraits());
input_streams.front(),
generateOutputHeader(params.getHeader(input_streams.front().header, final), params.keys, use_nulls),
getDataStreamTraits());
}

View File

@ -4,9 +4,14 @@ namespace DB
{
StreamInQueryCacheTransform::StreamInQueryCacheTransform(
const Block & header_, QueryCachePtr cache, const QueryCache::Key & cache_key, std::chrono::milliseconds min_query_duration)
const Block & header_,
QueryCachePtr cache,
const QueryCache::Key & cache_key,
std::chrono::milliseconds min_query_duration,
bool squash_partial_results,
size_t max_block_size)
: ISimpleTransform(header_, header_, false)
, cache_writer(cache->createWriter(cache_key, min_query_duration))
, cache_writer(cache->createWriter(cache_key, min_query_duration, squash_partial_results, max_block_size))
{
}

View File

@ -10,7 +10,12 @@ class StreamInQueryCacheTransform : public ISimpleTransform
{
public:
StreamInQueryCacheTransform(
const Block & header_, QueryCachePtr cache, const QueryCache::Key & cache_key, std::chrono::milliseconds min_query_duration);
const Block & header_,
QueryCachePtr cache,
const QueryCache::Key & cache_key,
std::chrono::milliseconds min_query_duration,
bool squash_partial_results,
size_t max_block_size);
protected:
void transform(Chunk & chunk) override;

View File

@ -2940,7 +2940,8 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
old_types.emplace(column.name, column.type.get());
NamesAndTypesList columns_to_check_conversion;
auto name_deps = getDependentViewsByColumn(local_context);
std::optional<NameDependencies> name_deps{};
for (const AlterCommand & command : commands)
{
/// Just validate partition expression
@ -3022,7 +3023,9 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
if (!command.clear)
{
const auto & deps_mv = name_deps[command.column_name];
if (!name_deps)
name_deps = getDependentViewsByColumn(local_context);
const auto & deps_mv = name_deps.value()[command.column_name];
if (!deps_mv.empty())
{
throw Exception(ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,

View File

@ -126,6 +126,12 @@ public:
user_error = UserError{};
}
template <typename... Args>
void setKeeperError(Coordination::Error code, fmt::format_string<Args...> fmt, Args &&... args)
{
setKeeperError(code, fmt::format(fmt, std::forward<Args>(args)...));
}
void stopRetries() { stop_retries = true; }
void requestUnconditionalRetry() { unconditional_retry = true; }

View File

@ -1016,7 +1016,7 @@ void StorageBuffer::reschedule()
void StorageBuffer::checkAlterIsPossible(const AlterCommands & commands, ContextPtr local_context) const
{
auto name_deps = getDependentViewsByColumn(local_context);
std::optional<NameDependencies> name_deps{};
for (const auto & command : commands)
{
if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN
@ -1027,7 +1027,9 @@ void StorageBuffer::checkAlterIsPossible(const AlterCommands & commands, Context
if (command.type == AlterCommand::Type::DROP_COLUMN && !command.clear)
{
const auto & deps_mv = name_deps[command.column_name];
if (!name_deps)
name_deps = getDependentViewsByColumn(local_context);
const auto & deps_mv = name_deps.value()[command.column_name];
if (!deps_mv.empty())
{
throw Exception(ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,

View File

@ -1401,7 +1401,7 @@ std::optional<QueryPipeline> StorageDistributed::distributedWrite(const ASTInser
void StorageDistributed::checkAlterIsPossible(const AlterCommands & commands, ContextPtr local_context) const
{
auto name_deps = getDependentViewsByColumn(local_context);
std::optional<NameDependencies> name_deps{};
for (const auto & command : commands)
{
if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN
@ -1413,7 +1413,9 @@ void StorageDistributed::checkAlterIsPossible(const AlterCommands & commands, Co
if (command.type == AlterCommand::DROP_COLUMN && !command.clear)
{
const auto & deps_mv = name_deps[command.column_name];
if (!name_deps)
name_deps = getDependentViewsByColumn(local_context);
const auto & deps_mv = name_deps.value()[command.column_name];
if (!deps_mv.empty())
{
throw Exception(ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,

View File

@ -59,6 +59,8 @@ namespace ErrorCodes
namespace
{
constexpr std::string_view version_column_name = "_version";
std::string formattedAST(const ASTPtr & ast)
{
if (!ast)
@ -77,7 +79,6 @@ void verifyTableId(const StorageID & table_id)
table_id.getDatabaseName(),
database->getEngineName());
}
}
}
@ -86,11 +87,13 @@ class StorageKeeperMapSink : public SinkToStorage
{
StorageKeeperMap & storage;
std::unordered_map<std::string, std::string> new_values;
std::unordered_map<std::string, int32_t> versions;
size_t primary_key_pos;
ContextPtr context;
public:
StorageKeeperMapSink(StorageKeeperMap & storage_, const StorageMetadataPtr & metadata_snapshot)
: SinkToStorage(metadata_snapshot->getSampleBlock()), storage(storage_)
StorageKeeperMapSink(StorageKeeperMap & storage_, Block header, ContextPtr context_)
: SinkToStorage(header), storage(storage_), context(std::move(context_))
{
auto primary_key = storage.getPrimaryKey();
assert(primary_key.size() == 1);
@ -113,18 +116,36 @@ public:
wb_value.restart();
size_t idx = 0;
int32_t version = -1;
for (const auto & elem : block)
{
if (elem.name == version_column_name)
{
version = assert_cast<const ColumnVector<Int32> &>(*elem.column).getData()[i];
continue;
}
elem.type->getDefaultSerialization()->serializeBinary(*elem.column, i, idx == primary_key_pos ? wb_key : wb_value, {});
++idx;
}
auto key = base64Encode(wb_key.str(), /* url_encoding */ true);
if (version != -1)
versions[key] = version;
new_values[std::move(key)] = std::move(wb_value.str());
}
}
void onFinish() override
{
finalize<false>(/*strict*/ context->getSettingsRef().keeper_map_strict_mode);
}
template <bool for_update>
void finalize(bool strict)
{
auto zookeeper = storage.getClient();
@ -147,21 +168,39 @@ public:
for (const auto & [key, _] : new_values)
key_paths.push_back(storage.fullPathForKey(key));
auto results = zookeeper->exists(key_paths);
zkutil::ZooKeeper::MultiExistsResponse results;
if constexpr (!for_update)
{
if (!strict)
results = zookeeper->exists(key_paths);
}
Coordination::Requests requests;
requests.reserve(key_paths.size());
for (size_t i = 0; i < key_paths.size(); ++i)
{
auto key = fs::path(key_paths[i]).filename();
if (results[i].error == Coordination::Error::ZOK)
if constexpr (for_update)
{
requests.push_back(zkutil::makeSetRequest(key_paths[i], new_values[key], -1));
int32_t version = -1;
if (strict)
version = versions.at(key);
requests.push_back(zkutil::makeSetRequest(key_paths[i], new_values[key], version));
}
else
{
requests.push_back(zkutil::makeCreateRequest(key_paths[i], new_values[key], zkutil::CreateMode::Persistent));
++new_keys_num;
if (!strict && results[i].error == Coordination::Error::ZOK)
{
requests.push_back(zkutil::makeSetRequest(key_paths[i], new_values[key], -1));
}
else
{
requests.push_back(zkutil::makeCreateRequest(key_paths[i], new_values[key], zkutil::CreateMode::Persistent));
++new_keys_num;
}
}
}
@ -193,6 +232,18 @@ class StorageKeeperMapSource : public ISource
KeyContainerIter it;
KeyContainerIter end;
bool with_version_column = false;
static Block getHeader(Block header, bool with_version_column)
{
if (with_version_column)
header.insert(
{DataTypeInt32{}.createColumn(),
std::make_shared<DataTypeInt32>(), std::string{version_column_name}});
return header;
}
public:
StorageKeeperMapSource(
const StorageKeeperMap & storage_,
@ -200,8 +251,10 @@ public:
size_t max_block_size_,
KeyContainerPtr container_,
KeyContainerIter begin_,
KeyContainerIter end_)
: ISource(header), storage(storage_), max_block_size(max_block_size_), container(std::move(container_)), it(begin_), end(end_)
KeyContainerIter end_,
bool with_version_column_)
: ISource(getHeader(header, with_version_column_)), storage(storage_), max_block_size(max_block_size_), container(std::move(container_)), it(begin_), end(end_)
, with_version_column(with_version_column_)
{
}
@ -225,12 +278,12 @@ public:
for (auto & raw_key : raw_keys)
raw_key = base64Encode(raw_key, /* url_encoding */ true);
return storage.getBySerializedKeys(raw_keys, nullptr);
return storage.getBySerializedKeys(raw_keys, nullptr, with_version_column);
}
else
{
size_t elem_num = std::min(max_block_size, static_cast<size_t>(end - it));
auto chunk = storage.getBySerializedKeys(std::span{it, it + elem_num}, nullptr);
auto chunk = storage.getBySerializedKeys(std::span{it, it + elem_num}, nullptr, with_version_column);
it += elem_num;
return chunk;
}
@ -426,6 +479,16 @@ Pipe StorageKeeperMap::read(
auto primary_key_type = sample_block.getByName(primary_key).type;
std::tie(filtered_keys, all_scan) = getFilterKeys(primary_key, primary_key_type, query_info, context_);
bool with_version_column = false;
for (const auto & column : column_names)
{
if (column == version_column_name)
{
with_version_column = true;
break;
}
}
const auto process_keys = [&]<typename KeyContainerPtr>(KeyContainerPtr keys) -> Pipe
{
if (keys->empty())
@ -449,7 +512,7 @@ Pipe StorageKeeperMap::read(
using KeyContainer = typename KeyContainerPtr::element_type;
pipes.emplace_back(std::make_shared<StorageKeeperMapSource<KeyContainer>>(
*this, sample_block, max_block_size, keys, keys->begin() + begin, keys->begin() + end));
*this, sample_block, max_block_size, keys, keys->begin() + begin, keys->begin() + end, with_version_column));
}
return Pipe::unitePipes(std::move(pipes));
};
@ -461,10 +524,10 @@ Pipe StorageKeeperMap::read(
return process_keys(std::move(filtered_keys));
}
SinkToStoragePtr StorageKeeperMap::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/)
SinkToStoragePtr StorageKeeperMap::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context)
{
checkTable<true>();
return std::make_shared<StorageKeeperMapSink>(*this, metadata_snapshot);
return std::make_shared<StorageKeeperMapSink>(*this, metadata_snapshot->getSampleBlock(), local_context);
}
void StorageKeeperMap::truncate(const ASTPtr &, const StorageMetadataPtr &, ContextPtr, TableExclusiveLockHolder &)
@ -554,6 +617,12 @@ void StorageKeeperMap::drop()
dropTable(client, metadata_drop_lock);
}
NamesAndTypesList StorageKeeperMap::getVirtuals() const
{
return NamesAndTypesList{
{std::string{version_column_name}, std::make_shared<DataTypeInt32>()}};
}
zkutil::ZooKeeperPtr StorageKeeperMap::getClient() const
{
std::lock_guard lock{zookeeper_mutex};
@ -670,13 +739,18 @@ Chunk StorageKeeperMap::getByKeys(const ColumnsWithTypeAndName & keys, PaddedPOD
if (raw_keys.size() != keys[0].column->size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Assertion failed: {} != {}", raw_keys.size(), keys[0].column->size());
return getBySerializedKeys(raw_keys, &null_map);
return getBySerializedKeys(raw_keys, &null_map, /* version_column */ false);
}
Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> keys, PaddedPODArray<UInt8> * null_map) const
Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> keys, PaddedPODArray<UInt8> * null_map, bool with_version) const
{
Block sample_block = getInMemoryMetadataPtr()->getSampleBlock();
MutableColumns columns = sample_block.cloneEmptyColumns();
MutableColumnPtr version_column = nullptr;
if (with_version)
version_column = ColumnVector<Int32>::create();
size_t primary_key_pos = getPrimaryKeyPos(sample_block, getPrimaryKey());
if (null_map)
@ -706,6 +780,9 @@ Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> k
if (code == Coordination::Error::ZOK)
{
fillColumns(base64Decode(keys[i], true), response.data, primary_key_pos, sample_block, columns);
if (version_column)
version_column->insert(response.stat.version);
}
else if (code == Coordination::Error::ZNONODE)
{
@ -714,6 +791,9 @@ Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> k
(*null_map)[i] = 0;
for (size_t col_idx = 0; col_idx < sample_block.columns(); ++col_idx)
columns[col_idx]->insert(sample_block.getByPosition(col_idx).type->getDefault());
if (version_column)
version_column->insert(-1);
}
}
else
@ -723,6 +803,10 @@ Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> k
}
size_t num_rows = columns.at(0)->size();
if (version_column)
columns.push_back(std::move(version_column));
return Chunk(std::move(columns), num_rows);
}
@ -763,6 +847,8 @@ void StorageKeeperMap::mutate(const MutationCommands & commands, ContextPtr loca
if (commands.empty())
return;
bool strict = local_context->getSettingsRef().keeper_map_strict_mode;
assert(commands.size() == 1);
auto metadata_snapshot = getInMemoryMetadataPtr();
@ -784,8 +870,10 @@ void StorageKeeperMap::mutate(const MutationCommands & commands, ContextPtr loca
auto header = interpreter->getUpdatedHeader();
auto primary_key_pos = header.getPositionByName(primary_key);
auto version_position = header.getPositionByName(std::string{version_column_name});
auto client = getClient();
Block block;
while (executor.pull(block))
{
@ -793,14 +881,23 @@ void StorageKeeperMap::mutate(const MutationCommands & commands, ContextPtr loca
auto column = column_type_name.column;
auto size = column->size();
WriteBufferFromOwnString wb_key;
Coordination::Requests delete_requests;
for (size_t i = 0; i < size; ++i)
{
int32_t version = -1;
if (strict)
{
const auto & version_column = block.getByPosition(version_position).column;
version = assert_cast<const ColumnVector<Int32> &>(*version_column).getData()[i];
}
wb_key.restart();
column_type_name.type->getDefaultSerialization()->serializeBinary(*column, i, wb_key, {});
delete_requests.emplace_back(zkutil::makeRemoveRequest(fullPathForKey(base64Encode(wb_key.str(), true)), -1));
delete_requests.emplace_back(zkutil::makeRemoveRequest(fullPathForKey(base64Encode(wb_key.str(), true)), version));
}
Coordination::Responses responses;
@ -834,12 +931,13 @@ void StorageKeeperMap::mutate(const MutationCommands & commands, ContextPtr loca
auto pipeline = QueryPipelineBuilder::getPipeline(interpreter->execute());
PullingPipelineExecutor executor(pipeline);
auto sink = std::make_shared<StorageKeeperMapSink>(*this, metadata_snapshot);
auto sink = std::make_shared<StorageKeeperMapSink>(*this, executor.getHeader(), local_context);
Block block;
while (executor.pull(block))
sink->consume(Chunk{block.getColumns(), block.rows()});
sink->onFinish();
sink->finalize<true>(strict);
}
namespace

View File

@ -46,11 +46,13 @@ public:
void truncate(const ASTPtr &, const StorageMetadataPtr &, ContextPtr, TableExclusiveLockHolder &) override;
void drop() override;
NamesAndTypesList getVirtuals() const override;
std::string getName() const override { return "KeeperMap"; }
Names getPrimaryKey() const override { return {primary_key}; }
Chunk getByKeys(const ColumnsWithTypeAndName & keys, PaddedPODArray<UInt8> & null_map, const Names &) const override;
Chunk getBySerializedKeys(std::span<const std::string> keys, PaddedPODArray<UInt8> * null_map) const;
Chunk getBySerializedKeys(std::span<const std::string> keys, PaddedPODArray<UInt8> * null_map, bool with_version) const;
Block getSampleBlock(const Names &) const override;

View File

@ -923,7 +923,7 @@ StorageMerge::DatabaseTablesIterators StorageMerge::getDatabaseIterators(Context
void StorageMerge::checkAlterIsPossible(const AlterCommands & commands, ContextPtr local_context) const
{
auto name_deps = getDependentViewsByColumn(local_context);
std::optional<NameDependencies> name_deps{};
for (const auto & command : commands)
{
if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN
@ -934,7 +934,9 @@ void StorageMerge::checkAlterIsPossible(const AlterCommands & commands, ContextP
if (command.type == AlterCommand::Type::DROP_COLUMN && !command.clear)
{
const auto & deps_mv = name_deps[command.column_name];
if (!name_deps)
name_deps = getDependentViewsByColumn(local_context);
const auto & deps_mv = name_deps.value()[command.column_name];
if (!deps_mv.empty())
{
throw Exception(ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,

View File

@ -37,7 +37,7 @@ void registerStorageNull(StorageFactory & factory)
void StorageNull::checkAlterIsPossible(const AlterCommands & commands, ContextPtr context) const
{
auto name_deps = getDependentViewsByColumn(context);
std::optional<NameDependencies> name_deps{};
for (const auto & command : commands)
{
if (command.type != AlterCommand::Type::ADD_COLUMN
@ -50,7 +50,9 @@ void StorageNull::checkAlterIsPossible(const AlterCommands & commands, ContextPt
if (command.type == AlterCommand::DROP_COLUMN && !command.clear)
{
const auto & deps_mv = name_deps[command.column_name];
if (!name_deps)
name_deps = getDependentViewsByColumn(context);
const auto & deps_mv = name_deps.value()[command.column_name];
if (!deps_mv.empty())
{
throw Exception(ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,

View File

@ -1252,7 +1252,7 @@ void StorageS3::updateConfiguration(ContextPtr ctx, StorageS3::Configuration & u
upd.auth_settings.server_side_encryption_customer_key_base64,
std::move(headers),
S3::CredentialsConfiguration{
upd.auth_settings.use_environment_credentials.value_or(ctx->getConfigRef().getBool("s3.use_environment_credentials", false)),
upd.auth_settings.use_environment_credentials.value_or(ctx->getConfigRef().getBool("s3.use_environment_credentials", true)),
upd.auth_settings.use_insecure_imds_request.value_or(ctx->getConfigRef().getBool("s3.use_insecure_imds_request", false)),
upd.auth_settings.expiration_window_seconds.value_or(
ctx->getConfigRef().getUInt64("s3.expiration_window_seconds", S3::DEFAULT_EXPIRATION_WINDOW_SECONDS)),
@ -1272,7 +1272,7 @@ void StorageS3::processNamedCollectionResult(StorageS3::Configuration & configur
configuration.auth_settings.access_key_id = collection.getOrDefault<String>("access_key_id", "");
configuration.auth_settings.secret_access_key = collection.getOrDefault<String>("secret_access_key", "");
configuration.auth_settings.use_environment_credentials = collection.getOrDefault<UInt64>("use_environment_credentials", 0);
configuration.auth_settings.use_environment_credentials = collection.getOrDefault<UInt64>("use_environment_credentials", 1);
configuration.auth_settings.no_sign_request = collection.getOrDefault<bool>("no_sign_request", false);
configuration.auth_settings.expiration_window_seconds = collection.getOrDefault<UInt64>("expiration_window_seconds", S3::DEFAULT_EXPIRATION_WINDOW_SECONDS);

View File

@ -13,11 +13,12 @@ NamesAndTypesList StorageSystemQueryCache::getNamesAndTypes()
{
return {
{"query", std::make_shared<DataTypeString>()},
{"key_hash", std::make_shared<DataTypeUInt64>()},
{"expires_at", std::make_shared<DataTypeDateTime>()},
{"result_size", std::make_shared<DataTypeUInt64>()},
{"stale", std::make_shared<DataTypeUInt8>()},
{"shared", std::make_shared<DataTypeUInt8>()},
{"result_size", std::make_shared<DataTypeUInt64>()}
{"compressed", std::make_shared<DataTypeUInt8>()},
{"expires_at", std::make_shared<DataTypeDateTime>()},
{"key_hash", std::make_shared<DataTypeUInt64>()}
};
}
@ -44,11 +45,12 @@ void StorageSystemQueryCache::fillData(MutableColumns & res_columns, ContextPtr
continue;
res_columns[0]->insert(key.queryStringFromAst()); /// approximates the original query string
res_columns[1]->insert(key.ast->getTreeHash().first);
res_columns[2]->insert(std::chrono::system_clock::to_time_t(key.expires_at));
res_columns[3]->insert(key.expires_at < std::chrono::system_clock::now());
res_columns[4]->insert(!key.username.has_value());
res_columns[5]->insert(QueryCache::QueryResultWeight()(*query_result));
res_columns[1]->insert(QueryCache::QueryResultWeight()(*query_result));
res_columns[2]->insert(key.expires_at < std::chrono::system_clock::now());
res_columns[3]->insert(!key.username.has_value());
res_columns[4]->insert(key.is_compressed);
res_columns[5]->insert(std::chrono::system_clock::to_time_t(key.expires_at));
res_columns[6]->insert(key.ast->getTreeHash().first);
}
}

View File

@ -26,7 +26,7 @@ try
auto config = processor.loadConfig().configuration;
String root_path = argv[2];
zkutil::ZooKeeper zk(*config, "zookeeper", nullptr);
zkutil::ZooKeeper zk(*config, zkutil::getZooKeeperConfigName(*config), nullptr);
String temp_path = root_path + "/temp";
String blocks_path = root_path + "/block_numbers";

View File

@ -29,7 +29,7 @@ try
auto config = processor.loadConfig().configuration;
String zookeeper_path = argv[2];
auto zookeeper = std::make_shared<zkutil::ZooKeeper>(*config, "zookeeper", nullptr);
auto zookeeper = std::make_shared<zkutil::ZooKeeper>(*config, zkutil::getZooKeeperConfigName(*config), nullptr);
std::unordered_map<String, std::set<Int64>> current_inserts;

View File

@ -144,7 +144,7 @@ def clickhouse_execute_http(
except Exception as ex:
if i == max_http_retries - 1:
raise ex
client.close()
sleep(i + 1)
if res.status != 200:

View File

@ -0,0 +1,5 @@
<clickhouse>
<s3>
<use_environment_credentials>0</use_environment_credentials>
</s3>
</clickhouse>

View File

@ -55,6 +55,7 @@ ln -sf $SRC_PATH/config.d/custom_disks_base_path.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/display_name.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/reverse_dns_query_function.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/compressed_marks_and_index.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/disable_s3_env_credentials.xml $DEST_SERVER_PATH/config.d/
# Not supported with fasttest.
if [ "${DEST_SERVER_PATH}" = "/etc/clickhouse-server" ]

View File

@ -0,0 +1,41 @@
<clickhouse>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<priority>3</priority>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>2</priority>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>3</priority>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

Some files were not shown because too many files have changed in this diff Show More