Merge branch 'master' into local-less-directories

This commit is contained in:
Alexey Milovidov 2024-04-30 04:45:33 +02:00 committed by GitHub
commit 817570c5e4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
250 changed files with 7151 additions and 820 deletions

View File

@ -8,4 +8,4 @@ jobs:
DebugInfo:
runs-on: ubuntu-latest
steps:
- uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
- uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6

View File

@ -16,7 +16,7 @@ jobs:
data: ${{ steps.runconfig.outputs.CI_DATA }}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:

View File

@ -22,7 +22,7 @@ jobs:
data: ${{ steps.runconfig.outputs.CI_DATA }}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:

View File

@ -63,7 +63,7 @@ jobs:
GITHUB_JOB_OVERRIDDEN: ${{inputs.test_name}}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:

View File

@ -16,7 +16,7 @@
#ci_set_reduced
#ci_set_arm
#ci_set_integration
#ci_set_analyzer
#ci_set_old_analyzer
## To run specified job in CI:
#job_<JOB NAME>

2
contrib/openssl vendored

@ -1 +1 @@
Subproject commit 417f9d2825799769708d99917d0465574c36f79a
Subproject commit f7b8721dfc66abb147f24ca07b9c9d1d64f40f71

View File

@ -87,7 +87,7 @@ function start()
tail -n1000 /var/log/clickhouse-server/clickhouse-server.log
break
fi
timeout 120 service clickhouse-server start
timeout 120 sudo -E -u clickhouse /usr/bin/clickhouse-server --config /etc/clickhouse-server/config.xml --daemon --pid-file /var/run/clickhouse-server/clickhouse-server.pid
sleep 0.5
counter=$((counter + 1))
done

View File

@ -287,9 +287,9 @@ The number of columns in the primary key is not explicitly limited. Depending on
A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.
You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#settings-max-insert-threads).
You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#max-insert-threads).
To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#settings-max_threads) `SELECT` queries.
To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#max_threads) `SELECT` queries.
### Choosing a Primary Key that Differs from the Sorting Key {#choosing-a-primary-key-that-differs-from-the-sorting-key}
@ -344,7 +344,7 @@ In the example below, the index cant be used.
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```
To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#settings-force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).
To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).
The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
@ -769,6 +769,7 @@ In addition to local block devices, ClickHouse supports these storage types:
- [`web` for read-only from web](#web-storage)
- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)
- [`s3_plain_rewritable` for immutable, non-replicated tables in S3](/docs/en/operations/storing-data.md#s3-plain-rewritable-storage)
## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}

View File

@ -113,7 +113,7 @@ You can specify any existing ZooKeeper cluster and the system will use a directo
If ZooKeeper is not set in the config file, you cant create replicated tables, and any existing replicated tables will be read-only.
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](/docs/en/operations/settings/settings.md/#settings-max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](/docs/en/operations/settings/settings.md/#settings-fallback_to_stale_replicas_for_distributed_queries).
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](/docs/en/operations/settings/settings.md/#max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](/docs/en/operations/settings/settings.md/#fallback_to_stale_replicas_for_distributed_queries).
For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it does not create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.

View File

@ -83,7 +83,7 @@ When creating a table, the following settings are applied:
#### join_any_take_last_row
[join_any_take_last_row](/docs/en/operations/settings/settings.md/#settings-join_any_take_last_row)
[join_any_take_last_row](/docs/en/operations/settings/settings.md/#join_any_take_last_row)
#### join_use_nulls
#### persistent

View File

@ -28,7 +28,7 @@ Starting from 24.1 clickhouse version, it is possible to use a new configuration
It requires to specify:
1. `type` equal to `object_storage`
2. `object_storage_type`, equal to one of `s3`, `azure_blob_storage` (or just `azure` from `24.3`), `hdfs`, `local_blob_storage` (or just `local` from `24.3`), `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web` and, starting from `24.4`, `plain_rewritable`.
Usage of `plain` metadata type is described in [plain storage section](/docs/en/operations/storing-data.md/#storing-data-on-webserver), `web` metadata type can be used only with `web` object storage type, `local` metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).
E.g. configuration option
@ -341,6 +341,36 @@ Configuration:
</s3_plain>
```
### Using S3 Plain Rewritable Storage {#s3-plain-rewritable-storage}
A new disk type `s3_plain_rewritable` was introduced in `24.4`.
Similar to the `s3_plain` disk type, it does not require additional storage for metadata files; instead, metadata is stored in S3.
Unlike `s3_plain` disk type, `s3_plain_rewritable` allows executing merges and supports INSERT operations.
[Mutations](/docs/en/sql-reference/statements/alter#mutations) and replication of tables are not supported.
A use case for this disk type are non-replicated `MergeTree` tables. Although the `s3` disk type is suitable for non-replicated
MergeTree tables, you may opt for the `s3_plain_rewritable` disk type if you do not require local metadata for the table and are
willing to accept a limited set of operations. This could be useful, for example, for system tables.
Configuration:
``` xml
<s3_plain_rewritable>
<type>s3_plain_rewritable</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
```
is equal to
``` xml
<s3_plain_rewritable>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>plain_rewritable</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
```
### Using Azure Blob Storage {#azure-blob-storage}
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.

View File

@ -505,9 +505,117 @@ HAVING uniqUpTo(4)(UserID) >= 5
`uniqUpTo(4)(UserID)` calculates the number of unique `UserID` values for each `SearchPhrase`, but it only counts up to 4 unique values. If there are more than 4 unique `UserID` values for a `SearchPhrase`, the function returns 5 (4 + 1). The `HAVING` clause then filters out the `SearchPhrase` values for which the number of unique `UserID` values is less than 5. This will give you a list of search keywords that were used by at least 5 unique users.
## sumMapFiltered(keys_to_keep)(keys, values)
## sumMapFiltered
Same behavior as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that an array of keys is passed as a parameter. This can be especially useful when working with a high cardinality of keys.
This function behaves the same as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that it also accepts an array of keys to filter with as a parameter. This can be especially useful when working with a high cardinality of keys.
**Syntax**
`sumMapFiltered(keys_to_keep)(keys, values)`
**Parameters**
- `keys_to_keep`: [Array](../data-types/array.md) of keys to filter with.
- `keys`: [Array](../data-types/array.md) of keys.
- `values`: [Array](../data-types/array.md) of values.
**Returned Value**
- Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
**Example**
Query:
```sql
CREATE TABLE sum_map
(
`date` Date,
`timeslot` DateTime,
`statusMap` Nested(status UInt16, requests UInt64)
)
ENGINE = Log
INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10]);
```
```sql
SELECT sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests) FROM sum_map;
```
Result:
```response
┌─sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests)─┐
1. │ ([1,4,8],[10,20,10]) │
└─────────────────────────────────────────────────────────────────┘
```
## sumMapFilteredWithOverflow
This function behaves the same as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that it also accepts an array of keys to filter with as a parameter. This can be especially useful when working with a high cardinality of keys. It differs from the [sumMapFiltered](#summapfiltered) function in that it does summation with overflow - i.e. returns the same data type for the summation as the argument data type.
**Syntax**
`sumMapFilteredWithOverflow(keys_to_keep)(keys, values)`
**Parameters**
- `keys_to_keep`: [Array](../data-types/array.md) of keys to filter with.
- `keys`: [Array](../data-types/array.md) of keys.
- `values`: [Array](../data-types/array.md) of values.
**Returned Value**
- Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
**Example**
In this example we create a table `sum_map`, insert some data into it and then use both `sumMapFilteredWithOverflow` and `sumMapFiltered` and the `toTypeName` function for comparison of the result. Where `requests` was of type `UInt8` in the created table, `sumMapFiltered` has promoted the type of the summed values to `UInt64` to avoid overflow whereas `sumMapFilteredWithOverflow` has kept the type as `UInt8` which is not large enough to store the result - i.e. overflow has occurred.
Query:
```sql
CREATE TABLE sum_map
(
`date` Date,
`timeslot` DateTime,
`statusMap` Nested(status UInt8, requests UInt8)
)
ENGINE = Log
INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10]);
```
```sql
SELECT sumMapFilteredWithOverflow([1, 4, 8])(statusMap.status, statusMap.requests) as summap_overflow, toTypeName(summap_overflow) FROM sum_map;
```
```sql
SELECT sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests) as summap, toTypeName(summap) FROM sum_map;
```
Result:
```response
┌─sum──────────────────┬─toTypeName(sum)───────────────────┐
1. │ ([1,4,8],[10,20,10]) │ Tuple(Array(UInt8), Array(UInt8)) │
└──────────────────────┴───────────────────────────────────┘
```
```response
┌─summap───────────────┬─toTypeName(summap)─────────────────┐
1. │ ([1,4,8],[10,20,10]) │ Tuple(Array(UInt8), Array(UInt64)) │
└──────────────────────┴────────────────────────────────────┘
```
## sequenceNextNode

View File

@ -16,7 +16,9 @@ Standard aggregate functions:
- [avg](/docs/en/sql-reference/aggregate-functions/reference/avg.md)
- [any](/docs/en/sql-reference/aggregate-functions/reference/any.md)
- [stddevPop](/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md)
- [stddevPopStable](/docs/en/sql-reference/aggregate-functions/reference/stddevpopstable.md)
- [stddevSamp](/docs/en/sql-reference/aggregate-functions/reference/stddevsamp.md)
- [stddevSampStable](/docs/en/sql-reference/aggregate-functions/reference/stddevsampstable.md)
- [varPop](/docs/en/sql-reference/aggregate-functions/reference/varpop.md)
- [varSamp](/docs/en/sql-reference/aggregate-functions/reference/varsamp.md)
- [corr](./corr.md)
@ -65,6 +67,9 @@ ClickHouse-specific aggregate functions:
- [groupBitmapXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapxor.md)
- [sumWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/sumwithoverflow.md)
- [sumMap](/docs/en/sql-reference/aggregate-functions/reference/summap.md)
- [sumMapWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/summapwithoverflow.md)
- [sumMapFiltered](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfiltered)
- [sumMapFilteredWithOverflow](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfilteredwithoverflow)
- [minMap](/docs/en/sql-reference/aggregate-functions/reference/minmap.md)
- [maxMap](/docs/en/sql-reference/aggregate-functions/reference/maxmap.md)
- [skewSamp](/docs/en/sql-reference/aggregate-functions/reference/skewsamp.md)

View File

@ -7,10 +7,50 @@ sidebar_position: 30
The result is equal to the square root of [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md).
Alias:
- `STD`
- `STDDEV_POP`
Aliases: `STD`, `STDDEV_POP`.
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevPopStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`stddevPopStable`](../reference/stddevpopstable.md) function. It works slower but provides a lower computational error.
:::
**Syntax**
```sql
stddevPop(x)
```
**Parameters**
- `x`: Population of values to find the standard deviation of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
Square root of standard deviation of `x`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population UInt8,
)
ENGINE = Log;
INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);
SELECT
stddevPop(population) AS stddev
FROM test_data;
```
Result:
```response
┌────────────stddev─┐
│ 3.794733192202055 │
└───────────────────┘
```

View File

@ -0,0 +1,49 @@
---
slug: /en/sql-reference/aggregate-functions/reference/stddevpopstable
sidebar_position: 30
---
# stddevPopStable
The result is equal to the square root of [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md). Unlike [`stddevPop`](../reference/stddevpop.md), this function uses a numerically stable algorithm. It works slower but provides a lower computational error.
**Syntax**
```sql
stddevPopStable(x)
```
**Parameters**
- `x`: Population of values to find the standard deviation of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
Square root of standard deviation of `x`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population Float64,
)
ENGINE = Log;
INSERT INTO test_data SELECT randUniform(5.5, 10) FROM numbers(1000000)
SELECT
stddevPopStable(population) AS stddev
FROM test_data;
```
Result:
```response
┌─────────────stddev─┐
│ 1.2999977786592576 │
└────────────────────┘
```

View File

@ -10,5 +10,46 @@ The result is equal to the square root of [varSamp](../../../sql-reference/aggre
Alias: `STDDEV_SAMP`.
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevSampStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`stddevSampStable`](../reference/stddevsampstable.md) function. It works slower but provides a lower computational error.
:::
**Syntax**
```sql
stddevSamp(x)
```
**Parameters**
- `x`: Values for which to find the square root of sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
Square root of sample variance of `x`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population UInt8,
)
ENGINE = Log;
INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);
SELECT
stddevSamp(population)
FROM test_data;
```
Result:
```response
┌─stddevSamp(population)─┐
│ 4 │
└────────────────────────┘
```

View File

@ -0,0 +1,49 @@
---
slug: /en/sql-reference/aggregate-functions/reference/stddevsampstable
sidebar_position: 31
---
# stddevSampStable
The result is equal to the square root of [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md). Unlike [`stddevSamp`](../reference/stddevsamp.md) This function uses a numerically stable algorithm. It works slower but provides a lower computational error.
**Syntax**
```sql
stddevSampStable(x)
```
**Parameters**
- `x`: Values for which to find the square root of sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
Square root of sample variance of `x`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population UInt8,
)
ENGINE = Log;
INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);
SELECT
stddevSampStable(population)
FROM test_data;
```
Result:
```response
┌─stddevSampStable(population)─┐
│ 4 │
└──────────────────────────────┘
```

View File

@ -7,6 +7,56 @@ sidebar_position: 4
Calculates the sum. Only works for numbers.
**Syntax**
```sql
sum(num)
```
**Parameters**
- `num`: Column of numeric values. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
- The sum of the values. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Example**
First we create a table `employees` and insert some fictional employee data into it.
Query:
```sql
CREATE TABLE employees
(
`id` UInt32,
`name` String,
`salary` UInt32
)
ENGINE = Log
```
```sql
INSERT INTO employees VALUES
(87432, 'John Smith', 45680),
(59018, 'Jane Smith', 72350),
(20376, 'Ivan Ivanovich', 58900),
(71245, 'Anastasia Ivanovna', 89210);
```
We query for the total amount of the employee salaries using the `sum` function.
Query:
```sql
SELECT sum(salary) FROM employees;
```
Result:
```response
┌─sum(salary)─┐
1. │ 266140 │
└─────────────┘
```

View File

@ -5,21 +5,35 @@ sidebar_position: 141
# sumMap
Syntax: `sumMap(key <Array>, value <Array>)` [Array type](../../data-types/array.md) or `sumMap(Tuple(key <Array>, value <Array>))` [Tuple type](../../data-types/tuple.md).
Totals a `value` array according to the keys specified in the `key` array. Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys without overflow.
Arguments:
**Syntax**
- `sumMap(key <Array>, value <Array>)` [Array type](../../data-types/array.md).
- `sumMap(Tuple(key <Array>, value <Array>))` [Tuple type](../../data-types/tuple.md).
Alias: `sumMappedArrays`.
Totals the `value` array according to the keys specified in the `key` array.
**Arguments**
Passing tuple of keys and values arrays is a synonym to passing two arrays of keys and values.
- `key`: [Array](../../data-types/array.md) of keys.
- `value`: [Array](../../data-types/array.md) of values.
Passing a tuple of key and value arrays is a synonym to passing separately an array of keys and an array of values.
:::note
The number of elements in `key` and `value` must be the same for each row that is totaled.
:::
Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
**Returned Value**
Example:
- Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
**Example**
First we create a table called `sum_map`, and insert some data into it. Arrays of keys and values are stored separately as a column called `statusMap` of [Nested](../../data-types/nested-data-structures/index.md) type, and together as a column called `statusMapTuple` of [tuple](../../data-types/tuple.md) type to illustrate the use of the two different syntaxes of this function described above.
Query:
``` sql
CREATE TABLE sum_map(
@ -31,13 +45,20 @@ CREATE TABLE sum_map(
),
statusMapTuple Tuple(Array(Int32), Array(Int32))
) ENGINE = Log;
```
```sql
INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10], ([1, 2, 3], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10], ([3, 4, 5], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10], ([4, 5, 6], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10], ([6, 7, 8], [10, 10, 10]));
```
Next, we query the table using the `sumMap` function, making use of both array and tuple type syntaxes:
Query:
``` sql
SELECT
timeslot,
sumMap(statusMap.status, statusMap.requests),
@ -46,6 +67,8 @@ FROM sum_map
GROUP BY timeslot
```
Result:
``` text
┌────────────timeslot─┬─sumMap(statusMap.status, statusMap.requests)─┬─sumMap(statusMapTuple)─────────┐
│ 2000-01-01 00:00:00 │ ([1,2,3,4,5],[10,10,20,10,10]) │ ([1,2,3,4,5],[10,10,20,10,10]) │
@ -54,5 +77,6 @@ GROUP BY timeslot
```
**See Also**
- [-Map combinator for Map datatype](../combinators.md#-map)
- [Map combinator for Map datatype](../combinators.md#-map)
- [sumMapWithOverflow](../reference/summapwithoverflow.md)

View File

@ -0,0 +1,92 @@
---
slug: /en/sql-reference/aggregate-functions/reference/summapwithoverflow
sidebar_position: 141
---
# sumMapWithOverflow
Totals a `value` array according to the keys specified in the `key` array. Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
It differs from the [sumMap](../reference/summap.md) function in that it does summation with overflow - i.e. returns the same data type for the summation as the argument data type.
**Syntax**
- `sumMapWithOverflow(key <Array>, value <Array>)` [Array type](../../data-types/array.md).
- `sumMapWithOverflow(Tuple(key <Array>, value <Array>))` [Tuple type](../../data-types/tuple.md).
**Arguments**
- `key`: [Array](../../data-types/array.md) of keys.
- `value`: [Array](../../data-types/array.md) of values.
Passing a tuple of key and value arrays is a synonym to passing separately an array of keys and an array of values.
:::note
The number of elements in `key` and `value` must be the same for each row that is totaled.
:::
**Returned Value**
- Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
**Example**
First we create a table called `sum_map`, and insert some data into it. Arrays of keys and values are stored separately as a column called `statusMap` of [Nested](../../data-types/nested-data-structures/index.md) type, and together as a column called `statusMapTuple` of [tuple](../../data-types/tuple.md) type to illustrate the use of the two different syntaxes of this function described above.
Query:
``` sql
CREATE TABLE sum_map(
date Date,
timeslot DateTime,
statusMap Nested(
status UInt8,
requests UInt8
),
statusMapTuple Tuple(Array(Int8), Array(Int8))
) ENGINE = Log;
```
```sql
INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10], ([1, 2, 3], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10], ([3, 4, 5], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10], ([4, 5, 6], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10], ([6, 7, 8], [10, 10, 10]));
```
If we query the table using the `sumMap`, `sumMapWithOverflow` with the array type syntax, and `toTypeName` functions then we can see that
for the `sumMapWithOverflow` function, the data type of the summed values array is the same as the argument type, both `UInt8` (i.e. summation was done with overflow). For `sumMap` the data type of the summed values arrays has changed from `UInt8` to `UInt64` such that overflow does not occur.
Query:
``` sql
SELECT
timeslot,
toTypeName(sumMap(statusMap.status, statusMap.requests)),
toTypeName(sumMapWithOverflow(statusMap.status, statusMap.requests)),
FROM sum_map
GROUP BY timeslot
```
Equivalently we could have used the tuple syntax with for the same result.
``` sql
SELECT
timeslot,
toTypeName(sumMap(statusMapTuple)),
toTypeName(sumMapWithOverflow(statusMapTuple)),
FROM sum_map
GROUP BY timeslot
```
Result:
``` text
┌────────────timeslot─┬─toTypeName(sumMap(statusMap.status, statusMap.requests))─┬─toTypeName(sumMapWithOverflow(statusMap.status, statusMap.requests))─┐
1. │ 2000-01-01 00:01:00 │ Tuple(Array(UInt8), Array(UInt64)) │ Tuple(Array(UInt8), Array(UInt8)) │
2. │ 2000-01-01 00:00:00 │ Tuple(Array(UInt8), Array(UInt64)) │ Tuple(Array(UInt8), Array(UInt8)) │
└─────────────────────┴──────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────┘
```
**See Also**
- [sumMap](../reference/summap.md)

View File

@ -8,3 +8,64 @@ sidebar_position: 140
Computes the sum of the numbers, using the same data type for the result as for the input parameters. If the sum exceeds the maximum value for this data type, it is calculated with overflow.
Only works for numbers.
**Syntax**
```sql
sumWithOverflow(num)
```
**Parameters**
- `num`: Column of numeric values. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Returned value**
- The sum of the values. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
**Example**
First we create a table `employees` and insert some fictional employee data into it. For this example we will select `salary` as `UInt16` such that a sum of these values may produce an overflow.
Query:
```sql
CREATE TABLE employees
(
`id` UInt32,
`name` String,
`monthly_salary` UInt16
)
ENGINE = Log
```
```sql
SELECT
sum(monthly_salary) AS no_overflow,
sumWithOverflow(monthly_salary) AS overflow,
toTypeName(no_overflow),
toTypeName(overflow)
FROM employees
```
We query for the total amount of the employee salaries using the `sum` and `sumWithOverflow` functions and show their types using the `toTypeName` function.
For the `sum` function the resulting type is `UInt64`, big enough to contain the sum, whilst for `sumWithOverflow` the resulting type remains as `UInt16`.
Query:
```sql
SELECT
sum(monthly_salary) AS no_overflow,
sumWithOverflow(monthly_salary) AS overflow,
toTypeName(no_overflow),
toTypeName(overflow),
FROM employees;
```
Result:
```response
┌─no_overflow─┬─overflow─┬─toTypeName(no_overflow)─┬─toTypeName(overflow)─┐
1. │ 118700 │ 53164 │ UInt64 │ UInt16 │
└─────────────┴──────────┴─────────────────────────┴──────────────────────┘
```

View File

@ -31,6 +31,8 @@ This function uses a numerically unstable algorithm. If you need numerical stabi
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
@ -47,6 +49,8 @@ SELECT
FROM test_data;
```
Result:
```response
3
```
@ -94,6 +98,8 @@ SELECT
FROM test_data;
```
Result:
```response
0.5999999999999999
```

View File

@ -4,48 +4,72 @@ sidebar_label: SVG
title: "Functions for Generating SVG images from Geo data"
---
## Syntax
## Svg
Returns a string of select SVG element tags from Geo data.
**Syntax**
``` sql
SVG(geometry,[style])
Svg(geometry,[style])
```
### Parameters
Aliases: `SVG`, `svg`
- `geometry` — Geo data
- `style` — Optional style name
**Parameters**
### Returned value
- `geometry` — Geo data. [Geo](../../data-types/geo).
- `style` — Optional style name. [String](../../data-types/string).
**Returned value**
- The SVG representation of the geometry:
- SVG circle
- SVG polygon
- SVG path
Type: String
Type: [String](../../data-types/string)
## Examples
**Examples**
**Circle**
Query:
### Circle
```sql
SELECT SVG((0., 0.))
```
Result:
```response
<circle cx="0" cy="0" r="5" style=""/>
```
### Polygon
**Polygon**
Query:
```sql
SELECT SVG([(0., 0.), (10, 0), (10, 10), (0, 10)])
```
Result:
```response
<polygon points="0,0 0,10 10,10 10,0 0,0" style=""/>
```
### Path
**Path**
Query:
```sql
SELECT SVG([[(0., 0.), (10, 0), (10, 10), (0, 10)], [(4., 4.), (5, 4), (5, 5), (4, 5)]])
```
Result:
```response
<g fill-rule="evenodd"><path d="M 0,0 L 0,10 L 10,10 L 10,0 L 0,0M 4,4 L 5,4 L 5,5 L 4,5 L 4,4 z " style=""/></g>
```

View File

@ -1147,13 +1147,13 @@ tryBase58Decode(encoded)
Query:
```sql
SELECT tryBase58Decode('3dc8KtHrwM') as res;
SELECT tryBase58Decode('3dc8KtHrwM') as res, tryBase58Decode('invalid') as res_invalid;
```
```response
┌─res─────┐
│ Encoded │
└─────────┘
┌─res─────┬─res_invalid─
│ Encoded │
└─────────┴─────────────
```
## base64Encode
@ -1187,13 +1187,13 @@ tryBase64Decode(encoded)
Query:
```sql
SELECT tryBase64Decode('RW5jb2RlZA==') as res;
SELECT tryBase64Decode('RW5jb2RlZA==') as res, tryBase64Decode('invalid') as res_invalid;
```
```response
┌─res─────┐
│ Encoded │
└─────────┘
┌─res─────┬─res_invalid─
│ Encoded │
└─────────┴─────────────
```
## endsWith {#endswith}

View File

@ -532,3 +532,15 @@ If there's a refresh in progress for the given view, interrupt and cancel it. Ot
```sql
SYSTEM CANCEL VIEW [db.]name
```
### SYSTEM UNLOAD PRIMARY KEY
Unload the primary keys for the given table or for all tables.
```sql
SYSTEM UNLOAD PRIMARY KEY [db.]name
```
```sql
SYSTEM UNLOAD PRIMARY KEY
```

View File

@ -121,9 +121,12 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/DiskType.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/IObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataOperationsHolder.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDisk.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataFromDiskTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorageMetadata.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorage.cpp
@ -137,6 +140,7 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/S3Capabilities.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/diskSettings.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/DiskS3Utils.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/ObjectStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/RegisterDiskObjectStorage.cpp

View File

@ -677,6 +677,7 @@ void LocalServer::processConfig()
/// NOTE: it is important to apply any overrides before
/// setDefaultProfiles() calls since it will copy current context (i.e.
/// there is separate context for Buffer tables).
adjustSettings();
applySettingsOverridesForLocal(global_context);
applyCmdOptions(global_context);

View File

@ -210,6 +210,7 @@ enum class AccessType
M(SYSTEM_FAILPOINT, "SYSTEM ENABLE FAILPOINT, SYSTEM DISABLE FAILPOINT, SYSTEM WAIT FAILPOINT", GLOBAL, SYSTEM) \
M(SYSTEM_LISTEN, "SYSTEM START LISTEN, SYSTEM STOP LISTEN", GLOBAL, SYSTEM) \
M(SYSTEM_JEMALLOC, "SYSTEM JEMALLOC PURGE, SYSTEM JEMALLOC ENABLE PROFILE, SYSTEM JEMALLOC DISABLE PROFILE, SYSTEM JEMALLOC FLUSH PROFILE", GLOBAL, SYSTEM) \
M(SYSTEM_UNLOAD_PRIMARY_KEY, "SYSTEM UNLOAD PRIMARY KEY", TABLE, SYSTEM) \
M(SYSTEM, "", GROUP, ALL) /* allows to execute SYSTEM {SHUTDOWN|RELOAD CONFIG|...} */ \
\
M(dictGet, "dictHas, dictGetHierarchy, dictIsIn", DICTIONARY, ALL) /* allows to execute functions dictGet(), dictHas(), dictGetHierarchy(), dictIsIn() */\

View File

@ -53,8 +53,9 @@ TEST(AccessRights, Union)
"SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, "
"SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, "
"SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, "
"SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM SYNC DATABASE REPLICA, SYSTEM FLUSH DISTRIBUTED, dictGet ON db1.*, "
"GRANT TABLE ENGINE ON db1, GRANT SET DEFINER ON db1, GRANT NAMED COLLECTION ADMIN ON db1");
"SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM SYNC DATABASE REPLICA, SYSTEM FLUSH DISTRIBUTED, "
"SYSTEM UNLOAD PRIMARY KEY, dictGet ON db1.*, GRANT TABLE ENGINE ON db1, "
"GRANT SET DEFINER ON db1, GRANT NAMED COLLECTION ADMIN ON db1");
}

View File

@ -1,9 +1,10 @@
#include <Analyzer/Passes/QueryAnalysisPass.h>
#include <boost/algorithm/string.hpp>
#include <Common/checkStackSize.h>
#include <Common/NamePrompter.h>
#include <Common/ProfileEvents.h>
#include <Analyzer/FunctionSecretArgumentsFinderTreeNode.h>
#include <IO/WriteBuffer.h>
#include <IO/WriteHelpers.h>
@ -81,8 +82,8 @@
#include <Analyzer/QueryTreeBuilder.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/Identifier.h>
#include <boost/algorithm/string.hpp>
#include <Analyzer/FunctionSecretArgumentsFinderTreeNode.h>
#include <Analyzer/RecursiveCTE.h>
namespace ProfileEvents
{
@ -740,7 +741,7 @@ struct IdentifierResolveScope
/// Identifier lookup to result
std::unordered_map<IdentifierLookup, IdentifierResolveState, IdentifierLookupHash> identifier_lookup_to_resolve_state;
/// Lambda argument can be expression like constant, column, or it can be function
/// Argument can be expression like constant, column, function or table expression
std::unordered_map<std::string, QueryTreeNodePtr> expression_argument_name_to_node;
/// Alias name to query expression node
@ -1464,7 +1465,8 @@ private:
/// Lambdas that are currently in resolve process
std::unordered_set<IQueryTreeNode *> lambdas_in_resolve_process;
std::unordered_set<std::string_view> cte_in_resolve_process;
/// CTEs that are currently in resolve process
std::unordered_set<std::string_view> ctes_in_resolve_process;
/// Function name to user defined lambda map
std::unordered_map<std::string, QueryTreeNodePtr> function_name_to_user_defined_lambda;
@ -2148,9 +2150,9 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
else
{
/** Make unique column names for tuple.
*
* Example: SELECT (SELECT 2 AS x, x)
*/
*
* Example: SELECT (SELECT 2 AS x, x)
*/
makeUniqueColumnNamesInBlock(block);
scalar_block.insert({
@ -3981,6 +3983,9 @@ IdentifierResolveResult QueryAnalyzer::tryResolveIdentifierInParentScopes(const
auto * union_node = resolved_identifier->as<UnionNode>();
bool is_cte = (subquery_node && subquery_node->isCTE()) || (union_node && union_node->isCTE());
bool is_table_from_expression_arguments = lookup_result.resolve_place == IdentifierResolvePlace::EXPRESSION_ARGUMENTS &&
resolved_identifier->getNodeType() == QueryTreeNodeType::TABLE;
bool is_valid_table_expression = is_cte || is_table_from_expression_arguments;
/** From parent scopes we can resolve table identifiers only as CTE.
* Example: SELECT (SELECT 1 FROM a) FROM test_table AS a;
@ -3988,14 +3993,10 @@ IdentifierResolveResult QueryAnalyzer::tryResolveIdentifierInParentScopes(const
* During child scope table identifier resolve a, table node test_table with alias a from parent scope
* is invalid.
*/
if (identifier_lookup.isTableExpressionLookup() && !is_cte)
if (identifier_lookup.isTableExpressionLookup() && !is_valid_table_expression)
continue;
if (is_cte)
{
return lookup_result;
}
else if (resolved_identifier->as<ConstantNode>())
if (is_valid_table_expression || resolved_identifier->as<ConstantNode>())
{
return lookup_result;
}
@ -4071,13 +4072,9 @@ IdentifierResolveResult QueryAnalyzer::tryResolveIdentifier(const IdentifierLook
if (it->second.resolve_result.isResolved() &&
scope.use_identifier_lookup_to_result_cache &&
!scope.non_cached_identifier_lookups_during_expression_resolve.contains(identifier_lookup))
{
if (!it->second.resolve_result.isResolvedFromCTEs() || !cte_in_resolve_process.contains(identifier_lookup.identifier.getFullName()))
{
return it->second.resolve_result;
}
}
!scope.non_cached_identifier_lookups_during_expression_resolve.contains(identifier_lookup) &&
(!it->second.resolve_result.isResolvedFromCTEs() || !ctes_in_resolve_process.contains(identifier_lookup.identifier.getFullName())))
return it->second.resolve_result;
}
else
{
@ -4150,7 +4147,7 @@ IdentifierResolveResult QueryAnalyzer::tryResolveIdentifier(const IdentifierLook
/// To accomplish this behaviour it's not allowed to resolve identifiers to
/// CTE that is being resolved.
if (cte_query_node_it != scope.cte_name_to_query_node.end()
&& !cte_in_resolve_process.contains(full_name))
&& !ctes_in_resolve_process.contains(full_name))
{
resolve_result.resolved_identifier = cte_query_node_it->second;
resolve_result.resolve_place = IdentifierResolvePlace::CTE;
@ -6296,14 +6293,14 @@ ProjectionNames QueryAnalyzer::resolveExpressionNode(QueryTreeNodePtr & node, Id
///
/// In this example argument of function `in` is being resolve here. If CTE `test1` is not forbidden,
/// `test1` is resolved to CTE (not to the table) in `initializeQueryJoinTreeNode` function.
cte_in_resolve_process.insert(cte_name);
ctes_in_resolve_process.insert(cte_name);
if (subquery_node)
resolveQuery(resolved_identifier_node, subquery_scope);
else
resolveUnion(resolved_identifier_node, subquery_scope);
cte_in_resolve_process.erase(cte_name);
ctes_in_resolve_process.erase(cte_name);
}
}
}
@ -7874,7 +7871,7 @@ void QueryAnalyzer::resolveQuery(const QueryTreeNodePtr & query_node, Identifier
auto & query_node_typed = query_node->as<QueryNode &>();
if (query_node_typed.isCTE())
cte_in_resolve_process.insert(query_node_typed.getCTEName());
ctes_in_resolve_process.insert(query_node_typed.getCTEName());
bool is_rollup_or_cube = query_node_typed.isGroupByWithRollup() || query_node_typed.isGroupByWithCube();
@ -7956,7 +7953,6 @@ void QueryAnalyzer::resolveQuery(const QueryTreeNodePtr & query_node, Identifier
auto * union_node = node->as<UnionNode>();
bool subquery_is_cte = (subquery_node && subquery_node->isCTE()) || (union_node && union_node->isCTE());
if (!subquery_is_cte)
continue;
@ -8213,7 +8209,7 @@ void QueryAnalyzer::resolveQuery(const QueryTreeNodePtr & query_node, Identifier
query_node_typed.resolveProjectionColumns(std::move(projection_columns));
if (query_node_typed.isCTE())
cte_in_resolve_process.erase(query_node_typed.getCTEName());
ctes_in_resolve_process.erase(query_node_typed.getCTEName());
}
void QueryAnalyzer::resolveUnion(const QueryTreeNodePtr & union_node, IdentifierResolveScope & scope)
@ -8221,13 +8217,56 @@ void QueryAnalyzer::resolveUnion(const QueryTreeNodePtr & union_node, Identifier
auto & union_node_typed = union_node->as<UnionNode &>();
if (union_node_typed.isCTE())
cte_in_resolve_process.insert(union_node_typed.getCTEName());
ctes_in_resolve_process.insert(union_node_typed.getCTEName());
auto & queries_nodes = union_node_typed.getQueries().getNodes();
for (auto & query_node : queries_nodes)
std::optional<RecursiveCTETable> recursive_cte_table;
TableNodePtr recursive_cte_table_node;
if (union_node_typed.isCTE() && union_node_typed.isRecursiveCTE())
{
auto & non_recursive_query = queries_nodes[0];
bool non_recursive_query_is_query_node = non_recursive_query->getNodeType() == QueryTreeNodeType::QUERY;
auto & non_recursive_query_mutable_context = non_recursive_query_is_query_node ? non_recursive_query->as<QueryNode &>().getMutableContext()
: non_recursive_query->as<UnionNode &>().getMutableContext();
IdentifierResolveScope non_recursive_subquery_scope(non_recursive_query, &scope /*parent_scope*/);
non_recursive_subquery_scope.subquery_depth = scope.subquery_depth + 1;
if (non_recursive_query_is_query_node)
resolveQuery(non_recursive_query, non_recursive_subquery_scope);
else
resolveUnion(non_recursive_query, non_recursive_subquery_scope);
auto temporary_table_columns = non_recursive_query_is_query_node
? non_recursive_query->as<QueryNode &>().getProjectionColumns()
: non_recursive_query->as<UnionNode &>().computeProjectionColumns();
auto temporary_table_holder = std::make_shared<TemporaryTableHolder>(
non_recursive_query_mutable_context,
ColumnsDescription{NamesAndTypesList{temporary_table_columns.begin(), temporary_table_columns.end()}},
ConstraintsDescription{},
nullptr /*query*/,
true /*create_for_global_subquery*/);
auto temporary_table_storage = temporary_table_holder->getTable();
recursive_cte_table_node = std::make_shared<TableNode>(temporary_table_storage, non_recursive_query_mutable_context);
recursive_cte_table_node->setTemporaryTableName(union_node_typed.getCTEName());
recursive_cte_table.emplace(std::move(temporary_table_holder), std::move(temporary_table_storage), std::move(temporary_table_columns));
}
size_t queries_nodes_size = queries_nodes.size();
for (size_t i = recursive_cte_table.has_value(); i < queries_nodes_size; ++i)
{
auto & query_node = queries_nodes[i];
IdentifierResolveScope subquery_scope(query_node, &scope /*parent_scope*/);
if (recursive_cte_table_node)
subquery_scope.expression_argument_name_to_node[union_node_typed.getCTEName()] = recursive_cte_table_node;
auto query_node_type = query_node->getNodeType();
if (query_node_type == QueryTreeNodeType::QUERY)
@ -8247,8 +8286,19 @@ void QueryAnalyzer::resolveUnion(const QueryTreeNodePtr & union_node, Identifier
}
}
if (recursive_cte_table && isStorageUsedInTree(recursive_cte_table->storage, union_node.get()))
{
if (union_node_typed.getUnionMode() != SelectUnionMode::UNION_ALL)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD,
"Recursive CTE subquery {} with {} union mode is unsupported, only UNION ALL union mode is supported",
union_node_typed.formatASTForErrorMessage(),
toString(union_node_typed.getUnionMode()));
union_node_typed.setRecursiveCTETable(std::move(*recursive_cte_table));
}
if (union_node_typed.isCTE())
cte_in_resolve_process.erase(union_node_typed.getCTEName());
ctes_in_resolve_process.erase(union_node_typed.getCTEName());
}
}

View File

@ -10,9 +10,10 @@
#include <Interpreters/Context.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Utils.h>
namespace DB
{
@ -52,17 +53,24 @@ public:
const auto & second_const_value = second_const_node->getValue();
if (second_const_value.isNull()
|| (lower_name == "sum" && isInt64OrUInt64FieldType(second_const_value.getType()) && second_const_value.get<UInt64>() == 0
&& !function_node->getResultType()->isNullable()))
&& !if_node->getResultType()->isNullable()))
{
/// avg(if(cond, a, null)) -> avgIf(a, cond)
/// avg(if(cond, nullable_a, null)) -> avgIfOrNull(a, cond)
/// avg(if(cond, a, null)) -> avgIf(a::ResultTypeIf, cond)
/// avg(if(cond, nullable_a, null)) -> avgIf(nullable_a, cond)
/// sum(if(cond, a, 0)) -> sumIf(a, cond)
/// sum(if(cond, nullable_a, 0)) **is not** equivalent to sumIfOrNull(cond, nullable_a) as
/// it changes the output when no rows pass the condition (from 0 to NULL)
function_arguments_nodes.resize(2);
function_arguments_nodes[0] = std::move(if_arguments_nodes[1]);
function_arguments_nodes[1] = std::move(if_arguments_nodes[0]);
QueryTreeNodes new_arguments{2};
/// We need to preserve the output type from if()
if (if_arguments_nodes[1]->getResultType()->getName() != if_node->getResultType()->getName())
new_arguments[0] = createCastFunction(std::move(if_arguments_nodes[1]), if_node->getResultType(), getContext());
else
new_arguments[0] = std::move(if_arguments_nodes[1]);
new_arguments[1] = std::move(if_arguments_nodes[0]);
function_arguments_nodes = std::move(new_arguments);
resolveAsAggregateFunctionWithIf(
*function_node, {function_arguments_nodes[0]->getResultType(), function_arguments_nodes[1]->getResultType()});
}
@ -72,21 +80,27 @@ public:
const auto & first_const_value = first_const_node->getValue();
if (first_const_value.isNull()
|| (lower_name == "sum" && isInt64OrUInt64FieldType(first_const_value.getType()) && first_const_value.get<UInt64>() == 0
&& !function_node->getResultType()->isNullable()))
&& !if_node->getResultType()->isNullable()))
{
/// avg(if(cond, null, a) -> avgIfOrNullable(a, !cond))
/// avg(if(cond, null, a) -> avgIf(a::ResultTypeIf, !cond))
/// sum(if(cond, 0, a) -> sumIf(a, !cond))
/// sum(if(cond, 0, nullable_a) **is not** sumIf(a, !cond)) -> Same as above
QueryTreeNodes new_arguments{2};
if (if_arguments_nodes[2]->getResultType()->getName() != if_node->getResultType()->getName())
new_arguments[0] = createCastFunction(std::move(if_arguments_nodes[2]), if_node->getResultType(), getContext());
else
new_arguments[0] = std::move(if_arguments_nodes[2]);
auto not_function = std::make_shared<FunctionNode>("not");
auto & not_function_arguments = not_function->getArguments().getNodes();
not_function_arguments.push_back(std::move(if_arguments_nodes[0]));
not_function->resolveAsFunction(
FunctionFactory::instance().get("not", getContext())->build(not_function->getArgumentColumns()));
new_arguments[1] = std::move(not_function);
function_arguments_nodes.resize(2);
function_arguments_nodes[0] = std::move(if_arguments_nodes[2]);
function_arguments_nodes[1] = std::move(not_function);
function_arguments_nodes = std::move(new_arguments);
resolveAsAggregateFunctionWithIf(
*function_node, {function_arguments_nodes[0]->getResultType(), function_arguments_nodes[1]->getResultType()});
}
@ -98,13 +112,9 @@ private:
{
auto result_type = function_node.getResultType();
std::string suffix = "If";
if (result_type->isNullable())
suffix = "OrNullIf";
AggregateFunctionProperties properties;
auto aggregate_function = AggregateFunctionFactory::instance().get(
function_node.getFunctionName() + suffix,
function_node.getFunctionName() + "If",
function_node.getNullsAction(),
argument_types,
function_node.getAggregateFunction()->getParameters(),

View File

@ -14,12 +14,14 @@
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ASTWithElement.h>
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Analyzer/Utils.h>
#include <Analyzer/UnionNode.h>
namespace DB
{
@ -107,6 +109,9 @@ void QueryNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, s
if (is_cte)
buffer << ", is_cte: " << is_cte;
if (is_recursive_with)
buffer << ", is_recursive_with: " << is_recursive_with;
if (is_distinct)
buffer << ", is_distinct: " << is_distinct;
@ -259,6 +264,7 @@ bool QueryNode::isEqualImpl(const IQueryTreeNode & rhs, CompareOptions) const
return is_subquery == rhs_typed.is_subquery &&
is_cte == rhs_typed.is_cte &&
is_recursive_with == rhs_typed.is_recursive_with &&
is_distinct == rhs_typed.is_distinct &&
is_limit_with_ties == rhs_typed.is_limit_with_ties &&
is_group_by_with_totals == rhs_typed.is_group_by_with_totals &&
@ -291,6 +297,7 @@ void QueryNode::updateTreeHashImpl(HashState & state, CompareOptions) const
state.update(projection_column_type_name);
}
state.update(is_recursive_with);
state.update(is_distinct);
state.update(is_limit_with_ties);
state.update(is_group_by_with_totals);
@ -317,19 +324,20 @@ QueryTreeNodePtr QueryNode::cloneImpl() const
{
auto result_query_node = std::make_shared<QueryNode>(context);
result_query_node->is_subquery = is_subquery;
result_query_node->is_cte = is_cte;
result_query_node->is_distinct = is_distinct;
result_query_node->is_limit_with_ties = is_limit_with_ties;
result_query_node->is_group_by_with_totals = is_group_by_with_totals;
result_query_node->is_group_by_with_rollup = is_group_by_with_rollup;
result_query_node->is_group_by_with_cube = is_group_by_with_cube;
result_query_node->is_subquery = is_subquery;
result_query_node->is_cte = is_cte;
result_query_node->is_recursive_with = is_recursive_with;
result_query_node->is_distinct = is_distinct;
result_query_node->is_limit_with_ties = is_limit_with_ties;
result_query_node->is_group_by_with_totals = is_group_by_with_totals;
result_query_node->is_group_by_with_rollup = is_group_by_with_rollup;
result_query_node->is_group_by_with_cube = is_group_by_with_cube;
result_query_node->is_group_by_with_grouping_sets = is_group_by_with_grouping_sets;
result_query_node->is_group_by_all = is_group_by_all;
result_query_node->is_order_by_all = is_order_by_all;
result_query_node->cte_name = cte_name;
result_query_node->projection_columns = projection_columns;
result_query_node->settings_changes = settings_changes;
result_query_node->is_group_by_all = is_group_by_all;
result_query_node->is_order_by_all = is_order_by_all;
result_query_node->cte_name = cte_name;
result_query_node->projection_columns = projection_columns;
result_query_node->settings_changes = settings_changes;
return result_query_node;
}
@ -337,6 +345,7 @@ QueryTreeNodePtr QueryNode::cloneImpl() const
ASTPtr QueryNode::toASTImpl(const ConvertToASTOptions & options) const
{
auto select_query = std::make_shared<ASTSelectQuery>();
select_query->recursive_with = is_recursive_with;
select_query->distinct = is_distinct;
select_query->limit_with_ties = is_limit_with_ties;
select_query->group_by_with_totals = is_group_by_with_totals;
@ -347,7 +356,41 @@ ASTPtr QueryNode::toASTImpl(const ConvertToASTOptions & options) const
select_query->order_by_all = is_order_by_all;
if (hasWith())
select_query->setExpression(ASTSelectQuery::Expression::WITH, getWith().toAST(options));
{
const auto & with = getWith();
auto expression_list_ast = std::make_shared<ASTExpressionList>();
expression_list_ast->children.reserve(with.getNodes().size());
for (const auto & with_node : with)
{
auto with_node_ast = with_node->toAST(options);
expression_list_ast->children.push_back(with_node_ast);
const auto * with_query_node = with_node->as<QueryNode>();
const auto * with_union_node = with_node->as<UnionNode>();
if (!with_query_node && !with_union_node)
continue;
bool is_with_node_cte = with_query_node ? with_query_node->isCTE() : with_union_node->isCTE();
if (!is_with_node_cte)
continue;
const auto & with_node_cte_name = with_query_node ? with_query_node->cte_name : with_union_node->getCTEName();
auto * with_node_ast_subquery = with_node_ast->as<ASTSubquery>();
if (with_node_ast_subquery)
with_node_ast_subquery->cte_name = "";
auto with_element_ast = std::make_shared<ASTWithElement>();
with_element_ast->name = with_node_cte_name;
with_element_ast->subquery = std::move(with_node_ast);
with_element_ast->children.push_back(with_element_ast->subquery);
expression_list_ast->children.back() = std::move(with_element_ast);
}
select_query->setExpression(ASTSelectQuery::Expression::WITH, std::move(expression_list_ast));
}
auto projection_ast = getProjection().toAST(options);
auto & projection_expression_list_ast = projection_ast->as<ASTExpressionList &>();

View File

@ -140,6 +140,18 @@ public:
cte_name = std::move(cte_name_value);
}
/// Returns true if query node has RECURSIVE WITH, false otherwise
bool isRecursiveWith() const
{
return is_recursive_with;
}
/// Set query node RECURSIVE WITH value
void setIsRecursiveWith(bool is_recursive_with_value)
{
is_recursive_with = is_recursive_with_value;
}
/// Returns true if query node has DISTINCT, false otherwise
bool isDistinct() const
{
@ -618,6 +630,7 @@ protected:
private:
bool is_subquery = false;
bool is_cte = false;
bool is_recursive_with = false;
bool is_distinct = false;
bool is_limit_with_ties = false;
bool is_group_by_with_totals = false;

View File

@ -271,6 +271,7 @@ QueryTreeNodePtr QueryTreeBuilder::buildSelectExpression(const ASTPtr & select_q
current_query_tree->setIsSubquery(is_subquery);
current_query_tree->setIsCTE(!cte_name.empty());
current_query_tree->setCTEName(cte_name);
current_query_tree->setIsRecursiveWith(select_query_typed.recursive_with);
current_query_tree->setIsDistinct(select_query_typed.distinct);
current_query_tree->setIsLimitWithTies(select_query_typed.limit_with_ties);
current_query_tree->setIsGroupByWithTotals(select_query_typed.group_by_with_totals);
@ -287,8 +288,22 @@ QueryTreeNodePtr QueryTreeBuilder::buildSelectExpression(const ASTPtr & select_q
auto select_with_list = select_query_typed.with();
if (select_with_list)
{
current_query_tree->getWithNode() = buildExpressionList(select_with_list, current_context);
if (select_query_typed.recursive_with)
{
for (auto & with_node : current_query_tree->getWith().getNodes())
{
auto * with_union_node = with_node->as<UnionNode>();
if (!with_union_node)
continue;
with_union_node->setIsRecursiveCTE(true);
}
}
}
auto select_expression_list = select_query_typed.select();
if (select_expression_list)
current_query_tree->getProjectionNode() = buildExpressionList(select_expression_list, current_context);

View File

@ -0,0 +1,21 @@
#include <Analyzer/RecursiveCTE.h>
#include <Storages/IStorage.h>
namespace DB
{
RecursiveCTETable::RecursiveCTETable(TemporaryTableHolderPtr holder_,
StoragePtr storage_,
NamesAndTypes columns_)
: holder(std::move(holder_))
, storage(std::move(storage_))
, columns(std::move(columns_))
{}
StorageID RecursiveCTETable::getStorageID() const
{
return storage->getStorageID();
}
}

View File

@ -0,0 +1,51 @@
#pragma once
#include <Core/NamesAndTypes.h>
#include <Interpreters/DatabaseCatalog.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/TableNode.h>
namespace DB
{
/** Recursive CTEs allow to recursively evaluate UNION subqueries.
*
* Overview:
* https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-RECURSIVE
*
* Current implementation algorithm:
*
* During query analysis, when we resolve UNION node that is inside WITH RECURSIVE section of parent query we:
* 1. First resolve non recursive subquery.
* 2. Create temporary table using projection columns of resolved subquery from step 1.
* 3. Create temporary table expression node using storage from step 2.
* 4. Create resolution scope for recursive subquery. In that scope we add node from step 3 as expression argument with UNION node CTE name.
* 5. Resolve recursive subquery.
* 6. If in resolved UNION node temporary table expression storage from step 2 is used, we update UNION query with recursive CTE table.
*
* During query planning if UNION node contains recursive CTE table, we add ReadFromRecursiveCTEStep to query plan. That step is responsible for whole
* recursive CTE query execution.
*
* TODO: Improve locking in ReadFromRecursiveCTEStep.
* TODO: Improve query analysis if query contains aggregates, JOINS, GROUP BY, ORDER BY, LIMIT, OFFSET.
* TODO: Support SEARCH DEPTH FIRST BY, SEARCH BREADTH FIRST BY syntax.
* TODO: Support CYCLE syntax.
* TODO: Support UNION DISTINCT recursive CTE mode.
*/
class RecursiveCTETable
{
public:
RecursiveCTETable(TemporaryTableHolderPtr holder_,
StoragePtr storage_,
NamesAndTypes columns_);
StorageID getStorageID() const;
TemporaryTableHolderPtr holder;
StoragePtr storage;
NamesAndTypes columns;
};
}

View File

@ -33,6 +33,14 @@ TableNode::TableNode(StoragePtr storage_, const ContextPtr & context)
{
}
void TableNode::updateStorage(StoragePtr storage_value, const ContextPtr & context)
{
storage = std::move(storage_value);
storage_id = storage->getStorageID();
storage_lock = storage->lockForShare(context->getInitialQueryId(), context->getSettingsRef().lock_acquire_timeout);
storage_snapshot = storage->getStorageSnapshot(storage->getInMemoryMetadataPtr(), context);
}
void TableNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, size_t indent) const
{
buffer << std::string(indent, ' ') << "TABLE id: " << format_state.getNodeId(this);

View File

@ -32,6 +32,11 @@ public:
/// Construct table node with storage, context
explicit TableNode(StoragePtr storage_, const ContextPtr & context);
/** Update table node storage.
* After this call storage, storage_id, storage_lock, storage_snapshot will be updated using new storage.
*/
void updateStorage(StoragePtr storage_value, const ContextPtr & context);
/// Get storage
const StoragePtr & getStorage() const
{

View File

@ -9,6 +9,7 @@
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ASTWithElement.h>
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
@ -20,6 +21,8 @@
#include <DataTypes/getLeastSupertype.h>
#include <Storages/IStorage.h>
#include <Interpreters/Context.h>
#include <Analyzer/QueryNode.h>
@ -49,6 +52,9 @@ UnionNode::UnionNode(ContextMutablePtr context_, SelectUnionMode union_mode_)
NamesAndTypes UnionNode::computeProjectionColumns() const
{
if (recursive_cte_table)
return recursive_cte_table->columns;
std::vector<NamesAndTypes> projections;
NamesAndTypes query_node_projection;
@ -90,6 +96,9 @@ NamesAndTypes UnionNode::computeProjectionColumns() const
void UnionNode::removeUnusedProjectionColumns(const std::unordered_set<std::string> & used_projection_columns)
{
if (recursive_cte_table)
return;
auto projection_columns = computeProjectionColumns();
size_t projection_columns_size = projection_columns.size();
std::unordered_set<size_t> used_projection_column_indexes;
@ -113,6 +122,9 @@ void UnionNode::removeUnusedProjectionColumns(const std::unordered_set<std::stri
void UnionNode::removeUnusedProjectionColumns(const std::unordered_set<size_t> & used_projection_columns_indexes)
{
if (recursive_cte_table)
return;
auto & query_nodes = getQueries().getNodes();
for (auto & query_node : query_nodes)
{
@ -136,6 +148,12 @@ void UnionNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, s
if (is_cte)
buffer << ", is_cte: " << is_cte;
if (is_recursive_cte)
buffer << ", is_recursive_cte: " << is_recursive_cte;
if (recursive_cte_table)
buffer << ", recursive_cte_table: " << recursive_cte_table->storage->getStorageID().getNameForLogs();
if (!cte_name.empty())
buffer << ", cte_name: " << cte_name;
@ -149,14 +167,28 @@ bool UnionNode::isEqualImpl(const IQueryTreeNode & rhs, CompareOptions) const
{
const auto & rhs_typed = assert_cast<const UnionNode &>(rhs);
return is_subquery == rhs_typed.is_subquery && is_cte == rhs_typed.is_cte && cte_name == rhs_typed.cte_name &&
union_mode == rhs_typed.union_mode;
if (recursive_cte_table && rhs_typed.recursive_cte_table &&
recursive_cte_table->getStorageID() != rhs_typed.recursive_cte_table->getStorageID())
return false;
else if ((recursive_cte_table && !rhs_typed.recursive_cte_table) || (!recursive_cte_table && rhs_typed.recursive_cte_table))
return false;
return is_subquery == rhs_typed.is_subquery && is_cte == rhs_typed.is_cte && is_recursive_cte == rhs_typed.is_recursive_cte
&& cte_name == rhs_typed.cte_name && union_mode == rhs_typed.union_mode;
}
void UnionNode::updateTreeHashImpl(HashState & state, CompareOptions) const
{
state.update(is_subquery);
state.update(is_cte);
state.update(is_recursive_cte);
if (recursive_cte_table)
{
auto full_name = recursive_cte_table->getStorageID().getFullNameNotQuoted();
state.update(full_name.size());
state.update(full_name);
}
state.update(cte_name.size());
state.update(cte_name);
@ -170,6 +202,8 @@ QueryTreeNodePtr UnionNode::cloneImpl() const
result_union_node->is_subquery = is_subquery;
result_union_node->is_cte = is_cte;
result_union_node->is_recursive_cte = is_recursive_cte;
result_union_node->recursive_cte_table = recursive_cte_table;
result_union_node->cte_name = cte_name;
return result_union_node;
@ -183,14 +217,64 @@ ASTPtr UnionNode::toASTImpl(const ConvertToASTOptions & options) const
select_with_union_query->children.push_back(getQueriesNode()->toAST(options));
select_with_union_query->list_of_selects = select_with_union_query->children.back();
if (is_subquery)
ASTPtr result_query = std::move(select_with_union_query);
bool set_subquery_cte_name = true;
if (recursive_cte_table)
{
auto subquery = std::make_shared<ASTSubquery>(std::move(select_with_union_query));
subquery->cte_name = cte_name;
return subquery;
auto recursive_select_query = std::make_shared<ASTSelectQuery>();
recursive_select_query->recursive_with = true;
auto with_element_ast = std::make_shared<ASTWithElement>();
with_element_ast->name = cte_name;
with_element_ast->subquery = std::make_shared<ASTSubquery>(std::move(result_query));
with_element_ast->children.push_back(with_element_ast->subquery);
auto with_expression_list_ast = std::make_shared<ASTExpressionList>();
with_expression_list_ast->children.push_back(std::move(with_element_ast));
recursive_select_query->setExpression(ASTSelectQuery::Expression::WITH, std::move(with_expression_list_ast));
auto select_expression_list_ast = std::make_shared<ASTExpressionList>();
select_expression_list_ast->children.reserve(recursive_cte_table->columns.size());
for (const auto & recursive_cte_table_column : recursive_cte_table->columns)
select_expression_list_ast->children.push_back(std::make_shared<ASTIdentifier>(recursive_cte_table_column.name));
recursive_select_query->setExpression(ASTSelectQuery::Expression::SELECT, std::move(select_expression_list_ast));
auto table_expression_ast = std::make_shared<ASTTableExpression>();
table_expression_ast->children.push_back(std::make_shared<ASTTableIdentifier>(cte_name));
table_expression_ast->database_and_table_name = table_expression_ast->children.back();
auto tables_in_select_query_element_ast = std::make_shared<ASTTablesInSelectQueryElement>();
tables_in_select_query_element_ast->children.push_back(std::move(table_expression_ast));
tables_in_select_query_element_ast->table_expression = tables_in_select_query_element_ast->children.back();
ASTPtr tables_in_select_query_ast = std::make_shared<ASTTablesInSelectQuery>();
tables_in_select_query_ast->children.push_back(std::move(tables_in_select_query_element_ast));
recursive_select_query->setExpression(ASTSelectQuery::Expression::TABLES, std::move(tables_in_select_query_ast));
auto recursive_select_with_union_query = std::make_shared<ASTSelectWithUnionQuery>();
auto recursive_select_with_union_query_list_of_selects = std::make_shared<ASTExpressionList>();
recursive_select_with_union_query_list_of_selects->children.push_back(std::move(recursive_select_query));
recursive_select_with_union_query->children.push_back(std::move(recursive_select_with_union_query_list_of_selects));
recursive_select_with_union_query->list_of_selects = recursive_select_with_union_query->children.back();
result_query = std::move(recursive_select_with_union_query);
set_subquery_cte_name = false;
}
return select_with_union_query;
if (is_subquery)
{
auto subquery = std::make_shared<ASTSubquery>(std::move(result_query));
if (set_subquery_cte_name)
subquery->cte_name = cte_name;
result_query = std::move(subquery);
}
return result_query;
}
}

View File

@ -9,6 +9,7 @@
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/ListNode.h>
#include <Analyzer/TableExpressionModifiers.h>
#include <Analyzer/RecursiveCTE.h>
#include <Interpreters/Context_fwd.h>
@ -84,6 +85,42 @@ public:
is_cte = is_cte_value;
}
/// Returns true if union node CTE is specified in WITH RECURSIVE, false otherwise
bool isRecursiveCTE() const
{
return is_recursive_cte;
}
/// Set union node is recursive CTE value
void setIsRecursiveCTE(bool is_recursive_cte_value)
{
is_recursive_cte = is_recursive_cte_value;
}
/// Returns true if union node has recursive CTE table, false otherwise
bool hasRecursiveCTETable() const
{
return recursive_cte_table.has_value();
}
/// Returns optional recursive CTE table
const std::optional<RecursiveCTETable> & getRecursiveCTETable() const
{
return recursive_cte_table;
}
/// Returns optional recursive CTE table
std::optional<RecursiveCTETable> & getRecursiveCTETable()
{
return recursive_cte_table;
}
/// Set union node recursive CTE table value
void setRecursiveCTETable(RecursiveCTETable recursive_cte_table_value)
{
recursive_cte_table.emplace(std::move(recursive_cte_table_value));
}
/// Get union node CTE name
const std::string & getCTEName() const
{
@ -154,6 +191,8 @@ protected:
private:
bool is_subquery = false;
bool is_cte = false;
bool is_recursive_cte = false;
std::optional<RecursiveCTETable> recursive_cte_table;
std::string cte_name;
ContextMutablePtr context;
SelectUnionMode union_mode;

View File

@ -5,6 +5,7 @@
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTFunction.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeArray.h>
@ -15,6 +16,8 @@
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionFactory.h>
#include <Storages/IStorage.h>
#include <Interpreters/Context.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
@ -61,6 +64,36 @@ bool isNodePartOfTree(const IQueryTreeNode * node, const IQueryTreeNode * root)
return false;
}
bool isStorageUsedInTree(const StoragePtr & storage, const IQueryTreeNode * root)
{
std::vector<const IQueryTreeNode *> nodes_to_process;
nodes_to_process.push_back(root);
while (!nodes_to_process.empty())
{
const auto * subtree_node = nodes_to_process.back();
nodes_to_process.pop_back();
const auto * table_node = subtree_node->as<TableNode>();
const auto * table_function_node = subtree_node->as<TableFunctionNode>();
if (table_node || table_function_node)
{
const auto & table_storage = table_node ? table_node->getStorage() : table_function_node->getStorage();
if (table_storage->getStorageID() == storage->getStorageID())
return true;
}
for (const auto & child : subtree_node->getChildren())
{
if (child)
nodes_to_process.push_back(child.get());
}
}
return false;
}
bool isNameOfInFunction(const std::string & function_name)
{
bool is_special_function_in = function_name == "in" ||
@ -808,26 +841,87 @@ QueryTreeNodePtr getExpressionSource(const QueryTreeNodePtr & node)
return source;
}
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(QueryTreeNodePtr table_node, const ContextPtr & context)
/** There are no limits on the maximum size of the result for the subquery.
* Since the result of the query is not the result of the entire query.
*/
void updateContextForSubqueryExecution(ContextMutablePtr & mutable_context)
{
/** The subquery in the IN / JOIN section does not have any restrictions on the maximum size of the result.
* Because the result of this query is not the result of the entire query.
* Constraints work instead
* max_rows_in_set, max_bytes_in_set, set_overflow_mode,
* max_rows_in_join, max_bytes_in_join, join_overflow_mode,
* which are checked separately (in the Set, Join objects).
*/
Settings subquery_settings = mutable_context->getSettings();
subquery_settings.max_result_rows = 0;
subquery_settings.max_result_bytes = 0;
/// The calculation of extremes does not make sense and is not necessary (if you do it, then the extremes of the subquery can be taken for whole query).
subquery_settings.extremes = false;
mutable_context->setSettings(subquery_settings);
}
QueryTreeNodePtr buildQueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
ContextMutablePtr & context)
{
auto projection_columns = columns;
QueryTreeNodes subquery_projection_nodes;
subquery_projection_nodes.reserve(projection_columns.size());
for (const auto & column : projection_columns)
subquery_projection_nodes.push_back(std::make_shared<ColumnNode>(column, table_expression));
if (subquery_projection_nodes.empty())
{
auto constant_data_type = std::make_shared<DataTypeUInt64>();
subquery_projection_nodes.push_back(std::make_shared<ConstantNode>(1UL, constant_data_type));
projection_columns.push_back({"1", std::move(constant_data_type)});
}
updateContextForSubqueryExecution(context);
auto query_node = std::make_shared<QueryNode>(std::move(context));
query_node->getProjection().getNodes() = std::move(subquery_projection_nodes);
query_node->resolveProjectionColumns(projection_columns);
query_node->getJoinTree() = table_expression;
return query_node;
}
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
ContextMutablePtr & context)
{
auto result = buildQueryToReadColumnsFromTableExpression(columns, table_expression, context);
result->as<QueryNode &>().setIsSubquery(true);
return result;
}
QueryTreeNodePtr buildQueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
const ContextPtr & context)
{
auto context_copy = Context::createCopy(context);
return buildQueryToReadColumnsFromTableExpression(columns, table_expression, context_copy);
}
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
const ContextPtr & context)
{
auto context_copy = Context::createCopy(context);
return buildSubqueryToReadColumnsFromTableExpression(columns, table_expression, context_copy);
}
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const QueryTreeNodePtr & table_node, const ContextPtr & context)
{
const auto & storage_snapshot = table_node->as<TableNode>()->getStorageSnapshot();
auto columns_to_select = storage_snapshot->getColumns(GetColumnsOptions(GetColumnsOptions::Ordinary));
size_t columns_to_select_size = columns_to_select.size();
auto column_nodes_to_select = std::make_shared<ListNode>();
column_nodes_to_select->getNodes().reserve(columns_to_select_size);
NamesAndTypes projection_columns;
projection_columns.reserve(columns_to_select_size);
for (auto & column : columns_to_select)
{
column_nodes_to_select->getNodes().emplace_back(std::make_shared<ColumnNode>(column, table_node));
projection_columns.emplace_back(column.name, column.type);
}
auto subquery_for_table = std::make_shared<QueryNode>(Context::createCopy(context));
subquery_for_table->setIsSubquery(true);
subquery_for_table->getProjectionNode() = std::move(column_nodes_to_select);
subquery_for_table->getJoinTree() = std::move(table_node);
subquery_for_table->resolveProjectionColumns(std::move(projection_columns));
return subquery_for_table;
auto columns_to_select_list = storage_snapshot->getColumns(GetColumnsOptions(GetColumnsOptions::Ordinary));
NamesAndTypes columns_to_select(columns_to_select_list.begin(), columns_to_select_list.end());
return buildSubqueryToReadColumnsFromTableExpression(columns_to_select, table_node, context);
}
}

View File

@ -1,9 +1,13 @@
#pragma once
#include <Analyzer/IQueryTreeNode.h>
#include <Core/NamesAndTypes.h>
#include <Storages/IStorage_fwd.h>
#include <Interpreters/Context_fwd.h>
#include <Analyzer/IQueryTreeNode.h>
namespace DB
{
@ -12,6 +16,9 @@ class FunctionNode;
/// Returns true if node part of root tree, false otherwise
bool isNodePartOfTree(const IQueryTreeNode * node, const IQueryTreeNode * root);
/// Returns true if storage is used in tree, false otherwise
bool isStorageUsedInTree(const StoragePtr & storage, const IQueryTreeNode * root);
/// Returns true if function name is name of IN function or its variations, false otherwise
bool isNameOfInFunction(const std::string & function_name);
@ -108,7 +115,41 @@ QueryTreeNodePtr createCastFunction(QueryTreeNodePtr node, DataTypePtr result_ty
/// Checks that node has only one source and returns it
QueryTreeNodePtr getExpressionSource(const QueryTreeNodePtr & node);
/// Build subquery which we execute for `IN table` function.
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(QueryTreeNodePtr table_node, const ContextPtr & context);
/// Update mutable context for subquery execution
void updateContextForSubqueryExecution(ContextMutablePtr & mutable_context);
/** Build query to read specified columns from table expression.
* Specified mutable context will be used as query context.
*/
QueryTreeNodePtr buildQueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
ContextMutablePtr & context);
/** Build subquery to read specified columns from table expression.
* Specified mutable context will be used as query context.
*/
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
ContextMutablePtr & context);
/** Build query to read specified columns from table expression.
* Specified context will be copied and used as query context.
*/
QueryTreeNodePtr buildQueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
const ContextPtr & context);
/** Build subquery to read specified columns from table expression.
* Specified context will be copied and used as query context.
*/
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const NamesAndTypes & columns,
const QueryTreeNodePtr & table_expression,
const ContextPtr & context);
/** Build subquery to read all columns from table expression.
* Specified context will be copied and used as query context.
*/
QueryTreeNodePtr buildSubqueryToReadColumnsFromTableExpression(const QueryTreeNodePtr & table_node, const ContextPtr & context);
}

View File

@ -125,7 +125,7 @@ void highlight(const String & query, std::vector<replxx::Replxx::Color> & colors
const char * begin = query.data();
const char * end = begin + query.size();
Tokens tokens(begin, end, 1000, true);
Tokens tokens(begin, end, 10000, true);
IParser::Pos token_iterator(tokens, static_cast<uint32_t>(1000), static_cast<uint32_t>(10000));
Expected expected;
expected.enable_highlighting = true;

View File

@ -39,6 +39,7 @@ static struct InitFiu
REGULAR(replicated_merge_tree_commit_zk_fail_when_recovering_from_hw_fault) \
REGULAR(use_delayed_remote_source) \
REGULAR(cluster_discovery_faults) \
REGULAR(replicated_sends_failpoint) \
ONCE(smt_commit_merge_mutate_zk_fail_after_op) \
ONCE(smt_commit_merge_mutate_zk_fail_before_op) \
ONCE(smt_commit_write_zk_fail_after_op) \

View File

@ -14,10 +14,7 @@ public:
, re_gen(key_template)
{
}
DB::ObjectStorageKey generate(const String &) const override
{
return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate());
}
DB::ObjectStorageKey generate(const String &, bool) const override { return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate()); }
private:
String key_template;
@ -32,7 +29,7 @@ public:
: key_prefix(std::move(key_prefix_))
{}
DB::ObjectStorageKey generate(const String &) const override
DB::ObjectStorageKey generate(const String &, bool) const override
{
/// Path to store the new S3 object.
@ -63,7 +60,7 @@ public:
: key_prefix(std::move(key_prefix_))
{}
DB::ObjectStorageKey generate(const String & path) const override
DB::ObjectStorageKey generate(const String & path, bool) const override
{
return DB::ObjectStorageKey::createAsRelative(key_prefix, path);
}

View File

@ -1,7 +1,7 @@
#pragma once
#include "ObjectStorageKey.h"
#include <memory>
#include "ObjectStorageKey.h"
namespace DB
{
@ -9,8 +9,9 @@ namespace DB
class IObjectStorageKeysGenerator
{
public:
virtual ObjectStorageKey generate(const String & path) const = 0;
virtual ~IObjectStorageKeysGenerator() = default;
virtual ObjectStorageKey generate(const String & path, bool is_directory) const = 0;
};
using ObjectStorageKeysGeneratorPtr = std::shared_ptr<IObjectStorageKeysGenerator>;

View File

@ -67,6 +67,9 @@ static constexpr auto DBMS_DEFAULT_MAX_PARSER_DEPTH = 1000;
/// Default limit on the amount of backtracking of recursive descend parser.
static constexpr auto DBMS_DEFAULT_MAX_PARSER_BACKTRACKS = 1000000;
/// Default limit on recursive CTE evaluation depth.
static constexpr auto DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH = 1000;
/// Default limit on query size.
static constexpr auto DBMS_DEFAULT_MAX_QUERY_SIZE = 262144;

View File

@ -50,6 +50,7 @@ class IColumn;
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(Bool, use_concurrency_control, true, "Respect the server's concurrency control (see the `concurrent_threads_soft_limit_num` and `concurrent_threads_soft_limit_ratio_to_cores` global server settings). If disabled, it allows using a larger number of threads even if the server is overloaded (not recommended for normal usage, and needed mostly for tests).", 0) \
M(MaxThreads, max_download_threads, 4, "The maximum number of threads to download data (e.g. for URL engine).", 0) \
M(MaxThreads, max_parsing_threads, 0, "The maximum number of threads to parse data in input formats that support parallel parsing. By default, it is determined automatically", 0) \
M(UInt64, max_download_buffer_size, 10*1024*1024, "The maximal size of buffer for parallel downloading (e.g. for URL engine) per each thread.", 0) \
M(UInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
M(UInt64, max_read_buffer_size_local_fs, 128*1024, "The maximum size of the buffer to read from local filesystem. If set to 0 then max_read_buffer_size will be used.", 0) \
@ -622,6 +623,7 @@ class IColumn;
M(Bool, validate_polygons, true, "Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.", 0) \
M(UInt64, max_parser_depth, DBMS_DEFAULT_MAX_PARSER_DEPTH, "Maximum parser depth (recursion depth of recursive descend parser).", 0) \
M(UInt64, max_parser_backtracks, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS, "Maximum parser backtracking (how many times it tries different alternatives in the recursive descend parsing process).", 0) \
M(UInt64, max_recursive_cte_evaluation_depth, DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, "Maximum limit on recursive CTE evaluation depth", 0) \
M(Bool, allow_settings_after_format_in_insert, false, "Allow SETTINGS after FORMAT, but note, that this is not always safe (note: this is a compatibility setting).", 0) \
M(Seconds, periodic_live_view_refresh, 60, "Interval after which periodically refreshed live view is forced to refresh.", 0) \
M(Bool, transform_null_in, false, "If enabled, NULL values will be matched with 'IN' operator as if they are considered equal.", 0) \
@ -738,6 +740,7 @@ class IColumn;
M(Bool, query_plan_split_filter, true, "Allow to split filters in the query plan", 0) \
M(Bool, query_plan_merge_expressions, true, "Allow to merge expressions in the query plan", 0) \
M(Bool, query_plan_filter_push_down, true, "Allow to push down filter by predicate query plan step", 0) \
M(Bool, query_plan_convert_outer_join_to_inner_join, true, "Allow to convert OUTER JOIN to INNER JOIN if filter after JOIN always filters default values", 0) \
M(Bool, query_plan_optimize_prewhere, true, "Allow to push down filter to PREWHERE expression for supported storages", 0) \
M(Bool, query_plan_execute_functions_after_sorting, true, "Allow to re-order functions after sorting", 0) \
M(Bool, query_plan_reuse_storage_ordering_for_window_functions, true, "Allow to use the storage sorting for window functions", 0) \
@ -1119,7 +1122,7 @@ class IColumn;
M(ParquetVersion, output_format_parquet_version, "2.latest", "Parquet format version for output format. Supported versions: 1.0, 2.4, 2.6 and 2.latest (default)", 0) \
M(ParquetCompression, output_format_parquet_compression_method, "zstd", "Compression method for Parquet output format. Supported codecs: snappy, lz4, brotli, zstd, gzip, none (uncompressed)", 0) \
M(Bool, output_format_parquet_compliant_nested_types, true, "In parquet file schema, use name 'element' instead of 'item' for list elements. This is a historical artifact of Arrow library implementation. Generally increases compatibility, except perhaps with some old versions of Arrow.", 0) \
M(Bool, output_format_parquet_use_custom_encoder, false, "Use a faster Parquet encoder implementation.", 0) \
M(Bool, output_format_parquet_use_custom_encoder, true, "Use a faster Parquet encoder implementation.", 0) \
M(Bool, output_format_parquet_parallel_encoding, true, "Do Parquet encoding in multiple threads. Requires output_format_parquet_use_custom_encoder.", 0) \
M(UInt64, output_format_parquet_data_page_size, 1024 * 1024, "Target page size in bytes, before compression.", 0) \
M(UInt64, output_format_parquet_batch_size, 1024, "Check page size every this many rows. Consider decreasing if you have columns with average values size above a few KBs.", 0) \

View File

@ -86,6 +86,7 @@ namespace SettingsChangesHistory
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"24.4", {{"input_format_json_throw_on_bad_escape_sequence", true, true, "Allow to save JSON strings with bad escape sequences"},
{"max_parsing_threads", 0, 0, "Add a separate setting to control number of threads in parallel parsing from files"},
{"ignore_drop_queries_probability", 0, 0, "Allow to ignore drop queries in server with specified probability for testing purposes"},
{"lightweight_deletes_sync", 2, 2, "The same as 'mutation_sync', but controls only execution of lightweight deletes"},
{"query_cache_system_table_handling", "save", "throw", "The query cache no longer caches results of queries against system tables"},
@ -95,6 +96,8 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"allow_experimental_database_replicated", false, true, "Database engine Replicated is now in Beta stage"},
{"temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds", (10 * 60 * 1000), (10 * 60 * 1000), "Wait time to lock cache for sapce reservation in temporary data in filesystem cache"},
{"azure_allow_parallel_part_upload", "true", "true", "Use multiple threads for azure multipart upload."},
{"max_recursive_cte_evaluation_depth", DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, "Maximum limit on recursive CTE evaluation depth"},
{"query_plan_convert_outer_join_to_inner_join", false, true, "Allow to convert OUTER JOIN to INNER JOIN if filter after JOIN always filters default values"},
}},
{"24.3", {{"s3_connect_timeout_ms", 1000, 1000, "Introduce new dedicated setting for s3 connection timeout"},
{"allow_experimental_shared_merge_tree", false, true, "The setting is obsolete"},
@ -135,6 +138,7 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"azure_max_upload_part_size", 5ull*1024*1024*1024, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to Azure blob storage."},
{"azure_upload_part_size_multiply_factor", 2, 2, "Multiply azure_min_upload_part_size by this factor each time azure_multiply_parts_count_threshold parts were uploaded from a single write to Azure blob storage."},
{"azure_upload_part_size_multiply_parts_count_threshold", 500, 500, "Each time this number of parts was uploaded to Azure blob storage, azure_min_upload_part_size is multiplied by azure_upload_part_size_multiply_factor."},
{"output_format_parquet_use_custom_encoder", false, true, "Enable custom Parquet encoder."},
}},
{"24.2", {{"allow_suspicious_variant_types", true, false, "Don't allow creating Variant type with suspicious variants by default"},
{"validate_experimental_and_suspicious_types_inside_nested_types", false, true, "Validate usage of experimental and suspicious types inside nested types"},

View File

@ -151,8 +151,7 @@ static void checkMySQLVariables(const mysqlxx::Pool::Entry & connection, const S
{"log_bin", "ON"},
{"binlog_format", "ROW"},
{"binlog_row_image", "FULL"},
{"default_authentication_plugin", "mysql_native_password"},
{"log_bin_use_v1_row_events", "OFF"}
{"default_authentication_plugin", "mysql_native_password"}
};
QueryPipeline pipeline(std::move(variables_input));

View File

@ -16,6 +16,8 @@ MetadataStorageType metadataTypeFromString(const String & type)
return MetadataStorageType::Local;
if (check_type == "plain")
return MetadataStorageType::Plain;
if (check_type == "plain_rewritable")
return MetadataStorageType::PlainRewritable;
if (check_type == "web")
return MetadataStorageType::StaticWeb;

View File

@ -28,6 +28,7 @@ enum class MetadataStorageType
None,
Local,
Plain,
PlainRewritable,
StaticWeb,
};

View File

@ -363,6 +363,8 @@ public:
virtual bool isWriteOnce() const { return false; }
virtual bool supportsHardLinks() const { return true; }
/// Check if disk is broken. Broken disks will have 0 space and cannot be used.
virtual bool isBroken() const { return false; }

View File

@ -0,0 +1,72 @@
#include "CommonPathPrefixKeyGenerator.h"
#include <Common/getRandomASCIIString.h>
#include <deque>
#include <filesystem>
#include <tuple>
namespace DB
{
CommonPathPrefixKeyGenerator::CommonPathPrefixKeyGenerator(
String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_)
: storage_key_prefix(key_prefix_), shared_mutex(shared_mutex_), path_map(std::move(path_map_))
{
}
ObjectStorageKey CommonPathPrefixKeyGenerator::generate(const String & path, bool is_directory) const
{
const auto & [object_key_prefix, suffix_parts] = getLongestObjectKeyPrefix(path);
auto key = std::filesystem::path(object_key_prefix.empty() ? storage_key_prefix : object_key_prefix);
/// The longest prefix is the same as path, meaning that the path is already mapped.
if (suffix_parts.empty())
return ObjectStorageKey::createAsRelative(std::move(key));
/// File and top-level directory paths are mapped as is.
if (!is_directory || object_key_prefix.empty())
for (const auto & part : suffix_parts)
key /= part;
/// Replace the last part of the directory path with a pseudorandom suffix.
else
{
for (size_t i = 0; i + 1 < suffix_parts.size(); ++i)
key /= suffix_parts[i];
constexpr size_t part_size = 16;
key /= getRandomASCIIString(part_size);
}
return ObjectStorageKey::createAsRelative(key);
}
std::tuple<std::string, std::vector<std::string>> CommonPathPrefixKeyGenerator::getLongestObjectKeyPrefix(const std::string & path) const
{
std::filesystem::path p(path);
std::deque<std::string> dq;
std::shared_lock lock(shared_mutex);
auto ptr = path_map.lock();
while (p != p.root_path())
{
auto it = ptr->find(p / "");
if (it != ptr->end())
{
std::vector<std::string> vec(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()));
return std::make_tuple(it->second, std::move(vec));
}
if (!p.filename().empty())
dq.push_front(p.filename());
p = p.parent_path();
}
return {std::string(), std::vector<std::string>(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()))};
}
}

View File

@ -0,0 +1,41 @@
#pragma once
#include <Common/ObjectStorageKeyGenerator.h>
#include <Common/SharedMutex.h>
#include <filesystem>
#include <map>
namespace DB
{
/// Object storage key generator used specifically with the
/// MetadataStorageFromPlainObjectStorage if multiple writes are allowed.
/// It searches for the local (metadata) path in a pre-loaded path map.
/// If no such path exists, it searches for the parent path, until it is found
/// or no parent path exists.
///
/// The key generator ensures that the original directory hierarchy is
/// preserved, which is required for the MergeTree family.
class CommonPathPrefixKeyGenerator : public IObjectStorageKeysGenerator
{
public:
/// Local to remote path map. Leverages filesystem::path comparator for paths.
using PathMap = std::map<std::filesystem::path, std::string>;
explicit CommonPathPrefixKeyGenerator(String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_);
ObjectStorageKey generate(const String & path, bool is_directory) const override;
private:
/// Longest key prefix and unresolved parts of the source path.
std::tuple<std::string, std::vector<String>> getLongestObjectKeyPrefix(const String & path) const;
const String storage_key_prefix;
SharedMutex & shared_mutex;
std::weak_ptr<PathMap> path_map;
};
}

View File

@ -112,20 +112,21 @@ size_t DiskObjectStorage::getFileSize(const String & path) const
return metadata_storage->getFileSize(path);
}
void DiskObjectStorage::moveDirectory(const String & from_path, const String & to_path)
{
if (send_metadata)
sendMoveMetadata(from_path, to_path);
auto transaction = createObjectStorageTransaction();
transaction->moveDirectory(from_path, to_path);
transaction->commit();
}
void DiskObjectStorage::moveFile(const String & from_path, const String & to_path, bool should_send_metadata)
{
if (should_send_metadata)
{
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;
const ObjectAttributes object_metadata {
{"from_path", from_path},
{"to_path", to_path}
};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}
sendMoveMetadata(from_path, to_path);
auto transaction = createObjectStorageTransaction();
transaction->moveFile(from_path, to_path);
@ -409,6 +410,15 @@ bool DiskObjectStorage::tryReserve(UInt64 bytes)
return false;
}
void DiskObjectStorage::sendMoveMetadata(const String & from_path, const String & to_path)
{
chassert(send_metadata);
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;
const ObjectAttributes object_metadata{{"from_path", from_path}, {"to_path", to_path}};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}
bool DiskObjectStorage::supportsCache() const
{
@ -425,6 +435,11 @@ bool DiskObjectStorage::isWriteOnce() const
return object_storage->isWriteOnce();
}
bool DiskObjectStorage::supportsHardLinks() const
{
return !isWriteOnce() && !object_storage->isPlain();
}
DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage()
{
const auto config_prefix = "storage_configuration.disks." + name;

View File

@ -112,7 +112,7 @@ public:
void clearDirectory(const String & path) override;
void moveDirectory(const String & from_path, const String & to_path) override { moveFile(from_path, to_path); }
void moveDirectory(const String & from_path, const String & to_path) override;
void removeDirectory(const String & path) override;
@ -183,6 +183,8 @@ public:
/// MergeTree table on this disk.
bool isWriteOnce() const override;
bool supportsHardLinks() const override;
/// Get structure of object storage this disk works with. Examples:
/// DiskObjectStorage(S3ObjectStorage)
/// DiskObjectStorage(CachedObjectStorage(S3ObjectStorage))
@ -228,6 +230,7 @@ private:
std::mutex reservation_mutex;
bool tryReserve(UInt64 bytes);
void sendMoveMetadata(const String & from_path, const String & to_path);
const bool send_metadata;

View File

@ -507,7 +507,7 @@ struct CopyFileObjectStorageOperation final : public IDiskObjectStorageOperation
std::string to_path;
StoredObjects created_objects;
IObjectStorage& destination_object_storage;
IObjectStorage & destination_object_storage;
CopyFileObjectStorageOperation(
IObjectStorage & object_storage_,
@ -714,7 +714,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorageTransaction::writeFile
{
/// Otherwise we will produce lost blobs which nobody points to
/// WriteOnce storages are not affected by the issue
if (!tx->object_storage.isWriteOnce() && tx->metadata_storage.exists(path))
if (!tx->object_storage.isPlain() && tx->metadata_storage.exists(path))
tx->object_storage.removeObjectsIfExist(tx->metadata_storage.getStorageObjects(path));
tx->metadata_transaction->createMetadataFile(path, key_, count);
@ -747,10 +747,9 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorageTransaction::writeFile
{
/// Otherwise we will produce lost blobs which nobody points to
/// WriteOnce storages are not affected by the issue
if (!object_storage_tx->object_storage.isWriteOnce() && object_storage_tx->metadata_storage.exists(path))
if (!object_storage_tx->object_storage.isPlain() && object_storage_tx->metadata_storage.exists(path))
{
object_storage_tx->object_storage.removeObjectsIfExist(
object_storage_tx->metadata_storage.getStorageObjects(path));
object_storage_tx->object_storage.removeObjectsIfExist(object_storage_tx->metadata_storage.getStorageObjects(path));
}
tx->createMetadataFile(path, key_, count);
@ -877,14 +876,14 @@ void DiskObjectStorageTransaction::createFile(const std::string & path)
void DiskObjectStorageTransaction::copyFile(const std::string & from_file_path, const std::string & to_file_path, const ReadSettings & read_settings, const WriteSettings & write_settings)
{
operations_to_execute.emplace_back(
std::make_unique<CopyFileObjectStorageOperation>(object_storage, metadata_storage, object_storage, read_settings, write_settings, from_file_path, to_file_path));
operations_to_execute.emplace_back(std::make_unique<CopyFileObjectStorageOperation>(
object_storage, metadata_storage, object_storage, read_settings, write_settings, from_file_path, to_file_path));
}
void MultipleDisksObjectStorageTransaction::copyFile(const std::string & from_file_path, const std::string & to_file_path, const ReadSettings & read_settings, const WriteSettings & write_settings)
{
operations_to_execute.emplace_back(
std::make_unique<CopyFileObjectStorageOperation>(object_storage, metadata_storage, destination_object_storage, read_settings, write_settings, from_file_path, to_file_path));
operations_to_execute.emplace_back(std::make_unique<CopyFileObjectStorageOperation>(
object_storage, metadata_storage, destination_object_storage, read_settings, write_settings, from_file_path, to_file_path));
}
void DiskObjectStorageTransaction::commit()

View File

@ -0,0 +1,19 @@
#pragma once
#include <mutex>
#include <Common/SharedMutex.h>
namespace DB
{
struct IMetadataOperation
{
virtual void execute(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void undo(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void finalize() { }
virtual ~IMetadataOperation() = default;
};
using MetadataOperationPtr = std::unique_ptr<IMetadataOperation>;
}

View File

@ -145,7 +145,7 @@ public:
virtual ~IMetadataTransaction() = default;
private:
protected:
[[noreturn]] static void throwNotImplemented()
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Operation is not implemented");
@ -229,7 +229,7 @@ public:
/// object_storage_path is absolute.
virtual StoredObjects getStorageObjects(const std::string & path) const = 0;
private:
protected:
[[noreturn]] static void throwNotImplemented()
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Operation is not implemented");

View File

@ -1,11 +1,12 @@
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Disks/IO/ThreadPoolRemoteFSReader.h>
#include <Common/Exception.h>
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Disks/ObjectStorages/ObjectStorageIterator.h>
#include <IO/ReadBufferFromFileBase.h>
#include <IO/WriteBufferFromFileBase.h>
#include <IO/copyData.h>
#include <IO/ReadBufferFromFileBase.h>
#include <Interpreters/Context.h>
#include <Disks/ObjectStorages/ObjectStorageIterator.h>
#include <Common/Exception.h>
#include <Common/ObjectStorageKeyGenerator.h>
namespace DB

View File

@ -83,6 +83,9 @@ using ObjectKeysWithMetadata = std::vector<ObjectKeyWithMetadata>;
class IObjectStorageIterator;
using ObjectStorageIteratorPtr = std::shared_ptr<IObjectStorageIterator>;
class IObjectStorageKeysGenerator;
using ObjectStorageKeysGeneratorPtr = std::shared_ptr<IObjectStorageKeysGenerator>;
/// Base class for all object storages which implement some subset of ordinary filesystem operations.
///
/// Examples of object storages are S3, Azure Blob Storage, HDFS.
@ -208,6 +211,12 @@ public:
/// Path can be generated either independently or based on `path`.
virtual ObjectStorageKey generateObjectKeyForPath(const std::string & path) const = 0;
/// Object key prefix for local paths in the directory 'path'.
virtual ObjectStorageKey generateObjectKeyPrefixForDirectoryPath(const std::string & /* path */) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method 'generateObjectKeyPrefixForDirectoryPath' is not implemented");
}
/// Get unique id for passed absolute path in object storage.
virtual std::string getUniqueId(const std::string & path) const { return path; }
@ -226,6 +235,8 @@ public:
virtual WriteSettings patchSettings(const WriteSettings & write_settings) const;
virtual void setKeysGenerator(ObjectStorageKeysGeneratorPtr) { }
#if USE_AZURE_BLOB_STORAGE
virtual std::shared_ptr<const Azure::Storage::Blobs::BlobContainerClient> getAzureBlobStorageClient()
{

View File

@ -1,23 +0,0 @@
#include <base/defines.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
namespace DB
{
std::string toString(MetadataFromDiskTransactionState state)
{
switch (state)
{
case MetadataFromDiskTransactionState::PREPARING:
return "PREPARING";
case MetadataFromDiskTransactionState::FAILED:
return "FAILED";
case MetadataFromDiskTransactionState::COMMITTED:
return "COMMITTED";
case MetadataFromDiskTransactionState::PARTIALLY_ROLLED_BACK:
return "PARTIALLY_ROLLED_BACK";
}
UNREACHABLE();
}
}

View File

@ -0,0 +1,93 @@
#include "MetadataOperationsHolder.h"
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int FS_METADATA_ERROR;
}
void MetadataOperationsHolder::rollback(std::unique_lock<SharedMutex> & lock, size_t until_pos)
{
/// Otherwise everything is alright
if (state == MetadataStorageTransactionState::FAILED)
{
for (int64_t i = until_pos; i >= 0; --i)
{
try
{
operations[i]->undo(lock);
}
catch (Exception & ex)
{
state = MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK;
ex.addMessage(fmt::format("While rolling back operation #{}", i));
throw;
}
}
}
else
{
/// Nothing to do, transaction committed or not even started to commit
}
}
void MetadataOperationsHolder::addOperation(MetadataOperationPtr && operation)
{
if (state != MetadataStorageTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot add operations to transaction in {} state, it should be in {} state",
toString(state),
toString(MetadataStorageTransactionState::PREPARING));
operations.emplace_back(std::move(operation));
}
void MetadataOperationsHolder::commitImpl(SharedMutex & metadata_mutex)
{
if (state != MetadataStorageTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot commit transaction in {} state, it should be in {} state",
toString(state),
toString(MetadataStorageTransactionState::PREPARING));
{
std::unique_lock lock(metadata_mutex);
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->execute(lock);
}
catch (Exception & ex)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
ex.addMessage(fmt::format("While committing metadata operation #{}", i));
state = MetadataStorageTransactionState::FAILED;
rollback(lock, i);
throw;
}
}
}
/// Do it in "best effort" mode
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__, fmt::format("Failed to finalize operation #{}", i));
}
}
state = MetadataStorageTransactionState::COMMITTED;
}
}

View File

@ -0,0 +1,29 @@
#pragma once
#include <mutex>
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <Common/SharedMutex.h>
/**
* Implementations for transactional operations with metadata used by MetadataStorageFromDisk
* and MetadataStorageFromPlainObjectStorage.
*/
namespace DB
{
class MetadataOperationsHolder
{
private:
std::vector<MetadataOperationPtr> operations;
MetadataStorageTransactionState state{MetadataStorageTransactionState::PREPARING};
void rollback(std::unique_lock<SharedMutex> & lock, size_t until_pos);
protected:
void addOperation(MetadataOperationPtr && operation);
void commitImpl(SharedMutex & metadata_mutex);
};
}

View File

@ -1,6 +1,7 @@
#include <Disks/ObjectStorages/MetadataStorageFactory.h>
#include <Disks/ObjectStorages/MetadataStorageFromDisk.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h>
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
#include <Disks/ObjectStorages/Web/MetadataStorageFromStaticFilesWebServer.h>
#endif
@ -118,6 +119,20 @@ void registerPlainMetadataStorage(MetadataStorageFactory & factory)
});
}
void registerPlainRewritableMetadataStorage(MetadataStorageFactory & factory)
{
factory.registerMetadataStorageType(
"plain_rewritable",
[](const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
ObjectStoragePtr object_storage) -> MetadataStoragePtr
{
auto key_compatibility_prefix = getObjectKeyCompatiblePrefix(*object_storage, config, config_prefix);
return std::make_shared<MetadataStorageFromPlainRewritableObjectStorage>(object_storage, key_compatibility_prefix);
});
}
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
void registerMetadataStorageFromStaticFilesWebServer(MetadataStorageFactory & factory)
{
@ -137,6 +152,7 @@ void registerMetadataStorages()
auto & factory = MetadataStorageFactory::instance();
registerMetadataStorageFromDisk(factory);
registerPlainMetadataStorage(factory);
registerPlainRewritableMetadataStorage(factory);
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
registerMetadataStorageFromStaticFilesWebServer(factory);
#endif

View File

@ -10,14 +10,8 @@
namespace DB
{
namespace ErrorCodes
{
extern const int FS_METADATA_ERROR;
}
MetadataStorageFromDisk::MetadataStorageFromDisk(DiskPtr disk_, String compatible_key_prefix_)
: disk(disk_)
, compatible_key_prefix(compatible_key_prefix_)
: disk(disk_), compatible_key_prefix(compatible_key_prefix_)
{
}
@ -158,83 +152,9 @@ const IMetadataStorage & MetadataStorageFromDiskTransaction::getStorageForNonTra
return metadata_storage;
}
void MetadataStorageFromDiskTransaction::addOperation(MetadataOperationPtr && operation)
{
if (state != MetadataFromDiskTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot add operations to transaction in {} state, it should be in {} state",
toString(state), toString(MetadataFromDiskTransactionState::PREPARING));
operations.emplace_back(std::move(operation));
}
void MetadataStorageFromDiskTransaction::commit()
{
if (state != MetadataFromDiskTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot commit transaction in {} state, it should be in {} state",
toString(state), toString(MetadataFromDiskTransactionState::PREPARING));
{
std::unique_lock lock(metadata_storage.metadata_mutex);
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->execute(lock);
}
catch (Exception & ex)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
ex.addMessage(fmt::format("While committing metadata operation #{}", i));
state = MetadataFromDiskTransactionState::FAILED;
rollback(i);
throw;
}
}
}
/// Do it in "best effort" mode
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__, fmt::format("Failed to finalize operation #{}", i));
}
}
state = MetadataFromDiskTransactionState::COMMITTED;
}
void MetadataStorageFromDiskTransaction::rollback(size_t until_pos)
{
/// Otherwise everything is alright
if (state == MetadataFromDiskTransactionState::FAILED)
{
for (int64_t i = until_pos; i >= 0; --i)
{
try
{
operations[i]->undo();
}
catch (Exception & ex)
{
state = MetadataFromDiskTransactionState::PARTIALLY_ROLLED_BACK;
ex.addMessage(fmt::format("While rolling back operation #{}", i));
throw;
}
}
}
else
{
/// Nothing to do, transaction committed or not even started to commit
}
MetadataOperationsHolder::commitImpl(metadata_storage.metadata_mutex);
}
void MetadataStorageFromDiskTransaction::writeStringToFile(

View File

@ -5,8 +5,9 @@
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/DiskObjectStorageMetadata.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/MetadataOperationsHolder.h>
#include <Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
namespace DB
{
@ -74,18 +75,11 @@ public:
DiskObjectStorageMetadataPtr readMetadataUnlocked(const std::string & path, std::shared_lock<SharedMutex> & lock) const;
};
class MetadataStorageFromDiskTransaction final : public IMetadataTransaction
class MetadataStorageFromDiskTransaction final : public IMetadataTransaction, private MetadataOperationsHolder
{
private:
const MetadataStorageFromDisk & metadata_storage;
std::vector<MetadataOperationPtr> operations;
MetadataFromDiskTransactionState state{MetadataFromDiskTransactionState::PREPARING};
void addOperation(MetadataOperationPtr && operation);
void rollback(size_t until_pos);
public:
explicit MetadataStorageFromDiskTransaction(const MetadataStorageFromDisk & metadata_storage_)
: metadata_storage(metadata_storage_)
@ -135,7 +129,6 @@ public:
UnlinkMetadataFileOperationOutcomePtr unlinkMetadata(const std::string & path) override;
};

View File

@ -32,7 +32,7 @@ void SetLastModifiedOperation::execute(std::unique_lock<SharedMutex> &)
disk.setLastModified(path, new_timestamp);
}
void SetLastModifiedOperation::undo()
void SetLastModifiedOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.setLastModified(path, old_timestamp);
}
@ -50,7 +50,7 @@ void ChmodOperation::execute(std::unique_lock<SharedMutex> &)
disk.chmod(path, mode);
}
void ChmodOperation::undo()
void ChmodOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.chmod(path, old_mode);
}
@ -68,7 +68,7 @@ void UnlinkFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.removeFile(path);
}
void UnlinkFileOperation::undo()
void UnlinkFileOperation::undo(std::unique_lock<SharedMutex> &)
{
auto buf = disk.writeFile(path);
writeString(prev_data, *buf);
@ -86,7 +86,7 @@ void CreateDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.createDirectory(path);
}
void CreateDirectoryOperation::undo()
void CreateDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.removeDirectory(path);
}
@ -112,7 +112,7 @@ void CreateDirectoryRecursiveOperation::execute(std::unique_lock<SharedMutex> &)
disk.createDirectory(path_to_create);
}
void CreateDirectoryRecursiveOperation::undo()
void CreateDirectoryRecursiveOperation::undo(std::unique_lock<SharedMutex> &)
{
for (const auto & path_created : paths_created)
disk.removeDirectory(path_created);
@ -129,7 +129,7 @@ void RemoveDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.removeDirectory(path);
}
void RemoveDirectoryOperation::undo()
void RemoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.createDirectory(path);
}
@ -149,7 +149,7 @@ void RemoveRecursiveOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveDirectory(path, temp_path);
}
void RemoveRecursiveOperation::undo()
void RemoveRecursiveOperation::undo(std::unique_lock<SharedMutex> &)
{
if (disk.isFile(temp_path))
disk.moveFile(temp_path, path);
@ -187,10 +187,10 @@ void CreateHardlinkOperation::execute(std::unique_lock<SharedMutex> & lock)
disk.createHardLink(path_from, path_to);
}
void CreateHardlinkOperation::undo()
void CreateHardlinkOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
disk.removeFile(path_to);
}
@ -206,7 +206,7 @@ void MoveFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveFile(path_from, path_to);
}
void MoveFileOperation::undo()
void MoveFileOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveFile(path_to, path_from);
}
@ -223,7 +223,7 @@ void MoveDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveDirectory(path_from, path_to);
}
void MoveDirectoryOperation::undo()
void MoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveDirectory(path_to, path_from);
}
@ -244,7 +244,7 @@ void ReplaceFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.replaceFile(path_from, path_to);
}
void ReplaceFileOperation::undo()
void ReplaceFileOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveFile(path_to, path_from);
disk.moveFile(temp_path_to, path_to);
@ -275,7 +275,7 @@ void WriteFileOperation::execute(std::unique_lock<SharedMutex> &)
buf->finalize();
}
void WriteFileOperation::undo()
void WriteFileOperation::undo(std::unique_lock<SharedMutex> &)
{
if (!existed)
{
@ -303,10 +303,10 @@ void AddBlobOperation::execute(std::unique_lock<SharedMutex> & metadata_lock)
write_operation->execute(metadata_lock);
}
void AddBlobOperation::undo()
void AddBlobOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
}
void UnlinkMetadataFileOperation::execute(std::unique_lock<SharedMutex> & metadata_lock)
@ -325,17 +325,17 @@ void UnlinkMetadataFileOperation::execute(std::unique_lock<SharedMutex> & metada
unlink_operation->execute(metadata_lock);
}
void UnlinkMetadataFileOperation::undo()
void UnlinkMetadataFileOperation::undo(std::unique_lock<SharedMutex> & lock)
{
/// Operations MUST be reverted in the reversed order, so
/// when we apply operation #1 (write) and operation #2 (unlink)
/// we should revert #2 and only after it #1. Otherwise #1 will overwrite
/// file with incorrect data.
if (unlink_operation)
unlink_operation->undo();
unlink_operation->undo(lock);
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
/// Update outcome to reflect the fact that we have restored the file.
outcome->num_hardlinks++;
@ -349,10 +349,10 @@ void SetReadonlyFileOperation::execute(std::unique_lock<SharedMutex> & metadata_
write_operation->execute(metadata_lock);
}
void SetReadonlyFileOperation::undo()
void SetReadonlyFileOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
}
}

View File

@ -1,6 +1,6 @@
#pragma once
#include <Common/SharedMutex.h>
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <numeric>
@ -14,24 +14,13 @@ class IDisk;
* Implementations for transactional operations with metadata used by MetadataStorageFromDisk.
*/
struct IMetadataOperation
{
virtual void execute(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void undo() = 0;
virtual void finalize() {}
virtual ~IMetadataOperation() = default;
};
using MetadataOperationPtr = std::unique_ptr<IMetadataOperation>;
struct SetLastModifiedOperation final : public IMetadataOperation
{
SetLastModifiedOperation(const std::string & path_, Poco::Timestamp new_timestamp_, IDisk & disk_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -46,7 +35,7 @@ struct ChmodOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -62,7 +51,7 @@ struct UnlinkFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -77,7 +66,7 @@ struct CreateDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -91,7 +80,7 @@ struct CreateDirectoryRecursiveOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -106,7 +95,7 @@ struct RemoveDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -119,7 +108,7 @@ struct RemoveRecursiveOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
void finalize() override;
@ -135,7 +124,8 @@ struct WriteFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
IDisk & disk;
@ -154,7 +144,7 @@ struct CreateHardlinkOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -171,7 +161,7 @@ struct MoveFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -186,7 +176,7 @@ struct MoveDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -201,7 +191,7 @@ struct ReplaceFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
void finalize() override;
@ -229,7 +219,7 @@ struct AddBlobOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -257,7 +247,7 @@ struct UnlinkMetadataFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -282,7 +272,7 @@ struct SetReadonlyFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;

View File

@ -1,18 +1,27 @@
#include "MetadataStorageFromPlainObjectStorage.h"
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.h>
#include <Disks/ObjectStorages/StaticDirectoryIterator.h>
#include <Common/filesystemHelpers.h>
#include <Common/logger_useful.h>
#include <Common/StringUtils/StringUtils.h>
#include <IO/WriteHelpers.h>
#include <Common/filesystemHelpers.h>
#include <filesystem>
#include <tuple>
namespace DB
{
MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(
ObjectStoragePtr object_storage_,
String storage_path_prefix_)
namespace
{
std::filesystem::path normalizeDirectoryPath(const std::filesystem::path & path)
{
return path / "";
}
}
MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_)
: object_storage(object_storage_)
, storage_path_prefix(std::move(storage_path_prefix_))
{
@ -20,7 +29,7 @@ MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(
MetadataTransactionPtr MetadataStorageFromPlainObjectStorage::createTransaction()
{
return std::make_shared<MetadataStorageFromPlainObjectStorageTransaction>(*this);
return std::make_shared<MetadataStorageFromPlainObjectStorageTransaction>(*this, object_storage);
}
const std::string & MetadataStorageFromPlainObjectStorage::getPath() const
@ -44,10 +53,9 @@ bool MetadataStorageFromPlainObjectStorage::isFile(const std::string & path) con
bool MetadataStorageFromPlainObjectStorage::isDirectory(const std::string & path) const
{
auto object_key = object_storage->generateObjectKeyForPath(path);
std::string directory = object_key.serialize();
if (!directory.ends_with('/'))
directory += '/';
auto key_prefix = object_storage->generateObjectKeyForPath(path).serialize();
auto directory = std::filesystem::path(std::move(key_prefix)) / "";
return object_storage->existsOrHasAnyChild(directory);
}
@ -62,33 +70,16 @@ uint64_t MetadataStorageFromPlainObjectStorage::getFileSize(const String & path)
std::vector<std::string> MetadataStorageFromPlainObjectStorage::listDirectory(const std::string & path) const
{
auto object_key = object_storage->generateObjectKeyForPath(path);
auto key_prefix = object_storage->generateObjectKeyForPath(path).serialize();
RelativePathsWithMetadata files;
std::string abs_key = object_key.serialize();
std::string abs_key = key_prefix;
if (!abs_key.ends_with('/'))
abs_key += '/';
object_storage->listObjects(abs_key, files, 0);
std::vector<std::string> result;
for (const auto & path_size : files)
{
result.push_back(path_size.relative_path);
}
std::unordered_set<std::string> duplicates_filter;
for (auto & row : result)
{
chassert(row.starts_with(abs_key));
row.erase(0, abs_key.size());
auto slash_pos = row.find_first_of('/');
if (slash_pos != std::string::npos)
row.erase(slash_pos, row.size() - slash_pos);
duplicates_filter.insert(row);
}
return std::vector<std::string>(duplicates_filter.begin(), duplicates_filter.end());
return getDirectChildrenOnDisk(abs_key, files, path);
}
DirectoryIteratorPtr MetadataStorageFromPlainObjectStorage::iterateDirectory(const std::string & path) const
@ -108,6 +99,25 @@ StoredObjects MetadataStorageFromPlainObjectStorage::getStorageObjects(const std
return {StoredObject(object_key.serialize(), path, object_size)};
}
std::vector<std::string> MetadataStorageFromPlainObjectStorage::getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & /* local_path */) const
{
std::unordered_set<std::string> duplicates_filter;
for (const auto & elem : remote_paths)
{
const auto & path = elem.relative_path;
chassert(path.find(storage_key) == 0);
const auto child_pos = storage_key.size();
/// string::npos is ok.
const auto slash_pos = path.find('/', child_pos);
if (slash_pos == std::string::npos)
duplicates_filter.emplace(path.substr(child_pos));
else
duplicates_filter.emplace(path.substr(child_pos, slash_pos - child_pos));
}
return std::vector<std::string>(std::make_move_iterator(duplicates_filter.begin()), std::make_move_iterator(duplicates_filter.end()));
}
const IMetadataStorage & MetadataStorageFromPlainObjectStorageTransaction::getStorageForNonTransactionalReads() const
{
return metadata_storage;
@ -122,18 +132,44 @@ void MetadataStorageFromPlainObjectStorageTransaction::unlinkFile(const std::str
void MetadataStorageFromPlainObjectStorageTransaction::removeDirectory(const std::string & path)
{
for (auto it = metadata_storage.iterateDirectory(path); it->isValid(); it->next())
metadata_storage.object_storage->removeObject(StoredObject(it->path()));
if (metadata_storage.object_storage->isWriteOnce())
{
for (auto it = metadata_storage.iterateDirectory(path); it->isValid(); it->next())
metadata_storage.object_storage->removeObject(StoredObject(it->path()));
}
else
{
addOperation(std::make_unique<MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation>(
normalizeDirectoryPath(path), *metadata_storage.getPathMap(), object_storage));
}
}
void MetadataStorageFromPlainObjectStorageTransaction::createDirectory(const std::string &)
void MetadataStorageFromPlainObjectStorageTransaction::createDirectory(const std::string & path)
{
/// Noop. It is an Object Storage not a filesystem.
if (metadata_storage.object_storage->isWriteOnce())
return;
auto normalized_path = normalizeDirectoryPath(path);
auto key_prefix = object_storage->generateObjectKeyPrefixForDirectoryPath(normalized_path).serialize();
auto op = std::make_unique<MetadataStorageFromPlainObjectStorageCreateDirectoryOperation>(
std::move(normalized_path), std::move(key_prefix), *metadata_storage.getPathMap(), object_storage);
addOperation(std::move(op));
}
void MetadataStorageFromPlainObjectStorageTransaction::createDirectoryRecursive(const std::string &)
void MetadataStorageFromPlainObjectStorageTransaction::createDirectoryRecursive(const std::string & path)
{
/// Noop. It is an Object Storage not a filesystem.
return createDirectory(path);
}
void MetadataStorageFromPlainObjectStorageTransaction::moveDirectory(const std::string & path_from, const std::string & path_to)
{
if (metadata_storage.object_storage->isWriteOnce())
throwNotImplemented();
addOperation(std::make_unique<MetadataStorageFromPlainObjectStorageMoveDirectoryOperation>(
normalizeDirectoryPath(path_from), normalizeDirectoryPath(path_to), *metadata_storage.getPathMap(), object_storage));
}
void MetadataStorageFromPlainObjectStorageTransaction::addBlobToMetadata(
const std::string &, ObjectStorageKey /* object_key */, uint64_t /* size_in_bytes */)
{
@ -146,4 +182,8 @@ UnlinkMetadataFileOperationOutcomePtr MetadataStorageFromPlainObjectStorageTrans
return std::make_shared<UnlinkMetadataFileOperationOutcome>(UnlinkMetadataFileOperationOutcome{0});
}
void MetadataStorageFromPlainObjectStorageTransaction::commit()
{
MetadataOperationsHolder::commitImpl(metadata_storage.metadata_mutex);
}
}

View File

@ -2,9 +2,10 @@
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.h>
#include <Disks/ObjectStorages/MetadataOperationsHolder.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <map>
namespace DB
{
@ -23,14 +24,21 @@ using UnlinkMetadataFileOperationOutcomePtr = std::shared_ptr<UnlinkMetadataFile
/// It is used to allow BACKUP/RESTORE to ObjectStorage (S3/...) with the same
/// structure as on disk MergeTree, and does not requires metadata from local
/// disk to restore.
class MetadataStorageFromPlainObjectStorage final : public IMetadataStorage
class MetadataStorageFromPlainObjectStorage : public IMetadataStorage
{
public:
/// Local path prefixes mapped to storage key prefixes.
using PathMap = std::map<std::filesystem::path, std::string>;
private:
friend class MetadataStorageFromPlainObjectStorageTransaction;
protected:
ObjectStoragePtr object_storage;
String storage_path_prefix;
mutable SharedMutex metadata_mutex;
public:
MetadataStorageFromPlainObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_);
@ -69,23 +77,37 @@ public:
bool supportsChmod() const override { return false; }
bool supportsStat() const override { return false; }
protected:
virtual std::shared_ptr<PathMap> getPathMap() const { throwNotImplemented(); }
virtual std::vector<std::string> getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const;
};
class MetadataStorageFromPlainObjectStorageTransaction final : public IMetadataTransaction
class MetadataStorageFromPlainObjectStorageTransaction final : public IMetadataTransaction, private MetadataOperationsHolder
{
private:
const MetadataStorageFromPlainObjectStorage & metadata_storage;
MetadataStorageFromPlainObjectStorage & metadata_storage;
ObjectStoragePtr object_storage;
std::vector<MetadataOperationPtr> operations;
public:
explicit MetadataStorageFromPlainObjectStorageTransaction(const MetadataStorageFromPlainObjectStorage & metadata_storage_)
: metadata_storage(metadata_storage_)
explicit MetadataStorageFromPlainObjectStorageTransaction(
MetadataStorageFromPlainObjectStorage & metadata_storage_, ObjectStoragePtr object_storage_)
: metadata_storage(metadata_storage_), object_storage(object_storage_)
{}
const IMetadataStorage & getStorageForNonTransactionalReads() const override;
void addBlobToMetadata(const std::string & path, ObjectStorageKey object_key, uint64_t size_in_bytes) override;
void setLastModified(const String &, const Poco::Timestamp &) override
{
/// Noop
}
void createEmptyMetadataFile(const std::string & /* path */) override
{
/// No metadata, no need to create anything.
@ -100,17 +122,15 @@ public:
void createDirectoryRecursive(const std::string & path) override;
void moveDirectory(const std::string & path_from, const std::string & path_to) override;
void unlinkFile(const std::string & path) override;
void removeDirectory(const std::string & path) override;
UnlinkMetadataFileOperationOutcomePtr unlinkMetadata(const std::string & path) override;
void commit() override
{
/// TODO: rewrite with transactions
}
void commit() override;
bool supportsChmod() const override { return false; }
};
}

View File

@ -0,0 +1,190 @@
#include "MetadataStorageFromPlainObjectStorageOperations.h"
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Common/Exception.h>
#include <Common/logger_useful.h>
namespace DB
{
namespace ErrorCodes
{
extern const int FILE_DOESNT_EXIST;
extern const int FILE_ALREADY_EXISTS;
extern const int INCORRECT_DATA;
};
namespace
{
constexpr auto PREFIX_PATH_FILE_NAME = "prefix.path";
}
MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::MetadataStorageFromPlainObjectStorageCreateDirectoryOperation(
std::filesystem::path && path_,
std::string && key_prefix_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_)
: path(std::move(path_)), key_prefix(key_prefix_), path_map(path_map_), object_storage(object_storage_)
{
}
void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
{
if (path_map.contains(path))
return;
LOG_TRACE(getLogger("MetadataStorageFromPlainObjectStorageCreateDirectoryOperation"), "Creating metadata for directory '{}'", path);
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
auto buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/* buf_size */ DBMS_DEFAULT_BUFFER_SIZE,
/* settings */ {});
write_created = true;
[[maybe_unused]] auto result = path_map.emplace(path, std::move(key_prefix));
chassert(result.second);
writeString(path.string(), *buf);
buf->finalize();
write_finalized = true;
}
void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
if (write_finalized)
{
path_map.erase(path);
object_storage->removeObject(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME));
}
else if (write_created)
object_storage->removeObjectIfExists(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME));
}
MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::MetadataStorageFromPlainObjectStorageMoveDirectoryOperation(
std::filesystem::path && path_from_,
std::filesystem::path && path_to_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_)
: path_from(std::move(path_from_)), path_to(std::move(path_to_)), path_map(path_map_), object_storage(object_storage_)
{
}
std::unique_ptr<WriteBufferFromFileBase> MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::createWriteBuf(
const std::filesystem::path & expected_path, const std::filesystem::path & new_path, bool validate_content)
{
auto expected_it = path_map.find(expected_path);
if (expected_it == path_map.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "Metadata object for the expected (source) path '{}' does not exist", expected_path);
if (path_map.contains(new_path))
throw Exception(ErrorCodes::FILE_ALREADY_EXISTS, "Metadata object for the new (destination) path '{}' already exists", new_path);
auto object_key = ObjectStorageKey::createAsRelative(expected_it->second, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), expected_path / PREFIX_PATH_FILE_NAME);
if (validate_content)
{
std::string data;
auto read_buf = object_storage->readObject(object);
readStringUntilEOF(data, *read_buf);
if (data != path_from)
throw Exception(
ErrorCodes::INCORRECT_DATA,
"Incorrect data for object key {}, expected {}, got {}",
object_key.serialize(),
expected_path,
data);
}
auto write_buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/*buf_size*/ DBMS_DEFAULT_BUFFER_SIZE,
/*settings*/ {});
return write_buf;
}
void MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::execute(std::unique_lock<SharedMutex> & /* metadata_lock */)
{
LOG_TRACE(
getLogger("MetadataStorageFromPlainObjectStorageMoveDirectoryOperation"), "Moving directory '{}' to '{}'", path_from, path_to);
auto write_buf = createWriteBuf(path_from, path_to, /* validate_content */ true);
write_created = true;
writeString(path_to.string(), *write_buf);
write_buf->finalize();
[[maybe_unused]] auto result = path_map.emplace(path_to, path_map.extract(path_from).mapped());
chassert(result.second);
write_finalized = true;
}
void MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
if (write_finalized)
path_map.emplace(path_from, path_map.extract(path_to).mapped());
if (write_created)
{
auto write_buf = createWriteBuf(path_to, path_from, /* verify_content */ false);
writeString(path_from.string(), *write_buf);
write_buf->finalize();
}
}
MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation(
std::filesystem::path && path_, MetadataStorageFromPlainObjectStorage::PathMap & path_map_, ObjectStoragePtr object_storage_)
: path(std::move(path_)), path_map(path_map_), object_storage(object_storage_)
{
}
void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::execute(std::unique_lock<SharedMutex> & /* metadata_lock */)
{
auto path_it = path_map.find(path);
if (path_it == path_map.end())
return;
LOG_TRACE(getLogger("MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation"), "Removing directory '{}'", path);
key_prefix = path_it->second;
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
object_storage->removeObject(object);
path_map.erase(path_it);
}
void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
if (!removed)
return;
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
auto buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/* buf_size */ DBMS_DEFAULT_BUFFER_SIZE,
/* settings */ {});
writeString(path.string(), *buf);
buf->finalize();
path_map.emplace(path, std::move(key_prefix));
}
}

View File

@ -0,0 +1,80 @@
#pragma once
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <filesystem>
#include <map>
namespace DB
{
class MetadataStorageFromPlainObjectStorageCreateDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path;
std::string key_prefix;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
bool write_created = false;
bool write_finalized = false;
public:
// Assuming that paths are normalized.
MetadataStorageFromPlainObjectStorageCreateDirectoryOperation(
std::filesystem::path && path_,
std::string && key_prefix_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
class MetadataStorageFromPlainObjectStorageMoveDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path_from;
std::filesystem::path path_to;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
bool write_created = false;
bool write_finalized = false;
std::unique_ptr<WriteBufferFromFileBase>
createWriteBuf(const std::filesystem::path & expected_path, const std::filesystem::path & new_path, bool validate_content);
public:
MetadataStorageFromPlainObjectStorageMoveDirectoryOperation(
std::filesystem::path && path_from_,
std::filesystem::path && path_to_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
class MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
std::string key_prefix;
bool removed = false;
public:
MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation(
std::filesystem::path && path_, MetadataStorageFromPlainObjectStorage::PathMap & path_map_, ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
}

View File

@ -0,0 +1,143 @@
#include <Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h>
#include <IO/ReadHelpers.h>
#include <Common/ErrorCodes.h>
#include <Common/logger_useful.h>
#include "CommonPathPrefixKeyGenerator.h"
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
namespace
{
constexpr auto PREFIX_PATH_FILE_NAME = "prefix.path";
MetadataStorageFromPlainObjectStorage::PathMap loadPathPrefixMap(const std::string & root, ObjectStoragePtr object_storage)
{
MetadataStorageFromPlainObjectStorage::PathMap result;
RelativePathsWithMetadata files;
object_storage->listObjects(root, files, 0);
for (const auto & file : files)
{
auto remote_path = std::filesystem::path(file.relative_path);
if (remote_path.filename() != PREFIX_PATH_FILE_NAME)
continue;
StoredObject object{file.relative_path};
auto read_buf = object_storage->readObject(object);
String local_path;
readStringUntilEOF(local_path, *read_buf);
chassert(remote_path.has_parent_path());
auto res = result.emplace(local_path, remote_path.parent_path());
/// This can happen if table replication is enabled, then the same local path is written
/// in `prefix.path` of each replica.
/// TODO: should replicated tables (e.g., RMT) be explicitly disallowed?
if (!res.second)
LOG_WARNING(
getLogger("MetadataStorageFromPlainObjectStorage"),
"The local path '{}' is already mapped to a remote path '{}', ignoring: '{}'",
local_path,
res.first->second,
remote_path.parent_path().string());
}
return result;
}
std::vector<std::string> getDirectChildrenOnRewritableDisk(
const std::string & storage_key,
const RelativePathsWithMetadata & remote_paths,
const std::string & local_path,
const MetadataStorageFromPlainObjectStorage::PathMap & local_path_prefixes,
SharedMutex & shared_mutex)
{
using PathMap = MetadataStorageFromPlainObjectStorage::PathMap;
std::unordered_set<std::string> duplicates_filter;
/// Map remote paths into local subdirectories.
std::unordered_map<PathMap::mapped_type, PathMap::key_type> remote_to_local_subdir;
{
std::shared_lock lock(shared_mutex);
auto end_it = local_path_prefixes.end();
for (auto it = local_path_prefixes.lower_bound(local_path); it != end_it; ++it)
{
const auto & [k, v] = std::make_tuple(it->first.string(), it->second);
if (!k.starts_with(local_path))
break;
auto slash_num = count(k.begin() + local_path.size(), k.end(), '/');
if (slash_num != 1)
continue;
chassert(k.back() == '/');
remote_to_local_subdir.emplace(v, std::string(k.begin() + local_path.size(), k.end() - 1));
}
}
auto skip_list = std::set<std::string>{PREFIX_PATH_FILE_NAME};
for (const auto & elem : remote_paths)
{
const auto & path = elem.relative_path;
chassert(path.find(storage_key) == 0);
const auto child_pos = storage_key.size();
auto slash_pos = path.find('/', child_pos);
if (slash_pos == std::string::npos)
{
/// File names.
auto filename = path.substr(child_pos);
if (!skip_list.contains(filename))
duplicates_filter.emplace(std::move(filename));
}
else
{
/// Subdirectories.
auto it = remote_to_local_subdir.find(path.substr(0, slash_pos));
/// Mapped subdirectories.
if (it != remote_to_local_subdir.end())
duplicates_filter.emplace(it->second);
/// The remote subdirectory name is the same as the local subdirectory.
else
duplicates_filter.emplace(path.substr(child_pos, slash_pos - child_pos));
}
}
return std::vector<std::string>(std::make_move_iterator(duplicates_filter.begin()), std::make_move_iterator(duplicates_filter.end()));
}
}
MetadataStorageFromPlainRewritableObjectStorage::MetadataStorageFromPlainRewritableObjectStorage(
ObjectStoragePtr object_storage_, String storage_path_prefix_)
: MetadataStorageFromPlainObjectStorage(object_storage_, storage_path_prefix_)
, path_map(std::make_shared<PathMap>(loadPathPrefixMap(object_storage->getCommonKeyPrefix(), object_storage)))
{
if (object_storage->isWriteOnce())
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"MetadataStorageFromPlainRewritableObjectStorage is not compatible with write-once storage '{}'",
object_storage->getName());
auto keys_gen = std::make_shared<CommonPathPrefixKeyGenerator>(object_storage->getCommonKeyPrefix(), metadata_mutex, path_map);
object_storage->setKeysGenerator(keys_gen);
}
std::vector<std::string> MetadataStorageFromPlainRewritableObjectStorage::getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const
{
return getDirectChildrenOnRewritableDisk(storage_key, remote_paths, local_path, *getPathMap(), metadata_mutex);
}
}

View File

@ -0,0 +1,26 @@
#pragma once
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <memory>
namespace DB
{
class MetadataStorageFromPlainRewritableObjectStorage final : public MetadataStorageFromPlainObjectStorage
{
private:
std::shared_ptr<PathMap> path_map;
public:
MetadataStorageFromPlainRewritableObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_);
MetadataStorageType getType() const override { return MetadataStorageType::PlainRewritable; }
protected:
std::shared_ptr<PathMap> getPathMap() const override { return path_map; }
std::vector<std::string> getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const override;
};
}

View File

@ -0,0 +1,23 @@
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <base/defines.h>
namespace DB
{
std::string toString(MetadataStorageTransactionState state)
{
switch (state)
{
case MetadataStorageTransactionState::PREPARING:
return "PREPARING";
case MetadataStorageTransactionState::FAILED:
return "FAILED";
case MetadataStorageTransactionState::COMMITTED:
return "COMMITTED";
case MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK:
return "PARTIALLY_ROLLED_BACK";
}
UNREACHABLE();
}
}

View File

@ -4,7 +4,7 @@
namespace DB
{
enum class MetadataFromDiskTransactionState
enum class MetadataStorageTransactionState
{
PREPARING,
FAILED,
@ -12,6 +12,5 @@ enum class MetadataFromDiskTransactionState
PARTIALLY_ROLLED_BACK,
};
std::string toString(MetadataFromDiskTransactionState state);
std::string toString(MetadataStorageTransactionState state);
}

View File

@ -1,9 +1,11 @@
#include "config.h"
#include <utility>
#include <Disks/ObjectStorages/ObjectStorageFactory.h>
#include "Disks/DiskType.h"
#include "config.h"
#if USE_AWS_S3
#include <Disks/ObjectStorages/S3/DiskS3Utils.h>
#include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
#include <Disks/ObjectStorages/S3/diskSettings.h>
#include <Disks/ObjectStorages/S3/DiskS3Utils.h>
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
#include <Disks/ObjectStorages/HDFS/HDFSObjectStorage.h>
@ -20,6 +22,7 @@
#endif
#include <Disks/ObjectStorages/MetadataStorageFactory.h>
#include <Disks/ObjectStorages/PlainObjectStorage.h>
#include <Disks/ObjectStorages/PlainRewritableObjectStorage.h>
#include <Interpreters/Context.h>
#include <Common/Macros.h>
@ -35,36 +38,50 @@ namespace ErrorCodes
extern const int UNKNOWN_ELEMENT_IN_CONFIG;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
extern const int NOT_IMPLEMENTED;
}
namespace
{
bool isPlainStorage(
ObjectStorageType type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix)
{
auto compatibility_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(type);
auto metadata_type = MetadataStorageFactory::getMetadataType(config, config_prefix, compatibility_hint);
return metadataTypeFromString(metadata_type) == MetadataStorageType::Plain;
}
template <typename BaseObjectStorage, class ...Args>
ObjectStoragePtr createObjectStorage(
ObjectStorageType type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
Args && ...args)
bool isCompatibleWithMetadataStorage(
ObjectStorageType storage_type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
MetadataStorageType target_metadata_type)
{
auto compatibility_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(storage_type);
auto metadata_type = MetadataStorageFactory::getMetadataType(config, config_prefix, compatibility_hint);
return metadataTypeFromString(metadata_type) == target_metadata_type;
}
bool isPlainStorage(ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
{
return isCompatibleWithMetadataStorage(type, config, config_prefix, MetadataStorageType::Plain);
}
bool isPlainRewritableStorage(ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
{
return isCompatibleWithMetadataStorage(type, config, config_prefix, MetadataStorageType::PlainRewritable);
}
template <typename BaseObjectStorage, class... Args>
ObjectStoragePtr createObjectStorage(
ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, Args &&... args)
{
if (isPlainStorage(type, config, config_prefix))
return std::make_shared<PlainObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
else if (isPlainRewritableStorage(type, config, config_prefix))
{
if (isPlainStorage(type, config, config_prefix))
{
return std::make_shared<PlainObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
}
else
{
return std::make_shared<BaseObjectStorage>(std::forward<Args>(args)...);
}
/// TODO(jkartseva@): Test support for generic disk type
if (type != ObjectStorageType::S3)
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "plain_rewritable metadata storage support is implemented only for S3");
return std::make_shared<PlainRewritableObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
}
else
return std::make_shared<BaseObjectStorage>(std::forward<Args>(args)...);
}
}
ObjectStorageFactory & ObjectStorageFactory::instance()
@ -76,10 +93,7 @@ ObjectStorageFactory & ObjectStorageFactory::instance()
void ObjectStorageFactory::registerObjectStorageType(const std::string & type, Creator creator)
{
if (!registry.emplace(type, creator).second)
{
throw Exception(ErrorCodes::LOGICAL_ERROR,
"ObjectStorageFactory: the metadata type '{}' is not unique", type);
}
throw Exception(ErrorCodes::LOGICAL_ERROR, "ObjectStorageFactory: the metadata type '{}' is not unique", type);
}
ObjectStoragePtr ObjectStorageFactory::create(
@ -91,13 +105,9 @@ ObjectStoragePtr ObjectStorageFactory::create(
{
std::string type;
if (config.has(config_prefix + ".object_storage_type"))
{
type = config.getString(config_prefix + ".object_storage_type");
}
else if (config.has(config_prefix + ".type")) /// .type -- for compatibility.
{
type = config.getString(config_prefix + ".type");
}
else
{
throw Exception(ErrorCodes::NO_ELEMENTS_IN_CONFIG, "Expected `object_storage_type` in config");
@ -210,31 +220,66 @@ void registerS3PlainObjectStorage(ObjectStorageFactory & factory)
return object_storage;
});
}
void registerS3PlainRewritableObjectStorage(ObjectStorageFactory & factory)
{
static constexpr auto disk_type = "s3_plain_rewritable";
factory.registerObjectStorageType(
disk_type,
[](const std::string & name,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool skip_access_check) -> ObjectStoragePtr
{
/// send_metadata changes the filenames (includes revision), while
/// s3_plain_rewritable does not support file renaming.
if (config.getBool(config_prefix + ".send_metadata", false))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "s3_plain_rewritable does not supports send_metadata");
auto uri = getS3URI(config, config_prefix, context);
auto s3_capabilities = getCapabilitiesFromConfig(config, config_prefix);
auto settings = getSettings(config, config_prefix, context);
auto client = getClient(config, config_prefix, context, *settings);
auto key_generator = getKeyGenerator(uri, config, config_prefix);
auto object_storage = std::make_shared<PlainRewritableObjectStorage<S3ObjectStorage>>(
std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name);
/// NOTE: should we still perform this check for clickhouse-disks?
if (!skip_access_check)
checkS3Capabilities(*dynamic_cast<S3ObjectStorage *>(object_storage.get()), s3_capabilities, name);
return object_storage;
});
}
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
void registerHDFSObjectStorage(ObjectStorageFactory & factory)
{
factory.registerObjectStorageType("hdfs", [](
const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool /* skip_access_check */) -> ObjectStoragePtr
{
auto uri = context->getMacros()->expand(config.getString(config_prefix + ".endpoint"));
checkHDFSURL(uri);
if (uri.back() != '/')
throw Exception(ErrorCodes::BAD_ARGUMENTS, "HDFS path must ends with '/', but '{}' doesn't.", uri);
factory.registerObjectStorageType(
"hdfs",
[](const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool /* skip_access_check */) -> ObjectStoragePtr
{
auto uri = context->getMacros()->expand(config.getString(config_prefix + ".endpoint"));
checkHDFSURL(uri);
if (uri.back() != '/')
throw Exception(ErrorCodes::BAD_ARGUMENTS, "HDFS path must ends with '/', but '{}' doesn't.", uri);
std::unique_ptr<HDFSObjectStorageSettings> settings = std::make_unique<HDFSObjectStorageSettings>(
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getInt(config_prefix + ".objects_chunk_size_to_delete", 1000),
context->getSettingsRef().hdfs_replication
);
std::unique_ptr<HDFSObjectStorageSettings> settings = std::make_unique<HDFSObjectStorageSettings>(
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getInt(config_prefix + ".objects_chunk_size_to_delete", 1000),
context->getSettingsRef().hdfs_replication);
return createObjectStorage<HDFSObjectStorage>(ObjectStorageType::HDFS, config, config_prefix, uri, std::move(settings), config);
});
return createObjectStorage<HDFSObjectStorage>(ObjectStorageType::HDFS, config, config_prefix, uri, std::move(settings), config);
});
}
#endif
@ -317,6 +362,7 @@ void registerObjectStorages()
#if USE_AWS_S3
registerS3ObjectStorage(factory);
registerS3PlainObjectStorage(factory);
registerS3PlainRewritableObjectStorage(factory);
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)

View File

@ -0,0 +1,24 @@
#pragma once
#include <Disks/ObjectStorages/IObjectStorage.h>
namespace DB
{
template <typename BaseObjectStorage>
class PlainRewritableObjectStorage : public BaseObjectStorage
{
public:
template <class... Args>
explicit PlainRewritableObjectStorage(Args &&... args) : BaseObjectStorage(std::forward<Args>(args)...)
{
}
std::string getName() const override { return "PlainRewritable" + BaseObjectStorage::getName(); }
bool isWriteOnce() const override { return false; }
bool isPlain() const override { return true; }
};
}

View File

@ -29,7 +29,10 @@ void registerDiskObjectStorage(DiskFactory & factory, bool global_skip_access_ch
if (!config.has(config_prefix + ".metadata_type"))
{
if (object_storage->isPlain())
compatibility_metadata_type_hint = "plain";
if (object_storage->isWriteOnce())
compatibility_metadata_type_hint = "plain";
else
compatibility_metadata_type_hint = "plain_rewritable";
else
compatibility_metadata_type_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(object_storage->getType());
}
@ -53,6 +56,7 @@ void registerDiskObjectStorage(DiskFactory & factory, bool global_skip_access_ch
#if USE_AWS_S3
factory.registerDiskType("s3", creator); /// For compatibility
factory.registerDiskType("s3_plain", creator); /// For compatibility
factory.registerDiskType("s3_plain_rewritable", creator); // For compatibility
#endif
#if USE_HDFS
factory.registerDiskType("hdfs", creator); /// For compatibility

View File

@ -1,4 +1,5 @@
#include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
#include "Common/ObjectStorageKey.h"
#if USE_AWS_S3
@ -568,10 +569,17 @@ ObjectStorageKey S3ObjectStorage::generateObjectKeyForPath(const std::string & p
{
if (!key_generator)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Key generator is not set");
return key_generator->generate(path);
return key_generator->generate(path, /* is_directory */ false);
}
ObjectStorageKey S3ObjectStorage::generateObjectKeyPrefixForDirectoryPath(const std::string & path) const
{
if (!key_generator)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Key generator is not set");
return key_generator->generate(path, /* is_directory */ true);
}
}
#endif

View File

@ -43,8 +43,6 @@ struct S3ObjectStorageSettings
class S3ObjectStorage : public IObjectStorage
{
private:
friend class S3PlainObjectStorage;
S3ObjectStorage(
const char * logger_name,
std::unique_ptr<S3::Client> && client_,
@ -54,11 +52,11 @@ private:
ObjectStorageKeysGeneratorPtr key_generator_,
const String & disk_name_)
: uri(uri_)
, key_generator(std::move(key_generator_))
, disk_name(disk_name_)
, client(std::move(client_))
, s3_settings(std::move(s3_settings_))
, s3_capabilities(s3_capabilities_)
, key_generator(std::move(key_generator_))
, log(getLogger(logger_name))
{
}
@ -161,9 +159,12 @@ public:
bool supportParallelWrite() const override { return true; }
ObjectStorageKey generateObjectKeyForPath(const std::string & path) const override;
ObjectStorageKey generateObjectKeyPrefixForDirectoryPath(const std::string & path) const override;
bool isReadOnly() const override { return s3_settings.get()->read_only; }
void setKeysGenerator(ObjectStorageKeysGeneratorPtr gen) override { key_generator = gen; }
private:
void setNewSettings(std::unique_ptr<S3ObjectStorageSettings> && s3_settings_);
@ -172,13 +173,14 @@ private:
const S3::URI uri;
ObjectStorageKeysGeneratorPtr key_generator;
std::string disk_name;
MultiVersion<S3::Client> client;
MultiVersion<S3ObjectStorageSettings> s3_settings;
S3Capabilities s3_capabilities;
ObjectStorageKeysGeneratorPtr key_generator;
LoggerPtr log;
};

View File

@ -1,9 +1,9 @@
#pragma once
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/Web/WebObjectStorage.h>
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <Disks/ObjectStorages/Web/WebObjectStorage.h>
namespace DB

View File

@ -305,7 +305,7 @@ InputFormatPtr FormatFactory::getInput(
auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context);
const Settings & settings = context->getSettingsRef();
size_t max_parsing_threads = _max_parsing_threads.value_or(settings.max_threads);
size_t max_parsing_threads = _max_parsing_threads.value_or(settings.max_parsing_threads);
size_t max_download_threads = _max_download_threads.value_or(settings.max_download_threads);
RowInputFormatParams row_input_format_params;

View File

@ -130,6 +130,7 @@ UUIDSerializer::Variant parseVariant(const DB::ColumnsWithTypeAndName & argument
const auto representation = static_cast<magic_enum::underlying_type_t<UUIDSerializer::Variant>>(arguments[1].column->getInt(0));
const auto as_enum = magic_enum::enum_cast<UUIDSerializer::Variant>(representation);
if (!as_enum)
throw DB::Exception(DB::ErrorCodes::ARGUMENT_OUT_OF_BOUND, "Expected UUID variant, got {}", representation);
@ -170,6 +171,7 @@ public:
}
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
@ -248,6 +250,7 @@ public:
}
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{

View File

@ -45,7 +45,7 @@ template <> struct FunctionUnaryArithmeticMonotonicity<NameAbs>
if ((left_float < 0 && right_float > 0) || (left_float > 0 && right_float < 0))
return {};
return { .is_monotonic = true, .is_positive = left_float > 0, .is_strict = true, };
return { .is_monotonic = true, .is_positive = std::min(left_float, right_float) >= 0, .is_strict = true, };
}
};

View File

@ -109,4 +109,9 @@ void copyDataWithThrottler(ReadBuffer & from, WriteBuffer & to, size_t bytes, co
copyDataImpl(from, to, true, bytes, &is_cancelled, throttler);
}
void copyDataWithThrottler(ReadBuffer & from, WriteBuffer & to, std::function<void()> cancellation_hook, ThrottlerPtr throttler)
{
copyDataImpl(from, to, false, std::numeric_limits<size_t>::max(), cancellation_hook, throttler);
}
}

View File

@ -33,5 +33,6 @@ void copyDataMaxBytes(ReadBuffer & from, WriteBuffer & to, size_t max_bytes);
/// Same as above but also use throttler to limit maximum speed
void copyDataWithThrottler(ReadBuffer & from, WriteBuffer & to, const std::atomic<int> & is_cancelled, ThrottlerPtr throttler);
void copyDataWithThrottler(ReadBuffer & from, WriteBuffer & to, size_t bytes, const std::atomic<int> & is_cancelled, ThrottlerPtr throttler);
void copyDataWithThrottler(ReadBuffer & from, WriteBuffer & to, std::function<void()> cancellation_hook, ThrottlerPtr throttler);
}

View File

@ -2013,6 +2013,63 @@ ActionsDAG::SplitResult ActionsDAG::splitActionsBySortingDescription(const NameS
return res;
}
bool ActionsDAG::isFilterAlwaysFalseForDefaultValueInputs(const std::string & filter_name, const Block & input_stream_header)
{
const auto * filter_node = tryFindInOutputs(filter_name);
if (!filter_node)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Outputs for ActionsDAG does not contain filter column name {}. DAG:\n{}",
filter_name,
dumpDAG());
std::unordered_map<std::string, ColumnWithTypeAndName> input_node_name_to_default_input_column;
for (const auto * input : inputs)
{
if (!input_stream_header.has(input->result_name))
continue;
if (input->column)
continue;
auto constant_column = input->result_type->createColumnConst(0, input->result_type->getDefault());
auto constant_column_with_type_and_name = ColumnWithTypeAndName{constant_column, input->result_type, input->result_name};
input_node_name_to_default_input_column.emplace(input->result_name, std::move(constant_column_with_type_and_name));
}
ActionsDAGPtr filter_with_default_value_inputs;
try
{
filter_with_default_value_inputs = buildFilterActionsDAG({filter_node}, input_node_name_to_default_input_column);
}
catch (const Exception &)
{
/** It is possible that duing DAG construction, some functions cannot be executed for constant default value inputs
* and exception will be thrown.
*/
return false;
}
const auto * filter_with_default_value_inputs_filter_node = filter_with_default_value_inputs->getOutputs()[0];
if (!filter_with_default_value_inputs_filter_node->column || !isColumnConst(*filter_with_default_value_inputs_filter_node->column))
return false;
const auto & constant_type = filter_with_default_value_inputs_filter_node->result_type;
auto which_constant_type = WhichDataType(constant_type);
if (!which_constant_type.isUInt8() && !which_constant_type.isNothing())
return false;
Field value;
filter_with_default_value_inputs_filter_node->column->get(0, value);
if (value.isNull())
return true;
UInt8 predicate_value = value.safeGet<UInt8>();
return predicate_value == 0;
}
ActionsDAG::SplitResult ActionsDAG::splitActionsForFilter(const std::string & column_name) const
{
const auto * node = tryFindInOutputs(column_name);

View File

@ -355,6 +355,13 @@ public:
/// The second contains the rest.
SplitResult splitActionsBySortingDescription(const NameSet & sort_columns) const;
/** Returns true if filter DAG is always false for inputs with default values.
*
* @param filter_name - name of filter node in current DAG.
* @param input_stream_header - input stream header.
*/
bool isFilterAlwaysFalseForDefaultValueInputs(const std::string & filter_name, const Block & input_stream_header);
/// Create actions which may calculate part of filter using only available_inputs.
/// If nothing may be calculated, returns nullptr.
/// Otherwise, return actions which inputs are from available_inputs.

View File

@ -245,11 +245,15 @@ void executeQuery(
const auto & shard_info = cluster->getShardsInfo()[i];
auto query_for_shard = query_info.query_tree->clone();
if (sharding_key_expr && query_info.optimized_cluster && settings.optimize_skip_unused_shards_rewrite_in && shards > 1)
if (sharding_key_expr &&
query_info.optimized_cluster &&
settings.optimize_skip_unused_shards_rewrite_in &&
shards > 1 &&
/// TODO: support composite sharding key
sharding_key_expr->getRequiredColumns().size() == 1)
{
OptimizeShardingKeyRewriteInVisitor::Data visitor_data{
sharding_key_expr,
sharding_key_expr->getSampleBlock().getByPosition(0).type,
sharding_key_column_name,
shard_info,
not_optimized_cluster->getSlotToShard(),
@ -282,11 +286,15 @@ void executeQuery(
const auto & shard_info = cluster->getShardsInfo()[i];
ASTPtr query_ast_for_shard = query_info.query->clone();
if (sharding_key_expr && query_info.optimized_cluster && settings.optimize_skip_unused_shards_rewrite_in && shards > 1)
if (sharding_key_expr &&
query_info.optimized_cluster &&
settings.optimize_skip_unused_shards_rewrite_in &&
shards > 1 &&
/// TODO: support composite sharding key
sharding_key_expr->getRequiredColumns().size() == 1)
{
OptimizeShardingKeyRewriteInVisitor::Data visitor_data{
sharding_key_expr,
sharding_key_expr->getSampleBlock().getByPosition(0).type,
sharding_key_column_name,
shard_info,
not_optimized_cluster->getSlotToShard(),

View File

@ -1617,6 +1617,33 @@ void Context::addExternalTable(const String & table_name, TemporaryTableHolder &
external_tables_mapping.emplace(table_name, std::make_shared<TemporaryTableHolder>(std::move(temporary_table)));
}
void Context::updateExternalTable(const String & table_name, TemporaryTableHolder && temporary_table)
{
if (isGlobalContext())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Global context cannot have external tables");
auto temporary_table_ptr = std::make_shared<TemporaryTableHolder>(std::move(temporary_table));
std::lock_guard lock(mutex);
auto it = external_tables_mapping.find(table_name);
if (it == external_tables_mapping.end())
throw Exception(ErrorCodes::TABLE_ALREADY_EXISTS, "Temporary table {} does not exists", backQuoteIfNeed(table_name));
it->second = std::move(temporary_table_ptr);
}
void Context::addOrUpdateExternalTable(const String & table_name, TemporaryTableHolder && temporary_table)
{
if (isGlobalContext())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Global context cannot have external tables");
auto temporary_table_ptr = std::make_shared<TemporaryTableHolder>(std::move(temporary_table));
std::lock_guard lock(mutex);
auto [it, inserted] = external_tables_mapping.emplace(table_name, temporary_table_ptr);
if (!inserted)
it->second = std::move(temporary_table_ptr);
}
std::shared_ptr<TemporaryTableHolder> Context::findExternalTable(const String & table_name) const
{
if (isGlobalContext())

View File

@ -315,6 +315,7 @@ protected:
/// This parameter can be set by the HTTP client to tune the behavior of output formats for compatibility.
UInt64 client_protocol_version = 0;
public:
/// Record entities accessed by current query, and store this information in system.query_log.
struct QueryAccessInfo
{
@ -339,8 +340,10 @@ protected:
return *this;
}
void swap(QueryAccessInfo & rhs) noexcept
void swap(QueryAccessInfo & rhs) noexcept TSA_NO_THREAD_SAFETY_ANALYSIS
{
/// TSA_NO_THREAD_SAFETY_ANALYSIS because it doesn't support scoped_lock
std::scoped_lock lck{mutex, rhs.mutex};
std::swap(databases, rhs.databases);
std::swap(tables, rhs.tables);
std::swap(columns, rhs.columns);
@ -351,19 +354,21 @@ protected:
/// To prevent a race between copy-constructor and other uses of this structure.
mutable std::mutex mutex{};
std::set<std::string> databases{};
std::set<std::string> tables{};
std::set<std::string> columns{};
std::set<std::string> partitions{};
std::set<std::string> projections{};
std::set<std::string> views{};
std::set<std::string> databases TSA_GUARDED_BY(mutex){};
std::set<std::string> tables TSA_GUARDED_BY(mutex){};
std::set<std::string> columns TSA_GUARDED_BY(mutex){};
std::set<std::string> partitions TSA_GUARDED_BY(mutex){};
std::set<std::string> projections TSA_GUARDED_BY(mutex){};
std::set<std::string> views TSA_GUARDED_BY(mutex){};
};
using QueryAccessInfoPtr = std::shared_ptr<QueryAccessInfo>;
protected:
/// In some situations, we want to be able to transfer the access info from children back to parents (e.g. definers context).
/// Therefore, query_access_info must be a pointer.
QueryAccessInfoPtr query_access_info;
public:
/// Record names of created objects of factories (for testing, etc)
struct QueryFactoriesInfo
{
@ -385,19 +390,20 @@ protected:
QueryFactoriesInfo(QueryFactoriesInfo && rhs) = delete;
std::unordered_set<std::string> aggregate_functions;
std::unordered_set<std::string> aggregate_function_combinators;
std::unordered_set<std::string> database_engines;
std::unordered_set<std::string> data_type_families;
std::unordered_set<std::string> dictionaries;
std::unordered_set<std::string> formats;
std::unordered_set<std::string> functions;
std::unordered_set<std::string> storages;
std::unordered_set<std::string> table_functions;
std::unordered_set<std::string> aggregate_functions TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> aggregate_function_combinators TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> database_engines TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> data_type_families TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> dictionaries TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> formats TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> functions TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> storages TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> table_functions TSA_GUARDED_BY(mutex);
mutable std::mutex mutex;
};
protected:
/// Needs to be changed while having const context in factories methods
mutable QueryFactoriesInfo query_factories_info;
/// Query metrics for reading data asynchronously with IAsynchronousReader.
@ -677,6 +683,8 @@ public:
Tables getExternalTables() const;
void addExternalTable(const String & table_name, TemporaryTableHolder && temporary_table);
void updateExternalTable(const String & table_name, TemporaryTableHolder && temporary_table);
void addOrUpdateExternalTable(const String & table_name, TemporaryTableHolder && temporary_table);
std::shared_ptr<TemporaryTableHolder> findExternalTable(const String & table_name) const;
std::shared_ptr<TemporaryTableHolder> removeExternalTable(const String & table_name);

View File

@ -113,8 +113,10 @@ struct TemporaryTableHolder : boost::noncopyable, WithContext
FutureSetFromSubqueryPtr future_set;
};
using TemporaryTableHolderPtr = std::shared_ptr<TemporaryTableHolder>;
///TODO maybe remove shared_ptr from here?
using TemporaryTablesMapping = std::map<String, std::shared_ptr<TemporaryTableHolder>>;
using TemporaryTablesMapping = std::map<String, TemporaryTableHolderPtr>;
class BackgroundSchedulePoolTaskHolder;

View File

@ -31,8 +31,21 @@ public:
}
std::string getName() const override { return "FullSortingMergeJoin"; }
const TableJoin & getTableJoin() const override { return *table_join; }
bool isCloneSupported() const override
{
return true;
}
std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_,
const Block &,
const Block & right_sample_block_) const override
{
return std::make_shared<FullSortingMergeJoin>(table_join_, right_sample_block_, null_direction);
}
int getNullDirection() const { return null_direction; }
bool addBlockToJoin(const Block & /* block */, bool /* check_limits */) override

View File

@ -236,11 +236,13 @@ static void correctNullabilityInplace(ColumnWithTypeAndName & column, bool nulla
}
HashJoin::HashJoin(std::shared_ptr<TableJoin> table_join_, const Block & right_sample_block_,
bool any_take_last_row_, size_t reserve_num, const String & instance_id_)
bool any_take_last_row_, size_t reserve_num_, const String & instance_id_)
: table_join(table_join_)
, kind(table_join->kind())
, strictness(table_join->strictness())
, any_take_last_row(any_take_last_row_)
, reserve_num(reserve_num_)
, instance_id(instance_id_)
, asof_inequality(table_join->getAsofInequality())
, data(std::make_shared<RightTableData>())
, right_sample_block(right_sample_block_)
@ -325,7 +327,7 @@ HashJoin::HashJoin(std::shared_ptr<TableJoin> table_join_, const Block & right_s
}
for (auto & maps : data->maps)
dataMapInit(maps, reserve_num);
dataMapInit(maps);
}
HashJoin::Type HashJoin::chooseMethod(JoinKind kind, const ColumnRawPtrs & key_columns, Sizes & key_sizes)
@ -487,9 +489,8 @@ struct KeyGetterForType
using Type = typename KeyGetterForTypeImpl<type, Value, Mapped>::Type;
};
void HashJoin::dataMapInit(MapsVariant & map, size_t reserve_num)
void HashJoin::dataMapInit(MapsVariant & map)
{
if (kind == JoinKind::Cross)
return;
joinDispatchInit(kind, strictness, map);

View File

@ -148,13 +148,26 @@ class HashJoin : public IJoin
public:
HashJoin(
std::shared_ptr<TableJoin> table_join_, const Block & right_sample_block,
bool any_take_last_row_ = false, size_t reserve_num = 0, const String & instance_id_ = "");
bool any_take_last_row_ = false, size_t reserve_num_ = 0, const String & instance_id_ = "");
~HashJoin() override;
std::string getName() const override { return "HashJoin"; }
const TableJoin & getTableJoin() const override { return *table_join; }
bool isCloneSupported() const override
{
return true;
}
std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_,
const Block &,
const Block & right_sample_block_) const override
{
return std::make_shared<HashJoin>(table_join_, right_sample_block_, any_take_last_row, reserve_num, instance_id);
}
/** Add block of data from right hand of JOIN to the map.
* Returns false, if some limit was exceeded and you should not insert more data.
*/
@ -412,7 +425,9 @@ private:
/// This join was created from StorageJoin and it is already filled.
bool from_storage_join = false;
bool any_take_last_row; /// Overwrite existing values when encountering the same key again
const bool any_take_last_row; /// Overwrite existing values when encountering the same key again
const size_t reserve_num;
const String instance_id;
std::optional<TypeIndex> asof_type;
const ASOFJoinInequality asof_inequality;
@ -454,7 +469,7 @@ private:
/// If set HashJoin instance is not available for modification (addBlockToJoin)
TableLockHolder storage_join_lock = nullptr;
void dataMapInit(MapsVariant &, size_t);
void dataMapInit(MapsVariant & map);
void initRightBlockStructure(Block & saved_block_sample);

View File

@ -11,6 +11,11 @@
namespace DB
{
namespace ErrorCodes
{
extern const int UNSUPPORTED_METHOD;
}
struct ExtraBlock;
using ExtraBlockPtr = std::shared_ptr<ExtraBlock>;
@ -52,6 +57,23 @@ public:
virtual const TableJoin & getTableJoin() const = 0;
/// Returns true if clone is supported
virtual bool isCloneSupported() const
{
return false;
}
/// Clone underlyhing JOIN algorithm using table join, left sample block, right sample block
virtual std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_,
const Block & left_sample_block_,
const Block & right_sample_block_) const
{
(void)(table_join_);
(void)(left_sample_block_);
(void)(right_sample_block_);
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Clone method is not supported for {}", getName());
}
/// Add block of data from right hand of JOIN.
/// @returns false, if some limit was exceeded and you should not insert more data.
virtual bool addBlockToJoin(const Block & block, bool check_limits = true) = 0; /// NOLINT

View File

@ -27,7 +27,9 @@ public:
const StoragePtr & storage_,
const SelectQueryOptions & select_query_options_);
/// Initialize interpreter with query tree
/** Initialize interpreter with query tree.
* No query tree passes are applied.
*/
InterpreterSelectQueryAnalyzer(const QueryTreeNodePtr & query_tree_,
const ContextPtr & context_,
const SelectQueryOptions & select_query_options_);

Some files were not shown because too many files have changed in this diff Show More