Merge remote-tracking branch 'rschu1ze/master' into punycode-revert-revert

This commit is contained in:
Robert Schulze 2024-01-07 08:08:46 +00:00
commit d54e500832
No known key found for this signature in database
GPG Key ID: 26703B55FB13728A
292 changed files with 4541 additions and 2424 deletions

View File

@ -22,7 +22,7 @@
* The MergeTree setting `clean_deleted_rows` is deprecated, it has no effect anymore. The `CLEANUP` keyword for the `OPTIMIZE` is not allowed by default (it can be unlocked with the `allow_experimental_replacing_merge_with_cleanup` setting). [#58267](https://github.com/ClickHouse/ClickHouse/pull/58267) ([Alexander Tokmakov](https://github.com/tavplubix)). This fixes [#57930](https://github.com/ClickHouse/ClickHouse/issues/57930). This closes [#54988](https://github.com/ClickHouse/ClickHouse/issues/54988). This closes [#54570](https://github.com/ClickHouse/ClickHouse/issues/54570). This closes [#50346](https://github.com/ClickHouse/ClickHouse/issues/50346). This closes [#47579](https://github.com/ClickHouse/ClickHouse/issues/47579). The feature has to be removed because it is not good. We have to remove it as quickly as possible, because there is no other option. [#57932](https://github.com/ClickHouse/ClickHouse/pull/57932) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Implement Refreshable Materialized Views, requested in [#33919](https://github.com/ClickHouse/ClickHouse/issues/57995). [#56946](https://github.com/ClickHouse/ClickHouse/pull/56946) ([Michael Kolupaev](https://github.com/al13n321), [Michael Guzov](https://github.com/koloshmet)).
* Implement Refreshable Materialized Views, requested in [#33919](https://github.com/ClickHouse/ClickHouse/issues/33919). [#56946](https://github.com/ClickHouse/ClickHouse/pull/56946) ([Michael Kolupaev](https://github.com/al13n321), [Michael Guzov](https://github.com/koloshmet)).
* Introduce `PASTE JOIN`, which allows users to join tables without `ON` clause simply by row numbers. Example: `SELECT * FROM (SELECT number AS a FROM numbers(2)) AS t1 PASTE JOIN (SELECT number AS a FROM numbers(2) ORDER BY a DESC) AS t2`. [#57995](https://github.com/ClickHouse/ClickHouse/pull/57995) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* The `ORDER BY` clause now supports specifying `ALL`, meaning that ClickHouse sorts by all columns in the `SELECT` clause. Example: `SELECT col1, col2 FROM tab WHERE [...] ORDER BY ALL`. [#57875](https://github.com/ClickHouse/ClickHouse/pull/57875) ([zhongyuankai](https://github.com/zhongyuankai)).
* Added a new mutation command `ALTER TABLE <table> APPLY DELETED MASK`, which allows to enforce applying of mask written by lightweight delete and to remove rows marked as deleted from disk. [#57433](https://github.com/ClickHouse/ClickHouse/pull/57433) ([Anton Popov](https://github.com/CurtizJ)).
@ -375,6 +375,7 @@
* Do not interpret the `send_timeout` set on the client side as the `receive_timeout` on the server side and vise-versa. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rewrited the experimental `S3Queue` table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Arbitrary paths on HTTP endpoint are no longer interpreted as a request to the `/query` endpoint. [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
#### New Feature
* Add function `arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)` which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.12.1.1368"
ARG VERSION="23.12.2.59"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.12.1.1368"
ARG VERSION="23.12.2.59"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -30,7 +30,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.12.1.1368"
ARG VERSION="23.12.2.59"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -236,6 +236,10 @@ function check_logs_for_critical_errors()
&& echo -e "S3_ERROR No such key thrown (see clickhouse-server.log or no_such_key_errors.txt)$FAIL$(trim_server_logs no_such_key_errors.txt)" >> /test_output/test_results.tsv \
|| echo -e "No lost s3 keys$OK" >> /test_output/test_results.tsv
rg -Fa "it is lost forever" /var/log/clickhouse-server/clickhouse-server*.log | grep 'SharedMergeTreePartCheckThread' > /dev/null \
&& echo -e "Lost forever for SharedMergeTree$FAIL" >> /test_output/test_results.tsv \
|| echo -e "No SharedMergeTree lost forever in clickhouse-server.log$OK" >> /test_output/test_results.tsv
# Remove file no_such_key_errors.txt if it's empty
[ -s /test_output/no_such_key_errors.txt ] || rm /test_output/no_such_key_errors.txt

View File

@ -0,0 +1,51 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.10.6.60-stable (68907bbe643) FIXME as compared to v23.10.5.20-stable (e84001e5c61)
#### Improvement
* Backported in [#58493](https://github.com/ClickHouse/ClickHouse/issues/58493): Fix transfer query to MySQL compatible query. Fixes [#57253](https://github.com/ClickHouse/ClickHouse/issues/57253). Fixes [#52654](https://github.com/ClickHouse/ClickHouse/issues/52654). Fixes [#56729](https://github.com/ClickHouse/ClickHouse/issues/56729). [#56456](https://github.com/ClickHouse/ClickHouse/pull/56456) ([flynn](https://github.com/ucasfl)).
* Backported in [#57659](https://github.com/ClickHouse/ClickHouse/issues/57659): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mike Kot (Михаил Кот)](https://github.com/myrrc)).
#### Build/Testing/Packaging Improvement
* Backported in [#57586](https://github.com/ClickHouse/ClickHouse/issues/57586): Fix issue caught in https://github.com/docker-library/official-images/pull/15846. [#57571](https://github.com/ClickHouse/ClickHouse/pull/57571) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix ALTER COLUMN with ALIAS [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)).
* Prevent incompatible ALTER of projection columns [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)).
* Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix incorrect JOIN plan optimization with partially materialized normal projection [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)).
* Fix `ReadonlyReplica` metric for all cases [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)).
* Background merges correctly use temporary data storage in the cache [#57275](https://github.com/ClickHouse/ClickHouse/pull/57275) ([vdimir](https://github.com/vdimir)).
* MergeTree mutations reuse source part index granularity [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix function jsonMergePatch for partially const columns [#57379](https://github.com/ClickHouse/ClickHouse/pull/57379) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)).
* bugfix: correctly parse SYSTEM STOP LISTEN TCP SECURE [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)).
* Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Disable system.kafka_consumers by default (due to possible live memory leak) [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)).
* Fix invalid memory access in BLAKE3 (Rust) [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)).
* Normalize function names in CREATE INDEX [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)).
* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix parallel parsing for JSONCompactEachRow [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix lost blobs after dropping a replica with broken detached parts [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)).
* MergeTreePrefetchedReadPool disable for LIMIT only queries [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)).
#### NO CL CATEGORY
* Backported in [#57916](https://github.com/ClickHouse/ClickHouse/issues/57916):. [#57909](https://github.com/ClickHouse/ClickHouse/pull/57909) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Pin alpine version of integration tests helper container [#57669](https://github.com/ClickHouse/ClickHouse/pull/57669) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove heavy rust stable toolchain [#57905](https://github.com/ClickHouse/ClickHouse/pull/57905) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix docker image for integration tests (fixes CI) [#57952](https://github.com/ClickHouse/ClickHouse/pull/57952) ([Azat Khuzhin](https://github.com/azat)).
* Fix test_user_valid_until [#58409](https://github.com/ClickHouse/ClickHouse/pull/58409) ([Nikolay Degterinsky](https://github.com/evillique)).

View File

@ -0,0 +1,26 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.11.4.24-stable (e79d840d7fe) FIXME as compared to v23.11.3.23-stable (a14ab450b0e)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)).
* Disable system.kafka_consumers by default (due to possible live memory leak) [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)).
* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)).
* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix parallel parsing for JSONCompactEachRow [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix lost blobs after dropping a replica with broken detached parts [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)).
* MergeTreePrefetchedReadPool disable for LIMIT only queries [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Handle another case for preprocessing in Keeper [#58308](https://github.com/ClickHouse/ClickHouse/pull/58308) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix test_user_valid_until [#58409](https://github.com/ClickHouse/ClickHouse/pull/58409) ([Nikolay Degterinsky](https://github.com/evillique)).

View File

@ -0,0 +1,32 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.12.2.59-stable (17ab210e761) FIXME as compared to v23.12.1.1368-stable (a2faa65b080)
#### Backward Incompatible Change
* Backported in [#58389](https://github.com/ClickHouse/ClickHouse/issues/58389): The MergeTree setting `clean_deleted_rows` is deprecated, it has no effect anymore. The `CLEANUP` keyword for `OPTIMIZE` is not allowed by default (unless `allow_experimental_replacing_merge_with_cleanup` is enabled). [#58316](https://github.com/ClickHouse/ClickHouse/pull/58316) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix lost blobs after dropping a replica with broken detached parts [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix segfault when graphite table does not have agg function [#58453](https://github.com/ClickHouse/ClickHouse/pull/58453) ([Duc Canh Le](https://github.com/canhld94)).
* MergeTreePrefetchedReadPool disable for LIMIT only queries [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Refreshable materialized views (takeover)"'. [#58296](https://github.com/ClickHouse/ClickHouse/pull/58296) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Fix an error in the release script - it didn't allow to make 23.12. [#58288](https://github.com/ClickHouse/ClickHouse/pull/58288) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v23.12.1.1368-stable [#58290](https://github.com/ClickHouse/ClickHouse/pull/58290) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix test_storage_s3_queue/test.py::test_drop_table [#58293](https://github.com/ClickHouse/ClickHouse/pull/58293) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Handle another case for preprocessing in Keeper [#58308](https://github.com/ClickHouse/ClickHouse/pull/58308) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix test_user_valid_until [#58409](https://github.com/ClickHouse/ClickHouse/pull/58409) ([Nikolay Degterinsky](https://github.com/evillique)).

View File

@ -0,0 +1,36 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.3.19.32-lts (c4d4ca8ec02) FIXME as compared to v23.3.18.15-lts (7228475d77a)
#### Backward Incompatible Change
* Backported in [#57840](https://github.com/ClickHouse/ClickHouse/issues/57840): Remove function `arrayFold` because it has a bug. This closes [#57816](https://github.com/ClickHouse/ClickHouse/issues/57816). This closes [#57458](https://github.com/ClickHouse/ClickHouse/issues/57458). [#57836](https://github.com/ClickHouse/ClickHouse/pull/57836) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Backported in [#58489](https://github.com/ClickHouse/ClickHouse/issues/58489): Fix transfer query to MySQL compatible query. Fixes [#57253](https://github.com/ClickHouse/ClickHouse/issues/57253). Fixes [#52654](https://github.com/ClickHouse/ClickHouse/issues/52654). Fixes [#56729](https://github.com/ClickHouse/ClickHouse/issues/56729). [#56456](https://github.com/ClickHouse/ClickHouse/pull/56456) ([flynn](https://github.com/ucasfl)).
* Backported in [#57653](https://github.com/ClickHouse/ClickHouse/issues/57653): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mike Kot (Михаил Кот)](https://github.com/myrrc)).
#### Build/Testing/Packaging Improvement
* Backported in [#57580](https://github.com/ClickHouse/ClickHouse/issues/57580): Fix issue caught in https://github.com/docker-library/official-images/pull/15846. [#57571](https://github.com/ClickHouse/ClickHouse/pull/57571) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Prevent incompatible ALTER of projection columns [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)).
* Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix incorrect JOIN plan optimization with partially materialized normal projection [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)).
* MergeTree mutations reuse source part index granularity [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix invalid memory access in BLAKE3 (Rust) [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)).
* Normalize function names in CREATE INDEX [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)).
* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Pin alpine version of integration tests helper container [#57669](https://github.com/ClickHouse/ClickHouse/pull/57669) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix docker image for integration tests (fixes CI) [#57952](https://github.com/ClickHouse/ClickHouse/pull/57952) ([Azat Khuzhin](https://github.com/azat)).

View File

@ -0,0 +1,47 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.8.9.54-lts (192a1d231fa) FIXME as compared to v23.8.8.20-lts (5e012a03bf2)
#### Improvement
* Backported in [#57668](https://github.com/ClickHouse/ClickHouse/issues/57668): Output valid JSON/XML on excetpion during HTTP query execution. Add setting `http_write_exception_in_output_format` to enable/disable this behaviour (enabled by default). [#52853](https://github.com/ClickHouse/ClickHouse/pull/52853) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#58491](https://github.com/ClickHouse/ClickHouse/issues/58491): Fix transfer query to MySQL compatible query. Fixes [#57253](https://github.com/ClickHouse/ClickHouse/issues/57253). Fixes [#52654](https://github.com/ClickHouse/ClickHouse/issues/52654). Fixes [#56729](https://github.com/ClickHouse/ClickHouse/issues/56729). [#56456](https://github.com/ClickHouse/ClickHouse/pull/56456) ([flynn](https://github.com/ucasfl)).
* Backported in [#57238](https://github.com/ClickHouse/ClickHouse/issues/57238): Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. [#56808](https://github.com/ClickHouse/ClickHouse/pull/56808) ([Sema Checherinda](https://github.com/CheSema)).
* Backported in [#57655](https://github.com/ClickHouse/ClickHouse/issues/57655): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mike Kot (Михаил Кот)](https://github.com/myrrc)).
#### Build/Testing/Packaging Improvement
* Backported in [#57582](https://github.com/ClickHouse/ClickHouse/issues/57582): Fix issue caught in https://github.com/docker-library/official-images/pull/15846. [#57571](https://github.com/ClickHouse/ClickHouse/pull/57571) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix ALTER COLUMN with ALIAS [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)).
* Prevent incompatible ALTER of projection columns [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)).
* Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix incorrect JOIN plan optimization with partially materialized normal projection [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)).
* Fix `ReadonlyReplica` metric for all cases [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)).
* bugfix: correctly parse SYSTEM STOP LISTEN TCP SECURE [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)).
* Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Disable system.kafka_consumers by default (due to possible live memory leak) [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)).
* Fix invalid memory access in BLAKE3 (Rust) [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)).
* Normalize function names in CREATE INDEX [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)).
* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix parallel parsing for JSONCompactEachRow [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)).
#### NO CL ENTRY
* NO CL ENTRY: 'Update PeekableWriteBuffer.cpp'. [#57701](https://github.com/ClickHouse/ClickHouse/pull/57701) ([Kruglov Pavel](https://github.com/Avogar)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Pin alpine version of integration tests helper container [#57669](https://github.com/ClickHouse/ClickHouse/pull/57669) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove heavy rust stable toolchain [#57905](https://github.com/ClickHouse/ClickHouse/pull/57905) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix docker image for integration tests (fixes CI) [#57952](https://github.com/ClickHouse/ClickHouse/pull/57952) ([Azat Khuzhin](https://github.com/azat)).

View File

@ -1262,6 +1262,7 @@ SELECT * FROM json_each_row_nested
- [input_format_import_nested_json](/docs/en/operations/settings/settings-formats.md/#input_format_import_nested_json) - map nested JSON data to nested tables (it works for JSONEachRow format). Default value - `false`.
- [input_format_json_read_bools_as_numbers](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_bools_as_numbers) - allow to parse bools as numbers in JSON input formats. Default value - `true`.
- [input_format_json_read_bools_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_bools_as_strings) - allow to parse bools as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_numbers_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_numbers_as_strings) - allow to parse numbers as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_arrays_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_arrays_as_strings) - allow to parse JSON arrays as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_objects_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_objects_as_strings) - allow to parse JSON objects as strings in JSON input formats. Default value - `true`.

View File

@ -614,6 +614,26 @@ DESC format(JSONEachRow, $$
└───────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_read_bools_as_strings
Enabling this setting allows reading Bool values as strings.
This setting is enabled by default.
**Example:**
```sql
SET input_format_json_read_bools_as_strings = 1;
DESC format(JSONEachRow, $$
{"value" : true}
{"value" : "Hello, World"}
$$)
```
```response
┌─name──┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ value │ Nullable(String) │ │ │ │ │ │
└───────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_read_arrays_as_strings
Enabling this setting allows reading JSON array values as strings.

View File

@ -377,6 +377,12 @@ Allow parsing bools as numbers in JSON input formats.
Enabled by default.
## input_format_json_read_bools_as_strings {#input_format_json_read_bools_as_strings}
Allow parsing bools as strings in JSON input formats.
Enabled by default.
## input_format_json_read_numbers_as_strings {#input_format_json_read_numbers_as_strings}
Allow parsing numbers as strings in JSON input formats.

View File

@ -27,7 +27,7 @@ $ clickhouse-format --query "select number from numbers(10) where number%2 order
Result:
```sql
```bash
SELECT number
FROM numbers(10)
WHERE number % 2
@ -49,22 +49,20 @@ SELECT sum(number) FROM numbers(5)
3. Multiqueries:
```bash
$ clickhouse-format -n <<< "SELECT * FROM (SELECT 1 AS x UNION ALL SELECT 1 UNION DISTINCT SELECT 3);"
$ clickhouse-format -n <<< "SELECT min(number) FROM numbers(5); SELECT max(number) FROM numbers(5);"
```
Result:
```sql
SELECT *
FROM
(
SELECT 1 AS x
UNION ALL
SELECT 1
UNION DISTINCT
SELECT 3
)
```
SELECT min(number)
FROM numbers(5)
;
SELECT max(number)
FROM numbers(5)
;
```
4. Obfuscating:
@ -75,7 +73,7 @@ $ clickhouse-format --seed Hello --obfuscate <<< "SELECT cost_first_screen BETWE
Result:
```sql
```
SELECT treasury_mammoth_hazelnut BETWEEN nutmeg AND span, CASE WHEN chive >= 116 THEN switching ELSE ANYTHING END;
```
@ -87,7 +85,7 @@ $ clickhouse-format --seed World --obfuscate <<< "SELECT cost_first_screen BETWE
Result:
```sql
```
SELECT horse_tape_summer BETWEEN folklore AND moccasins, CASE WHEN intestine >= 116 THEN nonconformist ELSE FORESTRY END;
```
@ -99,7 +97,7 @@ $ clickhouse-format --backslash <<< "SELECT * FROM (SELECT 1 AS x UNION ALL SELE
Result:
```sql
```
SELECT * \
FROM \
( \

View File

@ -1779,7 +1779,9 @@ Result:
## sqid
Transforms numbers into YouTube-like short URL hash called [Sqid](https://sqids.org/).
Transforms numbers into a [Sqid](https://sqids.org/) which is a YouTube-like ID string.
The output alphabet is `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`.
Do not use this function for hashing - the generated IDs can be decoded back into numbers.
**Syntax**

View File

@ -1260,11 +1260,11 @@ try
{
Settings::checkNoSettingNamesAtTopLevel(*config, config_path);
ServerSettings server_settings_;
server_settings_.loadSettingsFromConfig(*config);
ServerSettings new_server_settings;
new_server_settings.loadSettingsFromConfig(*config);
size_t max_server_memory_usage = server_settings_.max_server_memory_usage;
double max_server_memory_usage_to_ram_ratio = server_settings_.max_server_memory_usage_to_ram_ratio;
size_t max_server_memory_usage = new_server_settings.max_server_memory_usage;
double max_server_memory_usage_to_ram_ratio = new_server_settings.max_server_memory_usage_to_ram_ratio;
size_t current_physical_server_memory = getMemoryAmount(); /// With cgroups, the amount of memory available to the server can be changed dynamically.
size_t default_max_server_memory_usage = static_cast<size_t>(current_physical_server_memory * max_server_memory_usage_to_ram_ratio);
@ -1294,9 +1294,9 @@ try
total_memory_tracker.setDescription("(total)");
total_memory_tracker.setMetric(CurrentMetrics::MemoryTracking);
size_t merges_mutations_memory_usage_soft_limit = server_settings_.merges_mutations_memory_usage_soft_limit;
size_t merges_mutations_memory_usage_soft_limit = new_server_settings.merges_mutations_memory_usage_soft_limit;
size_t default_merges_mutations_server_memory_usage = static_cast<size_t>(current_physical_server_memory * server_settings_.merges_mutations_memory_usage_to_ram_ratio);
size_t default_merges_mutations_server_memory_usage = static_cast<size_t>(current_physical_server_memory * new_server_settings.merges_mutations_memory_usage_to_ram_ratio);
if (merges_mutations_memory_usage_soft_limit == 0)
{
merges_mutations_memory_usage_soft_limit = default_merges_mutations_server_memory_usage;
@ -1304,7 +1304,7 @@ try
" ({} available * {:.2f} merges_mutations_memory_usage_to_ram_ratio)",
formatReadableSizeWithBinarySuffix(merges_mutations_memory_usage_soft_limit),
formatReadableSizeWithBinarySuffix(current_physical_server_memory),
server_settings_.merges_mutations_memory_usage_to_ram_ratio);
new_server_settings.merges_mutations_memory_usage_to_ram_ratio);
}
else if (merges_mutations_memory_usage_soft_limit > default_merges_mutations_server_memory_usage)
{
@ -1313,7 +1313,7 @@ try
" ({} available * {:.2f} merges_mutations_memory_usage_to_ram_ratio)",
formatReadableSizeWithBinarySuffix(merges_mutations_memory_usage_soft_limit),
formatReadableSizeWithBinarySuffix(current_physical_server_memory),
server_settings_.merges_mutations_memory_usage_to_ram_ratio);
new_server_settings.merges_mutations_memory_usage_to_ram_ratio);
}
LOG_INFO(log, "Merges and mutations memory limit is set to {}",
@ -1322,7 +1322,7 @@ try
background_memory_tracker.setDescription("(background)");
background_memory_tracker.setMetric(CurrentMetrics::MergesMutationsMemoryTracking);
total_memory_tracker.setAllowUseJemallocMemory(server_settings_.allow_use_jemalloc_memory);
total_memory_tracker.setAllowUseJemallocMemory(new_server_settings.allow_use_jemalloc_memory);
auto * global_overcommit_tracker = global_context->getGlobalOvercommitTracker();
total_memory_tracker.setOvercommitTracker(global_overcommit_tracker);
@ -1346,26 +1346,26 @@ try
global_context->setRemoteHostFilter(*config);
global_context->setHTTPHeaderFilter(*config);
global_context->setMaxTableSizeToDrop(server_settings_.max_table_size_to_drop);
global_context->setMaxPartitionSizeToDrop(server_settings_.max_partition_size_to_drop);
global_context->setMaxTableNumToWarn(server_settings_.max_table_num_to_warn);
global_context->setMaxDatabaseNumToWarn(server_settings_.max_database_num_to_warn);
global_context->setMaxPartNumToWarn(server_settings_.max_part_num_to_warn);
global_context->setMaxTableSizeToDrop(new_server_settings.max_table_size_to_drop);
global_context->setMaxPartitionSizeToDrop(new_server_settings.max_partition_size_to_drop);
global_context->setMaxTableNumToWarn(new_server_settings.max_table_num_to_warn);
global_context->setMaxDatabaseNumToWarn(new_server_settings.max_database_num_to_warn);
global_context->setMaxPartNumToWarn(new_server_settings.max_part_num_to_warn);
ConcurrencyControl::SlotCount concurrent_threads_soft_limit = ConcurrencyControl::Unlimited;
if (server_settings_.concurrent_threads_soft_limit_num > 0 && server_settings_.concurrent_threads_soft_limit_num < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = server_settings_.concurrent_threads_soft_limit_num;
if (server_settings_.concurrent_threads_soft_limit_ratio_to_cores > 0)
if (new_server_settings.concurrent_threads_soft_limit_num > 0 && new_server_settings.concurrent_threads_soft_limit_num < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = new_server_settings.concurrent_threads_soft_limit_num;
if (new_server_settings.concurrent_threads_soft_limit_ratio_to_cores > 0)
{
auto value = server_settings_.concurrent_threads_soft_limit_ratio_to_cores * std::thread::hardware_concurrency();
auto value = new_server_settings.concurrent_threads_soft_limit_ratio_to_cores * std::thread::hardware_concurrency();
if (value > 0 && value < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = value;
}
ConcurrencyControl::instance().setMaxConcurrency(concurrent_threads_soft_limit);
global_context->getProcessList().setMaxSize(server_settings_.max_concurrent_queries);
global_context->getProcessList().setMaxInsertQueriesAmount(server_settings_.max_concurrent_insert_queries);
global_context->getProcessList().setMaxSelectQueriesAmount(server_settings_.max_concurrent_select_queries);
global_context->getProcessList().setMaxSize(new_server_settings.max_concurrent_queries);
global_context->getProcessList().setMaxInsertQueriesAmount(new_server_settings.max_concurrent_insert_queries);
global_context->getProcessList().setMaxSelectQueriesAmount(new_server_settings.max_concurrent_select_queries);
if (config->has("keeper_server"))
global_context->updateKeeperConfiguration(*config);
@ -1376,68 +1376,68 @@ try
/// This is done for backward compatibility.
if (global_context->areBackgroundExecutorsInitialized())
{
auto new_pool_size = server_settings_.background_pool_size;
auto new_ratio = server_settings_.background_merges_mutations_concurrency_ratio;
auto new_pool_size = new_server_settings.background_pool_size;
auto new_ratio = new_server_settings.background_merges_mutations_concurrency_ratio;
global_context->getMergeMutateExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, static_cast<size_t>(new_pool_size * new_ratio));
global_context->getMergeMutateExecutor()->updateSchedulingPolicy(server_settings_.background_merges_mutations_scheduling_policy.toString());
global_context->getMergeMutateExecutor()->updateSchedulingPolicy(new_server_settings.background_merges_mutations_scheduling_policy.toString());
}
if (global_context->areBackgroundExecutorsInitialized())
{
auto new_pool_size = server_settings_.background_move_pool_size;
auto new_pool_size = new_server_settings.background_move_pool_size;
global_context->getMovesExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, new_pool_size);
}
if (global_context->areBackgroundExecutorsInitialized())
{
auto new_pool_size = server_settings_.background_fetches_pool_size;
auto new_pool_size = new_server_settings.background_fetches_pool_size;
global_context->getFetchesExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, new_pool_size);
}
if (global_context->areBackgroundExecutorsInitialized())
{
auto new_pool_size = server_settings_.background_common_pool_size;
auto new_pool_size = new_server_settings.background_common_pool_size;
global_context->getCommonExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, new_pool_size);
}
global_context->getBufferFlushSchedulePool().increaseThreadsCount(server_settings_.background_buffer_flush_schedule_pool_size);
global_context->getSchedulePool().increaseThreadsCount(server_settings_.background_schedule_pool_size);
global_context->getMessageBrokerSchedulePool().increaseThreadsCount(server_settings_.background_message_broker_schedule_pool_size);
global_context->getDistributedSchedulePool().increaseThreadsCount(server_settings_.background_distributed_schedule_pool_size);
global_context->getBufferFlushSchedulePool().increaseThreadsCount(new_server_settings.background_buffer_flush_schedule_pool_size);
global_context->getSchedulePool().increaseThreadsCount(new_server_settings.background_schedule_pool_size);
global_context->getMessageBrokerSchedulePool().increaseThreadsCount(new_server_settings.background_message_broker_schedule_pool_size);
global_context->getDistributedSchedulePool().increaseThreadsCount(new_server_settings.background_distributed_schedule_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderForegroundPoolId, server_settings_.tables_loader_foreground_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderBackgroundLoadPoolId, server_settings_.tables_loader_background_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderBackgroundStartupPoolId, server_settings_.tables_loader_background_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderForegroundPoolId, new_server_settings.tables_loader_foreground_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderBackgroundLoadPoolId, new_server_settings.tables_loader_background_pool_size);
global_context->getAsyncLoader().setMaxThreads(TablesLoaderBackgroundStartupPoolId, new_server_settings.tables_loader_background_pool_size);
getIOThreadPool().reloadConfiguration(
server_settings.max_io_thread_pool_size,
server_settings.max_io_thread_pool_free_size,
server_settings.io_thread_pool_queue_size);
new_server_settings.max_io_thread_pool_size,
new_server_settings.max_io_thread_pool_free_size,
new_server_settings.io_thread_pool_queue_size);
getBackupsIOThreadPool().reloadConfiguration(
server_settings.max_backups_io_thread_pool_size,
server_settings.max_backups_io_thread_pool_free_size,
server_settings.backups_io_thread_pool_queue_size);
new_server_settings.max_backups_io_thread_pool_size,
new_server_settings.max_backups_io_thread_pool_free_size,
new_server_settings.backups_io_thread_pool_queue_size);
getActivePartsLoadingThreadPool().reloadConfiguration(
server_settings.max_active_parts_loading_thread_pool_size,
new_server_settings.max_active_parts_loading_thread_pool_size,
0, // We don't need any threads once all the parts will be loaded
server_settings.max_active_parts_loading_thread_pool_size);
new_server_settings.max_active_parts_loading_thread_pool_size);
getOutdatedPartsLoadingThreadPool().reloadConfiguration(
server_settings.max_outdated_parts_loading_thread_pool_size,
new_server_settings.max_outdated_parts_loading_thread_pool_size,
0, // We don't need any threads once all the parts will be loaded
server_settings.max_outdated_parts_loading_thread_pool_size);
new_server_settings.max_outdated_parts_loading_thread_pool_size);
/// It could grow if we need to synchronously wait until all the data parts will be loaded.
getOutdatedPartsLoadingThreadPool().setMaxTurboThreads(
server_settings.max_active_parts_loading_thread_pool_size
new_server_settings.max_active_parts_loading_thread_pool_size
);
getPartsCleaningThreadPool().reloadConfiguration(
server_settings.max_parts_cleaning_thread_pool_size,
new_server_settings.max_parts_cleaning_thread_pool_size,
0, // We don't need any threads one all the parts will be deleted
server_settings.max_parts_cleaning_thread_pool_size);
new_server_settings.max_parts_cleaning_thread_pool_size);
if (config->has("resources"))
{

View File

@ -140,8 +140,7 @@ void SettingsProfilesCache::mergeSettingsAndConstraintsFor(EnabledSettings & ena
auto info = std::make_shared<SettingsProfilesInfo>(access_control);
info->profiles = merged_settings.toProfileIDs();
substituteProfiles(merged_settings, info->profiles_with_implicit, info->names_of_profiles);
substituteProfiles(merged_settings, info->profiles, info->profiles_with_implicit, info->names_of_profiles);
info->settings = merged_settings.toSettingsChanges();
info->constraints = merged_settings.toSettingsConstraints(access_control);
@ -152,9 +151,12 @@ void SettingsProfilesCache::mergeSettingsAndConstraintsFor(EnabledSettings & ena
void SettingsProfilesCache::substituteProfiles(
SettingsProfileElements & elements,
std::vector<UUID> & profiles,
std::vector<UUID> & substituted_profiles,
std::unordered_map<UUID, String> & names_of_substituted_profiles) const
{
profiles = elements.toProfileIDs();
/// We should substitute profiles in reversive order because the same profile can occur
/// in `elements` multiple times (with some other settings in between) and in this case
/// the last occurrence should override all the previous ones.
@ -184,6 +186,11 @@ void SettingsProfilesCache::substituteProfiles(
names_of_substituted_profiles.emplace(profile_id, profile->getName());
}
std::reverse(substituted_profiles.begin(), substituted_profiles.end());
std::erase_if(profiles, [&substituted_profiles_set](const UUID & profile_id)
{
return !substituted_profiles_set.contains(profile_id);
});
}
std::shared_ptr<const EnabledSettings> SettingsProfilesCache::getEnabledSettings(
@ -225,13 +232,13 @@ std::shared_ptr<const SettingsProfilesInfo> SettingsProfilesCache::getSettingsPr
if (auto pos = this->profile_infos_cache.get(profile_id))
return *pos;
SettingsProfileElements elements = all_profiles[profile_id]->elements;
SettingsProfileElements elements;
auto & element = elements.emplace_back();
element.parent_profile = profile_id;
auto info = std::make_shared<SettingsProfilesInfo>(access_control);
info->profiles.push_back(profile_id);
info->profiles_with_implicit.push_back(profile_id);
substituteProfiles(elements, info->profiles_with_implicit, info->names_of_profiles);
substituteProfiles(elements, info->profiles, info->profiles_with_implicit, info->names_of_profiles);
info->settings = elements.toSettingsChanges();
info->constraints.merge(elements.toSettingsConstraints(access_control));

View File

@ -37,7 +37,11 @@ private:
void profileRemoved(const UUID & profile_id);
void mergeSettingsAndConstraints();
void mergeSettingsAndConstraintsFor(EnabledSettings & enabled) const;
void substituteProfiles(SettingsProfileElements & elements, std::vector<UUID> & substituted_profiles, std::unordered_map<UUID, String> & names_of_substituted_profiles) const;
void substituteProfiles(SettingsProfileElements & elements,
std::vector<UUID> & profiles,
std::vector<UUID> & substituted_profiles,
std::unordered_map<UUID, String> & names_of_substituted_profiles) const;
const AccessControl & access_control;
std::unordered_map<UUID, SettingsProfilePtr> all_profiles;

View File

@ -1,7 +1,8 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/FactoryHelpers.h>
#include <AggregateFunctions/HelpersMinMaxAny.h>
#include <AggregateFunctions/findNumeric.h>
#include <Common/Concepts.h>
#include <Common/findExtreme.h>
namespace DB
{
@ -19,7 +20,7 @@ public:
explicit AggregateFunctionsSingleValueMax(const DataTypePtr & type) : Parent(type) { }
/// Specializations for native numeric types
ALWAYS_INLINE inline void addBatchSinglePlace(
void addBatchSinglePlace(
size_t row_begin,
size_t row_end,
AggregateDataPtr __restrict place,
@ -27,7 +28,7 @@ public:
Arena * arena,
ssize_t if_argument_pos) const override;
ALWAYS_INLINE inline void addBatchSinglePlaceNotNull(
void addBatchSinglePlaceNotNull(
size_t row_begin,
size_t row_end,
AggregateDataPtr __restrict place,
@ -53,10 +54,10 @@ void AggregateFunctionsSingleValueMax<typename DB::AggregateFunctionMaxData<Sing
if (if_argument_pos >= 0) \
{ \
const auto & flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData(); \
opt = findNumericMaxIf(column.getData().data(), flags.data(), row_begin, row_end); \
opt = findExtremeMaxIf(column.getData().data(), flags.data(), row_begin, row_end); \
} \
else \
opt = findNumericMax(column.getData().data(), row_begin, row_end); \
opt = findExtremeMax(column.getData().data(), row_begin, row_end); \
if (opt.has_value()) \
this->data(place).changeIfGreater(opt.value()); \
}
@ -74,7 +75,57 @@ void AggregateFunctionsSingleValueMax<Data>::addBatchSinglePlace(
Arena * arena,
ssize_t if_argument_pos) const
{
return Parent::addBatchSinglePlace(row_begin, row_end, place, columns, arena, if_argument_pos);
if constexpr (!is_any_of<typename Data::Impl, SingleValueDataString, SingleValueDataGeneric>)
{
/// Leave other numeric types (large integers, decimals, etc) to keep doing the comparison as it's
/// faster than doing a permutation
return Parent::addBatchSinglePlace(row_begin, row_end, place, columns, arena, if_argument_pos);
}
constexpr int nan_direction_hint = 1;
auto const & column = *columns[0];
if (if_argument_pos >= 0)
{
size_t index = row_begin;
const auto & if_flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData();
while (if_flags[index] == 0 && index < row_end)
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((if_flags[i] != 0) && (column.compareAt(i, index, column, nan_direction_hint) > 0))
index = i;
}
this->data(place).changeIfGreater(column, index, arena);
}
else
{
if (row_begin >= row_end)
return;
/// TODO: Introduce row_begin and row_end to getPermutation
if (row_begin != 0 || row_end != column.size())
{
size_t index = row_begin;
for (size_t i = index + 1; i < row_end; i++)
{
if (column.compareAt(i, index, column, nan_direction_hint) > 0)
index = i;
}
this->data(place).changeIfGreater(column, index, arena);
}
else
{
constexpr IColumn::PermutationSortDirection direction = IColumn::PermutationSortDirection::Descending;
constexpr IColumn::PermutationSortStability stability = IColumn::PermutationSortStability::Unstable;
IColumn::Permutation permutation;
constexpr UInt64 limit = 1;
column.getPermutation(direction, stability, limit, nan_direction_hint, permutation);
this->data(place).changeIfGreater(column, permutation[0], arena);
}
}
}
// NOLINTBEGIN(bugprone-macro-parentheses)
@ -97,10 +148,10 @@ void AggregateFunctionsSingleValueMax<typename DB::AggregateFunctionMaxData<Sing
auto final_flags = std::make_unique<UInt8[]>(row_end); \
for (size_t i = row_begin; i < row_end; ++i) \
final_flags[i] = (!null_map[i]) & !!if_flags[i]; \
opt = findNumericMaxIf(column.getData().data(), final_flags.get(), row_begin, row_end); \
opt = findExtremeMaxIf(column.getData().data(), final_flags.get(), row_begin, row_end); \
} \
else \
opt = findNumericMaxNotNull(column.getData().data(), null_map, row_begin, row_end); \
opt = findExtremeMaxNotNull(column.getData().data(), null_map, row_begin, row_end); \
if (opt.has_value()) \
this->data(place).changeIfGreater(opt.value()); \
}
@ -119,7 +170,46 @@ void AggregateFunctionsSingleValueMax<Data>::addBatchSinglePlaceNotNull(
Arena * arena,
ssize_t if_argument_pos) const
{
return Parent::addBatchSinglePlaceNotNull(row_begin, row_end, place, columns, null_map, arena, if_argument_pos);
if constexpr (!is_any_of<typename Data::Impl, SingleValueDataString, SingleValueDataGeneric>)
{
/// Leave other numeric types (large integers, decimals, etc) to keep doing the comparison as it's
/// faster than doing a permutation
return Parent::addBatchSinglePlaceNotNull(row_begin, row_end, place, columns, null_map, arena, if_argument_pos);
}
constexpr int nan_direction_hint = 1;
auto const & column = *columns[0];
if (if_argument_pos >= 0)
{
size_t index = row_begin;
const auto & if_flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData();
while ((if_flags[index] == 0 || null_map[index] != 0) && (index < row_end))
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((if_flags[i] != 0) && (null_map[i] == 0) && (column.compareAt(i, index, column, nan_direction_hint) > 0))
index = i;
}
this->data(place).changeIfGreater(column, index, arena);
}
else
{
size_t index = row_begin;
while ((null_map[index] != 0) && (index < row_end))
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((null_map[i] == 0) && (column.compareAt(i, index, column, nan_direction_hint) > 0))
index = i;
}
this->data(place).changeIfGreater(column, index, arena);
}
}
AggregateFunctionPtr createAggregateFunctionMax(

View File

@ -1,7 +1,8 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/FactoryHelpers.h>
#include <AggregateFunctions/HelpersMinMaxAny.h>
#include <AggregateFunctions/findNumeric.h>
#include <Common/Concepts.h>
#include <Common/findExtreme.h>
namespace DB
@ -20,7 +21,7 @@ public:
explicit AggregateFunctionsSingleValueMin(const DataTypePtr & type) : Parent(type) { }
/// Specializations for native numeric types
ALWAYS_INLINE inline void addBatchSinglePlace(
void addBatchSinglePlace(
size_t row_begin,
size_t row_end,
AggregateDataPtr __restrict place,
@ -28,7 +29,7 @@ public:
Arena * arena,
ssize_t if_argument_pos) const override;
ALWAYS_INLINE inline void addBatchSinglePlaceNotNull(
void addBatchSinglePlaceNotNull(
size_t row_begin,
size_t row_end,
AggregateDataPtr __restrict place,
@ -54,10 +55,10 @@ public:
if (if_argument_pos >= 0) \
{ \
const auto & flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData(); \
opt = findNumericMinIf(column.getData().data(), flags.data(), row_begin, row_end); \
opt = findExtremeMinIf(column.getData().data(), flags.data(), row_begin, row_end); \
} \
else \
opt = findNumericMin(column.getData().data(), row_begin, row_end); \
opt = findExtremeMin(column.getData().data(), row_begin, row_end); \
if (opt.has_value()) \
this->data(place).changeIfLess(opt.value()); \
}
@ -75,7 +76,57 @@ void AggregateFunctionsSingleValueMin<Data>::addBatchSinglePlace(
Arena * arena,
ssize_t if_argument_pos) const
{
return Parent::addBatchSinglePlace(row_begin, row_end, place, columns, arena, if_argument_pos);
if constexpr (!is_any_of<typename Data::Impl, SingleValueDataString, SingleValueDataGeneric>)
{
/// Leave other numeric types (large integers, decimals, etc) to keep doing the comparison as it's
/// faster than doing a permutation
return Parent::addBatchSinglePlace(row_begin, row_end, place, columns, arena, if_argument_pos);
}
constexpr int nan_direction_hint = 1;
auto const & column = *columns[0];
if (if_argument_pos >= 0)
{
size_t index = row_begin;
const auto & if_flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData();
while (if_flags[index] == 0 && index < row_end)
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((if_flags[i] != 0) && (column.compareAt(i, index, column, nan_direction_hint) < 0))
index = i;
}
this->data(place).changeIfLess(column, index, arena);
}
else
{
if (row_begin >= row_end)
return;
/// TODO: Introduce row_begin and row_end to getPermutation
if (row_begin != 0 || row_end != column.size())
{
size_t index = row_begin;
for (size_t i = index + 1; i < row_end; i++)
{
if (column.compareAt(i, index, column, nan_direction_hint) < 0)
index = i;
}
this->data(place).changeIfLess(column, index, arena);
}
else
{
constexpr IColumn::PermutationSortDirection direction = IColumn::PermutationSortDirection::Ascending;
constexpr IColumn::PermutationSortStability stability = IColumn::PermutationSortStability::Unstable;
IColumn::Permutation permutation;
constexpr UInt64 limit = 1;
column.getPermutation(direction, stability, limit, nan_direction_hint, permutation);
this->data(place).changeIfLess(column, permutation[0], arena);
}
}
}
// NOLINTBEGIN(bugprone-macro-parentheses)
@ -98,10 +149,10 @@ void AggregateFunctionsSingleValueMin<Data>::addBatchSinglePlace(
auto final_flags = std::make_unique<UInt8[]>(row_end); \
for (size_t i = row_begin; i < row_end; ++i) \
final_flags[i] = (!null_map[i]) & !!if_flags[i]; \
opt = findNumericMinIf(column.getData().data(), final_flags.get(), row_begin, row_end); \
opt = findExtremeMinIf(column.getData().data(), final_flags.get(), row_begin, row_end); \
} \
else \
opt = findNumericMinNotNull(column.getData().data(), null_map, row_begin, row_end); \
opt = findExtremeMinNotNull(column.getData().data(), null_map, row_begin, row_end); \
if (opt.has_value()) \
this->data(place).changeIfLess(opt.value()); \
}
@ -120,7 +171,46 @@ void AggregateFunctionsSingleValueMin<Data>::addBatchSinglePlaceNotNull(
Arena * arena,
ssize_t if_argument_pos) const
{
return Parent::addBatchSinglePlaceNotNull(row_begin, row_end, place, columns, null_map, arena, if_argument_pos);
if constexpr (!is_any_of<typename Data::Impl, SingleValueDataString, SingleValueDataGeneric>)
{
/// Leave other numeric types (large integers, decimals, etc) to keep doing the comparison as it's
/// faster than doing a permutation
return Parent::addBatchSinglePlaceNotNull(row_begin, row_end, place, columns, null_map, arena, if_argument_pos);
}
constexpr int nan_direction_hint = 1;
auto const & column = *columns[0];
if (if_argument_pos >= 0)
{
size_t index = row_begin;
const auto & if_flags = assert_cast<const ColumnUInt8 &>(*columns[if_argument_pos]).getData();
while ((if_flags[index] == 0 || null_map[index] != 0) && (index < row_end))
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((if_flags[i] != 0) && (null_map[index] == 0) && (column.compareAt(i, index, column, nan_direction_hint) < 0))
index = i;
}
this->data(place).changeIfLess(column, index, arena);
}
else
{
size_t index = row_begin;
while ((null_map[index] != 0) && (index < row_end))
index++;
if (index >= row_end)
return;
for (size_t i = index + 1; i < row_end; i++)
{
if ((null_map[i] == 0) && (column.compareAt(i, index, column, nan_direction_hint) < 0))
index = i;
}
this->data(place).changeIfLess(column, index, arena);
}
}
AggregateFunctionPtr createAggregateFunctionMin(

View File

@ -965,6 +965,7 @@ template <typename Data>
struct AggregateFunctionMinData : Data
{
using Self = AggregateFunctionMinData;
using Impl = Data;
bool changeIfBetter(const IColumn & column, size_t row_num, Arena * arena) { return this->changeIfLess(column, row_num, arena); }
bool changeIfBetter(const Self & to, Arena * arena) { return this->changeIfLess(to, arena); }
@ -993,6 +994,7 @@ template <typename Data>
struct AggregateFunctionMaxData : Data
{
using Self = AggregateFunctionMaxData;
using Impl = Data;
bool changeIfBetter(const IColumn & column, size_t row_num, Arena * arena) { return this->changeIfGreater(column, row_num, arena); }
bool changeIfBetter(const Self & to, Arena * arena) { return this->changeIfGreater(to, arena); }

View File

@ -1,15 +0,0 @@
#include <AggregateFunctions/findNumeric.h>
namespace DB
{
#define INSTANTIATION(T) \
template std::optional<T> findNumericMin(const T * __restrict ptr, size_t start, size_t end); \
template std::optional<T> findNumericMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findNumericMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findNumericMax(const T * __restrict ptr, size_t start, size_t end); \
template std::optional<T> findNumericMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findNumericMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
FOR_BASIC_NUMERIC_TYPES(INSTANTIATION)
#undef INSTANTIATION
}

View File

@ -143,9 +143,17 @@ public:
return alias;
}
const String & getOriginalAlias() const
{
return original_alias.empty() ? alias : original_alias;
}
/// Set node alias
void setAlias(String alias_value)
{
if (original_alias.empty())
original_alias = std::move(alias);
alias = std::move(alias_value);
}
@ -276,6 +284,9 @@ protected:
private:
String alias;
/// An alias from query. Alias can be replaced by query passes,
/// but we need to keep the original one to support additional_table_filters.
String original_alias;
ASTPtr original_ast;
};

View File

@ -52,6 +52,7 @@
#include <Processors/Executors/PullingAsyncPipelineExecutor.h>
#include <Analyzer/createUniqueTableAliases.h>
#include <Analyzer/Utils.h>
#include <Analyzer/SetUtils.h>
#include <Analyzer/AggregationUtils.h>
@ -1198,7 +1199,7 @@ private:
static void mergeWindowWithParentWindow(const QueryTreeNodePtr & window_node, const QueryTreeNodePtr & parent_window_node, IdentifierResolveScope & scope);
static void replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_list, const QueryTreeNodes & projection_nodes, IdentifierResolveScope & scope);
void replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_list, const QueryTreeNodes & projection_nodes, IdentifierResolveScope & scope);
static void convertLimitOffsetExpression(QueryTreeNodePtr & expression_node, const String & expression_description, IdentifierResolveScope & scope);
@ -2168,7 +2169,12 @@ void QueryAnalyzer::replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_
scope.scope_node->formatASTForErrorMessage());
--positional_argument_number;
*node_to_replace = projection_nodes[positional_argument_number];
*node_to_replace = projection_nodes[positional_argument_number]->clone();
if (auto it = resolved_expressions.find(projection_nodes[positional_argument_number]);
it != resolved_expressions.end())
{
resolved_expressions[*node_to_replace] = it->second;
}
}
}
@ -7366,6 +7372,7 @@ void QueryAnalysisPass::run(QueryTreeNodePtr query_tree_node, ContextPtr context
{
QueryAnalyzer analyzer;
analyzer.resolve(query_tree_node, table_expression, context);
createUniqueTableAliases(query_tree_node, table_expression, context);
}
}

View File

@ -326,7 +326,7 @@ void addTableExpressionOrJoinIntoTablesInSelectQuery(ASTPtr & tables_in_select_q
}
}
QueryTreeNodes extractTableExpressions(const QueryTreeNodePtr & join_tree_node)
QueryTreeNodes extractTableExpressions(const QueryTreeNodePtr & join_tree_node, bool add_array_join)
{
QueryTreeNodes result;
@ -357,6 +357,8 @@ QueryTreeNodes extractTableExpressions(const QueryTreeNodePtr & join_tree_node)
{
auto & array_join_node = node_to_process->as<ArrayJoinNode &>();
nodes_to_process.push_front(array_join_node.getTableExpression());
if (add_array_join)
result.push_back(std::move(node_to_process));
break;
}
case QueryTreeNodeType::JOIN:

View File

@ -51,7 +51,7 @@ std::optional<bool> tryExtractConstantFromConditionNode(const QueryTreeNodePtr &
void addTableExpressionOrJoinIntoTablesInSelectQuery(ASTPtr & tables_in_select_query_ast, const QueryTreeNodePtr & table_expression, const IQueryTreeNode::ConvertToASTOptions & convert_to_ast_options);
/// Extract table, table function, query, union from join tree
QueryTreeNodes extractTableExpressions(const QueryTreeNodePtr & join_tree_node);
QueryTreeNodes extractTableExpressions(const QueryTreeNodePtr & join_tree_node, bool add_array_join = false);
/// Extract left table expression from join tree
QueryTreeNodePtr extractLeftTableExpression(const QueryTreeNodePtr & join_tree_node);

View File

@ -0,0 +1,141 @@
#include <memory>
#include <unordered_map>
#include <Analyzer/createUniqueTableAliases.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/LambdaNode.h>
#include <Analyzer/Utils.h>
namespace DB
{
namespace
{
class CreateUniqueTableAliasesVisitor : public InDepthQueryTreeVisitorWithContext<CreateUniqueTableAliasesVisitor>
{
public:
using Base = InDepthQueryTreeVisitorWithContext<CreateUniqueTableAliasesVisitor>;
explicit CreateUniqueTableAliasesVisitor(const ContextPtr & context)
: Base(context)
{
// Insert a fake node on top of the stack.
scope_nodes_stack.push_back(std::make_shared<LambdaNode>(Names{}, nullptr));
}
void enterImpl(QueryTreeNodePtr & node)
{
auto node_type = node->getNodeType();
switch (node_type)
{
case QueryTreeNodeType::QUERY:
[[fallthrough]];
case QueryTreeNodeType::UNION:
{
/// Queries like `(SELECT 1) as t` have invalid syntax. To avoid creating such queries (e.g. in StorageDistributed)
/// we need to remove aliases for top level queries.
/// N.B. Subquery depth starts count from 1, so the following condition checks if it's a top level.
if (getSubqueryDepth() == 1)
{
node->removeAlias();
break;
}
[[fallthrough]];
}
case QueryTreeNodeType::TABLE:
[[fallthrough]];
case QueryTreeNodeType::TABLE_FUNCTION:
[[fallthrough]];
case QueryTreeNodeType::ARRAY_JOIN:
{
auto & alias = table_expression_to_alias[node];
if (alias.empty())
{
scope_to_nodes_with_aliases[scope_nodes_stack.back()].push_back(node);
alias = fmt::format("__table{}", ++next_id);
node->setAlias(alias);
}
break;
}
default:
break;
}
switch (node_type)
{
case QueryTreeNodeType::QUERY:
[[fallthrough]];
case QueryTreeNodeType::UNION:
[[fallthrough]];
case QueryTreeNodeType::LAMBDA:
scope_nodes_stack.push_back(node);
break;
default:
break;
}
}
void leaveImpl(QueryTreeNodePtr & node)
{
if (scope_nodes_stack.back() == node)
{
if (auto it = scope_to_nodes_with_aliases.find(scope_nodes_stack.back());
it != scope_to_nodes_with_aliases.end())
{
for (const auto & node_with_alias : it->second)
{
table_expression_to_alias.erase(node_with_alias);
}
scope_to_nodes_with_aliases.erase(it);
}
scope_nodes_stack.pop_back();
}
/// Here we revisit subquery for IN function. Reasons:
/// * For remote query execution, query tree may be traversed a few times.
/// In such a case, it is possible to get AST like
/// `IN ((SELECT ... FROM table AS __table4) AS __table1)` which result in
/// `Multiple expressions for the alias` exception
/// * Tables in subqueries could have different aliases => different three hashes,
/// which is important to be able to find a set in PreparedSets
/// See 01253_subquery_in_aggregate_function_JustStranger.
///
/// So, we revisit this subquery to make aliases stable.
/// This should be safe cause columns from IN subquery can't be used in main query anyway.
if (node->getNodeType() == QueryTreeNodeType::FUNCTION)
{
auto * function_node = node->as<FunctionNode>();
if (isNameOfInFunction(function_node->getFunctionName()))
{
auto arg = function_node->getArguments().getNodes().back();
/// Avoid aliasing IN `table`
if (arg->getNodeType() != QueryTreeNodeType::TABLE)
CreateUniqueTableAliasesVisitor(getContext()).visit(function_node->getArguments().getNodes().back());
}
}
}
private:
size_t next_id = 0;
// Stack of nodes which create scopes: QUERY, UNION and LAMBDA.
QueryTreeNodes scope_nodes_stack;
std::unordered_map<QueryTreeNodePtr, QueryTreeNodes> scope_to_nodes_with_aliases;
// We need to use raw pointer as a key, not a QueryTreeNodePtrWithHash.
std::unordered_map<QueryTreeNodePtr, String> table_expression_to_alias;
};
}
void createUniqueTableAliases(QueryTreeNodePtr & node, const QueryTreeNodePtr & /*table_expression*/, const ContextPtr & context)
{
CreateUniqueTableAliasesVisitor(context).visit(node);
}
}

View File

@ -0,0 +1,18 @@
#pragma once
#include <memory>
#include <Interpreters/Context_fwd.h>
class IQueryTreeNode;
using QueryTreeNodePtr = std::shared_ptr<IQueryTreeNode>;
namespace DB
{
/*
* For each table expression in the Query Tree generate and add a unique alias.
* If table expression had an alias in initial query tree, override it.
*/
void createUniqueTableAliases(QueryTreeNodePtr & node, const QueryTreeNodePtr & table_expression, const ContextPtr & context);
}

View File

@ -573,11 +573,12 @@ void RestorerFromBackup::createDatabase(const String & database_name) const
create_database_query->if_not_exists = (restore_settings.create_table == RestoreTableCreationMode::kCreateIfNotExists);
LOG_TRACE(log, "Creating database {}: {}", backQuoteIfNeed(database_name), serializeAST(*create_database_query));
auto query_context = Context::createCopy(context);
query_context->setSetting("allow_deprecated_database_ordinary", 1);
try
{
/// Execute CREATE DATABASE query.
InterpreterCreateQuery interpreter{create_database_query, context};
InterpreterCreateQuery interpreter{create_database_query, query_context};
interpreter.setInternal(true);
interpreter.execute();
}

View File

@ -589,6 +589,7 @@
M(707, GCP_ERROR) \
M(708, ILLEGAL_STATISTIC) \
M(709, CANNOT_GET_REPLICATED_DATABASE_SNAPSHOT) \
M(710, FAULT_INJECTED) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -34,6 +34,8 @@ static struct InitFiu
#define APPLY_FOR_FAILPOINTS(ONCE, REGULAR, PAUSEABLE_ONCE, PAUSEABLE) \
ONCE(replicated_merge_tree_commit_zk_fail_after_op) \
ONCE(replicated_queue_fail_next_entry) \
REGULAR(replicated_queue_unfail_entries) \
ONCE(replicated_merge_tree_insert_quorum_fail_0) \
REGULAR(replicated_merge_tree_commit_zk_fail_when_recovering_from_hw_fault) \
REGULAR(use_delayed_remote_source) \

View File

@ -1,18 +1,9 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <base/defines.h>
#include <base/types.h>
#include <Common/Concepts.h>
#include <Common/TargetSpecific.h>
#include <algorithm>
#include <optional>
#include <Common/findExtreme.h>
namespace DB
{
template <typename T>
concept is_any_native_number = (is_any_of<T, Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64>);
template <is_any_native_number T>
struct MinComparator
@ -28,8 +19,8 @@ struct MaxComparator
MULTITARGET_FUNCTION_AVX2_SSE42(
MULTITARGET_FUNCTION_HEADER(template <is_any_native_number T, typename ComparatorClass, bool add_all_elements, bool add_if_cond_zero> static std::optional<T> NO_INLINE),
findNumericExtremeImpl,
MULTITARGET_FUNCTION_BODY((const T * __restrict ptr, const UInt8 * __restrict condition_map [[maybe_unused]], size_t row_begin, size_t row_end)
findExtremeImpl,
MULTITARGET_FUNCTION_BODY((const T * __restrict ptr, const UInt8 * __restrict condition_map [[maybe_unused]], size_t row_begin, size_t row_end) /// NOLINT
{
size_t count = row_end - row_begin;
ptr += row_begin;
@ -86,69 +77,67 @@ MULTITARGET_FUNCTION_AVX2_SSE42(
}
))
/// Given a vector of T finds the extreme (MIN or MAX) value
template <is_any_native_number T, class ComparatorClass, bool add_all_elements, bool add_if_cond_zero>
static std::optional<T>
findNumericExtreme(const T * __restrict ptr, const UInt8 * __restrict condition_map [[maybe_unused]], size_t start, size_t end)
findExtreme(const T * __restrict ptr, const UInt8 * __restrict condition_map [[maybe_unused]], size_t start, size_t end)
{
#if USE_MULTITARGET_CODE
/// We see no benefit from using AVX512BW or AVX512F (over AVX2), so we only declare SSE and AVX2
if (isArchSupported(TargetArch::AVX2))
return findNumericExtremeImplAVX2<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
return findExtremeImplAVX2<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
if (isArchSupported(TargetArch::SSE42))
return findNumericExtremeImplSSE42<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
return findExtremeImplSSE42<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
#endif
return findNumericExtremeImpl<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
return findExtremeImpl<T, ComparatorClass, add_all_elements, add_if_cond_zero>(ptr, condition_map, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMin(const T * __restrict ptr, size_t start, size_t end)
std::optional<T> findExtremeMin(const T * __restrict ptr, size_t start, size_t end)
{
return findNumericExtreme<T, MinComparator<T>, true, false>(ptr, nullptr, start, end);
return findExtreme<T, MinComparator<T>, true, false>(ptr, nullptr, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
std::optional<T> findExtremeMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
{
return findNumericExtreme<T, MinComparator<T>, false, true>(ptr, condition_map, start, end);
return findExtreme<T, MinComparator<T>, false, true>(ptr, condition_map, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
std::optional<T> findExtremeMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
{
return findNumericExtreme<T, MinComparator<T>, false, false>(ptr, condition_map, start, end);
return findExtreme<T, MinComparator<T>, false, false>(ptr, condition_map, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMax(const T * __restrict ptr, size_t start, size_t end)
std::optional<T> findExtremeMax(const T * __restrict ptr, size_t start, size_t end)
{
return findNumericExtreme<T, MaxComparator<T>, true, false>(ptr, nullptr, start, end);
return findExtreme<T, MaxComparator<T>, true, false>(ptr, nullptr, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
std::optional<T> findExtremeMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
{
return findNumericExtreme<T, MaxComparator<T>, false, true>(ptr, condition_map, start, end);
return findExtreme<T, MaxComparator<T>, false, true>(ptr, condition_map, start, end);
}
template <is_any_native_number T>
std::optional<T> findNumericMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
std::optional<T> findExtremeMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end)
{
return findNumericExtreme<T, MaxComparator<T>, false, false>(ptr, condition_map, start, end);
return findExtreme<T, MaxComparator<T>, false, false>(ptr, condition_map, start, end);
}
#define EXTERN_INSTANTIATION(T) \
extern template std::optional<T> findNumericMin(const T * __restrict ptr, size_t start, size_t end); \
extern template std::optional<T> findNumericMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findNumericMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findNumericMax(const T * __restrict ptr, size_t start, size_t end); \
extern template std::optional<T> findNumericMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findNumericMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
FOR_BASIC_NUMERIC_TYPES(EXTERN_INSTANTIATION)
#undef EXTERN_INSTANTIATION
#define INSTANTIATION(T) \
template std::optional<T> findExtremeMin(const T * __restrict ptr, size_t start, size_t end); \
template std::optional<T> findExtremeMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findExtremeMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findExtremeMax(const T * __restrict ptr, size_t start, size_t end); \
template std::optional<T> findExtremeMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
template std::optional<T> findExtremeMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
FOR_BASIC_NUMERIC_TYPES(INSTANTIATION)
#undef INSTANTIATION
}

45
src/Common/findExtreme.h Normal file
View File

@ -0,0 +1,45 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <base/defines.h>
#include <base/types.h>
#include <Common/Concepts.h>
#include <algorithm>
#include <optional>
namespace DB
{
template <typename T>
concept is_any_native_number = (is_any_of<T, Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64>);
template <is_any_native_number T>
std::optional<T> findExtremeMin(const T * __restrict ptr, size_t start, size_t end);
template <is_any_native_number T>
std::optional<T> findExtremeMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
template <is_any_native_number T>
std::optional<T> findExtremeMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
template <is_any_native_number T>
std::optional<T> findExtremeMax(const T * __restrict ptr, size_t start, size_t end);
template <is_any_native_number T>
std::optional<T> findExtremeMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
template <is_any_native_number T>
std::optional<T> findExtremeMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
#define EXTERN_INSTANTIATION(T) \
extern template std::optional<T> findExtremeMin(const T * __restrict ptr, size_t start, size_t end); \
extern template std::optional<T> findExtremeMinNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findExtremeMinIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findExtremeMax(const T * __restrict ptr, size_t start, size_t end); \
extern template std::optional<T> findExtremeMaxNotNull(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end); \
extern template std::optional<T> findExtremeMaxIf(const T * __restrict ptr, const UInt8 * __restrict condition_map, size_t start, size_t end);
FOR_BASIC_NUMERIC_TYPES(EXTERN_INSTANTIATION)
#undef EXTERN_INSTANTIATION
}

View File

@ -26,6 +26,8 @@ namespace DB
M(UInt64, max_active_parts_loading_thread_pool_size, 64, "The number of threads to load active set of data parts (Active ones) at startup.", 0) \
M(UInt64, max_outdated_parts_loading_thread_pool_size, 32, "The number of threads to load inactive set of data parts (Outdated ones) at startup.", 0) \
M(UInt64, max_parts_cleaning_thread_pool_size, 128, "The number of threads for concurrent removal of inactive data parts.", 0) \
M(UInt64, max_mutations_bandwidth_for_server, 0, "The maximum read speed of all mutations on server in bytes per second. Zero means unlimited.", 0) \
M(UInt64, max_merges_bandwidth_for_server, 0, "The maximum read speed of all merges on server in bytes per second. Zero means unlimited.", 0) \
M(UInt64, max_replicated_fetches_network_bandwidth_for_server, 0, "The maximum speed of data exchange over the network in bytes per second for replicated fetches. Zero means unlimited.", 0) \
M(UInt64, max_replicated_sends_network_bandwidth_for_server, 0, "The maximum speed of data exchange over the network in bytes per second for replicated sends. Zero means unlimited.", 0) \
M(UInt64, max_remote_read_network_bandwidth_for_server, 0, "The maximum speed of data exchange over the network in bytes per second for read. Zero means unlimited.", 0) \

View File

@ -157,7 +157,7 @@ class IColumn;
M(Bool, allow_suspicious_fixed_string_types, false, "In CREATE TABLE statement allows creating columns of type FixedString(n) with n > 256. FixedString with length >= 256 is suspicious and most likely indicates misusage", 0) \
M(Bool, allow_suspicious_indices, false, "Reject primary/secondary indexes and sorting keys with identical expressions", 0) \
M(Bool, allow_suspicious_ttl_expressions, false, "Reject TTL expressions that don't depend on any of table's columns. It indicates a user error most of the time.", 0) \
M(Bool, compile_expressions, true, "Compile some scalar functions and operators to native code.", 0) \
M(Bool, compile_expressions, false, "Compile some scalar functions and operators to native code.", 0) \
M(UInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled", 0) \
M(Bool, compile_aggregate_expressions, true, "Compile aggregate functions to native code.", 0) \
M(UInt64, min_count_to_compile_aggregate_expression, 3, "The number of identical aggregate expressions before they are JIT-compiled", 0) \
@ -709,7 +709,6 @@ class IColumn;
M(Bool, query_plan_execute_functions_after_sorting, true, "Allow to re-order functions after sorting", 0) \
M(Bool, query_plan_reuse_storage_ordering_for_window_functions, true, "Allow to use the storage sorting for window functions", 0) \
M(Bool, query_plan_lift_up_union, true, "Allow to move UNIONs up so that more parts of the query plan can be optimized", 0) \
M(Bool, query_plan_optimize_primary_key, true, "Analyze primary key using query plan (instead of AST)", 0) \
M(Bool, query_plan_read_in_order, true, "Use query plan for read-in-order optimization", 0) \
M(Bool, query_plan_aggregation_in_order, true, "Use query plan for aggregation-in-order optimization", 0) \
M(Bool, query_plan_remove_redundant_sorting, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries", 0) \
@ -845,7 +844,7 @@ class IColumn;
M(Timezone, session_timezone, "", "This setting can be removed in the future due to potential caveats. It is experimental and is not suitable for production usage. The default timezone for current session or query. The server default timezone if empty.", 0) \
M(Bool, allow_create_index_without_type, false, "Allow CREATE INDEX query without TYPE. Query will be ignored. Made for SQL compatibility tests.", 0) \
M(Bool, create_index_ignore_unique, false, "Ignore UNIQUE keyword in CREATE UNIQUE INDEX. Made for SQL compatibility tests.", 0) \
M(Bool, print_pretty_type_names, false, "Print pretty type names in DESCRIBE query and toTypeName() function", 0) \
M(Bool, print_pretty_type_names, true, "Print pretty type names in DESCRIBE query and toTypeName() function", 0) \
M(Bool, create_table_empty_primary_key_by_default, false, "Allow to create *MergeTree tables with empty primary key when ORDER BY and PRIMARY KEY not specified", 0) \
M(Bool, allow_named_collection_override_by_default, true, "Allow named collections' fields override by default.", 0)\
M(Bool, allow_experimental_shared_merge_tree, false, "Only available in ClickHouse Cloud", 0) \
@ -918,6 +917,7 @@ class IColumn;
MAKE_OBSOLETE(M, Bool, optimize_move_functions_out_of_any, false) \
MAKE_OBSOLETE(M, Bool, allow_experimental_undrop_table_query, true) \
MAKE_OBSOLETE(M, Bool, allow_experimental_s3queue, true) \
MAKE_OBSOLETE(M, Bool, query_plan_optimize_primary_key, true) \
/** The section above is for obsolete settings. Do not add anything there. */
@ -983,6 +983,7 @@ class IColumn;
M(SchemaInferenceMode, schema_inference_mode, "default", "Mode of schema inference. 'default' - assume that all files have the same schema and schema can be inferred from any file, 'union' - files can have different schemas and the resulting schema should be the a union of schemas of all files", 0) \
M(Bool, schema_inference_make_columns_nullable, true, "If set to true, all inferred types will be Nullable in schema inference for formats without information about nullability.", 0) \
M(Bool, input_format_json_read_bools_as_numbers, true, "Allow to parse bools as numbers in JSON input formats", 0) \
M(Bool, input_format_json_read_bools_as_strings, true, "Allow to parse bools as strings in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, false, "Try to infer numbers from string fields while schema inference", 0) \
M(Bool, input_format_json_validate_types_from_metadata, true, "For JSON/JSONCompact/JSONColumnsWithMetadata input formats this controls whether format parser should check if data types from input metadata match data types of the corresponding columns from the table", 0) \
M(Bool, input_format_json_read_numbers_as_strings, true, "Allow to parse numbers as strings in JSON input formats", 0) \

View File

@ -81,6 +81,8 @@ namespace SettingsChangesHistory
/// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972)
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"24.1", {{"print_pretty_type_names", false, true, "Better user experience."},
{"input_format_json_read_bools_as_strings", false, true, "Allow to read bools as strings in JSON formats by default"}}},
{"23.12", {{"allow_suspicious_ttl_expressions", true, false, "It is a new setting, and in previous versions the behavior was equivalent to allowing."},
{"input_format_parquet_allow_missing_columns", false, true, "Allow missing columns in Parquet files by default"},
{"input_format_orc_allow_missing_columns", false, true, "Allow missing columns in ORC files by default"},

View File

@ -85,10 +85,7 @@ std::string DataTypeMap::doGetName() const
std::string DataTypeMap::doGetPrettyName(size_t indent) const
{
WriteBufferFromOwnString s;
s << "Map(\n"
<< fourSpaceIndent(indent + 1) << key_type->getPrettyName(indent + 1) << ",\n"
<< fourSpaceIndent(indent + 1) << value_type->getPrettyName(indent + 1) << '\n'
<< fourSpaceIndent(indent) << ')';
s << "Map(" << key_type->getPrettyName(indent) << ", " << value_type->getPrettyName(indent) << ')';
return s.str();
}

View File

@ -98,21 +98,38 @@ std::string DataTypeTuple::doGetPrettyName(size_t indent) const
{
size_t size = elems.size();
WriteBufferFromOwnString s;
s << "Tuple(\n";
for (size_t i = 0; i != size; ++i)
/// If the Tuple is named, we will output it in multiple lines with indentation.
if (have_explicit_names)
{
if (i != 0)
s << ",\n";
s << "Tuple(\n";
s << fourSpaceIndent(indent + 1);
if (have_explicit_names)
s << backQuoteIfNeed(names[i]) << ' ';
for (size_t i = 0; i != size; ++i)
{
if (i != 0)
s << ",\n";
s << elems[i]->getPrettyName(indent + 1);
s << fourSpaceIndent(indent + 1)
<< backQuoteIfNeed(names[i]) << ' '
<< elems[i]->getPrettyName(indent + 1);
}
s << ')';
}
else
{
s << "Tuple(";
for (size_t i = 0; i != size; ++i)
{
if (i != 0)
s << ", ";
s << elems[i]->getPrettyName(indent);
}
s << ')';
}
s << '\n' << fourSpaceIndent(indent) << ')';
return s.str();
}

View File

@ -335,6 +335,22 @@ void SerializationString::deserializeTextJSON(IColumn & column, ReadBuffer & ist
{
read(column, [&](ColumnString::Chars & data) { readJSONArrayInto(data, istr); });
}
else if (settings.json.read_bools_as_strings && !istr.eof() && (*istr.position() == 't' || *istr.position() == 'f'))
{
String str_value;
if (*istr.position() == 't')
{
assertString("true", istr);
str_value = "true";
}
else if (*istr.position() == 'f')
{
assertString("false", istr);
str_value = "false";
}
read(column, [&](ColumnString::Chars & data) { data.insert(str_value.begin(), str_value.end()); });
}
else if (settings.json.read_numbers_as_strings && !istr.eof() && *istr.position() != '"')
{
String field;

View File

@ -92,9 +92,16 @@ void validate(const ASTCreateQuery & create_query)
DatabasePtr DatabaseFactory::get(const ASTCreateQuery & create, const String & metadata_path, ContextPtr context)
{
const auto engine_name = create.storage->engine->name;
/// check if the database engine is a valid one before proceeding
if (!database_engines.contains(create.storage->engine->name))
throw Exception(ErrorCodes::UNKNOWN_DATABASE_ENGINE, "Unknown database engine: {}", create.storage->engine->name);
if (!database_engines.contains(engine_name))
{
auto hints = getHints(engine_name);
if (!hints.empty())
throw Exception(ErrorCodes::UNKNOWN_DATABASE_ENGINE, "Unknown database engine {}. Maybe you meant: {}", engine_name, toString(hints));
else
throw Exception(ErrorCodes::UNKNOWN_DATABASE_ENGINE, "Unknown database engine: {}", create.storage->engine->name);
}
/// if the engine is found (i.e. registered with the factory instance), then validate if the
/// supplied engine arguments, settings and table overrides are valid for the engine.

View File

@ -1,5 +1,6 @@
#pragma once
#include <Common/NamePrompter.h>
#include <Interpreters/Context_fwd.h>
#include <Databases/IDatabase.h>
#include <Parsers/ASTCreateQuery.h>
@ -24,7 +25,7 @@ static inline ValueType safeGetLiteralValue(const ASTPtr &ast, const String &eng
return ast->as<ASTLiteral>()->value.safeGet<ValueType>();
}
class DatabaseFactory : private boost::noncopyable
class DatabaseFactory : private boost::noncopyable, public IHints<>
{
public:
@ -52,6 +53,14 @@ public:
const DatabaseEngines & getDatabaseEngines() const { return database_engines; }
std::vector<String> getAllRegisteredNames() const override
{
std::vector<String> result;
auto getter = [](const auto & pair) { return pair.first; };
std::transform(database_engines.begin(), database_engines.end(), std::back_inserter(result), getter);
return result;
}
private:
DatabaseEngines database_engines;

View File

@ -450,10 +450,11 @@ String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, Fo
break;
case FormatSettings::EscapingRule::JSON:
result += fmt::format(
", try_infer_numbers_from_strings={}, read_bools_as_numbers={}, read_objects_as_strings={}, read_numbers_as_strings={}, "
", try_infer_numbers_from_strings={}, read_bools_as_numbers={}, read_bools_as_strings={}, read_objects_as_strings={}, read_numbers_as_strings={}, "
"read_arrays_as_strings={}, try_infer_objects_as_tuples={}, infer_incomplete_types_as_strings={}, try_infer_objects={}",
settings.json.try_infer_numbers_from_strings,
settings.json.read_bools_as_numbers,
settings.json.read_bools_as_strings,
settings.json.read_objects_as_strings,
settings.json.read_numbers_as_strings,
settings.json.read_arrays_as_strings,

View File

@ -111,6 +111,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
format_settings.json.quote_denormals = settings.output_format_json_quote_denormals;
format_settings.json.quote_decimals = settings.output_format_json_quote_decimals;
format_settings.json.read_bools_as_numbers = settings.input_format_json_read_bools_as_numbers;
format_settings.json.read_bools_as_strings = settings.input_format_json_read_bools_as_strings;
format_settings.json.read_numbers_as_strings = settings.input_format_json_read_numbers_as_strings;
format_settings.json.read_objects_as_strings = settings.input_format_json_read_objects_as_strings;
format_settings.json.read_arrays_as_strings = settings.input_format_json_read_arrays_as_strings;

View File

@ -204,6 +204,7 @@ struct FormatSettings
bool ignore_unknown_keys_in_named_tuple = false;
bool serialize_as_strings = false;
bool read_bools_as_numbers = true;
bool read_bools_as_strings = true;
bool read_numbers_as_strings = true;
bool read_objects_as_strings = true;
bool read_arrays_as_strings = true;

View File

@ -377,6 +377,22 @@ namespace
type_indexes.erase(TypeIndex::UInt8);
}
/// If we have Bool and String types convert all numbers to String.
/// It's applied only when setting input_format_json_read_bools_as_strings is enabled.
void transformJSONBoolsAndStringsToString(DataTypes & data_types, TypeIndexesSet & type_indexes)
{
if (!type_indexes.contains(TypeIndex::String) || !type_indexes.contains(TypeIndex::UInt8))
return;
for (auto & type : data_types)
{
if (isBool(type))
type = std::make_shared<DataTypeString>();
}
type_indexes.erase(TypeIndex::UInt8);
}
/// If we have type Nothing/Nullable(Nothing) and some other non Nothing types,
/// convert all Nothing/Nullable(Nothing) types to the first non Nothing.
/// For example, when we have [Nothing, Array(Int64)] it will convert it to [Array(Int64), Array(Int64)]
@ -628,6 +644,10 @@ namespace
if (settings.json.read_bools_as_numbers)
transformBoolsAndNumbersToNumbers(data_types, type_indexes);
/// Convert Bool to String if needed.
if (settings.json.read_bools_as_strings)
transformJSONBoolsAndStringsToString(data_types, type_indexes);
if (settings.json.try_infer_objects_as_tuples)
mergeJSONPaths(data_types, type_indexes, settings, json_info);
};

105
src/Functions/sqid.cpp Normal file
View File

@ -0,0 +1,105 @@
#include "config.h"
#if USE_SQIDS
#include <Columns/ColumnString.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <Functions/FunctionHelpers.h>
#include <Interpreters/Context.h>
#include <sqids/sqids.hpp>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
// sqid(number1, ...)
class FunctionSqid : public IFunction
{
public:
static constexpr auto name = "sqid";
String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 0; }
bool isVariadic() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionSqid>(); }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Function {} requires at least one argument.", getName());
for (size_t i = 0; i < arguments.size(); ++i)
{
if (!checkDataTypes<
DataTypeUInt8,
DataTypeUInt16,
DataTypeUInt32,
DataTypeUInt64>(arguments[i].get()))
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Argument {} for function {} must have datatype UInt*, given type: {}.",
i, getName(), arguments[i]->getName());
}
return std::make_shared<DataTypeString>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto col_res = ColumnString::create();
col_res->reserve(input_rows_count);
const size_t num_args = arguments.size();
std::vector<UInt64> numbers(num_args);
for (size_t i = 0; i < input_rows_count; ++i)
{
for (size_t j = 0; j < num_args; ++j)
{
const ColumnWithTypeAndName & arg = arguments[j];
ColumnPtr current_column = arg.column;
numbers[j] = current_column->getUInt(i);
}
auto id = sqids.encode(numbers);
col_res->insert(id);
}
return col_res;
}
private:
sqidscxx::Sqids<> sqids;
};
REGISTER_FUNCTION(Sqid)
{
factory.registerFunction<FunctionSqid>(FunctionDocumentation{
.description=R"(
Transforms numbers into a [Sqid](https://sqids.org/) which is a Youtube-like ID string.)",
.syntax="sqid(number1, ...)",
.arguments={{"number1, ...", "Arbitrarily many UInt8, UInt16, UInt32 or UInt64 arguments"}},
.returned_value="A hash id [String](/docs/en/sql-reference/data-types/string.md).",
.examples={
{"simple",
"SELECT sqid(1, 2, 3, 4, 5);",
R"(
sqid(1, 2, 3, 4, 5)
gXHfJ1C6dN
)"
}}
});
}
}
#endif

View File

@ -1382,8 +1382,12 @@ void skipJSONField(ReadBuffer & buf, StringRef name_of_field)
}
else
{
throw Exception(ErrorCodes::INCORRECT_DATA, "Unexpected symbol '{}' for key '{}'",
std::string(*buf.position(), 1), name_of_field.toString());
throw Exception(
ErrorCodes::INCORRECT_DATA,
"Cannot read JSON field here: '{}'. Unexpected symbol '{}'{}",
String(buf.position(), std::min(buf.available(), size_t(10))),
std::string(1, *buf.position()),
name_of_field.empty() ? "" : " for key " + name_of_field.toString());
}
}
@ -1753,7 +1757,7 @@ void readQuotedField(String & s, ReadBuffer & buf)
void readJSONField(String & s, ReadBuffer & buf)
{
s.clear();
auto parse_func = [](ReadBuffer & in) { skipJSONField(in, "json_field"); };
auto parse_func = [](ReadBuffer & in) { skipJSONField(in, ""); };
readParsedValueInto(s, buf, parse_func);
}

View File

@ -1419,7 +1419,7 @@ FutureSetPtr ActionsMatcher::makeSet(const ASTFunction & node, Data & data, bool
return set;
}
FutureSetPtr external_table_set;
FutureSetFromSubqueryPtr external_table_set;
/// A special case is if the name of the table is specified on the right side of the IN statement,
/// and the table has the type Set (a previously prepared set).

View File

@ -664,26 +664,26 @@ void Aggregator::compileAggregateFunctionsIfNeeded()
for (size_t i = 0; i < aggregate_functions.size(); ++i)
{
const auto * function = aggregate_functions[i];
bool function_is_compilable = function->isCompilable();
if (!function_is_compilable)
continue;
size_t offset_of_aggregate_function = offsets_of_aggregate_states[i];
AggregateFunctionWithOffset function_to_compile
if (function->isCompilable())
{
.function = function,
.aggregate_data_offset = offset_of_aggregate_function
};
AggregateFunctionWithOffset function_to_compile
{
.function = function,
.aggregate_data_offset = offset_of_aggregate_function
};
functions_to_compile.emplace_back(std::move(function_to_compile));
functions_to_compile.emplace_back(std::move(function_to_compile));
functions_description += function->getDescription();
functions_description += ' ';
functions_description += function->getDescription();
functions_description += ' ';
functions_description += std::to_string(offset_of_aggregate_function);
functions_description += ' ';
functions_description += std::to_string(offset_of_aggregate_function);
functions_description += ' ';
}
is_aggregate_function_compiled[i] = true;
is_aggregate_function_compiled[i] = function->isCompilable();
}
if (functions_to_compile.empty())
@ -1685,13 +1685,14 @@ bool Aggregator::executeOnBlock(Columns columns,
/// For the case when there are no keys (all aggregate into one row).
if (result.type == AggregatedDataVariants::Type::without_key)
{
#if USE_EMBEDDED_COMPILER
if (compiled_aggregate_functions_holder && !hasSparseArguments(aggregate_functions_instructions.data()))
{
executeWithoutKeyImpl<true>(result.without_key, row_begin, row_end, aggregate_functions_instructions.data(), result.aggregates_pool);
}
else
#endif
/// TODO: Enable compilation after investigation
// #if USE_EMBEDDED_COMPILER
// if (compiled_aggregate_functions_holder)
// {
// executeWithoutKeyImpl<true>(result.without_key, row_begin, row_end, aggregate_functions_instructions.data(), result.aggregates_pool);
// }
// else
// #endif
{
executeWithoutKeyImpl<false>(result.without_key, row_begin, row_end, aggregate_functions_instructions.data(), result.aggregates_pool);
}

View File

@ -330,6 +330,9 @@ struct ContextSharedPart : boost::noncopyable
mutable ThrottlerPtr backups_server_throttler; /// A server-wide throttler for BACKUPs
mutable ThrottlerPtr mutations_throttler; /// A server-wide throttler for mutations
mutable ThrottlerPtr merges_throttler; /// A server-wide throttler for merges
MultiVersion<Macros> macros; /// Substitutions extracted from config.
std::unique_ptr<DDLWorker> ddl_worker TSA_GUARDED_BY(mutex); /// Process ddl commands from zk.
LoadTaskPtr ddl_worker_startup_task; /// To postpone `ddl_worker->startup()` after all tables startup
@ -738,6 +741,12 @@ struct ContextSharedPart : boost::noncopyable
if (auto bandwidth = server_settings.max_backup_bandwidth_for_server)
backups_server_throttler = std::make_shared<Throttler>(bandwidth);
if (auto bandwidth = server_settings.max_mutations_bandwidth_for_server)
mutations_throttler = std::make_shared<Throttler>(bandwidth);
if (auto bandwidth = server_settings.max_merges_bandwidth_for_server)
merges_throttler = std::make_shared<Throttler>(bandwidth);
}
};
@ -3001,6 +3010,16 @@ ThrottlerPtr Context::getBackupsThrottler() const
return throttler;
}
ThrottlerPtr Context::getMutationsThrottler() const
{
return shared->mutations_throttler;
}
ThrottlerPtr Context::getMergesThrottler() const
{
return shared->merges_throttler;
}
bool Context::hasDistributedDDL() const
{
return getConfigRef().has("distributed_ddl");

View File

@ -1328,6 +1328,9 @@ public:
ThrottlerPtr getBackupsThrottler() const;
ThrottlerPtr getMutationsThrottler() const;
ThrottlerPtr getMergesThrottler() const;
/// Kitchen sink
using ContextData::KitchenSink;
using ContextData::kitchen_sink;

View File

@ -82,8 +82,8 @@ private:
using DDLGuardPtr = std::unique_ptr<DDLGuard>;
class FutureSet;
using FutureSetPtr = std::shared_ptr<FutureSet>;
class FutureSetFromSubquery;
using FutureSetFromSubqueryPtr = std::shared_ptr<FutureSetFromSubquery>;
/// Creates temporary table in `_temporary_and_external_tables` with randomly generated unique StorageID.
/// Such table can be accessed from everywhere by its ID.
@ -116,7 +116,7 @@ struct TemporaryTableHolder : boost::noncopyable, WithContext
IDatabase * temporary_tables = nullptr;
UUID id = UUIDHelpers::Nil;
FutureSetPtr future_set;
FutureSetFromSubqueryPtr future_set;
};
///TODO maybe remove shared_ptr from here?

View File

@ -2378,12 +2378,25 @@ std::optional<UInt64> InterpreterSelectQuery::getTrivialCount(UInt64 max_paralle
else
{
// It's possible to optimize count() given only partition predicates
SelectQueryInfo temp_query_info;
temp_query_info.query = query_ptr;
temp_query_info.syntax_analyzer_result = syntax_analyzer_result;
temp_query_info.prepared_sets = query_analyzer->getPreparedSets();
ActionsDAG::NodeRawConstPtrs filter_nodes;
if (analysis_result.hasPrewhere())
{
auto & prewhere_info = analysis_result.prewhere_info;
filter_nodes.push_back(&prewhere_info->prewhere_actions->findInOutputs(prewhere_info->prewhere_column_name));
return storage->totalRowsByPartitionPredicate(temp_query_info, context);
if (prewhere_info->row_level_filter)
filter_nodes.push_back(&prewhere_info->row_level_filter->findInOutputs(prewhere_info->row_level_column_name));
}
if (analysis_result.hasWhere())
{
filter_nodes.push_back(&analysis_result.before_where->findInOutputs(analysis_result.where_column_name));
}
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(filter_nodes, {}, context);
if (!filter_actions_dag)
return {};
return storage->totalRowsByPartitionPredicate(filter_actions_dag, context);
}
}
@ -2501,7 +2514,12 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc
max_block_size = std::max<UInt64>(1, max_block_limited);
max_threads_execute_query = max_streams = 1;
}
if (max_block_limited < local_limits.local_limits.size_limits.max_rows)
if (local_limits.local_limits.size_limits.max_rows != 0)
{
if (max_block_limited < local_limits.local_limits.size_limits.max_rows)
query_info.limit = max_block_limited;
}
else
{
query_info.limit = max_block_limited;
}

View File

@ -67,8 +67,7 @@ static void compileFunction(llvm::Module & module, const IFunctionBase & functio
{
const auto & function_argument_types = function.getArgumentTypes();
auto & context = module.getContext();
llvm::IRBuilder<> b(context);
llvm::IRBuilder<> b(module.getContext());
auto * size_type = b.getIntNTy(sizeof(size_t) * 8);
auto * data_type = llvm::StructType::get(b.getInt8PtrTy(), b.getInt8PtrTy());
auto * func_type = llvm::FunctionType::get(b.getVoidTy(), { size_type, data_type->getPointerTo() }, /*isVarArg=*/false);
@ -76,8 +75,6 @@ static void compileFunction(llvm::Module & module, const IFunctionBase & functio
/// Create function in module
auto * func = llvm::Function::Create(func_type, llvm::Function::ExternalLinkage, function.getName(), module);
func->setAttributes(llvm::AttributeList::get(context, {{2, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)}}));
auto * args = func->args().begin();
llvm::Value * rows_count_arg = args++;
llvm::Value * columns_arg = args++;
@ -199,9 +196,6 @@ static void compileCreateAggregateStatesFunctions(llvm::Module & module, const s
auto * create_aggregate_states_function_type = llvm::FunctionType::get(b.getVoidTy(), { aggregate_data_places_type }, false);
auto * create_aggregate_states_function = llvm::Function::Create(create_aggregate_states_function_type, llvm::Function::ExternalLinkage, name, module);
create_aggregate_states_function->setAttributes(
llvm::AttributeList::get(context, {{1, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)}}));
auto * arguments = create_aggregate_states_function->args().begin();
llvm::Value * aggregate_data_place_arg = arguments++;
@ -247,11 +241,6 @@ static void compileAddIntoAggregateStatesFunctions(llvm::Module & module,
auto * add_into_aggregate_states_func_declaration = llvm::FunctionType::get(b.getVoidTy(), { size_type, size_type, column_type->getPointerTo(), places_type }, false);
auto * add_into_aggregate_states_func = llvm::Function::Create(add_into_aggregate_states_func_declaration, llvm::Function::ExternalLinkage, name, module);
add_into_aggregate_states_func->setAttributes(llvm::AttributeList::get(
context,
{{3, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)},
{4, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)}}));
auto * arguments = add_into_aggregate_states_func->args().begin();
llvm::Value * row_start_arg = arguments++;
llvm::Value * row_end_arg = arguments++;
@ -307,7 +296,7 @@ static void compileAddIntoAggregateStatesFunctions(llvm::Module & module,
llvm::Value * aggregation_place = nullptr;
if (places_argument_type == AddIntoAggregateStatesPlacesArgumentType::MultiplePlaces)
aggregation_place = b.CreateLoad(b.getInt8Ty()->getPointerTo(), b.CreateInBoundsGEP(b.getInt8Ty()->getPointerTo(), places_arg, counter_phi));
aggregation_place = b.CreateLoad(b.getInt8Ty()->getPointerTo(), b.CreateGEP(b.getInt8Ty()->getPointerTo(), places_arg, counter_phi));
else
aggregation_place = places_arg;
@ -324,7 +313,7 @@ static void compileAddIntoAggregateStatesFunctions(llvm::Module & module,
auto & column = columns[previous_columns_size + column_argument_index];
const auto & argument_type = arguments_types[column_argument_index];
auto * column_data_element = b.CreateLoad(column.data_element_type, b.CreateInBoundsGEP(column.data_element_type, column.data_ptr, counter_phi));
auto * column_data_element = b.CreateLoad(column.data_element_type, b.CreateGEP(column.data_element_type, column.data_ptr, counter_phi));
if (!argument_type->isNullable())
{
@ -332,7 +321,7 @@ static void compileAddIntoAggregateStatesFunctions(llvm::Module & module,
continue;
}
auto * column_null_data_with_offset = b.CreateInBoundsGEP(b.getInt8Ty(), column.null_data_ptr, counter_phi);
auto * column_null_data_with_offset = b.CreateGEP(b.getInt8Ty(), column.null_data_ptr, counter_phi);
auto * is_null = b.CreateICmpNE(b.CreateLoad(b.getInt8Ty(), column_null_data_with_offset), b.getInt8(0));
auto * nullable_unitialized = llvm::Constant::getNullValue(toNullableType(b, column.data_element_type));
auto * first_insert = b.CreateInsertValue(nullable_unitialized, column_data_element, {0});
@ -365,8 +354,7 @@ static void compileAddIntoAggregateStatesFunctions(llvm::Module & module,
static void compileMergeAggregatesStates(llvm::Module & module, const std::vector<AggregateFunctionWithOffset> & functions, const std::string & name)
{
auto & context = module.getContext();
llvm::IRBuilder<> b(context);
llvm::IRBuilder<> b(module.getContext());
auto * aggregate_data_place_type = b.getInt8Ty()->getPointerTo();
auto * aggregate_data_places_type = aggregate_data_place_type->getPointerTo();
@ -377,11 +365,6 @@ static void compileMergeAggregatesStates(llvm::Module & module, const std::vecto
auto * merge_aggregates_states_func
= llvm::Function::Create(merge_aggregates_states_func_declaration, llvm::Function::ExternalLinkage, name, module);
merge_aggregates_states_func->setAttributes(llvm::AttributeList::get(
context,
{{1, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)},
{2, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)}}));
auto * arguments = merge_aggregates_states_func->args().begin();
llvm::Value * aggregate_data_places_dst_arg = arguments++;
llvm::Value * aggregate_data_places_src_arg = arguments++;
@ -443,11 +426,6 @@ static void compileInsertAggregatesIntoResultColumns(llvm::Module & module, cons
auto * insert_aggregates_into_result_func_declaration = llvm::FunctionType::get(b.getVoidTy(), { size_type, size_type, column_type->getPointerTo(), aggregate_data_places_type }, false);
auto * insert_aggregates_into_result_func = llvm::Function::Create(insert_aggregates_into_result_func_declaration, llvm::Function::ExternalLinkage, name, module);
insert_aggregates_into_result_func->setAttributes(llvm::AttributeList::get(
context,
{{3, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)},
{4, llvm::Attribute::get(context, llvm::Attribute::AttrKind::NoAlias)}}));
auto * arguments = insert_aggregates_into_result_func->args().begin();
llvm::Value * row_start_arg = arguments++;
llvm::Value * row_end_arg = arguments++;
@ -482,7 +460,7 @@ static void compileInsertAggregatesIntoResultColumns(llvm::Module & module, cons
auto * counter_phi = b.CreatePHI(row_start_arg->getType(), 2);
counter_phi->addIncoming(row_start_arg, entry);
auto * aggregate_data_place = b.CreateLoad(b.getInt8Ty()->getPointerTo(), b.CreateInBoundsGEP(b.getInt8Ty()->getPointerTo(), aggregate_data_places_arg, counter_phi));
auto * aggregate_data_place = b.CreateLoad(b.getInt8Ty()->getPointerTo(), b.CreateGEP(b.getInt8Ty()->getPointerTo(), aggregate_data_places_arg, counter_phi));
for (size_t i = 0; i < functions.size(); ++i)
{
@ -492,11 +470,11 @@ static void compileInsertAggregatesIntoResultColumns(llvm::Module & module, cons
const auto * aggregate_function_ptr = functions[i].function;
auto * final_value = aggregate_function_ptr->compileGetResult(b, aggregation_place_with_offset);
auto * result_column_data_element = b.CreateInBoundsGEP(columns[i].data_element_type, columns[i].data_ptr, counter_phi);
auto * result_column_data_element = b.CreateGEP(columns[i].data_element_type, columns[i].data_ptr, counter_phi);
if (columns[i].null_data_ptr)
{
b.CreateStore(b.CreateExtractValue(final_value, {0}), result_column_data_element);
auto * result_column_is_null_element = b.CreateInBoundsGEP(b.getInt8Ty(), columns[i].null_data_ptr, counter_phi);
auto * result_column_is_null_element = b.CreateGEP(b.getInt8Ty(), columns[i].null_data_ptr, counter_phi);
b.CreateStore(b.CreateSelect(b.CreateExtractValue(final_value, {1}), b.getInt8(1), b.getInt8(0)), result_column_is_null_element);
}
else

View File

@ -1280,6 +1280,7 @@ void MutationsInterpreter::Source::read(
VirtualColumns virtual_columns(std::move(required_columns), part);
createReadFromPartStep(
MergeTreeSequentialSourceType::Mutation,
plan, *data, storage_snapshot, part,
std::move(virtual_columns.columns_to_read),
apply_deleted_mask_, filter, context_,

View File

@ -97,7 +97,7 @@ FutureSetFromSubquery::FutureSetFromSubquery(
String key,
std::unique_ptr<QueryPlan> source_,
StoragePtr external_table_,
FutureSetPtr external_table_set_,
std::shared_ptr<FutureSetFromSubquery> external_table_set_,
const Settings & settings,
bool in_subquery_)
: external_table(std::move(external_table_))
@ -168,6 +168,24 @@ std::unique_ptr<QueryPlan> FutureSetFromSubquery::build(const ContextPtr & conte
return plan;
}
void FutureSetFromSubquery::buildSetInplace(const ContextPtr & context)
{
if (external_table_set)
external_table_set->buildSetInplace(context);
auto plan = build(context);
if (!plan)
return;
auto builder = plan->buildQueryPipeline(QueryPlanOptimizationSettings::fromContext(context), BuildQueryPipelineSettings::fromContext(context));
auto pipeline = QueryPipelineBuilder::getPipeline(std::move(*builder));
pipeline.complete(std::make_shared<EmptySink>(Block()));
CompletedPipelineExecutor executor(pipeline);
executor.execute();
}
SetPtr FutureSetFromSubquery::buildOrderedSetInplace(const ContextPtr & context)
{
if (!context->getSettingsRef().use_index_for_in_with_subqueries)
@ -233,7 +251,7 @@ String PreparedSets::toString(const PreparedSets::Hash & key, const DataTypes &
return buf.str();
}
FutureSetPtr PreparedSets::addFromTuple(const Hash & key, Block block, const Settings & settings)
FutureSetFromTuplePtr PreparedSets::addFromTuple(const Hash & key, Block block, const Settings & settings)
{
auto from_tuple = std::make_shared<FutureSetFromTuple>(std::move(block), settings);
const auto & set_types = from_tuple->getTypes();
@ -247,7 +265,7 @@ FutureSetPtr PreparedSets::addFromTuple(const Hash & key, Block block, const Set
return from_tuple;
}
FutureSetPtr PreparedSets::addFromStorage(const Hash & key, SetPtr set_)
FutureSetFromStoragePtr PreparedSets::addFromStorage(const Hash & key, SetPtr set_)
{
auto from_storage = std::make_shared<FutureSetFromStorage>(std::move(set_));
auto [it, inserted] = sets_from_storage.emplace(key, from_storage);
@ -258,11 +276,11 @@ FutureSetPtr PreparedSets::addFromStorage(const Hash & key, SetPtr set_)
return from_storage;
}
FutureSetPtr PreparedSets::addFromSubquery(
FutureSetFromSubqueryPtr PreparedSets::addFromSubquery(
const Hash & key,
std::unique_ptr<QueryPlan> source,
StoragePtr external_table,
FutureSetPtr external_table_set,
FutureSetFromSubqueryPtr external_table_set,
const Settings & settings,
bool in_subquery)
{
@ -282,7 +300,7 @@ FutureSetPtr PreparedSets::addFromSubquery(
return from_subquery;
}
FutureSetPtr PreparedSets::addFromSubquery(
FutureSetFromSubqueryPtr PreparedSets::addFromSubquery(
const Hash & key,
QueryTreeNodePtr query_tree,
const Settings & settings)
@ -300,7 +318,7 @@ FutureSetPtr PreparedSets::addFromSubquery(
return from_subquery;
}
FutureSetPtr PreparedSets::findTuple(const Hash & key, const DataTypes & types) const
FutureSetFromTuplePtr PreparedSets::findTuple(const Hash & key, const DataTypes & types) const
{
auto it = sets_from_tuple.find(key);
if (it == sets_from_tuple.end())

View File

@ -69,6 +69,8 @@ private:
SetPtr set;
};
using FutureSetFromStoragePtr = std::shared_ptr<FutureSetFromStorage>;
/// Set from tuple is filled as well as set from storage.
/// Additionally, it can be converted to set useful for PK.
class FutureSetFromTuple final : public FutureSet
@ -86,6 +88,8 @@ private:
SetKeyColumns set_key_columns;
};
using FutureSetFromTuplePtr = std::shared_ptr<FutureSetFromTuple>;
/// Set from subquery can be built inplace for PK or in CreatingSet step.
/// If use_index_for_in_with_subqueries_max_values is reached, set for PK won't be created,
/// but ordinary set would be created instead.
@ -96,7 +100,7 @@ public:
String key,
std::unique_ptr<QueryPlan> source_,
StoragePtr external_table_,
FutureSetPtr external_table_set_,
std::shared_ptr<FutureSetFromSubquery> external_table_set_,
const Settings & settings,
bool in_subquery_);
@ -110,6 +114,7 @@ public:
SetPtr buildOrderedSetInplace(const ContextPtr & context) override;
std::unique_ptr<QueryPlan> build(const ContextPtr & context);
void buildSetInplace(const ContextPtr & context);
QueryTreeNodePtr detachQueryTree() { return std::move(query_tree); }
void setQueryPlan(std::unique_ptr<QueryPlan> source_);
@ -119,7 +124,7 @@ public:
private:
SetAndKeyPtr set_and_key;
StoragePtr external_table;
FutureSetPtr external_table_set;
std::shared_ptr<FutureSetFromSubquery> external_table_set;
std::unique_ptr<QueryPlan> source;
QueryTreeNodePtr query_tree;
@ -130,6 +135,8 @@ private:
// with new analyzer it's not a case
};
using FutureSetFromSubqueryPtr = std::shared_ptr<FutureSetFromSubquery>;
/// Container for all the sets used in query.
class PreparedSets
{
@ -141,32 +148,32 @@ public:
UInt64 operator()(const Hash & key) const { return key.low64 ^ key.high64; }
};
using SetsFromTuple = std::unordered_map<Hash, std::vector<std::shared_ptr<FutureSetFromTuple>>, Hashing>;
using SetsFromStorage = std::unordered_map<Hash, std::shared_ptr<FutureSetFromStorage>, Hashing>;
using SetsFromSubqueries = std::unordered_map<Hash, std::shared_ptr<FutureSetFromSubquery>, Hashing>;
using SetsFromTuple = std::unordered_map<Hash, std::vector<FutureSetFromTuplePtr>, Hashing>;
using SetsFromStorage = std::unordered_map<Hash, FutureSetFromStoragePtr, Hashing>;
using SetsFromSubqueries = std::unordered_map<Hash, FutureSetFromSubqueryPtr, Hashing>;
FutureSetPtr addFromStorage(const Hash & key, SetPtr set_);
FutureSetPtr addFromTuple(const Hash & key, Block block, const Settings & settings);
FutureSetFromStoragePtr addFromStorage(const Hash & key, SetPtr set_);
FutureSetFromTuplePtr addFromTuple(const Hash & key, Block block, const Settings & settings);
FutureSetPtr addFromSubquery(
FutureSetFromSubqueryPtr addFromSubquery(
const Hash & key,
std::unique_ptr<QueryPlan> source,
StoragePtr external_table,
FutureSetPtr external_table_set,
FutureSetFromSubqueryPtr external_table_set,
const Settings & settings,
bool in_subquery = false);
FutureSetPtr addFromSubquery(
FutureSetFromSubqueryPtr addFromSubquery(
const Hash & key,
QueryTreeNodePtr query_tree,
const Settings & settings);
FutureSetPtr findTuple(const Hash & key, const DataTypes & types) const;
std::shared_ptr<FutureSetFromStorage> findStorage(const Hash & key) const;
std::shared_ptr<FutureSetFromSubquery> findSubquery(const Hash & key) const;
FutureSetFromTuplePtr findTuple(const Hash & key, const DataTypes & types) const;
FutureSetFromStoragePtr findStorage(const Hash & key) const;
FutureSetFromSubqueryPtr findSubquery(const Hash & key) const;
void markAsINSubquery(const Hash & key);
using Subqueries = std::vector<std::shared_ptr<FutureSetFromSubquery>>;
using Subqueries = std::vector<FutureSetFromSubqueryPtr>;
Subqueries getSubqueries() const;
bool hasSubqueries() const { return !sets_from_subqueries.empty(); }

View File

@ -36,7 +36,6 @@ struct RequiredSourceColumnsData
bool has_table_join = false;
bool has_array_join = false;
bool visit_index_hint = false;
bool addColumnAliasIfAny(const IAST & ast);
void addColumnIdentifier(const ASTIdentifier & node);

View File

@ -72,11 +72,6 @@ void RequiredSourceColumnsMatcher::visit(const ASTPtr & ast, Data & data)
}
if (auto * t = ast->as<ASTFunction>())
{
/// "indexHint" is a special function for index analysis.
/// Everything that is inside it is not calculated. See KeyCondition
if (!data.visit_index_hint && t->name == "indexHint")
return;
data.addColumnAliasIfAny(*ast);
visit(*t, ast, data);
return;

View File

@ -995,13 +995,12 @@ void TreeRewriterResult::collectSourceColumns(bool add_special)
/// Calculate which columns are required to execute the expression.
/// Then, delete all other columns from the list of available columns.
/// After execution, columns will only contain the list of columns needed to read from the table.
bool TreeRewriterResult::collectUsedColumns(const ASTPtr & query, bool is_select, bool visit_index_hint, bool no_throw)
bool TreeRewriterResult::collectUsedColumns(const ASTPtr & query, bool is_select, bool no_throw)
{
/// We calculate required_source_columns with source_columns modifications and swap them on exit
required_source_columns = source_columns;
RequiredSourceColumnsVisitor::Data columns_context;
columns_context.visit_index_hint = visit_index_hint;
RequiredSourceColumnsVisitor(columns_context).visit(query);
NameSet source_column_names;
@ -1385,7 +1384,7 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect(
result.window_function_asts = getWindowFunctions(query, *select_query);
result.expressions_with_window_function = getExpressionsWithWindowFunctions(query);
result.collectUsedColumns(query, true, settings.query_plan_optimize_primary_key);
result.collectUsedColumns(query, true);
if (!result.missed_subcolumns.empty())
{
@ -1422,7 +1421,7 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect(
result.aggregates = getAggregates(query, *select_query);
result.window_function_asts = getWindowFunctions(query, *select_query);
result.expressions_with_window_function = getExpressionsWithWindowFunctions(query);
result.collectUsedColumns(query, true, settings.query_plan_optimize_primary_key);
result.collectUsedColumns(query, true);
}
}
@ -1499,7 +1498,7 @@ TreeRewriterResultPtr TreeRewriter::analyze(
else
assertNoAggregates(query, "in wrong place");
bool is_ok = result.collectUsedColumns(query, false, settings.query_plan_optimize_primary_key, no_throw);
bool is_ok = result.collectUsedColumns(query, false, no_throw);
if (!is_ok)
return {};

View File

@ -88,7 +88,7 @@ struct TreeRewriterResult
bool add_special = true);
void collectSourceColumns(bool add_special);
bool collectUsedColumns(const ASTPtr & query, bool is_select, bool visit_index_hint, bool no_throw = false);
bool collectUsedColumns(const ASTPtr & query, bool is_select, bool no_throw = false);
Names requiredSourceColumns() const { return required_source_columns.getNames(); }
const Names & requiredSourceColumnsForAccessCheck() const { return required_source_columns_before_expanding_alias_columns; }
NameSet getArrayJoinSourceNameSet() const;

View File

@ -8,6 +8,8 @@
#include <Analyzer/QueryNode.h>
#include <Analyzer/TableNode.h>
#include <Analyzer/TableFunctionNode.h>
#include <Analyzer/JoinNode.h>
#include <Analyzer/ListNode.h>
#include <Planner/PlannerContext.h>
#include <Planner/PlannerActionsVisitor.h>
@ -33,6 +35,28 @@ public:
void visitImpl(QueryTreeNodePtr & node)
{
/// Special case for USING clause which contains references to ALIAS columns.
/// We can not modify such ColumnNode.
if (auto * join_node = node->as<JoinNode>())
{
if (!join_node->isUsingJoinExpression())
return;
auto & using_list = join_node->getJoinExpression()->as<ListNode&>();
for (auto & using_element : using_list)
{
auto & column_node = using_element->as<ColumnNode&>();
/// This list contains column nodes from left and right tables.
auto & columns_from_subtrees = column_node.getExpressionOrThrow()->as<ListNode&>().getNodes();
/// Visit left table column node.
visitUsingColumn(columns_from_subtrees[0]);
/// Visit right table column node.
visitUsingColumn(columns_from_subtrees[1]);
}
return;
}
auto * column_node = node->as<ColumnNode>();
if (!column_node)
return;
@ -55,7 +79,13 @@ public:
if (column_node->hasExpression() && column_source_node_type != QueryTreeNodeType::ARRAY_JOIN)
{
/// Replace ALIAS column with expression
table_expression_data.addAliasColumnName(column_node->getColumnName());
bool column_already_exists = table_expression_data.hasColumn(column_node->getColumnName());
if (!column_already_exists)
{
auto column_identifier = planner_context.getGlobalPlannerContext()->createColumnIdentifier(node);
table_expression_data.addAliasColumnName(column_node->getColumnName(), column_identifier);
}
node = column_node->getExpression();
visitImpl(node);
return;
@ -78,13 +108,38 @@ public:
table_expression_data.addColumn(column_node->getColumn(), column_identifier);
}
static bool needChildVisit(const QueryTreeNodePtr &, const QueryTreeNodePtr & child_node)
static bool needChildVisit(const QueryTreeNodePtr & parent, const QueryTreeNodePtr & child_node)
{
if (auto * join_node = parent->as<JoinNode>())
{
if (join_node->getJoinExpression() == child_node && join_node->isUsingJoinExpression())
return false;
}
auto child_node_type = child_node->getNodeType();
return !(child_node_type == QueryTreeNodeType::QUERY || child_node_type == QueryTreeNodeType::UNION);
}
private:
void visitUsingColumn(QueryTreeNodePtr & node)
{
auto & column_node = node->as<ColumnNode&>();
if (column_node.hasExpression())
{
auto & table_expression_data = planner_context.getOrCreateTableExpressionData(column_node.getColumnSource());
bool column_already_exists = table_expression_data.hasColumn(column_node.getColumnName());
if (column_already_exists)
return;
auto column_identifier = planner_context.getGlobalPlannerContext()->createColumnIdentifier(node);
table_expression_data.addAliasColumnName(column_node.getColumnName(), column_identifier);
visitImpl(column_node.getExpressionOrThrow());
}
else
visitImpl(node);
}
PlannerContext & planner_context;
};

View File

@ -1057,7 +1057,7 @@ void addBuildSubqueriesForSetsStepIfNeeded(
Planner subquery_planner(
query_tree,
subquery_options,
planner_context->getGlobalPlannerContext());
std::make_shared<GlobalPlannerContext>()); //planner_context->getGlobalPlannerContext());
subquery_planner.buildQueryPlanIfNeeded();
subquery->setQueryPlan(std::make_unique<QueryPlan>(std::move(subquery_planner).extractQueryPlan()));

View File

@ -20,12 +20,15 @@ const ColumnIdentifier & GlobalPlannerContext::createColumnIdentifier(const Quer
return createColumnIdentifier(column_node_typed.getColumn(), column_source_node);
}
const ColumnIdentifier & GlobalPlannerContext::createColumnIdentifier(const NameAndTypePair & column, const QueryTreeNodePtr & /*column_source_node*/)
const ColumnIdentifier & GlobalPlannerContext::createColumnIdentifier(const NameAndTypePair & column, const QueryTreeNodePtr & column_source_node)
{
std::string column_identifier;
column_identifier += column.name;
column_identifier += '_' + std::to_string(column_identifiers.size());
const auto & source_alias = column_source_node->getAlias();
if (!source_alias.empty())
column_identifier = source_alias + "." + column.name;
else
column_identifier = column.name;
auto [it, inserted] = column_identifiers.emplace(column_identifier);
assert(inserted);

View File

@ -645,7 +645,12 @@ JoinTreeQueryPlan buildQueryPlanForTableExpression(QueryTreeNodePtr table_expres
max_threads_execute_query = 1;
}
if (max_block_size_limited < select_query_info.local_storage_limits.local_limits.size_limits.max_rows)
if (select_query_info.local_storage_limits.local_limits.size_limits.max_rows != 0)
{
if (max_block_size_limited < select_query_info.local_storage_limits.local_limits.size_limits.max_rows)
table_expression_query_info.limit = max_block_size_limited;
}
else
{
table_expression_query_info.limit = max_block_size_limited;
}
@ -812,7 +817,7 @@ JoinTreeQueryPlan buildQueryPlanForTableExpression(QueryTreeNodePtr table_expres
}
}
const auto & table_expression_alias = table_expression->getAlias();
const auto & table_expression_alias = table_expression->getOriginalAlias();
auto additional_filters_info = buildAdditionalFiltersIfNeeded(storage, table_expression_alias, table_expression_query_info, planner_context);
add_filter(additional_filters_info, "additional filter");
@ -978,6 +983,57 @@ void joinCastPlanColumnsToNullable(QueryPlan & plan_to_add_cast, PlannerContextP
plan_to_add_cast.addStep(std::move(cast_join_columns_step));
}
/// Actions to calculate table columns that have a functional representation (ALIASes and subcolumns)
/// and used in USING clause of JOIN expression.
struct UsingAliasKeyActions
{
UsingAliasKeyActions(
const ColumnsWithTypeAndName & left_plan_output_columns,
const ColumnsWithTypeAndName & right_plan_output_columns
)
: left_alias_columns_keys(std::make_shared<ActionsDAG>(left_plan_output_columns))
, right_alias_columns_keys(std::make_shared<ActionsDAG>(right_plan_output_columns))
{}
void addLeftColumn(QueryTreeNodePtr & node, const ColumnsWithTypeAndName & plan_output_columns, const PlannerContextPtr & planner_context)
{
addColumnImpl(left_alias_columns_keys, node, plan_output_columns, planner_context);
}
void addRightColumn(QueryTreeNodePtr & node, const ColumnsWithTypeAndName & plan_output_columns, const PlannerContextPtr & planner_context)
{
addColumnImpl(right_alias_columns_keys, node, plan_output_columns, planner_context);
}
ActionsDAGPtr getLeftActions()
{
left_alias_columns_keys->projectInput();
return std::move(left_alias_columns_keys);
}
ActionsDAGPtr getRightActions()
{
right_alias_columns_keys->projectInput();
return std::move(right_alias_columns_keys);
}
private:
void addColumnImpl(ActionsDAGPtr & alias_columns_keys, QueryTreeNodePtr & node, const ColumnsWithTypeAndName & plan_output_columns, const PlannerContextPtr & planner_context)
{
auto & column_node = node->as<ColumnNode&>();
if (column_node.hasExpression())
{
auto dag = buildActionsDAGFromExpressionNode(column_node.getExpressionOrThrow(), plan_output_columns, planner_context);
const auto & left_inner_column_identifier = planner_context->getColumnNodeIdentifierOrThrow(node);
dag->addOrReplaceInOutputs(dag->addAlias(*dag->getOutputs().front(), left_inner_column_identifier));
alias_columns_keys->mergeInplace(std::move(*dag));
}
}
ActionsDAGPtr left_alias_columns_keys;
ActionsDAGPtr right_alias_columns_keys;
};
JoinTreeQueryPlan buildQueryPlanForJoinNode(const QueryTreeNodePtr & join_table_expression,
JoinTreeQueryPlan left_join_tree_query_plan,
JoinTreeQueryPlan right_join_tree_query_plan,
@ -1002,6 +1058,18 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(const QueryTreeNodePtr & join_table_
auto right_plan = std::move(right_join_tree_query_plan.query_plan);
auto right_plan_output_columns = right_plan.getCurrentDataStream().header.getColumnsWithTypeAndName();
// {
// WriteBufferFromOwnString buf;
// left_plan.explainPlan(buf, {.header = true, .actions = true});
// std::cerr << "left plan \n "<< buf.str() << std::endl;
// }
// {
// WriteBufferFromOwnString buf;
// right_plan.explainPlan(buf, {.header = true, .actions = true});
// std::cerr << "right plan \n "<< buf.str() << std::endl;
// }
JoinClausesAndActions join_clauses_and_actions;
JoinKind join_kind = join_node.getKind();
JoinStrictness join_strictness = join_node.getStrictness();
@ -1034,6 +1102,8 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(const QueryTreeNodePtr & join_table_
if (join_node.isUsingJoinExpression())
{
UsingAliasKeyActions using_alias_key_actions{left_plan_output_columns, right_plan_output_columns};
auto & join_node_using_columns_list = join_node.getJoinExpression()->as<ListNode &>();
for (auto & join_node_using_node : join_node_using_columns_list.getNodes())
{
@ -1043,9 +1113,13 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(const QueryTreeNodePtr & join_table_
auto & left_inner_column_node = inner_columns_list.getNodes().at(0);
auto & left_inner_column = left_inner_column_node->as<ColumnNode &>();
using_alias_key_actions.addLeftColumn(left_inner_column_node, left_plan_output_columns, planner_context);
auto & right_inner_column_node = inner_columns_list.getNodes().at(1);
auto & right_inner_column = right_inner_column_node->as<ColumnNode &>();
using_alias_key_actions.addRightColumn(right_inner_column_node, right_plan_output_columns, planner_context);
const auto & join_node_using_column_node_type = join_node_using_column_node.getColumnType();
if (!left_inner_column.getColumnType()->equals(*join_node_using_column_node_type))
{
@ -1059,6 +1133,14 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(const QueryTreeNodePtr & join_table_
right_plan_column_name_to_cast_type.emplace(right_inner_column_identifier, join_node_using_column_node_type);
}
}
auto left_alias_columns_keys_step = std::make_unique<ExpressionStep>(left_plan.getCurrentDataStream(), using_alias_key_actions.getLeftActions());
left_alias_columns_keys_step->setStepDescription("Actions for left table alias column keys");
left_plan.addStep(std::move(left_alias_columns_keys_step));
auto right_alias_columns_keys_step = std::make_unique<ExpressionStep>(right_plan.getCurrentDataStream(), using_alias_key_actions.getRightActions());
right_alias_columns_keys_step->setStepDescription("Actions for right table alias column keys");
right_plan.addStep(std::move(right_alias_columns_keys_step));
}
auto join_cast_plan_output_nodes = [&](QueryPlan & plan_to_add_cast, std::unordered_map<std::string, DataTypePtr> & plan_column_name_to_cast_type)

View File

@ -20,6 +20,7 @@
#include <Analyzer/Utils.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/ColumnNode.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/TableNode.h>
#include <Analyzer/TableFunctionNode.h>
@ -113,41 +114,96 @@ String JoinClause::dump() const
namespace
{
std::optional<JoinTableSide> extractJoinTableSideFromExpression(const ActionsDAG::Node * expression_root_node,
const std::unordered_set<const ActionsDAG::Node *> & join_expression_dag_input_nodes,
const NameSet & left_table_expression_columns_names,
const NameSet & right_table_expression_columns_names,
using TableExpressionSet = std::unordered_set<const IQueryTreeNode *>;
TableExpressionSet extractTableExpressionsSet(const QueryTreeNodePtr & node)
{
TableExpressionSet res;
for (const auto & expr : extractTableExpressions(node, true))
res.insert(expr.get());
return res;
}
std::optional<JoinTableSide> extractJoinTableSideFromExpression(//const ActionsDAG::Node * expression_root_node,
const IQueryTreeNode * expression_root_node,
//const std::unordered_set<const ActionsDAG::Node *> & join_expression_dag_input_nodes,
const TableExpressionSet & left_table_expressions,
const TableExpressionSet & right_table_expressions,
const JoinNode & join_node)
{
std::optional<JoinTableSide> table_side;
std::vector<const ActionsDAG::Node *> nodes_to_process;
std::vector<const IQueryTreeNode *> nodes_to_process;
nodes_to_process.push_back(expression_root_node);
// std::cerr << "==== extractJoinTableSideFromExpression\n";
// std::cerr << "inp nodes" << std::endl;
// for (const auto * node : join_expression_dag_input_nodes)
// std::cerr << reinterpret_cast<const void *>(node) << ' ' << node->result_name << std::endl;
// std::cerr << "l names" << std::endl;
// for (const auto & l : left_table_expression_columns_names)
// std::cerr << l << std::endl;
// std::cerr << "r names" << std::endl;
// for (const auto & r : right_table_expression_columns_names)
// std::cerr << r << std::endl;
// const auto * left_table_expr = join_node.getLeftTableExpression().get();
// const auto * right_table_expr = join_node.getRightTableExpression().get();
while (!nodes_to_process.empty())
{
const auto * node_to_process = nodes_to_process.back();
nodes_to_process.pop_back();
for (const auto & child : node_to_process->children)
nodes_to_process.push_back(child);
//std::cerr << "... " << reinterpret_cast<const void *>(node_to_process) << ' ' << node_to_process->result_name << std::endl;
if (!join_expression_dag_input_nodes.contains(node_to_process))
if (const auto * function_node = node_to_process->as<FunctionNode>())
{
for (const auto & child : function_node->getArguments())
nodes_to_process.push_back(child.get());
continue;
}
const auto * column_node = node_to_process->as<ColumnNode>();
if (!column_node)
continue;
const auto & input_name = node_to_process->result_name;
// if (!join_expression_dag_input_nodes.contains(node_to_process))
// continue;
bool left_table_expression_contains_input = left_table_expression_columns_names.contains(input_name);
bool right_table_expression_contains_input = right_table_expression_columns_names.contains(input_name);
const auto & input_name = column_node->getColumnName();
if (!left_table_expression_contains_input && !right_table_expression_contains_input)
// bool left_table_expression_contains_input = left_table_expression_columns_names.contains(input_name);
// bool right_table_expression_contains_input = right_table_expression_columns_names.contains(input_name);
// if (!left_table_expression_contains_input && !right_table_expression_contains_input)
// throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
// "JOIN {} actions has column {} that do not exist in left {} or right {} table expression columns",
// join_node.formatASTForErrorMessage(),
// input_name,
// boost::join(left_table_expression_columns_names, ", "),
// boost::join(right_table_expression_columns_names, ", "));
const auto * column_source = column_node->getColumnSource().get();
if (!column_source)
throw Exception(ErrorCodes::LOGICAL_ERROR, "No source for column {} in JOIN {}", input_name, join_node.formatASTForErrorMessage());
bool is_column_from_left_expr = left_table_expressions.contains(column_source);
bool is_column_from_right_expr = right_table_expressions.contains(column_source);
if (!is_column_from_left_expr && !is_column_from_right_expr)
throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
"JOIN {} actions has column {} that do not exist in left {} or right {} table expression columns",
join_node.formatASTForErrorMessage(),
input_name,
boost::join(left_table_expression_columns_names, ", "),
boost::join(right_table_expression_columns_names, ", "));
column_source->formatASTForErrorMessage(),
join_node.getLeftTableExpression()->formatASTForErrorMessage(),
join_node.getRightTableExpression()->formatASTForErrorMessage());
auto input_table_side = left_table_expression_contains_input ? JoinTableSide::Left : JoinTableSide::Right;
auto input_table_side = is_column_from_left_expr ? JoinTableSide::Left : JoinTableSide::Right;
if (table_side && (*table_side) != input_table_side)
throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
"JOIN {} join expression contains column from left and right table",
@ -159,29 +215,58 @@ std::optional<JoinTableSide> extractJoinTableSideFromExpression(const ActionsDAG
return table_side;
}
void buildJoinClause(ActionsDAGPtr join_expression_dag,
const std::unordered_set<const ActionsDAG::Node *> & join_expression_dag_input_nodes,
const ActionsDAG::Node * join_expressions_actions_node,
const NameSet & left_table_expression_columns_names,
const NameSet & right_table_expression_columns_names,
const ActionsDAG::Node * appendExpression(
ActionsDAGPtr & dag,
const QueryTreeNodePtr & expression,
const PlannerContextPtr & planner_context,
const JoinNode & join_node)
{
PlannerActionsVisitor join_expression_visitor(planner_context);
auto join_expression_dag_node_raw_pointers = join_expression_visitor.visit(dag, expression);
if (join_expression_dag_node_raw_pointers.size() != 1)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"JOIN {} ON clause contains multiple expressions",
join_node.formatASTForErrorMessage());
return join_expression_dag_node_raw_pointers[0];
}
void buildJoinClause(
ActionsDAGPtr & left_dag,
ActionsDAGPtr & right_dag,
const PlannerContextPtr & planner_context,
//ActionsDAGPtr join_expression_dag,
//const std::unordered_set<const ActionsDAG::Node *> & join_expression_dag_input_nodes,
//const ActionsDAG::Node * join_expressions_actions_node,
const QueryTreeNodePtr & join_expression,
const TableExpressionSet & left_table_expressions,
const TableExpressionSet & right_table_expressions,
const JoinNode & join_node,
JoinClause & join_clause)
{
std::string function_name;
if (join_expressions_actions_node->function)
function_name = join_expressions_actions_node->function->getName();
//std::cerr << join_expression_dag->dumpDAG() << std::endl;
auto * function_node = join_expression->as<FunctionNode>();
if (function_node)
function_name = function_node->getFunction()->getName();
// if (join_expressions_actions_node->function)
// function_name = join_expressions_actions_node->function->getName();
/// For 'and' function go into children
if (function_name == "and")
{
for (const auto & child : join_expressions_actions_node->children)
for (const auto & child : function_node->getArguments())
{
buildJoinClause(join_expression_dag,
join_expression_dag_input_nodes,
buildJoinClause(//join_expression_dag,
//join_expression_dag_input_nodes,
left_dag,
right_dag,
planner_context,
child,
left_table_expression_columns_names,
right_table_expression_columns_names,
left_table_expressions,
right_table_expressions,
join_node,
join_clause);
}
@ -194,45 +279,49 @@ void buildJoinClause(ActionsDAGPtr join_expression_dag,
if (function_name == "equals" || function_name == "isNotDistinctFrom" || is_asof_join_inequality)
{
const auto * left_child = join_expressions_actions_node->children.at(0);
const auto * right_child = join_expressions_actions_node->children.at(1);
const auto left_child = function_node->getArguments().getNodes().at(0);//join_expressions_actions_node->children.at(0);
const auto right_child = function_node->getArguments().getNodes().at(1); //join_expressions_actions_node->children.at(1);
auto left_expression_side_optional = extractJoinTableSideFromExpression(left_child,
join_expression_dag_input_nodes,
left_table_expression_columns_names,
right_table_expression_columns_names,
auto left_expression_side_optional = extractJoinTableSideFromExpression(left_child.get(),
//join_expression_dag_input_nodes,
left_table_expressions,
right_table_expressions,
join_node);
auto right_expression_side_optional = extractJoinTableSideFromExpression(right_child,
join_expression_dag_input_nodes,
left_table_expression_columns_names,
right_table_expression_columns_names,
auto right_expression_side_optional = extractJoinTableSideFromExpression(right_child.get(),
//join_expression_dag_input_nodes,
left_table_expressions,
right_table_expressions,
join_node);
if (!left_expression_side_optional && !right_expression_side_optional)
{
throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
"JOIN {} ON expression {} with constants is not supported",
join_node.formatASTForErrorMessage(),
join_expressions_actions_node->result_name);
"JOIN {} ON expression with constants is not supported",
join_node.formatASTForErrorMessage());
}
else if (left_expression_side_optional && !right_expression_side_optional)
{
join_clause.addCondition(*left_expression_side_optional, join_expressions_actions_node);
auto & dag = *left_expression_side_optional == JoinTableSide::Left ? left_dag : right_dag;
const auto * node = appendExpression(dag, join_expression, planner_context, join_node);
join_clause.addCondition(*left_expression_side_optional, node);
}
else if (!left_expression_side_optional && right_expression_side_optional)
{
join_clause.addCondition(*right_expression_side_optional, join_expressions_actions_node);
auto & dag = *right_expression_side_optional == JoinTableSide::Left ? left_dag : right_dag;
const auto * node = appendExpression(dag, join_expression, planner_context, join_node);
join_clause.addCondition(*right_expression_side_optional, node);
}
else
{
// std::cerr << "===============\n";
auto left_expression_side = *left_expression_side_optional;
auto right_expression_side = *right_expression_side_optional;
if (left_expression_side != right_expression_side)
{
const ActionsDAG::Node * left_key = left_child;
const ActionsDAG::Node * right_key = right_child;
auto left_key = left_child;
auto right_key = right_child;
if (left_expression_side == JoinTableSide::Right)
{
@ -241,6 +330,9 @@ void buildJoinClause(ActionsDAGPtr join_expression_dag,
asof_inequality = reverseASOFJoinInequality(asof_inequality);
}
const auto * left_node = appendExpression(left_dag, left_key, planner_context, join_node);
const auto * right_node = appendExpression(right_dag, right_key, planner_context, join_node);
if (is_asof_join_inequality)
{
if (join_clause.hasASOF())
@ -250,55 +342,66 @@ void buildJoinClause(ActionsDAGPtr join_expression_dag,
join_node.formatASTForErrorMessage());
}
join_clause.addASOFKey(left_key, right_key, asof_inequality);
join_clause.addASOFKey(left_node, right_node, asof_inequality);
}
else
{
bool null_safe_comparison = function_name == "isNotDistinctFrom";
join_clause.addKey(left_key, right_key, null_safe_comparison);
join_clause.addKey(left_node, right_node, null_safe_comparison);
}
}
else
{
join_clause.addCondition(left_expression_side, join_expressions_actions_node);
auto & dag = left_expression_side == JoinTableSide::Left ? left_dag : right_dag;
const auto * node = appendExpression(dag, join_expression, planner_context, join_node);
join_clause.addCondition(left_expression_side, node);
}
}
return;
}
auto expression_side_optional = extractJoinTableSideFromExpression(join_expressions_actions_node,
join_expression_dag_input_nodes,
left_table_expression_columns_names,
right_table_expression_columns_names,
auto expression_side_optional = extractJoinTableSideFromExpression(//join_expressions_actions_node,
//join_expression_dag_input_nodes,
join_expression.get(),
left_table_expressions,
right_table_expressions,
join_node);
if (!expression_side_optional)
expression_side_optional = JoinTableSide::Right;
auto expression_side = *expression_side_optional;
join_clause.addCondition(expression_side, join_expressions_actions_node);
auto & dag = expression_side == JoinTableSide::Left ? left_dag : right_dag;
const auto * node = appendExpression(dag, join_expression, planner_context, join_node);
join_clause.addCondition(expression_side, node);
}
JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName & join_expression_input_columns,
JoinClausesAndActions buildJoinClausesAndActions(//const ColumnsWithTypeAndName & join_expression_input_columns,
const ColumnsWithTypeAndName & left_table_expression_columns,
const ColumnsWithTypeAndName & right_table_expression_columns,
const JoinNode & join_node,
const PlannerContextPtr & planner_context)
{
ActionsDAGPtr join_expression_actions = std::make_shared<ActionsDAG>(join_expression_input_columns);
//ActionsDAGPtr join_expression_actions = std::make_shared<ActionsDAG>(join_expression_input_columns);
ActionsDAGPtr left_join_actions = std::make_shared<ActionsDAG>(left_table_expression_columns);
ActionsDAGPtr right_join_actions = std::make_shared<ActionsDAG>(right_table_expression_columns);
// LOG_TRACE(&Poco::Logger::get("Planner"), "buildJoinClausesAndActions cols {} ", left_join_actions->dumpDAG());
// LOG_TRACE(&Poco::Logger::get("Planner"), "buildJoinClausesAndActions cols {} ", right_join_actions->dumpDAG());
/** In ActionsDAG if input node has constant representation additional constant column is added.
* That way we cannot simply check that node has INPUT type during resolution of expression join table side.
* Put all nodes after actions dag initialization in set.
* To check if actions dag node is input column, we check if set contains it.
*/
const auto & join_expression_actions_nodes = join_expression_actions->getNodes();
// const auto & join_expression_actions_nodes = join_expression_actions->getNodes();
std::unordered_set<const ActionsDAG::Node *> join_expression_dag_input_nodes;
join_expression_dag_input_nodes.reserve(join_expression_actions_nodes.size());
for (const auto & node : join_expression_actions_nodes)
join_expression_dag_input_nodes.insert(&node);
// std::unordered_set<const ActionsDAG::Node *> join_expression_dag_input_nodes;
// join_expression_dag_input_nodes.reserve(join_expression_actions_nodes.size());
// for (const auto & node : join_expression_actions_nodes)
// join_expression_dag_input_nodes.insert(&node);
/** It is possible to have constant value in JOIN ON section, that we need to ignore during DAG construction.
* If we do not ignore it, this function will be replaced by underlying constant.
@ -308,6 +411,9 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
* ON (t1.id = t2.id) AND 1 != 1 AND (t1.value >= t1.value);
*/
auto join_expression = join_node.getJoinExpression();
// LOG_TRACE(&Poco::Logger::get("Planner"), "buildJoinClausesAndActions expr {} ", join_expression->formatConvertedASTForErrorMessage());
// LOG_TRACE(&Poco::Logger::get("Planner"), "buildJoinClausesAndActions expr {} ", join_expression->dumpTree());
auto * constant_join_expression = join_expression->as<ConstantNode>();
if (constant_join_expression && constant_join_expression->hasSourceExpression())
@ -319,18 +425,18 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
"JOIN {} join expression expected function",
join_node.formatASTForErrorMessage());
PlannerActionsVisitor join_expression_visitor(planner_context);
auto join_expression_dag_node_raw_pointers = join_expression_visitor.visit(join_expression_actions, join_expression);
if (join_expression_dag_node_raw_pointers.size() != 1)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"JOIN {} ON clause contains multiple expressions",
join_node.formatASTForErrorMessage());
// PlannerActionsVisitor join_expression_visitor(planner_context);
// auto join_expression_dag_node_raw_pointers = join_expression_visitor.visit(join_expression_actions, join_expression);
// if (join_expression_dag_node_raw_pointers.size() != 1)
// throw Exception(ErrorCodes::LOGICAL_ERROR,
// "JOIN {} ON clause contains multiple expressions",
// join_node.formatASTForErrorMessage());
const auto * join_expressions_actions_root_node = join_expression_dag_node_raw_pointers[0];
if (!join_expressions_actions_root_node->function)
throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
"JOIN {} join expression expected function",
join_node.formatASTForErrorMessage());
// const auto * join_expressions_actions_root_node = join_expression_dag_node_raw_pointers[0];
// if (!join_expressions_actions_root_node->function)
// throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION,
// "JOIN {} join expression expected function",
// join_node.formatASTForErrorMessage());
size_t left_table_expression_columns_size = left_table_expression_columns.size();
@ -360,21 +466,27 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
join_right_actions_names_set.insert(right_table_expression_column.name);
}
JoinClausesAndActions result;
result.join_expression_actions = join_expression_actions;
auto join_left_table_expressions = extractTableExpressionsSet(join_node.getLeftTableExpression());
auto join_right_table_expressions = extractTableExpressionsSet(join_node.getRightTableExpression());
const auto & function_name = join_expressions_actions_root_node->function->getName();
JoinClausesAndActions result;
//result.join_expression_actions = join_expression_actions;
const auto & function_name = function_node->getFunction()->getName();
if (function_name == "or")
{
for (const auto & child : join_expressions_actions_root_node->children)
for (const auto & child : function_node->getArguments())
{
result.join_clauses.emplace_back();
buildJoinClause(join_expression_actions,
join_expression_dag_input_nodes,
buildJoinClause(//join_expression_actions,
//join_expression_dag_input_nodes,
left_join_actions,
right_join_actions,
planner_context,
child,
join_left_actions_names_set,
join_right_actions_names_set,
join_left_table_expressions,
join_right_table_expressions,
join_node,
result.join_clauses.back());
}
@ -383,11 +495,15 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
{
result.join_clauses.emplace_back();
buildJoinClause(join_expression_actions,
join_expression_dag_input_nodes,
join_expressions_actions_root_node,
join_left_actions_names_set,
join_right_actions_names_set,
buildJoinClause(
left_join_actions,
right_join_actions,
planner_context,
//join_expression_actions,
//join_expression_dag_input_nodes,
join_expression, //join_expressions_actions_root_node,
join_left_table_expressions,
join_right_table_expressions,
join_node,
result.join_clauses.back());
}
@ -412,12 +528,12 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
const ActionsDAG::Node * dag_filter_condition_node = nullptr;
if (left_filter_condition_nodes.size() > 1)
dag_filter_condition_node = &join_expression_actions->addFunction(and_function, left_filter_condition_nodes, {});
dag_filter_condition_node = &left_join_actions->addFunction(and_function, left_filter_condition_nodes, {});
else
dag_filter_condition_node = left_filter_condition_nodes[0];
join_clause.getLeftFilterConditionNodes() = {dag_filter_condition_node};
join_expression_actions->addOrReplaceInOutputs(*dag_filter_condition_node);
left_join_actions->addOrReplaceInOutputs(*dag_filter_condition_node);
add_necessary_name_if_needed(JoinTableSide::Left, dag_filter_condition_node->result_name);
}
@ -428,12 +544,12 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
const ActionsDAG::Node * dag_filter_condition_node = nullptr;
if (right_filter_condition_nodes.size() > 1)
dag_filter_condition_node = &join_expression_actions->addFunction(and_function, right_filter_condition_nodes, {});
dag_filter_condition_node = &right_join_actions->addFunction(and_function, right_filter_condition_nodes, {});
else
dag_filter_condition_node = right_filter_condition_nodes[0];
join_clause.getRightFilterConditionNodes() = {dag_filter_condition_node};
join_expression_actions->addOrReplaceInOutputs(*dag_filter_condition_node);
right_join_actions->addOrReplaceInOutputs(*dag_filter_condition_node);
add_necessary_name_if_needed(JoinTableSide::Right, dag_filter_condition_node->result_name);
}
@ -470,10 +586,10 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
}
if (!left_key_node->result_type->equals(*common_type))
left_key_node = &join_expression_actions->addCast(*left_key_node, common_type, {});
left_key_node = &left_join_actions->addCast(*left_key_node, common_type, {});
if (!right_key_node->result_type->equals(*common_type))
right_key_node = &join_expression_actions->addCast(*right_key_node, common_type, {});
right_key_node = &right_join_actions->addCast(*right_key_node, common_type, {});
}
if (join_clause.isNullsafeCompareKey(i) && left_key_node->result_type->isNullable() && right_key_node->result_type->isNullable())
@ -490,22 +606,29 @@ JoinClausesAndActions buildJoinClausesAndActions(const ColumnsWithTypeAndName &
* SELECT * FROM t1 JOIN t2 ON tuple(t1.a) == tuple(t2.b)
*/
auto wrap_nullsafe_function = FunctionFactory::instance().get("tuple", planner_context->getQueryContext());
left_key_node = &join_expression_actions->addFunction(wrap_nullsafe_function, {left_key_node}, {});
right_key_node = &join_expression_actions->addFunction(wrap_nullsafe_function, {right_key_node}, {});
left_key_node = &left_join_actions->addFunction(wrap_nullsafe_function, {left_key_node}, {});
right_key_node = &right_join_actions->addFunction(wrap_nullsafe_function, {right_key_node}, {});
}
join_expression_actions->addOrReplaceInOutputs(*left_key_node);
join_expression_actions->addOrReplaceInOutputs(*right_key_node);
left_join_actions->addOrReplaceInOutputs(*left_key_node);
right_join_actions->addOrReplaceInOutputs(*right_key_node);
add_necessary_name_if_needed(JoinTableSide::Left, left_key_node->result_name);
add_necessary_name_if_needed(JoinTableSide::Right, right_key_node->result_name);
}
}
result.left_join_expressions_actions = join_expression_actions->clone();
result.left_join_expressions_actions = left_join_actions->clone();
result.left_join_tmp_expression_actions = std::move(left_join_actions);
result.left_join_expressions_actions->removeUnusedActions(join_left_actions_names);
result.right_join_expressions_actions = join_expression_actions->clone();
// for (const auto & name : join_right_actions_names)
// std::cerr << ".. " << name << std::endl;
// std::cerr << right_join_actions->dumpDAG() << std::endl;
result.right_join_expressions_actions = right_join_actions->clone();
result.right_join_tmp_expression_actions = std::move(right_join_actions);
result.right_join_expressions_actions->removeUnusedActions(join_right_actions_names);
return result;
@ -525,10 +648,10 @@ JoinClausesAndActions buildJoinClausesAndActions(
"JOIN {} join does not have ON section",
join_node_typed.formatASTForErrorMessage());
auto join_expression_input_columns = left_table_expression_columns;
join_expression_input_columns.insert(join_expression_input_columns.end(), right_table_expression_columns.begin(), right_table_expression_columns.end());
// auto join_expression_input_columns = left_table_expression_columns;
// join_expression_input_columns.insert(join_expression_input_columns.end(), right_table_expression_columns.begin(), right_table_expression_columns.end());
return buildJoinClausesAndActions(join_expression_input_columns, left_table_expression_columns, right_table_expression_columns, join_node_typed, planner_context);
return buildJoinClausesAndActions(/*join_expression_input_columns,*/ left_table_expression_columns, right_table_expression_columns, join_node_typed, planner_context);
}
std::optional<bool> tryExtractConstantFromJoinNode(const QueryTreeNodePtr & join_node)

View File

@ -165,7 +165,8 @@ struct JoinClausesAndActions
/// Join clauses. Actions dag nodes point into join_expression_actions.
JoinClauses join_clauses;
/// Whole JOIN ON section expressions
ActionsDAGPtr join_expression_actions;
ActionsDAGPtr left_join_tmp_expression_actions;
ActionsDAGPtr right_join_tmp_expression_actions;
/// Left join expressions actions
ActionsDAGPtr left_join_expressions_actions;
/// Right join expressions actions

View File

@ -80,9 +80,11 @@ public:
}
/// Add alias column name
void addAliasColumnName(const std::string & column_name)
void addAliasColumnName(const std::string & column_name, const ColumnIdentifier & column_identifier)
{
alias_columns_names.insert(column_name);
column_name_to_column_identifier.emplace(column_name, column_identifier);
}
/// Get alias columns names

View File

@ -357,6 +357,7 @@ QueryTreeNodePtr mergeConditionNodes(const QueryTreeNodes & condition_nodes, con
QueryTreeNodePtr replaceTableExpressionsWithDummyTables(const QueryTreeNodePtr & query_node,
const ContextPtr & context,
//PlannerContext & planner_context,
ResultReplacementMap * result_replacement_map)
{
auto & query_node_typed = query_node->as<QueryNode &>();
@ -406,6 +407,13 @@ QueryTreeNodePtr replaceTableExpressionsWithDummyTables(const QueryTreeNodePtr &
if (result_replacement_map)
result_replacement_map->emplace(table_expression, dummy_table_node);
dummy_table_node->setAlias(table_expression->getAlias());
// auto & src_table_expression_data = planner_context.getOrCreateTableExpressionData(table_expression);
// auto & dst_table_expression_data = planner_context.getOrCreateTableExpressionData(dummy_table_node);
// dst_table_expression_data = src_table_expression_data;
replacement_map.emplace(table_expression.get(), std::move(dummy_table_node));
}

View File

@ -436,7 +436,6 @@ AggregateProjectionCandidates getAggregateProjectionCandidates(
AggregateProjectionCandidates candidates;
const auto & parts = reading.getParts();
const auto & query_info = reading.getQueryInfo();
const auto metadata = reading.getStorageMetadata();
ContextPtr context = reading.getContext();
@ -481,8 +480,7 @@ AggregateProjectionCandidates getAggregateProjectionCandidates(
auto block = reading.getMergeTreeData().getMinMaxCountProjectionBlock(
metadata,
candidate.dag->getRequiredColumnsNames(),
dag.filter_node != nullptr,
query_info,
(dag.filter_node ? dag.dag : nullptr),
parts,
max_added_blocks.get(),
context);

View File

@ -23,6 +23,8 @@
#include <Processors/Transforms/ReverseTransform.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Storages/MergeTree/MergeTreeDataSelectExecutor.h>
#include <Storages/MergeTree/MergeTreeIndexAnnoy.h>
#include <Storages/MergeTree/MergeTreeIndexUSearch.h>
#include <Storages/MergeTree/MergeTreeReadPool.h>
#include <Storages/MergeTree/MergeTreePrefetchedReadPool.h>
#include <Storages/MergeTree/MergeTreeReadPoolInOrder.h>
@ -418,7 +420,13 @@ Pipe ReadFromMergeTree::readFromPool(
&& settings.allow_prefetched_read_pool_for_local_filesystem
&& MergeTreePrefetchedReadPool::checkReadMethodAllowed(reader_settings.read_settings.local_fs_method);
if (allow_prefetched_remote || allow_prefetched_local)
/** Do not use prefetched read pool if query is trivial limit query.
* Because time spend during filling per thread tasks can be greater than whole query
* execution for big tables with small limit.
*/
bool use_prefetched_read_pool = query_info.limit == 0 && (allow_prefetched_remote || allow_prefetched_local);
if (use_prefetched_read_pool)
{
pool = std::make_shared<MergeTreePrefetchedReadPool>(
std::move(parts_with_range),
@ -1331,26 +1339,12 @@ static void buildIndexes(
const Names & primary_key_column_names = primary_key.column_names;
const auto & settings = context->getSettingsRef();
if (settings.query_plan_optimize_primary_key)
{
NameSet array_join_name_set;
if (query_info.syntax_analyzer_result)
array_join_name_set = query_info.syntax_analyzer_result->getArrayJoinSourceNameSet();
indexes.emplace(ReadFromMergeTree::Indexes{{
filter_actions_dag,
context,
primary_key_column_names,
primary_key.expression}, {}, {}, {}, {}, false, {}});
}
else
{
indexes.emplace(ReadFromMergeTree::Indexes{{
query_info,
context,
primary_key_column_names,
primary_key.expression}, {}, {}, {}, {}, false, {}});
}
indexes.emplace(ReadFromMergeTree::Indexes{{
filter_actions_dag,
context,
primary_key_column_names,
primary_key.expression}, {}, {}, {}, {}, false, {}});
if (metadata_snapshot->hasPartitionKey())
{
@ -1363,11 +1357,7 @@ static void buildIndexes(
}
/// TODO Support row_policy_filter and additional_filters
if (settings.allow_experimental_analyzer)
indexes->part_values = MergeTreeDataSelectExecutor::filterPartsByVirtualColumns(data, parts, filter_actions_dag, context);
else
indexes->part_values = MergeTreeDataSelectExecutor::filterPartsByVirtualColumns(data, parts, query_info.query, context);
indexes->part_values = MergeTreeDataSelectExecutor::filterPartsByVirtualColumns(data, parts, filter_actions_dag, context);
MergeTreeDataSelectExecutor::buildKeyConditionFromPartOffset(indexes->part_offset_condition, filter_actions_dag, context);
indexes->use_skip_indexes = settings.use_skip_indexes;
@ -1379,14 +1369,18 @@ static void buildIndexes(
if (!indexes->use_skip_indexes)
return;
const SelectQueryInfo * info = &query_info;
std::optional<SelectQueryInfo> info_copy;
if (settings.allow_experimental_analyzer)
auto get_query_info = [&]() -> const SelectQueryInfo &
{
info_copy.emplace(query_info);
info_copy->filter_actions_dag = filter_actions_dag;
info = &*info_copy;
}
if (settings.allow_experimental_analyzer)
{
info_copy.emplace(query_info);
info_copy->filter_actions_dag = filter_actions_dag;
return *info_copy;
}
return query_info;
};
std::unordered_set<std::string> ignored_index_names;
@ -1427,14 +1421,30 @@ static void buildIndexes(
if (inserted)
{
skip_indexes.merged_indices.emplace_back();
skip_indexes.merged_indices.back().condition = index_helper->createIndexMergedCondition(*info, metadata_snapshot);
skip_indexes.merged_indices.back().condition = index_helper->createIndexMergedCondition(get_query_info(), metadata_snapshot);
}
skip_indexes.merged_indices[it->second].addIndex(index_helper);
}
else
{
auto condition = index_helper->createIndexCondition(*info, context);
MergeTreeIndexConditionPtr condition;
if (index_helper->isVectorSearch())
{
#ifdef ENABLE_ANNOY
if (const auto * annoy = typeid_cast<const MergeTreeIndexAnnoy *>(index_helper.get()))
condition = annoy->createIndexCondition(get_query_info(), context);
#endif
#ifdef ENABLE_USEARCH
if (const auto * usearch = typeid_cast<const MergeTreeIndexUSearch *>(index_helper.get()))
condition = usearch->createIndexCondition(get_query_info(), context);
#endif
if (!condition)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown vector search index {}", index_helper->index.name);
}
else
condition = index_helper->createIndexCondition(filter_actions_dag, context);
if (!condition->alwaysUnknownOrTrue())
skip_indexes.useful_indices.emplace_back(index_helper, condition);
}
@ -1467,34 +1477,15 @@ MergeTreeDataSelectAnalysisResultPtr ReadFromMergeTree::selectRangesToRead(
Poco::Logger * log,
std::optional<Indexes> & indexes)
{
const auto & settings = context->getSettingsRef();
if (settings.allow_experimental_analyzer || settings.query_plan_optimize_primary_key)
{
auto updated_query_info_with_filter_dag = query_info;
updated_query_info_with_filter_dag.filter_actions_dag = buildFilterDAG(context, prewhere_info, added_filter_nodes, query_info);
return selectRangesToReadImpl(
std::move(parts),
std::move(alter_conversions),
metadata_snapshot_base,
metadata_snapshot,
updated_query_info_with_filter_dag,
context,
num_streams,
max_block_numbers_to_read,
data,
real_column_names,
sample_factor_column_queried,
log,
indexes);
}
auto updated_query_info_with_filter_dag = query_info;
updated_query_info_with_filter_dag.filter_actions_dag = buildFilterDAG(context, prewhere_info, added_filter_nodes, query_info);
return selectRangesToReadImpl(
std::move(parts),
std::move(alter_conversions),
metadata_snapshot_base,
metadata_snapshot,
query_info,
updated_query_info_with_filter_dag,
context,
num_streams,
max_block_numbers_to_read,

View File

@ -30,19 +30,9 @@ void ReadFromStorageStep::applyFilters()
if (!context)
return;
std::shared_ptr<const KeyCondition> key_condition;
if (!context->getSettingsRef().allow_experimental_analyzer)
{
for (const auto & processor : pipe.getProcessors())
if (auto * source = dynamic_cast<SourceWithKeyCondition *>(processor.get()))
source->setKeyCondition(query_info, context);
}
else
{
for (const auto & processor : pipe.getProcessors())
if (auto * source = dynamic_cast<SourceWithKeyCondition *>(processor.get()))
source->setKeyCondition(filter_nodes.nodes, context);
}
for (const auto & processor : pipe.getProcessors())
if (auto * source = dynamic_cast<SourceWithKeyCondition *>(processor.get()))
source->setKeyCondition(filter_nodes.nodes, context);
}
}

View File

@ -16,33 +16,18 @@ protected:
/// Represents pushed down filters in source
std::shared_ptr<const KeyCondition> key_condition;
void setKeyConditionImpl(const SelectQueryInfo & query_info, ContextPtr context, const Block & keys)
{
if (!context->getSettingsRef().allow_experimental_analyzer)
{
key_condition = std::make_shared<const KeyCondition>(
query_info,
context,
keys.getNames(),
std::make_shared<ExpressionActions>(std::make_shared<ActionsDAG>(keys.getColumnsWithTypeAndName())));
}
}
void setKeyConditionImpl(const ActionsDAG::NodeRawConstPtrs & nodes, ContextPtr context, const Block & keys)
{
if (context->getSettingsRef().allow_experimental_analyzer)
{
std::unordered_map<std::string, DB::ColumnWithTypeAndName> node_name_to_input_column;
for (const auto & column : keys.getColumnsWithTypeAndName())
node_name_to_input_column.insert({column.name, column});
std::unordered_map<std::string, DB::ColumnWithTypeAndName> node_name_to_input_column;
for (const auto & column : keys.getColumnsWithTypeAndName())
node_name_to_input_column.insert({column.name, column});
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(nodes, node_name_to_input_column, context);
key_condition = std::make_shared<const KeyCondition>(
filter_actions_dag,
context,
keys.getNames(),
std::make_shared<ExpressionActions>(std::make_shared<ActionsDAG>(keys.getColumnsWithTypeAndName())));
}
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(nodes, node_name_to_input_column, context);
key_condition = std::make_shared<const KeyCondition>(
filter_actions_dag,
context,
keys.getNames(),
std::make_shared<ExpressionActions>(std::make_shared<ActionsDAG>(keys.getColumnsWithTypeAndName())));
}
public:
@ -52,10 +37,7 @@ public:
/// Set key_condition directly. It is used for filter push down in source.
virtual void setKeyCondition(const std::shared_ptr<const KeyCondition> & key_condition_) { key_condition = key_condition_; }
/// Set key_condition created by query_info and context. It is used for filter push down when allow_experimental_analyzer is false.
virtual void setKeyCondition(const SelectQueryInfo & /*query_info*/, ContextPtr /*context*/) { }
/// Set key_condition created by nodes and context. It is used for filter push down when allow_experimental_analyzer is true.
/// Set key_condition created by nodes and context.
virtual void setKeyCondition(const ActionsDAG::NodeRawConstPtrs & /*nodes*/, ContextPtr /*context*/) { }
};
}

View File

@ -15,6 +15,9 @@
#include <Processors/Transforms/AddingDefaultsTransform.h>
#include <Processors/Transforms/ExtractColumnsTransform.h>
#include <Processors/Sources/ConstChunkGenerator.h>
#include <Processors/Sources/NullSource.h>
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/SourceStepWithFilter.h>
#include <IO/WriteHelpers.h>
#include <IO/CompressionMethod.h>
@ -408,22 +411,22 @@ ColumnsDescription StorageHDFS::getTableStructureFromData(
class HDFSSource::DisclosedGlobIterator::Impl
{
public:
Impl(const String & uri, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
Impl(const String & uri, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
{
const auto [path_from_uri, uri_without_path] = getPathFromUriAndUriWithoutPath(uri);
uris = getPathsList(path_from_uri, uri_without_path, context);
ASTPtr filter_ast;
ActionsDAGPtr filter_dag;
if (!uris.empty())
filter_ast = VirtualColumnUtils::createPathAndFileFilterAst(query, virtual_columns, uris[0].path, context);
filter_dag = VirtualColumnUtils::createPathAndFileFilterDAG(predicate, virtual_columns);
if (filter_ast)
if (filter_dag)
{
std::vector<String> paths;
paths.reserve(uris.size());
for (const auto & path_with_info : uris)
paths.push_back(path_with_info.path);
VirtualColumnUtils::filterByPathOrFile(uris, paths, query, virtual_columns, context, filter_ast);
VirtualColumnUtils::filterByPathOrFile(uris, paths, filter_dag, virtual_columns, context);
}
auto file_progress_callback = context->getFileProgressCallback();
@ -456,21 +459,21 @@ private:
class HDFSSource::URISIterator::Impl : WithContext
{
public:
explicit Impl(const std::vector<String> & uris_, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context_)
explicit Impl(const std::vector<String> & uris_, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context_)
: WithContext(context_), uris(uris_), file_progress_callback(context_->getFileProgressCallback())
{
ASTPtr filter_ast;
ActionsDAGPtr filter_dag;
if (!uris.empty())
filter_ast = VirtualColumnUtils::createPathAndFileFilterAst(query, virtual_columns, getPathFromUriAndUriWithoutPath(uris[0]).first, getContext());
filter_dag = VirtualColumnUtils::createPathAndFileFilterDAG(predicate, virtual_columns);
if (filter_ast)
if (filter_dag)
{
std::vector<String> paths;
paths.reserve(uris.size());
for (const auto & uri : uris)
paths.push_back(getPathFromUriAndUriWithoutPath(uri).first);
VirtualColumnUtils::filterByPathOrFile(uris, paths, query, virtual_columns, getContext(), filter_ast);
VirtualColumnUtils::filterByPathOrFile(uris, paths, filter_dag, virtual_columns, getContext());
}
if (!uris.empty())
@ -517,16 +520,16 @@ private:
std::function<void(FileProgress)> file_progress_callback;
};
HDFSSource::DisclosedGlobIterator::DisclosedGlobIterator(const String & uri, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
: pimpl(std::make_shared<HDFSSource::DisclosedGlobIterator::Impl>(uri, query, virtual_columns, context)) {}
HDFSSource::DisclosedGlobIterator::DisclosedGlobIterator(const String & uri, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
: pimpl(std::make_shared<HDFSSource::DisclosedGlobIterator::Impl>(uri, predicate, virtual_columns, context)) {}
StorageHDFS::PathWithInfo HDFSSource::DisclosedGlobIterator::next()
{
return pimpl->next();
}
HDFSSource::URISIterator::URISIterator(const std::vector<String> & uris_, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
: pimpl(std::make_shared<HDFSSource::URISIterator::Impl>(uris_, query, virtual_columns, context))
HDFSSource::URISIterator::URISIterator(const std::vector<String> & uris_, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context)
: pimpl(std::make_shared<HDFSSource::URISIterator::Impl>(uris_, predicate, virtual_columns, context))
{
}
@ -541,8 +544,7 @@ HDFSSource::HDFSSource(
ContextPtr context_,
UInt64 max_block_size_,
std::shared_ptr<IteratorWrapper> file_iterator_,
bool need_only_count_,
const SelectQueryInfo & query_info_)
bool need_only_count_)
: ISource(info.source_header, false)
, WithContext(context_)
, storage(std::move(storage_))
@ -553,7 +555,6 @@ HDFSSource::HDFSSource(
, file_iterator(file_iterator_)
, columns_description(info.columns_description)
, need_only_count(need_only_count_)
, query_info(query_info_)
{
initialize();
}
@ -843,7 +844,57 @@ bool StorageHDFS::supportsSubsetOfColumns(const ContextPtr & context_) const
return FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(format_name, context_);
}
Pipe StorageHDFS::read(
class ReadFromHDFS : public SourceStepWithFilter
{
public:
std::string getName() const override { return "ReadFromHDFS"; }
void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override;
void applyFilters() override;
ReadFromHDFS(
Block sample_block,
ReadFromFormatInfo info_,
bool need_only_count_,
std::shared_ptr<StorageHDFS> storage_,
ContextPtr context_,
size_t max_block_size_,
size_t num_streams_)
: SourceStepWithFilter(DataStream{.header = std::move(sample_block)})
, info(std::move(info_))
, need_only_count(need_only_count_)
, storage(std::move(storage_))
, context(std::move(context_))
, max_block_size(max_block_size_)
, num_streams(num_streams_)
{
}
private:
ReadFromFormatInfo info;
const bool need_only_count;
std::shared_ptr<StorageHDFS> storage;
ContextPtr context;
size_t max_block_size;
size_t num_streams;
std::shared_ptr<HDFSSource::IteratorWrapper> iterator_wrapper;
void createIterator(const ActionsDAG::Node * predicate);
};
void ReadFromHDFS::applyFilters()
{
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(filter_nodes.nodes, {}, context);
const ActionsDAG::Node * predicate = nullptr;
if (filter_actions_dag)
predicate = filter_actions_dag->getOutputs().at(0);
createIterator(predicate);
}
void StorageHDFS::read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
@ -852,18 +903,40 @@ Pipe StorageHDFS::read(
size_t max_block_size,
size_t num_streams)
{
std::shared_ptr<HDFSSource::IteratorWrapper> iterator_wrapper{nullptr};
if (distributed_processing)
auto read_from_format_info = prepareReadingFromFormat(column_names, storage_snapshot, supportsSubsetOfColumns(context_), virtual_columns);
bool need_only_count = (query_info.optimize_trivial_count || read_from_format_info.requested_columns.empty())
&& context_->getSettingsRef().optimize_count_from_files;
auto this_ptr = std::static_pointer_cast<StorageHDFS>(shared_from_this());
auto reading = std::make_unique<ReadFromHDFS>(
read_from_format_info.source_header,
std::move(read_from_format_info),
need_only_count,
std::move(this_ptr),
context_,
max_block_size,
num_streams);
query_plan.addStep(std::move(reading));
}
void ReadFromHDFS::createIterator(const ActionsDAG::Node * predicate)
{
if (iterator_wrapper)
return;
if (storage->distributed_processing)
{
iterator_wrapper = std::make_shared<HDFSSource::IteratorWrapper>(
[callback = context_->getReadTaskCallback()]() -> StorageHDFS::PathWithInfo {
[callback = context->getReadTaskCallback()]() -> StorageHDFS::PathWithInfo {
return StorageHDFS::PathWithInfo{callback(), std::nullopt};
});
}
else if (is_path_with_globs)
else if (storage->is_path_with_globs)
{
/// Iterate through disclosed globs and make a source for each file
auto glob_iterator = std::make_shared<HDFSSource::DisclosedGlobIterator>(uris[0], query_info.query, virtual_columns, context_);
auto glob_iterator = std::make_shared<HDFSSource::DisclosedGlobIterator>(storage->uris[0], predicate, storage->virtual_columns, context);
iterator_wrapper = std::make_shared<HDFSSource::IteratorWrapper>([glob_iterator]()
{
return glob_iterator->next();
@ -871,31 +944,38 @@ Pipe StorageHDFS::read(
}
else
{
auto uris_iterator = std::make_shared<HDFSSource::URISIterator>(uris, query_info.query, virtual_columns, context_);
auto uris_iterator = std::make_shared<HDFSSource::URISIterator>(storage->uris, predicate, storage->virtual_columns, context);
iterator_wrapper = std::make_shared<HDFSSource::IteratorWrapper>([uris_iterator]()
{
return uris_iterator->next();
});
}
}
auto read_from_format_info = prepareReadingFromFormat(column_names, storage_snapshot, supportsSubsetOfColumns(context_), getVirtuals());
bool need_only_count = (query_info.optimize_trivial_count || read_from_format_info.requested_columns.empty())
&& context_->getSettingsRef().optimize_count_from_files;
void ReadFromHDFS::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &)
{
createIterator(nullptr);
Pipes pipes;
auto this_ptr = std::static_pointer_cast<StorageHDFS>(shared_from_this());
for (size_t i = 0; i < num_streams; ++i)
{
pipes.emplace_back(std::make_shared<HDFSSource>(
read_from_format_info,
this_ptr,
context_,
info,
storage,
context,
max_block_size,
iterator_wrapper,
need_only_count,
query_info));
need_only_count));
}
return Pipe::unitePipes(std::move(pipes));
auto pipe = Pipe::unitePipes(std::move(pipes));
if (pipe.empty())
pipe = Pipe(std::make_shared<NullSource>(info.source_header));
for (const auto & processor : pipe.getProcessors())
processors.emplace_back(processor);
pipeline.init(std::move(pipe));
}
SinkToStoragePtr StorageHDFS::write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context_, bool /*async_insert*/)

View File

@ -51,7 +51,8 @@ public:
String getName() const override { return "HDFS"; }
Pipe read(
void read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
@ -93,6 +94,7 @@ public:
protected:
friend class HDFSSource;
friend class ReadFromHDFS;
private:
std::vector<String> uris;
@ -114,7 +116,7 @@ public:
class DisclosedGlobIterator
{
public:
DisclosedGlobIterator(const String & uri_, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context);
DisclosedGlobIterator(const String & uri_, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context);
StorageHDFS::PathWithInfo next();
private:
class Impl;
@ -125,7 +127,7 @@ public:
class URISIterator
{
public:
URISIterator(const std::vector<String> & uris_, const ASTPtr & query, const NamesAndTypesList & virtual_columns, const ContextPtr & context);
URISIterator(const std::vector<String> & uris_, const ActionsDAG::Node * predicate, const NamesAndTypesList & virtual_columns, const ContextPtr & context);
StorageHDFS::PathWithInfo next();
private:
class Impl;
@ -142,8 +144,7 @@ public:
ContextPtr context_,
UInt64 max_block_size_,
std::shared_ptr<IteratorWrapper> file_iterator_,
bool need_only_count_,
const SelectQueryInfo & query_info_);
bool need_only_count_);
String getName() const override;
@ -162,7 +163,6 @@ private:
ColumnsDescription columns_description;
bool need_only_count;
size_t total_rows_in_file = 0;
SelectQueryInfo query_info;
std::unique_ptr<ReadBuffer> read_buf;
std::shared_ptr<IInputFormat> input_format;

View File

@ -79,9 +79,9 @@ void StorageHDFSCluster::addColumnsStructureToQuery(ASTPtr & query, const String
}
RemoteQueryExecutor::Extension StorageHDFSCluster::getTaskIteratorExtension(ASTPtr query, const ContextPtr & context) const
RemoteQueryExecutor::Extension StorageHDFSCluster::getTaskIteratorExtension(const ActionsDAG::Node * predicate, const ContextPtr & context) const
{
auto iterator = std::make_shared<HDFSSource::DisclosedGlobIterator>(uri, query, virtual_columns, context);
auto iterator = std::make_shared<HDFSSource::DisclosedGlobIterator>(uri, predicate, virtual_columns, context);
auto callback = std::make_shared<std::function<String()>>([iter = std::move(iterator)]() mutable -> String { return iter->next().path; });
return RemoteQueryExecutor::Extension{.task_iterator = std::move(callback)};
}

View File

@ -35,7 +35,7 @@ public:
NamesAndTypesList getVirtuals() const override;
RemoteQueryExecutor::Extension getTaskIteratorExtension(ASTPtr query, const ContextPtr & context) const override;
RemoteQueryExecutor::Extension getTaskIteratorExtension(const ActionsDAG::Node * predicate, const ContextPtr & context) const override;
bool supportsSubcolumns() const override { return true; }

View File

@ -29,10 +29,14 @@
#include <Parsers/ASTLiteral.h>
#include <QueryPipeline/Pipe.h>
#include <QueryPipeline/QueryPipeline.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Processors/ISource.h>
#include <Processors/Formats/IInputFormat.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/Transforms/AddingDefaultsTransform.h>
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/SourceStepWithFilter.h>
#include <Processors/Sources/NullSource.h>
#include <Storages/AlterCommands.h>
#include <Storages/HDFS/ReadBufferFromHDFS.h>
#include <Storages/HDFS/AsynchronousReadBufferFromHDFS.h>
@ -123,7 +127,6 @@ public:
String compression_method_,
Block sample_block_,
ContextPtr context_,
const SelectQueryInfo & query_info_,
UInt64 max_block_size_,
const StorageHive & storage_,
const Names & text_input_field_names_ = {})
@ -140,7 +143,6 @@ public:
, text_input_field_names(text_input_field_names_)
, format_settings(getFormatSettings(getContext()))
, read_settings(getContext()->getReadSettings())
, query_info(query_info_)
{
to_read_block = sample_block;
@ -395,7 +397,6 @@ private:
const Names & text_input_field_names;
FormatSettings format_settings;
ReadSettings read_settings;
SelectQueryInfo query_info;
HiveFilePtr current_file;
String current_path;
@ -574,7 +575,7 @@ static HiveFilePtr createHiveFile(
HiveFiles StorageHive::collectHiveFilesFromPartition(
const Apache::Hadoop::Hive::Partition & partition,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const HDFSFSPtr & fs,
const ContextPtr & context_,
@ -638,7 +639,7 @@ HiveFiles StorageHive::collectHiveFilesFromPartition(
for (size_t i = 0; i < partition_names.size(); ++i)
ranges.emplace_back(fields[i]);
const KeyCondition partition_key_condition(query_info, getContext(), partition_names, partition_minmax_idx_expr);
const KeyCondition partition_key_condition(filter_actions_dag, getContext(), partition_names, partition_minmax_idx_expr);
if (!partition_key_condition.checkInHyperrectangle(ranges, partition_types).can_be_true)
return {};
}
@ -648,7 +649,7 @@ HiveFiles StorageHive::collectHiveFilesFromPartition(
hive_files.reserve(file_infos.size());
for (const auto & file_info : file_infos)
{
auto hive_file = getHiveFileIfNeeded(file_info, fields, query_info, hive_table_metadata, context_, prune_level);
auto hive_file = getHiveFileIfNeeded(file_info, fields, filter_actions_dag, hive_table_metadata, context_, prune_level);
if (hive_file)
{
LOG_TRACE(
@ -672,7 +673,7 @@ StorageHive::listDirectory(const String & path, const HiveTableMetadataPtr & hiv
HiveFilePtr StorageHive::getHiveFileIfNeeded(
const FileInfo & file_info,
const FieldVector & fields,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const ContextPtr & context_,
PruneLevel prune_level) const
@ -706,7 +707,7 @@ HiveFilePtr StorageHive::getHiveFileIfNeeded(
if (prune_level >= PruneLevel::File)
{
const KeyCondition hivefile_key_condition(query_info, getContext(), hivefile_name_types.getNames(), hivefile_minmax_idx_expr);
const KeyCondition hivefile_key_condition(filter_actions_dag, getContext(), hivefile_name_types.getNames(), hivefile_minmax_idx_expr);
if (hive_file->useFileMinMaxIndex())
{
/// Load file level minmax index and apply
@ -758,10 +759,77 @@ bool StorageHive::supportsSubsetOfColumns() const
return format_name == "Parquet" || format_name == "ORC";
}
Pipe StorageHive::read(
class ReadFromHive : public SourceStepWithFilter
{
public:
std::string getName() const override { return "ReadFromHive"; }
void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override;
void applyFilters() override;
ReadFromHive(
Block header,
std::shared_ptr<StorageHive> storage_,
std::shared_ptr<StorageHiveSource::SourcesInfo> sources_info_,
HDFSBuilderWrapper builder_,
HDFSFSPtr fs_,
HiveMetastoreClient::HiveTableMetadataPtr hive_table_metadata_,
Block sample_block_,
Poco::Logger * log_,
ContextPtr context_,
size_t max_block_size_,
size_t num_streams_)
: SourceStepWithFilter(DataStream{.header = std::move(header)})
, storage(std::move(storage_))
, sources_info(std::move(sources_info_))
, builder(std::move(builder_))
, fs(std::move(fs_))
, hive_table_metadata(std::move(hive_table_metadata_))
, sample_block(std::move(sample_block_))
, log(log_)
, context(std::move(context_))
, max_block_size(max_block_size_)
, num_streams(num_streams_)
{
}
private:
std::shared_ptr<StorageHive> storage;
std::shared_ptr<StorageHiveSource::SourcesInfo> sources_info;
HDFSBuilderWrapper builder;
HDFSFSPtr fs;
HiveMetastoreClient::HiveTableMetadataPtr hive_table_metadata;
Block sample_block;
Poco::Logger * log;
ContextPtr context;
size_t max_block_size;
size_t num_streams;
std::optional<HiveFiles> hive_files;
void createFiles(const ActionsDAGPtr & filter_actions_dag);
};
void ReadFromHive::applyFilters()
{
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(filter_nodes.nodes, {}, context);
createFiles(filter_actions_dag);
}
void ReadFromHive::createFiles(const ActionsDAGPtr & filter_actions_dag)
{
if (hive_files)
return;
hive_files = storage->collectHiveFiles(num_streams, filter_actions_dag, hive_table_metadata, fs, context);
LOG_INFO(log, "Collect {} hive files to read", hive_files->size());
}
void StorageHive::read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
SelectQueryInfo &,
ContextPtr context_,
QueryProcessingStage::Enum /* processed_stage */,
size_t max_block_size,
@ -774,15 +842,7 @@ Pipe StorageHive::read(
auto hive_metastore_client = HiveMetastoreClientFactory::instance().getOrCreate(hive_metastore_url);
auto hive_table_metadata = hive_metastore_client->getTableMetadata(hive_database, hive_table);
/// Collect Hive files to read
HiveFiles hive_files = collectHiveFiles(num_streams, query_info, hive_table_metadata, fs, context_);
LOG_INFO(log, "Collect {} hive files to read", hive_files.size());
if (hive_files.empty())
return {};
auto sources_info = std::make_shared<StorageHiveSource::SourcesInfo>();
sources_info->hive_files = std::move(hive_files);
sources_info->database_name = hive_database;
sources_info->table_name = hive_table;
sources_info->hive_metastore_client = hive_metastore_client;
@ -822,6 +882,36 @@ Pipe StorageHive::read(
sources_info->need_file_column = true;
}
auto this_ptr = std::static_pointer_cast<StorageHive>(shared_from_this());
auto reading = std::make_unique<ReadFromHive>(
StorageHiveSource::getHeader(sample_block, sources_info),
std::move(this_ptr),
std::move(sources_info),
std::move(builder),
std::move(fs),
std::move(hive_table_metadata),
std::move(sample_block),
log,
context_,
max_block_size,
num_streams);
query_plan.addStep(std::move(reading));
}
void ReadFromHive::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &)
{
createFiles(nullptr);
if (hive_files->empty())
{
pipeline.init(Pipe(std::make_shared<NullSource>(getOutputStream().header)));
return;
}
sources_info->hive_files = std::move(*hive_files);
if (num_streams > sources_info->hive_files.size())
num_streams = sources_info->hive_files.size();
@ -830,22 +920,29 @@ Pipe StorageHive::read(
{
pipes.emplace_back(std::make_shared<StorageHiveSource>(
sources_info,
hdfs_namenode_url,
format_name,
compression_method,
storage->hdfs_namenode_url,
storage->format_name,
storage->compression_method,
sample_block,
context_,
query_info,
context,
max_block_size,
*this,
text_input_field_names));
*storage,
storage->text_input_field_names));
}
return Pipe::unitePipes(std::move(pipes));
auto pipe = Pipe::unitePipes(std::move(pipes));
if (pipe.empty())
pipe = Pipe(std::make_shared<NullSource>(getOutputStream().header));
for (const auto & processor : pipe.getProcessors())
processors.emplace_back(processor);
pipeline.init(std::move(pipe));
}
HiveFiles StorageHive::collectHiveFiles(
size_t max_threads,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const HDFSFSPtr & fs,
const ContextPtr & context_,
@ -871,7 +968,7 @@ HiveFiles StorageHive::collectHiveFiles(
[&]()
{
auto hive_files_in_partition
= collectHiveFilesFromPartition(partition, query_info, hive_table_metadata, fs, context_, prune_level);
= collectHiveFilesFromPartition(partition, filter_actions_dag, hive_table_metadata, fs, context_, prune_level);
if (!hive_files_in_partition.empty())
{
std::lock_guard lock(hive_files_mutex);
@ -897,7 +994,7 @@ HiveFiles StorageHive::collectHiveFiles(
pool.scheduleOrThrowOnError(
[&]()
{
auto hive_file = getHiveFileIfNeeded(file_info, {}, query_info, hive_table_metadata, context_, prune_level);
auto hive_file = getHiveFileIfNeeded(file_info, {}, filter_actions_dag, hive_table_metadata, context_, prune_level);
if (hive_file)
{
std::lock_guard lock(hive_files_mutex);
@ -925,13 +1022,12 @@ NamesAndTypesList StorageHive::getVirtuals() const
std::optional<UInt64> StorageHive::totalRows(const Settings & settings) const
{
/// query_info is not used when prune_level == PruneLevel::None
SelectQueryInfo query_info;
return totalRowsImpl(settings, query_info, getContext(), PruneLevel::None);
return totalRowsImpl(settings, nullptr, getContext(), PruneLevel::None);
}
std::optional<UInt64> StorageHive::totalRowsByPartitionPredicate(const SelectQueryInfo & query_info, ContextPtr context_) const
std::optional<UInt64> StorageHive::totalRowsByPartitionPredicate(const ActionsDAGPtr & filter_actions_dag, ContextPtr context_) const
{
return totalRowsImpl(context_->getSettingsRef(), query_info, context_, PruneLevel::Partition);
return totalRowsImpl(context_->getSettingsRef(), filter_actions_dag, context_, PruneLevel::Partition);
}
void StorageHive::checkAlterIsPossible(const AlterCommands & commands, ContextPtr /*local_context*/) const
@ -946,7 +1042,7 @@ void StorageHive::checkAlterIsPossible(const AlterCommands & commands, ContextPt
}
std::optional<UInt64>
StorageHive::totalRowsImpl(const Settings & settings, const SelectQueryInfo & query_info, ContextPtr context_, PruneLevel prune_level) const
StorageHive::totalRowsImpl(const Settings & settings, const ActionsDAGPtr & filter_actions_dag, ContextPtr context_, PruneLevel prune_level) const
{
/// Row-based format like Text doesn't support totalRowsByPartitionPredicate
if (!supportsSubsetOfColumns())
@ -958,7 +1054,7 @@ StorageHive::totalRowsImpl(const Settings & settings, const SelectQueryInfo & qu
HDFSFSPtr fs = createHDFSFS(builder.get());
HiveFiles hive_files = collectHiveFiles(
settings.max_threads,
query_info,
filter_actions_dag,
hive_table_metadata,
fs,
context_,

View File

@ -42,10 +42,11 @@ public:
bool supportsSubcolumns() const override { return true; }
Pipe read(
void read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
SelectQueryInfo &,
ContextPtr context,
QueryProcessingStage::Enum processed_stage,
size_t max_block_size,
@ -58,9 +59,12 @@ public:
bool supportsSubsetOfColumns() const;
std::optional<UInt64> totalRows(const Settings & settings) const override;
std::optional<UInt64> totalRowsByPartitionPredicate(const SelectQueryInfo & query_info, ContextPtr context_) const override;
std::optional<UInt64> totalRowsByPartitionPredicate(const ActionsDAGPtr & filter_actions_dag, ContextPtr context_) const override;
void checkAlterIsPossible(const AlterCommands & commands, ContextPtr local_context) const override;
protected:
friend class ReadFromHive;
private:
using FileFormat = IHiveFile::FileFormat;
using FileInfo = HiveMetastoreClient::FileInfo;
@ -88,7 +92,7 @@ private:
HiveFiles collectHiveFiles(
size_t max_threads,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const HDFSFSPtr & fs,
const ContextPtr & context_,
@ -96,7 +100,7 @@ private:
HiveFiles collectHiveFilesFromPartition(
const Apache::Hadoop::Hive::Partition & partition,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const HDFSFSPtr & fs,
const ContextPtr & context_,
@ -105,7 +109,7 @@ private:
HiveFilePtr getHiveFileIfNeeded(
const FileInfo & file_info,
const FieldVector & fields,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
const HiveTableMetadataPtr & hive_table_metadata,
const ContextPtr & context_,
PruneLevel prune_level = PruneLevel::Max) const;
@ -113,7 +117,7 @@ private:
void lazyInitialize();
std::optional<UInt64>
totalRowsImpl(const Settings & settings, const SelectQueryInfo & query_info, ContextPtr context_, PruneLevel prune_level) const;
totalRowsImpl(const Settings & settings, const ActionsDAGPtr & filter_actions_dag, ContextPtr context_, PruneLevel prune_level) const;
String hive_metastore_url;

View File

@ -669,7 +669,7 @@ public:
virtual std::optional<UInt64> totalRows(const Settings &) const { return {}; }
/// Same as above but also take partition predicate into account.
virtual std::optional<UInt64> totalRowsByPartitionPredicate(const SelectQueryInfo &, ContextPtr) const { return {}; }
virtual std::optional<UInt64> totalRowsByPartitionPredicate(const ActionsDAGPtr &, ContextPtr) const { return {}; }
/// If it is possible to quickly determine exact number of bytes for the table on storage:
/// - memory (approximated, resident)

View File

@ -1,7 +1,7 @@
#include "Storages/IStorageCluster.h"
#include <Storages/IStorageCluster.h>
#include "Common/Exception.h"
#include "Core/QueryProcessingStage.h"
#include <Common/Exception.h>
#include <Core/QueryProcessingStage.h>
#include <DataTypes/DataTypeString.h>
#include <IO/ConnectionTimeouts.h>
#include <Interpreters/Context.h>
@ -11,11 +11,14 @@
#include <Interpreters/AddDefaultDatabaseVisitor.h>
#include <Interpreters/TranslateQualifiedNamesVisitor.h>
#include <Interpreters/InterpreterSelectQueryAnalyzer.h>
#include <Parsers/queryToString.h>
#include <Processors/Sources/NullSource.h>
#include <Processors/Sources/RemoteSource.h>
#include <Processors/QueryPlan/SourceStepWithFilter.h>
#include <QueryPipeline/narrowPipe.h>
#include <QueryPipeline/Pipe.h>
#include <Processors/Sources/RemoteSource.h>
#include <QueryPipeline/RemoteQueryExecutor.h>
#include <Parsers/queryToString.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Storages/IStorage.h>
#include <Storages/SelectQueryInfo.h>
#include <Storages/StorageDictionary.h>
@ -38,9 +41,66 @@ IStorageCluster::IStorageCluster(
{
}
class ReadFromCluster : public SourceStepWithFilter
{
public:
std::string getName() const override { return "ReadFromCluster"; }
void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override;
void applyFilters() override;
ReadFromCluster(
Block sample_block,
std::shared_ptr<IStorageCluster> storage_,
ASTPtr query_to_send_,
QueryProcessingStage::Enum processed_stage_,
ClusterPtr cluster_,
Poco::Logger * log_,
ContextPtr context_)
: SourceStepWithFilter(DataStream{.header = std::move(sample_block)})
, storage(std::move(storage_))
, query_to_send(std::move(query_to_send_))
, processed_stage(processed_stage_)
, cluster(std::move(cluster_))
, log(log_)
, context(std::move(context_))
{
}
private:
std::shared_ptr<IStorageCluster> storage;
ASTPtr query_to_send;
QueryProcessingStage::Enum processed_stage;
ClusterPtr cluster;
Poco::Logger * log;
ContextPtr context;
std::optional<RemoteQueryExecutor::Extension> extension;
void createExtension(const ActionsDAG::Node * predicate);
ContextPtr updateSettings(const Settings & settings);
};
void ReadFromCluster::applyFilters()
{
auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG(filter_nodes.nodes, {}, context);
const ActionsDAG::Node * predicate = nullptr;
if (filter_actions_dag)
predicate = filter_actions_dag->getOutputs().at(0);
createExtension(predicate);
}
void ReadFromCluster::createExtension(const ActionsDAG::Node * predicate)
{
if (extension)
return;
extension = storage->getTaskIteratorExtension(predicate, context);
}
/// The code executes on initiator
Pipe IStorageCluster::read(
void IStorageCluster::read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
@ -49,10 +109,10 @@ Pipe IStorageCluster::read(
size_t /*max_block_size*/,
size_t /*num_streams*/)
{
updateBeforeRead(context);
storage_snapshot->check(column_names);
updateBeforeRead(context);
auto cluster = getCluster(context);
auto extension = getTaskIteratorExtension(query_info.query, context);
/// Calculate the header. This is significant, because some columns could be thrown away in some cases like query with count(*)
@ -70,12 +130,6 @@ Pipe IStorageCluster::read(
query_to_send = interpreter.getQueryInfo().query->clone();
}
const Scalars & scalars = context->hasQueryContext() ? context->getQueryContext()->getScalars() : Scalars{};
Pipes pipes;
const bool add_agg_info = processed_stage == QueryProcessingStage::WithMergeableState;
if (!structure_argument_was_provided)
addColumnsStructureToQuery(query_to_send, storage_snapshot->metadata->getColumns().getAll().toNamesAndTypesDescription(), context);
@ -89,7 +143,29 @@ Pipe IStorageCluster::read(
/* only_replace_in_join_= */true);
visitor.visit(query_to_send);
auto new_context = updateSettings(context, context->getSettingsRef());
auto this_ptr = std::static_pointer_cast<IStorageCluster>(shared_from_this());
auto reading = std::make_unique<ReadFromCluster>(
sample_block,
std::move(this_ptr),
std::move(query_to_send),
processed_stage,
cluster,
log,
context);
query_plan.addStep(std::move(reading));
}
void ReadFromCluster::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &)
{
createExtension(nullptr);
const Scalars & scalars = context->hasQueryContext() ? context->getQueryContext()->getScalars() : Scalars{};
const bool add_agg_info = processed_stage == QueryProcessingStage::WithMergeableState;
Pipes pipes;
auto new_context = updateSettings(context->getSettingsRef());
const auto & current_settings = new_context->getSettingsRef();
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(current_settings);
for (const auto & shard_info : cluster->getShardsInfo())
@ -100,7 +176,7 @@ Pipe IStorageCluster::read(
auto remote_query_executor = std::make_shared<RemoteQueryExecutor>(
std::vector<IConnectionPool::Entry>{try_result},
queryToString(query_to_send),
sample_block,
getOutputStream().header,
new_context,
/*throttler=*/nullptr,
scalars,
@ -113,8 +189,14 @@ Pipe IStorageCluster::read(
}
}
storage_snapshot->check(column_names);
return Pipe::unitePipes(std::move(pipes));
auto pipe = Pipe::unitePipes(std::move(pipes));
if (pipe.empty())
pipe = Pipe(std::make_shared<NullSource>(getOutputStream().header));
for (const auto & processor : pipe.getProcessors())
processors.emplace_back(processor);
pipeline.init(std::move(pipe));
}
QueryProcessingStage::Enum IStorageCluster::getQueryProcessingStage(
@ -129,7 +211,7 @@ QueryProcessingStage::Enum IStorageCluster::getQueryProcessingStage(
return QueryProcessingStage::Enum::FetchColumns;
}
ContextPtr IStorageCluster::updateSettings(ContextPtr context, const Settings & settings)
ContextPtr ReadFromCluster::updateSettings(const Settings & settings)
{
Settings new_settings = settings;

View File

@ -22,7 +22,8 @@ public:
Poco::Logger * log_,
bool structure_argument_was_provided_);
Pipe read(
void read(
QueryPlan & query_plan,
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo & query_info,
@ -33,7 +34,7 @@ public:
ClusterPtr getCluster(ContextPtr context) const;
/// Query is needed for pruning by virtual columns (_file, _path)
virtual RemoteQueryExecutor::Extension getTaskIteratorExtension(ASTPtr query, const ContextPtr & context) const = 0;
virtual RemoteQueryExecutor::Extension getTaskIteratorExtension(const ActionsDAG::Node * predicate, const ContextPtr & context) const = 0;
QueryProcessingStage::Enum getQueryProcessingStage(ContextPtr, QueryProcessingStage::Enum, const StorageSnapshotPtr &, SelectQueryInfo &) const override;
@ -45,8 +46,6 @@ protected:
virtual void addColumnsStructureToQuery(ASTPtr & query, const String & structure, const ContextPtr & context) = 0;
private:
ContextPtr updateSettings(ContextPtr context, const Settings & settings);
Poco::Logger * log;
String cluster_name;
bool structure_argument_was_provided;

View File

@ -762,92 +762,6 @@ void KeyCondition::getAllSpaceFillingCurves()
}
}
KeyCondition::KeyCondition(
const ASTPtr & query,
const ASTs & additional_filter_asts,
Block block_with_constants,
PreparedSetsPtr prepared_sets,
ContextPtr context,
const Names & key_column_names,
const ExpressionActionsPtr & key_expr_,
NameSet array_joined_column_names_,
bool single_point_,
bool strict_)
: key_expr(key_expr_)
, key_subexpr_names(getAllSubexpressionNames(*key_expr))
, array_joined_column_names(std::move(array_joined_column_names_))
, single_point(single_point_)
, strict(strict_)
{
size_t key_index = 0;
for (const auto & name : key_column_names)
{
if (!key_columns.contains(name))
{
key_columns[name] = key_columns.size();
key_indices.push_back(key_index);
}
++key_index;
}
if (context->getSettingsRef().analyze_index_with_space_filling_curves)
getAllSpaceFillingCurves();
ASTPtr filter_node;
if (query)
filter_node = buildFilterNode(query, additional_filter_asts);
if (!filter_node)
{
has_filter = false;
rpn.emplace_back(RPNElement::FUNCTION_UNKNOWN);
return;
}
has_filter = true;
/** When non-strictly monotonic functions are employed in functional index (e.g. ORDER BY toStartOfHour(dateTime)),
* the use of NOT operator in predicate will result in the indexing algorithm leave out some data.
* This is caused by rewriting in KeyCondition::tryParseAtomFromAST of relational operators to less strict
* when parsing the AST into internal RPN representation.
* To overcome the problem, before parsing the AST we transform it to its semantically equivalent form where all NOT's
* are pushed down and applied (when possible) to leaf nodes.
*/
auto inverted_filter_node = DB::cloneASTWithInversionPushDown(filter_node);
RPNBuilder<RPNElement> builder(
inverted_filter_node,
std::move(context),
std::move(block_with_constants),
std::move(prepared_sets),
[&](const RPNBuilderTreeNode & node, RPNElement & out) { return extractAtomFromTree(node, out); });
rpn = std::move(builder).extractRPN();
findHyperrectanglesForArgumentsOfSpaceFillingCurves();
}
KeyCondition::KeyCondition(
const SelectQueryInfo & query_info,
ContextPtr context,
const Names & key_column_names,
const ExpressionActionsPtr & key_expr_,
bool single_point_,
bool strict_)
: KeyCondition(
query_info.query,
query_info.filter_asts,
KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context),
query_info.prepared_sets,
context,
key_column_names,
key_expr_,
query_info.syntax_analyzer_result ? query_info.syntax_analyzer_result->getArrayJoinSourceNameSet() : NameSet{},
single_point_,
strict_)
{
}
KeyCondition::KeyCondition(
ActionsDAGPtr filter_dag,
ContextPtr context,
@ -883,6 +797,13 @@ KeyCondition::KeyCondition(
has_filter = true;
/** When non-strictly monotonic functions are employed in functional index (e.g. ORDER BY toStartOfHour(dateTime)),
* the use of NOT operator in predicate will result in the indexing algorithm leave out some data.
* This is caused by rewriting in KeyCondition::tryParseAtomFromAST of relational operators to less strict
* when parsing the AST into internal RPN representation.
* To overcome the problem, before parsing the AST we transform it to its semantically equivalent form where all NOT's
* are pushed down and applied (when possible) to leaf nodes.
*/
auto inverted_dag = cloneASTWithInversionPushDown({filter_dag->getOutputs().at(0)}, context);
assert(inverted_dag->getOutputs().size() == 1);

View File

@ -39,30 +39,6 @@ struct ActionDAGNodes;
class KeyCondition
{
public:
/// Construct key condition from AST SELECT query WHERE, PREWHERE and additional filters
KeyCondition(
const ASTPtr & query,
const ASTs & additional_filter_asts,
Block block_with_constants,
PreparedSetsPtr prepared_sets_,
ContextPtr context,
const Names & key_column_names,
const ExpressionActionsPtr & key_expr,
NameSet array_joined_column_names,
bool single_point_ = false,
bool strict_ = false);
/** Construct key condition from AST SELECT query WHERE, PREWHERE and additional filters.
* Select query, additional filters, prepared sets are initialized using query info.
*/
KeyCondition(
const SelectQueryInfo & query_info,
ContextPtr context,
const Names & key_column_names,
const ExpressionActionsPtr & key_expr_,
bool single_point_ = false,
bool strict_ = false);
/// Construct key condition from ActionsDAG nodes
KeyCondition(
ActionsDAGPtr filter_dag,

View File

@ -43,6 +43,8 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
LOG_TRACE(log, "Executing log entry to merge parts {} to {}",
fmt::join(entry.source_parts, ", "), entry.new_part_name);
StorageMetadataPtr metadata_snapshot = storage.getInMemoryMetadataPtr();
int32_t metadata_version = metadata_snapshot->getMetadataVersion();
const auto storage_settings_ptr = storage.getSettings();
if (storage_settings_ptr->always_fetch_merged_part)
@ -129,6 +131,18 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
};
}
int32_t part_metadata_version = source_part_or_covering->getMetadataVersion();
if (part_metadata_version > metadata_version)
{
LOG_DEBUG(log, "Source part metadata version {} is newer then the table metadata version {}. ALTER_METADATA is still in progress.",
part_metadata_version, metadata_version);
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
parts.push_back(source_part_or_covering);
}
@ -176,8 +190,6 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
/// It will live until the whole task is being destroyed
table_lock_holder = storage.lockForShare(RWLockImpl::NO_QUERY, storage_settings_ptr->lock_acquire_timeout_for_background_operations);
StorageMetadataPtr metadata_snapshot = storage.getInMemoryMetadataPtr();
auto future_merged_part = std::make_shared<FutureMergedMutatedPart>(parts, entry.new_part_format);
if (future_merged_part->name != entry.new_part_name)
{

View File

@ -570,6 +570,7 @@ void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const
for (size_t part_num = 0; part_num < global_ctx->future_part->parts.size(); ++part_num)
{
Pipe pipe = createMergeTreeSequentialSource(
MergeTreeSequentialSourceType::Merge,
*global_ctx->data,
global_ctx->storage_snapshot,
global_ctx->future_part->parts[part_num],
@ -925,6 +926,7 @@ void MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream()
for (const auto & part : global_ctx->future_part->parts)
{
Pipe pipe = createMergeTreeSequentialSource(
MergeTreeSequentialSourceType::Merge,
*global_ctx->data,
global_ctx->storage_snapshot,
part,

View File

@ -1075,26 +1075,30 @@ Block MergeTreeData::getBlockWithVirtualPartColumns(const MergeTreeData::DataPar
std::optional<UInt64> MergeTreeData::totalRowsByPartitionPredicateImpl(
const SelectQueryInfo & query_info, ContextPtr local_context, const DataPartsVector & parts) const
const ActionsDAGPtr & filter_actions_dag, ContextPtr local_context, const DataPartsVector & parts) const
{
if (parts.empty())
return 0u;
auto metadata_snapshot = getInMemoryMetadataPtr();
ASTPtr expression_ast;
Block virtual_columns_block = getBlockWithVirtualPartColumns(parts, true /* one_part */);
// Generate valid expressions for filtering
bool valid = VirtualColumnUtils::prepareFilterBlockWithQuery(query_info.query, local_context, virtual_columns_block, expression_ast);
auto filter_dag = VirtualColumnUtils::splitFilterDagForAllowedInputs(filter_actions_dag->getOutputs().at(0), nullptr);
PartitionPruner partition_pruner(metadata_snapshot, query_info, local_context, true /* strict */);
// Generate valid expressions for filtering
bool valid = true;
for (const auto * input : filter_dag->getInputs())
if (!virtual_columns_block.has(input->result_name))
valid = false;
PartitionPruner partition_pruner(metadata_snapshot, filter_dag, local_context, true /* strict */);
if (partition_pruner.isUseless() && !valid)
return {};
std::unordered_set<String> part_values;
if (valid && expression_ast)
if (valid)
{
virtual_columns_block = getBlockWithVirtualPartColumns(parts, false /* one_part */);
VirtualColumnUtils::filterBlockWithQuery(query_info.query, virtual_columns_block, local_context, expression_ast);
VirtualColumnUtils::filterBlockWithDAG(filter_dag, virtual_columns_block, local_context);
part_values = VirtualColumnUtils::extractSingleValueFromBlock<String>(virtual_columns_block, "_part");
if (part_values.empty())
return 0;
@ -3985,8 +3989,15 @@ MergeTreeData::PartsToRemoveFromZooKeeper MergeTreeData::removePartsInRangeFromW
/// FIXME refactor removePartsFromWorkingSet(...), do not remove parts twice
removePartsFromWorkingSet(txn, parts_to_remove, clear_without_timeout, lock);
/// We can only create a covering part for a blocks range that starts with 0 (otherwise we may get "intersecting parts"
/// if we remove a range from the middle when dropping a part).
/// Maybe we could do it by incrementing mutation version to get a name for the empty covering part,
/// but it's okay to simply avoid creating it for DROP PART (for a part in the middle).
/// NOTE: Block numbers in ReplicatedMergeTree start from 0. For MergeTree, is_new_syntax is always false.
assert(!create_empty_part || supportsReplication());
bool range_in_the_middle = drop_range.min_block;
bool is_new_syntax = format_version >= MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING;
if (create_empty_part && !parts_to_remove.empty() && is_new_syntax)
if (create_empty_part && !parts_to_remove.empty() && is_new_syntax && !range_in_the_middle)
{
/// We are going to remove a lot of parts from zookeeper just after returning from this function.
/// And we will remove parts from disk later (because some queries may use them).
@ -3995,12 +4006,9 @@ MergeTreeData::PartsToRemoveFromZooKeeper MergeTreeData::removePartsInRangeFromW
/// We don't need to commit it to zk, and don't even need to activate it.
MergeTreePartInfo empty_info = drop_range;
empty_info.level = empty_info.mutation = 0;
if (!empty_info.min_block)
empty_info.min_block = MergeTreePartInfo::MAX_BLOCK_NUMBER;
empty_info.min_block = empty_info.level = empty_info.mutation = 0;
for (const auto & part : parts_to_remove)
{
empty_info.min_block = std::min(empty_info.min_block, part->info.min_block);
empty_info.level = std::max(empty_info.level, part->info.level);
empty_info.mutation = std::max(empty_info.mutation, part->info.mutation);
}
@ -6617,8 +6625,7 @@ using PartitionIdToMaxBlock = std::unordered_map<String, Int64>;
Block MergeTreeData::getMinMaxCountProjectionBlock(
const StorageMetadataPtr & metadata_snapshot,
const Names & required_columns,
bool has_filter,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_dag,
const DataPartsVector & parts,
const PartitionIdToMaxBlock * max_block_numbers_to_read,
ContextPtr query_context) const
@ -6668,7 +6675,7 @@ Block MergeTreeData::getMinMaxCountProjectionBlock(
Block virtual_columns_block;
auto virtual_block = getSampleBlockWithVirtualColumns();
bool has_virtual_column = std::any_of(required_columns.begin(), required_columns.end(), [&](const auto & name) { return virtual_block.has(name); });
if (has_virtual_column || has_filter)
if (has_virtual_column || filter_dag)
{
virtual_columns_block = getBlockWithVirtualPartColumns(parts, false /* one_part */, true /* ignore_empty */);
if (virtual_columns_block.rows() == 0)
@ -6680,7 +6687,7 @@ Block MergeTreeData::getMinMaxCountProjectionBlock(
std::optional<PartitionPruner> partition_pruner;
std::optional<KeyCondition> minmax_idx_condition;
DataTypes minmax_columns_types;
if (has_filter)
if (filter_dag)
{
if (metadata_snapshot->hasPartitionKey())
{
@ -6689,16 +6696,15 @@ Block MergeTreeData::getMinMaxCountProjectionBlock(
minmax_columns_types = getMinMaxColumnsTypes(partition_key);
minmax_idx_condition.emplace(
query_info, query_context, minmax_columns_names,
filter_dag, query_context, minmax_columns_names,
getMinMaxExpr(partition_key, ExpressionActionsSettings::fromContext(query_context)));
partition_pruner.emplace(metadata_snapshot, query_info, query_context, false /* strict */);
partition_pruner.emplace(metadata_snapshot, filter_dag, query_context, false /* strict */);
}
const auto * predicate = filter_dag->getOutputs().at(0);
// Generate valid expressions for filtering
ASTPtr expression_ast;
VirtualColumnUtils::prepareFilterBlockWithQuery(query_info.query, query_context, virtual_columns_block, expression_ast);
if (expression_ast)
VirtualColumnUtils::filterBlockWithQuery(query_info.query, virtual_columns_block, query_context, expression_ast);
VirtualColumnUtils::filterBlockWithPredicate(predicate, virtual_columns_block, query_context);
rows = virtual_columns_block.rows();
part_name_column = virtual_columns_block.getByName("_part").column;

View File

@ -404,8 +404,7 @@ public:
Block getMinMaxCountProjectionBlock(
const StorageMetadataPtr & metadata_snapshot,
const Names & required_columns,
bool has_filter,
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_dag,
const DataPartsVector & parts,
const PartitionIdToMaxBlock * max_block_numbers_to_read,
ContextPtr query_context) const;
@ -1222,7 +1221,7 @@ protected:
boost::iterator_range<DataPartIteratorByStateAndInfo> range, const ColumnsDescription & storage_columns);
std::optional<UInt64> totalRowsByPartitionPredicateImpl(
const SelectQueryInfo & query_info, ContextPtr context, const DataPartsVector & parts) const;
const ActionsDAGPtr & filter_actions_dag, ContextPtr context, const DataPartsVector & parts) const;
static decltype(auto) getStateModifier(DataPartState state)
{

View File

@ -784,7 +784,7 @@ void MergeTreeDataSelectExecutor::buildKeyConditionFromPartOffset(
= {ColumnWithTypeAndName(part_offset_type->createColumn(), part_offset_type, "_part_offset"),
ColumnWithTypeAndName(part_type->createColumn(), part_type, "_part")};
auto dag = VirtualColumnUtils::splitFilterDagForAllowedInputs(filter_dag->getOutputs().at(0), sample);
auto dag = VirtualColumnUtils::splitFilterDagForAllowedInputs(filter_dag->getOutputs().at(0), &sample);
if (!dag)
return;
@ -810,7 +810,7 @@ std::optional<std::unordered_set<String>> MergeTreeDataSelectExecutor::filterPar
if (!filter_dag)
return {};
auto sample = data.getSampleBlockWithVirtualColumns();
auto dag = VirtualColumnUtils::splitFilterDagForAllowedInputs(filter_dag->getOutputs().at(0), sample);
auto dag = VirtualColumnUtils::splitFilterDagForAllowedInputs(filter_dag->getOutputs().at(0), &sample);
if (!dag)
return {};
@ -819,34 +819,6 @@ std::optional<std::unordered_set<String>> MergeTreeDataSelectExecutor::filterPar
return VirtualColumnUtils::extractSingleValueFromBlock<String>(virtual_columns_block, "_part");
}
std::optional<std::unordered_set<String>> MergeTreeDataSelectExecutor::filterPartsByVirtualColumns(
const MergeTreeData & data,
const MergeTreeData::DataPartsVector & parts,
const ASTPtr & query,
ContextPtr context)
{
std::unordered_set<String> part_values;
ASTPtr expression_ast;
auto virtual_columns_block = data.getBlockWithVirtualPartColumns(parts, true /* one_part */);
if (virtual_columns_block.rows() == 0)
return {};
// Generate valid expressions for filtering
VirtualColumnUtils::prepareFilterBlockWithQuery(query, context, virtual_columns_block, expression_ast);
// If there is still something left, fill the virtual block and do the filtering.
if (expression_ast)
{
virtual_columns_block = data.getBlockWithVirtualPartColumns(parts, false /* one_part */);
VirtualColumnUtils::filterBlockWithQuery(query, virtual_columns_block, context, expression_ast);
return VirtualColumnUtils::extractSingleValueFromBlock<String>(virtual_columns_block, "_part");
}
return {};
}
void MergeTreeDataSelectExecutor::filterPartsByPartition(
const std::optional<PartitionPruner> & partition_pruner,
const std::optional<KeyCondition> & minmax_idx_condition,

View File

@ -169,12 +169,6 @@ public:
/// If possible, filter using expression on virtual columns.
/// Example: SELECT count() FROM table WHERE _part = 'part_name'
/// If expression found, return a set with allowed part names (std::nullopt otherwise).
static std::optional<std::unordered_set<String>> filterPartsByVirtualColumns(
const MergeTreeData & data,
const MergeTreeData::DataPartsVector & parts,
const ASTPtr & query,
ContextPtr context);
static std::optional<std::unordered_set<String>> filterPartsByVirtualColumns(
const MergeTreeData & data,
const MergeTreeData::DataPartsVector & parts,

View File

@ -23,6 +23,7 @@ namespace ErrorCodes
extern const int INCORRECT_NUMBER_OF_COLUMNS;
extern const int INCORRECT_QUERY;
extern const int LOGICAL_ERROR;
extern const int NOT_IMPLEMENTED;
}
template <typename Distance>
@ -331,6 +332,11 @@ MergeTreeIndexConditionPtr MergeTreeIndexAnnoy::createIndexCondition(const Selec
return std::make_shared<MergeTreeIndexConditionAnnoy>(index, query, distance_function, context);
};
MergeTreeIndexConditionPtr MergeTreeIndexAnnoy::createIndexCondition(const ActionsDAGPtr &, ContextPtr) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "MergeTreeIndexAnnoy cannot be created with ActionsDAG");
}
MergeTreeIndexPtr annoyIndexCreator(const IndexDescription & index)
{
static constexpr auto DEFAULT_DISTANCE_FUNCTION = DISTANCE_FUNCTION_L2;

View File

@ -88,7 +88,7 @@ private:
};
class MergeTreeIndexAnnoy : public IMergeTreeIndex
class MergeTreeIndexAnnoy final : public IMergeTreeIndex
{
public:
@ -98,7 +98,9 @@ public:
MergeTreeIndexGranulePtr createIndexGranule() const override;
MergeTreeIndexAggregatorPtr createIndexAggregator(const MergeTreeWriterSettings & settings) const override;
MergeTreeIndexConditionPtr createIndexCondition(const SelectQueryInfo & query, ContextPtr context) const override;
MergeTreeIndexConditionPtr createIndexCondition(const SelectQueryInfo & query, ContextPtr context) const;
MergeTreeIndexConditionPtr createIndexCondition(const ActionsDAGPtr &, ContextPtr) const override;
bool isVectorSearch() const override { return true; }
private:
const UInt64 trees;

View File

@ -43,9 +43,9 @@ MergeTreeIndexAggregatorPtr MergeTreeIndexBloomFilter::createIndexAggregator(con
return std::make_shared<MergeTreeIndexAggregatorBloomFilter>(bits_per_row, hash_functions, index.column_names);
}
MergeTreeIndexConditionPtr MergeTreeIndexBloomFilter::createIndexCondition(const SelectQueryInfo & query_info, ContextPtr context) const
MergeTreeIndexConditionPtr MergeTreeIndexBloomFilter::createIndexCondition(const ActionsDAGPtr & filter_actions_dag, ContextPtr context) const
{
return std::make_shared<MergeTreeIndexConditionBloomFilter>(query_info, context, index.sample_block, hash_functions);
return std::make_shared<MergeTreeIndexConditionBloomFilter>(filter_actions_dag, context, index.sample_block, hash_functions);
}
static void assertIndexColumnsType(const Block & header)

View File

@ -20,7 +20,7 @@ public:
MergeTreeIndexAggregatorPtr createIndexAggregator(const MergeTreeWriterSettings & settings) const override;
MergeTreeIndexConditionPtr createIndexCondition(const SelectQueryInfo & query_info, ContextPtr context) const override;
MergeTreeIndexConditionPtr createIndexCondition(const ActionsDAGPtr & filter_actions_dag, ContextPtr context) const override;
private:
size_t bits_per_row;

View File

@ -97,39 +97,18 @@ bool maybeTrueOnBloomFilter(const IColumn * hash_column, const BloomFilterPtr &
}
MergeTreeIndexConditionBloomFilter::MergeTreeIndexConditionBloomFilter(
const SelectQueryInfo & info_, ContextPtr context_, const Block & header_, size_t hash_functions_)
: WithContext(context_), header(header_), query_info(info_), hash_functions(hash_functions_)
const ActionsDAGPtr & filter_actions_dag, ContextPtr context_, const Block & header_, size_t hash_functions_)
: WithContext(context_), header(header_), hash_functions(hash_functions_)
{
if (context_->getSettingsRef().allow_experimental_analyzer)
{
if (!query_info.filter_actions_dag)
{
rpn.push_back(RPNElement::FUNCTION_UNKNOWN);
return;
}
RPNBuilder<RPNElement> builder(
query_info.filter_actions_dag->getOutputs().at(0),
context_,
[&](const RPNBuilderTreeNode & node, RPNElement & out) { return extractAtomFromTree(node, out); });
rpn = std::move(builder).extractRPN();
return;
}
ASTPtr filter_node = buildFilterNode(query_info.query);
if (!filter_node)
if (!filter_actions_dag)
{
rpn.push_back(RPNElement::FUNCTION_UNKNOWN);
return;
}
auto block_with_constants = KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context_);
RPNBuilder<RPNElement> builder(
filter_node,
filter_actions_dag->getOutputs().at(0),
context_,
std::move(block_with_constants),
query_info.prepared_sets,
[&](const RPNBuilderTreeNode & node, RPNElement & out) { return extractAtomFromTree(node, out); });
rpn = std::move(builder).extractRPN();
}

View File

@ -44,7 +44,7 @@ public:
std::vector<std::pair<size_t, ColumnPtr>> predicate;
};
MergeTreeIndexConditionBloomFilter(const SelectQueryInfo & info_, ContextPtr context_, const Block & header_, size_t hash_functions_);
MergeTreeIndexConditionBloomFilter(const ActionsDAGPtr & filter_actions_dag, ContextPtr context_, const Block & header_, size_t hash_functions_);
bool alwaysUnknownOrTrue() const override;
@ -58,7 +58,6 @@ public:
private:
const Block & header;
const SelectQueryInfo & query_info;
const size_t hash_functions;
std::vector<RPNElement> rpn;

View File

@ -1,22 +1,23 @@
#include <Storages/MergeTree/MergeTreeIndexFullText.h>
#include <Columns/ColumnArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <Common/OptimizedRegularExpression.h>
#include <Core/Defines.h>
#include <DataTypes/DataTypeArray.h>
#include <IO/WriteHelpers.h>
#include <DataTypes/DataTypesNumber.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/ExpressionActions.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Interpreters/TreeRewriter.h>
#include <Interpreters/misc.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Storages/MergeTree/RPNBuilder.h>
#include <Storages/MergeTree/MergeTreeIndexUtils.h>
#include <Parsers/ASTIdentifier.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTSelectQuery.h>
#include <Core/Defines.h>
#include <Parsers/ASTSubquery.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Storages/MergeTree/MergeTreeIndexUtils.h>
#include <Storages/MergeTree/RPNBuilder.h>
#include <Poco/Logger.h>
@ -137,7 +138,7 @@ void MergeTreeIndexAggregatorFullText::update(const Block & block, size_t * pos,
}
MergeTreeConditionFullText::MergeTreeConditionFullText(
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
ContextPtr context,
const Block & index_sample_block,
const BloomFilterParameters & params_,
@ -146,38 +147,16 @@ MergeTreeConditionFullText::MergeTreeConditionFullText(
, index_data_types(index_sample_block.getNamesAndTypesList().getTypes())
, params(params_)
, token_extractor(token_extactor_)
, prepared_sets(query_info.prepared_sets)
{
if (context->getSettingsRef().allow_experimental_analyzer)
{
if (!query_info.filter_actions_dag)
{
rpn.push_back(RPNElement::FUNCTION_UNKNOWN);
return;
}
RPNBuilder<RPNElement> builder(
query_info.filter_actions_dag->getOutputs().at(0),
context,
[&](const RPNBuilderTreeNode & node, RPNElement & out) { return extractAtomFromTree(node, out); });
rpn = std::move(builder).extractRPN();
return;
}
ASTPtr filter_node = buildFilterNode(query_info.query);
if (!filter_node)
if (!filter_actions_dag)
{
rpn.push_back(RPNElement::FUNCTION_UNKNOWN);
return;
}
auto block_with_constants = KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context);
RPNBuilder<RPNElement> builder(
filter_node,
filter_actions_dag->getOutputs().at(0),
context,
std::move(block_with_constants),
query_info.prepared_sets,
[&](const RPNBuilderTreeNode & node, RPNElement & out) { return extractAtomFromTree(node, out); });
rpn = std::move(builder).extractRPN();
}
@ -201,6 +180,7 @@ bool MergeTreeConditionFullText::alwaysUnknownOrTrue() const
|| element.function == RPNElement::FUNCTION_IN
|| element.function == RPNElement::FUNCTION_NOT_IN
|| element.function == RPNElement::FUNCTION_MULTI_SEARCH
|| element.function == RPNElement::FUNCTION_MATCH
|| element.function == RPNElement::FUNCTION_HAS_ANY
|| element.function == RPNElement::ALWAYS_FALSE)
{
@ -285,8 +265,27 @@ bool MergeTreeConditionFullText::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx
for (size_t row = 0; row < bloom_filters.size(); ++row)
result[row] = result[row] && granule->bloom_filters[element.key_column].contains(bloom_filters[row]);
rpn_stack.emplace_back(
std::find(std::cbegin(result), std::cend(result), true) != std::end(result), true);
rpn_stack.emplace_back(std::find(std::cbegin(result), std::cend(result), true) != std::end(result), true);
}
else if (element.function == RPNElement::FUNCTION_MATCH)
{
if (!element.set_bloom_filters.empty())
{
/// Alternative substrings
std::vector<bool> result(element.set_bloom_filters.back().size(), true);
const auto & bloom_filters = element.set_bloom_filters[0];
for (size_t row = 0; row < bloom_filters.size(); ++row)
result[row] = result[row] && granule->bloom_filters[element.key_column].contains(bloom_filters[row]);
rpn_stack.emplace_back(std::find(std::cbegin(result), std::cend(result), true) != std::end(result), true);
}
else if (element.bloom_filter)
{
/// Required substrings
rpn_stack.emplace_back(granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true);
}
}
else if (element.function == RPNElement::FUNCTION_NOT)
{
@ -392,6 +391,7 @@ bool MergeTreeConditionFullText::extractAtomFromTree(const RPNBuilderTreeNode &
function_name == "notEquals" ||
function_name == "has" ||
function_name == "mapContains" ||
function_name == "match" ||
function_name == "like" ||
function_name == "notLike" ||
function_name.starts_with("hasToken") ||
@ -513,6 +513,7 @@ bool MergeTreeConditionFullText::traverseTreeEquals(
token_extractor->stringToBloomFilter(value.data(), value.size(), *out.bloom_filter);
return true;
}
else if (function_name == "has")
{
out.key_column = *key_index;
@ -600,6 +601,39 @@ bool MergeTreeConditionFullText::traverseTreeEquals(
out.set_bloom_filters = std::move(bloom_filters);
return true;
}
else if (function_name == "match")
{
out.key_column = *key_index;
out.function = RPNElement::FUNCTION_MATCH;
out.bloom_filter = std::make_unique<BloomFilter>(params);
auto & value = const_value.get<String>();
String required_substring;
bool dummy_is_trivial, dummy_required_substring_is_prefix;
std::vector<String> alternatives;
OptimizedRegularExpression::analyze(value, required_substring, dummy_is_trivial, dummy_required_substring_is_prefix, alternatives);
if (required_substring.empty() && alternatives.empty())
return false;
/// out.set_bloom_filters means alternatives exist
/// out.bloom_filter means required_substring exists
if (!alternatives.empty())
{
std::vector<std::vector<BloomFilter>> bloom_filters;
bloom_filters.emplace_back();
for (const auto & alternative : alternatives)
{
bloom_filters.back().emplace_back(params);
token_extractor->stringToBloomFilter(alternative.data(), alternative.size(), bloom_filters.back().back());
}
out.set_bloom_filters = std::move(bloom_filters);
}
else
token_extractor->stringToBloomFilter(required_substring.data(), required_substring.size(), *out.bloom_filter);
return true;
}
return false;
}
@ -691,9 +725,9 @@ MergeTreeIndexAggregatorPtr MergeTreeIndexFullText::createIndexAggregator(const
}
MergeTreeIndexConditionPtr MergeTreeIndexFullText::createIndexCondition(
const SelectQueryInfo & query, ContextPtr context) const
const ActionsDAGPtr & filter_dag, ContextPtr context) const
{
return std::make_shared<MergeTreeConditionFullText>(query, context, index.sample_block, params, token_extractor.get());
return std::make_shared<MergeTreeConditionFullText>(filter_dag, context, index.sample_block, params, token_extractor.get());
}
MergeTreeIndexPtr bloomFilterIndexCreator(

View File

@ -62,7 +62,7 @@ class MergeTreeConditionFullText final : public IMergeTreeIndexCondition
{
public:
MergeTreeConditionFullText(
const SelectQueryInfo & query_info,
const ActionsDAGPtr & filter_actions_dag,
ContextPtr context,
const Block & index_sample_block,
const BloomFilterParameters & params_,
@ -90,6 +90,7 @@ private:
FUNCTION_NOT_EQUALS,
FUNCTION_HAS,
FUNCTION_IN,
FUNCTION_MATCH,
FUNCTION_NOT_IN,
FUNCTION_MULTI_SEARCH,
FUNCTION_HAS_ANY,
@ -143,9 +144,6 @@ private:
BloomFilterParameters params;
TokenExtractorPtr token_extractor;
RPN rpn;
/// Sets from syntax analyzer.
PreparedSetsPtr prepared_sets;
};
class MergeTreeIndexFullText final : public IMergeTreeIndex
@ -165,7 +163,7 @@ public:
MergeTreeIndexAggregatorPtr createIndexAggregator(const MergeTreeWriterSettings & settings) const override;
MergeTreeIndexConditionPtr createIndexCondition(
const SelectQueryInfo & query, ContextPtr context) const override;
const ActionsDAGPtr & filter_dag, ContextPtr context) const override;
BloomFilterParameters params;
/// Function for selecting next token.

View File

@ -79,7 +79,7 @@ MergeTreeIndexAggregatorPtr MergeTreeIndexHypothesis::createIndexAggregator(cons
}
MergeTreeIndexConditionPtr MergeTreeIndexHypothesis::createIndexCondition(
const SelectQueryInfo &, ContextPtr) const
const ActionsDAGPtr &, ContextPtr) const
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not supported");
}

View File

@ -70,7 +70,7 @@ public:
MergeTreeIndexAggregatorPtr createIndexAggregator(const MergeTreeWriterSettings & settings) const override;
MergeTreeIndexConditionPtr createIndexCondition(
const SelectQueryInfo & query, ContextPtr context) const override;
const ActionsDAGPtr & filter_actions_dag, ContextPtr context) const override;
MergeTreeIndexMergedConditionPtr createIndexMergedCondition(
const SelectQueryInfo & query_info, StorageMetadataPtr storage_metadata) const override;

Some files were not shown because too many files have changed in this diff Show More