Merge branch 'master' into pufit/fix-definers-restore

This commit is contained in:
pufit 2024-06-03 19:15:48 -04:00
commit 5b3e3376f1
136 changed files with 4488 additions and 2666 deletions

View File

@ -12,7 +12,7 @@
#### Backward Incompatible Change
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_error_prone_window_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature

View File

@ -2,20 +2,22 @@
the file is autogenerated by utils/security-generator/generate_security.py
-->
# Security Policy
# ClickHouse Security Vulnerability Response Policy
## Security Announcements
Security fixes will be announced by posting them in the [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/).
## Security Change Log and Support
## Scope and Supported Versions
Details regarding security fixes are publicly reported in our [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/). A summary of known security vulnerabilities is shown at the bottom of this page.
The following versions of ClickHouse server are currently being supported with security updates:
Vulnerability notifications pre-release or during embargo periods are available to open source users and support customers registered for vulnerability alerts. Refer to our [Embargo Policy](#embargo-policy) below.
The following versions of ClickHouse server are currently supported with security updates:
| Version | Supported |
|:-|:-|
| 24.5 | ✔️ |
| 24.4 | ✔️ |
| 24.3 | ✔️ |
| 24.2 | ✔️ |
| 24.2 | |
| 24.1 | ❌ |
| 23.* | ❌ |
| 23.8 | ✔️ |
@ -37,7 +39,7 @@ The following versions of ClickHouse server are currently being supported with s
We're extremely grateful for security researchers and users that report vulnerabilities to the ClickHouse Open Source Community. All reports are thoroughly investigated by developers.
To report a potential vulnerability in ClickHouse please send the details about it to [security@clickhouse.com](mailto:security@clickhouse.com). We do not offer any financial rewards for reporting issues to us using this method. Alternatively, you can also submit your findings through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement.
To report a potential vulnerability in ClickHouse please send the details about it through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement.
### When Should I Report a Vulnerability?
@ -59,3 +61,21 @@ As the security issue moves from triage, to identified fix, to release planning
A public disclosure date is negotiated by the ClickHouse maintainers and the bug submitter. We prefer to fully disclose the bug as soon as possible once a user mitigation is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for vendor coordination. The timeframe for disclosure is from immediate (especially if it's already publicly known) to 90 days. For a vulnerability with a straightforward mitigation, we expect the report date to disclosure date to be on the order of 7 days.
## Embargo Policy
Open source users and support customers may subscribe to receive alerts during the embargo period by visiting [https://trust.clickhouse.com/?product=clickhouseoss](https://trust.clickhouse.com/?product=clickhouseoss), requesting access and subscribing for alerts. Subscribers agree not to make these notifications public, issue communications, share this information with others, or issue public patches before the disclosure date. Accidental disclosures must be reported immediately to trust@clickhouse.com. Failure to follow this policy or repeated leaks may result in removal from the subscriber list.
Participation criteria:
1. Be a current open source user or support customer with a valid corporate email domain (no @gmail.com, @azure.com, etc.).
1. Sign up to the ClickHouse OSS Trust Center at [https://trust.clickhouse.com](https://trust.clickhouse.com).
1. Accept the ClickHouse Security Vulnerability Response Policy as outlined above.
1. Subscribe to ClickHouse OSS Trust Center alerts.
Removal criteria:
1. Members may be removed for failure to follow this policy or repeated leaks.
1. Members may be removed for bounced messages (mail delivery failure).
1. Members may unsubscribe at any time.
Notification process:
ClickHouse will post notifications within our OSS Trust Center and notify subscribers. Subscribers must log in to the Trust Center to download the notification. The notification will include the timeframe for public disclosure.

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
#docker-official-library:off

View File

@ -65,46 +65,22 @@ function save_settings_clean()
script -q -c "clickhouse-local -q \"select * from system.settings into outfile '$out'\"" --log-out /dev/null
}
# We save the (numeric) version of the old server to compare setting changes between the 2
# We do this since we are testing against the latest release, not taking into account release candidates, so we might
# be testing current master (24.6) against the latest stable release (24.4)
function save_major_version()
{
local out=$1 && shift
clickhouse-local -q "SELECT a[1]::UInt64 * 100 + a[2]::UInt64 as v FROM (Select splitByChar('.', version()) as a) into outfile '$out'"
}
save_settings_clean 'old_settings.native'
save_major_version 'old_version.native'
# Initial run without S3 to create system.*_log on local file system to make it
# available for dump via clickhouse-local
configure
function remove_keeper_config()
{
sudo sed -i "/<$1>$2<\/$1>/d" /etc/clickhouse-server/config.d/keeper_port.xml
}
# async_replication setting doesn't exist on some older versions
remove_keeper_config "async_replication" "1"
# create_if_not_exists feature flag doesn't exist on some older versions
remove_keeper_config "create_if_not_exists" "[01]"
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>azure<|<object_storage_type>azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>local<|<object_storage_type>local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml
# latest_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+"
# commit_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+"
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml
rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml
rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml
rm /etc/clickhouse-server/config.d/storage_conf_02963.xml
rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml
rm /etc/clickhouse-server/config.d/handlers.yaml
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
rm /etc/clickhouse-server/users.d/s3_cache_new.xml
rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml
start
stop
mv /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/clickhouse-server.initial.log
@ -116,44 +92,11 @@ export USE_S3_STORAGE_FOR_MERGE_TREE=1
export ZOOKEEPER_FAULT_INJECTION=0
configure
# force_sync=false doesn't work correctly on some older versions
sudo sed -i "s|<force_sync>false</force_sync>|<force_sync>true</force_sync>|" /etc/clickhouse-server/config.d/keeper_port.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>azure<|<object_storage_type>azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>local<|<object_storage_type>local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml
# async_replication setting doesn't exist on some older versions
remove_keeper_config "async_replication" "1"
# create_if_not_exists feature flag doesn't exist on some older versions
remove_keeper_config "create_if_not_exists" "[01]"
# latest_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+"
# commit_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+"
# But we still need default disk because some tables loaded only into it
sudo sed -i "s|<main><disk>s3</disk></main>|<main><disk>s3</disk></main><default><disk>default</disk></default>|" /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
sudo chown clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
sudo chgrp clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml
rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml
rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml
rm /etc/clickhouse-server/config.d/storage_conf_02963.xml
rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml
rm /etc/clickhouse-server/config.d/handlers.yaml
rm /etc/clickhouse-server/config.d/block_number.xml
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
rm /etc/clickhouse-server/users.d/s3_cache_new.xml
rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml
start
clickhouse-client --query="SELECT 'Server version: ', version()"
@ -192,6 +135,7 @@ then
save_settings_clean 'new_settings.native'
clickhouse-local -nmq "
CREATE TABLE old_settings AS file('old_settings.native');
CREATE TABLE old_version AS file('old_version.native');
CREATE TABLE new_settings AS file('new_settings.native');
SELECT
@ -202,8 +146,11 @@ then
LEFT JOIN old_settings ON new_settings.name = old_settings.name
WHERE (new_settings.value != old_settings.value) AND (name NOT IN (
SELECT arrayJoin(tupleElement(changes, 'name'))
FROM system.settings_changes
WHERE version = extract(version(), '^(?:\\d+\\.\\d+)')
FROM
(
SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes
)
WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1)
))
SETTINGS join_use_nulls = 1
INTO OUTFILE 'changed_settings.txt'
@ -216,8 +163,11 @@ then
FROM old_settings
)) AND (name NOT IN (
SELECT arrayJoin(tupleElement(changes, 'name'))
FROM system.settings_changes
WHERE version = extract(version(), '^(?:\\d+\\.\\d+)')
FROM
(
SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes
)
WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1)
))
INTO OUTFILE 'new_settings.txt'
FORMAT PrettyCompactNoEscapes;

View File

@ -0,0 +1,366 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.5.1.1763-stable (647c154a94d) FIXME as compared to v24.4.1.2088-stable (6d4b31322d1)
#### Backward Incompatible Change
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions=1`. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Provide support for AzureBlobStorage function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)).
* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `StorageEmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) ([Duc Canh Le](https://github.com/canhld94)).
* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)).
* Adds the Form Format to read/write a single record in the application/x-www-form-urlencoded format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)).
* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)).
* New setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)).
* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)).
* Earlier our s3 storage and s3 table function didn't support selecting from archive files. I created a solution that allows to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)).
* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)).
* Add npy output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)).
* Added SQL functions `generateUUIDv7`, `generateUUIDv7ThreadMonotonic`, `generateUUIDv7NonMonotonic` (with different monotonicity/performance trade-offs) to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function `UUIDToNum` to extract bytes from a UUID and a new function `UUIDv7ToDateTime` to extract timestamp component from a UUID version 7. [#62852](https://github.com/ClickHouse/ClickHouse/pull/62852) ([Alexey Petrunyaka](https://github.com/pet74alex)).
* Backported in [#64307](https://github.com/ClickHouse/ClickHouse/issues/64307): Implement Dynamic data type that allows to store values of any type inside it without knowing all of them in advance. Dynamic type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)).
* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce StorageEmbeddedRocksDB table settings. [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)).
* Raw as a synonym for TSVRaw. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)).
* Added possibility to do cross join in temporary file if size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)).
* On Linux and MacOS, if the program has STDOUT redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE` ). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)).
* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
#### Performance Improvement
* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)).
* Process string functions XXXUTF8 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)).
* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)).
* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)).
* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)).
* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)).
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Raúl Marín](https://github.com/Algunenano)).
* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K,V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)).
* Multiline strings with border preservation and column width change. [#59940](https://github.com/ClickHouse/ClickHouse/pull/59940) ([Volodyachan](https://github.com/Volodyachan)).
* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)).
* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)).
* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* YAML Merge Key support. [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)).
* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)).
* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)).
* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support configuration substitutions from YAML files. [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)).
* Add TTL information in system parts_columns table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)).
* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)).
* Width of fields now correctly calculate, ignoring ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)).
* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)).
* Support English-style Unicode quotes, e.g. “Hello”, world. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allowed to create MaterializedMySQL database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)).
* Remove copying data when writing to filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)).
* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed tabulation from line numbering, correct handling of length when moving a line if the value has a tab, added tests. [#63493](https://github.com/ClickHouse/ClickHouse/pull/63493) ([Volodyachan](https://github.com/Volodyachan)).
* Add this `aggregate_function_group_array_has_limit_size`setting to support discarding data in some scenarios. [#63516](https://github.com/ClickHouse/ClickHouse/pull/63516) ([zhongyuankai](https://github.com/zhongyuankai)).
* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Automatically correct `max_block_size=0` to default value. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)).
* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)).
* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)).
* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)).
* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)).
* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)).
* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)).
* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)).
* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)).
#### Build/Testing/Packaging Improvement
* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Re-enable broken s390x build in CI. [#63135](https://github.com/ClickHouse/ClickHouse/pull/63135) ([Harry Lee](https://github.com/HarryLeeIBM)).
* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)).
* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix making backup when multiple shards are used. This PR fixes [#56566](https://github.com/ClickHouse/ClickHouse/issues/56566). [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix passing projections/indexes from CREATE query into inner table of MV. [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)).
* Fix boundRatio incorrect merge. [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)).
* Fix crash when using some functions with low-cardinality columns. [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix queries with FINAL give wrong result when table does not use adaptive granularity. [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)).
* Improve the detection of cgroups v2 memory controller in unusual locations. This fixes a warning that the cgroup memory observer was disabled because no cgroups v1 or v2 current memory file could be found. [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix subsequent use of external tables in client. [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash with untuple and unresolved lambda. [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)).
* Fix bug which could lead to server to accept connections before server is actually loaded. [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)).
* Fix intersect parts when restart after drop range. [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)).
* Fix a misbehavior when SQL security defaults don't load for old tables during server startup. [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)).
* JOIN filter push down filled join fix. Closes [#63228](https://github.com/ClickHouse/ClickHouse/issues/63228). [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix infinite loop while listing objects in Azure blob storage. [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)).
* CROSS join can be executed with any value `join_algorithm` setting, close [#62431](https://github.com/ClickHouse/ClickHouse/issues/62431). [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)).
* Fixed a potential crash caused by a `no space left` error when temporary data in the cache is used. [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)).
* Fix bug which could potentially lead to rare LOGICAL_ERROR during SELECT query with message: `Unexpected return type from materialize. Expected type_XXX. Got type_YYY.` Introduced in [#59379](https://github.com/ClickHouse/ClickHouse/issues/59379). [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)).
* Fix `X-ClickHouse-Timezone` header returning wrong timezone when using `session_timezone` as query level setting. [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types. [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical errors in queries with `GROUPING SETS` and `WHERE` and `group_by_use_nulls = true`, close [#60538](https://github.com/ClickHouse/ClickHouse/issues/60538). [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)).
* Fix backup of projection part in case projection was removed from table metadata, but part still has projection. [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix 'Every derived table must have its own alias' error for MYSQL dictionary source, close [#63341](https://github.com/ClickHouse/ClickHouse/issues/63341). [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)).
* Insert QueryFinish on AsyncInsertFlush with no data. [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)).
* Fix `system.query_log.used_dictionaries` logging. [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)).
* Avoid segafult in `MergeTreePrefetchedReadPool` while fetching projection parts. [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix rabbitmq heap-use-after-free found by clang-18, which can happen if an error is thrown from RabbitMQ during initialization of exchange and queues. [#63515](https://github.com/ClickHouse/ClickHouse/pull/63515) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry). [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)).
* Fix support for Array and Map with Keyed hashing functions and materialized keys. [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fixed Parquet filter pushdown not working with Analyzer. [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)).
* It is forbidden to convert MergeTree to replicated if the zookeeper path for this table already exists. [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)).
* Read only the necessary columns from VIEW (new analyzer). Closes [#62594](https://github.com/ClickHouse/ClickHouse/issues/62594). [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix rare case with missing data in the result of distributed query. [#63691](https://github.com/ClickHouse/ClickHouse/pull/63691) ([vdimir](https://github.com/vdimir)).
* Fix [#63539](https://github.com/ClickHouse/ClickHouse/issues/63539). Forbid WINDOW redefinition in new analyzer. [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)).
* Flatten_nested is broken with replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `SIZES_OF_COLUMNS_DOESNT_MATCH` error for queries with `arrayJoin` function in `WHERE`. Fixes [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653). [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `Not found column` and `CAST AS Map from array requires nested tuple of 2 elements` exceptions for distributed queries which use `Map(Nothing, Nothing)` type. Fixes [#63637](https://github.com/ClickHouse/ClickHouse/issues/63637). [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix possible `ILLEGAL_COLUMN` error in `partial_merge` join, close [#37928](https://github.com/ClickHouse/ClickHouse/issues/37928). [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)).
* `query_plan_remove_redundant_distinct` can break queries with WINDOW FUNCTIONS (with `allow_experimental_analyzer` is on). Fixes [#62820](https://github.com/ClickHouse/ClickHouse/issues/62820). [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY. [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)).
* Fix a query with a duplicating cycling alias. Fixes [#63320](https://github.com/ClickHouse/ClickHouse/issues/63320). [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed performance degradation of parsing data formats in INSERT query. This closes [#62918](https://github.com/ClickHouse/ClickHouse/issues/62918). This partially reverts [#42284](https://github.com/ClickHouse/ClickHouse/issues/42284), which breaks the original design and introduces more problems. [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add 'endpoint_subpath' S3 URI setting to allow plain_rewritable disks to share the same endpoint. [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix queries using parallel read buffer (e.g. with max_download_thread > 0) getting stuck when threads cannot be allocated. [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow JOIN filter push down to both streams if only single equivalent column is used in query. Closes [#63799](https://github.com/ClickHouse/ClickHouse/issues/63799). [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)).
* Remove the data from all disks after DROP with the Lazy database engines. Without these changes, orhpaned will remain on the disks. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix incorrect select query result when parallel replicas were used to read from a Materialized View. [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)).
* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client: - do not fail on ZNONODE errors - find super nodes inside super nodes - properly calculate subtree node count. [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)).
* Fix a error `Database name is empty` for remote queries with lambdas over the cluster with modified default database. Fixes [#63471](https://github.com/ClickHouse/ClickHouse/issues/63471). [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix SIGSEGV due to CPU/Real (`query_profiler_real_time_period_ns`/`query_profiler_cpu_time_period_ns`) profiler (has been an issue since 2022, that leads to periodic server crashes, especially if you were using distributed engine). [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)).
* Fixed `EXPLAIN CURRENT TRANSACTION` query. [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)).
* Fix analyzer - IN function with arbitrary deep sub-selects in materialized view to use insertion block. [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Allow `ALTER TABLE .. MODIFY|RESET SETTING` and `ALTER TABLE .. MODIFY COMMENT` for plain_rewritable disk. [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix Recursive CTE with distributed queries. Closes [#63790](https://github.com/ClickHouse/ClickHouse/issues/63790). [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
* Fix the `Not found column` error for queries with `skip_unused_shards = 1`, `LIMIT BY`, and the new analyzer. Fixes [#63943](https://github.com/ClickHouse/ClickHouse/issues/63943). [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* (Low-quality third-party Kusto Query Language). Resolve Client Abortion Issue When Using KQL Table Function in Interactive Mode. [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)).
* Backported in [#64356](https://github.com/ClickHouse/ClickHouse/issues/64356): Fix an `Cyclic aliases` error for cyclic aliases of different type (expression and function). Fixes [#63205](https://github.com/ClickHouse/ClickHouse/issues/63205). [#63993](https://github.com/ClickHouse/ClickHouse/pull/63993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Deserialize untrusted binary inputs in a safer way. [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)).
* Do not throw `Storage doesn't support FINAL` error for remote queries over non-MergeTree tables with `final = true` and new analyzer. Fixes [#63960](https://github.com/ClickHouse/ClickHouse/issues/63960). [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add missing settings to recoverLostReplica. [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)).
* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal). [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#64324](https://github.com/ClickHouse/ClickHouse/issues/64324): This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline Closes [#63777](https://github.com/ClickHouse/ClickHouse/issues/63777). [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)).
* Backported in [#64384](https://github.com/ClickHouse/ClickHouse/issues/64384): Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix azure backup writing multipart blocks as 1mb (read buffer size) instead of max_upload_part_size. [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#64541](https://github.com/ClickHouse/ClickHouse/issues/64541): Fix creating backups to S3 buckets with different credentials from the disk containing the file. [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)).
* Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#64332](https://github.com/ClickHouse/ClickHouse/issues/64332): The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
* Ignore `text_log` config when using Keeper. [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#64692](https://github.com/ClickHouse/ClickHouse/issues/64692): Fix Query Tree size validation. Closes [#63701](https://github.com/ClickHouse/ClickHouse/issues/63701). [#64377](https://github.com/ClickHouse/ClickHouse/pull/64377) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#64411](https://github.com/ClickHouse/ClickHouse/issues/64411): Fix `Logical error: Bad cast` for `Buffer` table with `PREWHERE`. Fixes [#64172](https://github.com/ClickHouse/ClickHouse/issues/64172). [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64625](https://github.com/ClickHouse/ClickHouse/issues/64625): Fix an error `Cannot find column` in distributed queries with constant CTE in the `GROUP BY` key. [#64519](https://github.com/ClickHouse/ClickHouse/pull/64519) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64682](https://github.com/ClickHouse/ClickHouse/issues/64682): Fix [#64612](https://github.com/ClickHouse/ClickHouse/issues/64612). Do not rewrite aggregation if `-If` combinator is already used. [#64638](https://github.com/ClickHouse/ClickHouse/pull/64638) ([Dmitry Novik](https://github.com/novikd)).
#### CI Fix or Improvement (changelog entry is not required)
* Implement cumulative A Sync status. [#61464](https://github.com/ClickHouse/ClickHouse/pull/61464) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add ability to run Azure tests in PR with label. [#63196](https://github.com/ClickHouse/ClickHouse/pull/63196) ([alesapin](https://github.com/alesapin)).
* Add azure run with msan. [#63238](https://github.com/ClickHouse/ClickHouse/pull/63238) ([alesapin](https://github.com/alesapin)).
* Improve cloud backport script. [#63282](https://github.com/ClickHouse/ClickHouse/pull/63282) ([Raúl Marín](https://github.com/Algunenano)).
* Use `/commit/` to have the URLs in [reports](https://play.clickhouse.com/play?user=play#c2VsZWN0IGRpc3RpbmN0IGNvbW1pdF91cmwgZnJvbSBjaGVja3Mgd2hlcmUgY2hlY2tfc3RhcnRfdGltZSA+PSBub3coKSAtIGludGVydmFsIDEgbW9udGggYW5kIHB1bGxfcmVxdWVzdF9udW1iZXI9NjA1MzI=) like https://github.com/ClickHouse/ClickHouse/commit/44f8bc5308b53797bec8cccc3bd29fab8a00235d and not like https://github.com/ClickHouse/ClickHouse/commits/44f8bc5308b53797bec8cccc3bd29fab8a00235d. [#63331](https://github.com/ClickHouse/ClickHouse/pull/63331) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Extra constraints for stress and fuzzer tests. [#63470](https://github.com/ClickHouse/ClickHouse/pull/63470) ([Raúl Marín](https://github.com/Algunenano)).
* Fix 02362_part_log_merge_algorithm flaky test. [#63635](https://github.com/ClickHouse/ClickHouse/pull/63635) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Fix test_odbc_interaction from aarch64 [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63787](https://github.com/ClickHouse/ClickHouse/pull/63787) ([alesapin](https://github.com/alesapin)).
* Fix test `test_catboost_evaluate` for aarch64. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63789](https://github.com/ClickHouse/ClickHouse/pull/63789) ([alesapin](https://github.com/alesapin)).
* Remove HDFS from disks config for one integration test for arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63832](https://github.com/ClickHouse/ClickHouse/pull/63832) ([alesapin](https://github.com/alesapin)).
* Bump version for old image in test_short_strings_aggregation to make it work on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63836](https://github.com/ClickHouse/ClickHouse/pull/63836) ([alesapin](https://github.com/alesapin)).
* Disable test `test_non_default_compression/test.py::test_preconfigured_deflateqpl_codec` on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63839](https://github.com/ClickHouse/ClickHouse/pull/63839) ([alesapin](https://github.com/alesapin)).
* Include checks like `Stateless tests (asan, distributed cache, meta storage in keeper, s3 storage) [2/3]` in `Mergeable Check` and `A Sync`. [#63945](https://github.com/ClickHouse/ClickHouse/pull/63945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix 02124_insert_deduplication_token_multiple_blocks. [#63950](https://github.com/ClickHouse/ClickHouse/pull/63950) ([Han Fei](https://github.com/hanfei1991)).
* Add `ClickHouseVersion.copy` method. Create a branch release in advance without spinning out the release to increase the stability. [#64039](https://github.com/ClickHouse/ClickHouse/pull/64039) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* The mime type is not 100% reliable for Python and shell scripts without shebangs; add a check for file extension. [#64062](https://github.com/ClickHouse/ClickHouse/pull/64062) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add retries in git submodule update. [#64125](https://github.com/ClickHouse/ClickHouse/pull/64125) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)
* Backported in [#64591](https://github.com/ClickHouse/ClickHouse/issues/64591): Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Do not remove server constants from GROUP BY key for secondary query."'. [#63297](https://github.com/ClickHouse/ClickHouse/pull/63297) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Introduce bulk loading to StorageEmbeddedRocksDB"'. [#63316](https://github.com/ClickHouse/ClickHouse/pull/63316) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Add tags for the test 03000_traverse_shadow_system_data_paths.sql to make it stable'. [#63366](https://github.com/ClickHouse/ClickHouse/pull/63366) ([Aleksei Filatov](https://github.com/aalexfvk)).
* NO CL ENTRY: 'Revert "Revert "Do not remove server constants from GROUP BY key for secondary query.""'. [#63415](https://github.com/ClickHouse/ClickHouse/pull/63415) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* NO CL ENTRY: 'Revert "Fix index analysis for `DateTime64`"'. [#63525](https://github.com/ClickHouse/ClickHouse/pull/63525) ([Raúl Marín](https://github.com/Algunenano)).
* NO CL ENTRY: 'Add `jwcrypto` to integration tests runner'. [#63551](https://github.com/ClickHouse/ClickHouse/pull/63551) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* NO CL ENTRY: 'Follow-up for the `binary_symbols` table in CI'. [#63802](https://github.com/ClickHouse/ClickHouse/pull/63802) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'chore(ci-workers): remove reusable from tailscale key'. [#63999](https://github.com/ClickHouse/ClickHouse/pull/63999) ([Gabriel Martinez](https://github.com/GMartinez-Sisti)).
* NO CL ENTRY: 'Revert "Update gui.md - Add ch-ui to open-source available tools."'. [#64064](https://github.com/ClickHouse/ClickHouse/pull/64064) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Prevent stack overflow in Fuzzer and Stress test'. [#64082](https://github.com/ClickHouse/ClickHouse/pull/64082) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Prevent conversion to Replicated if zookeeper path already exists"'. [#64214](https://github.com/ClickHouse/ClickHouse/pull/64214) ([Sergei Trifonov](https://github.com/serxa)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Remove http_max_chunk_size setting (too internal) [#60852](https://github.com/ClickHouse/ClickHouse/pull/60852) ([Azat Khuzhin](https://github.com/azat)).
* Fix race in refreshable materialized views causing SELECT to fail sometimes [#60883](https://github.com/ClickHouse/ClickHouse/pull/60883) ([Michael Kolupaev](https://github.com/al13n321)).
* Parallel replicas: table check failover [#61935](https://github.com/ClickHouse/ClickHouse/pull/61935) ([Igor Nikonov](https://github.com/devcrafter)).
* Avoid crashing on column type mismatch in a few dozen places [#62087](https://github.com/ClickHouse/ClickHouse/pull/62087) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix optimize_if_chain_to_multiif const NULL handling [#62104](https://github.com/ClickHouse/ClickHouse/pull/62104) ([Michael Kolupaev](https://github.com/al13n321)).
* Use intrusive lists for `ResourceRequest` instead of deque [#62165](https://github.com/ClickHouse/ClickHouse/pull/62165) ([Sergei Trifonov](https://github.com/serxa)).
* Analyzer: Fix validateAggregates for tables with different aliases [#62346](https://github.com/ClickHouse/ClickHouse/pull/62346) ([vdimir](https://github.com/vdimir)).
* Improve code and tests of `DROP` of multiple tables [#62359](https://github.com/ClickHouse/ClickHouse/pull/62359) ([zhongyuankai](https://github.com/zhongyuankai)).
* Fix exception message during writing to partitioned s3/hdfs/azure path with globs [#62423](https://github.com/ClickHouse/ClickHouse/pull/62423) ([Kruglov Pavel](https://github.com/Avogar)).
* Support UBSan on Clang-19 (master) [#62466](https://github.com/ClickHouse/ClickHouse/pull/62466) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Save the stacktrace of thread waiting on failing AsyncLoader job [#62719](https://github.com/ClickHouse/ClickHouse/pull/62719) ([Sergei Trifonov](https://github.com/serxa)).
* group_by_use_nulls strikes back [#62922](https://github.com/ClickHouse/ClickHouse/pull/62922) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: prefer column name to alias from array join [#62995](https://github.com/ClickHouse/ClickHouse/pull/62995) ([vdimir](https://github.com/vdimir)).
* CI: try separate the workflows file for GitHub's Merge Queue [#63123](https://github.com/ClickHouse/ClickHouse/pull/63123) ([Max K.](https://github.com/maxknv)).
* Try to fix coverage tests [#63130](https://github.com/ClickHouse/ClickHouse/pull/63130) ([Raúl Marín](https://github.com/Algunenano)).
* Fix azure backup flaky test [#63158](https://github.com/ClickHouse/ClickHouse/pull/63158) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Merging [#60920](https://github.com/ClickHouse/ClickHouse/issues/60920) [#63159](https://github.com/ClickHouse/ClickHouse/pull/63159) ([vdimir](https://github.com/vdimir)).
* QueryAnalysisPass improve QUALIFY validation [#63162](https://github.com/ClickHouse/ClickHouse/pull/63162) ([Maksim Kita](https://github.com/kitaisreal)).
* Add numpy tests for different endianness [#63189](https://github.com/ClickHouse/ClickHouse/pull/63189) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fallback action-runner to autoupdate when it's unable to start [#63195](https://github.com/ClickHouse/ClickHouse/pull/63195) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix possible endless loop while reading from azure [#63197](https://github.com/ClickHouse/ClickHouse/pull/63197) ([Anton Popov](https://github.com/CurtizJ)).
* Add information about materialized view security bug fix into the changelog [#63204](https://github.com/ClickHouse/ClickHouse/pull/63204) ([pufit](https://github.com/pufit)).
* Disable one query from 02994_sanity_check_settings [#63208](https://github.com/ClickHouse/ClickHouse/pull/63208) ([Raúl Marín](https://github.com/Algunenano)).
* Enable custom parquet encoder by default, attempt 2 [#63210](https://github.com/ClickHouse/ClickHouse/pull/63210) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version after release [#63215](https://github.com/ClickHouse/ClickHouse/pull/63215) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v24.4.1.2088-stable [#63217](https://github.com/ClickHouse/ClickHouse/pull/63217) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.3.3.102-lts [#63226](https://github.com/ClickHouse/ClickHouse/pull/63226) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.2.3.70-stable [#63227](https://github.com/ClickHouse/ClickHouse/pull/63227) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Return back [#61551](https://github.com/ClickHouse/ClickHouse/issues/61551) (More optimal loading of marks) [#63233](https://github.com/ClickHouse/ClickHouse/pull/63233) ([Anton Popov](https://github.com/CurtizJ)).
* Hide CI options under a spoiler [#63237](https://github.com/ClickHouse/ClickHouse/pull/63237) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Add `FROM` keyword to `TRUNCATE ALL TABLES` [#63241](https://github.com/ClickHouse/ClickHouse/pull/63241) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Minor follow-up to a renaming PR [#63260](https://github.com/ClickHouse/ClickHouse/pull/63260) ([Robert Schulze](https://github.com/rschu1ze)).
* More checks for concurrently deleted files and dirs in system.remote_data_paths [#63274](https://github.com/ClickHouse/ClickHouse/pull/63274) ([Alexander Gololobov](https://github.com/davenger)).
* Fix SettingsChangesHistory.h for allow_experimental_join_condition [#63278](https://github.com/ClickHouse/ClickHouse/pull/63278) ([Raúl Marín](https://github.com/Algunenano)).
* Update version_date.tsv and changelogs after v23.8.14.6-lts [#63285](https://github.com/ClickHouse/ClickHouse/pull/63285) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix azure flaky test [#63286](https://github.com/ClickHouse/ClickHouse/pull/63286) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix deadlock in `CacheDictionaryUpdateQueue` in case of exception in constructor [#63287](https://github.com/ClickHouse/ClickHouse/pull/63287) ([Nikita Taranov](https://github.com/nickitat)).
* DiskApp: fix 'list --recursive /' and crash on invalid arguments [#63296](https://github.com/ClickHouse/ClickHouse/pull/63296) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix terminate because of unhandled exception in `MergeTreeDeduplicationLog::shutdown` [#63298](https://github.com/ClickHouse/ClickHouse/pull/63298) ([Nikita Taranov](https://github.com/nickitat)).
* Move s3_plain_rewritable unit test to shell [#63317](https://github.com/ClickHouse/ClickHouse/pull/63317) ([Julia Kartseva](https://github.com/jkartseva)).
* Add tests for [#63264](https://github.com/ClickHouse/ClickHouse/issues/63264) [#63321](https://github.com/ClickHouse/ClickHouse/pull/63321) ([Raúl Marín](https://github.com/Algunenano)).
* Try fix segfault in `MergeTreeReadPoolBase::createTask` [#63323](https://github.com/ClickHouse/ClickHouse/pull/63323) ([Antonio Andelic](https://github.com/antonio2368)).
* Update README.md [#63326](https://github.com/ClickHouse/ClickHouse/pull/63326) ([Tyler Hannan](https://github.com/tylerhannan)).
* Skip unaccessible table dirs in system.remote_data_paths [#63330](https://github.com/ClickHouse/ClickHouse/pull/63330) ([Alexander Gololobov](https://github.com/davenger)).
* Add test for [#56287](https://github.com/ClickHouse/ClickHouse/issues/56287) [#63340](https://github.com/ClickHouse/ClickHouse/pull/63340) ([Raúl Marín](https://github.com/Algunenano)).
* Update README.md [#63350](https://github.com/ClickHouse/ClickHouse/pull/63350) ([Tyler Hannan](https://github.com/tylerhannan)).
* Add test for [#48049](https://github.com/ClickHouse/ClickHouse/issues/48049) [#63351](https://github.com/ClickHouse/ClickHouse/pull/63351) ([Raúl Marín](https://github.com/Algunenano)).
* Add option `query_id_prefix` to `clickhouse-benchmark` [#63352](https://github.com/ClickHouse/ClickHouse/pull/63352) ([Anton Popov](https://github.com/CurtizJ)).
* Rollback azurite to working version [#63354](https://github.com/ClickHouse/ClickHouse/pull/63354) ([alesapin](https://github.com/alesapin)).
* Randomize setting `enable_block_offset_column` in stress tests [#63355](https://github.com/ClickHouse/ClickHouse/pull/63355) ([Anton Popov](https://github.com/CurtizJ)).
* Fix AST parsing of invalid type names [#63357](https://github.com/ClickHouse/ClickHouse/pull/63357) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix some 00002_log_and_exception_messages_formatting flakiness [#63358](https://github.com/ClickHouse/ClickHouse/pull/63358) ([Michael Kolupaev](https://github.com/al13n321)).
* Add a test for [#55655](https://github.com/ClickHouse/ClickHouse/issues/55655) [#63380](https://github.com/ClickHouse/ClickHouse/pull/63380) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data race in `reportBrokenPart` [#63396](https://github.com/ClickHouse/ClickHouse/pull/63396) ([Antonio Andelic](https://github.com/antonio2368)).
* Workaround for `oklch()` inside canvas bug for firefox [#63404](https://github.com/ClickHouse/ClickHouse/pull/63404) ([Sergei Trifonov](https://github.com/serxa)).
* Add test for issue [#47862](https://github.com/ClickHouse/ClickHouse/issues/47862) [#63424](https://github.com/ClickHouse/ClickHouse/pull/63424) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix parsing of `CREATE INDEX` query [#63425](https://github.com/ClickHouse/ClickHouse/pull/63425) ([Anton Popov](https://github.com/CurtizJ)).
* We are using Shared Catalog in the CI Logs cluster [#63442](https://github.com/ClickHouse/ClickHouse/pull/63442) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix collection of coverage data in the CI Logs cluster [#63453](https://github.com/ClickHouse/ClickHouse/pull/63453) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky test for rocksdb bulk sink [#63457](https://github.com/ClickHouse/ClickHouse/pull/63457) ([Duc Canh Le](https://github.com/canhld94)).
* io_uring: refactor get reader from context [#63475](https://github.com/ClickHouse/ClickHouse/pull/63475) ([Tomer Shafir](https://github.com/tomershafir)).
* Analyzer setting max_streams_to_max_threads_ratio overflow fix [#63478](https://github.com/ClickHouse/ClickHouse/pull/63478) ([Maksim Kita](https://github.com/kitaisreal)).
* Add setting for better rendering of multiline string for pretty format [#63479](https://github.com/ClickHouse/ClickHouse/pull/63479) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix logical error when reloading config with customly created web disk broken after [#56367](https://github.com/ClickHouse/ClickHouse/issues/56367) [#63484](https://github.com/ClickHouse/ClickHouse/pull/63484) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add test for [#49307](https://github.com/ClickHouse/ClickHouse/issues/49307) [#63486](https://github.com/ClickHouse/ClickHouse/pull/63486) ([Anton Popov](https://github.com/CurtizJ)).
* Remove leftovers of GCC support in cmake rules [#63488](https://github.com/ClickHouse/ClickHouse/pull/63488) ([Azat Khuzhin](https://github.com/azat)).
* Fix ProfileEventTimeIncrement code [#63489](https://github.com/ClickHouse/ClickHouse/pull/63489) ([Azat Khuzhin](https://github.com/azat)).
* MergeTreePrefetchedReadPool: Print parent name when logging projection parts [#63522](https://github.com/ClickHouse/ClickHouse/pull/63522) ([Raúl Marín](https://github.com/Algunenano)).
* Correctly stop `asyncCopy` tasks in all cases [#63523](https://github.com/ClickHouse/ClickHouse/pull/63523) ([Antonio Andelic](https://github.com/antonio2368)).
* Almost everything should work on AArch64 (Part of [#58061](https://github.com/ClickHouse/ClickHouse/issues/58061)) [#63527](https://github.com/ClickHouse/ClickHouse/pull/63527) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update randomization of `old_parts_lifetime` [#63530](https://github.com/ClickHouse/ClickHouse/pull/63530) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update 02240_system_filesystem_cache_table.sh [#63531](https://github.com/ClickHouse/ClickHouse/pull/63531) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix data race in `DistributedSink` [#63538](https://github.com/ClickHouse/ClickHouse/pull/63538) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix azure tests run on master [#63540](https://github.com/ClickHouse/ClickHouse/pull/63540) ([alesapin](https://github.com/alesapin)).
* Find a proper commit for cumulative `A Sync` status [#63543](https://github.com/ClickHouse/ClickHouse/pull/63543) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add `no-s3-storage` tag to local_plain_rewritable ut [#63546](https://github.com/ClickHouse/ClickHouse/pull/63546) ([Julia Kartseva](https://github.com/jkartseva)).
* Go back to upstream lz4 submodule [#63574](https://github.com/ClickHouse/ClickHouse/pull/63574) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical error in ColumnTuple::tryInsert() [#63583](https://github.com/ClickHouse/ClickHouse/pull/63583) ([Michael Kolupaev](https://github.com/al13n321)).
* harmonize sumMap error messages on ILLEGAL_TYPE_OF_ARGUMENT [#63619](https://github.com/ClickHouse/ClickHouse/pull/63619) ([Yohann Jardin](https://github.com/yohannj)).
* Update README.md [#63631](https://github.com/ClickHouse/ClickHouse/pull/63631) ([Tyler Hannan](https://github.com/tylerhannan)).
* Ignore global profiler if system.trace_log is not enabled and fix really disable it for keeper standalone build [#63632](https://github.com/ClickHouse/ClickHouse/pull/63632) ([Azat Khuzhin](https://github.com/azat)).
* Fixes for 00002_log_and_exception_messages_formatting [#63634](https://github.com/ClickHouse/ClickHouse/pull/63634) ([Azat Khuzhin](https://github.com/azat)).
* Fix tests flakiness due to long SYSTEM FLUSH LOGS (explicitly specify old_parts_lifetime) [#63639](https://github.com/ClickHouse/ClickHouse/pull/63639) ([Azat Khuzhin](https://github.com/azat)).
* Update clickhouse-test help section [#63663](https://github.com/ClickHouse/ClickHouse/pull/63663) ([Ali](https://github.com/xogoodnow)).
* Fix bad test `02950_part_log_bytes_uncompressed` [#63672](https://github.com/ClickHouse/ClickHouse/pull/63672) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove leftovers of `optimize_monotonous_functions_in_order_by` [#63674](https://github.com/ClickHouse/ClickHouse/pull/63674) ([Nikita Taranov](https://github.com/nickitat)).
* tests: attempt to fix 02340_parts_refcnt_mergetree flakiness [#63684](https://github.com/ClickHouse/ClickHouse/pull/63684) ([Azat Khuzhin](https://github.com/azat)).
* Parallel replicas: simple cleanup [#63685](https://github.com/ClickHouse/ClickHouse/pull/63685) ([Igor Nikonov](https://github.com/devcrafter)).
* Cancel S3 reads properly when parallel reads are used [#63687](https://github.com/ClickHouse/ClickHouse/pull/63687) ([Antonio Andelic](https://github.com/antonio2368)).
* Explain map insertion order [#63690](https://github.com/ClickHouse/ClickHouse/pull/63690) ([Mark Needham](https://github.com/mneedham)).
* selectRangesToRead() simple cleanup [#63692](https://github.com/ClickHouse/ClickHouse/pull/63692) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix fuzzed analyzer_join_with_constant query [#63702](https://github.com/ClickHouse/ClickHouse/pull/63702) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add missing explicit instantiations of ColumnUnique [#63718](https://github.com/ClickHouse/ClickHouse/pull/63718) ([Raúl Marín](https://github.com/Algunenano)).
* Better asserts in ColumnString.h [#63719](https://github.com/ClickHouse/ClickHouse/pull/63719) ([Raúl Marín](https://github.com/Algunenano)).
* Don't randomize some settings in 02941_variant_type_* tests to avoid timeouts [#63721](https://github.com/ClickHouse/ClickHouse/pull/63721) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix flaky 03145_non_loaded_projection_backup.sh [#63728](https://github.com/ClickHouse/ClickHouse/pull/63728) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Userspace page cache: don't collect stats if cache is unused [#63730](https://github.com/ClickHouse/ClickHouse/pull/63730) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix insignificant UBSAN error in QueryAnalyzer::replaceNodesWithPositionalArguments() [#63734](https://github.com/ClickHouse/ClickHouse/pull/63734) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix a bug in resolving matcher inside lambda inside ARRAY JOIN [#63744](https://github.com/ClickHouse/ClickHouse/pull/63744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Remove unused CaresPTRResolver::cancel_requests method [#63754](https://github.com/ClickHouse/ClickHouse/pull/63754) ([Arthur Passos](https://github.com/arthurpassos)).
* Do not hide disk name [#63756](https://github.com/ClickHouse/ClickHouse/pull/63756) ([Kseniia Sumarokova](https://github.com/kssenii)).
* CI: remove Cancel and Debug workflows as redundant [#63757](https://github.com/ClickHouse/ClickHouse/pull/63757) ([Max K.](https://github.com/maxknv)).
* Security Policy: Add notification process [#63773](https://github.com/ClickHouse/ClickHouse/pull/63773) ([Leticia Webb](https://github.com/leticiawebb)).
* Fix typo [#63774](https://github.com/ClickHouse/ClickHouse/pull/63774) ([Anton Popov](https://github.com/CurtizJ)).
* Fix fuzzer when only explicit faults are used [#63775](https://github.com/ClickHouse/ClickHouse/pull/63775) ([Raúl Marín](https://github.com/Algunenano)).
* Settings typo [#63782](https://github.com/ClickHouse/ClickHouse/pull/63782) ([Rory Crispin](https://github.com/RoryCrispin)).
* Changed the previous value of `output_format_pretty_preserve_border_for_multiline_string` setting [#63783](https://github.com/ClickHouse/ClickHouse/pull/63783) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* fix antlr insertStmt for issue 63657 [#63811](https://github.com/ClickHouse/ClickHouse/pull/63811) ([GG Bond](https://github.com/zzyReal666)).
* Fix race in `ReplicatedMergeTreeLogEntryData` [#63816](https://github.com/ClickHouse/ClickHouse/pull/63816) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow allocation during job destructor in `ThreadPool` [#63829](https://github.com/ClickHouse/ClickHouse/pull/63829) ([Antonio Andelic](https://github.com/antonio2368)).
* io_uring: add basic io_uring clickhouse perf test [#63835](https://github.com/ClickHouse/ClickHouse/pull/63835) ([Tomer Shafir](https://github.com/tomershafir)).
* fix typo [#63838](https://github.com/ClickHouse/ClickHouse/pull/63838) ([Alexander Gololobov](https://github.com/davenger)).
* Remove unnecessary logging statements in MergeJoinTransform.cpp [#63860](https://github.com/ClickHouse/ClickHouse/pull/63860) ([vdimir](https://github.com/vdimir)).
* CI: disable ARM integration test cases with libunwind crash [#63867](https://github.com/ClickHouse/ClickHouse/pull/63867) ([Max K.](https://github.com/maxknv)).
* Fix some settings values in 02455_one_row_from_csv_memory_usage test to make it less flaky [#63874](https://github.com/ClickHouse/ClickHouse/pull/63874) ([Kruglov Pavel](https://github.com/Avogar)).
* Randomise `allow_experimental_parallel_reading_from_replicas` in stress tests [#63899](https://github.com/ClickHouse/ClickHouse/pull/63899) ([Nikita Taranov](https://github.com/nickitat)).
* Fix logs test for binary data by converting it to a valid UTF8 string. [#63909](https://github.com/ClickHouse/ClickHouse/pull/63909) ([Alexey Katsman](https://github.com/alexkats)).
* More sanity checks for parallel replicas [#63910](https://github.com/ClickHouse/ClickHouse/pull/63910) ([Nikita Taranov](https://github.com/nickitat)).
* Insignificant libunwind build fixes [#63946](https://github.com/ClickHouse/ClickHouse/pull/63946) ([Azat Khuzhin](https://github.com/azat)).
* Revert multiline pretty changes due to performance problems [#63947](https://github.com/ClickHouse/ClickHouse/pull/63947) ([Raúl Marín](https://github.com/Algunenano)).
* Some usability improvements for c++expr script [#63948](https://github.com/ClickHouse/ClickHouse/pull/63948) ([Azat Khuzhin](https://github.com/azat)).
* CI: aarch64: disable arm integration tests with kerberaized kafka [#63961](https://github.com/ClickHouse/ClickHouse/pull/63961) ([Max K.](https://github.com/maxknv)).
* Slightly better setting `force_optimize_projection_name` [#63997](https://github.com/ClickHouse/ClickHouse/pull/63997) ([Anton Popov](https://github.com/CurtizJ)).
* Better script to collect symbols statistics [#64013](https://github.com/ClickHouse/ClickHouse/pull/64013) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix a typo in Analyzer [#64022](https://github.com/ClickHouse/ClickHouse/pull/64022) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix libbcrypt for FreeBSD build [#64023](https://github.com/ClickHouse/ClickHouse/pull/64023) ([Azat Khuzhin](https://github.com/azat)).
* Fix searching for libclang_rt.builtins.*.a on FreeBSD [#64051](https://github.com/ClickHouse/ClickHouse/pull/64051) ([Azat Khuzhin](https://github.com/azat)).
* Fix waiting for mutations with retriable errors [#64063](https://github.com/ClickHouse/ClickHouse/pull/64063) ([Alexander Tokmakov](https://github.com/tavplubix)).
* harmonize h3PointDist* error messages [#64080](https://github.com/ClickHouse/ClickHouse/pull/64080) ([Yohann Jardin](https://github.com/yohannj)).
* This log message is better in Trace [#64081](https://github.com/ClickHouse/ClickHouse/pull/64081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* tests: fix expected error for 03036_reading_s3_archives (fixes CI) [#64089](https://github.com/ClickHouse/ClickHouse/pull/64089) ([Azat Khuzhin](https://github.com/azat)).
* Fix sanitizers [#64090](https://github.com/ClickHouse/ClickHouse/pull/64090) ([Azat Khuzhin](https://github.com/azat)).
* Update llvm/clang to 18.1.6 [#64091](https://github.com/ClickHouse/ClickHouse/pull/64091) ([Azat Khuzhin](https://github.com/azat)).
* CI: mergeable check redesign [#64093](https://github.com/ClickHouse/ClickHouse/pull/64093) ([Max K.](https://github.com/maxknv)).
* Move `isAllASCII` from UTFHelper to StringUtils [#64108](https://github.com/ClickHouse/ClickHouse/pull/64108) ([Robert Schulze](https://github.com/rschu1ze)).
* Clean up .clang-tidy after transition to Clang 18 [#64111](https://github.com/ClickHouse/ClickHouse/pull/64111) ([Robert Schulze](https://github.com/rschu1ze)).
* Ignore exception when checking for cgroupsv2 [#64118](https://github.com/ClickHouse/ClickHouse/pull/64118) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix UBSan error in negative positional arguments [#64127](https://github.com/ClickHouse/ClickHouse/pull/64127) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Syncing code [#64135](https://github.com/ClickHouse/ClickHouse/pull/64135) ([Antonio Andelic](https://github.com/antonio2368)).
* Losen build resource limits for unusual architectures [#64152](https://github.com/ClickHouse/ClickHouse/pull/64152) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* fix clang tidy [#64179](https://github.com/ClickHouse/ClickHouse/pull/64179) ([Han Fei](https://github.com/hanfei1991)).
* Fix global query profiler [#64187](https://github.com/ClickHouse/ClickHouse/pull/64187) ([Azat Khuzhin](https://github.com/azat)).
* CI: cancel running PR wf after adding to MQ [#64188](https://github.com/ClickHouse/ClickHouse/pull/64188) ([Max K.](https://github.com/maxknv)).
* Add debug logging to EmbeddedRocksDBBulkSink [#64203](https://github.com/ClickHouse/ClickHouse/pull/64203) ([vdimir](https://github.com/vdimir)).
* Fix special builds (due to excessive resource usage - memory/CPU) [#64204](https://github.com/ClickHouse/ClickHouse/pull/64204) ([Azat Khuzhin](https://github.com/azat)).
* Add gh to style-check dockerfile [#64227](https://github.com/ClickHouse/ClickHouse/pull/64227) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Followup for [#63691](https://github.com/ClickHouse/ClickHouse/issues/63691) [#64285](https://github.com/ClickHouse/ClickHouse/pull/64285) ([vdimir](https://github.com/vdimir)).
* Rename allow_deprecated_functions to allow_deprecated_error_prone_win… [#64358](https://github.com/ClickHouse/ClickHouse/pull/64358) ([Raúl Marín](https://github.com/Algunenano)).
* Update description for settings `cross_join_min_rows_to_compress` and `cross_join_min_bytes_to_compress` [#64360](https://github.com/ClickHouse/ClickHouse/pull/64360) ([Nikita Fomichev](https://github.com/fm4v)).
* Rename aggregate_function_group_array_has_limit_size [#64362](https://github.com/ClickHouse/ClickHouse/pull/64362) ([Raúl Marín](https://github.com/Algunenano)).
* Split tests 03039_dynamic_all_merge_algorithms to avoid timeouts [#64363](https://github.com/ClickHouse/ClickHouse/pull/64363) ([Kruglov Pavel](https://github.com/Avogar)).
* Clean settings in 02943_variant_read_subcolumns test [#64437](https://github.com/ClickHouse/ClickHouse/pull/64437) ([Kruglov Pavel](https://github.com/Avogar)).
* CI: Critical bugfix category in PR template [#64480](https://github.com/ClickHouse/ClickHouse/pull/64480) ([Max K.](https://github.com/maxknv)).

View File

@ -7,6 +7,8 @@ sidebar_label: Configuration Files
# Configuration Files
The ClickHouse server can be configured with configuration files in XML or YAML syntax. In most installation types, the ClickHouse server runs with `/etc/clickhouse-server/config.xml` as default configuration file, but it is also possible to specify the location of the configuration file manually at server startup using command line option `--config-file=` or `-C`. Additional configuration files may be placed into directory `config.d/` relative to the main configuration file, for example into directory `/etc/clickhouse-server/config.d/`. Files in this directory and the main configuration are merged in a preprocessing step before the configuration is applied in ClickHouse server. Configuration files are merged in alphabetical order. To simplify updates and improve modularization, it is best practice to keep the default `config.xml` file unmodified and place additional customization into `config.d/`.
(The ClickHouse keeper configuration lives in `/etc/clickhouse-keeper/keeper_config.xml` and thus the additional files need to be placed in `/etc/clickhouse-keeper/keeper_config.d/` )
It is possible to mix XML and YAML configuration files, for example you could have a main configuration file `config.xml` and additional configuration files `config.d/network.xml`, `config.d/timezone.yaml` and `config.d/keeper.yaml`. Mixing XML and YAML within a single configuration file is not supported. XML configuration files should use `<clickhouse>...</clickhouse>` as top-level tag. In YAML configuration files, `clickhouse:` is optional, the parser inserts it implicitly if absent.

View File

@ -1956,7 +1956,7 @@ Possible values:
- Positive integer.
- 0 — Asynchronous insertions are disabled.
Default value: `1000000`.
Default value: `10485760`.
### async_insert_max_query_number {#async-insert-max-query-number}

View File

@ -167,7 +167,7 @@ Performs the opposite operation of [hex](#hex). It interprets each pair of hexad
If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs&lt;Type&gt;](../../sql-reference/functions/type-conversion-functions.md#type-conversion-functions) functions.
:::note
:::note
If `unhex` is invoked from within the `clickhouse-client`, binary strings display using UTF-8.
:::
@ -322,11 +322,11 @@ Alias: `UNBIN`.
For a numeric argument `unbin()` does not return the inverse of `bin()`. If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs&lt;Type&gt;](../../sql-reference/functions/type-conversion-functions.md#reinterpretasuint8163264) functions.
:::note
:::note
If `unbin` is invoked from within the `clickhouse-client`, binary strings are displayed using UTF-8.
:::
Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isnt thrown).
Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isnt thrown).
**Arguments**
@ -482,7 +482,7 @@ mortonEncode(range_mask, args)
- `range_mask`: 1-8.
- `args`: up to 8 [unsigned integers](../data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
@ -626,7 +626,7 @@ Result:
Accepts a range mask (tuple) as a first argument and the code as the second argument.
Each number in the mask configures the amount of range shrink:<br/>
1 - no shrink<br/>
2 - 2x shrink<br/>
2 - 2x shrink<br/>
3 - 3x shrink<br/>
...<br/>
Up to 8x shrink.<br/>
@ -701,6 +701,267 @@ Result:
1 2 3 4 5 6 7 8
```
## hilbertEncode
Calculates code for Hilbert Curve for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
### Simple mode
Simple: accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
hilbertEncode(args)
```
**Parameters**
- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT hilbertEncode(3, 4);
```
Result:
```response
31
```
### Expanded mode
Accepts a range mask ([tuple](../../sql-reference/data-types/tuple.md)) as a first argument and up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
**Syntax**
```sql
hilbertEncode(range_mask, args)
```
**Parameters**
- `range_mask`: ([tuple](../../sql-reference/data-types/tuple.md))
- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
Query:
```sql
SELECT hilbertEncode((10,6), 1024, 16);
```
Result:
```response
4031541586602
```
Note: tuple size must be equal to the number of the other arguments.
**Example**
For a single argument without a tuple, the function returns the argument itself as the Hilbert index, since no dimensional mapping is needed.
Query:
```sql
SELECT hilbertEncode(1);
```
Result:
```response
1
```
**Example**
If a single argument is provided with a tuple specifying bit shifts, the function shifts the argument left by the specified number of bits.
Query:
```sql
SELECT hilbertEncode(tuple(2), 128);
```
Result:
```response
512
```
**Example**
The function also accepts columns as arguments:
Query:
First create the table and insert some data.
```sql
create table hilbert_numbers(
n1 UInt32,
n2 UInt32
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into hilbert_numbers (*) values(1,2);
```
Use column names instead of constants as function arguments to `hilbertEncode`
Query:
```sql
SELECT hilbertEncode(n1, n2) FROM hilbert_numbers;
```
Result:
```response
13
```
**implementation details**
Please note that you can fit only so many bits of information into Hilbert code as [UInt64](../../sql-reference/data-types/int-uint.md) has. Two arguments will have a range of maximum 2^32 (64/2) each. All overflow will be clamped to zero.
## hilbertDecode
Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.
As with the `hilbertEncode` function, this function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
hilbertDecode(tuple_size, code)
```
**Parameters**
- `tuple_size`: integer value no more than 2.
- `code`: [UInt64](../../sql-reference/data-types/int-uint.md) code.
**Returned value**
- [tuple](../../sql-reference/data-types/tuple.md) of the specified size.
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT hilbertDecode(2, 31);
```
Result:
```response
["3", "4"]
```
### Expanded mode
Accepts a range mask (tuple) as a first argument and up to 2 unsigned integers as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
As with the encode function, this is limited to 8 numbers at most.
**Example**
Hilbert code for one argument is always the argument itself (as a tuple).
Query:
```sql
SELECT hilbertDecode(1, 1);
```
Result:
```response
["1"]
```
**Example**
A single argument with a tuple specifying bit shifts will be right-shifted accordingly.
Query:
```sql
SELECT hilbertDecode(tuple(2), 32768);
```
Result:
```response
["128"]
```
**Example**
The function accepts a column of codes as a second argument:
First create the table and insert some data.
Query:
```sql
create table hilbert_numbers(
n1 UInt32,
n2 UInt32
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into hilbert_numbers (*) values(1,2);
```
Use column names instead of constants as function arguments to `hilbertDecode`
Query:
```sql
select untuple(hilbertDecode(2, hilbertEncode(n1, n2))) from hilbert_numbers;
```
Result:
```response
1 2
```

View File

@ -410,6 +410,10 @@ High compression levels are useful for asymmetric scenarios, like compress once,
- For compression, ZSTD_QAT tries to use an Intel® QAT offloading device ([QuickAssist Technology](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html)). If no such device was found, it will fallback to ZSTD compression in software.
- Decompression is always performed in software.
:::note
ZSTD_QAT is not available in ClickHouse Cloud.
:::
#### DEFLATE_QPL
`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply:

View File

@ -154,7 +154,8 @@ function _clickhouse_quote()
# Extract every option (everything that starts with "-") from the --help dialog.
function _clickhouse_get_options()
{
"$@" --help 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u
# By default --help will not print all settings, this is done only under --verbose
"$@" --help --verbose 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u
}
function _complete_for_clickhouse_generic_bin_impl()

View File

@ -11,7 +11,6 @@ namespace DB
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int KEEPER_EXCEPTION;
}
bool LSCommand::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const
@ -214,6 +213,143 @@ void GetStatCommand::execute(const ASTKeeperQuery * query, KeeperClient * client
std::cout << "numChildren = " << stat.numChildren << "\n";
}
namespace
{
/// Helper class for parallelized tree traversal
template <class UserCtx>
struct TraversalTask : public std::enable_shared_from_this<TraversalTask<UserCtx>>
{
using TraversalTaskPtr = std::shared_ptr<TraversalTask<UserCtx>>;
struct Ctx
{
std::deque<TraversalTaskPtr> new_tasks; /// Tasks for newly discovered children, that hasn't been started yet
std::deque<std::function<void(Ctx &)>> in_flight_list_requests; /// In-flight getChildren requests
std::deque<std::function<void(Ctx &)>> finish_callbacks; /// Callbacks to be called
KeeperClient * client;
UserCtx & user_ctx;
Ctx(KeeperClient * client_, UserCtx & user_ctx_) : client(client_), user_ctx(user_ctx_) {}
};
private:
const fs::path path;
const TraversalTaskPtr parent;
Int64 child_tasks = 0;
Int64 nodes_in_subtree = 1;
public:
TraversalTask(const fs::path & path_, TraversalTaskPtr parent_)
: path(path_)
, parent(parent_)
{
}
/// Start traversing the subtree
void onStart(Ctx & ctx)
{
/// tryGetChildren doesn't throw if the node is not found (was deleted in the meantime)
std::shared_ptr<std::future<Coordination::ListResponse>> list_request =
std::make_shared<std::future<Coordination::ListResponse>>(ctx.client->zookeeper->asyncTryGetChildren(path));
ctx.in_flight_list_requests.push_back([task = this->shared_from_this(), list_request](Ctx & ctx_) mutable
{
task->onGetChildren(ctx_, list_request->get());
});
}
/// Called when getChildren request returns
void onGetChildren(Ctx & ctx, const Coordination::ListResponse & response)
{
const bool traverse_children = ctx.user_ctx.onListChildren(path, response.names);
if (traverse_children)
{
/// Schedule traversal of each child
for (const auto & child : response.names)
{
auto task = std::make_shared<TraversalTask>(path / child, this->shared_from_this());
ctx.new_tasks.push_back(task);
}
child_tasks = response.names.size();
}
if (child_tasks == 0)
finish(ctx);
}
/// Called when a child subtree has been traversed
void onChildTraversalFinished(Ctx & ctx, Int64 child_nodes_in_subtree)
{
nodes_in_subtree += child_nodes_in_subtree;
--child_tasks;
/// Finish if all children have been traversed
if (child_tasks == 0)
finish(ctx);
}
private:
/// This node and all its children have been traversed
void finish(Ctx & ctx)
{
ctx.user_ctx.onFinishChildrenTraversal(path, nodes_in_subtree);
if (!parent)
return;
/// Notify the parent that we have finished traversing the subtree
ctx.finish_callbacks.push_back([p = this->parent, child_nodes_in_subtree = this->nodes_in_subtree](Ctx & ctx_)
{
p->onChildTraversalFinished(ctx_, child_nodes_in_subtree);
});
}
};
/// Traverses the tree in parallel and calls user callbacks
/// Parallelization is achieved by sending multiple async getChildren requests to Keeper, but all processing is done in a single thread
template <class UserCtx>
void parallelized_traverse(const fs::path & path, KeeperClient * client, size_t max_in_flight_requests, UserCtx & ctx_)
{
typename TraversalTask<UserCtx>::Ctx ctx(client, ctx_);
auto root_task = std::make_shared<TraversalTask<UserCtx>>(path, nullptr);
ctx.new_tasks.push_back(root_task);
/// Until there is something to do
while (!ctx.new_tasks.empty() || !ctx.in_flight_list_requests.empty() || !ctx.finish_callbacks.empty())
{
/// First process all finish callbacks, they don't wait for anything and allow to free memory
while (!ctx.finish_callbacks.empty())
{
auto callback = std::move(ctx.finish_callbacks.front());
ctx.finish_callbacks.pop_front();
callback(ctx);
}
/// Make new requests if there are less than max in flight
while (!ctx.new_tasks.empty() && ctx.in_flight_list_requests.size() < max_in_flight_requests)
{
auto task = std::move(ctx.new_tasks.front());
ctx.new_tasks.pop_front();
task->onStart(ctx);
}
/// Wait for first request in the queue to finish
if (!ctx.in_flight_list_requests.empty())
{
auto request = std::move(ctx.in_flight_list_requests.front());
ctx.in_flight_list_requests.pop_front();
request(ctx);
}
}
}
} /// anonymous namespace
bool FindSuperNodes::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const
{
ASTPtr threshold;
@ -237,27 +373,21 @@ void FindSuperNodes::execute(const ASTKeeperQuery * query, KeeperClient * client
auto threshold = query->args[0].safeGet<UInt64>();
auto path = client->getAbsolutePath(query->args[1].safeGet<String>());
Coordination::Stat stat;
if (!client->zookeeper->exists(path, &stat))
return; /// It is ok if node was deleted meanwhile
if (stat.numChildren >= static_cast<Int32>(threshold))
std::cout << static_cast<String>(path) << "\t" << stat.numChildren << "\n";
Strings children;
auto status = client->zookeeper->tryGetChildren(path, children);
if (status == Coordination::Error::ZNONODE)
return; /// It is ok if node was deleted meanwhile
else if (status != Coordination::Error::ZOK)
throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string());
std::sort(children.begin(), children.end());
auto next_query = *query;
for (const auto & child : children)
struct
{
next_query.args[1] = DB::Field(path / child);
execute(&next_query, client);
}
bool onListChildren(const fs::path & path, const Strings & children) const
{
if (children.size() >= threshold)
std::cout << static_cast<String>(path) << "\t" << children.size() << "\n";
return true;
}
void onFinishChildrenTraversal(const fs::path &, Int64) const {}
size_t threshold;
} ctx {.threshold = threshold };
parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx);
}
bool DeleteStaleBackups::parse(IParser::Pos & /* pos */, std::shared_ptr<ASTKeeperQuery> & /* node */, Expected & /* expected */) const
@ -322,38 +452,28 @@ bool FindBigFamily::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> &
return true;
}
/// DFS the subtree and return the number of nodes in the subtree
static Int64 traverse(const fs::path & path, KeeperClient * client, std::vector<std::tuple<Int64, String>> & result)
{
Int64 nodes_in_subtree = 1;
Strings children;
auto status = client->zookeeper->tryGetChildren(path, children);
if (status == Coordination::Error::ZNONODE)
return 0;
else if (status != Coordination::Error::ZOK)
throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string());
for (auto & child : children)
nodes_in_subtree += traverse(path / child, client, result);
result.emplace_back(nodes_in_subtree, path.string());
return nodes_in_subtree;
}
void FindBigFamily::execute(const ASTKeeperQuery * query, KeeperClient * client) const
{
auto path = client->getAbsolutePath(query->args[0].safeGet<String>());
auto n = query->args[1].safeGet<UInt64>();
std::vector<std::tuple<Int64, String>> result;
struct
{
std::vector<std::tuple<Int64, String>> result;
traverse(path, client, result);
bool onListChildren(const fs::path &, const Strings &) const { return true; }
std::sort(result.begin(), result.end(), std::greater());
for (UInt64 i = 0; i < std::min(result.size(), static_cast<size_t>(n)); ++i)
std::cout << std::get<1>(result[i]) << "\t" << std::get<0>(result[i]) << "\n";
void onFinishChildrenTraversal(const fs::path & path, Int64 nodes_in_subtree)
{
result.emplace_back(nodes_in_subtree, path.string());
}
} ctx;
parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx);
std::sort(ctx.result.begin(), ctx.result.end(), std::greater());
for (UInt64 i = 0; i < std::min(ctx.result.size(), static_cast<size_t>(n)); ++i)
std::cout << std::get<1>(ctx.result[i]) << "\t" << std::get<0>(ctx.result[i]) << "\n";
}
bool RMCommand::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const

View File

@ -9,8 +9,6 @@ set (CLICKHOUSE_KEEPER_LINK
clickhouse_common_zookeeper
daemon
dbms
${LINK_RESOURCE_LIB}
)
clickhouse_program_add(keeper)
@ -210,8 +208,6 @@ if (BUILD_STANDALONE_KEEPER)
loggers_no_text_log
clickhouse_common_io
clickhouse_parsers # Otherwise compression will not built. FIXME.
${LINK_RESOURCE_LIB_STANDALONE_KEEPER}
)
set_target_properties(clickhouse-keeper PROPERTIES RUNTIME_OUTPUT_DIRECTORY ../)

View File

@ -14,8 +14,6 @@ set (CLICKHOUSE_SERVER_LINK
clickhouse_storages_system
clickhouse_table_functions
${LINK_RESOURCE_LIB}
PUBLIC
daemon
)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,157 @@
#pragma once
#include <Analyzer/HashUtils.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/Resolve/IdentifierLookup.h>
#include <Core/Joins.h>
#include <Core/NamesAndTypes.h>
#include <Interpreters/Context_fwd.h>
#include <Parsers/NullsAction.h>
namespace DB
{
struct GetColumnsOptions;
struct IdentifierResolveScope;
struct AnalysisTableExpressionData;
class QueryExpressionsAliasVisitor ;
class QueryNode;
class JoinNode;
class ColumnNode;
using ProjectionName = String;
using ProjectionNames = std::vector<ProjectionName>;
struct Settings;
class IdentifierResolver
{
public:
IdentifierResolver(
std::unordered_set<std::string_view> & ctes_in_resolve_process_,
std::unordered_map<QueryTreeNodePtr, ProjectionName> & node_to_projection_name_)
: ctes_in_resolve_process(ctes_in_resolve_process_)
, node_to_projection_name(node_to_projection_name_)
{}
/// Utility functions
static bool isExpressionNodeType(QueryTreeNodeType node_type);
static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type);
static bool isSubqueryNodeType(QueryTreeNodeType node_type);
static bool isTableExpressionNodeType(QueryTreeNodeType node_type);
static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node);
static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const DataTypePtr & compound_expression_type,
const Identifier & valid_identifier_prefix,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const QueryTreeNodePtr & table_expression,
const AnalysisTableExpressionData & table_expression_data,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static std::vector<String> collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set<Identifier> & valid_identifiers);
static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path, const ContextPtr & context);
static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded(
const QueryTreeNodePtr & resolved_identifier,
const JoinKind & join_kind,
std::optional<JoinTableSide> resolved_side,
IdentifierResolveScope & scope);
/// Resolve identifier functions
static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context);
QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier,
size_t identifier_bind_size,
const QueryTreeNodePtr & compound_expression,
String compound_expression_source,
IdentifierResolveScope & scope,
bool can_be_not_found = false);
QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr matchArrayJoinSubcolumns(
const QueryTreeNodePtr & array_join_column_inner_expression,
const ColumnNode & array_join_column_expression_typed,
const QueryTreeNodePtr & resolved_expression,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & join_tree_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromStorage(
const Identifier & identifier,
const QueryTreeNodePtr & table_expression_node,
const AnalysisTableExpressionData & table_expression_data,
IdentifierResolveScope & scope,
size_t identifier_column_qualifier_parts,
bool can_be_not_found = false);
/// CTEs that are currently in resolve process
std::unordered_set<std::string_view> & ctes_in_resolve_process;
/// Global expression node to projection name map
std::unordered_map<QueryTreeNodePtr, ProjectionName> & node_to_projection_name;
};
}

File diff suppressed because it is too large Load Diff

View File

@ -4,6 +4,7 @@
#include <Analyzer/HashUtils.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/Resolve/IdentifierLookup.h>
#include <Analyzer/Resolve/IdentifierResolver.h>
#include <Core/Joins.h>
#include <Core/NamesAndTypes.h>
@ -121,16 +122,6 @@ public:
private:
/// Utility functions
static bool isExpressionNodeType(QueryTreeNodeType node_type);
static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type);
static bool isSubqueryNodeType(QueryTreeNodeType node_type);
static bool isTableExpressionNodeType(QueryTreeNodeType node_type);
static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node);
static ProjectionName calculateFunctionProjectionName(const QueryTreeNodePtr & function_node,
const ProjectionNames & parameters_projection_names,
const ProjectionNames & arguments_projection_names);
@ -149,34 +140,6 @@ private:
const ProjectionName & fill_to_expression_projection_name,
const ProjectionName & fill_step_expression_projection_name);
static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const DataTypePtr & compound_expression_type,
const Identifier & valid_identifier_prefix,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const QueryTreeNodePtr & table_expression,
const AnalysisTableExpressionData & table_expression_data,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static std::vector<String> collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set<Identifier> & valid_identifiers);
static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path);
QueryTreeNodePtr tryGetLambdaFromSQLUserDefinedFunctions(const std::string & function_name, ContextPtr context);
void evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & query_tree_node, IdentifierResolveScope & scope);
@ -204,84 +167,18 @@ private:
static std::optional<JoinTableSide> getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node);
static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded(
const QueryTreeNodePtr & resolved_identifier,
const JoinKind & join_kind,
std::optional<JoinTableSide> resolved_side,
IdentifierResolveScope & scope);
/// Resolve identifier functions
static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context);
QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier,
size_t identifier_bind_size,
const QueryTreeNodePtr & compound_expression,
String compound_expression_source,
IdentifierResolveScope & scope,
bool can_be_not_found = false);
QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromAliases(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope,
IdentifierResolveSettings identifier_resolve_settings);
QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr matchArrayJoinSubcolumns(
const QueryTreeNodePtr & array_join_column_inner_expression,
const ColumnNode & array_join_column_expression_typed,
const QueryTreeNodePtr & resolved_expression,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & join_tree_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope);
IdentifierResolveResult tryResolveIdentifierInParentScopes(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
IdentifierResolveResult tryResolveIdentifier(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope,
IdentifierResolveSettings identifier_resolve_settings = {});
QueryTreeNodePtr tryResolveIdentifierFromStorage(
const Identifier & identifier,
const QueryTreeNodePtr & table_expression_node,
const AnalysisTableExpressionData & table_expression_data,
IdentifierResolveScope & scope,
size_t identifier_column_qualifier_parts,
bool can_be_not_found = false);
/// Resolve query tree nodes functions
void qualifyColumnNodesWithProjectionNames(const QueryTreeNodes & column_nodes,
@ -362,6 +259,8 @@ private:
/// Global expression node to projection name map
std::unordered_map<QueryTreeNodePtr, ProjectionName> node_to_projection_name;
IdentifierResolver identifier_resolver; // (ctes_in_resolve_process, node_to_projection_name);
/// Global resolve expression node to projection names map
std::unordered_map<QueryTreeNodePtr, ProjectionNames> resolved_expressions;

View File

@ -0,0 +1,71 @@
#pragma once
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Utils.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/// Used to replace columns that changed type because of JOIN to their original type
class ReplaceColumnsVisitor : public InDepthQueryTreeVisitor<ReplaceColumnsVisitor>
{
public:
explicit ReplaceColumnsVisitor(const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_, const ContextPtr & context_)
: replacement_map(replacement_map_)
, context(context_)
{}
/// Apply replacement transitively, because column may change it's type twice, one to have a supertype and then because of `joun_use_nulls`
static QueryTreeNodePtr findTransitiveReplacement(QueryTreeNodePtr node, const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_)
{
auto it = replacement_map_.find(node);
QueryTreeNodePtr result_node = nullptr;
for (; it != replacement_map_.end(); it = replacement_map_.find(result_node))
{
if (result_node && result_node->isEqual(*it->second))
{
Strings map_dump;
for (const auto & [k, v]: replacement_map_)
map_dump.push_back(fmt::format("{} -> {} (is_equals: {}, is_same: {})",
k.node->dumpTree(), v->dumpTree(), k.node->isEqual(*v), k.node == v));
throw Exception(ErrorCodes::LOGICAL_ERROR, "Infinite loop in query tree replacement map: {}", fmt::join(map_dump, "; "));
}
chassert(it->second);
result_node = it->second;
}
return result_node;
}
void visitImpl(QueryTreeNodePtr & node)
{
if (auto replacement_node = findTransitiveReplacement(node, replacement_map))
node = replacement_node;
if (auto * function_node = node->as<FunctionNode>(); function_node && function_node->isResolved())
rerunFunctionResolve(function_node, context);
}
/// We want to re-run resolve for function _after_ its arguments are replaced
bool shouldTraverseTopToBottom() const { return false; }
bool needChildVisit(QueryTreeNodePtr & /* parent */, QueryTreeNodePtr & child)
{
/// Visit only expressions, but not subqueries
return child->getNodeType() == QueryTreeNodeType::IDENTIFIER
|| child->getNodeType() == QueryTreeNodeType::LIST
|| child->getNodeType() == QueryTreeNodeType::FUNCTION
|| child->getNodeType() == QueryTreeNodeType::COLUMN;
}
private:
const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map;
const ContextPtr & context;
};
}

View File

@ -60,12 +60,9 @@ ColumnPtr IColumnDummy::filter(const Filter & filt, ssize_t /*result_size_hint*/
return cloneDummy(bytes);
}
void IColumnDummy::expand(const IColumn::Filter & mask, bool inverted)
void IColumnDummy::expand(const IColumn::Filter & mask, bool)
{
size_t bytes = countBytesInFilter(mask);
if (inverted)
bytes = mask.size() - bytes;
s = bytes;
s = mask.size();
}
ColumnPtr IColumnDummy::permute(const Permutation & perm, size_t limit) const

View File

@ -77,7 +77,7 @@ INSTANTIATE(IPv6)
#undef INSTANTIATE
template <bool inverted, bool column_is_short, typename Container>
template <bool inverted, typename Container>
static size_t extractMaskNumericImpl(
PaddedPODArray<UInt8> & mask,
const Container & data,
@ -85,42 +85,27 @@ static size_t extractMaskNumericImpl(
const PaddedPODArray<UInt8> * null_bytemap,
PaddedPODArray<UInt8> * nulls)
{
if constexpr (!column_is_short)
{
if (data.size() != mask.size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a full data column is not equal to the size of a mask");
}
if (data.size() != mask.size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a full data column is not equal to the size of a mask");
size_t ones_count = 0;
size_t data_index = 0;
size_t mask_size = mask.size();
size_t data_size = data.size();
for (size_t i = 0; i != mask_size && data_index != data_size; ++i)
for (size_t i = 0; i != mask_size; ++i)
{
// Change mask only where value is 1.
if (!mask[i])
continue;
UInt8 value;
size_t index;
if constexpr (column_is_short)
{
index = data_index;
++data_index;
}
else
index = i;
if (null_bytemap && (*null_bytemap)[index])
if (null_bytemap && (*null_bytemap)[i])
{
value = null_value;
if (nulls)
(*nulls)[i] = 1;
}
else
value = static_cast<bool>(data[index]);
value = static_cast<bool>(data[i]);
if constexpr (inverted)
value = !value;
@ -131,12 +116,6 @@ static size_t extractMaskNumericImpl(
mask[i] = value;
}
if constexpr (column_is_short)
{
if (data_index != data_size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a short column is not equal to the number of ones in a mask");
}
return ones_count;
}
@ -155,10 +134,7 @@ static bool extractMaskNumeric(
const auto & data = numeric_column->getData();
size_t ones_count;
if (column->size() < mask.size())
ones_count = extractMaskNumericImpl<inverted, true>(mask, data, null_value, null_bytemap, nulls);
else
ones_count = extractMaskNumericImpl<inverted, false>(mask, data, null_value, null_bytemap, nulls);
ones_count = extractMaskNumericImpl<inverted>(mask, data, null_value, null_bytemap, nulls);
mask_info.has_ones = ones_count > 0;
mask_info.has_zeros = ones_count != mask.size();
@ -279,25 +255,32 @@ void maskedExecute(ColumnWithTypeAndName & column, const PaddedPODArray<UInt8> &
if (!column_function)
return;
size_t original_size = column.column->size();
ColumnWithTypeAndName result;
/// If mask contains only zeros, we can just create
/// an empty column with the execution result type.
if (!mask_info.has_ones)
{
/// If mask contains only zeros, we can just create a column with default values as it will be ignored
auto result_type = column_function->getResultType();
auto empty_column = result_type->createColumn();
result = {std::move(empty_column), result_type, ""};
auto default_column = result_type->createColumnConstWithDefaultValue(original_size)->convertToFullColumnIfConst();
column = {default_column, result_type, ""};
}
/// Filter column only if mask contains zeros.
else if (mask_info.has_zeros)
{
/// If it contains both zeros and ones, we need to execute the function only on the mask values
/// First we filter the column, which creates a new column, then we apply the column, and finally we expand it
/// Expanding is done to keep consistency in function calls (all columns the same size) and it's ok
/// since the values won't be used by `if`
auto filtered = column_function->filter(mask, -1);
result = typeid_cast<const ColumnFunction *>(filtered.get())->reduce();
auto filter_after_execution = typeid_cast<const ColumnFunction *>(filtered.get())->reduce();
auto mut_column = IColumn::mutate(std::move(filter_after_execution.column));
mut_column->expand(mask, false);
column.column = std::move(mut_column);
}
else
result = column_function->reduce();
column = column_function->reduce();
column = std::move(result);
chassert(column.column->size() == original_size);
}
void executeColumnIfNeeded(ColumnWithTypeAndName & column, bool empty)

View File

@ -260,6 +260,8 @@ class IColumn;
M(Bool, force_primary_key, false, "Throw an exception if there is primary key in a table, and it is not used.", 0) \
M(Bool, use_skip_indexes, true, "Use data skipping indexes during query execution.", 0) \
M(Bool, use_skip_indexes_if_final, false, "If query has FINAL, then skipping data based on indexes may produce incorrect result, hence disabled by default.", 0) \
M(Bool, materialize_skip_indexes_on_insert, true, "If true skip indexes are calculated on inserts, otherwise skip indexes will be calculated only during merges", 0) \
M(Bool, materialize_statistics_on_insert, true, "If true statistics are calculated on inserts, otherwise statistics will be calculated only during merges", 0) \
M(String, ignore_data_skipping_indices, "", "Comma separated list of strings or literals with the name of the data skipping indices that should be excluded during query execution.", 0) \
\
M(String, force_data_skipping_indices, "", "Comma separated list of strings or literals with the name of the data skipping indices that should be used during query execution, otherwise an exception will be thrown.", 0) \

View File

@ -85,7 +85,9 @@ namespace SettingsChangesHistory
/// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972)
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"24.6", {{"input_format_parquet_use_native_reader", false, false, "When reading Parquet files, to use native reader instead of arrow reader."},
{"24.6", {{"materialize_skip_indexes_on_insert", true, true, "Added new setting to allow to disable materialization of skip indexes on insert"},
{"materialize_statistics_on_insert", true, true, "Added new setting to allow to disable materialization of statistics on insert"},
{"input_format_parquet_use_native_reader", false, false, "When reading Parquet files, to use native reader instead of arrow reader."},
{"hdfs_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in HDFS engine instead of empty query result"},
{"azure_throw_on_zero_files_match", false, false, "Allow to throw an error when ListObjects request cannot match any files in AzureBlobStorage engine instead of empty query result"},
{"s3_validate_request_settings", true, true, "Allow to disable S3 request settings validation"},

View File

@ -450,7 +450,10 @@ MutableColumns CacheDictionary<dictionary_key_type>::aggregateColumnsInOrderOfKe
if (default_mask)
{
if (state.isDefault())
{
(*default_mask)[key_index] = 1;
aggregated_column->insertDefault();
}
else
{
(*default_mask)[key_index] = 0;
@ -536,7 +539,10 @@ MutableColumns CacheDictionary<dictionary_key_type>::aggregateColumns(
}
if (default_mask)
{
aggregated_column->insertDefault(); /// Any default is ok
(*default_mask)[key_index] = 1;
}
else
{
/// Insert default value

View File

@ -189,7 +189,6 @@ private:
const time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
size_t fetched_columns_index = 0;
size_t fetched_columns_index_without_default = 0;
size_t keys_size = keys.size();
PaddedPODArray<FetchedKey> fetched_keys;
@ -211,15 +210,10 @@ private:
result.expired_keys_size += static_cast<size_t>(key_state == KeyState::expired);
result.key_index_to_state[key_index] = {key_state,
default_mask ? fetched_columns_index_without_default : fetched_columns_index};
result.key_index_to_state[key_index] = {key_state, fetched_columns_index};
fetched_keys[fetched_columns_index] = FetchedKey(cell.element_index, cell.is_default);
++fetched_columns_index;
if (!cell.is_default)
++fetched_columns_index_without_default;
result.key_index_to_state[key_index].setDefaultValue(cell.is_default);
result.default_keys_size += cell.is_default;
}
@ -233,8 +227,7 @@ private:
auto & attribute = attributes[attribute_index];
auto & fetched_column = *result.fetched_columns[attribute_index];
fetched_column.reserve(default_mask ? fetched_columns_index_without_default :
fetched_columns_index);
fetched_column.reserve(fetched_columns_index);
if (!default_mask)
{
@ -689,7 +682,11 @@ private:
auto fetched_key = fetched_keys[fetched_key_index];
if (unlikely(fetched_key.is_default))
{
default_mask[fetched_key_index] = 1;
auto v = ValueType{};
value_setter(v);
}
else
{
default_mask[fetched_key_index] = 0;

View File

@ -174,6 +174,9 @@ Columns DirectDictionary<dictionary_key_type>::getColumns(
{
if (!mask_filled)
(*default_mask)[requested_key_index] = 1;
Field value{};
result_column->insert(value);
}
else
{

View File

@ -92,24 +92,20 @@ ColumnPtr FlatDictionary::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t, const Array & value, bool) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
ids,
[&](size_t row, StringRef value, bool is_null)
@ -119,18 +115,15 @@ ColumnPtr FlatDictionary::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); }, default_mask);
}
else
{
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
ids,
[&](size_t row, const auto value, bool is_null)
@ -140,17 +133,9 @@ ColumnPtr FlatDictionary::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t row, const auto value, bool) { out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (attribute.is_nullable_set)
vec_null_map_to->resize(keys_found);
}
else
{
@ -643,11 +628,8 @@ void FlatDictionary::getItemsImpl(
}
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t FlatDictionary::getItemsShortCircuitImpl(
const Attribute & attribute,
const PaddedPODArray<UInt64> & keys,
ValueSetter && set_value,
IColumn::Filter & default_mask) const
void FlatDictionary::getItemsShortCircuitImpl(
const Attribute & attribute, const PaddedPODArray<UInt64> & keys, ValueSetter && set_value, IColumn::Filter & default_mask) const
{
const auto rows = keys.size();
default_mask.resize(rows);
@ -660,22 +642,23 @@ size_t FlatDictionary::getItemsShortCircuitImpl(
if (key < loaded_keys.size() && loaded_keys[key])
{
keys_found++;
default_mask[row] = 0;
if constexpr (is_nullable)
set_value(keys_found, container[key], attribute.is_nullable_set->find(key) != nullptr);
set_value(row, container[key], attribute.is_nullable_set->find(key) != nullptr);
else
set_value(keys_found, container[key], false);
++keys_found;
set_value(row, container[key], false);
}
else
{
default_mask[row] = 1;
set_value(row, AttributeType{}, true);
}
}
query_count.fetch_add(rows, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <typename T>

View File

@ -166,11 +166,8 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
const Attribute & attribute,
const PaddedPODArray<UInt64> & keys,
ValueSetter && set_value,
IColumn::Filter & default_mask) const;
void getItemsShortCircuitImpl(
const Attribute & attribute, const PaddedPODArray<UInt64> & keys, ValueSetter && set_value, IColumn::Filter & default_mask) const;
template <typename T>
void resize(Attribute & attribute, UInt64 key);

View File

@ -650,24 +650,20 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](const size_t, const Array & value, bool) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, keys_object, [&](const size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
keys_object,
[&](size_t row, StringRef value, bool is_null)
@ -677,7 +673,7 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
@ -688,7 +684,7 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
keys_object,
[&](size_t row, const auto value, bool is_null)
@ -698,17 +694,9 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](size_t row, const auto value, bool) { out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, keys_object, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{
@ -834,7 +822,7 @@ void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsImpl(
template <DictionaryKeyType dictionary_key_type, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
@ -870,14 +858,16 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
++keys_found;
}
else
{
default_mask[key_index] = 1;
set_value(key_index, AttributeType{}, true);
}
keys_extractor.rollbackCurrentKey();
}
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sharded>
@ -929,7 +919,7 @@ void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsImpl(
template <DictionaryKeyType dictionary_key_type, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
const KeyIndexToElementIndex & key_index_to_element_index,
ValueSetter && set_value,
@ -938,7 +928,6 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
const auto & attribute_containers = std::get<AttributeContainerShardsType<AttributeType>>(attribute.containers);
const size_t keys_size = key_index_to_element_index.size();
size_t shard = 0;
size_t keys_found = 0;
for (size_t key_index = 0; key_index < keys_size; ++key_index)
{
@ -955,7 +944,6 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
if (element_index != -1)
{
keys_found++;
const auto & attribute_container = attribute_containers[shard];
size_t found_element_index = static_cast<size_t>(element_index);
@ -966,9 +954,11 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
else
set_value(key_index, element, false);
}
else
{
set_value(key_index, AttributeType{}, true);
}
}
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sharded>

View File

@ -228,7 +228,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
@ -244,7 +244,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
const KeyIndexToElementIndex & key_index_to_element_index,
ValueSetter && set_value,

View File

@ -245,12 +245,12 @@ private:
ValueSetter && set_value,
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullSetter>
size_t getItemsShortCircuitImpl(
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullAndDefaultSetter>
void getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
NullSetter && set_null,
NullAndDefaultSetter && set_null_and_default,
IColumn::Filter & default_mask) const;
template <typename GetContainersFunc>
@ -428,17 +428,16 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](const size_t, const Array & value) { out->insert(value); },
[&](size_t) {},
[&](size_t) { out->insertDefault(); },
default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
@ -447,7 +446,7 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
if (is_attribute_nullable)
{
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
extractor,
[&](size_t row, StringRef value)
@ -463,11 +462,11 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
default_mask);
}
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](size_t, StringRef value) { out->insertData(value.data, value.size); },
[&](size_t) {},
[&](size_t) { out->insertDefault(); },
default_mask);
}
else
@ -475,7 +474,7 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
extractor,
[&](size_t row, const auto value)
@ -486,18 +485,9 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
[&](size_t row) { (*vec_null_map_to)[row] = true; },
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](size_t row, const auto value) { out[row] = value; },
[&](size_t) {},
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, extractor, [&](size_t row, const auto value) { out[row] = value; }, [&](size_t) {}, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{
@ -1112,12 +1102,12 @@ void HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsImpl(
}
template <DictionaryKeyType dictionary_key_type, bool sparse, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullSetter>
size_t HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCircuitImpl(
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullAndDefaultSetter>
void HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
NullSetter && set_null,
NullAndDefaultSetter && set_null_and_default,
IColumn::Filter & default_mask) const
{
const auto & attribute_containers = std::get<CollectionsHolder<AttributeType>>(attribute.containers);
@ -1143,20 +1133,22 @@ size_t HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCirc
// Need to consider items in is_nullable_sets as well, see blockToAttributes()
else if (is_nullable && (*attribute.is_nullable_sets)[shard].find(key) != nullptr)
{
set_null(key_index);
set_null_and_default(key_index);
default_mask[key_index] = 0;
++keys_found;
}
else
{
set_null_and_default(key_index);
default_mask[key_index] = 1;
}
keys_extractor.rollbackCurrentKey();
}
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sparse, bool sharded>

View File

@ -249,39 +249,27 @@ ColumnPtr IPAddressDictionary::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t, const Array & value) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t, const Array & value) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t, StringRef value) { out->insertData(value.data, value.size); },
default_mask);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t, StringRef value) { out->insertData(value.data, value.size); }, default_mask);
}
else
{
auto & out = column->getData();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t row, const auto value) { return out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t row, const auto value) { return out[row] = value; }, default_mask);
}
}
else
@ -783,7 +771,10 @@ size_t IPAddressDictionary::getItemsByTwoKeyColumnsShortCircuitImpl(
keys_found++;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
return keys_found;
}
@ -822,7 +813,10 @@ size_t IPAddressDictionary::getItemsByTwoKeyColumnsShortCircuitImpl(
keys_found++;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
return keys_found;
}
@ -893,11 +887,8 @@ void IPAddressDictionary::getItemsImpl(
}
template <typename AttributeType, typename ValueSetter>
size_t IPAddressDictionary::getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetter && set_value,
IColumn::Filter & default_mask) const
void IPAddressDictionary::getItemsShortCircuitImpl(
const Attribute & attribute, const Columns & key_columns, ValueSetter && set_value, IColumn::Filter & default_mask) const
{
const auto & first_column = key_columns.front();
const size_t rows = first_column->size();
@ -909,7 +900,8 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
keys_found = getItemsByTwoKeyColumnsShortCircuitImpl<AttributeType>(
attribute, key_columns, std::forward<ValueSetter>(set_value), default_mask);
query_count.fetch_add(rows, std::memory_order_relaxed);
return keys_found;
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return;
}
auto & vec = std::get<ContainerType<AttributeType>>(attribute.maps);
@ -931,7 +923,10 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
default_mask[i] = 0;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
}
else if (type_id == TypeIndex::IPv6 || type_id == TypeIndex::FixedString)
@ -949,7 +944,10 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
default_mask[i] = 0;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
}
else
@ -957,7 +955,6 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
query_count.fetch_add(rows, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <typename T>

View File

@ -193,12 +193,9 @@ private:
ValueSetter && set_value,
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType,typename ValueSetter>
size_t getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetter && set_value,
IColumn::Filter & default_mask) const;
template <typename AttributeType, typename ValueSetter>
void getItemsShortCircuitImpl(
const Attribute & attribute, const Columns & key_columns, ValueSetter && set_value, IColumn::Filter & default_mask) const;
template <typename T>
void setAttributeValueImpl(Attribute & attribute, const T value); /// NOLINT

View File

@ -475,7 +475,11 @@ void IPolygonDictionary::getItemsShortCircuitImpl(
default_mask[requested_key_index] = 0;
}
else
{
auto value = AttributeType{};
set_value(value);
default_mask[requested_key_index] = 1;
}
}
query_count.fetch_add(requested_key_size, std::memory_order_relaxed);

View File

@ -56,27 +56,20 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t, const Array & value, bool)
{
out->insert(value);
},
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, modified_key_columns, [&](size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
modified_key_columns,
[&](size_t row, StringRef value, bool is_null)
@ -86,13 +79,10 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t, StringRef value, bool)
{
out->insertData(value.data, value.size);
},
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
default_mask);
}
else
@ -100,7 +90,7 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
modified_key_columns,
[&](size_t row, const auto value, bool is_null)
@ -110,20 +100,9 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t row, const auto value, bool)
{
out[row] = value;
},
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, modified_key_columns, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{

View File

@ -245,7 +245,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename ValueType, bool is_nullable>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetterFunc<ValueType> && set_value,

View File

@ -1,7 +1,7 @@
#include <Dictionaries/RangeHashedDictionary.h>
#define INSTANTIATE_GET_ITEMS_SHORT_CIRCUIT_IMPL(DictionaryKeyType, IsNullable, ValueType) \
template size_t RangeHashedDictionary<DictionaryKeyType>::getItemsShortCircuitImpl<ValueType, IsNullable>( \
template void RangeHashedDictionary<DictionaryKeyType>::getItemsShortCircuitImpl<ValueType, IsNullable>( \
const Attribute & attribute, \
const Columns & key_columns, \
typename RangeHashedDictionary<DictionaryKeyType>::ValueSetterFunc<ValueType> && set_value, \
@ -18,7 +18,7 @@ namespace DB
template <DictionaryKeyType dictionary_key_type>
template <typename ValueType, bool is_nullable>
size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
void RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
typename RangeHashedDictionary<dictionary_key_type>::ValueSetterFunc<ValueType> && set_value,
@ -120,6 +120,7 @@ size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
}
default_mask[key_index] = 1;
set_value(key_index, ValueType{}, true);
keys_extractor.rollbackCurrentKey();
}
@ -127,6 +128,5 @@ size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
}

View File

@ -807,6 +807,7 @@ std::unordered_map<String, ColumnPtr> RegExpTreeDictionary::match(
if (attributes_to_set.contains(name_))
continue;
columns[name_]->insertDefault();
default_mask.value().get()[key_idx] = 1;
}

View File

@ -14,6 +14,7 @@ namespace DB
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int LOGICAL_ERROR;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int SIZES_OF_ARRAYS_DONT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
@ -298,4 +299,27 @@ bool isDecimalOrNullableDecimal(const DataTypePtr & type)
return isDecimal(assert_cast<const DataTypeNullable *>(type.get())->getNestedType());
}
/// Note that, for historical reasons, most of the functions use the first argument size to determine which is the
/// size of all the columns. When short circuit optimization was introduced, `input_rows_count` was also added for
/// all functions, but many have not been adjusted
void checkFunctionArgumentSizes(const ColumnsWithTypeAndName & arguments, size_t input_rows_count)
{
for (size_t i = 0; i < arguments.size(); i++)
{
if (isColumnConst(*arguments[i].column))
continue;
size_t current_size = arguments[i].column->size();
if (current_size != input_rows_count)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Expected the argument nº#{} ('{}' of type {}) to have {} rows, but it has {}",
i + 1,
arguments[i].name,
arguments[i].type->getName(),
input_rows_count,
current_size);
}
}
}

View File

@ -197,4 +197,6 @@ struct NullPresence
NullPresence getNullPresense(const ColumnsWithTypeAndName & args);
bool isDecimalOrNullableDecimal(const DataTypePtr & type);
void checkFunctionArgumentSizes(const ColumnsWithTypeAndName & arguments, size_t input_rows_count);
}

View File

@ -0,0 +1,142 @@
#pragma once
#include <Functions/IFunction.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Functions/FunctionHelpers.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
extern const int ILLEGAL_COLUMN;
}
class FunctionSpaceFillingCurveEncode: public IFunction
{
public:
bool isVariadic() const override
{
return true;
}
size_t getNumberOfArguments() const override
{
return 0;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DB::DataTypes & arguments) const override
{
size_t vector_start_index = 0;
if (arguments.empty())
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"At least one UInt argument is required for function {}",
getName());
if (WhichDataType(arguments[0]).isTuple())
{
vector_start_index = 1;
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].get());
auto tuple_size = type_tuple->getElements().size();
if (tuple_size != (arguments.size() - 1))
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} for function {}, tuple size should be equal to number of UInt arguments",
arguments[0]->getName(), getName());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
}
}
for (size_t i = vector_start_index; i < arguments.size(); i++)
{
const auto & arg = arguments[i];
if (!WhichDataType(arg).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument for function {}, should be a native UInt",
arg->getName(), getName());
}
return std::make_shared<DataTypeUInt64>();
}
};
template <UInt8 max_dimensions, UInt8 min_ratio, UInt8 max_ratio>
class FunctionSpaceFillingCurveDecode: public IFunction
{
public:
size_t getNumberOfArguments() const override
{
return 2;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
UInt64 tuple_size = 0;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
if (!col_const)
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be a constant (UInt or Tuple)",
arguments[0].type->getName(), getName());
if (!WhichDataType(arguments[1].type).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be a native UInt",
arguments[1].type->getName(), getName());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
{
tuple_size = mask->tupleSize();
}
else if (WhichDataType(arguments[0].type).isNativeUInt())
{
tuple_size = col_const->getUInt(0);
}
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be UInt or Tuple",
arguments[0].type->getName(), getName());
if (tuple_size > max_dimensions || tuple_size < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal first argument for function {}, should be a number in range 1-{} or a Tuple of such size",
getName(), String{max_dimensions});
if (mask)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].type.get());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > max_ratio || ratio < min_ratio)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} in tuple for function {}, should be a number in range {}-{}",
ratio, getName(), String{min_ratio}, String{max_ratio});
}
}
DataTypes types(tuple_size);
for (size_t i = 0; i < tuple_size; i++)
{
types[i] = std::make_shared<DataTypeUInt64>();
}
return std::make_shared<DataTypeTuple>(types);
}
};
}

View File

@ -47,7 +47,6 @@ namespace ErrorCodes
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_COLUMN;
extern const int TYPE_MISMATCH;
extern const int LOGICAL_ERROR;
}
@ -655,18 +654,6 @@ private:
result_column = if_func->build(if_args)->execute(if_args, result_type, rows);
}
#ifdef ABORT_ON_LOGICAL_ERROR
void validateShortCircuitResult(const ColumnPtr & column, const IColumn::Filter & filter) const
{
size_t expected_size = filter.size() - countBytesInFilter(filter);
size_t col_size = column->size();
if (col_size != expected_size)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Invalid size of getColumnsOrDefaultShortCircuit result. Column has {} rows, but filter contains {} bytes.",
col_size, expected_size);
}
#endif
ColumnPtr executeDictionaryRequest(
std::shared_ptr<const IDictionary> & dictionary,
@ -696,11 +683,6 @@ private:
IColumn::Filter default_mask;
result_columns = dictionary->getColumns(attribute_names, attribute_tuple_type.getElements(), key_columns, key_types, default_mask);
#ifdef ABORT_ON_LOGICAL_ERROR
for (const auto & column : result_columns)
validateShortCircuitResult(column, default_mask);
#endif
auto [defaults_column, mask_column] =
getDefaultsShortCircuit(std::move(default_mask), result_type, last_argument);
@ -736,10 +718,6 @@ private:
IColumn::Filter default_mask;
result = dictionary->getColumn(attribute_names[0], attribute_type, key_columns, key_types, default_mask);
#ifdef ABORT_ON_LOGICAL_ERROR
validateShortCircuitResult(result, default_mask);
#endif
auto [defaults_column, mask_column] =
getDefaultsShortCircuit(std::move(default_mask), result_type, last_argument);

View File

@ -440,9 +440,6 @@ void NO_INLINE conditional(SourceA && src_a, SourceB && src_b, Sink && sink, con
const UInt8 * cond_pos = condition.data();
const UInt8 * cond_end = cond_pos + condition.size();
bool a_is_short = src_a.getColumnSize() < condition.size();
bool b_is_short = src_b.getColumnSize() < condition.size();
while (cond_pos < cond_end)
{
if (*cond_pos)
@ -450,10 +447,8 @@ void NO_INLINE conditional(SourceA && src_a, SourceB && src_b, Sink && sink, con
else
writeSlice(src_b.getWhole(), sink);
if (!a_is_short || *cond_pos)
src_a.next();
if (!b_is_short || !*cond_pos)
src_b.next();
src_a.next();
src_b.next();
++cond_pos;
sink.next();

View File

@ -110,7 +110,6 @@ void convertLowCardinalityColumnsToFull(ColumnsWithTypeAndName & args)
column.type = recursiveRemoveLowCardinality(column.type);
}
}
}
ColumnPtr IExecutableFunction::defaultImplementationForConstantArguments(
@ -277,6 +276,7 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType
size_t new_input_rows_count = columns_without_low_cardinality.empty()
? input_rows_count
: columns_without_low_cardinality.front().column->size();
checkFunctionArgumentSizes(columns_without_low_cardinality, new_input_rows_count);
auto res = executeWithoutLowCardinalityColumns(columns_without_low_cardinality, dictionary_type, new_input_rows_count, dry_run);
bool res_is_constant = isColumnConst(*res);
@ -311,6 +311,8 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType
ColumnPtr IExecutableFunction::execute(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, bool dry_run) const
{
checkFunctionArgumentSizes(arguments, input_rows_count);
bool use_default_implementation_for_sparse_columns = useDefaultImplementationForSparseColumns();
/// DataTypeFunction does not support obtaining default (isDefaultAt())
/// ColumnFunction does not support getting specific values.

View File

@ -3,11 +3,12 @@
#include <Core/ColumnNumbers.h>
#include <Core/ColumnsWithTypeAndName.h>
#include <Core/Field.h>
#include <Core/ValuesWithType.h>
#include <Core/Names.h>
#include <Core/IResolvedFunction.h>
#include <Common/Exception.h>
#include <Core/Names.h>
#include <Core/ValuesWithType.h>
#include <DataTypes/IDataType.h>
#include <Functions/FunctionHelpers.h>
#include <Common/Exception.h>
#include "config.h"
@ -133,8 +134,12 @@ public:
~IFunctionBase() override = default;
virtual ColumnPtr execute( /// NOLINT
const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, bool dry_run = false) const
const ColumnsWithTypeAndName & arguments,
const DataTypePtr & result_type,
size_t input_rows_count,
bool dry_run = false) const
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return prepare(arguments)->execute(arguments, result_type, input_rows_count, dry_run);
}

View File

@ -18,11 +18,13 @@ protected:
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const final
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return function->executeImpl(arguments, result_type, input_rows_count);
}
ColumnPtr executeDryRunImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const final
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return function->executeImplDryRun(arguments, result_type, input_rows_count);
}

View File

@ -205,13 +205,13 @@ private:
return 4;
}
/// Cast content from integer to string, and append result string to buffer.
/// Make sure digits number in result string is no less than total_digits by padding leading '0'
/// Casts val from integer to string, then appends result string to buffer.
/// Makes sure digits number in result string is no less than min_digits by padding leading '0'.
/// Notice: '-' is not counted as digit.
/// For example:
/// val = -123, total_digits = 2 => dest = "-123"
/// val = -123, total_digits = 3 => dest = "-123"
/// val = -123, total_digits = 4 => dest = "-0123"
/// val = -123, min_digits = 2 => dest = "-123"
/// val = -123, min_digits = 3 => dest = "-123"
/// val = -123, min_digits = 4 => dest = "-0123"
static size_t writeNumberWithPadding(char * dest, std::integral auto val, size_t min_digits)
{
using T = decltype(val);
@ -226,9 +226,10 @@ private:
++digits;
}
/// Possible sign
size_t pos = 0;
n = val;
/// Possible sign
if constexpr (is_signed_v<T>)
if (val < 0)
{
@ -245,16 +246,17 @@ private:
}
/// Digits
size_t digits_written = 0;
while (w >= 100)
{
w /= 100;
writeNumber2(dest + pos, n / w);
pos += 2;
digits_written += 2;
n = n % w;
}
if (n)
if (digits_written < digits)
{
dest[pos] = '0' + n;
++pos;

View File

@ -0,0 +1,124 @@
#include <Common/BitHelpers.h>
#include <Functions/FunctionFactory.h>
#include <Functions/PerformanceAdaptors.h>
#include "hilbertDecode2DLUT.h"
#include <limits>
namespace DB
{
class FunctionHilbertDecode : public FunctionSpaceFillingCurveDecode<2, 0, 32>
{
public:
static constexpr auto name = "hilbertDecode";
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionHilbertDecode>();
}
String getName() const override { return name; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
size_t num_dimensions;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
num_dimensions = mask->tupleSize();
else
num_dimensions = col_const->getUInt(0);
const ColumnPtr & col_code = arguments[1].column;
Columns tuple_columns(num_dimensions);
const auto shrink = [mask](const UInt64 value, const UInt8 column_num)
{
if (mask)
return value >> mask->getColumn(column_num).getUInt(0);
return value;
};
auto col0 = ColumnUInt64::create();
auto & vec0 = col0->getData();
vec0.resize(input_rows_count);
if (num_dimensions == 1)
{
for (size_t i = 0; i < input_rows_count; i++)
{
vec0[i] = shrink(col_code->getUInt(i), 0);
}
tuple_columns[0] = std::move(col0);
return ColumnTuple::create(tuple_columns);
}
auto col1 = ColumnUInt64::create();
auto & vec1 = col1->getData();
vec1.resize(input_rows_count);
if (num_dimensions == 2)
{
for (size_t i = 0; i < input_rows_count; i++)
{
const auto res = FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(col_code->getUInt(i));
vec0[i] = shrink(std::get<0>(res), 0);
vec1[i] = shrink(std::get<1>(res), 1);
}
tuple_columns[0] = std::move(col0);
tuple_columns[1] = std::move(col1);
return ColumnTuple::create(tuple_columns);
}
return ColumnTuple::create(tuple_columns);
}
};
REGISTER_FUNCTION(HilbertDecode)
{
factory.registerFunction<FunctionHilbertDecode>(FunctionDocumentation{
.description=R"(
Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.
The function has two modes of operation:
- Simple
- Expanded
Simple Mode: Accepts the desired tuple size as the first argument (up to 2) and the Hilbert index as the second argument. This mode decodes the index into a tuple of the specified size.
[example:simple]
Will decode into: `(8, 0)`
The resulting tuple size cannot be more than 2
Expanded Mode: Takes a range mask (tuple) as the first argument and the Hilbert index as the second argument.
Each number in the mask specifies the number of bits by which the corresponding decoded argument will be right-shifted, effectively scaling down the output values.
[example:range_shrank]
Note: see hilbertEncode() docs on why range change might be beneficial.
Still limited to 2 numbers at most.
Hilbert code for one argument is always the argument itself (as a tuple).
[example:identity]
Produces: `(1)`
A single argument with a tuple specifying bit shifts will be right-shifted accordingly.
[example:identity_shrank]
Produces: `(128)`
The function accepts a column of codes as a second argument:
[example:from_table]
The range tuple must be a constant:
[example:from_table_range]
)",
.examples{
{"simple", "SELECT hilbertDecode(2, 64)", ""},
{"range_shrank", "SELECT hilbertDecode((1,2), 1572864)", ""},
{"identity", "SELECT hilbertDecode(1, 1)", ""},
{"identity_shrank", "SELECT hilbertDecode(tuple(2), 512)", ""},
{"from_table", "SELECT hilbertDecode(2, code) FROM table", ""},
{"from_table_range", "SELECT hilbertDecode((1,2), code) FROM table", ""},
},
.categories {"Hilbert coding", "Hilbert Curve"}
});
}
}

View File

@ -0,0 +1,145 @@
#pragma once
#include <Functions/FunctionSpaceFillingCurve.h>
namespace DB
{
namespace HilbertDetails
{
template <UInt8 bit_step>
class HilbertDecodeLookupTable
{
public:
constexpr static UInt8 LOOKUP_TABLE[0] = {};
};
template <>
class HilbertDecodeLookupTable<1>
{
public:
constexpr static UInt8 LOOKUP_TABLE[16] = {
4, 1, 3, 10,
0, 6, 7, 13,
15, 9, 8, 2,
11, 14, 12, 5
};
};
template <>
class HilbertDecodeLookupTable<2>
{
public:
constexpr static UInt8 LOOKUP_TABLE[64] = {
0, 20, 21, 49, 18, 3, 7, 38,
26, 11, 15, 46, 61, 41, 40, 12,
16, 1, 5, 36, 8, 28, 29, 57,
10, 30, 31, 59, 39, 54, 50, 19,
47, 62, 58, 27, 55, 35, 34, 6,
53, 33, 32, 4, 24, 9, 13, 44,
63, 43, 42, 14, 45, 60, 56, 25,
37, 52, 48, 17, 2, 22, 23, 51
};
};
template <>
class HilbertDecodeLookupTable<3>
{
public:
constexpr static UInt8 LOOKUP_TABLE[256] = {
64, 1, 9, 136, 16, 88, 89, 209, 18, 90, 91, 211, 139, 202, 194, 67,
4, 76, 77, 197, 70, 7, 15, 142, 86, 23, 31, 158, 221, 149, 148, 28,
36, 108, 109, 229, 102, 39, 47, 174, 118, 55, 63, 190, 253, 181, 180, 60,
187, 250, 242, 115, 235, 163, 162, 42, 233, 161, 160, 40, 112, 49, 57, 184,
0, 72, 73, 193, 66, 3, 11, 138, 82, 19, 27, 154, 217, 145, 144, 24,
96, 33, 41, 168, 48, 120, 121, 241, 50, 122, 123, 243, 171, 234, 226, 99,
100, 37, 45, 172, 52, 124, 125, 245, 54, 126, 127, 247, 175, 238, 230, 103,
223, 151, 150, 30, 157, 220, 212, 85, 141, 204, 196, 69, 6, 78, 79, 199,
255, 183, 182, 62, 189, 252, 244, 117, 173, 236, 228, 101, 38, 110, 111, 231,
159, 222, 214, 87, 207, 135, 134, 14, 205, 133, 132, 12, 84, 21, 29, 156,
155, 218, 210, 83, 203, 131, 130, 10, 201, 129, 128, 8, 80, 17, 25, 152,
32, 104, 105, 225, 98, 35, 43, 170, 114, 51, 59, 186, 249, 177, 176, 56,
191, 254, 246, 119, 239, 167, 166, 46, 237, 165, 164, 44, 116, 53, 61, 188,
251, 179, 178, 58, 185, 248, 240, 113, 169, 232, 224, 97, 34, 106, 107, 227,
219, 147, 146, 26, 153, 216, 208, 81, 137, 200, 192, 65, 2, 74, 75, 195,
68, 5, 13, 140, 20, 92, 93, 213, 22, 94, 95, 215, 143, 206, 198, 71
};
};
}
template <UInt8 bit_step>
class FunctionHilbertDecode2DWIthLookupTableImpl
{
static_assert(bit_step <= 3, "bit_step should not be more than 3 to fit in UInt8");
public:
static std::tuple<UInt64, UInt64> decode(UInt64 hilbert_code)
{
UInt64 x = 0;
UInt64 y = 0;
const auto leading_zeros_count = getLeadingZeroBits(hilbert_code);
const auto used_bits = std::numeric_limits<UInt64>::digits - leading_zeros_count;
auto [current_shift, state] = getInitialShiftAndState(used_bits);
while (current_shift >= 0)
{
const UInt8 hilbert_bits = (hilbert_code >> current_shift) & HILBERT_MASK;
const auto [x_bits, y_bits] = getCodeAndUpdateState(hilbert_bits, state);
x |= (x_bits << (current_shift >> 1));
y |= (y_bits << (current_shift >> 1));
current_shift -= getHilbertShift(bit_step);
}
return {x, y};
}
private:
// for bit_step = 3
// LOOKUP_TABLE[SSHHHHHH] = SSXXXYYY
// where SS - 2 bits for state, XXX - 3 bits of x, YYY - 3 bits of y
// State is rotation of curve on every step, left/up/right/down - therefore 2 bits
static std::pair<UInt64, UInt64> getCodeAndUpdateState(UInt8 hilbert_bits, UInt8& state)
{
const UInt8 table_index = state | hilbert_bits;
const auto table_code = HilbertDetails::HilbertDecodeLookupTable<bit_step>::LOOKUP_TABLE[table_index];
state = table_code & STATE_MASK;
const UInt64 x_bits = (table_code & X_MASK) >> bit_step;
const UInt64 y_bits = table_code & Y_MASK;
return {x_bits, y_bits};
}
// hilbert code is double size of input values
static constexpr UInt8 getHilbertShift(UInt8 shift)
{
return shift << 1;
}
static std::pair<Int8, UInt8> getInitialShiftAndState(UInt8 used_bits)
{
UInt8 iterations = used_bits / HILBERT_SHIFT;
Int8 initial_shift = iterations * HILBERT_SHIFT;
if (initial_shift < used_bits)
{
++iterations;
}
else
{
initial_shift -= HILBERT_SHIFT;
}
UInt8 state = iterations % 2 == 0 ? LEFT_STATE : DEFAULT_STATE;
return {initial_shift, state};
}
constexpr static UInt8 STEP_MASK = (1 << bit_step) - 1;
constexpr static UInt8 HILBERT_SHIFT = getHilbertShift(bit_step);
constexpr static UInt8 HILBERT_MASK = (1 << HILBERT_SHIFT) - 1;
constexpr static UInt8 STATE_MASK = 0b11 << HILBERT_SHIFT;
constexpr static UInt8 Y_MASK = STEP_MASK;
constexpr static UInt8 X_MASK = STEP_MASK << bit_step;
constexpr static UInt8 LEFT_STATE = 0b01 << HILBERT_SHIFT;
constexpr static UInt8 DEFAULT_STATE = bit_step % 2 == 0 ? LEFT_STATE : 0;
};
}

View File

@ -0,0 +1,150 @@
#include "hilbertEncode2DLUT.h"
#include <Common/BitHelpers.h>
#include <Functions/PerformanceAdaptors.h>
#include <limits>
#include <optional>
#include <Functions/FunctionFactory.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
}
class FunctionHilbertEncode : public FunctionSpaceFillingCurveEncode
{
public:
static constexpr auto name = "hilbertEncode";
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionHilbertEncode>();
}
String getName() const override { return name; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
if (input_rows_count == 0)
return ColumnUInt64::create();
size_t num_dimensions = arguments.size();
size_t vector_start_index = 0;
const auto * const_col = typeid_cast<const ColumnConst *>(arguments[0].column.get());
const ColumnTuple * mask;
if (const_col)
mask = typeid_cast<const ColumnTuple *>(const_col->getDataColumnPtr().get());
else
mask = typeid_cast<const ColumnTuple *>(arguments[0].column.get());
if (mask)
{
num_dimensions = mask->tupleSize();
vector_start_index = 1;
for (size_t i = 0; i < num_dimensions; i++)
{
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > 32)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} of function {}, should be a number in range 0-32",
arguments[0].column->getName(), getName());
}
}
auto col_res = ColumnUInt64::create();
ColumnUInt64::Container & vec_res = col_res->getData();
vec_res.resize(input_rows_count);
const auto expand = [mask](const UInt64 value, const UInt8 column_num)
{
if (mask)
return value << mask->getColumn(column_num).getUInt(0);
return value;
};
const ColumnPtr & col0 = arguments[0 + vector_start_index].column;
if (num_dimensions == 1)
{
for (size_t i = 0; i < input_rows_count; ++i)
{
vec_res[i] = expand(col0->getUInt(i), 0);
}
return col_res;
}
const ColumnPtr & col1 = arguments[1 + vector_start_index].column;
if (num_dimensions == 2)
{
for (size_t i = 0; i < input_rows_count; ++i)
{
vec_res[i] = FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(
expand(col0->getUInt(i), 0),
expand(col1->getUInt(i), 1));
}
return col_res;
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal number of UInt arguments of function {}: should be not more than 2 dimensions",
getName());
}
};
REGISTER_FUNCTION(HilbertEncode)
{
factory.registerFunction<FunctionHilbertEncode>(FunctionDocumentation{
.description=R"(
Calculates code for Hilbert Curve for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
Simple: accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
[example:simple]
Produces: `31`
Expanded: accepts a range mask (tuple) as a first argument and up to 2 unsigned integers as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
[example:range_expanded]
Produces: `4031541586602`
Note: tuple size must be equal to the number of the other arguments
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF)
For a single argument without a tuple, the function returns the argument itself as the Hilbert index, since no dimensional mapping is needed.
[example:identity]
Produces: `1`
If a single argument is provided with a tuple specifying bit shifts, the function shifts the argument left by the specified number of bits.
[example:identity_expanded]
Produces: `512`
The function also accepts columns as arguments:
[example:from_table]
But the range tuple must still be a constant:
[example:from_table_range]
Please note that you can fit only so much bits of information into Hilbert code as UInt64 has.
Two arguments will have a range of maximum 2^32 (64/2) each
All overflow will be clamped to zero
)",
.examples{
{"simple", "SELECT hilbertEncode(3, 4)", ""},
{"range_expanded", "SELECT hilbertEncode((10,6), 1024, 16)", ""},
{"identity", "SELECT hilbertEncode(1)", ""},
{"identity_expanded", "SELECT hilbertEncode(tuple(2), 128)", ""},
{"from_table", "SELECT hilbertEncode(n1, n2) FROM table", ""},
{"from_table_range", "SELECT hilbertEncode((1,2), n1, n2) FROM table", ""},
},
.categories {"Hilbert coding", "Hilbert Curve"}
});
}
}

View File

@ -0,0 +1,142 @@
#pragma once
#include <Functions/FunctionSpaceFillingCurve.h>
namespace DB
{
namespace HilbertDetails
{
template <UInt8 bit_step>
class HilbertEncodeLookupTable
{
public:
constexpr static UInt8 LOOKUP_TABLE[0] = {};
};
template <>
class HilbertEncodeLookupTable<1>
{
public:
constexpr static UInt8 LOOKUP_TABLE[16] = {
4, 1, 11, 2,
0, 15, 5, 6,
10, 9, 3, 12,
14, 7, 13, 8
};
};
template <>
class HilbertEncodeLookupTable<2>
{
public:
constexpr static UInt8 LOOKUP_TABLE[64] = {
0, 51, 20, 5, 17, 18, 39, 6,
46, 45, 24, 9, 15, 60, 43, 10,
16, 1, 62, 31, 35, 2, 61, 44,
4, 55, 8, 59, 21, 22, 25, 26,
42, 41, 38, 37, 11, 56, 7, 52,
28, 13, 50, 19, 47, 14, 49, 32,
58, 27, 12, 63, 57, 40, 29, 30,
54, 23, 34, 33, 53, 36, 3, 48
};
};
template <>
class HilbertEncodeLookupTable<3>
{
public:
constexpr static UInt8 LOOKUP_TABLE[256] = {
64, 1, 206, 79, 16, 211, 84, 21, 131, 2, 205, 140, 81, 82, 151, 22, 4,
199, 8, 203, 158, 157, 88, 25, 69, 70, 73, 74, 31, 220, 155, 26, 186,
185, 182, 181, 32, 227, 100, 37, 59, 248, 55, 244, 97, 98, 167, 38, 124,
61, 242, 115, 174, 173, 104, 41, 191, 62, 241, 176, 47, 236, 171, 42, 0,
195, 68, 5, 250, 123, 60, 255, 65, 66, 135, 6, 249, 184, 125, 126, 142,
141, 72, 9, 246, 119, 178, 177, 15, 204, 139, 10, 245, 180, 51, 240, 80,
17, 222, 95, 96, 33, 238, 111, 147, 18, 221, 156, 163, 34, 237, 172, 20,
215, 24, 219, 36, 231, 40, 235, 85, 86, 89, 90, 101, 102, 105, 106, 170,
169, 166, 165, 154, 153, 150, 149, 43, 232, 39, 228, 27, 216, 23, 212, 108,
45, 226, 99, 92, 29, 210, 83, 175, 46, 225, 160, 159, 30, 209, 144, 48,
243, 116, 53, 202, 75, 12, 207, 113, 114, 183, 54, 201, 136, 77, 78, 190,
189, 120, 57, 198, 71, 130, 129, 63, 252, 187, 58, 197, 132, 3, 192, 234,
107, 44, 239, 112, 49, 254, 127, 233, 168, 109, 110, 179, 50, 253, 188, 230,
103, 162, 161, 52, 247, 56, 251, 229, 164, 35, 224, 117, 118, 121, 122, 218,
91, 28, 223, 138, 137, 134, 133, 217, 152, 93, 94, 11, 200, 7, 196, 214,
87, 146, 145, 76, 13, 194, 67, 213, 148, 19, 208, 143, 14, 193, 128,
};
};
}
template <UInt8 bit_step>
class FunctionHilbertEncode2DWIthLookupTableImpl
{
static_assert(bit_step <= 3, "bit_step should not be more than 3 to fit in UInt8");
public:
static UInt64 encode(UInt64 x, UInt64 y)
{
UInt64 hilbert_code = 0;
const auto leading_zeros_count = getLeadingZeroBits(x | y);
const auto used_bits = std::numeric_limits<UInt64>::digits - leading_zeros_count;
if (used_bits > 32)
return 0; // hilbert code will be overflowed in this case
auto [current_shift, state] = getInitialShiftAndState(used_bits);
while (current_shift >= 0)
{
const UInt8 x_bits = (x >> current_shift) & STEP_MASK;
const UInt8 y_bits = (y >> current_shift) & STEP_MASK;
const auto hilbert_bits = getCodeAndUpdateState(x_bits, y_bits, state);
hilbert_code |= (hilbert_bits << getHilbertShift(current_shift));
current_shift -= bit_step;
}
return hilbert_code;
}
private:
// for bit_step = 3
// LOOKUP_TABLE[SSXXXYYY] = SSHHHHHH
// where SS - 2 bits for state, XXX - 3 bits of x, YYY - 3 bits of y
// State is rotation of curve on every step, left/up/right/down - therefore 2 bits
static UInt64 getCodeAndUpdateState(UInt8 x_bits, UInt8 y_bits, UInt8& state)
{
const UInt8 table_index = state | (x_bits << bit_step) | y_bits;
const auto table_code = HilbertDetails::HilbertEncodeLookupTable<bit_step>::LOOKUP_TABLE[table_index];
state = table_code & STATE_MASK;
return table_code & HILBERT_MASK;
}
// hilbert code is double size of input values
static constexpr UInt8 getHilbertShift(UInt8 shift)
{
return shift << 1;
}
static std::pair<Int8, UInt8> getInitialShiftAndState(UInt8 used_bits)
{
UInt8 iterations = used_bits / bit_step;
Int8 initial_shift = iterations * bit_step;
if (initial_shift < used_bits)
{
++iterations;
}
else
{
initial_shift -= bit_step;
}
UInt8 state = iterations % 2 == 0 ? LEFT_STATE : DEFAULT_STATE;
return {initial_shift, state};
}
constexpr static UInt8 STEP_MASK = (1 << bit_step) - 1;
constexpr static UInt8 HILBERT_SHIFT = getHilbertShift(bit_step);
constexpr static UInt8 HILBERT_MASK = (1 << HILBERT_SHIFT) - 1;
constexpr static UInt8 STATE_MASK = 0b11 << HILBERT_SHIFT;
constexpr static UInt8 LEFT_STATE = 0b01 << HILBERT_SHIFT;
constexpr static UInt8 DEFAULT_STATE = bit_step % 2 == 0 ? LEFT_STATE : 0;
};
}

View File

@ -77,75 +77,17 @@ inline void fillVectorVector(const ArrayCond & cond, const ArrayA & a, const Arr
{
size_t size = cond.size();
bool a_is_short = a.size() < size;
bool b_is_short = b.size() < size;
if (a_is_short && b_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t a_index = 0, b_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b[b_index]);
a_index += !!cond[i];
b_index += !cond[i];
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[i], res[i])
}
}
else if (a_is_short)
{
size_t a_index = 0;
for (size_t i = 0; i < size; ++i)
else
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b[i]);
a_index += !!cond[i];
}
}
else if (b_is_short)
{
size_t b_index = 0;
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[b_index]);
b_index += !cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[i], res[i])
}
else
{
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[i]);
}
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[i]);
}
}
}
@ -154,37 +96,16 @@ template <typename ArrayCond, typename ArrayA, typename B, typename ArrayResult,
inline void fillVectorConstant(const ArrayCond & cond, const ArrayA & a, B b, ArrayResult & res)
{
size_t size = cond.size();
bool a_is_short = a.size() < size;
if (a_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t a_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b);
a_index += !!cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b);
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b);
}
}
@ -192,37 +113,16 @@ template <typename ArrayCond, typename A, typename ArrayB, typename ArrayResult,
inline void fillConstantVector(const ArrayCond & cond, A a, const ArrayB & b, ArrayResult & res)
{
size_t size = cond.size();
bool b_is_short = b.size() < size;
if (b_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t b_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[b_index]);
b_index += !cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[i]);
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[i]);
}
}
@ -880,9 +780,6 @@ private:
bool then_is_const = isColumnConst(*col_then);
bool else_is_const = isColumnConst(*col_else);
bool then_is_short = col_then->size() < cond_col->size();
bool else_is_short = col_else->size() < cond_col->size();
const auto & cond_array = cond_col->getData();
if (then_is_const && else_is_const)
@ -902,37 +799,34 @@ private:
{
const IColumn & then_nested_column = assert_cast<const ColumnConst &>(*col_then).getDataColumn();
size_t else_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(then_nested_column, 0);
else
result_column->insertFrom(*col_else, else_is_short ? else_index++ : i);
result_column->insertFrom(*col_else, i);
}
}
else if (else_is_const)
{
const IColumn & else_nested_column = assert_cast<const ColumnConst &>(*col_else).getDataColumn();
size_t then_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(*col_then, then_is_short ? then_index++ : i);
result_column->insertFrom(*col_then, i);
else
result_column->insertFrom(else_nested_column, 0);
}
}
else
{
size_t then_index = 0, else_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(*col_then, then_is_short ? then_index++ : i);
result_column->insertFrom(*col_then, i);
else
result_column->insertFrom(*col_else, else_is_short ? else_index++ : i);
result_column->insertFrom(*col_else, i);
}
}
@ -1125,9 +1019,6 @@ private:
if (then_is_null && else_is_null)
return result_type->createColumnConstWithDefaultValue(input_rows_count);
bool then_is_short = arg_then.column->size() < arg_cond.column->size();
bool else_is_short = arg_else.column->size() < arg_cond.column->size();
const ColumnUInt8 * cond_col = typeid_cast<const ColumnUInt8 *>(arg_cond.column.get());
const ColumnConst * cond_const_col = checkAndGetColumnConst<ColumnVector<UInt8>>(arg_cond.column.get());
@ -1146,8 +1037,6 @@ private:
{
arg_else_column = arg_else_column->convertToFullColumnIfConst();
auto result_column = IColumn::mutate(std::move(arg_else_column));
if (else_is_short)
result_column->expand(cond_col->getData(), true);
if (isColumnNullable(*result_column))
{
assert_cast<ColumnNullable &>(*result_column).applyNullMap(assert_cast<const ColumnUInt8 &>(*arg_cond.column));
@ -1193,8 +1082,6 @@ private:
{
arg_then_column = arg_then_column->convertToFullColumnIfConst();
auto result_column = IColumn::mutate(std::move(arg_then_column));
if (then_is_short)
result_column->expand(cond_col->getData(), false);
if (isColumnNullable(*result_column))
{

View File

@ -1,10 +1,11 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Functions/FunctionHelpers.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionSpaceFillingCurve.h>
#include <Functions/IFunction.h>
#include <Functions/PerformanceAdaptors.h>
#include <morton-nd/mortonND_LUT.h>
@ -15,13 +16,6 @@
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int ARGUMENT_OUT_OF_BOUND;
}
// NOLINTBEGIN(bugprone-switch-missing-default-case)
#define EXTRACT_VECTOR(INDEX) \
@ -186,7 +180,7 @@ constexpr auto MortonND_5D_Dec = mortonnd::MortonNDLutDecoder<5, 12, 8>();
constexpr auto MortonND_6D_Dec = mortonnd::MortonNDLutDecoder<6, 10, 8>();
constexpr auto MortonND_7D_Dec = mortonnd::MortonNDLutDecoder<7, 9, 8>();
constexpr auto MortonND_8D_Dec = mortonnd::MortonNDLutDecoder<8, 8, 8>();
class FunctionMortonDecode : public IFunction
class FunctionMortonDecode : public FunctionSpaceFillingCurveDecode<8, 1, 8>
{
public:
static constexpr auto name = "mortonDecode";
@ -200,68 +194,6 @@ public:
return name;
}
size_t getNumberOfArguments() const override
{
return 2;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
UInt64 tuple_size = 0;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
if (!col_const)
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be a constant (UInt or Tuple)",
arguments[0].type->getName(), getName());
if (!WhichDataType(arguments[1].type).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be a native UInt",
arguments[1].type->getName(), getName());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
{
tuple_size = mask->tupleSize();
}
else if (WhichDataType(arguments[0].type).isNativeUInt())
{
tuple_size = col_const->getUInt(0);
}
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be UInt or Tuple",
arguments[0].type->getName(), getName());
if (tuple_size > 8 || tuple_size < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal first argument for function {}, should be a number in range 1-8 or a Tuple of such size",
getName());
if (mask)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].type.get());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > 8 || ratio < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} in tuple for function {}, should be a number in range 1-8",
ratio, getName());
}
}
DataTypes types(tuple_size);
for (size_t i = 0; i < tuple_size; i++)
{
types[i] = std::make_shared<DataTypeUInt64>();
}
return std::make_shared<DataTypeTuple>(types);
}
static UInt64 shrink(UInt64 ratio, UInt64 value)
{
switch (ratio) // NOLINT(bugprone-switch-missing-default-case)

View File

@ -1,10 +1,9 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnConst.h>
#include <Columns/ColumnTuple.h>
#include <Functions/FunctionSpaceFillingCurve.h>
#include <Functions/PerformanceAdaptors.h>
#include <morton-nd/mortonND_LUT.h>
@ -19,7 +18,6 @@ namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
}
#define EXTRACT_VECTOR(INDEX) \
@ -144,7 +142,7 @@ constexpr auto MortonND_5D_Enc = mortonnd::MortonNDLutEncoder<5, 12, 8>();
constexpr auto MortonND_6D_Enc = mortonnd::MortonNDLutEncoder<6, 10, 8>();
constexpr auto MortonND_7D_Enc = mortonnd::MortonNDLutEncoder<7, 9, 8>();
constexpr auto MortonND_8D_Enc = mortonnd::MortonNDLutEncoder<8, 8, 8>();
class FunctionMortonEncode : public IFunction
class FunctionMortonEncode : public FunctionSpaceFillingCurveEncode
{
public:
static constexpr auto name = "mortonEncode";
@ -158,56 +156,6 @@ public:
return name;
}
bool isVariadic() const override
{
return true;
}
size_t getNumberOfArguments() const override
{
return 0;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DB::DataTypes & arguments) const override
{
size_t vectorStartIndex = 0;
if (arguments.empty())
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"At least one UInt argument is required for function {}",
getName());
if (WhichDataType(arguments[0]).isTuple())
{
vectorStartIndex = 1;
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].get());
auto tuple_size = type_tuple->getElements().size();
if (tuple_size != (arguments.size() - 1))
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} for function {}, tuple size should be equal to number of UInt arguments",
arguments[0]->getName(), getName());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
}
}
for (size_t i = vectorStartIndex; i < arguments.size(); i++)
{
const auto & arg = arguments[i];
if (!WhichDataType(arg).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument of function {}, should be a native UInt",
arg->getName(), getName());
}
return std::make_shared<DataTypeUInt64>();
}
static UInt64 expand(UInt64 ratio, UInt64 value)
{
switch (ratio) // NOLINT(bugprone-switch-missing-default-case)

View File

@ -148,11 +148,6 @@ public:
bool condition_always_true = false;
bool condition_is_nullable = false;
bool source_is_constant = false;
bool condition_is_short = false;
bool source_is_short = false;
size_t condition_index = 0;
size_t source_index = 0;
};
ColumnPtr executeImpl(const ColumnsWithTypeAndName & args, const DataTypePtr & result_type, size_t input_rows_count) const override
@ -214,12 +209,9 @@ public:
instruction.condition = cond_col;
instruction.condition_is_nullable = instruction.condition->isNullable();
}
instruction.condition_is_short = cond_col->size() < arguments[0].column->size();
}
const ColumnWithTypeAndName & source_col = arguments[source_idx];
instruction.source_is_short = source_col.column->size() < arguments[0].column->size();
if (source_col.type->equals(*return_type))
{
instruction.source = source_col.column;
@ -250,19 +242,8 @@ public:
return ColumnConst::create(std::move(res), instruction.source->size());
}
bool contains_short = false;
for (const auto & instruction : instructions)
{
if (instruction.condition_is_short || instruction.source_is_short)
{
contains_short = true;
break;
}
}
const WhichDataType which(removeNullable(result_type));
bool execute_multiif_columnar = allow_execute_multiif_columnar && !contains_short
&& instructions.size() <= std::numeric_limits<UInt8>::max()
bool execute_multiif_columnar = allow_execute_multiif_columnar && instructions.size() <= std::numeric_limits<UInt8>::max()
&& (which.isInt() || which.isUInt() || which.isFloat() || which.isDecimal() || which.isDateOrDate32OrDateTimeOrDateTime64()
|| which.isEnum() || which.isIPv4() || which.isIPv6());
@ -339,25 +320,23 @@ private:
{
bool insert = false;
size_t condition_index = instruction.condition_is_short ? instruction.condition_index++ : i;
if (instruction.condition_always_true)
insert = true;
else if (!instruction.condition_is_nullable)
insert = assert_cast<const ColumnUInt8 &>(*instruction.condition).getData()[condition_index];
insert = assert_cast<const ColumnUInt8 &>(*instruction.condition).getData()[i];
else
{
const ColumnNullable & condition_nullable = assert_cast<const ColumnNullable &>(*instruction.condition);
const ColumnUInt8 & condition_nested = assert_cast<const ColumnUInt8 &>(condition_nullable.getNestedColumn());
const NullMap & condition_null_map = condition_nullable.getNullMapData();
insert = !condition_null_map[condition_index] && condition_nested.getData()[condition_index];
insert = !condition_null_map[i] && condition_nested.getData()[i];
}
if (insert)
{
size_t source_index = instruction.source_is_short ? instruction.source_index++ : i;
if (!instruction.source_is_constant)
res->insertFrom(*instruction.source, source_index);
res->insertFrom(*instruction.source, i);
else
res->insertFrom(assert_cast<const ColumnConst &>(*instruction.source).getDataColumn(), 0);

View File

@ -0,0 +1,81 @@
#include <gtest/gtest.h>
#include "Functions/hilbertDecode2DLUT.h"
#include "Functions/hilbertEncode2DLUT.h"
#include "base/types.h"
TEST(HilbertLookupTable, EncodeBit1And3Consistency)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert1bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<1>::encode(x, y);
auto hilbert3bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert1bit, hilbert3bit);
}
}
}
TEST(HilbertLookupTable, EncodeBit2And3Consistency)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert2bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<2>::encode(x, y);
auto hilbert3bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert3bit, hilbert2bit);
}
}
}
TEST(HilbertLookupTable, DecodeBit1And3Consistency)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto res1 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<1>::decode(hilbert_code);
auto res3 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(res1, res3);
}
}
TEST(HilbertLookupTable, DecodeBit2And3Consistency)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto res2 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<2>::decode(hilbert_code);
auto res3 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(res2, res3);
}
}
TEST(HilbertLookupTable, DecodeAndEncodeAreInverseOperations)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert_code = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
auto [x_new, y_new] = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(x_new, x);
ASSERT_EQ(y_new, y);
}
}
}
TEST(HilbertLookupTable, EncodeAndDecodeAreInverseOperations)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto [x, y] = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
auto hilbert_new = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert_new, hilbert_code);
}
}

View File

@ -1400,7 +1400,7 @@ public:
divide_result.type, input_rows_count);
auto minus_elem = minus->build({one, divide_result});
return minus_elem->execute({one, divide_result}, minus_elem->getResultType(), {});
return minus_elem->execute({one, divide_result}, minus_elem->getResultType(), input_rows_count);
}
};

View File

@ -149,16 +149,18 @@ namespace
dest_bucket, dest_key, /* local_path_ */ {}, /* data_size */ 0,
outcome.IsSuccess() ? nullptr : &outcome.GetError());
if (outcome.IsSuccess())
{
multipart_upload_id = outcome.GetResult().GetUploadId();
LOG_TRACE(log, "Multipart upload has created. Bucket: {}, Key: {}, Upload id: {}", dest_bucket, dest_key, multipart_upload_id);
}
else
if (!outcome.IsSuccess())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw S3Exception(outcome.GetError().GetMessage(), outcome.GetError().GetErrorType());
}
multipart_upload_id = outcome.GetResult().GetUploadId();
if (multipart_upload_id.empty())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw Exception(ErrorCodes::S3_ERROR, "Invalid CreateMultipartUpload result: missing UploadId.");
}
LOG_TRACE(log, "Multipart upload was created. Bucket: {}, Key: {}, Upload id: {}", dest_bucket, dest_key, multipart_upload_id);
}
void completeMultipartUpload()

View File

@ -413,7 +413,13 @@ void WriteBufferFromS3::createMultipartUpload()
multipart_upload_id = outcome.GetResult().GetUploadId();
LOG_TRACE(limitedLog, "Multipart upload has created. {}", getShortLogDetails());
if (multipart_upload_id.empty())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw Exception(ErrorCodes::S3_ERROR, "Invalid CreateMultipartUpload result: missing UploadId.");
}
LOG_TRACE(limitedLog, "Multipart upload was created. {}", getShortLogDetails());
}
void WriteBufferFromS3::abortMultipartUpload()

View File

@ -1621,7 +1621,7 @@ void ActionsDAG::mergeInplace(ActionsDAG && second)
first.projected_output = second.projected_output;
}
void ActionsDAG::mergeNodes(ActionsDAG && second)
void ActionsDAG::mergeNodes(ActionsDAG && second, NodeRawConstPtrs * out_outputs)
{
std::unordered_map<std::string, const ActionsDAG::Node *> node_name_to_node;
for (auto & node : nodes)
@ -1677,6 +1677,12 @@ void ActionsDAG::mergeNodes(ActionsDAG && second)
nodes_to_process.pop_back();
}
if (out_outputs)
{
for (auto & node : second.getOutputs())
out_outputs->push_back(node_name_to_node.at(node->result_name));
}
if (nodes_to_move_from_second_dag.empty())
return;
@ -2888,6 +2894,7 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
FunctionOverloadResolverPtr function_overload_resolver;
String result_name;
if (node->function_base->getName() == "indexHint")
{
ActionsDAG::NodeRawConstPtrs children;
@ -2908,6 +2915,11 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
auto index_hint_function_clone = std::make_shared<FunctionIndexHint>();
index_hint_function_clone->setActions(std::move(index_hint_filter_dag));
function_overload_resolver = std::make_shared<FunctionToOverloadResolverAdaptor>(std::move(index_hint_function_clone));
/// Keep the unique name like "indexHint(foo)" instead of replacing it
/// with "indexHint()". Otherwise index analysis (which does look at
/// indexHint arguments that we're hiding here) will get confused by the
/// multiple substantially different nodes with the same result name.
result_name = node->result_name;
}
}
}
@ -2922,7 +2934,7 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
function_base,
std::move(function_children),
std::move(arguments),
{},
result_name,
node->result_type,
all_const);
break;

View File

@ -324,8 +324,9 @@ public:
/// So that pointers to nodes are kept valid.
void mergeInplace(ActionsDAG && second);
/// Merge current nodes with specified dag nodes
void mergeNodes(ActionsDAG && second);
/// Merge current nodes with specified dag nodes.
/// *out_outputs is filled with pointers to the nodes corresponding to second.getOutputs().
void mergeNodes(ActionsDAG && second, NodeRawConstPtrs * out_outputs = nullptr);
struct SplitResult
{

View File

@ -60,6 +60,7 @@ String calculateActionNodeNameWithCastIfNeeded(const ConstantNode & constant_nod
if (constant_node.requiresCastCall())
{
/// Projection name for constants is <value>_<type> so for _cast(1, 'String') we will have _cast(1_Uint8, 'String'_String)
buffer << ", '" << constant_node.getResultType()->getName() << "'_String)";
}

View File

@ -145,11 +145,10 @@ void ParquetBlockOutputFormat::consume(Chunk chunk)
/// Because the real SquashingTransform is only used for INSERT, not for SELECT ... INTO OUTFILE.
/// The latter doesn't even have a pipeline where a transform could be inserted, so it's more
/// convenient to do the squashing here. It's also parallelized here.
if (chunk.getNumRows() != 0)
{
staging_rows += chunk.getNumRows();
staging_bytes += chunk.bytes();
staging_bytes += chunk.allocatedBytes();
staging_chunks.push_back(std::move(chunk));
}
@ -282,11 +281,15 @@ void ParquetBlockOutputFormat::writeRowGroup(std::vector<Chunk> chunks)
writeUsingArrow(std::move(chunks));
else
{
Chunk concatenated = std::move(chunks[0]);
for (size_t i = 1; i < chunks.size(); ++i)
concatenated.append(chunks[i]);
chunks.clear();
Chunk concatenated;
while (!chunks.empty())
{
if (concatenated.empty())
concatenated.swap(chunks.back());
else
concatenated.append(chunks.back());
chunks.pop_back();
}
writeRowGroupInOneThread(std::move(concatenated));
}
}

View File

@ -56,6 +56,30 @@ std::string toString(const ColumnDefaultKind kind)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Invalid ColumnDefaultKind");
}
ColumnDefault & ColumnDefault::operator=(const ColumnDefault & other)
{
if (this == &other)
return *this;
kind = other.kind;
expression = other.expression ? other.expression->clone() : nullptr;
ephemeral_default = other.ephemeral_default;
return *this;
}
ColumnDefault & ColumnDefault::operator=(ColumnDefault && other) noexcept
{
if (this == &other)
return *this;
kind = std::exchange(other.kind, ColumnDefaultKind{});
expression = other.expression ? other.expression->clone() : nullptr;
other.expression.reset();
ephemeral_default = std::exchange(other.ephemeral_default, false);
return *this;
}
bool operator==(const ColumnDefault & lhs, const ColumnDefault & rhs)
{

View File

@ -24,15 +24,19 @@ std::string toString(ColumnDefaultKind kind);
struct ColumnDefault
{
ColumnDefault() = default;
ColumnDefault(const ColumnDefault & other) { *this = other; }
ColumnDefault & operator=(const ColumnDefault & other);
ColumnDefault(ColumnDefault && other) noexcept { *this = std::move(other); }
ColumnDefault & operator=(ColumnDefault && other) noexcept;
ColumnDefaultKind kind = ColumnDefaultKind::Default;
ASTPtr expression;
bool ephemeral_default = false;
};
bool operator==(const ColumnDefault & lhs, const ColumnDefault & rhs);
using ColumnDefaults = std::unordered_map<std::string, ColumnDefault>;
}

View File

@ -60,6 +60,46 @@ ColumnDescription::ColumnDescription(String name_, DataTypePtr type_, ASTPtr cod
{
}
ColumnDescription & ColumnDescription::operator=(const ColumnDescription & other)
{
if (this == &other)
return *this;
name = other.name;
type = other.type;
default_desc = other.default_desc;
comment = other.comment;
codec = other.codec ? other.codec->clone() : nullptr;
settings = other.settings;
ttl = other.ttl ? other.ttl->clone() : nullptr;
stat = other.stat;
return *this;
}
ColumnDescription & ColumnDescription::operator=(ColumnDescription && other) noexcept
{
if (this == &other)
return *this;
name = std::move(other.name);
type = std::move(other.type);
default_desc = std::move(other.default_desc);
comment = std::move(other.comment);
codec = other.codec ? other.codec->clone() : nullptr;
other.codec.reset();
settings = std::move(other.settings);
ttl = other.ttl ? other.ttl->clone() : nullptr;
other.ttl.reset();
stat = std::move(other.stat);
return *this;
}
bool ColumnDescription::operator==(const ColumnDescription & other) const
{
auto ast_to_str = [](const ASTPtr & ast) { return ast ? queryToString(ast) : String{}; };

View File

@ -92,8 +92,11 @@ struct ColumnDescription
std::optional<StatisticDescription> stat;
ColumnDescription() = default;
ColumnDescription(ColumnDescription &&) = default;
ColumnDescription(const ColumnDescription &) = default;
ColumnDescription(const ColumnDescription & other) { *this = other; }
ColumnDescription & operator=(const ColumnDescription & other);
ColumnDescription(ColumnDescription && other) noexcept { *this = std::move(other); }
ColumnDescription & operator=(ColumnDescription && other) noexcept;
ColumnDescription(String name_, DataTypePtr type_);
ColumnDescription(String name_, DataTypePtr type_, String comment_);
ColumnDescription(String name_, DataTypePtr type_, ASTPtr codec_, String comment_);

View File

@ -3,6 +3,8 @@
#include <IO/WriteHelpers.h>
#include <Common/quoteString.h>
#include <algorithm>
#include <Parsers/ExpressionListParsers.h>
#include <Parsers/parseQuery.h>
#include <base/JSON.h>
@ -272,6 +274,29 @@ bool MergeTreeDataPartTTLInfos::hasAnyNonFinishedTTLs() const
return false;
}
namespace
{
/// We had backward incompatibility in representation of serialized expressions, example:
///
/// `expired + toIntervalSecond(20)` vs `plus(expired, toIntervalSecond(20))`
/// Since they are stored as strings we cannot compare them directly as strings
/// To avoid backward incompatibility we parse them and check AST hashes.
/// This O(N^2), but amount of TTLs should be small, so it should be Ok.
auto tryToFindTTLExpressionInMapByASTMatching(const TTLInfoMap & ttl_info_map, const std::string & result_column)
{
ParserExpression parser;
auto ast_needle = parseQuery(parser, result_column.data(), result_column.data() + result_column.size(), "", 0, 0, 0);
for (auto it = ttl_info_map.begin(); it != ttl_info_map.end(); ++it)
{
const std::string & stored_expression = it->first;
auto ast_candidate = parseQuery(parser, stored_expression.data(), stored_expression.data() + stored_expression.size(), "", 0, 0, 0);
if (ast_candidate->getTreeHash(false) == ast_needle->getTreeHash(false))
return it;
}
return ttl_info_map.end();
}
}
std::optional<TTLDescription> selectTTLDescriptionForTTLInfos(const TTLDescriptions & descriptions, const TTLInfoMap & ttl_info_map, time_t current_time, bool use_max)
{
time_t best_ttl_time = 0;
@ -281,7 +306,11 @@ std::optional<TTLDescription> selectTTLDescriptionForTTLInfos(const TTLDescripti
auto ttl_info_it = ttl_info_map.find(ttl_entry_it->result_column);
if (ttl_info_it == ttl_info_map.end())
continue;
{
ttl_info_it = tryToFindTTLExpressionInMapByASTMatching(ttl_info_map, ttl_entry_it->result_column);
if (ttl_info_it == ttl_info_map.end())
continue;
}
time_t ttl_time;

View File

@ -176,6 +176,7 @@ MergeTreeDataPartWriterOnDisk::MergeTreeDataPartWriterOnDisk(
if (settings.rewrite_primary_key)
initPrimaryIndex();
initSkipIndices();
initStatistics();
}
@ -272,6 +273,9 @@ void MergeTreeDataPartWriterOnDisk::initStatistics()
void MergeTreeDataPartWriterOnDisk::initSkipIndices()
{
if (skip_indices.empty())
return;
ParserCodec codec_parser;
auto ast = parseQuery(codec_parser, "(" + Poco::toUpper(settings.marks_compression_codec) + ")", 0, DBMS_DEFAULT_MAX_PARSER_DEPTH, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS);
CompressionCodecPtr marks_compression_codec = CompressionCodecFactory::instance().get(ast, nullptr);

View File

@ -464,7 +464,13 @@ MergeTreeDataWriter::TemporaryPart MergeTreeDataWriter::writeTempPartImpl(
temp_part.temporary_directory_lock = data.getTemporaryPartDirectoryHolder(part_dir);
auto indices = MergeTreeIndexFactory::instance().getMany(metadata_snapshot->getSecondaryIndices());
MergeTreeIndices indices;
if (context->getSettingsRef().materialize_skip_indexes_on_insert)
indices = MergeTreeIndexFactory::instance().getMany(metadata_snapshot->getSecondaryIndices());
Statistics statistics;
if (context->getSettingsRef().materialize_statistics_on_insert)
statistics = MergeTreeStatisticsFactory::instance().getMany(metadata_snapshot->getColumns());
/// If we need to calculate some columns to sort.
if (metadata_snapshot->hasSortingKey() || metadata_snapshot->hasSecondaryIndices())
@ -596,7 +602,7 @@ MergeTreeDataWriter::TemporaryPart MergeTreeDataWriter::writeTempPartImpl(
metadata_snapshot,
columns,
indices,
MergeTreeStatisticsFactory::instance().getMany(metadata_snapshot->getColumns()),
statistics,
compression_codec,
context->getCurrentTransaction() ? context->getCurrentTransaction()->tid : Tx::PrehistoricTID,
false,

View File

@ -12,6 +12,7 @@
#include <Parsers/ASTSelectQuery.h>
#include <Functions/FunctionFactory.h>
#include <Functions/indexHint.h>
#include <Planner/PlannerActionsVisitor.h>
#include <Storages/MergeTree/MergeTreeIndexUtils.h>
@ -256,8 +257,13 @@ MergeTreeIndexConditionSet::MergeTreeIndexConditionSet(
if (!filter_dag)
return;
if (checkDAGUseless(*filter_dag->getOutputs().at(0), context))
std::vector<FutureSetPtr> sets_to_prepare;
if (checkDAGUseless(*filter_dag->getOutputs().at(0), context, sets_to_prepare))
return;
/// Try to run subqueries, don't use index if failed (e.g. if use_index_for_in_with_subqueries is disabled).
for (auto & set : sets_to_prepare)
if (!set->buildOrderedSetInplace(context))
return;
auto filter_actions_dag = filter_dag->clone();
const auto * filter_actions_dag_node = filter_actions_dag->getOutputs().at(0);
@ -300,6 +306,25 @@ bool MergeTreeIndexConditionSet::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx
}
static const ActionsDAG::NodeRawConstPtrs & getArguments(const ActionsDAG::Node & node, const ActionsDAGPtr & result_dag_or_null, ActionsDAG::NodeRawConstPtrs * storage)
{
chassert(node.type == ActionsDAG::ActionType::FUNCTION);
if (node.function_base->getName() != "indexHint")
return node.children;
/// indexHint arguments are stored inside of `FunctionIndexHint` class.
const auto & adaptor = typeid_cast<const FunctionToFunctionBaseAdaptor &>(*node.function_base);
const auto & index_hint = typeid_cast<const FunctionIndexHint &>(*adaptor.getFunction());
if (!result_dag_or_null)
return index_hint.getActions()->getOutputs();
/// Import the DAG and map argument pointers.
ActionsDAGPtr actions_clone = index_hint.getActions()->clone();
chassert(storage);
result_dag_or_null->mergeNodes(std::move(*actions_clone), storage);
return *storage;
}
const ActionsDAG::Node & MergeTreeIndexConditionSet::traverseDAG(const ActionsDAG::Node & node,
ActionsDAGPtr & result_dag,
const ContextPtr & context,
@ -349,7 +374,7 @@ const ActionsDAG::Node * MergeTreeIndexConditionSet::atomFromDAG(const ActionsDA
while (node_to_check->type == ActionsDAG::ActionType::ALIAS)
node_to_check = node_to_check->children[0];
if (node_to_check->column && isColumnConst(*node_to_check->column))
if (node_to_check->column && (isColumnConst(*node_to_check->column) || WhichDataType(node.result_type).isSet()))
return &node;
RPNBuilderTreeContext tree_context(context);
@ -396,14 +421,15 @@ const ActionsDAG::Node * MergeTreeIndexConditionSet::operatorFromDAG(const Actio
while (node_to_check->type == ActionsDAG::ActionType::ALIAS)
node_to_check = node_to_check->children[0];
if (node_to_check->column && isColumnConst(*node_to_check->column))
if (node_to_check->column && (isColumnConst(*node_to_check->column) || WhichDataType(node.result_type).isSet()))
return nullptr;
if (node_to_check->type != ActionsDAG::ActionType::FUNCTION)
return nullptr;
auto function_name = node_to_check->function->getName();
const auto & arguments = node_to_check->children;
ActionsDAG::NodeRawConstPtrs temp_ptrs_to_argument;
const auto & arguments = getArguments(*node_to_check, result_dag, &temp_ptrs_to_argument);
size_t arguments_size = arguments.size();
if (function_name == "not")
@ -418,7 +444,7 @@ const ActionsDAG::Node * MergeTreeIndexConditionSet::operatorFromDAG(const Actio
}
else if (function_name == "and" || function_name == "indexHint" || function_name == "or")
{
if (arguments_size < 2)
if (arguments_size < 1)
return nullptr;
ActionsDAG::NodeRawConstPtrs children;
@ -437,18 +463,12 @@ const ActionsDAG::Node * MergeTreeIndexConditionSet::operatorFromDAG(const Actio
const auto * last_argument = children.back();
children.pop_back();
const auto * before_last_argument = children.back();
children.pop_back();
while (true)
while (!children.empty())
{
last_argument = &result_dag->addFunction(function, {before_last_argument, last_argument}, {});
if (children.empty())
break;
before_last_argument = children.back();
const auto * before_last_argument = children.back();
children.pop_back();
last_argument = &result_dag->addFunction(function, {before_last_argument, last_argument}, {});
}
return last_argument;
@ -457,7 +477,7 @@ const ActionsDAG::Node * MergeTreeIndexConditionSet::operatorFromDAG(const Actio
return nullptr;
}
bool MergeTreeIndexConditionSet::checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, bool atomic) const
bool MergeTreeIndexConditionSet::checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, std::vector<FutureSetPtr> & sets_to_prepare, bool atomic) const
{
const auto * node_to_check = &node;
while (node_to_check->type == ActionsDAG::ActionType::ALIAS)
@ -466,8 +486,13 @@ bool MergeTreeIndexConditionSet::checkDAGUseless(const ActionsDAG::Node & node,
RPNBuilderTreeContext tree_context(context);
RPNBuilderTreeNode tree_node(node_to_check, tree_context);
if (node.column && isColumnConst(*node.column)
&& !WhichDataType(node.result_type).isSet())
if (WhichDataType(node.result_type).isSet())
{
if (auto set = tree_node.tryGetPreparedSet())
sets_to_prepare.push_back(set);
return false;
}
else if (node.column && isColumnConst(*node.column))
{
Field literal;
node.column->get(0, literal);
@ -480,173 +505,33 @@ bool MergeTreeIndexConditionSet::checkDAGUseless(const ActionsDAG::Node & node,
return false;
auto function_name = node.function_base->getName();
const auto & arguments = node.children;
const auto & arguments = getArguments(node, nullptr, nullptr);
if (function_name == "and" || function_name == "indexHint")
return std::all_of(arguments.begin(), arguments.end(), [&, atomic](const auto & arg) { return checkDAGUseless(*arg, context, atomic); });
{
/// Can't use std::all_of() because we have to call checkDAGUseless() for all arguments
/// to populate sets_to_prepare.
bool all_useless = true;
for (const auto & arg : arguments)
{
bool u = checkDAGUseless(*arg, context, sets_to_prepare, atomic);
all_useless = all_useless && u;
}
return all_useless;
}
else if (function_name == "or")
return std::any_of(arguments.begin(), arguments.end(), [&, atomic](const auto & arg) { return checkDAGUseless(*arg, context, atomic); });
return std::any_of(arguments.begin(), arguments.end(), [&, atomic](const auto & arg) { return checkDAGUseless(*arg, context, sets_to_prepare, atomic); });
else if (function_name == "not")
return checkDAGUseless(*arguments.at(0), context, atomic);
return checkDAGUseless(*arguments.at(0), context, sets_to_prepare, atomic);
else
return std::any_of(arguments.begin(), arguments.end(),
[&](const auto & arg) { return checkDAGUseless(*arg, context, true /*atomic*/); });
[&](const auto & arg) { return checkDAGUseless(*arg, context, sets_to_prepare, true /*atomic*/); });
}
auto column_name = tree_node.getColumnName();
return !key_columns.contains(column_name);
}
void MergeTreeIndexConditionSet::traverseAST(ASTPtr & node) const
{
if (operatorFromAST(node))
{
auto & args = node->as<ASTFunction>()->arguments->children;
for (auto & arg : args)
traverseAST(arg);
return;
}
if (atomFromAST(node))
{
if (node->as<ASTIdentifier>() || node->as<ASTFunction>())
/// __bitWrapperFunc* uses default implementation for Nullable types
/// Here we additionally convert Null to 0,
/// otherwise condition 'something OR NULL' will always return Null and filter everything.
node = makeASTFunction("__bitWrapperFunc", makeASTFunction("ifNull", node, std::make_shared<ASTLiteral>(Field(0))));
}
else
node = std::make_shared<ASTLiteral>(UNKNOWN_FIELD);
}
bool MergeTreeIndexConditionSet::atomFromAST(ASTPtr & node) const
{
/// Function, literal or column
if (node->as<ASTLiteral>())
return true;
if (const auto * identifier = node->as<ASTIdentifier>())
return key_columns.contains(identifier->getColumnName());
if (auto * func = node->as<ASTFunction>())
{
if (key_columns.contains(func->getColumnName()))
{
/// Function is already calculated.
node = std::make_shared<ASTIdentifier>(func->getColumnName());
return true;
}
auto & args = func->arguments->children;
for (auto & arg : args)
if (!atomFromAST(arg))
return false;
return true;
}
return false;
}
bool MergeTreeIndexConditionSet::operatorFromAST(ASTPtr & node)
{
/// Functions AND, OR, NOT. Replace with bit*.
auto * func = node->as<ASTFunction>();
if (!func)
return false;
auto & args = func->arguments->children;
if (func->name == "not")
{
if (args.size() != 1)
return false;
func->name = "__bitSwapLastTwo";
}
else if (func->name == "and" || func->name == "indexHint")
{
if (args.size() < 2)
return false;
auto last_arg = args.back();
args.pop_back();
ASTPtr new_func;
if (args.size() > 1)
new_func = makeASTFunction(
"__bitBoolMaskAnd",
node,
last_arg);
else
new_func = makeASTFunction(
"__bitBoolMaskAnd",
args.back(),
last_arg);
node = new_func;
}
else if (func->name == "or")
{
if (args.size() < 2)
return false;
auto last_arg = args.back();
args.pop_back();
ASTPtr new_func;
if (args.size() > 1)
new_func = makeASTFunction(
"__bitBoolMaskOr",
node,
last_arg);
else
new_func = makeASTFunction(
"__bitBoolMaskOr",
args.back(),
last_arg);
node = new_func;
}
else
return false;
return true;
}
bool MergeTreeIndexConditionSet::checkASTUseless(const ASTPtr & node, bool atomic) const
{
if (!node)
return true;
if (const auto * func = node->as<ASTFunction>())
{
if (key_columns.contains(func->getColumnName()))
return false;
const ASTs & args = func->arguments->children;
if (func->name == "and" || func->name == "indexHint")
return std::all_of(args.begin(), args.end(), [this, atomic](const auto & arg) { return checkASTUseless(arg, atomic); });
else if (func->name == "or")
return std::any_of(args.begin(), args.end(), [this, atomic](const auto & arg) { return checkASTUseless(arg, atomic); });
else if (func->name == "not")
return checkASTUseless(args[0], atomic);
else
return std::any_of(args.begin(), args.end(),
[this](const auto & arg) { return checkASTUseless(arg, true); });
}
else if (const auto * literal = node->as<ASTLiteral>())
return !atomic && literal->value.safeGet<bool>();
else if (const auto * identifier = node->as<ASTIdentifier>())
return !key_columns.contains(identifier->getColumnName());
else
return true;
}
MergeTreeIndexGranulePtr MergeTreeIndexSet::createIndexGranule() const
{

View File

@ -106,15 +106,7 @@ private:
const ContextPtr & context,
std::unordered_map<const ActionsDAG::Node *, const ActionsDAG::Node *> & node_to_result_node) const;
bool checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, bool atomic = false) const;
void traverseAST(ASTPtr & node) const;
bool atomFromAST(ASTPtr & node) const;
static bool operatorFromAST(ASTPtr & node);
bool checkASTUseless(const ASTPtr & node, bool atomic = false) const;
bool checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, std::vector<FutureSetPtr> & sets_to_prepare, bool atomic = false) const;
String index_name;
size_t max_rows;

View File

@ -261,9 +261,9 @@ void MergeTreeWhereOptimizer::analyzeImpl(Conditions & res, const RPNBuilderTree
cond.columns_size = getColumnsSize(cond.table_columns);
cond.viable =
!has_invalid_column &&
!has_invalid_column
/// Condition depend on some column. Constant expressions are not moved.
!cond.table_columns.empty()
&& !cond.table_columns.empty()
&& !cannotBeMoved(node, where_optimizer_context)
/// When use final, do not take into consideration the conditions with non-sorting keys. Because final select
/// need to use all sorting keys, it will cause correctness issues if we filter other columns before final merge.
@ -273,17 +273,15 @@ void MergeTreeWhereOptimizer::analyzeImpl(Conditions & res, const RPNBuilderTree
/// Do not move conditions involving all queried columns.
&& cond.table_columns.size() < queried_columns.size();
if (cond.viable)
cond.good = isConditionGood(node, table_columns);
if (where_optimizer_context.use_statistic)
{
cond.good = cond.viable;
cond.selectivity = estimator.estimateSelectivity(node);
if (node.getASTNode() != nullptr)
LOG_TEST(log, "Condition {} has selectivity {}", node.getASTNode()->dumpTree(), cond.selectivity);
LOG_TEST(log, "Condition {} has selectivity {}", node.getColumnName(), cond.selectivity);
}
else if (cond.viable)
{
cond.good = isConditionGood(node, table_columns);
}
if (where_optimizer_context.move_primary_key_columns_to_end_of_prewhere)
@ -363,6 +361,7 @@ std::optional<MergeTreeWhereOptimizer::OptimizeResult> MergeTreeWhereOptimizer::
/// Move condition and all other conditions depend on the same set of columns.
auto move_condition = [&](Conditions::iterator cond_it)
{
LOG_TRACE(log, "Condition {} moved to PREWHERE", cond_it->node.getColumnName());
prewhere_conditions.splice(prewhere_conditions.end(), where_conditions, cond_it);
total_size_of_moved_conditions += cond_it->columns_size;
total_number_of_moved_columns += cond_it->table_columns.size();
@ -371,9 +370,14 @@ std::optional<MergeTreeWhereOptimizer::OptimizeResult> MergeTreeWhereOptimizer::
for (auto jt = where_conditions.begin(); jt != where_conditions.end();)
{
if (jt->viable && jt->columns_size == cond_it->columns_size && jt->table_columns == cond_it->table_columns)
{
LOG_TRACE(log, "Condition {} moved to PREWHERE", jt->node.getColumnName());
prewhere_conditions.splice(prewhere_conditions.end(), where_conditions, jt++);
}
else
{
++jt;
}
}
};

View File

@ -424,9 +424,9 @@ RPNBuilderTreeNode RPNBuilderFunctionTreeNode::getArgumentAt(size_t index) const
// because they are used only for index analysis.
if (dag_node->function_base->getName() == "indexHint")
{
const auto * adaptor = typeid_cast<const FunctionToFunctionBaseAdaptor *>(dag_node->function_base.get());
const auto * index_hint = typeid_cast<const FunctionIndexHint *>(adaptor->getFunction().get());
return RPNBuilderTreeNode(index_hint->getActions()->getOutputs()[index], tree_context);
const auto & adaptor = typeid_cast<const FunctionToFunctionBaseAdaptor &>(*dag_node->function_base);
const auto & index_hint = typeid_cast<const FunctionIndexHint &>(*adaptor.getFunction());
return RPNBuilderTreeNode(index_hint.getActions()->getOutputs()[index], tree_context);
}
return RPNBuilderTreeNode(dag_node->children[index], tree_context);

View File

@ -112,7 +112,7 @@ Float64 ConditionEstimator::estimateSelectivity(const RPNBuilderTreeNode & node)
auto [op, val] = extractBinaryOp(node, col);
if (op == "equals")
{
if (val < - threshold || val > threshold)
if (val < -threshold || val > threshold)
return default_normal_cond_factor;
else
return default_good_cond_factor;

View File

@ -22,6 +22,31 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
};
StatisticDescription & StatisticDescription::operator=(const StatisticDescription & other)
{
if (this == &other)
return *this;
type = other.type;
column_name = other.column_name;
ast = other.ast ? other.ast->clone() : nullptr;
return *this;
}
StatisticDescription & StatisticDescription::operator=(StatisticDescription && other) noexcept
{
if (this == &other)
return *this;
type = std::exchange(other.type, StatisticType{});
column_name = std::move(other.column_name);
ast = other.ast ? other.ast->clone() : nullptr;
other.ast.reset();
return *this;
}
StatisticType stringToType(String type)
{
if (type == "tdigest")
@ -55,15 +80,7 @@ std::vector<StatisticDescription> StatisticDescription::getStatisticsFromAST(con
const auto & column = columns.getPhysical(column_name);
stat.column_name = column.name;
auto function_node = std::make_shared<ASTFunction>();
function_node->name = "STATISTIC";
function_node->arguments = std::make_shared<ASTExpressionList>();
function_node->arguments->children.push_back(std::make_shared<ASTIdentifier>(stat_definition->type));
function_node->children.push_back(function_node->arguments);
stat.ast = function_node;
stat.ast = makeASTFunction("STATISTIC", std::make_shared<ASTIdentifier>(stat_definition->type));
stats.push_back(stat);
}
@ -80,6 +97,7 @@ StatisticDescription StatisticDescription::getStatisticFromColumnDeclaration(con
const auto & stat_type_list_ast = column.stat_type->as<ASTFunction &>().arguments;
if (stat_type_list_ast->children.size() != 1)
throw Exception(ErrorCodes::INCORRECT_QUERY, "We expect only one statistic type for column {}", queryToString(column));
const auto & stat_type = stat_type_list_ast->children[0]->as<ASTFunction &>().name;
StatisticDescription stat;

View File

@ -27,6 +27,10 @@ struct StatisticDescription
String getTypeName() const;
StatisticDescription() = default;
StatisticDescription(const StatisticDescription & other) { *this = other; }
StatisticDescription & operator=(const StatisticDescription & other);
StatisticDescription(StatisticDescription && other) noexcept { *this = std::move(other); }
StatisticDescription & operator=(StatisticDescription && other) noexcept;
bool operator==(const StatisticDescription & other) const
{

View File

@ -274,7 +274,7 @@ std::unique_ptr<ReadBuffer> selectReadBuffer(
if (S_ISREG(file_stat.st_mode) && (read_method == LocalFSReadMethod::pread || read_method == LocalFSReadMethod::mmap))
{
if (use_table_fd)
res = std::make_unique<ReadBufferFromFileDescriptorPRead>(table_fd);
res = std::make_unique<ReadBufferFromFileDescriptorPRead>(table_fd, context->getSettingsRef().max_read_buffer_size);
else
res = std::make_unique<ReadBufferFromFilePRead>(current_path, context->getSettingsRef().max_read_buffer_size);
@ -296,7 +296,7 @@ std::unique_ptr<ReadBuffer> selectReadBuffer(
else
{
if (use_table_fd)
res = std::make_unique<ReadBufferFromFileDescriptor>(table_fd);
res = std::make_unique<ReadBufferFromFileDescriptor>(table_fd, context->getSettingsRef().max_read_buffer_size);
else
res = std::make_unique<ReadBufferFromFile>(current_path, context->getSettingsRef().max_read_buffer_size);

View File

@ -320,6 +320,8 @@ QueryTreeNodePtr buildQueryTreeForShard(const PlannerContextPtr & planner_contex
auto replacement_map = visitor.getReplacementMap();
const auto & global_in_or_join_nodes = visitor.getGlobalInOrJoinNodes();
QueryTreeNodePtrWithHashMap<TableNodePtr> global_in_temporary_tables;
for (const auto & global_in_or_join_node : global_in_or_join_nodes)
{
if (auto * join_node = global_in_or_join_node.query_node->as<JoinNode>())
@ -364,15 +366,19 @@ QueryTreeNodePtr buildQueryTreeForShard(const PlannerContextPtr & planner_contex
if (in_function_node_type != QueryTreeNodeType::QUERY && in_function_node_type != QueryTreeNodeType::UNION && in_function_node_type != QueryTreeNodeType::TABLE)
continue;
auto subquery_to_execute = in_function_subquery_node;
if (subquery_to_execute->as<TableNode>())
subquery_to_execute = buildSubqueryToReadColumnsFromTableExpression(subquery_to_execute, planner_context->getQueryContext());
auto & temporary_table_expression_node = global_in_temporary_tables[in_function_subquery_node];
if (!temporary_table_expression_node)
{
auto subquery_to_execute = in_function_subquery_node;
if (subquery_to_execute->as<TableNode>())
subquery_to_execute = buildSubqueryToReadColumnsFromTableExpression(subquery_to_execute, planner_context->getQueryContext());
auto temporary_table_expression_node = executeSubqueryNode(subquery_to_execute,
planner_context->getMutableQueryContext(),
global_in_or_join_node.subquery_depth);
temporary_table_expression_node = executeSubqueryNode(subquery_to_execute,
planner_context->getMutableQueryContext(),
global_in_or_join_node.subquery_depth);
}
in_function_subquery_node = std::move(temporary_table_expression_node);
replacement_map.emplace(in_function_subquery_node.get(), temporary_table_expression_node);
}
else
{

View File

@ -108,7 +108,7 @@ namespace DB
bool is_insert_query) const
{
StoragePtr storage;
if (!loop_table_name.empty())
if (!inner_table_function_ast)
{
String database_name = loop_database_name;
if (database_name.empty())
@ -119,7 +119,6 @@ namespace DB
if (!storage)
throw Exception(ErrorCodes::UNKNOWN_TABLE, "Table '{}' not found in database '{}'", loop_table_name, database_name);
}
else
{
auto inner_table_function = TableFunctionFactory::instance().get(inner_table_function_ast, context);

View File

@ -61,7 +61,6 @@ def test_big_family(client: KeeperClient):
)
response = client.find_big_family("/test_big_family", 2)
assert response == TSV(
[
["/test_big_family", "11"],
@ -87,7 +86,12 @@ def test_find_super_nodes(client: KeeperClient):
client.cd("/test_find_super_nodes")
response = client.find_super_nodes(4)
assert response == TSV(
# The order of the response is not guaranteed, so we need to sort it
normalized_response = response.strip().split("\n")
normalized_response.sort()
assert TSV(normalized_response) == TSV(
[
["/test_find_super_nodes/1", "5"],
["/test_find_super_nodes/2", "4"],

View File

@ -0,0 +1,36 @@
<clickhouse>
<logger>
<level>test</level>
</logger>
<storage_configuration>
<disks>
<s3>
<type>s3</type>
<endpoint>http://minio1:9001/root/data/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3>
</disks>
<policies>
<default>
<default>
<disk>default</disk>
</default>
</default>
<s3>
<volumes>
<default>
<disk>default</disk>
<perform_ttl_move_on_insert>False</perform_ttl_move_on_insert>
</default>
<main>
<disk>s3</disk>
<perform_ttl_move_on_insert>False</perform_ttl_move_on_insert>
</main>
</volumes>
<move_factor>0.0</move_factor>
</s3>
</policies>
</storage_configuration>
</clickhouse>

View File

@ -0,0 +1,105 @@
#!/usr/bin/env python3
import logging
import random
import string
import time
import pytest
from helpers.cluster import ClickHouseCluster
import minio
cluster = ClickHouseCluster(__file__)
@pytest.fixture(scope="module")
def started_cluster():
try:
cluster.add_instance(
"node1",
main_configs=["configs/storage_conf.xml"],
image="clickhouse/clickhouse-server",
with_minio=True,
tag="24.1",
stay_alive=True,
with_installed_binary=True,
)
cluster.start()
yield cluster
finally:
cluster.shutdown()
def test_bc_compatibility(started_cluster):
node1 = cluster.instances["node1"]
node1.query(
"""
CREATE TABLE test_ttl_table (
generation UInt64,
date_key DateTime,
number UInt64,
text String,
expired DateTime DEFAULT now()
)
ENGINE=MergeTree
ORDER BY (generation, date_key)
PARTITION BY toMonth(date_key)
TTL expired + INTERVAL 20 SECONDS TO DISK 's3'
SETTINGS storage_policy = 's3';
"""
)
node1.query(
"""
INSERT INTO test_ttl_table (
generation,
date_key,
number,
text
)
SELECT
1,
toDateTime('2000-01-01 00:00:00') + rand(number) % 365 * 86400,
number,
toString(number)
FROM numbers(10000);
"""
)
disks = (
node1.query(
"""
SELECT distinct disk_name
FROM system.parts
WHERE table = 'test_ttl_table'
"""
)
.strip()
.split("\n")
)
print("Disks before", disks)
assert len(disks) == 1
assert disks[0] == "default"
node1.restart_with_latest_version()
for _ in range(60):
disks = (
node1.query(
"""
SELECT distinct disk_name
FROM system.parts
WHERE table = 'test_ttl_table'
"""
)
.strip()
.split("\n")
)
print("Disks after", disks)
if "s3" in disks:
break
time.sleep(1)
assert "s3" in disks

View File

@ -12,6 +12,6 @@ INSERT INTO cast_enums SELECT 2 AS type, toDate('2017-01-01') AS date, number AS
SELECT type, date, id FROM cast_enums ORDER BY type, id;
INSERT INTO cast_enums VALUES ('wrong_value', '2017-01-02', 7); -- { clientError 691 }
INSERT INTO cast_enums VALUES ('wrong_value', '2017-01-02', 7); -- { clientError UNKNOWN_ELEMENT_OF_ENUM }
DROP TABLE IF EXISTS cast_enums;

View File

@ -18,26 +18,26 @@ CREATE TABLE IF NOT EXISTS decimal
j DECIMAL(1,0)
) ENGINE = Memory;
INSERT INTO decimal (a) VALUES (1000000000); -- { clientError 69 }
INSERT INTO decimal (a) VALUES (-1000000000); -- { clientError 69 }
INSERT INTO decimal (b) VALUES (1000000000000000000); -- { clientError 69 }
INSERT INTO decimal (b) VALUES (-1000000000000000000); -- { clientError 69 }
INSERT INTO decimal (c) VALUES (100000000000000000000000000000000000000); -- { clientError 69 }
INSERT INTO decimal (c) VALUES (-100000000000000000000000000000000000000); -- { clientError 69 }
INSERT INTO decimal (d) VALUES (1); -- { clientError 69 }
INSERT INTO decimal (d) VALUES (-1); -- { clientError 69 }
INSERT INTO decimal (e) VALUES (1000000000000000000); -- { clientError 69 }
INSERT INTO decimal (e) VALUES (-1000000000000000000); -- { clientError 69 }
INSERT INTO decimal (f) VALUES (1); -- { clientError 69 }
INSERT INTO decimal (f) VALUES (-1); -- { clientError 69 }
INSERT INTO decimal (g) VALUES (10000); -- { clientError 69 }
INSERT INTO decimal (g) VALUES (-10000); -- { clientError 69 }
INSERT INTO decimal (h) VALUES (1000000000); -- { clientError 69 }
INSERT INTO decimal (h) VALUES (-1000000000); -- { clientError 69 }
INSERT INTO decimal (i) VALUES (100000000000000000000); -- { clientError 69 }
INSERT INTO decimal (i) VALUES (-100000000000000000000); -- { clientError 69 }
INSERT INTO decimal (j) VALUES (10); -- { clientError 69 }
INSERT INTO decimal (j) VALUES (-10); -- { clientError 69 }
INSERT INTO decimal (a) VALUES (1000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (a) VALUES (-1000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (b) VALUES (1000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (b) VALUES (-1000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (c) VALUES (100000000000000000000000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (c) VALUES (-100000000000000000000000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (d) VALUES (1); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (d) VALUES (-1); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (e) VALUES (1000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (e) VALUES (-1000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (f) VALUES (1); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (f) VALUES (-1); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (g) VALUES (10000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (g) VALUES (-10000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (h) VALUES (1000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (h) VALUES (-1000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (i) VALUES (100000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (i) VALUES (-100000000000000000000); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (j) VALUES (10); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (j) VALUES (-10); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (a) VALUES (0.1);
INSERT INTO decimal (a) VALUES (-0.1);
@ -84,14 +84,14 @@ INSERT INTO decimal (a, b, c, d, e, f, g, h, i, j) VALUES (0.0, 0.0, 0.0, 0.0, 0
INSERT INTO decimal (a, b, c, d, e, f, g, h, i, j) VALUES (-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0);
INSERT INTO decimal (a, b, g) VALUES ('42.00000', 42.0000000000000000000000000000000, '0.999990');
INSERT INTO decimal (a) VALUES ('-9x'); -- { clientError 6 }
INSERT INTO decimal (a) VALUES ('0x1'); -- { clientError 6 }
INSERT INTO decimal (a) VALUES ('-9x'); -- { clientError CANNOT_PARSE_TEXT }
INSERT INTO decimal (a) VALUES ('0x1'); -- { clientError CANNOT_PARSE_TEXT }
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('0.9e9', '0.9e18', '0.9e38', '9e-9', '9e-18', '9e-38');
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('-0.9e9', '-0.9e18', '-0.9e38', '-9e-9', '-9e-18', '-9e-38');
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('1e9', '1e18', '1e38', '1e-10', '1e-19', '1e-39'); -- { clientError 69 }
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('-1e9', '-1e18', '-1e38', '-1e-10', '-1e-19', '-1e-39'); -- { clientError 69 }
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('1e9', '1e18', '1e38', '1e-10', '1e-19', '1e-39'); -- { clientError ARGUMENT_OUT_OF_BOUND }
INSERT INTO decimal (a, b, c, d, e, f) VALUES ('-1e9', '-1e18', '-1e38', '-1e-10', '-1e-19', '-1e-39'); -- { clientError ARGUMENT_OUT_OF_BOUND }
SELECT * FROM decimal ORDER BY a, b, c, d, e, f, g, h, i, j;
DROP TABLE IF EXISTS decimal;

View File

@ -5,7 +5,7 @@ set input_format_null_as_default=0;
CREATE TABLE arraytest ( created_date Date DEFAULT toDate(created_at), created_at DateTime DEFAULT now(), strings Array(String) DEFAULT emptyArrayString()) ENGINE = MergeTree(created_date, cityHash64(created_at), (created_date, cityHash64(created_at)), 8192);
INSERT INTO arraytest (created_at, strings) VALUES (now(), ['aaaaa', 'bbbbb', 'ccccc']);
INSERT INTO arraytest (created_at, strings) VALUES (now(), ['aaaaa', 'bbbbb', null]); -- { clientError 349 }
INSERT INTO arraytest (created_at, strings) VALUES (now(), ['aaaaa', 'bbbbb', null]); -- { clientError CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN }
SELECT strings from arraytest;

View File

@ -23,9 +23,9 @@ INSERT INTO values_template VALUES ((1), lower(replaceAll('Hella', 'a', 'o')), 1
INSERT INTO values_template_nullable VALUES ((1), lower(replaceAll('Hella', 'a', 'o')), 1 + 2 + 3, arraySort(x -> assumeNotNull(x), [null, NULL::Nullable(UInt8)])), ((2), lower(replaceAll('Warld', 'b', 'o')), 4 - 5 + 6, arraySort(x -> assumeNotNull(x), [+1, -1, Null])), ((3), lower(replaceAll('Test', 'c', 'o')), 3 + 2 - 1, arraySort(x -> assumeNotNull(x), [1, nUlL, 3.14])), ((4), lower(replaceAll(null, 'c', 'o')), 6 + 5 - null, arraySort(x -> assumeNotNull(x), [3, 2, 1]));
INSERT INTO values_template_fallback VALUES (1 + x); -- { clientError 62 }
INSERT INTO values_template_fallback VALUES (abs(functionThatDoesNotExists(42))); -- { clientError 46 }
INSERT INTO values_template_fallback VALUES ([1]); -- { clientError 43 }
INSERT INTO values_template_fallback VALUES (1 + x); -- { clientError SYNTAX_ERROR }
INSERT INTO values_template_fallback VALUES (abs(functionThatDoesNotExists(42))); -- { clientError UNKNOWN_FUNCTION }
INSERT INTO values_template_fallback VALUES ([1]); -- { clientError ILLEGAL_TYPE_OF_ARGUMENT }
INSERT INTO values_template_fallback VALUES (CAST(1, 'UInt8')), (CAST('2', 'UInt8'));
SET input_format_values_accurate_types_of_literals = 0;

View File

@ -1,2 +1,2 @@
select 1 format Template settings format_template_row='01070_nonexistent_file.txt'; -- { clientError 107 }
select 1 format Template settings format_template_row='/dev/null'; -- { clientError 474 }
select 1 format Template settings format_template_row='01070_nonexistent_file.txt'; -- { clientError FILE_DOESNT_EXIST }
select 1 format Template settings format_template_row='/dev/null'; -- { clientError INVALID_TEMPLATE_FORMAT }

View File

@ -4,7 +4,7 @@ create table rmt1 (d DateTime, n int) engine=ReplicatedMergeTree('/test/01165/{d
create table rmt2 (d DateTime, n int) engine=ReplicatedMergeTree('/test/01165/{database}/rmt', '2') order by n partition by toYYYYMMDD(d);
system stop replicated sends rmt1;
insert into rmt1 values (now(), arrayJoin([1, 2])); -- { clientError 36 }
insert into rmt1 values (now(), arrayJoin([1, 2])); -- { clientError BAD_ARGUMENTS }
insert into rmt1(n) select * from system.numbers limit arrayJoin([1, 2]); -- { serverError BAD_ARGUMENTS, INVALID_LIMIT_EXPRESSION }
insert into rmt1 values (now(), rand());
drop table rmt1;

View File

@ -54,7 +54,7 @@ begin transaction;
insert into mt1 values (6);
insert into mt2 values (60);
select 'on session close', arraySort(groupArray(n)) from (select n from mt1 union all select * from mt2);
insert into mt1 values ([1]); -- { clientError 43 }
insert into mt1 values ([1]); -- { clientError ILLEGAL_TYPE_OF_ARGUMENT }
-- INSERT failures does not produce client reconnect anymore, so rollback can be done
rollback;

View File

@ -7,7 +7,7 @@ drop table if exists mt;
attach table test from 'some/path' (n UInt8) engine=Memory; -- { serverError NOT_IMPLEMENTED }
attach table test from '/etc/passwd' (s String) engine=File(TSVRaw); -- { serverError PATH_ACCESS_DENIED }
attach table test from '../../../../../../../../../etc/passwd' (s String) engine=File(TSVRaw); -- { serverError PATH_ACCESS_DENIED }
attach table test from 42 (s String) engine=File(TSVRaw); -- { clientError 62 }
attach table test from 42 (s String) engine=File(TSVRaw); -- { clientError SYNTAX_ERROR }
insert into table function file('01188_attach/file/data.TSV', 'TSV', 's String, n UInt8') values ('file', 42);
attach table file from '01188_attach/file' (s String, n UInt8) engine=File(TSV);

View File

@ -156,8 +156,8 @@ CREATE QUOTA q13_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '12G';
CREATE QUOTA q14_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '12Gi';
CREATE QUOTA q15_01297 FOR INTERVAL 1 MINUTE MAX query_selects = 1.5;
CREATE QUOTA q16_01297 FOR INTERVAL 1 MINUTE MAX execution_time = 1.5;
CREATE QUOTA q17_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1.5'; -- { clientError 27 }
CREATE QUOTA q18_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1.5'; -- { clientError 27 }
CREATE QUOTA q17_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1.5'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q18_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1.5'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
SHOW CREATE QUOTA q1_01297;
SHOW CREATE QUOTA q2_01297;
SHOW CREATE QUOTA q3_01297;
@ -205,8 +205,8 @@ SHOW CREATE QUOTA q2_01297;
DROP QUOTA IF EXISTS q1_01297;
DROP QUOTA IF EXISTS q2_01297;
SELECT '-- underflow test';
CREATE QUOTA q1_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '-1'; -- { clientError 72 }
CREATE QUOTA q2_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '-1'; -- { clientError 72 }
CREATE QUOTA q1_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '-1'; -- { clientError CANNOT_PARSE_NUMBER }
CREATE QUOTA q2_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '-1'; -- { clientError CANNOT_PARSE_NUMBER }
SELECT '-- syntax test';
CREATE QUOTA q1_01297 FOR INTERVAL 1 MINUTE MAX query_selects = ' 12 ';
CREATE QUOTA q2_01297 FOR INTERVAL 1 MINUTE MAX execution_time = ' 12 ';
@ -239,11 +239,11 @@ DROP QUOTA IF EXISTS q8_01297;
DROP QUOTA IF EXISTS q9_01297;
DROP QUOTA IF EXISTS q10_01297;
SELECT '-- bad syntax test';
CREATE QUOTA q1_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1 1'; -- { clientError 27 }
CREATE QUOTA q2_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1 1'; -- { clientError 27 }
CREATE QUOTA q3_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1K 1'; -- { clientError 27 }
CREATE QUOTA q4_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1K 1'; -- { clientError 27 }
CREATE QUOTA q5_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1K1'; -- { clientError 27 }
CREATE QUOTA q6_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1K1'; -- { clientError 27 }
CREATE QUOTA q7_01297 FOR INTERVAL 1 MINUTE MAX query_selects = 'foo'; -- { clientError 27 }
CREATE QUOTA q8_01297 FOR INTERVAL 1 MINUTE MAX execution_time = 'bar'; -- { clientError 27 }
CREATE QUOTA q1_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1 1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q2_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1 1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q3_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1K 1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q4_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1K 1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q5_01297 FOR INTERVAL 1 MINUTE MAX query_selects = '1K1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q6_01297 FOR INTERVAL 1 MINUTE MAX execution_time = '1K1'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q7_01297 FOR INTERVAL 1 MINUTE MAX query_selects = 'foo'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }
CREATE QUOTA q8_01297 FOR INTERVAL 1 MINUTE MAX execution_time = 'bar'; -- { clientError CANNOT_PARSE_INPUT_ASSERTION_FAILED }

View File

@ -3,11 +3,11 @@ create table values_01564(
a int,
constraint c1 check a < 10) engine Memory;
-- client error hint after broken insert values
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
select 1;
1
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
select nonexistent column; -- { serverError UNKNOWN_IDENTIFIER }
select 1;
1
@ -25,7 +25,7 @@ select 1;
1
-- a failing insert and then a normal insert (#https://github.com/ClickHouse/ClickHouse/issues/19353)
CREATE TABLE t0 (c0 String, c1 Int32) ENGINE = Memory() ;
INSERT INTO t0(c0, c1) VALUES ("1",1) ; -- { clientError 47 }
INSERT INTO t0(c0, c1) VALUES ("1",1) ; -- { clientError UNKNOWN_IDENTIFIER }
INSERT INTO t0(c0, c1) VALUES ('1', 1) ;
-- the return code must be zero after the final query has failed with expected error
insert into values_01564 values (11); -- { serverError VIOLATED_CONSTRAINT }

View File

@ -4,21 +4,21 @@ create table values_01564(
constraint c1 check a < 10) engine Memory;
-- client error hint after broken insert values
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
select 1;
insert into values_01564 values ('f'); -- { clientError 6 }
insert into values_01564 values ('f'); -- { clientError CANNOT_PARSE_TEXT }
select nonexistent column; -- { serverError UNKNOWN_IDENTIFIER }
-- syntax error hint after broken insert values
insert into values_01564 this is bad syntax values ('f'); -- { clientError 62 }
insert into values_01564 this is bad syntax values ('f'); -- { clientError SYNTAX_ERROR }
insert into values_01564 this is bad syntax values ('f'); -- { clientError 62 }
insert into values_01564 this is bad syntax values ('f'); -- { clientError SYNTAX_ERROR }
select 1;
insert into values_01564 this is bad syntax values ('f'); -- { clientError 62 }
insert into values_01564 this is bad syntax values ('f'); -- { clientError SYNTAX_ERROR }
select nonexistent column; -- { serverError UNKNOWN_IDENTIFIER }
-- server error hint after broken insert values (violated constraint)
@ -37,14 +37,14 @@ insert into values_01564 values (1); select 1;
-- insert into values_01564 values (11) /*{ serverError VIOLATED_CONSTRAINT }*/; select 1;
-- syntax error, where the last token we can parse is long before the semicolon.
select this is too many words for an alias; -- { clientError 62 }
OPTIMIZE TABLE values_01564 DEDUPLICATE BY; -- { clientError 62 }
OPTIMIZE TABLE values_01564 DEDUPLICATE BY a EXCEPT a; -- { clientError 62 }
select 'a' || distinct one || 'c' from system.one; -- { clientError 62 }
select this is too many words for an alias; -- { clientError SYNTAX_ERROR }
OPTIMIZE TABLE values_01564 DEDUPLICATE BY; -- { clientError SYNTAX_ERROR }
OPTIMIZE TABLE values_01564 DEDUPLICATE BY a EXCEPT a; -- { clientError SYNTAX_ERROR }
select 'a' || distinct one || 'c' from system.one; -- { clientError SYNTAX_ERROR }
-- a failing insert and then a normal insert (#https://github.com/ClickHouse/ClickHouse/issues/19353)
CREATE TABLE t0 (c0 String, c1 Int32) ENGINE = Memory() ;
INSERT INTO t0(c0, c1) VALUES ("1",1) ; -- { clientError 47 }
INSERT INTO t0(c0, c1) VALUES ("1",1) ; -- { clientError UNKNOWN_IDENTIFIER }
INSERT INTO t0(c0, c1) VALUES ('1', 1) ;
-- the return code must be zero after the final query has failed with expected error

View File

@ -32,10 +32,10 @@ OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(pk); -- { serverError THE
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(sk); -- { serverError THERE_IS_NO_COLUMN } -- sorting key column is missing [1]
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(partition_key); -- { serverError THERE_IS_NO_COLUMN } -- partitioning column is missing [1]
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY; -- { clientError 62 } -- empty list is a syntax error
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk,sk,val,mat EXCEPT mat; -- { clientError 62 } -- invalid syntax
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk APPLY(pk + 1); -- { clientError 62 } -- APPLY column transformer is not supported
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk REPLACE(pk + 1); -- { clientError 62 } -- REPLACE column transformer is not supported
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY; -- { clientError SYNTAX_ERROR } -- empty list is a syntax error
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk,sk,val,mat EXCEPT mat; -- { clientError SYNTAX_ERROR } -- invalid syntax
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk APPLY(pk + 1); -- { clientError SYNTAX_ERROR } -- APPLY column transformer is not supported
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk REPLACE(pk + 1); -- { clientError SYNTAX_ERROR } -- REPLACE column transformer is not supported
-- Valid cases
-- NOTE: here and below we need FINAL to force deduplication in such a small set of data in only 1 part.

View File

@ -28,13 +28,13 @@ SHOW CREATE VIEW test_1602.tbl; -- { serverError BAD_ARGUMENTS }
SHOW CREATE TEMPORARY VIEW; -- { serverError UNKNOWN_TABLE }
SHOW CREATE VIEW; -- { clientError 62 }
SHOW CREATE VIEW; -- { clientError SYNTAX_ERROR }
SHOW CREATE DATABASE; -- { clientError 62 }
SHOW CREATE DATABASE; -- { clientError SYNTAX_ERROR }
SHOW CREATE DICTIONARY; -- { clientError 62 }
SHOW CREATE DICTIONARY; -- { clientError SYNTAX_ERROR }
SHOW CREATE TABLE; -- { clientError 62 }
SHOW CREATE TABLE; -- { clientError SYNTAX_ERROR }
SHOW CREATE test_1602.VIEW;

View File

@ -1,3 +1,3 @@
explain ast; -- { clientError 62 }
explain ast; -- { clientError SYNTAX_ERROR }
explain ast alter table t1 delete where date = today();
explain ast create function double AS (n) -> 2*n;

View File

@ -1,3 +1,3 @@
SELECT view(SELECT 1); -- { clientError 62 }
SELECT view(SELECT 1); -- { clientError SYNTAX_ERROR }
SELECT sumIf(dummy, dummy) FROM remote('127.0.0.{1,2}', numbers(2, 100), view(SELECT CAST(NULL, 'Nullable(UInt8)') AS dummy FROM system.one)); -- { serverError UNKNOWN_FUNCTION }

Some files were not shown because too many files have changed in this diff Show More