Merge branch 'master' into ci_py_small_refactoring

This commit is contained in:
Max K 2024-06-04 18:57:43 +02:00
commit f0f3d533c6
207 changed files with 6130 additions and 3311 deletions

View File

@ -10,3 +10,11 @@ assignees: ''
> Make sure to check documentation https://clickhouse.com/docs/en/ first. If the question is concise and probably has a short answer, asking it in [community Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-1gh9ds7f4-PgDhJAaF8ad5RbWBAAjzFg) is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse
> If you still prefer GitHub issues, remove all this text and ask your question here.
**Company or project name**
Put your company name or project description here
**Question**
Your question

View File

@ -9,6 +9,10 @@ assignees: ''
> (you don't have to strictly follow this form)
**Company or project name**
> Put your company name or project description here
**Use case**
> A clear and concise description of what is the intended usage scenario is.

View File

@ -9,6 +9,10 @@ assignees: ''
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the unexpected behaviour**
A clear and concise description of what works not as it is supposed to.

View File

@ -9,6 +9,10 @@ assignees: ''
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the unexpected behaviour**
A clear and concise description of what works not as it is supposed to.

View File

@ -9,6 +9,9 @@ assignees: ''
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the issue**
A clear and concise description of what works not as it is supposed to.

View File

@ -9,6 +9,10 @@ assignees: ''
> Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.com/docs/en/development/build/
**Company or project name**
> Put your company name or project description here
**Operating system**
> OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too.

View File

@ -8,6 +8,9 @@ labels: comp-documentation
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the issue**
A clear and concise description of what's wrong in documentation.

View File

@ -9,6 +9,9 @@ assignees: ''
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the situation**
What exactly works slower than expected?

View File

@ -9,6 +9,9 @@ assignees: ''
(you don't have to strictly follow this form)
**Company or project name**
Put your company name or project description here
**Describe the issue**
A clear and concise description of what works not as it is supposed to.

View File

@ -11,6 +11,10 @@ assignees: ''
> You have to provide the following information whenever possible.
**Company or project name**
> Put your company name or project description here
**Describe what's wrong**
> A clear and concise description of what works not as it is supposed to.

View File

@ -7,6 +7,10 @@ assignees: ''
---
**Company or project name**
Put your company name or project description here
**I have tried the following solutions**: https://clickhouse.com/docs/en/faq/troubleshooting/#troubleshooting-installation-errors
**Installation type**

View File

@ -12,7 +12,7 @@
#### Backward Incompatible Change
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_error_prone_window_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature

View File

@ -2,20 +2,22 @@
the file is autogenerated by utils/security-generator/generate_security.py
-->
# Security Policy
# ClickHouse Security Vulnerability Response Policy
## Security Announcements
Security fixes will be announced by posting them in the [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/).
## Security Change Log and Support
## Scope and Supported Versions
Details regarding security fixes are publicly reported in our [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/). A summary of known security vulnerabilities is shown at the bottom of this page.
The following versions of ClickHouse server are currently being supported with security updates:
Vulnerability notifications pre-release or during embargo periods are available to open source users and support customers registered for vulnerability alerts. Refer to our [Embargo Policy](#embargo-policy) below.
The following versions of ClickHouse server are currently supported with security updates:
| Version | Supported |
|:-|:-|
| 24.5 | ✔️ |
| 24.4 | ✔️ |
| 24.3 | ✔️ |
| 24.2 | ✔️ |
| 24.2 | |
| 24.1 | ❌ |
| 23.* | ❌ |
| 23.8 | ✔️ |
@ -37,7 +39,7 @@ The following versions of ClickHouse server are currently being supported with s
We're extremely grateful for security researchers and users that report vulnerabilities to the ClickHouse Open Source Community. All reports are thoroughly investigated by developers.
To report a potential vulnerability in ClickHouse please send the details about it to [security@clickhouse.com](mailto:security@clickhouse.com). We do not offer any financial rewards for reporting issues to us using this method. Alternatively, you can also submit your findings through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement.
To report a potential vulnerability in ClickHouse please send the details about it through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement.
### When Should I Report a Vulnerability?
@ -59,3 +61,21 @@ As the security issue moves from triage, to identified fix, to release planning
A public disclosure date is negotiated by the ClickHouse maintainers and the bug submitter. We prefer to fully disclose the bug as soon as possible once a user mitigation is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for vendor coordination. The timeframe for disclosure is from immediate (especially if it's already publicly known) to 90 days. For a vulnerability with a straightforward mitigation, we expect the report date to disclosure date to be on the order of 7 days.
## Embargo Policy
Open source users and support customers may subscribe to receive alerts during the embargo period by visiting [https://trust.clickhouse.com/?product=clickhouseoss](https://trust.clickhouse.com/?product=clickhouseoss), requesting access and subscribing for alerts. Subscribers agree not to make these notifications public, issue communications, share this information with others, or issue public patches before the disclosure date. Accidental disclosures must be reported immediately to trust@clickhouse.com. Failure to follow this policy or repeated leaks may result in removal from the subscriber list.
Participation criteria:
1. Be a current open source user or support customer with a valid corporate email domain (no @gmail.com, @azure.com, etc.).
1. Sign up to the ClickHouse OSS Trust Center at [https://trust.clickhouse.com](https://trust.clickhouse.com).
1. Accept the ClickHouse Security Vulnerability Response Policy as outlined above.
1. Subscribe to ClickHouse OSS Trust Center alerts.
Removal criteria:
1. Members may be removed for failure to follow this policy or repeated leaks.
1. Members may be removed for bounced messages (mail delivery failure).
1. Members may unsubscribe at any time.
Notification process:
ClickHouse will post notifications within our OSS Trust Center and notify subscribers. Subscribers must log in to the Trust Center to download the notification. The notification will include the timeframe for public disclosure.

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.4.1.2088"
ARG VERSION="24.5.1.1763"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
#docker-official-library:off

View File

@ -65,46 +65,22 @@ function save_settings_clean()
script -q -c "clickhouse-local -q \"select * from system.settings into outfile '$out'\"" --log-out /dev/null
}
# We save the (numeric) version of the old server to compare setting changes between the 2
# We do this since we are testing against the latest release, not taking into account release candidates, so we might
# be testing current master (24.6) against the latest stable release (24.4)
function save_major_version()
{
local out=$1 && shift
clickhouse-local -q "SELECT a[1]::UInt64 * 100 + a[2]::UInt64 as v FROM (Select splitByChar('.', version()) as a) into outfile '$out'"
}
save_settings_clean 'old_settings.native'
save_major_version 'old_version.native'
# Initial run without S3 to create system.*_log on local file system to make it
# available for dump via clickhouse-local
configure
function remove_keeper_config()
{
sudo sed -i "/<$1>$2<\/$1>/d" /etc/clickhouse-server/config.d/keeper_port.xml
}
# async_replication setting doesn't exist on some older versions
remove_keeper_config "async_replication" "1"
# create_if_not_exists feature flag doesn't exist on some older versions
remove_keeper_config "create_if_not_exists" "[01]"
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>azure<|<object_storage_type>azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>local<|<object_storage_type>local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml
# latest_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+"
# commit_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+"
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml
rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml
rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml
rm /etc/clickhouse-server/config.d/storage_conf_02963.xml
rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml
rm /etc/clickhouse-server/config.d/handlers.yaml
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
rm /etc/clickhouse-server/users.d/s3_cache_new.xml
rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml
start
stop
mv /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/clickhouse-server.initial.log
@ -116,44 +92,11 @@ export USE_S3_STORAGE_FOR_MERGE_TREE=1
export ZOOKEEPER_FAULT_INJECTION=0
configure
# force_sync=false doesn't work correctly on some older versions
sudo sed -i "s|<force_sync>false</force_sync>|<force_sync>true</force_sync>|" /etc/clickhouse-server/config.d/keeper_port.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>azure<|<object_storage_type>azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml
#todo: remove these after 24.3 released.
sudo sed -i "s|<object_storage_type>local<|<object_storage_type>local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml
# async_replication setting doesn't exist on some older versions
remove_keeper_config "async_replication" "1"
# create_if_not_exists feature flag doesn't exist on some older versions
remove_keeper_config "create_if_not_exists" "[01]"
# latest_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+"
# commit_logs_cache_size_threshold setting doesn't exist on some older versions
remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+"
# But we still need default disk because some tables loaded only into it
sudo sed -i "s|<main><disk>s3</disk></main>|<main><disk>s3</disk></main><default><disk>default</disk></default>|" /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
sudo chown clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
sudo chgrp clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml
rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml
rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml
rm /etc/clickhouse-server/config.d/storage_conf_02963.xml
rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml
rm /etc/clickhouse-server/config.d/handlers.yaml
rm /etc/clickhouse-server/config.d/block_number.xml
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
rm /etc/clickhouse-server/users.d/s3_cache_new.xml
rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml
start
clickhouse-client --query="SELECT 'Server version: ', version()"
@ -192,6 +135,7 @@ then
save_settings_clean 'new_settings.native'
clickhouse-local -nmq "
CREATE TABLE old_settings AS file('old_settings.native');
CREATE TABLE old_version AS file('old_version.native');
CREATE TABLE new_settings AS file('new_settings.native');
SELECT
@ -202,8 +146,11 @@ then
LEFT JOIN old_settings ON new_settings.name = old_settings.name
WHERE (new_settings.value != old_settings.value) AND (name NOT IN (
SELECT arrayJoin(tupleElement(changes, 'name'))
FROM system.settings_changes
WHERE version = extract(version(), '^(?:\\d+\\.\\d+)')
FROM
(
SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes
)
WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1)
))
SETTINGS join_use_nulls = 1
INTO OUTFILE 'changed_settings.txt'
@ -216,8 +163,11 @@ then
FROM old_settings
)) AND (name NOT IN (
SELECT arrayJoin(tupleElement(changes, 'name'))
FROM system.settings_changes
WHERE version = extract(version(), '^(?:\\d+\\.\\d+)')
FROM
(
SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes
)
WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1)
))
INTO OUTFILE 'new_settings.txt'
FORMAT PrettyCompactNoEscapes;

View File

@ -0,0 +1,366 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.5.1.1763-stable (647c154a94d) FIXME as compared to v24.4.1.2088-stable (6d4b31322d1)
#### Backward Incompatible Change
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions=1`. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Provide support for AzureBlobStorage function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)).
* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `StorageEmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) ([Duc Canh Le](https://github.com/canhld94)).
* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)).
* Adds the Form Format to read/write a single record in the application/x-www-form-urlencoded format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)).
* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)).
* New setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)).
* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)).
* Earlier our s3 storage and s3 table function didn't support selecting from archive files. I created a solution that allows to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)).
* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)).
* Add npy output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)).
* Added SQL functions `generateUUIDv7`, `generateUUIDv7ThreadMonotonic`, `generateUUIDv7NonMonotonic` (with different monotonicity/performance trade-offs) to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function `UUIDToNum` to extract bytes from a UUID and a new function `UUIDv7ToDateTime` to extract timestamp component from a UUID version 7. [#62852](https://github.com/ClickHouse/ClickHouse/pull/62852) ([Alexey Petrunyaka](https://github.com/pet74alex)).
* Backported in [#64307](https://github.com/ClickHouse/ClickHouse/issues/64307): Implement Dynamic data type that allows to store values of any type inside it without knowing all of them in advance. Dynamic type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)).
* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce StorageEmbeddedRocksDB table settings. [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)).
* Raw as a synonym for TSVRaw. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)).
* Added possibility to do cross join in temporary file if size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)).
* On Linux and MacOS, if the program has STDOUT redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE` ). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)).
* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
#### Performance Improvement
* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)).
* Process string functions XXXUTF8 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)).
* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)).
* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)).
* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)).
* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)).
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Raúl Marín](https://github.com/Algunenano)).
* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K,V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)).
* Multiline strings with border preservation and column width change. [#59940](https://github.com/ClickHouse/ClickHouse/pull/59940) ([Volodyachan](https://github.com/Volodyachan)).
* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)).
* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)).
* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* YAML Merge Key support. [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)).
* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)).
* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)).
* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support configuration substitutions from YAML files. [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)).
* Add TTL information in system parts_columns table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)).
* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)).
* Width of fields now correctly calculate, ignoring ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)).
* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)).
* Support English-style Unicode quotes, e.g. “Hello”, world. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allowed to create MaterializedMySQL database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)).
* Remove copying data when writing to filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)).
* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed tabulation from line numbering, correct handling of length when moving a line if the value has a tab, added tests. [#63493](https://github.com/ClickHouse/ClickHouse/pull/63493) ([Volodyachan](https://github.com/Volodyachan)).
* Add this `aggregate_function_group_array_has_limit_size`setting to support discarding data in some scenarios. [#63516](https://github.com/ClickHouse/ClickHouse/pull/63516) ([zhongyuankai](https://github.com/zhongyuankai)).
* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Automatically correct `max_block_size=0` to default value. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)).
* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)).
* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)).
* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)).
* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)).
* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)).
* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)).
* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)).
* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)).
#### Build/Testing/Packaging Improvement
* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Re-enable broken s390x build in CI. [#63135](https://github.com/ClickHouse/ClickHouse/pull/63135) ([Harry Lee](https://github.com/HarryLeeIBM)).
* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)).
* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix making backup when multiple shards are used. This PR fixes [#56566](https://github.com/ClickHouse/ClickHouse/issues/56566). [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix passing projections/indexes from CREATE query into inner table of MV. [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)).
* Fix boundRatio incorrect merge. [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)).
* Fix crash when using some functions with low-cardinality columns. [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix queries with FINAL give wrong result when table does not use adaptive granularity. [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)).
* Improve the detection of cgroups v2 memory controller in unusual locations. This fixes a warning that the cgroup memory observer was disabled because no cgroups v1 or v2 current memory file could be found. [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix subsequent use of external tables in client. [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash with untuple and unresolved lambda. [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)).
* Fix bug which could lead to server to accept connections before server is actually loaded. [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)).
* Fix intersect parts when restart after drop range. [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)).
* Fix a misbehavior when SQL security defaults don't load for old tables during server startup. [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)).
* JOIN filter push down filled join fix. Closes [#63228](https://github.com/ClickHouse/ClickHouse/issues/63228). [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix infinite loop while listing objects in Azure blob storage. [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)).
* CROSS join can be executed with any value `join_algorithm` setting, close [#62431](https://github.com/ClickHouse/ClickHouse/issues/62431). [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)).
* Fixed a potential crash caused by a `no space left` error when temporary data in the cache is used. [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)).
* Fix bug which could potentially lead to rare LOGICAL_ERROR during SELECT query with message: `Unexpected return type from materialize. Expected type_XXX. Got type_YYY.` Introduced in [#59379](https://github.com/ClickHouse/ClickHouse/issues/59379). [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)).
* Fix `X-ClickHouse-Timezone` header returning wrong timezone when using `session_timezone` as query level setting. [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types. [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical errors in queries with `GROUPING SETS` and `WHERE` and `group_by_use_nulls = true`, close [#60538](https://github.com/ClickHouse/ClickHouse/issues/60538). [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)).
* Fix backup of projection part in case projection was removed from table metadata, but part still has projection. [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix 'Every derived table must have its own alias' error for MYSQL dictionary source, close [#63341](https://github.com/ClickHouse/ClickHouse/issues/63341). [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)).
* Insert QueryFinish on AsyncInsertFlush with no data. [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)).
* Fix `system.query_log.used_dictionaries` logging. [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)).
* Avoid segafult in `MergeTreePrefetchedReadPool` while fetching projection parts. [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix rabbitmq heap-use-after-free found by clang-18, which can happen if an error is thrown from RabbitMQ during initialization of exchange and queues. [#63515](https://github.com/ClickHouse/ClickHouse/pull/63515) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry). [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)).
* Fix support for Array and Map with Keyed hashing functions and materialized keys. [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fixed Parquet filter pushdown not working with Analyzer. [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)).
* It is forbidden to convert MergeTree to replicated if the zookeeper path for this table already exists. [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)).
* Read only the necessary columns from VIEW (new analyzer). Closes [#62594](https://github.com/ClickHouse/ClickHouse/issues/62594). [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix rare case with missing data in the result of distributed query. [#63691](https://github.com/ClickHouse/ClickHouse/pull/63691) ([vdimir](https://github.com/vdimir)).
* Fix [#63539](https://github.com/ClickHouse/ClickHouse/issues/63539). Forbid WINDOW redefinition in new analyzer. [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)).
* Flatten_nested is broken with replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `SIZES_OF_COLUMNS_DOESNT_MATCH` error for queries with `arrayJoin` function in `WHERE`. Fixes [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653). [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `Not found column` and `CAST AS Map from array requires nested tuple of 2 elements` exceptions for distributed queries which use `Map(Nothing, Nothing)` type. Fixes [#63637](https://github.com/ClickHouse/ClickHouse/issues/63637). [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix possible `ILLEGAL_COLUMN` error in `partial_merge` join, close [#37928](https://github.com/ClickHouse/ClickHouse/issues/37928). [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)).
* `query_plan_remove_redundant_distinct` can break queries with WINDOW FUNCTIONS (with `allow_experimental_analyzer` is on). Fixes [#62820](https://github.com/ClickHouse/ClickHouse/issues/62820). [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY. [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)).
* Fix a query with a duplicating cycling alias. Fixes [#63320](https://github.com/ClickHouse/ClickHouse/issues/63320). [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed performance degradation of parsing data formats in INSERT query. This closes [#62918](https://github.com/ClickHouse/ClickHouse/issues/62918). This partially reverts [#42284](https://github.com/ClickHouse/ClickHouse/issues/42284), which breaks the original design and introduces more problems. [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add 'endpoint_subpath' S3 URI setting to allow plain_rewritable disks to share the same endpoint. [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix queries using parallel read buffer (e.g. with max_download_thread > 0) getting stuck when threads cannot be allocated. [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow JOIN filter push down to both streams if only single equivalent column is used in query. Closes [#63799](https://github.com/ClickHouse/ClickHouse/issues/63799). [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)).
* Remove the data from all disks after DROP with the Lazy database engines. Without these changes, orhpaned will remain on the disks. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix incorrect select query result when parallel replicas were used to read from a Materialized View. [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)).
* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client: - do not fail on ZNONODE errors - find super nodes inside super nodes - properly calculate subtree node count. [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)).
* Fix a error `Database name is empty` for remote queries with lambdas over the cluster with modified default database. Fixes [#63471](https://github.com/ClickHouse/ClickHouse/issues/63471). [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix SIGSEGV due to CPU/Real (`query_profiler_real_time_period_ns`/`query_profiler_cpu_time_period_ns`) profiler (has been an issue since 2022, that leads to periodic server crashes, especially if you were using distributed engine). [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)).
* Fixed `EXPLAIN CURRENT TRANSACTION` query. [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)).
* Fix analyzer - IN function with arbitrary deep sub-selects in materialized view to use insertion block. [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Allow `ALTER TABLE .. MODIFY|RESET SETTING` and `ALTER TABLE .. MODIFY COMMENT` for plain_rewritable disk. [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix Recursive CTE with distributed queries. Closes [#63790](https://github.com/ClickHouse/ClickHouse/issues/63790). [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
* Fix the `Not found column` error for queries with `skip_unused_shards = 1`, `LIMIT BY`, and the new analyzer. Fixes [#63943](https://github.com/ClickHouse/ClickHouse/issues/63943). [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* (Low-quality third-party Kusto Query Language). Resolve Client Abortion Issue When Using KQL Table Function in Interactive Mode. [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)).
* Backported in [#64356](https://github.com/ClickHouse/ClickHouse/issues/64356): Fix an `Cyclic aliases` error for cyclic aliases of different type (expression and function). Fixes [#63205](https://github.com/ClickHouse/ClickHouse/issues/63205). [#63993](https://github.com/ClickHouse/ClickHouse/pull/63993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Deserialize untrusted binary inputs in a safer way. [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)).
* Do not throw `Storage doesn't support FINAL` error for remote queries over non-MergeTree tables with `final = true` and new analyzer. Fixes [#63960](https://github.com/ClickHouse/ClickHouse/issues/63960). [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add missing settings to recoverLostReplica. [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)).
* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal). [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#64324](https://github.com/ClickHouse/ClickHouse/issues/64324): This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline Closes [#63777](https://github.com/ClickHouse/ClickHouse/issues/63777). [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)).
* Backported in [#64384](https://github.com/ClickHouse/ClickHouse/issues/64384): Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix azure backup writing multipart blocks as 1mb (read buffer size) instead of max_upload_part_size. [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#64541](https://github.com/ClickHouse/ClickHouse/issues/64541): Fix creating backups to S3 buckets with different credentials from the disk containing the file. [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)).
* Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#64332](https://github.com/ClickHouse/ClickHouse/issues/64332): The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
* Ignore `text_log` config when using Keeper. [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#64692](https://github.com/ClickHouse/ClickHouse/issues/64692): Fix Query Tree size validation. Closes [#63701](https://github.com/ClickHouse/ClickHouse/issues/63701). [#64377](https://github.com/ClickHouse/ClickHouse/pull/64377) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#64411](https://github.com/ClickHouse/ClickHouse/issues/64411): Fix `Logical error: Bad cast` for `Buffer` table with `PREWHERE`. Fixes [#64172](https://github.com/ClickHouse/ClickHouse/issues/64172). [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64625](https://github.com/ClickHouse/ClickHouse/issues/64625): Fix an error `Cannot find column` in distributed queries with constant CTE in the `GROUP BY` key. [#64519](https://github.com/ClickHouse/ClickHouse/pull/64519) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64682](https://github.com/ClickHouse/ClickHouse/issues/64682): Fix [#64612](https://github.com/ClickHouse/ClickHouse/issues/64612). Do not rewrite aggregation if `-If` combinator is already used. [#64638](https://github.com/ClickHouse/ClickHouse/pull/64638) ([Dmitry Novik](https://github.com/novikd)).
#### CI Fix or Improvement (changelog entry is not required)
* Implement cumulative A Sync status. [#61464](https://github.com/ClickHouse/ClickHouse/pull/61464) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add ability to run Azure tests in PR with label. [#63196](https://github.com/ClickHouse/ClickHouse/pull/63196) ([alesapin](https://github.com/alesapin)).
* Add azure run with msan. [#63238](https://github.com/ClickHouse/ClickHouse/pull/63238) ([alesapin](https://github.com/alesapin)).
* Improve cloud backport script. [#63282](https://github.com/ClickHouse/ClickHouse/pull/63282) ([Raúl Marín](https://github.com/Algunenano)).
* Use `/commit/` to have the URLs in [reports](https://play.clickhouse.com/play?user=play#c2VsZWN0IGRpc3RpbmN0IGNvbW1pdF91cmwgZnJvbSBjaGVja3Mgd2hlcmUgY2hlY2tfc3RhcnRfdGltZSA+PSBub3coKSAtIGludGVydmFsIDEgbW9udGggYW5kIHB1bGxfcmVxdWVzdF9udW1iZXI9NjA1MzI=) like https://github.com/ClickHouse/ClickHouse/commit/44f8bc5308b53797bec8cccc3bd29fab8a00235d and not like https://github.com/ClickHouse/ClickHouse/commits/44f8bc5308b53797bec8cccc3bd29fab8a00235d. [#63331](https://github.com/ClickHouse/ClickHouse/pull/63331) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Extra constraints for stress and fuzzer tests. [#63470](https://github.com/ClickHouse/ClickHouse/pull/63470) ([Raúl Marín](https://github.com/Algunenano)).
* Fix 02362_part_log_merge_algorithm flaky test. [#63635](https://github.com/ClickHouse/ClickHouse/pull/63635) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Fix test_odbc_interaction from aarch64 [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63787](https://github.com/ClickHouse/ClickHouse/pull/63787) ([alesapin](https://github.com/alesapin)).
* Fix test `test_catboost_evaluate` for aarch64. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63789](https://github.com/ClickHouse/ClickHouse/pull/63789) ([alesapin](https://github.com/alesapin)).
* Remove HDFS from disks config for one integration test for arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63832](https://github.com/ClickHouse/ClickHouse/pull/63832) ([alesapin](https://github.com/alesapin)).
* Bump version for old image in test_short_strings_aggregation to make it work on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63836](https://github.com/ClickHouse/ClickHouse/pull/63836) ([alesapin](https://github.com/alesapin)).
* Disable test `test_non_default_compression/test.py::test_preconfigured_deflateqpl_codec` on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63839](https://github.com/ClickHouse/ClickHouse/pull/63839) ([alesapin](https://github.com/alesapin)).
* Include checks like `Stateless tests (asan, distributed cache, meta storage in keeper, s3 storage) [2/3]` in `Mergeable Check` and `A Sync`. [#63945](https://github.com/ClickHouse/ClickHouse/pull/63945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix 02124_insert_deduplication_token_multiple_blocks. [#63950](https://github.com/ClickHouse/ClickHouse/pull/63950) ([Han Fei](https://github.com/hanfei1991)).
* Add `ClickHouseVersion.copy` method. Create a branch release in advance without spinning out the release to increase the stability. [#64039](https://github.com/ClickHouse/ClickHouse/pull/64039) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* The mime type is not 100% reliable for Python and shell scripts without shebangs; add a check for file extension. [#64062](https://github.com/ClickHouse/ClickHouse/pull/64062) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add retries in git submodule update. [#64125](https://github.com/ClickHouse/ClickHouse/pull/64125) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)
* Backported in [#64591](https://github.com/ClickHouse/ClickHouse/issues/64591): Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Do not remove server constants from GROUP BY key for secondary query."'. [#63297](https://github.com/ClickHouse/ClickHouse/pull/63297) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Introduce bulk loading to StorageEmbeddedRocksDB"'. [#63316](https://github.com/ClickHouse/ClickHouse/pull/63316) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Add tags for the test 03000_traverse_shadow_system_data_paths.sql to make it stable'. [#63366](https://github.com/ClickHouse/ClickHouse/pull/63366) ([Aleksei Filatov](https://github.com/aalexfvk)).
* NO CL ENTRY: 'Revert "Revert "Do not remove server constants from GROUP BY key for secondary query.""'. [#63415](https://github.com/ClickHouse/ClickHouse/pull/63415) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* NO CL ENTRY: 'Revert "Fix index analysis for `DateTime64`"'. [#63525](https://github.com/ClickHouse/ClickHouse/pull/63525) ([Raúl Marín](https://github.com/Algunenano)).
* NO CL ENTRY: 'Add `jwcrypto` to integration tests runner'. [#63551](https://github.com/ClickHouse/ClickHouse/pull/63551) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* NO CL ENTRY: 'Follow-up for the `binary_symbols` table in CI'. [#63802](https://github.com/ClickHouse/ClickHouse/pull/63802) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'chore(ci-workers): remove reusable from tailscale key'. [#63999](https://github.com/ClickHouse/ClickHouse/pull/63999) ([Gabriel Martinez](https://github.com/GMartinez-Sisti)).
* NO CL ENTRY: 'Revert "Update gui.md - Add ch-ui to open-source available tools."'. [#64064](https://github.com/ClickHouse/ClickHouse/pull/64064) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Prevent stack overflow in Fuzzer and Stress test'. [#64082](https://github.com/ClickHouse/ClickHouse/pull/64082) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Prevent conversion to Replicated if zookeeper path already exists"'. [#64214](https://github.com/ClickHouse/ClickHouse/pull/64214) ([Sergei Trifonov](https://github.com/serxa)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Remove http_max_chunk_size setting (too internal) [#60852](https://github.com/ClickHouse/ClickHouse/pull/60852) ([Azat Khuzhin](https://github.com/azat)).
* Fix race in refreshable materialized views causing SELECT to fail sometimes [#60883](https://github.com/ClickHouse/ClickHouse/pull/60883) ([Michael Kolupaev](https://github.com/al13n321)).
* Parallel replicas: table check failover [#61935](https://github.com/ClickHouse/ClickHouse/pull/61935) ([Igor Nikonov](https://github.com/devcrafter)).
* Avoid crashing on column type mismatch in a few dozen places [#62087](https://github.com/ClickHouse/ClickHouse/pull/62087) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix optimize_if_chain_to_multiif const NULL handling [#62104](https://github.com/ClickHouse/ClickHouse/pull/62104) ([Michael Kolupaev](https://github.com/al13n321)).
* Use intrusive lists for `ResourceRequest` instead of deque [#62165](https://github.com/ClickHouse/ClickHouse/pull/62165) ([Sergei Trifonov](https://github.com/serxa)).
* Analyzer: Fix validateAggregates for tables with different aliases [#62346](https://github.com/ClickHouse/ClickHouse/pull/62346) ([vdimir](https://github.com/vdimir)).
* Improve code and tests of `DROP` of multiple tables [#62359](https://github.com/ClickHouse/ClickHouse/pull/62359) ([zhongyuankai](https://github.com/zhongyuankai)).
* Fix exception message during writing to partitioned s3/hdfs/azure path with globs [#62423](https://github.com/ClickHouse/ClickHouse/pull/62423) ([Kruglov Pavel](https://github.com/Avogar)).
* Support UBSan on Clang-19 (master) [#62466](https://github.com/ClickHouse/ClickHouse/pull/62466) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Save the stacktrace of thread waiting on failing AsyncLoader job [#62719](https://github.com/ClickHouse/ClickHouse/pull/62719) ([Sergei Trifonov](https://github.com/serxa)).
* group_by_use_nulls strikes back [#62922](https://github.com/ClickHouse/ClickHouse/pull/62922) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: prefer column name to alias from array join [#62995](https://github.com/ClickHouse/ClickHouse/pull/62995) ([vdimir](https://github.com/vdimir)).
* CI: try separate the workflows file for GitHub's Merge Queue [#63123](https://github.com/ClickHouse/ClickHouse/pull/63123) ([Max K.](https://github.com/maxknv)).
* Try to fix coverage tests [#63130](https://github.com/ClickHouse/ClickHouse/pull/63130) ([Raúl Marín](https://github.com/Algunenano)).
* Fix azure backup flaky test [#63158](https://github.com/ClickHouse/ClickHouse/pull/63158) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Merging [#60920](https://github.com/ClickHouse/ClickHouse/issues/60920) [#63159](https://github.com/ClickHouse/ClickHouse/pull/63159) ([vdimir](https://github.com/vdimir)).
* QueryAnalysisPass improve QUALIFY validation [#63162](https://github.com/ClickHouse/ClickHouse/pull/63162) ([Maksim Kita](https://github.com/kitaisreal)).
* Add numpy tests for different endianness [#63189](https://github.com/ClickHouse/ClickHouse/pull/63189) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fallback action-runner to autoupdate when it's unable to start [#63195](https://github.com/ClickHouse/ClickHouse/pull/63195) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix possible endless loop while reading from azure [#63197](https://github.com/ClickHouse/ClickHouse/pull/63197) ([Anton Popov](https://github.com/CurtizJ)).
* Add information about materialized view security bug fix into the changelog [#63204](https://github.com/ClickHouse/ClickHouse/pull/63204) ([pufit](https://github.com/pufit)).
* Disable one query from 02994_sanity_check_settings [#63208](https://github.com/ClickHouse/ClickHouse/pull/63208) ([Raúl Marín](https://github.com/Algunenano)).
* Enable custom parquet encoder by default, attempt 2 [#63210](https://github.com/ClickHouse/ClickHouse/pull/63210) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version after release [#63215](https://github.com/ClickHouse/ClickHouse/pull/63215) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v24.4.1.2088-stable [#63217](https://github.com/ClickHouse/ClickHouse/pull/63217) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.3.3.102-lts [#63226](https://github.com/ClickHouse/ClickHouse/pull/63226) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.2.3.70-stable [#63227](https://github.com/ClickHouse/ClickHouse/pull/63227) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Return back [#61551](https://github.com/ClickHouse/ClickHouse/issues/61551) (More optimal loading of marks) [#63233](https://github.com/ClickHouse/ClickHouse/pull/63233) ([Anton Popov](https://github.com/CurtizJ)).
* Hide CI options under a spoiler [#63237](https://github.com/ClickHouse/ClickHouse/pull/63237) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Add `FROM` keyword to `TRUNCATE ALL TABLES` [#63241](https://github.com/ClickHouse/ClickHouse/pull/63241) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Minor follow-up to a renaming PR [#63260](https://github.com/ClickHouse/ClickHouse/pull/63260) ([Robert Schulze](https://github.com/rschu1ze)).
* More checks for concurrently deleted files and dirs in system.remote_data_paths [#63274](https://github.com/ClickHouse/ClickHouse/pull/63274) ([Alexander Gololobov](https://github.com/davenger)).
* Fix SettingsChangesHistory.h for allow_experimental_join_condition [#63278](https://github.com/ClickHouse/ClickHouse/pull/63278) ([Raúl Marín](https://github.com/Algunenano)).
* Update version_date.tsv and changelogs after v23.8.14.6-lts [#63285](https://github.com/ClickHouse/ClickHouse/pull/63285) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix azure flaky test [#63286](https://github.com/ClickHouse/ClickHouse/pull/63286) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix deadlock in `CacheDictionaryUpdateQueue` in case of exception in constructor [#63287](https://github.com/ClickHouse/ClickHouse/pull/63287) ([Nikita Taranov](https://github.com/nickitat)).
* DiskApp: fix 'list --recursive /' and crash on invalid arguments [#63296](https://github.com/ClickHouse/ClickHouse/pull/63296) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix terminate because of unhandled exception in `MergeTreeDeduplicationLog::shutdown` [#63298](https://github.com/ClickHouse/ClickHouse/pull/63298) ([Nikita Taranov](https://github.com/nickitat)).
* Move s3_plain_rewritable unit test to shell [#63317](https://github.com/ClickHouse/ClickHouse/pull/63317) ([Julia Kartseva](https://github.com/jkartseva)).
* Add tests for [#63264](https://github.com/ClickHouse/ClickHouse/issues/63264) [#63321](https://github.com/ClickHouse/ClickHouse/pull/63321) ([Raúl Marín](https://github.com/Algunenano)).
* Try fix segfault in `MergeTreeReadPoolBase::createTask` [#63323](https://github.com/ClickHouse/ClickHouse/pull/63323) ([Antonio Andelic](https://github.com/antonio2368)).
* Update README.md [#63326](https://github.com/ClickHouse/ClickHouse/pull/63326) ([Tyler Hannan](https://github.com/tylerhannan)).
* Skip unaccessible table dirs in system.remote_data_paths [#63330](https://github.com/ClickHouse/ClickHouse/pull/63330) ([Alexander Gololobov](https://github.com/davenger)).
* Add test for [#56287](https://github.com/ClickHouse/ClickHouse/issues/56287) [#63340](https://github.com/ClickHouse/ClickHouse/pull/63340) ([Raúl Marín](https://github.com/Algunenano)).
* Update README.md [#63350](https://github.com/ClickHouse/ClickHouse/pull/63350) ([Tyler Hannan](https://github.com/tylerhannan)).
* Add test for [#48049](https://github.com/ClickHouse/ClickHouse/issues/48049) [#63351](https://github.com/ClickHouse/ClickHouse/pull/63351) ([Raúl Marín](https://github.com/Algunenano)).
* Add option `query_id_prefix` to `clickhouse-benchmark` [#63352](https://github.com/ClickHouse/ClickHouse/pull/63352) ([Anton Popov](https://github.com/CurtizJ)).
* Rollback azurite to working version [#63354](https://github.com/ClickHouse/ClickHouse/pull/63354) ([alesapin](https://github.com/alesapin)).
* Randomize setting `enable_block_offset_column` in stress tests [#63355](https://github.com/ClickHouse/ClickHouse/pull/63355) ([Anton Popov](https://github.com/CurtizJ)).
* Fix AST parsing of invalid type names [#63357](https://github.com/ClickHouse/ClickHouse/pull/63357) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix some 00002_log_and_exception_messages_formatting flakiness [#63358](https://github.com/ClickHouse/ClickHouse/pull/63358) ([Michael Kolupaev](https://github.com/al13n321)).
* Add a test for [#55655](https://github.com/ClickHouse/ClickHouse/issues/55655) [#63380](https://github.com/ClickHouse/ClickHouse/pull/63380) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data race in `reportBrokenPart` [#63396](https://github.com/ClickHouse/ClickHouse/pull/63396) ([Antonio Andelic](https://github.com/antonio2368)).
* Workaround for `oklch()` inside canvas bug for firefox [#63404](https://github.com/ClickHouse/ClickHouse/pull/63404) ([Sergei Trifonov](https://github.com/serxa)).
* Add test for issue [#47862](https://github.com/ClickHouse/ClickHouse/issues/47862) [#63424](https://github.com/ClickHouse/ClickHouse/pull/63424) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix parsing of `CREATE INDEX` query [#63425](https://github.com/ClickHouse/ClickHouse/pull/63425) ([Anton Popov](https://github.com/CurtizJ)).
* We are using Shared Catalog in the CI Logs cluster [#63442](https://github.com/ClickHouse/ClickHouse/pull/63442) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix collection of coverage data in the CI Logs cluster [#63453](https://github.com/ClickHouse/ClickHouse/pull/63453) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky test for rocksdb bulk sink [#63457](https://github.com/ClickHouse/ClickHouse/pull/63457) ([Duc Canh Le](https://github.com/canhld94)).
* io_uring: refactor get reader from context [#63475](https://github.com/ClickHouse/ClickHouse/pull/63475) ([Tomer Shafir](https://github.com/tomershafir)).
* Analyzer setting max_streams_to_max_threads_ratio overflow fix [#63478](https://github.com/ClickHouse/ClickHouse/pull/63478) ([Maksim Kita](https://github.com/kitaisreal)).
* Add setting for better rendering of multiline string for pretty format [#63479](https://github.com/ClickHouse/ClickHouse/pull/63479) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix logical error when reloading config with customly created web disk broken after [#56367](https://github.com/ClickHouse/ClickHouse/issues/56367) [#63484](https://github.com/ClickHouse/ClickHouse/pull/63484) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add test for [#49307](https://github.com/ClickHouse/ClickHouse/issues/49307) [#63486](https://github.com/ClickHouse/ClickHouse/pull/63486) ([Anton Popov](https://github.com/CurtizJ)).
* Remove leftovers of GCC support in cmake rules [#63488](https://github.com/ClickHouse/ClickHouse/pull/63488) ([Azat Khuzhin](https://github.com/azat)).
* Fix ProfileEventTimeIncrement code [#63489](https://github.com/ClickHouse/ClickHouse/pull/63489) ([Azat Khuzhin](https://github.com/azat)).
* MergeTreePrefetchedReadPool: Print parent name when logging projection parts [#63522](https://github.com/ClickHouse/ClickHouse/pull/63522) ([Raúl Marín](https://github.com/Algunenano)).
* Correctly stop `asyncCopy` tasks in all cases [#63523](https://github.com/ClickHouse/ClickHouse/pull/63523) ([Antonio Andelic](https://github.com/antonio2368)).
* Almost everything should work on AArch64 (Part of [#58061](https://github.com/ClickHouse/ClickHouse/issues/58061)) [#63527](https://github.com/ClickHouse/ClickHouse/pull/63527) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update randomization of `old_parts_lifetime` [#63530](https://github.com/ClickHouse/ClickHouse/pull/63530) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update 02240_system_filesystem_cache_table.sh [#63531](https://github.com/ClickHouse/ClickHouse/pull/63531) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix data race in `DistributedSink` [#63538](https://github.com/ClickHouse/ClickHouse/pull/63538) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix azure tests run on master [#63540](https://github.com/ClickHouse/ClickHouse/pull/63540) ([alesapin](https://github.com/alesapin)).
* Find a proper commit for cumulative `A Sync` status [#63543](https://github.com/ClickHouse/ClickHouse/pull/63543) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add `no-s3-storage` tag to local_plain_rewritable ut [#63546](https://github.com/ClickHouse/ClickHouse/pull/63546) ([Julia Kartseva](https://github.com/jkartseva)).
* Go back to upstream lz4 submodule [#63574](https://github.com/ClickHouse/ClickHouse/pull/63574) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical error in ColumnTuple::tryInsert() [#63583](https://github.com/ClickHouse/ClickHouse/pull/63583) ([Michael Kolupaev](https://github.com/al13n321)).
* harmonize sumMap error messages on ILLEGAL_TYPE_OF_ARGUMENT [#63619](https://github.com/ClickHouse/ClickHouse/pull/63619) ([Yohann Jardin](https://github.com/yohannj)).
* Update README.md [#63631](https://github.com/ClickHouse/ClickHouse/pull/63631) ([Tyler Hannan](https://github.com/tylerhannan)).
* Ignore global profiler if system.trace_log is not enabled and fix really disable it for keeper standalone build [#63632](https://github.com/ClickHouse/ClickHouse/pull/63632) ([Azat Khuzhin](https://github.com/azat)).
* Fixes for 00002_log_and_exception_messages_formatting [#63634](https://github.com/ClickHouse/ClickHouse/pull/63634) ([Azat Khuzhin](https://github.com/azat)).
* Fix tests flakiness due to long SYSTEM FLUSH LOGS (explicitly specify old_parts_lifetime) [#63639](https://github.com/ClickHouse/ClickHouse/pull/63639) ([Azat Khuzhin](https://github.com/azat)).
* Update clickhouse-test help section [#63663](https://github.com/ClickHouse/ClickHouse/pull/63663) ([Ali](https://github.com/xogoodnow)).
* Fix bad test `02950_part_log_bytes_uncompressed` [#63672](https://github.com/ClickHouse/ClickHouse/pull/63672) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove leftovers of `optimize_monotonous_functions_in_order_by` [#63674](https://github.com/ClickHouse/ClickHouse/pull/63674) ([Nikita Taranov](https://github.com/nickitat)).
* tests: attempt to fix 02340_parts_refcnt_mergetree flakiness [#63684](https://github.com/ClickHouse/ClickHouse/pull/63684) ([Azat Khuzhin](https://github.com/azat)).
* Parallel replicas: simple cleanup [#63685](https://github.com/ClickHouse/ClickHouse/pull/63685) ([Igor Nikonov](https://github.com/devcrafter)).
* Cancel S3 reads properly when parallel reads are used [#63687](https://github.com/ClickHouse/ClickHouse/pull/63687) ([Antonio Andelic](https://github.com/antonio2368)).
* Explain map insertion order [#63690](https://github.com/ClickHouse/ClickHouse/pull/63690) ([Mark Needham](https://github.com/mneedham)).
* selectRangesToRead() simple cleanup [#63692](https://github.com/ClickHouse/ClickHouse/pull/63692) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix fuzzed analyzer_join_with_constant query [#63702](https://github.com/ClickHouse/ClickHouse/pull/63702) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add missing explicit instantiations of ColumnUnique [#63718](https://github.com/ClickHouse/ClickHouse/pull/63718) ([Raúl Marín](https://github.com/Algunenano)).
* Better asserts in ColumnString.h [#63719](https://github.com/ClickHouse/ClickHouse/pull/63719) ([Raúl Marín](https://github.com/Algunenano)).
* Don't randomize some settings in 02941_variant_type_* tests to avoid timeouts [#63721](https://github.com/ClickHouse/ClickHouse/pull/63721) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix flaky 03145_non_loaded_projection_backup.sh [#63728](https://github.com/ClickHouse/ClickHouse/pull/63728) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Userspace page cache: don't collect stats if cache is unused [#63730](https://github.com/ClickHouse/ClickHouse/pull/63730) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix insignificant UBSAN error in QueryAnalyzer::replaceNodesWithPositionalArguments() [#63734](https://github.com/ClickHouse/ClickHouse/pull/63734) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix a bug in resolving matcher inside lambda inside ARRAY JOIN [#63744](https://github.com/ClickHouse/ClickHouse/pull/63744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Remove unused CaresPTRResolver::cancel_requests method [#63754](https://github.com/ClickHouse/ClickHouse/pull/63754) ([Arthur Passos](https://github.com/arthurpassos)).
* Do not hide disk name [#63756](https://github.com/ClickHouse/ClickHouse/pull/63756) ([Kseniia Sumarokova](https://github.com/kssenii)).
* CI: remove Cancel and Debug workflows as redundant [#63757](https://github.com/ClickHouse/ClickHouse/pull/63757) ([Max K.](https://github.com/maxknv)).
* Security Policy: Add notification process [#63773](https://github.com/ClickHouse/ClickHouse/pull/63773) ([Leticia Webb](https://github.com/leticiawebb)).
* Fix typo [#63774](https://github.com/ClickHouse/ClickHouse/pull/63774) ([Anton Popov](https://github.com/CurtizJ)).
* Fix fuzzer when only explicit faults are used [#63775](https://github.com/ClickHouse/ClickHouse/pull/63775) ([Raúl Marín](https://github.com/Algunenano)).
* Settings typo [#63782](https://github.com/ClickHouse/ClickHouse/pull/63782) ([Rory Crispin](https://github.com/RoryCrispin)).
* Changed the previous value of `output_format_pretty_preserve_border_for_multiline_string` setting [#63783](https://github.com/ClickHouse/ClickHouse/pull/63783) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* fix antlr insertStmt for issue 63657 [#63811](https://github.com/ClickHouse/ClickHouse/pull/63811) ([GG Bond](https://github.com/zzyReal666)).
* Fix race in `ReplicatedMergeTreeLogEntryData` [#63816](https://github.com/ClickHouse/ClickHouse/pull/63816) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow allocation during job destructor in `ThreadPool` [#63829](https://github.com/ClickHouse/ClickHouse/pull/63829) ([Antonio Andelic](https://github.com/antonio2368)).
* io_uring: add basic io_uring clickhouse perf test [#63835](https://github.com/ClickHouse/ClickHouse/pull/63835) ([Tomer Shafir](https://github.com/tomershafir)).
* fix typo [#63838](https://github.com/ClickHouse/ClickHouse/pull/63838) ([Alexander Gololobov](https://github.com/davenger)).
* Remove unnecessary logging statements in MergeJoinTransform.cpp [#63860](https://github.com/ClickHouse/ClickHouse/pull/63860) ([vdimir](https://github.com/vdimir)).
* CI: disable ARM integration test cases with libunwind crash [#63867](https://github.com/ClickHouse/ClickHouse/pull/63867) ([Max K.](https://github.com/maxknv)).
* Fix some settings values in 02455_one_row_from_csv_memory_usage test to make it less flaky [#63874](https://github.com/ClickHouse/ClickHouse/pull/63874) ([Kruglov Pavel](https://github.com/Avogar)).
* Randomise `allow_experimental_parallel_reading_from_replicas` in stress tests [#63899](https://github.com/ClickHouse/ClickHouse/pull/63899) ([Nikita Taranov](https://github.com/nickitat)).
* Fix logs test for binary data by converting it to a valid UTF8 string. [#63909](https://github.com/ClickHouse/ClickHouse/pull/63909) ([Alexey Katsman](https://github.com/alexkats)).
* More sanity checks for parallel replicas [#63910](https://github.com/ClickHouse/ClickHouse/pull/63910) ([Nikita Taranov](https://github.com/nickitat)).
* Insignificant libunwind build fixes [#63946](https://github.com/ClickHouse/ClickHouse/pull/63946) ([Azat Khuzhin](https://github.com/azat)).
* Revert multiline pretty changes due to performance problems [#63947](https://github.com/ClickHouse/ClickHouse/pull/63947) ([Raúl Marín](https://github.com/Algunenano)).
* Some usability improvements for c++expr script [#63948](https://github.com/ClickHouse/ClickHouse/pull/63948) ([Azat Khuzhin](https://github.com/azat)).
* CI: aarch64: disable arm integration tests with kerberaized kafka [#63961](https://github.com/ClickHouse/ClickHouse/pull/63961) ([Max K.](https://github.com/maxknv)).
* Slightly better setting `force_optimize_projection_name` [#63997](https://github.com/ClickHouse/ClickHouse/pull/63997) ([Anton Popov](https://github.com/CurtizJ)).
* Better script to collect symbols statistics [#64013](https://github.com/ClickHouse/ClickHouse/pull/64013) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix a typo in Analyzer [#64022](https://github.com/ClickHouse/ClickHouse/pull/64022) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix libbcrypt for FreeBSD build [#64023](https://github.com/ClickHouse/ClickHouse/pull/64023) ([Azat Khuzhin](https://github.com/azat)).
* Fix searching for libclang_rt.builtins.*.a on FreeBSD [#64051](https://github.com/ClickHouse/ClickHouse/pull/64051) ([Azat Khuzhin](https://github.com/azat)).
* Fix waiting for mutations with retriable errors [#64063](https://github.com/ClickHouse/ClickHouse/pull/64063) ([Alexander Tokmakov](https://github.com/tavplubix)).
* harmonize h3PointDist* error messages [#64080](https://github.com/ClickHouse/ClickHouse/pull/64080) ([Yohann Jardin](https://github.com/yohannj)).
* This log message is better in Trace [#64081](https://github.com/ClickHouse/ClickHouse/pull/64081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* tests: fix expected error for 03036_reading_s3_archives (fixes CI) [#64089](https://github.com/ClickHouse/ClickHouse/pull/64089) ([Azat Khuzhin](https://github.com/azat)).
* Fix sanitizers [#64090](https://github.com/ClickHouse/ClickHouse/pull/64090) ([Azat Khuzhin](https://github.com/azat)).
* Update llvm/clang to 18.1.6 [#64091](https://github.com/ClickHouse/ClickHouse/pull/64091) ([Azat Khuzhin](https://github.com/azat)).
* CI: mergeable check redesign [#64093](https://github.com/ClickHouse/ClickHouse/pull/64093) ([Max K.](https://github.com/maxknv)).
* Move `isAllASCII` from UTFHelper to StringUtils [#64108](https://github.com/ClickHouse/ClickHouse/pull/64108) ([Robert Schulze](https://github.com/rschu1ze)).
* Clean up .clang-tidy after transition to Clang 18 [#64111](https://github.com/ClickHouse/ClickHouse/pull/64111) ([Robert Schulze](https://github.com/rschu1ze)).
* Ignore exception when checking for cgroupsv2 [#64118](https://github.com/ClickHouse/ClickHouse/pull/64118) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix UBSan error in negative positional arguments [#64127](https://github.com/ClickHouse/ClickHouse/pull/64127) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Syncing code [#64135](https://github.com/ClickHouse/ClickHouse/pull/64135) ([Antonio Andelic](https://github.com/antonio2368)).
* Losen build resource limits for unusual architectures [#64152](https://github.com/ClickHouse/ClickHouse/pull/64152) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* fix clang tidy [#64179](https://github.com/ClickHouse/ClickHouse/pull/64179) ([Han Fei](https://github.com/hanfei1991)).
* Fix global query profiler [#64187](https://github.com/ClickHouse/ClickHouse/pull/64187) ([Azat Khuzhin](https://github.com/azat)).
* CI: cancel running PR wf after adding to MQ [#64188](https://github.com/ClickHouse/ClickHouse/pull/64188) ([Max K.](https://github.com/maxknv)).
* Add debug logging to EmbeddedRocksDBBulkSink [#64203](https://github.com/ClickHouse/ClickHouse/pull/64203) ([vdimir](https://github.com/vdimir)).
* Fix special builds (due to excessive resource usage - memory/CPU) [#64204](https://github.com/ClickHouse/ClickHouse/pull/64204) ([Azat Khuzhin](https://github.com/azat)).
* Add gh to style-check dockerfile [#64227](https://github.com/ClickHouse/ClickHouse/pull/64227) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Followup for [#63691](https://github.com/ClickHouse/ClickHouse/issues/63691) [#64285](https://github.com/ClickHouse/ClickHouse/pull/64285) ([vdimir](https://github.com/vdimir)).
* Rename allow_deprecated_functions to allow_deprecated_error_prone_win… [#64358](https://github.com/ClickHouse/ClickHouse/pull/64358) ([Raúl Marín](https://github.com/Algunenano)).
* Update description for settings `cross_join_min_rows_to_compress` and `cross_join_min_bytes_to_compress` [#64360](https://github.com/ClickHouse/ClickHouse/pull/64360) ([Nikita Fomichev](https://github.com/fm4v)).
* Rename aggregate_function_group_array_has_limit_size [#64362](https://github.com/ClickHouse/ClickHouse/pull/64362) ([Raúl Marín](https://github.com/Algunenano)).
* Split tests 03039_dynamic_all_merge_algorithms to avoid timeouts [#64363](https://github.com/ClickHouse/ClickHouse/pull/64363) ([Kruglov Pavel](https://github.com/Avogar)).
* Clean settings in 02943_variant_read_subcolumns test [#64437](https://github.com/ClickHouse/ClickHouse/pull/64437) ([Kruglov Pavel](https://github.com/Avogar)).
* CI: Critical bugfix category in PR template [#64480](https://github.com/ClickHouse/ClickHouse/pull/64480) ([Max K.](https://github.com/maxknv)).

View File

@ -7,6 +7,8 @@ sidebar_label: Configuration Files
# Configuration Files
The ClickHouse server can be configured with configuration files in XML or YAML syntax. In most installation types, the ClickHouse server runs with `/etc/clickhouse-server/config.xml` as default configuration file, but it is also possible to specify the location of the configuration file manually at server startup using command line option `--config-file=` or `-C`. Additional configuration files may be placed into directory `config.d/` relative to the main configuration file, for example into directory `/etc/clickhouse-server/config.d/`. Files in this directory and the main configuration are merged in a preprocessing step before the configuration is applied in ClickHouse server. Configuration files are merged in alphabetical order. To simplify updates and improve modularization, it is best practice to keep the default `config.xml` file unmodified and place additional customization into `config.d/`.
(The ClickHouse keeper configuration lives in `/etc/clickhouse-keeper/keeper_config.xml` and thus the additional files need to be placed in `/etc/clickhouse-keeper/keeper_config.d/` )
It is possible to mix XML and YAML configuration files, for example you could have a main configuration file `config.xml` and additional configuration files `config.d/network.xml`, `config.d/timezone.yaml` and `config.d/keeper.yaml`. Mixing XML and YAML within a single configuration file is not supported. XML configuration files should use `<clickhouse>...</clickhouse>` as top-level tag. In YAML configuration files, `clickhouse:` is optional, the parser inserts it implicitly if absent.

View File

@ -1956,7 +1956,7 @@ Possible values:
- Positive integer.
- 0 — Asynchronous insertions are disabled.
Default value: `1000000`.
Default value: `10485760`.
### async_insert_max_query_number {#async-insert-max-query-number}

View File

@ -5,10 +5,57 @@ sidebar_position: 107
# corr
Syntax: `corr(x, y)`
Calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient):
$$
\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{\sqrt{\Sigma{(x - \bar{x})^2} * \Sigma{(y - \bar{y})^2}}}
$$
Calculates the Pearson correlation coefficient: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`.
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `corrStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`corrStable`](../reference/corrstable.md) function. It is slower but provides a more accurate result.
:::
**Syntax**
```sql
corr(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The Pearson correlation coefficient. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series
(
i UInt32,
x_value Float64,
y_value Float64
)
ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3);
```
```sql
SELECT corr(x_value, y_value)
FROM series;
```
Result:
```response
┌─corr(x_value, y_value)─┐
│ 0.1730265755453256 │
└────────────────────────┘
```

View File

@ -0,0 +1,55 @@
---
slug: /en/sql-reference/aggregate-functions/reference/corrmatrix
sidebar_position: 108
---
# corrMatrix
Computes the correlation matrix over N variables.
**Syntax**
```sql
corrMatrix(x[, ...])
```
**Arguments**
- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned value**
- Correlation matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test;
CREATE TABLE test
(
a UInt32,
b Float64,
c Float64,
d Float64
)
ENGINE = Memory;
INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762);
```
```sql
SELECT arrayMap(x -> round(x, 3), arrayJoin(corrMatrix(a, b, c, d))) AS corrMatrix
FROM test;
```
Result:
```response
┌─corrMatrix─────────────┐
1. │ [1,-0.096,0.243,0.746] │
2. │ [-0.096,1,0.173,0.106] │
3. │ [0.243,0.173,1,0.258] │
4. │ [0.746,0.106,0.258,1] │
└────────────────────────┘
```

View File

@ -0,0 +1,58 @@
---
slug: /en/sql-reference/aggregate-functions/reference/corrstable
sidebar_position: 107
---
# corrStable
Calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient):
$$
\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{\sqrt{\Sigma{(x - \bar{x})^2} * \Sigma{(y - \bar{y})^2}}}
$$
Similar to the [`corr`](../reference/corr.md) function, but uses a numerically stable algorithm. As a result, `corrStable` is slower than `corr` but produces a more accurate result.
**Syntax**
```sql
corrStable(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The Pearson correlation coefficient. [Float64](../../data-types/float.md).
***Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series
(
i UInt32,
x_value Float64,
y_value Float64
)
ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3);
```
```sql
SELECT corrStable(x_value, y_value)
FROM series;
```
Result:
```response
┌─corrStable(x_value, y_value)─┐
│ 0.17302657554532558 │
└──────────────────────────────┘
```

View File

@ -1,14 +1,54 @@
---
slug: /en/sql-reference/aggregate-functions/reference/covarpop
sidebar_position: 36
sidebar_position: 37
---
# covarPop
Syntax: `covarPop(x, y)`
Calculates the population covariance:
Calculates the value of `Σ((x - x̅)(y - y̅)) / n`.
$$
\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{n}
$$
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarPopStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`covarPopStable`](../reference/covarpopstable.md) function. It works slower but provides a lower computational error.
:::
**Syntax**
```sql
covarPop(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The population covariance between `x` and `y`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3);
```
```sql
SELECT covarPop(x_value, y_value)
FROM series;
```
Result:
```reference
┌─covarPop(x_value, y_value)─┐
│ 6.485648 │
└────────────────────────────┘
```

View File

@ -0,0 +1,55 @@
---
slug: /en/sql-reference/aggregate-functions/reference/covarpopmatrix
sidebar_position: 36
---
# covarPopMatrix
Returns the population covariance matrix over N variables.
**Syntax**
```sql
covarPopMatrix(x[, ...])
```
**Arguments**
- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- Population covariance matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test;
CREATE TABLE test
(
a UInt32,
b Float64,
c Float64,
d Float64
)
ENGINE = Memory;
INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762);
```
```sql
SELECT arrayMap(x -> round(x, 3), arrayJoin(covarPopMatrix(a, b, c, d))) AS covarPopMatrix
FROM test;
```
Result:
```reference
┌─covarPopMatrix────────────┐
1. │ [8.25,-1.76,4.08,6.748] │
2. │ [-1.76,41.07,6.486,2.132] │
3. │ [4.08,6.486,34.21,4.755] │
4. │ [6.748,2.132,4.755,9.93] │
└───────────────────────────┘
```

View File

@ -0,0 +1,60 @@
---
slug: /en/sql-reference/aggregate-functions/reference/covarpopstable
sidebar_position: 36
---
# covarPopStable
Calculates the value of the population covariance:
$$
\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{n}
$$
It is similar to the [covarPop](../reference/covarpop.md) function, but uses a numerically stable algorithm. As a result, `covarPopStable` is slower than `covarPop` but produces a more accurate result.
**Syntax**
```sql
covarPop(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The population covariance between `x` and `y`. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3);
```
```sql
SELECT covarPopStable(x_value, y_value)
FROM
(
SELECT
x_value,
y_value
FROM series
);
```
Result:
```reference
┌─covarPopStable(x_value, y_value)─┐
│ 6.485648 │
└──────────────────────────────────┘
```

View File

@ -7,8 +7,74 @@ sidebar_position: 37
Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`.
Returns Float64. When `n <= 1`, returns `nan`.
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarSampStable` function. It works slower but provides a lower computational error.
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`covarSampStable`](../reference/covarsamp.md) function. It works slower but provides a lower computational error.
:::
**Syntax**
```sql
covarSamp(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The sample covariance between `x` and `y`. For `n <= 1`, `nan` is returned. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3);
```
```sql
SELECT covarSamp(x_value, y_value)
FROM
(
SELECT
x_value,
y_value
FROM series
);
```
Result:
```reference
┌─covarSamp(x_value, y_value)─┐
│ 7.206275555555556 │
└─────────────────────────────┘
```
Query:
```sql
SELECT covarSamp(x_value, y_value)
FROM
(
SELECT
x_value,
y_value
FROM series LIMIT 1
);
```
Result:
```reference
┌─covarSamp(x_value, y_value)─┐
│ nan │
└─────────────────────────────┘
```

View File

@ -0,0 +1,57 @@
---
slug: /en/sql-reference/aggregate-functions/reference/covarsampmatrix
sidebar_position: 38
---
# covarSampMatrix
Returns the sample covariance matrix over N variables.
**Syntax**
```sql
covarSampMatrix(x[, ...])
```
**Arguments**
- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- Sample covariance matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))).
**Example**
Query:
```sql
DROP TABLE IF EXISTS test;
CREATE TABLE test
(
a UInt32,
b Float64,
c Float64,
d Float64
)
ENGINE = Memory;
INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762);
```
```sql
SELECT arrayMap(x -> round(x, 3), arrayJoin(covarSampMatrix(a, b, c, d))) AS covarSampMatrix
FROM test;
```
Result:
```reference
┌─covarSampMatrix─────────────┐
1. │ [9.167,-1.956,4.534,7.498] │
2. │ [-1.956,45.634,7.206,2.369] │
3. │ [4.534,7.206,38.011,5.283] │
4. │ [7.498,2.369,5.283,11.034] │
└─────────────────────────────┘
```

View File

@ -0,0 +1,73 @@
---
slug: /en/sql-reference/aggregate-functions/reference/covarsampstable
sidebar_position: 37
---
# covarSampStable
Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`. Similar to [covarSamp](../reference/covarsamp.md) but works slower while providing a lower computational error.
**Syntax**
```sql
covarSampStable(x, y)
```
**Arguments**
- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md).
**Returned Value**
- The sample covariance between `x` and `y`. For `n <= 1`, `inf` is returned. [Float64](../../data-types/float.md).
**Example**
Query:
```sql
DROP TABLE IF EXISTS series;
CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory;
INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3);
```
```sql
SELECT covarSampStable(x_value, y_value)
FROM
(
SELECT
x_value,
y_value
FROM series
);
```
Result:
```reference
┌─covarSampStable(x_value, y_value)─┐
│ 7.206275555555556 │
└───────────────────────────────────┘
```
Query:
```sql
SELECT covarSampStable(x_value, y_value)
FROM
(
SELECT
x_value,
y_value
FROM series LIMIT 1
);
```
Result:
```reference
┌─covarSampStable(x_value, y_value)─┐
│ inf │
└───────────────────────────────────┘
```

View File

@ -9,110 +9,116 @@ toc_hidden: true
Standard aggregate functions:
- [count](/docs/en/sql-reference/aggregate-functions/reference/count.md)
- [min](/docs/en/sql-reference/aggregate-functions/reference/min.md)
- [max](/docs/en/sql-reference/aggregate-functions/reference/max.md)
- [sum](/docs/en/sql-reference/aggregate-functions/reference/sum.md)
- [avg](/docs/en/sql-reference/aggregate-functions/reference/avg.md)
- [any](/docs/en/sql-reference/aggregate-functions/reference/any.md)
- [stddevPop](/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md)
- [stddevPopStable](/docs/en/sql-reference/aggregate-functions/reference/stddevpopstable.md)
- [stddevSamp](/docs/en/sql-reference/aggregate-functions/reference/stddevsamp.md)
- [stddevSampStable](/docs/en/sql-reference/aggregate-functions/reference/stddevsampstable.md)
- [varPop](/docs/en/sql-reference/aggregate-functions/reference/varpop.md)
- [varSamp](/docs/en/sql-reference/aggregate-functions/reference/varsamp.md)
- [corr](./corr.md)
- [covarPop](/docs/en/sql-reference/aggregate-functions/reference/covarpop.md)
- [covarSamp](/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md)
- [entropy](./entropy.md)
- [exponentialMovingAverage](./exponentialmovingaverage.md)
- [intervalLengthSum](./intervalLengthSum.md)
- [kolmogorovSmirnovTest](./kolmogorovsmirnovtest.md)
- [mannwhitneyutest](./mannwhitneyutest.md)
- [median](./median.md)
- [rankCorr](./rankCorr.md)
- [sumKahan](./sumkahan.md)
- [studentTTest](./studentttest.md)
- [welchTTest](./welchttest.md)
- [count](../reference/count.md)
- [min](../reference/min.md)
- [max](../reference/max.md)
- [sum](../reference/sum.md)
- [avg](../reference/avg.md)
- [any](../reference/any.md)
- [stddevPop](../reference/stddevpop.md)
- [stddevPopStable](../reference/stddevpopstable.md)
- [stddevSamp](../reference/stddevsamp.md)
- [stddevSampStable](../reference/stddevsampstable.md)
- [varPop](../reference/varpop.md)
- [varSamp](../reference/varsamp.md)
- [corr](../reference/corr.md)
- [corr](../reference/corrstable.md)
- [corrMatrix](../reference/corrmatrix.md)
- [covarPop](../reference/covarpop.md)
- [covarStable](../reference/covarpopstable.md)
- [covarPopMatrix](../reference/covarpopmatrix.md)
- [covarSamp](../reference/covarsamp.md)
- [covarSampStable](../reference/covarsampstable.md)
- [covarSampMatrix](../reference/covarsampmatrix.md)
- [entropy](../reference/entropy.md)
- [exponentialMovingAverage](../reference/exponentialmovingaverage.md)
- [intervalLengthSum](../reference/intervalLengthSum.md)
- [kolmogorovSmirnovTest](../reference/kolmogorovsmirnovtest.md)
- [mannwhitneyutest](../reference/mannwhitneyutest.md)
- [median](../reference/median.md)
- [rankCorr](../reference/rankCorr.md)
- [sumKahan](../reference/sumkahan.md)
- [studentTTest](../reference/studentttest.md)
- [welchTTest](../reference/welchttest.md)
ClickHouse-specific aggregate functions:
- [analysisOfVariance](/docs/en/sql-reference/aggregate-functions/reference/analysis_of_variance.md)
- [any](/docs/en/sql-reference/aggregate-functions/reference/any_respect_nulls.md)
- [anyHeavy](/docs/en/sql-reference/aggregate-functions/reference/anyheavy.md)
- [anyLast](/docs/en/sql-reference/aggregate-functions/reference/anylast.md)
- [anyLast](/docs/en/sql-reference/aggregate-functions/reference/anylast_respect_nulls.md)
- [boundingRatio](/docs/en/sql-reference/aggregate-functions/reference/boundrat.md)
- [first_value](/docs/en/sql-reference/aggregate-functions/reference/first_value.md)
- [last_value](/docs/en/sql-reference/aggregate-functions/reference/last_value.md)
- [argMin](/docs/en/sql-reference/aggregate-functions/reference/argmin.md)
- [argMax](/docs/en/sql-reference/aggregate-functions/reference/argmax.md)
- [avgWeighted](/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md)
- [topK](/docs/en/sql-reference/aggregate-functions/reference/topk.md)
- [topKWeighted](/docs/en/sql-reference/aggregate-functions/reference/topkweighted.md)
- [deltaSum](./deltasum.md)
- [deltaSumTimestamp](./deltasumtimestamp.md)
- [groupArray](/docs/en/sql-reference/aggregate-functions/reference/grouparray.md)
- [groupArrayLast](/docs/en/sql-reference/aggregate-functions/reference/grouparraylast.md)
- [groupUniqArray](/docs/en/sql-reference/aggregate-functions/reference/groupuniqarray.md)
- [groupArrayInsertAt](/docs/en/sql-reference/aggregate-functions/reference/grouparrayinsertat.md)
- [groupArrayMovingAvg](/docs/en/sql-reference/aggregate-functions/reference/grouparraymovingavg.md)
- [groupArrayMovingSum](/docs/en/sql-reference/aggregate-functions/reference/grouparraymovingsum.md)
- [groupArraySample](./grouparraysample.md)
- [groupArraySorted](/docs/en/sql-reference/aggregate-functions/reference/grouparraysorted.md)
- [groupArrayIntersect](./grouparrayintersect.md)
- [groupBitAnd](/docs/en/sql-reference/aggregate-functions/reference/groupbitand.md)
- [groupBitOr](/docs/en/sql-reference/aggregate-functions/reference/groupbitor.md)
- [groupBitXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitxor.md)
- [groupBitmap](/docs/en/sql-reference/aggregate-functions/reference/groupbitmap.md)
- [groupBitmapAnd](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapand.md)
- [groupBitmapOr](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapor.md)
- [groupBitmapXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapxor.md)
- [sumWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/sumwithoverflow.md)
- [sumMap](/docs/en/sql-reference/aggregate-functions/reference/summap.md)
- [sumMapWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/summapwithoverflow.md)
- [sumMapFiltered](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfiltered)
- [sumMapFilteredWithOverflow](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfilteredwithoverflow)
- [minMap](/docs/en/sql-reference/aggregate-functions/reference/minmap.md)
- [maxMap](/docs/en/sql-reference/aggregate-functions/reference/maxmap.md)
- [skewSamp](/docs/en/sql-reference/aggregate-functions/reference/skewsamp.md)
- [skewPop](/docs/en/sql-reference/aggregate-functions/reference/skewpop.md)
- [kurtSamp](/docs/en/sql-reference/aggregate-functions/reference/kurtsamp.md)
- [kurtPop](/docs/en/sql-reference/aggregate-functions/reference/kurtpop.md)
- [uniq](/docs/en/sql-reference/aggregate-functions/reference/uniq.md)
- [uniqExact](/docs/en/sql-reference/aggregate-functions/reference/uniqexact.md)
- [uniqCombined](/docs/en/sql-reference/aggregate-functions/reference/uniqcombined.md)
- [uniqCombined64](/docs/en/sql-reference/aggregate-functions/reference/uniqcombined64.md)
- [uniqHLL12](/docs/en/sql-reference/aggregate-functions/reference/uniqhll12.md)
- [uniqTheta](/docs/en/sql-reference/aggregate-functions/reference/uniqthetasketch.md)
- [quantile](/docs/en/sql-reference/aggregate-functions/reference/quantile.md)
- [quantiles](/docs/en/sql-reference/aggregate-functions/reference/quantiles.md)
- [quantileExact](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md)
- [quantileExactLow](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow)
- [quantileExactHigh](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh)
- [quantileExactWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md)
- [quantileTiming](/docs/en/sql-reference/aggregate-functions/reference/quantiletiming.md)
- [quantileTimingWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantiletimingweighted.md)
- [quantileDeterministic](/docs/en/sql-reference/aggregate-functions/reference/quantiledeterministic.md)
- [quantileTDigest](/docs/en/sql-reference/aggregate-functions/reference/quantiletdigest.md)
- [quantileTDigestWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantiletdigestweighted.md)
- [quantileBFloat16](/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16)
- [quantileBFloat16Weighted](/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16weighted)
- [quantileDD](/docs/en/sql-reference/aggregate-functions/reference/quantileddsketch.md#quantileddsketch)
- [simpleLinearRegression](/docs/en/sql-reference/aggregate-functions/reference/simplelinearregression.md)
- [singleValueOrNull](/docs/en/sql-reference/aggregate-functions/reference/singlevalueornull.md)
- [stochasticLinearRegression](/docs/en/sql-reference/aggregate-functions/reference/stochasticlinearregression.md)
- [stochasticLogisticRegression](/docs/en/sql-reference/aggregate-functions/reference/stochasticlogisticregression.md)
- [categoricalInformationValue](/docs/en/sql-reference/aggregate-functions/reference/categoricalinformationvalue.md)
- [contingency](./contingency.md)
- [cramersV](./cramersv.md)
- [cramersVBiasCorrected](./cramersvbiascorrected.md)
- [theilsU](./theilsu.md)
- [maxIntersections](./maxintersections.md)
- [maxIntersectionsPosition](./maxintersectionsposition.md)
- [meanZTest](./meanztest.md)
- [quantileGK](./quantileGK.md)
- [quantileInterpolatedWeighted](./quantileinterpolatedweighted.md)
- [sparkBar](./sparkbar.md)
- [sumCount](./sumcount.md)
- [largestTriangleThreeBuckets](./largestTriangleThreeBuckets.md)
- [analysisOfVariance](../reference/analysis_of_variance.md)
- [any](../reference/any_respect_nulls.md)
- [anyHeavy](../reference/anyheavy.md)
- [anyLast](../reference/anylast.md)
- [anyLast](../reference/anylast_respect_nulls.md)
- [boundingRatio](../reference/boundrat.md)
- [first_value](../reference/first_value.md)
- [last_value](../reference/last_value.md)
- [argMin](../reference/argmin.md)
- [argMax](../reference/argmax.md)
- [avgWeighted](../reference/avgweighted.md)
- [topK](../reference/topk.md)
- [topKWeighted](../reference/topkweighted.md)
- [deltaSum](../reference/deltasum.md)
- [deltaSumTimestamp](../reference/deltasumtimestamp.md)
- [groupArray](../reference/grouparray.md)
- [groupArrayLast](../reference/grouparraylast.md)
- [groupUniqArray](../reference/groupuniqarray.md)
- [groupArrayInsertAt](../reference/grouparrayinsertat.md)
- [groupArrayMovingAvg](../reference/grouparraymovingavg.md)
- [groupArrayMovingSum](../reference/grouparraymovingsum.md)
- [groupArraySample](../reference/grouparraysample.md)
- [groupArraySorted](../reference/grouparraysorted.md)
- [groupArrayIntersect](../reference/grouparrayintersect.md)
- [groupBitAnd](../reference/groupbitand.md)
- [groupBitOr](../reference/groupbitor.md)
- [groupBitXor](../reference/groupbitxor.md)
- [groupBitmap](../reference/groupbitmap.md)
- [groupBitmapAnd](../reference/groupbitmapand.md)
- [groupBitmapOr](../reference/groupbitmapor.md)
- [groupBitmapXor](../reference/groupbitmapxor.md)
- [sumWithOverflow](../reference/sumwithoverflow.md)
- [sumMap](../reference/summap.md)
- [sumMapWithOverflow](../reference/summapwithoverflow.md)
- [sumMapFiltered](../parametric-functions.md/#summapfiltered)
- [sumMapFilteredWithOverflow](../parametric-functions.md/#summapfilteredwithoverflow)
- [minMap](../reference/minmap.md)
- [maxMap](../reference/maxmap.md)
- [skewSamp](../reference/skewsamp.md)
- [skewPop](../reference/skewpop.md)
- [kurtSamp](../reference/kurtsamp.md)
- [kurtPop](../reference/kurtpop.md)
- [uniq](../reference/uniq.md)
- [uniqExact](../reference/uniqexact.md)
- [uniqCombined](../reference/uniqcombined.md)
- [uniqCombined64](../reference/uniqcombined64.md)
- [uniqHLL12](../reference/uniqhll12.md)
- [uniqTheta](../reference/uniqthetasketch.md)
- [quantile](../reference/quantile.md)
- [quantiles](../reference/quantiles.md)
- [quantileExact](../reference/quantileexact.md)
- [quantileExactLow](../reference/quantileexact.md#quantileexactlow)
- [quantileExactHigh](../reference/quantileexact.md#quantileexacthigh)
- [quantileExactWeighted](../reference/quantileexactweighted.md)
- [quantileTiming](../reference/quantiletiming.md)
- [quantileTimingWeighted](../reference/quantiletimingweighted.md)
- [quantileDeterministic](../reference/quantiledeterministic.md)
- [quantileTDigest](../reference/quantiletdigest.md)
- [quantileTDigestWeighted](../reference/quantiletdigestweighted.md)
- [quantileBFloat16](../reference/quantilebfloat16.md#quantilebfloat16)
- [quantileBFloat16Weighted](../reference/quantilebfloat16.md#quantilebfloat16weighted)
- [quantileDD](../reference/quantileddsketch.md#quantileddsketch)
- [simpleLinearRegression](../reference/simplelinearregression.md)
- [singleValueOrNull](../reference/singlevalueornull.md)
- [stochasticLinearRegression](../reference/stochasticlinearregression.md)
- [stochasticLogisticRegression](../reference/stochasticlogisticregression.md)
- [categoricalInformationValue](../reference/categoricalinformationvalue.md)
- [contingency](../reference/contingency.md)
- [cramersV](../reference/cramersv.md)
- [cramersVBiasCorrected](../reference/cramersvbiascorrected.md)
- [theilsU](../reference/theilsu.md)
- [maxIntersections](../reference/maxintersections.md)
- [maxIntersectionsPosition](../reference/maxintersectionsposition.md)
- [meanZTest](../reference/meanztest.md)
- [quantileGK](../reference/quantileGK.md)
- [quantileInterpolatedWeighted](../reference/quantileinterpolatedweighted.md)
- [sparkBar](../reference/sparkbar.md)
- [sumCount](../reference/sumcount.md)
- [largestTriangleThreeBuckets](../reference/largestTriangleThreeBuckets.md)

View File

@ -167,7 +167,7 @@ Performs the opposite operation of [hex](#hex). It interprets each pair of hexad
If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs&lt;Type&gt;](../../sql-reference/functions/type-conversion-functions.md#type-conversion-functions) functions.
:::note
:::note
If `unhex` is invoked from within the `clickhouse-client`, binary strings display using UTF-8.
:::
@ -322,11 +322,11 @@ Alias: `UNBIN`.
For a numeric argument `unbin()` does not return the inverse of `bin()`. If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs&lt;Type&gt;](../../sql-reference/functions/type-conversion-functions.md#reinterpretasuint8163264) functions.
:::note
:::note
If `unbin` is invoked from within the `clickhouse-client`, binary strings are displayed using UTF-8.
:::
Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isnt thrown).
Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isnt thrown).
**Arguments**
@ -482,7 +482,7 @@ mortonEncode(range_mask, args)
- `range_mask`: 1-8.
- `args`: up to 8 [unsigned integers](../data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
@ -626,7 +626,7 @@ Result:
Accepts a range mask (tuple) as a first argument and the code as the second argument.
Each number in the mask configures the amount of range shrink:<br/>
1 - no shrink<br/>
2 - 2x shrink<br/>
2 - 2x shrink<br/>
3 - 3x shrink<br/>
...<br/>
Up to 8x shrink.<br/>
@ -701,6 +701,267 @@ Result:
1 2 3 4 5 6 7 8
```
## hilbertEncode
Calculates code for Hilbert Curve for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
### Simple mode
Simple: accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
hilbertEncode(args)
```
**Parameters**
- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT hilbertEncode(3, 4);
```
Result:
```response
31
```
### Expanded mode
Accepts a range mask ([tuple](../../sql-reference/data-types/tuple.md)) as a first argument and up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
**Syntax**
```sql
hilbertEncode(range_mask, args)
```
**Parameters**
- `range_mask`: ([tuple](../../sql-reference/data-types/tuple.md))
- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
Query:
```sql
SELECT hilbertEncode((10,6), 1024, 16);
```
Result:
```response
4031541586602
```
Note: tuple size must be equal to the number of the other arguments.
**Example**
For a single argument without a tuple, the function returns the argument itself as the Hilbert index, since no dimensional mapping is needed.
Query:
```sql
SELECT hilbertEncode(1);
```
Result:
```response
1
```
**Example**
If a single argument is provided with a tuple specifying bit shifts, the function shifts the argument left by the specified number of bits.
Query:
```sql
SELECT hilbertEncode(tuple(2), 128);
```
Result:
```response
512
```
**Example**
The function also accepts columns as arguments:
Query:
First create the table and insert some data.
```sql
create table hilbert_numbers(
n1 UInt32,
n2 UInt32
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into hilbert_numbers (*) values(1,2);
```
Use column names instead of constants as function arguments to `hilbertEncode`
Query:
```sql
SELECT hilbertEncode(n1, n2) FROM hilbert_numbers;
```
Result:
```response
13
```
**implementation details**
Please note that you can fit only so many bits of information into Hilbert code as [UInt64](../../sql-reference/data-types/int-uint.md) has. Two arguments will have a range of maximum 2^32 (64/2) each. All overflow will be clamped to zero.
## hilbertDecode
Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.
As with the `hilbertEncode` function, this function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
hilbertDecode(tuple_size, code)
```
**Parameters**
- `tuple_size`: integer value no more than 2.
- `code`: [UInt64](../../sql-reference/data-types/int-uint.md) code.
**Returned value**
- [tuple](../../sql-reference/data-types/tuple.md) of the specified size.
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT hilbertDecode(2, 31);
```
Result:
```response
["3", "4"]
```
### Expanded mode
Accepts a range mask (tuple) as a first argument and up to 2 unsigned integers as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
As with the encode function, this is limited to 8 numbers at most.
**Example**
Hilbert code for one argument is always the argument itself (as a tuple).
Query:
```sql
SELECT hilbertDecode(1, 1);
```
Result:
```response
["1"]
```
**Example**
A single argument with a tuple specifying bit shifts will be right-shifted accordingly.
Query:
```sql
SELECT hilbertDecode(tuple(2), 32768);
```
Result:
```response
["128"]
```
**Example**
The function accepts a column of codes as a second argument:
First create the table and insert some data.
Query:
```sql
create table hilbert_numbers(
n1 UInt32,
n2 UInt32
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into hilbert_numbers (*) values(1,2);
```
Use column names instead of constants as function arguments to `hilbertDecode`
Query:
```sql
select untuple(hilbertDecode(2, hilbertEncode(n1, n2))) from hilbert_numbers;
```
Result:
```response
1 2
```

View File

@ -410,6 +410,10 @@ High compression levels are useful for asymmetric scenarios, like compress once,
- For compression, ZSTD_QAT tries to use an Intel® QAT offloading device ([QuickAssist Technology](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html)). If no such device was found, it will fallback to ZSTD compression in software.
- Decompression is always performed in software.
:::note
ZSTD_QAT is not available in ClickHouse Cloud.
:::
#### DEFLATE_QPL
`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply:

View File

@ -154,7 +154,8 @@ function _clickhouse_quote()
# Extract every option (everything that starts with "-") from the --help dialog.
function _clickhouse_get_options()
{
"$@" --help 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u
# By default --help will not print all settings, this is done only under --verbose
"$@" --help --verbose 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u
}
function _complete_for_clickhouse_generic_bin_impl()

View File

@ -11,7 +11,6 @@ namespace DB
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int KEEPER_EXCEPTION;
}
bool LSCommand::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const
@ -214,6 +213,143 @@ void GetStatCommand::execute(const ASTKeeperQuery * query, KeeperClient * client
std::cout << "numChildren = " << stat.numChildren << "\n";
}
namespace
{
/// Helper class for parallelized tree traversal
template <class UserCtx>
struct TraversalTask : public std::enable_shared_from_this<TraversalTask<UserCtx>>
{
using TraversalTaskPtr = std::shared_ptr<TraversalTask<UserCtx>>;
struct Ctx
{
std::deque<TraversalTaskPtr> new_tasks; /// Tasks for newly discovered children, that hasn't been started yet
std::deque<std::function<void(Ctx &)>> in_flight_list_requests; /// In-flight getChildren requests
std::deque<std::function<void(Ctx &)>> finish_callbacks; /// Callbacks to be called
KeeperClient * client;
UserCtx & user_ctx;
Ctx(KeeperClient * client_, UserCtx & user_ctx_) : client(client_), user_ctx(user_ctx_) {}
};
private:
const fs::path path;
const TraversalTaskPtr parent;
Int64 child_tasks = 0;
Int64 nodes_in_subtree = 1;
public:
TraversalTask(const fs::path & path_, TraversalTaskPtr parent_)
: path(path_)
, parent(parent_)
{
}
/// Start traversing the subtree
void onStart(Ctx & ctx)
{
/// tryGetChildren doesn't throw if the node is not found (was deleted in the meantime)
std::shared_ptr<std::future<Coordination::ListResponse>> list_request =
std::make_shared<std::future<Coordination::ListResponse>>(ctx.client->zookeeper->asyncTryGetChildren(path));
ctx.in_flight_list_requests.push_back([task = this->shared_from_this(), list_request](Ctx & ctx_) mutable
{
task->onGetChildren(ctx_, list_request->get());
});
}
/// Called when getChildren request returns
void onGetChildren(Ctx & ctx, const Coordination::ListResponse & response)
{
const bool traverse_children = ctx.user_ctx.onListChildren(path, response.names);
if (traverse_children)
{
/// Schedule traversal of each child
for (const auto & child : response.names)
{
auto task = std::make_shared<TraversalTask>(path / child, this->shared_from_this());
ctx.new_tasks.push_back(task);
}
child_tasks = response.names.size();
}
if (child_tasks == 0)
finish(ctx);
}
/// Called when a child subtree has been traversed
void onChildTraversalFinished(Ctx & ctx, Int64 child_nodes_in_subtree)
{
nodes_in_subtree += child_nodes_in_subtree;
--child_tasks;
/// Finish if all children have been traversed
if (child_tasks == 0)
finish(ctx);
}
private:
/// This node and all its children have been traversed
void finish(Ctx & ctx)
{
ctx.user_ctx.onFinishChildrenTraversal(path, nodes_in_subtree);
if (!parent)
return;
/// Notify the parent that we have finished traversing the subtree
ctx.finish_callbacks.push_back([p = this->parent, child_nodes_in_subtree = this->nodes_in_subtree](Ctx & ctx_)
{
p->onChildTraversalFinished(ctx_, child_nodes_in_subtree);
});
}
};
/// Traverses the tree in parallel and calls user callbacks
/// Parallelization is achieved by sending multiple async getChildren requests to Keeper, but all processing is done in a single thread
template <class UserCtx>
void parallelized_traverse(const fs::path & path, KeeperClient * client, size_t max_in_flight_requests, UserCtx & ctx_)
{
typename TraversalTask<UserCtx>::Ctx ctx(client, ctx_);
auto root_task = std::make_shared<TraversalTask<UserCtx>>(path, nullptr);
ctx.new_tasks.push_back(root_task);
/// Until there is something to do
while (!ctx.new_tasks.empty() || !ctx.in_flight_list_requests.empty() || !ctx.finish_callbacks.empty())
{
/// First process all finish callbacks, they don't wait for anything and allow to free memory
while (!ctx.finish_callbacks.empty())
{
auto callback = std::move(ctx.finish_callbacks.front());
ctx.finish_callbacks.pop_front();
callback(ctx);
}
/// Make new requests if there are less than max in flight
while (!ctx.new_tasks.empty() && ctx.in_flight_list_requests.size() < max_in_flight_requests)
{
auto task = std::move(ctx.new_tasks.front());
ctx.new_tasks.pop_front();
task->onStart(ctx);
}
/// Wait for first request in the queue to finish
if (!ctx.in_flight_list_requests.empty())
{
auto request = std::move(ctx.in_flight_list_requests.front());
ctx.in_flight_list_requests.pop_front();
request(ctx);
}
}
}
} /// anonymous namespace
bool FindSuperNodes::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const
{
ASTPtr threshold;
@ -237,27 +373,21 @@ void FindSuperNodes::execute(const ASTKeeperQuery * query, KeeperClient * client
auto threshold = query->args[0].safeGet<UInt64>();
auto path = client->getAbsolutePath(query->args[1].safeGet<String>());
Coordination::Stat stat;
if (!client->zookeeper->exists(path, &stat))
return; /// It is ok if node was deleted meanwhile
if (stat.numChildren >= static_cast<Int32>(threshold))
std::cout << static_cast<String>(path) << "\t" << stat.numChildren << "\n";
Strings children;
auto status = client->zookeeper->tryGetChildren(path, children);
if (status == Coordination::Error::ZNONODE)
return; /// It is ok if node was deleted meanwhile
else if (status != Coordination::Error::ZOK)
throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string());
std::sort(children.begin(), children.end());
auto next_query = *query;
for (const auto & child : children)
struct
{
next_query.args[1] = DB::Field(path / child);
execute(&next_query, client);
}
bool onListChildren(const fs::path & path, const Strings & children) const
{
if (children.size() >= threshold)
std::cout << static_cast<String>(path) << "\t" << children.size() << "\n";
return true;
}
void onFinishChildrenTraversal(const fs::path &, Int64) const {}
size_t threshold;
} ctx {.threshold = threshold };
parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx);
}
bool DeleteStaleBackups::parse(IParser::Pos & /* pos */, std::shared_ptr<ASTKeeperQuery> & /* node */, Expected & /* expected */) const
@ -322,38 +452,28 @@ bool FindBigFamily::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> &
return true;
}
/// DFS the subtree and return the number of nodes in the subtree
static Int64 traverse(const fs::path & path, KeeperClient * client, std::vector<std::tuple<Int64, String>> & result)
{
Int64 nodes_in_subtree = 1;
Strings children;
auto status = client->zookeeper->tryGetChildren(path, children);
if (status == Coordination::Error::ZNONODE)
return 0;
else if (status != Coordination::Error::ZOK)
throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string());
for (auto & child : children)
nodes_in_subtree += traverse(path / child, client, result);
result.emplace_back(nodes_in_subtree, path.string());
return nodes_in_subtree;
}
void FindBigFamily::execute(const ASTKeeperQuery * query, KeeperClient * client) const
{
auto path = client->getAbsolutePath(query->args[0].safeGet<String>());
auto n = query->args[1].safeGet<UInt64>();
std::vector<std::tuple<Int64, String>> result;
struct
{
std::vector<std::tuple<Int64, String>> result;
traverse(path, client, result);
bool onListChildren(const fs::path &, const Strings &) const { return true; }
std::sort(result.begin(), result.end(), std::greater());
for (UInt64 i = 0; i < std::min(result.size(), static_cast<size_t>(n)); ++i)
std::cout << std::get<1>(result[i]) << "\t" << std::get<0>(result[i]) << "\n";
void onFinishChildrenTraversal(const fs::path & path, Int64 nodes_in_subtree)
{
result.emplace_back(nodes_in_subtree, path.string());
}
} ctx;
parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx);
std::sort(ctx.result.begin(), ctx.result.end(), std::greater());
for (UInt64 i = 0; i < std::min(ctx.result.size(), static_cast<size_t>(n)); ++i)
std::cout << std::get<1>(ctx.result[i]) << "\t" << std::get<0>(ctx.result[i]) << "\n";
}
bool RMCommand::parse(IParser::Pos & pos, std::shared_ptr<ASTKeeperQuery> & node, Expected & expected) const

View File

@ -9,8 +9,6 @@ set (CLICKHOUSE_KEEPER_LINK
clickhouse_common_zookeeper
daemon
dbms
${LINK_RESOURCE_LIB}
)
clickhouse_program_add(keeper)
@ -210,8 +208,6 @@ if (BUILD_STANDALONE_KEEPER)
loggers_no_text_log
clickhouse_common_io
clickhouse_parsers # Otherwise compression will not built. FIXME.
${LINK_RESOURCE_LIB_STANDALONE_KEEPER}
)
set_target_properties(clickhouse-keeper PROPERTIES RUNTIME_OUTPUT_DIRECTORY ../)

View File

@ -14,8 +14,6 @@ set (CLICKHOUSE_SERVER_LINK
clickhouse_storages_system
clickhouse_table_functions
${LINK_RESOURCE_LIB}
PUBLIC
daemon
)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,157 @@
#pragma once
#include <Analyzer/HashUtils.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/Resolve/IdentifierLookup.h>
#include <Core/Joins.h>
#include <Core/NamesAndTypes.h>
#include <Interpreters/Context_fwd.h>
#include <Parsers/NullsAction.h>
namespace DB
{
struct GetColumnsOptions;
struct IdentifierResolveScope;
struct AnalysisTableExpressionData;
class QueryExpressionsAliasVisitor ;
class QueryNode;
class JoinNode;
class ColumnNode;
using ProjectionName = String;
using ProjectionNames = std::vector<ProjectionName>;
struct Settings;
class IdentifierResolver
{
public:
IdentifierResolver(
std::unordered_set<std::string_view> & ctes_in_resolve_process_,
std::unordered_map<QueryTreeNodePtr, ProjectionName> & node_to_projection_name_)
: ctes_in_resolve_process(ctes_in_resolve_process_)
, node_to_projection_name(node_to_projection_name_)
{}
/// Utility functions
static bool isExpressionNodeType(QueryTreeNodeType node_type);
static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type);
static bool isSubqueryNodeType(QueryTreeNodeType node_type);
static bool isTableExpressionNodeType(QueryTreeNodeType node_type);
static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node);
static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const DataTypePtr & compound_expression_type,
const Identifier & valid_identifier_prefix,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const QueryTreeNodePtr & table_expression,
const AnalysisTableExpressionData & table_expression_data,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static std::vector<String> collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set<Identifier> & valid_identifiers);
static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path, const ContextPtr & context);
static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded(
const QueryTreeNodePtr & resolved_identifier,
const JoinKind & join_kind,
std::optional<JoinTableSide> resolved_side,
IdentifierResolveScope & scope);
/// Resolve identifier functions
static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context);
QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier,
size_t identifier_bind_size,
const QueryTreeNodePtr & compound_expression,
String compound_expression_source,
IdentifierResolveScope & scope,
bool can_be_not_found = false);
QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr matchArrayJoinSubcolumns(
const QueryTreeNodePtr & array_join_column_inner_expression,
const ColumnNode & array_join_column_expression_typed,
const QueryTreeNodePtr & resolved_expression,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & join_tree_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromStorage(
const Identifier & identifier,
const QueryTreeNodePtr & table_expression_node,
const AnalysisTableExpressionData & table_expression_data,
IdentifierResolveScope & scope,
size_t identifier_column_qualifier_parts,
bool can_be_not_found = false);
/// CTEs that are currently in resolve process
std::unordered_set<std::string_view> & ctes_in_resolve_process;
/// Global expression node to projection name map
std::unordered_map<QueryTreeNodePtr, ProjectionName> & node_to_projection_name;
};
}

File diff suppressed because it is too large Load Diff

View File

@ -4,6 +4,7 @@
#include <Analyzer/HashUtils.h>
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/Resolve/IdentifierLookup.h>
#include <Analyzer/Resolve/IdentifierResolver.h>
#include <Core/Joins.h>
#include <Core/NamesAndTypes.h>
@ -121,16 +122,6 @@ public:
private:
/// Utility functions
static bool isExpressionNodeType(QueryTreeNodeType node_type);
static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type);
static bool isSubqueryNodeType(QueryTreeNodeType node_type);
static bool isTableExpressionNodeType(QueryTreeNodeType node_type);
static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node);
static ProjectionName calculateFunctionProjectionName(const QueryTreeNodePtr & function_node,
const ProjectionNames & parameters_projection_names,
const ProjectionNames & arguments_projection_names);
@ -149,34 +140,6 @@ private:
const ProjectionName & fill_to_expression_projection_name,
const ProjectionName & fill_step_expression_projection_name);
static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const DataTypePtr & compound_expression_type,
const Identifier & valid_identifier_prefix,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const QueryTreeNodePtr & table_expression,
const AnalysisTableExpressionData & table_expression_data,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
const IdentifierResolveScope & scope,
bool allow_expression_identifiers,
bool allow_function_identifiers,
bool allow_table_expression_identifiers,
std::unordered_set<Identifier> & valid_identifiers_result);
static std::vector<String> collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set<Identifier> & valid_identifiers);
static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path);
QueryTreeNodePtr tryGetLambdaFromSQLUserDefinedFunctions(const std::string & function_name, ContextPtr context);
void evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & query_tree_node, IdentifierResolveScope & scope);
@ -204,84 +167,18 @@ private:
static std::optional<JoinTableSide> getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node);
static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded(
const QueryTreeNodePtr & resolved_identifier,
const JoinKind & join_kind,
std::optional<JoinTableSide> resolved_side,
IdentifierResolveScope & scope);
/// Resolve identifier functions
static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context);
QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier,
size_t identifier_bind_size,
const QueryTreeNodePtr & compound_expression,
String compound_expression_source,
IdentifierResolveScope & scope,
bool can_be_not_found = false);
QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromAliases(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope,
IdentifierResolveSettings identifier_resolve_settings);
QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
const IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr matchArrayJoinSubcolumns(
const QueryTreeNodePtr & array_join_column_inner_expression,
const ColumnNode & array_join_column_expression_typed,
const QueryTreeNodePtr & resolved_expression,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & join_tree_node,
IdentifierResolveScope & scope);
QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope);
IdentifierResolveResult tryResolveIdentifierInParentScopes(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
IdentifierResolveResult tryResolveIdentifier(const IdentifierLookup & identifier_lookup,
IdentifierResolveScope & scope,
IdentifierResolveSettings identifier_resolve_settings = {});
QueryTreeNodePtr tryResolveIdentifierFromStorage(
const Identifier & identifier,
const QueryTreeNodePtr & table_expression_node,
const AnalysisTableExpressionData & table_expression_data,
IdentifierResolveScope & scope,
size_t identifier_column_qualifier_parts,
bool can_be_not_found = false);
/// Resolve query tree nodes functions
void qualifyColumnNodesWithProjectionNames(const QueryTreeNodes & column_nodes,
@ -362,6 +259,8 @@ private:
/// Global expression node to projection name map
std::unordered_map<QueryTreeNodePtr, ProjectionName> node_to_projection_name;
IdentifierResolver identifier_resolver; // (ctes_in_resolve_process, node_to_projection_name);
/// Global resolve expression node to projection names map
std::unordered_map<QueryTreeNodePtr, ProjectionNames> resolved_expressions;

View File

@ -0,0 +1,71 @@
#pragma once
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Utils.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/// Used to replace columns that changed type because of JOIN to their original type
class ReplaceColumnsVisitor : public InDepthQueryTreeVisitor<ReplaceColumnsVisitor>
{
public:
explicit ReplaceColumnsVisitor(const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_, const ContextPtr & context_)
: replacement_map(replacement_map_)
, context(context_)
{}
/// Apply replacement transitively, because column may change it's type twice, one to have a supertype and then because of `joun_use_nulls`
static QueryTreeNodePtr findTransitiveReplacement(QueryTreeNodePtr node, const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_)
{
auto it = replacement_map_.find(node);
QueryTreeNodePtr result_node = nullptr;
for (; it != replacement_map_.end(); it = replacement_map_.find(result_node))
{
if (result_node && result_node->isEqual(*it->second))
{
Strings map_dump;
for (const auto & [k, v]: replacement_map_)
map_dump.push_back(fmt::format("{} -> {} (is_equals: {}, is_same: {})",
k.node->dumpTree(), v->dumpTree(), k.node->isEqual(*v), k.node == v));
throw Exception(ErrorCodes::LOGICAL_ERROR, "Infinite loop in query tree replacement map: {}", fmt::join(map_dump, "; "));
}
chassert(it->second);
result_node = it->second;
}
return result_node;
}
void visitImpl(QueryTreeNodePtr & node)
{
if (auto replacement_node = findTransitiveReplacement(node, replacement_map))
node = replacement_node;
if (auto * function_node = node->as<FunctionNode>(); function_node && function_node->isResolved())
rerunFunctionResolve(function_node, context);
}
/// We want to re-run resolve for function _after_ its arguments are replaced
bool shouldTraverseTopToBottom() const { return false; }
bool needChildVisit(QueryTreeNodePtr & /* parent */, QueryTreeNodePtr & child)
{
/// Visit only expressions, but not subqueries
return child->getNodeType() == QueryTreeNodeType::IDENTIFIER
|| child->getNodeType() == QueryTreeNodeType::LIST
|| child->getNodeType() == QueryTreeNodeType::FUNCTION
|| child->getNodeType() == QueryTreeNodeType::COLUMN;
}
private:
const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map;
const ContextPtr & context;
};
}

View File

@ -60,12 +60,9 @@ ColumnPtr IColumnDummy::filter(const Filter & filt, ssize_t /*result_size_hint*/
return cloneDummy(bytes);
}
void IColumnDummy::expand(const IColumn::Filter & mask, bool inverted)
void IColumnDummy::expand(const IColumn::Filter & mask, bool)
{
size_t bytes = countBytesInFilter(mask);
if (inverted)
bytes = mask.size() - bytes;
s = bytes;
s = mask.size();
}
ColumnPtr IColumnDummy::permute(const Permutation & perm, size_t limit) const

View File

@ -77,7 +77,7 @@ INSTANTIATE(IPv6)
#undef INSTANTIATE
template <bool inverted, bool column_is_short, typename Container>
template <bool inverted, typename Container>
static size_t extractMaskNumericImpl(
PaddedPODArray<UInt8> & mask,
const Container & data,
@ -85,42 +85,27 @@ static size_t extractMaskNumericImpl(
const PaddedPODArray<UInt8> * null_bytemap,
PaddedPODArray<UInt8> * nulls)
{
if constexpr (!column_is_short)
{
if (data.size() != mask.size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a full data column is not equal to the size of a mask");
}
if (data.size() != mask.size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a full data column is not equal to the size of a mask");
size_t ones_count = 0;
size_t data_index = 0;
size_t mask_size = mask.size();
size_t data_size = data.size();
for (size_t i = 0; i != mask_size && data_index != data_size; ++i)
for (size_t i = 0; i != mask_size; ++i)
{
// Change mask only where value is 1.
if (!mask[i])
continue;
UInt8 value;
size_t index;
if constexpr (column_is_short)
{
index = data_index;
++data_index;
}
else
index = i;
if (null_bytemap && (*null_bytemap)[index])
if (null_bytemap && (*null_bytemap)[i])
{
value = null_value;
if (nulls)
(*nulls)[i] = 1;
}
else
value = static_cast<bool>(data[index]);
value = static_cast<bool>(data[i]);
if constexpr (inverted)
value = !value;
@ -131,12 +116,6 @@ static size_t extractMaskNumericImpl(
mask[i] = value;
}
if constexpr (column_is_short)
{
if (data_index != data_size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of a short column is not equal to the number of ones in a mask");
}
return ones_count;
}
@ -155,10 +134,7 @@ static bool extractMaskNumeric(
const auto & data = numeric_column->getData();
size_t ones_count;
if (column->size() < mask.size())
ones_count = extractMaskNumericImpl<inverted, true>(mask, data, null_value, null_bytemap, nulls);
else
ones_count = extractMaskNumericImpl<inverted, false>(mask, data, null_value, null_bytemap, nulls);
ones_count = extractMaskNumericImpl<inverted>(mask, data, null_value, null_bytemap, nulls);
mask_info.has_ones = ones_count > 0;
mask_info.has_zeros = ones_count != mask.size();
@ -279,25 +255,32 @@ void maskedExecute(ColumnWithTypeAndName & column, const PaddedPODArray<UInt8> &
if (!column_function)
return;
size_t original_size = column.column->size();
ColumnWithTypeAndName result;
/// If mask contains only zeros, we can just create
/// an empty column with the execution result type.
if (!mask_info.has_ones)
{
/// If mask contains only zeros, we can just create a column with default values as it will be ignored
auto result_type = column_function->getResultType();
auto empty_column = result_type->createColumn();
result = {std::move(empty_column), result_type, ""};
auto default_column = result_type->createColumnConstWithDefaultValue(original_size)->convertToFullColumnIfConst();
column = {default_column, result_type, ""};
}
/// Filter column only if mask contains zeros.
else if (mask_info.has_zeros)
{
/// If it contains both zeros and ones, we need to execute the function only on the mask values
/// First we filter the column, which creates a new column, then we apply the column, and finally we expand it
/// Expanding is done to keep consistency in function calls (all columns the same size) and it's ok
/// since the values won't be used by `if`
auto filtered = column_function->filter(mask, -1);
result = typeid_cast<const ColumnFunction *>(filtered.get())->reduce();
auto filter_after_execution = typeid_cast<const ColumnFunction *>(filtered.get())->reduce();
auto mut_column = IColumn::mutate(std::move(filter_after_execution.column));
mut_column->expand(mask, false);
column.column = std::move(mut_column);
}
else
result = column_function->reduce();
column = column_function->reduce();
column = std::move(result);
chassert(column.column->size() == original_size);
}
void executeColumnIfNeeded(ColumnWithTypeAndName & column, bool empty)

View File

@ -637,6 +637,9 @@ void TestKeeper::finalize(const String &)
expired = true;
}
/// Signal request_queue to wake up processing thread without waiting for timeout
requests_queue.finish();
processing_thread.join();
try

View File

@ -1,5 +1,4 @@
#include "ZooKeeper.h"
#include "Coordination/KeeperConstants.h"
#include "Coordination/KeeperFeatureFlags.h"
#include "ZooKeeperImpl.h"
#include "KeeperException.h"
@ -376,11 +375,14 @@ void ZooKeeper::createAncestors(const std::string & path)
}
Coordination::Responses responses;
Coordination::Error code = multiImpl(create_ops, responses, /*check_session_valid*/ false);
const auto & [code, failure_reason] = multiImpl(create_ops, responses, /*check_session_valid*/ false);
if (code == Coordination::Error::ZOK)
return;
if (!failure_reason.empty())
throw KeeperException::fromMessage(code, failure_reason);
throw KeeperException::fromPath(code, path);
}
@ -676,17 +678,19 @@ Coordination::Error ZooKeeper::trySet(const std::string & path, const std::strin
}
Coordination::Error ZooKeeper::multiImpl(const Coordination::Requests & requests, Coordination::Responses & responses, bool check_session_valid)
std::pair<Coordination::Error, std::string>
ZooKeeper::multiImpl(const Coordination::Requests & requests, Coordination::Responses & responses, bool check_session_valid)
{
if (requests.empty())
return Coordination::Error::ZOK;
return {Coordination::Error::ZOK, ""};
std::future<Coordination::MultiResponse> future_result;
Coordination::Requests requests_with_check_session;
if (check_session_valid)
{
Coordination::Requests new_requests = requests;
addCheckSessionOp(new_requests);
future_result = asyncTryMultiNoThrow(new_requests);
requests_with_check_session = requests;
addCheckSessionOp(requests_with_check_session);
future_result = asyncTryMultiNoThrow(requests_with_check_session);
}
else
{
@ -696,7 +700,7 @@ Coordination::Error ZooKeeper::multiImpl(const Coordination::Requests & requests
if (future_result.wait_for(std::chrono::milliseconds(args.operation_timeout_ms)) != std::future_status::ready)
{
impl->finalize(fmt::format("Operation timeout on {} {}", Coordination::OpNum::Multi, requests[0]->getPath()));
return Coordination::Error::ZOPERATIONTIMEOUT;
return {Coordination::Error::ZOPERATIONTIMEOUT, ""};
}
else
{
@ -704,11 +708,14 @@ Coordination::Error ZooKeeper::multiImpl(const Coordination::Requests & requests
Coordination::Error code = response.error;
responses = response.responses;
std::string reason;
if (check_session_valid)
{
if (code != Coordination::Error::ZOK && !Coordination::isHardwareError(code) && getFailedOpIndex(code, responses) == requests.size())
{
impl->finalize(fmt::format("Session was killed: {}", requests.back()->getPath()));
reason = fmt::format("Session was killed: {}", requests_with_check_session.back()->getPath());
impl->finalize(reason);
code = Coordination::Error::ZSESSIONMOVED;
}
responses.pop_back();
@ -717,23 +724,33 @@ Coordination::Error ZooKeeper::multiImpl(const Coordination::Requests & requests
chassert(code == Coordination::Error::ZOK || Coordination::isHardwareError(code) || responses.back()->error != Coordination::Error::ZOK);
}
return code;
return {code, std::move(reason)};
}
}
Coordination::Responses ZooKeeper::multi(const Coordination::Requests & requests, bool check_session_valid)
{
Coordination::Responses responses;
Coordination::Error code = multiImpl(requests, responses, check_session_valid);
const auto & [code, failure_reason] = multiImpl(requests, responses, check_session_valid);
if (!failure_reason.empty())
throw KeeperException::fromMessage(code, failure_reason);
KeeperMultiException::check(code, requests, responses);
return responses;
}
Coordination::Error ZooKeeper::tryMulti(const Coordination::Requests & requests, Coordination::Responses & responses, bool check_session_valid)
{
Coordination::Error code = multiImpl(requests, responses, check_session_valid);
const auto & [code, failure_reason] = multiImpl(requests, responses, check_session_valid);
if (code != Coordination::Error::ZOK && !Coordination::isUserError(code))
{
if (!failure_reason.empty())
throw KeeperException::fromMessage(code, failure_reason);
throw KeeperException(code);
}
return code;
}
@ -1346,7 +1363,7 @@ Coordination::Error ZooKeeper::tryMultiNoThrow(const Coordination::Requests & re
{
try
{
return multiImpl(requests, responses, check_session_valid);
return multiImpl(requests, responses, check_session_valid).first;
}
catch (const Coordination::Exception & e)
{

View File

@ -2,10 +2,8 @@
#include "Types.h"
#include <Poco/Util/LayeredConfiguration.h>
#include <unordered_set>
#include <future>
#include <memory>
#include <mutex>
#include <string>
#include <Common/logger_useful.h>
#include <Common/ProfileEvents.h>
@ -18,7 +16,6 @@
#include <Common/thread_local_rng.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <unistd.h>
#include <random>
namespace ProfileEvents
@ -644,7 +641,11 @@ private:
Coordination::Stat * stat,
Coordination::WatchCallbackPtr watch_callback,
Coordination::ListRequestType list_request_type);
Coordination::Error multiImpl(const Coordination::Requests & requests, Coordination::Responses & responses, bool check_session_valid);
/// returns error code with optional reason
std::pair<Coordination::Error, std::string>
multiImpl(const Coordination::Requests & requests, Coordination::Responses & responses, bool check_session_valid);
Coordination::Error existsImpl(const std::string & path, Coordination::Stat * stat_, Coordination::WatchCallback watch_callback);
Coordination::Error syncImpl(const std::string & path, std::string & returned_path);

View File

@ -450,7 +450,10 @@ MutableColumns CacheDictionary<dictionary_key_type>::aggregateColumnsInOrderOfKe
if (default_mask)
{
if (state.isDefault())
{
(*default_mask)[key_index] = 1;
aggregated_column->insertDefault();
}
else
{
(*default_mask)[key_index] = 0;
@ -536,7 +539,10 @@ MutableColumns CacheDictionary<dictionary_key_type>::aggregateColumns(
}
if (default_mask)
{
aggregated_column->insertDefault(); /// Any default is ok
(*default_mask)[key_index] = 1;
}
else
{
/// Insert default value

View File

@ -189,7 +189,6 @@ private:
const time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
size_t fetched_columns_index = 0;
size_t fetched_columns_index_without_default = 0;
size_t keys_size = keys.size();
PaddedPODArray<FetchedKey> fetched_keys;
@ -211,15 +210,10 @@ private:
result.expired_keys_size += static_cast<size_t>(key_state == KeyState::expired);
result.key_index_to_state[key_index] = {key_state,
default_mask ? fetched_columns_index_without_default : fetched_columns_index};
result.key_index_to_state[key_index] = {key_state, fetched_columns_index};
fetched_keys[fetched_columns_index] = FetchedKey(cell.element_index, cell.is_default);
++fetched_columns_index;
if (!cell.is_default)
++fetched_columns_index_without_default;
result.key_index_to_state[key_index].setDefaultValue(cell.is_default);
result.default_keys_size += cell.is_default;
}
@ -233,8 +227,7 @@ private:
auto & attribute = attributes[attribute_index];
auto & fetched_column = *result.fetched_columns[attribute_index];
fetched_column.reserve(default_mask ? fetched_columns_index_without_default :
fetched_columns_index);
fetched_column.reserve(fetched_columns_index);
if (!default_mask)
{
@ -689,7 +682,11 @@ private:
auto fetched_key = fetched_keys[fetched_key_index];
if (unlikely(fetched_key.is_default))
{
default_mask[fetched_key_index] = 1;
auto v = ValueType{};
value_setter(v);
}
else
{
default_mask[fetched_key_index] = 0;

View File

@ -174,6 +174,9 @@ Columns DirectDictionary<dictionary_key_type>::getColumns(
{
if (!mask_filled)
(*default_mask)[requested_key_index] = 1;
Field value{};
result_column->insert(value);
}
else
{

View File

@ -92,24 +92,20 @@ ColumnPtr FlatDictionary::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t, const Array & value, bool) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
ids,
[&](size_t row, StringRef value, bool is_null)
@ -119,18 +115,15 @@ ColumnPtr FlatDictionary::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); }, default_mask);
}
else
{
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
ids,
[&](size_t row, const auto value, bool is_null)
@ -140,17 +133,9 @@ ColumnPtr FlatDictionary::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
ids,
[&](size_t row, const auto value, bool) { out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, ids, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (attribute.is_nullable_set)
vec_null_map_to->resize(keys_found);
}
else
{
@ -643,11 +628,8 @@ void FlatDictionary::getItemsImpl(
}
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t FlatDictionary::getItemsShortCircuitImpl(
const Attribute & attribute,
const PaddedPODArray<UInt64> & keys,
ValueSetter && set_value,
IColumn::Filter & default_mask) const
void FlatDictionary::getItemsShortCircuitImpl(
const Attribute & attribute, const PaddedPODArray<UInt64> & keys, ValueSetter && set_value, IColumn::Filter & default_mask) const
{
const auto rows = keys.size();
default_mask.resize(rows);
@ -660,22 +642,23 @@ size_t FlatDictionary::getItemsShortCircuitImpl(
if (key < loaded_keys.size() && loaded_keys[key])
{
keys_found++;
default_mask[row] = 0;
if constexpr (is_nullable)
set_value(keys_found, container[key], attribute.is_nullable_set->find(key) != nullptr);
set_value(row, container[key], attribute.is_nullable_set->find(key) != nullptr);
else
set_value(keys_found, container[key], false);
++keys_found;
set_value(row, container[key], false);
}
else
{
default_mask[row] = 1;
set_value(row, AttributeType{}, true);
}
}
query_count.fetch_add(rows, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <typename T>

View File

@ -166,11 +166,8 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
const Attribute & attribute,
const PaddedPODArray<UInt64> & keys,
ValueSetter && set_value,
IColumn::Filter & default_mask) const;
void getItemsShortCircuitImpl(
const Attribute & attribute, const PaddedPODArray<UInt64> & keys, ValueSetter && set_value, IColumn::Filter & default_mask) const;
template <typename T>
void resize(Attribute & attribute, UInt64 key);

View File

@ -650,24 +650,20 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](const size_t, const Array & value, bool) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, keys_object, [&](const size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
keys_object,
[&](size_t row, StringRef value, bool is_null)
@ -677,7 +673,7 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
@ -688,7 +684,7 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
keys_object,
[&](size_t row, const auto value, bool is_null)
@ -698,17 +694,9 @@ ColumnPtr HashedArrayDictionary<dictionary_key_type, sharded>::getAttributeColum
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
keys_object,
[&](size_t row, const auto value, bool) { out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, keys_object, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{
@ -834,7 +822,7 @@ void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsImpl(
template <DictionaryKeyType dictionary_key_type, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
@ -870,14 +858,16 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
++keys_found;
}
else
{
default_mask[key_index] = 1;
set_value(key_index, AttributeType{}, true);
}
keys_extractor.rollbackCurrentKey();
}
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sharded>
@ -929,7 +919,7 @@ void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsImpl(
template <DictionaryKeyType dictionary_key_type, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
void HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
const KeyIndexToElementIndex & key_index_to_element_index,
ValueSetter && set_value,
@ -938,7 +928,6 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
const auto & attribute_containers = std::get<AttributeContainerShardsType<AttributeType>>(attribute.containers);
const size_t keys_size = key_index_to_element_index.size();
size_t shard = 0;
size_t keys_found = 0;
for (size_t key_index = 0; key_index < keys_size; ++key_index)
{
@ -955,7 +944,6 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
if (element_index != -1)
{
keys_found++;
const auto & attribute_container = attribute_containers[shard];
size_t found_element_index = static_cast<size_t>(element_index);
@ -966,9 +954,11 @@ size_t HashedArrayDictionary<dictionary_key_type, sharded>::getItemsShortCircuit
else
set_value(key_index, element, false);
}
else
{
set_value(key_index, AttributeType{}, true);
}
}
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sharded>

View File

@ -228,7 +228,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
@ -244,7 +244,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
const KeyIndexToElementIndex & key_index_to_element_index,
ValueSetter && set_value,

View File

@ -245,12 +245,12 @@ private:
ValueSetter && set_value,
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullSetter>
size_t getItemsShortCircuitImpl(
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullAndDefaultSetter>
void getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
NullSetter && set_null,
NullAndDefaultSetter && set_null_and_default,
IColumn::Filter & default_mask) const;
template <typename GetContainersFunc>
@ -428,17 +428,16 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](const size_t, const Array & value) { out->insert(value); },
[&](size_t) {},
[&](size_t) { out->insertDefault(); },
default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
@ -447,7 +446,7 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
if (is_attribute_nullable)
{
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
extractor,
[&](size_t row, StringRef value)
@ -463,11 +462,11 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
default_mask);
}
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](size_t, StringRef value) { out->insertData(value.data, value.size); },
[&](size_t) {},
[&](size_t) { out->insertDefault(); },
default_mask);
}
else
@ -475,7 +474,7 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
extractor,
[&](size_t row, const auto value)
@ -486,18 +485,9 @@ ColumnPtr HashedDictionary<dictionary_key_type, sparse, sharded>::getColumn(
[&](size_t row) { (*vec_null_map_to)[row] = true; },
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
extractor,
[&](size_t row, const auto value) { out[row] = value; },
[&](size_t) {},
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, extractor, [&](size_t row, const auto value) { out[row] = value; }, [&](size_t) {}, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{
@ -1112,12 +1102,12 @@ void HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsImpl(
}
template <DictionaryKeyType dictionary_key_type, bool sparse, bool sharded>
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullSetter>
size_t HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCircuitImpl(
template <typename AttributeType, bool is_nullable, typename ValueSetter, typename NullAndDefaultSetter>
void HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCircuitImpl(
const Attribute & attribute,
DictionaryKeysExtractor<dictionary_key_type> & keys_extractor,
ValueSetter && set_value,
NullSetter && set_null,
NullAndDefaultSetter && set_null_and_default,
IColumn::Filter & default_mask) const
{
const auto & attribute_containers = std::get<CollectionsHolder<AttributeType>>(attribute.containers);
@ -1143,20 +1133,22 @@ size_t HashedDictionary<dictionary_key_type, sparse, sharded>::getItemsShortCirc
// Need to consider items in is_nullable_sets as well, see blockToAttributes()
else if (is_nullable && (*attribute.is_nullable_sets)[shard].find(key) != nullptr)
{
set_null(key_index);
set_null_and_default(key_index);
default_mask[key_index] = 0;
++keys_found;
}
else
{
set_null_and_default(key_index);
default_mask[key_index] = 1;
}
keys_extractor.rollbackCurrentKey();
}
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <DictionaryKeyType dictionary_key_type, bool sparse, bool sharded>

View File

@ -249,39 +249,27 @@ ColumnPtr IPAddressDictionary::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t, const Array & value) { out->insert(value); },
default_mask);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t, const Array & value) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t, StringRef value) { out->insertData(value.data, value.size); },
default_mask);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t, StringRef value) { out->insertData(value.data, value.size); }, default_mask);
}
else
{
auto & out = column->getData();
keys_found = getItemsShortCircuitImpl<ValueType>(
attribute,
key_columns,
[&](const size_t row, const auto value) { return out[row] = value; },
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType>(
attribute, key_columns, [&](const size_t row, const auto value) { return out[row] = value; }, default_mask);
}
}
else
@ -783,7 +771,10 @@ size_t IPAddressDictionary::getItemsByTwoKeyColumnsShortCircuitImpl(
keys_found++;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
return keys_found;
}
@ -822,7 +813,10 @@ size_t IPAddressDictionary::getItemsByTwoKeyColumnsShortCircuitImpl(
keys_found++;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
return keys_found;
}
@ -893,11 +887,8 @@ void IPAddressDictionary::getItemsImpl(
}
template <typename AttributeType, typename ValueSetter>
size_t IPAddressDictionary::getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetter && set_value,
IColumn::Filter & default_mask) const
void IPAddressDictionary::getItemsShortCircuitImpl(
const Attribute & attribute, const Columns & key_columns, ValueSetter && set_value, IColumn::Filter & default_mask) const
{
const auto & first_column = key_columns.front();
const size_t rows = first_column->size();
@ -909,7 +900,8 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
keys_found = getItemsByTwoKeyColumnsShortCircuitImpl<AttributeType>(
attribute, key_columns, std::forward<ValueSetter>(set_value), default_mask);
query_count.fetch_add(rows, std::memory_order_relaxed);
return keys_found;
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return;
}
auto & vec = std::get<ContainerType<AttributeType>>(attribute.maps);
@ -931,7 +923,10 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
default_mask[i] = 0;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
}
else if (type_id == TypeIndex::IPv6 || type_id == TypeIndex::FixedString)
@ -949,7 +944,10 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
default_mask[i] = 0;
}
else
{
set_value(i, AttributeType{});
default_mask[i] = 1;
}
}
}
else
@ -957,7 +955,6 @@ size_t IPAddressDictionary::getItemsShortCircuitImpl(
query_count.fetch_add(rows, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
template <typename T>

View File

@ -193,12 +193,9 @@ private:
ValueSetter && set_value,
DefaultValueExtractor & default_value_extractor) const;
template <typename AttributeType,typename ValueSetter>
size_t getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetter && set_value,
IColumn::Filter & default_mask) const;
template <typename AttributeType, typename ValueSetter>
void getItemsShortCircuitImpl(
const Attribute & attribute, const Columns & key_columns, ValueSetter && set_value, IColumn::Filter & default_mask) const;
template <typename T>
void setAttributeValueImpl(Attribute & attribute, const T value); /// NOLINT

View File

@ -475,7 +475,11 @@ void IPolygonDictionary::getItemsShortCircuitImpl(
default_mask[requested_key_index] = 0;
}
else
{
auto value = AttributeType{};
set_value(value);
default_mask[requested_key_index] = 1;
}
}
query_count.fetch_add(requested_key_size, std::memory_order_relaxed);

View File

@ -56,27 +56,20 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
if constexpr (std::is_same_v<ValueType, Array>)
{
auto * out = column.get();
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t, const Array & value, bool)
{
out->insert(value);
},
default_mask);
getItemsShortCircuitImpl<ValueType, false>(
attribute, modified_key_columns, [&](size_t, const Array & value, bool) { out->insert(value); }, default_mask);
}
else if constexpr (std::is_same_v<ValueType, StringRef>)
{
auto * out = column.get();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
modified_key_columns,
[&](size_t row, StringRef value, bool is_null)
@ -86,13 +79,10 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t, StringRef value, bool)
{
out->insertData(value.data, value.size);
},
[&](size_t, StringRef value, bool) { out->insertData(value.data, value.size); },
default_mask);
}
else
@ -100,7 +90,7 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
auto & out = column->getData();
if (is_attribute_nullable)
keys_found = getItemsShortCircuitImpl<ValueType, true>(
getItemsShortCircuitImpl<ValueType, true>(
attribute,
modified_key_columns,
[&](size_t row, const auto value, bool is_null)
@ -110,20 +100,9 @@ ColumnPtr RangeHashedDictionary<dictionary_key_type>::getColumn(
},
default_mask);
else
keys_found = getItemsShortCircuitImpl<ValueType, false>(
attribute,
modified_key_columns,
[&](size_t row, const auto value, bool)
{
out[row] = value;
},
default_mask);
out.resize(keys_found);
getItemsShortCircuitImpl<ValueType, false>(
attribute, modified_key_columns, [&](size_t row, const auto value, bool) { out[row] = value; }, default_mask);
}
if (is_attribute_nullable)
vec_null_map_to->resize(keys_found);
}
else
{

View File

@ -245,7 +245,7 @@ private:
DefaultValueExtractor & default_value_extractor) const;
template <typename ValueType, bool is_nullable>
size_t getItemsShortCircuitImpl(
void getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
ValueSetterFunc<ValueType> && set_value,

View File

@ -1,7 +1,7 @@
#include <Dictionaries/RangeHashedDictionary.h>
#define INSTANTIATE_GET_ITEMS_SHORT_CIRCUIT_IMPL(DictionaryKeyType, IsNullable, ValueType) \
template size_t RangeHashedDictionary<DictionaryKeyType>::getItemsShortCircuitImpl<ValueType, IsNullable>( \
template void RangeHashedDictionary<DictionaryKeyType>::getItemsShortCircuitImpl<ValueType, IsNullable>( \
const Attribute & attribute, \
const Columns & key_columns, \
typename RangeHashedDictionary<DictionaryKeyType>::ValueSetterFunc<ValueType> && set_value, \
@ -18,7 +18,7 @@ namespace DB
template <DictionaryKeyType dictionary_key_type>
template <typename ValueType, bool is_nullable>
size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
void RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
const Attribute & attribute,
const Columns & key_columns,
typename RangeHashedDictionary<dictionary_key_type>::ValueSetterFunc<ValueType> && set_value,
@ -120,6 +120,7 @@ size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
}
default_mask[key_index] = 1;
set_value(key_index, ValueType{}, true);
keys_extractor.rollbackCurrentKey();
}
@ -127,6 +128,5 @@ size_t RangeHashedDictionary<dictionary_key_type>::getItemsShortCircuitImpl(
query_count.fetch_add(keys_size, std::memory_order_relaxed);
found_count.fetch_add(keys_found, std::memory_order_relaxed);
return keys_found;
}
}

View File

@ -807,6 +807,7 @@ std::unordered_map<String, ColumnPtr> RegExpTreeDictionary::match(
if (attributes_to_set.contains(name_))
continue;
columns[name_]->insertDefault();
default_mask.value().get()[key_idx] = 1;
}

View File

@ -14,6 +14,7 @@ namespace DB
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int LOGICAL_ERROR;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int SIZES_OF_ARRAYS_DONT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
@ -298,4 +299,27 @@ bool isDecimalOrNullableDecimal(const DataTypePtr & type)
return isDecimal(assert_cast<const DataTypeNullable *>(type.get())->getNestedType());
}
/// Note that, for historical reasons, most of the functions use the first argument size to determine which is the
/// size of all the columns. When short circuit optimization was introduced, `input_rows_count` was also added for
/// all functions, but many have not been adjusted
void checkFunctionArgumentSizes(const ColumnsWithTypeAndName & arguments, size_t input_rows_count)
{
for (size_t i = 0; i < arguments.size(); i++)
{
if (isColumnConst(*arguments[i].column))
continue;
size_t current_size = arguments[i].column->size();
if (current_size != input_rows_count)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Expected the argument nº#{} ('{}' of type {}) to have {} rows, but it has {}",
i + 1,
arguments[i].name,
arguments[i].type->getName(),
input_rows_count,
current_size);
}
}
}

View File

@ -197,4 +197,6 @@ struct NullPresence
NullPresence getNullPresense(const ColumnsWithTypeAndName & args);
bool isDecimalOrNullableDecimal(const DataTypePtr & type);
void checkFunctionArgumentSizes(const ColumnsWithTypeAndName & arguments, size_t input_rows_count);
}

View File

@ -0,0 +1,142 @@
#pragma once
#include <Functions/IFunction.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Functions/FunctionHelpers.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
extern const int ILLEGAL_COLUMN;
}
class FunctionSpaceFillingCurveEncode: public IFunction
{
public:
bool isVariadic() const override
{
return true;
}
size_t getNumberOfArguments() const override
{
return 0;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DB::DataTypes & arguments) const override
{
size_t vector_start_index = 0;
if (arguments.empty())
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"At least one UInt argument is required for function {}",
getName());
if (WhichDataType(arguments[0]).isTuple())
{
vector_start_index = 1;
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].get());
auto tuple_size = type_tuple->getElements().size();
if (tuple_size != (arguments.size() - 1))
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} for function {}, tuple size should be equal to number of UInt arguments",
arguments[0]->getName(), getName());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
}
}
for (size_t i = vector_start_index; i < arguments.size(); i++)
{
const auto & arg = arguments[i];
if (!WhichDataType(arg).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument for function {}, should be a native UInt",
arg->getName(), getName());
}
return std::make_shared<DataTypeUInt64>();
}
};
template <UInt8 max_dimensions, UInt8 min_ratio, UInt8 max_ratio>
class FunctionSpaceFillingCurveDecode: public IFunction
{
public:
size_t getNumberOfArguments() const override
{
return 2;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
UInt64 tuple_size = 0;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
if (!col_const)
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be a constant (UInt or Tuple)",
arguments[0].type->getName(), getName());
if (!WhichDataType(arguments[1].type).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be a native UInt",
arguments[1].type->getName(), getName());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
{
tuple_size = mask->tupleSize();
}
else if (WhichDataType(arguments[0].type).isNativeUInt())
{
tuple_size = col_const->getUInt(0);
}
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} for function {}, should be UInt or Tuple",
arguments[0].type->getName(), getName());
if (tuple_size > max_dimensions || tuple_size < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal first argument for function {}, should be a number in range 1-{} or a Tuple of such size",
getName(), String{max_dimensions});
if (mask)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].type.get());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > max_ratio || ratio < min_ratio)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} in tuple for function {}, should be a number in range {}-{}",
ratio, getName(), String{min_ratio}, String{max_ratio});
}
}
DataTypes types(tuple_size);
for (size_t i = 0; i < tuple_size; i++)
{
types[i] = std::make_shared<DataTypeUInt64>();
}
return std::make_shared<DataTypeTuple>(types);
}
};
}

View File

@ -47,7 +47,6 @@ namespace ErrorCodes
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_COLUMN;
extern const int TYPE_MISMATCH;
extern const int LOGICAL_ERROR;
}
@ -655,18 +654,6 @@ private:
result_column = if_func->build(if_args)->execute(if_args, result_type, rows);
}
#ifdef ABORT_ON_LOGICAL_ERROR
void validateShortCircuitResult(const ColumnPtr & column, const IColumn::Filter & filter) const
{
size_t expected_size = filter.size() - countBytesInFilter(filter);
size_t col_size = column->size();
if (col_size != expected_size)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Invalid size of getColumnsOrDefaultShortCircuit result. Column has {} rows, but filter contains {} bytes.",
col_size, expected_size);
}
#endif
ColumnPtr executeDictionaryRequest(
std::shared_ptr<const IDictionary> & dictionary,
@ -696,11 +683,6 @@ private:
IColumn::Filter default_mask;
result_columns = dictionary->getColumns(attribute_names, attribute_tuple_type.getElements(), key_columns, key_types, default_mask);
#ifdef ABORT_ON_LOGICAL_ERROR
for (const auto & column : result_columns)
validateShortCircuitResult(column, default_mask);
#endif
auto [defaults_column, mask_column] =
getDefaultsShortCircuit(std::move(default_mask), result_type, last_argument);
@ -736,10 +718,6 @@ private:
IColumn::Filter default_mask;
result = dictionary->getColumn(attribute_names[0], attribute_type, key_columns, key_types, default_mask);
#ifdef ABORT_ON_LOGICAL_ERROR
validateShortCircuitResult(result, default_mask);
#endif
auto [defaults_column, mask_column] =
getDefaultsShortCircuit(std::move(default_mask), result_type, last_argument);

View File

@ -440,9 +440,6 @@ void NO_INLINE conditional(SourceA && src_a, SourceB && src_b, Sink && sink, con
const UInt8 * cond_pos = condition.data();
const UInt8 * cond_end = cond_pos + condition.size();
bool a_is_short = src_a.getColumnSize() < condition.size();
bool b_is_short = src_b.getColumnSize() < condition.size();
while (cond_pos < cond_end)
{
if (*cond_pos)
@ -450,10 +447,8 @@ void NO_INLINE conditional(SourceA && src_a, SourceB && src_b, Sink && sink, con
else
writeSlice(src_b.getWhole(), sink);
if (!a_is_short || *cond_pos)
src_a.next();
if (!b_is_short || !*cond_pos)
src_b.next();
src_a.next();
src_b.next();
++cond_pos;
sink.next();

View File

@ -110,7 +110,6 @@ void convertLowCardinalityColumnsToFull(ColumnsWithTypeAndName & args)
column.type = recursiveRemoveLowCardinality(column.type);
}
}
}
ColumnPtr IExecutableFunction::defaultImplementationForConstantArguments(
@ -277,6 +276,7 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType
size_t new_input_rows_count = columns_without_low_cardinality.empty()
? input_rows_count
: columns_without_low_cardinality.front().column->size();
checkFunctionArgumentSizes(columns_without_low_cardinality, new_input_rows_count);
auto res = executeWithoutLowCardinalityColumns(columns_without_low_cardinality, dictionary_type, new_input_rows_count, dry_run);
bool res_is_constant = isColumnConst(*res);
@ -311,6 +311,8 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType
ColumnPtr IExecutableFunction::execute(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, bool dry_run) const
{
checkFunctionArgumentSizes(arguments, input_rows_count);
bool use_default_implementation_for_sparse_columns = useDefaultImplementationForSparseColumns();
/// DataTypeFunction does not support obtaining default (isDefaultAt())
/// ColumnFunction does not support getting specific values.

View File

@ -3,11 +3,12 @@
#include <Core/ColumnNumbers.h>
#include <Core/ColumnsWithTypeAndName.h>
#include <Core/Field.h>
#include <Core/ValuesWithType.h>
#include <Core/Names.h>
#include <Core/IResolvedFunction.h>
#include <Common/Exception.h>
#include <Core/Names.h>
#include <Core/ValuesWithType.h>
#include <DataTypes/IDataType.h>
#include <Functions/FunctionHelpers.h>
#include <Common/Exception.h>
#include "config.h"
@ -133,8 +134,12 @@ public:
~IFunctionBase() override = default;
virtual ColumnPtr execute( /// NOLINT
const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, bool dry_run = false) const
const ColumnsWithTypeAndName & arguments,
const DataTypePtr & result_type,
size_t input_rows_count,
bool dry_run = false) const
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return prepare(arguments)->execute(arguments, result_type, input_rows_count, dry_run);
}

View File

@ -18,11 +18,13 @@ protected:
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const final
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return function->executeImpl(arguments, result_type, input_rows_count);
}
ColumnPtr executeDryRunImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const final
{
checkFunctionArgumentSizes(arguments, input_rows_count);
return function->executeImplDryRun(arguments, result_type, input_rows_count);
}

View File

@ -205,13 +205,13 @@ private:
return 4;
}
/// Cast content from integer to string, and append result string to buffer.
/// Make sure digits number in result string is no less than total_digits by padding leading '0'
/// Casts val from integer to string, then appends result string to buffer.
/// Makes sure digits number in result string is no less than min_digits by padding leading '0'.
/// Notice: '-' is not counted as digit.
/// For example:
/// val = -123, total_digits = 2 => dest = "-123"
/// val = -123, total_digits = 3 => dest = "-123"
/// val = -123, total_digits = 4 => dest = "-0123"
/// val = -123, min_digits = 2 => dest = "-123"
/// val = -123, min_digits = 3 => dest = "-123"
/// val = -123, min_digits = 4 => dest = "-0123"
static size_t writeNumberWithPadding(char * dest, std::integral auto val, size_t min_digits)
{
using T = decltype(val);
@ -226,9 +226,10 @@ private:
++digits;
}
/// Possible sign
size_t pos = 0;
n = val;
/// Possible sign
if constexpr (is_signed_v<T>)
if (val < 0)
{
@ -245,16 +246,17 @@ private:
}
/// Digits
size_t digits_written = 0;
while (w >= 100)
{
w /= 100;
writeNumber2(dest + pos, n / w);
pos += 2;
digits_written += 2;
n = n % w;
}
if (n)
if (digits_written < digits)
{
dest[pos] = '0' + n;
++pos;

View File

@ -0,0 +1,124 @@
#include <Common/BitHelpers.h>
#include <Functions/FunctionFactory.h>
#include <Functions/PerformanceAdaptors.h>
#include "hilbertDecode2DLUT.h"
#include <limits>
namespace DB
{
class FunctionHilbertDecode : public FunctionSpaceFillingCurveDecode<2, 0, 32>
{
public:
static constexpr auto name = "hilbertDecode";
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionHilbertDecode>();
}
String getName() const override { return name; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
size_t num_dimensions;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
num_dimensions = mask->tupleSize();
else
num_dimensions = col_const->getUInt(0);
const ColumnPtr & col_code = arguments[1].column;
Columns tuple_columns(num_dimensions);
const auto shrink = [mask](const UInt64 value, const UInt8 column_num)
{
if (mask)
return value >> mask->getColumn(column_num).getUInt(0);
return value;
};
auto col0 = ColumnUInt64::create();
auto & vec0 = col0->getData();
vec0.resize(input_rows_count);
if (num_dimensions == 1)
{
for (size_t i = 0; i < input_rows_count; i++)
{
vec0[i] = shrink(col_code->getUInt(i), 0);
}
tuple_columns[0] = std::move(col0);
return ColumnTuple::create(tuple_columns);
}
auto col1 = ColumnUInt64::create();
auto & vec1 = col1->getData();
vec1.resize(input_rows_count);
if (num_dimensions == 2)
{
for (size_t i = 0; i < input_rows_count; i++)
{
const auto res = FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(col_code->getUInt(i));
vec0[i] = shrink(std::get<0>(res), 0);
vec1[i] = shrink(std::get<1>(res), 1);
}
tuple_columns[0] = std::move(col0);
tuple_columns[1] = std::move(col1);
return ColumnTuple::create(tuple_columns);
}
return ColumnTuple::create(tuple_columns);
}
};
REGISTER_FUNCTION(HilbertDecode)
{
factory.registerFunction<FunctionHilbertDecode>(FunctionDocumentation{
.description=R"(
Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.
The function has two modes of operation:
- Simple
- Expanded
Simple Mode: Accepts the desired tuple size as the first argument (up to 2) and the Hilbert index as the second argument. This mode decodes the index into a tuple of the specified size.
[example:simple]
Will decode into: `(8, 0)`
The resulting tuple size cannot be more than 2
Expanded Mode: Takes a range mask (tuple) as the first argument and the Hilbert index as the second argument.
Each number in the mask specifies the number of bits by which the corresponding decoded argument will be right-shifted, effectively scaling down the output values.
[example:range_shrank]
Note: see hilbertEncode() docs on why range change might be beneficial.
Still limited to 2 numbers at most.
Hilbert code for one argument is always the argument itself (as a tuple).
[example:identity]
Produces: `(1)`
A single argument with a tuple specifying bit shifts will be right-shifted accordingly.
[example:identity_shrank]
Produces: `(128)`
The function accepts a column of codes as a second argument:
[example:from_table]
The range tuple must be a constant:
[example:from_table_range]
)",
.examples{
{"simple", "SELECT hilbertDecode(2, 64)", ""},
{"range_shrank", "SELECT hilbertDecode((1,2), 1572864)", ""},
{"identity", "SELECT hilbertDecode(1, 1)", ""},
{"identity_shrank", "SELECT hilbertDecode(tuple(2), 512)", ""},
{"from_table", "SELECT hilbertDecode(2, code) FROM table", ""},
{"from_table_range", "SELECT hilbertDecode((1,2), code) FROM table", ""},
},
.categories {"Hilbert coding", "Hilbert Curve"}
});
}
}

View File

@ -0,0 +1,145 @@
#pragma once
#include <Functions/FunctionSpaceFillingCurve.h>
namespace DB
{
namespace HilbertDetails
{
template <UInt8 bit_step>
class HilbertDecodeLookupTable
{
public:
constexpr static UInt8 LOOKUP_TABLE[0] = {};
};
template <>
class HilbertDecodeLookupTable<1>
{
public:
constexpr static UInt8 LOOKUP_TABLE[16] = {
4, 1, 3, 10,
0, 6, 7, 13,
15, 9, 8, 2,
11, 14, 12, 5
};
};
template <>
class HilbertDecodeLookupTable<2>
{
public:
constexpr static UInt8 LOOKUP_TABLE[64] = {
0, 20, 21, 49, 18, 3, 7, 38,
26, 11, 15, 46, 61, 41, 40, 12,
16, 1, 5, 36, 8, 28, 29, 57,
10, 30, 31, 59, 39, 54, 50, 19,
47, 62, 58, 27, 55, 35, 34, 6,
53, 33, 32, 4, 24, 9, 13, 44,
63, 43, 42, 14, 45, 60, 56, 25,
37, 52, 48, 17, 2, 22, 23, 51
};
};
template <>
class HilbertDecodeLookupTable<3>
{
public:
constexpr static UInt8 LOOKUP_TABLE[256] = {
64, 1, 9, 136, 16, 88, 89, 209, 18, 90, 91, 211, 139, 202, 194, 67,
4, 76, 77, 197, 70, 7, 15, 142, 86, 23, 31, 158, 221, 149, 148, 28,
36, 108, 109, 229, 102, 39, 47, 174, 118, 55, 63, 190, 253, 181, 180, 60,
187, 250, 242, 115, 235, 163, 162, 42, 233, 161, 160, 40, 112, 49, 57, 184,
0, 72, 73, 193, 66, 3, 11, 138, 82, 19, 27, 154, 217, 145, 144, 24,
96, 33, 41, 168, 48, 120, 121, 241, 50, 122, 123, 243, 171, 234, 226, 99,
100, 37, 45, 172, 52, 124, 125, 245, 54, 126, 127, 247, 175, 238, 230, 103,
223, 151, 150, 30, 157, 220, 212, 85, 141, 204, 196, 69, 6, 78, 79, 199,
255, 183, 182, 62, 189, 252, 244, 117, 173, 236, 228, 101, 38, 110, 111, 231,
159, 222, 214, 87, 207, 135, 134, 14, 205, 133, 132, 12, 84, 21, 29, 156,
155, 218, 210, 83, 203, 131, 130, 10, 201, 129, 128, 8, 80, 17, 25, 152,
32, 104, 105, 225, 98, 35, 43, 170, 114, 51, 59, 186, 249, 177, 176, 56,
191, 254, 246, 119, 239, 167, 166, 46, 237, 165, 164, 44, 116, 53, 61, 188,
251, 179, 178, 58, 185, 248, 240, 113, 169, 232, 224, 97, 34, 106, 107, 227,
219, 147, 146, 26, 153, 216, 208, 81, 137, 200, 192, 65, 2, 74, 75, 195,
68, 5, 13, 140, 20, 92, 93, 213, 22, 94, 95, 215, 143, 206, 198, 71
};
};
}
template <UInt8 bit_step>
class FunctionHilbertDecode2DWIthLookupTableImpl
{
static_assert(bit_step <= 3, "bit_step should not be more than 3 to fit in UInt8");
public:
static std::tuple<UInt64, UInt64> decode(UInt64 hilbert_code)
{
UInt64 x = 0;
UInt64 y = 0;
const auto leading_zeros_count = getLeadingZeroBits(hilbert_code);
const auto used_bits = std::numeric_limits<UInt64>::digits - leading_zeros_count;
auto [current_shift, state] = getInitialShiftAndState(used_bits);
while (current_shift >= 0)
{
const UInt8 hilbert_bits = (hilbert_code >> current_shift) & HILBERT_MASK;
const auto [x_bits, y_bits] = getCodeAndUpdateState(hilbert_bits, state);
x |= (x_bits << (current_shift >> 1));
y |= (y_bits << (current_shift >> 1));
current_shift -= getHilbertShift(bit_step);
}
return {x, y};
}
private:
// for bit_step = 3
// LOOKUP_TABLE[SSHHHHHH] = SSXXXYYY
// where SS - 2 bits for state, XXX - 3 bits of x, YYY - 3 bits of y
// State is rotation of curve on every step, left/up/right/down - therefore 2 bits
static std::pair<UInt64, UInt64> getCodeAndUpdateState(UInt8 hilbert_bits, UInt8& state)
{
const UInt8 table_index = state | hilbert_bits;
const auto table_code = HilbertDetails::HilbertDecodeLookupTable<bit_step>::LOOKUP_TABLE[table_index];
state = table_code & STATE_MASK;
const UInt64 x_bits = (table_code & X_MASK) >> bit_step;
const UInt64 y_bits = table_code & Y_MASK;
return {x_bits, y_bits};
}
// hilbert code is double size of input values
static constexpr UInt8 getHilbertShift(UInt8 shift)
{
return shift << 1;
}
static std::pair<Int8, UInt8> getInitialShiftAndState(UInt8 used_bits)
{
UInt8 iterations = used_bits / HILBERT_SHIFT;
Int8 initial_shift = iterations * HILBERT_SHIFT;
if (initial_shift < used_bits)
{
++iterations;
}
else
{
initial_shift -= HILBERT_SHIFT;
}
UInt8 state = iterations % 2 == 0 ? LEFT_STATE : DEFAULT_STATE;
return {initial_shift, state};
}
constexpr static UInt8 STEP_MASK = (1 << bit_step) - 1;
constexpr static UInt8 HILBERT_SHIFT = getHilbertShift(bit_step);
constexpr static UInt8 HILBERT_MASK = (1 << HILBERT_SHIFT) - 1;
constexpr static UInt8 STATE_MASK = 0b11 << HILBERT_SHIFT;
constexpr static UInt8 Y_MASK = STEP_MASK;
constexpr static UInt8 X_MASK = STEP_MASK << bit_step;
constexpr static UInt8 LEFT_STATE = 0b01 << HILBERT_SHIFT;
constexpr static UInt8 DEFAULT_STATE = bit_step % 2 == 0 ? LEFT_STATE : 0;
};
}

View File

@ -0,0 +1,150 @@
#include "hilbertEncode2DLUT.h"
#include <Common/BitHelpers.h>
#include <Functions/PerformanceAdaptors.h>
#include <limits>
#include <optional>
#include <Functions/FunctionFactory.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
}
class FunctionHilbertEncode : public FunctionSpaceFillingCurveEncode
{
public:
static constexpr auto name = "hilbertEncode";
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionHilbertEncode>();
}
String getName() const override { return name; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
if (input_rows_count == 0)
return ColumnUInt64::create();
size_t num_dimensions = arguments.size();
size_t vector_start_index = 0;
const auto * const_col = typeid_cast<const ColumnConst *>(arguments[0].column.get());
const ColumnTuple * mask;
if (const_col)
mask = typeid_cast<const ColumnTuple *>(const_col->getDataColumnPtr().get());
else
mask = typeid_cast<const ColumnTuple *>(arguments[0].column.get());
if (mask)
{
num_dimensions = mask->tupleSize();
vector_start_index = 1;
for (size_t i = 0; i < num_dimensions; i++)
{
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > 32)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} of function {}, should be a number in range 0-32",
arguments[0].column->getName(), getName());
}
}
auto col_res = ColumnUInt64::create();
ColumnUInt64::Container & vec_res = col_res->getData();
vec_res.resize(input_rows_count);
const auto expand = [mask](const UInt64 value, const UInt8 column_num)
{
if (mask)
return value << mask->getColumn(column_num).getUInt(0);
return value;
};
const ColumnPtr & col0 = arguments[0 + vector_start_index].column;
if (num_dimensions == 1)
{
for (size_t i = 0; i < input_rows_count; ++i)
{
vec_res[i] = expand(col0->getUInt(i), 0);
}
return col_res;
}
const ColumnPtr & col1 = arguments[1 + vector_start_index].column;
if (num_dimensions == 2)
{
for (size_t i = 0; i < input_rows_count; ++i)
{
vec_res[i] = FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(
expand(col0->getUInt(i), 0),
expand(col1->getUInt(i), 1));
}
return col_res;
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal number of UInt arguments of function {}: should be not more than 2 dimensions",
getName());
}
};
REGISTER_FUNCTION(HilbertEncode)
{
factory.registerFunction<FunctionHilbertEncode>(FunctionDocumentation{
.description=R"(
Calculates code for Hilbert Curve for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
Simple: accepts up to 2 unsigned integers as arguments and produces a UInt64 code.
[example:simple]
Produces: `31`
Expanded: accepts a range mask (tuple) as a first argument and up to 2 unsigned integers as other arguments.
Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range.
[example:range_expanded]
Produces: `4031541586602`
Note: tuple size must be equal to the number of the other arguments
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF)
For a single argument without a tuple, the function returns the argument itself as the Hilbert index, since no dimensional mapping is needed.
[example:identity]
Produces: `1`
If a single argument is provided with a tuple specifying bit shifts, the function shifts the argument left by the specified number of bits.
[example:identity_expanded]
Produces: `512`
The function also accepts columns as arguments:
[example:from_table]
But the range tuple must still be a constant:
[example:from_table_range]
Please note that you can fit only so much bits of information into Hilbert code as UInt64 has.
Two arguments will have a range of maximum 2^32 (64/2) each
All overflow will be clamped to zero
)",
.examples{
{"simple", "SELECT hilbertEncode(3, 4)", ""},
{"range_expanded", "SELECT hilbertEncode((10,6), 1024, 16)", ""},
{"identity", "SELECT hilbertEncode(1)", ""},
{"identity_expanded", "SELECT hilbertEncode(tuple(2), 128)", ""},
{"from_table", "SELECT hilbertEncode(n1, n2) FROM table", ""},
{"from_table_range", "SELECT hilbertEncode((1,2), n1, n2) FROM table", ""},
},
.categories {"Hilbert coding", "Hilbert Curve"}
});
}
}

View File

@ -0,0 +1,142 @@
#pragma once
#include <Functions/FunctionSpaceFillingCurve.h>
namespace DB
{
namespace HilbertDetails
{
template <UInt8 bit_step>
class HilbertEncodeLookupTable
{
public:
constexpr static UInt8 LOOKUP_TABLE[0] = {};
};
template <>
class HilbertEncodeLookupTable<1>
{
public:
constexpr static UInt8 LOOKUP_TABLE[16] = {
4, 1, 11, 2,
0, 15, 5, 6,
10, 9, 3, 12,
14, 7, 13, 8
};
};
template <>
class HilbertEncodeLookupTable<2>
{
public:
constexpr static UInt8 LOOKUP_TABLE[64] = {
0, 51, 20, 5, 17, 18, 39, 6,
46, 45, 24, 9, 15, 60, 43, 10,
16, 1, 62, 31, 35, 2, 61, 44,
4, 55, 8, 59, 21, 22, 25, 26,
42, 41, 38, 37, 11, 56, 7, 52,
28, 13, 50, 19, 47, 14, 49, 32,
58, 27, 12, 63, 57, 40, 29, 30,
54, 23, 34, 33, 53, 36, 3, 48
};
};
template <>
class HilbertEncodeLookupTable<3>
{
public:
constexpr static UInt8 LOOKUP_TABLE[256] = {
64, 1, 206, 79, 16, 211, 84, 21, 131, 2, 205, 140, 81, 82, 151, 22, 4,
199, 8, 203, 158, 157, 88, 25, 69, 70, 73, 74, 31, 220, 155, 26, 186,
185, 182, 181, 32, 227, 100, 37, 59, 248, 55, 244, 97, 98, 167, 38, 124,
61, 242, 115, 174, 173, 104, 41, 191, 62, 241, 176, 47, 236, 171, 42, 0,
195, 68, 5, 250, 123, 60, 255, 65, 66, 135, 6, 249, 184, 125, 126, 142,
141, 72, 9, 246, 119, 178, 177, 15, 204, 139, 10, 245, 180, 51, 240, 80,
17, 222, 95, 96, 33, 238, 111, 147, 18, 221, 156, 163, 34, 237, 172, 20,
215, 24, 219, 36, 231, 40, 235, 85, 86, 89, 90, 101, 102, 105, 106, 170,
169, 166, 165, 154, 153, 150, 149, 43, 232, 39, 228, 27, 216, 23, 212, 108,
45, 226, 99, 92, 29, 210, 83, 175, 46, 225, 160, 159, 30, 209, 144, 48,
243, 116, 53, 202, 75, 12, 207, 113, 114, 183, 54, 201, 136, 77, 78, 190,
189, 120, 57, 198, 71, 130, 129, 63, 252, 187, 58, 197, 132, 3, 192, 234,
107, 44, 239, 112, 49, 254, 127, 233, 168, 109, 110, 179, 50, 253, 188, 230,
103, 162, 161, 52, 247, 56, 251, 229, 164, 35, 224, 117, 118, 121, 122, 218,
91, 28, 223, 138, 137, 134, 133, 217, 152, 93, 94, 11, 200, 7, 196, 214,
87, 146, 145, 76, 13, 194, 67, 213, 148, 19, 208, 143, 14, 193, 128,
};
};
}
template <UInt8 bit_step>
class FunctionHilbertEncode2DWIthLookupTableImpl
{
static_assert(bit_step <= 3, "bit_step should not be more than 3 to fit in UInt8");
public:
static UInt64 encode(UInt64 x, UInt64 y)
{
UInt64 hilbert_code = 0;
const auto leading_zeros_count = getLeadingZeroBits(x | y);
const auto used_bits = std::numeric_limits<UInt64>::digits - leading_zeros_count;
if (used_bits > 32)
return 0; // hilbert code will be overflowed in this case
auto [current_shift, state] = getInitialShiftAndState(used_bits);
while (current_shift >= 0)
{
const UInt8 x_bits = (x >> current_shift) & STEP_MASK;
const UInt8 y_bits = (y >> current_shift) & STEP_MASK;
const auto hilbert_bits = getCodeAndUpdateState(x_bits, y_bits, state);
hilbert_code |= (hilbert_bits << getHilbertShift(current_shift));
current_shift -= bit_step;
}
return hilbert_code;
}
private:
// for bit_step = 3
// LOOKUP_TABLE[SSXXXYYY] = SSHHHHHH
// where SS - 2 bits for state, XXX - 3 bits of x, YYY - 3 bits of y
// State is rotation of curve on every step, left/up/right/down - therefore 2 bits
static UInt64 getCodeAndUpdateState(UInt8 x_bits, UInt8 y_bits, UInt8& state)
{
const UInt8 table_index = state | (x_bits << bit_step) | y_bits;
const auto table_code = HilbertDetails::HilbertEncodeLookupTable<bit_step>::LOOKUP_TABLE[table_index];
state = table_code & STATE_MASK;
return table_code & HILBERT_MASK;
}
// hilbert code is double size of input values
static constexpr UInt8 getHilbertShift(UInt8 shift)
{
return shift << 1;
}
static std::pair<Int8, UInt8> getInitialShiftAndState(UInt8 used_bits)
{
UInt8 iterations = used_bits / bit_step;
Int8 initial_shift = iterations * bit_step;
if (initial_shift < used_bits)
{
++iterations;
}
else
{
initial_shift -= bit_step;
}
UInt8 state = iterations % 2 == 0 ? LEFT_STATE : DEFAULT_STATE;
return {initial_shift, state};
}
constexpr static UInt8 STEP_MASK = (1 << bit_step) - 1;
constexpr static UInt8 HILBERT_SHIFT = getHilbertShift(bit_step);
constexpr static UInt8 HILBERT_MASK = (1 << HILBERT_SHIFT) - 1;
constexpr static UInt8 STATE_MASK = 0b11 << HILBERT_SHIFT;
constexpr static UInt8 LEFT_STATE = 0b01 << HILBERT_SHIFT;
constexpr static UInt8 DEFAULT_STATE = bit_step % 2 == 0 ? LEFT_STATE : 0;
};
}

View File

@ -77,75 +77,17 @@ inline void fillVectorVector(const ArrayCond & cond, const ArrayA & a, const Arr
{
size_t size = cond.size();
bool a_is_short = a.size() < size;
bool b_is_short = b.size() < size;
if (a_is_short && b_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t a_index = 0, b_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b[b_index]);
a_index += !!cond[i];
b_index += !cond[i];
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[i], res[i])
}
}
else if (a_is_short)
{
size_t a_index = 0;
for (size_t i = 0; i < size; ++i)
else
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b[i]);
a_index += !!cond[i];
}
}
else if (b_is_short)
{
size_t b_index = 0;
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[b_index]);
b_index += !cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b[i], res[i])
}
else
{
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[i]);
}
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b[i]);
}
}
}
@ -154,37 +96,16 @@ template <typename ArrayCond, typename ArrayA, typename B, typename ArrayResult,
inline void fillVectorConstant(const ArrayCond & cond, const ArrayA & a, B b, ArrayResult & res)
{
size_t size = cond.size();
bool a_is_short = a.size() < size;
if (a_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t a_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[a_index]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[a_index], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[a_index]) : static_cast<ResultType>(b);
a_index += !!cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a[i]) + (!cond[i]) * static_cast<ResultType>(b);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b);
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a[i], b, res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a[i]) : static_cast<ResultType>(b);
}
}
@ -192,37 +113,16 @@ template <typename ArrayCond, typename A, typename ArrayB, typename ArrayResult,
inline void fillConstantVector(const ArrayCond & cond, A a, const ArrayB & b, ArrayResult & res)
{
size_t size = cond.size();
bool b_is_short = b.size() < size;
if (b_is_short)
for (size_t i = 0; i < size; ++i)
{
size_t b_index = 0;
for (size_t i = 0; i < size; ++i)
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[b_index]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[b_index], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[b_index]);
b_index += !cond[i];
}
}
else
{
for (size_t i = 0; i < size; ++i)
{
if constexpr (is_native_int_or_decimal_v<ResultType>)
res[i] = !!cond[i] * static_cast<ResultType>(a) + (!cond[i]) * static_cast<ResultType>(b[i]);
else if constexpr (std::is_floating_point_v<ResultType>)
{
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[i]);
BRANCHFREE_IF_FLOAT(ResultType, cond[i], a, b[i], res[i])
}
else
res[i] = cond[i] ? static_cast<ResultType>(a) : static_cast<ResultType>(b[i]);
}
}
@ -880,9 +780,6 @@ private:
bool then_is_const = isColumnConst(*col_then);
bool else_is_const = isColumnConst(*col_else);
bool then_is_short = col_then->size() < cond_col->size();
bool else_is_short = col_else->size() < cond_col->size();
const auto & cond_array = cond_col->getData();
if (then_is_const && else_is_const)
@ -902,37 +799,34 @@ private:
{
const IColumn & then_nested_column = assert_cast<const ColumnConst &>(*col_then).getDataColumn();
size_t else_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(then_nested_column, 0);
else
result_column->insertFrom(*col_else, else_is_short ? else_index++ : i);
result_column->insertFrom(*col_else, i);
}
}
else if (else_is_const)
{
const IColumn & else_nested_column = assert_cast<const ColumnConst &>(*col_else).getDataColumn();
size_t then_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(*col_then, then_is_short ? then_index++ : i);
result_column->insertFrom(*col_then, i);
else
result_column->insertFrom(else_nested_column, 0);
}
}
else
{
size_t then_index = 0, else_index = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
if (cond_array[i])
result_column->insertFrom(*col_then, then_is_short ? then_index++ : i);
result_column->insertFrom(*col_then, i);
else
result_column->insertFrom(*col_else, else_is_short ? else_index++ : i);
result_column->insertFrom(*col_else, i);
}
}
@ -1125,9 +1019,6 @@ private:
if (then_is_null && else_is_null)
return result_type->createColumnConstWithDefaultValue(input_rows_count);
bool then_is_short = arg_then.column->size() < arg_cond.column->size();
bool else_is_short = arg_else.column->size() < arg_cond.column->size();
const ColumnUInt8 * cond_col = typeid_cast<const ColumnUInt8 *>(arg_cond.column.get());
const ColumnConst * cond_const_col = checkAndGetColumnConst<ColumnVector<UInt8>>(arg_cond.column.get());
@ -1146,8 +1037,6 @@ private:
{
arg_else_column = arg_else_column->convertToFullColumnIfConst();
auto result_column = IColumn::mutate(std::move(arg_else_column));
if (else_is_short)
result_column->expand(cond_col->getData(), true);
if (isColumnNullable(*result_column))
{
assert_cast<ColumnNullable &>(*result_column).applyNullMap(assert_cast<const ColumnUInt8 &>(*arg_cond.column));
@ -1193,8 +1082,6 @@ private:
{
arg_then_column = arg_then_column->convertToFullColumnIfConst();
auto result_column = IColumn::mutate(std::move(arg_then_column));
if (then_is_short)
result_column->expand(cond_col->getData(), false);
if (isColumnNullable(*result_column))
{

View File

@ -1,10 +1,11 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Functions/FunctionHelpers.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionSpaceFillingCurve.h>
#include <Functions/IFunction.h>
#include <Functions/PerformanceAdaptors.h>
#include <morton-nd/mortonND_LUT.h>
@ -15,13 +16,6 @@
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int ARGUMENT_OUT_OF_BOUND;
}
// NOLINTBEGIN(bugprone-switch-missing-default-case)
#define EXTRACT_VECTOR(INDEX) \
@ -186,7 +180,7 @@ constexpr auto MortonND_5D_Dec = mortonnd::MortonNDLutDecoder<5, 12, 8>();
constexpr auto MortonND_6D_Dec = mortonnd::MortonNDLutDecoder<6, 10, 8>();
constexpr auto MortonND_7D_Dec = mortonnd::MortonNDLutDecoder<7, 9, 8>();
constexpr auto MortonND_8D_Dec = mortonnd::MortonNDLutDecoder<8, 8, 8>();
class FunctionMortonDecode : public IFunction
class FunctionMortonDecode : public FunctionSpaceFillingCurveDecode<8, 1, 8>
{
public:
static constexpr auto name = "mortonDecode";
@ -200,68 +194,6 @@ public:
return name;
}
size_t getNumberOfArguments() const override
{
return 2;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
UInt64 tuple_size = 0;
const auto * col_const = typeid_cast<const ColumnConst *>(arguments[0].column.get());
if (!col_const)
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be a constant (UInt or Tuple)",
arguments[0].type->getName(), getName());
if (!WhichDataType(arguments[1].type).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be a native UInt",
arguments[1].type->getName(), getName());
const auto * mask = typeid_cast<const ColumnTuple *>(col_const->getDataColumnPtr().get());
if (mask)
{
tuple_size = mask->tupleSize();
}
else if (WhichDataType(arguments[0].type).isNativeUInt())
{
tuple_size = col_const->getUInt(0);
}
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column type {} of function {}, should be UInt or Tuple",
arguments[0].type->getName(), getName());
if (tuple_size > 8 || tuple_size < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal first argument for function {}, should be a number in range 1-8 or a Tuple of such size",
getName());
if (mask)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].type.get());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
auto ratio = mask->getColumn(i).getUInt(0);
if (ratio > 8 || ratio < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} in tuple for function {}, should be a number in range 1-8",
ratio, getName());
}
}
DataTypes types(tuple_size);
for (size_t i = 0; i < tuple_size; i++)
{
types[i] = std::make_shared<DataTypeUInt64>();
}
return std::make_shared<DataTypeTuple>(types);
}
static UInt64 shrink(UInt64 ratio, UInt64 value)
{
switch (ratio) // NOLINT(bugprone-switch-missing-default-case)

View File

@ -1,10 +1,9 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnConst.h>
#include <Columns/ColumnTuple.h>
#include <Functions/FunctionSpaceFillingCurve.h>
#include <Functions/PerformanceAdaptors.h>
#include <morton-nd/mortonND_LUT.h>
@ -19,7 +18,6 @@ namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
}
#define EXTRACT_VECTOR(INDEX) \
@ -144,7 +142,7 @@ constexpr auto MortonND_5D_Enc = mortonnd::MortonNDLutEncoder<5, 12, 8>();
constexpr auto MortonND_6D_Enc = mortonnd::MortonNDLutEncoder<6, 10, 8>();
constexpr auto MortonND_7D_Enc = mortonnd::MortonNDLutEncoder<7, 9, 8>();
constexpr auto MortonND_8D_Enc = mortonnd::MortonNDLutEncoder<8, 8, 8>();
class FunctionMortonEncode : public IFunction
class FunctionMortonEncode : public FunctionSpaceFillingCurveEncode
{
public:
static constexpr auto name = "mortonEncode";
@ -158,56 +156,6 @@ public:
return name;
}
bool isVariadic() const override
{
return true;
}
size_t getNumberOfArguments() const override
{
return 0;
}
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DB::DataTypes & arguments) const override
{
size_t vectorStartIndex = 0;
if (arguments.empty())
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"At least one UInt argument is required for function {}",
getName());
if (WhichDataType(arguments[0]).isTuple())
{
vectorStartIndex = 1;
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(arguments[0].get());
auto tuple_size = type_tuple->getElements().size();
if (tuple_size != (arguments.size() - 1))
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"Illegal argument {} for function {}, tuple size should be equal to number of UInt arguments",
arguments[0]->getName(), getName());
for (size_t i = 0; i < tuple_size; i++)
{
if (!WhichDataType(type_tuple->getElement(i)).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument in tuple for function {}, should be a native UInt",
type_tuple->getElement(i)->getName(), getName());
}
}
for (size_t i = vectorStartIndex; i < arguments.size(); i++)
{
const auto & arg = arguments[i];
if (!WhichDataType(arg).isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument of function {}, should be a native UInt",
arg->getName(), getName());
}
return std::make_shared<DataTypeUInt64>();
}
static UInt64 expand(UInt64 ratio, UInt64 value)
{
switch (ratio) // NOLINT(bugprone-switch-missing-default-case)

View File

@ -148,11 +148,6 @@ public:
bool condition_always_true = false;
bool condition_is_nullable = false;
bool source_is_constant = false;
bool condition_is_short = false;
bool source_is_short = false;
size_t condition_index = 0;
size_t source_index = 0;
};
ColumnPtr executeImpl(const ColumnsWithTypeAndName & args, const DataTypePtr & result_type, size_t input_rows_count) const override
@ -214,12 +209,9 @@ public:
instruction.condition = cond_col;
instruction.condition_is_nullable = instruction.condition->isNullable();
}
instruction.condition_is_short = cond_col->size() < arguments[0].column->size();
}
const ColumnWithTypeAndName & source_col = arguments[source_idx];
instruction.source_is_short = source_col.column->size() < arguments[0].column->size();
if (source_col.type->equals(*return_type))
{
instruction.source = source_col.column;
@ -250,19 +242,8 @@ public:
return ColumnConst::create(std::move(res), instruction.source->size());
}
bool contains_short = false;
for (const auto & instruction : instructions)
{
if (instruction.condition_is_short || instruction.source_is_short)
{
contains_short = true;
break;
}
}
const WhichDataType which(removeNullable(result_type));
bool execute_multiif_columnar = allow_execute_multiif_columnar && !contains_short
&& instructions.size() <= std::numeric_limits<UInt8>::max()
bool execute_multiif_columnar = allow_execute_multiif_columnar && instructions.size() <= std::numeric_limits<UInt8>::max()
&& (which.isInt() || which.isUInt() || which.isFloat() || which.isDecimal() || which.isDateOrDate32OrDateTimeOrDateTime64()
|| which.isEnum() || which.isIPv4() || which.isIPv6());
@ -339,25 +320,23 @@ private:
{
bool insert = false;
size_t condition_index = instruction.condition_is_short ? instruction.condition_index++ : i;
if (instruction.condition_always_true)
insert = true;
else if (!instruction.condition_is_nullable)
insert = assert_cast<const ColumnUInt8 &>(*instruction.condition).getData()[condition_index];
insert = assert_cast<const ColumnUInt8 &>(*instruction.condition).getData()[i];
else
{
const ColumnNullable & condition_nullable = assert_cast<const ColumnNullable &>(*instruction.condition);
const ColumnUInt8 & condition_nested = assert_cast<const ColumnUInt8 &>(condition_nullable.getNestedColumn());
const NullMap & condition_null_map = condition_nullable.getNullMapData();
insert = !condition_null_map[condition_index] && condition_nested.getData()[condition_index];
insert = !condition_null_map[i] && condition_nested.getData()[i];
}
if (insert)
{
size_t source_index = instruction.source_is_short ? instruction.source_index++ : i;
if (!instruction.source_is_constant)
res->insertFrom(*instruction.source, source_index);
res->insertFrom(*instruction.source, i);
else
res->insertFrom(assert_cast<const ColumnConst &>(*instruction.source).getDataColumn(), 0);

View File

@ -0,0 +1,81 @@
#include <gtest/gtest.h>
#include "Functions/hilbertDecode2DLUT.h"
#include "Functions/hilbertEncode2DLUT.h"
#include "base/types.h"
TEST(HilbertLookupTable, EncodeBit1And3Consistency)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert1bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<1>::encode(x, y);
auto hilbert3bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert1bit, hilbert3bit);
}
}
}
TEST(HilbertLookupTable, EncodeBit2And3Consistency)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert2bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<2>::encode(x, y);
auto hilbert3bit = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert3bit, hilbert2bit);
}
}
}
TEST(HilbertLookupTable, DecodeBit1And3Consistency)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto res1 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<1>::decode(hilbert_code);
auto res3 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(res1, res3);
}
}
TEST(HilbertLookupTable, DecodeBit2And3Consistency)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto res2 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<2>::decode(hilbert_code);
auto res3 = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(res2, res3);
}
}
TEST(HilbertLookupTable, DecodeAndEncodeAreInverseOperations)
{
const size_t bound = 1000;
for (size_t x = 0; x < bound; ++x)
{
for (size_t y = 0; y < bound; ++y)
{
auto hilbert_code = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
auto [x_new, y_new] = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
ASSERT_EQ(x_new, x);
ASSERT_EQ(y_new, y);
}
}
}
TEST(HilbertLookupTable, EncodeAndDecodeAreInverseOperations)
{
const size_t bound = 1000 * 1000;
for (size_t hilbert_code = 0; hilbert_code < bound; ++hilbert_code)
{
auto [x, y] = DB::FunctionHilbertDecode2DWIthLookupTableImpl<3>::decode(hilbert_code);
auto hilbert_new = DB::FunctionHilbertEncode2DWIthLookupTableImpl<3>::encode(x, y);
ASSERT_EQ(hilbert_new, hilbert_code);
}
}

View File

@ -1400,7 +1400,7 @@ public:
divide_result.type, input_rows_count);
auto minus_elem = minus->build({one, divide_result});
return minus_elem->execute({one, divide_result}, minus_elem->getResultType(), {});
return minus_elem->execute({one, divide_result}, minus_elem->getResultType(), input_rows_count);
}
};

View File

@ -149,16 +149,18 @@ namespace
dest_bucket, dest_key, /* local_path_ */ {}, /* data_size */ 0,
outcome.IsSuccess() ? nullptr : &outcome.GetError());
if (outcome.IsSuccess())
{
multipart_upload_id = outcome.GetResult().GetUploadId();
LOG_TRACE(log, "Multipart upload has created. Bucket: {}, Key: {}, Upload id: {}", dest_bucket, dest_key, multipart_upload_id);
}
else
if (!outcome.IsSuccess())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw S3Exception(outcome.GetError().GetMessage(), outcome.GetError().GetErrorType());
}
multipart_upload_id = outcome.GetResult().GetUploadId();
if (multipart_upload_id.empty())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw Exception(ErrorCodes::S3_ERROR, "Invalid CreateMultipartUpload result: missing UploadId.");
}
LOG_TRACE(log, "Multipart upload was created. Bucket: {}, Key: {}, Upload id: {}", dest_bucket, dest_key, multipart_upload_id);
}
void completeMultipartUpload()

View File

@ -413,7 +413,13 @@ void WriteBufferFromS3::createMultipartUpload()
multipart_upload_id = outcome.GetResult().GetUploadId();
LOG_TRACE(limitedLog, "Multipart upload has created. {}", getShortLogDetails());
if (multipart_upload_id.empty())
{
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3RequestsErrors, 1);
throw Exception(ErrorCodes::S3_ERROR, "Invalid CreateMultipartUpload result: missing UploadId.");
}
LOG_TRACE(limitedLog, "Multipart upload was created. {}", getShortLogDetails());
}
void WriteBufferFromS3::abortMultipartUpload()

View File

@ -1621,7 +1621,7 @@ void ActionsDAG::mergeInplace(ActionsDAG && second)
first.projected_output = second.projected_output;
}
void ActionsDAG::mergeNodes(ActionsDAG && second)
void ActionsDAG::mergeNodes(ActionsDAG && second, NodeRawConstPtrs * out_outputs)
{
std::unordered_map<std::string, const ActionsDAG::Node *> node_name_to_node;
for (auto & node : nodes)
@ -1677,6 +1677,12 @@ void ActionsDAG::mergeNodes(ActionsDAG && second)
nodes_to_process.pop_back();
}
if (out_outputs)
{
for (auto & node : second.getOutputs())
out_outputs->push_back(node_name_to_node.at(node->result_name));
}
if (nodes_to_move_from_second_dag.empty())
return;
@ -2888,6 +2894,7 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
FunctionOverloadResolverPtr function_overload_resolver;
String result_name;
if (node->function_base->getName() == "indexHint")
{
ActionsDAG::NodeRawConstPtrs children;
@ -2908,6 +2915,11 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
auto index_hint_function_clone = std::make_shared<FunctionIndexHint>();
index_hint_function_clone->setActions(std::move(index_hint_filter_dag));
function_overload_resolver = std::make_shared<FunctionToOverloadResolverAdaptor>(std::move(index_hint_function_clone));
/// Keep the unique name like "indexHint(foo)" instead of replacing it
/// with "indexHint()". Otherwise index analysis (which does look at
/// indexHint arguments that we're hiding here) will get confused by the
/// multiple substantially different nodes with the same result name.
result_name = node->result_name;
}
}
}
@ -2922,7 +2934,7 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG(
function_base,
std::move(function_children),
std::move(arguments),
{},
result_name,
node->result_type,
all_const);
break;

View File

@ -324,8 +324,9 @@ public:
/// So that pointers to nodes are kept valid.
void mergeInplace(ActionsDAG && second);
/// Merge current nodes with specified dag nodes
void mergeNodes(ActionsDAG && second);
/// Merge current nodes with specified dag nodes.
/// *out_outputs is filled with pointers to the nodes corresponding to second.getOutputs().
void mergeNodes(ActionsDAG && second, NodeRawConstPtrs * out_outputs = nullptr);
struct SplitResult
{

View File

@ -1323,7 +1323,9 @@ void ActionsMatcher::visit(const ASTLiteral & literal, const ASTPtr & /* ast */,
Data & data)
{
DataTypePtr type;
if (data.getContext()->getSettingsRef().allow_experimental_variant_type && data.getContext()->getSettingsRef().use_variant_as_common_type)
if (literal.custom_type)
type = literal.custom_type;
else if (data.getContext()->getSettingsRef().allow_experimental_variant_type && data.getContext()->getSettingsRef().use_variant_as_common_type)
type = applyVisitor(FieldToDataType<LeastSupertypeOnError::Variant>(), literal.value);
else
type = applyVisitor(FieldToDataType(), literal.value);

View File

@ -657,7 +657,7 @@ InterpreterSelectQuery::InterpreterSelectQuery(
MergeTreeWhereOptimizer where_optimizer{
std::move(column_compressed_sizes),
metadata_snapshot,
storage->getConditionEstimatorByPredicate(query_info, storage_snapshot, context),
storage->getConditionEstimatorByPredicate(storage_snapshot, nullptr, context),
queried_columns,
supported_prewhere_columns,
log};

View File

@ -653,7 +653,7 @@ BoolMask MergeTreeSetIndex::checkInRange(const std::vector<Range> & key_ranges,
/// Given left_lower >= left_point, right_lower >= right_point, find if there may be a match in between left_lower and right_lower.
if (left_lower + 1 < right_lower)
{
/// There is an point in between: left_lower + 1
/// There is a point in between: left_lower + 1
return {true, true};
}
else if (left_lower + 1 == right_lower)

View File

@ -0,0 +1,73 @@
#include <iostream>
#include <memory>
#include <DataTypes/DataTypeDate32.h>
#include <DataTypes/DataTypesNumber.h>
#include <Parsers/ASTLiteral.h>
#include <Interpreters/ActionsDAG.h>
#include <Interpreters/ActionsVisitor.h>
#include <Common/tests/gtest_global_context.h>
#include <gtest/gtest.h>
using namespace DB;
TEST(ActionsVisitor, VisitLiteral)
{
DataTypePtr date_type = std::make_shared<DataTypeDate32>();
DataTypePtr expect_type = std::make_shared<DataTypeInt16>();
const NamesAndTypesList name_and_types =
{
{"year", date_type}
};
const auto ast = std::make_shared<ASTLiteral>(19870);
auto context = Context::createCopy(getContext().context);
NamesAndTypesList aggregation_keys;
ColumnNumbersList aggregation_keys_indexes_list;
AggregationKeysInfo info(aggregation_keys, aggregation_keys_indexes_list, GroupByKind::NONE);
SizeLimits size_limits_for_set;
ActionsMatcher::Data visitor_data(
context,
size_limits_for_set,
size_t(0),
name_and_types,
std::make_shared<ActionsDAG>(name_and_types),
std::make_shared<PreparedSets>(),
false /* no_subqueries */,
false /* no_makeset */,
false /* only_consts */,
info);
ActionsVisitor(visitor_data).visit(ast);
auto actions = visitor_data.getActions();
ASSERT_EQ(actions->getResultColumns().back().type->getTypeId(), expect_type->getTypeId());
}
TEST(ActionsVisitor, VisitLiteralWithType)
{
DataTypePtr date_type = std::make_shared<DataTypeDate32>();
const NamesAndTypesList name_and_types =
{
{"year", date_type}
};
const auto ast = std::make_shared<ASTLiteral>(19870, date_type);
auto context = Context::createCopy(getContext().context);
NamesAndTypesList aggregation_keys;
ColumnNumbersList aggregation_keys_indexes_list;
AggregationKeysInfo info(aggregation_keys, aggregation_keys_indexes_list, GroupByKind::NONE);
SizeLimits size_limits_for_set;
ActionsMatcher::Data visitor_data(
context,
size_limits_for_set,
size_t(0),
name_and_types,
std::make_shared<ActionsDAG>(name_and_types),
std::make_shared<PreparedSets>(),
false /* no_subqueries */,
false /* no_makeset */,
false /* only_consts */,
info);
ActionsVisitor(visitor_data).visit(ast);
auto actions = visitor_data.getActions();
ASSERT_EQ(actions->getResultColumns().back().type->getTypeId(), date_type->getTypeId());
}

View File

@ -4,6 +4,7 @@
#include <Parsers/ASTWithAlias.h>
#include <Parsers/TokenIterator.h>
#include <Common/FieldVisitorDump.h>
#include <DataTypes/IDataType.h>
#include <optional>
@ -17,7 +18,14 @@ class ASTLiteral : public ASTWithAlias
public:
explicit ASTLiteral(Field value_) : value(std::move(value_)) {}
// This methond and the custom_type are only used for Apache Gluten,
explicit ASTLiteral(Field value_, DataTypePtr & type_) : value(std::move(value_))
{
custom_type = type_;
}
Field value;
DataTypePtr custom_type;
/// For ConstantExpressionTemplate
std::optional<TokenIterator> begin;

View File

@ -60,6 +60,7 @@ String calculateActionNodeNameWithCastIfNeeded(const ConstantNode & constant_nod
if (constant_node.requiresCastCall())
{
/// Projection name for constants is <value>_<type> so for _cast(1, 'String') we will have _cast(1_Uint8, 'String'_String)
buffer << ", '" << constant_node.getResultType()->getName() << "'_String)";
}

View File

@ -115,7 +115,9 @@ const BlockMissingValues & ArrowBlockInputFormat::getMissingValues() const
static std::shared_ptr<arrow::RecordBatchReader> createStreamReader(ReadBuffer & in)
{
auto stream_reader_status = arrow::ipc::RecordBatchStreamReader::Open(std::make_unique<ArrowInputStreamFromReadBuffer>(in));
auto options = arrow::ipc::IpcReadOptions::Defaults();
options.memory_pool = ArrowMemoryPool::instance();
auto stream_reader_status = arrow::ipc::RecordBatchStreamReader::Open(std::make_unique<ArrowInputStreamFromReadBuffer>(in), options);
if (!stream_reader_status.ok())
throw Exception(ErrorCodes::UNKNOWN_EXCEPTION,
"Error while opening a table: {}", stream_reader_status.status().ToString());
@ -128,7 +130,9 @@ static std::shared_ptr<arrow::ipc::RecordBatchFileReader> createFileReader(ReadB
if (is_stopped)
return nullptr;
auto file_reader_status = arrow::ipc::RecordBatchFileReader::Open(arrow_file);
auto options = arrow::ipc::IpcReadOptions::Defaults();
options.memory_pool = ArrowMemoryPool::instance();
auto file_reader_status = arrow::ipc::RecordBatchFileReader::Open(arrow_file, options);
if (!file_reader_status.ok())
throw Exception(ErrorCodes::UNKNOWN_EXCEPTION,
"Error while opening a table: {}", file_reader_status.status().ToString());

View File

@ -12,6 +12,7 @@
#include <arrow/util/future.h>
#include <arrow/io/memory.h>
#include <arrow/result.h>
#include <arrow/memory_pool_internal.h>
#include <Core/Settings.h>
#include <sys/stat.h>
@ -100,7 +101,7 @@ arrow::Result<int64_t> RandomAccessFileFromSeekableReadBuffer::Read(int64_t nbyt
arrow::Result<std::shared_ptr<arrow::Buffer>> RandomAccessFileFromSeekableReadBuffer::Read(int64_t nbytes)
{
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes))
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes, ArrowMemoryPool::instance()))
ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, Read(nbytes, buffer->mutable_data()))
if (bytes_read < nbytes)
@ -157,7 +158,7 @@ arrow::Result<int64_t> ArrowInputStreamFromReadBuffer::Read(int64_t nbytes, void
arrow::Result<std::shared_ptr<arrow::Buffer>> ArrowInputStreamFromReadBuffer::Read(int64_t nbytes)
{
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes))
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes, ArrowMemoryPool::instance()))
ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, Read(nbytes, buffer->mutable_data()))
if (bytes_read < nbytes)
@ -193,7 +194,8 @@ arrow::Result<int64_t> RandomAccessFileFromRandomAccessReadBuffer::ReadAt(int64_
{
try
{
return in.readBigAt(reinterpret_cast<char *>(out), nbytes, position, nullptr);
int64_t r = in.readBigAt(reinterpret_cast<char *>(out), nbytes, position, nullptr);
return r;
}
catch (...)
{
@ -205,7 +207,7 @@ arrow::Result<int64_t> RandomAccessFileFromRandomAccessReadBuffer::ReadAt(int64_
arrow::Result<std::shared_ptr<arrow::Buffer>> RandomAccessFileFromRandomAccessReadBuffer::ReadAt(int64_t position, int64_t nbytes)
{
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes))
ARROW_ASSIGN_OR_RAISE(auto buffer, arrow::AllocateResizableBuffer(nbytes, ArrowMemoryPool::instance()))
ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, ReadAt(position, nbytes, buffer->mutable_data()))
if (bytes_read < nbytes)
@ -231,6 +233,71 @@ arrow::Result<int64_t> RandomAccessFileFromRandomAccessReadBuffer::Tell() const
arrow::Result<int64_t> RandomAccessFileFromRandomAccessReadBuffer::Read(int64_t, void*) { return arrow::Status::NotImplemented(""); }
arrow::Result<std::shared_ptr<arrow::Buffer>> RandomAccessFileFromRandomAccessReadBuffer::Read(int64_t) { return arrow::Status::NotImplemented(""); }
ArrowMemoryPool * ArrowMemoryPool::instance()
{
static ArrowMemoryPool x;
return &x;
}
arrow::Status ArrowMemoryPool::Allocate(int64_t size, int64_t alignment, uint8_t ** out)
{
if (size == 0)
{
*out = arrow::memory_pool::internal::kZeroSizeArea;
return arrow::Status::OK();
}
try // is arrow exception-safe? idk, let's avoid throwing, just in case
{
void * p = Allocator<false>().alloc(size_t(size), size_t(alignment));
*out = reinterpret_cast<uint8_t*>(p);
}
catch (...)
{
return arrow::Status::OutOfMemory("allocation of size ", size, " failed");
}
return arrow::Status::OK();
}
arrow::Status ArrowMemoryPool::Reallocate(int64_t old_size, int64_t new_size, int64_t alignment, uint8_t ** ptr)
{
if (old_size == 0)
{
chassert(*ptr == arrow::memory_pool::internal::kZeroSizeArea);
return Allocate(new_size, alignment, ptr);
}
if (new_size == 0)
{
Free(*ptr, old_size, alignment);
*ptr = arrow::memory_pool::internal::kZeroSizeArea;
return arrow::Status::OK();
}
try
{
void * p = Allocator<false>().realloc(*ptr, size_t(old_size), size_t(new_size), size_t(alignment));
*ptr = reinterpret_cast<uint8_t*>(p);
}
catch (...)
{
return arrow::Status::OutOfMemory("reallocation of size ", new_size, " failed");
}
return arrow::Status::OK();
}
void ArrowMemoryPool::Free(uint8_t * buffer, int64_t size, int64_t /*alignment*/)
{
if (size == 0)
{
chassert(buffer == arrow::memory_pool::internal::kZeroSizeArea);
return;
}
Allocator<false>().free(buffer, size_t(size));
}
std::shared_ptr<arrow::io::RandomAccessFile> asArrowFile(
ReadBuffer & in,

View File

@ -6,6 +6,7 @@
#include <optional>
#include <arrow/io/interfaces.h>
#include <arrow/memory_pool.h>
#define ORC_MAGIC_BYTES "ORC"
#define PARQUET_MAGIC_BYTES "PAR1"
@ -124,6 +125,27 @@ private:
ARROW_DISALLOW_COPY_AND_ASSIGN(ArrowInputStreamFromReadBuffer);
};
/// By default, arrow allocated memory using posix_memalign(), which is currently not equipped with
/// clickhouse memory tracking. This adapter adds memory tracking.
class ArrowMemoryPool : public arrow::MemoryPool
{
public:
static ArrowMemoryPool * instance();
arrow::Status Allocate(int64_t size, int64_t alignment, uint8_t ** out) override;
arrow::Status Reallocate(int64_t old_size, int64_t new_size, int64_t alignment, uint8_t ** ptr) override;
void Free(uint8_t * buffer, int64_t size, int64_t alignment) override;
std::string backend_name() const override { return "clickhouse"; }
int64_t bytes_allocated() const override { return 0; }
int64_t total_bytes_allocated() const override { return 0; }
int64_t num_allocations() const override { return 0; }
private:
ArrowMemoryPool() = default;
};
std::shared_ptr<arrow::io::RandomAccessFile> asArrowFile(
ReadBuffer & in,
const FormatSettings & settings,

View File

@ -22,6 +22,7 @@
#include <Common/DateLUTImpl.h>
#include <base/types.h>
#include <Processors/Chunk.h>
#include <Processors/Formats/Impl/ArrowBufferedStreams.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnArray.h>
@ -1133,7 +1134,7 @@ static void checkStatus(const arrow::Status & status, const String & column_name
/// Create empty arrow column using specified field
static std::shared_ptr<arrow::ChunkedArray> createArrowColumn(const std::shared_ptr<arrow::Field> & field, const String & format_name)
{
arrow::MemoryPool * pool = arrow::default_memory_pool();
arrow::MemoryPool * pool = ArrowMemoryPool::instance();
std::unique_ptr<arrow::ArrayBuilder> array_builder;
arrow::Status status = MakeBuilder(pool, field->type(), &array_builder);
checkStatus(status, field->name(), format_name);

View File

@ -20,6 +20,7 @@
#include <DataTypes/DataTypeDateTime64.h>
#include <DataTypes/DataTypeFixedString.h>
#include <Processors/Formats/IOutputFormat.h>
#include <Processors/Formats/Impl/ArrowBufferedStreams.h>
#include <arrow/api.h>
#include <arrow/builder.h>
#include <arrow/type.h>
@ -418,7 +419,7 @@ namespace DB
/// Convert dictionary values to arrow array.
auto value_type = assert_cast<arrow::DictionaryType *>(builder->type().get())->value_type();
std::unique_ptr<arrow::ArrayBuilder> values_builder;
arrow::MemoryPool* pool = arrow::default_memory_pool();
arrow::MemoryPool* pool = ArrowMemoryPool::instance();
arrow::Status status = MakeBuilder(pool, value_type, &values_builder);
checkStatus(status, column->getName(), format_name);
@ -1025,7 +1026,7 @@ namespace DB
arrow_fields.emplace_back(std::make_shared<arrow::Field>(header_column.name, arrow_type, is_column_nullable));
}
arrow::MemoryPool * pool = arrow::default_memory_pool();
arrow::MemoryPool * pool = ArrowMemoryPool::instance();
std::unique_ptr<arrow::ArrayBuilder> array_builder;
arrow::Status status = MakeBuilder(pool, arrow_fields[column_i]->type(), &array_builder);
checkStatus(status, column->getName(), format_name);

View File

@ -103,7 +103,7 @@ static void getFileReaderAndSchema(
if (is_stopped)
return;
auto result = arrow::adapters::orc::ORCFileReader::Open(arrow_file, arrow::default_memory_pool());
auto result = arrow::adapters::orc::ORCFileReader::Open(arrow_file, ArrowMemoryPool::instance());
if (!result.ok())
throw Exception::createDeprecated(result.status().ToString(), ErrorCodes::BAD_ARGUMENTS);
file_reader = std::move(result).ValueOrDie();

View File

@ -46,12 +46,13 @@ namespace
std::unique_ptr<parquet::ParquetFileReader> createFileReader(
std::shared_ptr<::arrow::io::RandomAccessFile> arrow_file,
parquet::ReaderProperties reader_properties,
std::shared_ptr<parquet::FileMetaData> metadata = nullptr)
{
std::unique_ptr<parquet::ParquetFileReader> res;
THROW_PARQUET_EXCEPTION(res = parquet::ParquetFileReader::Open(
std::move(arrow_file),
parquet::default_reader_properties(),
reader_properties,
metadata));
return res;
}
@ -60,12 +61,12 @@ class ColReaderFactory
{
public:
ColReaderFactory(
const parquet::ArrowReaderProperties & reader_properties_,
const parquet::ArrowReaderProperties & arrow_properties_,
const parquet::ColumnDescriptor & col_descriptor_,
DataTypePtr ch_type_,
std::unique_ptr<parquet::ColumnChunkMetaData> meta_,
std::unique_ptr<parquet::PageReader> page_reader_)
: reader_properties(reader_properties_)
: arrow_properties(arrow_properties_)
, col_descriptor(col_descriptor_)
, ch_type(std::move(ch_type_))
, meta(std::move(meta_))
@ -74,7 +75,7 @@ public:
std::unique_ptr<ParquetColumnReader> makeReader();
private:
const parquet::ArrowReaderProperties & reader_properties;
const parquet::ArrowReaderProperties & arrow_properties;
const parquet::ColumnDescriptor & col_descriptor;
DataTypePtr ch_type;
std::unique_ptr<parquet::ColumnChunkMetaData> meta;
@ -274,7 +275,7 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
DataTypePtr read_type = ch_type;
if (!isDateTime64(ch_type))
{
auto scale = getScaleFromArrowTimeUnit(reader_properties.coerce_int96_timestamp_unit());
auto scale = getScaleFromArrowTimeUnit(arrow_properties.coerce_int96_timestamp_unit());
read_type = std::make_shared<DataTypeDateTime64>(scale);
}
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>>>(
@ -299,13 +300,14 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
ParquetRecordReader::ParquetRecordReader(
Block header_,
parquet::ArrowReaderProperties reader_properties_,
parquet::ArrowReaderProperties arrow_properties_,
parquet::ReaderProperties reader_properties_,
std::shared_ptr<::arrow::io::RandomAccessFile> arrow_file,
const FormatSettings & format_settings,
std::vector<int> row_groups_indices_,
std::shared_ptr<parquet::FileMetaData> metadata)
: file_reader(createFileReader(std::move(arrow_file), std::move(metadata)))
, reader_properties(reader_properties_)
: file_reader(createFileReader(std::move(arrow_file), reader_properties_, std::move(metadata)))
, arrow_properties(arrow_properties_)
, header(std::move(header_))
, max_block_size(format_settings.parquet.max_block_size)
, row_groups_indices(std::move(row_groups_indices_))
@ -337,10 +339,10 @@ ParquetRecordReader::ParquetRecordReader(
chassert(idx >= 0);
parquet_col_indice.push_back(idx);
}
if (reader_properties.pre_buffer())
if (arrow_properties.pre_buffer())
{
THROW_PARQUET_EXCEPTION(file_reader->PreBuffer(
row_groups_indices, parquet_col_indice, reader_properties.io_context(), reader_properties.cache_options()));
row_groups_indices, parquet_col_indice, arrow_properties.io_context(), arrow_properties.cache_options()));
}
}
@ -378,7 +380,7 @@ void ParquetRecordReader::loadNextRowGroup()
for (size_t i = 0; i < parquet_col_indice.size(); i++)
{
ColReaderFactory factory(
reader_properties,
arrow_properties,
*file_reader->metadata()->schema()->Column(parquet_col_indice[i]),
header.getByPosition(i).type,
cur_row_group_reader->metadata()->ColumnChunk(parquet_col_indice[i]),

View File

@ -19,7 +19,8 @@ class ParquetRecordReader
public:
ParquetRecordReader(
Block header_,
parquet::ArrowReaderProperties reader_properties_,
parquet::ArrowReaderProperties arrow_properties_,
parquet::ReaderProperties reader_properties_,
std::shared_ptr<::arrow::io::RandomAccessFile> arrow_file,
const FormatSettings & format_settings,
std::vector<int> row_groups_indices_,
@ -29,7 +30,7 @@ public:
private:
std::unique_ptr<parquet::ParquetFileReader> file_reader;
parquet::ArrowReaderProperties reader_properties;
parquet::ArrowReaderProperties arrow_properties;
Block header;

View File

@ -39,6 +39,7 @@ namespace DB
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int INCORRECT_DATA;
extern const int CANNOT_READ_ALL_DATA;
extern const int CANNOT_PARSE_NUMBER;
}
@ -47,7 +48,10 @@ namespace ErrorCodes
do \
{ \
if (::arrow::Status _s = (status); !_s.ok()) \
throw Exception::createDeprecated(_s.ToString(), ErrorCodes::BAD_ARGUMENTS); \
{ \
throw Exception::createDeprecated(_s.ToString(), \
_s.IsOutOfMemory() ? ErrorCodes::CANNOT_ALLOCATE_MEMORY : ErrorCodes::INCORRECT_DATA); \
} \
} while (false)
/// Decode min/max value from column chunk statistics.
@ -444,9 +448,10 @@ void ParquetBlockInputFormat::initializeRowGroupBatchReader(size_t row_group_bat
{
auto & row_group_batch = row_group_batches[row_group_batch_idx];
parquet::ArrowReaderProperties properties;
properties.set_use_threads(false);
properties.set_batch_size(format_settings.parquet.max_block_size);
parquet::ArrowReaderProperties arrow_properties;
parquet::ReaderProperties reader_properties(ArrowMemoryPool::instance());
arrow_properties.set_use_threads(false);
arrow_properties.set_batch_size(format_settings.parquet.max_block_size);
// When reading a row group, arrow will:
// 1. Look at `metadata` to get all byte ranges it'll need to read from the file (typically one
@ -464,11 +469,11 @@ void ParquetBlockInputFormat::initializeRowGroupBatchReader(size_t row_group_bat
//
// This adds one unnecessary copy. We should probably do coalescing and prefetch scheduling on
// our side instead.
properties.set_pre_buffer(true);
arrow_properties.set_pre_buffer(true);
auto cache_options = arrow::io::CacheOptions::LazyDefaults();
cache_options.hole_size_limit = min_bytes_for_seek;
cache_options.range_size_limit = 1l << 40; // reading the whole row group at once is fine
properties.set_cache_options(cache_options);
arrow_properties.set_cache_options(cache_options);
// Workaround for a workaround in the parquet library.
//
@ -481,7 +486,7 @@ void ParquetBlockInputFormat::initializeRowGroupBatchReader(size_t row_group_bat
// other, failing an assert. So we disable pre-buffering in this case.
// That version is >10 years old, so this is not very important.
if (metadata->writer_version().VersionLt(parquet::ApplicationVersion::PARQUET_816_FIXED_VERSION()))
properties.set_pre_buffer(false);
arrow_properties.set_pre_buffer(false);
if (format_settings.parquet.use_native_reader)
{
@ -495,7 +500,8 @@ void ParquetBlockInputFormat::initializeRowGroupBatchReader(size_t row_group_bat
row_group_batch.native_record_reader = std::make_shared<ParquetRecordReader>(
getPort().getHeader(),
std::move(properties),
arrow_properties,
reader_properties,
arrow_file,
format_settings,
row_group_batch.row_groups_idxs);
@ -503,10 +509,9 @@ void ParquetBlockInputFormat::initializeRowGroupBatchReader(size_t row_group_bat
else
{
parquet::arrow::FileReaderBuilder builder;
THROW_ARROW_NOT_OK(
builder.Open(arrow_file, /* not to be confused with ArrowReaderProperties */ parquet::default_reader_properties(), metadata));
builder.properties(properties);
// TODO: Pass custom memory_pool() to enable memory accounting with non-jemalloc allocators.
THROW_ARROW_NOT_OK(builder.Open(arrow_file, reader_properties, metadata));
builder.properties(arrow_properties);
builder.memory_pool(ArrowMemoryPool::instance());
THROW_ARROW_NOT_OK(builder.Build(&row_group_batch.file_reader));
THROW_ARROW_NOT_OK(

View File

@ -145,11 +145,10 @@ void ParquetBlockOutputFormat::consume(Chunk chunk)
/// Because the real SquashingTransform is only used for INSERT, not for SELECT ... INTO OUTFILE.
/// The latter doesn't even have a pipeline where a transform could be inserted, so it's more
/// convenient to do the squashing here. It's also parallelized here.
if (chunk.getNumRows() != 0)
{
staging_rows += chunk.getNumRows();
staging_bytes += chunk.bytes();
staging_bytes += chunk.allocatedBytes();
staging_chunks.push_back(std::move(chunk));
}
@ -282,11 +281,15 @@ void ParquetBlockOutputFormat::writeRowGroup(std::vector<Chunk> chunks)
writeUsingArrow(std::move(chunks));
else
{
Chunk concatenated = std::move(chunks[0]);
for (size_t i = 1; i < chunks.size(); ++i)
concatenated.append(chunks[i]);
chunks.clear();
Chunk concatenated;
while (!chunks.empty())
{
if (concatenated.empty())
concatenated.swap(chunks.back());
else
concatenated.append(chunks.back());
chunks.pop_back();
}
writeRowGroupInOneThread(std::move(concatenated));
}
}
@ -327,7 +330,7 @@ void ParquetBlockOutputFormat::writeUsingArrow(std::vector<Chunk> chunks)
auto result = parquet::arrow::FileWriter::Open(
*arrow_table->schema(),
arrow::default_memory_pool(),
ArrowMemoryPool::instance(),
sink,
builder.build(),
writer_props_builder.build());

Some files were not shown because too many files have changed in this diff Show More