Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-string-to-variant

This commit is contained in:
avogar 2024-04-03 15:20:21 +00:00
commit 56b5b5e171
493 changed files with 7767 additions and 2663 deletions

View File

@ -44,22 +44,35 @@ At a minimum, the following information should be added (but add more as needed)
---
### Modify your CI run:
**NOTE:** If your merge the PR with modified CI you **MUST KNOW** what you are doing
**NOTE:** Set desired options before CI starts or re-push after updates
**NOTE:** Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
#### Run only:
- [ ] <!---ci_set_integration--> Integration tests
- [ ] <!---ci_set_arm--> Integration tests (arm64)
- [ ] <!---ci_set_stateless--> Stateless tests (release)
- [ ] <!---ci_set_stateless_asan--> Stateless tests (asan)
- [ ] <!---ci_set_stateful--> Stateful tests (release)
- [ ] <!---ci_set_stateful_asan--> Stateful tests (asan)
- [ ] <!---ci_set_reduced--> No sanitizers
- [ ] <!---ci_set_analyzer--> Tests with analyzer
- [ ] <!---ci_set_fast--> Fast tests
- [ ] <!---job_package_debug--> Only package_debug build
- [ ] <!---PLACE_YOUR_TAG_CONFIGURED_IN_ci_config.py_FILE_HERE--> Add your CI variant description here
#### Include tests (required builds will be added automatically):
- [ ] <!---ci_include_fast--> Fast test
- [ ] <!---ci_include_integration--> Integration Tests
- [ ] <!---ci_include_stateless--> Stateless tests
- [ ] <!---ci_include_stateful--> Stateful tests
- [ ] <!---ci_include_unit--> Unit tests
- [ ] <!---ci_include_performance--> Performance tests
- [ ] <!---ci_include_asan--> All with ASAN
- [ ] <!---ci_include_tsan--> All with TSAN
- [ ] <!---ci_include_analyzer--> All with Analyzer
- [ ] <!---ci_include_KEYWORD--> Add your option here
#### CI options:
#### Exclude tests:
- [ ] <!---ci_exclude_fast--> Fast test
- [ ] <!---ci_exclude_integration--> Integration Tests
- [ ] <!---ci_exclude_stateless--> Stateless tests
- [ ] <!---ci_exclude_stateful--> Stateful tests
- [ ] <!---ci_exclude_performance--> Performance tests
- [ ] <!---ci_exclude_asan--> All with ASAN
- [ ] <!---ci_exclude_tsan--> All with TSAN
- [ ] <!---ci_exclude_msan--> All with MSAN
- [ ] <!---ci_exclude_ubsan--> All with UBSAN
- [ ] <!---ci_exclude_coverage--> All with Coverage
- [ ] <!---ci_exclude_aarch64--> All with Aarch64
- [ ] <!---ci_exclude_KEYWORD--> Add your option here
#### Extra options:
- [ ] <!---do_not_test--> do not test (only style check)
- [ ] <!---no_merge_commit--> disable merge-commit (no merge from master before tests)
- [ ] <!---no_ci_cache--> disable CI cache (job reuse)

View File

@ -374,7 +374,7 @@ jobs:
if: ${{ !failure() && !cancelled() }}
uses: ./.github/workflows/reusable_test.yml
with:
test_name: Stateless tests (release, analyzer, s3, DatabaseReplicated)
test_name: Stateless tests (release, old analyzer, s3, DatabaseReplicated)
runner_type: func-tester
data: ${{ needs.RunConfig.outputs.data }}
FunctionalStatelessTestS3Debug:
@ -632,7 +632,7 @@ jobs:
if: ${{ !failure() && !cancelled() }}
uses: ./.github/workflows/reusable_test.yml
with:
test_name: Integration tests (asan, analyzer)
test_name: Integration tests (asan, old analyzer)
runner_type: stress-tester
data: ${{ needs.RunConfig.outputs.data }}
IntegrationTestsTsan:

View File

@ -436,7 +436,7 @@ jobs:
if: ${{ !failure() && !cancelled() }}
uses: ./.github/workflows/reusable_test.yml
with:
test_name: Integration tests (asan, analyzer)
test_name: Integration tests (asan, old analyzer)
runner_type: stress-tester
data: ${{ needs.RunConfig.outputs.data }}
IntegrationTestsTsan:

3
.gitignore vendored
View File

@ -164,6 +164,9 @@ tests/queries/0_stateless/*.generated-expect
tests/queries/0_stateless/*.expect.history
tests/integration/**/_gen
# pytest --pdb history
.pdb_history
# rust
/rust/**/target*
# It is autogenerated from *.in

View File

@ -6,7 +6,7 @@
# 2024 Changelog
### <a id="243"></a> ClickHouse release 24.3 LTS, 2024-03-26
### <a id="243"></a> ClickHouse release 24.3 LTS, 2024-03-27
#### Upgrade Notes
* The setting `allow_experimental_analyzer` is enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting the `compatibility` to `24.2` or disabling the `allow_experimental_analyzer` setting. Watch the [video on YouTube](https://www.youtube.com/watch?v=zhrOYQpgvkk).
@ -123,7 +123,6 @@
* Something was wrong with Apache Hive, which is experimental and not supported. [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)).
* An improvement for experimental parallel replicas: force reanalysis if parallel replicas changed [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)).
* Fix usage of plain metadata type with new disks configuration option [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Don't allow to set max_parallel_replicas to 0 as it doesn't make sense [#60430](https://github.com/ClickHouse/ClickHouse/pull/60430) ([Kruglov Pavel](https://github.com/Avogar)).
* Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)).
* Avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -13,18 +13,16 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 24.3 | ✔️ |
| 24.2 | ✔️ |
| 24.1 | ✔️ |
| 23.12 | ✔️ |
| 23.11 | ❌ |
| 23.10 | ❌ |
| 23.9 | ❌ |
| 23.* | ❌ |
| 23.8 | ✔️ |
| 23.7 | ❌ |
| 23.6 | ❌ |
| 23.5 | ❌ |
| 23.4 | ❌ |
| 23.3 | ✔️ |
| 23.3 | |
| 23.2 | ❌ |
| 23.1 | ❌ |
| 22.* | ❌ |

View File

@ -314,13 +314,13 @@ static int read_unicode(json_stream *json)
if (l < 0xdc00 || l > 0xdfff) {
json_error(json, "invalid surrogate pair continuation \\u%04lx out "
"of range (dc00-dfff)", l);
"of range (dc00-dfff)", (unsigned long)l);
return -1;
}
cp = ((h - 0xd800) * 0x400) + ((l - 0xdc00) + 0x10000);
} else if (cp >= 0xdc00 && cp <= 0xdfff) {
json_error(json, "dangling surrogate \\u%04lx", cp);
json_error(json, "dangling surrogate \\u%04lx", (unsigned long)cp);
return -1;
}

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54484)
SET(VERSION_REVISION 54485)
SET(VERSION_MAJOR 24)
SET(VERSION_MINOR 3)
SET(VERSION_MINOR 4)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 891689a41506d00aa169548f5b4a8774351242c4)
SET(VERSION_DESCRIBE v24.3.1.1-testing)
SET(VERSION_STRING 24.3.1.1)
SET(VERSION_GITHASH 2c5c589a882ceec35439650337b92db3e76f0081)
SET(VERSION_DESCRIBE v24.4.1.1-testing)
SET(VERSION_STRING 24.4.1.1)
# end of autochange

2
contrib/NuRaft vendored

@ -1 +1 @@
Subproject commit 4a12f99dfc9d47c687ff7700b927cc76856225d1
Subproject commit 08ac76ea80a37f89b12109c805eafe9f1dc9b991

View File

@ -32,6 +32,7 @@ set(SRCS
"${LIBRARY_DIR}/src/handle_custom_notification.cxx"
"${LIBRARY_DIR}/src/handle_vote.cxx"
"${LIBRARY_DIR}/src/launcher.cxx"
"${LIBRARY_DIR}/src/log_entry.cxx"
"${LIBRARY_DIR}/src/srv_config.cxx"
"${LIBRARY_DIR}/src/snapshot_sync_req.cxx"
"${LIBRARY_DIR}/src/snapshot_sync_ctx.cxx"

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.2.2.71"
ARG VERSION="24.3.1.2672"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""
@ -44,7 +44,10 @@ ARG DIRECT_DOWNLOAD_URLS=""
# We do that in advance at the begining of Dockerfile before any packages will be
# installed to prevent picking those uid / gid by some unrelated software.
# The same uid / gid (101) is used both for alpine and ubuntu.
ARG DEFAULT_UID="101"
ARG DEFAULT_GID="101"
RUN addgroup -S -g "${DEFAULT_GID}" clickhouse && \
adduser -S -h "/var/lib/clickhouse" -s /bin/bash -G clickhouse -g "ClickHouse keeper" -u "${DEFAULT_UID}" clickhouse
ARG TARGETARCH
RUN arch=${TARGETARCH:-amd64} \
@ -71,20 +74,21 @@ RUN arch=${TARGETARCH:-amd64} \
fi \
; done \
&& rm /tmp/*.tgz /install -r \
&& addgroup -S -g 101 clickhouse \
&& adduser -S -h /var/lib/clickhouse -s /bin/bash -G clickhouse -g "ClickHouse keeper" -u 101 clickhouse \
&& mkdir -p /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper \
&& chown clickhouse:clickhouse /var/lib/clickhouse \
&& chown root:clickhouse /var/log/clickhouse-keeper \
&& chmod +x /entrypoint.sh \
&& apk add --no-cache su-exec bash tzdata \
&& cp /usr/share/zoneinfo/UTC /etc/localtime \
&& echo "UTC" > /etc/timezone \
&& chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper
&& echo "UTC" > /etc/timezone
ARG DEFAULT_CONFIG_DIR="/etc/clickhouse-keeper"
ARG DEFAULT_DATA_DIR="/var/lib/clickhouse-keeper"
ARG DEFAULT_LOG_DIR="/var/log/clickhouse-keeper"
RUN mkdir -p "${DEFAULT_DATA_DIR}" "${DEFAULT_LOG_DIR}" "${DEFAULT_CONFIG_DIR}" \
&& chown clickhouse:clickhouse "${DEFAULT_DATA_DIR}" \
&& chown root:clickhouse "${DEFAULT_LOG_DIR}" \
&& chmod ugo+Xrw -R "${DEFAULT_DATA_DIR}" "${DEFAULT_LOG_DIR}" "${DEFAULT_CONFIG_DIR}"
# /var/lib/clickhouse is necessary due to the current default configuration for Keeper
VOLUME "${DEFAULT_DATA_DIR}" /var/lib/clickhouse
EXPOSE 2181 10181 44444 9181
VOLUME /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -80,7 +80,7 @@ if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
# so the container can't be finished by ctrl+c
export CLICKHOUSE_WATCHDOG_ENABLE
cd /var/lib/clickhouse
cd "${DATA_DIR}"
# There is a config file. It is already tested with gosu (if it is readably by keeper user)
if [ -f "$KEEPER_CONFIG" ]; then

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.2.2.71"
ARG VERSION="24.3.1.2672"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""
@ -42,6 +42,10 @@ ARG DIRECT_DOWNLOAD_URLS=""
# We do that in advance at the begining of Dockerfile before any packages will be
# installed to prevent picking those uid / gid by some unrelated software.
# The same uid / gid (101) is used both for alpine and ubuntu.
ARG DEFAULT_UID="101"
ARG DEFAULT_GID="101"
RUN addgroup -S -g "${DEFAULT_GID}" clickhouse && \
adduser -S -h "/var/lib/clickhouse" -s /bin/bash -G clickhouse -g "ClickHouse server" -u "${DEFAULT_UID}" clickhouse
RUN arch=${TARGETARCH:-amd64} \
&& cd /tmp \
@ -66,23 +70,30 @@ RUN arch=${TARGETARCH:-amd64} \
fi \
; done \
&& rm /tmp/*.tgz /install -r \
&& addgroup -S -g 101 clickhouse \
&& adduser -S -h /var/lib/clickhouse -s /bin/bash -G clickhouse -g "ClickHouse server" -u 101 clickhouse \
&& mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server/config.d /etc/clickhouse-server/users.d /etc/clickhouse-client /docker-entrypoint-initdb.d \
&& chown clickhouse:clickhouse /var/lib/clickhouse \
&& chown root:clickhouse /var/log/clickhouse-server \
&& chmod +x /entrypoint.sh \
&& apk add --no-cache bash tzdata \
&& cp /usr/share/zoneinfo/UTC /etc/localtime \
&& echo "UTC" > /etc/timezone \
&& chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client
&& echo "UTC" > /etc/timezone
# we need to allow "others" access to clickhouse folder, because docker container
# can be started with arbitrary uid (openshift usecase)
ARG DEFAULT_CLIENT_CONFIG_DIR="/etc/clickhouse-client"
ARG DEFAULT_SERVER_CONFIG_DIR="/etc/clickhouse-server"
ARG DEFAULT_DATA_DIR="/var/lib/clickhouse"
ARG DEFAULT_LOG_DIR="/var/log/clickhouse-server"
# we need to allow "others" access to ClickHouse folders, because docker containers
# can be started with arbitrary uids (OpenShift usecase)
RUN mkdir -p \
"${DEFAULT_DATA_DIR}" \
"${DEFAULT_LOG_DIR}" \
"${DEFAULT_CLIENT_CONFIG_DIR}" \
"${DEFAULT_SERVER_CONFIG_DIR}/config.d" \
"${DEFAULT_SERVER_CONFIG_DIR}/users.d" \
/docker-entrypoint-initdb.d \
&& chown clickhouse:clickhouse "${DEFAULT_DATA_DIR}" \
&& chown root:clickhouse "${DEFAULT_LOG_DIR}" \
&& chmod ugo+Xrw -R "${DEFAULT_DATA_DIR}" "${DEFAULT_LOG_DIR}" "${DEFAULT_CLIENT_CONFIG_DIR}" "${DEFAULT_SERVER_CONFIG_DIR}"
VOLUME "${DEFAULT_DATA_DIR}"
EXPOSE 9000 8123 9009
VOLUME /var/lib/clickhouse \
/var/log/clickhouse-server
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -27,7 +27,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.2.2.71"
ARG VERSION="24.3.1.2672"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -25,7 +25,7 @@ azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
config_logs_export_cluster /etc/clickhouse-server/config.d/system_logs_export.yaml
cache_policy=""
if [ $(( $(date +%-d) % 2 )) -eq 1 ]; then
if [ $(($RANDOM%2)) -eq 1 ]; then
cache_policy="SLRU"
else
cache_policy="LRU"

View File

@ -257,10 +257,10 @@ do
echo "$err"
[[ "0" != "${#err}" ]] && failed_to_save_logs=1
if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]; then
err=$( { clickhouse-client -q "select * from system.$table format TSVWithNamesAndTypes" | zstd --threads=0 > /test_output/$table.1.tsv.zst; } 2>&1 )
err=$( { clickhouse-client --port 19000 -q "select * from system.$table format TSVWithNamesAndTypes" | zstd --threads=0 > /test_output/$table.1.tsv.zst; } 2>&1 )
echo "$err"
[[ "0" != "${#err}" ]] && failed_to_save_logs=1
err=$( { clickhouse-client -q "select * from system.$table format TSVWithNamesAndTypes" | zstd --threads=0 > /test_output/$table.2.tsv.zst; } 2>&1 )
err=$( { clickhouse-client --port 29000 -q "select * from system.$table format TSVWithNamesAndTypes" | zstd --threads=0 > /test_output/$table.2.tsv.zst; } 2>&1 )
echo "$err"
[[ "0" != "${#err}" ]] && failed_to_save_logs=1
fi

View File

@ -72,7 +72,7 @@ mv /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/c
# Randomize cache policies.
cache_policy=""
if [ $(( $(date +%-d) % 2 )) -eq 1 ]; then
if [ $(($RANDOM%2)) -eq 1 ]; then
cache_policy="SLRU"
else
cache_policy="LRU"
@ -87,6 +87,25 @@ if [ "$cache_policy" = "SLRU" ]; then
mv /etc/clickhouse-server/config.d/storage_conf.xml.tmp /etc/clickhouse-server/config.d/storage_conf.xml
fi
# Disable experimental WINDOW VIEW tests for stress tests, since they may be
# created with old analyzer and then, after server restart it will refuse to
# start.
# FIXME: remove once the support for WINDOW VIEW will be implemented in analyzer.
sudo cat /etc/clickhouse-server/users.d/stress_tests_overrides.xml <<EOL
<clickhouse>
<profiles>
<default>
<allow_experimental_window_view>false</allow_experimental_window_view>
<constraints>
<allow_experimental_window_view>
<readonly/>
</allow_experimental_window_view>
</constraints>
</default>
</profiles>
</clickhouse>
EOL
start_server
clickhouse-client --query "SHOW TABLES FROM datasets"

View File

@ -0,0 +1,537 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.3.1.2672-lts (2c5c589a882) FIXME as compared to v24.2.1.2248-stable (891689a4150)
#### Backward Incompatible Change
* Don't allow to set max_parallel_replicas to 0 as it doesn't make sense. Setting it to 0 could lead to unexpected logical errors. Closes [#60140](https://github.com/ClickHouse/ClickHouse/issues/60140). [#60430](https://github.com/ClickHouse/ClickHouse/pull/60430) ([Kruglov Pavel](https://github.com/Avogar)).
* Change the column name from `duration_ms` to `duration_microseconds` in the `system.zookeeper` table to reflect the reality that the duration is in the microsecond resolution. [#60774](https://github.com/ClickHouse/ClickHouse/pull/60774) ([Duc Canh Le](https://github.com/canhld94)).
* Reject incoming INSERT queries in case when query-level settings `async_insert` and `deduplicate_blocks_in_dependent_materialized_views` are enabled together. This behaviour is controlled by a setting `throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert` and enabled by default. This is a continuation of https://github.com/ClickHouse/ClickHouse/pull/59699 needed to unblock https://github.com/ClickHouse/ClickHouse/pull/59915. [#60888](https://github.com/ClickHouse/ClickHouse/pull/60888) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Utility `clickhouse-copier` is moved to a separate repository on GitHub: https://github.com/ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download. This closes: [#60734](https://github.com/ClickHouse/ClickHouse/issues/60734) This closes: [#60540](https://github.com/ClickHouse/ClickHouse/issues/60540) This closes: [#60250](https://github.com/ClickHouse/ClickHouse/issues/60250) This closes: [#52917](https://github.com/ClickHouse/ClickHouse/issues/52917) This closes: [#51140](https://github.com/ClickHouse/ClickHouse/issues/51140) This closes: [#47517](https://github.com/ClickHouse/ClickHouse/issues/47517) This closes: [#47189](https://github.com/ClickHouse/ClickHouse/issues/47189) This closes: [#46598](https://github.com/ClickHouse/ClickHouse/issues/46598) This closes: [#40257](https://github.com/ClickHouse/ClickHouse/issues/40257) This closes: [#36504](https://github.com/ClickHouse/ClickHouse/issues/36504) This closes: [#35485](https://github.com/ClickHouse/ClickHouse/issues/35485) This closes: [#33702](https://github.com/ClickHouse/ClickHouse/issues/33702) This closes: [#26702](https://github.com/ClickHouse/ClickHouse/issues/26702) ### Documentation entry for user-facing changes. [#61058](https://github.com/ClickHouse/ClickHouse/pull/61058) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* To increase compatibility with MySQL, function `locate` now accepts arguments `(needle, haystack[, start_pos])` by default. The previous behavior `(haystack, needle, [, start_pos])` can be restored by setting `function_locate_has_mysql_compatible_argument_order = 0`. [#61092](https://github.com/ClickHouse/ClickHouse/pull/61092) ([Robert Schulze](https://github.com/rschu1ze)).
* The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of [#55186](https://github.com/ClickHouse/ClickHouse/issues/55186) and [#45409](https://github.com/ClickHouse/ClickHouse/issues/45409). It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: `SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type`. To disable the usage of in-memory data parts, do `ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT`. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. [#61127](https://github.com/ClickHouse/ClickHouse/pull/61127) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Forbid `SimpleAggregateFunction` in `ORDER BY` of `MergeTree` tables (like `AggregateFunction` is forbidden, but they are forbidden because they are not comparable) by default (use `allow_suspicious_primary_key` to allow them). [#61399](https://github.com/ClickHouse/ClickHouse/pull/61399) ([Azat Khuzhin](https://github.com/azat)).
* ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`, `output_format_arrow_string_as_string`. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster `lz4` compression method, that's why we set `zstd` by default. This is controlled by the settings `output_format_parquet_compression_method`, `output_format_orc_compression_method`, and `output_format_arrow_compression_method`. We changed the default to `zstd` for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). [#61817](https://github.com/ClickHouse/ClickHouse/pull/61817) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* In the new ClickHouse version, the functions `geoDistance`, `greatCircleDistance`, and `greatCircleAngle` will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes [#58476](https://github.com/ClickHouse/ClickHouse/issues/58476). In previous versions, the function always used Float32. You can switch to the old behavior by setting `geo_distance_returns_float64_on_float64_arguments` to `false` or setting `compatibility` to `24.2` or earlier. [#61848](https://github.com/ClickHouse/ClickHouse/pull/61848) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)).
* Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([divanik](https://github.com/divanik)).
* Support reading and writing backups as tar archives. [#59535](https://github.com/ClickHouse/ClickHouse/pull/59535) ([josh-hildred](https://github.com/josh-hildred)).
* Implemented support for S3Express buckets. [#59965](https://github.com/ClickHouse/ClickHouse/pull/59965) ([Nikita Taranov](https://github.com/nickitat)).
* Allow to attach parts from a different disk * attach partition from the table on other disks using copy instead of hard link (such as instant table) * attach partition using copy when the hard link fails even on the same disk. [#60112](https://github.com/ClickHouse/ClickHouse/pull/60112) ([Unalian](https://github.com/Unalian)).
* Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60281](https://github.com/ClickHouse/ClickHouse/pull/60281) ([Shaun Struwig](https://github.com/Blargian)).
* Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. [#60420](https://github.com/ClickHouse/ClickHouse/pull/60420) ([豪肥肥](https://github.com/HowePa)).
* Add four properties to the `StorageMemory` (memory-engine) `min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep` and `max_rows_to_keep` - Add tests to reflect new changes - Update `memory.md` documentation - Add table `context` property to `MemorySink` to enable access to table parameter bounds. [#60612](https://github.com/ClickHouse/ClickHouse/pull/60612) ([Jake Bamrah](https://github.com/JakeBamrah)).
* Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60649](https://github.com/ClickHouse/ClickHouse/pull/60649) ([Robert Schulze](https://github.com/rschu1ze)).
* Separate limits on number of waiting and executing queries. Added new server setting `max_waiting_queries` that limits the number of queries waiting due to `async_load_databases`. Existing limits on number of executing queries no longer count waiting queries. [#61053](https://github.com/ClickHouse/ClickHouse/pull/61053) ([Sergei Trifonov](https://github.com/serxa)).
* Add support for `ATTACH PARTITION ALL`. [#61107](https://github.com/ClickHouse/ClickHouse/pull/61107) ([Kirill Nikiforov](https://github.com/allmazz)).
* Add a new function, `getClientHTTPHeader`. This closes [#54665](https://github.com/ClickHouse/ClickHouse/issues/54665). Co-authored with @lingtaolf. [#61820](https://github.com/ClickHouse/ClickHouse/pull/61820) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Performance Improvement
* Improve the performance of serialized aggregation method when involving multiple [nullable] columns. This is a general version of [#51399](https://github.com/ClickHouse/ClickHouse/issues/51399) that doesn't compromise on abstraction integrity. [#55809](https://github.com/ClickHouse/ClickHouse/pull/55809) ([Amos Bird](https://github.com/amosbird)).
* Lazy build join output to improve performance of ALL join. [#58278](https://github.com/ClickHouse/ClickHouse/pull/58278) ([LiuNeng](https://github.com/liuneng1994)).
* Improvements to aggregate functions ArgMin / ArgMax / any / anyLast / anyHeavy, as well as `ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1` queries. [#58640](https://github.com/ClickHouse/ClickHouse/pull/58640) ([Raúl Marín](https://github.com/Algunenano)).
* Trivial optimize on column filter. Avoid those filter columns whoes underlying data type is not number being filtered with `result_size_hint = -1`. Peak memory can be reduced to 44% of the original in some cases. [#59698](https://github.com/ClickHouse/ClickHouse/pull/59698) ([李扬](https://github.com/taiyang-li)).
* If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting `primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns` with the value `0.9` by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. [#60255](https://github.com/ClickHouse/ClickHouse/pull/60255) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Execute multiIf function columnarly when result_type's underlying type is number. [#60384](https://github.com/ClickHouse/ClickHouse/pull/60384) ([李扬](https://github.com/taiyang-li)).
* Faster (almost 2x) mutexes (was slower due to ThreadFuzzer). [#60823](https://github.com/ClickHouse/ClickHouse/pull/60823) ([Azat Khuzhin](https://github.com/azat)).
* Move connection drain from prepare to work, and drain multiple connections in parallel. [#60845](https://github.com/ClickHouse/ClickHouse/pull/60845) ([lizhuoyu5](https://github.com/lzydmxy)).
* Optimize insertManyFrom of nullable number or nullable string. [#60846](https://github.com/ClickHouse/ClickHouse/pull/60846) ([李扬](https://github.com/taiyang-li)).
* Optimized function `dotProduct` to omit unnecessary and expensive memory copies. [#60928](https://github.com/ClickHouse/ClickHouse/pull/60928) ([Robert Schulze](https://github.com/rschu1ze)).
* Operations with the filesystem cache will suffer less from the lock contention. [#61066](https://github.com/ClickHouse/ClickHouse/pull/61066) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Optimize ColumnString::replicate and prevent memcpySmallAllowReadWriteOverflow15Impl from being optimized to built-in memcpy. Close [#61074](https://github.com/ClickHouse/ClickHouse/issues/61074). ColumnString::replicate speeds up by 2.46x on x86-64. [#61075](https://github.com/ClickHouse/ClickHouse/pull/61075) ([李扬](https://github.com/taiyang-li)).
* 30x faster printing for 256-bit integers. [#61100](https://github.com/ClickHouse/ClickHouse/pull/61100) ([Raúl Marín](https://github.com/Algunenano)).
* If a query with a syntax error contained COLUMNS matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. [#61543](https://github.com/ClickHouse/ClickHouse/pull/61543) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a new analyzer pass to optimize in single value. [#61564](https://github.com/ClickHouse/ClickHouse/pull/61564) ([LiuNeng](https://github.com/liuneng1994)).
#### Improvement
* While running the MODIFY COLUMN query for materialized views, check the inner table's structure to ensure every column exists. [#47427](https://github.com/ClickHouse/ClickHouse/pull/47427) ([sunny](https://github.com/sunny19930321)).
* Added table `system.keywords` which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. [#51808](https://github.com/ClickHouse/ClickHouse/pull/51808) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Ordinary database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes [#52229](https://github.com/ClickHouse/ClickHouse/issues/52229). [#56942](https://github.com/ClickHouse/ClickHouse/pull/56942) ([shabroo](https://github.com/shabroo)).
* All zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. [#57575](https://github.com/ClickHouse/ClickHouse/pull/57575) ([Sema Checherinda](https://github.com/CheSema)).
* Allow declaring enum in external table structure. [#57857](https://github.com/ClickHouse/ClickHouse/pull/57857) ([Duc Canh Le](https://github.com/canhld94)).
* Consider lightweight deleted rows when selecting parts to merge. [#58223](https://github.com/ClickHouse/ClickHouse/pull/58223) ([Zhuo Qiu](https://github.com/jewelzqiu)).
* This PR makes http/https connections reusable for all uses cases. Even when response is 3xx or 4xx. [#58845](https://github.com/ClickHouse/ClickHouse/pull/58845) ([Sema Checherinda](https://github.com/CheSema)).
* Added comments for columns for more system tables. Continuation of https://github.com/ClickHouse/ClickHouse/pull/58356. [#59016](https://github.com/ClickHouse/ClickHouse/pull/59016) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like `_part_offset`. [#59033](https://github.com/ClickHouse/ClickHouse/pull/59033) ([Amos Bird](https://github.com/amosbird)).
* Add ability to skip read-only replicas for INSERT into Distributed engine (Controlled with `distributed_insert_skip_read_only_replicas` setting, by default OFF - backward compatible). [#59176](https://github.com/ClickHouse/ClickHouse/pull/59176) ([Azat Khuzhin](https://github.com/azat)).
* Instead using a constant key, now object storage generates key for determining remove objects capability. [#59495](https://github.com/ClickHouse/ClickHouse/pull/59495) ([Sema Checherinda](https://github.com/CheSema)).
* Add positional pread in libhdfs3. If you want to call positional read in libhdfs3, use the hdfsPread function in hdfs.h as follows. `tSize hdfsPread(hdfsFS fs, hdfsFile file, void * buffer, tSize length, tOffset position);`. [#59624](https://github.com/ClickHouse/ClickHouse/pull/59624) ([M1eyu](https://github.com/M1eyu2018)).
* Add asynchronous WriteBuffer for AzureBlobStorage similar to S3. [#59929](https://github.com/ClickHouse/ClickHouse/pull/59929) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Allow "local" as object storage type instead of "local_blob_storage". [#60165](https://github.com/ClickHouse/ClickHouse/pull/60165) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improved overall usability of virtual columns. Now it is allowed to use virtual columns in `PREWHERE` (it's worthwhile for non-const virtual columns like `_part_offset`). Now a builtin documentation is available for virtual columns as a comment of column in `DESCRIBE` query with enabled setting `describe_include_virtual_columns`. [#60205](https://github.com/ClickHouse/ClickHouse/pull/60205) ([Anton Popov](https://github.com/CurtizJ)).
* Parallel flush of pending INSERT blocks of Distributed engine on `DETACH`/server shutdown and `SYSTEM FLUSH DISTRIBUTED` (Parallelism will work only if you have multi disk policy for table (like everything in Distributed engine right now)). [#60225](https://github.com/ClickHouse/ClickHouse/pull/60225) ([Azat Khuzhin](https://github.com/azat)).
* Filter setting is improper in `joinRightColumnsSwitchNullability`, resolve [#59625](https://github.com/ClickHouse/ClickHouse/issues/59625). [#60259](https://github.com/ClickHouse/ClickHouse/pull/60259) ([lgbo](https://github.com/lgbo-ustc)).
* Add a setting to force read-through cache for merges. [#60308](https://github.com/ClickHouse/ClickHouse/pull/60308) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Issue [#57598](https://github.com/ClickHouse/ClickHouse/issues/57598) mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. [#60338](https://github.com/ClickHouse/ClickHouse/pull/60338) ([PapaToemmsn](https://github.com/PapaToemmsn)).
* Added `none_only_active` mode for `distributed_ddl_output_mode` setting. [#60340](https://github.com/ClickHouse/ClickHouse/pull/60340) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make `/` redirect to the Play UI. [#60390](https://github.com/ClickHouse/ClickHouse/pull/60390) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The advanced dashboard has slightly better colors for multi-line graphs. [#60391](https://github.com/ClickHouse/ClickHouse/pull/60391) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Function `substring` now has a new alias `byteSlice`. [#60494](https://github.com/ClickHouse/ClickHouse/pull/60494) ([Robert Schulze](https://github.com/rschu1ze)).
* Renamed server setting `dns_cache_max_size` to `dns_cache_max_entries` to reduce ambiguity. [#60500](https://github.com/ClickHouse/ClickHouse/pull/60500) ([Kirill Nikiforov](https://github.com/allmazz)).
* `SHOW INDEX | INDEXES | INDICES | KEYS` no longer sorts by the primary key columns (which was unintuitive). [#60514](https://github.com/ClickHouse/ClickHouse/pull/60514) ([Robert Schulze](https://github.com/rschu1ze)).
* Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. [#60537](https://github.com/ClickHouse/ClickHouse/pull/60537) ([Antonio Andelic](https://github.com/antonio2368)).
* Added MergeTree read split ranges into intersecting and non intersecting fault injection using `merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_fault_probability` setting. [#60548](https://github.com/ClickHouse/ClickHouse/pull/60548) ([Maksim Kita](https://github.com/kitaisreal)).
* The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. [#60692](https://github.com/ClickHouse/ClickHouse/pull/60692) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes [#60726](https://github.com/ClickHouse/ClickHouse/issues/60726). [#60727](https://github.com/ClickHouse/ClickHouse/pull/60727) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update tzdata to 2024a. [#60768](https://github.com/ClickHouse/ClickHouse/pull/60768) ([Raúl Marín](https://github.com/Algunenano)).
* Support files without format extension in Filesystem database. [#60795](https://github.com/ClickHouse/ClickHouse/pull/60795) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper improvement: support `leadership_expiry_ms` in Keeper's settings. [#60806](https://github.com/ClickHouse/ClickHouse/pull/60806) ([Brokenice0415](https://github.com/Brokenice0415)).
* Always infer exponential numbers in JSON formats regardless of the setting `input_format_try_infer_exponent_floats`. Add setting `input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects` that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. [#60808](https://github.com/ClickHouse/ClickHouse/pull/60808) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for `START TRANSACTION` syntax typically used in MySQL syntax, resolving https://github.com/ClickHouse/ClickHouse/discussions/60865. [#60886](https://github.com/ClickHouse/ClickHouse/pull/60886) ([Zach Naimon](https://github.com/ArctypeZach)).
* Add a flag for SMJ to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. [#60896](https://github.com/ClickHouse/ClickHouse/pull/60896) ([loudongfeng](https://github.com/loudongfeng)).
* Clickhouse version has been added to docker labels. Closes [#54224](https://github.com/ClickHouse/ClickHouse/issues/54224). [#60949](https://github.com/ClickHouse/ClickHouse/pull/60949) ([Nikolay Monkov](https://github.com/nikmonkov)).
* Add a setting `parallel_replicas_allow_in_with_subquery = 1` which allows subqueries for IN work with parallel replicas. [#60950](https://github.com/ClickHouse/ClickHouse/pull/60950) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* DNSResolver shuffles set of resolved IPs. [#60965](https://github.com/ClickHouse/ClickHouse/pull/60965) ([Sema Checherinda](https://github.com/CheSema)).
* Support detect output format by file exctension in `clickhouse-client` and `clickhouse-local`. [#61036](https://github.com/ClickHouse/ClickHouse/pull/61036) ([豪肥肥](https://github.com/HowePa)).
* Check memory limit update periodically. [#61049](https://github.com/ClickHouse/ClickHouse/pull/61049) ([Han Fei](https://github.com/hanfei1991)).
* Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. [#61096](https://github.com/ClickHouse/ClickHouse/pull/61096) ([Azat Khuzhin](https://github.com/azat)).
* Add the function `toUInt128OrZero`, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases `FROM_UNIXTIME` and `DATE_FORMAT` (they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. [#61114](https://github.com/ClickHouse/ClickHouse/pull/61114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example: ```sql GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;. [#61115](https://github.com/ClickHouse/ClickHouse/pull/61115) ([pufit](https://github.com/pufit)).
* Fix an error in previeous opt: https://github.com/ClickHouse/ClickHouse/pull/59698: remove break to make sure the first filtered column has minimum size cc @jsc0218. [#61145](https://github.com/ClickHouse/ClickHouse/pull/61145) ([李扬](https://github.com/taiyang-li)).
* Fix `has()` function with `Nullable` column (fixes [#60214](https://github.com/ClickHouse/ClickHouse/issues/60214)). [#61249](https://github.com/ClickHouse/ClickHouse/pull/61249) ([Mikhail Koviazin](https://github.com/mkmkme)).
* Now it's possible to specify attribute `merge="true"` in config substitutions for subtrees `<include from_zk="/path" merge="true">`. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. [#61299](https://github.com/ClickHouse/ClickHouse/pull/61299) ([alesapin](https://github.com/alesapin)).
* Add async metrics for virtual memory mappings: VMMaxMapCount & VMNumMaps. Closes [#60662](https://github.com/ClickHouse/ClickHouse/issues/60662). [#61354](https://github.com/ClickHouse/ClickHouse/pull/61354) ([Tuan Pham Anh](https://github.com/tuanpavn)).
* Use `temporary_files_codec` setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in `partial_merge` JOIN algorithm. [#61456](https://github.com/ClickHouse/ClickHouse/pull/61456) ([Maksim Kita](https://github.com/kitaisreal)).
* Remove duplicated check `containing_part.empty()`, It's already being checked here: https://github.com/ClickHouse/ClickHouse/blob/1296dac3c7e47670872c15e3f5e58f869e0bd2f2/src/Storages/MergeTree/MergeTreeData.cpp#L6141. [#61467](https://github.com/ClickHouse/ClickHouse/pull/61467) ([William Schoeffel](https://github.com/wiledusc)).
* Add a new setting `max_parser_backtracks` which allows to limit the complexity of query parsing. [#61502](https://github.com/ClickHouse/ClickHouse/pull/61502) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support parallel reading for azure blob storage. [#61503](https://github.com/ClickHouse/ClickHouse/pull/61503) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Less contention during dynamic resize of filesystem cache. [#61524](https://github.com/ClickHouse/ClickHouse/pull/61524) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Disallow sharded mode of StorageS3 queue, because it will be rewritten. [#61537](https://github.com/ClickHouse/ClickHouse/pull/61537) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed typo: from `use_leagcy_max_level` to `use_legacy_max_level`. [#61545](https://github.com/ClickHouse/ClickHouse/pull/61545) ([William Schoeffel](https://github.com/wiledusc)).
* Remove some duplicate entries in blob_storage_log. [#61622](https://github.com/ClickHouse/ClickHouse/pull/61622) ([YenchangChan](https://github.com/YenchangChan)).
* Enable `allow_experimental_analyzer` setting by default. [#61652](https://github.com/ClickHouse/ClickHouse/pull/61652) ([Dmitry Novik](https://github.com/novikd)).
* Added `current_user` function as a compatibility alias for MySQL. [#61770](https://github.com/ClickHouse/ClickHouse/pull/61770) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. [#61785](https://github.com/ClickHouse/ClickHouse/pull/61785) ([Daniel Pozo Escalona](https://github.com/danipozo)).
* Enable `output_format_pretty_row_numbers` by default. It is better for usability. [#61791](https://github.com/ClickHouse/ClickHouse/pull/61791) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* In the previous version, some numbers in Pretty formats were not pretty enough. [#61794](https://github.com/ClickHouse/ClickHouse/pull/61794) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the `SHOW CREATE TABLE` query. [#61795](https://github.com/ClickHouse/ClickHouse/pull/61795) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Similarly to `clickhouse-local`, `clickhouse-client` will accept the `--output-format` option as a synonym to the `--format` option. This closes [#59848](https://github.com/ClickHouse/ClickHouse/issues/59848). [#61797](https://github.com/ClickHouse/ClickHouse/pull/61797) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* If stdout is a terminal and the output format is not specified, `clickhouse-client` and similar tools will use `PrettyCompact` by default, similarly to the interactive mode. `clickhouse-client` and `clickhouse-local` will handle command line arguments for input and output formats in a unified fashion. This closes [#61272](https://github.com/ClickHouse/ClickHouse/issues/61272). [#61800](https://github.com/ClickHouse/ClickHouse/pull/61800) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting, `output_format_pretty_highlight_digit_groups`. [#61802](https://github.com/ClickHouse/ClickHouse/pull/61802) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add ability to override initial INSERT SETTINGS via SYSTEM FLUSH DISTRIBUTED. [#61832](https://github.com/ClickHouse/ClickHouse/pull/61832) ([Azat Khuzhin](https://github.com/azat)).
* Fixed grammar from "a" to "the" in the warning message. There is only one Atomic engine, so it should be "to the new Atomic engine" instead of "to a new Atomic engine". [#61952](https://github.com/ClickHouse/ClickHouse/pull/61952) ([shabroo](https://github.com/shabroo)).
#### Build/Testing/Packaging Improvement
* Update sccache to the latest version; significantly reduce images size by reshaking the dependency trees; use the latest working odbc driver. [#59953](https://github.com/ClickHouse/ClickHouse/pull/59953) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update python related style checkers. Continue the [#50174](https://github.com/ClickHouse/ClickHouse/issues/50174). [#60408](https://github.com/ClickHouse/ClickHouse/pull/60408) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Upgrade `prqlc` to 0.11.3. [#60616](https://github.com/ClickHouse/ClickHouse/pull/60616) ([Maximilian Roos](https://github.com/max-sixty)).
* Attach gdb to running fuzzer process. [#60654](https://github.com/ClickHouse/ClickHouse/pull/60654) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use explicit template instantiation more aggressively. Get rid of templates in favor of overloaded functions in some places. [#60730](https://github.com/ClickHouse/ClickHouse/pull/60730) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* The real-time query profiler now works on AArch64. In previous versions, it worked only when a program didn't spend time inside a syscall. [#60807](https://github.com/ClickHouse/ClickHouse/pull/60807) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ... Too big translation unit in `Aggregator`. [#61211](https://github.com/ClickHouse/ClickHouse/pull/61211) ([lgbo](https://github.com/lgbo-ustc)).
* Fixed flakiness of 01603_insert_select_too_many_parts test. Closes [#61158](https://github.com/ClickHouse/ClickHouse/issues/61158). [#61259](https://github.com/ClickHouse/ClickHouse/pull/61259) ([Ilya Yatsishin](https://github.com/qoega)).
* Now it possible to use `chassert(expression, comment)` in the codebase. [#61263](https://github.com/ClickHouse/ClickHouse/pull/61263) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Teach the fuzzer to use other numeric types. [#61317](https://github.com/ClickHouse/ClickHouse/pull/61317) ([Raúl Marín](https://github.com/Algunenano)).
* Increase memory limit for coverage builds. [#61405](https://github.com/ClickHouse/ClickHouse/pull/61405) ([Raúl Marín](https://github.com/Algunenano)).
* Add generic query text fuzzer in `clickhouse-local`. [#61508](https://github.com/ClickHouse/ClickHouse/pull/61508) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix function execution over const and LowCardinality with GROUP BY const for analyzer [#59986](https://github.com/ClickHouse/ClickHouse/pull/59986) ([Azat Khuzhin](https://github.com/azat)).
* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)).
* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)).
* Azure Blob Storage : Fix issues endpoint and prefix [#60251](https://github.com/ClickHouse/ClickHouse/pull/60251) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* fix LRUResource Cache bug (Hive cache) [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)).
* Force reanalysis if parallel replicas changed [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)).
* Fix usage of plain metadata type with new disks configuration option [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)).
* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)).
* Keeper fix: add timeouts when waiting for commit logs [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)).
* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)).
* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)).
* Fix reading from MergeTree with non-deterministic functions in filter [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix logical error on bad compatibility setting value type [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix inconsistent aggregate function states in mixed x86-64 / ARM clusters [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)).
* fix(prql): Robust panic handler [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)).
* Fix `intDiv` for decimal and date arguments [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix: expand CTE in alter modify query [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)).
* Fix "Invalid storage definition in metadata file" for parameterized views [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)).
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove wrong sanitize checking in aggregate function quantileGK [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)).
* Fix insert-select + insert_deduplication_token bug by setting streams to 1 [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)).
* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
* Fix toStartOfInterval [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix possible stuck on error in HashedDictionaryParallelLoader [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)).
* Fix async RESTORE with Replicated database [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)).
* fix csv format not support tuple [#60994](https://github.com/ClickHouse/ClickHouse/pull/60994) ([shuai.xu](https://github.com/shuai-xu)).
* Fix deadlock in async inserts to `Log` tables via native protocol [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)).
* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)).
* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix usage of session_token in S3 engine [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)).
* Fix bugs in show database [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical error in RabbitMQ storage with MATERIALIZED columns [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)).
* Fix CREATE OR REPLACE DICTIONARY [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix crash in ObjectJson parsing array with nulls [#61364](https://github.com/ClickHouse/ClickHouse/pull/61364) ([vdimir](https://github.com/vdimir)).
* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)).
* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)).
* Fix finishing a failed RESTORE [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)).
* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)).
* Allow queuing in restore pool [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)).
* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)).
* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)).
* Fix crash in window view [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `repeat` with non native integers [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Reset part level upon attach from disk on MergeTree [#61536](https://github.com/ClickHouse/ClickHouse/pull/61536) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix crash in arrayPartialReverseSort [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)).
* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix addDays cause an error when used datetime64 [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)).
* disallow LowCardinality input type for JSONExtract [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix `system.part_log` for async insert with deduplication [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Non-ready set for system.parts. [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Don't allow the same expression in ORDER BY with and without WITH FILL [#61667](https://github.com/ClickHouse/ClickHouse/pull/61667) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix actual_part_name for REPLACE_RANGE (`Entry actual part isn't empty yet`) [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix columns after executing MODIFY QUERY for a materialized view with internal table [#61734](https://github.com/ClickHouse/ClickHouse/pull/61734) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).
* Fix RANGE frame is not supported for Nullable columns. [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)).
* Revert "Revert "Fix bug when reading system.parts using UUID (issue 61220)."" [#61779](https://github.com/ClickHouse/ClickHouse/pull/61779) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### CI Fix or Improvement (changelog entry is not required)
* Decoupled changes from [#60408](https://github.com/ClickHouse/ClickHouse/issues/60408). [#60553](https://github.com/ClickHouse/ClickHouse/pull/60553) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Eliminates the need to provide input args to docker server jobs to clean yml files. [#60602](https://github.com/ClickHouse/ClickHouse/pull/60602) ([Max K.](https://github.com/maxknv)).
* Debug and fix markreleaseready. [#60611](https://github.com/ClickHouse/ClickHouse/pull/60611) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix build_report job so that it's defined by ci_config only (not yml file). [#60613](https://github.com/ClickHouse/ClickHouse/pull/60613) ([Max K.](https://github.com/maxknv)).
* Do not await ci pending jobs on release branches decrease wait timeout to fit into gh job timeout. [#60652](https://github.com/ClickHouse/ClickHouse/pull/60652) ([Max K.](https://github.com/maxknv)).
* Set limited number of builds for "special build check" report in backports. [#60850](https://github.com/ClickHouse/ClickHouse/pull/60850) ([Max K.](https://github.com/maxknv)).
* ... [#60935](https://github.com/ClickHouse/ClickHouse/pull/60935) ([Max K.](https://github.com/maxknv)).
* ... [#60947](https://github.com/ClickHouse/ClickHouse/pull/60947) ([Max K.](https://github.com/maxknv)).
* ... [#60952](https://github.com/ClickHouse/ClickHouse/pull/60952) ([Max K.](https://github.com/maxknv)).
* ... [#60958](https://github.com/ClickHouse/ClickHouse/pull/60958) ([Max K.](https://github.com/maxknv)).
* ... [#61022](https://github.com/ClickHouse/ClickHouse/pull/61022) ([Max K.](https://github.com/maxknv)).
* Just a preparation for the merge queue support. [#61099](https://github.com/ClickHouse/ClickHouse/pull/61099) ([Max K.](https://github.com/maxknv)).
* ... [#61133](https://github.com/ClickHouse/ClickHouse/pull/61133) ([Max K.](https://github.com/maxknv)).
* In PRs: - run typos, aspell check - always - run pylint, mypy - only if py file(s) changed in PRs - run basic source files style check - only if not all changes in py files. [#61148](https://github.com/ClickHouse/ClickHouse/pull/61148) ([Max K.](https://github.com/maxknv)).
* ... [#61172](https://github.com/ClickHouse/ClickHouse/pull/61172) ([Max K.](https://github.com/maxknv)).
* ... [#61183](https://github.com/ClickHouse/ClickHouse/pull/61183) ([Han Fei](https://github.com/hanfei1991)).
* ... [#61185](https://github.com/ClickHouse/ClickHouse/pull/61185) ([Max K.](https://github.com/maxknv)).
* TBD. [#61197](https://github.com/ClickHouse/ClickHouse/pull/61197) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* ... [#61214](https://github.com/ClickHouse/ClickHouse/pull/61214) ([Max K.](https://github.com/maxknv)).
* ... [#61441](https://github.com/ClickHouse/ClickHouse/pull/61441) ([Max K.](https://github.com/maxknv)).
* ![Screenshot_20240323_025055](https://github.com/ClickHouse/ClickHouse/assets/18581488/ccaab212-a1d3-4dfb-8d56-b1991760b6bf). [#61801](https://github.com/ClickHouse/ClickHouse/pull/61801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ... [#61877](https://github.com/ClickHouse/ClickHouse/pull/61877) ([Max K.](https://github.com/maxknv)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Revert "Use `MergeTree` as a default table engine""'. [#60524](https://github.com/ClickHouse/ClickHouse/pull/60524) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Revert "Support resource request canceling""'. [#60558](https://github.com/ClickHouse/ClickHouse/pull/60558) ([Sergei Trifonov](https://github.com/serxa)).
* NO CL ENTRY: 'Revert "Add `toMillisecond` function"'. [#60644](https://github.com/ClickHouse/ClickHouse/pull/60644) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Synchronize parsers"'. [#60759](https://github.com/ClickHouse/ClickHouse/pull/60759) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Fix wacky primary key sorting in `SHOW INDEX`"'. [#60898](https://github.com/ClickHouse/ClickHouse/pull/60898) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "CI: make style check faster"'. [#61142](https://github.com/ClickHouse/ClickHouse/pull/61142) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Don't allow to set max_parallel_replicas to 0 as it doesn't make sense"'. [#61200](https://github.com/ClickHouse/ClickHouse/pull/61200) ([Kruglov Pavel](https://github.com/Avogar)).
* NO CL ENTRY: 'Revert "Fix usage of session_token in S3 engine"'. [#61359](https://github.com/ClickHouse/ClickHouse/pull/61359) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "Revert "Fix usage of session_token in S3 engine""'. [#61362](https://github.com/ClickHouse/ClickHouse/pull/61362) ([Kruglov Pavel](https://github.com/Avogar)).
* NO CL ENTRY: 'Reorder hidden and shown checks in comment, change url of Mergeable check'. [#61373](https://github.com/ClickHouse/ClickHouse/pull/61373) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* NO CL ENTRY: 'Remove unnecessary layers from clickhouse/cctools'. [#61374](https://github.com/ClickHouse/ClickHouse/pull/61374) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* NO CL ENTRY: 'Revert "Updated format settings references in the docs (datetime.md)"'. [#61435](https://github.com/ClickHouse/ClickHouse/pull/61435) ([Kruglov Pavel](https://github.com/Avogar)).
* NO CL ENTRY: 'Revert "CI: ARM integration tests: disable tests with HDFS "'. [#61449](https://github.com/ClickHouse/ClickHouse/pull/61449) ([Max K.](https://github.com/maxknv)).
* NO CL ENTRY: 'Revert "Analyzer: Fix virtual columns in StorageMerge"'. [#61518](https://github.com/ClickHouse/ClickHouse/pull/61518) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "Revert "Analyzer: Fix virtual columns in StorageMerge""'. [#61528](https://github.com/ClickHouse/ClickHouse/pull/61528) ([Dmitry Novik](https://github.com/novikd)).
* NO CL ENTRY: 'Improve build_download_helper'. [#61592](https://github.com/ClickHouse/ClickHouse/pull/61592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* NO CL ENTRY: 'Revert "Un-flake `test_undrop_query`"'. [#61668](https://github.com/ClickHouse/ClickHouse/pull/61668) ([Robert Schulze](https://github.com/rschu1ze)).
* NO CL ENTRY: 'Fix flaky tests (stateless, integration)'. [#61816](https://github.com/ClickHouse/ClickHouse/pull/61816) ([Nikita Fomichev](https://github.com/fm4v)).
* NO CL ENTRY: 'Better usability of "expect" tests: less trouble with running directly'. [#61818](https://github.com/ClickHouse/ClickHouse/pull/61818) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix flaky `02122_parallel_formatting_Template`"'. [#61868](https://github.com/ClickHouse/ClickHouse/pull/61868) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Add --now option to enable and start the service" #job_Install_packages_amd64'. [#61878](https://github.com/ClickHouse/ClickHouse/pull/61878) ([Max K.](https://github.com/maxknv)).
* NO CL ENTRY: 'Revert "disallow LowCardinality input type for JSONExtract"'. [#61960](https://github.com/ClickHouse/ClickHouse/pull/61960) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Improve query performance in case of very small blocks [#58879](https://github.com/ClickHouse/ClickHouse/pull/58879) ([Azat Khuzhin](https://github.com/azat)).
* Analyzer: fixes for JOIN columns resolution [#59007](https://github.com/ClickHouse/ClickHouse/pull/59007) ([vdimir](https://github.com/vdimir)).
* Fix race on `Context::async_insert_queue` [#59082](https://github.com/ClickHouse/ClickHouse/pull/59082) ([Alexander Tokmakov](https://github.com/tavplubix)).
* CI: support batch specification in commit message [#59738](https://github.com/ClickHouse/ClickHouse/pull/59738) ([Max K.](https://github.com/maxknv)).
* Update storing-data.md [#60024](https://github.com/ClickHouse/ClickHouse/pull/60024) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make max_insert_delayed_streams_for_parallel_write actually work [#60079](https://github.com/ClickHouse/ClickHouse/pull/60079) ([alesapin](https://github.com/alesapin)).
* Analyzer: support join using column from select list [#60182](https://github.com/ClickHouse/ClickHouse/pull/60182) ([vdimir](https://github.com/vdimir)).
* test for [#60223](https://github.com/ClickHouse/ClickHouse/issues/60223) [#60258](https://github.com/ClickHouse/ClickHouse/pull/60258) ([Denny Crane](https://github.com/den-crane)).
* Analyzer: Refactor execution name for ConstantNode [#60313](https://github.com/ClickHouse/ClickHouse/pull/60313) ([Dmitry Novik](https://github.com/novikd)).
* Fix database iterator waiting code [#60314](https://github.com/ClickHouse/ClickHouse/pull/60314) ([Sergei Trifonov](https://github.com/serxa)).
* QueryCache: Don't acquire the query count mutex if not necessary [#60348](https://github.com/ClickHouse/ClickHouse/pull/60348) ([zhongyuankai](https://github.com/zhongyuankai)).
* Fix bugfix check (due to unknown commit_logs_cache_size_threshold) [#60375](https://github.com/ClickHouse/ClickHouse/pull/60375) ([Azat Khuzhin](https://github.com/azat)).
* Enable testing with `io_uring` back [#60383](https://github.com/ClickHouse/ClickHouse/pull/60383) ([Nikita Taranov](https://github.com/nickitat)).
* Analyzer - improve hiding secret arguments. [#60386](https://github.com/ClickHouse/ClickHouse/pull/60386) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* CI: make workflow yml abstract [#60421](https://github.com/ClickHouse/ClickHouse/pull/60421) ([Max K.](https://github.com/maxknv)).
* Improve test test_reload_clusters_config [#60426](https://github.com/ClickHouse/ClickHouse/pull/60426) ([Kruglov Pavel](https://github.com/Avogar)).
* Revert "Revert "Merge pull request [#56864](https://github.com/ClickHouse/ClickHouse/issues/56864) from ClickHouse/broken-projections-better-handling"" [#60452](https://github.com/ClickHouse/ClickHouse/pull/60452) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not check to and from files existence in metadata_storage because it does not see uncommitted changes [#60462](https://github.com/ClickHouse/ClickHouse/pull/60462) ([Alexander Gololobov](https://github.com/davenger)).
* Fix option ambiguous in `clickhouse-local` [#60475](https://github.com/ClickHouse/ClickHouse/pull/60475) ([豪肥肥](https://github.com/HowePa)).
* Fix: test_parallel_replicas_custom_key_load_balancing [#60485](https://github.com/ClickHouse/ClickHouse/pull/60485) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix: progress bar for *Cluster table functions [#60491](https://github.com/ClickHouse/ClickHouse/pull/60491) ([Igor Nikonov](https://github.com/devcrafter)).
* Analyzer: Support different ObjectJSON on shards [#60497](https://github.com/ClickHouse/ClickHouse/pull/60497) ([Dmitry Novik](https://github.com/novikd)).
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).
* Refactor StorageSystemOneBlock [#60510](https://github.com/ClickHouse/ClickHouse/pull/60510) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Simple cleanup while fixing progress bar [#60513](https://github.com/ClickHouse/ClickHouse/pull/60513) ([Igor Nikonov](https://github.com/devcrafter)).
* PullingAsyncPipelineExecutor cleanup [#60515](https://github.com/ClickHouse/ClickHouse/pull/60515) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix bad error message [#60518](https://github.com/ClickHouse/ClickHouse/pull/60518) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Synchronize Access [#60519](https://github.com/ClickHouse/ClickHouse/pull/60519) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Synchronize metrics and Keeper [#60520](https://github.com/ClickHouse/ClickHouse/pull/60520) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enforce clang-tidy in `programs/` and `utils/` headers [#60521](https://github.com/ClickHouse/ClickHouse/pull/60521) ([Robert Schulze](https://github.com/rschu1ze)).
* Synchronize parsers [#60522](https://github.com/ClickHouse/ClickHouse/pull/60522) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix a bunch of clang-tidy warnings in headers [#60523](https://github.com/ClickHouse/ClickHouse/pull/60523) ([Robert Schulze](https://github.com/rschu1ze)).
* General sanity in function `seriesOutliersDetectTukey` [#60535](https://github.com/ClickHouse/ClickHouse/pull/60535) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update Chinese document for max_query_size, max_parser_depth and optimize_functions_to_subcolumns [#60541](https://github.com/ClickHouse/ClickHouse/pull/60541) ([Alex Cheng](https://github.com/Alex-Cheng)).
* Userspace page cache again [#60552](https://github.com/ClickHouse/ClickHouse/pull/60552) ([Michael Kolupaev](https://github.com/al13n321)).
* Traverse shadow directory for system.remote_data_paths [#60585](https://github.com/ClickHouse/ClickHouse/pull/60585) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Add test for [#58906](https://github.com/ClickHouse/ClickHouse/issues/58906) [#60597](https://github.com/ClickHouse/ClickHouse/pull/60597) ([Raúl Marín](https://github.com/Algunenano)).
* Use python zipfile to have x-platform idempotent lambda packages [#60603](https://github.com/ClickHouse/ClickHouse/pull/60603) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* tests: suppress data-race in librdkafka statistics code [#60604](https://github.com/ClickHouse/ClickHouse/pull/60604) ([Azat Khuzhin](https://github.com/azat)).
* Update version after release [#60605](https://github.com/ClickHouse/ClickHouse/pull/60605) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v24.2.1.2248-stable [#60607](https://github.com/ClickHouse/ClickHouse/pull/60607) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Addition to changelog [#60609](https://github.com/ClickHouse/ClickHouse/pull/60609) ([Anton Popov](https://github.com/CurtizJ)).
* internal: Refine rust prql code [#60617](https://github.com/ClickHouse/ClickHouse/pull/60617) ([Maximilian Roos](https://github.com/max-sixty)).
* fix(rust): Fix skim's panic handler [#60621](https://github.com/ClickHouse/ClickHouse/pull/60621) ([Maximilian Roos](https://github.com/max-sixty)).
* Resubmit "Analyzer: compute ALIAS columns right after reading" [#60641](https://github.com/ClickHouse/ClickHouse/pull/60641) ([vdimir](https://github.com/vdimir)).
* Analyzer: Fix bug with join_use_nulls and PREWHERE [#60655](https://github.com/ClickHouse/ClickHouse/pull/60655) ([vdimir](https://github.com/vdimir)).
* Add test for [#59891](https://github.com/ClickHouse/ClickHouse/issues/59891) [#60657](https://github.com/ClickHouse/ClickHouse/pull/60657) ([Raúl Marín](https://github.com/Algunenano)).
* Fix missed entries in system.part_log in case of fetch preferred over merges/mutations [#60659](https://github.com/ClickHouse/ClickHouse/pull/60659) ([Azat Khuzhin](https://github.com/azat)).
* Always apply first minmax index among available skip indices [#60675](https://github.com/ClickHouse/ClickHouse/pull/60675) ([Igor Nikonov](https://github.com/devcrafter)).
* Remove bad test `02152_http_external_tables_memory_tracking` [#60690](https://github.com/ClickHouse/ClickHouse/pull/60690) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix questionable behavior in the `parseDateTimeBestEffort` function. [#60691](https://github.com/ClickHouse/ClickHouse/pull/60691) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky checks [#60694](https://github.com/ClickHouse/ClickHouse/pull/60694) ([Azat Khuzhin](https://github.com/azat)).
* Resubmit http_external_tables_memory_tracking test [#60695](https://github.com/ClickHouse/ClickHouse/pull/60695) ([Azat Khuzhin](https://github.com/azat)).
* Fix bugfix and upgrade checks (due to "Unknown handler type 'redirect'" error) [#60696](https://github.com/ClickHouse/ClickHouse/pull/60696) ([Azat Khuzhin](https://github.com/azat)).
* Fix test_grant_and_revoke/test.py::test_grant_all_on_table (after syncing with cloud) [#60699](https://github.com/ClickHouse/ClickHouse/pull/60699) ([Azat Khuzhin](https://github.com/azat)).
* Remove unit test for ColumnObject [#60709](https://github.com/ClickHouse/ClickHouse/pull/60709) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve unit tests [#60710](https://github.com/ClickHouse/ClickHouse/pull/60710) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix scheduler fairness test [#60712](https://github.com/ClickHouse/ClickHouse/pull/60712) ([Sergei Trifonov](https://github.com/serxa)).
* Do not retry queries if container is down in integration tests (resubmit) [#60714](https://github.com/ClickHouse/ClickHouse/pull/60714) ([Azat Khuzhin](https://github.com/azat)).
* Mark one setting as obsolete [#60715](https://github.com/ClickHouse/ClickHouse/pull/60715) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix a test with Analyzer [#60723](https://github.com/ClickHouse/ClickHouse/pull/60723) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Two tests are fixed with Analyzer [#60724](https://github.com/ClickHouse/ClickHouse/pull/60724) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old code [#60728](https://github.com/ClickHouse/ClickHouse/pull/60728) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove more code from LIVE VIEW [#60729](https://github.com/ClickHouse/ClickHouse/pull/60729) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `test_keeper_back_to_back/test.py::test_concurrent_watches` [#60749](https://github.com/ClickHouse/ClickHouse/pull/60749) ([Antonio Andelic](https://github.com/antonio2368)).
* Catch exceptions on finalize in `InterserverIOHTTPHandler` [#60769](https://github.com/ClickHouse/ClickHouse/pull/60769) ([Antonio Andelic](https://github.com/antonio2368)).
* Reduce flakiness of 02932_refreshable_materialized_views [#60771](https://github.com/ClickHouse/ClickHouse/pull/60771) ([Michael Kolupaev](https://github.com/al13n321)).
* Use 64-bit capabilities if available [#60775](https://github.com/ClickHouse/ClickHouse/pull/60775) ([Azat Khuzhin](https://github.com/azat)).
* Include multiline logs in fuzzer fatal.log report [#60796](https://github.com/ClickHouse/ClickHouse/pull/60796) ([Raúl Marín](https://github.com/Algunenano)).
* Add missing clone calls related to compression [#60810](https://github.com/ClickHouse/ClickHouse/pull/60810) ([Raúl Marín](https://github.com/Algunenano)).
* New private runners [#60811](https://github.com/ClickHouse/ClickHouse/pull/60811) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Move userspace page cache settings to the correct section of SettingsChangeHistory.h [#60812](https://github.com/ClickHouse/ClickHouse/pull/60812) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version_date.tsv and changelogs after v23.8.10.43-lts [#60851](https://github.com/ClickHouse/ClickHouse/pull/60851) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix fuzzer report [#60853](https://github.com/ClickHouse/ClickHouse/pull/60853) ([Raúl Marín](https://github.com/Algunenano)).
* Update version_date.tsv and changelogs after v23.3.20.27-lts [#60857](https://github.com/ClickHouse/ClickHouse/pull/60857) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Refactor OptimizeDateOrDateTimeConverterWithPreimageVisitor [#60875](https://github.com/ClickHouse/ClickHouse/pull/60875) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Fix race in PageCache [#60878](https://github.com/ClickHouse/ClickHouse/pull/60878) ([Michael Kolupaev](https://github.com/al13n321)).
* Small changes in async inserts code [#60885](https://github.com/ClickHouse/ClickHouse/pull/60885) ([Nikita Taranov](https://github.com/nickitat)).
* Remove useless verbose logging from AWS library [#60921](https://github.com/ClickHouse/ClickHouse/pull/60921) ([alesapin](https://github.com/alesapin)).
* Throw on query timeout in ZooKeeperRetries [#60922](https://github.com/ClickHouse/ClickHouse/pull/60922) ([Antonio Andelic](https://github.com/antonio2368)).
* Bring clickhouse-test changes from private [#60924](https://github.com/ClickHouse/ClickHouse/pull/60924) ([Raúl Marín](https://github.com/Algunenano)).
* Add debug info to exceptions in `IMergeTreeDataPart::checkConsistency()` [#60981](https://github.com/ClickHouse/ClickHouse/pull/60981) ([Nikita Taranov](https://github.com/nickitat)).
* Fix a typo [#60987](https://github.com/ClickHouse/ClickHouse/pull/60987) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Replace some header includes with forward declarations [#61003](https://github.com/ClickHouse/ClickHouse/pull/61003) ([Amos Bird](https://github.com/amosbird)).
* Speed up cctools building [#61011](https://github.com/ClickHouse/ClickHouse/pull/61011) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix ASTRenameQuery::clone [#61013](https://github.com/ClickHouse/ClickHouse/pull/61013) ([vdimir](https://github.com/vdimir)).
* Update README.md [#61021](https://github.com/ClickHouse/ClickHouse/pull/61021) ([Tyler Hannan](https://github.com/tylerhannan)).
* Fix TableFunctionExecutable::skipAnalysisForArguments [#61037](https://github.com/ClickHouse/ClickHouse/pull/61037) ([Dmitry Novik](https://github.com/novikd)).
* Fix: parallel replicas with PREWHERE (ubsan) [#61052](https://github.com/ClickHouse/ClickHouse/pull/61052) ([Igor Nikonov](https://github.com/devcrafter)).
* Fast fix tests. [#61056](https://github.com/ClickHouse/ClickHouse/pull/61056) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `test_placement_info` [#61057](https://github.com/ClickHouse/ClickHouse/pull/61057) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix: parallel replicas with CTEs, crash in EXPLAIN SYNTAX with analyzer [#61059](https://github.com/ClickHouse/ClickHouse/pull/61059) ([Igor Nikonov](https://github.com/devcrafter)).
* Debug fuzzer failures [#61062](https://github.com/ClickHouse/ClickHouse/pull/61062) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add regression tests for fixed issues [#61076](https://github.com/ClickHouse/ClickHouse/pull/61076) ([Antonio Andelic](https://github.com/antonio2368)).
* Analyzer: Fix 01244_optimize_distributed_group_by_sharding_key [#61089](https://github.com/ClickHouse/ClickHouse/pull/61089) ([Dmitry Novik](https://github.com/novikd)).
* Use global scalars cache with analyzer [#61104](https://github.com/ClickHouse/ClickHouse/pull/61104) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix removing is_active node after re-creation [#61105](https://github.com/ClickHouse/ClickHouse/pull/61105) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update 02962_system_sync_replica_lightweight_from_modifier.sh [#61110](https://github.com/ClickHouse/ClickHouse/pull/61110) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Simplify bridges [#61118](https://github.com/ClickHouse/ClickHouse/pull/61118) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* update cppkafka to v0.4.1 [#61119](https://github.com/ClickHouse/ClickHouse/pull/61119) ([Ilya Golshtein](https://github.com/ilejn)).
* CI: add wf class in ci_config [#61122](https://github.com/ClickHouse/ClickHouse/pull/61122) ([Max K.](https://github.com/maxknv)).
* QueryFuzzer: replace element randomly when AST part buffer is full [#61124](https://github.com/ClickHouse/ClickHouse/pull/61124) ([Tomer Shafir](https://github.com/tomershafir)).
* CI: make style check fast [#61125](https://github.com/ClickHouse/ClickHouse/pull/61125) ([Max K.](https://github.com/maxknv)).
* Better gitignore [#61128](https://github.com/ClickHouse/ClickHouse/pull/61128) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix something strange [#61129](https://github.com/ClickHouse/ClickHouse/pull/61129) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update check-large-objects.sh to be language neutral [#61130](https://github.com/ClickHouse/ClickHouse/pull/61130) ([Dan Wu](https://github.com/wudanzy)).
* Throw memory limit exceptions to avoid OOM in some places [#61132](https://github.com/ClickHouse/ClickHouse/pull/61132) ([alesapin](https://github.com/alesapin)).
* Fix test_distributed_directory_monitor_split_batch_on_failure flakienss [#61136](https://github.com/ClickHouse/ClickHouse/pull/61136) ([Azat Khuzhin](https://github.com/azat)).
* Fix llvm symbolizer on CI [#61147](https://github.com/ClickHouse/ClickHouse/pull/61147) ([Azat Khuzhin](https://github.com/azat)).
* Some clang-tidy fixes [#61150](https://github.com/ClickHouse/ClickHouse/pull/61150) ([Robert Schulze](https://github.com/rschu1ze)).
* Revive "Less contention in the cache, part 2" [#61152](https://github.com/ClickHouse/ClickHouse/pull/61152) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Enable black back [#61159](https://github.com/ClickHouse/ClickHouse/pull/61159) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: fix nightly job issue [#61160](https://github.com/ClickHouse/ClickHouse/pull/61160) ([Max K.](https://github.com/maxknv)).
* Split `RangeHashedDictionary` [#61162](https://github.com/ClickHouse/ClickHouse/pull/61162) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Remove a few templates from Aggregator.cpp [#61171](https://github.com/ClickHouse/ClickHouse/pull/61171) ([Raúl Marín](https://github.com/Algunenano)).
* Avoid some logical errors in experimantal Object type [#61173](https://github.com/ClickHouse/ClickHouse/pull/61173) ([Kruglov Pavel](https://github.com/Avogar)).
* Update ReadSettings.h [#61174](https://github.com/ClickHouse/ClickHouse/pull/61174) ([Kseniia Sumarokova](https://github.com/kssenii)).
* CI: ARM integration tests: disable tests with HDFS [#61182](https://github.com/ClickHouse/ClickHouse/pull/61182) ([Max K.](https://github.com/maxknv)).
* Disable sanitizers with 02784_parallel_replicas_automatic_decision_join [#61184](https://github.com/ClickHouse/ClickHouse/pull/61184) ([Raúl Marín](https://github.com/Algunenano)).
* Fix `02887_mutations_subcolumns` test flakiness [#61198](https://github.com/ClickHouse/ClickHouse/pull/61198) ([Nikita Taranov](https://github.com/nickitat)).
* Make variant tests a bit faster [#61199](https://github.com/ClickHouse/ClickHouse/pull/61199) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix strange log message [#61206](https://github.com/ClickHouse/ClickHouse/pull/61206) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix 01603_insert_select_too_many_parts flakiness [#61218](https://github.com/ClickHouse/ClickHouse/pull/61218) ([Azat Khuzhin](https://github.com/azat)).
* Make every style-checker runner types scaling-out very quickly [#61231](https://github.com/ClickHouse/ClickHouse/pull/61231) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve `test_failed_mutations` [#61235](https://github.com/ClickHouse/ClickHouse/pull/61235) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix `test_merge_tree_load_parts/test.py::test_merge_tree_load_parts_corrupted` [#61236](https://github.com/ClickHouse/ClickHouse/pull/61236) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix `forget_partition` test [#61237](https://github.com/ClickHouse/ClickHouse/pull/61237) ([Sergei Trifonov](https://github.com/serxa)).
* Print more info in `02572_system_logs_materialized_views_ignore_errors` to debug [#61246](https://github.com/ClickHouse/ClickHouse/pull/61246) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix runtime error in AST Fuzzer [#61248](https://github.com/ClickHouse/ClickHouse/pull/61248) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add retries to `02908_many_requests_to_system_replicas` [#61253](https://github.com/ClickHouse/ClickHouse/pull/61253) ([Nikita Taranov](https://github.com/nickitat)).
* Followup fix ASTRenameQuery::clone [#61254](https://github.com/ClickHouse/ClickHouse/pull/61254) ([vdimir](https://github.com/vdimir)).
* Disable test 02998_primary_key_skip_columns.sql in sanitizer builds as it can be slow [#61256](https://github.com/ClickHouse/ClickHouse/pull/61256) ([Kruglov Pavel](https://github.com/Avogar)).
* Update curl to curl with data race fix [#61264](https://github.com/ClickHouse/ClickHouse/pull/61264) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix `01417_freeze_partition_verbose` [#61266](https://github.com/ClickHouse/ClickHouse/pull/61266) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Free memory earlier in inserts [#61267](https://github.com/ClickHouse/ClickHouse/pull/61267) ([Anton Popov](https://github.com/CurtizJ)).
* Fixing test_build_sets_from_multiple_threads/test.py::test_set [#61286](https://github.com/ClickHouse/ClickHouse/pull/61286) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: Fix virtual columns in StorageMerge [#61298](https://github.com/ClickHouse/ClickHouse/pull/61298) ([Dmitry Novik](https://github.com/novikd)).
* Fix 01952_optimize_distributed_group_by_sharding_key with analyzer. [#61301](https://github.com/ClickHouse/ClickHouse/pull/61301) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* fix data race in poco tcp server [#61309](https://github.com/ClickHouse/ClickHouse/pull/61309) ([Sema Checherinda](https://github.com/CheSema)).
* Don't use default cluster in test test_distibuted_settings [#61314](https://github.com/ClickHouse/ClickHouse/pull/61314) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix false positive assertion in cache [#61319](https://github.com/ClickHouse/ClickHouse/pull/61319) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test test_input_format_parallel_parsing_memory_tracking [#61322](https://github.com/ClickHouse/ClickHouse/pull/61322) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 01761_cast_to_enum_nullable with analyzer. [#61323](https://github.com/ClickHouse/ClickHouse/pull/61323) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add zookeeper retries for exists check in forcefullyRemoveBrokenOutdatedPartFromZooKeeper [#61324](https://github.com/ClickHouse/ClickHouse/pull/61324) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Minor changes in stress and fuzzer reports [#61333](https://github.com/ClickHouse/ClickHouse/pull/61333) ([Raúl Marín](https://github.com/Algunenano)).
* Un-flake `test_undrop_query` [#61348](https://github.com/ClickHouse/ClickHouse/pull/61348) ([Robert Schulze](https://github.com/rschu1ze)).
* Tiny improvement for replication.lib [#61361](https://github.com/ClickHouse/ClickHouse/pull/61361) ([alesapin](https://github.com/alesapin)).
* Fix bugfix check (due to "unknown object storage type: azure") [#61363](https://github.com/ClickHouse/ClickHouse/pull/61363) ([Azat Khuzhin](https://github.com/azat)).
* Fix `01599_multiline_input_and_singleline_comments` 3 minute wait [#61371](https://github.com/ClickHouse/ClickHouse/pull/61371) ([Sergei Trifonov](https://github.com/serxa)).
* Terminate EC2 on spot event if runner isn't running [#61377](https://github.com/ClickHouse/ClickHouse/pull/61377) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Try fix docs check [#61378](https://github.com/ClickHouse/ClickHouse/pull/61378) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix `heap-use-after-free` for Merge table with alias [#61380](https://github.com/ClickHouse/ClickHouse/pull/61380) ([Antonio Andelic](https://github.com/antonio2368)).
* Disable `optimize_rewrite_sum_if_to_count_if` if return type is nullable (new analyzer) [#61389](https://github.com/ClickHouse/ClickHouse/pull/61389) ([Antonio Andelic](https://github.com/antonio2368)).
* Analyzer: Fix planner context for subquery in StorageMerge [#61392](https://github.com/ClickHouse/ClickHouse/pull/61392) ([Dmitry Novik](https://github.com/novikd)).
* Fix `test_failed_async_inserts` [#61394](https://github.com/ClickHouse/ClickHouse/pull/61394) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix test test_system_clusters_actual_information flakiness [#61395](https://github.com/ClickHouse/ClickHouse/pull/61395) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove default cluster from default config from test config [#61396](https://github.com/ClickHouse/ClickHouse/pull/61396) ([Raúl Marín](https://github.com/Algunenano)).
* Enable clang-tidy in headers [#61406](https://github.com/ClickHouse/ClickHouse/pull/61406) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sanity check for poll_max_batch_size FileLog setting [#61408](https://github.com/ClickHouse/ClickHouse/pull/61408) ([Kruglov Pavel](https://github.com/Avogar)).
* ThreadFuzzer: randomize sleep time [#61410](https://github.com/ClickHouse/ClickHouse/pull/61410) ([Tomer Shafir](https://github.com/tomershafir)).
* Update version_date.tsv and changelogs after v23.8.11.28-lts [#61416](https://github.com/ClickHouse/ClickHouse/pull/61416) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.3.21.26-lts [#61418](https://github.com/ClickHouse/ClickHouse/pull/61418) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.1.7.18-stable [#61419](https://github.com/ClickHouse/ClickHouse/pull/61419) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.2.2.71-stable [#61420](https://github.com/ClickHouse/ClickHouse/pull/61420) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.12.5.81-stable [#61421](https://github.com/ClickHouse/ClickHouse/pull/61421) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Restore automerge for approved PRs [#61433](https://github.com/ClickHouse/ClickHouse/pull/61433) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Disable broken SonarCloud [#61434](https://github.com/ClickHouse/ClickHouse/pull/61434) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix `01599_multiline_input_and_singleline_comments` properly [#61440](https://github.com/ClickHouse/ClickHouse/pull/61440) ([Sergei Trifonov](https://github.com/serxa)).
* Convert test 02998_system_dns_cache_table to smoke and mirrors [#61443](https://github.com/ClickHouse/ClickHouse/pull/61443) ([vdimir](https://github.com/vdimir)).
* Check boundaries for some settings in parallel replicas [#61455](https://github.com/ClickHouse/ClickHouse/pull/61455) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use SHARD_LOAD_QUEUE_BACKLOG for dictionaries in tests [#61462](https://github.com/ClickHouse/ClickHouse/pull/61462) ([vdimir](https://github.com/vdimir)).
* Split `02125_lz4_compression_bug` [#61465](https://github.com/ClickHouse/ClickHouse/pull/61465) ([Antonio Andelic](https://github.com/antonio2368)).
* Correctly process last stacktrace in `postprocess-traces.pl` [#61470](https://github.com/ClickHouse/ClickHouse/pull/61470) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix test `test_polymorphic_parts` [#61477](https://github.com/ClickHouse/ClickHouse/pull/61477) ([Anton Popov](https://github.com/CurtizJ)).
* A definitive guide to CAST [#61491](https://github.com/ClickHouse/ClickHouse/pull/61491) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Minor rename in FileCache [#61494](https://github.com/ClickHouse/ClickHouse/pull/61494) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Remove useless code [#61498](https://github.com/ClickHouse/ClickHouse/pull/61498) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix fuzzers [#61499](https://github.com/ClickHouse/ClickHouse/pull/61499) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update jdbc.md [#61506](https://github.com/ClickHouse/ClickHouse/pull/61506) ([San](https://github.com/santrancisco)).
* Fix error in clickhouse-client [#61507](https://github.com/ClickHouse/ClickHouse/pull/61507) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix clang-tidy build [#61519](https://github.com/ClickHouse/ClickHouse/pull/61519) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix infinite loop in function `hop` [#61523](https://github.com/ClickHouse/ClickHouse/pull/61523) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve tests 00159_parallel_formatting_* to to avoid timeouts [#61532](https://github.com/ClickHouse/ClickHouse/pull/61532) ([Kruglov Pavel](https://github.com/Avogar)).
* Refactoring of reading from compact parts [#61535](https://github.com/ClickHouse/ClickHouse/pull/61535) ([Anton Popov](https://github.com/CurtizJ)).
* Don't run 01459_manual_write_to_replicas in debug build as it's too slow [#61538](https://github.com/ClickHouse/ClickHouse/pull/61538) ([Kruglov Pavel](https://github.com/Avogar)).
* CI: ARM integration test - skip hdfs, kerberos, kafka [#61542](https://github.com/ClickHouse/ClickHouse/pull/61542) ([Max K.](https://github.com/maxknv)).
* More logging for loading of tables [#61546](https://github.com/ClickHouse/ClickHouse/pull/61546) ([Sergei Trifonov](https://github.com/serxa)).
* Fixing 01584_distributed_buffer_cannot_find_column with analyzer. [#61550](https://github.com/ClickHouse/ClickHouse/pull/61550) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Wait for done mutation with more logs and asserts [#61554](https://github.com/ClickHouse/ClickHouse/pull/61554) ([alesapin](https://github.com/alesapin)).
* Fix read_rows count with external group by [#61555](https://github.com/ClickHouse/ClickHouse/pull/61555) ([Alexander Tokmakov](https://github.com/tavplubix)).
* queries-file should be used to specify file [#61557](https://github.com/ClickHouse/ClickHouse/pull/61557) ([danila-ermakov](https://github.com/danila-ermakov)).
* Fix `02481_async_insert_dedup_token` [#61568](https://github.com/ClickHouse/ClickHouse/pull/61568) ([Antonio Andelic](https://github.com/antonio2368)).
* Add a comment after [#61458](https://github.com/ClickHouse/ClickHouse/issues/61458) [#61580](https://github.com/ClickHouse/ClickHouse/pull/61580) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix clickhouse-test client option and CLICKHOUSE_URL_PARAMS interference [#61596](https://github.com/ClickHouse/ClickHouse/pull/61596) ([vdimir](https://github.com/vdimir)).
* CI: remove compose files from integration test docker [#61597](https://github.com/ClickHouse/ClickHouse/pull/61597) ([Max K.](https://github.com/maxknv)).
* Fix 01244_optimize_distributed_group_by_sharding_key by ordering output [#61602](https://github.com/ClickHouse/ClickHouse/pull/61602) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove some tests from analyzer_tech_debt [#61603](https://github.com/ClickHouse/ClickHouse/pull/61603) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Reduce header dependencies [#61604](https://github.com/ClickHouse/ClickHouse/pull/61604) ([Raúl Marín](https://github.com/Algunenano)).
* Remove some magic_enum from headers [#61606](https://github.com/ClickHouse/ClickHouse/pull/61606) ([Raúl Marín](https://github.com/Algunenano)).
* Fix configs for upgrade and bugfix [#61607](https://github.com/ClickHouse/ClickHouse/pull/61607) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add tests for multiple fuzzer issues [#61614](https://github.com/ClickHouse/ClickHouse/pull/61614) ([Raúl Marín](https://github.com/Algunenano)).
* Try to fix `02908_many_requests_to_system_replicas` again [#61616](https://github.com/ClickHouse/ClickHouse/pull/61616) ([Nikita Taranov](https://github.com/nickitat)).
* Verbose error message about analyzer_compatibility_join_using_top_level_identifier [#61631](https://github.com/ClickHouse/ClickHouse/pull/61631) ([vdimir](https://github.com/vdimir)).
* Fix 00223_shard_distributed_aggregation_memory_efficient with analyzer [#61649](https://github.com/ClickHouse/ClickHouse/pull/61649) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Better fuzzer logs [#61650](https://github.com/ClickHouse/ClickHouse/pull/61650) ([Raúl Marín](https://github.com/Algunenano)).
* Fix flaky `02122_parallel_formatting_Template` [#61651](https://github.com/ClickHouse/ClickHouse/pull/61651) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Aggregator when data is empty [#61654](https://github.com/ClickHouse/ClickHouse/pull/61654) ([Antonio Andelic](https://github.com/antonio2368)).
* Restore poco SUN files [#61655](https://github.com/ClickHouse/ClickHouse/pull/61655) ([Andy Fiddaman](https://github.com/citrus-it)).
* Another fix for `SumIfToCountIfPass` [#61656](https://github.com/ClickHouse/ClickHouse/pull/61656) ([Antonio Andelic](https://github.com/antonio2368)).
* Keeper: fix data race during snapshot destructor call [#61657](https://github.com/ClickHouse/ClickHouse/pull/61657) ([Antonio Andelic](https://github.com/antonio2368)).
* CI: integration tests: use runner as py module [#61658](https://github.com/ClickHouse/ClickHouse/pull/61658) ([Max K.](https://github.com/maxknv)).
* Fix logging of autoscaling lambda, add test for effective_capacity [#61662](https://github.com/ClickHouse/ClickHouse/pull/61662) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Small change in `DatabaseOnDisk::iterateMetadataFiles()` [#61664](https://github.com/ClickHouse/ClickHouse/pull/61664) ([Nikita Taranov](https://github.com/nickitat)).
* Build improvements by removing magic enum from header and apply some explicit template instantiation [#61665](https://github.com/ClickHouse/ClickHouse/pull/61665) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Update the dictionary for OSSFuzz [#61672](https://github.com/ClickHouse/ClickHouse/pull/61672) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Inhibit randomization in some tests and exclude some long tests from debug runs [#61676](https://github.com/ClickHouse/ClickHouse/pull/61676) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test for [#61669](https://github.com/ClickHouse/ClickHouse/issues/61669) [#61678](https://github.com/ClickHouse/ClickHouse/pull/61678) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix use-of-uninitialized-value in HedgedConnections [#61679](https://github.com/ClickHouse/ClickHouse/pull/61679) ([Nikolay Degterinsky](https://github.com/evillique)).
* Remove clickhouse-diagnostics from the package [#61681](https://github.com/ClickHouse/ClickHouse/pull/61681) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix use-of-uninitialized-value in parseDateTimeBestEffort [#61694](https://github.com/ClickHouse/ClickHouse/pull/61694) ([Nikolay Degterinsky](https://github.com/evillique)).
* poco foundation: add illumos support [#61701](https://github.com/ClickHouse/ClickHouse/pull/61701) ([Andy Fiddaman](https://github.com/citrus-it)).
* contrib/c-ares: add illumos as a platform [#61702](https://github.com/ClickHouse/ClickHouse/pull/61702) ([Andy Fiddaman](https://github.com/citrus-it)).
* contrib/curl: Add illumos support [#61704](https://github.com/ClickHouse/ClickHouse/pull/61704) ([Andy Fiddaman](https://github.com/citrus-it)).
* Fuzzer: Try a different way to wait for the server [#61706](https://github.com/ClickHouse/ClickHouse/pull/61706) ([Raúl Marín](https://github.com/Algunenano)).
* Disable some tests for SMT [#61708](https://github.com/ClickHouse/ClickHouse/pull/61708) ([Raúl Marín](https://github.com/Algunenano)).
* Fix signal handler for sanitizer signals [#61709](https://github.com/ClickHouse/ClickHouse/pull/61709) ([Antonio Andelic](https://github.com/antonio2368)).
* Avoid `IsADirectoryError: Is a directory contrib/azure` [#61710](https://github.com/ClickHouse/ClickHouse/pull/61710) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Analyzer: fix group_by_use_nulls [#61717](https://github.com/ClickHouse/ClickHouse/pull/61717) ([Dmitry Novik](https://github.com/novikd)).
* Analyzer: Clear list of broken integration tests [#61718](https://github.com/ClickHouse/ClickHouse/pull/61718) ([Dmitry Novik](https://github.com/novikd)).
* CI: modify CI from PR body [#61725](https://github.com/ClickHouse/ClickHouse/pull/61725) ([Max K.](https://github.com/maxknv)).
* Add test for [#57820](https://github.com/ClickHouse/ClickHouse/issues/57820) [#61726](https://github.com/ClickHouse/ClickHouse/pull/61726) ([Dmitry Novik](https://github.com/novikd)).
* Revert "Revert "Un-flake test_undrop_query"" [#61727](https://github.com/ClickHouse/ClickHouse/pull/61727) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* FunctionsConversion: Start simplifying templates [#61733](https://github.com/ClickHouse/ClickHouse/pull/61733) ([Raúl Marín](https://github.com/Algunenano)).
* CI: modify it [#61735](https://github.com/ClickHouse/ClickHouse/pull/61735) ([Max K.](https://github.com/maxknv)).
* Fix segfault in SquashingTransform [#61736](https://github.com/ClickHouse/ClickHouse/pull/61736) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix DWARF format failing to skip DW_FORM_strx3 attributes [#61737](https://github.com/ClickHouse/ClickHouse/pull/61737) ([Michael Kolupaev](https://github.com/al13n321)).
* There is no such thing as broken tests [#61739](https://github.com/ClickHouse/ClickHouse/pull/61739) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Process removed files, decouple _check_mime [#61751](https://github.com/ClickHouse/ClickHouse/pull/61751) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Keeper fix: destroy `KeeperDispatcher` first [#61752](https://github.com/ClickHouse/ClickHouse/pull/61752) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix flaky `03014_async_with_dedup_part_log_rmt` [#61757](https://github.com/ClickHouse/ClickHouse/pull/61757) ([Antonio Andelic](https://github.com/antonio2368)).
* FunctionsConversion: Remove another batch of bad templates [#61773](https://github.com/ClickHouse/ClickHouse/pull/61773) ([Raúl Marín](https://github.com/Algunenano)).
* Revert "Fix bug when reading system.parts using UUID (issue 61220)." [#61774](https://github.com/ClickHouse/ClickHouse/pull/61774) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* CI: disable grpc tests on ARM [#61778](https://github.com/ClickHouse/ClickHouse/pull/61778) ([Max K.](https://github.com/maxknv)).
* Fix more tests with virtual columns in StorageMerge. [#61787](https://github.com/ClickHouse/ClickHouse/pull/61787) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Remove already not flaky tests with analyzer. [#61788](https://github.com/ClickHouse/ClickHouse/pull/61788) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: Fix assert in JOIN with Distributed table [#61789](https://github.com/ClickHouse/ClickHouse/pull/61789) ([vdimir](https://github.com/vdimir)).
* A test can be slow in debug build [#61796](https://github.com/ClickHouse/ClickHouse/pull/61796) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Updated clang-19 to master. [#61798](https://github.com/ClickHouse/ClickHouse/pull/61798) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test "00002_log_and_exception_messages_formatting" [#61821](https://github.com/ClickHouse/ClickHouse/pull/61821) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A test is too slow for debug [#61822](https://github.com/ClickHouse/ClickHouse/pull/61822) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove DataStreams [#61824](https://github.com/ClickHouse/ClickHouse/pull/61824) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better message for logging errors [#61827](https://github.com/ClickHouse/ClickHouse/pull/61827) ([Azat Khuzhin](https://github.com/azat)).
* Fix sanitizers suppressions [#61828](https://github.com/ClickHouse/ClickHouse/pull/61828) ([Azat Khuzhin](https://github.com/azat)).
* Remove unused code [#61830](https://github.com/ClickHouse/ClickHouse/pull/61830) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove DataStreams (2) [#61831](https://github.com/ClickHouse/ClickHouse/pull/61831) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update xxhash to v0.8.2 [#61838](https://github.com/ClickHouse/ClickHouse/pull/61838) ([Shubham Ranjan](https://github.com/shubhamranjan)).
* Fix: DISTINCT in subquery with analyzer [#61847](https://github.com/ClickHouse/ClickHouse/pull/61847) ([Igor Nikonov](https://github.com/devcrafter)).
* Analyzer: fix limit/offset on shards [#61849](https://github.com/ClickHouse/ClickHouse/pull/61849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Remove PoolBase::AllocateNewBypassingPool [#61866](https://github.com/ClickHouse/ClickHouse/pull/61866) ([Azat Khuzhin](https://github.com/azat)).
* Try to fix 02901_parallel_replicas_rollup with analyzer. [#61875](https://github.com/ClickHouse/ClickHouse/pull/61875) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add test for [#57808](https://github.com/ClickHouse/ClickHouse/issues/57808) [#61879](https://github.com/ClickHouse/ClickHouse/pull/61879) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* CI: merge queue support [#61881](https://github.com/ClickHouse/ClickHouse/pull/61881) ([Max K.](https://github.com/maxknv)).
* Update create.sql [#61885](https://github.com/ClickHouse/ClickHouse/pull/61885) ([Kseniia Sumarokova](https://github.com/kssenii)).
* no smaller unit in date_trunc [#61888](https://github.com/ClickHouse/ClickHouse/pull/61888) ([jsc0218](https://github.com/jsc0218)).
* Move KQL trash where it is supposed to be [#61903](https://github.com/ClickHouse/ClickHouse/pull/61903) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Changelog for 24.3 [#61909](https://github.com/ClickHouse/ClickHouse/pull/61909) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v23.3.22.3-lts [#61914](https://github.com/ClickHouse/ClickHouse/pull/61914) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.12.13-lts [#61915](https://github.com/ClickHouse/ClickHouse/pull/61915) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* No "please" [#61916](https://github.com/ClickHouse/ClickHouse/pull/61916) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v23.12.6.19-stable [#61917](https://github.com/ClickHouse/ClickHouse/pull/61917) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v24.1.8.22-stable [#61918](https://github.com/ClickHouse/ClickHouse/pull/61918) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix flaky test_broken_projestions/test.py::test_broken_ignored_replic… [#61932](https://github.com/ClickHouse/ClickHouse/pull/61932) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Check is Rust avaiable for build, if not, suggest a way to disable Rust support [#61938](https://github.com/ClickHouse/ClickHouse/pull/61938) ([Azat Khuzhin](https://github.com/azat)).
* CI: new ci menu in PR body [#61948](https://github.com/ClickHouse/ClickHouse/pull/61948) ([Max K.](https://github.com/maxknv)).
* Remove flaky test `01193_metadata_loading` [#61961](https://github.com/ClickHouse/ClickHouse/pull/61961) ([Nikita Taranov](https://github.com/nickitat)).
#### Packaging Improvement
* Adding the `--now` option to enable and start service automatically when installing the database server completely. [#60656](https://github.com/ClickHouse/ClickHouse/pull/60656) ([Chun-Sheng, Li](https://github.com/peter279k)).

View File

@ -1270,12 +1270,13 @@ SELECT * FROM json_each_row_nested
- [input_format_json_read_arrays_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_arrays_as_strings) - allow to parse JSON arrays as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_objects_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_objects_as_strings) - allow to parse JSON objects as strings in JSON input formats. Default value - `true`.
- [input_format_json_named_tuples_as_objects](/docs/en/operations/settings/settings-formats.md/#input_format_json_named_tuples_as_objects) - parse named tuple columns as JSON objects. Default value - `true`.
- [input_format_json_try_infer_numbers_from_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_try_infer_numbers_from_strings) - Try to infer numbers from string fields while schema inference. Default value - `false`.
- [input_format_json_try_infer_numbers_from_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_try_infer_numbers_from_strings) - try to infer numbers from string fields while schema inference. Default value - `false`.
- [input_format_json_try_infer_named_tuples_from_objects](/docs/en/operations/settings/settings-formats.md/#input_format_json_try_infer_named_tuples_from_objects) - try to infer named tuple from JSON objects during schema inference. Default value - `true`.
- [input_format_json_infer_incomplete_types_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_infer_incomplete_types_as_strings) - use type String for keys that contains only Nulls or empty objects/arrays during schema inference in JSON input formats. Default value - `true`.
- [input_format_json_defaults_for_missing_elements_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_defaults_for_missing_elements_in_named_tuple) - insert default values for missing elements in JSON object while parsing named tuple. Default value - `true`.
- [input_format_json_ignore_unknown_keys_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_ignore_unknown_keys_in_named_tuple) - Ignore unknown keys in json object for named tuples. Default value - `false`.
- [input_format_json_ignore_unknown_keys_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_ignore_unknown_keys_in_named_tuple) - ignore unknown keys in json object for named tuples. Default value - `false`.
- [input_format_json_compact_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_json_compact_allow_variable_number_of_columns) - allow variable number of columns in JSONCompact/JSONCompactEachRow format, ignore extra columns and use default values on missing columns. Default value - `false`.
- [input_format_json_throw_on_bad_escape_sequence](/docs/en/operations/settings/settings-formats.md/#input_format_json_throw_on_bad_escape_sequence) - throw an exception if JSON string contains bad escape sequence. If disabled, bad escape sequences will remain as is in the data. Default value - `true`.
- [output_format_json_quote_64bit_integers](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_64bit_integers) - controls quoting of 64-bit integers in JSON output format. Default value - `true`.
- [output_format_json_quote_64bit_floats](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_64bit_floats) - controls quoting of 64-bit floats in JSON output format. Default value - `false`.
- [output_format_json_quote_denormals](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_denormals) - enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format. Default value - `false`.
@ -2356,7 +2357,7 @@ You can select data from a ClickHouse table and save them into some file in the
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Arrow" > {filename.arrow}
```
### Arrow format settings {#parquet-format-settings}
### Arrow format settings {#arrow-format-settings}
- [output_format_arrow_low_cardinality_as_dictionary](/docs/en/operations/settings/settings-formats.md/#output_format_arrow_low_cardinality_as_dictionary) - enable output ClickHouse LowCardinality type as Dictionary Arrow type. Default value - `false`.
- [output_format_arrow_use_64_bit_indexes_for_dictionary](/docs/en/operations/settings/settings-formats.md/#output_format_arrow_use_64_bit_indexes_for_dictionary) - use 64-bit integer type for Dictionary indexes. Default value - `false`.

View File

@ -945,9 +945,9 @@ Hard limit is configured via system tools
## database_atomic_delay_before_drop_table_sec {#database_atomic_delay_before_drop_table_sec}
Sets the delay before remove table data in seconds. If the query has `SYNC` modifier, this setting is ignored.
The delay during which a dropped table can be restored using the [UNDROP](/docs/en/sql-reference/statements/undrop.md) statement. If `DROP TABLE` ran with a `SYNC` modifier, the setting is ignored.
Default value: `480` (8 minute).
Default value: `480` (8 minutes).
## database_catalog_unused_dir_hide_timeout_sec {#database_catalog_unused_dir_hide_timeout_sec}
@ -1354,6 +1354,7 @@ Keys:
- `count` The number of archived log files that ClickHouse stores.
- `console` Send `log` and `errorlog` to the console instead of file. To enable, set to `1` or `true`.
- `stream_compress` Compress `log` and `errorlog` with `lz4` stream compression. To enable, set to `1` or `true`.
- `formatting` Specify log format to be printed in console log (currently only `json` supported).
Both log and error log file names (only file names, not directories) support date and time format specifiers.
@ -1422,6 +1423,8 @@ Writing to the console can be configured. Config example:
</logger>
```
### syslog
Writing to the syslog is also supported. Config example:
``` xml
@ -1445,6 +1448,52 @@ Keys for syslog:
Default value: `LOG_USER` if `address` is specified, `LOG_DAEMON` otherwise.
- format Message format. Possible values: `bsd` and `syslog.`
### Log formats
You can specify the log format that will be outputted in the console log. Currently, only JSON is supported. Here is an example of an output JSON log:
```json
{
"date_time": "1650918987.180175",
"thread_name": "#1",
"thread_id": "254545",
"level": "Trace",
"query_id": "",
"logger_name": "BaseDaemon",
"message": "Received signal 2",
"source_file": "../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()",
"source_line": "192"
}
```
To enable JSON logging support, use the following snippet:
```xml
<logger>
<formatting>
<type>json</type>
<names>
<date_time>date_time</date_time>
<thread_name>thread_name</thread_name>
<thread_id>thread_id</thread_id>
<level>level</level>
<query_id>query_id</query_id>
<logger_name>logger_name</logger_name>
<message>message</message>
<source_file>source_file</source_file>
<source_line>source_line</source_line>
</names>
</formatting>
</logger>
```
**Renaming keys for JSON logs**
Key names can be modified by changing tag values inside the `<names>` tag. For example, to change `DATE_TIME` to `MY_DATE_TIME`, you can use `<date_time>MY_DATE_TIME</date_time>`.
**Omitting keys for JSON logs**
Log properties can be omitted by commenting out the property. For example, if you do not want your log to print `query_id`, you can comment out the `<query_id>` tag.
## send_crash_reports {#send_crash_reports}
Settings for opt-in sending crash reports to the ClickHouse core developers team via [Sentry](https://sentry.io).

View File

@ -651,6 +651,12 @@ This setting works only when setting `input_format_json_named_tuples_as_objects`
Enabled by default.
## input_format_json_throw_on_bad_escape_sequence {#input_format_json_throw_on_bad_escape_sequence}
Throw an exception if JSON string contains bad escape sequence in JSON input formats. If disabled, bad escape sequences will remain as is in the data.
Enabled by default.
## output_format_json_array_of_rows {#output_format_json_array_of_rows}
Enables the ability to output all rows as a JSON array in the [JSONEachRow](../../interfaces/formats.md/#jsoneachrow) format.
@ -1367,7 +1373,7 @@ Default value: `1'000'000`.
While importing data, when column is not found in schema default value will be used instead of error.
Disabled by default.
Enabled by default.
### input_format_parquet_skip_columns_with_unsupported_types_in_schema_inference {#input_format_parquet_skip_columns_with_unsupported_types_in_schema_inference}

View File

@ -1776,7 +1776,7 @@ Default value: 0 (no restriction).
## insert_quorum {#insert_quorum}
:::note
This setting is not applicable to SharedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information.
This setting is not applicable to SharedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information.
:::
Enables the quorum writes.
@ -1819,7 +1819,7 @@ See also:
## insert_quorum_parallel {#insert_quorum_parallel}
:::note
This setting is not applicable to SharedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information.
This setting is not applicable to SharedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information.
:::
Enables or disables parallelism for quorum `INSERT` queries. If enabled, additional `INSERT` queries can be sent while previous queries have not yet finished. If disabled, additional writes to the same table will be rejected.
@ -1840,7 +1840,7 @@ See also:
## select_sequential_consistency {#select_sequential_consistency}
:::note
This setting differ in behavior between SharedMergeTree and ReplicatedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information about the behavior of `select_sequential_consistency` in SharedMergeTree.
This setting differ in behavior between SharedMergeTree and ReplicatedMergeTree, see [SharedMergeTree consistency](/docs/en/cloud/reference/shared-merge-tree/#consistency) for more information about the behavior of `select_sequential_consistency` in SharedMergeTree.
:::
Enables or disables sequential consistency for `SELECT` queries. Requires `insert_quorum_parallel` to be disabled (enabled by default).
@ -2817,6 +2817,17 @@ Possible values:
Default value: 0.
## distributed_insert_skip_read_only_replicas {#distributed_insert_skip_read_only_replicas}
Enables skipping read-only replicas for INSERT queries into Distributed.
Possible values:
- 0 — INSERT was as usual, if it will go to read-only replica it will fail
- 1 — Initiator will skip read-only replicas before sending data to shards.
Default value: `0`
## distributed_foreground_insert {#distributed_foreground_insert}
Enables or disables synchronous data insertion into a [Distributed](../../engines/table-engines/special/distributed.md/#distributed) table.
@ -5442,3 +5453,7 @@ Enabling this setting can lead to incorrect result as in case of evolved schema
:::
Default value: 'false'.
## allow_suspicious_primary_key {#allow_suspicious_primary_key}
Allow suspicious `PRIMARY KEY`/`ORDER BY` for MergeTree (i.e. SimpleAggregateFunction).

View File

@ -520,13 +520,13 @@ Example of configuration for versions later or equal to 22.8:
</cache>
</disks>
<policies>
<s3-cache>
<s3_cache>
<volumes>
<main>
<disk>cache</disk>
</main>
</volumes>
</s3-cache>
</s3_cache>
<policies>
</storage_configuration>
```
@ -546,13 +546,13 @@ Example of configuration for versions earlier than 22.8:
</s3>
</disks>
<policies>
<s3-cache>
<s3_cache>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3-cache>
</s3_cache>
<policies>
</storage_configuration>
```

View File

@ -47,7 +47,7 @@ An example:
<engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) SETTINGS index_granularity = 1024</engine>
-->
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<max_size_rows>1048576</max_size>
<max_size_rows>1048576</max_size_rows>
<reserved_size_rows>8192</reserved_size_rows>
<buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
<flush_on_crash>false</flush_on_crash>

View File

@ -483,7 +483,7 @@ Where:
- `r1`- the number of unique visitors who visited the site during 2020-01-01 (the `cond1` condition).
- `r2`- the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-02 (`cond1` and `cond2` conditions).
- `r3`- the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-03 (`cond1` and `cond3` conditions).
- `r3`- the number of unique visitors who visited the site during a specific time period on 2020-01-01 and 2020-01-03 (`cond1` and `cond3` conditions).
## uniqUpTo(N)(x)

View File

@ -1670,7 +1670,7 @@ Like [fromDaysSinceYearZero](#fromDaysSinceYearZero) but returns a [Date32](../.
## age
Returns the `unit` component of the difference between `startdate` and `enddate`. The difference is calculated using a precision of 1 microsecond.
Returns the `unit` component of the difference between `startdate` and `enddate`. The difference is calculated using a precision of 1 nanosecond.
E.g. the difference between `2021-12-29` and `2022-01-01` is 3 days for `day` unit, 0 months for `month` unit, 0 years for `year` unit.
For an alternative to `age`, see function `date\_diff`.
@ -1686,16 +1686,17 @@ age('unit', startdate, enddate, [timezone])
- `unit` — The type of interval for result. [String](../../sql-reference/data-types/string.md).
Possible values:
- `microsecond` `microseconds` `us` `u`
- `millisecond` `milliseconds` `ms`
- `second` `seconds` `ss` `s`
- `minute` `minutes` `mi` `n`
- `hour` `hours` `hh` `h`
- `day` `days` `dd` `d`
- `week` `weeks` `wk` `ww`
- `month` `months` `mm` `m`
- `quarter` `quarters` `qq` `q`
- `year` `years` `yyyy` `yy`
- `nanosecond`, `nanoseconds`, `ns`
- `microsecond`, `microseconds`, `us`, `u`
- `millisecond`, `milliseconds`, `ms`
- `second`, `seconds`, `ss`, `s`
- `minute`, `minutes`, `mi`, `n`
- `hour`, `hours`, `hh`, `h`
- `day`, `days`, `dd`, `d`
- `week`, `weeks`, `wk`, `ww`
- `month`, `months`, `mm`, `m`
- `quarter`, `quarters`, `qq`, `q`
- `year`, `years`, `yyyy`, `yy`
- `startdate` — The first time value to subtract (the subtrahend). [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
@ -1763,16 +1764,17 @@ Aliases: `dateDiff`, `DATE_DIFF`, `timestampDiff`, `timestamp_diff`, `TIMESTAMP_
- `unit` — The type of interval for result. [String](../../sql-reference/data-types/string.md).
Possible values:
- `microsecond` `microseconds` `us` `u`
- `millisecond` `milliseconds` `ms`
- `second` `seconds` `ss` `s`
- `minute` `minutes` `mi` `n`
- `hour` `hours` `hh` `h`
- `day` `days` `dd` `d`
- `week` `weeks` `wk` `ww`
- `month` `months` `mm` `m`
- `quarter` `quarters` `qq` `q`
- `year` `years` `yyyy` `yy`
- `nanosecond`, `nanoseconds`, `ns`
- `microsecond`, `microseconds`, `us`, `u`
- `millisecond`, `milliseconds`, `ms`
- `second`, `seconds`, `ss`, `s`
- `minute`, `minutes`, `mi`, `n`
- `hour`, `hours`, `hh`, `h`
- `day`, `days`, `dd`, `d`
- `week`, `weeks`, `wk`, `ww`
- `month`, `months`, `mm`, `m`
- `quarter`, `quarters`, `qq`, `q`
- `year`, `years`, `yyyy`, `yy`
- `startdate` — The first time value to subtract (the subtrahend). [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).

View File

@ -543,12 +543,64 @@ You can get similar result by using the [ternary operator](../../sql-reference/f
Returns 1 if the Float32 and Float64 argument is NaN, otherwise this function 0.
## hasColumnInTable(\[hostname\[, username\[, password\]\],\] database, table, column)
## hasColumnInTable
Given the database name, the table name, and the column name as constant strings, returns 1 if the given column exists, otherwise 0.
**Syntax**
```sql
hasColumnInTable(\[hostname\[, username\[, password\]\],\] database, table, column)
```
**Parameters**
- `database` : name of the database. [String literal](../syntax#syntax-string-literal)
- `table` : name of the table. [String literal](../syntax#syntax-string-literal)
- `column` : name of the column. [String literal](../syntax#syntax-string-literal)
- `hostname` : remote server name to perform the check on. [String literal](../syntax#syntax-string-literal)
- `username` : username for remote server. [String literal](../syntax#syntax-string-literal)
- `password` : password for remote server. [String literal](../syntax#syntax-string-literal)
**Returned value**
- `1` if the given column exists.
- `0`, otherwise.
**Implementation details**
Given the database name, the table name, and the column name as constant strings, returns 1 if the given column exists, otherwise 0. If parameter `hostname` is given, the check is performed on a remote server.
If the table does not exist, an exception is thrown.
For elements in a nested data structure, the function checks for the existence of a column. For the nested data structure itself, the function returns 0.
**Example**
Query:
```sql
SELECT hasColumnInTable('system','metrics','metric')
```
```response
1
```
```sql
SELECT hasColumnInTable('system','metrics','non-existing_column')
```
```response
0
```
## hasThreadFuzzer
Returns whether Thread Fuzzer is effective. It can be used in tests to prevent runs from being too long.
**Syntax**
```sql
hasThreadFuzzer();
```
## bar
Builds a bar chart.

View File

@ -99,7 +99,7 @@ Alias: `OCTET_LENGTH`
Returns the length of a string in Unicode code points (not: in bytes or characters). It assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Alias:
- `CHAR_LENGTH``
- `CHAR_LENGTH`
- `CHARACTER_LENGTH`
## leftPad

View File

@ -74,6 +74,8 @@ SELECT
position('Hello, world!', 'o', 7)
```
Result:
``` text
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
@ -479,9 +481,9 @@ Alias: `haystack NOT ILIKE pattern` (operator)
## ngramDistance
Calculates the 4-gram distance between a `haystack` string and a `needle` string. For that, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns a Float32 between 0 and 1. The smaller the result is, the more strings are similar to each other. Throws an exception if constant `needle` or `haystack` arguments are more than 32Kb in size. If any of non-constant `haystack` or `needle` arguments is more than 32Kb in size, the distance is always 1.
Calculates the 4-gram distance between a `haystack` string and a `needle` string. For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns a [Float32](../../sql-reference/data-types/float.md/#float32-float64) between 0 and 1. The smaller the result is, the more similar the strings are to each other.
Functions `ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8` provide case-insensitive and/or UTF-8 variants of this function.
Functions [`ngramDistanceCaseInsensitive`](#ngramdistancecaseinsensitive), [`ngramDistanceUTF8`](#ngramdistanceutf8), [`ngramDistanceCaseInsensitiveUTF8`](#ngramdistancecaseinsensitiveutf8) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
@ -489,15 +491,170 @@ Functions `ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseIns
ngramDistance(haystack, needle)
```
**Parameters**
- `haystack`: First comparison string. [String literal](../syntax#string)
- `needle`: Second comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the similarity between the two strings. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
**Implementation details**
This function will throw an exception if constant `needle` or `haystack` arguments are more than 32Kb in size. If any non-constant `haystack` or `needle` arguments are more than 32Kb in size, then the distance is always 1.
**Examples**
The more similar two strings are to each other, the closer the result will be to 0 (identical).
Query:
```sql
SELECT ngramDistance('ClickHouse','ClickHouse!');
```
Result:
```response
0.06666667
```
The less similar two strings are to each, the larger the result will be.
Query:
```sql
SELECT ngramDistance('ClickHouse','House');
```
Result:
```response
0.5555556
```
## ngramDistanceCaseInsensitive
Provides a case-insensitive variant of [ngramDistance](#ngramdistance).
**Syntax**
```sql
ngramDistanceCaseInsensitive(haystack, needle)
```
**Parameters**
- `haystack`: First comparison string. [String literal](../syntax#string)
- `needle`: Second comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the similarity between the two strings. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
**Examples**
With [ngramDistance](#ngramdistance) differences in case will affect the similarity value:
Query:
```sql
SELECT ngramDistance('ClickHouse','clickhouse');
```
Result:
```response
0.71428573
```
With [ngramDistanceCaseInsensitive](#ngramdistancecaseinsensitive) case is ignored so two identical strings differing only in case will now return a low similarity value:
Query:
```sql
SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse');
```
Result:
```response
0
```
## ngramDistanceUTF8
Provides a UTF-8 variant of [ngramDistance](#ngramdistance). Assumes that `needle` and `haystack` strings are UTF-8 encoded strings.
**Syntax**
```sql
ngramDistanceUTF8(haystack, needle)
```
**Parameters**
- `haystack`: First UTF-8 encoded comparison string. [String literal](../syntax#string)
- `needle`: Second UTF-8 encoded comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the similarity between the two strings. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
**Example**
Query:
```sql
SELECT ngramDistanceUTF8('abcde','cde');
```
Result:
```response
0.5
```
## ngramDistanceCaseInsensitiveUTF8
Provides a case-insensitive variant of [ngramDistanceUTF8](#ngramdistanceutf8).
**Syntax**
```sql
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
```
**Parameters**
- `haystack`: First UTF-8 encoded comparison string. [String literal](../syntax#string)
- `needle`: Second UTF-8 encoded comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the similarity between the two strings. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
**Example**
Query:
```sql
SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE');
```
Result:
```response
0.5
```
## ngramSearch
Like `ngramDistance` but calculates the non-symmetric difference between a `needle` string and a `haystack` string, i.e. the number of n-grams from needle minus the common number of n-grams normalized by the number of `needle` n-grams. Returns a Float32 between 0 and 1. The bigger the result is, the more likely `needle` is in the `haystack`. This function is useful for fuzzy string search. Also see function `soundex`.
Like `ngramDistance` but calculates the non-symmetric difference between a `needle` string and a `haystack` string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of `needle` n-grams. Returns a [Float32](../../sql-reference/data-types/float.md/#float32-float64) between 0 and 1. The bigger the result is, the more likely `needle` is in the `haystack`. This function is useful for fuzzy string search. Also see function [`soundex`](../../sql-reference/functions/string-functions#soundex).
Functions `ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8` provide case-insensitive and/or UTF-8 variants of this function.
:::note
The UTF-8 variants use the 3-gram distance. These are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables collisions may occur. With UTF-8 case-insensitive format we do not use fair `tolower` function we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one this works for Latin and mostly for all Cyrillic letters.
:::
Functions [`ngramSearchCaseInsensitive`](#ngramsearchcaseinsensitive), [`ngramSearchUTF8`](#ngramsearchutf8), [`ngramSearchCaseInsensitiveUTF8`](#ngramsearchcaseinsensitiveutf8) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
@ -505,6 +662,140 @@ The UTF-8 variants use the 3-gram distance. These are not perfectly fair n-gram
ngramSearch(haystack, needle)
```
**Parameters**
- `haystack`: First comparison string. [String literal](../syntax#string)
- `needle`: Second comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack`. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
**Implementation details**
:::note
The UTF-8 variants use the 3-gram distance. These are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables collisions may occur. With UTF-8 case-insensitive format we do not use fair `tolower` function we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one this works for Latin and mostly for all Cyrillic letters.
:::
**Example**
Query:
```sql
SELECT ngramSearch('Hello World','World Hello');
```
Result:
```response
0.5
```
## ngramSearchCaseInsensitive
Provides a case-insensitive variant of [ngramSearch](#ngramSearch).
**Syntax**
```sql
ngramSearchCaseInsensitive(haystack, needle)
```
**Parameters**
- `haystack`: First comparison string. [String literal](../syntax#string)
- `needle`: Second comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack`. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
The bigger the result is, the more likely `needle` is in the `haystack`.
**Example**
Query:
```sql
SELECT ngramSearchCaseInsensitive('Hello World','hello');
```
Result:
```response
1
```
## ngramSearchUTF8
Provides a UTF-8 variant of [ngramSearch](#ngramsearch) in which `needle` and `haystack` are assumed to be UTF-8 encoded strings.
**Syntax**
```sql
ngramSearchUTF8(haystack, needle)
```
**Parameters**
- `haystack`: First UTF-8 encoded comparison string. [String literal](../syntax#string)
- `needle`: Second UTF-8 encoded comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack`. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
The bigger the result is, the more likely `needle` is in the `haystack`.
**Example**
Query:
```sql
SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд');
```
Result:
```response
0.5
```
## ngramSearchCaseInsensitiveUTF8
Provides a case-insensitive variant of [ngramSearchUTF8](#ngramsearchutf8) in which `needle` and `haystack`.
**Syntax**
```sql
ngramSearchCaseInsensitiveUTF8(haystack, needle)
```
**Parameters**
- `haystack`: First UTF-8 encoded comparison string. [String literal](../syntax#string)
- `needle`: Second UTF-8 encoded comparison string. [String literal](../syntax#string)
**Returned value**
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack`. [Float32](../../sql-reference/data-types/float.md/#float32-float64)
The bigger the result is, the more likely `needle` is in the `haystack`.
**Example**
Query:
```sql
SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз');
```
Result:
```response
0.57142854
```
## countSubstrings
Returns how often substring `needle` occurs in string `haystack`.
@ -610,7 +901,7 @@ Like `countMatches(haystack, pattern)` but matching ignores the case.
## regexpExtract
Extracts the first string in haystack that matches the regexp pattern and corresponds to the regex group index.
Extracts the first string in `haystack` that matches the regexp pattern and corresponds to the regex group index.
**Syntax**
@ -652,7 +943,7 @@ Result:
## hasSubsequence
Returns 1 if needle is a subsequence of haystack, or 0 otherwise.
Returns 1 if `needle` is a subsequence of `haystack`, or 0 otherwise.
A subsequence of a string is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.
@ -676,8 +967,10 @@ Type: `UInt8`.
**Examples**
Query:
``` sql
SELECT hasSubsequence('garbage', 'arg') ;
SELECT hasSubsequence('garbage', 'arg');
```
Result:
@ -692,10 +985,263 @@ Result:
Like [hasSubsequence](#hasSubsequence) but searches case-insensitively.
**Syntax**
``` sql
hasSubsequenceCaseInsensitive(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Subsequence to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
**Returned values**
- 1, if needle is a subsequence of haystack.
- 0, otherwise.
Type: `UInt8`.
**Examples**
Query:
``` sql
SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG');
```
Result:
``` text
┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│ 1 │
└─────────────────────────────────────────────────┘
```
## hasSubsequenceUTF8
Like [hasSubsequence](#hasSubsequence) but assumes `haystack` and `needle` are UTF-8 encoded strings.
**Syntax**
``` sql
hasSubsequenceUTF8(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. UTF-8 encoded [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Subsequence to be searched. UTF-8 encoded [String](../../sql-reference/syntax.md#syntax-string-literal).
**Returned values**
- 1, if needle is a subsequence of haystack.
- 0, otherwise.
Type: `UInt8`.
Query:
**Examples**
``` sql
select hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система');
```
Result:
``` text
┌─hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система')─┐
│ 1 │
└───────────────────────────────────────────────────────────────────────────────────────────┘
```
## hasSubsequenceCaseInsensitiveUTF8
Like [hasSubsequenceUTF8](#hasSubsequenceUTF8) but searches case-insensitively.
**Syntax**
``` sql
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. UTF-8 encoded [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Subsequence to be searched. UTF-8 encoded [String](../../sql-reference/syntax.md#syntax-string-literal).
**Returned values**
- 1, if needle is a subsequence of haystack.
- 0, otherwise.
Type: `UInt8`.
**Examples**
Query:
``` sql
select hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА');
```
Result:
``` text
┌─hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА')─┐
│ 1 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
## hasToken
Returns 1 if a given token is present in a haystack, or 0 otherwise.
**Syntax**
```sql
hasToken(haystack, token)
```
**Parameters**
- `haystack`: String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `token`: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
- 1, if the token is present in the haystack.
- 0, if the token is not present.
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Query:
```sql
SELECT hasToken('Hello World','Hello');
```
```response
1
```
## hasTokenOrNull
Returns 1 if a given token is present, 0 if not present, and null if the token is ill-formed.
**Syntax**
```sql
hasTokenOrNull(haystack, token)
```
**Parameters**
- `haystack`: String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `token`: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
- 1, if the token is present in the haystack.
- 0, if the token is not present in the haystack.
- null, if the token is ill-formed.
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Where `hasToken` would throw an error for an ill-formed token, `hasTokenOrNull` returns `null` for an ill-formed token.
Query:
```sql
SELECT hasTokenOrNull('Hello World','Hello,World');
```
```response
null
```
## hasTokenCaseInsensitive
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case.
**Syntax**
```sql
hasTokenCaseInsensitive(haystack, token)
```
**Parameters**
- `haystack`: String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `token`: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
- 1, if the token is present in the haystack.
- 0, otherwise.
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Query:
```sql
SELECT hasTokenCaseInsensitive('Hello World','hello');
```
```response
1
```
## hasTokenCaseInsensitiveOrNull
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case and returns null if the token is ill-formed.
**Syntax**
```sql
hasTokenCaseInsensitiveOrNull(haystack, token)
```
**Parameters**
- `haystack`: String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `token`: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
- 1, if the token is present in the haystack.
- 0, if token is not present.
- null, if the token is ill-formed.
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Where `hasTokenCaseInsensitive` would throw an error for an ill-formed token, `hasTokenCaseInsensitiveOrNull` returns `null` for an ill-formed token.
Query:
```sql
SELECT hasTokenCaseInsensitiveOrNull('Hello World','hello,world');
```
```response
null
```

View File

@ -128,9 +128,9 @@ Returns the part of the domain that includes top-level subdomains up to the “f
For example:
- `cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
- `cutToFirstSignificantSubdomain('www.tr') = 'www.tr'`.
- `cutToFirstSignificantSubdomain('tr') = ''`.
- `cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
- `cutToFirstSignificantSubdomainWithWWW('www.tr') = 'www.tr'`.
- `cutToFirstSignificantSubdomainWithWWW('tr') = ''`.
### cutToFirstSignificantSubdomainCustom

View File

@ -56,7 +56,9 @@ Entries for finished mutations are not deleted right away (the number of preserv
For non-replicated tables, all `ALTER` queries are performed synchronously. For replicated tables, the query just adds instructions for the appropriate actions to `ZooKeeper`, and the actions themselves are performed as soon as possible. However, the query can wait for these actions to be completed on all the replicas.
For all `ALTER` queries, you can use the [alter_sync](/docs/en/operations/settings/settings.md/#alter-sync) setting to set up waiting.
For `ALTER` queries that creates mutations (e.g.: including, but not limited to `UPDATE`, `DELETE`, `MATERIALIZE INDEX`, `MATERIALIZE PROJECTION`, `MATERIALIZE COLUMN`, `APPLY DELETED MASK`, `CLEAR STATISTIC`, `MATERIALIZE STATISTIC`) the synchronicity is defined by the [mutations_sync](/docs/en/operations/settings/settings.md/#mutations_sync) setting.
For other `ALTER` queries which only modify the metadata, you can use the [alter_sync](/docs/en/operations/settings/settings.md/#alter-sync) setting to set up waiting.
You can specify how long (in seconds) to wait for inactive replicas to execute all `ALTER` queries with the [replication_wait_for_inactive_replica_timeout](/docs/en/operations/settings/settings.md/#replication-wait-for-inactive-replica-timeout) setting.
@ -64,8 +66,6 @@ You can specify how long (in seconds) to wait for inactive replicas to execute a
For all `ALTER` queries, if `alter_sync = 2` and some replicas are not active for more than the time, specified in the `replication_wait_for_inactive_replica_timeout` setting, then an exception `UNFINISHED` is thrown.
:::
For `ALTER TABLE ... UPDATE|DELETE|MATERIALIZE INDEX|MATERIALIZE PROJECTION|MATERIALIZE COLUMN` queries the synchronicity is defined by the [mutations_sync](/docs/en/operations/settings/settings.md/#mutations_sync) setting.
## Related content
- Blog: [Handling Updates and Deletes in ClickHouse](https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse)

View File

@ -30,9 +30,11 @@ Also see [UNDROP TABLE](/docs/en/sql-reference/statements/undrop.md)
Syntax:
``` sql
DROP [TEMPORARY] TABLE [IF EXISTS] [IF EMPTY] [db.]name [ON CLUSTER cluster] [SYNC]
DROP [TEMPORARY] TABLE [IF EXISTS] [IF EMPTY] [db1.]name_1[, [db2.]name_2, ...] [ON CLUSTER cluster] [SYNC]
```
Note that deleting multiple tables at the same time is a non-atomic deletion. If a table fails to be deleted, subsequent tables will not be deleted.
## DROP DICTIONARY
Deletes the dictionary.

View File

@ -64,6 +64,14 @@ RELOAD FUNCTIONS [ON CLUSTER cluster_name]
RELOAD FUNCTION [ON CLUSTER cluster_name] function_name
```
## RELOAD ASYNCHRONOUS METRICS
Re-calculates all [asynchronous metrics](../../operations/system-tables/asynchronous_metrics.md). Since asynchronous metrics are periodically updated based on setting [asynchronous_metrics_update_period_s](../../operations/server-configuration-parameters/settings.md), updating them manually using this statement is typically not necessary.
```sql
RELOAD ASYNCHRONOUS METRICS [ON CLUSTER cluster_name]
```
## DROP DNS CACHE
Clears ClickHouses internal DNS cache. Sometimes (for old ClickHouse versions) it is necessary to use this command when changing the infrastructure (changing the IP address of another ClickHouse server or the server used by dictionaries).

View File

@ -23,9 +23,16 @@ You can specify how long (in seconds) to wait for inactive replicas to execute `
If the `alter_sync` is set to `2` and some replicas are not active for more than the time, specified by the `replication_wait_for_inactive_replica_timeout` setting, then an exception `UNFINISHED` is thrown.
:::
## TRUNCATE ALL TABLES
``` sql
TRUNCATE ALL TABLES [IF EXISTS] db [ON CLUSTER cluster]
```
Removes all data from all tables in a database.
## TRUNCATE DATABASE
``` sql
TRUNCATE DATABASE [IF EXISTS] [db.]name [ON CLUSTER cluster]
TRUNCATE DATABASE [IF EXISTS] db [ON CLUSTER cluster]
```
Removes all tables from a database but keeps the database itself. When the clause `IF EXISTS` is omitted, the query returns an error if the database does not exist.

View File

@ -13,13 +13,6 @@ a system table called `system.dropped_tables`.
If you have a materialized view without a `TO` clause associated with the dropped table, then you will also have to UNDROP the inner table of that view.
:::note
UNDROP TABLE is experimental. To use it add this setting:
```sql
set allow_experimental_undrop_table_query = 1;
```
:::
:::tip
Also see [DROP TABLE](/docs/en/sql-reference/statements/drop.md)
:::
@ -32,60 +25,53 @@ UNDROP TABLE [db.]name [UUID '<uuid>'] [ON CLUSTER cluster]
**Example**
``` sql
set allow_experimental_undrop_table_query = 1;
```
```sql
CREATE TABLE undropMe
CREATE TABLE tab
(
`id` UInt8
)
ENGINE = MergeTree
ORDER BY id
```
ORDER BY id;
DROP TABLE tab;
```sql
DROP TABLE undropMe
```
```sql
SELECT *
FROM system.dropped_tables
FORMAT Vertical
FORMAT Vertical;
```
```response
Row 1:
──────
index: 0
database: default
table: undropMe
table: tab
uuid: aa696a1a-1d70-4e60-a841-4c80827706cc
engine: MergeTree
metadata_dropped_path: /var/lib/clickhouse/metadata_dropped/default.undropMe.aa696a1a-1d70-4e60-a841-4c80827706cc.sql
metadata_dropped_path: /var/lib/clickhouse/metadata_dropped/default.tab.aa696a1a-1d70-4e60-a841-4c80827706cc.sql
table_dropped_time: 2023-04-05 14:12:12
1 row in set. Elapsed: 0.001 sec.
```
```sql
UNDROP TABLE undropMe
```
```response
Ok.
```
```sql
UNDROP TABLE tab;
SELECT *
FROM system.dropped_tables
FORMAT Vertical
```
FORMAT Vertical;
```response
Ok.
0 rows in set. Elapsed: 0.001 sec.
```
```sql
DESCRIBE TABLE undropMe
FORMAT Vertical
DESCRIBE TABLE tab
FORMAT Vertical;
```
```response
Row 1:
──────

View File

@ -53,7 +53,7 @@ SELECT * FROM random;
└──────────────────────────────┴──────────────┴────────────────────────────────────────────────────────────────────┘
```
In combination with [generateRandomStructure](../../sql-reference/functions/other-functions.md#generateRandomStructure):
In combination with [generateRandomStructure](../../sql-reference/functions/other-functions.md#generaterandomstructure):
```sql
SELECT * FROM generateRandom(generateRandomStructure(4, 101), 101) LIMIT 3;

View File

@ -476,7 +476,7 @@ FROM
- `r1` - количество уникальных посетителей за 2020-01-01 (`cond1`).
- `r2` - количество уникальных посетителей в период между 2020-01-01 и 2020-01-02 (`cond1` и `cond2`).
- `r3` - количество уникальных посетителей в период между 2020-01-01 и 2020-01-03 (`cond1` и `cond3`).
- `r3` - количество уникальных посетителей в период за 2020-01-01 и 2020-01-03 (`cond1` и `cond3`).
## uniqUpTo(N)(x) {#uniquptonx}

View File

@ -627,7 +627,7 @@ SELECT toDate('2016-12-27') AS date, toYearWeek(date) AS yearWeek0, toYearWeek(d
## age
Вычисляет компонент `unit` разницы между `startdate` и `enddate`. Разница вычисляется с точностью в 1 микросекунду.
Вычисляет компонент `unit` разницы между `startdate` и `enddate`. Разница вычисляется с точностью в 1 наносекунду.
Например, разница между `2021-12-29` и `2022-01-01` 3 дня для единицы `day`, 0 месяцев для единицы `month`, 0 лет для единицы `year`.
**Синтаксис**
@ -641,6 +641,7 @@ age('unit', startdate, enddate, [timezone])
- `unit` — единица измерения времени, в которой будет выражено возвращаемое значение функции. [String](../../sql-reference/data-types/string.md).
Возможные значения:
- `nanosecond` (возможные сокращения: `ns`)
- `microsecond` (возможные сокращения: `us`, `u`)
- `millisecond` (возможные сокращения: `ms`)
- `second` (возможные сокращения: `ss`, `s`)
@ -716,6 +717,7 @@ date_diff('unit', startdate, enddate, [timezone])
- `unit` — единица измерения времени, в которой будет выражено возвращаемое значение функции. [String](../../sql-reference/data-types/string.md).
Возможные значения:
- `nanosecond` (возможные сокращения: `ns`)
- `microsecond` (возможные сокращения: `us`, `u`)
- `millisecond` (возможные сокращения: `ms`)
- `second` (возможные сокращения: `ss`, `s`)

View File

@ -1,196 +0,0 @@
---
slug: /zh/engines/database-engines/materialize-mysql
sidebar_position: 29
sidebar_label: "[experimental] MaterializedMySQL"
---
# [experimental] MaterializedMySQL {#materialized-mysql}
**这是一个实验性的特性,不应该在生产中使用。**
创建ClickHouse数据库包含MySQL中所有的表以及这些表中的所有数据。
ClickHouse服务器作为MySQL副本工作。它读取binlog并执行DDL和DML查询。
这个功能是实验性的。
## 创建数据库 {#creating-a-database}
``` sql
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
```
**引擎参数**
- `host:port` — MySQL服务地址
- `database` — MySQL数据库名称
- `user` — MySQL用户名
- `password` — MySQL用户密码
**引擎配置**
- `max_rows_in_buffer` — 允许数据缓存到内存中的最大行数(对于单个表和无法查询的缓存数据)。当超过行数时,数据将被物化。默认值: `65505`
- `max_bytes_in_buffer` — 允许在内存中缓存数据的最大字节数(对于单个表和无法查询的缓存数据)。当超过行数时,数据将被物化。默认值: `1048576`.
- `max_rows_in_buffers` — 允许数据缓存到内存中的最大行数(对于数据库和无法查询的缓存数据)。当超过行数时,数据将被物化。默认值: `65505`.
- `max_bytes_in_buffers` — 允许在内存中缓存数据的最大字节数(对于数据库和无法查询的缓存数据)。当超过行数时,数据将被物化。默认值: `1048576`.
- `max_flush_data_time` — 允许数据在内存中缓存的最大毫秒数(对于数据库和无法查询的缓存数据)。当超过这个时间时,数据将被物化。默认值: `1000`.
- `max_wait_time_when_mysql_unavailable` — 当MySQL不可用时重试间隔(毫秒)。负值禁止重试。默认值: `1000`.
- `allows_query_when_mysql_lost` — 当mysql丢失时允许查询物化表。默认值: `0` (`false`).
```
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***')
SETTINGS
allows_query_when_mysql_lost=true,
max_wait_time_when_mysql_unavailable=10000;
```
**MySQL服务器端配置**
为了`MaterializeMySQL`正确的工作,有一些强制性的`MySQL`侧配置设置应该设置:
- `default_authentication_plugin = mysql_native_password`,因为`MaterializeMySQL`只能使用此方法授权。
- `gtid_mode = on`,因为要提供正确的`MaterializeMySQL`复制基于GTID的日志记录是必须的。注意在打开这个模式`On`时,你还应该指定`enforce_gtid_consistency = on`。
## 虚拟列 {#virtual-columns}
当使用`MaterializeMySQL`数据库引擎时,[ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md)表与虚拟的`_sign`和`_version`列一起使用。
- `_version` — 同步版本。 类型[UInt64](../../sql-reference/data-types/int-uint.md).
- `_sign` — 删除标记。类型 [Int8](../../sql-reference/data-types/int-uint.md). Possible values:
- `1` — 行不会删除,
- `-1` — 行被删除。
## 支持的数据类型 {#data_types-support}
| MySQL | ClickHouse |
|-------------------------|--------------------------------------------------------------|
| TINY | [Int8](../../sql-reference/data-types/int-uint.md) |
| SHORT | [Int16](../../sql-reference/data-types/int-uint.md) |
| INT24 | [Int32](../../sql-reference/data-types/int-uint.md) |
| LONG | [UInt32](../../sql-reference/data-types/int-uint.md) |
| LONGLONG | [UInt64](../../sql-reference/data-types/int-uint.md) |
| FLOAT | [Float32](../../sql-reference/data-types/float.md) |
| DOUBLE | [Float64](../../sql-reference/data-types/float.md) |
| DECIMAL, NEWDECIMAL | [Decimal](../../sql-reference/data-types/decimal.md) |
| DATE, NEWDATE | [Date](../../sql-reference/data-types/date.md) |
| DATETIME, TIMESTAMP | [DateTime](../../sql-reference/data-types/datetime.md) |
| DATETIME2, TIMESTAMP2 | [DateTime64](../../sql-reference/data-types/datetime64.md) |
| ENUM | [Enum](../../sql-reference/data-types/enum.md) |
| STRING | [String](../../sql-reference/data-types/string.md) |
| VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) |
| BLOB | [String](../../sql-reference/data-types/string.md) |
| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) |
不支持其他类型。如果MySQL表包含此类类型的列ClickHouse抛出异常"Unhandled data type"并停止复制。
[Nullable](../../sql-reference/data-types/nullable.md)已经支持
## 使用方式 {#specifics-and-recommendations}
### 兼容性限制
除了数据类型的限制外,与`MySQL`数据库相比,还存在一些限制,在实现复制之前应先解决这些限制:
- `MySQL`中的每个表都应该包含`PRIMARY KEY`
- 对于包含`ENUM`字段值超出范围(在`ENUM`签名中指定)的行的表,复制将不起作用。
### DDL查询 {#ddl-queries}
MySQL DDL查询转换为相应的ClickHouse DDL查询([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md))。如果ClickHouse无法解析某个DDL查询则该查询将被忽略。
### Data Replication {#data-replication}
`MaterializeMySQL`不支持直接`INSERT`, `DELETE`和`UPDATE`查询. 但是,它们是在数据复制方面支持的:
- MySQL的`INSERT`查询转换为`INSERT`并携带`_sign=1`.
- MySQL的`DELETE`查询转换为`INSERT`并携带`_sign=-1`.
- MySQL的`UPDATE`查询转换为`INSERT`并携带`_sign=-1`, `INSERT`和`_sign=1`.
### 查询MaterializeMySQL表 {#select}
`SELECT`查询`MaterializeMySQL`表有一些细节:
- 如果`_version`在`SELECT`中没有指定,则使用[FINAL](../../sql-reference/statements/select/from.md#select-from-final)修饰符。所以只有带有`MAX(_version)`的行才会被选中。
- 如果`_sign`在`SELECT`中没有指定,则默认使用`WHERE _sign=1`。因此,删除的行不会包含在结果集中。
- 结果包括列中的列注释因为它们存在于SQL数据库表中。
### Index Conversion {#index-conversion}
MySQL的`PRIMARY KEY`和`INDEX`子句在ClickHouse表中转换为`ORDER BY`元组。
ClickHouse只有一个物理顺序由`ORDER BY`子句决定。要创建一个新的物理顺序,使用[materialized views](../../sql-reference/statements/create/view.md#materialized)。
**Notes**
- 带有`_sign=-1`的行不会从表中物理删除。
- `MaterializeMySQL`引擎不支持级联`UPDATE/DELETE`查询。
- 复制很容易被破坏。
- 禁止对数据库和表进行手工操作。
- `MaterializeMySQL`受[optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert)设置的影响。当MySQL服务器中的表发生变化时数据会合并到`MaterializeMySQL`数据库中相应的表中。
## 使用示例 {#examples-of-use}
MySQL操作:
``` sql
mysql> CREATE DATABASE db;
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
mysql> DELETE FROM db.test WHERE a=1;
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
mysql> UPDATE db.test SET c='Wow!', b=222;
mysql> SELECT * FROM test;
```
```text
+---+------+------+
| a | b | c |
+---+------+------+
| 2 | 222 | Wow! |
+---+------+------+
```
ClickHouse中的数据库与MySQL服务器交换数据:
创建的数据库和表:
``` sql
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***');
SHOW TABLES FROM mysql;
```
``` text
┌─name─┐
│ test │
└──────┘
```
然后插入数据:
``` sql
SELECT * FROM mysql.test;
```
``` text
┌─a─┬──b─┐
│ 1 │ 11 │
│ 2 │ 22 │
└───┴────┘
```
删除数据后,添加列并更新:
``` sql
SELECT * FROM mysql.test;
```
``` text
┌─a─┬───b─┬─c────┐
│ 2 │ 222 │ Wow! │
└───┴─────┴──────┘
```

View File

@ -472,7 +472,7 @@ FROM
- `r1`-2020-01-01期间访问该网站的独立访问者数量 `cond1` 条件)。
- `r2`-在2020-01-01和2020-01-02之间的特定时间段内访问该网站的唯一访问者的数量 (`cond1` 和 `cond2` 条件)。
- `r3`-在2020-01-01和2020-01-03之间的特定时间段内访问该网站的唯一访问者的数量 (`cond1` 和 `cond3` 条件)。
- `r3`-在2020-01-01和2020-01-03 网站的独立访客数量 (`cond1` 和 `cond3` 条件)。
## uniqUpTo(N)(x) {#uniquptonx}

View File

@ -643,6 +643,7 @@ date_diff('unit', startdate, enddate, [timezone])
- `unit``value`对应的时间单位。类型为[String](../../sql-reference/data-types/string.md)。
可能的值:
- `nanosecond`
- `microsecond`
- `millisecond`
- `second`

View File

@ -1,128 +1,702 @@
---
slug: /zh/sql-reference/functions/string-search-functions
---
# 字符串搜索函数 {#zi-fu-chuan-sou-suo-han-shu}
下列所有函数在默认的情况下区分大小写。对于不区分大小写的搜索,存在单独的变体。
# 字符串搜索函数
## 位置(大海捞针),定位(大海捞针) {#positionhaystack-needle-locatehaystack-needle}
本节中的所有函数默认情况下都区分大小写进行搜索。不区分大小写的搜索通常由单独的函数变体提供。
请注意,不区分大小写的搜索,遵循英语的小写-大写规则。
例如。英语中大写的`i`是`I`,而在土耳其语中则是`İ`, 对于英语以外的语言,结果可能会不符合预期。
在字符串`haystack`中搜索子串`needle`。
返回子串的位置以字节为单位从1开始如果未找到子串则返回0。
本节中的函数还假设搜索字符串和被搜索字符串是单字节编码文本(例如ASCII)。如果违反此假设不会抛出异常且结果为undefined。
UTF-8 编码字符串的搜索通常由单独的函数变体提供。同样,如果使用 UTF-8 函数变体但输入字符串不是 UTF-8 编码文本,不会抛出异常且结果为 undefined。
需要注意,函数不会执行自动 Unicode 规范化,您可以使用[normalizeUTF8*()](https://clickhouse.com/docs/zh/sql-reference/functions/string-functions/) 函数来执行此操作。
在[字符串函数](string-functions.md) 和 [字符串替换函数](string-replace-functions.md) 会分别说明.
对于不区分大小写的搜索,请使用函数`positionCaseInsensitive`。
## position
## positionUTF8(大海捞针) {#positionutf8haystack-needle}
返回字符串`haystack`中子字符串`needle`的位置(以字节为单位,从 1 开始)。
与`position`相同但位置以Unicode字符返回。此函数工作在UTF-8编码的文本字符集中。如非此编码的字符集则返回一些非预期结果他不会抛出异常
**语法**
对于不区分大小写的搜索,请使用函数`positionCaseInsensitiveUTF8`。
``` sql
position(haystack, needle[, start_pos])
```
## 多搜索分配(干草堆,\[针<sub>1</sub>,针<sub>2</sub>, …, needle<sub>n</sub>\]) {#multisearchallpositionshaystack-needle1-needle2-needlen}
别名:
- `position(needle IN haystack)`
与`position`相同但函数返回一个数组其中包含所有匹配needle<sub></sub>的位置。
**参数**
对于不区分大小写的搜索或/和UTF-8格式使用函数`multiSearchAllPositionsCaseInsensitivemultiSearchAllPositionsUTF8multiSearchAllPositionsCaseInsensitiveUTF8`。
- `haystack` — 被检索查询字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — 进行查询的子字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `start_pos` 在字符串`haystack` 中开始检索的位置(从1开始),类型为[UInt](../../sql-reference/data-types/int-uint.md),可选。
## multiSearchFirstPosition(大海捞针,\[针<sub>1</sub>,针<sub>2</sub>, …, needle<sub>n</sub>\]) {#multisearchfirstpositionhaystack-needle1-needle2-needlen}
**返回值**
与`position`相同,但返回在`haystack`中与needles字符串匹配的最左偏移。
- 若子字符串存在,返回位置(以字节为单位,从 1 开始)。
- 如果不存在子字符串,返回 0。
对于不区分大小写的搜索或/和UTF-8格式使用函数`multiSearchFirstPositionCaseInsensitivemultiSearchFirstPositionUTF8multiSearchFirstPositionCaseInsensitiveUTF8`。
如果子字符串 `needle` 为空,则:
- 如果未指定 `start_pos`,返回 `1`
- 如果 `start_pos = 0`,则返回 `1`
- 如果 `start_pos >= 1``start_pos <= length(haystack) + 1`,则返回 `start_pos`
- 否则返回 `0`
## multiSearchFirstIndex(大海捞针,\[针<sub>1</sub>,针<sub>2</sub>, …, needle<sub>n</sub>\]) {#multisearchfirstindexhaystack-needle1-needle2-needlen}
以上规则同样在这些函数中生效: [locate](#locate), [positionCaseInsensitive](#positionCaseInsensitive), [positionUTF8](#positionUTF8), [positionCaseInsensitiveUTF8](#positionCaseInsensitiveUTF8)
返回在字符串`haystack`中最先查找到的needle<sub></sub>的索引`i`从1开始没有找到任何匹配项则返回0。
数据类型: `Integer`.
对于不区分大小写的搜索或/和UTF-8格式使用函数`multiSearchFirstIndexCaseInsensitivemultiSearchFirstIndexUTF8multiSearchFirstIndexCaseInsensitiveUTF8`。
**示例**
## 多搜索(大海捞针,\[针<sub>1</sub>,针<sub>2</sub>, …, needle<sub>n</sub>\]) {#multisearchanyhaystack-needle1-needle2-needlen}
``` sql
SELECT position('Hello, world!', '!');
```
如果`haystack`中至少存在一个needle<sub></sub>匹配则返回1否则返回0。
结果:
``` text
┌─position('Hello, world!', '!')─┐
│ 13 │
└────────────────────────────────┘
```
示例,使用参数 `start_pos` :
``` sql
SELECT
position('Hello, world!', 'o', 1),
position('Hello, world!', 'o', 7)
```
结果:
``` text
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
└───────────────────────────────────┴───────────────────────────────────┘
```
示例,`needle IN haystack`:
```sql
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
```
结果:
```text
┌─equals(6, position(s, '/'))─┐
│ 1 │
└─────────────────────────────┘
```
示例,子字符串 `needle` 为空:
``` sql
SELECT
position('abc', ''),
position('abc', '', 0),
position('abc', '', 1),
position('abc', '', 2),
position('abc', '', 3),
position('abc', '', 4),
position('abc', '', 5)
```
结果:
``` text
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│ 1 │ 1 │ 1 │ 2 │ 3 │ 4 │ 0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
```
## locate
类似于 [position](#position) 但交换了 `haystack``locate` 参数。
此函数的行为取决于 ClickHouse 版本:
- 在 v24.3 以下的版本中,`locate` 是函数`position`的别名,参数为 `(haystack, needle[, start_pos])`
- 在 v24.3 及以上的版本中,, `locate` 是独立的函数 (以更好地兼容 MySQL) ,参数为 `(needle, haystack[, start_pos])`。 之前的行为
可以在设置中恢复 [function_locate_has_mysql_compatible_argument_order = false](../../operations/settings/settings.md#function-locate-has-mysql-compatible-argument-order);
**语法**
``` sql
locate(needle, haystack[, start_pos])
```
## positionCaseInsensitive
类似于 [position](#position) 但是不区分大小写。
## positionUTF8
类似于 [position](#position) 但是假定 `haystack``needle` 是 UTF-8 编码的字符串。
**示例**
函数 `positionUTF8` 可以正确的将字符 `ö` 计为单个 Unicode 代码点(`ö`由两个点表示):
``` sql
SELECT positionUTF8('Motörhead', 'r');
```
结果:
``` text
┌─position('Motörhead', 'r')─┐
│ 5 │
└────────────────────────────┘
```
## positionCaseInsensitiveUTF8
类似于 [positionUTF8](#positionutf8) 但是不区分大小写。
## multiSearchAllPositions
类似于 [position](#position) 但是返回多个在字符串 `haystack``needle` 子字符串的位置的数组(以字节为单位,从 1 开始)。
对于不区分大小写的搜索或/和UTF-8格式使用函数`multiSearchAnyCaseInsensitivemultiSearchAnyUTF8multiSearchAnyCaseInsensitiveUTF8`。
:::note
在所有`multiSearch*`函数中由于实现规范needles的数量应小于2<sup>8</sup>
所有以 `multiSearch*()` 开头的函数仅支持最多 2<sup>8</sup> 个`needle`.
:::
## 匹配(大海捞针,模式) {#matchhaystack-pattern}
**语法**
检查字符串是否与`pattern`正则表达式匹配。`pattern`可以是一个任意的`re2`正则表达式。 `re2`正则表达式的[语法](https://github.com/google/re2/wiki/Syntax)比Perl正则表达式的语法存在更多限制。
``` sql
multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])
```
如果不匹配返回0否则返回1。
**参数**
请注意,反斜杠符号(`\`)用于在正则表达式中转义。由于字符串中采用相同的符号来进行转义。因此,为了在正则表达式中转义符号,必须在字符串文字中写入两个反斜杠(\\)。
- `haystack` — 被检索查询字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — 子字符串数组, 类型为[Array](../../sql-reference/data-types/array.md)
正则表达式与字符串一起使用,就像它是一组字节一样。正则表达式中不能包含空字节。
对于在字符串中搜索子字符串的模式最好使用LIKE或«position»因为它们更加高效。
**返回值**
## multiMatchAny大海捞针\[模式<sub>1</sub>,模式<sub>2</sub>, …, pattern<sub>n</sub>\]) {#multimatchanyhaystack-pattern1-pattern2-patternn}
- 位置数组,数组中的每个元素对应于 `needle` 数组中的一个元素。如果在 `haystack` 中找到子字符串,则返回的数组中的元素为子字符串的位置(以字节为单位,从 1 开始);如果未找到子字符串,则返回的数组中的元素为 0。
与`match`相同但如果所有正则表达式都不匹配则返回0如果任何模式匹配则返回1。它使用[超扫描](https://github.com/intel/hyperscan)库。对于在字符串中搜索子字符串的模式最好使用«multisearchany»因为它更高效。
**示例**
``` sql
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
```
结果:
``` text
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
```
## multiSearchAllPositionsUTF8
类似于 [multiSearchAllPositions](#multiSearchAllPositions) ,但假定 `haystack``needle`-s 是 UTF-8 编码的字符串。
## multiSearchFirstPosition
类似于 `position` , 在字符串`haystack`中匹配多个`needle`子字符串,从左开始任一匹配的子串,返回其位置。
函数 `multiSearchFirstPositionCaseInsensitive`, `multiSearchFirstPositionUTF8``multiSearchFirstPositionCaseInsensitiveUTF8` 提供此函数的不区分大小写 以及/或 UTF-8 变体。
**语法**
```sql
multiSearchFirstPosition(haystack, [needle1, needle2, …, needleN])
```
## multiSearchFirstIndex
在字符串`haystack`中匹配最左侧的 needle<sub>i</sub> 子字符串,返回其索引 `i` (从1开始)如无法匹配则返回0。
函数 `multiSearchFirstIndexCaseInsensitive`, `multiSearchFirstIndexUTF8``multiSearchFirstIndexCaseInsensitiveUTF8` 提供此函数的不区分大小写以及/或 UTF-8 变体。
**语法**
```sql
multiSearchFirstIndex(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
```
## multiSearchAny {#multisearchany}
至少已有一个子字符串`needle`匹配 `haystack` 时返回1否则返回 0 。
函数 `multiSearchAnyCaseInsensitive`, `multiSearchAnyUTF8``multiSearchAnyCaseInsensitiveUTF8` 提供此函数的不区分大小写以及/或 UTF-8 变体。
**语法**
```sql
multiSearchAny(haystack, [needle1, needle2, …, needleN])
```
## match {#match}
返回字符串 `haystack` 是否匹配正则表达式 `pattern` [re2正则语法参考](https://github.com/google/re2/wiki/Syntax)
匹配基于 UTF-8例如`.` 匹配 Unicode 代码点 `¥`,它使用两个字节以 UTF-8 表示。T正则表达式不得包含空字节。如果 `haystack` 或`pattern`不是有效的 UTF-8则此行为为undefined。
与 re2 的默认行为不同,`.` 会匹配换行符。要禁用此功能,请在模式前面添加`(?-s)`。
如果仅希望搜索子字符串,可以使用函数 [like](#like)或 [position](#position) 来替代,这些函数的性能比此函数更高。
**语法**
```sql
match(haystack, pattern)
```
别名: `haystack REGEXP pattern operator`
## multiMatchAny
类似于 `match`,如果至少有一个表达式 `pattern<sub>i</sub>` 匹配字符串 `haystack`则返回1否则返回0。
:::note
任何`haystack`字符串的长度必须小于2<sup>32\</sup>字节否则抛出异常。这种限制是因为hyperscan API而产生的。
`multi[Fuzzy]Match*()` 函数家族使用了(Vectorscan)[https://github.com/VectorCamp/vectorscan]库. 因此,只有当 ClickHouse 编译时支持矢量扫描时,它们才会启用。
要关闭所有使用矢量扫描(hyperscan)的功能,请使用设置 `SET allow_hyperscan = 0;`
由于Vectorscan的限制`haystack` 字符串的长度必须小于2<sup>32</sup>字节。
Hyperscan 通常容易受到正则表达式拒绝服务 (ReDoS) 攻击。有关更多信息,请参见
[https://www.usenix.org/conference/usenixsecurity22/presentation/turonova](https://www.usenix.org/conference/usenixsecurity22/presentation/turonova)
[https://doi.org/10.1007/s10664-021-10033-1](https://doi.org/10.1007/s10664-021-10033-1)
[https://doi.org/10.1145/3236024.3236027](https://doi.org/10.1145/3236024.3236027)
建议用户谨慎检查提供的表达式。
:::
## multiMatchAnyIndex大海捞针\[模式<sub>1</sub>,模式<sub>2</sub>, …, pattern<sub>n</sub>\]) {#multimatchanyindexhaystack-pattern1-pattern2-patternn}
如果仅希望搜索子字符串,可以使用函数 [multiSearchAny](#multisearchany) 来替代,这些函数的性能比此函数更高。
与`multiMatchAny`相同但返回与haystack匹配的任何内容的索引位置。
**语法**
## multiFuzzyMatchAny(干草堆,距离,\[模式<sub>1</sub>,模式<sub>2</sub>, …, pattern<sub>n</sub>\]) {#multifuzzymatchanyhaystack-distance-pattern1-pattern2-patternn}
```sql
multiMatchAny(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
与`multiMatchAny`相同但如果在haystack能够查找到任何模式匹配能够在指定的[编辑距离](https://en.wikipedia.org/wiki/Edit_distance)内进行匹配则返回1。此功能也处于实验模式可能非常慢。有关更多信息请参阅[hyperscan文档](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching)。
## multiMatchAnyIndex
## multiFuzzyMatchAnyIndex(大海捞针,距离,\[模式<sub>1</sub>,模式<sub>2</sub>, …, pattern<sub>n</sub>\]) {#multifuzzymatchanyindexhaystack-distance-pattern1-pattern2-patternn}
类似于 `multiMatchAny` ,返回任何子串匹配 `haystack` 的索引。
与`multiFuzzyMatchAny`相同,但返回匹配项的匹配能容的索引位置。
**语法**
```sql
multiMatchAnyIndex(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
## multiMatchAllIndices
类似于 `multiMatchAny`,返回一个数组,包含所有匹配 `haystack` 的子字符串的索引。
**语法**
```sql
multiMatchAllIndices(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
## multiFuzzyMatchAny
类似于 `multiMatchAny` ,如果任一 `pattern` 匹配 `haystack`则返回1 within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance). 该功能依赖于实验特征 [hyperscan](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching) 库,并且对于某些边缘场景可能会很慢。性能取决于编辑距离`distance`的值和使用的`partten`,但与非模糊搜索相比,它的开销总是更高的。
:::note
`multiFuzzyMatch*`函数不支持UTF-8正则表达式由于hyperscan限制这些表达式被按字节解析。
由于 hyperscan 的限制,`multiFuzzyMatch*()` 函数族不支持 UTF-8 正则表达式hyperscan以一串字节来处理
:::
**语法**
```sql
multiFuzzyMatchAny(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
## multiFuzzyMatchAnyIndex
类似于 `multiFuzzyMatchAny` 返回在编辑距离内与`haystack`匹配的任何索引
**语法**
```sql
multiFuzzyMatchAnyIndex(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
## multiFuzzyMatchAllIndices
类似于 `multiFuzzyMatchAny` 返回在编辑距离内与`haystack`匹配的所有索引的数组。
**语法**
```sql
multiFuzzyMatchAllIndices(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
```
## extract
使用正则表达式提取字符串。如果字符串 `haystack` 不匹配正则表达式 `pattern` ,则返回空字符串。
对于没有子模式的正则表达式,该函数使用与整个正则表达式匹配的片段。否则,它使用与第一个子模式匹配的片段。
**语法**
```sql
extract(haystack, pattern)
```
## extractAll
使用正则表达式提取字符串的所有片段。如果字符串 `haystack` 不匹配正则表达式 `pattern` ,则返回空字符串。
返回所有匹配项组成的字符串数组。
子模式的行为与函数`extract`中的行为相同。
**语法**
```sql
extractAll(haystack, pattern)
```
## extractAllGroupsHorizontal
使用`pattern`正则表达式匹配`haystack`字符串的所有组。
返回一个元素为数组的数组,其中第一个数组包含与第一组匹配的所有片段,第二个数组包含与第二组匹配的所有片段,依此类推。
这个函数相比 [extractAllGroupsVertical](#extractallgroups-vertical)更慢。
**语法**
``` sql
extractAllGroupsHorizontal(haystack, pattern)
```
**参数**
- `haystack` — 输入的字符串,数据类型为[String](../../sql-reference/data-types/string.md).
- `pattern` — 正则表达式([re2正则语法参考](https://github.com/google/re2/wiki/Syntax) ,必须包含 group每个 group 用括号括起来。 如果 `pattern` 不包含 group 则会抛出异常。 数据类型为[String](../../sql-reference/data-types/string.md).
**返回值**
- 数据类型: [Array](../../sql-reference/data-types/array.md).
如果`haystack`不匹配`pattern`正则表达式,则返回一个空数组的数组。
**示例**
``` sql
SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
```
结果:
``` text
┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']] │
└──────────────────────────────────────────────────────────────────────────────────────────┘
```
## extractAllGroupsVertical
使用正则表达式 `pattern`匹配字符串`haystack`中的所有group。返回一个数组其中每个数组包含每个group的匹配片段。片段按照在`haystack`中出现的顺序进行分组。
**语法**
``` sql
extractAllGroupsVertical(haystack, pattern)
```
**参数**
- `haystack` — 输入的字符串,数据类型为[String](../../sql-reference/data-types/string.md).
- `pattern` — 正则表达式([re2正则语法参考](https://github.com/google/re2/wiki/Syntax) 必须包含group每个group用括号括起来。 如果 `pattern` 不包含group则会抛出异常。 数据类型为[String](../../sql-reference/data-types/string.md).
**返回值**
- 数据类型: [Array](../../sql-reference/data-types/array.md).
如果`haystack`不匹配`pattern`正则表达式,则返回一个空数组。
**示例**
``` sql
SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
```
结果:
``` text
┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']] │
└────────────────────────────────────────────────────────────────────────────────────────┘
```
## like {#like}
返回字符串 `haystack` 是否匹配 LIKE 表达式 `pattern`
一个 LIKE 表达式可以包含普通字符和以下元字符:
- `%` 表示任意数量的任意字符(包括零个字符)。
- `_` 表示单个任意字符。
- `\` 用于转义文字 `%`, `_``\`
匹配基于 UTF-8例如 `_` 匹配 Unicode 代码点 `¥`,它使用两个字节以 UTF-8 表示。
如果 `haystack``LIKE` 表达式不是有效的 UTF-8则行为是未定义的。
不会自动执行 Unicode 规范化,您可以使用[normalizeUTF8*()](https://clickhouse.com/docs/zh/sql-reference/functions/string-functions/) 函数来执行此操作。
如果需要匹配字符 `%`, `_``/`(这些是 LIKE 元字符),请在其前面加上反斜杠:`\%`, `\_` 和 `\\`。
如果在非 `%`, `_``\` 字符前使用反斜杠,则反斜杠将失去其特殊含义(即被解释为字面值)。
请注意ClickHouse 要求字符串中使用反斜杠 [也需要被转义](../syntax.md#string), 因此您实际上需要编写 `\\%`、`\\_` 和 `\\\\`
对于形式为 `%needle%` 的 LIKE 表达式,函数的性能与 `position` 函数相同。
所有其他 LIKE 表达式都会被内部转换为正则表达式,并以与函数 `match` 相似的性能执行。
**语法**
```sql
like(haystack, pattern)
```
别名: `haystack LIKE pattern` (operator)
## notLike {#notlike}
类似于 `like` 但是返回相反的结果。
别名: `haystack NOT LIKE pattern` (operator)
## ilike
类似于 `like` 但是不区分大小写。
别名: `haystack ILIKE pattern` (operator)
## notILike
类似于 `ilike` 但是返回相反的结果。
别名: `haystack NOT ILIKE pattern` (operator)
## ngramDistance
计算字符串 `haystack` 和子字符串 `needle` 的 4-gram 距离。 为此,它计算两个 4-gram 多重集之间的对称差异,并通过它们的基数之和对其进行标准化。返回 0 到 1 之间的 Float32 浮点数。返回值越小,代表字符串越相似. 如果参数 `needle` or `haystack` 是常数且大小超过 32Kb则抛出异常。如果参数 `haystack``needle` 是非常数且大小超过 32Kb ,则返回值恒为 1。
函数 `ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8` 提供此函数的不区分大小写以及/或 UTF-8 变体。
**语法**
```sql
ngramDistance(haystack, needle)
```
## ngramSearch
类似于`ngramDistance`,但计算`needle`字符串和 `haystack` 字符串之间的非对称差异,即来自 `needle` 的 n-gram 数量减去由`needle`数量归一化的 n-gram 的公共数量 n-gram。返回 0 到 1 之间的 Float32 浮点数。结果越大,`needle` 越有可能在 `haystack` 中。该函数对于模糊字符串搜索很有用。另请参阅函数 `soundex``。
函数 `ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8` 提供此函数的不区分大小写以及/或 UTF-8 变体。
:::note
如要关闭所有hyperscan函数的使用请设置`SET allow_hyperscan = 0;`。
UTF-8 变体使用了 3-gram 距离。这些并不是完全公平的 n-gram 距离。我们使用 2 字节的哈希函数来哈希 n-gram然后计算这些哈希表之间的(非)对称差异——可能会发生冲突。在使用 UTF-8 大小写不敏感格式时,我们并不使用公平的 `tolower` 函数——我们将每个码点字节的第 5 位(从零开始)和第零字节的第一个比特位位置为零(如果该串的大小超过一个字节)——这对拉丁字母和大部分西里尔字母都有效
:::
## 提取(大海捞针,图案) {#extracthaystack-pattern}
**语法**
使用正则表达式截取字符串。如果haystackpattern不匹配则返回空字符串。如果正则表达式中不包含子模式它将获取与整个正则表达式匹配的子串。否则它将获取与第一个子模式匹配的子串。
```sql
ngramSearch(haystack, needle)
```
## extractAll大海捞针图案) {#extractallhaystack-pattern}
## countSubstrings
使用正则表达式提取字符串的所有片段。如果haystackpattern正则表达式不匹配则返回一个空字符串。否则返回所有与正则表达式匹配的字符串数组。通常行为与extract函数相同它采用第一个子模式如果没有子模式则采用整个表达式
返回字符串 `haystack` 中子字符串 `needle` 出现的次数
## 像(干草堆,模式),干草堆像模式运算符 {#likehaystack-pattern-haystack-like-pattern-operator}
函数 `countSubstringsCaseInsensitive``countSubstringsCaseInsensitiveUTF8` 提供此函数的不区分大小写以及 UTF-8 变体。
检查字符串是否与简单正则表达式匹配。
正则表达式可以包含的元符号有``和`_`。
**语法**
`%` 表示任何字节数(包括零字符)。
``` sql
countSubstrings(haystack, needle[, start_pos])
```
`_` 表示任何一个字节。
**参数**
可以使用反斜杠(`\`来对元符号进行转义。请参阅«match»函数说明中有关转义的说明。
- `haystack` — 被搜索的字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — 用于搜索的模式子字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `start_pos` 在字符串`haystack` 中开始检索的位置(从 1 开始),类型为[UInt](../../sql-reference/data-types/int-uint.md),可选。
对于像`needle`这样的正则表达式,改函数与`position`函数一样快。
对于其他正则表达式函数与match函数相同。
**返回值**
## 不喜欢(干草堆,模式),干草堆不喜欢模式运算符 {#notlikehaystack-pattern-haystack-not-like-pattern-operator}
- 子字符串出现的次数。
like函数返回相反的结果。
数据类型: [UInt64](../../sql-reference/data-types/int-uint.md).
## 大海捞针) {#ngramdistancehaystack-needle}
**示例**
基于4-gram计算`haystack`和`needle`之间的距离计算两个4-gram集合之间的对称差异并用它们的基数和对其进行归一化。返回0到1之间的任何浮点数 越接近0则表示越多的字符串彼此相似。如果常量的`needle`或`haystack`超过32KB函数将抛出异常。如果非常量的`haystack`或`needle`字符串超过32Kb则距离始终为1。
``` sql
SELECT countSubstrings('aaaa', 'aa');
```
对于不区分大小写的搜索或/和UTF-8格式使用函数`ngramDistanceCaseInsensitivengramDistanceUTF8ngramDistanceCaseInsensitiveUTF8`。
结果:
## ツ暗ェツ氾环催ツ団ツ法ツ人) {#ngramsearchhaystack-needle}
``` text
┌─countSubstrings('aaaa', 'aa')─┐
│ 2 │
└───────────────────────────────┘
```
与`ngramDistance`相同,但计算`needle`和`haystack`之间的非对称差异——`needle`的n-gram减去`needle`归一化n-gram。可用于模糊字符串搜索。
示例,使用参数 `start_pos` :
对于不区分大小写的搜索或/和UTF-8格式使用函数`ngramSearchCaseInsensitivengramSearchUTF8ngramSearchCaseInsensitiveUTF8`。
```sql
SELECT countSubstrings('abc___abc', 'abc', 4);
```
:::note
对于UTF-8我们使用3-gram。所有这些都不是完全公平的n-gram距离。我们使用2字节哈希来散列n-gram然后计算这些哈希表之间的对称差异 - 可能会发生冲突。对于UTF-8不区分大小写的格式我们不使用公平的`tolower`函数 - 我们将每个Unicode字符字节的第5位从零开始和字节的第一位归零 - 这适用于拉丁语,主要用于所有西里尔字母。
:::
结果:
``` text
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│ 1 │
└────────────────────────────────────────┘
```
## countMatches
返回正则表达式 `pattern``haystack` 中成功匹配的次数。
**语法**
``` sql
countMatches(haystack, pattern)
```
**参数**
- `haystack` — 输入的字符串,数据类型为[String](../../sql-reference/data-types/string.md).
- `pattern` — 正则表达式([re2正则语法参考](https://github.com/google/re2/wiki/Syntax) 数据类型为[String](../../sql-reference/data-types/string.md).
**返回值**
- 匹配次数。
数据类型: [UInt64](../../sql-reference/data-types/int-uint.md).
**示例**
``` sql
SELECT countMatches('foobar.com', 'o+');
```
结果:
``` text
┌─countMatches('foobar.com', 'o+')─┐
│ 2 │
└──────────────────────────────────┘
```
``` sql
SELECT countMatches('aaaa', 'aa');
```
结果:
``` text
┌─countMatches('aaaa', 'aa')────┐
│ 2 │
└───────────────────────────────┘
```
## countMatchesCaseInsensitive
类似于 `countMatches(haystack, pattern)` 但是不区分大小写。
## regexpExtract
提取匹配正则表达式模式的字符串`haystack`中的第一个字符串,并对应于正则表达式组索引。
**语法**
``` sql
regexpExtract(haystack, pattern[, index])
```
别名: `REGEXP_EXTRACT(haystack, pattern[, index])`.
**参数**
- `haystack` — 被匹配字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `pattern` — 正则表达式,必须是常量。类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `index` 一个大于等于 0 的整数,默认为 1 ,它代表要提取哪个正则表达式组。 [UInt or Int](../../sql-reference/data-types/int-uint.md) 可选。
**返回值**
`pattern`可以包含多个正则组, `index` 代表要提取哪个正则表达式组。如果 `index` 为 0则返回整个匹配的字符串。
数据类型: `String`.
**示例**
``` sql
SELECT
regexpExtract('100-200', '(\\d+)-(\\d+)', 1),
regexpExtract('100-200', '(\\d+)-(\\d+)', 2),
regexpExtract('100-200', '(\\d+)-(\\d+)', 0),
regexpExtract('100-200', '(\\d+)-(\\d+)');
```
结果:
``` text
┌─regexpExtract('100-200', '(\\d+)-(\\d+)', 1)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 2)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 0)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)')─┐
│ 100 │ 200 │ 100-200 │ 100 │
└──────────────────────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┴───────────────────────────────────────────┘
```
## hasSubsequence
如果`needle`是`haystack`的子序列返回1否则返回0。
子序列是从给定字符串中删除零个或多个元素而不改变剩余元素的顺序得到的序列。
**语法**
``` sql
hasSubsequence(haystack, needle)
```
**参数**
- `haystack` — 被搜索的字符串,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — 搜索子序列,类型为[String](../../sql-reference/syntax.md#syntax-string-literal).
**返回值**
- 1, 如果`needle`是`haystack`的子序列
- 0, 如果`needle`不是`haystack`的子序列
数据类型: `UInt8`.
**示例**
``` sql
SELECT hasSubsequence('garbage', 'arg') ;
```
结果:
``` text
┌─hasSubsequence('garbage', 'arg')─┐
│ 1 │
└──────────────────────────────────┘
```
## hasSubsequenceCaseInsensitive
类似于 [hasSubsequence](#hasSubsequence) 但是不区分大小写。
## hasSubsequenceUTF8
类似于 [hasSubsequence](#hasSubsequence) 但是假定 `haystack``needle` 是 UTF-8 编码的字符串。
## hasSubsequenceCaseInsensitiveUTF8
类似于 [hasSubsequenceUTF8](#hasSubsequenceUTF8) 但是不区分大小写。

View File

@ -17,12 +17,13 @@
#include <Access/AccessControl.h>
#include <Common/config_version.h>
#include <Common/Exception.h>
#include <Common/formatReadable.h>
#include <Common/TerminalSize.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/Config/getClientConfigPath.h>
#include <Common/CurrentThread.h>
#include <Common/Exception.h>
#include <Common/TerminalSize.h>
#include <Common/config_version.h>
#include <Common/formatReadable.h>
#include <Columns/ColumnString.h>
#include <Poco/Util/Application.h>

View File

@ -99,6 +99,19 @@ function(add_rust_subdirectory src)
message(STATUS "Copy ${src} to ${dst}")
file(COPY "${src}" DESTINATION "${CMAKE_CURRENT_BINARY_DIR}"
PATTERN target EXCLUDE)
# Check is Rust available or not.
#
# `cargo update --dry-run` will not update anything, but will check the internet connectivity.
execute_process(COMMAND ${Rust_CARGO_CACHED} update --dry-run
WORKING_DIRECTORY "${dst}"
RESULT_VARIABLE CARGO_UPDATE_RESULT
OUTPUT_VARIABLE CARGO_UPDATE_STDOUT
ERROR_VARIABLE CARGO_UPDATE_STDERR)
if (CARGO_UPDATE_RESULT)
message(FATAL_ERROR "Rust (${Rust_CARGO_CACHED}) support is not available (likely there is no internet connectivity):\n${CARGO_UPDATE_STDERR}\nYou can disable Rust support with -DENABLE_RUST=OFF")
endif()
add_subdirectory("${dst}" "${dst}")
# cmake -E copy* do now know how to exclude files

View File

@ -1,6 +1,8 @@
#include <Access/AccessRights.h>
#include <Common/logger_useful.h>
#include <base/sort.h>
#include <Common/Exception.h>
#include <Common/logger_useful.h>
#include <boost/container/small_vector.hpp>
#include <boost/range/adaptor/map.hpp>
#include <unordered_map>

View File

@ -205,7 +205,7 @@ enum class AccessType
M(SYSTEM_FLUSH, "", GROUP, SYSTEM) \
M(SYSTEM_THREAD_FUZZER, "SYSTEM START THREAD FUZZER, SYSTEM STOP THREAD FUZZER, START THREAD FUZZER, STOP THREAD FUZZER", GLOBAL, SYSTEM) \
M(SYSTEM_UNFREEZE, "SYSTEM UNFREEZE", GLOBAL, SYSTEM) \
M(SYSTEM_FAILPOINT, "SYSTEM ENABLE FAILPOINT, SYSTEM DISABLE FAILPOINT", GLOBAL, SYSTEM) \
M(SYSTEM_FAILPOINT, "SYSTEM ENABLE FAILPOINT, SYSTEM DISABLE FAILPOINT, SYSTEM WAIT FAILPOINT", GLOBAL, SYSTEM) \
M(SYSTEM_LISTEN, "SYSTEM START LISTEN, SYSTEM STOP LISTEN", GLOBAL, SYSTEM) \
M(SYSTEM_JEMALLOC, "SYSTEM JEMALLOC PURGE, SYSTEM JEMALLOC ENABLE PROFILE, SYSTEM JEMALLOC DISABLE PROFILE, SYSTEM JEMALLOC FLUSH PROFILE", GLOBAL, SYSTEM) \
M(SYSTEM, "", GROUP, ALL) /* allows to execute SYSTEM {SHUTDOWN|RELOAD CONFIG|...} */ \

View File

@ -1,11 +1,11 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/Combinators/AggregateFunctionCombinatorFactory.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/Context.h>
#include <Common/CurrentThread.h>
static constexpr size_t MAX_AGGREGATE_FUNCTION_NAME_LENGTH = 1000;

View File

@ -457,9 +457,9 @@ public:
detail::Adder<T, Data>::add(this->data(place), columns, num_args, row_begin, row_end, flags, null_map);
}
bool isParallelizeMergePrepareNeeded() const override { return is_parallelize_merge_prepare_needed;}
bool isParallelizeMergePrepareNeeded() const override { return is_parallelize_merge_prepare_needed; }
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
if constexpr (is_parallelize_merge_prepare_needed)
{
@ -469,7 +469,7 @@ public:
for (size_t i = 0; i < data_vec.size(); ++i)
data_vec[i] = &this->data(places[i]).set;
DataSet::parallelizeMergePrepare(data_vec, thread_pool);
DataSet::parallelizeMergePrepare(data_vec, thread_pool, is_cancelled);
}
else
{
@ -485,10 +485,10 @@ public:
bool isAbleToParallelizeMerge() const override { return is_able_to_parallelize_merge; }
bool canOptimizeEqualKeysRanges() const override { return !is_able_to_parallelize_merge; }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena *) const override
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena *) const override
{
if constexpr (is_able_to_parallelize_merge)
this->data(place).set.merge(this->data(rhs).set, &thread_pool);
this->data(place).set.merge(this->data(rhs).set, &thread_pool, &is_cancelled);
else
this->data(place).set.merge(this->data(rhs).set);
}
@ -579,10 +579,10 @@ public:
bool isAbleToParallelizeMerge() const override { return is_able_to_parallelize_merge; }
bool canOptimizeEqualKeysRanges() const override { return !is_able_to_parallelize_merge; }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena *) const override
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena *) const override
{
if constexpr (is_able_to_parallelize_merge)
this->data(place).set.merge(this->data(rhs).set, &thread_pool);
this->data(place).set.merge(this->data(rhs).set, &thread_pool, &is_cancelled);
else
this->data(place).set.merge(this->data(rhs).set);
}

View File

@ -144,9 +144,14 @@ public:
bool isAbleToParallelizeMerge() const override { return nested_func->isAbleToParallelizeMerge(); }
bool canOptimizeEqualKeysRanges() const override { return nested_func->canOptimizeEqualKeysRanges(); }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena * arena) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
nested_func->merge(place, rhs, thread_pool, arena);
nested_func->parallelizeMergePrepare(places, thread_pool, is_cancelled);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena * arena) const override
{
nested_func->merge(place, rhs, thread_pool, is_cancelled, arena);
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override

View File

@ -167,9 +167,14 @@ public:
bool isAbleToParallelizeMerge() const override { return nested_func->isAbleToParallelizeMerge(); }
bool canOptimizeEqualKeysRanges() const override { return nested_func->canOptimizeEqualKeysRanges(); }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena * arena) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
nested_func->merge(place, rhs, thread_pool, arena);
nested_func->parallelizeMergePrepare(places, thread_pool, is_cancelled);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena * arena) const override
{
nested_func->merge(place, rhs, thread_pool, is_cancelled, arena);
}
void mergeBatch(

View File

@ -113,9 +113,14 @@ public:
bool isAbleToParallelizeMerge() const override { return nested_func->isAbleToParallelizeMerge(); }
bool canOptimizeEqualKeysRanges() const override { return nested_func->canOptimizeEqualKeysRanges(); }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena * arena) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
nested_func->merge(place, rhs, thread_pool, arena);
nested_func->parallelizeMergePrepare(places, thread_pool, is_cancelled);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena * arena) const override
{
nested_func->merge(place, rhs, thread_pool, is_cancelled, arena);
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override

View File

@ -154,9 +154,18 @@ public:
bool isAbleToParallelizeMerge() const override { return nested_function->isAbleToParallelizeMerge(); }
bool canOptimizeEqualKeysRanges() const override { return nested_function->canOptimizeEqualKeysRanges(); }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena * arena) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
nested_function->merge(nestedPlace(place), nestedPlace(rhs), thread_pool, arena);
AggregateDataPtrs nested_places(places.begin(), places.end());
for (auto & nested_place : nested_places)
nested_place = nestedPlace(nested_place);
nested_function->parallelizeMergePrepare(nested_places, thread_pool, is_cancelled);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena * arena) const override
{
nested_function->merge(nestedPlace(place), nestedPlace(rhs), thread_pool, is_cancelled, arena);
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override

View File

@ -94,9 +94,14 @@ public:
bool isAbleToParallelizeMerge() const override { return nested_func->isAbleToParallelizeMerge(); }
bool canOptimizeEqualKeysRanges() const override { return nested_func->canOptimizeEqualKeysRanges(); }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, Arena * arena) const override
void parallelizeMergePrepare(AggregateDataPtrs & places, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled) const override
{
nested_func->merge(place, rhs, thread_pool, arena);
nested_func->parallelizeMergePrepare(places, thread_pool, is_cancelled);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled, Arena * arena) const override
{
nested_func->merge(place, rhs, thread_pool, is_cancelled, arena);
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override

View File

@ -151,7 +151,7 @@ public:
virtual bool isParallelizeMergePrepareNeeded() const { return false; }
virtual void parallelizeMergePrepare(AggregateDataPtrs & /*places*/, ThreadPool & /*thread_pool*/) const
virtual void parallelizeMergePrepare(AggregateDataPtrs & /*places*/, ThreadPool & /*thread_pool*/, std::atomic<bool> & /*is_cancelled*/) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "parallelizeMergePrepare() with thread pool parameter isn't implemented for {} ", getName());
}
@ -168,7 +168,7 @@ public:
/// Should be used only if isAbleToParallelizeMerge() returned true.
virtual void
merge(AggregateDataPtr __restrict /*place*/, ConstAggregateDataPtr /*rhs*/, ThreadPool & /*thread_pool*/, Arena * /*arena*/) const
merge(AggregateDataPtr __restrict /*place*/, ConstAggregateDataPtr /*rhs*/, ThreadPool & /*thread_pool*/, std::atomic<bool> & /*is_cancelled*/, Arena * /*arena*/) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "merge() with thread pool parameter isn't implemented for {} ", getName());
}

View File

@ -37,7 +37,7 @@ public:
/// In merge, if one of the lhs and rhs is twolevelset and the other is singlelevelset, then the singlelevelset will need to convertToTwoLevel().
/// It's not in parallel and will cost extra large time if the thread_num is large.
/// This method will convert all the SingleLevelSet to TwoLevelSet in parallel if the hashsets are not all singlelevel or not all twolevel.
static void parallelizeMergePrepare(const std::vector<UniqExactSet *> & data_vec, ThreadPool & thread_pool)
static void parallelizeMergePrepare(const std::vector<UniqExactSet *> & data_vec, ThreadPool & thread_pool, std::atomic<bool> & is_cancelled)
{
UInt64 single_level_set_num = 0;
UInt64 all_single_hash_size = 0;
@ -63,7 +63,7 @@ public:
try
{
auto data_vec_atomic_index = std::make_shared<std::atomic_uint32_t>(0);
auto thread_func = [data_vec, data_vec_atomic_index, thread_group = CurrentThread::getGroup()]()
auto thread_func = [data_vec, data_vec_atomic_index, &is_cancelled, thread_group = CurrentThread::getGroup()]()
{
SCOPE_EXIT_SAFE(
if (thread_group)
@ -76,6 +76,9 @@ public:
while (true)
{
if (is_cancelled.load(std::memory_order_seq_cst))
return;
const auto i = data_vec_atomic_index->fetch_add(1);
if (i >= data_vec.size())
return;
@ -96,7 +99,7 @@ public:
}
}
auto merge(const UniqExactSet & other, ThreadPool * thread_pool = nullptr)
auto merge(const UniqExactSet & other, ThreadPool * thread_pool = nullptr, std::atomic<bool> * is_cancelled = nullptr)
{
if (isSingleLevel() && other.isTwoLevel())
convertToTwoLevel();
@ -113,7 +116,9 @@ public:
if (!thread_pool)
{
for (size_t i = 0; i < rhs.NUM_BUCKETS; ++i)
{
lhs.impls[i].merge(rhs.impls[i]);
}
}
else
{
@ -121,7 +126,7 @@ public:
{
auto next_bucket_to_merge = std::make_shared<std::atomic_uint32_t>(0);
auto thread_func = [&lhs, &rhs, next_bucket_to_merge, thread_group = CurrentThread::getGroup()]()
auto thread_func = [&lhs, &rhs, next_bucket_to_merge, is_cancelled, thread_group = CurrentThread::getGroup()]()
{
SCOPE_EXIT_SAFE(
if (thread_group)
@ -133,6 +138,9 @@ public:
while (true)
{
if (is_cancelled->load(std::memory_order_seq_cst))
return;
const auto bucket = next_bucket_to_merge->fetch_add(1);
if (bucket >= rhs.NUM_BUCKETS)
return;

View File

@ -4,6 +4,7 @@
#include <Analyzer/IQueryTreeNode.h>
#include <Analyzer/ConstantValue.h>
#include <DataTypes/DataTypeNullable.h>
namespace DB
{
@ -86,6 +87,11 @@ public:
mask_id = id;
}
void convertToNullable() override
{
constant_value = std::make_shared<ConstantValue>(constant_value->getValue(), makeNullableSafe(constant_value->getType()));
}
void dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, size_t indent) const override;
protected:

View File

@ -11,35 +11,37 @@ namespace DB
* Example of usage:
* std::unordered_map<QueryTreeNodeConstRawPtrWithHash, std::string> map;
*/
template <typename QueryTreeNodePtrType>
template <typename QueryTreeNodePtrType, bool compare_aliases = true>
struct QueryTreeNodeWithHash
{
QueryTreeNodeWithHash(QueryTreeNodePtrType node_) /// NOLINT
: node(std::move(node_))
, hash(node->getTreeHash())
, hash(node->getTreeHash({.compare_aliases = compare_aliases}))
{}
QueryTreeNodePtrType node = nullptr;
CityHash_v1_0_2::uint128 hash;
};
template <typename T>
inline bool operator==(const QueryTreeNodeWithHash<T> & lhs, const QueryTreeNodeWithHash<T> & rhs)
template <typename T, bool compare_aliases>
inline bool operator==(const QueryTreeNodeWithHash<T, compare_aliases> & lhs, const QueryTreeNodeWithHash<T, compare_aliases> & rhs)
{
return lhs.hash == rhs.hash && lhs.node->isEqual(*rhs.node);
return lhs.hash == rhs.hash && lhs.node->isEqual(*rhs.node, {.compare_aliases = compare_aliases});
}
template <typename T>
inline bool operator!=(const QueryTreeNodeWithHash<T> & lhs, const QueryTreeNodeWithHash<T> & rhs)
template <typename T, bool compare_aliases>
inline bool operator!=(const QueryTreeNodeWithHash<T, compare_aliases> & lhs, const QueryTreeNodeWithHash<T, compare_aliases> & rhs)
{
return !(lhs == rhs);
}
using QueryTreeNodePtrWithHash = QueryTreeNodeWithHash<QueryTreeNodePtr>;
using QueryTreeNodePtrWithHashWithoutAlias = QueryTreeNodeWithHash<QueryTreeNodePtr, /*compare_aliases*/ false>;
using QueryTreeNodeRawPtrWithHash = QueryTreeNodeWithHash<IQueryTreeNode *>;
using QueryTreeNodeConstRawPtrWithHash = QueryTreeNodeWithHash<const IQueryTreeNode *>;
using QueryTreeNodePtrWithHashSet = std::unordered_set<QueryTreeNodePtrWithHash>;
using QueryTreeNodePtrWithHashWithoutAliasSet = std::unordered_set<QueryTreeNodePtrWithHashWithoutAlias>;
using QueryTreeNodeConstRawPtrWithHashSet = std::unordered_set<QueryTreeNodeConstRawPtrWithHash>;
template <typename Value>
@ -50,10 +52,10 @@ using QueryTreeNodeConstRawPtrWithHashMap = std::unordered_map<QueryTreeNodeCons
}
template <typename T>
struct std::hash<DB::QueryTreeNodeWithHash<T>>
template <typename T, bool compare_aliases>
struct std::hash<DB::QueryTreeNodeWithHash<T, compare_aliases>>
{
size_t operator()(const DB::QueryTreeNodeWithHash<T> & node_with_hash) const
size_t operator()(const DB::QueryTreeNodeWithHash<T, compare_aliases> & node_with_hash) const
{
return node_with_hash.hash.low64;
}

View File

@ -164,7 +164,7 @@ bool IQueryTreeNode::isEqual(const IQueryTreeNode & rhs, CompareOptions compare_
return true;
}
IQueryTreeNode::Hash IQueryTreeNode::getTreeHash() const
IQueryTreeNode::Hash IQueryTreeNode::getTreeHash(CompareOptions compare_options) const
{
/** Compute tree hash with this node as root.
*
@ -201,7 +201,7 @@ IQueryTreeNode::Hash IQueryTreeNode::getTreeHash() const
}
hash_state.update(static_cast<size_t>(node_to_process->getNodeType()));
if (!node_to_process->alias.empty())
if (compare_options.compare_aliases && !node_to_process->alias.empty())
{
hash_state.update(node_to_process->alias.size());
hash_state.update(node_to_process->alias);

View File

@ -114,7 +114,7 @@ public:
* Alias of query tree node is part of query tree hash.
* Original AST is not part of query tree hash.
*/
Hash getTreeHash() const;
Hash getTreeHash(CompareOptions compare_options = { .compare_aliases = true }) const;
/// Get a deep copy of the query tree
QueryTreeNodePtr clone() const;

View File

@ -1,73 +0,0 @@
#include <Analyzer/ColumnNode.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Passes/ConvertInToEqualPass.h>
#include <Functions/equals.h>
#include <Functions/notEquals.h>
namespace DB
{
class ConvertInToEqualPassVisitor : public InDepthQueryTreeVisitorWithContext<ConvertInToEqualPassVisitor>
{
public:
using Base = InDepthQueryTreeVisitorWithContext<ConvertInToEqualPassVisitor>;
using Base::Base;
void enterImpl(QueryTreeNodePtr & node)
{
static const std::unordered_map<String, String> MAPPING = {
{"in", "equals"},
{"notIn", "notEquals"}
};
auto * func_node = node->as<FunctionNode>();
if (!func_node
|| !MAPPING.contains(func_node->getFunctionName())
|| func_node->getArguments().getNodes().size() != 2)
return ;
auto args = func_node->getArguments().getNodes();
auto * column_node = args[0]->as<ColumnNode>();
auto * constant_node = args[1]->as<ConstantNode>();
if (!column_node || !constant_node)
return ;
// IN multiple values is not supported
if (constant_node->getValue().getType() == Field::Types::Which::Tuple
|| constant_node->getValue().getType() == Field::Types::Which::Array)
return ;
// x IN null not equivalent to x = null
if (constant_node->getValue().isNull())
return ;
auto result_func_name = MAPPING.at(func_node->getFunctionName());
auto equal = std::make_shared<FunctionNode>(result_func_name);
QueryTreeNodes arguments{column_node->clone(), constant_node->clone()};
equal->getArguments().getNodes() = std::move(arguments);
FunctionOverloadResolverPtr resolver;
bool decimal_check_overflow = getContext()->getSettingsRef().decimal_check_overflow;
if (result_func_name == "equals")
{
resolver = createInternalFunctionEqualOverloadResolver(decimal_check_overflow);
}
else
{
resolver = createInternalFunctionNotEqualOverloadResolver(decimal_check_overflow);
}
try
{
equal->resolveAsFunction(resolver);
}
catch (...)
{
// When function resolver fails, we should not replace the function node
return;
}
node = equal;
}
};
void ConvertInToEqualPass::run(QueryTreeNodePtr & query_tree_node, ContextPtr context)
{
ConvertInToEqualPassVisitor visitor(std::move(context));
visitor.visit(query_tree_node);
}
}

View File

@ -1,27 +0,0 @@
#pragma once
#include <Analyzer/IQueryTreePass.h>
namespace DB
{
/** Optimize `in` to `equals` if possible.
* 1. convert in single value to equal
* Example: SELECT * from test where x IN (1);
* Result: SELECT * from test where x = 1;
*
* 2. convert not in single value to notEqual
* Example: SELECT * from test where x NOT IN (1);
* Result: SELECT * from test where x != 1;
*
* If value is null or tuple, do not convert.
*/
class ConvertInToEqualPass final : public IQueryTreePass
{
public:
String getName() override { return "ConvertInToEqualPass"; }
String getDescription() override { return "Convert in to equal"; }
void run(QueryTreeNodePtr & query_tree_node, ContextPtr context) override;
};
}

View File

@ -146,7 +146,7 @@ void resolveGroupingFunctions(QueryTreeNodePtr & query_node, ContextPtr context)
if (query_node_typed.hasGroupBy())
{
/// It is expected by execution layer that if there are only 1 grouping set it will be removed
if (query_node_typed.isGroupByWithGroupingSets() && query_node_typed.getGroupBy().getNodes().size() == 1)
if (query_node_typed.isGroupByWithGroupingSets() && query_node_typed.getGroupBy().getNodes().size() == 1 && !context->getSettingsRef().group_by_use_nulls)
{
auto grouping_set_list_node = query_node_typed.getGroupBy().getNodes().front();
auto & grouping_set_list_node_typed = grouping_set_list_node->as<ListNode &>();

View File

@ -65,6 +65,12 @@ public:
auto multi_if_function = std::make_shared<FunctionNode>("multiIf");
multi_if_function->getArguments().getNodes() = std::move(multi_if_arguments);
multi_if_function->resolveAsFunction(multi_if_function_ptr->build(multi_if_function->getArgumentColumns()));
/// Ignore if returned type changed.
/// Example : SELECT now64(if(Null, NULL, if(Null, nan, toFloat64(number))), Null) FROM numbers(2)
if (!multi_if_function->getResultType()->equals(*function_node->getResultType()))
return;
node = std::move(multi_if_function);
}

View File

@ -31,6 +31,12 @@ public:
if (!getSettings().optimize_group_by_function_keys)
return;
/// When group_by_use_nulls = 1 removing keys from GROUP BY can lead
/// to unexpected types in some functions.
/// See example in https://github.com/ClickHouse/ClickHouse/pull/61567#issuecomment-2018007887
if (getSettings().group_by_use_nulls)
return;
auto * query = node->as<QueryNode>();
if (!query)
return;
@ -73,12 +79,14 @@ private:
candidates.push_back({ *it, is_deterministic });
/// Using DFS we traverse function tree and try to find if it uses other keys as function arguments.
bool found_at_least_one_usage = false;
while (!candidates.empty())
{
auto [candidate, parents_are_only_deterministic] = candidates.back();
candidates.pop_back();
bool found = group_by_keys.contains(candidate);
found_at_least_one_usage |= found;
switch (candidate->getNodeType())
{
@ -111,7 +119,7 @@ private:
}
}
return true;
return found_at_least_one_usage;
}
static void optimizeGroupingSet(QueryTreeNodes & grouping_set)

View File

@ -43,6 +43,13 @@ public:
if (!getSettings().optimize_injective_functions_in_group_by)
return;
/// Don't optimize injective functions when group_by_use_nulls=true,
/// because in this case we make initial group by keys Nullable
/// and eliminating some functions can cause issues with arguments Nullability
/// during their execution. See examples in https://github.com/ClickHouse/ClickHouse/pull/61567#issuecomment-2008181143
if (getSettings().group_by_use_nulls)
return;
auto * query = node->as<QueryNode>();
if (!query)
return;

View File

@ -776,7 +776,13 @@ struct IdentifierResolveScope
/// Table expression node to data
std::unordered_map<QueryTreeNodePtr, TableExpressionData> table_expression_node_to_data;
QueryTreeNodePtrWithHashSet nullable_group_by_keys;
QueryTreeNodePtrWithHashWithoutAliasSet nullable_group_by_keys;
/// Here we count the number of nullable GROUP BY keys we met resolving expression.
/// E.g. for a query `SELECT tuple(tuple(number)) FROM numbers(10) GROUP BY (number, tuple(number)) with cube`
/// both `number` and `tuple(number)` would be in nullable_group_by_keys.
/// But when we resolve `tuple(tuple(number))` we should figure out that `tuple(number)` is already a key,
/// and we should not convert `number` to nullable.
size_t found_nullable_group_by_key_in_scope = 0;
/** It's possible that after a JOIN, a column in the projection has a type different from the column in the source table.
* (For example, after join_use_nulls or USING column casted to supertype)
@ -2059,13 +2065,9 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
subquery_context->setSetting("use_structure_from_insertion_table_in_table_functions", false);
auto options = SelectQueryOptions(QueryProcessingStage::Complete, scope.subquery_depth, true /*is_subquery*/);
options.only_analyze = only_analyze;
auto interpreter = std::make_unique<InterpreterSelectQueryAnalyzer>(node->toAST(), subquery_context, subquery_context->getViewSource(), options);
auto io = interpreter->execute();
PullingAsyncPipelineExecutor executor(io.pipeline);
io.pipeline.setProgressCallback(context->getProgressCallback());
io.pipeline.setProcessListElement(context->getProcessListElement());
if (only_analyze)
{
/// If query is only analyzed, then constants are not correct.
@ -2082,6 +2084,11 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
}
else
{
auto io = interpreter->execute();
PullingAsyncPipelineExecutor executor(io.pipeline);
io.pipeline.setProgressCallback(context->getProgressCallback());
io.pipeline.setProcessListElement(context->getProcessListElement());
Block block;
while (block.rows() == 0 && executor.pull(block))
@ -2193,7 +2200,7 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
auto & nearest_query_scope_query_node = nearest_query_scope->scope_node->as<QueryNode &>();
auto & mutable_context = nearest_query_scope_query_node.getMutableContext();
auto scalar_query_hash_string = DB::toString(node_with_hash.hash);
auto scalar_query_hash_string = DB::toString(node_with_hash.hash) + (only_analyze ? "_analyze" : "");
if (mutable_context->hasQueryContext())
mutable_context->getQueryContext()->addScalar(scalar_query_hash_string, scalar_block);
@ -6148,6 +6155,12 @@ ProjectionNames QueryAnalyzer::resolveExpressionNode(QueryTreeNodePtr & node, Id
return resolved_expression_it->second;
}
bool is_nullable_group_by_key = scope.nullable_group_by_keys.contains(node) && !scope.expressions_in_resolve_process_stack.hasAggregateFunction();
if (is_nullable_group_by_key)
++scope.found_nullable_group_by_key_in_scope;
SCOPE_EXIT(scope.found_nullable_group_by_key_in_scope -= is_nullable_group_by_key);
String node_alias = node->getAlias();
ProjectionNames result_projection_names;
@ -6439,7 +6452,7 @@ ProjectionNames QueryAnalyzer::resolveExpressionNode(QueryTreeNodePtr & node, Id
validateTreeSize(node, scope.context->getSettingsRef().max_expanded_ast_elements, node_to_tree_size);
if (scope.nullable_group_by_keys.contains(node) && !scope.expressions_in_resolve_process_stack.hasAggregateFunction())
if (is_nullable_group_by_key && scope.found_nullable_group_by_key_in_scope == 1)
{
node = node->clone();
node->convertToNullable();
@ -6666,45 +6679,48 @@ void QueryAnalyzer::resolveGroupByNode(QueryNode & query_node_typed, IdentifierR
if (query_node_typed.isGroupByWithGroupingSets())
{
QueryTreeNodes nullable_group_by_keys;
for (auto & grouping_sets_keys_list_node : query_node_typed.getGroupBy().getNodes())
{
if (settings.enable_positional_arguments)
replaceNodesWithPositionalArguments(grouping_sets_keys_list_node, query_node_typed.getProjection().getNodes(), scope);
resolveExpressionNodeList(grouping_sets_keys_list_node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/);
// Remove redundant calls to `tuple` function. It simplifies checking if expression is an aggregation key.
// It's required to support queries like: SELECT number FROM numbers(3) GROUP BY (number, number % 2)
auto & group_by_list = grouping_sets_keys_list_node->as<ListNode &>().getNodes();
expandTuplesInList(group_by_list);
if (scope.group_by_use_nulls)
for (const auto & group_by_elem : group_by_list)
nullable_group_by_keys.push_back(group_by_elem->clone());
resolveExpressionNodeList(grouping_sets_keys_list_node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/);
}
if (scope.group_by_use_nulls)
{
for (const auto & grouping_set : query_node_typed.getGroupBy().getNodes())
{
for (const auto & group_by_elem : grouping_set->as<ListNode>()->getNodes())
scope.nullable_group_by_keys.insert(group_by_elem);
}
}
for (auto & nullable_group_by_key : nullable_group_by_keys)
scope.nullable_group_by_keys.insert(std::move(nullable_group_by_key));
}
else
{
if (settings.enable_positional_arguments)
replaceNodesWithPositionalArguments(query_node_typed.getGroupByNode(), query_node_typed.getProjection().getNodes(), scope);
resolveExpressionNodeList(query_node_typed.getGroupByNode(), scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/);
// Remove redundant calls to `tuple` function. It simplifies checking if expression is an aggregation key.
// It's required to support queries like: SELECT number FROM numbers(3) GROUP BY (number, number % 2)
auto & group_by_list = query_node_typed.getGroupBy().getNodes();
expandTuplesInList(group_by_list);
QueryTreeNodes nullable_group_by_keys;
if (scope.group_by_use_nulls)
{
for (const auto & group_by_elem : query_node_typed.getGroupBy().getNodes())
scope.nullable_group_by_keys.insert(group_by_elem);
nullable_group_by_keys.push_back(group_by_elem->clone());
}
resolveExpressionNodeList(query_node_typed.getGroupByNode(), scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/);
for (auto & nullable_group_by_key : nullable_group_by_keys)
scope.nullable_group_by_keys.insert(std::move(nullable_group_by_key));
}
}
@ -7360,7 +7376,7 @@ void QueryAnalyzer::resolveTableFunction(QueryTreeNodePtr & table_function_node,
ColumnDescription column = insert_columns.get(*insert_column_name_it);
/// Change ephemeral columns to default columns.
column.default_desc.kind = ColumnDefaultKind::Default;
structure_hint.add(insert_columns.get(*insert_column_name_it));
structure_hint.add(std::move(column));
}
}

View File

@ -5,6 +5,7 @@
#include <Analyzer/ColumnNode.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/Utils.h>
#include <Functions/FunctionFactory.h>
namespace DB
@ -83,7 +84,7 @@ public:
rhs->getArguments().getNodes().push_back(rhs_count);
resolveOrdinaryFunctionNode(*rhs, rhs->getFunctionName());
const auto new_node = std::make_shared<FunctionNode>(Poco::toLower(func_plus_minus_node->getFunctionName()));
auto new_node = std::make_shared<FunctionNode>(Poco::toLower(func_plus_minus_node->getFunctionName()));
if (column_id == 0)
new_node->getArguments().getNodes() = {lhs, rhs};
else if (column_id == 1)
@ -93,7 +94,12 @@ public:
if (!new_node)
return;
node = new_node;
QueryTreeNodePtr res = std::move(new_node);
if (!res->getResultType()->equals(*function_node->getResultType()))
res = createCastFunction(res, function_node->getResultType(), getContext());
node = std::move(res);
}

View File

@ -28,7 +28,6 @@
#include <Analyzer/Passes/MultiIfToIfPass.h>
#include <Analyzer/Passes/IfConstantConditionPass.h>
#include <Analyzer/Passes/IfChainToMultiIfPass.h>
#include <Analyzer/Passes/ConvertInToEqualPass.h>
#include <Analyzer/Passes/OrderByTupleEliminationPass.h>
#include <Analyzer/Passes/NormalizeCountVariantsPass.h>
#include <Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.h>
@ -264,7 +263,6 @@ void addQueryTreePasses(QueryTreePassManager & manager, bool only_analyze)
manager.addPass(std::make_unique<SumIfToCountIfPass>());
manager.addPass(std::make_unique<RewriteArrayExistsToHasPass>());
manager.addPass(std::make_unique<NormalizeCountVariantsPass>());
manager.addPass(std::make_unique<ConvertInToEqualPass>());
/// should before AggregateFunctionsArithmericOperationsPass
manager.addPass(std::make_unique<AggregateFunctionOfGroupByKeysPass>());

View File

@ -1,14 +1,15 @@
#pragma once
#include <Core/Settings.h>
#include <Core/Block.h>
#include <DataTypes/IDataType.h>
#include <QueryPipeline/SizeLimits.h>
#include <memory>
namespace DB
{
class IDataType;
using DataTypePtr = std::shared_ptr<const IDataType>;
class Set;
using SetPtr = std::shared_ptr<Set>;

View File

@ -753,7 +753,7 @@ void ClientBase::setDefaultFormatsFromConfiguration()
else
default_output_format = "TSV";
}
else if (is_interactive || stdout_is_a_tty)
else if (is_interactive)
{
default_output_format = "PrettyCompact";
}

View File

@ -70,6 +70,12 @@ void ConnectionEstablisher::run(ConnectionEstablisher::TryResult & result, std::
ProfileEvents::increment(ProfileEvents::DistributedConnectionUsable);
result.is_usable = true;
if (table_status_it->second.is_readonly)
{
result.is_readonly = true;
LOG_TRACE(log, "Table {}.{} is readonly on server {}", table_to_check->database, table_to_check->table, result.entry->getDescription());
}
const UInt64 max_allowed_delay = settings.max_replica_delay_for_distributed_queries;
if (!max_allowed_delay)
{

View File

@ -52,7 +52,7 @@ IConnectionPool::Entry ConnectionPoolWithFailover::get(const ConnectionTimeouts
settings.distributed_replica_max_ignored_errors = 0;
settings.fallback_to_stale_replicas_for_distributed_queries = true;
return get(timeouts, settings, true);
return get(timeouts, settings, /* force_connected= */ true);
}
IConnectionPool::Entry ConnectionPoolWithFailover::get(const ConnectionTimeouts & timeouts,
@ -65,7 +65,7 @@ IConnectionPool::Entry ConnectionPoolWithFailover::get(const ConnectionTimeouts
TryGetEntryFunc try_get_entry = [&](const NestedPoolPtr & pool, std::string & fail_message)
{
return tryGetEntry(pool, timeouts, fail_message, settings, {});
return tryGetEntry(pool, timeouts, fail_message, settings);
};
const size_t offset = settings.load_balancing_first_offset % nested_pools.size();
@ -158,6 +158,21 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
return getManyImpl(settings, pool_mode, try_get_entry, skip_unavailable_endpoints, priority_func);
}
std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::getManyCheckedForInsert(
const ConnectionTimeouts & timeouts,
const Settings & settings,
PoolMode pool_mode,
const QualifiedTableName & table_to_check)
{
TryGetEntryFunc try_get_entry = [&](const NestedPoolPtr & pool, std::string & fail_message)
{ return tryGetEntry(pool, timeouts, fail_message, settings, &table_to_check, /*async_callback=*/ {}); };
return getManyImpl(settings, pool_mode, try_get_entry,
/*skip_unavailable_endpoints=*/ std::nullopt,
/*priority_func=*/ {},
settings.distributed_insert_skip_read_only_replicas);
}
ConnectionPoolWithFailover::Base::GetPriorityFunc ConnectionPoolWithFailover::makeGetPriorityFunc(const Settings & settings)
{
const size_t offset = settings.load_balancing_first_offset % nested_pools.size();
@ -171,7 +186,8 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
PoolMode pool_mode,
const TryGetEntryFunc & try_get_entry,
std::optional<bool> skip_unavailable_endpoints,
GetPriorityForLoadBalancing::Func priority_func)
GetPriorityForLoadBalancing::Func priority_func,
bool skip_read_only_replicas)
{
if (nested_pools.empty())
throw DB::Exception(
@ -191,11 +207,17 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
max_entries = nested_pools.size();
}
else if (pool_mode == PoolMode::GET_ONE)
{
max_entries = 1;
}
else if (pool_mode == PoolMode::GET_MANY)
{
max_entries = settings.max_parallel_replicas;
}
else
{
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "Unknown pool allocation mode");
}
if (!priority_func)
priority_func = makeGetPriorityFunc(settings);
@ -203,7 +225,7 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
UInt64 max_ignored_errors = settings.distributed_replica_max_ignored_errors.value;
bool fallback_to_stale_replicas = settings.fallback_to_stale_replicas_for_distributed_queries.value;
return Base::getMany(min_entries, max_entries, max_tries, max_ignored_errors, fallback_to_stale_replicas, try_get_entry, priority_func);
return Base::getMany(min_entries, max_entries, max_tries, max_ignored_errors, fallback_to_stale_replicas, skip_read_only_replicas, try_get_entry, priority_func);
}
ConnectionPoolWithFailover::TryResult

View File

@ -77,6 +77,12 @@ public:
AsyncCallback async_callback = {},
std::optional<bool> skip_unavailable_endpoints = std::nullopt,
GetPriorityForLoadBalancing::Func priority_func = {});
/// The same as getManyChecked(), but respects distributed_insert_skip_read_only_replicas setting.
std::vector<TryResult> getManyCheckedForInsert(
const ConnectionTimeouts & timeouts,
const Settings & settings,
PoolMode pool_mode,
const QualifiedTableName & table_to_check);
struct NestedPoolStatus
{
@ -107,7 +113,8 @@ private:
PoolMode pool_mode,
const TryGetEntryFunc & try_get_entry,
std::optional<bool> skip_unavailable_endpoints = std::nullopt,
GetPriorityForLoadBalancing::Func priority_func = {});
GetPriorityForLoadBalancing::Func priority_func = {},
bool skip_read_only_replicas = false);
/// Try to get a connection from the pool and check that it is good.
/// If table_to_check is not null and the check is enabled in settings, check that replication delay

View File

@ -82,7 +82,7 @@ std::vector<Connection *> HedgedConnectionsFactory::getManyConnections(PoolMode
}
case PoolMode::GET_MANY:
{
max_entries = max_parallel_replicas;
max_entries = std::min(max_parallel_replicas, shuffled_pools.size());
break;
}
}

View File

@ -158,7 +158,7 @@ private:
/// checking the number of requested replicas that are still in process).
size_t requested_connections_count = 0;
const size_t max_parallel_replicas = 0;
const size_t max_parallel_replicas = 1;
const bool skip_unavailable_shards = false;
};

View File

@ -20,12 +20,12 @@ namespace DB
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int DUPLICATE_COLUMN;
extern const int EXPERIMENTAL_FEATURE_ERROR;
extern const int ILLEGAL_COLUMN;
extern const int NUMBER_OF_DIMENSIONS_MISMATCHED;
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int EXPERIMENTAL_FEATURE_ERROR;
}
namespace
@ -334,7 +334,18 @@ void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
if (type_changed || info.need_convert)
field = convertFieldToTypeOrThrow(field, *least_common_type.get());
data.back()->insert(field);
if (!data.back()->tryInsert(field))
{
/** Normalization of the field above is pretty complicated (it uses several FieldVisitors),
* so in the case of a bug, we may get mismatched types.
* The `IColumn::insert` method does not check the type of the inserted field, and it can lead to a segmentation fault.
* Therefore, we use the safer `tryInsert` method to get an exception instead of a segmentation fault.
*/
throw Exception(ErrorCodes::EXPERIMENTAL_FEATURE_ERROR,
"Cannot insert field {} to column {}",
field.dump(), data.back()->dumpStructure());
}
++num_rows;
}

View File

@ -460,6 +460,28 @@ Float32 ColumnVector<T>::getFloat32(size_t n [[maybe_unused]]) const
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Cannot get the value of {} as Float32", TypeName<T>);
}
template <typename T>
bool ColumnVector<T>::tryInsert(const DB::Field & x)
{
NearestFieldType<T> value;
if (!x.tryGet<NearestFieldType<T>>(value))
{
if constexpr (std::is_same_v<T, UInt8>)
{
/// It's also possible to insert boolean values into UInt8 column.
bool boolean_value;
if (x.tryGet<bool>(boolean_value))
{
data.push_back(static_cast<T>(boolean_value));
return true;
}
}
return false;
}
data.push_back(static_cast<T>(value));
return true;
}
template <typename T>
void ColumnVector<T>::insertRangeFrom(const IColumn & src, size_t start, size_t length)
{

View File

@ -224,14 +224,8 @@ public:
data.push_back(static_cast<T>(x.get<T>()));
}
bool tryInsert(const DB::Field & x) override
{
NearestFieldType<T> value;
if (!x.tryGet<NearestFieldType<T>>(value))
return false;
data.push_back(static_cast<T>(value));
return true;
}
bool tryInsert(const DB::Field & x) override;
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
ColumnPtr filter(const IColumn::Filter & filt, ssize_t result_size_hint) const override;

View File

@ -2,10 +2,13 @@
#include <memory>
#include <base/types.h>
#include <Common/Logger.h>
#include <Common/SharedMutex.h>
#include <Common/SharedLockGuard.h>
#include <Common/SharedMutex.h>
namespace DB
{
/** AtomicLogger allows to atomically change logger.
* Standard library does not have atomic_shared_ptr, and we do not use std::atomic* operations,
@ -49,3 +52,5 @@ private:
mutable DB::SharedMutex log_mutex;
LoggerPtr logger;
};
}

View File

@ -0,0 +1,16 @@
#include <Common/CurrentThread.h>
#include <Common/CurrentThreadHelpers.h>
namespace DB
{
bool currentThreadHasGroup()
{
return DB::CurrentThread::getGroup() != nullptr;
}
LogsLevel currentThreadLogsLevel()
{
return DB::CurrentThread::get().getClientLogsLevel();
}
}

View File

@ -0,0 +1,9 @@
#pragma once
#include <Core/LogsLevel.h>
namespace DB
{
bool currentThreadHasGroup();
LogsLevel currentThreadLogsLevel();
}

View File

@ -1,13 +1,15 @@
#include "DateLUT.h"
#include <Interpreters/Context.h>
#include <Common/CurrentThread.h>
#include <Common/filesystemHelpers.h>
#include <Poco/DigestStream.h>
#include <Poco/Exception.h>
#include <Poco/SHA1Engine.h>
#include <Common/filesystemHelpers.h>
#include <filesystem>
#include <fstream>
#include <Interpreters/Context.h>
namespace
@ -140,6 +142,38 @@ std::string determineDefaultTimeZone()
}
const DateLUTImpl & DateLUT::instance()
{
const auto & date_lut = getInstance();
if (DB::CurrentThread::isInitialized())
{
std::string timezone_from_context;
const DB::ContextPtr query_context = DB::CurrentThread::get().getQueryContext();
if (query_context)
{
timezone_from_context = extractTimezoneFromContext(query_context);
if (!timezone_from_context.empty())
return date_lut.getImplementation(timezone_from_context);
}
/// On the server side, timezone is passed in query_context,
/// but on CH-client side we have no query context,
/// and each time we modify client's global context
const DB::ContextPtr global_context = DB::CurrentThread::get().getGlobalContext();
if (global_context)
{
timezone_from_context = extractTimezoneFromContext(global_context);
if (!timezone_from_context.empty())
return date_lut.getImplementation(timezone_from_context);
}
}
return serverTimezoneInstance();
}
DateLUT::DateLUT()
{
/// Initialize the pointer to the default DateLUTImpl.

View File

@ -1,17 +1,23 @@
#pragma once
#include "DateLUTImpl.h"
#include <base/defines.h>
#include <base/types.h>
#include <boost/noncopyable.hpp>
#include "Common/CurrentThread.h"
#include <atomic>
#include <memory>
#include <mutex>
#include <unordered_map>
namespace DB
{
class Context;
using ContextPtr = std::shared_ptr<const Context>;
}
class DateLUTImpl;
/// This class provides lazy initialization and lookup of singleton DateLUTImpl objects for a given timezone.
class DateLUT : private boost::noncopyable
@ -20,38 +26,7 @@ public:
/// Return DateLUTImpl instance for session timezone.
/// session_timezone is a session-level setting.
/// If setting is not set, returns the server timezone.
static ALWAYS_INLINE const DateLUTImpl & instance()
{
const auto & date_lut = getInstance();
if (DB::CurrentThread::isInitialized())
{
std::string timezone_from_context;
const DB::ContextPtr query_context = DB::CurrentThread::get().getQueryContext();
if (query_context)
{
timezone_from_context = extractTimezoneFromContext(query_context);
if (!timezone_from_context.empty())
return date_lut.getImplementation(timezone_from_context);
}
/// On the server side, timezone is passed in query_context,
/// but on CH-client side we have no query context,
/// and each time we modify client's global context
const DB::ContextPtr global_context = DB::CurrentThread::get().getGlobalContext();
if (global_context)
{
timezone_from_context = extractTimezoneFromContext(global_context);
if (!timezone_from_context.empty())
return date_lut.getImplementation(timezone_from_context);
}
}
return serverTimezoneInstance();
}
static const DateLUTImpl & instance();
static ALWAYS_INLINE const DateLUTImpl & instance(const std::string & time_zone)
{

View File

@ -1,8 +1,5 @@
#include "DateLUTImpl.h"
#include <cctz/civil_time.h>
#include <cctz/time_zone.h>
#include <cctz/zone_info_source.h>
#include <Core/DecimalFunctions.h>
#include <Common/DateLUTImpl.h>
#include <Common/Exception.h>
#include <algorithm>
@ -11,6 +8,10 @@
#include <cstring>
#include <memory>
#include <cctz/civil_time.h>
#include <cctz/time_zone.h>
#include <cctz/zone_info_source.h>
namespace DB
{
@ -214,6 +215,29 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_)
}
}
unsigned int DateLUTImpl::toMillisecond(const DB::DateTime64 & datetime, Int64 scale_multiplier) const
{
constexpr Int64 millisecond_multiplier = 1'000;
constexpr Int64 microsecond_multiplier = 1'000 * millisecond_multiplier;
constexpr Int64 divider = microsecond_multiplier / millisecond_multiplier;
auto components = DB::DecimalUtils::splitWithScaleMultiplier(datetime, scale_multiplier);
if (datetime.value < 0 && components.fractional)
{
components.fractional = scale_multiplier + (components.whole ? Int64(-1) : Int64(1)) * components.fractional;
--components.whole;
}
Int64 fractional = components.fractional;
if (scale_multiplier > microsecond_multiplier)
fractional = fractional / (scale_multiplier / microsecond_multiplier);
else if (scale_multiplier < microsecond_multiplier)
fractional = fractional * (microsecond_multiplier / scale_multiplier);
UInt16 millisecond = static_cast<UInt16>(fractional / divider);
return millisecond;
}
/// Prefer to load timezones from blobs linked to the binary.
/// The blobs are provided by "tzdata" library.

View File

@ -3,7 +3,6 @@
#include <base/DayNum.h>
#include <base/defines.h>
#include <base/types.h>
#include <Core/DecimalFunctions.h>
#include <ctime>
#include <cassert>
@ -50,6 +49,11 @@ enum class WeekDayMode
WeekStartsSunday1 = 3
};
namespace DB
{
class DateTime64;
}
/** Lookup table to conversion of time to date, and to month / year / day of week / day of month and so on.
* First time was implemented for OLAPServer, that needed to do billions of such transformations.
*/
@ -593,29 +597,7 @@ public:
return time % 60;
}
template <typename DateOrTime>
unsigned toMillisecond(const DateOrTime & datetime, Int64 scale_multiplier) const
{
constexpr Int64 millisecond_multiplier = 1'000;
constexpr Int64 microsecond_multiplier = 1'000 * millisecond_multiplier;
constexpr Int64 divider = microsecond_multiplier / millisecond_multiplier;
auto components = DB::DecimalUtils::splitWithScaleMultiplier(datetime, scale_multiplier);
if (datetime.value < 0 && components.fractional)
{
components.fractional = scale_multiplier + (components.whole ? Int64(-1) : Int64(1)) * components.fractional;
--components.whole;
}
Int64 fractional = components.fractional;
if (scale_multiplier > microsecond_multiplier)
fractional = fractional / (scale_multiplier / microsecond_multiplier);
else if (scale_multiplier < microsecond_multiplier)
fractional = fractional * (microsecond_multiplier / scale_multiplier);
UInt16 millisecond = static_cast<UInt16>(fractional / divider);
return millisecond;
}
unsigned toMillisecond(const DB::DateTime64 & datetime, Int64 scale_multiplier) const;
unsigned toMinute(Time t) const
{

View File

@ -1,26 +1,27 @@
#include "Exception.h"
#include <algorithm>
#include <cstdlib>
#include <cstring>
#include <filesystem>
#include <cxxabi.h>
#include <IO/Operators.h>
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <base/demangle.h>
#include <Poco/String.h>
#include <Common/AtomicLogger.h>
#include <Common/ErrorCodes.h>
#include <Common/Exception.h>
#include <Common/LockMemoryExceptionInThread.h>
#include <Common/MemorySanitizer.h>
#include <Common/SensitiveDataMasker.h>
#include <Common/config_version.h>
#include <Common/filesystemHelpers.h>
#include <Common/formatReadable.h>
#include <Common/logger_useful.h>
#include <Common/config_version.h>
#include <algorithm>
#include <cstdlib>
#include <cstring>
#include <filesystem>
#include <cxxabi.h>
#include <Poco/String.h>
namespace fs = std::filesystem;

View File

@ -1,22 +1,20 @@
#pragma once
#include <cerrno>
#include <exception>
#include <vector>
#include <memory>
#include <Poco/Exception.h>
#include <base/defines.h>
#include <base/errnoToString.h>
#include <base/int8_to_string.h>
#include <base/scope_guard.h>
#include <Common/AtomicLogger.h>
#include <Common/Logger.h>
#include <Common/LoggingFormatStringHelpers.h>
#include <Common/StackTrace.h>
#include <cerrno>
#include <exception>
#include <memory>
#include <vector>
#include <fmt/format.h>
#include <Poco/Exception.h>
namespace Poco { class Logger; }
@ -24,6 +22,8 @@ namespace Poco { class Logger; }
namespace DB
{
class AtomicLogger;
[[noreturn]] void abortOnFailedAssertion(const String & description);
/// This flag can be set for testing purposes - to check that no exceptions are thrown.

View File

@ -50,7 +50,9 @@ static struct InitFiu
REGULAR(check_table_query_delay_for_part) \
REGULAR(dummy_failpoint) \
REGULAR(prefetched_reader_pool_failpoint) \
PAUSEABLE_ONCE(dummy_pausable_failpoint_once) \
PAUSEABLE_ONCE(replicated_merge_tree_insert_retry_pause) \
PAUSEABLE_ONCE(finish_set_quorum_failed_parts) \
PAUSEABLE_ONCE(finish_clean_quorum_failed_parts) \
PAUSEABLE(dummy_pausable_failpoint) \
ONCE(execute_query_calling_empty_set_result_func_on_exception)

View File

@ -295,8 +295,13 @@ private:
String getTarget() const
{
if (!Session::getProxyConfig().host.empty())
return fmt::format("{} over proxy {}", Session::getHost(), Session::getProxyConfig().host);
return Session::getHost();
return fmt::format("{}:{} over proxy {}",
Session::getHost(),
Session::getPort(),
Session::getProxyConfig().host);
return fmt::format("{}:{}",
Session::getHost(),
Session::getPort());
}
void flushRequest() override
@ -472,7 +477,8 @@ public:
String getTarget() const
{
if (!proxy_configuration.isEmpty())
return fmt::format("{} over proxy {}", host, proxy_configuration.host);
return fmt::format("{} over proxy {}",
host, proxy_configuration.host);
return host;
}

View File

@ -2,6 +2,7 @@
#if USE_JEMALLOC
#include <Common/Exception.h>
#include <Common/Stopwatch.h>
#include <Common/logger_useful.h>
#include <jemalloc/jemalloc.h>

View File

@ -1,9 +1,10 @@
#pragma once
#include <cstring>
#include <string>
#include <exception>
#include <string>
#include <Common/DateLUT.h>
#include <Common/DateLUTImpl.h>
/** Stores a calendar date in broken-down form (year, month, day-in-month).

View File

@ -1,15 +1,20 @@
#pragma once
#include <memory>
#include <base/defines.h>
#include <Poco/Channel.h>
#include <memory>
#include <Poco/Logger.h>
#include <Poco/Message.h>
using LoggerPtr = Poco::LoggerPtr;
namespace Poco
{
class Channel;
class Logger;
using LoggerPtr = std::shared_ptr<Logger>;
}
using LoggerPtr = std::shared_ptr<Poco::Logger>;
using LoggerRawPtr = Poco::Logger *;
/** RAII wrappers around Poco/Logger.h.

View File

@ -1,4 +1,5 @@
#include <Common/DateLUT.h>
#include <Common/DateLUTImpl.h>
#include <Common/LoggingFormatStringHelpers.h>
#include <Common/SipHash.h>
#include <Common/thread_local_rng.h>

View File

@ -22,7 +22,7 @@ void protectMemoryRegion(void * addr, size_t len, int prot)
}
#endif
size_t byte_size(size_t num_elements, size_t element_size)
ALWAYS_INLINE size_t byte_size(size_t num_elements, size_t element_size)
{
size_t amount;
if (__builtin_mul_overflow(num_elements, element_size, &amount))
@ -30,7 +30,7 @@ size_t byte_size(size_t num_elements, size_t element_size)
return amount;
}
size_t minimum_memory_for_elements(size_t num_elements, size_t element_size, size_t pad_left, size_t pad_right)
ALWAYS_INLINE size_t minimum_memory_for_elements(size_t num_elements, size_t element_size, size_t pad_left, size_t pad_right)
{
size_t amount;
if (__builtin_add_overflow(byte_size(num_elements, element_size), pad_left + pad_right, &amount))

View File

@ -30,6 +30,7 @@ namespace ProfileEvents
{
extern const Event DistributedConnectionFailTry;
extern const Event DistributedConnectionFailAtAll;
extern const Event DistributedConnectionSkipReadOnlyReplica;
}
/// This class provides a pool with fault tolerance. It is used for pooling of connections to replicated DB.
@ -73,13 +74,7 @@ public:
{
TryResult() = default;
void reset()
{
entry = Entry();
is_usable = false;
is_up_to_date = false;
delay = 0;
}
void reset() { *this = {}; }
Entry entry; /// use isNull() to check if connection is established
bool is_usable = false; /// if connection is established, then can be false only with table check
@ -87,6 +82,7 @@ public:
bool is_up_to_date = false; /// If true, the entry is a connection to up-to-date replica
/// Depends on max_replica_delay_for_distributed_queries setting
UInt32 delay = 0; /// Helps choosing the "least stale" option when all replicas are stale.
bool is_readonly = false; /// Table is in read-only mode, INSERT can ignore such replicas.
};
struct PoolState;
@ -117,6 +113,7 @@ public:
size_t min_entries, size_t max_entries, size_t max_tries,
size_t max_ignored_errors,
bool fallback_to_stale_replicas,
bool skip_read_only_replicas,
const TryGetEntryFunc & try_get_entry,
const GetPriorityFunc & get_priority);
@ -205,8 +202,12 @@ PoolWithFailoverBase<TNestedPool>::get(size_t max_ignored_errors, bool fallback_
const TryGetEntryFunc & try_get_entry, const GetPriorityFunc & get_priority)
{
std::vector<TryResult> results = getMany(
1 /* min entries */, 1 /* max entries */, 1 /* max tries */,
max_ignored_errors, fallback_to_stale_replicas,
/* min_entries= */ 1,
/* max_entries= */ 1,
/* max_tries= */ 1,
max_ignored_errors,
fallback_to_stale_replicas,
/* skip_read_only_replicas= */ false,
try_get_entry, get_priority);
if (results.empty() || results[0].entry.isNull())
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR,
@ -220,6 +221,7 @@ PoolWithFailoverBase<TNestedPool>::getMany(
size_t min_entries, size_t max_entries, size_t max_tries,
size_t max_ignored_errors,
bool fallback_to_stale_replicas,
bool skip_read_only_replicas,
const TryGetEntryFunc & try_get_entry,
const GetPriorityFunc & get_priority)
{
@ -271,9 +273,14 @@ PoolWithFailoverBase<TNestedPool>::getMany(
++entries_count;
if (result.is_usable)
{
++usable_count;
if (result.is_up_to_date)
++up_to_date_count;
if (skip_read_only_replicas && result.is_readonly)
ProfileEvents::increment(ProfileEvents::DistributedConnectionSkipReadOnlyReplica);
else
{
++usable_count;
if (result.is_up_to_date)
++up_to_date_count;
}
}
}
else
@ -296,7 +303,7 @@ PoolWithFailoverBase<TNestedPool>::getMany(
throw DB::NetException(DB::ErrorCodes::ALL_CONNECTION_TRIES_FAILED,
"All connection tries failed. Log: \n\n{}\n", fail_messages);
std::erase_if(try_results, [](const TryResult & r) { return r.entry.isNull() || !r.is_usable; });
std::erase_if(try_results, [&](const TryResult & r) { return r.entry.isNull() || !r.is_usable || (skip_read_only_replicas && r.is_readonly); });
/// Sort so that preferred items are near the beginning.
std::stable_sort(

View File

@ -156,6 +156,7 @@
M(DistributedConnectionFailTry, "Total count when distributed connection fails with retry.") \
M(DistributedConnectionMissingTable, "Number of times we rejected a replica from a distributed query, because it did not contain a table needed for the query.") \
M(DistributedConnectionStaleReplica, "Number of times we rejected a replica from a distributed query, because some table needed for a query had replication lag higher than the configured threshold.") \
M(DistributedConnectionSkipReadOnlyReplica, "Number of replicas skipped during INSERT into Distributed table due to replicas being read-only") \
M(DistributedConnectionFailAtAll, "Total count when distributed connection fails after all retries finished.") \
\
M(HedgedRequestsChangeReplica, "Total count when timeout for changing replica expired in hedged requests.") \
@ -355,8 +356,6 @@ The server successfully detected this situation and will download merged part fr
M(PerfLocalMemoryReferences, "Local NUMA node memory reads") \
M(PerfLocalMemoryMisses, "Local NUMA node memory read misses") \
\
M(CreatedHTTPConnections, "Total amount of created HTTP connections (counter increase every time connection is created).") \
\
M(CannotWriteToWriteBufferDiscard, "Number of stack traces dropped by query profiler or signal handler because pipe is full or cannot write to pipe.") \
M(QueryProfilerSignalOverruns, "Number of times we drop processing of a query profiler signal due to overrun plus the number of signals that OS has not delivered due to overrun.") \
M(QueryProfilerConcurrencyOverruns, "Number of times we drop processing of a query profiler signal due to too many concurrent query profilers in other threads, which may indicate overload.") \
@ -436,8 +435,6 @@ The server successfully detected this situation and will download merged part fr
M(ReadBufferFromS3ResetSessions, "Number of HTTP sessions that were reset in ReadBufferFromS3.") \
M(ReadBufferFromS3PreservedSessions, "Number of HTTP sessions that were preserved in ReadBufferFromS3.") \
\
M(ReadWriteBufferFromHTTPPreservedSessions, "Number of HTTP sessions that were preserved in ReadWriteBufferFromHTTP.") \
\
M(WriteBufferFromS3Microseconds, "Time spent on writing to S3.") \
M(WriteBufferFromS3Bytes, "Bytes written to S3.") \
M(WriteBufferFromS3RequestsErrors, "Number of exceptions while writing to S3.") \
@ -459,7 +456,8 @@ The server successfully detected this situation and will download merged part fr
M(FilesystemCacheLoadMetadataMicroseconds, "Time spent loading filesystem cache metadata") \
M(FilesystemCacheEvictedBytes, "Number of bytes evicted from filesystem cache") \
M(FilesystemCacheEvictedFileSegments, "Number of file segments evicted from filesystem cache") \
M(FilesystemCacheEvictionSkippedFileSegments, "Number of file segments skipped for eviction because of being unreleasable") \
M(FilesystemCacheEvictionSkippedFileSegments, "Number of file segments skipped for eviction because of being in unreleasable state") \
M(FilesystemCacheEvictionSkippedEvictingFileSegments, "Number of file segments skipped for eviction because of being in evicting state") \
M(FilesystemCacheEvictionTries, "Number of filesystem cache eviction attempts") \
M(FilesystemCacheLockKeyMicroseconds, "Lock cache key time") \
M(FilesystemCacheLockMetadataMicroseconds, "Lock filesystem cache metadata time") \
@ -726,6 +724,9 @@ The server successfully detected this situation and will download merged part fr
M(AddressesDiscovered, "Total count of new addresses in dns resolve results for http connections") \
M(AddressesExpired, "Total count of expired addresses which is no longer presented in dns resolve results for http connections") \
M(AddressesMarkedAsFailed, "Total count of addresses which has been marked as faulty due to connection errors for http connections") \
\
M(ReadWriteBufferFromHTTPRequestsSent, "Number of HTTP requests sent by ReadWriteBufferFromHTTP") \
M(ReadWriteBufferFromHTTPBytes, "Total size of payload bytes received and sent by ReadWriteBufferFromHTTP. Doesn't include HTTP headers.") \
#ifdef APPLY_FOR_EXTERNAL_EVENTS

View File

@ -1,3 +1,4 @@
#include <Common/CurrentThread.h>
#include <Common/ProfileEventsScope.h>
namespace DB

View File

@ -1,7 +1,8 @@
#pragma once
#include <Common/ProfileEvents.h>
#include <Common/CurrentThread.h>
#include <boost/noncopyable.hpp>
namespace DB
{

View File

@ -1,15 +1,16 @@
#include "QueryProfiler.h"
#include <IO/WriteHelpers.h>
#include <Common/TraceSender.h>
#include <base/defines.h>
#include <base/errnoToString.h>
#include <base/phdr_cache.h>
#include <Common/CurrentMetrics.h>
#include <Common/Exception.h>
#include <Common/MemoryTracker.h>
#include <Common/StackTrace.h>
#include <Common/thread_local_rng.h>
#include <Common/TraceSender.h>
#include <Common/logger_useful.h>
#include <base/defines.h>
#include <base/phdr_cache.h>
#include <base/errnoToString.h>
#include <Common/thread_local_rng.h>
#include <random>

View File

@ -15,6 +15,7 @@
#include <Common/logger_useful.h>
#include <Common/Exception.h>
#include <Common/MemoryTracker.h>
#include <Common/thread_local_rng.h>
#include <Common/ThreadFuzzer.h>

Some files were not shown because too many files have changed in this diff Show More