Merge remote-tracking branch 'blessed/master' into perf_experiment

This commit is contained in:
Raúl Marín 2022-10-27 13:17:07 +02:00
commit 891484b462
98 changed files with 2410 additions and 1047 deletions

View File

@ -466,6 +466,7 @@ jobs:
- BuilderDebTsan
- BuilderDebDebug
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |
@ -504,6 +505,7 @@ jobs:
- BuilderBinDarwin
- BuilderBinDarwinAarch64
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |

View File

@ -974,6 +974,7 @@ jobs:
- BuilderDebTsan
- BuilderDebUBsan
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |
@ -1021,6 +1022,7 @@ jobs:
- BuilderBinClangTidy
- BuilderDebShared
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |

View File

@ -112,7 +112,7 @@ jobs:
StyleCheck:
needs: DockerHubPush
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
if: ${{ success() || failure() || always() }}
steps:
- name: Set envs
run: |

View File

@ -541,6 +541,7 @@ jobs:
- BuilderDebMsan
- BuilderDebDebug
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |
@ -580,6 +581,7 @@ jobs:
- BuilderBinDarwin
- BuilderBinDarwinAarch64
runs-on: [self-hosted, style-checker]
if: ${{ success() || failure() }}
steps:
- name: Set envs
run: |

View File

@ -31,6 +31,8 @@
* Add OpenTelemetry support to ON CLUSTER DDL (require `distributed_ddl_entry_format_version` to be set to 4). [#41484](https://github.com/ClickHouse/ClickHouse/pull/41484) ([Frank Chen](https://github.com/FrankChen021)).
* Added system table `asynchronous_insert_log`. It contains information about asynchronous inserts (including results of queries in fire-and-forget mode (with `wait_for_async_insert=0`)) for better introspection. [#42040](https://github.com/ClickHouse/ClickHouse/pull/42040) ([Anton Popov](https://github.com/CurtizJ)).
* Add support for methods `lz4`, `bz2`, `snappy` in HTTP's `Accept-Encoding` which is a non-standard extension to HTTP protocol. [#42071](https://github.com/ClickHouse/ClickHouse/pull/42071) ([Nikolay Degterinsky](https://github.com/evillique)).
* Adds Morton Coding (ZCurve) encode/decode functions. [#41753](https://github.com/ClickHouse/ClickHouse/pull/41753) ([Constantine Peresypkin](https://github.com/pkit)).
* Add support for `SET setting_name = DEFAULT`. [#42187](https://github.com/ClickHouse/ClickHouse/pull/42187) ([Filatenkov Artur](https://github.com/FArthur-cmd)).
#### Experimental Feature
* Added new infrastructure for query analysis and planning under the `allow_experimental_analyzer` setting. [#31796](https://github.com/ClickHouse/ClickHouse/pull/31796) ([Maksim Kita](https://github.com/kitaisreal)).
@ -66,8 +68,7 @@
* Allow readable size values (like `1TB`) in cache config. [#41688](https://github.com/ClickHouse/ClickHouse/pull/41688) ([Kseniia Sumarokova](https://github.com/kssenii)).
* ClickHouse could cache stale DNS entries for some period of time (15 seconds by default) until the cache won't be updated asynchronously. During these periods ClickHouse can nevertheless try to establish a connection and produce errors. This behavior is fixed. [#41707](https://github.com/ClickHouse/ClickHouse/pull/41707) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add interactive history search with fzf-like utility (fzf/sk) for `clickhouse-client`/`clickhouse-local` (note you can use `FZF_DEFAULT_OPTS`/`SKIM_DEFAULT_OPTIONS` to additionally configure the behavior). [#41730](https://github.com/ClickHouse/ClickHouse/pull/41730) ([Azat Khuzhin](https://github.com/azat)).
*
Only allow clients connecting to a secure server with an invalid certificate only to proceed with the '--accept-certificate' flag. [#41743](https://github.com/ClickHouse/ClickHouse/pull/41743) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Only allow clients connecting to a secure server with an invalid certificate only to proceed with the '--accept-certificate' flag. [#41743](https://github.com/ClickHouse/ClickHouse/pull/41743) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add function `tryBase58Decode`, similar to the existing function `tryBase64Decode`. [#41824](https://github.com/ClickHouse/ClickHouse/pull/41824) ([Robert Schulze](https://github.com/rschu1ze)).
* Improve feedback when replacing partition with different primary key. Fixes [#34798](https://github.com/ClickHouse/ClickHouse/issues/34798). [#41838](https://github.com/ClickHouse/ClickHouse/pull/41838) ([Salvatore](https://github.com/tbsal)).
* Fix parallel parsing: segmentator now checks `max_block_size`. This fixed memory overallocation in case of parallel parsing and small LIMIT. [#41852](https://github.com/ClickHouse/ClickHouse/pull/41852) ([Vitaly Baranov](https://github.com/vitlibar)).
@ -86,6 +87,8 @@ Only allow clients connecting to a secure server with an invalid certificate onl
* Fix rarely invalid cast of aggregate state types with complex types such as Decimal. This fixes [#42408](https://github.com/ClickHouse/ClickHouse/issues/42408). [#42417](https://github.com/ClickHouse/ClickHouse/pull/42417) ([Amos Bird](https://github.com/amosbird)).
* Allow to use `Date32` arguments for `dateName` function. [#42554](https://github.com/ClickHouse/ClickHouse/pull/42554) ([Roman Vasin](https://github.com/rvasin)).
* Now filters with NULL literals will be used during index analysis. [#34063](https://github.com/ClickHouse/ClickHouse/issues/34063). [#41842](https://github.com/ClickHouse/ClickHouse/pull/41842) ([Amos Bird](https://github.com/amosbird)).
* Merge parts if every part in the range is older than a certain threshold. The threshold can be set by using `min_age_to_force_merge_seconds`. This closes [#35836](https://github.com/ClickHouse/ClickHouse/issues/35836). [#42423](https://github.com/ClickHouse/ClickHouse/pull/42423) ([Antonio Andelic](https://github.com/antonio2368)). This is continuation of [#39550i](https://github.com/ClickHouse/ClickHouse/pull/39550) by [@fastio](https://github.com/fastio) who implemented most of the logic.
* Improve the time to recover lost keeper connections. [#42541](https://github.com/ClickHouse/ClickHouse/pull/42541) ([Raúl Marín](https://github.com/Algunenano)).
#### Build/Testing/Packaging Improvement
* Add fuzzer for table definitions [#40096](https://github.com/ClickHouse/ClickHouse/pull/40096) ([Anton Popov](https://github.com/CurtizJ)). This represents the biggest advancement for ClickHouse testing in this year so far.

View File

@ -10,9 +10,11 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 22.10 | ✔️ |
| 22.9 | ✔️ |
| 22.8 | ✔️ |
| 22.7 | ✔️ |
| 22.6 | ✔️ |
| 22.7 | |
| 22.6 | |
| 22.5 | ❌ |
| 22.4 | ❌ |
| 22.3 | ✔️ |

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54467)
SET(VERSION_REVISION 54468)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 10)
SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 3030d4c7ff09ec44ab07d0a8069ea923227288a1)
SET(VERSION_DESCRIBE v22.10.1.1-testing)
SET(VERSION_STRING 22.10.1.1)
SET(VERSION_GITHASH 98ab5a3c189232ea2a3dddb9d2be7196ae8b3434)
SET(VERSION_DESCRIBE v22.11.1.1-testing)
SET(VERSION_STRING 22.11.1.1)
# end of autochange

2
contrib/libcxx vendored

@ -1 +1 @@
Subproject commit 172b2ae074f6755145b91c53a95c8540c1468239
Subproject commit 4db7f838afd3139eb3761694b04d31275df45d2d

View File

@ -25,6 +25,7 @@ set(SRCS
"${LIBCXX_SOURCE_DIR}/src/ios.cpp"
"${LIBCXX_SOURCE_DIR}/src/ios.instantiations.cpp"
"${LIBCXX_SOURCE_DIR}/src/iostream.cpp"
"${LIBCXX_SOURCE_DIR}/src/legacy_debug_handler.cpp"
"${LIBCXX_SOURCE_DIR}/src/legacy_pointer_safety.cpp"
"${LIBCXX_SOURCE_DIR}/src/locale.cpp"
"${LIBCXX_SOURCE_DIR}/src/memory.cpp"
@ -49,6 +50,7 @@ set(SRCS
"${LIBCXX_SOURCE_DIR}/src/valarray.cpp"
"${LIBCXX_SOURCE_DIR}/src/variant.cpp"
"${LIBCXX_SOURCE_DIR}/src/vector.cpp"
"${LIBCXX_SOURCE_DIR}/src/verbose_abort.cpp"
)
add_library(cxx ${SRCS})

2
contrib/libcxxabi vendored

@ -1 +1 @@
Subproject commit 6eb7cc7a7bdd779e6734d1b9fb451df2274462d7
Subproject commit a736a6b3c6a7b8aae2ebad629ca21b2c55b4820e

View File

@ -9,6 +9,7 @@ set(SRCS
"${LIBCXXABI_SOURCE_DIR}/src/cxa_exception_storage.cpp"
"${LIBCXXABI_SOURCE_DIR}/src/cxa_guard.cpp"
"${LIBCXXABI_SOURCE_DIR}/src/cxa_handlers.cpp"
# "${LIBCXXABI_SOURCE_DIR}/src/cxa_noexception.cpp"
"${LIBCXXABI_SOURCE_DIR}/src/cxa_personality.cpp"
"${LIBCXXABI_SOURCE_DIR}/src/cxa_thread_atexit.cpp"
"${LIBCXXABI_SOURCE_DIR}/src/cxa_vector.cpp"

2
contrib/rocksdb vendored

@ -1 +1 @@
Subproject commit e7c2b2f7bcf3b4b33892a1a6d25c32a93edfbdb9
Subproject commit 2c8998e26c6d46b27c710d7829c3a15e34959f70

2
contrib/zlib-ng vendored

@ -1 +1 @@
Subproject commit bffad6f6fe74d6a2f92e2668390664a926c68733
Subproject commit 50f0eae1a411764cd6d1e85b3ce471438acd3c1c

View File

@ -33,7 +33,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="22.9.3.18"
ARG VERSION="22.10.1.1877"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -21,7 +21,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="22.9.3.18"
ARG VERSION="22.10.1.1877"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -0,0 +1,352 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.10.1.1877-stable (98ab5a3c189) FIXME as compared to v22.9.1.2603-stable (3030d4c7ff0)
#### Backward Incompatible Change
* Rename cache commands: `show caches` -> `show filesystem caches`, `describe cache` -> `describe filesystem cache`. [#41508](https://github.com/ClickHouse/ClickHouse/pull/41508) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Remove support for the `WITH TIMEOUT` section for `LIVE VIEW`. This closes [#40557](https://github.com/ClickHouse/ClickHouse/issues/40557). [#42173](https://github.com/ClickHouse/ClickHouse/pull/42173) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Add Rust code support into ClickHouse with BLAKE3 hash-function library as an example. [#33435](https://github.com/ClickHouse/ClickHouse/pull/33435) ([BoloniniD](https://github.com/BoloniniD)).
* This is the initial implement of Kusto Query Language. (MVP). [#37961](https://github.com/ClickHouse/ClickHouse/pull/37961) ([Yong Wang](https://github.com/kashwy)).
* * Support limiting of temporary data stored on disk using settings `max_temporary_data_on_disk_size_for_user`/`max_temporary_data_on_disk_size_for_query` . [#40893](https://github.com/ClickHouse/ClickHouse/pull/40893) ([Vladimir C](https://github.com/vdimir)).
* Support Java integers hashing in `javaHash`. [#41131](https://github.com/ClickHouse/ClickHouse/pull/41131) ([JackyWoo](https://github.com/JackyWoo)).
* This PR is to support the OpenSSL in-house build like the BoringSSL submodule. Build flag i.e. ENABLE_CH_BUNDLE_BORINGSSL is used to choose between BoringSSL and OpenSSL. By default, the BoringSSL in-house build will be used. [#41142](https://github.com/ClickHouse/ClickHouse/pull/41142) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Composable protocol configuration is added. [#41198](https://github.com/ClickHouse/ClickHouse/pull/41198) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add OpenTelemetry support to ON CLUSTER DDL(require `distributed_ddl_entry_format_version` to be set to 4). [#41484](https://github.com/ClickHouse/ClickHouse/pull/41484) ([Frank Chen](https://github.com/FrankChen021)).
* Add setting `format_json_object_each_row_column_for_object_name` to write/parse object name as column value in JSONObjectEachRow format. [#41703](https://github.com/ClickHouse/ClickHouse/pull/41703) ([Kruglov Pavel](https://github.com/Avogar)).
* adds Morton Coding (ZCurve) encode/decode functions. [#41753](https://github.com/ClickHouse/ClickHouse/pull/41753) ([Constantine Peresypkin](https://github.com/pkit)).
* Implement support for different UUID binary formats with support for the two most prevalent ones: the default big-endian and Microsoft's mixed-endian as specified in [RFC 4122](https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.1). [#42108](https://github.com/ClickHouse/ClickHouse/pull/42108) ([ltrk2](https://github.com/ltrk2)).
* Added an aggregate function `analysisOfVariance` (`anova`) to perform a statistical test over several groups of normally distributed observations to find out whether all groups have the same mean or not. Original PR [#37872](https://github.com/ClickHouse/ClickHouse/issues/37872). [#42131](https://github.com/ClickHouse/ClickHouse/pull/42131) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add support for `SET setting_name = DEFAULT`. [#42187](https://github.com/ClickHouse/ClickHouse/pull/42187) ([Filatenkov Artur](https://github.com/FArthur-cmd)).
* * Add `URL` Functions which conform rfc. Functions include: `cutToFirstSignificantSubdomainCustomRFC`, `cutToFirstSignificantSubdomainCustomWithWWWRFC`, `cutToFirstSignificantSubdomainRFC`, `cutToFirstSignificantSubdomainWithWWWRFC`, `domainRFC`, `domainWithoutWWWRFC`, `firstSignificantSubdomainCustomRFC`, `firstSignificantSubdomainRFC`, `portRFC`, `topLevelDomainRFC`. [#42274](https://github.com/ClickHouse/ClickHouse/pull/42274) ([Quanfa Fu](https://github.com/dentiscalprum)).
* Added functions (`randUniform`, `randNormal`, `randLogNormal`, `randExponential`, `randChiSquared`, `randStudentT`, `randFisherF`, `randBernoulli`, `randBinomial`, `randNegativeBinomial`, `randPoisson` ) to generate random values according to the specified distributions. This closes [#21834](https://github.com/ClickHouse/ClickHouse/issues/21834). [#42411](https://github.com/ClickHouse/ClickHouse/pull/42411) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Performance Improvement
* Implement operator precedence element parser to resolve stack overflow issues and make the required stack size smaller. [#34892](https://github.com/ClickHouse/ClickHouse/pull/34892) ([Nikolay Degterinsky](https://github.com/evillique)).
* DISTINCT in order optimization leverage sorting properties of data streams. This improvement will enable reading in order for DISTINCT if applicable (before it was necessary to provide ORDER BY for columns in DISTINCT). [#41014](https://github.com/ClickHouse/ClickHouse/pull/41014) ([Igor Nikonov](https://github.com/devcrafter)).
* ColumnVector: optimize UInt8 index with AVX512VBMI. [#41247](https://github.com/ClickHouse/ClickHouse/pull/41247) ([Guo Wangyang](https://github.com/guowangy)).
* The performance experiments of **SSB** (Star Schema Benchmark) on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) shows that this change could bring a **2.95x** improvement of the geomean of all subcases' QPS. [#41675](https://github.com/ClickHouse/ClickHouse/pull/41675) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Fixed slowness in JSONExtract with LowCardinality(String) tuples. [#41726](https://github.com/ClickHouse/ClickHouse/pull/41726) ([AlfVII](https://github.com/AlfVII)).
* Add ldapr capabilities to AArch64 builds. This is supported from Graviton 2+, Azure and GCP instances. Only appeared in clang-15 [not so long ago](https://github.com/llvm/llvm-project/commit/9609b5daffe9fd28d83d83da895abc5113f76c24). [#41778](https://github.com/ClickHouse/ClickHouse/pull/41778) ([Daniel Kutenin](https://github.com/danlark1)).
* Improve performance when comparing strings and one argument is empty constant string. [#41870](https://github.com/ClickHouse/ClickHouse/pull/41870) ([Jiebin Sun](https://github.com/jiebinn)).
* optimize insertFrom of ColumnAggregateFunction to share Aggregate State in some cases. [#41960](https://github.com/ClickHouse/ClickHouse/pull/41960) ([flynn](https://github.com/ucasfl)).
* Relax the "Too many parts" threshold. This closes [#6551](https://github.com/ClickHouse/ClickHouse/issues/6551). Now ClickHouse will allow more parts in a partition if the average part size is large enough (at least 10 GiB). This allows to have up to petabytes of data in a single partition of a single table on a single server, which is possible using disk shelves or object storage. [#42002](https://github.com/ClickHouse/ClickHouse/pull/42002) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make writing to AzureBlobStorage more efficient (respect `max_single_part_upload_size` instead of writing a block per each buffer size). Inefficiency mentioned in [#41754](https://github.com/ClickHouse/ClickHouse/issues/41754). [#42041](https://github.com/ClickHouse/ClickHouse/pull/42041) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make thread ids in the process list and query_log unique to avoid waste. [#42180](https://github.com/ClickHouse/ClickHouse/pull/42180) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Added new infrastructure for query analysis and planning under `allow_experimental_analyzer` setting. [#31796](https://github.com/ClickHouse/ClickHouse/pull/31796) ([Maksim Kita](https://github.com/kitaisreal)).
* * Support expression `(EXPLAIN SELECT ...)` in a subquery. Queries like `SELECT * FROM (EXPLAIN PIPELINE SELECT col FROM TABLE ORDER BY col)` became valid. [#40630](https://github.com/ClickHouse/ClickHouse/pull/40630) ([Vladimir C](https://github.com/vdimir)).
* Currently changing `async_insert_max_data_size` or `async_insert_busy_timeout_ms` in scope of query makes no sense and this leads to bad user experience. E.g. user wants to insert data rarely and he doesn't have an access to server config to tune default settings. [#40668](https://github.com/ClickHouse/ClickHouse/pull/40668) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Embedded Keeper will always start in the background allowing ClickHouse to start without achieving quorum. [#40991](https://github.com/ClickHouse/ClickHouse/pull/40991) ([Antonio Andelic](https://github.com/antonio2368)).
* Improvements for reading from remote filesystems, made threadpool size for reads/writes configurable. Closes [#41070](https://github.com/ClickHouse/ClickHouse/issues/41070). [#41011](https://github.com/ClickHouse/ClickHouse/pull/41011) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Made reestablishing a new connection more reactive in case of expiration of the previous one. Previously there was a task which spawns every minute by default and thus a table could be in readonly state for about this time. [#41092](https://github.com/ClickHouse/ClickHouse/pull/41092) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Support all combinators combination in WindowTransform/arratReduce*/initializeAggregation/aggregate functions versioning. Previously combinators like `ForEach/Resample/Map` didn't work in these places, using them led to exception like`State function ... inserts results into non-state column`. [#41107](https://github.com/ClickHouse/ClickHouse/pull/41107) ([Kruglov Pavel](https://github.com/Avogar)).
* Now projections can be used with zero copy replication. [#41147](https://github.com/ClickHouse/ClickHouse/pull/41147) ([alesapin](https://github.com/alesapin)).
* - Add function tryDecrypt that returns NULL when decrypt fail (e.g. decrypt with incorrect key) instead of throwing exception. [#41206](https://github.com/ClickHouse/ClickHouse/pull/41206) ([Duc Canh Le](https://github.com/canhld94)).
* Add the `unreserved_space` column to the `system.disks` table to check how much space is not taken by reservations per disk. [#41254](https://github.com/ClickHouse/ClickHouse/pull/41254) ([filimonov](https://github.com/filimonov)).
* Support s3 authorisation headers from ast arguments. [#41261](https://github.com/ClickHouse/ClickHouse/pull/41261) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add setting 'allow_implicit_no_password' that forbids creating a user with no password unless 'IDENTIFIED WITH no_password' is explicitly specified. [#41341](https://github.com/ClickHouse/ClickHouse/pull/41341) ([Nikolay Degterinsky](https://github.com/evillique)).
* keeper-improvement: add support for uploading snapshots to S3. S3 information can be defined inside `keeper_server.s3_snapshot`. [#41342](https://github.com/ClickHouse/ClickHouse/pull/41342) ([Antonio Andelic](https://github.com/antonio2368)).
* Add support for MultiRead in Keeper and internal ZooKeeper client. [#41410](https://github.com/ClickHouse/ClickHouse/pull/41410) ([Antonio Andelic](https://github.com/antonio2368)).
* add a support for decimal type comparing with floating point literal in IN operator. [#41544](https://github.com/ClickHouse/ClickHouse/pull/41544) ([liang.huang](https://github.com/lhuang09287750)).
* Allow readable size values in cache config. [#41688](https://github.com/ClickHouse/ClickHouse/pull/41688) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Check file path for path traversal attacks in errors logger for input formats. [#41694](https://github.com/ClickHouse/ClickHouse/pull/41694) ([Kruglov Pavel](https://github.com/Avogar)).
* ClickHouse could cache stale DNS entries for some period of time (15 seconds by default) until the cache won't be updated asynchronously. During these period ClickHouse can nevertheless try to establish a connection and produce errors. This behaviour is fixed. [#41707](https://github.com/ClickHouse/ClickHouse/pull/41707) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add interactive history search with fzf-like utility (fzf/sk) for `clickhouse-client`/`clickhouse-local` (note you can use `FZF_DEFAULT_OPTS`/`SKIM_DEFAULT_OPTIONS` to additionally configure the behavior). [#41730](https://github.com/ClickHouse/ClickHouse/pull/41730) ([Azat Khuzhin](https://github.com/azat)).
* For client when connecting to a secure server with invalid certificate only allow to proceed with '--accept-certificate' flag. [#41743](https://github.com/ClickHouse/ClickHouse/pull/41743) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add function "tryBase58Decode()", similar to the existing function "tryBase64Decode()". [#41824](https://github.com/ClickHouse/ClickHouse/pull/41824) ([Robert Schulze](https://github.com/rschu1ze)).
* Improve feedback when replacing partition with different primary key. Fixes [#34798](https://github.com/ClickHouse/ClickHouse/issues/34798). [#41838](https://github.com/ClickHouse/ClickHouse/pull/41838) ([Salvatore](https://github.com/tbsal)).
* Replace back `clickhouse su` command with `sudo -u` in start in order to respect limits in `/etc/security/limits.conf`. [#41847](https://github.com/ClickHouse/ClickHouse/pull/41847) ([Eugene Konkov](https://github.com/ekonkov)).
* Fix parallel parsing: segmentator now checks max_block_size. [#41852](https://github.com/ClickHouse/ClickHouse/pull/41852) ([Vitaly Baranov](https://github.com/vitlibar)).
* Don't report TABLE_IS_DROPPED exception in order to skip table in case is was just dropped. [#41908](https://github.com/ClickHouse/ClickHouse/pull/41908) ([AlfVII](https://github.com/AlfVII)).
* Improve option enable_extended_results_for_datetime_functions to return results of type DateTime64 for functions toStartOfDay, toStartOfHour, toStartOfFifteenMinutes, toStartOfTenMinutes, toStartOfFiveMinutes, toStartOfMinute and timeSlot. [#41910](https://github.com/ClickHouse/ClickHouse/pull/41910) ([Roman Vasin](https://github.com/rvasin)).
* Improve DateTime type inference for text formats. Now it respect setting `date_time_input_format` and doesn't try to infer datetimes from numbers as timestamps. Closes [#41389](https://github.com/ClickHouse/ClickHouse/issues/41389) Closes [#42206](https://github.com/ClickHouse/ClickHouse/issues/42206). [#41912](https://github.com/ClickHouse/ClickHouse/pull/41912) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove confusing warning when inserting with `perform_ttl_move_on_insert`=false. [#41980](https://github.com/ClickHouse/ClickHouse/pull/41980) ([Vitaly Baranov](https://github.com/vitlibar)).
* Allow user to write `countState(*)` similar to `count(*)`. This closes [#9338](https://github.com/ClickHouse/ClickHouse/issues/9338). [#41983](https://github.com/ClickHouse/ClickHouse/pull/41983) ([Amos Bird](https://github.com/amosbird)).
* - Fix rankCorr size overflow. [#42020](https://github.com/ClickHouse/ClickHouse/pull/42020) ([Duc Canh Le](https://github.com/canhld94)).
* Added an option to specify an arbitrary string as an environment name in the Sentry's config for more handy reports. [#42037](https://github.com/ClickHouse/ClickHouse/pull/42037) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Added system table `asynchronous_insert_log `. It contains information about asynchronous inserts (including results of queries in fire-and-forget mode (with `wait_for_async_insert=0`)) for better introspection. [#42040](https://github.com/ClickHouse/ClickHouse/pull/42040) ([Anton Popov](https://github.com/CurtizJ)).
* Fix parsing out-of-range Date from CSV:. [#42044](https://github.com/ClickHouse/ClickHouse/pull/42044) ([Andrey Zvonov](https://github.com/zvonand)).
* parseDataTimeBestEffort support comma between date and time. Closes [#42038](https://github.com/ClickHouse/ClickHouse/issues/42038). [#42049](https://github.com/ClickHouse/ClickHouse/pull/42049) ([flynn](https://github.com/ucasfl)).
* Add support for methods lz4, bz2, snappy in 'Accept-Encoding'. [#42071](https://github.com/ClickHouse/ClickHouse/pull/42071) ([Nikolay Degterinsky](https://github.com/evillique)).
* Various minor fixes for BLAKE3 function. [#42073](https://github.com/ClickHouse/ClickHouse/pull/42073) ([BoloniniD](https://github.com/BoloniniD)).
* Improved stale replica recovery process for `ReplicatedMergeTree`. If lost replica have some parts which absent on a healthy replica, but these parts should appear in future according to replication queue of the healthy replica, then lost replica will keep such parts instead of detaching them. [#42134](https://github.com/ClickHouse/ClickHouse/pull/42134) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support BACKUP to S3 with as-is path/data structure. [#42232](https://github.com/ClickHouse/ClickHouse/pull/42232) ([Azat Khuzhin](https://github.com/azat)).
* Add a possibility to use Date32 arguments for date_diff function. Fix issue in date_diff function when using DateTime64 arguments with start date before Unix epoch and end date after Unix epoch. [#42308](https://github.com/ClickHouse/ClickHouse/pull/42308) ([Roman Vasin](https://github.com/rvasin)).
* When uploading big parts to minio, 'Complete Multipart Upload' can take a long time. Minio sends heartbeats every 10 seconds (see https://github.com/minio/minio/pull/7198). But clickhouse times out earlier, because the default send/receive timeout is [set](https://github.com/ClickHouse/ClickHouse/blob/cc24fcd6d5dfb67f5f66f5483e986bd1010ad9cf/src/IO/S3/PocoHTTPClient.cpp#L123) to 5 seconds. [#42321](https://github.com/ClickHouse/ClickHouse/pull/42321) ([filimonov](https://github.com/filimonov)).
* Add `S3` as a new type of the destination of backups. [#42333](https://github.com/ClickHouse/ClickHouse/pull/42333) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix rarely invalid cast of aggregate state types with complex types such as Decimal. This fixes [#42408](https://github.com/ClickHouse/ClickHouse/issues/42408). [#42417](https://github.com/ClickHouse/ClickHouse/pull/42417) ([Amos Bird](https://github.com/amosbird)).
* Support skipping cache completely (both download to cache and reading cached data) in case the requested read range exceeds the threshold defined by cache setting `bypass_cache_threashold`, requires to be enabled with `enable_bypass_cache_with_threshold`). [#42418](https://github.com/ClickHouse/ClickHouse/pull/42418) ([Han Shukai](https://github.com/KinderRiven)).
* Merge parts if every part in the range is older than a certain threshold. The threshold can be set by using `min_age_to_force_merge_seconds`. This closes [#35836](https://github.com/ClickHouse/ClickHouse/issues/35836). [#42423](https://github.com/ClickHouse/ClickHouse/pull/42423) ([Antonio Andelic](https://github.com/antonio2368)).
* Enabled CompiledExpressionCache in clickhouse-local. [#42477](https://github.com/ClickHouse/ClickHouse/pull/42477) ([AlfVII](https://github.com/AlfVII)).
* Remove support for the `{database}` macro from the client's prompt. It was displayed incorrectly if the database was unspecified and it was not updated on `USE` statements. This closes [#25891](https://github.com/ClickHouse/ClickHouse/issues/25891). [#42508](https://github.com/ClickHouse/ClickHouse/pull/42508) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Improve the time to recover lost keeper connections. [#42541](https://github.com/ClickHouse/ClickHouse/pull/42541) ([Raúl Marín](https://github.com/Algunenano)).
* Allow to use Date32 arguments for dateName function. [#42554](https://github.com/ClickHouse/ClickHouse/pull/42554) ([Roman Vasin](https://github.com/rvasin)).
#### Bug Fix
* Now filters with NULL literals will be used during index analysis. This closes https://github.com/ClickHouse/ClickHouse/pull/41814 [#34063](https://github.com/ClickHouse/ClickHouse/issues/34063). [#41842](https://github.com/ClickHouse/ClickHouse/pull/41842) ([Amos Bird](https://github.com/amosbird)).
* - Choose correct aggregation method for LowCardinality with BigInt. [#42342](https://github.com/ClickHouse/ClickHouse/pull/42342) ([Duc Canh Le](https://github.com/canhld94)).
* Fix using subqueries in row policy filters. This PR fixes [#32463](https://github.com/ClickHouse/ClickHouse/issues/32463). [#42562](https://github.com/ClickHouse/ClickHouse/pull/42562) ([Vitaly Baranov](https://github.com/vitlibar)).
#### Build/Testing/Packaging Improvement
* Added support of WHERE clause generation to AST Fuzzer and possibility to add or remove ORDER BY and WHERE clause. [#38519](https://github.com/ClickHouse/ClickHouse/pull/38519) ([Ilya Yatsishin](https://github.com/qoega)).
* Aarch64 binaries now require at least ARMv8.2, released in 2016. Most notably, this enables use of ARM LSE, i.e. native atomic operations. Also, CMake build option "NO_ARMV81_OR_HIGHER" has been added to allow compilation of binaries for older ARMv8.0 hardware, e.g. Raspberry Pi 4. [#41610](https://github.com/ClickHouse/ClickHouse/pull/41610) ([Robert Schulze](https://github.com/rschu1ze)).
* After updating runners to 22.04 cgroups stopped to work in privileged mode, here's the issue https://github.com/moby/moby/issues/42275#issuecomment-1115055846. [#41857](https://github.com/ClickHouse/ClickHouse/pull/41857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Allow building ClickHouse with Musl (small changes after it was already supported but broken). [#41987](https://github.com/ClickHouse/ClickHouse/pull/41987) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Add the `$CLICKHOUSE_CRONFILE` file checking to avoid running the `sed` command to get the file not found error. [#42081](https://github.com/ClickHouse/ClickHouse/pull/42081) ([Chun-Sheng, Li](https://github.com/peter279k)).
* Update cctz to the latest master, update tzdb to 2020e. [#42273](https://github.com/ClickHouse/ClickHouse/pull/42273) ([Dom Del Nano](https://github.com/ddelnano)).
* Update tzdata to 2022e to support the new timezone changes. Palestine transitions are now Saturdays at 02:00. Simplify three Ukraine zones into one. Jordan and Syria switch from +02/+03 with DST to year-round +03. (https://data.iana.org/time-zones/tzdb/NEWS). This closes [#42252](https://github.com/ClickHouse/ClickHouse/issues/42252). [#42327](https://github.com/ClickHouse/ClickHouse/pull/42327) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix power8 support. [#42462](https://github.com/ClickHouse/ClickHouse/pull/42462) ([Boris Kuschel](https://github.com/bkuschel)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Several fixes for DiskWeb. [#41652](https://github.com/ClickHouse/ClickHouse/pull/41652) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixes issue when docker run will fail if "https_port" is not present in config. [#41693](https://github.com/ClickHouse/ClickHouse/pull/41693) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Mutations were not cancelled properly on server shutdown or `SYSTEM STOP MERGES` query and cancellation might take long time, it's fixed. [#41699](https://github.com/ClickHouse/ClickHouse/pull/41699) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix wrong result of queries with `ORDER BY` or `GROUP BY` by columns from prefix of sorting key, wrapped into monotonic functions, with enable "read in order" optimization (settings `optimize_read_in_order` and `optimize_aggregation_in_order`). [#41701](https://github.com/ClickHouse/ClickHouse/pull/41701) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible crash in `SELECT` from `Merge` table with enabled `optimize_monotonous_functions_in_order_by` setting. Fixes [#41269](https://github.com/ClickHouse/ClickHouse/issues/41269). [#41740](https://github.com/ClickHouse/ClickHouse/pull/41740) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed "Part ... intersects part ..." error that might happen in extremely rare cases if replica was restarted just after detaching some part as broken. [#41741](https://github.com/ClickHouse/ClickHouse/pull/41741) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Don't allow to create or alter merge tree tables with virtual column name _row_exists, which is reserved for lightweight delete. Fixed [#41716](https://github.com/ClickHouse/ClickHouse/issues/41716). [#41763](https://github.com/ClickHouse/ClickHouse/pull/41763) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix a bug that CORS headers are missing in some HTTP responses. [#41792](https://github.com/ClickHouse/ClickHouse/pull/41792) ([Frank Chen](https://github.com/FrankChen021)).
* 22.9 might fail to startup `ReplicatedMergeTree` table if that table was created by 20.3 or older version and was never altered, it's fixed. Fixes [#41742](https://github.com/ClickHouse/ClickHouse/issues/41742). [#41796](https://github.com/ClickHouse/ClickHouse/pull/41796) ([Alexander Tokmakov](https://github.com/tavplubix)).
* When the batch sending fails for some reason, it cannot be automatically recovered, and if it is not processed in time, it will lead to accumulation, and the printed error message will become longer and longer, which will cause the http thread to block. [#41813](https://github.com/ClickHouse/ClickHouse/pull/41813) ([zhongyuankai](https://github.com/zhongyuankai)).
* Fix compact parts with compressed marks setting. Fixes [#41783](https://github.com/ClickHouse/ClickHouse/issues/41783) and [#41746](https://github.com/ClickHouse/ClickHouse/issues/41746). [#41823](https://github.com/ClickHouse/ClickHouse/pull/41823) ([alesapin](https://github.com/alesapin)).
* Old versions of Replicated database doesn't have a special marker in [Zoo]Keeper. We need to check only whether the node contains come obscure data instead of special mark. [#41875](https://github.com/ClickHouse/ClickHouse/pull/41875) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix possible exception in fs cache. [#41884](https://github.com/ClickHouse/ClickHouse/pull/41884) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix use_environment_credentials for s3 table function. [#41970](https://github.com/ClickHouse/ClickHouse/pull/41970) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed "Directory already exists and is not empty" error on detaching broken part that might prevent `ReplicatedMergeTree` table from starting replication. Fixes [#40957](https://github.com/ClickHouse/ClickHouse/issues/40957). [#41981](https://github.com/ClickHouse/ClickHouse/pull/41981) ([Alexander Tokmakov](https://github.com/tavplubix)).
* toDateTime64() now returns the same output with negative integer and float arguments. [#42025](https://github.com/ClickHouse/ClickHouse/pull/42025) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix write into AzureBlobStorage. Partially closes [#41754](https://github.com/ClickHouse/ClickHouse/issues/41754). [#42034](https://github.com/ClickHouse/ClickHouse/pull/42034) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix the bzip2 decoding issue for specific bzip2 files. [#42046](https://github.com/ClickHouse/ClickHouse/pull/42046) ([Nikolay Degterinsky](https://github.com/evillique)).
* - Fix SQL function "toLastDayOfMonth()" with setting "enable_extended_results_for_datetime_functions = 1" at the beginning of the extended range (January 1900). - Fix SQL function "toRelativeWeekNum()" with setting "enable_extended_results_for_datetime_functions = 1" at the end of extended range (December 2299). - Improve the performance of for SQL functions "toISOYear()", "toFirstDayNumOfISOYearIndex()" and "toYearWeekOfNewyearMode()" by avoiding unnecessary index arithmetics. [#42084](https://github.com/ClickHouse/ClickHouse/pull/42084) ([Roman Vasin](https://github.com/rvasin)).
* The maximum size of fetches for each table accidentally was set to 8 while the pool size could be bigger. Now the maximum size of fetches for table is equal to the pool size. [#42090](https://github.com/ClickHouse/ClickHouse/pull/42090) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* A table might be shut down and a dictionary might be detached before checking if can be dropped without breaking dependencies between table, it's fixed. Fixes [#41982](https://github.com/ClickHouse/ClickHouse/issues/41982). [#42106](https://github.com/ClickHouse/ClickHouse/pull/42106) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bad inefficiency of `remote_filesystem_read_method=read` with filesystem cache. Closes [#42125](https://github.com/ClickHouse/ClickHouse/issues/42125). [#42129](https://github.com/ClickHouse/ClickHouse/pull/42129) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible timeout exception for distributed queries with use_hedged_requests=0. [#42130](https://github.com/ClickHouse/ClickHouse/pull/42130) ([Azat Khuzhin](https://github.com/azat)).
* Fixed a minor bug inside function `runningDifference` in case of using it with `Date32` type. Previously `Date` was used and it may cause some logical errors like `Bad cast from type DB::ColumnVector<int> to DB::ColumnVector<unsigned short>'`. [#42143](https://github.com/ClickHouse/ClickHouse/pull/42143) ([Alfred Xu](https://github.com/sperlingxx)).
* Fix reusing of files > 4GB from base backup. [#42146](https://github.com/ClickHouse/ClickHouse/pull/42146) ([Azat Khuzhin](https://github.com/azat)).
* DISTINCT in order fails with LOGICAL_ERROR if first column in sorting key contains function. [#42186](https://github.com/ClickHouse/ClickHouse/pull/42186) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix a bug with projections and the `aggregate_functions_null_for_empty` setting. This bug is very rare and appears only if you enable the `aggregate_functions_null_for_empty` setting in the server's config. This closes [#41647](https://github.com/ClickHouse/ClickHouse/issues/41647). [#42198](https://github.com/ClickHouse/ClickHouse/pull/42198) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Fix read from buffer with read in order desc. [#42236](https://github.com/ClickHouse/ClickHouse/pull/42236) ([Duc Canh Le](https://github.com/canhld94)).
* Fix a bug which prevents ClickHouse to start when background_pool_size setting is set on default profile but background_merges_mutations_concurrency_ratio is not. [#42315](https://github.com/ClickHouse/ClickHouse/pull/42315) ([nvartolomei](https://github.com/nvartolomei)).
* `ALTER UPDATE` of attached part (with columns different from table schema) could create an invalid `columns.txt` metadata on disk. Reading from such part could fail with errors or return invalid data. Fixes [#42161](https://github.com/ClickHouse/ClickHouse/issues/42161). [#42319](https://github.com/ClickHouse/ClickHouse/pull/42319) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Setting `additional_table_filters` were not applied to `Distributed` storage. Fixes [#41692](https://github.com/ClickHouse/ClickHouse/issues/41692). [#42322](https://github.com/ClickHouse/ClickHouse/pull/42322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix a data race in query finish/cancel. This closes [#42346](https://github.com/ClickHouse/ClickHouse/issues/42346). [#42362](https://github.com/ClickHouse/ClickHouse/pull/42362) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* This reverts [#40217](https://github.com/ClickHouse/ClickHouse/issues/40217) which introduced a regression in date/time functions. [#42367](https://github.com/ClickHouse/ClickHouse/pull/42367) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix assert cast in join on falsy condition, Close [#42380](https://github.com/ClickHouse/ClickHouse/issues/42380). [#42407](https://github.com/ClickHouse/ClickHouse/pull/42407) ([Vladimir C](https://github.com/vdimir)).
* Fix buffer overflow in the processing of Decimal data types. This closes [#42451](https://github.com/ClickHouse/ClickHouse/issues/42451). [#42465](https://github.com/ClickHouse/ClickHouse/pull/42465) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* `AggregateFunctionQuantile` now correctly works with UInt128 columns. Previously, the quantile state interpreted `UInt128` columns as `Int128` which could have led to incorrect results. [#42473](https://github.com/ClickHouse/ClickHouse/pull/42473) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix bad_assert during INSERT into Annoy indexes over non-Float32 columns. [#42485](https://github.com/ClickHouse/ClickHouse/pull/42485) ([Robert Schulze](https://github.com/rschu1ze)).
* This closes [#42453](https://github.com/ClickHouse/ClickHouse/issues/42453). [#42573](https://github.com/ClickHouse/ClickHouse/pull/42573) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix function `arrayElement` with type `Map` with `Nullable` values and `Nullable` index. [#42623](https://github.com/ClickHouse/ClickHouse/pull/42623) ([Anton Popov](https://github.com/CurtizJ)).
#### Bug Fix (user-visible misbehaviour in official stable or prestable release)
* Fix unexpected table loading error when partition key contains alias function names during server upgrade. [#36379](https://github.com/ClickHouse/ClickHouse/pull/36379) ([Amos Bird](https://github.com/amosbird)).
#### Build Improvement
* Fixed SipHash Endian issue for s390x platform. [#41372](https://github.com/ClickHouse/ClickHouse/pull/41372) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Enable lib base64 for ppc64le platform. [#41974](https://github.com/ClickHouse/ClickHouse/pull/41974) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Fixed Endian issue in T64 compression codec on s390x. [#42314](https://github.com/ClickHouse/ClickHouse/pull/42314) ([Harry Lee](https://github.com/HarryLeeIBM)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Disable parallel s3 multipart upload for part moves."'. [#41681](https://github.com/ClickHouse/ClickHouse/pull/41681) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Attempt to fix abort from parallel parsing"'. [#42545](https://github.com/ClickHouse/ClickHouse/pull/42545) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* NO CL ENTRY: 'Revert "Low cardinality cases moved to the function for its corresponding type"'. [#42633](https://github.com/ClickHouse/ClickHouse/pull/42633) ([Anton Popov](https://github.com/CurtizJ)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Test for ignore function in PARTITION KEY [#39875](https://github.com/ClickHouse/ClickHouse/pull/39875) ([UnamedRus](https://github.com/UnamedRus)).
* Add fuzzer for table definitions [#40096](https://github.com/ClickHouse/ClickHouse/pull/40096) ([Anton Popov](https://github.com/CurtizJ)).
* Add missing tests for legacy geobase [#40684](https://github.com/ClickHouse/ClickHouse/pull/40684) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove obsolete comment from the config.xml [#41518](https://github.com/ClickHouse/ClickHouse/pull/41518) ([filimonov](https://github.com/filimonov)).
* Resurrect parallel distributed insert select with s3Cluster [#41535](https://github.com/ClickHouse/ClickHouse/pull/41535) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update runners to a recent version to install on 22.04 [#41556](https://github.com/ClickHouse/ClickHouse/pull/41556) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Refactor wiping sensitive information from logs. [#41562](https://github.com/ClickHouse/ClickHouse/pull/41562) ([Vitaly Baranov](https://github.com/vitlibar)).
* Better S3 logs [#41587](https://github.com/ClickHouse/ClickHouse/pull/41587) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix typos in JSON formats after [#40910](https://github.com/ClickHouse/ClickHouse/issues/40910) [#41614](https://github.com/ClickHouse/ClickHouse/pull/41614) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix drop for KeeperMap [#41616](https://github.com/ClickHouse/ClickHouse/pull/41616) ([Antonio Andelic](https://github.com/antonio2368)).
* increase default max_suspicious_broken_parts to 100 [#41619](https://github.com/ClickHouse/ClickHouse/pull/41619) ([Denny Crane](https://github.com/den-crane)).
* Release AWS SDK log level + replace one exception [#41649](https://github.com/ClickHouse/ClickHouse/pull/41649) ([alesapin](https://github.com/alesapin)).
* Fix a destruction order for views ThreadStatus [#41650](https://github.com/ClickHouse/ClickHouse/pull/41650) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add very explicit logging on disk choice for fetch [#41653](https://github.com/ClickHouse/ClickHouse/pull/41653) ([alesapin](https://github.com/alesapin)).
* Fix race between ~BackgroundSchedulePool and ~DNSCacheUpdater [#41654](https://github.com/ClickHouse/ClickHouse/pull/41654) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add changelog for 22.9 [#41668](https://github.com/ClickHouse/ClickHouse/pull/41668) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version after release [#41670](https://github.com/ClickHouse/ClickHouse/pull/41670) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix error message [#41680](https://github.com/ClickHouse/ClickHouse/pull/41680) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add test for setting output_format_json_validate_utf8 [#41691](https://github.com/ClickHouse/ClickHouse/pull/41691) ([Kruglov Pavel](https://github.com/Avogar)).
* Resolve findings from clang-tidy [#41702](https://github.com/ClickHouse/ClickHouse/pull/41702) ([ltrk2](https://github.com/ltrk2)).
* Ignore Keeper errors from ReplicatedMergeTreeAttachThread in stress tests [#41717](https://github.com/ClickHouse/ClickHouse/pull/41717) ([Antonio Andelic](https://github.com/antonio2368)).
* Collect logs in Stress test using clickhouse-local [#41721](https://github.com/ClickHouse/ClickHouse/pull/41721) ([Antonio Andelic](https://github.com/antonio2368)).
* Disable flaky `test_merge_tree_azure_blob_storage` [#41722](https://github.com/ClickHouse/ClickHouse/pull/41722) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update version_date.tsv and changelogs after v22.9.2.7-stable [#41724](https://github.com/ClickHouse/ClickHouse/pull/41724) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix part removal retries [#41728](https://github.com/ClickHouse/ClickHouse/pull/41728) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Try fix azure tests [#41731](https://github.com/ClickHouse/ClickHouse/pull/41731) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test build [#41732](https://github.com/ClickHouse/ClickHouse/pull/41732) ([Robert Schulze](https://github.com/rschu1ze)).
* Change logging levels in cache [#41733](https://github.com/ClickHouse/ClickHouse/pull/41733) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Revert of "Revert the revert of "ColumnVector: optimize filter with AVX512 VBMI2 compress store" [#40033](https://github.com/ClickHouse/ClickHouse/issues/40033)" [#41752](https://github.com/ClickHouse/ClickHouse/pull/41752) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix SET query parameters formatting [#41755](https://github.com/ClickHouse/ClickHouse/pull/41755) ([Nikolay Degterinsky](https://github.com/evillique)).
* Support to run testcases on macOS [#41760](https://github.com/ClickHouse/ClickHouse/pull/41760) ([Frank Chen](https://github.com/FrankChen021)).
* Bump LLVM from 12 to 13 [#41762](https://github.com/ClickHouse/ClickHouse/pull/41762) ([Robert Schulze](https://github.com/rschu1ze)).
* ColumnVector: re-enable AVX512_VBMI/AVX512_VBMI2 optimized filter and index [#41765](https://github.com/ClickHouse/ClickHouse/pull/41765) ([Guo Wangyang](https://github.com/guowangy)).
* Update 02354_annoy.sql [#41767](https://github.com/ClickHouse/ClickHouse/pull/41767) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix the typo preventing building latest images [#41769](https://github.com/ClickHouse/ClickHouse/pull/41769) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Make automatic download script choose between ARMv8.0 or ARMv8.2 builds [#41775](https://github.com/ClickHouse/ClickHouse/pull/41775) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix tests for docker-ci [#41777](https://github.com/ClickHouse/ClickHouse/pull/41777) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Possible fix for KeeperMap drop [#41784](https://github.com/ClickHouse/ClickHouse/pull/41784) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix drop of completely dropped table [#41789](https://github.com/ClickHouse/ClickHouse/pull/41789) ([alesapin](https://github.com/alesapin)).
* Log git hash during startup [#41790](https://github.com/ClickHouse/ClickHouse/pull/41790) ([Robert Schulze](https://github.com/rschu1ze)).
* Revert "ColumnVector: optimize UInt8 index with AVX512VBMI ([#41247](https://github.com/ClickHouse/ClickHouse/issues/41247))" [#41797](https://github.com/ClickHouse/ClickHouse/pull/41797) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Small fix in dashboard [#41798](https://github.com/ClickHouse/ClickHouse/pull/41798) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Keep the most important log in stress tests [#41821](https://github.com/ClickHouse/ClickHouse/pull/41821) ([alesapin](https://github.com/alesapin)).
* Use copy for some operations instead of hardlinks [#41832](https://github.com/ClickHouse/ClickHouse/pull/41832) ([alesapin](https://github.com/alesapin)).
* Remove unused variable in registerStorageMergeTree.cpp [#41839](https://github.com/ClickHouse/ClickHouse/pull/41839) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix Jepsen [#41845](https://github.com/ClickHouse/ClickHouse/pull/41845) ([Antonio Andelic](https://github.com/antonio2368)).
* Increase `request_timeout_ms` for s3 tests in CI [#41853](https://github.com/ClickHouse/ClickHouse/pull/41853) ([Kseniia Sumarokova](https://github.com/kssenii)).
* tests: fix debug symbols (and possible crashes) for backward compatiblity check [#41854](https://github.com/ClickHouse/ClickHouse/pull/41854) ([Azat Khuzhin](https://github.com/azat)).
* Remove two redundant lines [#41856](https://github.com/ClickHouse/ClickHouse/pull/41856) ([alesapin](https://github.com/alesapin)).
* Infer Object type only when allow_experimental_object_type is enabled [#41858](https://github.com/ClickHouse/ClickHouse/pull/41858) ([Kruglov Pavel](https://github.com/Avogar)).
* Add default UNION/EXCEPT/INTERSECT to the echo query text [#41862](https://github.com/ClickHouse/ClickHouse/pull/41862) ([Nikolay Degterinsky](https://github.com/evillique)).
* Consolidate CMake-generated config headers [#41873](https://github.com/ClickHouse/ClickHouse/pull/41873) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix 02267_file_globs_schema_inference.sql flakiness [#41877](https://github.com/ClickHouse/ClickHouse/pull/41877) ([Kruglov Pavel](https://github.com/Avogar)).
* Docs: Remove obsolete modelEvaluate() mention [#41878](https://github.com/ClickHouse/ClickHouse/pull/41878) ([Robert Schulze](https://github.com/rschu1ze)).
* Better exception message for duplicate column names in schema inference [#41885](https://github.com/ClickHouse/ClickHouse/pull/41885) ([Kruglov Pavel](https://github.com/Avogar)).
* Docs: Reference external papers as DOIs [#41886](https://github.com/ClickHouse/ClickHouse/pull/41886) ([Robert Schulze](https://github.com/rschu1ze)).
* Make LDAPR a prerequisite for downloading the ARMv8.2 build [#41897](https://github.com/ClickHouse/ClickHouse/pull/41897) ([Robert Schulze](https://github.com/rschu1ze)).
* Another sync replicas in test_recovery_replica [#41898](https://github.com/ClickHouse/ClickHouse/pull/41898) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* remove unused code [#41921](https://github.com/ClickHouse/ClickHouse/pull/41921) ([flynn](https://github.com/ucasfl)).
* Move all queries for MV creation to the end of queue during recovering [#41932](https://github.com/ClickHouse/ClickHouse/pull/41932) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix broken test_disks_app_func [#41933](https://github.com/ClickHouse/ClickHouse/pull/41933) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Temporarily disable ThreadFuzzer with TSan [#41943](https://github.com/ClickHouse/ClickHouse/pull/41943) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Enable some disabled S3 tests [#41945](https://github.com/ClickHouse/ClickHouse/pull/41945) ([alesapin](https://github.com/alesapin)).
* QOL log improvements [#41947](https://github.com/ClickHouse/ClickHouse/pull/41947) ([Raúl Marín](https://github.com/Algunenano)).
* Fix non-deterministic test results [#41948](https://github.com/ClickHouse/ClickHouse/pull/41948) ([Robert Schulze](https://github.com/rschu1ze)).
* Earlier throw exception in PullingAsyncPipelineExecutor. [#41949](https://github.com/ClickHouse/ClickHouse/pull/41949) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix linker error [#41950](https://github.com/ClickHouse/ClickHouse/pull/41950) ([ltrk2](https://github.com/ltrk2)).
* Bump LLVM from 13 to 14 [#41951](https://github.com/ClickHouse/ClickHouse/pull/41951) ([Robert Schulze](https://github.com/rschu1ze)).
* Update version_date.tsv and changelogs after v22.3.13.80-lts [#41953](https://github.com/ClickHouse/ClickHouse/pull/41953) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v22.7.6.74-stable [#41954](https://github.com/ClickHouse/ClickHouse/pull/41954) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v22.8.6.71-lts [#41955](https://github.com/ClickHouse/ClickHouse/pull/41955) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v22.9.3.18-stable [#41956](https://github.com/ClickHouse/ClickHouse/pull/41956) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Add a warning message to release.py script, require release type [#41975](https://github.com/ClickHouse/ClickHouse/pull/41975) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Rename max_temp_data_on_disk -> max_temporary_data_on_disk [#41984](https://github.com/ClickHouse/ClickHouse/pull/41984) ([Vladimir C](https://github.com/vdimir)).
* Add more checkStackSize calls [#41991](https://github.com/ClickHouse/ClickHouse/pull/41991) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test 02403_big_http_chunk_size [#41996](https://github.com/ClickHouse/ClickHouse/pull/41996) ([Vitaly Baranov](https://github.com/vitlibar)).
* More sane behavior of part number thresholds override in query level settings [#42001](https://github.com/ClickHouse/ClickHouse/pull/42001) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove useless code [#42004](https://github.com/ClickHouse/ClickHouse/pull/42004) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Refactoring: Uninline some error handling methods [#42010](https://github.com/ClickHouse/ClickHouse/pull/42010) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix warning that ENABLE_REPLXX is unused [#42013](https://github.com/ClickHouse/ClickHouse/pull/42013) ([Robert Schulze](https://github.com/rschu1ze)).
* Drop leftovers of libexecinfo [#42014](https://github.com/ClickHouse/ClickHouse/pull/42014) ([Robert Schulze](https://github.com/rschu1ze)).
* More detailed exception message [#42022](https://github.com/ClickHouse/ClickHouse/pull/42022) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Build against an LLVM version which has clang[-extra-tools], lldb and lld removed [#42023](https://github.com/ClickHouse/ClickHouse/pull/42023) ([Robert Schulze](https://github.com/rschu1ze)).
* Add log message and lower the retry timeout in MergeTreeRestartingThread [#42026](https://github.com/ClickHouse/ClickHouse/pull/42026) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update amqp-cpp [#42031](https://github.com/ClickHouse/ClickHouse/pull/42031) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix No such key during table drop [#42036](https://github.com/ClickHouse/ClickHouse/pull/42036) ([alesapin](https://github.com/alesapin)).
* Temporarily disable too aggressive tests [#42050](https://github.com/ClickHouse/ClickHouse/pull/42050) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix style check [#42055](https://github.com/ClickHouse/ClickHouse/pull/42055) ([Anton Popov](https://github.com/CurtizJ)).
* Function name normalization fix functions header [#42063](https://github.com/ClickHouse/ClickHouse/pull/42063) ([Maksim Kita](https://github.com/kitaisreal)).
* remove unused virtual keyword [#42065](https://github.com/ClickHouse/ClickHouse/pull/42065) ([flynn](https://github.com/ucasfl)).
* Fix crash in `SummingMergeTree` with `LowCardinality` [#42066](https://github.com/ClickHouse/ClickHouse/pull/42066) ([Anton Popov](https://github.com/CurtizJ)).
* Fix drop of completely dropped table [#42067](https://github.com/ClickHouse/ClickHouse/pull/42067) ([alesapin](https://github.com/alesapin)).
* Fix assertion in bloom filter index [#42072](https://github.com/ClickHouse/ClickHouse/pull/42072) ([Anton Popov](https://github.com/CurtizJ)).
* Ignore core.autocrlf for tests references [#42076](https://github.com/ClickHouse/ClickHouse/pull/42076) ([Azat Khuzhin](https://github.com/azat)).
* Fix progress for INSERT SELECT [#42078](https://github.com/ClickHouse/ClickHouse/pull/42078) ([Azat Khuzhin](https://github.com/azat)).
* Avoid adding extra new line after using fuzzy history search [#42080](https://github.com/ClickHouse/ClickHouse/pull/42080) ([Azat Khuzhin](https://github.com/azat)).
* Add `at` to runner AMI, bump gh runner version [#42082](https://github.com/ClickHouse/ClickHouse/pull/42082) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Use send_metadata instead of send_object_metadata [#42085](https://github.com/ClickHouse/ClickHouse/pull/42085) ([Elena Torró](https://github.com/elenatorro)).
* Docs: Preparations to remove misc statements page [#42086](https://github.com/ClickHouse/ClickHouse/pull/42086) ([Robert Schulze](https://github.com/rschu1ze)).
* Followup for TemporaryDataOnDisk [#42103](https://github.com/ClickHouse/ClickHouse/pull/42103) ([Vladimir C](https://github.com/vdimir)).
* Disable 02122_join_group_by_timeout for debug [#42104](https://github.com/ClickHouse/ClickHouse/pull/42104) ([Vladimir C](https://github.com/vdimir)).
* Update version_date.tsv and changelogs after v22.6.9.11-stable [#42114](https://github.com/ClickHouse/ClickHouse/pull/42114) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* JIT compilation migration to LLVM 15 [#42123](https://github.com/ClickHouse/ClickHouse/pull/42123) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix build without TSA [#42128](https://github.com/ClickHouse/ClickHouse/pull/42128) ([Raúl Marín](https://github.com/Algunenano)).
* Update codespell-ignore-words.list [#42132](https://github.com/ClickHouse/ClickHouse/pull/42132) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Add null pointer checks [#42135](https://github.com/ClickHouse/ClickHouse/pull/42135) ([ltrk2](https://github.com/ltrk2)).
* Revert [#27787](https://github.com/ClickHouse/ClickHouse/issues/27787) [#42136](https://github.com/ClickHouse/ClickHouse/pull/42136) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Follow up for [#42129](https://github.com/ClickHouse/ClickHouse/issues/42129) [#42144](https://github.com/ClickHouse/ClickHouse/pull/42144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix checking parent for old-format parts [#42147](https://github.com/ClickHouse/ClickHouse/pull/42147) ([alesapin](https://github.com/alesapin)).
* Revert "Resurrect parallel distributed insert select with s3Cluster [#42150](https://github.com/ClickHouse/ClickHouse/pull/42150) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Docs: Add "TABLE" to CHECK/DESCRIBE statements in sidebar [#42152](https://github.com/ClickHouse/ClickHouse/pull/42152) ([Robert Schulze](https://github.com/rschu1ze)).
* Add logging during merge tree startup [#42163](https://github.com/ClickHouse/ClickHouse/pull/42163) ([alesapin](https://github.com/alesapin)).
* Abort instead of `__builtin_unreachable` in debug builds [#42168](https://github.com/ClickHouse/ClickHouse/pull/42168) ([Alexander Tokmakov](https://github.com/tavplubix)).
* [RFC] Enable -Wshorten-64-to-32 [#42190](https://github.com/ClickHouse/ClickHouse/pull/42190) ([Azat Khuzhin](https://github.com/azat)).
* Fix dialect setting description [#42196](https://github.com/ClickHouse/ClickHouse/pull/42196) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add a test for #658 [#42197](https://github.com/ClickHouse/ClickHouse/pull/42197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* use alias for MergeMutateSelectedEntry share ptr [#42211](https://github.com/ClickHouse/ClickHouse/pull/42211) ([Tian Xinhui](https://github.com/xinhuitian)).
* Fix LLVM build [#42216](https://github.com/ClickHouse/ClickHouse/pull/42216) ([Raúl Marín](https://github.com/Algunenano)).
* Exclude comments from style-check defined extern [#42217](https://github.com/ClickHouse/ClickHouse/pull/42217) ([Vladimir C](https://github.com/vdimir)).
* Update safeExit.cpp [#42220](https://github.com/ClickHouse/ClickHouse/pull/42220) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Disable concurrent parts removal [#42222](https://github.com/ClickHouse/ClickHouse/pull/42222) ([alesapin](https://github.com/alesapin)).
* Fail fast on empty URL in HDFS [#42223](https://github.com/ClickHouse/ClickHouse/pull/42223) ([Ilya Yatsishin](https://github.com/qoega)).
* Add a test for [#2389](https://github.com/ClickHouse/ClickHouse/issues/2389) [#42235](https://github.com/ClickHouse/ClickHouse/pull/42235) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use MultiRead where possible [#42243](https://github.com/ClickHouse/ClickHouse/pull/42243) ([Antonio Andelic](https://github.com/antonio2368)).
* Minor cleanups of LLVM integration [#42249](https://github.com/ClickHouse/ClickHouse/pull/42249) ([Robert Schulze](https://github.com/rschu1ze)).
* remove useless code [#42253](https://github.com/ClickHouse/ClickHouse/pull/42253) ([flynn](https://github.com/ucasfl)).
* Early return of corner cases in selectPartsToMutate function [#42254](https://github.com/ClickHouse/ClickHouse/pull/42254) ([Tian Xinhui](https://github.com/xinhuitian)).
* Refactor the implementation of user-defined functions [#42263](https://github.com/ClickHouse/ClickHouse/pull/42263) ([Vitaly Baranov](https://github.com/vitlibar)).
* assert unused value in test_replicated_merge_tree_compatibility [#42266](https://github.com/ClickHouse/ClickHouse/pull/42266) ([nvartolomei](https://github.com/nvartolomei)).
* Fix Date Interval add/minus over DataTypeDate32 [#42279](https://github.com/ClickHouse/ClickHouse/pull/42279) ([Alfred Xu](https://github.com/sperlingxx)).
* Fix log-level in `clickhouse-disks` [#42302](https://github.com/ClickHouse/ClickHouse/pull/42302) ([Nikolay Degterinsky](https://github.com/evillique)).
* Remove forgotten debug logging [#42313](https://github.com/ClickHouse/ClickHouse/pull/42313) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix another trash in zero-copy replication [#42317](https://github.com/ClickHouse/ClickHouse/pull/42317) ([alesapin](https://github.com/alesapin)).
* go update for diagnostics tool [#42325](https://github.com/ClickHouse/ClickHouse/pull/42325) ([Dale McDiarmid](https://github.com/gingerwizard)).
* Better logging for asynchronous inserts [#42345](https://github.com/ClickHouse/ClickHouse/pull/42345) ([Anton Popov](https://github.com/CurtizJ)).
* Use nfpm packager for archlinux packages [#42349](https://github.com/ClickHouse/ClickHouse/pull/42349) ([Azat Khuzhin](https://github.com/azat)).
* Bump llvm/clang to 15.0.2 [#42351](https://github.com/ClickHouse/ClickHouse/pull/42351) ([Azat Khuzhin](https://github.com/azat)).
* Make getResource() independent from the order of the sections [#42353](https://github.com/ClickHouse/ClickHouse/pull/42353) ([Azat Khuzhin](https://github.com/azat)).
* Smaller threshold for multipart upload part size increase [#42392](https://github.com/ClickHouse/ClickHouse/pull/42392) ([alesapin](https://github.com/alesapin)).
* Better error message for unsupported delimiters in custom formats [#42406](https://github.com/ClickHouse/ClickHouse/pull/42406) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix formatting of `ALTER FREEZE` [#42409](https://github.com/ClickHouse/ClickHouse/pull/42409) ([Anton Popov](https://github.com/CurtizJ)).
* Replace table name in ast fuzzer more often [#42413](https://github.com/ClickHouse/ClickHouse/pull/42413) ([Anton Popov](https://github.com/CurtizJ)).
* Add *-15 tools to cmake.tools for GCC build [#42430](https://github.com/ClickHouse/ClickHouse/pull/42430) ([Ilya Yatsishin](https://github.com/qoega)).
* Deactivate tasks in ReplicatedMergeTree until startup [#42441](https://github.com/ClickHouse/ClickHouse/pull/42441) ([alesapin](https://github.com/alesapin)).
* Revert "Revert [#27787](https://github.com/ClickHouse/ClickHouse/issues/27787)" [#42442](https://github.com/ClickHouse/ClickHouse/pull/42442) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update woboq_codebrowser location [#42448](https://github.com/ClickHouse/ClickHouse/pull/42448) ([Azat Khuzhin](https://github.com/azat)).
* add mdx and jsx to list of doc files [#42454](https://github.com/ClickHouse/ClickHouse/pull/42454) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Remove code browser docs [#42455](https://github.com/ClickHouse/ClickHouse/pull/42455) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Better workaround for emitting .debug_aranges section [#42457](https://github.com/ClickHouse/ClickHouse/pull/42457) ([Azat Khuzhin](https://github.com/azat)).
* Fix flaky test [#42459](https://github.com/ClickHouse/ClickHouse/pull/42459) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix UBSan report in Julian Day functions [#42464](https://github.com/ClickHouse/ClickHouse/pull/42464) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* rename filesystem_query_cache [#42472](https://github.com/ClickHouse/ClickHouse/pull/42472) ([Han Shukai](https://github.com/KinderRiven)).
* Add convenience typedefs for Date/Date32/DateTime/DateTime64 columns [#42476](https://github.com/ClickHouse/ClickHouse/pull/42476) ([Robert Schulze](https://github.com/rschu1ze)).
* Add error "Destination table is myself" to exception list in BC check [#42479](https://github.com/ClickHouse/ClickHouse/pull/42479) ([Kruglov Pavel](https://github.com/Avogar)).
* Get current clickhouse version without sending query in BC check [#42483](https://github.com/ClickHouse/ClickHouse/pull/42483) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix logical error from welchTTest [#42487](https://github.com/ClickHouse/ClickHouse/pull/42487) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Attempt to fix abort from parallel parsing [#42496](https://github.com/ClickHouse/ClickHouse/pull/42496) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Increase threshold for using physical cores for `max_threads` [#42503](https://github.com/ClickHouse/ClickHouse/pull/42503) ([Nikita Taranov](https://github.com/nickitat)).
* Add a test for [#16827](https://github.com/ClickHouse/ClickHouse/issues/16827) [#42511](https://github.com/ClickHouse/ClickHouse/pull/42511) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test for [#13653](https://github.com/ClickHouse/ClickHouse/issues/13653) [#42512](https://github.com/ClickHouse/ClickHouse/pull/42512) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix aliases [#42514](https://github.com/ClickHouse/ClickHouse/pull/42514) ([Nikolay Degterinsky](https://github.com/evillique)).
* tests: fix 00705_drop_create_merge_tree flakiness [#42522](https://github.com/ClickHouse/ClickHouse/pull/42522) ([Azat Khuzhin](https://github.com/azat)).
* Fix sanitizer reports in integration tests [#42529](https://github.com/ClickHouse/ClickHouse/pull/42529) ([Azat Khuzhin](https://github.com/azat)).
* Fix `KeeperTCPHandler` data race [#42532](https://github.com/ClickHouse/ClickHouse/pull/42532) ([Antonio Andelic](https://github.com/antonio2368)).
* Disable `test_storage_nats`, because it's permanently broken [#42535](https://github.com/ClickHouse/ClickHouse/pull/42535) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Better logs in clickhouse-disks [#42549](https://github.com/ClickHouse/ClickHouse/pull/42549) ([Nikolay Degterinsky](https://github.com/evillique)).
* add lib_fuzzer and lib_fuzzer_no_main to llvm-project build [#42550](https://github.com/ClickHouse/ClickHouse/pull/42550) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Some polishing: replicated merge tree [#42560](https://github.com/ClickHouse/ClickHouse/pull/42560) ([Igor Nikonov](https://github.com/devcrafter)).
* Temporarily disable flaky `test_replicated_merge_tree_hdfs_zero_copy` [#42563](https://github.com/ClickHouse/ClickHouse/pull/42563) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Adapt internal data structures to 512-bit era [#42564](https://github.com/ClickHouse/ClickHouse/pull/42564) ([Nikita Taranov](https://github.com/nickitat)).
* Fix strange code in date monotonicity [#42574](https://github.com/ClickHouse/ClickHouse/pull/42574) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Clear thread::id when ThreadFromGlobalPool exits. [#42577](https://github.com/ClickHouse/ClickHouse/pull/42577) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* ci/stress: fix memory limits overrides for hung check [#42585](https://github.com/ClickHouse/ClickHouse/pull/42585) ([Azat Khuzhin](https://github.com/azat)).
* tests: avoid model overlap for obfuscator [#42586](https://github.com/ClickHouse/ClickHouse/pull/42586) ([Azat Khuzhin](https://github.com/azat)).
* Fix possible segfault in expression parser [#42598](https://github.com/ClickHouse/ClickHouse/pull/42598) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix incorrect trace log line on dict reload [#42609](https://github.com/ClickHouse/ClickHouse/pull/42609) ([filimonov](https://github.com/filimonov)).
* Fix flaky 02458_datediff_date32 test [#42611](https://github.com/ClickHouse/ClickHouse/pull/42611) ([Roman Vasin](https://github.com/rvasin)).
* Revert revert 41268 disable s3 parallel write for part moves to disk s3 [#42617](https://github.com/ClickHouse/ClickHouse/pull/42617) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Try to fix data race on zookeeper vs DDLWorker at server shutdown. [#42620](https://github.com/ClickHouse/ClickHouse/pull/42620) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add a template for installation issues [#42626](https://github.com/ClickHouse/ClickHouse/pull/42626) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix typo in cmake code related to fuzzing [#42627](https://github.com/ClickHouse/ClickHouse/pull/42627) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix build [#42635](https://github.com/ClickHouse/ClickHouse/pull/42635) ([Anton Popov](https://github.com/CurtizJ)).
* Add .rgignore for test data [#42639](https://github.com/ClickHouse/ClickHouse/pull/42639) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky 02457_datediff_via_unix_epoch test [#42655](https://github.com/ClickHouse/ClickHouse/pull/42655) ([Roman Vasin](https://github.com/rvasin)).

View File

@ -0,0 +1,29 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.7.7.24-stable (02ad1f979a8) FIXME as compared to v22.7.6.74-stable (c00ffb3c11a)
#### Bug Fix
* Backported in [#42433](https://github.com/ClickHouse/ClickHouse/issues/42433): - Choose correct aggregation method for LowCardinality with BigInt. [#42342](https://github.com/ClickHouse/ClickHouse/pull/42342) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* Backported in [#42329](https://github.com/ClickHouse/ClickHouse/issues/42329): Update cctz to the latest master, update tzdb to 2020e. [#42273](https://github.com/ClickHouse/ClickHouse/pull/42273) ([Dom Del Nano](https://github.com/ddelnano)).
* Backported in [#42359](https://github.com/ClickHouse/ClickHouse/issues/42359): Update tzdata to 2022e to support the new timezone changes. Palestine transitions are now Saturdays at 02:00. Simplify three Ukraine zones into one. Jordan and Syria switch from +02/+03 with DST to year-round +03. (https://data.iana.org/time-zones/tzdb/NEWS). This closes [#42252](https://github.com/ClickHouse/ClickHouse/issues/42252). [#42327](https://github.com/ClickHouse/ClickHouse/pull/42327) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Backported in [#42268](https://github.com/ClickHouse/ClickHouse/issues/42268): Fix reusing of files > 4GB from base backup. [#42146](https://github.com/ClickHouse/ClickHouse/pull/42146) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#42299](https://github.com/ClickHouse/ClickHouse/issues/42299): Fix a bug with projections and the `aggregate_functions_null_for_empty` setting. This bug is very rare and appears only if you enable the `aggregate_functions_null_for_empty` setting in the server's config. This closes [#41647](https://github.com/ClickHouse/ClickHouse/issues/41647). [#42198](https://github.com/ClickHouse/ClickHouse/pull/42198) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42386](https://github.com/ClickHouse/ClickHouse/issues/42386): `ALTER UPDATE` of attached part (with columns different from table schema) could create an invalid `columns.txt` metadata on disk. Reading from such part could fail with errors or return invalid data. Fixes [#42161](https://github.com/ClickHouse/ClickHouse/issues/42161). [#42319](https://github.com/ClickHouse/ClickHouse/pull/42319) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42498](https://github.com/ClickHouse/ClickHouse/issues/42498): Setting `additional_table_filters` were not applied to `Distributed` storage. Fixes [#41692](https://github.com/ClickHouse/ClickHouse/issues/41692). [#42322](https://github.com/ClickHouse/ClickHouse/pull/42322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42593](https://github.com/ClickHouse/ClickHouse/issues/42593): This closes [#42453](https://github.com/ClickHouse/ClickHouse/issues/42453). [#42573](https://github.com/ClickHouse/ClickHouse/pull/42573) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Add a warning message to release.py script, require release type [#41975](https://github.com/ClickHouse/ClickHouse/pull/41975) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Revert [#27787](https://github.com/ClickHouse/ClickHouse/issues/27787) [#42136](https://github.com/ClickHouse/ClickHouse/pull/42136) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).

View File

@ -0,0 +1,37 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.8.7.34-lts (3c38e5e8ab9) FIXME as compared to v22.8.6.71-lts (7bf38a43e30)
#### Improvement
* Backported in [#42096](https://github.com/ClickHouse/ClickHouse/issues/42096): Replace back `clickhouse su` command with `sudo -u` in start in order to respect limits in `/etc/security/limits.conf`. [#41847](https://github.com/ClickHouse/ClickHouse/pull/41847) ([Eugene Konkov](https://github.com/ekonkov)).
#### Bug Fix
* Backported in [#42434](https://github.com/ClickHouse/ClickHouse/issues/42434): - Choose correct aggregation method for LowCardinality with BigInt. [#42342](https://github.com/ClickHouse/ClickHouse/pull/42342) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* Backported in [#42296](https://github.com/ClickHouse/ClickHouse/issues/42296): Update cctz to the latest master, update tzdb to 2020e. [#42273](https://github.com/ClickHouse/ClickHouse/pull/42273) ([Dom Del Nano](https://github.com/ddelnano)).
* Backported in [#42360](https://github.com/ClickHouse/ClickHouse/issues/42360): Update tzdata to 2022e to support the new timezone changes. Palestine transitions are now Saturdays at 02:00. Simplify three Ukraine zones into one. Jordan and Syria switch from +02/+03 with DST to year-round +03. (https://data.iana.org/time-zones/tzdb/NEWS). This closes [#42252](https://github.com/ClickHouse/ClickHouse/issues/42252). [#42327](https://github.com/ClickHouse/ClickHouse/pull/42327) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Backported in [#42489](https://github.com/ClickHouse/ClickHouse/issues/42489): Removed skipping of mutations in unaffected partitions of `MergeTree` tables, because this feature never worked correctly and might cause resurrection of finished mutations. [#40589](https://github.com/ClickHouse/ClickHouse/pull/40589) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#42121](https://github.com/ClickHouse/ClickHouse/issues/42121): Fixed "Part ... intersects part ..." error that might happen in extremely rare cases if replica was restarted just after detaching some part as broken. [#41741](https://github.com/ClickHouse/ClickHouse/pull/41741) ([Alexander Tokmakov](https://github.com/tavplubix)).
* - Prevent crash when passing wrong aggregation states to groupBitmap*. [#41972](https://github.com/ClickHouse/ClickHouse/pull/41972) ([Raúl Marín](https://github.com/Algunenano)).
* - Fix read bytes/rows in X-ClickHouse-Summary with materialized views. [#41973](https://github.com/ClickHouse/ClickHouse/pull/41973) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#42269](https://github.com/ClickHouse/ClickHouse/issues/42269): Fix reusing of files > 4GB from base backup. [#42146](https://github.com/ClickHouse/ClickHouse/pull/42146) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#42300](https://github.com/ClickHouse/ClickHouse/issues/42300): Fix a bug with projections and the `aggregate_functions_null_for_empty` setting. This bug is very rare and appears only if you enable the `aggregate_functions_null_for_empty` setting in the server's config. This closes [#41647](https://github.com/ClickHouse/ClickHouse/issues/41647). [#42198](https://github.com/ClickHouse/ClickHouse/pull/42198) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42387](https://github.com/ClickHouse/ClickHouse/issues/42387): `ALTER UPDATE` of attached part (with columns different from table schema) could create an invalid `columns.txt` metadata on disk. Reading from such part could fail with errors or return invalid data. Fixes [#42161](https://github.com/ClickHouse/ClickHouse/issues/42161). [#42319](https://github.com/ClickHouse/ClickHouse/pull/42319) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42499](https://github.com/ClickHouse/ClickHouse/issues/42499): Setting `additional_table_filters` were not applied to `Distributed` storage. Fixes [#41692](https://github.com/ClickHouse/ClickHouse/issues/41692). [#42322](https://github.com/ClickHouse/ClickHouse/pull/42322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42571](https://github.com/ClickHouse/ClickHouse/issues/42571): Fix buffer overflow in the processing of Decimal data types. This closes [#42451](https://github.com/ClickHouse/ClickHouse/issues/42451). [#42465](https://github.com/ClickHouse/ClickHouse/pull/42465) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42594](https://github.com/ClickHouse/ClickHouse/issues/42594): This closes [#42453](https://github.com/ClickHouse/ClickHouse/issues/42453). [#42573](https://github.com/ClickHouse/ClickHouse/pull/42573) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Add a warning message to release.py script, require release type [#41975](https://github.com/ClickHouse/ClickHouse/pull/41975) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Revert [#27787](https://github.com/ClickHouse/ClickHouse/issues/27787) [#42136](https://github.com/ClickHouse/ClickHouse/pull/42136) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).

View File

@ -0,0 +1,33 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.9.4.32-stable (3db8bcf1a70) FIXME as compared to v22.9.3.18-stable (0cb4b15d2fa)
#### Bug Fix
* Backported in [#42435](https://github.com/ClickHouse/ClickHouse/issues/42435): - Choose correct aggregation method for LowCardinality with BigInt. [#42342](https://github.com/ClickHouse/ClickHouse/pull/42342) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* Backported in [#42297](https://github.com/ClickHouse/ClickHouse/issues/42297): Update cctz to the latest master, update tzdb to 2020e. [#42273](https://github.com/ClickHouse/ClickHouse/pull/42273) ([Dom Del Nano](https://github.com/ddelnano)).
* Backported in [#42361](https://github.com/ClickHouse/ClickHouse/issues/42361): Update tzdata to 2022e to support the new timezone changes. Palestine transitions are now Saturdays at 02:00. Simplify three Ukraine zones into one. Jordan and Syria switch from +02/+03 with DST to year-round +03. (https://data.iana.org/time-zones/tzdb/NEWS). This closes [#42252](https://github.com/ClickHouse/ClickHouse/issues/42252). [#42327](https://github.com/ClickHouse/ClickHouse/pull/42327) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Backported in [#42122](https://github.com/ClickHouse/ClickHouse/issues/42122): Fixed "Part ... intersects part ..." error that might happen in extremely rare cases if replica was restarted just after detaching some part as broken. [#41741](https://github.com/ClickHouse/ClickHouse/pull/41741) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#41938](https://github.com/ClickHouse/ClickHouse/issues/41938): Don't allow to create or alter merge tree tables with virtual column name _row_exists, which is reserved for lightweight delete. Fixed [#41716](https://github.com/ClickHouse/ClickHouse/issues/41716). [#41763](https://github.com/ClickHouse/ClickHouse/pull/41763) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Backported in [#42179](https://github.com/ClickHouse/ClickHouse/issues/42179): Fix reusing of files > 4GB from base backup. [#42146](https://github.com/ClickHouse/ClickHouse/pull/42146) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#42301](https://github.com/ClickHouse/ClickHouse/issues/42301): Fix a bug with projections and the `aggregate_functions_null_for_empty` setting. This bug is very rare and appears only if you enable the `aggregate_functions_null_for_empty` setting in the server's config. This closes [#41647](https://github.com/ClickHouse/ClickHouse/issues/41647). [#42198](https://github.com/ClickHouse/ClickHouse/pull/42198) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42388](https://github.com/ClickHouse/ClickHouse/issues/42388): `ALTER UPDATE` of attached part (with columns different from table schema) could create an invalid `columns.txt` metadata on disk. Reading from such part could fail with errors or return invalid data. Fixes [#42161](https://github.com/ClickHouse/ClickHouse/issues/42161). [#42319](https://github.com/ClickHouse/ClickHouse/pull/42319) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42500](https://github.com/ClickHouse/ClickHouse/issues/42500): Setting `additional_table_filters` were not applied to `Distributed` storage. Fixes [#41692](https://github.com/ClickHouse/ClickHouse/issues/41692). [#42322](https://github.com/ClickHouse/ClickHouse/pull/42322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#42581](https://github.com/ClickHouse/ClickHouse/issues/42581): This reverts [#40217](https://github.com/ClickHouse/ClickHouse/issues/40217) which introduced a regression in date/time functions. [#42367](https://github.com/ClickHouse/ClickHouse/pull/42367) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42572](https://github.com/ClickHouse/ClickHouse/issues/42572): Fix buffer overflow in the processing of Decimal data types. This closes [#42451](https://github.com/ClickHouse/ClickHouse/issues/42451). [#42465](https://github.com/ClickHouse/ClickHouse/pull/42465) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#42595](https://github.com/ClickHouse/ClickHouse/issues/42595): This closes [#42453](https://github.com/ClickHouse/ClickHouse/issues/42453). [#42573](https://github.com/ClickHouse/ClickHouse/pull/42573) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Add a warning message to release.py script, require release type [#41975](https://github.com/ClickHouse/ClickHouse/pull/41975) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Revert [#27787](https://github.com/ClickHouse/ClickHouse/issues/27787) [#42136](https://github.com/ClickHouse/ClickHouse/pull/42136) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).

View File

@ -1,9 +1,12 @@
---
slug: /en/operations/backup
sidebar_position: 49
sidebar_label: Data backup and restore
title: Data backup and restore
---
[//]: # (This file is included in Manage > Backups)
- [Backup to a local disk](#backup-to-a-local-disk)
- [Configuring backup/restore to use an S3 endpoint](#configuring-backuprestore-to-use-an-s3-endpoint)
- [Backup/restore using an S3 disk](#backuprestore-using-an-s3-disk)
- [Alternatives](#alternatives)
## Background
While [replication](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [you cant just drop tables with a MergeTree-like engine containing more than 50 Gb of data](server-configuration-parameters/settings.md#max-table-size-to-drop). However, these safeguards do not cover all possible cases and can be circumvented.
@ -15,7 +18,9 @@ Each company has different resources available and business requirements, so the
Keep in mind that if you backed something up and never tried to restore it, chances are that restore will not work properly when you actually need it (or at least it will take longer than business can tolerate). So whatever backup approach you choose, make sure to automate the restore process as well, and practice it on a spare ClickHouse cluster regularly.
:::
## Configure a backup destination
## Backup to a local disk
### Configure a backup destination
In the examples below you will see the backup destination specified like `Disk('backups', '1.zip')`. To prepare the destination add a file to `/etc/clickhouse-server/config.d/backup_disk.xml` specifying the backup destination. For example, this file defines disk named `backups` and then adds that disk to the **backups > allowed_disk** list:
@ -39,7 +44,7 @@ In the examples below you will see the backup destination specified like `Disk('
</clickhouse>
```
## Parameters
### Parameters
Backups can be either full or incremental, and can include tables (including materialized views, projections, and dictionaries), and databases. Backups can be synchronous (default) or asynchronous. They can be compressed. Backups can be password protected.
@ -52,7 +57,7 @@ The BACKUP and RESTORE statements take a list of DATABASE and TABLE names, a des
- `password` for the file on disk
- `base_backup`: the destination of the previous backup of this source. For example, `Disk('backups', '1.zip')`
## Usage examples
### Usage examples
Backup and then restore a table:
```
@ -81,7 +86,7 @@ RESTORE TABLE test.table AS test.table2 FROM Disk('backups', '1.zip')
BACKUP TABLE test.table3 AS test.table4 TO Disk('backups', '2.zip')
```
## Incremental backups
### Incremental backups
Incremental backups can be taken by specifying the `base_backup`.
:::note
@ -100,7 +105,7 @@ RESTORE TABLE test.table AS test.table2
FROM Disk('backups', 'incremental-a.zip');
```
## Assign a password to the backup
### Assign a password to the backup
Backups written to disk can have a password applied to the file:
```
@ -116,7 +121,7 @@ RESTORE TABLE test.table
SETTINGS password='qwerty'
```
## Compression settings
### Compression settings
If you would like to specify the compression method or level:
```
@ -125,14 +130,14 @@ BACKUP TABLE test.table
SETTINGS compression_method='lzma', compression_level=3
```
## Restore specific partitions
### Restore specific partitions
If specific partitions associated with a table need to be restored these can be specified. To restore partitions 1 and 4 from backup:
```
RESTORE TABLE test.table PARTITIONS '2', '3'
FROM Disk('backups', 'filename.zip')
```
## Check the status of backups
### Check the status of backups
The backup command returns an `id` and `status`, and that `id` can be used to get the status of the backup. This is very useful to check the progress of long ASYNC backups. The example below shows a failure that happened when trying to overwrite an existing backup file:
```sql
@ -171,13 +176,118 @@ end_time: 2022-08-30 09:21:46
1 row in set. Elapsed: 0.002 sec.
```
## Backup to S3
## Configuring BACKUP/RESTORE to use an S3 Endpoint
It is possible to `BACKUP`/`RESTORE` to S3, but this disk should be configured
in a proper way, since by default you will need to backup metadata from local
disk to make backup full.
To write backups to an S3 bucket you need three pieces of information:
- S3 endpoint,
for example `https://mars-doc-test.s3.amazonaws.com/backup-S3/`
- Access key ID,
for example `ABC123`
- Secret access key,
for example `Abc+123`
First of all, you need to configure S3 disk in a special way:
:::note
Creating an S3 bucket is covered in [Use S3 Object Storage as a ClickHouse disk](/docs/en/integrations/data-ingestion/s3/configuring-s3-for-clickhouse-use.md), just come back to this doc after saving the policy, there is no need to configure ClickHouse to use the S3 bucket.
:::
The destination for a backup will be specified like this:
```
S3('<S3 endpoint>/<directory>', '<Access key ID>', '<Secret access key>)
```
```sql
CREATE TABLE data
(
`key` Int,
`value` String,
`array` Array(String)
)
ENGINE = MergeTree
ORDER BY tuple()
```
```sql
INSERT INTO data SELECT *
FROM generateRandom('key Int, value String, array Array(String)')
LIMIT 1000
```
### Create a base (initial) backup
Incremental backups require a _base_ backup to start from, this example will be used
later as the base backup. The first parameter of the S3 destination is the S3 endpoint followed by the directory within the bucket to use for this backup. In this example the directory is named `my_backup`.
```sql
BACKUP TABLE data TO S3('https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup', 'ABC123', 'Abc+123')
```
```response
┌─id───────────────────────────────────┬─status─────────┐
│ de442b75-a66c-4a3c-a193-f76f278c70f3 │ BACKUP_CREATED │
└──────────────────────────────────────┴────────────────┘
```
### Add more data
Incremental backups are populated with the difference between the base backup and the current content of the table being backed up. Add more data before taking the incremental backup:
```sql
INSERT INTO data SELECT *
FROM generateRandom('key Int, value String, array Array(String)')
LIMIT 100
```
### Take an incremental backup
This backup command is similar to the base backup, but adds `SETTINGS base_backup` and the location of the base backup. Note that the destination for the incremental backup is not the same directory as the base, it is the same endpoint with a different target directory within the bucket. The base backup is in `my_backup`, and the incremental will be written to `my_incremental`:
```sql
BACKUP TABLE data TO S3('https://mars-doc-test.s3.amazonaws.com/backup-S3/my_incremental', 'ABC123', 'Abc+123') SETTINGS base_backup = S3('https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup', 'ABC123', 'Abc+123')
```
```response
┌─id───────────────────────────────────┬─status─────────┐
│ f6cd3900-850f-41c9-94f1-0c4df33ea528 │ BACKUP_CREATED │
└──────────────────────────────────────┴────────────────┘
```
### Restore from the incremental backup
This command restores the incremental backup into a new table, `data3`. Note that when an incremental backup is restored, the base backup is also included. Specify only the incremental backup when restoring:
```sql
RESTORE TABLE data AS data3 FROM S3('https://mars-doc-test.s3.amazonaws.com/backup-S3/my_incremental', 'ABC123', 'Abc+123')
```
```response
┌─id───────────────────────────────────┬─status───┐
│ ff0c8c39-7dff-4324-a241-000796de11ca │ RESTORED │
└──────────────────────────────────────┴──────────┘
```
### Verify the count
There were two inserts into the original table `data`, one with 1,000 rows and one with 100 rows, for a total of 1,100. Verify that the restored table has 1,100 rows:
```sql
SELECT count()
FROM data3
```
```response
┌─count()─┐
│ 1100 │
└─────────┘
```
### Verify the content
This compares the content of the original table, `data` with the restored table `data3`:
```sql
SELECT throwIf((
SELECT groupArray(tuple(*))
FROM data
) != (
SELECT groupArray(tuple(*))
FROM data3
), 'Data does not match after BACKUP/RESTORE')
```
## BACKUP/RESTORE Using an S3 Disk
It is also possible to `BACKUP`/`RESTORE` to S3 by configuring an S3 disk in the ClickHouse storage configuration. Configure the disk like this by adding a file to `/etc/clickhouse-server/config.d`:
```xml
<clickhouse>

View File

@ -194,7 +194,7 @@ To restore data from a backup, do the following:
Restoring from a backup does not require stopping the server.
For more information about backups and restoring data, see the [Data Backup](../../../operations/backup.md) section.
For more information about backups and restoring data, see the [Data Backup](/docs/en/manage/backups.mdx) section.
## UNFREEZE PARTITION

View File

@ -81,6 +81,7 @@ Multiple path components can have globs. For being processed file must exist and
- `?` — Substitutes any single character.
- `{some_string,another_string,yet_another_one}` — Substitutes any of strings `'some_string', 'another_string', 'yet_another_one'`.
- `{N..M}` — Substitutes any number in range from N to M including both borders.
- `**` - Fetches all files inside the folder recursively.
Constructions with `{}` are similar to the [remote](remote.md) table function.
@ -119,6 +120,22 @@ Query the data from files named `file000`, `file001`, … , `file999`:
SELECT count(*) FROM file('big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name String, value UInt32');
```
**Example**
Query the data from all files inside `big_dir` directory recursively:
``` sql
SELECT count(*) FROM file('big_dir/**', 'CSV', 'name String, value UInt32');
```
**Example**
Query the data from all `file002` files from any folder inside `big_dir` directory recursively:
``` sql
SELECT count(*) FROM file('big_dir/**/file002', 'CSV', 'name String, value UInt32');
```
## Virtual Columns
- `_path` — Path to the file.

View File

@ -127,6 +127,18 @@ INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-
SELECT name, value FROM existing_table;
```
Glob ** can be used for recursive directory traversal. Consider the below example, it will fetch all files from `my-test-bucket-768` directory recursively:
``` sql
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**', 'CSV', 'name String, value UInt32', 'gzip');
```
The below get data from all `test-data.csv.gz` files from any folder inside `my-test-bucket` directory recursively:
``` sql
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');
```
## Partitioned Write
If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.

View File

@ -87,14 +87,15 @@ SETTINGS
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот метод в новых проектах. По возможности переключите старые проекты на метод, описанный выше.
:::note "Attention"
Не используйте этот метод в новых проектах. По возможности переключите старые проекты на метод, описанный выше.
:::
``` sql
Kafka(kafka_broker_list, kafka_topic_list, kafka_group_name, kafka_format
[, kafka_row_delimiter, kafka_schema, kafka_num_consumers, kafka_skip_broken_messages])
```
:::
</details>
## Описание {#opisanie}

View File

@ -39,9 +39,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -43,9 +43,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
@ -59,7 +60,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
- `sign` — Имя столбца с типом строки: `1` — строка состояния, `-1` — строка отмены состояния.
Тип данных столбца — `Int8`.
Тип данных столбца — `Int8`.
</details>

View File

@ -55,9 +55,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -115,9 +115,10 @@ ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDa
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ, описанный выше.
:::
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ, описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -42,9 +42,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::note "Attention"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -316,9 +316,9 @@ SELECT toStartOfISOYear(toDate('2017-01-01')) AS ISOYear20170101;
Возвращается дата.
:::note "Attention"
Возвращаемое значение для некорректных дат зависит от реализации. ClickHouse может вернуть нулевую дату, выбросить исключение, или выполнить «естественное» перетекание дат между месяцами.
Возвращаемое значение для некорректных дат зависит от реализации. ClickHouse может вернуть нулевую дату, выбросить исключение, или выполнить «естественное» перетекание дат между месяцами.
:::
## toMonday {#tomonday}
Округляет дату или дату-с-временем вниз до ближайшего понедельника.

View File

@ -122,9 +122,9 @@ FROM t_null
Существует два варианта IN-ов с подзапросами (аналогично для JOIN-ов): обычный `IN` / `JOIN` и `GLOBAL IN` / `GLOBAL JOIN`. Они отличаются способом выполнения при распределённой обработке запроса.
:::note "Attention"
Помните, что алгоритмы, описанные ниже, могут работать иначе в зависимости от [настройки](../../operations/settings/settings.md) `distributed_product_mode`.
:::
:::note "Attention"
Помните, что алгоритмы, описанные ниже, могут работать иначе в зависимости от [настройки](../../operations/settings/settings.md) `distributed_product_mode`.
:::
При использовании обычного IN-а, запрос отправляется на удалённые серверы, и на каждом из них выполняются подзапросы в секциях `IN` / `JOIN`.
При использовании `GLOBAL IN` / `GLOBAL JOIN-а`, сначала выполняются все подзапросы для `GLOBAL IN` / `GLOBAL JOIN-ов`, и результаты складываются во временные таблицы. Затем эти временные таблицы передаются на каждый удалённый сервер, и на них выполняются запросы, с использованием этих переданных временных данных.

View File

@ -1,6 +1,10 @@
#pragma once
#include <Interpreters/Cluster.h>
#include <base/types.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <utility>
namespace DB
{
@ -8,21 +12,4 @@ namespace DB
using DatabaseAndTableName = std::pair<String, String>;
using ListOfDatabasesAndTableNames = std::vector<DatabaseAndTableName>;
/// Hierarchical description of the tasks
struct ShardPartitionPiece;
struct ShardPartition;
struct TaskShard;
struct TaskTable;
struct TaskCluster;
struct ClusterPartition;
using PartitionPieces = std::vector<ShardPartitionPiece>;
using TasksPartition = std::map<String, ShardPartition, std::greater<>>;
using ShardInfo = Cluster::ShardInfo;
using TaskShardPtr = std::shared_ptr<TaskShard>;
using TasksShard = std::vector<TaskShardPtr>;
using TasksTable = std::list<TaskTable>;
using ClusterPartitions = std::map<String, ClusterPartition, std::greater<>>;
}

View File

@ -1,7 +1,13 @@
set(CLICKHOUSE_COPIER_SOURCES
"${CMAKE_CURRENT_SOURCE_DIR}/ClusterCopierApp.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/ClusterCopier.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/Internals.cpp")
"${CMAKE_CURRENT_SOURCE_DIR}/Internals.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/ShardPartition.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/ShardPartitionPiece.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/StatusAccumulator.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/TaskCluster.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/TaskShard.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/TaskTable.cpp")
set (CLICKHOUSE_COPIER_LINK
PRIVATE

View File

@ -3,7 +3,8 @@
#include "Aliases.h"
#include "Internals.h"
#include "TaskCluster.h"
#include "TaskTableAndShard.h"
#include "TaskShard.h"
#include "TaskTable.h"
#include "ShardPartition.h"
#include "ShardPartitionPiece.h"
#include "ZooKeeperStaff.h"

View File

@ -1,17 +1,22 @@
#pragma once
#include "Aliases.h"
#include <base/types.h>
#include <map>
namespace DB
{
/// Contains info about all shards that contain a partition
struct ClusterPartition
{
double elapsed_time_seconds = 0;
UInt64 bytes_copied = 0;
UInt64 rows_copied = 0;
UInt64 blocks_copied = 0;
UInt64 total_tries = 0;
};
/// Contains info about all shards that contain a partition
struct ClusterPartition
{
double elapsed_time_seconds = 0;
UInt64 bytes_copied = 0;
UInt64 rows_copied = 0;
UInt64 blocks_copied = 0;
UInt64 total_tries = 0;
};
using ClusterPartitions = std::map<String, ClusterPartition, std::greater<>>;
}

View File

@ -0,0 +1,70 @@
#include "ShardPartition.h"
#include "TaskShard.h"
#include "TaskTable.h"
namespace DB
{
ShardPartition::ShardPartition(TaskShard & parent, String name_quoted_, size_t number_of_splits)
: task_shard(parent)
, name(std::move(name_quoted_))
{
pieces.reserve(number_of_splits);
}
String ShardPartition::getPartitionCleanStartPath() const
{
return getPartitionPath() + "/clean_start";
}
String ShardPartition::getPartitionPieceCleanStartPath(size_t current_piece_number) const
{
assert(current_piece_number < task_shard.task_table.number_of_splits);
return getPartitionPiecePath(current_piece_number) + "/clean_start";
}
String ShardPartition::getPartitionPath() const
{
return task_shard.task_table.getPartitionPath(name);
}
String ShardPartition::getPartitionPiecePath(size_t current_piece_number) const
{
assert(current_piece_number < task_shard.task_table.number_of_splits);
return task_shard.task_table.getPartitionPiecePath(name, current_piece_number);
}
String ShardPartition::getShardStatusPath() const
{
// schema: /<root...>/tables/<table>/<partition>/shards/<shard>
// e.g. /root/table_test.hits/201701/shards/1
return getPartitionShardsPath() + "/" + toString(task_shard.numberInCluster());
}
String ShardPartition::getPartitionShardsPath() const
{
return getPartitionPath() + "/shards";
}
String ShardPartition::getPartitionActiveWorkersPath() const
{
return getPartitionPath() + "/partition_active_workers";
}
String ShardPartition::getActiveWorkerPath() const
{
return getPartitionActiveWorkersPath() + "/" + toString(task_shard.numberInCluster());
}
String ShardPartition::getCommonPartitionIsDirtyPath() const
{
return getPartitionPath() + "/is_dirty";
}
String ShardPartition::getCommonPartitionIsCleanedPath() const
{
return getCommonPartitionIsDirtyPath() + "/cleaned";
}
}

View File

@ -1,19 +1,23 @@
#pragma once
#include "Aliases.h"
#include "TaskTableAndShard.h"
#include "ShardPartitionPiece.h"
#include <base/types.h>
#include <map>
namespace DB
{
struct TaskShard;
/// Just destination partition of a shard
/// I don't know what this comment means.
/// In short, when we discovered what shards contain currently processing partition,
/// This class describes a partition (name) that is stored on the shard (parent).
struct ShardPartition
{
ShardPartition(TaskShard &parent, String name_quoted_, size_t number_of_splits = 10)
: task_shard(parent), name(std::move(name_quoted_)) { pieces.reserve(number_of_splits); }
ShardPartition(TaskShard &parent, String name_quoted_, size_t number_of_splits = 10);
String getPartitionPath() const;
@ -45,58 +49,6 @@ struct ShardPartition
String name;
};
inline String ShardPartition::getPartitionCleanStartPath() const
{
return getPartitionPath() + "/clean_start";
}
inline String ShardPartition::getPartitionPieceCleanStartPath(size_t current_piece_number) const
{
assert(current_piece_number < task_shard.task_table.number_of_splits);
return getPartitionPiecePath(current_piece_number) + "/clean_start";
}
inline String ShardPartition::getPartitionPath() const
{
return task_shard.task_table.getPartitionPath(name);
}
inline String ShardPartition::getPartitionPiecePath(size_t current_piece_number) const
{
assert(current_piece_number < task_shard.task_table.number_of_splits);
return task_shard.task_table.getPartitionPiecePath(name, current_piece_number);
}
inline String ShardPartition::getShardStatusPath() const
{
// schema: /<root...>/tables/<table>/<partition>/shards/<shard>
// e.g. /root/table_test.hits/201701/shards/1
return getPartitionShardsPath() + "/" + toString(task_shard.numberInCluster());
}
inline String ShardPartition::getPartitionShardsPath() const
{
return getPartitionPath() + "/shards";
}
inline String ShardPartition::getPartitionActiveWorkersPath() const
{
return getPartitionPath() + "/partition_active_workers";
}
inline String ShardPartition::getActiveWorkerPath() const
{
return getPartitionActiveWorkersPath() + "/" + toString(task_shard.numberInCluster());
}
inline String ShardPartition::getCommonPartitionIsDirtyPath() const
{
return getPartitionPath() + "/is_dirty";
}
inline String ShardPartition::getCommonPartitionIsCleanedPath() const
{
return getCommonPartitionIsDirtyPath() + "/cleaned";
}
using TasksPartition = std::map<String, ShardPartition, std::greater<>>;
}

View File

@ -0,0 +1,64 @@
#include "ShardPartitionPiece.h"
#include "ShardPartition.h"
#include "TaskShard.h"
#include <IO/WriteHelpers.h>
namespace DB
{
ShardPartitionPiece::ShardPartitionPiece(ShardPartition & parent, size_t current_piece_number_, bool is_present_piece_)
: is_absent_piece(!is_present_piece_)
, current_piece_number(current_piece_number_)
, shard_partition(parent)
{
}
String ShardPartitionPiece::getPartitionPiecePath() const
{
return shard_partition.getPartitionPath() + "/piece_" + toString(current_piece_number);
}
String ShardPartitionPiece::getPartitionPieceCleanStartPath() const
{
return getPartitionPiecePath() + "/clean_start";
}
String ShardPartitionPiece::getPartitionPieceIsDirtyPath() const
{
return getPartitionPiecePath() + "/is_dirty";
}
String ShardPartitionPiece::getPartitionPieceIsCleanedPath() const
{
return getPartitionPieceIsDirtyPath() + "/cleaned";
}
String ShardPartitionPiece::getPartitionPieceActiveWorkersPath() const
{
return getPartitionPiecePath() + "/partition_piece_active_workers";
}
String ShardPartitionPiece::getActiveWorkerPath() const
{
return getPartitionPieceActiveWorkersPath() + "/" + toString(shard_partition.task_shard.numberInCluster());
}
/// On what shards do we have current partition.
String ShardPartitionPiece::getPartitionPieceShardsPath() const
{
return getPartitionPiecePath() + "/shards";
}
String ShardPartitionPiece::getShardStatusPath() const
{
return getPartitionPieceShardsPath() + "/" + toString(shard_partition.task_shard.numberInCluster());
}
String ShardPartitionPiece::getPartitionPieceCleanerPath() const
{
return getPartitionPieceIsDirtyPath() + "/cleaner";
}
}

View File

@ -1,16 +1,15 @@
#pragma once
#include "Internals.h"
#include <base/types.h>
namespace DB
{
struct ShardPartition;
struct ShardPartitionPiece
{
ShardPartitionPiece(ShardPartition &parent, size_t current_piece_number_, bool is_present_piece_)
: is_absent_piece(!is_present_piece_), current_piece_number(current_piece_number_),
shard_partition(parent) {}
ShardPartitionPiece(ShardPartition & parent, size_t current_piece_number_, bool is_present_piece_);
String getPartitionPiecePath() const;
@ -37,52 +36,6 @@ struct ShardPartitionPiece
ShardPartition & shard_partition;
};
inline String ShardPartitionPiece::getPartitionPiecePath() const
{
return shard_partition.getPartitionPath() + "/piece_" + toString(current_piece_number);
}
inline String ShardPartitionPiece::getPartitionPieceCleanStartPath() const
{
return getPartitionPiecePath() + "/clean_start";
}
inline String ShardPartitionPiece::getPartitionPieceIsDirtyPath() const
{
return getPartitionPiecePath() + "/is_dirty";
}
inline String ShardPartitionPiece::getPartitionPieceIsCleanedPath() const
{
return getPartitionPieceIsDirtyPath() + "/cleaned";
}
inline String ShardPartitionPiece::getPartitionPieceActiveWorkersPath() const
{
return getPartitionPiecePath() + "/partition_piece_active_workers";
}
inline String ShardPartitionPiece::getActiveWorkerPath() const
{
return getPartitionPieceActiveWorkersPath() + "/" + toString(shard_partition.task_shard.numberInCluster());
}
/// On what shards do we have current partition.
inline String ShardPartitionPiece::getPartitionPieceShardsPath() const
{
return getPartitionPiecePath() + "/shards";
}
inline String ShardPartitionPiece::getShardStatusPath() const
{
return getPartitionPieceShardsPath() + "/" + toString(shard_partition.task_shard.numberInCluster());
}
inline String ShardPartitionPiece::getPartitionPieceCleanerPath() const
{
return getPartitionPieceIsDirtyPath() + "/cleaner";
}
using PartitionPieces = std::vector<ShardPartitionPiece>;
}

View File

@ -0,0 +1,48 @@
#include "StatusAccumulator.h"
#include <Poco/JSON/Parser.h>
#include <Poco/JSON/JSON.h>
#include <Poco/JSON/Object.h>
#include <Poco/JSON/Stringifier.h>
#include <iostream>
namespace DB
{
StatusAccumulator::MapPtr StatusAccumulator::fromJSON(String state_json)
{
Poco::JSON::Parser parser;
auto state = parser.parse(state_json).extract<Poco::JSON::Object::Ptr>();
MapPtr result_ptr = std::make_shared<Map>();
for (const auto & table_name : state->getNames())
{
auto table_status_json = state->getValue<String>(table_name);
auto table_status = parser.parse(table_status_json).extract<Poco::JSON::Object::Ptr>();
/// Map entry will be created if it is absent
auto & map_table_status = (*result_ptr)[table_name];
map_table_status.all_partitions_count += table_status->getValue<size_t>("all_partitions_count");
map_table_status.processed_partitions_count += table_status->getValue<size_t>("processed_partitions_count");
}
return result_ptr;
}
String StatusAccumulator::serializeToJSON(MapPtr statuses)
{
Poco::JSON::Object result_json;
for (const auto & [table_name, table_status] : *statuses)
{
Poco::JSON::Object status_json;
status_json.set("all_partitions_count", table_status.all_partitions_count);
status_json.set("processed_partitions_count", table_status.processed_partitions_count);
result_json.set(table_name, status_json);
}
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(result_json, oss);
auto result = oss.str();
return result;
}
}

View File

@ -1,65 +1,27 @@
#pragma once
#include <base/types.h>
#include <Poco/JSON/Parser.h>
#include <Poco/JSON/JSON.h>
#include <Poco/JSON/Object.h>
#include <Poco/JSON/Stringifier.h>
#include <unordered_map>
#include <memory>
#include <string>
#include <iostream>
#include <unordered_map>
namespace DB
{
class StatusAccumulator
{
public:
struct TableStatus
{
size_t all_partitions_count;
size_t processed_partitions_count;
};
public:
struct TableStatus
{
size_t all_partitions_count;
size_t processed_partitions_count;
};
using Map = std::unordered_map<std::string, TableStatus>;
using MapPtr = std::shared_ptr<Map>;
using Map = std::unordered_map<String, TableStatus>;
using MapPtr = std::shared_ptr<Map>;
static MapPtr fromJSON(std::string state_json)
{
Poco::JSON::Parser parser;
auto state = parser.parse(state_json).extract<Poco::JSON::Object::Ptr>();
MapPtr result_ptr = std::make_shared<Map>();
for (const auto & table_name : state->getNames())
{
auto table_status_json = state->getValue<std::string>(table_name);
auto table_status = parser.parse(table_status_json).extract<Poco::JSON::Object::Ptr>();
/// Map entry will be created if it is absent
auto & map_table_status = (*result_ptr)[table_name];
map_table_status.all_partitions_count += table_status->getValue<size_t>("all_partitions_count");
map_table_status.processed_partitions_count += table_status->getValue<size_t>("processed_partitions_count");
}
return result_ptr;
}
static std::string serializeToJSON(MapPtr statuses)
{
Poco::JSON::Object result_json;
for (const auto & [table_name, table_status] : *statuses)
{
Poco::JSON::Object status_json;
status_json.set("all_partitions_count", table_status.all_partitions_count);
status_json.set("processed_partitions_count", table_status.processed_partitions_count);
result_json.set(table_name, status_json);
}
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(result_json, oss);
auto result = oss.str();
return result;
}
static MapPtr fromJSON(String state_json);
static String serializeToJSON(MapPtr statuses);
};
}

View File

@ -0,0 +1,74 @@
#include "TaskCluster.h"
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
TaskCluster::TaskCluster(const String & task_zookeeper_path_, const String & default_local_database_)
: task_zookeeper_path(task_zookeeper_path_)
, default_local_database(default_local_database_)
{}
void DB::TaskCluster::loadTasks(const Poco::Util::AbstractConfiguration & config, const String & base_key)
{
String prefix = base_key.empty() ? "" : base_key + ".";
clusters_prefix = prefix + "remote_servers";
if (!config.has(clusters_prefix))
throw Exception("You should specify list of clusters in " + clusters_prefix, ErrorCodes::BAD_ARGUMENTS);
Poco::Util::AbstractConfiguration::Keys tables_keys;
config.keys(prefix + "tables", tables_keys);
for (const auto & table_key : tables_keys)
{
table_tasks.emplace_back(*this, config, prefix + "tables", table_key);
}
}
void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & config, const String & base_key)
{
String prefix = base_key.empty() ? "" : base_key + ".";
max_workers = config.getUInt64(prefix + "max_workers");
settings_common = Settings();
if (config.has(prefix + "settings"))
settings_common.loadSettingsFromConfig(prefix + "settings", config);
settings_common.prefer_localhost_replica = false;
settings_pull = settings_common;
if (config.has(prefix + "settings_pull"))
settings_pull.loadSettingsFromConfig(prefix + "settings_pull", config);
settings_push = settings_common;
if (config.has(prefix + "settings_push"))
settings_push.loadSettingsFromConfig(prefix + "settings_push", config);
auto set_default_value = [] (auto && setting, auto && default_value)
{
setting = setting.changed ? setting.value : default_value;
};
/// Override important settings
settings_pull.readonly = 1;
settings_pull.prefer_localhost_replica = false;
settings_push.insert_distributed_sync = true;
settings_push.prefer_localhost_replica = false;
set_default_value(settings_pull.load_balancing, LoadBalancing::NEAREST_HOSTNAME);
set_default_value(settings_pull.max_threads, 1);
set_default_value(settings_pull.max_block_size, 8192UL);
set_default_value(settings_pull.preferred_block_size_bytes, 0);
set_default_value(settings_push.insert_distributed_timeout, 0);
set_default_value(settings_push.replication_alter_partitions_sync, 2);
}
}

View File

@ -1,21 +1,20 @@
#pragma once
#include "Aliases.h"
#include "TaskTable.h"
#include <Core/Settings.h>
#include <base/types.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <random>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
struct TaskCluster
{
TaskCluster(const String & task_zookeeper_path_, const String & default_local_database_)
: task_zookeeper_path(task_zookeeper_path_)
, default_local_database(default_local_database_)
{}
TaskCluster(const String & task_zookeeper_path_, const String & default_local_database_);
void loadTasks(const Poco::Util::AbstractConfiguration & config, const String & base_key = "");
@ -50,61 +49,4 @@ struct TaskCluster
pcg64 random_engine;
};
inline void DB::TaskCluster::loadTasks(const Poco::Util::AbstractConfiguration & config, const String & base_key)
{
String prefix = base_key.empty() ? "" : base_key + ".";
clusters_prefix = prefix + "remote_servers";
if (!config.has(clusters_prefix))
throw Exception("You should specify list of clusters in " + clusters_prefix, ErrorCodes::BAD_ARGUMENTS);
Poco::Util::AbstractConfiguration::Keys tables_keys;
config.keys(prefix + "tables", tables_keys);
for (const auto & table_key : tables_keys)
{
table_tasks.emplace_back(*this, config, prefix + "tables", table_key);
}
}
inline void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & config, const String & base_key)
{
String prefix = base_key.empty() ? "" : base_key + ".";
max_workers = config.getUInt64(prefix + "max_workers");
settings_common = Settings();
if (config.has(prefix + "settings"))
settings_common.loadSettingsFromConfig(prefix + "settings", config);
settings_common.prefer_localhost_replica = 0;
settings_pull = settings_common;
if (config.has(prefix + "settings_pull"))
settings_pull.loadSettingsFromConfig(prefix + "settings_pull", config);
settings_push = settings_common;
if (config.has(prefix + "settings_push"))
settings_push.loadSettingsFromConfig(prefix + "settings_push", config);
auto set_default_value = [] (auto && setting, auto && default_value)
{
setting = setting.changed ? setting.value : default_value;
};
/// Override important settings
settings_pull.readonly = 1;
settings_pull.prefer_localhost_replica = false;
settings_push.insert_distributed_sync = true;
settings_push.prefer_localhost_replica = false;
set_default_value(settings_pull.load_balancing, LoadBalancing::NEAREST_HOSTNAME);
set_default_value(settings_pull.max_threads, 1);
set_default_value(settings_pull.max_block_size, 8192UL);
set_default_value(settings_pull.preferred_block_size_bytes, 0);
set_default_value(settings_push.insert_distributed_timeout, 0);
set_default_value(settings_push.replication_alter_partitions_sync, 2);
}
}

View File

@ -0,0 +1,37 @@
#include "TaskShard.h"
#include "TaskTable.h"
namespace DB
{
TaskShard::TaskShard(TaskTable & parent, const Cluster::ShardInfo & info_)
: task_table(parent)
, info(info_)
{
list_of_split_tables_on_shard.assign(task_table.number_of_splits, DatabaseAndTableName());
}
UInt32 TaskShard::numberInCluster() const
{
return info.shard_num;
}
UInt32 TaskShard::indexInCluster() const
{
return info.shard_num - 1;
}
String DB::TaskShard::getDescription() const
{
return fmt::format("N{} (having a replica {}, pull table {} of cluster {}",
numberInCluster(), getHostNameExample(), getQuotedTable(task_table.table_pull), task_table.cluster_pull_name);
}
String DB::TaskShard::getHostNameExample() const
{
const auto & replicas = task_table.cluster_pull->getShardsAddresses().at(indexInCluster());
return replicas.at(0).readableString();
}
}

View File

@ -0,0 +1,56 @@
#pragma once
#include "Aliases.h"
#include "Internals.h"
#include "ClusterPartition.h"
#include "ShardPartition.h"
namespace DB
{
struct TaskTable;
struct TaskShard
{
TaskShard(TaskTable & parent, const Cluster::ShardInfo & info_);
TaskTable & task_table;
Cluster::ShardInfo info;
UInt32 numberInCluster() const;
UInt32 indexInCluster() const;
String getDescription() const;
String getHostNameExample() const;
/// Used to sort clusters by their proximity
ShardPriority priority;
/// Column with unique destination partitions (computed from engine_push_partition_key expr.) in the shard
ColumnWithTypeAndName partition_key_column;
/// There is a task for each destination partition
TasksPartition partition_tasks;
/// Which partitions have been checked for existence
/// If some partition from this lists is exists, it is in partition_tasks
std::set<String> checked_partitions;
/// Last CREATE TABLE query of the table of the shard
ASTPtr current_pull_table_create_query;
ASTPtr current_push_table_create_query;
/// Internal distributed tables
DatabaseAndTableName table_read_shard;
DatabaseAndTableName main_table_split_shard;
ListOfDatabasesAndTableNames list_of_split_tables_on_shard;
};
using TaskShardPtr = std::shared_ptr<TaskShard>;
using TasksShard = std::vector<TaskShardPtr>;
}

View File

@ -0,0 +1,221 @@
#include "TaskTable.h"
#include "ClusterPartition.h"
#include "TaskCluster.h"
#include <Parsers/ASTFunction.h>
#include <boost/algorithm/string/join.hpp>
namespace DB
{
namespace ErrorCodes
{
extern const int UNKNOWN_ELEMENT_IN_CONFIG;
extern const int LOGICAL_ERROR;
}
TaskTable::TaskTable(TaskCluster & parent, const Poco::Util::AbstractConfiguration & config,
const String & prefix_, const String & table_key)
: task_cluster(parent)
{
String table_prefix = prefix_ + "." + table_key + ".";
name_in_config = table_key;
number_of_splits = config.getUInt64(table_prefix + "number_of_splits", 3);
allow_to_copy_alias_and_materialized_columns = config.getBool(table_prefix + "allow_to_copy_alias_and_materialized_columns", false);
allow_to_drop_target_partitions = config.getBool(table_prefix + "allow_to_drop_target_partitions", false);
cluster_pull_name = config.getString(table_prefix + "cluster_pull");
cluster_push_name = config.getString(table_prefix + "cluster_push");
table_pull.first = config.getString(table_prefix + "database_pull");
table_pull.second = config.getString(table_prefix + "table_pull");
table_push.first = config.getString(table_prefix + "database_push");
table_push.second = config.getString(table_prefix + "table_push");
/// Used as node name in ZooKeeper
table_id = escapeForFileName(cluster_push_name)
+ "." + escapeForFileName(table_push.first)
+ "." + escapeForFileName(table_push.second);
engine_push_str = config.getString(table_prefix + "engine", "rand()");
{
ParserStorage parser_storage;
engine_push_ast = parseQuery(parser_storage, engine_push_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
engine_push_partition_key_ast = extractPartitionKey(engine_push_ast);
primary_key_comma_separated = boost::algorithm::join(extractPrimaryKeyColumnNames(engine_push_ast), ", ");
is_replicated_table = isReplicatedTableEngine(engine_push_ast);
}
sharding_key_str = config.getString(table_prefix + "sharding_key");
auxiliary_engine_split_asts.reserve(number_of_splits);
{
ParserExpressionWithOptionalAlias parser_expression(false);
sharding_key_ast = parseQuery(parser_expression, sharding_key_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
main_engine_split_ast = createASTStorageDistributed(cluster_push_name, table_push.first, table_push.second,
sharding_key_ast);
for (const auto piece_number : collections::range(0, number_of_splits))
{
auxiliary_engine_split_asts.emplace_back
(
createASTStorageDistributed(cluster_push_name, table_push.first,
table_push.second + "_piece_" + toString(piece_number), sharding_key_ast)
);
}
}
where_condition_str = config.getString(table_prefix + "where_condition", "");
if (!where_condition_str.empty())
{
ParserExpressionWithOptionalAlias parser_expression(false);
where_condition_ast = parseQuery(parser_expression, where_condition_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
// Will use canonical expression form
where_condition_str = queryToString(where_condition_ast);
}
String enabled_partitions_prefix = table_prefix + "enabled_partitions";
has_enabled_partitions = config.has(enabled_partitions_prefix);
if (has_enabled_partitions)
{
Strings keys;
config.keys(enabled_partitions_prefix, keys);
if (keys.empty())
{
/// Parse list of partition from space-separated string
String partitions_str = config.getString(table_prefix + "enabled_partitions");
boost::trim_if(partitions_str, isWhitespaceASCII);
boost::split(enabled_partitions, partitions_str, isWhitespaceASCII, boost::token_compress_on);
}
else
{
/// Parse sequence of <partition>...</partition>
for (const String &key : keys)
{
if (!startsWith(key, "partition"))
throw Exception("Unknown key " + key + " in " + enabled_partitions_prefix, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG);
enabled_partitions.emplace_back(config.getString(enabled_partitions_prefix + "." + key));
}
}
std::copy(enabled_partitions.begin(), enabled_partitions.end(), std::inserter(enabled_partitions_set, enabled_partitions_set.begin()));
}
}
String TaskTable::getPartitionPath(const String & partition_name) const
{
return task_cluster.task_zookeeper_path // root
+ "/tables/" + table_id // tables/dst_cluster.merge.hits
+ "/" + escapeForFileName(partition_name); // 201701
}
String TaskTable::getPartitionAttachIsActivePath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/attach_active";
}
String TaskTable::getPartitionAttachIsDonePath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/attach_is_done";
}
String TaskTable::getPartitionPiecePath(const String & partition_name, size_t piece_number) const
{
assert(piece_number < number_of_splits);
return getPartitionPath(partition_name) + "/piece_" + toString(piece_number); // 1...number_of_splits
}
String TaskTable::getCertainPartitionIsDirtyPath(const String &partition_name) const
{
return getPartitionPath(partition_name) + "/is_dirty";
}
String TaskTable::getCertainPartitionPieceIsDirtyPath(const String & partition_name, const size_t piece_number) const
{
return getPartitionPiecePath(partition_name, piece_number) + "/is_dirty";
}
String TaskTable::getCertainPartitionIsCleanedPath(const String & partition_name) const
{
return getCertainPartitionIsDirtyPath(partition_name) + "/cleaned";
}
String TaskTable::getCertainPartitionPieceIsCleanedPath(const String & partition_name, const size_t piece_number) const
{
return getCertainPartitionPieceIsDirtyPath(partition_name, piece_number) + "/cleaned";
}
String TaskTable::getCertainPartitionTaskStatusPath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/shards";
}
String TaskTable::getCertainPartitionPieceTaskStatusPath(const String & partition_name, const size_t piece_number) const
{
return getPartitionPiecePath(partition_name, piece_number) + "/shards";
}
bool TaskTable::isReplicatedTable() const
{
return is_replicated_table;
}
String TaskTable::getStatusAllPartitionCount() const
{
return task_cluster.task_zookeeper_path + "/status/all_partitions_count";
}
String TaskTable::getStatusProcessedPartitionsCount() const
{
return task_cluster.task_zookeeper_path + "/status/processed_partitions_count";
}
ASTPtr TaskTable::rewriteReplicatedCreateQueryToPlain() const
{
ASTPtr prev_engine_push_ast = engine_push_ast->clone();
auto & new_storage_ast = prev_engine_push_ast->as<ASTStorage &>();
auto & new_engine_ast = new_storage_ast.engine->as<ASTFunction &>();
/// Remove "Replicated" from name
new_engine_ast.name = new_engine_ast.name.substr(10);
if (new_engine_ast.arguments)
{
auto & replicated_table_arguments = new_engine_ast.arguments->children;
/// In some cases of Atomic database engine usage ReplicatedMergeTree tables
/// could be created without arguments.
if (!replicated_table_arguments.empty())
{
/// Delete first two arguments of Replicated...MergeTree() table.
replicated_table_arguments.erase(replicated_table_arguments.begin());
replicated_table_arguments.erase(replicated_table_arguments.begin());
}
}
return new_storage_ast.clone();
}
ClusterPartition & TaskTable::getClusterPartition(const String & partition_name)
{
auto it = cluster_partitions.find(partition_name);
if (it == cluster_partitions.end())
throw Exception("There are no cluster partition " + partition_name + " in " + table_id,
ErrorCodes::LOGICAL_ERROR);
return it->second;
}
}

173
programs/copier/TaskTable.h Normal file
View File

@ -0,0 +1,173 @@
#pragma once
#include "Aliases.h"
#include "TaskShard.h"
namespace DB
{
struct ClusterPartition;
struct TaskCluster;
struct TaskTable
{
TaskTable(TaskCluster & parent, const Poco::Util::AbstractConfiguration & config, const String & prefix, const String & table_key);
TaskCluster & task_cluster;
/// These functions used in checkPartitionIsDone() or checkPartitionPieceIsDone()
/// They are implemented here not to call task_table.tasks_shard[partition_name].second.pieces[current_piece_number] etc.
String getPartitionPath(const String & partition_name) const;
String getPartitionAttachIsActivePath(const String & partition_name) const;
String getPartitionAttachIsDonePath(const String & partition_name) const;
String getPartitionPiecePath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionIsDirtyPath(const String & partition_name) const;
String getCertainPartitionPieceIsDirtyPath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionIsCleanedPath(const String & partition_name) const;
String getCertainPartitionPieceIsCleanedPath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionTaskStatusPath(const String & partition_name) const;
String getCertainPartitionPieceTaskStatusPath(const String & partition_name, size_t piece_number) const;
bool isReplicatedTable() const;
/// These nodes are used for check-status option
String getStatusAllPartitionCount() const;
String getStatusProcessedPartitionsCount() const;
/// Partitions will be split into number-of-splits pieces.
/// Each piece will be copied independently. (10 by default)
size_t number_of_splits;
bool allow_to_copy_alias_and_materialized_columns{false};
bool allow_to_drop_target_partitions{false};
String name_in_config;
/// Used as task ID
String table_id;
/// Column names in primary key
String primary_key_comma_separated;
/// Source cluster and table
String cluster_pull_name;
DatabaseAndTableName table_pull;
/// Destination cluster and table
String cluster_push_name;
DatabaseAndTableName table_push;
/// Storage of destination table
/// (tables that are stored on each shard of target cluster)
String engine_push_str;
ASTPtr engine_push_ast;
ASTPtr engine_push_partition_key_ast;
/// First argument of Replicated...MergeTree()
String engine_push_zk_path;
bool is_replicated_table;
ASTPtr rewriteReplicatedCreateQueryToPlain() const;
/*
* A Distributed table definition used to split data
* Distributed table will be created on each shard of default
* cluster to perform data copying and resharding
* */
String sharding_key_str;
ASTPtr sharding_key_ast;
ASTPtr main_engine_split_ast;
/*
* To copy partition piece form one cluster to another we have to use Distributed table.
* In case of usage separate table (engine_push) for each partition piece,
* we have to use many Distributed tables.
* */
ASTs auxiliary_engine_split_asts;
/// Additional WHERE expression to filter input data
String where_condition_str;
ASTPtr where_condition_ast;
/// Resolved clusters
ClusterPtr cluster_pull;
ClusterPtr cluster_push;
/// Filter partitions that should be copied
bool has_enabled_partitions = false;
Strings enabled_partitions;
NameSet enabled_partitions_set;
/**
* Prioritized list of shards
* all_shards contains information about all shards in the table.
* So we have to check whether particular shard have current partition or not while processing.
*/
TasksShard all_shards;
TasksShard local_shards;
/// All partitions of the current table.
ClusterPartitions cluster_partitions;
NameSet finished_cluster_partitions;
/// Partition names to process in user-specified order
Strings ordered_partition_names;
ClusterPartition & getClusterPartition(const String & partition_name);
Stopwatch watch;
UInt64 bytes_copied = 0;
UInt64 rows_copied = 0;
template <typename RandomEngine>
void initShards(RandomEngine &&random_engine);
};
using TasksTable = std::list<TaskTable>;
template<typename RandomEngine>
inline void TaskTable::initShards(RandomEngine && random_engine)
{
const String & fqdn_name = getFQDNOrHostName();
std::uniform_int_distribution<uint8_t> get_urand(0, std::numeric_limits<UInt8>::max());
// Compute the priority
for (const auto & shard_info : cluster_pull->getShardsInfo())
{
TaskShardPtr task_shard = std::make_shared<TaskShard>(*this, shard_info);
const auto & replicas = cluster_pull->getShardsAddresses().at(task_shard->indexInCluster());
task_shard->priority = getReplicasPriority(replicas, fqdn_name, get_urand(random_engine));
all_shards.emplace_back(task_shard);
}
// Sort by priority
std::sort(all_shards.begin(), all_shards.end(),
[](const TaskShardPtr & lhs, const TaskShardPtr & rhs)
{
return ShardPriority::greaterPriority(lhs->priority, rhs->priority);
});
// Cut local shards
auto it_first_remote = std::lower_bound(all_shards.begin(), all_shards.end(), 1,
[](const TaskShardPtr & lhs, UInt8 is_remote)
{
return lhs->priority.is_remote < is_remote;
});
local_shards.assign(all_shards.begin(), it_first_remote);
}
}

View File

@ -1,434 +0,0 @@
#pragma once
#include "Aliases.h"
#include "Internals.h"
#include "ClusterPartition.h"
#include <Core/Defines.h>
#include <Parsers/ASTFunction.h>
#include <base/map.h>
#include <boost/algorithm/string/join.hpp>
namespace DB
{
namespace ErrorCodes
{
extern const int UNKNOWN_ELEMENT_IN_CONFIG;
extern const int LOGICAL_ERROR;
}
struct TaskShard;
struct TaskTable
{
TaskTable(TaskCluster & parent, const Poco::Util::AbstractConfiguration & config, const String & prefix,
const String & table_key);
TaskCluster & task_cluster;
/// These functions used in checkPartitionIsDone() or checkPartitionPieceIsDone()
/// They are implemented here not to call task_table.tasks_shard[partition_name].second.pieces[current_piece_number] etc.
String getPartitionPath(const String & partition_name) const;
String getPartitionAttachIsActivePath(const String & partition_name) const;
String getPartitionAttachIsDonePath(const String & partition_name) const;
String getPartitionPiecePath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionIsDirtyPath(const String & partition_name) const;
String getCertainPartitionPieceIsDirtyPath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionIsCleanedPath(const String & partition_name) const;
String getCertainPartitionPieceIsCleanedPath(const String & partition_name, size_t piece_number) const;
String getCertainPartitionTaskStatusPath(const String & partition_name) const;
String getCertainPartitionPieceTaskStatusPath(const String & partition_name, size_t piece_number) const;
bool isReplicatedTable() const { return is_replicated_table; }
/// These nodes are used for check-status option
String getStatusAllPartitionCount() const;
String getStatusProcessedPartitionsCount() const;
/// Partitions will be split into number-of-splits pieces.
/// Each piece will be copied independently. (10 by default)
size_t number_of_splits;
bool allow_to_copy_alias_and_materialized_columns{false};
bool allow_to_drop_target_partitions{false};
String name_in_config;
/// Used as task ID
String table_id;
/// Column names in primary key
String primary_key_comma_separated;
/// Source cluster and table
String cluster_pull_name;
DatabaseAndTableName table_pull;
/// Destination cluster and table
String cluster_push_name;
DatabaseAndTableName table_push;
/// Storage of destination table
/// (tables that are stored on each shard of target cluster)
String engine_push_str;
ASTPtr engine_push_ast;
ASTPtr engine_push_partition_key_ast;
/// First argument of Replicated...MergeTree()
String engine_push_zk_path;
bool is_replicated_table;
ASTPtr rewriteReplicatedCreateQueryToPlain() const;
/*
* A Distributed table definition used to split data
* Distributed table will be created on each shard of default
* cluster to perform data copying and resharding
* */
String sharding_key_str;
ASTPtr sharding_key_ast;
ASTPtr main_engine_split_ast;
/*
* To copy partition piece form one cluster to another we have to use Distributed table.
* In case of usage separate table (engine_push) for each partition piece,
* we have to use many Distributed tables.
* */
ASTs auxiliary_engine_split_asts;
/// Additional WHERE expression to filter input data
String where_condition_str;
ASTPtr where_condition_ast;
/// Resolved clusters
ClusterPtr cluster_pull;
ClusterPtr cluster_push;
/// Filter partitions that should be copied
bool has_enabled_partitions = false;
Strings enabled_partitions;
NameSet enabled_partitions_set;
/**
* Prioritized list of shards
* all_shards contains information about all shards in the table.
* So we have to check whether particular shard have current partition or not while processing.
*/
TasksShard all_shards;
TasksShard local_shards;
/// All partitions of the current table.
ClusterPartitions cluster_partitions;
NameSet finished_cluster_partitions;
/// Partition names to process in user-specified order
Strings ordered_partition_names;
ClusterPartition & getClusterPartition(const String & partition_name)
{
auto it = cluster_partitions.find(partition_name);
if (it == cluster_partitions.end())
throw Exception("There are no cluster partition " + partition_name + " in " + table_id,
ErrorCodes::LOGICAL_ERROR);
return it->second;
}
Stopwatch watch;
UInt64 bytes_copied = 0;
UInt64 rows_copied = 0;
template <typename RandomEngine>
void initShards(RandomEngine &&random_engine);
};
struct TaskShard
{
TaskShard(TaskTable & parent, const ShardInfo & info_) : task_table(parent), info(info_)
{
list_of_split_tables_on_shard.assign(task_table.number_of_splits, DatabaseAndTableName());
}
TaskTable & task_table;
ShardInfo info;
UInt32 numberInCluster() const { return info.shard_num; }
UInt32 indexInCluster() const { return info.shard_num - 1; }
String getDescription() const;
String getHostNameExample() const;
/// Used to sort clusters by their proximity
ShardPriority priority;
/// Column with unique destination partitions (computed from engine_push_partition_key expr.) in the shard
ColumnWithTypeAndName partition_key_column;
/// There is a task for each destination partition
TasksPartition partition_tasks;
/// Which partitions have been checked for existence
/// If some partition from this lists is exists, it is in partition_tasks
std::set<String> checked_partitions;
/// Last CREATE TABLE query of the table of the shard
ASTPtr current_pull_table_create_query;
ASTPtr current_push_table_create_query;
/// Internal distributed tables
DatabaseAndTableName table_read_shard;
DatabaseAndTableName main_table_split_shard;
ListOfDatabasesAndTableNames list_of_split_tables_on_shard;
};
inline String TaskTable::getPartitionPath(const String & partition_name) const
{
return task_cluster.task_zookeeper_path // root
+ "/tables/" + table_id // tables/dst_cluster.merge.hits
+ "/" + escapeForFileName(partition_name); // 201701
}
inline String TaskTable::getPartitionAttachIsActivePath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/attach_active";
}
inline String TaskTable::getPartitionAttachIsDonePath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/attach_is_done";
}
inline String TaskTable::getPartitionPiecePath(const String & partition_name, size_t piece_number) const
{
assert(piece_number < number_of_splits);
return getPartitionPath(partition_name) + "/piece_" + toString(piece_number); // 1...number_of_splits
}
inline String TaskTable::getCertainPartitionIsDirtyPath(const String &partition_name) const
{
return getPartitionPath(partition_name) + "/is_dirty";
}
inline String TaskTable::getCertainPartitionPieceIsDirtyPath(const String & partition_name, const size_t piece_number) const
{
return getPartitionPiecePath(partition_name, piece_number) + "/is_dirty";
}
inline String TaskTable::getCertainPartitionIsCleanedPath(const String & partition_name) const
{
return getCertainPartitionIsDirtyPath(partition_name) + "/cleaned";
}
inline String TaskTable::getCertainPartitionPieceIsCleanedPath(const String & partition_name, const size_t piece_number) const
{
return getCertainPartitionPieceIsDirtyPath(partition_name, piece_number) + "/cleaned";
}
inline String TaskTable::getCertainPartitionTaskStatusPath(const String & partition_name) const
{
return getPartitionPath(partition_name) + "/shards";
}
inline String TaskTable::getCertainPartitionPieceTaskStatusPath(const String & partition_name, const size_t piece_number) const
{
return getPartitionPiecePath(partition_name, piece_number) + "/shards";
}
inline String TaskTable::getStatusAllPartitionCount() const
{
return task_cluster.task_zookeeper_path + "/status/all_partitions_count";
}
inline String TaskTable::getStatusProcessedPartitionsCount() const
{
return task_cluster.task_zookeeper_path + "/status/processed_partitions_count";
}
inline TaskTable::TaskTable(TaskCluster & parent, const Poco::Util::AbstractConfiguration & config,
const String & prefix_, const String & table_key)
: task_cluster(parent)
{
String table_prefix = prefix_ + "." + table_key + ".";
name_in_config = table_key;
number_of_splits = config.getUInt64(table_prefix + "number_of_splits", 3);
allow_to_copy_alias_and_materialized_columns = config.getBool(table_prefix + "allow_to_copy_alias_and_materialized_columns", false);
allow_to_drop_target_partitions = config.getBool(table_prefix + "allow_to_drop_target_partitions", false);
cluster_pull_name = config.getString(table_prefix + "cluster_pull");
cluster_push_name = config.getString(table_prefix + "cluster_push");
table_pull.first = config.getString(table_prefix + "database_pull");
table_pull.second = config.getString(table_prefix + "table_pull");
table_push.first = config.getString(table_prefix + "database_push");
table_push.second = config.getString(table_prefix + "table_push");
/// Used as node name in ZooKeeper
table_id = escapeForFileName(cluster_push_name)
+ "." + escapeForFileName(table_push.first)
+ "." + escapeForFileName(table_push.second);
engine_push_str = config.getString(table_prefix + "engine", "rand()");
{
ParserStorage parser_storage;
engine_push_ast = parseQuery(parser_storage, engine_push_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
engine_push_partition_key_ast = extractPartitionKey(engine_push_ast);
primary_key_comma_separated = boost::algorithm::join(extractPrimaryKeyColumnNames(engine_push_ast), ", ");
is_replicated_table = isReplicatedTableEngine(engine_push_ast);
}
sharding_key_str = config.getString(table_prefix + "sharding_key");
auxiliary_engine_split_asts.reserve(number_of_splits);
{
ParserExpressionWithOptionalAlias parser_expression(false);
sharding_key_ast = parseQuery(parser_expression, sharding_key_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
main_engine_split_ast = createASTStorageDistributed(cluster_push_name, table_push.first, table_push.second,
sharding_key_ast);
for (const auto piece_number : collections::range(0, number_of_splits))
{
auxiliary_engine_split_asts.emplace_back
(
createASTStorageDistributed(cluster_push_name, table_push.first,
table_push.second + "_piece_" + toString(piece_number), sharding_key_ast)
);
}
}
where_condition_str = config.getString(table_prefix + "where_condition", "");
if (!where_condition_str.empty())
{
ParserExpressionWithOptionalAlias parser_expression(false);
where_condition_ast = parseQuery(parser_expression, where_condition_str, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
// Will use canonical expression form
where_condition_str = queryToString(where_condition_ast);
}
String enabled_partitions_prefix = table_prefix + "enabled_partitions";
has_enabled_partitions = config.has(enabled_partitions_prefix);
if (has_enabled_partitions)
{
Strings keys;
config.keys(enabled_partitions_prefix, keys);
if (keys.empty())
{
/// Parse list of partition from space-separated string
String partitions_str = config.getString(table_prefix + "enabled_partitions");
boost::trim_if(partitions_str, isWhitespaceASCII);
boost::split(enabled_partitions, partitions_str, isWhitespaceASCII, boost::token_compress_on);
}
else
{
/// Parse sequence of <partition>...</partition>
for (const String &key : keys)
{
if (!startsWith(key, "partition"))
throw Exception("Unknown key " + key + " in " + enabled_partitions_prefix, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG);
enabled_partitions.emplace_back(config.getString(enabled_partitions_prefix + "." + key));
}
}
std::copy(enabled_partitions.begin(), enabled_partitions.end(), std::inserter(enabled_partitions_set, enabled_partitions_set.begin()));
}
}
template<typename RandomEngine>
inline void TaskTable::initShards(RandomEngine && random_engine)
{
const String & fqdn_name = getFQDNOrHostName();
std::uniform_int_distribution<UInt8> get_urand(0, std::numeric_limits<UInt8>::max());
// Compute the priority
for (const auto & shard_info : cluster_pull->getShardsInfo())
{
TaskShardPtr task_shard = std::make_shared<TaskShard>(*this, shard_info);
const auto & replicas = cluster_pull->getShardsAddresses().at(task_shard->indexInCluster());
task_shard->priority = getReplicasPriority(replicas, fqdn_name, get_urand(random_engine));
all_shards.emplace_back(task_shard);
}
// Sort by priority
std::sort(all_shards.begin(), all_shards.end(),
[](const TaskShardPtr & lhs, const TaskShardPtr & rhs)
{
return ShardPriority::greaterPriority(lhs->priority, rhs->priority);
});
// Cut local shards
auto it_first_remote = std::lower_bound(all_shards.begin(), all_shards.end(), 1,
[](const TaskShardPtr & lhs, UInt8 is_remote)
{
return lhs->priority.is_remote < is_remote;
});
local_shards.assign(all_shards.begin(), it_first_remote);
}
inline ASTPtr TaskTable::rewriteReplicatedCreateQueryToPlain() const
{
ASTPtr prev_engine_push_ast = engine_push_ast->clone();
auto & new_storage_ast = prev_engine_push_ast->as<ASTStorage &>();
auto & new_engine_ast = new_storage_ast.engine->as<ASTFunction &>();
/// Remove "Replicated" from name
new_engine_ast.name = new_engine_ast.name.substr(10);
if (new_engine_ast.arguments)
{
auto & replicated_table_arguments = new_engine_ast.arguments->children;
/// In some cases of Atomic database engine usage ReplicatedMergeTree tables
/// could be created without arguments.
if (!replicated_table_arguments.empty())
{
/// Delete first two arguments of Replicated...MergeTree() table.
replicated_table_arguments.erase(replicated_table_arguments.begin());
replicated_table_arguments.erase(replicated_table_arguments.begin());
}
}
return new_storage_ast.clone();
}
inline String DB::TaskShard::getDescription() const
{
return fmt::format("N{} (having a replica {}, pull table {} of cluster {}",
numberInCluster(), getHostNameExample(), getQuotedTable(task_table.table_pull), task_table.cluster_pull_name);
}
inline String DB::TaskShard::getHostNameExample() const
{
const auto & replicas = task_table.cluster_pull->getShardsAddresses().at(indexInCluster());
return replicas.at(0).readableString();
}
}

View File

@ -90,17 +90,23 @@ std::string makeRegexpPatternFromGlobs(const std::string & initial_str_with_glob
oss_for_replacing << escaped_with_globs.substr(current_index);
std::string almost_res = oss_for_replacing.str();
WriteBufferFromOwnString buf_final_processing;
char previous = ' ';
for (const auto & letter : almost_res)
{
if ((letter == '?') || (letter == '*'))
if (previous == '*' && letter == '*')
{
buf_final_processing << "[^{}]";
}
else if ((letter == '?') || (letter == '*'))
{
buf_final_processing << "[^/]"; /// '?' is any symbol except '/'
if (letter == '?')
continue;
}
if ((letter == '.') || (letter == '{') || (letter == '}'))
else if ((letter == '.') || (letter == '{') || (letter == '}'))
buf_final_processing << '\\';
buf_final_processing << letter;
previous = letter;
}
return buf_final_processing.str();
}

View File

@ -1049,7 +1049,7 @@ INSTANTIATE_TEST_SUITE_P(RandomInt,
::testing::Combine(
DefaultCodecsToTest,
::testing::Values(
generateSeq<UInt8 >(G(RandomGenerator<UInt8>(0))),
generateSeq<UInt8 >(G(RandomGenerator<uint8_t>(0))),
generateSeq<UInt16>(G(RandomGenerator<UInt16>(0))),
generateSeq<UInt32>(G(RandomGenerator<UInt32>(0, 0, 1000'000'000))),
generateSeq<UInt64>(G(RandomGenerator<UInt64>(0, 0, 1000'000'000)))

View File

@ -69,7 +69,7 @@ static std::ostream & operator<<(std::ostream & ostr, const JSONPathAndValue & p
bool first = true;
for (const auto & part : path_and_value.path.getParts())
{
ostr << (first ? "{" : ", {") << part.key << ", " << part.is_nested << ", " << part.anonymous_array_level << "}";
ostr << (first ? "{" : ", {") << part.key << ", " << part.is_nested << ", " << static_cast<uint8_t>(part.anonymous_array_level) << "}";
first = false;
}

View File

@ -39,6 +39,7 @@
#include <Common/FieldVisitorsAccurateComparison.h>
#include <Common/assert_cast.h>
#include <Common/typeid_cast.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <Interpreters/Context.h>
#if USE_EMBEDDED_COMPILER
@ -1790,25 +1791,32 @@ public:
// const +|- variable
if (left.column && isColumnConst(*left.column))
{
auto left_type = removeNullable(removeLowCardinality(left.type));
auto right_type = removeNullable(removeLowCardinality(right.type));
auto ret_type = removeNullable(removeLowCardinality(return_type));
auto transform = [&](const Field & point)
{
ColumnsWithTypeAndName columns_with_constant
= {{left.column->cloneResized(1), left.type, left.name},
{right.type->createColumnConst(1, point), right.type, right.name}};
= {{left_type->createColumnConst(1, (*left.column)[0]), left_type, left.name},
{right_type->createColumnConst(1, point), right_type, right.name}};
auto col = Base::executeImpl(columns_with_constant, return_type, 1);
/// This is a bit dangerous to call Base::executeImpl cause it ignores `use Default Implementation For XXX` flags.
/// It was possible to check monotonicity for nullable right type which result to exception.
/// Adding removeNullable above fixes the issue, but some other inconsistency may left.
auto col = Base::executeImpl(columns_with_constant, ret_type, 1);
Field point_transformed;
col->get(0, point_transformed);
return point_transformed;
};
transform(left_point);
transform(right_point);
bool is_positive_monotonicity = applyVisitor(FieldVisitorAccurateLess(), left_point, right_point)
== applyVisitor(FieldVisitorAccurateLess(), transform(left_point), transform(right_point));
if (name_view == "plus")
{
// Check if there is an overflow
if (applyVisitor(FieldVisitorAccurateLess(), left_point, right_point)
== applyVisitor(FieldVisitorAccurateLess(), transform(left_point), transform(right_point)))
if (is_positive_monotonicity)
return {true, true, false, true};
else
return {false, true, false, false};
@ -1816,8 +1824,7 @@ public:
else
{
// Check if there is an overflow
if (applyVisitor(FieldVisitorAccurateLess(), left_point, right_point)
!= applyVisitor(FieldVisitorAccurateLess(), transform(left_point), transform(right_point)))
if (!is_positive_monotonicity)
return {true, false, false, true};
else
return {false, false, false, false};
@ -1826,13 +1833,17 @@ public:
// variable +|- constant
else if (right.column && isColumnConst(*right.column))
{
auto left_type = removeNullable(removeLowCardinality(left.type));
auto right_type = removeNullable(removeLowCardinality(right.type));
auto ret_type = removeNullable(removeLowCardinality(return_type));
auto transform = [&](const Field & point)
{
ColumnsWithTypeAndName columns_with_constant
= {{left.type->createColumnConst(1, point), left.type, left.name},
{right.column->cloneResized(1), right.type, right.name}};
= {{left_type->createColumnConst(1, point), left_type, left.name},
{right_type->createColumnConst(1, (*right.column)[0]), right_type, right.name}};
auto col = Base::executeImpl(columns_with_constant, return_type, 1);
auto col = Base::executeImpl(columns_with_constant, ret_type, 1);
Field point_transformed;
col->get(0, point_transformed);
return point_transformed;

View File

@ -2173,6 +2173,10 @@ struct ToNumberMonotonicity
const size_t size_of_from = type.getSizeOfValueInMemory();
const size_t size_of_to = sizeof(T);
/// Do not support 128 bit integers and decimals for now.
if (size_of_from > sizeof(Int64))
return {};
const bool left_in_first_half = left.isNull()
? from_is_unsigned
: (left.get<Int64>() >= 0);

View File

@ -43,10 +43,34 @@ using FunctionCutToFirstSignificantSubdomainWithWWWRFC = FunctionStringToString<
REGISTER_FUNCTION(CutToFirstSignificantSubdomain)
{
factory.registerFunction<FunctionCutToFirstSignificantSubdomain>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainWithWWW>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainRFC>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainWithWWWRFC>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomain>(
{
R"(Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see documentation of the `firstSignificantSubdomain`).)",
Documentation::Examples{
{"cutToFirstSignificantSubdomain1", "SELECT cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/')"},
{"cutToFirstSignificantSubdomain2", "SELECT cutToFirstSignificantSubdomain('www.tr')"},
{"cutToFirstSignificantSubdomain3", "SELECT cutToFirstSignificantSubdomain('tr')"},
},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainWithWWW>(
{
R"(Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping "www".)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainRFC>(
{
R"(Similar to `cutToFirstSignificantSubdomain` but follows stricter rules to be compatible with RFC 3986 and less performant.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainWithWWWRFC>(
{
R"(Similar to `cutToFirstSignificantSubdomainWithWWW` but follows stricter rules to be compatible with RFC 3986 and less performant.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -42,10 +42,41 @@ using FunctionCutToFirstSignificantSubdomainCustomWithWWWRFC = FunctionCutToFirs
REGISTER_FUNCTION(CutToFirstSignificantSubdomainCustom)
{
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustom>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomWithWWW>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomRFC>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomWithWWWRFC>();
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustom>(
{
R"(
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom TLD list name.
Can be useful if you need fresh TLD list or you have custom.
)",
Documentation::Examples{
{"cutToFirstSignificantSubdomainCustom", "SELECT cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');"},
},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomWithWWW>(
{
R"(
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
Accepts custom TLD list name from config.
Can be useful if you need fresh TLD list or you have custom.
)",
Documentation::Examples{{"cutToFirstSignificantSubdomainCustomWithWWW", "SELECT cutToFirstSignificantSubdomainCustomWithWWW('www.foo', 'public_suffix_list')"}},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomRFC>(
{
R"(Similar to `cutToFirstSignificantSubdomainCustom` but follows stricter rules according to RFC 3986.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionCutToFirstSignificantSubdomainCustomWithWWWRFC>(
{
R"(Similar to `cutToFirstSignificantSubdomainCustomWithWWW` but follows stricter rules according to RFC 3986.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -14,8 +14,24 @@ using FunctionDomainRFC = FunctionStringToString<ExtractSubstringImpl<ExtractDom
REGISTER_FUNCTION(Domain)
{
factory.registerFunction<FunctionDomain>();
factory.registerFunction<FunctionDomainRFC>();
factory.registerFunction<FunctionDomain>(
{
R"(
Extracts the hostname from a URL.
The URL can be specified with or without a scheme.
If the argument can't be parsed as URL, the function returns an empty string.
)",
Documentation::Examples{{"domain", "SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk')"}},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionDomainRFC>(
{
R"(Similar to `domain` but follows stricter rules to be compatible with RFC 3986 and less performant.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -14,8 +14,23 @@ using FunctionDomainWithoutWWWRFC = FunctionStringToString<ExtractSubstringImpl<
REGISTER_FUNCTION(DomainWithoutWWW)
{
factory.registerFunction<FunctionDomainWithoutWWW>();
factory.registerFunction<FunctionDomainWithoutWWWRFC>();
factory.registerFunction<FunctionDomainWithoutWWW>(
{
R"(
Extracts the hostname from a URL, removing the leading "www." if present.
The URL can be specified with or without a scheme.
If the argument can't be parsed as URL, the function returns an empty string.
)",
Documentation::Examples{{"domainWithoutWWW", "SELECT domainWithoutWWW('https://www.clickhouse.com')"}},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionDomainWithoutWWWRFC>(
{
R"(Similar to `domainWithoutWWW` but follows stricter rules to be compatible with RFC 3986 and less performant.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -14,8 +14,28 @@ using FunctionFirstSignificantSubdomainRFC = FunctionStringToString<ExtractSubst
REGISTER_FUNCTION(FirstSignificantSubdomain)
{
factory.registerFunction<FunctionFirstSignificantSubdomain>();
factory.registerFunction<FunctionFirstSignificantSubdomainRFC>();
factory.registerFunction<FunctionFirstSignificantSubdomain>(
{
R"(
Returns the "first significant subdomain".
The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'.
Otherwise, it is a third-level domain.
For example, firstSignificantSubdomain('https://news.clickhouse.com/') = 'clickhouse', firstSignificantSubdomain ('https://news.clickhouse.com.tr/') = 'clickhouse'.
The list of "insignificant" second-level domains and other implementation details may change in the future.
)",
Documentation::Examples{{"firstSignificantSubdomain", "SELECT firstSignificantSubdomain('https://news.clickhouse.com/')"}},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionFirstSignificantSubdomainRFC>(
{
R"(Returns the "first significant subdomain" according to RFC 1034.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -139,8 +139,18 @@ struct FunctionPortRFC : public FunctionPortImpl<true>
REGISTER_FUNCTION(Port)
{
factory.registerFunction<FunctionPort>();
factory.registerFunction<FunctionPortRFC>();
factory.registerFunction<FunctionPort>(
{
R"(Returns the port or `default_port` if there is no port in the URL (or in case of validation error).)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionPortRFC>(
{
R"(Similar to `port`, but conforms to RFC 3986.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -53,8 +53,23 @@ using FunctionTopLevelDomainRFC = FunctionStringToString<ExtractSubstringImpl<Ex
REGISTER_FUNCTION(TopLevelDomain)
{
factory.registerFunction<FunctionTopLevelDomain>();
factory.registerFunction<FunctionTopLevelDomainRFC>();
factory.registerFunction<FunctionTopLevelDomain>(
{
R"(
Extracts the the top-level domain from a URL.
Returns an empty string if the argument cannot be parsed as a URL or does not contain a top-level domain.
)",
Documentation::Examples{{"topLevelDomain", "SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')"}},
Documentation::Categories{"URL"}
});
factory.registerFunction<FunctionTopLevelDomainRFC>(
{
R"(Similar to topLevelDomain, but conforms to RFC 3986.)",
Documentation::Examples{},
Documentation::Categories{"URL"}
});
}
}

View File

@ -2146,6 +2146,8 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc
auto [limit_length, limit_offset] = getLimitLengthAndOffset(query, context);
auto local_limits = getStorageLimits(*context, options);
/** Optimization - if not specified DISTINCT, WHERE, GROUP, HAVING, ORDER, JOIN, LIMIT BY, WITH TIES
* but LIMIT is specified, and limit + offset < max_block_size,
* then as the block size we will use limit + offset (not to read more from the table than requested),
@ -2164,17 +2166,22 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc
&& !query_analyzer->hasAggregation()
&& !query_analyzer->hasWindow()
&& query.limitLength()
&& limit_length <= std::numeric_limits<UInt64>::max() - limit_offset
&& limit_length + limit_offset < max_block_size)
&& limit_length <= std::numeric_limits<UInt64>::max() - limit_offset)
{
max_block_size = std::max<UInt64>(1, limit_length + limit_offset);
max_threads_execute_query = max_streams = 1;
if (limit_length + limit_offset < max_block_size)
{
max_block_size = std::max<UInt64>(1, limit_length + limit_offset);
max_threads_execute_query = max_streams = 1;
}
if (limit_length + limit_offset < local_limits.local_limits.size_limits.max_rows)
{
query_info.limit = limit_length + limit_offset;
}
}
if (!max_block_size)
throw Exception("Setting 'max_block_size' cannot be zero", ErrorCodes::PARAMETER_OUT_OF_BOUND);
auto local_limits = getStorageLimits(*context, options);
storage_limits.emplace_back(local_limits);
/// Initialize the initial data streams to which the query transforms are superimposed. Table or subquery or prepared input?

View File

@ -131,6 +131,7 @@ void Set::setHeader(const ColumnsWithTypeAndName & header)
if (const auto * low_cardinality_type = typeid_cast<const DataTypeLowCardinality *>(data_types.back().get()))
{
data_types.back() = low_cardinality_type->getDictionaryType();
set_elements_types.back() = low_cardinality_type->getDictionaryType();
materialized_columns.emplace_back(key_columns.back()->convertToFullColumnIfLowCardinality());
key_columns.back() = materialized_columns.back().get();
}

View File

@ -1,7 +1,5 @@
#include <Processors/Merges/Algorithms/AggregatingSortedAlgorithm.h>
#include <Columns/ColumnAggregateFunction.h>
#include <Common/AlignedBuffer.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeCustomSimpleAggregateFunction.h>
#include <DataTypes/DataTypeLowCardinality.h>
@ -18,70 +16,6 @@ AggregatingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition() = default;
AggregatingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition(ColumnsDefinition &&) noexcept = default;
AggregatingSortedAlgorithm::ColumnsDefinition::~ColumnsDefinition() = default;
/// Stores information for aggregation of AggregateFunction columns
struct AggregatingSortedAlgorithm::AggregateDescription
{
ColumnAggregateFunction * column = nullptr;
const size_t column_number = 0; /// Position in header.
AggregateDescription() = default;
explicit AggregateDescription(size_t col_number) : column_number(col_number) {}
};
/// Stores information for aggregation of SimpleAggregateFunction columns
struct AggregatingSortedAlgorithm::SimpleAggregateDescription
{
/// An aggregate function 'anyLast', 'sum'...
AggregateFunctionPtr function;
IAggregateFunction::AddFunc add_function = nullptr;
size_t column_number = 0;
IColumn * column = nullptr;
/// For LowCardinality, convert is converted to nested type. nested_type is nullptr if no conversion needed.
const DataTypePtr nested_type; /// Nested type for LowCardinality, if it is.
const DataTypePtr real_type; /// Type in header.
AlignedBuffer state;
bool created = false;
SimpleAggregateDescription(
AggregateFunctionPtr function_, const size_t column_number_,
DataTypePtr nested_type_, DataTypePtr real_type_)
: function(std::move(function_)), column_number(column_number_)
, nested_type(std::move(nested_type_)), real_type(std::move(real_type_))
{
add_function = function->getAddressOfAddFunction();
state.reset(function->sizeOfData(), function->alignOfData());
}
void createState()
{
if (created)
return;
function->create(state.data());
created = true;
}
void destroyState()
{
if (!created)
return;
function->destroy(state.data());
created = false;
}
/// Explicitly destroy aggregation state if the stream is terminated
~SimpleAggregateDescription()
{
destroyState();
}
SimpleAggregateDescription() = default;
SimpleAggregateDescription(SimpleAggregateDescription &&) = default;
SimpleAggregateDescription(const SimpleAggregateDescription &) = delete;
};
static AggregatingSortedAlgorithm::ColumnsDefinition defineColumns(
const Block & header, const SortDescription & description)
{
@ -191,6 +125,39 @@ static void postprocessChunk(Chunk & chunk, const AggregatingSortedAlgorithm::Co
}
AggregatingSortedAlgorithm::SimpleAggregateDescription::SimpleAggregateDescription(
AggregateFunctionPtr function_, const size_t column_number_,
DataTypePtr nested_type_, DataTypePtr real_type_)
: function(std::move(function_)), column_number(column_number_)
, nested_type(std::move(nested_type_)), real_type(std::move(real_type_))
{
add_function = function->getAddressOfAddFunction();
state.reset(function->sizeOfData(), function->alignOfData());
}
void AggregatingSortedAlgorithm::SimpleAggregateDescription::createState()
{
if (created)
return;
function->create(state.data());
created = true;
}
void AggregatingSortedAlgorithm::SimpleAggregateDescription::destroyState()
{
if (!created)
return;
function->destroy(state.data());
created = false;
}
/// Explicitly destroy aggregation state if the stream is terminated
AggregatingSortedAlgorithm::SimpleAggregateDescription::~SimpleAggregateDescription()
{
destroyState();
}
AggregatingSortedAlgorithm::AggregatingMergedData::AggregatingMergedData(
MutableColumns columns_, UInt64 max_block_size_, ColumnsDefinition & def_)
: MergedData(std::move(columns_), false, max_block_size_), def(def_)

View File

@ -1,5 +1,7 @@
#pragma once
#include <Columns/ColumnAggregateFunction.h>
#include <Common/AlignedBuffer.h>
#include <Processors/Merges/Algorithms/IMergingAlgorithmWithDelayedChunk.h>
#include <Processors/Merges/Algorithms/MergedData.h>
@ -23,8 +25,48 @@ public:
void consume(Input & input, size_t source_num) override;
Status merge() override;
struct SimpleAggregateDescription;
struct AggregateDescription;
/// Stores information for aggregation of SimpleAggregateFunction columns
struct SimpleAggregateDescription
{
/// An aggregate function 'anyLast', 'sum'...
AggregateFunctionPtr function;
IAggregateFunction::AddFunc add_function = nullptr;
size_t column_number = 0;
IColumn * column = nullptr;
/// For LowCardinality, convert is converted to nested type. nested_type is nullptr if no conversion needed.
const DataTypePtr nested_type; /// Nested type for LowCardinality, if it is.
const DataTypePtr real_type; /// Type in header.
AlignedBuffer state;
bool created = false;
SimpleAggregateDescription(
AggregateFunctionPtr function_, const size_t column_number_,
DataTypePtr nested_type_, DataTypePtr real_type_);
void createState();
void destroyState();
/// Explicitly destroy aggregation state if the stream is terminated
~SimpleAggregateDescription();
SimpleAggregateDescription() = default;
SimpleAggregateDescription(SimpleAggregateDescription &&) = default;
SimpleAggregateDescription(const SimpleAggregateDescription &) = delete;
};
/// Stores information for aggregation of AggregateFunction columns
struct AggregateDescription
{
ColumnAggregateFunction * column = nullptr;
const size_t column_number = 0; /// Position in header.
AggregateDescription() = default;
explicit AggregateDescription(size_t col_number) : column_number(col_number) {}
};
/// This structure define columns into one of three types:
/// * columns which are not aggregate functions and not needed to be aggregated

View File

@ -23,10 +23,6 @@ namespace ErrorCodes
extern const int CORRUPTED_DATA;
}
SummingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition() = default;
SummingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition(ColumnsDefinition &&) noexcept = default;
SummingSortedAlgorithm::ColumnsDefinition::~ColumnsDefinition() = default;
/// Stores numbers of key-columns and value-columns.
struct SummingSortedAlgorithm::MapDescription
{
@ -777,4 +773,8 @@ IMergingAlgorithm::Status SummingSortedAlgorithm::merge()
return Status(merged_data.pull(), true);
}
SummingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition() = default;
SummingSortedAlgorithm::ColumnsDefinition::ColumnsDefinition(ColumnsDefinition &&) noexcept = default;
SummingSortedAlgorithm::ColumnsDefinition::~ColumnsDefinition() = default;
}

View File

@ -173,6 +173,9 @@ Pipe ReadFromMergeTree::readFromPool(
total_rows += part.getRowsCount();
}
if (query_info.limit > 0 && query_info.limit < total_rows)
total_rows = query_info.limit;
const auto & settings = context->getSettingsRef();
const auto & client_info = context->getClientInfo();
MergeTreeReadPool::BackoffSettings backoff_settings(settings);
@ -246,10 +249,26 @@ ProcessorPtr ReadFromMergeTree::createSource(
};
}
return std::make_shared<TSource>(
auto total_rows = part.getRowsCount();
if (query_info.limit > 0 && query_info.limit < total_rows)
total_rows = query_info.limit;
/// Actually it means that parallel reading from replicas enabled
/// and we have to collaborate with initiator.
/// In this case we won't set approximate rows, because it will be accounted multiple times.
/// Also do not count amount of read rows if we read in order of sorting key,
/// because we don't know actual amount of read rows in case when limit is set.
bool set_rows_approx = !extension.has_value() && !reader_settings.read_in_order;
auto source = std::make_shared<TSource>(
data, storage_snapshot, part.data_part, max_block_size, preferred_block_size_bytes,
preferred_max_column_in_block_size_bytes, required_columns, part.ranges, use_uncompressed_cache, prewhere_info,
actions_settings, reader_settings, virt_column_names, part.part_index_in_query, has_limit_below_one_block, std::move(extension));
if (set_rows_approx)
source -> addTotalRowsApprox(total_rows);
return source;
}
Pipe ReadFromMergeTree::readInOrder(

View File

@ -1424,6 +1424,7 @@ public:
ColumnsWithTypeAndName new_arguments;
new_arguments.reserve(arguments.size() + 1);
new_arguments.push_back(const_arg);
new_arguments.front().column = new_arguments.front().column->cloneResized(input_rows_count);
for (const auto & arg : arguments)
new_arguments.push_back(arg);
return func->prepare(new_arguments)->execute(new_arguments, result_type, input_rows_count, dry_run);
@ -1432,6 +1433,7 @@ public:
{
auto new_arguments = arguments;
new_arguments.push_back(const_arg);
new_arguments.back().column = new_arguments.back().column->cloneResized(input_rows_count);
return func->prepare(new_arguments)->execute(new_arguments, result_type, input_rows_count, dry_run);
}
else

View File

@ -607,7 +607,7 @@ Block MergeTreeBaseSelectProcessor::transformHeader(
if (!row_level_column.type->canBeUsedInBooleanContext())
{
throw Exception("Invalid type for filter in PREWHERE: " + row_level_column.type->getName(),
ErrorCodes::LOGICAL_ERROR);
ErrorCodes::ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER);
}
block.erase(prewhere_info->row_level_column_name);
@ -620,7 +620,7 @@ Block MergeTreeBaseSelectProcessor::transformHeader(
if (!prewhere_column.type->canBeUsedInBooleanContext())
{
throw Exception("Invalid type for filter in PREWHERE: " + prewhere_column.type->getName(),
ErrorCodes::LOGICAL_ERROR);
ErrorCodes::ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER);
}
if (prewhere_info->remove_prewhere_column)
@ -628,13 +628,13 @@ Block MergeTreeBaseSelectProcessor::transformHeader(
else
{
WhichDataType which(removeNullable(recursiveRemoveLowCardinality(prewhere_column.type)));
if (which.isInt() || which.isUInt())
if (which.isNativeInt() || which.isNativeUInt())
prewhere_column.column = prewhere_column.type->createColumnConst(block.rows(), 1u)->convertToFullColumnIfConst();
else if (which.isFloat())
prewhere_column.column = prewhere_column.type->createColumnConst(block.rows(), 1.0f)->convertToFullColumnIfConst();
else
throw Exception("Illegal type " + prewhere_column.type->getName() + " of column for filter.",
ErrorCodes::ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER);
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER, "Illegal type {} of column for filter", prewhere_column.type->getName());
}
}

View File

@ -1078,6 +1078,10 @@ RangesInDataParts MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipInd
auto current_rows_estimate = ranges.getRowsCount();
size_t prev_total_rows_estimate = total_rows.fetch_add(current_rows_estimate);
size_t total_rows_estimate = current_rows_estimate + prev_total_rows_estimate;
if (query_info.limit > 0 && total_rows_estimate > query_info.limit)
{
total_rows_estimate = query_info.limit;
}
limits.check(total_rows_estimate, 0, "rows (controlled by 'max_rows_to_read' setting)", ErrorCodes::TOO_MANY_ROWS);
leaf_limits.check(
total_rows_estimate, 0, "rows (controlled by 'max_rows_to_read_leaf' setting)", ErrorCodes::TOO_MANY_ROWS);

View File

@ -38,14 +38,6 @@ MergeTreeSelectProcessor::MergeTreeSelectProcessor(
has_limit_below_one_block(has_limit_below_one_block_),
total_rows(data_part->index_granularity.getRowsCountInRanges(all_mark_ranges))
{
/// Actually it means that parallel reading from replicas enabled
/// and we have to collaborate with initiator.
/// In this case we won't set approximate rows, because it will be accounted multiple times.
/// Also do not count amount of read rows if we read in order of sorting key,
/// because we don't know actual amount of read rows in case when limit is set.
if (!extension_.has_value() && !reader_settings.read_in_order)
addTotalRowsApprox(total_rows);
ordered_names = header_without_virtual_columns.getNames();
}

View File

@ -232,6 +232,9 @@ struct SelectQueryInfo
Block minmax_count_projection_block;
MergeTreeDataSelectAnalysisResultPtr merge_tree_select_result_ptr;
// If limit is not 0, that means it's a trivial limit query.
UInt64 limit = 0;
InputOrderInfoPtr getInputOrderInfo() const
{
return input_order_info ? input_order_info : (projection ? projection->input_order_info : nullptr);

View File

@ -81,7 +81,8 @@ void listFilesWithRegexpMatchingImpl(
const std::string & path_for_ls,
const std::string & for_match,
size_t & total_bytes_to_read,
std::vector<std::string> & result)
std::vector<std::string> & result,
bool recursive = false)
{
const size_t first_glob = for_match.find_first_of("*?{");
@ -89,10 +90,17 @@ void listFilesWithRegexpMatchingImpl(
const std::string suffix_with_globs = for_match.substr(end_of_path_without_globs); /// begin with '/'
const size_t next_slash = suffix_with_globs.find('/', 1);
auto regexp = makeRegexpPatternFromGlobs(suffix_with_globs.substr(0, next_slash));
const std::string current_glob = suffix_with_globs.substr(0, next_slash);
auto regexp = makeRegexpPatternFromGlobs(current_glob);
re2::RE2 matcher(regexp);
bool skip_regex = current_glob == "/*" ? true : false;
if (!recursive)
recursive = current_glob == "/**" ;
const std::string prefix_without_globs = path_for_ls + for_match.substr(1, end_of_path_without_globs);
if (!fs::exists(prefix_without_globs))
return;
@ -107,15 +115,21 @@ void listFilesWithRegexpMatchingImpl(
/// Condition is_directory means what kind of path is it in current iteration of ls
if (!it->is_directory() && !looking_for_directory)
{
if (re2::RE2::FullMatch(file_name, matcher))
if (skip_regex || re2::RE2::FullMatch(file_name, matcher))
{
total_bytes_to_read += it->file_size();
result.push_back(it->path().string());
}
}
else if (it->is_directory() && looking_for_directory)
else if (it->is_directory())
{
if (re2::RE2::FullMatch(file_name, matcher))
if (recursive)
{
listFilesWithRegexpMatchingImpl(fs::path(full_path).append(it->path().string()) / "" ,
looking_for_directory ? suffix_with_globs.substr(next_slash) : current_glob ,
total_bytes_to_read, result, recursive);
}
else if (looking_for_directory && re2::RE2::FullMatch(file_name, matcher))
{
/// Recursion depth is limited by pattern. '*' works only for depth = 1, for depth = 2 pattern path is '*/*'. So we do not need additional check.
listFilesWithRegexpMatchingImpl(fs::path(full_path) / "", suffix_with_globs.substr(next_slash), total_bytes_to_read, result);

View File

@ -139,7 +139,9 @@ public:
request.SetBucket(globbed_uri.bucket);
request.SetPrefix(key_prefix);
matcher = std::make_unique<re2::RE2>(makeRegexpPatternFromGlobs(globbed_uri.key));
recursive = globbed_uri.key == "/**" ? true : false;
fillInternalBufferAssumeLocked();
}
@ -197,7 +199,7 @@ private:
for (const auto & row : result_batch)
{
const String & key = row.GetKey();
if (re2::RE2::FullMatch(key, *matcher))
if (recursive || re2::RE2::FullMatch(key, *matcher))
{
String path = fs::path(globbed_uri.bucket) / key;
if (object_infos)
@ -224,7 +226,7 @@ private:
for (const auto & row : result_batch)
{
String key = row.GetKey();
if (re2::RE2::FullMatch(key, *matcher))
if (recursive || re2::RE2::FullMatch(key, *matcher))
buffer.emplace_back(std::move(key));
}
}
@ -252,6 +254,7 @@ private:
Aws::S3::Model::ListObjectsV2Request request;
Aws::S3::Model::ListObjectsV2Outcome outcome;
std::unique_ptr<re2::RE2> matcher;
bool recursive{false};
bool is_finished{false};
std::unordered_map<String, S3::ObjectInfo> * object_infos;
Strings * read_keys;

View File

@ -12,6 +12,7 @@ const char * auto_contributors[] {
"821008736@qq.com",
"ANDREI STAROVEROV",
"Aaron Katz",
"Adam Rutkowski",
"Adri Fernandez",
"Ahmed Dardery",
"Aimiyoo",
@ -76,11 +77,15 @@ const char * auto_contributors[] {
"Alexey Elymanov",
"Alexey Gusev",
"Alexey Ilyukhov",
"Alexey Ivanov",
"Alexey Milovidov",
"Alexey Tronov",
"Alexey Vasiliev",
"Alexey Zatelepin",
"Alexsey Shestakov",
"AlfVII",
"Alfonso Martinez",
"Alfred Xu",
"Ali Demirci",
"Aliaksandr Pliutau",
"Aliaksandr Shylau",
@ -196,6 +201,7 @@ const char * auto_contributors[] {
"Brian Hunter",
"Bulat Gaifullin",
"Carbyn",
"Carlos Rodríguez Hernández",
"Caspian",
"Chao Ma",
"Chao Wang",
@ -222,6 +228,7 @@ const char * auto_contributors[] {
"DIAOZHAFENG",
"Dale McDiarmid",
"Dale Mcdiarmid",
"Dalitso Banda",
"Dan Roscigno",
"DanRoscigno",
"Daniel Bershatsky",
@ -267,6 +274,7 @@ const char * auto_contributors[] {
"Dmitry S..ky / skype: dvska-at-skype",
"Dmitry Ukolov",
"Doge",
"Dom Del Nano",
"Dongdong Yang",
"DoomzD",
"Dr. Strange Looker",
@ -276,6 +284,7 @@ const char * auto_contributors[] {
"Egor Savin",
"Ekaterina",
"Eldar Zaitov",
"Elena",
"Elena Baskakova",
"Elghazal Ahmed",
"Elizaveta Mironyuk",
@ -342,6 +351,7 @@ const char * auto_contributors[] {
"Grigory Pervakov",
"GruffGemini",
"Guillaume Tassery",
"Guo Wangyang",
"Guo Wei (William)",
"Haavard Kvaalen",
"Habibullah Oladepo",
@ -349,6 +359,7 @@ const char * auto_contributors[] {
"Hakob Saghatelyan",
"Hamoon",
"Han Fei",
"Han Shukai",
"Harry Lee",
"Harry-Lee",
"HarryLeeIBM",
@ -404,6 +415,7 @@ const char * auto_contributors[] {
"Jack Song",
"JackyWoo",
"Jacob Hayes",
"Jacob Herrington",
"Jake Liu",
"Jakub Kuklis",
"James Maidment",
@ -419,6 +431,7 @@ const char * auto_contributors[] {
"Jiading Guo",
"Jiang Tao",
"Jianmei Zhang",
"Jiebin Sun",
"Jochen Schalanda",
"John",
"John Hummel",
@ -432,6 +445,7 @@ const char * auto_contributors[] {
"Julian Gilyadov",
"Julian Zhou",
"Julio Jimenez",
"Jus",
"Justin Hilliard",
"Kang Liu",
"Karl Pietrzak",
@ -652,6 +666,7 @@ const char * auto_contributors[] {
"OuO",
"PHO",
"Pablo Alegre",
"Pablo Marcos",
"Paramtamtam",
"Patrick Zippenfenig",
"Paul Loyd",
@ -681,6 +696,7 @@ const char * auto_contributors[] {
"Prashant Shahi",
"Pxl",
"Pysaoke",
"Quanfa Fu",
"Quid37",
"Rafael Acevedo",
"Rafael David Tinoco",
@ -693,6 +709,7 @@ const char * auto_contributors[] {
"RedClusive",
"RegulusZ",
"Reilee",
"Reinaldy Rafli",
"Reto Kromer",
"Ri",
"Rich Raposa",
@ -726,6 +743,7 @@ const char * auto_contributors[] {
"Sachin",
"Safronov Michail",
"SaltTan",
"Salvatore Mesoraca",
"Sami Kerola",
"Samuel Chou",
"San",
@ -927,6 +945,7 @@ const char * auto_contributors[] {
"ZhiYong Wang",
"Zhichang Yu",
"Zhichun Wu",
"Zhiguo Zhou",
"Zhipeng",
"Zijie Lu",
"Zoran Pandovski",
@ -950,6 +969,7 @@ const char * auto_contributors[] {
"alexander goryanets",
"alexander kozhikhov",
"alexey-milovidov",
"alexeyerm",
"alexeypavlenko",
"alfredlu",
"amesaru",
@ -1131,6 +1151,7 @@ const char * auto_contributors[] {
"jennyma",
"jetgm",
"jewisliu",
"jferroal",
"jiahui-97",
"jianmei zhang",
"jinjunzh",
@ -1236,6 +1257,7 @@ const char * auto_contributors[] {
"mo-avatar",
"morty",
"moscas",
"mosinnik",
"mreddy017",
"msaf1980",
"msirm",
@ -1321,6 +1343,7 @@ const char * auto_contributors[] {
"simon-says",
"snyk-bot",
"songenjie",
"sperlingxx",
"spff",
"spongedc",
"spume",
@ -1422,6 +1445,7 @@ const char * auto_contributors[] {
"zhongyuankai",
"zhoubintao",
"zhukai",
"zimv",
"zkun",
"zlx19950903",
"zombee0",

View File

@ -2,6 +2,7 @@
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypesNumber.h>
#include <Storages/System/StorageSystemNumbers.h>
#include <Storages/SelectQueryInfo.h>
#include <Processors/ISource.h>
#include <QueryPipeline/Pipe.h>
@ -125,7 +126,7 @@ StorageSystemNumbers::StorageSystemNumbers(const StorageID & table_id, bool mult
Pipe StorageSystemNumbers::read(
const Names & column_names,
const StorageSnapshotPtr & storage_snapshot,
SelectQueryInfo &,
SelectQueryInfo & query_info,
ContextPtr /*context*/,
QueryProcessingStage::Enum /*processed_stage*/,
size_t max_block_size,
@ -154,7 +155,12 @@ Pipe StorageSystemNumbers::read(
auto source = std::make_shared<NumbersMultiThreadedSource>(state, max_block_size, max_counter);
if (i == 0)
source->addTotalRowsApprox(*limit);
{
auto rows_appr = *limit;
if (query_info.limit > 0 && query_info.limit < rows_appr)
rows_appr = query_info.limit;
source->addTotalRowsApprox(rows_appr);
}
pipe.addSource(std::move(source));
}
@ -167,7 +173,12 @@ Pipe StorageSystemNumbers::read(
auto source = std::make_shared<NumbersSource>(max_block_size, offset + i * max_block_size, num_streams * max_block_size);
if (limit && i == 0)
source->addTotalRowsApprox(*limit);
{
auto rows_appr = *limit;
if (query_info.limit > 0 && query_info.limit < rows_appr)
rows_appr = query_info.limit;
source->addTotalRowsApprox(rows_appr);
}
pipe.addSource(std::move(source));
}

View File

@ -344,7 +344,7 @@ def main():
update_contributors()
return
version = get_version_from_repo(args.version_path)
version = get_version_from_repo(args.version_path, Git(True))
if args.update:
version = version.update(args.update)

View File

@ -13,10 +13,14 @@
<values>
<value>protocol</value>
<value>domain</value>
<value>domainRFC</value>
<value>domainWithoutWWW</value>
<value>domainWithoutWWWRFC</value>
<value>topLevelDomain</value>
<value>firstSignificantSubdomain</value>
<value>firstSignificantSubdomainRFC</value>
<value>cutToFirstSignificantSubdomain</value>
<value>cutToFirstSignificantSubdomainRFC</value>
<value>path</value>
<value>pathFull</value>
<value>queryString</value>

View File

@ -124,8 +124,25 @@ example.com
example.com
com
example.com
example.com
example.com
example.com
example.com
example.com
example.com
example.com
example.com
com
====CUT TO FIRST SIGNIFICANT SUBDOMAIN WITH WWW====
www.com
example.com
example.com
example.com
example.com
www.com
example.com
example.com

View File

@ -7,42 +7,28 @@ SELECT protocol('http://127.0.0.1:443/') AS Scheme;
SELECT protocol('//127.0.0.1:443/') AS Scheme;
SELECT '====HOST====';
SELECT domain('http://paul@www.example.com:80/') AS Host;
SELECT domain('user:password@example.com:8080') AS Host;
SELECT domain('http://user:password@example.com:8080') AS Host;
SELECT domain('http://user:password@example.com:8080/path?query=value#fragment') AS Host;
SELECT domain('newuser:@example.com') AS Host;
SELECT domain('http://:pass@example.com') AS Host;
SELECT domain(':newpass@example.com') AS Host;
SELECT domain('http://user:pass@example@.com') AS Host;
SELECT domain('http://user:pass:example.com') AS Host;
SELECT domain('http:/paul/example/com') AS Host;
SELECT domain('http://www.example.com?q=4') AS Host;
SELECT domain('http://127.0.0.1:443/') AS Host;
SELECT domain('//www.example.com') AS Host;
SELECT domain('//paul@www.example.com') AS Host;
SELECT domain('www.example.com') as Host;
SELECT domain('example.com') as Host;
SELECT domainWithoutWWW('//paul@www.example.com') AS Host;
SELECT domainWithoutWWW('http://paul@www.example.com:80/') AS Host;
SELECT domainRFC('http://paul@www.example.com:80/') AS Host;
SELECT domainRFC('user:password@example.com:8080') AS Host;
SELECT domainRFC('http://user:password@example.com:8080') AS Host;
SELECT domainRFC('http://user:password@example.com:8080/path?query=value#fragment') AS Host;
SELECT domainRFC('newuser:@example.com') AS Host;
SELECT domainRFC('http://:pass@example.com') AS Host;
SELECT domainRFC(':newpass@example.com') AS Host;
SELECT domainRFC('http://user:pass@example@.com') AS Host;
SELECT domainRFC('http://user:pass:example.com') AS Host;
SELECT domainRFC('http:/paul/example/com') AS Host;
SELECT domainRFC('http://www.example.com?q=4') AS Host;
SELECT domainRFC('http://127.0.0.1:443/') AS Host;
SELECT domainRFC('//www.example.com') AS Host;
SELECT domainRFC('//paul@www.example.com') AS Host;
SELECT domainRFC('www.example.com') as Host;
SELECT domainRFC('example.com') as Host;
SELECT domainWithoutWWWRFC('//paul@www.example.com') AS Host;
SELECT domainWithoutWWWRFC('http://paul@www.example.com:80/') AS Host;
{% for suffix in ['', 'RFC'] -%}
SELECT domain{{ suffix }}('http://paul@www.example.com:80/') AS Host;
SELECT domain{{ suffix }}('user:password@example.com:8080') AS Host;
SELECT domain{{ suffix }}('http://user:password@example.com:8080') AS Host;
SELECT domain{{ suffix }}('http://user:password@example.com:8080/path?query=value#fragment') AS Host;
SELECT domain{{ suffix }}('newuser:@example.com') AS Host;
SELECT domain{{ suffix }}('http://:pass@example.com') AS Host;
SELECT domain{{ suffix }}(':newpass@example.com') AS Host;
SELECT domain{{ suffix }}('http://user:pass@example@.com') AS Host;
SELECT domain{{ suffix }}('http://user:pass:example.com') AS Host;
SELECT domain{{ suffix }}('http:/paul/example/com') AS Host;
SELECT domain{{ suffix }}('http://www.example.com?q=4') AS Host;
SELECT domain{{ suffix }}('http://127.0.0.1:443/') AS Host;
SELECT domain{{ suffix }}('//www.example.com') AS Host;
SELECT domain{{ suffix }}('//paul@www.example.com') AS Host;
SELECT domain{{ suffix }}('www.example.com') as Host;
SELECT domain{{ suffix }}('example.com') as Host;
SELECT domainWithoutWWW{{ suffix }}('//paul@www.example.com') AS Host;
SELECT domainWithoutWWW{{ suffix }}('http://paul@www.example.com:80/') AS Host;
{% endfor %}
SELECT '====NETLOC====';
SELECT netloc('http://paul@www.example.com:80/') AS Netloc;
@ -121,25 +107,31 @@ SELECT decodeURLComponent(encodeURLComponent('http://paul@127.0.0.1/?query=hello
SELECT decodeURLFormComponent(encodeURLFormComponent('http://paul@127.0.0.1/?query=hello world foo+bar#a=b'));
SELECT '====CUT TO FIRST SIGNIFICANT SUBDOMAIN====';
SELECT cutToFirstSignificantSubdomain('http://www.example.com');
SELECT cutToFirstSignificantSubdomain('http://www.example.com:1234');
SELECT cutToFirstSignificantSubdomain('http://www.example.com/a/b/c');
SELECT cutToFirstSignificantSubdomain('http://www.example.com/a/b/c?a=b');
SELECT cutToFirstSignificantSubdomain('http://www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain('http://paul@www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain('//paul@www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain('www.example.com');
SELECT cutToFirstSignificantSubdomain('example.com');
SELECT cutToFirstSignificantSubdomain('www.com');
SELECT cutToFirstSignificantSubdomain('com');
{% for suffix in ['', 'RFC'] -%}
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://www.example.com');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://www.example.com:1234');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://www.example.com/a/b/c');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://www.example.com/a/b/c?a=b');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('http://paul@www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('//paul@www.example.com/a/b/c?a=b#d=f');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('www.example.com');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('example.com');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('www.com');
SELECT cutToFirstSignificantSubdomain{{ suffix }}('com');
{% endfor %}
SELECT '====CUT TO FIRST SIGNIFICANT SUBDOMAIN WITH WWW====';
SELECT cutToFirstSignificantSubdomainWithWWW('http://com');
SELECT cutToFirstSignificantSubdomainWithWWW('http://www.com');
SELECT cutToFirstSignificantSubdomainWithWWW('http://www.example.com');
SELECT cutToFirstSignificantSubdomainWithWWW('http://www.foo.example.com');
SELECT cutToFirstSignificantSubdomainWithWWW('http://www.example.com:1');
SELECT cutToFirstSignificantSubdomainWithWWW('http://www.example.com/');
{% for suffix in ['', 'RFC'] -%}
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://com');
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://www.com');
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://www.example.com');
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://www.foo.example.com');
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://www.example.com:1');
SELECT cutToFirstSignificantSubdomainWithWWW{{ suffix }}('http://www.example.com/');
{% endfor %}
SELECT '====CUT WWW====';
SELECT cutWWW('http://www.example.com');

View File

@ -22,3 +22,27 @@ ipv6
0
host-no-dot
0
ipv4
0
80
80
80
80
hostname
0
80
80
80
80
default-port
80
80
ipv6
0
0
0
0
0
0
host-no-dot
0

View File

@ -1,34 +0,0 @@
select 'ipv4';
select port('http://127.0.0.1/');
select port('http://127.0.0.1:80');
select port('http://127.0.0.1:80/');
select port('//127.0.0.1:80/');
select port('127.0.0.1:80');
select 'hostname';
select port('http://foobar.com/');
select port('http://foobar.com:80');
select port('http://foobar.com:80/');
select port('//foobar.com:80/');
select port('foobar.com:80');
select 'default-port';
select port('http://127.0.0.1/', toUInt16(80));
select port('http://foobar.com/', toUInt16(80));
-- unsupported
/* ILLEGAL_TYPE_OF_ARGUMENT */ select port(toFixedString('', 1)); -- { serverError 43; }
/* ILLEGAL_TYPE_OF_ARGUMENT */ select port('', 1); -- { serverError 43; }
/* NUMBER_OF_ARGUMENTS_DOESNT_MATCH */ select port('', 1, 1); -- { serverError 42; }
--
-- Known limitations of domain() (getURLHost())
--
select 'ipv6';
select port('http://[2001:db8::8a2e:370:7334]/');
select port('http://[2001:db8::8a2e:370:7334]:80');
select port('http://[2001:db8::8a2e:370:7334]:80/');
select port('//[2001:db8::8a2e:370:7334]:80/');
select port('[2001:db8::8a2e:370:7334]:80');
select port('2001:db8::8a2e:370:7334:80');
select 'host-no-dot';
select port('//foobar:80/');

View File

@ -0,0 +1,39 @@
{% for suffix in ['', 'RFC'] -%}
select 'ipv4';
select port{{ suffix }}('http://127.0.0.1/');
select port{{ suffix }}('http://127.0.0.1:80');
select port{{ suffix }}('http://127.0.0.1:80/');
select port{{ suffix }}('//127.0.0.1:80/');
select port{{ suffix }}('127.0.0.1:80');
select 'hostname';
select port{{ suffix }}('http://foobar.com/');
select port{{ suffix }}('http://foobar.com:80');
select port{{ suffix }}('http://foobar.com:80/');
select port{{ suffix }}('//foobar.com:80/');
select port{{ suffix }}('foobar.com:80');
select 'default-port';
select port{{ suffix }}('http://127.0.0.1/', toUInt16(80));
select port{{ suffix }}('http://foobar.com/', toUInt16(80));
-- unsupported
/* ILLEGAL_TYPE_OF_ARGUMENT */ select port(toFixedString('', 1)); -- { serverError 43; }
/* ILLEGAL_TYPE_OF_ARGUMENT */ select port{{ suffix }}('', 1); -- { serverError 43; }
/* NUMBER_OF_ARGUMENTS_DOESNT_MATCH */ select port{{ suffix }}('', 1, 1); -- { serverError 42; }
--
-- Known limitations of domain() (getURLHost())
--
select 'ipv6';
select port{{ suffix }}('http://[2001:db8::8a2e:370:7334]/');
select port{{ suffix }}('http://[2001:db8::8a2e:370:7334]:80');
select port{{ suffix }}('http://[2001:db8::8a2e:370:7334]:80/');
select port{{ suffix }}('//[2001:db8::8a2e:370:7334]:80/');
select port{{ suffix }}('[2001:db8::8a2e:370:7334]:80');
select port{{ suffix }}('2001:db8::8a2e:370:7334:80');
select 'host-no-dot';
select port{{ suffix }}('//foobar:80/');
{%- endfor %}

View File

@ -89,3 +89,92 @@ select cutToFirstSignificantSubdomainCustom('city.kawasaki.jp', 'public_suffix_l
city.kawasaki.jp
select cutToFirstSignificantSubdomainCustom('some.city.kawasaki.jp', 'public_suffix_list');
city.kawasaki.jp
select '-- no-tld';
-- no-tld
-- even if there is no TLD, 2-nd level by default anyway
-- FIXME: make this behavior optional (so that TLD for host never changed, either empty or something real)
select cutToFirstSignificantSubdomainRFC('there-is-no-such-domain');
select cutToFirstSignificantSubdomainRFC('foo.there-is-no-such-domain');
foo.there-is-no-such-domain
select cutToFirstSignificantSubdomainRFC('bar.foo.there-is-no-such-domain');
foo.there-is-no-such-domain
select cutToFirstSignificantSubdomainCustomRFC('there-is-no-such-domain', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustomRFC('foo.there-is-no-such-domain', 'public_suffix_list');
foo.there-is-no-such-domain
select cutToFirstSignificantSubdomainCustomRFC('bar.foo.there-is-no-such-domain', 'public_suffix_list');
foo.there-is-no-such-domain
select firstSignificantSubdomainCustomRFC('bar.foo.there-is-no-such-domain', 'public_suffix_list');
foo
select '-- generic';
-- generic
select firstSignificantSubdomainCustomRFC('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel
kernel
select cutToFirstSignificantSubdomainCustomRFC('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
kernel.biz.ss
select '-- difference';
-- difference
-- biz.ss is not in the default TLD list, hence:
select cutToFirstSignificantSubdomainRFC('foo.kernel.biz.ss'); -- biz.ss
biz.ss
select cutToFirstSignificantSubdomainCustomRFC('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
kernel.biz.ss
select '-- 3+level';
-- 3+level
select cutToFirstSignificantSubdomainCustomRFC('xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
xx.blogspot.co.at
select firstSignificantSubdomainCustomRFC('xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
blogspot
select cutToFirstSignificantSubdomainCustomRFC('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
xx.blogspot.co.at
select firstSignificantSubdomainCustomRFC('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
blogspot
select '-- url';
-- url
select cutToFirstSignificantSubdomainCustomRFC('http://foobar.com', 'public_suffix_list');
foobar.com
select cutToFirstSignificantSubdomainCustomRFC('http://foobar.com/foo', 'public_suffix_list');
foobar.com
select cutToFirstSignificantSubdomainCustomRFC('http://bar.foobar.com/foo', 'public_suffix_list');
foobar.com
select cutToFirstSignificantSubdomainCustomRFC('http://xx.blogspot.co.at', 'public_suffix_list');
xx.blogspot.co.at
select '-- www';
-- www
select cutToFirstSignificantSubdomainCustomWithWWWRFC('http://www.foo', 'public_suffix_list');
www.foo
select cutToFirstSignificantSubdomainCustomRFC('http://www.foo', 'public_suffix_list');
foo
select '-- vector';
-- vector
select cutToFirstSignificantSubdomainCustomRFC('http://xx.blogspot.co.at/' || toString(number), 'public_suffix_list') from numbers(1);
xx.blogspot.co.at
select cutToFirstSignificantSubdomainCustomRFC('there-is-no-such-domain' || toString(number), 'public_suffix_list') from numbers(1);
select '-- no new line';
-- no new line
select cutToFirstSignificantSubdomainCustomRFC('foo.bar', 'no_new_line_list');
foo.bar
select cutToFirstSignificantSubdomainCustomRFC('a.foo.bar', 'no_new_line_list');
a.foo.bar
select cutToFirstSignificantSubdomainCustomRFC('a.foo.baz', 'no_new_line_list');
foo.baz
select '-- asterisk';
-- asterisk
select cutToFirstSignificantSubdomainCustomRFC('foo.something.sheffield.sch.uk', 'public_suffix_list');
something.sheffield.sch.uk
select cutToFirstSignificantSubdomainCustomRFC('something.sheffield.sch.uk', 'public_suffix_list');
something.sheffield.sch.uk
select cutToFirstSignificantSubdomainCustomRFC('sheffield.sch.uk', 'public_suffix_list');
sheffield.sch.uk
select '-- exclamation mark';
-- exclamation mark
select cutToFirstSignificantSubdomainCustomRFC('foo.kawasaki.jp', 'public_suffix_list');
foo.kawasaki.jp
select cutToFirstSignificantSubdomainCustomRFC('foo.foo.kawasaki.jp', 'public_suffix_list');
foo.foo.kawasaki.jp
select cutToFirstSignificantSubdomainCustomRFC('city.kawasaki.jp', 'public_suffix_list');
city.kawasaki.jp
select cutToFirstSignificantSubdomainCustomRFC('some.city.kawasaki.jp', 'public_suffix_list');
city.kawasaki.jp

View File

@ -1,57 +0,0 @@
-- { echo }
select '-- no-tld';
-- even if there is no TLD, 2-nd level by default anyway
-- FIXME: make this behavior optional (so that TLD for host never changed, either empty or something real)
select cutToFirstSignificantSubdomain('there-is-no-such-domain');
select cutToFirstSignificantSubdomain('foo.there-is-no-such-domain');
select cutToFirstSignificantSubdomain('bar.foo.there-is-no-such-domain');
select cutToFirstSignificantSubdomainCustom('there-is-no-such-domain', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('foo.there-is-no-such-domain', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');
select firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list');
select '-- generic';
select firstSignificantSubdomainCustom('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel
select cutToFirstSignificantSubdomainCustom('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
select '-- difference';
-- biz.ss is not in the default TLD list, hence:
select cutToFirstSignificantSubdomain('foo.kernel.biz.ss'); -- biz.ss
select cutToFirstSignificantSubdomainCustom('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
select '-- 3+level';
select cutToFirstSignificantSubdomainCustom('xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
select firstSignificantSubdomainCustom('xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
select cutToFirstSignificantSubdomainCustom('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
select firstSignificantSubdomainCustom('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
select '-- url';
select cutToFirstSignificantSubdomainCustom('http://foobar.com', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('http://foobar.com/foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('http://bar.foobar.com/foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('http://xx.blogspot.co.at', 'public_suffix_list');
select '-- www';
select cutToFirstSignificantSubdomainCustomWithWWW('http://www.foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('http://www.foo', 'public_suffix_list');
select '-- vector';
select cutToFirstSignificantSubdomainCustom('http://xx.blogspot.co.at/' || toString(number), 'public_suffix_list') from numbers(1);
select cutToFirstSignificantSubdomainCustom('there-is-no-such-domain' || toString(number), 'public_suffix_list') from numbers(1);
select '-- no new line';
select cutToFirstSignificantSubdomainCustom('foo.bar', 'no_new_line_list');
select cutToFirstSignificantSubdomainCustom('a.foo.bar', 'no_new_line_list');
select cutToFirstSignificantSubdomainCustom('a.foo.baz', 'no_new_line_list');
select '-- asterisk';
select cutToFirstSignificantSubdomainCustom('foo.something.sheffield.sch.uk', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('something.sheffield.sch.uk', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('sheffield.sch.uk', 'public_suffix_list');
select '-- exclamation mark';
select cutToFirstSignificantSubdomainCustom('foo.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('foo.foo.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('city.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom('some.city.kawasaki.jp', 'public_suffix_list');

View File

@ -0,0 +1,61 @@
-- { echo }
{% for suffix in ['', 'RFC'] -%}
select '-- no-tld';
-- even if there is no TLD, 2-nd level by default anyway
-- FIXME: make this behavior optional (so that TLD for host never changed, either empty or something real)
select cutToFirstSignificantSubdomain{{ suffix }}('there-is-no-such-domain');
select cutToFirstSignificantSubdomain{{ suffix }}('foo.there-is-no-such-domain');
select cutToFirstSignificantSubdomain{{ suffix }}('bar.foo.there-is-no-such-domain');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('there-is-no-such-domain', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.there-is-no-such-domain', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('bar.foo.there-is-no-such-domain', 'public_suffix_list');
select firstSignificantSubdomainCustom{{ suffix }}('bar.foo.there-is-no-such-domain', 'public_suffix_list');
select '-- generic';
select firstSignificantSubdomainCustom{{ suffix }}('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
select '-- difference';
-- biz.ss is not in the default TLD list, hence:
select cutToFirstSignificantSubdomain{{ suffix }}('foo.kernel.biz.ss'); -- biz.ss
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.kernel.biz.ss', 'public_suffix_list'); -- kernel.biz.ss
select '-- 3+level';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
select firstSignificantSubdomainCustom{{ suffix }}('xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- xx.blogspot.co.at
select firstSignificantSubdomainCustom{{ suffix }}('foo.bar.xx.blogspot.co.at', 'public_suffix_list'); -- blogspot
select '-- url';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://foobar.com', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://foobar.com/foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://bar.foobar.com/foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://xx.blogspot.co.at', 'public_suffix_list');
select '-- www';
select cutToFirstSignificantSubdomainCustomWithWWW{{ suffix }}('http://www.foo', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://www.foo', 'public_suffix_list');
select '-- vector';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('http://xx.blogspot.co.at/' || toString(number), 'public_suffix_list') from numbers(1);
select cutToFirstSignificantSubdomainCustom{{ suffix }}('there-is-no-such-domain' || toString(number), 'public_suffix_list') from numbers(1);
select '-- no new line';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.bar', 'no_new_line_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('a.foo.bar', 'no_new_line_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('a.foo.baz', 'no_new_line_list');
select '-- asterisk';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.something.sheffield.sch.uk', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('something.sheffield.sch.uk', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('sheffield.sch.uk', 'public_suffix_list');
select '-- exclamation mark';
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('foo.foo.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('city.kawasaki.jp', 'public_suffix_list');
select cutToFirstSignificantSubdomainCustom{{ suffix }}('some.city.kawasaki.jp', 'public_suffix_list');
{% endfor %}

View File

@ -62,6 +62,11 @@ SHOW CREATE TABLE 01902_db.t_merge_1;
SELECT 'SELECT _database, _table, n FROM merge(currentDatabase(), ^t) ORDER BY _database, _table, n';
SELECT _database, _table, n FROM merge(currentDatabase(), '^t') ORDER BY _database, _table, n;
--fuzzed LOGICAL_ERROR
CREATE TABLE 01902_db.t4 (n Date) ENGINE=MergeTree ORDER BY n;
INSERT INTO 01902_db.t4 SELECT * FROM numbers(10);
SELECT NULL FROM 01902_db.t_merge WHERE n ORDER BY _table DESC; -- {serverError ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER}
DROP DATABASE 01902_db;
DROP DATABASE 01902_db1;
DROP DATABASE 01902_db2;

View File

@ -219,14 +219,6 @@ cutFragment
cutIPv6
cutQueryString
cutQueryStringAndFragment
cutToFirstSignificantSubdomain
cutToFirstSignificantSubdomainCustom
cutToFirstSignificantSubdomainCustomRFC
cutToFirstSignificantSubdomainCustomWithWWW
cutToFirstSignificantSubdomainCustomWithWWWRFC
cutToFirstSignificantSubdomainRFC
cutToFirstSignificantSubdomainWithWWW
cutToFirstSignificantSubdomainWithWWWRFC
cutURLParameter
cutWWW
dateDiff
@ -284,10 +276,6 @@ dictGetUUIDOrDefault
dictHas
dictIsIn
divide
domain
domainRFC
domainWithoutWWW
domainWithoutWWWRFC
dotProduct
dumpColumnStructure
e
@ -336,10 +324,8 @@ filesystemAvailable
filesystemCapacity
filesystemFree
finalizeAggregation
firstSignificantSubdomain
firstSignificantSubdomainCustom
firstSignificantSubdomainCustomRFC
firstSignificantSubdomainRFC
flattenTuple
floor
format
@ -600,8 +586,6 @@ polygonsUnionCartesian
polygonsUnionSpherical
polygonsWithinCartesian
polygonsWithinSpherical
port
portRFC
position
positionCaseInsensitive
positionCaseInsensitiveUTF8
@ -906,8 +890,6 @@ toYear
toYearWeek
today
tokens
topLevelDomain
topLevelDomainRFC
transactionID
transactionLatestSnapshot
transactionOldestSnapshot

View File

@ -0,0 +1,14 @@
1 1
2 2
1 1
2 2
3 3
4 4
5 5
6 6
3 3
4 4
3 3
4 4
5 5
6 6

View File

@ -0,0 +1,43 @@
#!/usr/bin/env bash
# Tags: no-fasttest, no-parallel
CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CUR_DIR"/../shell_config.sh
user_files_path=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}')
mkdir $user_files_path/d1
touch $user_files_path/d1/text1.txt
for i in {1..2}
do
echo $i$'\t'$i >> $user_files_path/d1/text1.txt
done
mkdir $user_files_path/d1/d2
touch $user_files_path/d1/d2/text2.txt
for i in {3..4}
do
echo $i$'\t'$i >> $user_files_path/d1/d2/text2.txt
done
mkdir $user_files_path/d1/d2/d3
touch $user_files_path/d1/d2/d3/text3.txt
for i in {5..6}
do
echo $i$'\t'$i >> $user_files_path/d1/d2/d3/text3.txt
done
${CLICKHOUSE_CLIENT} -q "SELECT * from file ('d1/*','TSV', 'Index UInt8, Number UInt8')" | sort --numeric-sort
${CLICKHOUSE_CLIENT} -q "SELECT * from file ('d1/**','TSV', 'Index UInt8, Number UInt8')" | sort --numeric-sort
${CLICKHOUSE_CLIENT} -q "SELECT * from file ('d1/*/tex*','TSV', 'Index UInt8, Number UInt8')" | sort --numeric-sort
${CLICKHOUSE_CLIENT} -q "SELECT * from file ('d1/**/tex*','TSV', 'Index UInt8, Number UInt8')" | sort --numeric-sort
rm $user_files_path/d1/d2/d3/text3.txt
rmdir $user_files_path/d1/d2/d3
rm $user_files_path/d1/d2/text2.txt
rmdir $user_files_path/d1/d2
rm $user_files_path/d1/text1.txt
rmdir $user_files_path/d1

View File

@ -0,0 +1,64 @@
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02
2022-02-02 00:00:01
2022-02-02 00:00:02

View File

@ -0,0 +1,62 @@
create table tab (x Nullable(UInt8)) engine = MergeTree order by x settings allow_nullable_key = 1, index_granularity = 2;
insert into tab select number from numbers(4);
set allow_suspicious_low_cardinality_types=1;
set max_rows_to_read = 2;
SELECT x + 1 FROM tab where plus(x, 1) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::Nullable(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(Nullable(UInt8))) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1, x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::Nullable(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(Nullable(UInt8)), x) <= 2 order by x;
drop table tab;
set max_rows_to_read = 100;
create table tab (x LowCardinality(UInt8)) engine = MergeTree order by x settings allow_nullable_key = 1, index_granularity = 2;
insert into tab select number from numbers(4);
set max_rows_to_read = 2;
SELECT x + 1 FROM tab where plus(x, 1) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::Nullable(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(Nullable(UInt8))) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1, x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::Nullable(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(Nullable(UInt8)), x) <= 2 order by x;
drop table tab;
set max_rows_to_read = 100;
create table tab (x UInt128) engine = MergeTree order by x settings allow_nullable_key = 1, index_granularity = 2;
insert into tab select number from numbers(4);
set max_rows_to_read = 2;
SELECT x + 1 FROM tab where plus(x, 1) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::Nullable(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(UInt8)) <= 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(Nullable(UInt8))) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1, x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::Nullable(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(UInt8), x) <= 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(Nullable(UInt8)), x) <= 2 order by x;
set max_rows_to_read = 100;
SELECT x + 1 FROM tab WHERE (x + 1::LowCardinality(UInt8)) <= -9223372036854775808 order by x;
drop table tab;
create table tab (x DateTime) engine = MergeTree order by x settings allow_nullable_key = 1, index_granularity = 2;
insert into tab select toDateTime('2022-02-02') + number from numbers(4);
set max_rows_to_read = 2;
SELECT x + 1 FROM tab where plus(x, 1) <= toDateTime('2022-02-02') + 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::Nullable(UInt8)) <= toDateTime('2022-02-02') + 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(UInt8)) <= toDateTime('2022-02-02') + 2 order by x;
SELECT x + 1 FROM tab where plus(x, 1::LowCardinality(Nullable(UInt8))) <= toDateTime('2022-02-02') + 2 order by x;
SELECT 1 + x FROM tab where plus(1, x) <= toDateTime('2022-02-02') + 2 order by x;
SELECT 1 + x FROM tab where plus(1::Nullable(UInt8), x) <= toDateTime('2022-02-02') + 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(UInt8), x) <= toDateTime('2022-02-02') + 2 order by x;
SELECT 1 + x FROM tab where plus(1::LowCardinality(Nullable(UInt8)), x) <= toDateTime('2022-02-02') + 2 order by x;
SELECT x + 1 FROM tab WHERE (x + CAST('1', 'Nullable(UInt8)')) <= -2147483647 ORDER BY x ASC NULLS FIRST;

View File

@ -0,0 +1,7 @@
0
0
1
2
3
4
0

View File

@ -0,0 +1,22 @@
DROP TABLE IF EXISTS t_max_rows_to_read;
CREATE TABLE t_max_rows_to_read (a UInt64)
ENGINE = MergeTree ORDER BY a
SETTINGS index_granularity = 4;
INSERT INTO t_max_rows_to_read SELECT number FROM numbers(100);
SET max_block_size = 10;
SET max_rows_to_read = 20;
SET read_overflow_mode = 'throw';
SELECT number FROM numbers(30); -- { serverError 158 }
SELECT number FROM numbers(30) LIMIT 21; -- { serverError 158 }
SELECT number FROM numbers(30) LIMIT 1;
SELECT number FROM numbers(5);
SELECT a FROM t_max_rows_to_read LIMIT 1;
SELECT a FROM t_max_rows_to_read LIMIT 11 offset 11; -- { serverError 158 }
SELECT a FROM t_max_rows_to_read WHERE a > 50 LIMIT 1; -- { serverError 158 }
DROP TABLE t_max_rows_to_read;

View File

@ -0,0 +1,2 @@
1 test
1 test

View File

@ -0,0 +1,31 @@
-- https://github.com/ClickHouse/ClickHouse/issues/42460
DROP TABLE IF EXISTS bloom_filter_nullable_index__fuzz_0;
CREATE TABLE bloom_filter_nullable_index__fuzz_0
(
`order_key` UInt64,
`str` Nullable(String),
INDEX idx str TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree ORDER BY order_key SETTINGS index_granularity = 6;
INSERT INTO bloom_filter_nullable_index__fuzz_0 VALUES (1, 'test');
INSERT INTO bloom_filter_nullable_index__fuzz_0 VALUES (2, 'test2');
DROP TABLE IF EXISTS bloom_filter_nullable_index__fuzz_1;
CREATE TABLE bloom_filter_nullable_index__fuzz_1
(
`order_key` UInt64,
`str` String,
INDEX idx str TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree ORDER BY order_key SETTINGS index_granularity = 6;
INSERT INTO bloom_filter_nullable_index__fuzz_0 VALUES (1, 'test');
INSERT INTO bloom_filter_nullable_index__fuzz_0 VALUES (2, 'test2');
DROP TABLE IF EXISTS nullable_string_value__fuzz_2;
CREATE TABLE nullable_string_value__fuzz_2 (`value` LowCardinality(String)) ENGINE = TinyLog;
INSERT INTO nullable_string_value__fuzz_2 VALUES ('test');
SELECT * FROM bloom_filter_nullable_index__fuzz_0 WHERE str IN (SELECT value FROM nullable_string_value__fuzz_2);
SELECT * FROM bloom_filter_nullable_index__fuzz_1 WHERE str IN (SELECT value FROM nullable_string_value__fuzz_2);

View File

@ -0,0 +1,24 @@
DROP TABLE IF EXISTS prewhere_int128;
DROP TABLE IF EXISTS prewhere_int256;
DROP TABLE IF EXISTS prewhere_uint128;
DROP TABLE IF EXISTS prewhere_uint256;
CREATE TABLE prewhere_int128 (a Int128) ENGINE=MergeTree ORDER BY a;
INSERT INTO prewhere_int128 VALUES (1);
SELECT a FROM prewhere_int128 PREWHERE a; -- { serverError 59 }
DROP TABLE prewhere_int128;
CREATE TABLE prewhere_int256 (a Int256) ENGINE=MergeTree ORDER BY a;
INSERT INTO prewhere_int256 VALUES (1);
SELECT a FROM prewhere_int256 PREWHERE a; -- { serverError 59 }
DROP TABLE prewhere_int256;
CREATE TABLE prewhere_uint128 (a UInt128) ENGINE=MergeTree ORDER BY a;
INSERT INTO prewhere_uint128 VALUES (1);
SELECT a FROM prewhere_uint128 PREWHERE a; -- { serverError 59 }
DROP TABLE prewhere_uint128;
CREATE TABLE prewhere_uint256 (a UInt256) ENGINE=MergeTree ORDER BY a;
INSERT INTO prewhere_uint256 VALUES (1);
SELECT a FROM prewhere_uint256 PREWHERE a; -- { serverError 59 }
DROP TABLE prewhere_uint256;

View File

@ -1,12 +1,16 @@
v22.10.1.1877-stable 2022-10-26
v22.9.4.32-stable 2022-10-26
v22.9.3.18-stable 2022-09-30
v22.9.2.7-stable 2022-09-23
v22.9.1.2603-stable 2022-09-22
v22.8.7.34-lts 2022-10-26
v22.8.6.71-lts 2022-09-30
v22.8.5.29-lts 2022-09-13
v22.8.4.7-lts 2022-08-31
v22.8.3.13-lts 2022-08-29
v22.8.2.11-lts 2022-08-23
v22.8.1.2097-lts 2022-08-18
v22.7.7.24-stable 2022-10-26
v22.7.6.74-stable 2022-09-30
v22.7.5.13-stable 2022-08-29
v22.7.4.16-stable 2022-08-23

1 v22.9.3.18-stable v22.10.1.1877-stable 2022-09-30 2022-10-26
1 v22.10.1.1877-stable 2022-10-26
2 v22.9.4.32-stable 2022-10-26
3 v22.9.3.18-stable v22.9.3.18-stable 2022-09-30 2022-09-30
4 v22.9.2.7-stable v22.9.2.7-stable 2022-09-23 2022-09-23
5 v22.9.1.2603-stable v22.9.1.2603-stable 2022-09-22 2022-09-22
6 v22.8.7.34-lts 2022-10-26
7 v22.8.6.71-lts v22.8.6.71-lts 2022-09-30 2022-09-30
8 v22.8.5.29-lts v22.8.5.29-lts 2022-09-13 2022-09-13
9 v22.8.4.7-lts v22.8.4.7-lts 2022-08-31 2022-08-31
10 v22.8.3.13-lts v22.8.3.13-lts 2022-08-29 2022-08-29
11 v22.8.2.11-lts v22.8.2.11-lts 2022-08-23 2022-08-23
12 v22.8.1.2097-lts v22.8.1.2097-lts 2022-08-18 2022-08-18
13 v22.7.7.24-stable 2022-10-26
14 v22.7.6.74-stable v22.7.6.74-stable 2022-09-30 2022-09-30
15 v22.7.5.13-stable v22.7.5.13-stable 2022-08-29 2022-08-29
16 v22.7.4.16-stable v22.7.4.16-stable 2022-08-23 2022-08-23