Merge branch 'master' into data_source_description

This commit is contained in:
alesapin 2022-08-21 12:11:21 +02:00
commit 78ba732b31
93 changed files with 2100 additions and 743 deletions

View File

@ -13,13 +13,24 @@ on: # yamllint disable-line rule:truthy
- 'v*-prestable'
- 'v*-stable'
- 'v*-lts'
workflow_dispatch:
inputs:
tag:
description: 'Test tag'
required: true
type: string
jobs:
UpdateVersions:
runs-on: [self-hosted, style-checker]
steps:
- name: Set test tag
if: github.event_name == 'workflow_dispatch'
run: |
echo "GITHUB_TAG=${{ github.event.inputs.tag }}" >> "$GITHUB_ENV"
- name: Get tag name
if: github.event_name != 'workflow_dispatch'
run: |
echo "GITHUB_TAG=${GITHUB_REF#refs/tags/}" >> "$GITHUB_ENV"
- name: Check out repository code
@ -35,19 +46,22 @@ jobs:
GID=$(id -g "${UID}")
docker run -u "${UID}:${GID}" -e PYTHONUNBUFFERED=1 \
--volume="${GITHUB_WORKSPACE}:/ClickHouse" clickhouse/style-test \
/ClickHouse/utils/changelog/changelog.py -vv --gh-user-or-token="$GITHUB_TOKEN" \
--output="/ClickHouse/docs/changelogs/${GITHUB_TAG}.md" --jobs=5 "${GITHUB_TAG}"
/ClickHouse/utils/changelog/changelog.py -v --debug-helpers \
--gh-user-or-token="$GITHUB_TOKEN" --jobs=5 \
--output="/ClickHouse/docs/changelogs/${GITHUB_TAG}.md" "${GITHUB_TAG}"
git add "./docs/changelogs/${GITHUB_TAG}.md"
git diff HEAD
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
with:
author: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
token: ${{ secrets.ROBOT_CLICKHOUSE_COMMIT_TOKEN }}
committer: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
commit-message: Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}
branch: auto/${{ env.GITHUB_TAG }}
delete-branch: true
title: Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}
labels: do not test
body: |
Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v22.8, 2022-08-18](#228)**<br/>
**[ClickHouse release v22.7, 2022-07-21](#227)**<br/>
**[ClickHouse release v22.6, 2022-06-16](#226)**<br/>
**[ClickHouse release v22.5, 2022-05-19](#225)**<br/>
@ -8,6 +9,148 @@
**[ClickHouse release v22.1, 2022-01-18](#221)**<br/>
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br/>
### <a id="228"></a> ClickHouse release 22.8, 2022-08-18
#### Backward Incompatible Change
* Extended range of `Date32` and `DateTime64` to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Prebuilt ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011. [#39000](https://github.com/ClickHouse/ClickHouse/pull/39000) ([Robert Schulze](https://github.com/rschu1ze)).
* Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. ([Kseniia Sumarokova](https://github.com/kssenii)). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171))
#### New Feature
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)) ([Alexander Gololobov](https://github.com/davenger)). Note: this new feature does not make ClickHouse an HTAP DBMS.
* Query parameters can be set in interactive mode as `SET param_abc = 'def'` and transferred via the native protocol as settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
* Quota key can be set in the native protocol ([Yakov Olkhovsky](https://github.com/ClickHouse/ClickHouse/pull/39874)).
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
* Added support for parallel distributed insert select with `s3Cluster` table function into tables with `Distributed` and `Replicated` engine [#34670](https://github.com/ClickHouse/ClickHouse/issues/34670). [#39107](https://github.com/ClickHouse/ClickHouse/pull/39107) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add new settings to control schema inference from text formats: - `input_format_try_infer_dates` - try infer dates from strings. - `input_format_try_infer_datetimes` - try infer datetimes from strings. - `input_format_try_infer_integers` - try infer `Int64` instead of `Float64`. - `input_format_json_try_infer_numbers_from_strings` - try infer numbers from json strings in JSON formats. [#39186](https://github.com/ClickHouse/ClickHouse/pull/39186) ([Kruglov Pavel](https://github.com/Avogar)).
* An option to provide JSON formatted log output. The purpose is to allow easier ingestion and query in log analysis tools. [#39277](https://github.com/ClickHouse/ClickHouse/pull/39277) ([Mallik Hassan](https://github.com/SadiHassan)).
* Add function `nowInBlock` which allows getting the current time during long-running and continuous queries. Closes [#39522](https://github.com/ClickHouse/ClickHouse/issues/39522). Notes: there are no functions `now64InBlock` neither `todayInBlock`. [#39533](https://github.com/ClickHouse/ClickHouse/pull/39533) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add ability to specify settings for an `executable()` table function. [#39681](https://github.com/ClickHouse/ClickHouse/pull/39681) ([Constantine Peresypkin](https://github.com/pkit)).
* Implemented automatic conversion of database engine from `Ordinary` to `Atomic`. Create empty `convert_ordinary_to_atomic` file in `flags` directory and all `Ordinary` databases will be converted automatically on next server start. Resolves [#39546](https://github.com/ClickHouse/ClickHouse/issues/39546). [#39933](https://github.com/ClickHouse/ClickHouse/pull/39933) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support `SELECT ... INTO OUTFILE '...' AND STDOUT`. [#37490](https://github.com/ClickHouse/ClickHouse/issues/37490). [#39054](https://github.com/ClickHouse/ClickHouse/pull/39054) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add formats `PrettyMonoBlock`, `PrettyNoEscapesMonoBlock`, `PrettyCompactNoEscapes`, `PrettyCompactNoEscapesMonoBlock`, `PrettySpaceNoEscapes`, `PrettySpaceMonoBlock`, `PrettySpaceNoEscapesMonoBlock`. [#39646](https://github.com/ClickHouse/ClickHouse/pull/39646) ([Kruglov Pavel](https://github.com/Avogar)).
* Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)). Add `concurrent_threads_soft_limit parameter` to increase performance in case of high QPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)). [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
* `DISTINCT` in order with `ORDER BY`: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)). Improve memory usage (significantly) and query execution time + use `DistinctSortedChunkTransform` for final distinct when `DISTINCT` columns match `ORDER BY` columns, but rename to `DistinctSortedStreamTransform` in `EXPLAIN PIPELINE` → this improves memory usage significantly + remove unnecessary allocations in hot loop in `DistinctSortedChunkTransform`. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)). Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)). Fix: `DistinctSortedTransform` didn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary `DISTINCT` (`DistinctTransform`). The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
* Optimize filtering by numeric columns with AVX512VBMI2 compress store. [#39633](https://github.com/ClickHouse/ClickHouse/pull/39633) ([Guo Wangyang](https://github.com/guowangy)). For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. [#40033](https://github.com/ClickHouse/ClickHouse/pull/40033) ([Robert Schulze](https://github.com/rschu1ze)).
* Optimize index analysis with functional expressions in multi-thread scenario. [#39812](https://github.com/ClickHouse/ClickHouse/pull/39812) ([Guo Wangyang](https://github.com/guowangy)).
* Optimizations for complex queries: Don't visit the AST for UDFs if none are registered. [#40069](https://github.com/ClickHouse/ClickHouse/pull/40069) ([Raúl Marín](https://github.com/Algunenano)). Optimize CurrentMemoryTracker alloc and free. [#40078](https://github.com/ClickHouse/ClickHouse/pull/40078) ([Raúl Marín](https://github.com/Algunenano)).
* Improved Base58 encoding/decoding. [#39292](https://github.com/ClickHouse/ClickHouse/pull/39292) ([Andrey Zvonov](https://github.com/zvonand)).
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
#### Improvement
* Normalize `AggregateFunction` types and state representations because optimizations like [#35788](https://github.com/ClickHouse/ClickHouse/pull/35788) will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)). The functions with identical states can be used in materialized views interchangeably.
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set the ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add result_rows and result_bytes to progress reports (`X-ClickHouse-Summary`). [#39567](https://github.com/ClickHouse/ClickHouse/pull/39567) ([Raúl Marín](https://github.com/Algunenano)).
* Improve primary key analysis for MergeTree. [#25563](https://github.com/ClickHouse/ClickHouse/pull/25563) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* `timeSlots` now works with DateTime64; subsecond duration and slot size available when working with DateTime64. [#37951](https://github.com/ClickHouse/ClickHouse/pull/37951) ([Andrey Zvonov](https://github.com/zvonand)).
* Added support of `LEFT SEMI` and `LEFT ANTI` direct join with `EmbeddedRocksDB` tables. [#38956](https://github.com/ClickHouse/ClickHouse/pull/38956) ([Vladimir C](https://github.com/vdimir)).
* Add profile events for fsync operations. [#39179](https://github.com/ClickHouse/ClickHouse/pull/39179) ([Azat Khuzhin](https://github.com/azat)).
* Add the second argument to the ordinary function `file(path[, default])`, which function returns in the case when a file does not exists. [#39218](https://github.com/ClickHouse/ClickHouse/pull/39218) ([Nikolay Degterinsky](https://github.com/evillique)).
* Some small fixes for reading via http, allow to retry partial content in case if 200 OK. [#39244](https://github.com/ClickHouse/ClickHouse/pull/39244) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support queries `CREATE TEMPORARY TABLE ... (<list of columns>) AS ...`. [#39462](https://github.com/ClickHouse/ClickHouse/pull/39462) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support of `!`/`*` (exclamation/asterisk) in custom TLDs (`cutToFirstSignificantSubdomainCustom()`/`cutToFirstSignificantSubdomainCustomWithWWW()`/`firstSignificantSubdomainCustom()`). [#39496](https://github.com/ClickHouse/ClickHouse/pull/39496) ([Azat Khuzhin](https://github.com/azat)).
* Add support for TLS connections to NATS. Implements [#39525](https://github.com/ClickHouse/ClickHouse/issues/39525). [#39527](https://github.com/ClickHouse/ClickHouse/pull/39527) ([Constantine Peresypkin](https://github.com/pkit)).
* `clickhouse-obfuscator` (a tool for database obfuscation for testing and load generation) now has the new `--save` and `--load` parameters to work with pre-trained models. This closes [#39534](https://github.com/ClickHouse/ClickHouse/issues/39534). [#39541](https://github.com/ClickHouse/ClickHouse/pull/39541) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix incorrect behavior of log rotation during restart. [#39558](https://github.com/ClickHouse/ClickHouse/pull/39558) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes [#39667](https://github.com/ClickHouse/ClickHouse/issues/39667) . [#39671](https://github.com/ClickHouse/ClickHouse/pull/39671) ([Amos Bird](https://github.com/amosbird)).
* Allow to execute hash functions with arguments of type `Map`. [#39685](https://github.com/ClickHouse/ClickHouse/pull/39685) ([Anton Popov](https://github.com/CurtizJ)).
* Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. [#39690](https://github.com/ClickHouse/ClickHouse/pull/39690) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Change the prefix size of AggregateFunctionDistinct to make sure nested function data memory segment is aligned. [#39696](https://github.com/ClickHouse/ClickHouse/pull/39696) ([Pxl](https://github.com/BiteTheDDDDt)).
* Properly escape credentials passed to the `clickhouse-diagnostic` tool. [#39707](https://github.com/ClickHouse/ClickHouse/pull/39707) ([Dale McDiarmid](https://github.com/gingerwizard)).
* ClickHouse Keeper improvement: create a snapshot on exit. It can be controlled with the config `keeper_server.create_snapshot_on_exit`, `true` by default. [#39755](https://github.com/ClickHouse/ClickHouse/pull/39755) ([Antonio Andelic](https://github.com/antonio2368)).
* Support primary key analysis for `row_policy_filter` and `additional_filter`. It also helps fix issues like [#37454](https://github.com/ClickHouse/ClickHouse/issues/37454) . [#39826](https://github.com/ClickHouse/ClickHouse/pull/39826) ([Amos Bird](https://github.com/amosbird)).
* Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes [#39957](https://github.com/ClickHouse/ClickHouse/issues/39957). This closes [#39960](https://github.com/ClickHouse/ClickHouse/issues/39960). [#39961](https://github.com/ClickHouse/ClickHouse/pull/39961) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. [#39962](https://github.com/ClickHouse/ClickHouse/pull/39962) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. [#40053](https://github.com/ClickHouse/ClickHouse/pull/40053) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes [#38070](https://github.com/ClickHouse/ClickHouse/issues/38070). See also [this](https://github.com/ClickHouse/ClickBench/blob/main/hardware/benchmark-cloud.sh#L37) for motivation. [#39968](https://github.com/ClickHouse/ClickHouse/pull/39968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Adds `parseDateTime64BestEffortUS`, `parseDateTime64BestEffortUSOrNull`, `parseDateTime64BestEffortUSOrZero` functions, closing [#37492](https://github.com/ClickHouse/ClickHouse/issues/37492). [#40015](https://github.com/ClickHouse/ClickHouse/pull/40015) ([Tanya Bragin](https://github.com/tbragin)).
* Extend the `system.processors_profile_log` with more information such as input rows. [#40121](https://github.com/ClickHouse/ClickHouse/pull/40121) ([Amos Bird](https://github.com/amosbird)).
* Display server-side time in `clickhouse-benchmark` by default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new `--client-side-time` command line option. Change the `--randomize` command line option from `--randomize 1` to the form without argument. [#40193](https://github.com/ClickHouse/ClickHouse/pull/40193) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for `overflow_mode` = `break` and `throw`). For example, if you have set up `max_rows_to_read` with `read_overflow_mode = 'break'`, looking at the value of `OverflowBreak` counter will allow distinguishing incomplete results. [#40205](https://github.com/ClickHouse/ClickHouse/pull/40205) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix memory accounting in case of "Memory limit exceeded" errors (previously [peak] memory usage was takes failed allocations into account). [#40249](https://github.com/ClickHouse/ClickHouse/pull/40249) ([Azat Khuzhin](https://github.com/azat)).
* Add metrics for filesystem cache: `FilesystemCacheSize` and `FilesystemCacheElements`. [#40260](https://github.com/ClickHouse/ClickHouse/pull/40260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support hadoop secure RPC transfer (hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). [#39411](https://github.com/ClickHouse/ClickHouse/pull/39411) ([michael1589](https://github.com/michael1589)).
* Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). [#40264](https://github.com/ClickHouse/ClickHouse/pull/40264) ([Robert Schulze](https://github.com/rschu1ze)).
* Add cache for schema inference for file/s3/hdfs/url table functions. Now, schema inference will be performed only on the first query to the file, all subsequent queries to the same file will use the schema from cache if data wasn't changed. Add system table system.schema_inference_cache with all current schemas in cache and system queries SYSTEM DROP SCHEMA CACHE [FOR FILE/S3/HDFS/URL] to drop schemas from cache. [#38286](https://github.com/ClickHouse/ClickHouse/pull/38286) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
#### Build/Testing/Packaging Improvement
* [ClickFiddle](https://fiddle.clickhouse.com/): A new tool for testing ClickHouse versions in read/write mode (**Igor Baliuk**).
* ClickHouse binary is made self-extracting [#35775](https://github.com/ClickHouse/ClickHouse/pull/35775) ([Yakov Olkhovskiy, Arthur Filatenkov](https://github.com/yakov-olkhovskiy)).
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Former packages used to install systemd.service file to `/etc`. The files there are marked as `conf` and are not cleaned out, and not updated automatically. This PR cleans them out. [#39323](https://github.com/ClickHouse/ClickHouse/pull/39323) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Ensure LSan is effective. [#39430](https://github.com/ClickHouse/ClickHouse/pull/39430) ([Azat Khuzhin](https://github.com/azat)).
* TSAN has issues with clang-14 (https://github.com/google/sanitizers/issues/1552, https://github.com/google/sanitizers/issues/1540), so here we build the TSAN binaries with clang-15. [#39450](https://github.com/ClickHouse/ClickHouse/pull/39450) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove the option to build ClickHouse tools as separate executable programs. This fixes [#37847](https://github.com/ClickHouse/ClickHouse/issues/37847). [#39520](https://github.com/ClickHouse/ClickHouse/pull/39520) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Small preparations for build on s390x (which is big-endian). [#39627](https://github.com/ClickHouse/ClickHouse/pull/39627) ([Harry Lee](https://github.com/HarryLeeIBM)). [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issue in BitHelpers for s390x. [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)). Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). [#39732](https://github.com/ClickHouse/ClickHouse/pull/39732) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). [#39931](https://github.com/ClickHouse/ClickHouse/pull/39931) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). [#40008](https://github.com/ClickHouse/ClickHouse/pull/40008) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). [#40179](https://github.com/ClickHouse/ClickHouse/pull/40179) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support build with `clang-16` (trunk). This closes [#39949](https://github.com/ClickHouse/ClickHouse/issues/39949). [#40181](https://github.com/ClickHouse/ClickHouse/pull/40181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prepare RISC-V 64 build to run in CI. This is for [#40141](https://github.com/ClickHouse/ClickHouse/issues/40141). [#40197](https://github.com/ClickHouse/ClickHouse/pull/40197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Simplified function registration macro interface (`FUNCTION_REGISTER*`) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. [#38615](https://github.com/ClickHouse/ClickHouse/pull/38615) ([Li Yin](https://github.com/liyinsg)).
* Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Bug Fix
* Fix possible segfault in `CapnProto` input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix the case when the order of columns can be incorrect if the `IN` operator is used with a table with `ENGINE = Set` containing multiple columns. This fixes [#13014](https://github.com/ClickHouse/ClickHouse/issues/13014). [#40225](https://github.com/ClickHouse/ClickHouse/pull/40225) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix seeking while reading from encrypted disk. This PR fixes [#38381](https://github.com/ClickHouse/ClickHouse/issues/38381). [#39687](https://github.com/ClickHouse/ClickHouse/pull/39687) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix duplicate columns in join plan. Finally, solve [#26809](https://github.com/ClickHouse/ClickHouse/issues/26809). [#40009](https://github.com/ClickHouse/ClickHouse/pull/40009) ([Vladimir C](https://github.com/vdimir)).
* Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. [#37849](https://github.com/ClickHouse/ClickHouse/pull/37849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). [#38725](https://github.com/ClickHouse/ClickHouse/pull/38725) ([Azat Khuzhin](https://github.com/azat)).
* Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes [#37032](https://github.com/ClickHouse/ClickHouse/issues/37032). [#39103](https://github.com/ClickHouse/ClickHouse/pull/39103) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Fix wrong table name in logs after RENAME TABLE. This fixes [#38018](https://github.com/ClickHouse/ClickHouse/issues/38018). [#39227](https://github.com/ClickHouse/ClickHouse/pull/39227) ([Amos Bird](https://github.com/amosbird)).
* Fix positional arguments in case of columns pruning when optimising the query. Closes [#38433](https://github.com/ClickHouse/ClickHouse/issues/38433). [#39293](https://github.com/ClickHouse/ClickHouse/pull/39293) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty `Tuple` type. Closes [#39051](https://github.com/ClickHouse/ClickHouse/issues/39051) Add 2 new settings `input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference` that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. [#39357](https://github.com/ClickHouse/ClickHouse/pull/39357) ([Kruglov Pavel](https://github.com/Avogar)).
* (Window View is an experimental feature) Fix segmentation fault on `CREATE WINDOW VIEW .. ON CLUSTER ... INNER`. Closes [#39363](https://github.com/ClickHouse/ClickHouse/issues/39363). [#39384](https://github.com/ClickHouse/ClickHouse/pull/39384) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix WriteBuffer finalize when cancelling insert into function (in previous versions it may leat to std::terminate). [#39458](https://github.com/ClickHouse/ClickHouse/pull/39458) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix storing of columns of type `Object` in sparse serialization. [#39464](https://github.com/ClickHouse/ClickHouse/pull/39464) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible "Not found column in block" exception when using projections. This closes [#39469](https://github.com/ClickHouse/ClickHouse/issues/39469). [#39470](https://github.com/ClickHouse/ClickHouse/pull/39470) ([小路](https://github.com/nicelulu)).
* Fix exception on race between DROP and INSERT with materialized views. [#39477](https://github.com/ClickHouse/ClickHouse/pull/39477) ([Azat Khuzhin](https://github.com/azat)).
* A bug in Apache Avro library: fix data race and possible heap-buffer-overflow in Avro format. Closes [#39094](https://github.com/ClickHouse/ClickHouse/issues/39094) Closes [#33652](https://github.com/ClickHouse/ClickHouse/issues/33652). [#39498](https://github.com/ClickHouse/ClickHouse/pull/39498) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare bug in asynchronous reading (with setting `local_filesystem_read_method='pread_threadpool'`) with enabled `O_DIRECT` (enabled by setting `min_bytes_to_use_direct_io`). [#39506](https://github.com/ClickHouse/ClickHouse/pull/39506) ([Anton Popov](https://github.com/CurtizJ)).
* (only on FreeBSD) Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. [#39551](https://github.com/ClickHouse/ClickHouse/pull/39551) ([Alexander Gololobov](https://github.com/davenger)).
* Fix bug with the recently introduced "maxsplit" argument for `splitByChar`, which was not working correctly. [#39552](https://github.com/ClickHouse/ClickHouse/pull/39552) ([filimonov](https://github.com/filimonov)).
* Fix bug in ASOF JOIN with `enable_optimize_predicate_expression`, close [#37813](https://github.com/ClickHouse/ClickHouse/issues/37813). [#39556](https://github.com/ClickHouse/ClickHouse/pull/39556) ([Vladimir C](https://github.com/vdimir)).
* Fixed `CREATE/DROP INDEX` query with `ON CLUSTER` or `Replicated` database and `ReplicatedMergeTree`. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes [#39511](https://github.com/ClickHouse/ClickHouse/issues/39511). [#39565](https://github.com/ClickHouse/ClickHouse/pull/39565) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "column not found" error for push down with join, close [#39505](https://github.com/ClickHouse/ClickHouse/issues/39505). [#39575](https://github.com/ClickHouse/ClickHouse/pull/39575) ([Vladimir C](https://github.com/vdimir)).
* Fix the wrong `REGEXP_REPLACE` alias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. [#39592](https://github.com/ClickHouse/ClickHouse/pull/39592) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula `exp((t - curr_row_t) / decay_length)`, which is incorrect when right boundary of window is not `CURRENT ROW`. It was changed to: `exp((t - last_row_t) / decay_length)`. There is no change in results for windows with `ROWS BETWEEN (smth) AND CURRENT ROW`. [#39593](https://github.com/ClickHouse/ClickHouse/pull/39593) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fix Decimal division overflow, which can be detected based on operands scale. [#39600](https://github.com/ClickHouse/ClickHouse/pull/39600) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix settings `output_format_arrow_string_as_string` and `output_format_arrow_low_cardinality_as_dictionary` work in combination. Closes [#39624](https://github.com/ClickHouse/ClickHouse/issues/39624). [#39647](https://github.com/ClickHouse/ClickHouse/pull/39647) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed a bug in default database resolution in distributed table reads. [#39674](https://github.com/ClickHouse/ClickHouse/pull/39674) ([Anton Kozlov](https://github.com/tonickkozlov)).
* (Only with the obsolete Ordinary databases) Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. [#39708](https://github.com/ClickHouse/ClickHouse/pull/39708) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix possible error `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinality` Fixes [#38460](https://github.com/ClickHouse/ClickHouse/issues/38460). [#39716](https://github.com/ClickHouse/ClickHouse/pull/39716) ([Arthur Passos](https://github.com/arthurpassos)).
* Field names in the `meta` section of JSON format were erroneously double escaped. This closes [#39693](https://github.com/ClickHouse/ClickHouse/issues/39693). [#39747](https://github.com/ClickHouse/ClickHouse/pull/39747) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong index analysis with tuples and operator `IN`, which could lead to wrong query result. [#39752](https://github.com/ClickHouse/ClickHouse/pull/39752) ([Anton Popov](https://github.com/CurtizJ)).
* Fix `EmbeddedRocksDB` tables filtering by key using params. [#39757](https://github.com/ClickHouse/ClickHouse/pull/39757) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix error `Invalid number of columns in chunk pushed to OutputPort` which was caused by ARRAY JOIN optimization. Fixes [#39164](https://github.com/ClickHouse/ClickHouse/issues/39164). [#39799](https://github.com/ClickHouse/ClickHouse/pull/39799) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* A workaround for a bug in Linux kernel. Fix `CANNOT_READ_ALL_DATA` exception with `local_filesystem_read_method=pread_threadpool`. This bug affected only Linux kernel version 5.9 and 5.10 according to [man](https://manpages.debian.org/testing/manpages-dev/preadv2.2.en.html#BUGS). [#39800](https://github.com/ClickHouse/ClickHouse/pull/39800) ([Anton Popov](https://github.com/CurtizJ)).
* (Only on NFS) Fix broken NFS mkdir for root-squashed volumes. [#39898](https://github.com/ClickHouse/ClickHouse/pull/39898) ([Constantine Peresypkin](https://github.com/pkit)).
* Remove dictionaries from prometheus metrics on DETACH/DROP. [#39926](https://github.com/ClickHouse/ClickHouse/pull/39926) ([Azat Khuzhin](https://github.com/azat)).
* Fix read of StorageFile with virtual columns. Closes [#39907](https://github.com/ClickHouse/ClickHouse/issues/39907). [#39943](https://github.com/ClickHouse/ClickHouse/pull/39943) ([flynn](https://github.com/ucasfl)).
* Fix big memory usage during fetches. Fixes [#39915](https://github.com/ClickHouse/ClickHouse/issues/39915). [#39990](https://github.com/ClickHouse/ClickHouse/pull/39990) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* (experimental feature) Fix `hashId` crash and salt parameter not being used. [#40002](https://github.com/ClickHouse/ClickHouse/pull/40002) ([Raúl Marín](https://github.com/Algunenano)).
* `EXCEPT` and `INTERSECT` operators may lead to crash if a specific combination of constant and non-constant columns were used. [#40020](https://github.com/ClickHouse/ClickHouse/pull/40020) ([Duc Canh Le](https://github.com/canhld94)).
* Fixed "Part directory doesn't exist" and "`tmp_<part_name>` ... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, but `tmp-fetch_<part_name>` directory was not cleaned up. [#40031](https://github.com/ClickHouse/ClickHouse/pull/40031) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix rare cases of parsing of arrays of tuples in format `Values`. [#40034](https://github.com/ClickHouse/ClickHouse/pull/40034) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. [#40037](https://github.com/ClickHouse/ClickHouse/pull/40037) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix potential deadlock in writing to S3 during task scheduling failure. [#40070](https://github.com/ClickHouse/ClickHouse/pull/40070) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bug in collectFilesToSkip() by adding correct file extension (.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed [#39896](https://github.com/ClickHouse/ClickHouse/issues/39896). [#40095](https://github.com/ClickHouse/ClickHouse/pull/40095) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* A fix for reverse DNS resolution. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix unexpected result `arrayDifference` of `Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
### <a id="227"></a> ClickHouse release 22.7, 2022-07-21
#### Upgrade Notes
@ -258,7 +401,7 @@
* Allows providing `NULL`/`NOT NULL` right after type in column declaration. [#37337](https://github.com/ClickHouse/ClickHouse/pull/37337) ([Igor Nikonov](https://github.com/devcrafter)).
* optimize file segment PARTIALLY_DOWNLOADED get read buffer. [#37338](https://github.com/ClickHouse/ClickHouse/pull/37338) ([xiedeyantu](https://github.com/xiedeyantu)).
* Try to improve short circuit functions processing to fix problems with stress tests. [#37384](https://github.com/ClickHouse/ClickHouse/pull/37384) ([Kruglov Pavel](https://github.com/Avogar)).
* Closes [#37395](https://github.com/ClickHouse/ClickHouse/issues/37395). [#37415](https://github.com/ClickHouse/ClickHouse/pull/37415) ([Memo](https://github.com/Joeywzr)).
* Generate multiple columns with UUID (generateUUIDv4(1), generateUUIDv4(2)) [#37395](https://github.com/ClickHouse/ClickHouse/issues/37395). [#37415](https://github.com/ClickHouse/ClickHouse/pull/37415) ([Memo](https://github.com/Joeywzr)).
* Fix extremely rare deadlock during part fetch in zero-copy replication. Fixes [#37423](https://github.com/ClickHouse/ClickHouse/issues/37423). [#37424](https://github.com/ClickHouse/ClickHouse/pull/37424) ([metahys](https://github.com/metahys)).
* Don't allow to create storage with unknown data format. [#37450](https://github.com/ClickHouse/ClickHouse/pull/37450) ([Kruglov Pavel](https://github.com/Avogar)).
* Set `global_memory_usage_overcommit_max_wait_microseconds` default value to 5 seconds. Add info about `OvercommitTracker` to OOM exception message. Add `MemoryOvercommitWaitTimeMicroseconds` profile event. [#37460](https://github.com/ClickHouse/ClickHouse/pull/37460) ([Dmitry Novik](https://github.com/novikd)).

View File

@ -10,9 +10,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 22.8 | ✔️ |
| 22.7 | ✔️ |
| 22.6 | ✔️ |
| 22.5 | ✔️ |
| 22.5 | |
| 22.4 | ❌ |
| 22.3 | ✔️ |
| 22.2 | ❌ |
@ -21,7 +22,7 @@ The following versions of ClickHouse server are currently being supported with s
| 21.11 | ❌ |
| 21.10 | ❌ |
| 21.9 | ❌ |
| 21.8 | ✔️ |
| 21.8 | |
| 21.7 | ❌ |
| 21.6 | ❌ |
| 21.5 | ❌ |

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54465)
SET(VERSION_REVISION 54466)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 8)
SET(VERSION_MINOR 9)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH f4f05ec786a8b8966dd0ea2a2d7e39a8c7db24f4)
SET(VERSION_DESCRIBE v22.8.1.1-testing)
SET(VERSION_STRING 22.8.1.1)
SET(VERSION_GITHASH 09a2ff88435f79e5279745bbe1dc0e5e401df38d)
SET(VERSION_DESCRIBE v22.9.1.1-testing)
SET(VERSION_STRING 22.9.1.1)
# end of autochange

View File

@ -107,6 +107,13 @@ fi
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# port is needed to check if clickhouse-server is ready for connections
HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port)"
HTTPS_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=https_port)"
if [ -n "$HTTP_PORT" ]; then
URL="http://127.0.0.1:$HTTP_PORT/ping"
else
URL="https://127.0.0.1:$HTTPS_PORT/ping"
fi
# Listen only on localhost until the initialization is done
/usr/bin/clickhouse su "${USER}:${GROUP}" /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
@ -115,7 +122,7 @@ if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 12 retries by default, with 1 sec timeout and 1 sec delay between retries)
tries=${CLICKHOUSE_INIT_TIMEOUT:-12}
while ! wget --spider -T 1 -q "http://127.0.0.1:$HTTP_PORT/ping" 2>/dev/null; do
while ! wget --spider --no-check-certificate -T 1 -q "$URL" 2>/dev/null; do
if [ "$tries" -le "0" ]; then
echo >&2 'ClickHouse init process failed.'
exit 1

View File

@ -284,13 +284,21 @@ function run_tests
# Use awk because bash doesn't support floating point arithmetic.
profile_seconds=$(awk "BEGIN { print ($profile_seconds_left > 0 ? 10 : 0) }")
if [ "$(grep -c $(basename $test) changed-test-definitions.txt)" -gt 0 ]
then
# Run all queries from changed test files to ensure that all new queries will be tested.
max_queries=0
else
max_queries=$CHPC_MAX_QUERIES
fi
(
set +x
argv=(
--host localhost localhost
--port "$LEFT_SERVER_PORT" "$RIGHT_SERVER_PORT"
--runs "$CHPC_RUNS"
--max-queries "$CHPC_MAX_QUERIES"
--max-queries "$max_queries"
--profile-seconds "$profile_seconds"
"$test"

View File

@ -387,6 +387,7 @@ else
-e "TABLE_IS_READ_ONLY" \
-e "Code: 1000, e.code() = 111, Connection refused" \
-e "UNFINISHED" \
-e "NETLINK_ERROR" \
-e "Renaming unexpected part" \
-e "PART_IS_TEMPORARILY_LOCKED" \
-e "and a merge is impossible: we didn't find" \

View File

@ -0,0 +1,374 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.8.1.2097-lts (09a2ff88435) FIXME as compared to v22.7.1.2484-stable (f4f05ec786a)
#### Backward Incompatible Change
* Make cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes [#36140](https://github.com/ClickHouse/ClickHouse/issues/36140). Closes [#37889](https://github.com/ClickHouse/ClickHouse/issues/37889). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Extended range of Date32 and DateTime64 to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
#### New Feature
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
* Add SLRU cache policy for uncompressed cache and marks cache. [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)).
* Add concurrent_threads_soft_limit parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)).
* Added support for parallel distributed insert select into tables with Distributed and Replicated engine [#34670](https://github.com/ClickHouse/ClickHouse/issues/34670). [#39107](https://github.com/ClickHouse/ClickHouse/pull/39107) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add new settings to control schema inference from text formats: - `input_format_try_infer_dates` - try infer dates from strings. - `input_format_try_infer_datetimes` - try infer datetimes from strings. - `input_format_try_infer_integers` - try infer `Int64` instead of `Float64`. - `input_format_json_try_infer_numbers_from_strings` - try infer numbers from json strings in JSON formats. [#39186](https://github.com/ClickHouse/ClickHouse/pull/39186) ([Kruglov Pavel](https://github.com/Avogar)).
* This feature will provide JSON formatted log output in console. The purpose is to allow easier ingestion and query in log analysis tools. [#39277](https://github.com/ClickHouse/ClickHouse/pull/39277) ([Mallik Hassan](https://github.com/SadiHassan)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
* Add function `nowInBlock` which allows getting the current time during long-running and continuous queries. Closes [#39522](https://github.com/ClickHouse/ClickHouse/issues/39522). Notes: there are no functions `now64InBlock` neither `todayInBlock`. [#39533](https://github.com/ClickHouse/ClickHouse/pull/39533) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Add result_rows and result_bytes to progress reports (`X-ClickHouse-Summary`). [#39567](https://github.com/ClickHouse/ClickHouse/pull/39567) ([Raúl Marín](https://github.com/Algunenano)).
* adds ability to specify settings for an `executable()` table function. [#39681](https://github.com/ClickHouse/ClickHouse/pull/39681) ([Constantine Peresypkin](https://github.com/pkit)).
* Implemented automatic conversion of database engine from `Ordinary` to `Atomic`. Create empty `convert_ordinary_to_atomic` file in `flags` directory and all `Ordinary` databases will be converted automatically on next server start. Resolves [#39546](https://github.com/ClickHouse/ClickHouse/issues/39546). [#39933](https://github.com/ClickHouse/ClickHouse/pull/39933) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add new setting `schema_inference_hints` that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* * Break on analyze stuck on complex query. [#38185](https://github.com/ClickHouse/ClickHouse/pull/38185) ([Vladimir C](https://github.com/vdimir)).
* Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)).
* `DISTINCT` in order with `ORDER BY` improves memory usage (significantly) and query execution time if `DISTINCT` columns match (or form a prefix of) `ORDER BY` columns. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)).
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
* Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation. It allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)).
* `DistinctSortedTransform` didn't take advantage of sorting, i.e. it worked like ordinary `DISTINCT` implementation. The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
* ColumnVector: optimize filter with AVX512VBMI2 compress store. [#39633](https://github.com/ClickHouse/ClickHouse/pull/39633) ([Guo Wangyang](https://github.com/guowangy)).
* KeyCondition: optimize applyFunction in multi-thread scenario. [#39812](https://github.com/ClickHouse/ClickHouse/pull/39812) ([Guo Wangyang](https://github.com/guowangy)).
* For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. [#40033](https://github.com/ClickHouse/ClickHouse/pull/40033) ([Robert Schulze](https://github.com/rschu1ze)).
* - Don't visit the AST for UDFs if none are registered. [#40069](https://github.com/ClickHouse/ClickHouse/pull/40069) ([Raúl Marín](https://github.com/Algunenano)).
* - Optimize CurrentMemoryTracker alloc and free. [#40078](https://github.com/ClickHouse/ClickHouse/pull/40078) ([Raúl Marín](https://github.com/Algunenano)).
#### Improvement
* Change the way how PK is analyzed for MergeTree. [#25563](https://github.com/ClickHouse/ClickHouse/pull/25563) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* - Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* `timeSlots` now works with DateTime64; subsecond duration and slot size available when working with DateTime64. [#37951](https://github.com/ClickHouse/ClickHouse/pull/37951) ([Andrey Zvonov](https://github.com/zvonand)).
* Add cache for schema inference for file/s3/hdfs/url table functions. Now, schema inference will be performed only on the first query to the file, all subsequent queries to the same file will use the schema from cache if data wasn't changed. Add system table `system.schema_inference_cache` with all current schemas in cache and system queries `SYSTEM DROP SCHEMA CACHE [FOR FILE/S3/HDFS/URL]` to drop schemas from cache. [#38286](https://github.com/ClickHouse/ClickHouse/pull/38286) ([Kruglov Pavel](https://github.com/Avogar)).
* - Simplified function registration macro interface (`FUNCTION_REGISTER*`) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. [#38615](https://github.com/ClickHouse/ClickHouse/pull/38615) ([Li Yin](https://github.com/liyinsg)).
* * Added support of `LEFT SEMI` and `LEFT ANTI` direct join with rocksdb. [#38956](https://github.com/ClickHouse/ClickHouse/pull/38956) ([Vladimir C](https://github.com/vdimir)).
* resolves [#37490](https://github.com/ClickHouse/ClickHouse/issues/37490). [#39054](https://github.com/ClickHouse/ClickHouse/pull/39054) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Store Keeper API version inside a predefined path. [#39096](https://github.com/ClickHouse/ClickHouse/pull/39096) ([Antonio Andelic](https://github.com/antonio2368)).
* Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add profile events for fsync. [#39179](https://github.com/ClickHouse/ClickHouse/pull/39179) ([Azat Khuzhin](https://github.com/azat)).
* Add the second argument to the ordinary function `file(path[, default])`, which function returns in the case when a file does not exists. [#39218](https://github.com/ClickHouse/ClickHouse/pull/39218) ([Nikolay Degterinsky](https://github.com/evillique)).
* Some small fixes for reading via http, allow to retry partial content in case if got 200OK. [#39244](https://github.com/ClickHouse/ClickHouse/pull/39244) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improved Base58 encoding/decoding. [#39292](https://github.com/ClickHouse/ClickHouse/pull/39292) ([Andrey Zvonov](https://github.com/zvonand)).
* Normalize `AggregateFunction` types and state representations because optimizations like https://github.com/ClickHouse/ClickHouse/pull/35788 will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)).
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
* Support queries `CREATE TEMPORARY TABLE ... (<list of columns>) AS ...`. [#39462](https://github.com/ClickHouse/ClickHouse/pull/39462) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support of `!`/`*` (exclamation/asterisk) in custom TLDs (`cutToFirstSignificantSubdomainCustom()`/`cutToFirstSignificantSubdomainCustomWithWWW()`/`firstSignificantSubdomainCustom()`). [#39496](https://github.com/ClickHouse/ClickHouse/pull/39496) ([Azat Khuzhin](https://github.com/azat)).
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
* Refactored a little code, removed duplicate code. [#39509](https://github.com/ClickHouse/ClickHouse/pull/39509) ([Simon Liu](https://github.com/monadbobo)).
* Add support for TLS connections to NATS. Implements [#39525](https://github.com/ClickHouse/ClickHouse/issues/39525). [#39527](https://github.com/ClickHouse/ClickHouse/pull/39527) ([Constantine Peresypkin](https://github.com/pkit)).
* `clickhouse-obfuscator` (a tool for database obfuscation for testing and load generation) now has the new `--save` and `--load` parameters to work with pre-trained models. This closes [#39534](https://github.com/ClickHouse/ClickHouse/issues/39534). [#39541](https://github.com/ClickHouse/ClickHouse/pull/39541) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix incorrect behavior of log rotation during restart. [#39558](https://github.com/ClickHouse/ClickHouse/pull/39558) ([Nikolay Degterinsky](https://github.com/evillique)).
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
* Add formats `PrettyMonoBlock`, `PrettyNoEscapesMonoBlock`, `PrettyCompactNoEscapes`, `PrettyCompactNoEscapesMonoBlock`, `PrettySpaceNoEscapes`, `PrettySpaceMonoBlock`, `PrettySpaceNoEscapesMonoBlock`. [#39646](https://github.com/ClickHouse/ClickHouse/pull/39646) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes [#39667](https://github.com/ClickHouse/ClickHouse/issues/39667) . [#39671](https://github.com/ClickHouse/ClickHouse/pull/39671) ([Amos Bird](https://github.com/amosbird)).
* Allow to execute hash functions with arguments of type `Map`. [#39685](https://github.com/ClickHouse/ClickHouse/pull/39685) ([Anton Popov](https://github.com/CurtizJ)).
* Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. [#39690](https://github.com/ClickHouse/ClickHouse/pull/39690) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* change the prefix size of AggregateFunctionDistinct to make sure nested function data memory aligned. [#39696](https://github.com/ClickHouse/ClickHouse/pull/39696) ([Pxl](https://github.com/BiteTheDDDDt)).
* Properly escape credentials passed to the `clickhouse-diagnostic` tool. [#39707](https://github.com/ClickHouse/ClickHouse/pull/39707) ([Dale McDiarmid](https://github.com/gingerwizard)).
* keeper-improvement: create a snapshot on exit. It can be controlled with the config `keeper_server.create_snapshot_on_exit`, `true` by default. [#39755](https://github.com/ClickHouse/ClickHouse/pull/39755) ([Antonio Andelic](https://github.com/antonio2368)).
* Support primary key analysis for `row_policy_filter` and `additional_filter`. It also helps fix issues like [#37454](https://github.com/ClickHouse/ClickHouse/issues/37454) . [#39826](https://github.com/ClickHouse/ClickHouse/pull/39826) ([Amos Bird](https://github.com/amosbird)).
* Parameters are now transferred in `Query` packets right after the query text in the same serialisation format as the settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
* Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes [#39957](https://github.com/ClickHouse/ClickHouse/issues/39957). This closes [#39960](https://github.com/ClickHouse/ClickHouse/issues/39960). [#39961](https://github.com/ClickHouse/ClickHouse/pull/39961) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. [#39962](https://github.com/ClickHouse/ClickHouse/pull/39962) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes [#38070](https://github.com/ClickHouse/ClickHouse/issues/38070). See also [this](https://github.com/ClickHouse/ClickBench/blob/main/hardware/benchmark-cloud.sh#L37) for motivation. [#39968](https://github.com/ClickHouse/ClickHouse/pull/39968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Adds `parseDateTime64BestEffortUS`, `parseDateTime64BestEffortUSOrNull`, `parseDateTime64BestEffortUSOrZero` functions, closing [#37492](https://github.com/ClickHouse/ClickHouse/issues/37492). [#40015](https://github.com/ClickHouse/ClickHouse/pull/40015) ([Tanya Bragin](https://github.com/tbragin)).
* * Add observer mode to (zoo)keeper cluster discovery feature. In this mode node itself doesn't belong to cluster. [#40035](https://github.com/ClickHouse/ClickHouse/pull/40035) ([Vladimir C](https://github.com/vdimir)).
* Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. [#40053](https://github.com/ClickHouse/ClickHouse/pull/40053) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Extend processors_profile_log with more information such as input rows. [#40121](https://github.com/ClickHouse/ClickHouse/pull/40121) ([Amos Bird](https://github.com/amosbird)).
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). [#40184](https://github.com/ClickHouse/ClickHouse/pull/40184) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Display server-side time in `clickhouse-benchmark` by default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new `--client-side-time` command line option. Change the `--randomize` command line option from `--randomize 1` to the form without argument. [#40193](https://github.com/ClickHouse/ClickHouse/pull/40193) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for `overflow_mode` = `break` and `throw`). For example, if you have set up `max_rows_to_read` with `read_overflow_mode = 'break'`, looking at the value of `OverflowBreak` counter will allow distinguishing incomplete results. [#40205](https://github.com/ClickHouse/ClickHouse/pull/40205) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix memory accounting in case of MEMORY_LIMIT_EXCEEDED errors (previously [peak] memory usage was takes failed allocations into account). [#40249](https://github.com/ClickHouse/ClickHouse/pull/40249) ([Azat Khuzhin](https://github.com/azat)).
* Add current metrics for fs cache: `FilesystemCacheSize` and `FilesystemCacheElements`. [#40260](https://github.com/ClickHouse/ClickHouse/pull/40260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
#### Bug Fix
* Support hadoop secure rpc transfer(hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). [#39411](https://github.com/ClickHouse/ClickHouse/pull/39411) ([michael1589](https://github.com/michael1589)).
* Fix seeking while reading from encrypted disk. This PR fixes [#38381](https://github.com/ClickHouse/ClickHouse/issues/38381). [#39687](https://github.com/ClickHouse/ClickHouse/pull/39687) ([Vitaly Baranov](https://github.com/vitlibar)).
* * Fix duplicate columns in join plan. Finally, solve [#26809](https://github.com/ClickHouse/ClickHouse/issues/26809). [#40009](https://github.com/ClickHouse/ClickHouse/pull/40009) ([Vladimir C](https://github.com/vdimir)).
#### Build/Testing/Packaging Improvement
* Prebuild ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011. [#39000](https://github.com/ClickHouse/ClickHouse/pull/39000) ([Robert Schulze](https://github.com/rschu1ze)).
* Former packages used to install systemd.service file to `/etc`. The files there are marked as `conf` and are not cleaned out, and not updated automatically. This PR cleans them out. [#39323](https://github.com/ClickHouse/ClickHouse/pull/39323) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix LSan by fixing getauxval(). [#39430](https://github.com/ClickHouse/ClickHouse/pull/39430) ([Azat Khuzhin](https://github.com/azat)).
* TSAN has issues with clang-14 (https://github.com/google/sanitizers/issues/1552, https://github.com/google/sanitizers/issues/1540), so here we temporary build the TSAN binaries with clang-13. [#39450](https://github.com/ClickHouse/ClickHouse/pull/39450) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove the option to build ClickHouse tools as separate executable programs. This fixes [#37847](https://github.com/ClickHouse/ClickHouse/issues/37847). [#39520](https://github.com/ClickHouse/ClickHouse/pull/39520) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed Unit tests for wide integers on s390x. [#39627](https://github.com/ClickHouse/ClickHouse/pull/39627) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Increase max cache size for clang-tidy builds. Try to avoid flushing it out between builds. [#39652](https://github.com/ClickHouse/ClickHouse/pull/39652) ([Nikita Taranov](https://github.com/nickitat)).
* No need to use fixed IP when you are using cluster with SSL. Using the same fixed IP could trigger collision between tests. At this change the server's certificate is generated for a designated host name (see server-ext.cnf at each test). The client should check server's certificate against that name accordingly. [#40007](https://github.com/ClickHouse/ClickHouse/pull/40007) ([Sema Checherinda](https://github.com/CheSema)).
* Support build with `clang-16` (trunk). This closes [#39949](https://github.com/ClickHouse/ClickHouse/issues/39949). [#40181](https://github.com/ClickHouse/ClickHouse/pull/40181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prepare RISC-V 64 build to run in CI. This is for [#40141](https://github.com/ClickHouse/ClickHouse/issues/40141). [#40197](https://github.com/ClickHouse/ClickHouse/pull/40197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. [#37849](https://github.com/ClickHouse/ClickHouse/pull/37849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). [#38725](https://github.com/ClickHouse/ClickHouse/pull/38725) ([Azat Khuzhin](https://github.com/azat)).
* Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes [#37032](https://github.com/ClickHouse/ClickHouse/issues/37032). [#39103](https://github.com/ClickHouse/ClickHouse/pull/39103) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Fix wrong table name in logs after RENAME TABLE. This fixes [#38018](https://github.com/ClickHouse/ClickHouse/issues/38018). [#39227](https://github.com/ClickHouse/ClickHouse/pull/39227) ([Amos Bird](https://github.com/amosbird)).
* Fix positional arguments in case of columns pruning when optimising the query. Closes [#38433](https://github.com/ClickHouse/ClickHouse/issues/38433). [#39293](https://github.com/ClickHouse/ClickHouse/pull/39293) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty `Tuple` type. Closes [#39051](https://github.com/ClickHouse/ClickHouse/issues/39051) Add 2 new settings `input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference` that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. [#39357](https://github.com/ClickHouse/ClickHouse/pull/39357) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix segmentation fault on `CREATE WINDOW VIEW .. ON CLUSTER ... INNER`. Closes [#39363](https://github.com/ClickHouse/ClickHouse/issues/39363). [#39384](https://github.com/ClickHouse/ClickHouse/pull/39384) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix WriteBuffer finalize when cancel insert into function. Proper version of https://github.com/ClickHouse/ClickHouse/pull/39396 that was reverted. [#39458](https://github.com/ClickHouse/ClickHouse/pull/39458) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix storing of columns of type `Object` in sparse serialization. [#39464](https://github.com/ClickHouse/ClickHouse/pull/39464) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible "Not found column in block" exception when using projections. This closes [#39469](https://github.com/ClickHouse/ClickHouse/issues/39469). [#39470](https://github.com/ClickHouse/ClickHouse/pull/39470) ([小路](https://github.com/nicelulu)).
* Fix LOGICAL_ERROR on race between DROP and INSERT with materialized views. [#39477](https://github.com/ClickHouse/ClickHouse/pull/39477) ([Azat Khuzhin](https://github.com/azat)).
* Fix data race and possible heap-buffer-overflow in Avro format. Closes [#39094](https://github.com/ClickHouse/ClickHouse/issues/39094) Closes [#33652](https://github.com/ClickHouse/ClickHouse/issues/33652). [#39498](https://github.com/ClickHouse/ClickHouse/pull/39498) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare bug in asynchronous reading (with setting `local_filesystem_read_method='pread_threadpool'`) with enabled `O_DIRECT` (enabled by setting `min_bytes_to_use_direct_io`). [#39506](https://github.com/ClickHouse/ClickHouse/pull/39506) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. [#39551](https://github.com/ClickHouse/ClickHouse/pull/39551) ([Alexander Gololobov](https://github.com/davenger)).
* Fix bug with maxsplit argument for splitByChar, which was not working correctly. [#39552](https://github.com/ClickHouse/ClickHouse/pull/39552) ([filimonov](https://github.com/filimonov)).
* * Fix bug in ASOF JOIN with `enable_optimize_predicate_expression`, close [#37813](https://github.com/ClickHouse/ClickHouse/issues/37813). [#39556](https://github.com/ClickHouse/ClickHouse/pull/39556) ([Vladimir C](https://github.com/vdimir)).
* Fixed `CREATE/DROP INDEX` query with `ON CLUSTER` or `Replicated` database and `ReplicatedMergeTree`. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes [#39511](https://github.com/ClickHouse/ClickHouse/issues/39511). [#39565](https://github.com/ClickHouse/ClickHouse/pull/39565) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "column not found" error for push down with join, close [#39505](https://github.com/ClickHouse/ClickHouse/issues/39505). [#39575](https://github.com/ClickHouse/ClickHouse/pull/39575) ([Vladimir C](https://github.com/vdimir)).
* Fix the wrong `REGEXP_REPLACE` alias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. [#39592](https://github.com/ClickHouse/ClickHouse/pull/39592) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula `exp((t - curr_row_t) / decay_length)`, which is incorrect when right boundary of window is not `CURRENT ROW`. It was changed to: `exp((t - last_row_t) / decay_length)`. There is no change in results for windows with `ROWS BETWEEN (smth) AND CURRENT ROW`. [#39593](https://github.com/ClickHouse/ClickHouse/pull/39593) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fix Decimal division overflow, which can be detected based on operands scale. [#39600](https://github.com/ClickHouse/ClickHouse/pull/39600) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix settings `output_format_arrow_string_as_string` and `output_format_arrow_low_cardinality_as_dictionary` work in combination. Closes [#39624](https://github.com/ClickHouse/ClickHouse/issues/39624). [#39647](https://github.com/ClickHouse/ClickHouse/pull/39647) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed a bug in default database resolution in distributed table reads. [#39674](https://github.com/ClickHouse/ClickHouse/pull/39674) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. [#39708](https://github.com/ClickHouse/ClickHouse/pull/39708) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix possible error `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinality` Fixes [#38460](https://github.com/ClickHouse/ClickHouse/issues/38460). [#39716](https://github.com/ClickHouse/ClickHouse/pull/39716) ([Arthur Passos](https://github.com/arthurpassos)).
* Field names in the `meta` section of JSON format were erroneously double escaped. This closes [#39693](https://github.com/ClickHouse/ClickHouse/issues/39693). [#39747](https://github.com/ClickHouse/ClickHouse/pull/39747) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong index analysis with tuples and operator `IN`, which could lead to wrong query result. [#39752](https://github.com/ClickHouse/ClickHouse/pull/39752) ([Anton Popov](https://github.com/CurtizJ)).
* Fix EmbeddedRocksDB filtering by key using params. [#39757](https://github.com/ClickHouse/ClickHouse/pull/39757) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix error `Invalid number of columns in chunk pushed to OutputPort` which was cause by ARRAY JOIN optimization. Fixes [#39164](https://github.com/ClickHouse/ClickHouse/issues/39164). [#39799](https://github.com/ClickHouse/ClickHouse/pull/39799) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `CANNOT_READ_ALL_DATA` exception with `local_filesystem_read_method=pread_threadpool`. This bug affected only Linux kernel version 5.9 and 5.10 according to [man](https://manpages.debian.org/testing/manpages-dev/preadv2.2.en.html#BUGS). [#39800](https://github.com/ClickHouse/ClickHouse/pull/39800) ([Anton Popov](https://github.com/CurtizJ)).
* Fix quota_key application on connect. [#39874](https://github.com/ClickHouse/ClickHouse/pull/39874) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* we meeted query exceptions: DB::Exception: Cannot open file /media/ssd1/fordata/clickhouse/data/data/perf/perf_log_local_v3_1/20220618_17233_17238_1/namespace.dict.bin, errno: 24, strerror: Too many open files. [#39886](https://github.com/ClickHouse/ClickHouse/pull/39886) ([Fangyuan Deng](https://github.com/pzhdfy)).
* fix broken NFS mkdir for root-squashed volumes. [#39898](https://github.com/ClickHouse/ClickHouse/pull/39898) ([Constantine Peresypkin](https://github.com/pkit)).
* Remove dictionaries from prometheus metrics on DETACH/DROP. [#39926](https://github.com/ClickHouse/ClickHouse/pull/39926) ([Azat Khuzhin](https://github.com/azat)).
* Fix read of StorageFile with virtual columns. Closes [#39907](https://github.com/ClickHouse/ClickHouse/issues/39907). [#39943](https://github.com/ClickHouse/ClickHouse/pull/39943) ([flynn](https://github.com/ucasfl)).
* Fix big memory usage during fetches. Fixes [#39915](https://github.com/ClickHouse/ClickHouse/issues/39915). [#39990](https://github.com/ClickHouse/ClickHouse/pull/39990) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* - Fix hashId crash and salt parameter not being used. [#40002](https://github.com/ClickHouse/ClickHouse/pull/40002) ([Raúl Marín](https://github.com/Algunenano)).
* fix HashMethodOneNumber get wrong key value when column is const. [#40020](https://github.com/ClickHouse/ClickHouse/pull/40020) ([Duc Canh Le](https://github.com/canhld94)).
* Fixed "Part directory doesn't exist" and "`tmp_<part_name>` ... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, but `tmp-fetch_<part_name>` directory was not cleaned up. [#40031](https://github.com/ClickHouse/ClickHouse/pull/40031) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix rare cases of parsing of arrays of tuples in format `Values`. [#40034](https://github.com/ClickHouse/ClickHouse/pull/40034) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. [#40037](https://github.com/ClickHouse/ClickHouse/pull/40037) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix potential deadlock in WriteBufferFromS3 during task scheduling failure. [#40070](https://github.com/ClickHouse/ClickHouse/pull/40070) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bug in collectFilesToSkip() by adding correct file extension(.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed [#39896](https://github.com/ClickHouse/ClickHouse/issues/39896). [#40095](https://github.com/ClickHouse/ClickHouse/pull/40095) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* A segmentation fault that has CaresPTRResolver::resolve in the stack trace has been reported:. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Fix unexpected result arrayDifference of Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
* Fix the case when the order of columns can be incorrect if the `IN` operator is used with a table with `ENGINE = Set` containing multiple columns. This fixes [#13014](https://github.com/ClickHouse/ClickHouse/issues/13014). [#40225](https://github.com/ClickHouse/ClickHouse/pull/40225) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix possible segfault in CapnProto input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
* - Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). [#40264](https://github.com/ClickHouse/ClickHouse/pull/40264) ([Robert Schulze](https://github.com/rschu1ze)).
#### Build
* Fix build error: ``` [ 69%] Building CXX object src/CMakeFiles/clickhouse_common_io.dir/Common/waitForPid.cpp.o /CLionProjects/clickhouse-yandex/src/Common/waitForPid.cpp:112:5: error: identifier '__kevp__' is reserved because it starts with '__' [-Werror,-Wreserved-identifier] EV_SET(&change, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL); ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/event.h:108:17: note: expanded from macro 'EV_SET' struct kevent *__kevp__ = (kevp); \ ^ ```. [#39493](https://github.com/ClickHouse/ClickHouse/pull/39493) ([小路](https://github.com/nicelulu)).
#### Build Improvement
* Fixed Endian issue in BitHelpers for s390x. [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). [#39732](https://github.com/ClickHouse/ClickHouse/pull/39732) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). [#39931](https://github.com/ClickHouse/ClickHouse/pull/39931) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). [#40008](https://github.com/ClickHouse/ClickHouse/pull/40008) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). [#40179](https://github.com/ClickHouse/ClickHouse/pull/40179) ([Harry Lee](https://github.com/HarryLeeIBM)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "tests: enable back 02232_dist_insert_send_logs_level_hung"'. [#39788](https://github.com/ClickHouse/ClickHouse/pull/39788) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Update arrow to fix possible data race"'. [#39804](https://github.com/ClickHouse/ClickHouse/pull/39804) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Revert "Update arrow to fix possible data race""'. [#39811](https://github.com/ClickHouse/ClickHouse/pull/39811) ([Kruglov Pavel](https://github.com/Avogar)).
* NO CL ENTRY: 'Revert "Limit number of analyze for one query"'. [#39816](https://github.com/ClickHouse/ClickHouse/pull/39816) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Revert "tests: enable back 02232_dist_insert_send_logs_level_hung""'. [#39817](https://github.com/ClickHouse/ClickHouse/pull/39817) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Prepare library-bridge for catboost integration'. [#39904](https://github.com/ClickHouse/ClickHouse/pull/39904) ([Robert Schulze](https://github.com/rschu1ze)).
* NO CL ENTRY: 'Revert "ColumnVector: optimize filter with AVX512VBMI2 compress store"'. [#39963](https://github.com/ClickHouse/ClickHouse/pull/39963) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "copy self-extracting to output"'. [#40005](https://github.com/ClickHouse/ClickHouse/pull/40005) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Use separate counter for RSS in global memory tracker."'. [#40199](https://github.com/ClickHouse/ClickHouse/pull/40199) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "tests/performance: cover sparse_hashed dictionary"'. [#40268](https://github.com/ClickHouse/ClickHouse/pull/40268) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Test/insert deduplication token materialized views [#34662](https://github.com/ClickHouse/ClickHouse/pull/34662) ([Denny Crane](https://github.com/den-crane)).
* Merging [#34372](https://github.com/ClickHouse/ClickHouse/issues/34372) [#35968](https://github.com/ClickHouse/ClickHouse/pull/35968) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Use separate counter for RSS in global memory tracker. [#38682](https://github.com/ClickHouse/ClickHouse/pull/38682) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Build self-extracting-executable utils [#38936](https://github.com/ClickHouse/ClickHouse/pull/38936) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Improvements in integration tests [#38978](https://github.com/ClickHouse/ClickHouse/pull/38978) ([Ilya Yatsishin](https://github.com/qoega)).
* More readable regexp in `test_quota` [#39084](https://github.com/ClickHouse/ClickHouse/pull/39084) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fixed regexp in `test_match_process_uid_against_data_owner` [#39085](https://github.com/ClickHouse/ClickHouse/pull/39085) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* tests: enable back 02232_dist_insert_send_logs_level_hung [#39124](https://github.com/ClickHouse/ClickHouse/pull/39124) ([Azat Khuzhin](https://github.com/azat)).
* Add connection info for Distributed sends log message [#39178](https://github.com/ClickHouse/ClickHouse/pull/39178) ([Azat Khuzhin](https://github.com/azat)).
* Forbid defining non-default disk with default path from <path> [#39183](https://github.com/ClickHouse/ClickHouse/pull/39183) ([Azat Khuzhin](https://github.com/azat)).
* Fix LZ4 decompression issue for s390x [#39195](https://github.com/ClickHouse/ClickHouse/pull/39195) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Do not report "Failed communicating with" on and on for parts exchange [#39222](https://github.com/ClickHouse/ClickHouse/pull/39222) ([Azat Khuzhin](https://github.com/azat)).
* Improve logging around replicated merges [#39230](https://github.com/ClickHouse/ClickHouse/pull/39230) ([Raúl Marín](https://github.com/Algunenano)).
* Cleanup logic around join_algorithm setting, add docs [#39271](https://github.com/ClickHouse/ClickHouse/pull/39271) ([Vladimir C](https://github.com/vdimir)).
* Possible fix for flaky `test_keeper_force_recovery` [#39321](https://github.com/ClickHouse/ClickHouse/pull/39321) ([Antonio Andelic](https://github.com/antonio2368)).
* tests/performance: improve parallel_mv test [#39325](https://github.com/ClickHouse/ClickHouse/pull/39325) ([Azat Khuzhin](https://github.com/azat)).
* Update azure library (removed "harmful" function) [#39327](https://github.com/ClickHouse/ClickHouse/pull/39327) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Refactor PreparedSets/SubqueryForSet [#39343](https://github.com/ClickHouse/ClickHouse/pull/39343) ([Vladimir C](https://github.com/vdimir)).
* Small doc updates [#39362](https://github.com/ClickHouse/ClickHouse/pull/39362) ([Robert Schulze](https://github.com/rschu1ze)).
* Even less usage of StringRef [#39364](https://github.com/ClickHouse/ClickHouse/pull/39364) ([Robert Schulze](https://github.com/rschu1ze)).
* Automatic fixes for black formatting for domestic repo PRs [#39390](https://github.com/ClickHouse/ClickHouse/pull/39390) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Clickhouse-local fixes [#39404](https://github.com/ClickHouse/ClickHouse/pull/39404) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Uppercase `ROWS`, `GROUPS`, `RANGE` in queries with windows [#39410](https://github.com/ClickHouse/ClickHouse/pull/39410) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* GitHub helper [#39421](https://github.com/ClickHouse/ClickHouse/pull/39421) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* ShellCommand wait pid refactoring [#39426](https://github.com/ClickHouse/ClickHouse/pull/39426) ([Maksim Kita](https://github.com/kitaisreal)).
* Require clear style check to continue building [#39428](https://github.com/ClickHouse/ClickHouse/pull/39428) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* DirectDictionary improve performance of dictHas with duplicate keys [#39449](https://github.com/ClickHouse/ClickHouse/pull/39449) ([Maksim Kita](https://github.com/kitaisreal)).
* Commit status names: remove "actions" [#39454](https://github.com/ClickHouse/ClickHouse/pull/39454) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve synchronization between hosts in distributed backup and fix locks [#39455](https://github.com/ClickHouse/ClickHouse/pull/39455) ([Vitaly Baranov](https://github.com/vitlibar)).
* Remove some dead and commented code [#39460](https://github.com/ClickHouse/ClickHouse/pull/39460) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add Build Check and Special Build Check to SimpleCheck [#39467](https://github.com/ClickHouse/ClickHouse/pull/39467) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update version after release [#39474](https://github.com/ClickHouse/ClickHouse/pull/39474) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.7.1.2484-stable [#39475](https://github.com/ClickHouse/ClickHouse/pull/39475) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Update README.md [#39478](https://github.com/ClickHouse/ClickHouse/pull/39478) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Remove unused constructor [#39491](https://github.com/ClickHouse/ClickHouse/pull/39491) ([alesapin](https://github.com/alesapin)).
* Mark new codec DEFLATE_QPL as experimental + cosmetics [#39495](https://github.com/ClickHouse/ClickHouse/pull/39495) ([Robert Schulze](https://github.com/rschu1ze)).
* Update arrow to fix possible data race [#39510](https://github.com/ClickHouse/ClickHouse/pull/39510) ([Kruglov Pavel](https://github.com/Avogar)).
* fix `-DENABLE_EXAMPLES=1` in master [#39517](https://github.com/ClickHouse/ClickHouse/pull/39517) ([Constantine Peresypkin](https://github.com/pkit)).
* LZ4_decompress_faster.cpp: remove endianness-dependent code [#39523](https://github.com/ClickHouse/ClickHouse/pull/39523) ([Ignat Loskutov](https://github.com/loskutov)).
* Fix 02286_parallel_final [#39524](https://github.com/ClickHouse/ClickHouse/pull/39524) ([Nikita Taranov](https://github.com/nickitat)).
* add Equinix metal N3 Xlarge [#39532](https://github.com/ClickHouse/ClickHouse/pull/39532) ([Tyler Hannan](https://github.com/tylerhannan)).
* Less usage of StringRef [#39535](https://github.com/ClickHouse/ClickHouse/pull/39535) ([Robert Schulze](https://github.com/rschu1ze)).
* Follow up to [#37827](https://github.com/ClickHouse/ClickHouse/issues/37827) [#39557](https://github.com/ClickHouse/ClickHouse/pull/39557) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Temporarily disable all tests with MaterializedPostgreSQL [#39564](https://github.com/ClickHouse/ClickHouse/pull/39564) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update version_date.tsv after v22.3.9.19-lts [#39576](https://github.com/ClickHouse/ClickHouse/pull/39576) ([github-actions[bot]](https://github.com/apps/github-actions)).
* free compression and decompression contexts [#39578](https://github.com/ClickHouse/ClickHouse/pull/39578) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update version_date.tsv and changelogs after v22.6.4.35-stable [#39579](https://github.com/ClickHouse/ClickHouse/pull/39579) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Merge Woboq code browser page into "Getting Started" document [#39596](https://github.com/ClickHouse/ClickHouse/pull/39596) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix Chain::addSink [#39601](https://github.com/ClickHouse/ClickHouse/pull/39601) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update NuRaft to latest master [#39609](https://github.com/ClickHouse/ClickHouse/pull/39609) ([Antonio Andelic](https://github.com/antonio2368)).
* copy self-extracting to output [#39617](https://github.com/ClickHouse/ClickHouse/pull/39617) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Replace MemoryTrackerBlockerInThread to LockMemoryExceptionInThread [#39619](https://github.com/ClickHouse/ClickHouse/pull/39619) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Combining sumIf->countIf and multiIf->if opt. [#39621](https://github.com/ClickHouse/ClickHouse/pull/39621) ([Amos Bird](https://github.com/amosbird)).
* Update README.md [#39622](https://github.com/ClickHouse/ClickHouse/pull/39622) ([Ivan Blinkov](https://github.com/blinkov)).
* Disable 02327_capnproto_protobuf_empty_messages with Ordinary [#39623](https://github.com/ClickHouse/ClickHouse/pull/39623) ([Alexander Tokmakov](https://github.com/tavplubix)).
* add Dell PowerEdge R740XD results [#39625](https://github.com/ClickHouse/ClickHouse/pull/39625) ([Tyler Hannan](https://github.com/tylerhannan)).
* Attempt to fix wrong workflow_run data for rerun [#39630](https://github.com/ClickHouse/ClickHouse/pull/39630) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Run tests with Replicated database in master [#39653](https://github.com/ClickHouse/ClickHouse/pull/39653) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Rollback request in Keeper if storing log fails [#39673](https://github.com/ClickHouse/ClickHouse/pull/39673) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix utils build on CI [#39679](https://github.com/ClickHouse/ClickHouse/pull/39679) ([Azat Khuzhin](https://github.com/azat)).
* Add duration_ms into system.zookeeper_log [#39686](https://github.com/ClickHouse/ClickHouse/pull/39686) ([Azat Khuzhin](https://github.com/azat)).
* Fix DISTINCT: handle all const columns case correctly [#39688](https://github.com/ClickHouse/ClickHouse/pull/39688) ([Igor Nikonov](https://github.com/devcrafter)).
* Update README.md [#39692](https://github.com/ClickHouse/ClickHouse/pull/39692) ([Yuko Takagi](https://github.com/yukotakagi)).
* Update Keeper version for digest [#39698](https://github.com/ClickHouse/ClickHouse/pull/39698) ([Antonio Andelic](https://github.com/antonio2368)).
* Change mysql-odbc url [#39702](https://github.com/ClickHouse/ClickHouse/pull/39702) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Avoid recursive destruction of AST. [#39705](https://github.com/ClickHouse/ClickHouse/pull/39705) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update ccache to the latest available version [#39709](https://github.com/ClickHouse/ClickHouse/pull/39709) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Join enums refactoring [#39718](https://github.com/ClickHouse/ClickHouse/pull/39718) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix flaky test `02360_send_logs_level_colors` [#39720](https://github.com/ClickHouse/ClickHouse/pull/39720) ([Anton Popov](https://github.com/CurtizJ)).
* Fix cherry-pick for cases, when assignee is not set for PR [#39723](https://github.com/ClickHouse/ClickHouse/pull/39723) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Jepsen label [#39730](https://github.com/ClickHouse/ClickHouse/pull/39730) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix redirecting of logs to stdout in clickhouse-client [#39731](https://github.com/ClickHouse/ClickHouse/pull/39731) ([Anton Popov](https://github.com/CurtizJ)).
* CI: refactor Simple Check, use statuses to make it stateful [#39735](https://github.com/ClickHouse/ClickHouse/pull/39735) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Use different root path for total-queue Jepsen test [#39738](https://github.com/ClickHouse/ClickHouse/pull/39738) ([Antonio Andelic](https://github.com/antonio2368)).
* Simple refactoring: ordinary DISTINCT implementation [#39740](https://github.com/ClickHouse/ClickHouse/pull/39740) ([Igor Nikonov](https://github.com/devcrafter)).
* Cleanup usages of `allow_experimental_projection_optimization` setting, part 1 [#39746](https://github.com/ClickHouse/ClickHouse/pull/39746) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable SQL function getOSKernelVersion() on all platforms [#39751](https://github.com/ClickHouse/ClickHouse/pull/39751) ([Robert Schulze](https://github.com/rschu1ze)).
* Try clang-15 for build with tsan [#39758](https://github.com/ClickHouse/ClickHouse/pull/39758) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Rename "splitted build" to "shared libraries build" in CI tools [#39759](https://github.com/ClickHouse/ClickHouse/pull/39759) ([Robert Schulze](https://github.com/rschu1ze)).
* Use std::popcount, ::countl_zero, ::countr_zero functions [#39760](https://github.com/ClickHouse/ClickHouse/pull/39760) ([Robert Schulze](https://github.com/rschu1ze)).
* Self-extracting - run resulting executable with execvp [#39763](https://github.com/ClickHouse/ClickHouse/pull/39763) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix non-deterministic queries in distinct_in_order test [#39772](https://github.com/ClickHouse/ClickHouse/pull/39772) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix some flaky integration tests [#39775](https://github.com/ClickHouse/ClickHouse/pull/39775) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Retry inserts with ClickHouseHelper [#39780](https://github.com/ClickHouse/ClickHouse/pull/39780) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add cloudflare DNS as a fallback [#39795](https://github.com/ClickHouse/ClickHouse/pull/39795) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update README.md [#39796](https://github.com/ClickHouse/ClickHouse/pull/39796) ([Yuko Takagi](https://github.com/yukotakagi)).
* Minor fix for Stress Tests [#39798](https://github.com/ClickHouse/ClickHouse/pull/39798) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Typos [#39813](https://github.com/ClickHouse/ClickHouse/pull/39813) ([Robert Schulze](https://github.com/rschu1ze)).
* Update settings changes history [#39839](https://github.com/ClickHouse/ClickHouse/pull/39839) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix post-build script for building utils/self-extracting-executable/compressor [#39843](https://github.com/ClickHouse/ClickHouse/pull/39843) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add hasJoin method into ASTSelectQuery [#39850](https://github.com/ClickHouse/ClickHouse/pull/39850) ([Maksim Kita](https://github.com/kitaisreal)).
* Update tweak on version part update [#39853](https://github.com/ClickHouse/ClickHouse/pull/39853) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.7.2.15-stable [#39854](https://github.com/ClickHouse/ClickHouse/pull/39854) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Fix typo and extra dots in exception messages from OverCommitTracker [#39858](https://github.com/ClickHouse/ClickHouse/pull/39858) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix flaky integration test test_async_backups_to_same_destination. [#39859](https://github.com/ClickHouse/ClickHouse/pull/39859) ([Vitaly Baranov](https://github.com/vitlibar)).
* Better total part size calculation on mutation [#39860](https://github.com/ClickHouse/ClickHouse/pull/39860) ([alesapin](https://github.com/alesapin)).
* typo: PostgerSQL -> PostgreSQL [#39861](https://github.com/ClickHouse/ClickHouse/pull/39861) ([nathanbegbie](https://github.com/nathanbegbie)).
* Remove prefer_localhost_replica from test [#39862](https://github.com/ClickHouse/ClickHouse/pull/39862) ([Igor Nikonov](https://github.com/devcrafter)).
* Block memory tracker in Keeper during commit [#39867](https://github.com/ClickHouse/ClickHouse/pull/39867) ([Antonio Andelic](https://github.com/antonio2368)).
* Update version_date.tsv after v22.3.10.22-lts [#39868](https://github.com/ClickHouse/ClickHouse/pull/39868) ([github-actions[bot]](https://github.com/apps/github-actions)).
* fix incorrect format for functions with settings [#39869](https://github.com/ClickHouse/ClickHouse/pull/39869) ([Constantine Peresypkin](https://github.com/pkit)).
* Get api url from event, not from const/ENV [#39871](https://github.com/ClickHouse/ClickHouse/pull/39871) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Cleanup unused dirs from `store/` on all disks [#39872](https://github.com/ClickHouse/ClickHouse/pull/39872) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#39893](https://github.com/ClickHouse/ClickHouse/pull/39893) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix the race between waitMutation and updating local queue from ZK [#39900](https://github.com/ClickHouse/ClickHouse/pull/39900) ([Alexander Gololobov](https://github.com/davenger)).
* Improve 02354_distributed_with_external_aggregation_memory_usage [#39908](https://github.com/ClickHouse/ClickHouse/pull/39908) ([Nikita Taranov](https://github.com/nickitat)).
* Move username and password from URL parameters to Basic Authentication [#39910](https://github.com/ClickHouse/ClickHouse/pull/39910) ([San](https://github.com/santrancisco)).
* Remove cache flush from the Docs Check [#39911](https://github.com/ClickHouse/ClickHouse/pull/39911) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky tests (`Tried to commit obsolete part`) [#39922](https://github.com/ClickHouse/ClickHouse/pull/39922) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add logging to debug flaky tests [#39925](https://github.com/ClickHouse/ClickHouse/pull/39925) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flaky test `02360_send_logs_level_colors` [#39927](https://github.com/ClickHouse/ClickHouse/pull/39927) ([Anton Popov](https://github.com/CurtizJ)).
* Don't create self-extracting clickhouse for split build [#39936](https://github.com/ClickHouse/ClickHouse/pull/39936) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* tests/stress: add dmesg output (to see OOM details) [#39939](https://github.com/ClickHouse/ClickHouse/pull/39939) ([Azat Khuzhin](https://github.com/azat)).
* Create metadata directory on CREATE for FileLog engine [#39940](https://github.com/ClickHouse/ClickHouse/pull/39940) ([Azat Khuzhin](https://github.com/azat)).
* tests: fix 02352_rwlock flakiness [#39941](https://github.com/ClickHouse/ClickHouse/pull/39941) ([Azat Khuzhin](https://github.com/azat)).
* Remove old code from the website [#39947](https://github.com/ClickHouse/ClickHouse/pull/39947) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove debug trace from DistinctStep [#39955](https://github.com/ClickHouse/ClickHouse/pull/39955) ([Igor Nikonov](https://github.com/devcrafter)).
* IAST destructor intrusive list [#39956](https://github.com/ClickHouse/ClickHouse/pull/39956) ([Maksim Kita](https://github.com/kitaisreal)).
* Remove old code from the website (part 2) [#39959](https://github.com/ClickHouse/ClickHouse/pull/39959) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add Stateful tests (release), Stateless tests (release) to Mergeable Check [#39967](https://github.com/ClickHouse/ClickHouse/pull/39967) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Change font in CI reports [#39969](https://github.com/ClickHouse/ClickHouse/pull/39969) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add setting type to support special 'auto' value [#39974](https://github.com/ClickHouse/ClickHouse/pull/39974) ([Vladimir C](https://github.com/vdimir)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#39979](https://github.com/ClickHouse/ClickHouse/pull/39979) ([Nikita Taranov](https://github.com/nickitat)).
* tests/stress: fix dmesg reading [#39980](https://github.com/ClickHouse/ClickHouse/pull/39980) ([Azat Khuzhin](https://github.com/azat)).
* Disable 02380_insert_mv_race.sh with Ordinary [#39985](https://github.com/ClickHouse/ClickHouse/pull/39985) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Mention how the build can be speed up by disabling self-extraction [#39988](https://github.com/ClickHouse/ClickHouse/pull/39988) ([Robert Schulze](https://github.com/rschu1ze)).
* Use different root path for Jepsen Counter test [#39992](https://github.com/ClickHouse/ClickHouse/pull/39992) ([Antonio Andelic](https://github.com/antonio2368)).
* ActionsDAG rename index to outputs [#39998](https://github.com/ClickHouse/ClickHouse/pull/39998) ([Maksim Kita](https://github.com/kitaisreal)).
* Added H literal for Hour IntervalKind [#39999](https://github.com/ClickHouse/ClickHouse/pull/39999) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Try to avoid timeouts when checking for replication consistency [#40001](https://github.com/ClickHouse/ClickHouse/pull/40001) ([Alexander Tokmakov](https://github.com/tavplubix)).
* More generic check for MergeTree table family [#40004](https://github.com/ClickHouse/ClickHouse/pull/40004) ([Alexander Gololobov](https://github.com/davenger)).
* Further preparation for catboost integration into library-bridge [#40010](https://github.com/ClickHouse/ClickHouse/pull/40010) ([Robert Schulze](https://github.com/rschu1ze)).
* Self-extracting: decompressor, extract real path of executable instead of argv[0] [#40011](https://github.com/ClickHouse/ClickHouse/pull/40011) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* copy self-extracting to output [#40017](https://github.com/ClickHouse/ClickHouse/pull/40017) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#40024](https://github.com/ClickHouse/ClickHouse/pull/40024) ([Nikita Taranov](https://github.com/nickitat)).
* Fix segfault in `DataTypeAggregateFunction` [#40025](https://github.com/ClickHouse/ClickHouse/pull/40025) ([Anton Popov](https://github.com/CurtizJ)).
* tests/performance: cover sparse_hashed dictionary [#40027](https://github.com/ClickHouse/ClickHouse/pull/40027) ([Azat Khuzhin](https://github.com/azat)).
* Cleanup docs of parseDateTime*() function family [#40030](https://github.com/ClickHouse/ClickHouse/pull/40030) ([Robert Schulze](https://github.com/rschu1ze)).
* Job url [#40032](https://github.com/ClickHouse/ClickHouse/pull/40032) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.6.5.22-stable [#40036](https://github.com/ClickHouse/ClickHouse/pull/40036) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Non-significant changes [#40038](https://github.com/ClickHouse/ClickHouse/pull/40038) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* tests: attempt to make 02293_part_log_has_merge_reason less flaky [#40047](https://github.com/ClickHouse/ClickHouse/pull/40047) ([Azat Khuzhin](https://github.com/azat)).
* Remove documentation templates [#40048](https://github.com/ClickHouse/ClickHouse/pull/40048) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Move images to clickhouse-presentations repository. [#40049](https://github.com/ClickHouse/ClickHouse/pull/40049) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix broken image in test-visualizer [#40050](https://github.com/ClickHouse/ClickHouse/pull/40050) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test for query parameters in HTTP POST [#40055](https://github.com/ClickHouse/ClickHouse/pull/40055) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix clickhouse-test hang in case of CREATE DATABASE fails [#40057](https://github.com/ClickHouse/ClickHouse/pull/40057) ([Azat Khuzhin](https://github.com/azat)).
* tests: fix 02380_insert_mv_race for Ordinary database [#40058](https://github.com/ClickHouse/ClickHouse/pull/40058) ([Azat Khuzhin](https://github.com/azat)).
* Skip newlines before Tags in clickhouse-test [#40061](https://github.com/ClickHouse/ClickHouse/pull/40061) ([Vladimir C](https://github.com/vdimir)).
* Replace S3 URLs by parameter [#40066](https://github.com/ClickHouse/ClickHouse/pull/40066) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Finally fix `_csv.Error: field larger than field limit` [#40072](https://github.com/ClickHouse/ClickHouse/pull/40072) ([Alexander Tokmakov](https://github.com/tavplubix)).
* tests: fix 00926_adaptive_index_granularity_pk/00489_pk_subexpression flakiness [#40075](https://github.com/ClickHouse/ClickHouse/pull/40075) ([Azat Khuzhin](https://github.com/azat)).
* Changelogs and versions [#40090](https://github.com/ClickHouse/ClickHouse/pull/40090) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* A test for counting resources in subqueries [#40104](https://github.com/ClickHouse/ClickHouse/pull/40104) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use a job ID as ref text [#40112](https://github.com/ClickHouse/ClickHouse/pull/40112) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Delete files DictionaryJoinAdapter.h/cpp [#40113](https://github.com/ClickHouse/ClickHouse/pull/40113) ([Vladimir C](https://github.com/vdimir)).
* Rework S3Helper a little bit [#40127](https://github.com/ClickHouse/ClickHouse/pull/40127) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* PODArray assign empty array fix [#40129](https://github.com/ClickHouse/ClickHouse/pull/40129) ([Maksim Kita](https://github.com/kitaisreal)).
* Disable 02390_prometheus_ClickHouseStatusInfo_DictionaryStatus with Ordinary database [#40136](https://github.com/ClickHouse/ClickHouse/pull/40136) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add tests with Ordinary database to flaky check [#40137](https://github.com/ClickHouse/ClickHouse/pull/40137) ([Alexander Tokmakov](https://github.com/tavplubix)).
* fs cache: minor change [#40138](https://github.com/ClickHouse/ClickHouse/pull/40138) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix typo [#40139](https://github.com/ClickHouse/ClickHouse/pull/40139) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix keeper-bench in case of error during scheduling a thread [#40147](https://github.com/ClickHouse/ClickHouse/pull/40147) ([Azat Khuzhin](https://github.com/azat)).
* Fix "Cannot quickly remove directory" [#40151](https://github.com/ClickHouse/ClickHouse/pull/40151) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Set sync_request_timeout to 10 to avoid reconnections in tests [#40158](https://github.com/ClickHouse/ClickHouse/pull/40158) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Disable zero-copy replication by default [#40175](https://github.com/ClickHouse/ClickHouse/pull/40175) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve assignment and logging for cherry-pick and backport steps [#40177](https://github.com/ClickHouse/ClickHouse/pull/40177) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* test for Decimal aggregateFunction normalization [#39420](https://github.com/ClickHouse/ClickHouse/issues/39420) [#40178](https://github.com/ClickHouse/ClickHouse/pull/40178) ([Denny Crane](https://github.com/den-crane)).
* Minor build changes [#40182](https://github.com/ClickHouse/ClickHouse/pull/40182) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* clickhouse-test: enable ZooKeeper tests by default [#40191](https://github.com/ClickHouse/ClickHouse/pull/40191) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old code [#40196](https://github.com/ClickHouse/ClickHouse/pull/40196) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#40198](https://github.com/ClickHouse/ClickHouse/pull/40198) ([clickhouse-robot-curie](https://github.com/clickhouse-robot-curie)).
* Fix a bug with symlinks detection [#40232](https://github.com/ClickHouse/ClickHouse/pull/40232) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Better error message when restoring covered parts [#40234](https://github.com/ClickHouse/ClickHouse/pull/40234) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Try to print stacktraces if query timeouts in integration tests [#40248](https://github.com/ClickHouse/ClickHouse/pull/40248) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add Unit tests to Mergeable [#40250](https://github.com/ClickHouse/ClickHouse/pull/40250) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Extract common KV storage logic [#40261](https://github.com/ClickHouse/ClickHouse/pull/40261) ([Antonio Andelic](https://github.com/antonio2368)).
* Add update_mergeable_check trigger for Unit tests [#40269](https://github.com/ClickHouse/ClickHouse/pull/40269) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* CVE-2021-3520: (negligible) rdkafka library: update lz4.c from upstream [#40272](https://github.com/ClickHouse/ClickHouse/pull/40272) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Fix build [#40297](https://github.com/ClickHouse/ClickHouse/pull/40297) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### Support cte statement for antlr4 syntax file
* ... [#39814](https://github.com/ClickHouse/ClickHouse/pull/39814) ([qianmoQ](https://github.com/qianmoQ)).

View File

@ -51,10 +51,14 @@ SELECT * FROM hdfs_engine_table LIMIT 2
## Implementation Details {#implementation-details}
- Reads and writes can be parallel.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
- Not supported:
- `ALTER` and `SELECT...SAMPLE` operations.
- Indexes.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is possible, but not recommended.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
**Globs in path**

View File

@ -50,10 +50,14 @@ For more information about virtual columns see [here](../../../engines/table-eng
## Implementation Details {#implementation-details}
- Reads and writes can be parallel
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
- Not supported:
- `ALTER` and `SELECT...SAMPLE` operations.
- Indexes.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is possible, but not supported.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## Wildcards In Path {#wildcards-in-path}

View File

@ -1023,6 +1023,10 @@ Other parameters:
Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## Virtual Columns {#virtual-columns}
- `_part` — Name of a part.

View File

@ -39,10 +39,53 @@ Uniqueness of rows is determined by the `ORDER BY` table section, not `PRIMARY K
`ver` — column with the version number. Type `UInt*`, `Date`, `DateTime` or `DateTime64`. Optional parameter.
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
- The last in the selection, if `ver` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
- With the maximum version, if `ver` specified. If `ver` is the same for several rows, then it will use "if `ver` is not specified" rule for them, i.e. the most recent inserted row will remain.
- The last in the selection, if `ver` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
- With the maximum version, if `ver` specified. If `ver` is the same for several rows, then it will use "if `ver` is not specified" rule for them, i.e. the most recent inserted row will remain.
Example:
```sql
-- without ver - the last inserted 'wins'
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
-- with ver - the row with the biggest ver 'wins'
CREATE TABLE mySecondReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree(eventTime)
ORDER BY key;
INSERT INTO mySecondReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO mySecondReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM mySecondReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ first │ 2020-01-01 01:01:01 │
└─────┴─────────┴─────────────────────┘
```
## Query clauses

View File

@ -743,13 +743,24 @@ On hosts with low RAM and swap, you possibly need setting `max_server_memory_usa
- [max_server_memory_usage](#max_server_memory_usage)
## concurrent_threads_soft_limit {#concurrent_threads_soft_limit}
The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries. This is not a hard limit. In case if the limit is reached the query will still get one thread to run.
## concurrent_threads_soft_limit_num {#concurrent_threads_soft_limit_num}
The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries. This is not a hard limit. In case if the limit is reached the query will still get at least one thread to run. Query can upscale to desired number of threads during execution if more threads become available.
Possible values:
- Positive integer.
- 0 — No limit.
Default value: `0`.
## concurrent_threads_soft_limit_ratio_to_cores {#concurrent_threads_soft_limit_ratio_to_cores}
The maximum number of query processing threads as multiple of number of logical cores.
More details: [concurrent_threads_soft_limit_num](#concurrent-threads-soft-limit-num).
Possible values:
- Positive integer.
- 0 — No limit.
- -1 — The parameter is initialized by number of logical cores multiplies by 3. Which is a good heuristic for CPU-bound tasks.
Default value: `0`.

View File

@ -218,6 +218,10 @@ Default value: 0 (seconds)
When this setting has a value greater than than zero only a single replica starts the merge immediately if merged part on shared storage and `allow_remote_fs_zero_copy_replication` is enabled.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
Possible values:
- Any positive integer.

View File

@ -747,7 +747,14 @@ Default value: 268435456.
Disables lagging replicas for distributed queries. See [Replication](../../engines/table-engines/mergetree-family/replication.md).
Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
Sets the time in seconds. If a replica's lag is greater than or equal to the set value, this replica is not used.
Possible values:
- Positive integer.
- 0 — Replica lags are not checked.
To prevent the use of any replica with a non-zero lag, set this parameter to 1.
Default value: 300.

View File

@ -316,4 +316,8 @@ Use [http_max_single_read_retries](../operations/settings/settings.md#http-max-s
## Zero-copy Replication (not ready for production) {#zero-copy}
ClickHouse supports zero-copy replication for `S3` and `HDFS` disks, which means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
Zero-copy replication is possible, but not recommended, with `S3` and `HDFS` disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::

View File

@ -1068,7 +1068,10 @@ Query:
```sql
WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT dateName('year', date_value), dateName('month', date_value), dateName('day', date_value);
SELECT
dateName('year', date_value),
dateName('month', date_value),
dateName('day', date_value);
```
Result:
@ -1076,7 +1079,44 @@ Result:
```text
┌─dateName('year', date_value)─┬─dateName('month', date_value)─┬─dateName('day', date_value)─┐
│ 2021 │ April │ 14 │
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────┘
```
## monthName
Returns name of the month.
**Syntax**
``` sql
monthName(date)
```
**Arguments**
- `date` — Date or date with time. [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
**Returned value**
- The name of the month.
Type: [String](../../sql-reference/data-types/string.md#string)
**Example**
Query:
```sql
WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT monthName(date_value);
```
Result:
```text
┌─monthName(date_value)─┐
│ April │
└───────────────────────┘
```
## FROM\_UNIXTIME

View File

@ -1822,10 +1822,13 @@ Result:
Evaluate external model.
Accepts a model name and model arguments. Returns Float64.
## throwIf(x\[, custom_message\])
## throwIf(x\[, message\[, error_code\]\])
Throw an exception if the argument is non zero.
custom_message - is an optional parameter: a constant string, provides an error message
`message` - is an optional parameter: a constant string providing a custom error message
`error_code` - is an optional parameter: a constant integer providing a custom error code
To use the `error_code` argument, configuration parameter `allow_custom_error_code_in_throwif` must be enabled.
``` sql
SELECT throwIf(number = 3, 'Too many') FROM numbers(10);

View File

@ -28,19 +28,65 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
Описание параметров запроса смотрите в [описании запроса](../../../engines/table-engines/mergetree-family/replacingmergetree.md).
:::note "Внимание"
Уникальность строк определяется `ORDER BY` секцией таблицы, а не `PRIMARY KEY`.
:::
**Параметры ReplacingMergeTree**
:::warning "Внимание"
Уникальность строк определяется `ORDER BY` секцией таблицы, а не `PRIMARY KEY`.
:::
- `ver` — столбец с номером версии. Тип `UInt*`, `Date`, `DateTime` или `DateTime64`. Необязательный параметр.
## Параметры ReplacingMergeTree
При слиянии `ReplacingMergeTree` оставляет только строку для каждого уникального ключа сортировки:
### ver
`ver` — столбец с номером версии. Тип `UInt*`, `Date`, `DateTime` или `DateTime64`. Необязательный параметр.
При слиянии `ReplacingMergeTree` оставляет только строку для каждого уникального ключа сортировки:
- Последнюю в выборке, если `ver` не задан. Под выборкой здесь понимается набор строк в наборе кусков данных, участвующих в слиянии. Последний по времени создания кусок (последняя вставка) будет последним в выборке. Таким образом, после дедупликации для каждого значения ключа сортировки останется самая последняя строка из самой последней вставки.
- С максимальной версией, если `ver` задан. Если `ver` одинаковый у нескольких строк, то для них используется правило -- если `ver` не задан, т.е. в результате слияния останется самая последняя строка из самой последней вставки.
**Секции запроса**
Пример:
```sql
-- without ver - the last inserted 'wins'
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
-- with ver - the row with the biggest ver 'wins'
CREATE TABLE mySecondReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree(eventTime)
ORDER BY key;
INSERT INTO mySecondReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO mySecondReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM mySecondReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ first │ 2020-01-01 01:01:01 │
└─────┴─────────┴─────────────────────┘
```
## Секции запроса
При создании таблицы `ReplacingMergeTree` используются те же [секции](mergetree.md), что и при создании таблицы `MergeTree`.
@ -48,9 +94,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Внимание"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::warning "Внимание"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ, описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -931,6 +931,13 @@ SELECT now('Europe/Moscow');
└──────────────────────┘
```
## nowInBlock {#nowinblock}
Возращает текующию дату и время в момент обработки блока данных. В отличие от функции `now`, возращаемое значение не константа, и будет возрващаться разлчиные значения в разных блоках данных при долгих запросах
Имеет смысл использовать данную функцию для получения времени сейчас при длительных запросов INSERT SELECT.
## today {#today}
Возвращает текущую дату на момент выполнения запроса. Функция не требует аргументов.

View File

@ -1727,10 +1727,13 @@ SELECT joinGet(db_test.id_val,'val',toUInt32(number)) from numbers(4) SETTINGS j
Принимает на вход имя и аргументы модели. Возвращает Float64.
## throwIf(x\[, custom_message\]) {#throwifx-custom-message}
## throwIf(x\[, message\[, error_code\]\]) {#throwifx-custom-message}
Бросает исключение, если аргумент не равен нулю.
custom_message - необязательный параметр, константная строка, задает текст сообщения об ошибке.
`custom_message` - необязательный параметр, константная строка, задает текст сообщения об ошибке.
`error_code` - необязательный параметр, константное число, задает код ошибки.
Чтобы использовать аргумент `error_code`, должен быть включен параметр конфигурации `allow_custom_error_code_in_throwif`.
``` sql
SELECT throwIf(number = 3, 'Too many') FROM numbers(10);

View File

@ -1156,22 +1156,20 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (config->has("max_partition_size_to_drop"))
global_context->setMaxPartitionSizeToDrop(config->getUInt64("max_partition_size_to_drop"));
if (config->has("concurrent_threads_soft_limit"))
ConcurrencyControl::SlotCount concurrent_threads_soft_limit = ConcurrencyControl::Unlimited;
if (config->has("concurrent_threads_soft_limit_num"))
{
auto concurrent_threads_soft_limit = config->getInt("concurrent_threads_soft_limit", 0);
if (concurrent_threads_soft_limit == -1)
{
// Based on tests concurrent_threads_soft_limit has an optimal value when it's about 3 times of logical CPU cores
constexpr size_t thread_factor = 3;
concurrent_threads_soft_limit = std::thread::hardware_concurrency() * thread_factor;
}
if (concurrent_threads_soft_limit)
ConcurrencyControl::instance().setMaxConcurrency(concurrent_threads_soft_limit);
else
ConcurrencyControl::instance().setMaxConcurrency(ConcurrencyControl::Unlimited);
auto value = config->getUInt64("concurrent_threads_soft_limit_num", 0);
if (value > 0 && value < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = value;
}
else
ConcurrencyControl::instance().setMaxConcurrency(ConcurrencyControl::Unlimited);
if (config->has("concurrent_threads_soft_limit_ratio_to_cores"))
{
auto value = config->getUInt64("concurrent_threads_soft_limit_ratio_to_cores", 0) * std::thread::hardware_concurrency();
if (value > 0 && value < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = value;
}
ConcurrencyControl::instance().setMaxConcurrency(concurrent_threads_soft_limit);
if (config->has("max_concurrent_queries"))
global_context->getProcessList().setMaxSize(config->getInt("max_concurrent_queries", 0));

View File

@ -281,12 +281,12 @@
<http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
-->
<!-- Maximum number of query processing threads to run all queries.
Note that This is not a hard limit. In case if the limit is reached the query will still get one thread to run.
For value equals to -1 this parameter is initialized by number of logical cores multiplies by 3.
Which is a good heuristic for CPU-bound tasks.
<!-- The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries.
This is not a hard limit. In case if the limit is reached the query will still get at least one thread to run.
Query can upscale to desired number of threads during execution if more threads become available.
-->
<concurrent_threads_soft_limit>0</concurrent_threads_soft_limit>
<concurrent_threads_soft_limit_num>0</concurrent_threads_soft_limit_num>
<concurrent_threads_soft_limit_ratio_to_cores>0</concurrent_threads_soft_limit_ratio_to_cores>
<!-- Maximum number of concurrent queries. -->
<max_concurrent_queries>100</max_concurrent_queries>

View File

@ -30,12 +30,12 @@ FileCache::FileCache(
, max_element_size(cache_settings_.max_elements)
, max_file_segment_size(cache_settings_.max_file_segment_size)
, allow_persistent_files(cache_settings_.do_not_evict_index_and_mark_files)
, enable_cache_hits_threshold(cache_settings_.enable_cache_hits_threshold)
, enable_filesystem_query_cache_limit(cache_settings_.enable_filesystem_query_cache_limit)
, log(&Poco::Logger::get("FileCache"))
, main_priority(std::make_unique<LRUFileCachePriority>())
, stash_priority(std::make_unique<LRUFileCachePriority>())
, max_stash_element_size(cache_settings_.max_elements)
, enable_cache_hits_threshold(cache_settings_.enable_cache_hits_threshold)
, log(&Poco::Logger::get("FileCache"))
{
}
@ -77,132 +77,6 @@ void FileCache::assertInitialized() const
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cache not initialized");
}
FileCache::QueryContextPtr FileCache::getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock)
{
if (!isQueryInitialized())
return nullptr;
return getQueryContext(std::string(CurrentThread::getQueryId()), cache_lock);
}
FileCache::QueryContextPtr FileCache::getQueryContext(const String & query_id, std::lock_guard<std::mutex> & /* cache_lock */)
{
auto query_iter = query_map.find(query_id);
return (query_iter == query_map.end()) ? nullptr : query_iter->second;
}
void FileCache::removeQueryContext(const String & query_id)
{
std::lock_guard cache_lock(mutex);
auto query_iter = query_map.find(query_id);
if (query_iter == query_map.end())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Attempt to release query context that does not exist (query_id: {})",
query_id);
}
query_map.erase(query_iter);
}
FileCache::QueryContextPtr FileCache::getOrSetQueryContext(
const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> & cache_lock)
{
if (query_id.empty())
return nullptr;
auto context = getQueryContext(query_id, cache_lock);
if (context)
return context;
auto query_context = std::make_shared<QueryContext>(settings.max_query_cache_size, settings.skip_download_if_exceeds_query_cache);
auto query_iter = query_map.emplace(query_id, query_context).first;
return query_iter->second;
}
FileCache::QueryContextHolder FileCache::getQueryContextHolder(const String & query_id, const ReadSettings & settings)
{
std::lock_guard cache_lock(mutex);
if (!enable_filesystem_query_cache_limit || settings.max_query_cache_size == 0)
return {};
/// if enable_filesystem_query_cache_limit is true, and max_query_cache_size large than zero,
/// we create context query for current query.
auto context = getOrSetQueryContext(query_id, settings, cache_lock);
return QueryContextHolder(query_id, this, context);
}
void FileCache::QueryContext::remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size < size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Deleted cache size exceeds existing cache size");
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record != records.end())
{
record->second->removeAndGetNext(cache_lock);
records.erase({key, offset});
}
}
cache_size -= size;
}
void FileCache::QueryContext::reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size + size > max_cache_size)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Reserved cache size exceeds the remaining cache size (key: {}, offset: {})",
key.toString(), offset);
}
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record == records.end())
{
auto queue_iter = priority->add(key, offset, 0, cache_lock);
record = records.insert({{key, offset}, queue_iter}).first;
}
record->second->incrementSize(size, cache_lock);
}
cache_size += size;
}
void FileCache::QueryContext::use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock)
{
if (skip_download_if_exceeds_query_cache)
return;
auto record = records.find({key, offset});
if (record != records.end())
record->second->use(cache_lock);
}
FileCache::QueryContextHolder::QueryContextHolder(
const String & query_id_,
FileCache * cache_,
FileCache::QueryContextPtr context_)
: query_id(query_id_)
, cache(cache_)
, context(context_)
{
}
FileCache::QueryContextHolder::~QueryContextHolder()
{
/// If only the query_map and the current holder hold the context_query,
/// the query has been completed and the query_context is released.
if (context && context.use_count() == 2)
cache->removeQueryContext(query_id);
}
void FileCache::initialize()
{
std::lock_guard cache_lock(mutex);
@ -1222,12 +1096,6 @@ size_t FileCache::getUsedCacheSizeUnlocked(std::lock_guard<std::mutex> & cache_l
return main_priority->getCacheSize(cache_lock);
}
size_t FileCache::getAvailableCacheSize() const
{
std::lock_guard cache_lock(mutex);
return getAvailableCacheSizeUnlocked(cache_lock);
}
size_t FileCache::getAvailableCacheSizeUnlocked(std::lock_guard<std::mutex> & cache_lock) const
{
return max_size - getUsedCacheSizeUnlocked(cache_lock);
@ -1346,4 +1214,130 @@ void FileCache::assertPriorityCorrectness(std::lock_guard<std::mutex> & cache_lo
assert(main_priority->getElementsNum(cache_lock) <= max_element_size);
}
FileCache::QueryContextPtr FileCache::getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock)
{
if (!isQueryInitialized())
return nullptr;
return getQueryContext(std::string(CurrentThread::getQueryId()), cache_lock);
}
FileCache::QueryContextPtr FileCache::getQueryContext(const String & query_id, std::lock_guard<std::mutex> & /* cache_lock */)
{
auto query_iter = query_map.find(query_id);
return (query_iter == query_map.end()) ? nullptr : query_iter->second;
}
void FileCache::removeQueryContext(const String & query_id)
{
std::lock_guard cache_lock(mutex);
auto query_iter = query_map.find(query_id);
if (query_iter == query_map.end())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Attempt to release query context that does not exist (query_id: {})",
query_id);
}
query_map.erase(query_iter);
}
FileCache::QueryContextPtr FileCache::getOrSetQueryContext(
const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> & cache_lock)
{
if (query_id.empty())
return nullptr;
auto context = getQueryContext(query_id, cache_lock);
if (context)
return context;
auto query_context = std::make_shared<QueryContext>(settings.max_query_cache_size, settings.skip_download_if_exceeds_query_cache);
auto query_iter = query_map.emplace(query_id, query_context).first;
return query_iter->second;
}
FileCache::QueryContextHolder FileCache::getQueryContextHolder(const String & query_id, const ReadSettings & settings)
{
std::lock_guard cache_lock(mutex);
if (!enable_filesystem_query_cache_limit || settings.max_query_cache_size == 0)
return {};
/// if enable_filesystem_query_cache_limit is true, and max_query_cache_size large than zero,
/// we create context query for current query.
auto context = getOrSetQueryContext(query_id, settings, cache_lock);
return QueryContextHolder(query_id, this, context);
}
void FileCache::QueryContext::remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size < size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Deleted cache size exceeds existing cache size");
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record != records.end())
{
record->second->removeAndGetNext(cache_lock);
records.erase({key, offset});
}
}
cache_size -= size;
}
void FileCache::QueryContext::reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size + size > max_cache_size)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Reserved cache size exceeds the remaining cache size (key: {}, offset: {})",
key.toString(), offset);
}
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record == records.end())
{
auto queue_iter = priority->add(key, offset, 0, cache_lock);
record = records.insert({{key, offset}, queue_iter}).first;
}
record->second->incrementSize(size, cache_lock);
}
cache_size += size;
}
void FileCache::QueryContext::use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock)
{
if (skip_download_if_exceeds_query_cache)
return;
auto record = records.find({key, offset});
if (record != records.end())
record->second->use(cache_lock);
}
FileCache::QueryContextHolder::QueryContextHolder(
const String & query_id_,
FileCache * cache_,
FileCache::QueryContextPtr context_)
: query_id(query_id_)
, cache(cache_)
, context(context_)
{
}
FileCache::QueryContextHolder::~QueryContextHolder()
{
/// If only the query_map and the current holder hold the context_query,
/// the query has been completed and the query_context is released.
if (context && context.use_count() == 2)
cache->removeQueryContext(query_id);
}
}

View File

@ -23,13 +23,17 @@ namespace DB
{
/// Local cache for remote filesystem files, represented as a set of non-overlapping non-empty file segments.
/// Different caching algorithms are implemented based on IFileCachePriority.
/// Different caching algorithms are implemented using IFileCachePriority.
class FileCache : private boost::noncopyable
{
friend class FileSegment;
friend class IFileCachePriority;
friend struct FileSegmentsHolder;
friend class FileSegmentRangeWriter;
friend class FileSegment;
friend class IFileCachePriority;
friend struct FileSegmentsHolder;
friend class FileSegmentRangeWriter;
struct QueryContext;
using QueryContextPtr = std::shared_ptr<QueryContext>;
public:
using Key = DB::FileCacheKey;
@ -41,25 +45,8 @@ public:
/// Restore cache from local filesystem.
void initialize();
void removeIfExists(const Key & key);
void removeIfReleasable();
static bool isReadOnly();
/// Cache capacity in bytes.
size_t capacity() const { return max_size; }
static Key hash(const String & path);
String getPathInLocalCache(const Key & key, size_t offset, bool is_persistent) const;
String getPathInLocalCache(const Key & key) const;
const String & getBasePath() const { return cache_base_path; }
std::vector<String> tryGetCachePaths(const Key & key);
/**
* Given an `offset` and `size` representing [offset, offset + size) bytes interval,
* return list of cached non-overlapping non-empty
@ -84,6 +71,28 @@ public:
*/
FileSegmentsHolder get(const Key & key, size_t offset, size_t size);
/// Remove files by `key`. Removes files which might be used at the moment.
void removeIfExists(const Key & key);
/// Remove files by `key`. Will not remove files which are used at the moment.
void removeIfReleasable();
static Key hash(const String & path);
String getPathInLocalCache(const Key & key, size_t offset, bool is_persistent) const;
String getPathInLocalCache(const Key & key) const;
std::vector<String> tryGetCachePaths(const Key & key);
size_t capacity() const { return max_size; }
size_t getUsedCacheSize() const;
size_t getFileSegmentsNum() const;
static bool isReadOnly();
/**
* Create a file segment of exactly requested size with EMPTY state.
* Throw exception if requested size exceeds max allowed file segment size.
@ -102,92 +111,6 @@ public:
/// For debug.
String dumpStructure(const Key & key);
size_t getUsedCacheSize() const;
size_t getFileSegmentsNum() const;
private:
String cache_base_path;
size_t max_size;
size_t max_element_size;
size_t max_file_segment_size;
bool allow_persistent_files;
bool is_initialized = false;
mutable std::mutex mutex;
bool tryReserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void remove(Key key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
bool isLastFileSegmentHolder(
const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
void reduceSizeToDownloaded(
const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & /* segment_lock */);
void assertInitialized() const;
using AccessKeyAndOffset = std::pair<Key, size_t>;
struct KeyAndOffsetHash
{
std::size_t operator()(const AccessKeyAndOffset & key) const
{
return std::hash<UInt128>()(key.first.key) ^ std::hash<UInt64>()(key.second);
}
};
using FileCacheRecords = std::unordered_map<AccessKeyAndOffset, IFileCachePriority::WriteIterator, KeyAndOffsetHash>;
/// Used to track and control the cache access of each query.
/// Through it, we can realize the processing of different queries by the cache layer.
struct QueryContext
{
FileCacheRecords records;
FileCachePriorityPtr priority;
size_t cache_size = 0;
size_t max_cache_size;
bool skip_download_if_exceeds_query_cache;
QueryContext(size_t max_cache_size_, bool skip_download_if_exceeds_query_cache_)
: max_cache_size(max_cache_size_), skip_download_if_exceeds_query_cache(skip_download_if_exceeds_query_cache_)
{
}
void remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock);
size_t getMaxCacheSize() const { return max_cache_size; }
size_t getCacheSize() const { return cache_size; }
FileCachePriorityPtr getPriority() { return priority; }
bool isSkipDownloadIfExceed() const { return skip_download_if_exceeds_query_cache; }
};
using QueryContextPtr = std::shared_ptr<QueryContext>;
using QueryContextMap = std::unordered_map<String, QueryContextPtr>;
QueryContextMap query_map;
bool enable_filesystem_query_cache_limit;
QueryContextPtr getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock);
QueryContextPtr getQueryContext(const String & query_id, std::lock_guard<std::mutex> & cache_lock);
void removeQueryContext(const String & query_id);
QueryContextPtr getOrSetQueryContext(const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> &);
public:
/// Save a query context information, and adopt different cache policies
/// for different queries through the context cache layer.
struct QueryContextHolder : private boost::noncopyable
@ -206,6 +129,43 @@ public:
QueryContextHolder getQueryContextHolder(const String & query_id, const ReadSettings & settings);
private:
String cache_base_path;
size_t max_size;
size_t max_element_size;
size_t max_file_segment_size;
bool allow_persistent_files;
size_t enable_cache_hits_threshold;
bool enable_filesystem_query_cache_limit;
Poco::Logger * log;
bool is_initialized = false;
mutable std::mutex mutex;
bool tryReserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void remove(
Key key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
bool isLastFileSegmentHolder(
const Key & key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
void reduceSizeToDownloaded(
const Key & key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
void assertInitialized() const;
struct FileSegmentCell : private boost::noncopyable
{
FileSegmentPtr file_segment;
@ -223,24 +183,30 @@ private:
FileSegmentCell(FileSegmentPtr file_segment_, FileCache * cache, std::lock_guard<std::mutex> & cache_lock);
FileSegmentCell(FileSegmentCell && other) noexcept
: file_segment(std::move(other.file_segment)), queue_iterator(other.queue_iterator)
: file_segment(std::move(other.file_segment)), queue_iterator(std::move(other.queue_iterator)) {}
};
using AccessKeyAndOffset = std::pair<Key, size_t>;
struct KeyAndOffsetHash
{
std::size_t operator()(const AccessKeyAndOffset & key) const
{
return std::hash<UInt128>()(key.first.key) ^ std::hash<UInt64>()(key.second);
}
};
using FileSegmentsByOffset = std::map<size_t, FileSegmentCell>;
using CachedFiles = std::unordered_map<Key, FileSegmentsByOffset>;
using FileCacheRecords = std::unordered_map<AccessKeyAndOffset, IFileCachePriority::WriteIterator, KeyAndOffsetHash>;
CachedFiles files;
std::unique_ptr<IFileCachePriority> main_priority;
FileCacheRecords stash_records;
std::unique_ptr<IFileCachePriority> stash_priority;
size_t max_stash_element_size;
size_t enable_cache_hits_threshold;
Poco::Logger * log;
void loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_lock);
FileSegments getImpl(const Key & key, const FileSegment::Range & range, std::lock_guard<std::mutex> & cache_lock);
@ -257,11 +223,11 @@ private:
void useCell(const FileSegmentCell & cell, FileSegments & result, std::lock_guard<std::mutex> & cache_lock) const;
bool tryReserveForMainList(
const Key & key, size_t offset, size_t size, QueryContextPtr query_context, std::lock_guard<std::mutex> & cache_lock);
size_t getAvailableCacheSize() const;
void loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_lock);
const Key & key,
size_t offset,
size_t size,
QueryContextPtr query_context,
std::lock_guard<std::mutex> & cache_lock);
FileSegments splitRangeIntoCells(
const Key & key,
@ -289,6 +255,48 @@ private:
void assertCacheCellsCorrectness(const FileSegmentsByOffset & cells_by_offset, std::lock_guard<std::mutex> & cache_lock);
/// Used to track and control the cache access of each query.
/// Through it, we can realize the processing of different queries by the cache layer.
struct QueryContext
{
FileCacheRecords records;
FileCachePriorityPtr priority;
size_t cache_size = 0;
size_t max_cache_size;
bool skip_download_if_exceeds_query_cache;
QueryContext(size_t max_cache_size_, bool skip_download_if_exceeds_query_cache_)
: max_cache_size(max_cache_size_)
, skip_download_if_exceeds_query_cache(skip_download_if_exceeds_query_cache_) {}
size_t getMaxCacheSize() const { return max_cache_size; }
size_t getCacheSize() const { return cache_size; }
FileCachePriorityPtr getPriority() const { return priority; }
bool isSkipDownloadIfExceed() const { return skip_download_if_exceeds_query_cache; }
void remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock);
};
using QueryContextMap = std::unordered_map<String, QueryContextPtr>;
QueryContextMap query_map;
QueryContextPtr getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock);
QueryContextPtr getQueryContext(const String & query_id, std::lock_guard<std::mutex> & cache_lock);
void removeQueryContext(const String & query_id);
QueryContextPtr getOrSetQueryContext(const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> &);
public:
void assertCacheCorrectness(const Key & key, std::lock_guard<std::mutex> & cache_lock);

View File

@ -73,6 +73,11 @@ bool BackgroundSchedulePoolTaskInfo::activateAndSchedule()
return true;
}
std::unique_lock<std::mutex> BackgroundSchedulePoolTaskInfo::getExecLock()
{
return std::unique_lock{exec_mutex};
}
void BackgroundSchedulePoolTaskInfo::execute()
{
Stopwatch watch;

View File

@ -121,6 +121,10 @@ public:
/// get Coordination::WatchCallback needed for notifications from ZooKeeper watches.
Coordination::WatchCallback getWatchCallback();
/// Returns lock that protects from concurrent task execution.
/// This lock should not be held for a long time.
std::unique_lock<std::mutex> getExecLock();
private:
friend class TaskNotification;
friend class BackgroundSchedulePool;

View File

@ -280,6 +280,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(UInt64, http_max_fields, 1000000, "Maximum number of fields in HTTP header", 0) \
M(UInt64, http_max_field_name_size, 1048576, "Maximum length of field name in HTTP header", 0) \
M(UInt64, http_max_field_value_size, 1048576, "Maximum length of field value in HTTP header", 0) \
M(UInt64, http_max_chunk_size, 100_GiB, "Maximum value of a chunk size in HTTP chunked transfer encoding", 0) \
M(Bool, http_skip_not_found_url_for_globs, true, "Skip url's for globs with HTTP_NOT_FOUND error", 0) \
M(Bool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown", 0) \
M(Bool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.", 0) \
@ -408,6 +409,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(UInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.", 0) \
M(Bool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.", 0) \
M(Bool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations", 0) \
M(Bool, allow_custom_error_code_in_throwif, false, "Enable custom error code in function throwIf(). If true, thrown exceptions may have unexpected error codes.", 0) \
\
M(Bool, prefer_localhost_replica, true, "If it's true then queries will be always sent to local replica (if it exists). If it's false then replica to send a query will be chosen between local and remote ones according to load_balancing", 0) \
M(UInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.", 0) \
@ -713,7 +715,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, input_format_orc_skip_columns_with_unsupported_types_in_schema_inference, false, "Skip columns with unsupported types while schema inference for format ORC", 0) \
M(Bool, input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference, false, "Skip columns with unsupported types while schema inference for format Arrow", 0) \
M(String, column_names_for_schema_inference, "", "The list of column names to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'", 0) \
M(String, schema_inference_hints, "", "The list of column names and types to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'", 0) \
M(String, schema_inference_hints, "", "The list of column names and types to use in schema inference for formats without column names. The format: 'column_name1 column_type1, column_name2 column_type2, ...'", 0) \
M(Bool, input_format_json_read_bools_as_numbers, true, "Allow to parse bools as numbers in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, true, "Try to infer numbers from string fields while schema inference", 0) \
M(Bool, input_format_try_infer_integers, true, "Try to infer numbers from string fields while schema inference in text formats", 0) \

View File

@ -126,13 +126,30 @@ std::pair<String, String> DatabaseReplicated::parseFullReplicaName(const String
return {shard, replica};
}
ClusterPtr DatabaseReplicated::getCluster() const
ClusterPtr DatabaseReplicated::tryGetCluster() const
{
std::lock_guard lock{mutex};
if (cluster)
return cluster;
cluster = getClusterImpl();
/// Database is probably not created or not initialized yet, it's ok to return nullptr
if (is_readonly)
return cluster;
try
{
/// A quick fix for stateless tests with DatabaseReplicated. Its ZK
/// node can be destroyed at any time. If another test lists
/// system.clusters to get client command line suggestions, it will
/// get an error when trying to get the info about DB from ZK.
/// Just ignore these inaccessible databases. A good example of a
/// failing test is `01526_client_start_and_exit`.
cluster = getClusterImpl();
}
catch (...)
{
tryLogCurrentException(log);
}
return cluster;
}

View File

@ -60,7 +60,7 @@ public:
const String & getZooKeeperPath() const { return zookeeper_path; }
/// Returns cluster consisting of database replicas
ClusterPtr getCluster() const;
ClusterPtr tryGetCluster() const;
void drop(ContextPtr /*context*/) override;

View File

@ -611,7 +611,7 @@ public:
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (!isNativeNumber(arguments[1].type))
throw Exception("Second argument for function " + getName() + " (delta) must be number",
throw Exception("Second argument for function " + getName() + " (delta) must be a number",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (arguments.size() == 2)
@ -627,7 +627,7 @@ public:
{
throw Exception(
"Function " + getName() + " supports 2 or 3 arguments. The 1st argument "
"must be of type Date or DateTime. The 2nd argument must be number. "
"must be of type Date or DateTime. The 2nd argument must be a number. "
"The 3rd argument (optional) must be "
"a constant string with timezone name. The timezone argument is allowed "
"only when the 1st argument has the type DateTime",

View File

@ -46,7 +46,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isNativeNumber(arguments.front()))
throw Exception{"Argument for function " + getName() + " must be number", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
throw Exception{"Argument for function " + getName() + " must be a number", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
return std::make_shared<DataTypeUInt8>();
}

View File

@ -62,7 +62,7 @@ protected:
DataTypePtr argument_type = arguments[i].type;
if (!isNumber(argument_type))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Argument '{}' for function {} must be number", std::string(argument_names[i]), getName());
"Argument '{}' for function {} must be a number", std::string(argument_names[i]), getName());
}
}
@ -322,7 +322,7 @@ public:
const auto& fraction_argument = arguments[argument_names.size()];
if (!isNumber(fraction_argument.type))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Argument 'fraction' for function {} must be number", getName());
"Argument 'fraction' for function {} must be a number", getName());
}
/// Optional precision argument

View File

@ -4,9 +4,10 @@
#include <Columns/ColumnString.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnsCommon.h>
#include <Common/ErrorCodes.h>
#include <DataTypes/DataTypesNumber.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/Context.h>
namespace DB
{
@ -21,55 +22,56 @@ namespace ErrorCodes
namespace
{
/// The regex-based code style check script in CI complains when it sees "ErrorCodes:: ErrorCode" (space added to avoid another match).
/// Because this expression is only used in this file, don't add some suppression mechanism to the already complex style checker, instead
/// work around by creating a namespace alias.
namespace ErrorCodeAlias = ErrorCodes;
/// Throw an exception if the argument is non zero.
class FunctionThrowIf : public IFunction
{
public:
static constexpr auto name = "throwIf";
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionThrowIf>();
}
String getName() const override
{
return name;
}
static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionThrowIf>(context); }
explicit FunctionThrowIf(ContextPtr context_) : allow_custom_error_code_argument(context_->getSettingsRef().allow_custom_error_code_in_throwif) {}
String getName() const override { return name; }
bool isVariadic() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
size_t getNumberOfArguments() const override
{
return 0;
}
size_t getNumberOfArguments() const override { return 0; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
const size_t number_of_arguments = arguments.size();
if (number_of_arguments < 1 || number_of_arguments > 2)
if (number_of_arguments < 1 || number_of_arguments > (allow_custom_error_code_argument ? 3 : 2))
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Number of arguments for function {} doesn't match: passed {}, should be 1 or 2",
getName(),
toString(number_of_arguments));
"Number of arguments for function {} doesn't match: passed {}, should be {}",
getName(), toString(number_of_arguments), allow_custom_error_code_argument ? "1 or 2 or 3" : "1 or 2");
if (!isNativeNumber(arguments[0]))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Argument for function {} must be number",
getName());
"First argument of function {} must be a number (passed: {})", getName(), arguments[0]->getName());
if (number_of_arguments > 1 && !isString(arguments[1]))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument of function {}",
arguments[1]->getName(),
getName());
"Second argument of function {} must be a string (passed: {})", getName(), arguments[1]->getName());
if (allow_custom_error_code_argument && number_of_arguments > 2)
{
WhichDataType which(arguments[2]);
if (!(which.isInt8() || which.isInt16() || which.isInt32()))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Third argument of function {} must be Int8, Int16 or Int32 (passed: {})", getName(), arguments[2]->getName());
}
return std::make_shared<DataTypeUInt8>();
}
bool useDefaultImplementationForConstants() const override { return false; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1, 2}; }
/** Prevent constant folding for FunctionThrowIf because for short circuit evaluation
* it is unsafe to evaluate this function during DAG analysis.
@ -86,36 +88,44 @@ public:
{
const auto * message_column = checkAndGetColumnConst<ColumnString>(arguments[1].column.get());
if (!message_column)
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Second argument for function {} must be constant String",
getName());
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Second argument for function {} must be constant String", getName());
custom_message = message_column->getValue<String>();
}
std::optional<ErrorCodeAlias::ErrorCode> custom_error_code;
if (allow_custom_error_code_argument && arguments.size() == 3)
{
if (!isColumnConst(*(arguments[2].column)))
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Third argument for function {} must be constant number", getName());
custom_error_code = arguments[2].column->getInt(0);
}
auto first_argument_column = arguments.front().column;
const auto * in = first_argument_column.get();
ColumnPtr res;
if (!((res = execute<UInt8>(in, custom_message))
|| (res = execute<UInt16>(in, custom_message))
|| (res = execute<UInt32>(in, custom_message))
|| (res = execute<UInt64>(in, custom_message))
|| (res = execute<Int8>(in, custom_message))
|| (res = execute<Int16>(in, custom_message))
|| (res = execute<Int32>(in, custom_message))
|| (res = execute<Int64>(in, custom_message))
|| (res = execute<Float32>(in, custom_message))
|| (res = execute<Float64>(in, custom_message))))
if (!((res = execute<UInt8>(in, custom_message, custom_error_code))
|| (res = execute<UInt16>(in, custom_message, custom_error_code))
|| (res = execute<UInt32>(in, custom_message, custom_error_code))
|| (res = execute<UInt64>(in, custom_message, custom_error_code))
|| (res = execute<Int8>(in, custom_message, custom_error_code))
|| (res = execute<Int16>(in, custom_message, custom_error_code))
|| (res = execute<Int32>(in, custom_message, custom_error_code))
|| (res = execute<Int64>(in, custom_message, custom_error_code))
|| (res = execute<Float32>(in, custom_message, custom_error_code))
|| (res = execute<Float64>(in, custom_message, custom_error_code))))
{
throw Exception{"Illegal column " + in->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN};
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of first argument of function {}", in->getName(), getName());
}
return res;
}
private:
template <typename T>
ColumnPtr execute(const IColumn * in_untyped, const std::optional<String> & message) const
ColumnPtr execute(const IColumn * in_untyped, const std::optional<String> & message, const std::optional<ErrorCodeAlias::ErrorCode> & error_code) const
{
const auto * in = checkAndGetColumn<ColumnVector<T>>(in_untyped);
@ -127,8 +137,9 @@ public:
const auto & in_data = in->getData();
if (!memoryIsZero(in_data.data(), 0, in_data.size() * sizeof(in_data[0])))
{
throw Exception(ErrorCodes::FUNCTION_THROW_IF_VALUE_IS_NON_ZERO,
message.value_or("Value passed to '" + getName() + "' function is non zero"));
throw Exception(
error_code.value_or(ErrorCodes::FUNCTION_THROW_IF_VALUE_IS_NON_ZERO),
message.value_or("Value passed to '" + getName() + "' function is non-zero"));
}
size_t result_size = in_untyped->size();
@ -139,6 +150,8 @@ public:
return nullptr;
}
bool allow_custom_error_code_argument;
};
}

View File

@ -8,6 +8,8 @@
#include <Common/Exception.h>
#include <Core/Defines.h>
#include <base/arithmeticOverflow.h>
namespace ProfileEvents
{
@ -19,6 +21,11 @@ namespace ProfileEvents
namespace DB
{
namespace ErrorCodes
{
extern const int ARGUMENT_OUT_OF_BOUND;
}
/** Replacement for std::vector<char> to use in buffers.
* Differs in that is doesn't do unneeded memset. (And also tries to do as little as possible.)
@ -38,9 +45,9 @@ struct Memory : boost::noncopyable, Allocator
Memory() = default;
/// If alignment != 0, then allocate memory aligned to specified value.
explicit Memory(size_t size_, size_t alignment_ = 0) : m_capacity(size_), m_size(m_capacity), alignment(alignment_)
explicit Memory(size_t size_, size_t alignment_ = 0) : alignment(alignment_)
{
alloc();
alloc(size_);
}
~Memory()
@ -75,57 +82,55 @@ struct Memory : boost::noncopyable, Allocator
void resize(size_t new_size)
{
if (0 == m_capacity)
if (!m_data)
{
m_size = new_size;
m_capacity = new_size;
alloc();
alloc(new_size);
return;
}
else if (new_size <= m_capacity - pad_right)
if (new_size <= m_capacity - pad_right)
{
m_size = new_size;
return;
}
else
{
size_t new_capacity = align(new_size, alignment) + pad_right;
size_t diff = new_capacity - m_capacity;
ProfileEvents::increment(ProfileEvents::IOBufferAllocBytes, diff);
size_t new_capacity = withPadding(new_size);
m_data = static_cast<char *>(Allocator::realloc(m_data, m_capacity, new_capacity, alignment));
m_capacity = new_capacity;
m_size = m_capacity - pad_right;
}
size_t diff = new_capacity - m_capacity;
ProfileEvents::increment(ProfileEvents::IOBufferAllocBytes, diff);
m_data = static_cast<char *>(Allocator::realloc(m_data, m_capacity, new_capacity, alignment));
m_capacity = new_capacity;
m_size = new_size;
}
private:
static size_t align(const size_t value, const size_t alignment)
static size_t withPadding(size_t value)
{
if (!alignment)
return value;
size_t res = 0;
if (!(value % alignment))
return value;
if (common::addOverflow<size_t>(value, pad_right, res))
throw Exception("value is too big to apply padding", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
return (value + alignment - 1) / alignment * alignment;
return res;
}
void alloc()
void alloc(size_t new_size)
{
if (!m_capacity)
if (!new_size)
{
m_data = nullptr;
return;
}
ProfileEvents::increment(ProfileEvents::IOBufferAllocs);
ProfileEvents::increment(ProfileEvents::IOBufferAllocBytes, m_capacity);
size_t new_capacity = withPadding(new_size);
ProfileEvents::increment(ProfileEvents::IOBufferAllocs);
ProfileEvents::increment(ProfileEvents::IOBufferAllocBytes, new_capacity);
size_t new_capacity = align(m_capacity, alignment) + pad_right;
m_data = static_cast<char *>(Allocator::alloc(new_capacity, alignment));
m_capacity = new_capacity;
m_size = m_capacity - pad_right;
m_size = new_size;
}
void dealloc()

View File

@ -32,6 +32,9 @@ size_t HTTPChunkedReadBuffer::readChunkHeader()
++in->position();
} while (!in->eof() && isHexDigit(*in->position()));
if (res > max_chunk_size)
throw Exception("Chunk size exceeded the limit", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
/// NOTE: If we want to read any chunk extensions, it should be done here.
skipToCarriageReturnOrEOF(*in);

View File

@ -10,9 +10,12 @@ namespace DB
class HTTPChunkedReadBuffer : public BufferWithOwnMemory<ReadBuffer>
{
public:
explicit HTTPChunkedReadBuffer(std::unique_ptr<ReadBuffer> in_) : in(std::move(in_)) {}
explicit HTTPChunkedReadBuffer(std::unique_ptr<ReadBuffer> in_, size_t max_chunk_size_)
: max_chunk_size(max_chunk_size_), in(std::move(in_))
{}
private:
const size_t max_chunk_size;
std::unique_ptr<ReadBuffer> in;
size_t readChunkHeader();

View File

@ -0,0 +1,328 @@
#include <IO/WriteHelpers.h>
#include <IO/ReadHelpers.h>
#include <IO/BufferWithOwnMemory.h>
#include <gtest/gtest.h>
#define EXPECT_THROW_ERROR_CODE(statement, expected_exception, expected_code) \
EXPECT_THROW( \
try \
{ \
statement; \
} \
catch (const expected_exception & e) \
{ \
EXPECT_EQ(expected_code, e.code()); \
throw; \
} \
, expected_exception)
namespace DB
{
namespace ErrorCodes
{
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int LOGICAL_ERROR;
extern const int CANNOT_ALLOCATE_MEMORY;
}
}
using namespace DB;
class DummyAllocator
{
void * dummy_address = reinterpret_cast<void *>(1);
public:
void * alloc(size_t size, size_t /*alignment*/ = 0)
{
checkSize(size);
if (size)
return dummy_address;
else
return nullptr;
}
void * realloc(void * /*buf*/, size_t /*old_size*/, size_t new_size, size_t /*alignment*/ = 0)
{
checkSize(new_size);
return dummy_address;
}
void free([[maybe_unused]] void * buf, size_t /*size*/)
{
assert(buf == dummy_address);
}
// the same check as in Common/Allocator.h
void static checkSize(size_t size)
{
/// More obvious exception in case of possible overflow (instead of just "Cannot mmap").
if (size >= 0x8000000000000000ULL)
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "Too large size ({}) passed to allocator. It indicates an error.", size);
}
};
TEST(MemoryResizeTest, SmallInitAndSmallResize)
{
{
auto memory = Memory<DummyAllocator>(0);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(0);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
}
{
auto memory = Memory<DummyAllocator>(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
memory.resize(0);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
}
}
TEST(MemoryResizeTest, SmallInitAndBigResizeOverflowWhenPadding)
{
{
auto memory = Memory<DummyAllocator>(0);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
EXPECT_THROW_ERROR_CODE(memory.resize(std::numeric_limits<size_t>::max()), Exception, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
ASSERT_EQ(memory.m_data, nullptr); // state is intact after exception
ASSERT_EQ(memory.m_size, 0);
ASSERT_EQ(memory.m_capacity, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
memory.resize(2);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 17);
ASSERT_EQ(memory.m_size, 2);
EXPECT_THROW_ERROR_CODE(memory.resize(std::numeric_limits<size_t>::max()), Exception, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
ASSERT_TRUE(memory.m_data); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 17);
ASSERT_EQ(memory.m_size, 2);
memory.resize(0x8000000000000000ULL-16);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 0x8000000000000000ULL - 1);
ASSERT_EQ(memory.m_size, 0x8000000000000000ULL - 16);
#ifndef ABORT_ON_LOGICAL_ERROR
EXPECT_THROW_ERROR_CODE(memory.resize(0x8000000000000000ULL-15), Exception, ErrorCodes::LOGICAL_ERROR);
ASSERT_TRUE(memory.m_data); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 0x8000000000000000ULL - 1);
ASSERT_EQ(memory.m_size, 0x8000000000000000ULL - 16);
#endif
}
{
auto memory = Memory<DummyAllocator>(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
EXPECT_THROW_ERROR_CODE(memory.resize(std::numeric_limits<size_t>::max()), Exception, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
ASSERT_TRUE(memory.m_data); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
#ifndef ABORT_ON_LOGICAL_ERROR
EXPECT_THROW_ERROR_CODE(memory.resize(0x8000000000000000ULL-15), Exception, ErrorCodes::LOGICAL_ERROR);
ASSERT_TRUE(memory.m_data); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
#endif
}
}
TEST(MemoryResizeTest, BigInitAndSmallResizeOverflowWhenPadding)
{
{
EXPECT_THROW_ERROR_CODE(
{
auto memory = Memory<DummyAllocator>(std::numeric_limits<size_t>::max());
}
, Exception
, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
}
{
EXPECT_THROW_ERROR_CODE(
{
auto memory = Memory<DummyAllocator>(std::numeric_limits<size_t>::max() - 1);
}
, Exception
, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
}
{
EXPECT_THROW_ERROR_CODE(
{
auto memory = Memory<DummyAllocator>(std::numeric_limits<size_t>::max() - 10);
}
, Exception
, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
}
#ifndef ABORT_ON_LOGICAL_ERROR
{
EXPECT_THROW_ERROR_CODE(
{
auto memory = Memory<DummyAllocator>(std::numeric_limits<size_t>::max() - 15);
}
, Exception
, ErrorCodes::LOGICAL_ERROR);
}
{
EXPECT_THROW_ERROR_CODE(
{
auto memory = Memory<DummyAllocator>(0x8000000000000000ULL - 15);
}
, Exception
, ErrorCodes::LOGICAL_ERROR);
}
#endif
{
auto memory = Memory<DummyAllocator>(0x8000000000000000ULL - 16);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 0x8000000000000000ULL - 1);
ASSERT_EQ(memory.m_size, 0x8000000000000000ULL - 16);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 0x8000000000000000ULL - 1);
ASSERT_EQ(memory.m_size, 1);
}
}
TEST(MemoryResizeTest, AlignmentWithRealAllocator)
{
{
auto memory = Memory<>(0, 3); // not the power of 2 but less than MALLOC_MIN_ALIGNMENT 8 so user-defined alignment is ignored at Allocator
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
memory.resize(2);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 17);
ASSERT_EQ(memory.m_size, 2);
memory.resize(3);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 18);
ASSERT_EQ(memory.m_size, 3);
memory.resize(4);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 19);
ASSERT_EQ(memory.m_size, 4);
memory.resize(0);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 19);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 19);
ASSERT_EQ(memory.m_size, 1);
}
#if !defined(ADDRESS_SANITIZER) && !defined(THREAD_SANITIZER) && !defined(MEMORY_SANITIZER) && !defined(UNDEFINED_BEHAVIOR_SANITIZER)
{
auto memory = Memory<>(0, 10); // not the power of 2
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
EXPECT_THROW_ERROR_CODE(memory.resize(1), ErrnoException, ErrorCodes::CANNOT_ALLOCATE_MEMORY);
ASSERT_EQ(memory.m_data, nullptr); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
}
#endif
{
auto memory = Memory<>(0, 32);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
memory.resize(32);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 47);
ASSERT_EQ(memory.m_size, 32);
}
}
TEST(MemoryResizeTest, SomeAlignmentOverflowWhenAlignment)
{
{
auto memory = Memory<DummyAllocator>(0, 31);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(0);
ASSERT_EQ(memory.m_data, nullptr);
ASSERT_EQ(memory.m_capacity, 0);
ASSERT_EQ(memory.m_size, 0);
memory.resize(1);
ASSERT_TRUE(memory.m_data);
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
EXPECT_THROW_ERROR_CODE(memory.resize(std::numeric_limits<size_t>::max()), Exception, ErrorCodes::ARGUMENT_OUT_OF_BOUND);
ASSERT_TRUE(memory.m_data); // state is intact after exception
ASSERT_EQ(memory.m_capacity, 16);
ASSERT_EQ(memory.m_size, 1);
}
}

View File

@ -1194,7 +1194,7 @@ ActionsDAGPtr ActionsDAG::merge(ActionsDAG && first, ActionsDAG && second)
if (it == first_result.end() || it->second.empty())
{
if (first.project_input)
throw Exception(ErrorCodes::LOGICAL_ERROR,
throw Exception(ErrorCodes::UNKNOWN_IDENTIFIER,
"Cannot find column {} in ActionsDAG result", input_node->result_name);
first.inputs.push_back(input_node);

View File

@ -469,7 +469,7 @@ void ZooKeeperMetadataTransaction::commit()
ClusterPtr tryGetReplicatedDatabaseCluster(const String & cluster_name)
{
if (const auto * replicated_db = dynamic_cast<const DatabaseReplicated *>(DatabaseCatalog::instance().tryGetDatabase(cluster_name).get()))
return replicated_db->getCluster();
return replicated_db->tryGetCluster();
return {};
}

View File

@ -1183,7 +1183,7 @@ void InterpreterSelectQuery::executeImpl(QueryPlan & query_plan, std::optional<P
bool to_aggregation_stage = false;
bool from_aggregation_stage = false;
/// Do I need to aggregate in a separate row rows that have not passed max_rows_to_group_by.
/// Do I need to aggregate in a separate row that has not passed max_rows_to_group_by?
bool aggregate_overflow_row =
expressions.need_aggregate &&
query.group_by_with_totals &&

View File

@ -70,6 +70,7 @@ namespace ErrorCodes
extern const int THERE_IS_NO_COLUMN;
extern const int UNKNOWN_EXCEPTION;
extern const int INCORRECT_NUMBER_OF_COLUMNS;
extern const int INCORRECT_DATA;
}
/// Inserts numeric data right into internal column data to reduce an overhead
@ -266,6 +267,9 @@ static ColumnWithTypeAndName readColumnWithDecimalData(std::shared_ptr<arrow::Ch
/// Creates a null bytemap from arrow's null bitmap
static ColumnPtr readByteMapFromArrowColumn(std::shared_ptr<arrow::ChunkedArray> & arrow_column)
{
if (!arrow_column->null_count())
return ColumnUInt8::create(arrow_column->length(), 0);
auto nullmap_column = ColumnUInt8::create();
PaddedPODArray<UInt8> & bytemap_data = assert_cast<ColumnVector<UInt8> &>(*nullmap_column).getData();
bytemap_data.reserve(arrow_column->length());
@ -298,14 +302,121 @@ static ColumnPtr readOffsetsFromArrowListColumn(std::shared_ptr<arrow::ChunkedAr
return offsets_column;
}
static ColumnPtr readColumnWithIndexesData(std::shared_ptr<arrow::ChunkedArray> & arrow_column)
/*
* Arrow Dictionary and ClickHouse LowCardinality types are a bit different.
* Dictionary(Nullable(X)) in ArrowColumn format is composed of a nullmap, dictionary and an index.
* It doesn't have the concept of null or default values.
* An empty string is just a regular value appended at any position of the dictionary.
* Null values have an index of 0, but it should be ignored since the nullmap will return null.
* In ClickHouse LowCardinality, it's different. The dictionary contains null (if dictionary type is Nullable)
* and default values at the beginning. [default, ...] when default values have index of 0 or [null, default, ...]
* when null values have an index of 0 and default values have an index of 1.
* So, we should remap indexes while converting Arrow Dictionary to ClickHouse LowCardinality
* */
template <typename NumericType, typename VectorType = ColumnVector<NumericType>>
static ColumnWithTypeAndName readColumnWithIndexesDataImpl(std::shared_ptr<arrow::ChunkedArray> & arrow_column, const String & column_name, Int64 default_value_index, NumericType dict_size, bool is_nullable)
{
auto internal_type = std::make_shared<DataTypeNumber<NumericType>>();
auto internal_column = internal_type->createColumn();
auto & column_data = static_cast<VectorType &>(*internal_column).getData();
column_data.reserve(arrow_column->length());
NumericType shift = is_nullable ? 2 : 1;
for (size_t chunk_i = 0, num_chunks = static_cast<size_t>(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i)
{
std::shared_ptr<arrow::Array> chunk = arrow_column->chunk(chunk_i);
if (chunk->length() == 0)
continue;
/// buffers[0] is a null bitmap and buffers[1] are actual values
std::shared_ptr<arrow::Buffer> buffer = chunk->data()->buffers[1];
const auto * data = reinterpret_cast<const NumericType *>(buffer->data());
/// Check that indexes are correct (protection against corrupted files)
for (int64_t i = 0; i != chunk->length(); ++i)
{
if (data[i] < 0 || data[i] >= dict_size)
throw Exception(ErrorCodes::INCORRECT_DATA, "Index {} in Dictionary column is out of bounds, dictionary size is {}", Int64(data[i]), UInt64(dict_size));
}
/// If dictionary type is not nullable and arrow dictionary contains default type
/// at 0 index, we don't need to remap anything (it's the case when this data
/// was generated by ClickHouse)
if (!is_nullable && default_value_index == 0)
{
column_data.insert_assume_reserved(data, data + chunk->length());
}
/// If dictionary don't contain default value, we should move all indexes
/// to the right one or two (if dictionary is Nullable) positions
/// Example:
/// Dictionary:
/// dict: ["one", "two"]
/// indexes: [0, 1, 0]
/// LowCardinality:
/// dict: ["", "one", "two"]
/// indexes: [1, 2, 1]
/// LowCardinality(Nullable):
/// dict: [null, "", "one", "two"]
/// indexes: [2, 3, 2]
else if (default_value_index == -1)
{
for (int64_t i = 0; i != chunk->length(); ++i)
{
if (chunk->IsNull(i))
column_data.push_back(0);
else
column_data.push_back(data[i] + shift);
}
}
/// If dictionary contains default value, we change all indexes of it to
/// 0 or 1 (if dictionary type is Nullable) and move all indexes
/// that are less then default value index to the right one or two
/// (if dictionary is Nullable) position and all indexes that are
/// greater then default value index zero or one (if dictionary is Nullable)
/// positions.
/// Example:
/// Dictionary:
/// dict: ["one", "two", "", "three"]
/// indexes: [0, 1, 2, 3, 0]
/// LowCardinality :
/// dict: ["", "one", "two", "three"]
/// indexes: [1, 2, 0, 3, 1]
/// LowCardinality(Nullable):
/// dict: [null, "", "one", "two", "three"]
/// indexes: [2, 3, 1, 4, 2]
else
{
NumericType new_default_index = is_nullable ? 1 : 0;
NumericType default_index = NumericType(default_value_index);
for (int64_t i = 0; i != chunk->length(); ++i)
{
if (chunk->IsNull(i))
column_data.push_back(0);
else
{
NumericType value = data[i];
if (value == default_index)
value = new_default_index;
else if (value < default_index)
value += shift;
else
value += shift - 1;
column_data.push_back(value);
}
}
}
}
return {std::move(internal_column), std::move(internal_type), column_name};
}
static ColumnPtr readColumnWithIndexesData(std::shared_ptr<arrow::ChunkedArray> & arrow_column, Int64 default_value_index, UInt64 dict_size, bool is_nullable)
{
switch (arrow_column->type()->id())
{
# define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \
case ARROW_NUMERIC_TYPE: \
{ \
return readColumnWithNumericData<CPP_NUMERIC_TYPE>(arrow_column, "").column; \
return readColumnWithIndexesDataImpl<CPP_NUMERIC_TYPE>(arrow_column, "", default_value_index, dict_size, is_nullable).column; \
}
FOR_ARROW_INDEXES_TYPES(DISPATCH)
# undef DISPATCH
@ -327,85 +438,25 @@ static std::shared_ptr<arrow::ChunkedArray> getNestedArrowColumn(std::shared_ptr
return std::make_shared<arrow::ChunkedArray>(array_vector);
}
static ColumnWithTypeAndName createLCColumnFromArrowDictionaryValues(
const std::shared_ptr<ColumnWithTypeAndName> & dict_values,
const ColumnPtr & indexes_column,
const String & column_name
)
{
auto lc_type = std::make_shared<DataTypeLowCardinality>(dict_values->type);
auto lc_column = lc_type->createColumn();
for (auto i = 0u; i < indexes_column->size(); i++)
{
Field f;
dict_values->column->get(indexes_column->getUInt(i), f);
lc_column->insert(f);
}
return {std::move(lc_column), std::move(lc_type), column_name};
}
/*
* Dictionary(Nullable(X)) in ArrowColumn format is composed of a nullmap, dictionary and an index.
* It doesn't have the concept of null or default values.
* An empty string is just a regular value appended at any position of the dictionary.
* Null values have an index of 0, but it should be ignored since the nullmap will return null.
* In ClickHouse LowCardinality, it's different. The dictionary contains null and default values at the beginning.
* [null, default, ...]. Therefore, null values have an index of 0 and default values have an index of 1.
* No nullmap is used.
* */
static ColumnWithTypeAndName createLCOfNullableColumnFromArrowDictionaryValues(
const std::shared_ptr<ColumnWithTypeAndName> & dict_values,
const ColumnPtr & indexes_column,
const ColumnPtr & nullmap_column,
const String & column_name
)
{
/*
* ArrowColumn format handles nulls by maintaining a nullmap column, there is no nullable type.
* Therefore, dict_values->type is the actual data type/ non-nullable. It needs to be transformed into nullable
* so LC column is created from nullable type and a null value at the beginning of the collection
* is automatically added.
* */
auto lc_type = std::make_shared<DataTypeLowCardinality>(makeNullable(dict_values->type));
auto lc_column = lc_type->createColumn();
for (auto i = 0u; i < indexes_column->size(); i++)
{
if (nullmap_column && nullmap_column->getBool(i))
{
lc_column->insertDefault();
}
else
{
Field f;
dict_values->column->get(indexes_column->getUInt(i), f);
lc_column->insert(f);
}
}
return {std::move(lc_column), std::move(lc_type), column_name};
}
static ColumnWithTypeAndName readColumnFromArrowColumn(
std::shared_ptr<arrow::ChunkedArray> & arrow_column,
const std::string & column_name,
const std::string & format_name,
bool is_nullable,
std::unordered_map<String, std::shared_ptr<ColumnWithTypeAndName>> & dictionary_values,
bool read_ints_as_dates,
std::unordered_map<String, ArrowColumnToCHColumn::DictionaryInfo> & dictionary_infos,
bool allow_null_type,
bool skip_columns_with_unsupported_types,
bool & skipped)
bool & skipped,
DataTypePtr type_hint = nullptr)
{
if (!is_nullable && arrow_column->null_count() && arrow_column->type()->id() != arrow::Type::LIST
if (!is_nullable && (arrow_column->null_count() || (type_hint && type_hint->isNullable())) && arrow_column->type()->id() != arrow::Type::LIST
&& arrow_column->type()->id() != arrow::Type::MAP && arrow_column->type()->id() != arrow::Type::STRUCT &&
arrow_column->type()->id() != arrow::Type::DICTIONARY)
{
auto nested_column = readColumnFromArrowColumn(arrow_column, column_name, format_name, true, dictionary_values, read_ints_as_dates, allow_null_type, skip_columns_with_unsupported_types, skipped);
DataTypePtr nested_type_hint;
if (type_hint)
nested_type_hint = removeNullable(type_hint);
auto nested_column = readColumnFromArrowColumn(arrow_column, column_name, format_name, true, dictionary_infos, allow_null_type, skip_columns_with_unsupported_types, skipped, nested_type_hint);
if (skipped)
return {};
auto nullmap_column = readByteMapFromArrowColumn(arrow_column);
@ -435,14 +486,14 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
case arrow::Type::UINT16:
{
auto column = readColumnWithNumericData<UInt16>(arrow_column, column_name);
if (read_ints_as_dates)
if (type_hint && (isDateOrDate32(type_hint) || isDateTime(type_hint) || isDateTime64(type_hint)))
column.type = std::make_shared<DataTypeDate>();
return column;
}
case arrow::Type::UINT32:
{
auto column = readColumnWithNumericData<UInt32>(arrow_column, column_name);
if (read_ints_as_dates)
if (type_hint && (isDateOrDate32(type_hint) || isDateTime(type_hint) || isDateTime64(type_hint)))
column.type = std::make_shared<DataTypeDateTime>();
return column;
}
@ -454,8 +505,15 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
return readColumnWithDecimalData<arrow::Decimal256Array>(arrow_column, column_name);
case arrow::Type::MAP:
{
DataTypePtr nested_type_hint;
if (type_hint)
{
const auto * map_type_hint = typeid_cast<const DataTypeMap *>(type_hint.get());
if (map_type_hint)
nested_type_hint = assert_cast<const DataTypeArray *>(map_type_hint->getNestedType().get())->getNestedType();
}
auto arrow_nested_column = getNestedArrowColumn(arrow_column);
auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_values, read_ints_as_dates, allow_null_type, skip_columns_with_unsupported_types, skipped);
auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_infos, allow_null_type, skip_columns_with_unsupported_types, skipped, nested_type_hint);
if (skipped)
return {};
@ -469,8 +527,15 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
}
case arrow::Type::LIST:
{
DataTypePtr nested_type_hint;
if (type_hint)
{
const auto * array_type_hint = typeid_cast<const DataTypeArray *>(type_hint.get());
if (array_type_hint)
nested_type_hint = array_type_hint->getNestedType();
}
auto arrow_nested_column = getNestedArrowColumn(arrow_column);
auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_values, read_ints_as_dates, allow_null_type, skip_columns_with_unsupported_types, skipped);
auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_infos, allow_null_type, skip_columns_with_unsupported_types, skipped, nested_type_hint);
if (skipped)
return {};
auto offsets_column = readOffsetsFromArrowListColumn(arrow_column);
@ -493,11 +558,25 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
Columns tuple_elements;
DataTypes tuple_types;
std::vector<String> tuple_names;
const auto * tuple_type_hint = type_hint ? typeid_cast<const DataTypeTuple *>(type_hint.get()) : nullptr;
for (int i = 0; i != arrow_struct_type->num_fields(); ++i)
{
auto field_name = arrow_struct_type->field(i)->name();
DataTypePtr nested_type_hint;
if (tuple_type_hint)
{
if (tuple_type_hint->haveExplicitNames())
{
auto pos = tuple_type_hint->tryGetPositionByName(field_name);
if (pos)
nested_type_hint = tuple_type_hint->getElement(*pos);
}
else if (size_t(i) < tuple_type_hint->getElements().size())
nested_type_hint = tuple_type_hint->getElement(i);
}
auto nested_arrow_column = std::make_shared<arrow::ChunkedArray>(nested_arrow_columns[i]);
auto element = readColumnFromArrowColumn(nested_arrow_column, arrow_struct_type->field(i)->name(), format_name, false, dictionary_values, read_ints_as_dates, allow_null_type, skip_columns_with_unsupported_types, skipped);
auto element = readColumnFromArrowColumn(nested_arrow_column, field_name, format_name, false, dictionary_infos, allow_null_type, skip_columns_with_unsupported_types, skipped, nested_type_hint);
if (skipped)
return {};
tuple_elements.emplace_back(std::move(element.column));
@ -511,9 +590,11 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
}
case arrow::Type::DICTIONARY:
{
auto & dict_values = dictionary_values[column_name];
auto & dict_info = dictionary_infos[column_name];
const auto is_lc_nullable = arrow_column->null_count() > 0 || (type_hint && type_hint->isLowCardinalityNullable());
/// Load dictionary values only once and reuse it.
if (!dict_values)
if (!dict_info.values)
{
arrow::ArrayVector dict_array;
for (size_t chunk_i = 0, num_chunks = static_cast<size_t>(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i)
@ -522,8 +603,22 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
dict_array.emplace_back(dict_chunk.dictionary());
}
auto arrow_dict_column = std::make_shared<arrow::ChunkedArray>(dict_array);
auto dict_column = readColumnFromArrowColumn(arrow_dict_column, column_name, format_name, false, dictionary_values, read_ints_as_dates, allow_null_type, skip_columns_with_unsupported_types, skipped);
dict_values = std::make_shared<ColumnWithTypeAndName>(std::move(dict_column));
auto dict_column = readColumnFromArrowColumn(arrow_dict_column, column_name, format_name, false, dictionary_infos, allow_null_type, skip_columns_with_unsupported_types, skipped);
for (size_t i = 0; i != dict_column.column->size(); ++i)
{
if (dict_column.column->isDefaultAt(i))
{
dict_info.default_value_index = i;
break;
}
}
auto lc_type = std::make_shared<DataTypeLowCardinality>(is_lc_nullable ? makeNullable(dict_column.type) : dict_column.type);
auto tmp_lc_column = lc_type->createColumn();
auto tmp_dict_column = IColumn::mutate(assert_cast<ColumnLowCardinality *>(tmp_lc_column.get())->getDictionaryPtr());
dynamic_cast<IColumnUnique *>(tmp_dict_column.get())->uniqueInsertRangeFrom(*dict_column.column, 0, dict_column.column->size());
dict_column.column = std::move(tmp_dict_column);
dict_info.values = std::make_shared<ColumnWithTypeAndName>(std::move(dict_column));
dict_info.dictionary_size = arrow_dict_column->length();
}
arrow::ArrayVector indexes_array;
@ -534,20 +629,10 @@ static ColumnWithTypeAndName readColumnFromArrowColumn(
}
auto arrow_indexes_column = std::make_shared<arrow::ChunkedArray>(indexes_array);
auto indexes_column = readColumnWithIndexesData(arrow_indexes_column);
const auto contains_null = arrow_column->null_count() > 0;
if (contains_null)
{
auto nullmap_column = readByteMapFromArrowColumn(arrow_column);
return createLCOfNullableColumnFromArrowDictionaryValues(dict_values, indexes_column, nullmap_column, column_name);
}
else
{
return createLCColumnFromArrowDictionaryValues(dict_values, indexes_column, column_name);
}
auto indexes_column = readColumnWithIndexesData(arrow_indexes_column, dict_info.default_value_index, dict_info.dictionary_size, is_lc_nullable);
auto lc_column = ColumnLowCardinality::create(dict_info.values->column, indexes_column);
auto lc_type = std::make_shared<DataTypeLowCardinality>(is_lc_nullable ? makeNullable(dict_info.values->type) : dict_info.values->type);
return {std::move(lc_column), std::move(lc_type), column_name};
}
# define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \
case ARROW_NUMERIC_TYPE: \
@ -623,13 +708,13 @@ Block ArrowColumnToCHColumn::arrowSchemaToCHHeader(
arrow::ArrayVector array_vector = {arrow_array};
auto arrow_column = std::make_shared<arrow::ChunkedArray>(array_vector);
std::unordered_map<std::string, std::shared_ptr<ColumnWithTypeAndName>> dict_values;
std::unordered_map<std::string, DictionaryInfo> dict_infos;
bool skipped = false;
bool allow_null_type = false;
if (hint_header && hint_header->has(field->name()) && hint_header->getByName(field->name()).type->isNullable())
allow_null_type = true;
ColumnWithTypeAndName sample_column = readColumnFromArrowColumn(
arrow_column, field->name(), format_name, false, dict_values, false, allow_null_type, skip_columns_with_unsupported_types, skipped);
arrow_column, field->name(), format_name, false, dict_infos, allow_null_type, skip_columns_with_unsupported_types, skipped);
if (!skipped)
sample_columns.emplace_back(std::move(sample_column));
}
@ -700,9 +785,17 @@ void ArrowColumnToCHColumn::arrowColumnsToCHChunk(Chunk & res, NameToColumnPtr &
{
if (!nested_tables.contains(search_nested_table_name))
{
NamesAndTypesList nested_columns;
for (const auto & name_and_type : header.getNamesAndTypesList())
{
if (name_and_type.name.starts_with(nested_table_name + "."))
nested_columns.push_back(name_and_type);
}
auto nested_table_type = Nested::collect(nested_columns).front().type;
std::shared_ptr<arrow::ChunkedArray> arrow_column = name_to_column_ptr[search_nested_table_name];
ColumnsWithTypeAndName cols = {readColumnFromArrowColumn(
arrow_column, nested_table_name, format_name, false, dictionary_values, true, true, false, skipped)};
arrow_column, nested_table_name, format_name, false, dictionary_infos, true, false, skipped, nested_table_type)};
BlockPtr block_ptr = std::make_shared<Block>(cols);
auto column_extractor = std::make_shared<NestedColumnExtractHelper>(*block_ptr, case_insensitive_matching);
nested_tables[search_nested_table_name] = {block_ptr, column_extractor};
@ -735,7 +828,7 @@ void ArrowColumnToCHColumn::arrowColumnsToCHChunk(Chunk & res, NameToColumnPtr &
{
auto arrow_column = name_to_column_ptr[search_column_name];
column = readColumnFromArrowColumn(
arrow_column, header_column.name, format_name, false, dictionary_values, true, true, false, skipped);
arrow_column, header_column.name, format_name, false, dictionary_infos, true, false, skipped, header_column.type);
}
try

View File

@ -44,6 +44,14 @@ public:
const Block * hint_header = nullptr,
bool ignore_case = false);
struct DictionaryInfo
{
std::shared_ptr<ColumnWithTypeAndName> values;
Int64 default_value_index = -1;
UInt64 dictionary_size;
};
private:
const Block & header;
const std::string format_name;
@ -55,7 +63,7 @@ private:
/// Map {column name : dictionary column}.
/// To avoid converting dictionary from Arrow Dictionary
/// to LowCardinality every chunk we save it and reuse.
std::unordered_map<std::string, std::shared_ptr<ColumnWithTypeAndName>> dictionary_values;
std::unordered_map<std::string, DictionaryInfo> dictionary_infos;
};
}

View File

@ -235,27 +235,30 @@ namespace DB
}
template<typename T>
static PaddedPODArray<Int64> extractIndexesImpl(ColumnPtr column, size_t start, size_t end)
static PaddedPODArray<Int64> extractIndexesImpl(ColumnPtr column, size_t start, size_t end, bool shift)
{
const PaddedPODArray<T> & data = assert_cast<const ColumnVector<T> *>(column.get())->getData();
PaddedPODArray<Int64> result;
result.reserve(end - start);
std::transform(data.begin() + start, data.begin() + end, std::back_inserter(result), [](T value) { return Int64(value); });
if (shift)
std::transform(data.begin() + start, data.begin() + end, std::back_inserter(result), [](T value) { return Int64(value) - 1; });
else
std::transform(data.begin() + start, data.begin() + end, std::back_inserter(result), [](T value) { return Int64(value); });
return result;
}
static PaddedPODArray<Int64> extractIndexesImpl(ColumnPtr column, size_t start, size_t end)
static PaddedPODArray<Int64> extractIndexesImpl(ColumnPtr column, size_t start, size_t end, bool shift)
{
switch (column->getDataType())
{
case TypeIndex::UInt8:
return extractIndexesImpl<UInt8>(column, start, end);
return extractIndexesImpl<UInt8>(column, start, end, shift);
case TypeIndex::UInt16:
return extractIndexesImpl<UInt16>(column, start, end);
return extractIndexesImpl<UInt16>(column, start, end, shift);
case TypeIndex::UInt32:
return extractIndexesImpl<UInt32>(column, start, end);
return extractIndexesImpl<UInt32>(column, start, end, shift);
case TypeIndex::UInt64:
return extractIndexesImpl<UInt64>(column, start, end);
return extractIndexesImpl<UInt64>(column, start, end, shift);
default:
throw Exception(fmt::format("Indexes column must be ColumnUInt, got {}.", column->getName()),
ErrorCodes::LOGICAL_ERROR);
@ -267,7 +270,7 @@ namespace DB
const String & column_name,
ColumnPtr & column,
const std::shared_ptr<const IDataType> & column_type,
const PaddedPODArray<UInt8> * null_bytemap,
const PaddedPODArray<UInt8> *,
arrow::ArrayBuilder * array_builder,
String format_name,
size_t start,
@ -278,6 +281,7 @@ namespace DB
const auto * column_lc = assert_cast<const ColumnLowCardinality *>(column.get());
arrow::DictionaryBuilder<ValueType> * builder = assert_cast<arrow::DictionaryBuilder<ValueType> *>(array_builder);
auto & dict_values = dictionary_values[column_name];
bool is_nullable = column_type->isLowCardinalityNullable();
/// Convert dictionary from LowCardinality to Arrow dictionary only once and then reuse it.
if (!dict_values)
@ -288,9 +292,9 @@ namespace DB
arrow::Status status = MakeBuilder(pool, value_type, &values_builder);
checkStatus(status, column->getName(), format_name);
auto dict_column = column_lc->getDictionary().getNestedColumn();
const auto & dict_type = assert_cast<const DataTypeLowCardinality *>(column_type.get())->getDictionaryType();
fillArrowArray(column_name, dict_column, dict_type, nullptr, values_builder.get(), format_name, 0, dict_column->size(), output_string_as_string, dictionary_values);
auto dict_column = column_lc->getDictionary().getNestedNotNullableColumn();
const auto & dict_type = removeNullable(assert_cast<const DataTypeLowCardinality *>(column_type.get())->getDictionaryType());
fillArrowArray(column_name, dict_column, dict_type, nullptr, values_builder.get(), format_name, is_nullable, dict_column->size(), output_string_as_string, dictionary_values);
status = values_builder->Finish(&dict_values);
checkStatus(status, column->getName(), format_name);
}
@ -300,15 +304,14 @@ namespace DB
/// AppendIndices in DictionaryBuilder works only with int64_t data, so we cannot use
/// fillArrowArray here and should copy all indexes to int64_t container.
auto indexes = extractIndexesImpl(column_lc->getIndexesPtr(), start, end);
auto indexes = extractIndexesImpl(column_lc->getIndexesPtr(), start, end, is_nullable);
const uint8_t * arrow_null_bytemap_raw_ptr = nullptr;
PaddedPODArray<uint8_t> arrow_null_bytemap;
if (null_bytemap)
if (column_type->isLowCardinalityNullable())
{
/// Invert values since Arrow interprets 1 as a non-null value, while CH as a null
arrow_null_bytemap.reserve(end - start);
for (size_t i = start; i < end; ++i)
arrow_null_bytemap.emplace_back(!(*null_bytemap)[i]);
arrow_null_bytemap.emplace_back(!column_lc->isNullAt(i));
arrow_null_bytemap_raw_ptr = arrow_null_bytemap.data();
}
@ -680,7 +683,7 @@ namespace DB
{
auto nested_type = assert_cast<const DataTypeLowCardinality *>(column_type.get())->getDictionaryType();
const auto * lc_column = assert_cast<const ColumnLowCardinality *>(column.get());
const auto & nested_column = lc_column->getDictionaryPtr();
const auto & nested_column = lc_column->getDictionary().getNestedColumn();
const auto & indexes_column = lc_column->getIndexesPtr();
return arrow::dictionary(
getArrowTypeForLowCardinalityIndexes(indexes_column),

View File

@ -46,7 +46,7 @@ HTTPServerRequest::HTTPServerRequest(ContextPtr context, HTTPServerResponse & re
readRequest(*in); /// Try parse according to RFC7230
if (getChunkedTransferEncoding())
stream = std::make_unique<HTTPChunkedReadBuffer>(std::move(in));
stream = std::make_unique<HTTPChunkedReadBuffer>(std::move(in), context->getSettingsRef().http_max_chunk_size);
else if (hasContentLength())
stream = std::make_unique<LimitReadBuffer>(std::move(in), getContentLength(), false);
else if (getMethod() != HTTPRequest::HTTP_GET && getMethod() != HTTPRequest::HTTP_HEAD && getMethod() != HTTPRequest::HTTP_DELETE)

View File

@ -79,10 +79,18 @@ NameSet IMergedBlockOutputStream::removeEmptyColumnsFromPart(
}
/// Remove files on disk and checksums
for (const String & removed_file : remove_files)
for (auto itr = remove_files.begin(); itr != remove_files.end();)
{
if (checksums.files.contains(removed_file))
checksums.files.erase(removed_file);
if (checksums.files.contains(*itr))
{
checksums.files.erase(*itr);
++itr;
}
else /// If we have no file in checksums it doesn't exist on disk
{
LOG_TRACE(storage.log, "Files {} doesn't exist in checksums so it doesn't exist on disk, will not try to remove it", *itr);
itr = remove_files.erase(itr);
}
}
/// Remove columns from columns array

View File

@ -703,11 +703,11 @@ size_t MergeTreeBaseSelectProcessor::estimateMaxBatchSizeForHugeRanges()
{
/// This is an empirical number and it is so,
/// because we have an adaptive granularity by default.
const size_t average_granule_size_bytes = 8UL * 1024 * 1024 * 10; // 10 MiB
const size_t average_granule_size_bytes = 1024 * 1024 * 10; // 10 MiB
/// We want to have one RTT per one gigabyte of data read from disk
/// this could be configurable.
const size_t max_size_for_one_request = 8UL * 1024 * 1024 * 1024; // 1 GiB
const size_t max_size_for_one_request = 1024 * 1024 * 1024; // 1 GiB
size_t sum_average_marks_size = 0;
/// getColumnSize is not fully implemented for compact parts

View File

@ -94,7 +94,7 @@ void MergedBlockOutputStream::Finalizer::Impl::finish()
{
writer.finish(sync);
for (const auto & file_name: files_to_remove_after_finish)
for (const auto & file_name : files_to_remove_after_finish)
data_part_storage_builder->removeFile(file_name);
for (auto & file : written_files)

View File

@ -70,23 +70,21 @@ void ReplicatedMergeTreePartCheckThread::enqueuePart(const String & name, time_t
void ReplicatedMergeTreePartCheckThread::cancelRemovedPartsCheck(const MergeTreePartInfo & drop_range_info)
{
/// Wait for running tasks to finish and temporarily stop checking
stop();
SCOPE_EXIT({ start(); });
auto pause_checking_parts = task->getExecLock();
std::lock_guard lock(parts_mutex);
for (auto it = parts_queue.begin(); it != parts_queue.end();)
{
std::lock_guard lock(parts_mutex);
for (auto it = parts_queue.begin(); it != parts_queue.end();)
if (drop_range_info.contains(MergeTreePartInfo::fromPartName(it->first, storage.format_version)))
{
if (drop_range_info.contains(MergeTreePartInfo::fromPartName(it->first, storage.format_version)))
{
/// Remove part from the queue to avoid part resurrection
/// if we will check it and enqueue fetch after DROP/REPLACE execution.
parts_set.erase(it->first);
it = parts_queue.erase(it);
}
else
{
++it;
}
/// Remove part from the queue to avoid part resurrection
/// if we will check it and enqueue fetch after DROP/REPLACE execution.
parts_set.erase(it->first);
it = parts_queue.erase(it);
}
else
{
++it;
}
}
}

View File

@ -1024,7 +1024,7 @@ void ReplicatedMergeTreeQueue::removePartProducingOpsInRange(
[[maybe_unused]] bool called_from_alter_query_directly = covering_entry && covering_entry->replace_range_entry
&& covering_entry->replace_range_entry->columns_version < 0;
[[maybe_unused]] bool called_for_broken_part = !covering_entry;
assert(currently_executing_drop_or_replace_range || called_from_alter_query_directly || called_for_broken_part);
assert(currently_executing_drop_replace_ranges.contains(part_info) || called_from_alter_query_directly || called_for_broken_part);
for (Queue::iterator it = queue.begin(); it != queue.end();)
{
@ -1367,15 +1367,26 @@ bool ReplicatedMergeTreeQueue::shouldExecuteLogEntry(
/// DROP_RANGE and REPLACE_RANGE entries remove other entries, which produce parts in the range.
/// If such part producing operations are currently executing, then DROP/REPLACE RANGE wait them to finish.
/// Deadlock is possible if multiple DROP/REPLACE RANGE entries are executing in parallel and wait each other.
/// But it should not happen if ranges are disjoint.
/// See also removePartProducingOpsInRange(...) and ReplicatedMergeTreeQueue::CurrentlyExecuting.
if (currently_executing_drop_or_replace_range)
if (auto drop_range = entry.getDropRange(format_version))
{
out_postpone_reason = fmt::format(
"Not executing log entry {} of type {} for part {} "
"because another DROP_RANGE or REPLACE_RANGE entry are currently executing.",
entry.znode_name, entry.typeToString(), entry.new_part_name);
LOG_TRACE(log, fmt::runtime(out_postpone_reason));
return false;
auto drop_range_info = MergeTreePartInfo::fromPartName(*drop_range, format_version);
for (const auto & info : currently_executing_drop_replace_ranges)
{
if (drop_range_info.isDisjoint(info))
continue;
out_postpone_reason = fmt::format(
"Not executing log entry {} of type {} for part {} "
"because another DROP_RANGE or REPLACE_RANGE entry with not disjoint range {} is currently executing.",
entry.znode_name,
entry.typeToString(),
entry.new_part_name,
info.getPartName());
LOG_TRACE(log, fmt::runtime(out_postpone_reason));
return false;
}
}
if (entry.isDropPart(format_version))
@ -1442,10 +1453,11 @@ ReplicatedMergeTreeQueue::CurrentlyExecuting::CurrentlyExecuting(
const ReplicatedMergeTreeQueue::LogEntryPtr & entry_, ReplicatedMergeTreeQueue & queue_, std::unique_lock<std::mutex> & /* state_lock */)
: entry(entry_), queue(queue_)
{
if (entry->type == ReplicatedMergeTreeLogEntry::DROP_RANGE || entry->type == ReplicatedMergeTreeLogEntry::REPLACE_RANGE)
if (auto drop_range = entry->getDropRange(queue.format_version))
{
assert(!queue.currently_executing_drop_or_replace_range);
queue.currently_executing_drop_or_replace_range = true;
auto drop_range_info = MergeTreePartInfo::fromPartName(*drop_range, queue.format_version);
[[maybe_unused]] bool inserted = queue.currently_executing_drop_replace_ranges.emplace(drop_range_info).second;
assert(inserted);
}
entry->currently_executing = true;
++entry->num_tries;
@ -1497,10 +1509,11 @@ ReplicatedMergeTreeQueue::CurrentlyExecuting::~CurrentlyExecuting()
{
std::lock_guard lock(queue.state_mutex);
if (entry->type == ReplicatedMergeTreeLogEntry::DROP_RANGE || entry->type == ReplicatedMergeTreeLogEntry::REPLACE_RANGE)
if (auto drop_range = entry->getDropRange(queue.format_version))
{
assert(queue.currently_executing_drop_or_replace_range);
queue.currently_executing_drop_or_replace_range = false;
auto drop_range_info = MergeTreePartInfo::fromPartName(*drop_range, queue.format_version);
[[maybe_unused]] bool removed = queue.currently_executing_drop_replace_ranges.erase(drop_range_info);
assert(removed);
}
entry->currently_executing = false;
entry->execution_complete.notify_all();

View File

@ -96,7 +96,7 @@ private:
FuturePartsSet future_parts;
/// Avoid parallel execution of queue enties, which may remove other entries from the queue.
bool currently_executing_drop_or_replace_range = false;
std::set<MergeTreePartInfo> currently_executing_drop_replace_ranges;
/** What will be the set of active parts after executing all log entries up to log_pointer.
* Used to determine which merges can be assigned (see ReplicatedMergeTreeMergePredicate)

View File

@ -151,13 +151,13 @@ bool ReplicatedMergeTreeRestartingThread::runImpl()
setNotReadonly();
/// Start queue processing
storage.part_check_thread.start();
storage.background_operations_assignee.start();
storage.queue_updating_task->activateAndSchedule();
storage.mutations_updating_task->activateAndSchedule();
storage.mutations_finalizing_task->activateAndSchedule();
storage.merge_selecting_task->activateAndSchedule();
storage.cleanup_thread.start();
storage.part_check_thread.start();
return true;
}
@ -356,6 +356,7 @@ void ReplicatedMergeTreeRestartingThread::partialShutdown(bool part_of_full_shut
storage.mutations_finalizing_task->deactivate();
storage.cleanup_thread.stop();
storage.part_check_thread.stop();
/// Stop queue processing
{
@ -365,9 +366,6 @@ void ReplicatedMergeTreeRestartingThread::partialShutdown(bool part_of_full_shut
storage.background_operations_assignee.finish();
}
/// Stop part_check_thread after queue processing, because some queue tasks may restart part_check_thread
storage.part_check_thread.stop();
LOG_TRACE(log, "Threads finished");
}

View File

@ -2,6 +2,7 @@
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Storages/StorageMerge.h>
#include <Storages/StorageFactory.h>
#include <Storages/StorageView.h>
#include <Storages/VirtualColumnUtils.h>
#include <Storages/AlterCommands.h>
#include <Storages/checkAndGetLiteralArgument.h>
@ -528,15 +529,33 @@ QueryPipelineBuilderPtr ReadFromMerge::createSources(
real_column_names.push_back(ExpressionActions::getSmallestColumn(storage_snapshot->metadata->getColumns().getAllPhysical()));
QueryPlan plan;
storage->read(
plan,
real_column_names,
storage_snapshot,
modified_query_info,
modified_context,
processed_stage,
max_block_size,
UInt32(streams_num));
if (StorageView * view = dynamic_cast<StorageView *>(storage.get()))
{
/// For view storage, we need to rewrite the `modified_query_info.view_query` to optimize read.
/// The most intuitive way is to use InterpreterSelectQuery.
/// Intercept the settings
modified_context->setSetting("max_threads", streams_num);
modified_context->setSetting("max_streams_to_max_threads_ratio", 1);
modified_context->setSetting("max_block_size", max_block_size);
InterpreterSelectQuery(
modified_query_info.query, modified_context, storage, view->getInMemoryMetadataPtr(), SelectQueryOptions(processed_stage))
.buildQueryPlan(plan);
}
else
{
storage->read(
plan,
real_column_names,
storage_snapshot,
modified_query_info,
modified_context,
processed_stage,
max_block_size,
UInt32(streams_num));
}
if (!plan.isInitialized())
return {};

View File

@ -39,20 +39,8 @@ void StorageSystemClusters::fillData(MutableColumns & res_columns, ContextPtr co
{
if (const auto * replicated = typeid_cast<const DatabaseReplicated *>(name_and_database.second.get()))
{
// A quick fix for stateless tests with DatabaseReplicated. Its ZK
// node can be destroyed at any time. If another test lists
// system.clusters to get client command line suggestions, it will
// get an error when trying to get the info about DB from ZK.
// Just ignore these inaccessible databases. A good example of a
// failing test is `01526_client_start_and_exit`.
try
{
writeCluster(res_columns, {name_and_database.first, replicated->getCluster()});
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
if (auto database_cluster = replicated->tryGetCluster())
writeCluster(res_columns, {name_and_database.first, database_cluster});
}
}
}

View File

@ -7,6 +7,7 @@ const char * auto_contributors[] {
"243f6a88 85a308d3",
"243f6a8885a308d313198a2e037",
"3ldar-nasyrov",
"546",
"7",
"821008736@qq.com",
"ANDREI STAROVEROV",
@ -15,6 +16,7 @@ const char * auto_contributors[] {
"Ahmed Dardery",
"Aimiyoo",
"Akazz",
"AlPerevyshin",
"Alain BERRIER",
"Albert Kidrachev",
"Alberto",
@ -215,6 +217,7 @@ const char * auto_contributors[] {
"DF5HSE",
"DIAOZHAFENG",
"Dale McDiarmid",
"Dale Mcdiarmid",
"Dan Roscigno",
"DanRoscigno",
"Daniel Bershatsky",
@ -261,6 +264,7 @@ const char * auto_contributors[] {
"Dongdong Yang",
"DoomzD",
"Dr. Strange Looker",
"Duc Canh Le",
"DuckSoft",
"Egor O'Sten",
"Egor Savin",
@ -293,6 +297,7 @@ const char * auto_contributors[] {
"Fabiano Francesconi",
"Fadi Hadzh",
"Fan()",
"Fangyuan Deng",
"FawnD2",
"Federico Ceratto",
"Federico Rodriguez",
@ -349,6 +354,7 @@ const char * auto_contributors[] {
"HuFuwang",
"Hui Wang",
"ILya Limarenko",
"Ignat Loskutov",
"Igor",
"Igor Hatarist",
"Igor Mineev",
@ -412,6 +418,7 @@ const char * auto_contributors[] {
"John Skopis",
"Jonatas Freitas",
"Jordi Villar",
"Josh Taylor",
"João Figueiredo",
"Julian Gilyadov",
"Julian Zhou",
@ -461,10 +468,12 @@ const char * auto_contributors[] {
"Leopold Schabel",
"Lev Borodin",
"Lewinma",
"Li Yin",
"Liu Cong",
"LiuCong",
"LiuYangkuan",
"Lopatin Konstantin",
"Lorenzo Mangani",
"Loud_Scream",
"Lucid Dreams",
"Luis Bosque",
@ -477,6 +486,7 @@ const char * auto_contributors[] {
"Maksim",
"Maksim Fedotov",
"Maksim Kita",
"Mallik Hassan",
"Malte",
"Marat IDRISOV",
"Marcelo Rodriguez",
@ -556,6 +566,7 @@ const char * auto_contributors[] {
"Mikhail f. Shiryaev",
"MikuSugar",
"Milad Arabi",
"Mingliang Pan",
"Misko Lee",
"Mohamad Fadhil",
"Mohammad Hossein Sekhavat",
@ -605,6 +616,7 @@ const char * auto_contributors[] {
"Nikolay Vasiliev",
"Nikolay Volosatov",
"Nir Peled",
"Nityananda Gohain",
"Niu Zhaojie",
"Odin Hultgren Van Der Horst",
"Okada Haruki",
@ -738,7 +750,9 @@ const char * auto_contributors[] {
"Simon Podlipsky",
"Sina",
"Sjoerd Mulder",
"SkyhotQin",
"Slach",
"Smita Kulkarni",
"Snow",
"Sofia Antipushina",
"Stanislav Pavlovichev",
@ -747,6 +761,7 @@ const char * auto_contributors[] {
"Stefan Thies",
"Stepan",
"Stepan Herold",
"Stephan",
"Steve-金勇",
"Stig Bakken",
"Storozhuk Kostiantyn",
@ -763,6 +778,7 @@ const char * auto_contributors[] {
"Tai White",
"Taleh Zaliyev",
"Tangaev",
"Tanya Bragin",
"Tatiana",
"Tatiana Kirillova",
"Teja",
@ -910,6 +926,8 @@ const char * auto_contributors[] {
"alesapin",
"alex-zaitsev",
"alex.lvxin",
"alexX512",
"alexander goryanets",
"alexander kozhikhov",
"alexey-milovidov",
"alexeypavlenko",
@ -968,6 +986,7 @@ const char * auto_contributors[] {
"chertus",
"chou.fan",
"christophe.kalenzaga",
"clickhouse-robot-curie",
"cms",
"cmsxbc",
"cn-ds",
@ -1197,6 +1216,7 @@ const char * auto_contributors[] {
"mwish",
"myrrc",
"nagorny",
"nathanbegbie",
"nauta",
"nautaa",
"ndchikin",
@ -1233,6 +1253,7 @@ const char * auto_contributors[] {
"proller",
"pufit",
"pyos",
"pzhdfy",
"qianlixiang",
"qianmoQ",
"qieqieplus",
@ -1242,6 +1263,7 @@ const char * auto_contributors[] {
"r1j1k",
"rainbowsysu",
"redclusive",
"renwujie",
"rfraposa",
"ritaank",
"rnbondarenko",
@ -1319,11 +1341,13 @@ const char * auto_contributors[] {
"vitstn",
"vivarum",
"vladimir golovchenko",
"vsrsvas",
"vxider",
"vzakaznikov",
"wangchao",
"wangdh15",
"weeds085490",
"whysage",
"wuxiaobai24",
"wzl",
"xPoSx",

View File

@ -211,7 +211,7 @@ Merge it only if you intend to backport changes to the target branch, otherwise
"Assing to assignees of the original PR: %s",
", ".join(user.login for user in self.pr.assignees),
)
self.cherrypick_pr.add_to_assignees(self.pr.assignees)
self.cherrypick_pr.add_to_assignees(*self.pr.assignees)
logging.info("Assign to the author of the original PR: %s", self.pr.user.login)
self.cherrypick_pr.add_to_assignees(self.pr.user)
@ -249,7 +249,7 @@ Merge it only if you intend to backport changes to the target branch, otherwise
"Assing to assignees of the original PR: %s",
", ".join(user.login for user in self.pr.assignees),
)
self.cherrypick_pr.add_to_assignees(self.pr.assignees)
self.cherrypick_pr.add_to_assignees(*self.pr.assignees)
logging.info("Assign to the author of the original PR: %s", self.pr.user.login)
self.backport_pr.add_to_assignees(self.pr.user)

View File

@ -98,7 +98,7 @@ class Packages:
class S3:
template = (
f"{S3_DOWNLOAD}"
f"{S3_DOWNLOAD}/"
# "clickhouse-builds/"
f"{S3_BUILDS_BUCKET}/"
# "33333/" or "21.11/" from --release, if pull request is omitted

View File

@ -34,7 +34,7 @@ def get_run_command(
# a static link, don't use S3_URL or S3_DOWNLOAD
"-e S3_URL='https://s3.amazonaws.com/clickhouse-datasets' "
# For dmesg
"--cap-add syslog "
"--privileged "
f"--volume={build_path}:/package_folder "
f"--volume={result_folder}:/test_output "
f"--volume={repo_tests_path}:/usr/share/clickhouse-test "

View File

@ -1,6 +1,6 @@
<?xml version="1.0"?>
<clickhouse>
<concurrent_threads_soft_limit>1</concurrent_threads_soft_limit>
<concurrent_threads_soft_limit_num>1</concurrent_threads_soft_limit_num>
<query_log>
<database>system</database>
<table>query_log</table>

View File

@ -1,6 +1,6 @@
<?xml version="1.0"?>
<clickhouse>
<concurrent_threads_soft_limit>50</concurrent_threads_soft_limit>
<concurrent_threads_soft_limit_num>50</concurrent_threads_soft_limit_num>
<query_log>
<database>system</database>
<table>query_log</table>

View File

@ -1,6 +1,6 @@
<?xml version="1.0"?>
<clickhouse>
<concurrent_threads_soft_limit>10</concurrent_threads_soft_limit>
<concurrent_threads_soft_limit_num>10</concurrent_threads_soft_limit_num>
<query_log>
<database>system</database>
<table>query_log</table>

View File

@ -1488,42 +1488,38 @@ def test_wrong_format_usage(started_cluster):
assert "Not a Parquet file" in result
def get_profile_event_for_query(instance, query, profile_event):
def check_profile_event_for_query(instance, query, profile_event, amount):
instance.query("system flush logs")
time.sleep(0.5)
query = query.replace("'", "\\'")
return int(
instance.query(
f"select ProfileEvents['{profile_event}'] from system.query_log where query='{query}' and type = 'QueryFinish' order by event_time desc limit 1"
attempt = 0
res = 0
while attempt < 10:
res = int(
instance.query(
f"select ProfileEvents['{profile_event}'] from system.query_log where query='{query}' and type = 'QueryFinish' order by event_time desc limit 1"
)
)
)
if res == amount:
break
assert res == amount
def check_cache_misses(instance, file, storage_name, started_cluster, bucket, amount=1):
query = f"desc {storage_name}('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/{file}')"
assert (
get_profile_event_for_query(instance, query, "SchemaInferenceCacheMisses")
== amount
)
check_profile_event_for_query(instance, query, "SchemaInferenceCacheMisses", amount)
def check_cache_hits(instance, file, storage_name, started_cluster, bucket, amount=1):
query = f"desc {storage_name}('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/{file}')"
assert (
get_profile_event_for_query(instance, query, "SchemaInferenceCacheHits")
== amount
)
check_profile_event_for_query(instance, query, "SchemaInferenceCacheHits", amount)
def check_cache_invalidations(
instance, file, storage_name, started_cluster, bucket, amount=1
):
query = f"desc {storage_name}('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/{file}')"
assert (
get_profile_event_for_query(
instance, query, "SchemaInferenceCacheInvalidations"
)
== amount
check_profile_event_for_query(
instance, query, "SchemaInferenceCacheInvalidations", amount
)
@ -1531,9 +1527,8 @@ def check_cache_evictions(
instance, file, storage_name, started_cluster, bucket, amount=1
):
query = f"desc {storage_name}('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/{file}')"
assert (
get_profile_event_for_query(instance, query, "SchemaInferenceCacheEvictions")
== amount
check_profile_event_for_query(
instance, query, "SchemaInferenceCacheEvictions", amount
)

View File

@ -0,0 +1,10 @@
<test>
<create_query>CREATE TABLE test (uint32 UInt32, n_uint32 Nullable(UInt32), lc LowCardinality(String)) ENGINE=File(Arrow) SETTINGS output_format_arrow_low_cardinality_as_dictionary=1</create_query>
<fill_query>insert into test select number, number, toString(number % 10000) from numbers(10000000)</fill_query>
<query>SELECT uint32 from test format Null</query>
<query>SELECT n_uint32 from test format Null</query>
<query>SELECT lc from test format Null</query>
<drop_query>DROP TABLE IF EXISTS test</drop_query>
</test>

View File

@ -1,3 +1,13 @@
1
1
1000000
1
1
1
1
1
1
1
1
1
1
1000

View File

@ -4,9 +4,43 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
default_exception_message="Value passed to 'throwIf' function is non zero"
custom_exception_message="Number equals 1000000"
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000000) FROM system.numbers" 2>&1 | grep -cF "$default_exception_message"
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000000, '$custom_exception_message') FROM system.numbers" 2>&1 | grep -v '^(query: ' | grep -cF "$custom_exception_message"
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT sum(x = 0) FROM (SELECT throwIf(number = 1000000) AS x FROM numbers(1000000))" 2>&1
default_exception_message="Value passed to 'throwIf' function is non-zero"
custom_exception_message="Number equals 1000"
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000) FROM system.numbers" 2>&1 \
| grep -cF "$default_exception_message"
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000, '$custom_exception_message') FROM system.numbers" 2>&1 \
| grep -v '^(query: ' | grep -cF "$custom_exception_message"
# Custom error code arguments are not enabled via configuration.
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000, '$custom_exception_message', 1) FROM system.numbers" 2>&1 \
| grep -v '^(query: ' | grep -c "Number of arguments for function throwIf doesn't match: passed 3, should be 1 or 2"
# Custom error code argument enabled but using the wrong type.
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000, '$custom_exception_message', 1) FROM system.numbers SETTINGS allow_custom_error_code_in_throwif=true" 2>&1 \
| grep -v '^(query: ' | grep -c "Third argument of function throwIf must be Int8, Int16 or Int32 (passed: UInt8)"
# Normal error code + some weird ones.
# Internal error codes use the upper half of 32-bit int.
custom_error_codes=(
"42"
"0" # OK
"101" # UNEXPECTED_PACKET_FROM_CLIENT (interpreted by client)
"102" # UNEXPECTED_PACKET_FROM_SERVER (interpreted by client)
"1001" # STD_EXCEPTION
"1002" # UNKNOWN_EXCEPTION
"999999" # Unused error code.
"-1") # Also unused. Weird but we should allow throwing negative errors.
for ec in "${custom_error_codes[@]}"
do
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT throwIf(number = 1000, '$custom_exception_message', toInt32($ec)) FROM system.numbers SETTINGS allow_custom_error_code_in_throwif=true" 2>&1 \
| grep -v '^(query: ' | grep -c "Code: $ec.*$custom_exception_message"
done
${CLICKHOUSE_CLIENT} --server_logs_file /dev/null --query="SELECT sum(x = 0) FROM (SELECT throwIf(number = 1000) AS x FROM numbers(1000))" 2>&1

View File

@ -1,55 +0,0 @@
#!/usr/bin/env python3
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
import os
import sys
import signal
CURDIR = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(CURDIR, 'helpers'))
from client import client, prompt, end_of_block
log = None
# uncomment the line below for debugging
#log=sys.stdout
with client(name='client1>', log=log) as client1, client(name='client2>', log=log) as client2:
client1.expect(prompt)
client2.expect(prompt)
client1.send('SET allow_experimental_live_view = 1')
client1.expect(prompt)
client2.send('SET allow_experimental_live_view = 1')
client2.expect(prompt)
client1.send('DROP TABLE IF EXISTS test.lv')
client1.expect(prompt)
client1.send('DROP TABLE IF EXISTS test.mt')
client1.expect(prompt)
client1.send('CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple()')
client1.expect(prompt)
client1.send('CREATE LIVE VIEW test.lv WITH TIMEOUT 1 AS SELECT sum(a) FROM test.mt')
client1.expect(prompt)
client1.send('WATCH test.lv')
client1.expect('_version')
client1.expect(r'0.*1' + end_of_block)
client2.send('INSERT INTO test.mt VALUES (1),(2),(3)')
client2.expect(prompt)
client1.expect(r'6.*2' + end_of_block)
client2.send('INSERT INTO test.mt VALUES (4),(5),(6)')
client2.expect(prompt)
client1.expect(r'21.*3' + end_of_block)
# send Ctrl-C
client1.send('\x03', eol='')
match = client1.expect('(%s)|([#\$] )' % prompt)
if match.groups()[1]:
client1.send(client1.command)
client1.expect(prompt)
client1.send('SELECT sleep(1)')
client1.expect(prompt)
client1.send('DROP TABLE test.lv')
client1.expect('Table test.lv doesn\'t exist')
client1.expect(prompt)
client1.send('DROP TABLE test.mt')
client1.expect(prompt)

View File

@ -1,56 +0,0 @@
#!/usr/bin/env python3
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
import os
import sys
import signal
CURDIR = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(CURDIR, 'helpers'))
from client import client, prompt, end_of_block
log = None
# uncomment the line below for debugging
#log=sys.stdout
with client(name='client1>', log=log) as client1, client(name='client2>', log=log) as client2:
client1.expect(prompt)
client2.expect(prompt)
client1.send('SET allow_experimental_live_view = 1')
client1.expect(prompt)
client2.send('SET allow_experimental_live_view = 1')
client2.expect(prompt)
client1.send('DROP TABLE IF EXISTS test.lv')
client1.expect(prompt)
client1.send(' DROP TABLE IF EXISTS test.mt')
client1.expect(prompt)
client1.send('CREATE TABLE test.mt (a Int32, id Int32) Engine=Memory')
client1.expect(prompt)
client1.send('CREATE LIVE VIEW test.lv AS SELECT sum(a)/2 FROM (SELECT a, id FROM ( SELECT a, id FROM test.mt ORDER BY id DESC LIMIT 2 ) ORDER BY id DESC LIMIT 2)')
client1.expect(prompt)
client1.send('WATCH test.lv')
client1.expect('_version')
client1.expect(r'0.*1' + end_of_block)
client2.send('INSERT INTO test.mt VALUES (1, 1),(2, 2),(3, 3)')
client1.expect(r'2\.5.*2' + end_of_block)
client2.expect(prompt)
client2.send('INSERT INTO test.mt VALUES (4, 4),(5, 5),(6, 6)')
client1.expect(r'5\.5.*3' + end_of_block)
client2.expect(prompt)
for v, i in enumerate(range(7,129)):
client2.send('INSERT INTO test.mt VALUES (%d, %d)' % (i, i))
client1.expect(r'%.1f.*%d' % (i-0.5, 4+v) + end_of_block)
client2.expect(prompt)
# send Ctrl-C
client1.send('\x03', eol='')
match = client1.expect('(%s)|([#\$] )' % prompt)
if match.groups()[1]:
client1.send(client1.command)
client1.expect(prompt)
client1.send('DROP TABLE test.lv')
client1.expect(prompt)
client1.send('DROP TABLE test.mt')
client1.expect(prompt)

View File

@ -1,7 +0,0 @@
#!/usr/bin/env bash
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
python3 $CURDIR/00991_live_view_watch_event_live.python

View File

@ -1,7 +0,0 @@
#!/usr/bin/env bash
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
python3 $CURDIR/00991_live_view_watch_http.python

View File

@ -1,7 +0,0 @@
#!/usr/bin/env bash
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
python3 $CURDIR/00991_temporary_live_view_watch_events_heartbeat.python

View File

@ -1,7 +0,0 @@
#!/usr/bin/env bash
# Tags: disabled, no-replicated-database, no-parallel, no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
python3 $CURDIR/00991_temporary_live_view_watch_live.python

View File

@ -61,6 +61,7 @@ function thread6()
done
}
# https://stackoverflow.com/questions/9954794/execute-a-shell-function-with-timeout
export -f thread1;
export -f thread2;

View File

@ -10,6 +10,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS check;"
$CLICKHOUSE_CLIENT --query="CREATE TABLE check (x UInt64, y UInt64 DEFAULT throwIf(x > 1500000)) ENGINE = Memory;"
seq 1 2000000 | $CLICKHOUSE_CLIENT --query="INSERT INTO check(x) FORMAT TSV" 2>&1 | grep -q "Value passed to 'throwIf' function is non zero." && echo 'OK' || echo 'FAIL' ||:
seq 1 2000000 | $CLICKHOUSE_CLIENT --query="INSERT INTO check(x) FORMAT TSV" 2>&1 | grep -q "Value passed to 'throwIf' function is non-zero." && echo 'OK' || echo 'FAIL' ||:
$CLICKHOUSE_CLIENT --query="DROP TABLE check;"

View File

@ -9,6 +9,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=./replication.lib
. "$CURDIR"/replication.lib
declare -A engines
engines[0]="MergeTree"
engines[1]="ReplicatedMergeTree('/test/$CLICKHOUSE_TEST_ZOOKEEPER_PREFIX/{shard}/src', '{replica}_' || toString(randConstant()))"

View File

@ -1,3 +1,5 @@
-- Tags: disabled
DROP TABLE IF EXISTS t_part_log_has_merge_type_table;
CREATE TABLE t_part_log_has_merge_type_table

View File

@ -1,4 +1,4 @@
-- Tags: no-fasttest, no-parallel, no-s3-storage, no-random-settings
-- Tags: disabled
-- { echo }

View File

@ -0,0 +1,32 @@
dict LowCardinality(Nullable(String))
one
two
three
one
two
dict LowCardinality(Nullable(String))
one
two
three
one
three
dict LowCardinality(Nullable(String))
one
two
three
one
two
three
lc LowCardinality(Nullable(String))
OK
dict LowCardinality(Nullable(String))
one
two
three
one
\N
three

View File

@ -0,0 +1,29 @@
#!/usr/bin/env bash
# Tags: no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
USER_FILES_PATH=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}')
mkdir -p $USER_FILES_PATH/test_02383
cp $CURDIR/data_arrow/dictionary*.arrow $USER_FILES_PATH/test_02383/
cp $CURDIR/data_arrow/corrupted.arrow $USER_FILES_PATH/test_02383/
cp $CURDIR/data_arrow/dict_with_nulls.arrow $USER_FILES_PATH/test_02383/
$CLICKHOUSE_CLIENT -q "desc file('test_02383/dictionary1.arrow')"
$CLICKHOUSE_CLIENT -q "select * from file('test_02383/dictionary1.arrow')"
$CLICKHOUSE_CLIENT -q "desc file('test_02383/dictionary2.arrow')"
$CLICKHOUSE_CLIENT -q "select * from file('test_02383/dictionary2.arrow')"
$CLICKHOUSE_CLIENT -q "desc file('test_02383/dictionary3.arrow')"
$CLICKHOUSE_CLIENT -q "select * from file('test_02383/dictionary3.arrow')"
$CLICKHOUSE_CLIENT -q "desc file('test_02383/corrupted.arrow')"
$CLICKHOUSE_CLIENT -q "select * from file('test_02383/corrupted.arrow')" 2>&1 | grep -F -q "INCORRECT_DATA" && echo OK || echo FAIL
$CLICKHOUSE_CLIENT -q "desc file('test_02383/dict_with_nulls.arrow')"
$CLICKHOUSE_CLIENT -q "select * from file('test_02383/dict_with_nulls.arrow')"
rm -rf $USER_FILES_PATH/test_02383

View File

@ -0,0 +1,4 @@
lc LowCardinality(Nullable(String))
abc
lc LowCardinality(Nullable(String))
abc

View File

@ -0,0 +1,8 @@
-- Tags: no-fasttest
insert into function file(02384_data.arrow) select toLowCardinality(toNullable('abc')) as lc settings output_format_arrow_low_cardinality_as_dictionary=1, output_format_arrow_string_as_string=0, engine_file_truncate_on_insert=1;
desc file(02384_data.arrow);
select * from file(02384_data.arrow);
insert into function file(02384_data.arrow) select toLowCardinality(toNullable('abc')) as lc settings output_format_arrow_low_cardinality_as_dictionary=1, output_format_arrow_string_as_string=1, engine_file_truncate_on_insert=1;
desc file(02384_data.arrow);
select * from file(02384_data.arrow);

View File

@ -0,0 +1,6 @@
2
2
3
3
4
4

View File

@ -0,0 +1,10 @@
CREATE TABLE m0 (id UInt64) ENGINE=MergeTree ORDER BY id SETTINGS index_granularity = 1;
INSERT INTO m0 SELECT number FROM numbers(10);
CREATE TABLE m1 (id UInt64, s String) ENGINE=MergeTree ORDER BY id SETTINGS index_granularity = 1;
INSERT INTO m1 SELECT number, 'boo' FROM numbers(10);
CREATE VIEW m1v AS SELECT id FROM m1;
CREATE TABLE m2 (id UInt64) ENGINE=Merge(currentDatabase(),'m0|m1v');
SELECT * FROM m2 WHERE id > 1 AND id < 5 ORDER BY id SETTINGS force_primary_key=1, max_bytes_to_read=64;

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python3
from socket import socket, AF_INET, SOCK_STREAM
import os
EXCEPTION_CODE_HEADER = "X-ClickHouse-Exception-Code"
TRANSFER_ENCODING_HEADER = "Transfer-Encoding"
def main():
host = os.environ['CLICKHOUSE_HOST']
port = int(os.environ['CLICKHOUSE_PORT_HTTP'])
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((host, port))
sock.settimeout(5)
s = "POST /play HTTP/1.1\r\n"
s += "Host: %s\r\n" % host
s += "Content-type: multipart/form-data\r\n"
s += "Transfer-encoding: chunked\r\n"
s += "\r\n"
s += "ffffffffffffffff"
s += "\r\n"
s += "X" * 100000
sock.sendall(s.encode())
data = sock.recv(10000).decode()
sock.close()
lines = data.splitlines()
print(lines.pop(0))
headers = {}
for x in lines:
x = x.strip()
if not x:
continue
tokens = x.split(":", 1)
if len(tokens) < 2:
continue
key, val = tokens
headers[key.strip()] = val.strip()
print("encoding type", headers[TRANSFER_ENCODING_HEADER])
print("error code", headers[EXCEPTION_CODE_HEADER])
if __name__ == "__main__":
main()

View File

@ -0,0 +1,3 @@
HTTP/1.1 200 OK
encoding type chunked
error code 1000

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
# We should have correct env vars from shell_config.sh to run this test
python3 "$CURDIR"/02403_big_http_chunk_size.python

View File

@ -0,0 +1,3 @@
2020-10-01 144
2020-10-01 0
2020-10-01 0

View File

@ -0,0 +1,28 @@
DROP TABLE IF EXISTS ttl_table;
CREATE TABLE ttl_table
(
EventDate Date,
Longitude Float64 TTL EventDate + toIntervalWeek(2)
)
ENGINE = MergeTree()
ORDER BY EventDate
SETTINGS vertical_merge_algorithm_min_rows_to_activate=1, vertical_merge_algorithm_min_columns_to_activate=1;
SYSTEM STOP MERGES ttl_table;
INSERT INTO ttl_table VALUES(toDate('2020-10-01'), 144);
SELECT * FROM ttl_table;
SYSTEM START MERGES ttl_table;
OPTIMIZE TABLE ttl_table FINAL;
SELECT * FROM ttl_table;
OPTIMIZE TABLE ttl_table FINAL;
SELECT * FROM ttl_table;
DROP TABLE IF EXISTS ttl_table;

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1,3 +1,4 @@
v22.8.1.2097-lts 2022-08-18
v22.7.3.5-stable 2022-08-10
v22.7.2.15-stable 2022-08-03
v22.7.1.2484-stable 2022-07-21

1 v22.7.3.5-stable v22.8.1.2097-lts 2022-08-10 2022-08-18
1 v22.8.1.2097-lts 2022-08-18
2 v22.7.3.5-stable v22.7.3.5-stable 2022-08-10 2022-08-10
3 v22.7.2.15-stable v22.7.2.15-stable 2022-08-03 2022-08-03
4 v22.7.1.2484-stable v22.7.1.2484-stable 2022-07-21 2022-07-21