Merge branch 'master' into keeper-listen-host

This commit is contained in:
Antonio Andelic 2022-08-21 18:43:55 +02:00 committed by GitHub
commit af02c76a3c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
301 changed files with 8059 additions and 2895 deletions

View File

@ -13,13 +13,24 @@ on: # yamllint disable-line rule:truthy
- 'v*-prestable'
- 'v*-stable'
- 'v*-lts'
workflow_dispatch:
inputs:
tag:
description: 'Test tag'
required: true
type: string
jobs:
UpdateVersions:
runs-on: [self-hosted, style-checker]
steps:
- name: Set test tag
if: github.event_name == 'workflow_dispatch'
run: |
echo "GITHUB_TAG=${{ github.event.inputs.tag }}" >> "$GITHUB_ENV"
- name: Get tag name
if: github.event_name != 'workflow_dispatch'
run: |
echo "GITHUB_TAG=${GITHUB_REF#refs/tags/}" >> "$GITHUB_ENV"
- name: Check out repository code
@ -35,19 +46,22 @@ jobs:
GID=$(id -g "${UID}")
docker run -u "${UID}:${GID}" -e PYTHONUNBUFFERED=1 \
--volume="${GITHUB_WORKSPACE}:/ClickHouse" clickhouse/style-test \
/ClickHouse/utils/changelog/changelog.py -vv --gh-user-or-token="$GITHUB_TOKEN" \
--output="/ClickHouse/docs/changelogs/${GITHUB_TAG}.md" --jobs=5 "${GITHUB_TAG}"
/ClickHouse/utils/changelog/changelog.py -v --debug-helpers \
--gh-user-or-token="$GITHUB_TOKEN" --jobs=5 \
--output="/ClickHouse/docs/changelogs/${GITHUB_TAG}.md" "${GITHUB_TAG}"
git add "./docs/changelogs/${GITHUB_TAG}.md"
git diff HEAD
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
with:
author: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
token: ${{ secrets.ROBOT_CLICKHOUSE_COMMIT_TOKEN }}
committer: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
commit-message: Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}
branch: auto/${{ env.GITHUB_TAG }}
delete-branch: true
title: Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}
labels: do not test
body: |
Update version_date.tsv and changelogs after ${{ env.GITHUB_TAG }}

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v22.8, 2022-08-18](#228)**<br/>
**[ClickHouse release v22.7, 2022-07-21](#227)**<br/>
**[ClickHouse release v22.6, 2022-06-16](#226)**<br/>
**[ClickHouse release v22.5, 2022-05-19](#225)**<br/>
@ -8,6 +9,148 @@
**[ClickHouse release v22.1, 2022-01-18](#221)**<br/>
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br/>
### <a id="228"></a> ClickHouse release 22.8, 2022-08-18
#### Backward Incompatible Change
* Extended range of `Date32` and `DateTime64` to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Prebuilt ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011. [#39000](https://github.com/ClickHouse/ClickHouse/pull/39000) ([Robert Schulze](https://github.com/rschu1ze)).
* Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. ([Kseniia Sumarokova](https://github.com/kssenii)). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171))
#### New Feature
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)) ([Alexander Gololobov](https://github.com/davenger)). Note: this new feature does not make ClickHouse an HTAP DBMS.
* Query parameters can be set in interactive mode as `SET param_abc = 'def'` and transferred via the native protocol as settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
* Quota key can be set in the native protocol ([Yakov Olkhovsky](https://github.com/ClickHouse/ClickHouse/pull/39874)).
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
* Added support for parallel distributed insert select with `s3Cluster` table function into tables with `Distributed` and `Replicated` engine [#34670](https://github.com/ClickHouse/ClickHouse/issues/34670). [#39107](https://github.com/ClickHouse/ClickHouse/pull/39107) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add new settings to control schema inference from text formats: - `input_format_try_infer_dates` - try infer dates from strings. - `input_format_try_infer_datetimes` - try infer datetimes from strings. - `input_format_try_infer_integers` - try infer `Int64` instead of `Float64`. - `input_format_json_try_infer_numbers_from_strings` - try infer numbers from json strings in JSON formats. [#39186](https://github.com/ClickHouse/ClickHouse/pull/39186) ([Kruglov Pavel](https://github.com/Avogar)).
* An option to provide JSON formatted log output. The purpose is to allow easier ingestion and query in log analysis tools. [#39277](https://github.com/ClickHouse/ClickHouse/pull/39277) ([Mallik Hassan](https://github.com/SadiHassan)).
* Add function `nowInBlock` which allows getting the current time during long-running and continuous queries. Closes [#39522](https://github.com/ClickHouse/ClickHouse/issues/39522). Notes: there are no functions `now64InBlock` neither `todayInBlock`. [#39533](https://github.com/ClickHouse/ClickHouse/pull/39533) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add ability to specify settings for an `executable()` table function. [#39681](https://github.com/ClickHouse/ClickHouse/pull/39681) ([Constantine Peresypkin](https://github.com/pkit)).
* Implemented automatic conversion of database engine from `Ordinary` to `Atomic`. Create empty `convert_ordinary_to_atomic` file in `flags` directory and all `Ordinary` databases will be converted automatically on next server start. Resolves [#39546](https://github.com/ClickHouse/ClickHouse/issues/39546). [#39933](https://github.com/ClickHouse/ClickHouse/pull/39933) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support `SELECT ... INTO OUTFILE '...' AND STDOUT`. [#37490](https://github.com/ClickHouse/ClickHouse/issues/37490). [#39054](https://github.com/ClickHouse/ClickHouse/pull/39054) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add formats `PrettyMonoBlock`, `PrettyNoEscapesMonoBlock`, `PrettyCompactNoEscapes`, `PrettyCompactNoEscapesMonoBlock`, `PrettySpaceNoEscapes`, `PrettySpaceMonoBlock`, `PrettySpaceNoEscapesMonoBlock`. [#39646](https://github.com/ClickHouse/ClickHouse/pull/39646) ([Kruglov Pavel](https://github.com/Avogar)).
* Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)). Add `concurrent_threads_soft_limit parameter` to increase performance in case of high QPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)). [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
* `DISTINCT` in order with `ORDER BY`: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)). Improve memory usage (significantly) and query execution time + use `DistinctSortedChunkTransform` for final distinct when `DISTINCT` columns match `ORDER BY` columns, but rename to `DistinctSortedStreamTransform` in `EXPLAIN PIPELINE` → this improves memory usage significantly + remove unnecessary allocations in hot loop in `DistinctSortedChunkTransform`. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)). Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)). Fix: `DistinctSortedTransform` didn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary `DISTINCT` (`DistinctTransform`). The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
* Optimize filtering by numeric columns with AVX512VBMI2 compress store. [#39633](https://github.com/ClickHouse/ClickHouse/pull/39633) ([Guo Wangyang](https://github.com/guowangy)). For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. [#40033](https://github.com/ClickHouse/ClickHouse/pull/40033) ([Robert Schulze](https://github.com/rschu1ze)).
* Optimize index analysis with functional expressions in multi-thread scenario. [#39812](https://github.com/ClickHouse/ClickHouse/pull/39812) ([Guo Wangyang](https://github.com/guowangy)).
* Optimizations for complex queries: Don't visit the AST for UDFs if none are registered. [#40069](https://github.com/ClickHouse/ClickHouse/pull/40069) ([Raúl Marín](https://github.com/Algunenano)). Optimize CurrentMemoryTracker alloc and free. [#40078](https://github.com/ClickHouse/ClickHouse/pull/40078) ([Raúl Marín](https://github.com/Algunenano)).
* Improved Base58 encoding/decoding. [#39292](https://github.com/ClickHouse/ClickHouse/pull/39292) ([Andrey Zvonov](https://github.com/zvonand)).
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
#### Improvement
* Normalize `AggregateFunction` types and state representations because optimizations like [#35788](https://github.com/ClickHouse/ClickHouse/pull/35788) will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)). The functions with identical states can be used in materialized views interchangeably.
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set the ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add result_rows and result_bytes to progress reports (`X-ClickHouse-Summary`). [#39567](https://github.com/ClickHouse/ClickHouse/pull/39567) ([Raúl Marín](https://github.com/Algunenano)).
* Improve primary key analysis for MergeTree. [#25563](https://github.com/ClickHouse/ClickHouse/pull/25563) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* `timeSlots` now works with DateTime64; subsecond duration and slot size available when working with DateTime64. [#37951](https://github.com/ClickHouse/ClickHouse/pull/37951) ([Andrey Zvonov](https://github.com/zvonand)).
* Added support of `LEFT SEMI` and `LEFT ANTI` direct join with `EmbeddedRocksDB` tables. [#38956](https://github.com/ClickHouse/ClickHouse/pull/38956) ([Vladimir C](https://github.com/vdimir)).
* Add profile events for fsync operations. [#39179](https://github.com/ClickHouse/ClickHouse/pull/39179) ([Azat Khuzhin](https://github.com/azat)).
* Add the second argument to the ordinary function `file(path[, default])`, which function returns in the case when a file does not exists. [#39218](https://github.com/ClickHouse/ClickHouse/pull/39218) ([Nikolay Degterinsky](https://github.com/evillique)).
* Some small fixes for reading via http, allow to retry partial content in case if 200 OK. [#39244](https://github.com/ClickHouse/ClickHouse/pull/39244) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support queries `CREATE TEMPORARY TABLE ... (<list of columns>) AS ...`. [#39462](https://github.com/ClickHouse/ClickHouse/pull/39462) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support of `!`/`*` (exclamation/asterisk) in custom TLDs (`cutToFirstSignificantSubdomainCustom()`/`cutToFirstSignificantSubdomainCustomWithWWW()`/`firstSignificantSubdomainCustom()`). [#39496](https://github.com/ClickHouse/ClickHouse/pull/39496) ([Azat Khuzhin](https://github.com/azat)).
* Add support for TLS connections to NATS. Implements [#39525](https://github.com/ClickHouse/ClickHouse/issues/39525). [#39527](https://github.com/ClickHouse/ClickHouse/pull/39527) ([Constantine Peresypkin](https://github.com/pkit)).
* `clickhouse-obfuscator` (a tool for database obfuscation for testing and load generation) now has the new `--save` and `--load` parameters to work with pre-trained models. This closes [#39534](https://github.com/ClickHouse/ClickHouse/issues/39534). [#39541](https://github.com/ClickHouse/ClickHouse/pull/39541) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix incorrect behavior of log rotation during restart. [#39558](https://github.com/ClickHouse/ClickHouse/pull/39558) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes [#39667](https://github.com/ClickHouse/ClickHouse/issues/39667) . [#39671](https://github.com/ClickHouse/ClickHouse/pull/39671) ([Amos Bird](https://github.com/amosbird)).
* Allow to execute hash functions with arguments of type `Map`. [#39685](https://github.com/ClickHouse/ClickHouse/pull/39685) ([Anton Popov](https://github.com/CurtizJ)).
* Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. [#39690](https://github.com/ClickHouse/ClickHouse/pull/39690) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Change the prefix size of AggregateFunctionDistinct to make sure nested function data memory segment is aligned. [#39696](https://github.com/ClickHouse/ClickHouse/pull/39696) ([Pxl](https://github.com/BiteTheDDDDt)).
* Properly escape credentials passed to the `clickhouse-diagnostic` tool. [#39707](https://github.com/ClickHouse/ClickHouse/pull/39707) ([Dale McDiarmid](https://github.com/gingerwizard)).
* ClickHouse Keeper improvement: create a snapshot on exit. It can be controlled with the config `keeper_server.create_snapshot_on_exit`, `true` by default. [#39755](https://github.com/ClickHouse/ClickHouse/pull/39755) ([Antonio Andelic](https://github.com/antonio2368)).
* Support primary key analysis for `row_policy_filter` and `additional_filter`. It also helps fix issues like [#37454](https://github.com/ClickHouse/ClickHouse/issues/37454) . [#39826](https://github.com/ClickHouse/ClickHouse/pull/39826) ([Amos Bird](https://github.com/amosbird)).
* Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes [#39957](https://github.com/ClickHouse/ClickHouse/issues/39957). This closes [#39960](https://github.com/ClickHouse/ClickHouse/issues/39960). [#39961](https://github.com/ClickHouse/ClickHouse/pull/39961) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. [#39962](https://github.com/ClickHouse/ClickHouse/pull/39962) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. [#40053](https://github.com/ClickHouse/ClickHouse/pull/40053) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes [#38070](https://github.com/ClickHouse/ClickHouse/issues/38070). See also [this](https://github.com/ClickHouse/ClickBench/blob/main/hardware/benchmark-cloud.sh#L37) for motivation. [#39968](https://github.com/ClickHouse/ClickHouse/pull/39968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Adds `parseDateTime64BestEffortUS`, `parseDateTime64BestEffortUSOrNull`, `parseDateTime64BestEffortUSOrZero` functions, closing [#37492](https://github.com/ClickHouse/ClickHouse/issues/37492). [#40015](https://github.com/ClickHouse/ClickHouse/pull/40015) ([Tanya Bragin](https://github.com/tbragin)).
* Extend the `system.processors_profile_log` with more information such as input rows. [#40121](https://github.com/ClickHouse/ClickHouse/pull/40121) ([Amos Bird](https://github.com/amosbird)).
* Display server-side time in `clickhouse-benchmark` by default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new `--client-side-time` command line option. Change the `--randomize` command line option from `--randomize 1` to the form without argument. [#40193](https://github.com/ClickHouse/ClickHouse/pull/40193) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for `overflow_mode` = `break` and `throw`). For example, if you have set up `max_rows_to_read` with `read_overflow_mode = 'break'`, looking at the value of `OverflowBreak` counter will allow distinguishing incomplete results. [#40205](https://github.com/ClickHouse/ClickHouse/pull/40205) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix memory accounting in case of "Memory limit exceeded" errors (previously [peak] memory usage was takes failed allocations into account). [#40249](https://github.com/ClickHouse/ClickHouse/pull/40249) ([Azat Khuzhin](https://github.com/azat)).
* Add metrics for filesystem cache: `FilesystemCacheSize` and `FilesystemCacheElements`. [#40260](https://github.com/ClickHouse/ClickHouse/pull/40260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support hadoop secure RPC transfer (hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). [#39411](https://github.com/ClickHouse/ClickHouse/pull/39411) ([michael1589](https://github.com/michael1589)).
* Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). [#40264](https://github.com/ClickHouse/ClickHouse/pull/40264) ([Robert Schulze](https://github.com/rschu1ze)).
* Add cache for schema inference for file/s3/hdfs/url table functions. Now, schema inference will be performed only on the first query to the file, all subsequent queries to the same file will use the schema from cache if data wasn't changed. Add system table system.schema_inference_cache with all current schemas in cache and system queries SYSTEM DROP SCHEMA CACHE [FOR FILE/S3/HDFS/URL] to drop schemas from cache. [#38286](https://github.com/ClickHouse/ClickHouse/pull/38286) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
#### Build/Testing/Packaging Improvement
* [ClickFiddle](https://fiddle.clickhouse.com/): A new tool for testing ClickHouse versions in read/write mode (**Igor Baliuk**).
* ClickHouse binary is made self-extracting [#35775](https://github.com/ClickHouse/ClickHouse/pull/35775) ([Yakov Olkhovskiy, Arthur Filatenkov](https://github.com/yakov-olkhovskiy)).
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Former packages used to install systemd.service file to `/etc`. The files there are marked as `conf` and are not cleaned out, and not updated automatically. This PR cleans them out. [#39323](https://github.com/ClickHouse/ClickHouse/pull/39323) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Ensure LSan is effective. [#39430](https://github.com/ClickHouse/ClickHouse/pull/39430) ([Azat Khuzhin](https://github.com/azat)).
* TSAN has issues with clang-14 (https://github.com/google/sanitizers/issues/1552, https://github.com/google/sanitizers/issues/1540), so here we build the TSAN binaries with clang-15. [#39450](https://github.com/ClickHouse/ClickHouse/pull/39450) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove the option to build ClickHouse tools as separate executable programs. This fixes [#37847](https://github.com/ClickHouse/ClickHouse/issues/37847). [#39520](https://github.com/ClickHouse/ClickHouse/pull/39520) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Small preparations for build on s390x (which is big-endian). [#39627](https://github.com/ClickHouse/ClickHouse/pull/39627) ([Harry Lee](https://github.com/HarryLeeIBM)). [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issue in BitHelpers for s390x. [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)). Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). [#39732](https://github.com/ClickHouse/ClickHouse/pull/39732) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). [#39931](https://github.com/ClickHouse/ClickHouse/pull/39931) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). [#40008](https://github.com/ClickHouse/ClickHouse/pull/40008) ([Harry Lee](https://github.com/HarryLeeIBM)). Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). [#40179](https://github.com/ClickHouse/ClickHouse/pull/40179) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support build with `clang-16` (trunk). This closes [#39949](https://github.com/ClickHouse/ClickHouse/issues/39949). [#40181](https://github.com/ClickHouse/ClickHouse/pull/40181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prepare RISC-V 64 build to run in CI. This is for [#40141](https://github.com/ClickHouse/ClickHouse/issues/40141). [#40197](https://github.com/ClickHouse/ClickHouse/pull/40197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Simplified function registration macro interface (`FUNCTION_REGISTER*`) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. [#38615](https://github.com/ClickHouse/ClickHouse/pull/38615) ([Li Yin](https://github.com/liyinsg)).
* Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Bug Fix
* Fix possible segfault in `CapnProto` input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix the case when the order of columns can be incorrect if the `IN` operator is used with a table with `ENGINE = Set` containing multiple columns. This fixes [#13014](https://github.com/ClickHouse/ClickHouse/issues/13014). [#40225](https://github.com/ClickHouse/ClickHouse/pull/40225) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix seeking while reading from encrypted disk. This PR fixes [#38381](https://github.com/ClickHouse/ClickHouse/issues/38381). [#39687](https://github.com/ClickHouse/ClickHouse/pull/39687) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix duplicate columns in join plan. Finally, solve [#26809](https://github.com/ClickHouse/ClickHouse/issues/26809). [#40009](https://github.com/ClickHouse/ClickHouse/pull/40009) ([Vladimir C](https://github.com/vdimir)).
* Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. [#37849](https://github.com/ClickHouse/ClickHouse/pull/37849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). [#38725](https://github.com/ClickHouse/ClickHouse/pull/38725) ([Azat Khuzhin](https://github.com/azat)).
* Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes [#37032](https://github.com/ClickHouse/ClickHouse/issues/37032). [#39103](https://github.com/ClickHouse/ClickHouse/pull/39103) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Fix wrong table name in logs after RENAME TABLE. This fixes [#38018](https://github.com/ClickHouse/ClickHouse/issues/38018). [#39227](https://github.com/ClickHouse/ClickHouse/pull/39227) ([Amos Bird](https://github.com/amosbird)).
* Fix positional arguments in case of columns pruning when optimising the query. Closes [#38433](https://github.com/ClickHouse/ClickHouse/issues/38433). [#39293](https://github.com/ClickHouse/ClickHouse/pull/39293) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty `Tuple` type. Closes [#39051](https://github.com/ClickHouse/ClickHouse/issues/39051) Add 2 new settings `input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference` that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. [#39357](https://github.com/ClickHouse/ClickHouse/pull/39357) ([Kruglov Pavel](https://github.com/Avogar)).
* (Window View is an experimental feature) Fix segmentation fault on `CREATE WINDOW VIEW .. ON CLUSTER ... INNER`. Closes [#39363](https://github.com/ClickHouse/ClickHouse/issues/39363). [#39384](https://github.com/ClickHouse/ClickHouse/pull/39384) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix WriteBuffer finalize when cancelling insert into function (in previous versions it may leat to std::terminate). [#39458](https://github.com/ClickHouse/ClickHouse/pull/39458) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix storing of columns of type `Object` in sparse serialization. [#39464](https://github.com/ClickHouse/ClickHouse/pull/39464) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible "Not found column in block" exception when using projections. This closes [#39469](https://github.com/ClickHouse/ClickHouse/issues/39469). [#39470](https://github.com/ClickHouse/ClickHouse/pull/39470) ([小路](https://github.com/nicelulu)).
* Fix exception on race between DROP and INSERT with materialized views. [#39477](https://github.com/ClickHouse/ClickHouse/pull/39477) ([Azat Khuzhin](https://github.com/azat)).
* A bug in Apache Avro library: fix data race and possible heap-buffer-overflow in Avro format. Closes [#39094](https://github.com/ClickHouse/ClickHouse/issues/39094) Closes [#33652](https://github.com/ClickHouse/ClickHouse/issues/33652). [#39498](https://github.com/ClickHouse/ClickHouse/pull/39498) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare bug in asynchronous reading (with setting `local_filesystem_read_method='pread_threadpool'`) with enabled `O_DIRECT` (enabled by setting `min_bytes_to_use_direct_io`). [#39506](https://github.com/ClickHouse/ClickHouse/pull/39506) ([Anton Popov](https://github.com/CurtizJ)).
* (only on FreeBSD) Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. [#39551](https://github.com/ClickHouse/ClickHouse/pull/39551) ([Alexander Gololobov](https://github.com/davenger)).
* Fix bug with the recently introduced "maxsplit" argument for `splitByChar`, which was not working correctly. [#39552](https://github.com/ClickHouse/ClickHouse/pull/39552) ([filimonov](https://github.com/filimonov)).
* Fix bug in ASOF JOIN with `enable_optimize_predicate_expression`, close [#37813](https://github.com/ClickHouse/ClickHouse/issues/37813). [#39556](https://github.com/ClickHouse/ClickHouse/pull/39556) ([Vladimir C](https://github.com/vdimir)).
* Fixed `CREATE/DROP INDEX` query with `ON CLUSTER` or `Replicated` database and `ReplicatedMergeTree`. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes [#39511](https://github.com/ClickHouse/ClickHouse/issues/39511). [#39565](https://github.com/ClickHouse/ClickHouse/pull/39565) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "column not found" error for push down with join, close [#39505](https://github.com/ClickHouse/ClickHouse/issues/39505). [#39575](https://github.com/ClickHouse/ClickHouse/pull/39575) ([Vladimir C](https://github.com/vdimir)).
* Fix the wrong `REGEXP_REPLACE` alias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. [#39592](https://github.com/ClickHouse/ClickHouse/pull/39592) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula `exp((t - curr_row_t) / decay_length)`, which is incorrect when right boundary of window is not `CURRENT ROW`. It was changed to: `exp((t - last_row_t) / decay_length)`. There is no change in results for windows with `ROWS BETWEEN (smth) AND CURRENT ROW`. [#39593](https://github.com/ClickHouse/ClickHouse/pull/39593) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fix Decimal division overflow, which can be detected based on operands scale. [#39600](https://github.com/ClickHouse/ClickHouse/pull/39600) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix settings `output_format_arrow_string_as_string` and `output_format_arrow_low_cardinality_as_dictionary` work in combination. Closes [#39624](https://github.com/ClickHouse/ClickHouse/issues/39624). [#39647](https://github.com/ClickHouse/ClickHouse/pull/39647) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed a bug in default database resolution in distributed table reads. [#39674](https://github.com/ClickHouse/ClickHouse/pull/39674) ([Anton Kozlov](https://github.com/tonickkozlov)).
* (Only with the obsolete Ordinary databases) Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. [#39708](https://github.com/ClickHouse/ClickHouse/pull/39708) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix possible error `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinality` Fixes [#38460](https://github.com/ClickHouse/ClickHouse/issues/38460). [#39716](https://github.com/ClickHouse/ClickHouse/pull/39716) ([Arthur Passos](https://github.com/arthurpassos)).
* Field names in the `meta` section of JSON format were erroneously double escaped. This closes [#39693](https://github.com/ClickHouse/ClickHouse/issues/39693). [#39747](https://github.com/ClickHouse/ClickHouse/pull/39747) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong index analysis with tuples and operator `IN`, which could lead to wrong query result. [#39752](https://github.com/ClickHouse/ClickHouse/pull/39752) ([Anton Popov](https://github.com/CurtizJ)).
* Fix `EmbeddedRocksDB` tables filtering by key using params. [#39757](https://github.com/ClickHouse/ClickHouse/pull/39757) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix error `Invalid number of columns in chunk pushed to OutputPort` which was caused by ARRAY JOIN optimization. Fixes [#39164](https://github.com/ClickHouse/ClickHouse/issues/39164). [#39799](https://github.com/ClickHouse/ClickHouse/pull/39799) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* A workaround for a bug in Linux kernel. Fix `CANNOT_READ_ALL_DATA` exception with `local_filesystem_read_method=pread_threadpool`. This bug affected only Linux kernel version 5.9 and 5.10 according to [man](https://manpages.debian.org/testing/manpages-dev/preadv2.2.en.html#BUGS). [#39800](https://github.com/ClickHouse/ClickHouse/pull/39800) ([Anton Popov](https://github.com/CurtizJ)).
* (Only on NFS) Fix broken NFS mkdir for root-squashed volumes. [#39898](https://github.com/ClickHouse/ClickHouse/pull/39898) ([Constantine Peresypkin](https://github.com/pkit)).
* Remove dictionaries from prometheus metrics on DETACH/DROP. [#39926](https://github.com/ClickHouse/ClickHouse/pull/39926) ([Azat Khuzhin](https://github.com/azat)).
* Fix read of StorageFile with virtual columns. Closes [#39907](https://github.com/ClickHouse/ClickHouse/issues/39907). [#39943](https://github.com/ClickHouse/ClickHouse/pull/39943) ([flynn](https://github.com/ucasfl)).
* Fix big memory usage during fetches. Fixes [#39915](https://github.com/ClickHouse/ClickHouse/issues/39915). [#39990](https://github.com/ClickHouse/ClickHouse/pull/39990) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* (experimental feature) Fix `hashId` crash and salt parameter not being used. [#40002](https://github.com/ClickHouse/ClickHouse/pull/40002) ([Raúl Marín](https://github.com/Algunenano)).
* `EXCEPT` and `INTERSECT` operators may lead to crash if a specific combination of constant and non-constant columns were used. [#40020](https://github.com/ClickHouse/ClickHouse/pull/40020) ([Duc Canh Le](https://github.com/canhld94)).
* Fixed "Part directory doesn't exist" and "`tmp_<part_name>` ... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, but `tmp-fetch_<part_name>` directory was not cleaned up. [#40031](https://github.com/ClickHouse/ClickHouse/pull/40031) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix rare cases of parsing of arrays of tuples in format `Values`. [#40034](https://github.com/ClickHouse/ClickHouse/pull/40034) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. [#40037](https://github.com/ClickHouse/ClickHouse/pull/40037) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix potential deadlock in writing to S3 during task scheduling failure. [#40070](https://github.com/ClickHouse/ClickHouse/pull/40070) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bug in collectFilesToSkip() by adding correct file extension (.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed [#39896](https://github.com/ClickHouse/ClickHouse/issues/39896). [#40095](https://github.com/ClickHouse/ClickHouse/pull/40095) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* A fix for reverse DNS resolution. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix unexpected result `arrayDifference` of `Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
### <a id="227"></a> ClickHouse release 22.7, 2022-07-21
#### Upgrade Notes
@ -258,7 +401,7 @@
* Allows providing `NULL`/`NOT NULL` right after type in column declaration. [#37337](https://github.com/ClickHouse/ClickHouse/pull/37337) ([Igor Nikonov](https://github.com/devcrafter)).
* optimize file segment PARTIALLY_DOWNLOADED get read buffer. [#37338](https://github.com/ClickHouse/ClickHouse/pull/37338) ([xiedeyantu](https://github.com/xiedeyantu)).
* Try to improve short circuit functions processing to fix problems with stress tests. [#37384](https://github.com/ClickHouse/ClickHouse/pull/37384) ([Kruglov Pavel](https://github.com/Avogar)).
* Closes [#37395](https://github.com/ClickHouse/ClickHouse/issues/37395). [#37415](https://github.com/ClickHouse/ClickHouse/pull/37415) ([Memo](https://github.com/Joeywzr)).
* Generate multiple columns with UUID (generateUUIDv4(1), generateUUIDv4(2)) [#37395](https://github.com/ClickHouse/ClickHouse/issues/37395). [#37415](https://github.com/ClickHouse/ClickHouse/pull/37415) ([Memo](https://github.com/Joeywzr)).
* Fix extremely rare deadlock during part fetch in zero-copy replication. Fixes [#37423](https://github.com/ClickHouse/ClickHouse/issues/37423). [#37424](https://github.com/ClickHouse/ClickHouse/pull/37424) ([metahys](https://github.com/metahys)).
* Don't allow to create storage with unknown data format. [#37450](https://github.com/ClickHouse/ClickHouse/pull/37450) ([Kruglov Pavel](https://github.com/Avogar)).
* Set `global_memory_usage_overcommit_max_wait_microseconds` default value to 5 seconds. Add info about `OvercommitTracker` to OOM exception message. Add `MemoryOvercommitWaitTimeMicroseconds` profile event. [#37460](https://github.com/ClickHouse/ClickHouse/pull/37460) ([Dmitry Novik](https://github.com/novikd)).

View File

@ -10,9 +10,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 22.8 | ✔️ |
| 22.7 | ✔️ |
| 22.6 | ✔️ |
| 22.5 | ✔️ |
| 22.5 | |
| 22.4 | ❌ |
| 22.3 | ✔️ |
| 22.2 | ❌ |
@ -21,7 +22,7 @@ The following versions of ClickHouse server are currently being supported with s
| 21.11 | ❌ |
| 21.10 | ❌ |
| 21.9 | ❌ |
| 21.8 | ✔️ |
| 21.8 | |
| 21.7 | ❌ |
| 21.6 | ❌ |
| 21.5 | ❌ |

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54465)
SET(VERSION_REVISION 54466)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 8)
SET(VERSION_MINOR 9)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH f4f05ec786a8b8966dd0ea2a2d7e39a8c7db24f4)
SET(VERSION_DESCRIBE v22.8.1.1-testing)
SET(VERSION_STRING 22.8.1.1)
SET(VERSION_GITHASH 09a2ff88435f79e5279745bbe1dc0e5e401df38d)
SET(VERSION_DESCRIBE v22.9.1.1-testing)
SET(VERSION_STRING 22.9.1.1)
# end of autochange

2
contrib/cctz vendored

@ -1 +1 @@
Subproject commit 8c71d74bdf76c3fa401da845089ae60a6c0aeefa
Subproject commit 49c656c62fbd36a1bc20d64c476853bdb7cf7bb9

View File

@ -107,6 +107,13 @@ fi
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# port is needed to check if clickhouse-server is ready for connections
HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port)"
HTTPS_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=https_port)"
if [ -n "$HTTP_PORT" ]; then
URL="http://127.0.0.1:$HTTP_PORT/ping"
else
URL="https://127.0.0.1:$HTTPS_PORT/ping"
fi
# Listen only on localhost until the initialization is done
/usr/bin/clickhouse su "${USER}:${GROUP}" /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
@ -115,7 +122,7 @@ if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 12 retries by default, with 1 sec timeout and 1 sec delay between retries)
tries=${CLICKHOUSE_INIT_TIMEOUT:-12}
while ! wget --spider -T 1 -q "http://127.0.0.1:$HTTP_PORT/ping" 2>/dev/null; do
while ! wget --spider --no-check-certificate -T 1 -q "$URL" 2>/dev/null; do
if [ "$tries" -le "0" ]; then
echo >&2 'ClickHouse init process failed.'
exit 1

View File

@ -284,13 +284,21 @@ function run_tests
# Use awk because bash doesn't support floating point arithmetic.
profile_seconds=$(awk "BEGIN { print ($profile_seconds_left > 0 ? 10 : 0) }")
if [ "$(grep -c $(basename $test) changed-test-definitions.txt)" -gt 0 ]
then
# Run all queries from changed test files to ensure that all new queries will be tested.
max_queries=0
else
max_queries=$CHPC_MAX_QUERIES
fi
(
set +x
argv=(
--host localhost localhost
--port "$LEFT_SERVER_PORT" "$RIGHT_SERVER_PORT"
--runs "$CHPC_RUNS"
--max-queries "$CHPC_MAX_QUERIES"
--max-queries "$max_queries"
--profile-seconds "$profile_seconds"
"$test"

View File

@ -7,6 +7,8 @@ RUN apt-get update -y \
&& env DEBIAN_FRONTEND=noninteractive \
apt-get install --yes --no-install-recommends \
python3-requests \
nodejs \
npm \
&& apt-get clean
COPY s3downloader /s3downloader
@ -14,5 +16,7 @@ COPY s3downloader /s3downloader
ENV S3_URL="https://clickhouse-datasets.s3.amazonaws.com"
ENV DATASETS="hits visits"
RUN npm install -g azurite
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

View File

@ -17,6 +17,7 @@ ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
# install test configs
/usr/share/clickhouse-test/config/install.sh
azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
./setup_minio.sh stateful
function start()

View File

@ -17,6 +17,8 @@ RUN apt-get update -y \
mysql-client=8.0* \
ncdu \
netcat-openbsd \
nodejs \
npm \
openjdk-11-jre-headless \
openssl \
postgresql-client \
@ -75,6 +77,8 @@ ENV MINIO_ROOT_USER="clickhouse"
ENV MINIO_ROOT_PASSWORD="clickhouse"
ENV EXPORT_S3_STORAGE_POLICIES=1
RUN npm install -g azurite
COPY run.sh /
COPY setup_minio.sh /
COPY setup_hdfs_minicluster.sh /

View File

@ -18,6 +18,12 @@ ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
# install test configs
/usr/share/clickhouse-test/config/install.sh
if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]; then
echo "Azure is disabled"
else
azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
fi
./setup_minio.sh stateless
./setup_hdfs_minicluster.sh

View File

@ -178,6 +178,7 @@ install_packages package_folder
configure
azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
./setup_minio.sh stateful # to have a proper environment
start
@ -314,6 +315,11 @@ else
# Avoid "Setting allow_deprecated_database_ordinary is neither a builtin setting..."
rm -f /etc/clickhouse-server/users.d/database_ordinary.xml ||:
# Remove s3 related configs to avoid "there is no disk type `cache`"
rm -f /etc/clickhouse-server/config.d/storage_conf.xml ||:
rm -f /etc/clickhouse-server/config.d/azure_storage_conf.xml ||:
# Disable aggressive cleanup of tmp dirs (it worked incorrectly before 22.8)
rm -f /etc/clickhouse-server/config.d/merge_tree_old_dirs_cleanup.xml ||:
@ -381,6 +387,7 @@ else
-e "TABLE_IS_READ_ONLY" \
-e "Code: 1000, e.code() = 111, Connection refused" \
-e "UNFINISHED" \
-e "NETLINK_ERROR" \
-e "Renaming unexpected part" \
-e "PART_IS_TEMPORARILY_LOCKED" \
-e "and a merge is impossible: we didn't find" \

View File

@ -0,0 +1,374 @@
---
sidebar_position: 1
sidebar_label: 2022
---
# 2022 Changelog
### ClickHouse release v22.8.1.2097-lts (09a2ff88435) FIXME as compared to v22.7.1.2484-stable (f4f05ec786a)
#### Backward Incompatible Change
* Make cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes [#36140](https://github.com/ClickHouse/ClickHouse/issues/36140). Closes [#37889](https://github.com/ClickHouse/ClickHouse/issues/37889). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Extended range of Date32 and DateTime64 to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
#### New Feature
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
* Add SLRU cache policy for uncompressed cache and marks cache. [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)).
* Add concurrent_threads_soft_limit parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)).
* Added support for parallel distributed insert select into tables with Distributed and Replicated engine [#34670](https://github.com/ClickHouse/ClickHouse/issues/34670). [#39107](https://github.com/ClickHouse/ClickHouse/pull/39107) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add new settings to control schema inference from text formats: - `input_format_try_infer_dates` - try infer dates from strings. - `input_format_try_infer_datetimes` - try infer datetimes from strings. - `input_format_try_infer_integers` - try infer `Int64` instead of `Float64`. - `input_format_json_try_infer_numbers_from_strings` - try infer numbers from json strings in JSON formats. [#39186](https://github.com/ClickHouse/ClickHouse/pull/39186) ([Kruglov Pavel](https://github.com/Avogar)).
* This feature will provide JSON formatted log output in console. The purpose is to allow easier ingestion and query in log analysis tools. [#39277](https://github.com/ClickHouse/ClickHouse/pull/39277) ([Mallik Hassan](https://github.com/SadiHassan)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
* Add function `nowInBlock` which allows getting the current time during long-running and continuous queries. Closes [#39522](https://github.com/ClickHouse/ClickHouse/issues/39522). Notes: there are no functions `now64InBlock` neither `todayInBlock`. [#39533](https://github.com/ClickHouse/ClickHouse/pull/39533) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Add result_rows and result_bytes to progress reports (`X-ClickHouse-Summary`). [#39567](https://github.com/ClickHouse/ClickHouse/pull/39567) ([Raúl Marín](https://github.com/Algunenano)).
* adds ability to specify settings for an `executable()` table function. [#39681](https://github.com/ClickHouse/ClickHouse/pull/39681) ([Constantine Peresypkin](https://github.com/pkit)).
* Implemented automatic conversion of database engine from `Ordinary` to `Atomic`. Create empty `convert_ordinary_to_atomic` file in `flags` directory and all `Ordinary` databases will be converted automatically on next server start. Resolves [#39546](https://github.com/ClickHouse/ClickHouse/issues/39546). [#39933](https://github.com/ClickHouse/ClickHouse/pull/39933) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add new setting `schema_inference_hints` that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* * Break on analyze stuck on complex query. [#38185](https://github.com/ClickHouse/ClickHouse/pull/38185) ([Vladimir C](https://github.com/vdimir)).
* Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)).
* `DISTINCT` in order with `ORDER BY` improves memory usage (significantly) and query execution time if `DISTINCT` columns match (or form a prefix of) `ORDER BY` columns. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)).
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
* Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation. It allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)).
* `DistinctSortedTransform` didn't take advantage of sorting, i.e. it worked like ordinary `DISTINCT` implementation. The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
* ColumnVector: optimize filter with AVX512VBMI2 compress store. [#39633](https://github.com/ClickHouse/ClickHouse/pull/39633) ([Guo Wangyang](https://github.com/guowangy)).
* KeyCondition: optimize applyFunction in multi-thread scenario. [#39812](https://github.com/ClickHouse/ClickHouse/pull/39812) ([Guo Wangyang](https://github.com/guowangy)).
* For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. [#40033](https://github.com/ClickHouse/ClickHouse/pull/40033) ([Robert Schulze](https://github.com/rschu1ze)).
* - Don't visit the AST for UDFs if none are registered. [#40069](https://github.com/ClickHouse/ClickHouse/pull/40069) ([Raúl Marín](https://github.com/Algunenano)).
* - Optimize CurrentMemoryTracker alloc and free. [#40078](https://github.com/ClickHouse/ClickHouse/pull/40078) ([Raúl Marín](https://github.com/Algunenano)).
#### Improvement
* Change the way how PK is analyzed for MergeTree. [#25563](https://github.com/ClickHouse/ClickHouse/pull/25563) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* - Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* `timeSlots` now works with DateTime64; subsecond duration and slot size available when working with DateTime64. [#37951](https://github.com/ClickHouse/ClickHouse/pull/37951) ([Andrey Zvonov](https://github.com/zvonand)).
* Add cache for schema inference for file/s3/hdfs/url table functions. Now, schema inference will be performed only on the first query to the file, all subsequent queries to the same file will use the schema from cache if data wasn't changed. Add system table `system.schema_inference_cache` with all current schemas in cache and system queries `SYSTEM DROP SCHEMA CACHE [FOR FILE/S3/HDFS/URL]` to drop schemas from cache. [#38286](https://github.com/ClickHouse/ClickHouse/pull/38286) ([Kruglov Pavel](https://github.com/Avogar)).
* - Simplified function registration macro interface (`FUNCTION_REGISTER*`) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. [#38615](https://github.com/ClickHouse/ClickHouse/pull/38615) ([Li Yin](https://github.com/liyinsg)).
* * Added support of `LEFT SEMI` and `LEFT ANTI` direct join with rocksdb. [#38956](https://github.com/ClickHouse/ClickHouse/pull/38956) ([Vladimir C](https://github.com/vdimir)).
* resolves [#37490](https://github.com/ClickHouse/ClickHouse/issues/37490). [#39054](https://github.com/ClickHouse/ClickHouse/pull/39054) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Store Keeper API version inside a predefined path. [#39096](https://github.com/ClickHouse/ClickHouse/pull/39096) ([Antonio Andelic](https://github.com/antonio2368)).
* Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add profile events for fsync. [#39179](https://github.com/ClickHouse/ClickHouse/pull/39179) ([Azat Khuzhin](https://github.com/azat)).
* Add the second argument to the ordinary function `file(path[, default])`, which function returns in the case when a file does not exists. [#39218](https://github.com/ClickHouse/ClickHouse/pull/39218) ([Nikolay Degterinsky](https://github.com/evillique)).
* Some small fixes for reading via http, allow to retry partial content in case if got 200OK. [#39244](https://github.com/ClickHouse/ClickHouse/pull/39244) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improved Base58 encoding/decoding. [#39292](https://github.com/ClickHouse/ClickHouse/pull/39292) ([Andrey Zvonov](https://github.com/zvonand)).
* Normalize `AggregateFunction` types and state representations because optimizations like https://github.com/ClickHouse/ClickHouse/pull/35788 will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)).
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
* Support queries `CREATE TEMPORARY TABLE ... (<list of columns>) AS ...`. [#39462](https://github.com/ClickHouse/ClickHouse/pull/39462) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support of `!`/`*` (exclamation/asterisk) in custom TLDs (`cutToFirstSignificantSubdomainCustom()`/`cutToFirstSignificantSubdomainCustomWithWWW()`/`firstSignificantSubdomainCustom()`). [#39496](https://github.com/ClickHouse/ClickHouse/pull/39496) ([Azat Khuzhin](https://github.com/azat)).
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
* Refactored a little code, removed duplicate code. [#39509](https://github.com/ClickHouse/ClickHouse/pull/39509) ([Simon Liu](https://github.com/monadbobo)).
* Add support for TLS connections to NATS. Implements [#39525](https://github.com/ClickHouse/ClickHouse/issues/39525). [#39527](https://github.com/ClickHouse/ClickHouse/pull/39527) ([Constantine Peresypkin](https://github.com/pkit)).
* `clickhouse-obfuscator` (a tool for database obfuscation for testing and load generation) now has the new `--save` and `--load` parameters to work with pre-trained models. This closes [#39534](https://github.com/ClickHouse/ClickHouse/issues/39534). [#39541](https://github.com/ClickHouse/ClickHouse/pull/39541) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix incorrect behavior of log rotation during restart. [#39558](https://github.com/ClickHouse/ClickHouse/pull/39558) ([Nikolay Degterinsky](https://github.com/evillique)).
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
* Add formats `PrettyMonoBlock`, `PrettyNoEscapesMonoBlock`, `PrettyCompactNoEscapes`, `PrettyCompactNoEscapesMonoBlock`, `PrettySpaceNoEscapes`, `PrettySpaceMonoBlock`, `PrettySpaceNoEscapesMonoBlock`. [#39646](https://github.com/ClickHouse/ClickHouse/pull/39646) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes [#39667](https://github.com/ClickHouse/ClickHouse/issues/39667) . [#39671](https://github.com/ClickHouse/ClickHouse/pull/39671) ([Amos Bird](https://github.com/amosbird)).
* Allow to execute hash functions with arguments of type `Map`. [#39685](https://github.com/ClickHouse/ClickHouse/pull/39685) ([Anton Popov](https://github.com/CurtizJ)).
* Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. [#39690](https://github.com/ClickHouse/ClickHouse/pull/39690) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* change the prefix size of AggregateFunctionDistinct to make sure nested function data memory aligned. [#39696](https://github.com/ClickHouse/ClickHouse/pull/39696) ([Pxl](https://github.com/BiteTheDDDDt)).
* Properly escape credentials passed to the `clickhouse-diagnostic` tool. [#39707](https://github.com/ClickHouse/ClickHouse/pull/39707) ([Dale McDiarmid](https://github.com/gingerwizard)).
* keeper-improvement: create a snapshot on exit. It can be controlled with the config `keeper_server.create_snapshot_on_exit`, `true` by default. [#39755](https://github.com/ClickHouse/ClickHouse/pull/39755) ([Antonio Andelic](https://github.com/antonio2368)).
* Support primary key analysis for `row_policy_filter` and `additional_filter`. It also helps fix issues like [#37454](https://github.com/ClickHouse/ClickHouse/issues/37454) . [#39826](https://github.com/ClickHouse/ClickHouse/pull/39826) ([Amos Bird](https://github.com/amosbird)).
* Parameters are now transferred in `Query` packets right after the query text in the same serialisation format as the settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
* Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes [#39957](https://github.com/ClickHouse/ClickHouse/issues/39957). This closes [#39960](https://github.com/ClickHouse/ClickHouse/issues/39960). [#39961](https://github.com/ClickHouse/ClickHouse/pull/39961) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. [#39962](https://github.com/ClickHouse/ClickHouse/pull/39962) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes [#38070](https://github.com/ClickHouse/ClickHouse/issues/38070). See also [this](https://github.com/ClickHouse/ClickBench/blob/main/hardware/benchmark-cloud.sh#L37) for motivation. [#39968](https://github.com/ClickHouse/ClickHouse/pull/39968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Adds `parseDateTime64BestEffortUS`, `parseDateTime64BestEffortUSOrNull`, `parseDateTime64BestEffortUSOrZero` functions, closing [#37492](https://github.com/ClickHouse/ClickHouse/issues/37492). [#40015](https://github.com/ClickHouse/ClickHouse/pull/40015) ([Tanya Bragin](https://github.com/tbragin)).
* * Add observer mode to (zoo)keeper cluster discovery feature. In this mode node itself doesn't belong to cluster. [#40035](https://github.com/ClickHouse/ClickHouse/pull/40035) ([Vladimir C](https://github.com/vdimir)).
* Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. [#40053](https://github.com/ClickHouse/ClickHouse/pull/40053) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Extend processors_profile_log with more information such as input rows. [#40121](https://github.com/ClickHouse/ClickHouse/pull/40121) ([Amos Bird](https://github.com/amosbird)).
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). [#40184](https://github.com/ClickHouse/ClickHouse/pull/40184) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Display server-side time in `clickhouse-benchmark` by default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new `--client-side-time` command line option. Change the `--randomize` command line option from `--randomize 1` to the form without argument. [#40193](https://github.com/ClickHouse/ClickHouse/pull/40193) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for `overflow_mode` = `break` and `throw`). For example, if you have set up `max_rows_to_read` with `read_overflow_mode = 'break'`, looking at the value of `OverflowBreak` counter will allow distinguishing incomplete results. [#40205](https://github.com/ClickHouse/ClickHouse/pull/40205) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix memory accounting in case of MEMORY_LIMIT_EXCEEDED errors (previously [peak] memory usage was takes failed allocations into account). [#40249](https://github.com/ClickHouse/ClickHouse/pull/40249) ([Azat Khuzhin](https://github.com/azat)).
* Add current metrics for fs cache: `FilesystemCacheSize` and `FilesystemCacheElements`. [#40260](https://github.com/ClickHouse/ClickHouse/pull/40260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
#### Bug Fix
* Support hadoop secure rpc transfer(hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). [#39411](https://github.com/ClickHouse/ClickHouse/pull/39411) ([michael1589](https://github.com/michael1589)).
* Fix seeking while reading from encrypted disk. This PR fixes [#38381](https://github.com/ClickHouse/ClickHouse/issues/38381). [#39687](https://github.com/ClickHouse/ClickHouse/pull/39687) ([Vitaly Baranov](https://github.com/vitlibar)).
* * Fix duplicate columns in join plan. Finally, solve [#26809](https://github.com/ClickHouse/ClickHouse/issues/26809). [#40009](https://github.com/ClickHouse/ClickHouse/pull/40009) ([Vladimir C](https://github.com/vdimir)).
#### Build/Testing/Packaging Improvement
* Prebuild ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011. [#39000](https://github.com/ClickHouse/ClickHouse/pull/39000) ([Robert Schulze](https://github.com/rschu1ze)).
* Former packages used to install systemd.service file to `/etc`. The files there are marked as `conf` and are not cleaned out, and not updated automatically. This PR cleans them out. [#39323](https://github.com/ClickHouse/ClickHouse/pull/39323) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix LSan by fixing getauxval(). [#39430](https://github.com/ClickHouse/ClickHouse/pull/39430) ([Azat Khuzhin](https://github.com/azat)).
* TSAN has issues with clang-14 (https://github.com/google/sanitizers/issues/1552, https://github.com/google/sanitizers/issues/1540), so here we temporary build the TSAN binaries with clang-13. [#39450](https://github.com/ClickHouse/ClickHouse/pull/39450) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove the option to build ClickHouse tools as separate executable programs. This fixes [#37847](https://github.com/ClickHouse/ClickHouse/issues/37847). [#39520](https://github.com/ClickHouse/ClickHouse/pull/39520) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed Unit tests for wide integers on s390x. [#39627](https://github.com/ClickHouse/ClickHouse/pull/39627) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Increase max cache size for clang-tidy builds. Try to avoid flushing it out between builds. [#39652](https://github.com/ClickHouse/ClickHouse/pull/39652) ([Nikita Taranov](https://github.com/nickitat)).
* No need to use fixed IP when you are using cluster with SSL. Using the same fixed IP could trigger collision between tests. At this change the server's certificate is generated for a designated host name (see server-ext.cnf at each test). The client should check server's certificate against that name accordingly. [#40007](https://github.com/ClickHouse/ClickHouse/pull/40007) ([Sema Checherinda](https://github.com/CheSema)).
* Support build with `clang-16` (trunk). This closes [#39949](https://github.com/ClickHouse/ClickHouse/issues/39949). [#40181](https://github.com/ClickHouse/ClickHouse/pull/40181) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prepare RISC-V 64 build to run in CI. This is for [#40141](https://github.com/ClickHouse/ClickHouse/issues/40141). [#40197](https://github.com/ClickHouse/ClickHouse/pull/40197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. [#37849](https://github.com/ClickHouse/ClickHouse/pull/37849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). [#38725](https://github.com/ClickHouse/ClickHouse/pull/38725) ([Azat Khuzhin](https://github.com/azat)).
* Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes [#37032](https://github.com/ClickHouse/ClickHouse/issues/37032). [#39103](https://github.com/ClickHouse/ClickHouse/pull/39103) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Fix wrong table name in logs after RENAME TABLE. This fixes [#38018](https://github.com/ClickHouse/ClickHouse/issues/38018). [#39227](https://github.com/ClickHouse/ClickHouse/pull/39227) ([Amos Bird](https://github.com/amosbird)).
* Fix positional arguments in case of columns pruning when optimising the query. Closes [#38433](https://github.com/ClickHouse/ClickHouse/issues/38433). [#39293](https://github.com/ClickHouse/ClickHouse/pull/39293) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty `Tuple` type. Closes [#39051](https://github.com/ClickHouse/ClickHouse/issues/39051) Add 2 new settings `input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference` that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. [#39357](https://github.com/ClickHouse/ClickHouse/pull/39357) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix segmentation fault on `CREATE WINDOW VIEW .. ON CLUSTER ... INNER`. Closes [#39363](https://github.com/ClickHouse/ClickHouse/issues/39363). [#39384](https://github.com/ClickHouse/ClickHouse/pull/39384) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix WriteBuffer finalize when cancel insert into function. Proper version of https://github.com/ClickHouse/ClickHouse/pull/39396 that was reverted. [#39458](https://github.com/ClickHouse/ClickHouse/pull/39458) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix storing of columns of type `Object` in sparse serialization. [#39464](https://github.com/ClickHouse/ClickHouse/pull/39464) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible "Not found column in block" exception when using projections. This closes [#39469](https://github.com/ClickHouse/ClickHouse/issues/39469). [#39470](https://github.com/ClickHouse/ClickHouse/pull/39470) ([小路](https://github.com/nicelulu)).
* Fix LOGICAL_ERROR on race between DROP and INSERT with materialized views. [#39477](https://github.com/ClickHouse/ClickHouse/pull/39477) ([Azat Khuzhin](https://github.com/azat)).
* Fix data race and possible heap-buffer-overflow in Avro format. Closes [#39094](https://github.com/ClickHouse/ClickHouse/issues/39094) Closes [#33652](https://github.com/ClickHouse/ClickHouse/issues/33652). [#39498](https://github.com/ClickHouse/ClickHouse/pull/39498) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare bug in asynchronous reading (with setting `local_filesystem_read_method='pread_threadpool'`) with enabled `O_DIRECT` (enabled by setting `min_bytes_to_use_direct_io`). [#39506](https://github.com/ClickHouse/ClickHouse/pull/39506) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. [#39551](https://github.com/ClickHouse/ClickHouse/pull/39551) ([Alexander Gololobov](https://github.com/davenger)).
* Fix bug with maxsplit argument for splitByChar, which was not working correctly. [#39552](https://github.com/ClickHouse/ClickHouse/pull/39552) ([filimonov](https://github.com/filimonov)).
* * Fix bug in ASOF JOIN with `enable_optimize_predicate_expression`, close [#37813](https://github.com/ClickHouse/ClickHouse/issues/37813). [#39556](https://github.com/ClickHouse/ClickHouse/pull/39556) ([Vladimir C](https://github.com/vdimir)).
* Fixed `CREATE/DROP INDEX` query with `ON CLUSTER` or `Replicated` database and `ReplicatedMergeTree`. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes [#39511](https://github.com/ClickHouse/ClickHouse/issues/39511). [#39565](https://github.com/ClickHouse/ClickHouse/pull/39565) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "column not found" error for push down with join, close [#39505](https://github.com/ClickHouse/ClickHouse/issues/39505). [#39575](https://github.com/ClickHouse/ClickHouse/pull/39575) ([Vladimir C](https://github.com/vdimir)).
* Fix the wrong `REGEXP_REPLACE` alias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. [#39592](https://github.com/ClickHouse/ClickHouse/pull/39592) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula `exp((t - curr_row_t) / decay_length)`, which is incorrect when right boundary of window is not `CURRENT ROW`. It was changed to: `exp((t - last_row_t) / decay_length)`. There is no change in results for windows with `ROWS BETWEEN (smth) AND CURRENT ROW`. [#39593](https://github.com/ClickHouse/ClickHouse/pull/39593) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fix Decimal division overflow, which can be detected based on operands scale. [#39600](https://github.com/ClickHouse/ClickHouse/pull/39600) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix settings `output_format_arrow_string_as_string` and `output_format_arrow_low_cardinality_as_dictionary` work in combination. Closes [#39624](https://github.com/ClickHouse/ClickHouse/issues/39624). [#39647](https://github.com/ClickHouse/ClickHouse/pull/39647) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed a bug in default database resolution in distributed table reads. [#39674](https://github.com/ClickHouse/ClickHouse/pull/39674) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. [#39708](https://github.com/ClickHouse/ClickHouse/pull/39708) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix possible error `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinality` Fixes [#38460](https://github.com/ClickHouse/ClickHouse/issues/38460). [#39716](https://github.com/ClickHouse/ClickHouse/pull/39716) ([Arthur Passos](https://github.com/arthurpassos)).
* Field names in the `meta` section of JSON format were erroneously double escaped. This closes [#39693](https://github.com/ClickHouse/ClickHouse/issues/39693). [#39747](https://github.com/ClickHouse/ClickHouse/pull/39747) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong index analysis with tuples and operator `IN`, which could lead to wrong query result. [#39752](https://github.com/ClickHouse/ClickHouse/pull/39752) ([Anton Popov](https://github.com/CurtizJ)).
* Fix EmbeddedRocksDB filtering by key using params. [#39757](https://github.com/ClickHouse/ClickHouse/pull/39757) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix error `Invalid number of columns in chunk pushed to OutputPort` which was cause by ARRAY JOIN optimization. Fixes [#39164](https://github.com/ClickHouse/ClickHouse/issues/39164). [#39799](https://github.com/ClickHouse/ClickHouse/pull/39799) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `CANNOT_READ_ALL_DATA` exception with `local_filesystem_read_method=pread_threadpool`. This bug affected only Linux kernel version 5.9 and 5.10 according to [man](https://manpages.debian.org/testing/manpages-dev/preadv2.2.en.html#BUGS). [#39800](https://github.com/ClickHouse/ClickHouse/pull/39800) ([Anton Popov](https://github.com/CurtizJ)).
* Fix quota_key application on connect. [#39874](https://github.com/ClickHouse/ClickHouse/pull/39874) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* we meeted query exceptions: DB::Exception: Cannot open file /media/ssd1/fordata/clickhouse/data/data/perf/perf_log_local_v3_1/20220618_17233_17238_1/namespace.dict.bin, errno: 24, strerror: Too many open files. [#39886](https://github.com/ClickHouse/ClickHouse/pull/39886) ([Fangyuan Deng](https://github.com/pzhdfy)).
* fix broken NFS mkdir for root-squashed volumes. [#39898](https://github.com/ClickHouse/ClickHouse/pull/39898) ([Constantine Peresypkin](https://github.com/pkit)).
* Remove dictionaries from prometheus metrics on DETACH/DROP. [#39926](https://github.com/ClickHouse/ClickHouse/pull/39926) ([Azat Khuzhin](https://github.com/azat)).
* Fix read of StorageFile with virtual columns. Closes [#39907](https://github.com/ClickHouse/ClickHouse/issues/39907). [#39943](https://github.com/ClickHouse/ClickHouse/pull/39943) ([flynn](https://github.com/ucasfl)).
* Fix big memory usage during fetches. Fixes [#39915](https://github.com/ClickHouse/ClickHouse/issues/39915). [#39990](https://github.com/ClickHouse/ClickHouse/pull/39990) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* - Fix hashId crash and salt parameter not being used. [#40002](https://github.com/ClickHouse/ClickHouse/pull/40002) ([Raúl Marín](https://github.com/Algunenano)).
* fix HashMethodOneNumber get wrong key value when column is const. [#40020](https://github.com/ClickHouse/ClickHouse/pull/40020) ([Duc Canh Le](https://github.com/canhld94)).
* Fixed "Part directory doesn't exist" and "`tmp_<part_name>` ... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, but `tmp-fetch_<part_name>` directory was not cleaned up. [#40031](https://github.com/ClickHouse/ClickHouse/pull/40031) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix rare cases of parsing of arrays of tuples in format `Values`. [#40034](https://github.com/ClickHouse/ClickHouse/pull/40034) ([Anton Popov](https://github.com/CurtizJ)).
* Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. [#40037](https://github.com/ClickHouse/ClickHouse/pull/40037) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix potential deadlock in WriteBufferFromS3 during task scheduling failure. [#40070](https://github.com/ClickHouse/ClickHouse/pull/40070) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bug in collectFilesToSkip() by adding correct file extension(.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed [#39896](https://github.com/ClickHouse/ClickHouse/issues/39896). [#40095](https://github.com/ClickHouse/ClickHouse/pull/40095) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* A segmentation fault that has CaresPTRResolver::resolve in the stack trace has been reported:. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* - Fix unexpected result arrayDifference of Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
* Fix the case when the order of columns can be incorrect if the `IN` operator is used with a table with `ENGINE = Set` containing multiple columns. This fixes [#13014](https://github.com/ClickHouse/ClickHouse/issues/13014). [#40225](https://github.com/ClickHouse/ClickHouse/pull/40225) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix possible segfault in CapnProto input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
* - Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). [#40264](https://github.com/ClickHouse/ClickHouse/pull/40264) ([Robert Schulze](https://github.com/rschu1ze)).
#### Build
* Fix build error: ``` [ 69%] Building CXX object src/CMakeFiles/clickhouse_common_io.dir/Common/waitForPid.cpp.o /CLionProjects/clickhouse-yandex/src/Common/waitForPid.cpp:112:5: error: identifier '__kevp__' is reserved because it starts with '__' [-Werror,-Wreserved-identifier] EV_SET(&change, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL); ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/event.h:108:17: note: expanded from macro 'EV_SET' struct kevent *__kevp__ = (kevp); \ ^ ```. [#39493](https://github.com/ClickHouse/ClickHouse/pull/39493) ([小路](https://github.com/nicelulu)).
#### Build Improvement
* Fixed Endian issue in BitHelpers for s390x. [#39656](https://github.com/ClickHouse/ClickHouse/pull/39656) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). [#39732](https://github.com/ClickHouse/ClickHouse/pull/39732) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). [#39931](https://github.com/ClickHouse/ClickHouse/pull/39931) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). [#40008](https://github.com/ClickHouse/ClickHouse/pull/40008) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). [#40179](https://github.com/ClickHouse/ClickHouse/pull/40179) ([Harry Lee](https://github.com/HarryLeeIBM)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "tests: enable back 02232_dist_insert_send_logs_level_hung"'. [#39788](https://github.com/ClickHouse/ClickHouse/pull/39788) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Update arrow to fix possible data race"'. [#39804](https://github.com/ClickHouse/ClickHouse/pull/39804) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Revert "Update arrow to fix possible data race""'. [#39811](https://github.com/ClickHouse/ClickHouse/pull/39811) ([Kruglov Pavel](https://github.com/Avogar)).
* NO CL ENTRY: 'Revert "Limit number of analyze for one query"'. [#39816](https://github.com/ClickHouse/ClickHouse/pull/39816) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Revert "tests: enable back 02232_dist_insert_send_logs_level_hung""'. [#39817](https://github.com/ClickHouse/ClickHouse/pull/39817) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Prepare library-bridge for catboost integration'. [#39904](https://github.com/ClickHouse/ClickHouse/pull/39904) ([Robert Schulze](https://github.com/rschu1ze)).
* NO CL ENTRY: 'Revert "ColumnVector: optimize filter with AVX512VBMI2 compress store"'. [#39963](https://github.com/ClickHouse/ClickHouse/pull/39963) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "copy self-extracting to output"'. [#40005](https://github.com/ClickHouse/ClickHouse/pull/40005) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Use separate counter for RSS in global memory tracker."'. [#40199](https://github.com/ClickHouse/ClickHouse/pull/40199) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "tests/performance: cover sparse_hashed dictionary"'. [#40268](https://github.com/ClickHouse/ClickHouse/pull/40268) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Test/insert deduplication token materialized views [#34662](https://github.com/ClickHouse/ClickHouse/pull/34662) ([Denny Crane](https://github.com/den-crane)).
* Merging [#34372](https://github.com/ClickHouse/ClickHouse/issues/34372) [#35968](https://github.com/ClickHouse/ClickHouse/pull/35968) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Use separate counter for RSS in global memory tracker. [#38682](https://github.com/ClickHouse/ClickHouse/pull/38682) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Build self-extracting-executable utils [#38936](https://github.com/ClickHouse/ClickHouse/pull/38936) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Improvements in integration tests [#38978](https://github.com/ClickHouse/ClickHouse/pull/38978) ([Ilya Yatsishin](https://github.com/qoega)).
* More readable regexp in `test_quota` [#39084](https://github.com/ClickHouse/ClickHouse/pull/39084) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* Fixed regexp in `test_match_process_uid_against_data_owner` [#39085](https://github.com/ClickHouse/ClickHouse/pull/39085) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* tests: enable back 02232_dist_insert_send_logs_level_hung [#39124](https://github.com/ClickHouse/ClickHouse/pull/39124) ([Azat Khuzhin](https://github.com/azat)).
* Add connection info for Distributed sends log message [#39178](https://github.com/ClickHouse/ClickHouse/pull/39178) ([Azat Khuzhin](https://github.com/azat)).
* Forbid defining non-default disk with default path from <path> [#39183](https://github.com/ClickHouse/ClickHouse/pull/39183) ([Azat Khuzhin](https://github.com/azat)).
* Fix LZ4 decompression issue for s390x [#39195](https://github.com/ClickHouse/ClickHouse/pull/39195) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Do not report "Failed communicating with" on and on for parts exchange [#39222](https://github.com/ClickHouse/ClickHouse/pull/39222) ([Azat Khuzhin](https://github.com/azat)).
* Improve logging around replicated merges [#39230](https://github.com/ClickHouse/ClickHouse/pull/39230) ([Raúl Marín](https://github.com/Algunenano)).
* Cleanup logic around join_algorithm setting, add docs [#39271](https://github.com/ClickHouse/ClickHouse/pull/39271) ([Vladimir C](https://github.com/vdimir)).
* Possible fix for flaky `test_keeper_force_recovery` [#39321](https://github.com/ClickHouse/ClickHouse/pull/39321) ([Antonio Andelic](https://github.com/antonio2368)).
* tests/performance: improve parallel_mv test [#39325](https://github.com/ClickHouse/ClickHouse/pull/39325) ([Azat Khuzhin](https://github.com/azat)).
* Update azure library (removed "harmful" function) [#39327](https://github.com/ClickHouse/ClickHouse/pull/39327) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Refactor PreparedSets/SubqueryForSet [#39343](https://github.com/ClickHouse/ClickHouse/pull/39343) ([Vladimir C](https://github.com/vdimir)).
* Small doc updates [#39362](https://github.com/ClickHouse/ClickHouse/pull/39362) ([Robert Schulze](https://github.com/rschu1ze)).
* Even less usage of StringRef [#39364](https://github.com/ClickHouse/ClickHouse/pull/39364) ([Robert Schulze](https://github.com/rschu1ze)).
* Automatic fixes for black formatting for domestic repo PRs [#39390](https://github.com/ClickHouse/ClickHouse/pull/39390) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Clickhouse-local fixes [#39404](https://github.com/ClickHouse/ClickHouse/pull/39404) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Uppercase `ROWS`, `GROUPS`, `RANGE` in queries with windows [#39410](https://github.com/ClickHouse/ClickHouse/pull/39410) ([Vladimir Chebotaryov](https://github.com/quickhouse)).
* GitHub helper [#39421](https://github.com/ClickHouse/ClickHouse/pull/39421) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* ShellCommand wait pid refactoring [#39426](https://github.com/ClickHouse/ClickHouse/pull/39426) ([Maksim Kita](https://github.com/kitaisreal)).
* Require clear style check to continue building [#39428](https://github.com/ClickHouse/ClickHouse/pull/39428) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* DirectDictionary improve performance of dictHas with duplicate keys [#39449](https://github.com/ClickHouse/ClickHouse/pull/39449) ([Maksim Kita](https://github.com/kitaisreal)).
* Commit status names: remove "actions" [#39454](https://github.com/ClickHouse/ClickHouse/pull/39454) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve synchronization between hosts in distributed backup and fix locks [#39455](https://github.com/ClickHouse/ClickHouse/pull/39455) ([Vitaly Baranov](https://github.com/vitlibar)).
* Remove some dead and commented code [#39460](https://github.com/ClickHouse/ClickHouse/pull/39460) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add Build Check and Special Build Check to SimpleCheck [#39467](https://github.com/ClickHouse/ClickHouse/pull/39467) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update version after release [#39474](https://github.com/ClickHouse/ClickHouse/pull/39474) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.7.1.2484-stable [#39475](https://github.com/ClickHouse/ClickHouse/pull/39475) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Update README.md [#39478](https://github.com/ClickHouse/ClickHouse/pull/39478) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Remove unused constructor [#39491](https://github.com/ClickHouse/ClickHouse/pull/39491) ([alesapin](https://github.com/alesapin)).
* Mark new codec DEFLATE_QPL as experimental + cosmetics [#39495](https://github.com/ClickHouse/ClickHouse/pull/39495) ([Robert Schulze](https://github.com/rschu1ze)).
* Update arrow to fix possible data race [#39510](https://github.com/ClickHouse/ClickHouse/pull/39510) ([Kruglov Pavel](https://github.com/Avogar)).
* fix `-DENABLE_EXAMPLES=1` in master [#39517](https://github.com/ClickHouse/ClickHouse/pull/39517) ([Constantine Peresypkin](https://github.com/pkit)).
* LZ4_decompress_faster.cpp: remove endianness-dependent code [#39523](https://github.com/ClickHouse/ClickHouse/pull/39523) ([Ignat Loskutov](https://github.com/loskutov)).
* Fix 02286_parallel_final [#39524](https://github.com/ClickHouse/ClickHouse/pull/39524) ([Nikita Taranov](https://github.com/nickitat)).
* add Equinix metal N3 Xlarge [#39532](https://github.com/ClickHouse/ClickHouse/pull/39532) ([Tyler Hannan](https://github.com/tylerhannan)).
* Less usage of StringRef [#39535](https://github.com/ClickHouse/ClickHouse/pull/39535) ([Robert Schulze](https://github.com/rschu1ze)).
* Follow up to [#37827](https://github.com/ClickHouse/ClickHouse/issues/37827) [#39557](https://github.com/ClickHouse/ClickHouse/pull/39557) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Temporarily disable all tests with MaterializedPostgreSQL [#39564](https://github.com/ClickHouse/ClickHouse/pull/39564) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update version_date.tsv after v22.3.9.19-lts [#39576](https://github.com/ClickHouse/ClickHouse/pull/39576) ([github-actions[bot]](https://github.com/apps/github-actions)).
* free compression and decompression contexts [#39578](https://github.com/ClickHouse/ClickHouse/pull/39578) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update version_date.tsv and changelogs after v22.6.4.35-stable [#39579](https://github.com/ClickHouse/ClickHouse/pull/39579) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Merge Woboq code browser page into "Getting Started" document [#39596](https://github.com/ClickHouse/ClickHouse/pull/39596) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix Chain::addSink [#39601](https://github.com/ClickHouse/ClickHouse/pull/39601) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update NuRaft to latest master [#39609](https://github.com/ClickHouse/ClickHouse/pull/39609) ([Antonio Andelic](https://github.com/antonio2368)).
* copy self-extracting to output [#39617](https://github.com/ClickHouse/ClickHouse/pull/39617) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Replace MemoryTrackerBlockerInThread to LockMemoryExceptionInThread [#39619](https://github.com/ClickHouse/ClickHouse/pull/39619) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Combining sumIf->countIf and multiIf->if opt. [#39621](https://github.com/ClickHouse/ClickHouse/pull/39621) ([Amos Bird](https://github.com/amosbird)).
* Update README.md [#39622](https://github.com/ClickHouse/ClickHouse/pull/39622) ([Ivan Blinkov](https://github.com/blinkov)).
* Disable 02327_capnproto_protobuf_empty_messages with Ordinary [#39623](https://github.com/ClickHouse/ClickHouse/pull/39623) ([Alexander Tokmakov](https://github.com/tavplubix)).
* add Dell PowerEdge R740XD results [#39625](https://github.com/ClickHouse/ClickHouse/pull/39625) ([Tyler Hannan](https://github.com/tylerhannan)).
* Attempt to fix wrong workflow_run data for rerun [#39630](https://github.com/ClickHouse/ClickHouse/pull/39630) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Run tests with Replicated database in master [#39653](https://github.com/ClickHouse/ClickHouse/pull/39653) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Rollback request in Keeper if storing log fails [#39673](https://github.com/ClickHouse/ClickHouse/pull/39673) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix utils build on CI [#39679](https://github.com/ClickHouse/ClickHouse/pull/39679) ([Azat Khuzhin](https://github.com/azat)).
* Add duration_ms into system.zookeeper_log [#39686](https://github.com/ClickHouse/ClickHouse/pull/39686) ([Azat Khuzhin](https://github.com/azat)).
* Fix DISTINCT: handle all const columns case correctly [#39688](https://github.com/ClickHouse/ClickHouse/pull/39688) ([Igor Nikonov](https://github.com/devcrafter)).
* Update README.md [#39692](https://github.com/ClickHouse/ClickHouse/pull/39692) ([Yuko Takagi](https://github.com/yukotakagi)).
* Update Keeper version for digest [#39698](https://github.com/ClickHouse/ClickHouse/pull/39698) ([Antonio Andelic](https://github.com/antonio2368)).
* Change mysql-odbc url [#39702](https://github.com/ClickHouse/ClickHouse/pull/39702) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Avoid recursive destruction of AST. [#39705](https://github.com/ClickHouse/ClickHouse/pull/39705) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update ccache to the latest available version [#39709](https://github.com/ClickHouse/ClickHouse/pull/39709) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Join enums refactoring [#39718](https://github.com/ClickHouse/ClickHouse/pull/39718) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix flaky test `02360_send_logs_level_colors` [#39720](https://github.com/ClickHouse/ClickHouse/pull/39720) ([Anton Popov](https://github.com/CurtizJ)).
* Fix cherry-pick for cases, when assignee is not set for PR [#39723](https://github.com/ClickHouse/ClickHouse/pull/39723) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Jepsen label [#39730](https://github.com/ClickHouse/ClickHouse/pull/39730) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix redirecting of logs to stdout in clickhouse-client [#39731](https://github.com/ClickHouse/ClickHouse/pull/39731) ([Anton Popov](https://github.com/CurtizJ)).
* CI: refactor Simple Check, use statuses to make it stateful [#39735](https://github.com/ClickHouse/ClickHouse/pull/39735) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Use different root path for total-queue Jepsen test [#39738](https://github.com/ClickHouse/ClickHouse/pull/39738) ([Antonio Andelic](https://github.com/antonio2368)).
* Simple refactoring: ordinary DISTINCT implementation [#39740](https://github.com/ClickHouse/ClickHouse/pull/39740) ([Igor Nikonov](https://github.com/devcrafter)).
* Cleanup usages of `allow_experimental_projection_optimization` setting, part 1 [#39746](https://github.com/ClickHouse/ClickHouse/pull/39746) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable SQL function getOSKernelVersion() on all platforms [#39751](https://github.com/ClickHouse/ClickHouse/pull/39751) ([Robert Schulze](https://github.com/rschu1ze)).
* Try clang-15 for build with tsan [#39758](https://github.com/ClickHouse/ClickHouse/pull/39758) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Rename "splitted build" to "shared libraries build" in CI tools [#39759](https://github.com/ClickHouse/ClickHouse/pull/39759) ([Robert Schulze](https://github.com/rschu1ze)).
* Use std::popcount, ::countl_zero, ::countr_zero functions [#39760](https://github.com/ClickHouse/ClickHouse/pull/39760) ([Robert Schulze](https://github.com/rschu1ze)).
* Self-extracting - run resulting executable with execvp [#39763](https://github.com/ClickHouse/ClickHouse/pull/39763) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix non-deterministic queries in distinct_in_order test [#39772](https://github.com/ClickHouse/ClickHouse/pull/39772) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix some flaky integration tests [#39775](https://github.com/ClickHouse/ClickHouse/pull/39775) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Retry inserts with ClickHouseHelper [#39780](https://github.com/ClickHouse/ClickHouse/pull/39780) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add cloudflare DNS as a fallback [#39795](https://github.com/ClickHouse/ClickHouse/pull/39795) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update README.md [#39796](https://github.com/ClickHouse/ClickHouse/pull/39796) ([Yuko Takagi](https://github.com/yukotakagi)).
* Minor fix for Stress Tests [#39798](https://github.com/ClickHouse/ClickHouse/pull/39798) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Typos [#39813](https://github.com/ClickHouse/ClickHouse/pull/39813) ([Robert Schulze](https://github.com/rschu1ze)).
* Update settings changes history [#39839](https://github.com/ClickHouse/ClickHouse/pull/39839) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix post-build script for building utils/self-extracting-executable/compressor [#39843](https://github.com/ClickHouse/ClickHouse/pull/39843) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add hasJoin method into ASTSelectQuery [#39850](https://github.com/ClickHouse/ClickHouse/pull/39850) ([Maksim Kita](https://github.com/kitaisreal)).
* Update tweak on version part update [#39853](https://github.com/ClickHouse/ClickHouse/pull/39853) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.7.2.15-stable [#39854](https://github.com/ClickHouse/ClickHouse/pull/39854) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Fix typo and extra dots in exception messages from OverCommitTracker [#39858](https://github.com/ClickHouse/ClickHouse/pull/39858) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix flaky integration test test_async_backups_to_same_destination. [#39859](https://github.com/ClickHouse/ClickHouse/pull/39859) ([Vitaly Baranov](https://github.com/vitlibar)).
* Better total part size calculation on mutation [#39860](https://github.com/ClickHouse/ClickHouse/pull/39860) ([alesapin](https://github.com/alesapin)).
* typo: PostgerSQL -> PostgreSQL [#39861](https://github.com/ClickHouse/ClickHouse/pull/39861) ([nathanbegbie](https://github.com/nathanbegbie)).
* Remove prefer_localhost_replica from test [#39862](https://github.com/ClickHouse/ClickHouse/pull/39862) ([Igor Nikonov](https://github.com/devcrafter)).
* Block memory tracker in Keeper during commit [#39867](https://github.com/ClickHouse/ClickHouse/pull/39867) ([Antonio Andelic](https://github.com/antonio2368)).
* Update version_date.tsv after v22.3.10.22-lts [#39868](https://github.com/ClickHouse/ClickHouse/pull/39868) ([github-actions[bot]](https://github.com/apps/github-actions)).
* fix incorrect format for functions with settings [#39869](https://github.com/ClickHouse/ClickHouse/pull/39869) ([Constantine Peresypkin](https://github.com/pkit)).
* Get api url from event, not from const/ENV [#39871](https://github.com/ClickHouse/ClickHouse/pull/39871) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Cleanup unused dirs from `store/` on all disks [#39872](https://github.com/ClickHouse/ClickHouse/pull/39872) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#39893](https://github.com/ClickHouse/ClickHouse/pull/39893) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix the race between waitMutation and updating local queue from ZK [#39900](https://github.com/ClickHouse/ClickHouse/pull/39900) ([Alexander Gololobov](https://github.com/davenger)).
* Improve 02354_distributed_with_external_aggregation_memory_usage [#39908](https://github.com/ClickHouse/ClickHouse/pull/39908) ([Nikita Taranov](https://github.com/nickitat)).
* Move username and password from URL parameters to Basic Authentication [#39910](https://github.com/ClickHouse/ClickHouse/pull/39910) ([San](https://github.com/santrancisco)).
* Remove cache flush from the Docs Check [#39911](https://github.com/ClickHouse/ClickHouse/pull/39911) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flaky tests (`Tried to commit obsolete part`) [#39922](https://github.com/ClickHouse/ClickHouse/pull/39922) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add logging to debug flaky tests [#39925](https://github.com/ClickHouse/ClickHouse/pull/39925) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flaky test `02360_send_logs_level_colors` [#39927](https://github.com/ClickHouse/ClickHouse/pull/39927) ([Anton Popov](https://github.com/CurtizJ)).
* Don't create self-extracting clickhouse for split build [#39936](https://github.com/ClickHouse/ClickHouse/pull/39936) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* tests/stress: add dmesg output (to see OOM details) [#39939](https://github.com/ClickHouse/ClickHouse/pull/39939) ([Azat Khuzhin](https://github.com/azat)).
* Create metadata directory on CREATE for FileLog engine [#39940](https://github.com/ClickHouse/ClickHouse/pull/39940) ([Azat Khuzhin](https://github.com/azat)).
* tests: fix 02352_rwlock flakiness [#39941](https://github.com/ClickHouse/ClickHouse/pull/39941) ([Azat Khuzhin](https://github.com/azat)).
* Remove old code from the website [#39947](https://github.com/ClickHouse/ClickHouse/pull/39947) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove debug trace from DistinctStep [#39955](https://github.com/ClickHouse/ClickHouse/pull/39955) ([Igor Nikonov](https://github.com/devcrafter)).
* IAST destructor intrusive list [#39956](https://github.com/ClickHouse/ClickHouse/pull/39956) ([Maksim Kita](https://github.com/kitaisreal)).
* Remove old code from the website (part 2) [#39959](https://github.com/ClickHouse/ClickHouse/pull/39959) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add Stateful tests (release), Stateless tests (release) to Mergeable Check [#39967](https://github.com/ClickHouse/ClickHouse/pull/39967) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Change font in CI reports [#39969](https://github.com/ClickHouse/ClickHouse/pull/39969) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add setting type to support special 'auto' value [#39974](https://github.com/ClickHouse/ClickHouse/pull/39974) ([Vladimir C](https://github.com/vdimir)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#39979](https://github.com/ClickHouse/ClickHouse/pull/39979) ([Nikita Taranov](https://github.com/nickitat)).
* tests/stress: fix dmesg reading [#39980](https://github.com/ClickHouse/ClickHouse/pull/39980) ([Azat Khuzhin](https://github.com/azat)).
* Disable 02380_insert_mv_race.sh with Ordinary [#39985](https://github.com/ClickHouse/ClickHouse/pull/39985) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Mention how the build can be speed up by disabling self-extraction [#39988](https://github.com/ClickHouse/ClickHouse/pull/39988) ([Robert Schulze](https://github.com/rschu1ze)).
* Use different root path for Jepsen Counter test [#39992](https://github.com/ClickHouse/ClickHouse/pull/39992) ([Antonio Andelic](https://github.com/antonio2368)).
* ActionsDAG rename index to outputs [#39998](https://github.com/ClickHouse/ClickHouse/pull/39998) ([Maksim Kita](https://github.com/kitaisreal)).
* Added H literal for Hour IntervalKind [#39999](https://github.com/ClickHouse/ClickHouse/pull/39999) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Try to avoid timeouts when checking for replication consistency [#40001](https://github.com/ClickHouse/ClickHouse/pull/40001) ([Alexander Tokmakov](https://github.com/tavplubix)).
* More generic check for MergeTree table family [#40004](https://github.com/ClickHouse/ClickHouse/pull/40004) ([Alexander Gololobov](https://github.com/davenger)).
* Further preparation for catboost integration into library-bridge [#40010](https://github.com/ClickHouse/ClickHouse/pull/40010) ([Robert Schulze](https://github.com/rschu1ze)).
* Self-extracting: decompressor, extract real path of executable instead of argv[0] [#40011](https://github.com/ClickHouse/ClickHouse/pull/40011) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* copy self-extracting to output [#40017](https://github.com/ClickHouse/ClickHouse/pull/40017) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update 02354_distributed_with_external_aggregation_memory_usage.sql [#40024](https://github.com/ClickHouse/ClickHouse/pull/40024) ([Nikita Taranov](https://github.com/nickitat)).
* Fix segfault in `DataTypeAggregateFunction` [#40025](https://github.com/ClickHouse/ClickHouse/pull/40025) ([Anton Popov](https://github.com/CurtizJ)).
* tests/performance: cover sparse_hashed dictionary [#40027](https://github.com/ClickHouse/ClickHouse/pull/40027) ([Azat Khuzhin](https://github.com/azat)).
* Cleanup docs of parseDateTime*() function family [#40030](https://github.com/ClickHouse/ClickHouse/pull/40030) ([Robert Schulze](https://github.com/rschu1ze)).
* Job url [#40032](https://github.com/ClickHouse/ClickHouse/pull/40032) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v22.6.5.22-stable [#40036](https://github.com/ClickHouse/ClickHouse/pull/40036) ([github-actions[bot]](https://github.com/apps/github-actions)).
* Non-significant changes [#40038](https://github.com/ClickHouse/ClickHouse/pull/40038) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* tests: attempt to make 02293_part_log_has_merge_reason less flaky [#40047](https://github.com/ClickHouse/ClickHouse/pull/40047) ([Azat Khuzhin](https://github.com/azat)).
* Remove documentation templates [#40048](https://github.com/ClickHouse/ClickHouse/pull/40048) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Move images to clickhouse-presentations repository. [#40049](https://github.com/ClickHouse/ClickHouse/pull/40049) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix broken image in test-visualizer [#40050](https://github.com/ClickHouse/ClickHouse/pull/40050) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test for query parameters in HTTP POST [#40055](https://github.com/ClickHouse/ClickHouse/pull/40055) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix clickhouse-test hang in case of CREATE DATABASE fails [#40057](https://github.com/ClickHouse/ClickHouse/pull/40057) ([Azat Khuzhin](https://github.com/azat)).
* tests: fix 02380_insert_mv_race for Ordinary database [#40058](https://github.com/ClickHouse/ClickHouse/pull/40058) ([Azat Khuzhin](https://github.com/azat)).
* Skip newlines before Tags in clickhouse-test [#40061](https://github.com/ClickHouse/ClickHouse/pull/40061) ([Vladimir C](https://github.com/vdimir)).
* Replace S3 URLs by parameter [#40066](https://github.com/ClickHouse/ClickHouse/pull/40066) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Finally fix `_csv.Error: field larger than field limit` [#40072](https://github.com/ClickHouse/ClickHouse/pull/40072) ([Alexander Tokmakov](https://github.com/tavplubix)).
* tests: fix 00926_adaptive_index_granularity_pk/00489_pk_subexpression flakiness [#40075](https://github.com/ClickHouse/ClickHouse/pull/40075) ([Azat Khuzhin](https://github.com/azat)).
* Changelogs and versions [#40090](https://github.com/ClickHouse/ClickHouse/pull/40090) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* A test for counting resources in subqueries [#40104](https://github.com/ClickHouse/ClickHouse/pull/40104) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use a job ID as ref text [#40112](https://github.com/ClickHouse/ClickHouse/pull/40112) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Delete files DictionaryJoinAdapter.h/cpp [#40113](https://github.com/ClickHouse/ClickHouse/pull/40113) ([Vladimir C](https://github.com/vdimir)).
* Rework S3Helper a little bit [#40127](https://github.com/ClickHouse/ClickHouse/pull/40127) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* PODArray assign empty array fix [#40129](https://github.com/ClickHouse/ClickHouse/pull/40129) ([Maksim Kita](https://github.com/kitaisreal)).
* Disable 02390_prometheus_ClickHouseStatusInfo_DictionaryStatus with Ordinary database [#40136](https://github.com/ClickHouse/ClickHouse/pull/40136) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add tests with Ordinary database to flaky check [#40137](https://github.com/ClickHouse/ClickHouse/pull/40137) ([Alexander Tokmakov](https://github.com/tavplubix)).
* fs cache: minor change [#40138](https://github.com/ClickHouse/ClickHouse/pull/40138) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix typo [#40139](https://github.com/ClickHouse/ClickHouse/pull/40139) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix keeper-bench in case of error during scheduling a thread [#40147](https://github.com/ClickHouse/ClickHouse/pull/40147) ([Azat Khuzhin](https://github.com/azat)).
* Fix "Cannot quickly remove directory" [#40151](https://github.com/ClickHouse/ClickHouse/pull/40151) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Set sync_request_timeout to 10 to avoid reconnections in tests [#40158](https://github.com/ClickHouse/ClickHouse/pull/40158) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Disable zero-copy replication by default [#40175](https://github.com/ClickHouse/ClickHouse/pull/40175) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve assignment and logging for cherry-pick and backport steps [#40177](https://github.com/ClickHouse/ClickHouse/pull/40177) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* test for Decimal aggregateFunction normalization [#39420](https://github.com/ClickHouse/ClickHouse/issues/39420) [#40178](https://github.com/ClickHouse/ClickHouse/pull/40178) ([Denny Crane](https://github.com/den-crane)).
* Minor build changes [#40182](https://github.com/ClickHouse/ClickHouse/pull/40182) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* clickhouse-test: enable ZooKeeper tests by default [#40191](https://github.com/ClickHouse/ClickHouse/pull/40191) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old code [#40196](https://github.com/ClickHouse/ClickHouse/pull/40196) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#40198](https://github.com/ClickHouse/ClickHouse/pull/40198) ([clickhouse-robot-curie](https://github.com/clickhouse-robot-curie)).
* Fix a bug with symlinks detection [#40232](https://github.com/ClickHouse/ClickHouse/pull/40232) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Better error message when restoring covered parts [#40234](https://github.com/ClickHouse/ClickHouse/pull/40234) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Try to print stacktraces if query timeouts in integration tests [#40248](https://github.com/ClickHouse/ClickHouse/pull/40248) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add Unit tests to Mergeable [#40250](https://github.com/ClickHouse/ClickHouse/pull/40250) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Extract common KV storage logic [#40261](https://github.com/ClickHouse/ClickHouse/pull/40261) ([Antonio Andelic](https://github.com/antonio2368)).
* Add update_mergeable_check trigger for Unit tests [#40269](https://github.com/ClickHouse/ClickHouse/pull/40269) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* CVE-2021-3520: (negligible) rdkafka library: update lz4.c from upstream [#40272](https://github.com/ClickHouse/ClickHouse/pull/40272) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Fix build [#40297](https://github.com/ClickHouse/ClickHouse/pull/40297) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### Support cte statement for antlr4 syntax file
* ... [#39814](https://github.com/ClickHouse/ClickHouse/pull/39814) ([qianmoQ](https://github.com/qianmoQ)).

View File

@ -192,7 +192,7 @@ ClickHouse fuzzing is implemented both using [libFuzzer](https://llvm.org/docs/L
All the fuzz testing should be performed with sanitizers (Address and Undefined).
LibFuzzer is used for isolated fuzz testing of library code. Fuzzers are implemented as part of test code and have “_fuzzer” name postfixes.
Fuzzer example can be found at `src/Parsers/tests/lexer_fuzzer.cpp`. LibFuzzer-specific configs, dictionaries and corpus are stored at `tests/fuzz`.
Fuzzer example can be found at `src/Parsers/fuzzers/lexer_fuzzer.cpp`. LibFuzzer-specific configs, dictionaries and corpus are stored at `tests/fuzz`.
We encourage you to write fuzz tests for every functionality that handles user input.
Fuzzers are not built by default. To build fuzzers both `-DENABLE_FUZZING=1` and `-DENABLE_TESTS=1` options should be set.

View File

@ -51,10 +51,14 @@ SELECT * FROM hdfs_engine_table LIMIT 2
## Implementation Details {#implementation-details}
- Reads and writes can be parallel.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
- Not supported:
- `ALTER` and `SELECT...SAMPLE` operations.
- Indexes.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is possible, but not recommended.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
**Globs in path**

View File

@ -50,10 +50,14 @@ For more information about virtual columns see [here](../../../engines/table-eng
## Implementation Details {#implementation-details}
- Reads and writes can be parallel
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
- Not supported:
- `ALTER` and `SELECT...SAMPLE` operations.
- Indexes.
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is possible, but not supported.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## Wildcards In Path {#wildcards-in-path}

View File

@ -1023,6 +1023,10 @@ Other parameters:
Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## Virtual Columns {#virtual-columns}
- `_part` — Name of a part.

View File

@ -39,10 +39,53 @@ Uniqueness of rows is determined by the `ORDER BY` table section, not `PRIMARY K
`ver` — column with the version number. Type `UInt*`, `Date`, `DateTime` or `DateTime64`. Optional parameter.
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
- The last in the selection, if `ver` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
- With the maximum version, if `ver` specified. If `ver` is the same for several rows, then it will use "if `ver` is not specified" rule for them, i.e. the most recent inserted row will remain.
- The last in the selection, if `ver` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
- With the maximum version, if `ver` specified. If `ver` is the same for several rows, then it will use "if `ver` is not specified" rule for them, i.e. the most recent inserted row will remain.
Example:
```sql
-- without ver - the last inserted 'wins'
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
-- with ver - the row with the biggest ver 'wins'
CREATE TABLE mySecondReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree(eventTime)
ORDER BY key;
INSERT INTO mySecondReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO mySecondReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM mySecondReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ first │ 2020-01-01 01:01:01 │
└─────┴─────────┴─────────────────────┘
```
## Query clauses

View File

@ -19,6 +19,7 @@ Additional cache types:
- Compiled expressions cache.
- [Avro format](../interfaces/formats.md#data-format-avro) schemas cache.
- [Dictionaries](../sql-reference/dictionaries/index.md) data cache.
- Schema inference cache.
Indirectly used:

View File

@ -745,13 +745,24 @@ On hosts with low RAM and swap, you possibly need setting `max_server_memory_usa
- [max_server_memory_usage](#max_server_memory_usage)
## concurrent_threads_soft_limit {#concurrent_threads_soft_limit}
The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries. This is not a hard limit. In case if the limit is reached the query will still get one thread to run.
## concurrent_threads_soft_limit_num {#concurrent_threads_soft_limit_num}
The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries. This is not a hard limit. In case if the limit is reached the query will still get at least one thread to run. Query can upscale to desired number of threads during execution if more threads become available.
Possible values:
- Positive integer.
- 0 — No limit.
Default value: `0`.
## concurrent_threads_soft_limit_ratio_to_cores {#concurrent_threads_soft_limit_ratio_to_cores}
The maximum number of query processing threads as multiple of number of logical cores.
More details: [concurrent_threads_soft_limit_num](#concurrent-threads-soft-limit-num).
Possible values:
- Positive integer.
- 0 — No limit.
- -1 — The parameter is initialized by number of logical cores multiplies by 3. Which is a good heuristic for CPU-bound tasks.
Default value: `0`.

View File

@ -218,6 +218,10 @@ Default value: 0 (seconds)
When this setting has a value greater than than zero only a single replica starts the merge immediately if merged part on shared storage and `allow_remote_fs_zero_copy_replication` is enabled.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
Possible values:
- Any positive integer.

View File

@ -747,7 +747,14 @@ Default value: 268435456.
Disables lagging replicas for distributed queries. See [Replication](../../engines/table-engines/mergetree-family/replication.md).
Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
Sets the time in seconds. If a replica's lag is greater than or equal to the set value, this replica is not used.
Possible values:
- Positive integer.
- 0 — Replica lags are not checked.
To prevent the use of any replica with a non-zero lag, set this parameter to 1.
Default value: 300.
@ -3300,7 +3307,7 @@ Possible values:
Default value: `0`.
## shutdown_wait_unfinished_queries
## shutdown_wait_unfinished_queries {#shutdown_wait_unfinished_queries}
Enables or disables waiting unfinished queries when shutdown server.
@ -3311,13 +3318,13 @@ Possible values:
Default value: 0.
## shutdown_wait_unfinished
## shutdown_wait_unfinished {#shutdown_wait_unfinished}
The waiting time in seconds for currently handled connections when shutdown server.
Default Value: 5.
## memory_overcommit_ratio_denominator
## memory_overcommit_ratio_denominator {#memory_overcommit_ratio_denominator}
It represents soft memory limit in case when hard limit is reached on user level.
This value is used to compute overcommit ratio for the query.
@ -3326,7 +3333,7 @@ Read more about [memory overcommit](memory-overcommit.md).
Default value: `1GiB`.
## memory_usage_overcommit_max_wait_microseconds
## memory_usage_overcommit_max_wait_microseconds {#memory_usage_overcommit_max_wait_microseconds}
Maximum time thread will wait for memory to be freed in the case of memory overcommit on a user level.
If the timeout is reached and memory is not freed, an exception is thrown.
@ -3334,7 +3341,7 @@ Read more about [memory overcommit](memory-overcommit.md).
Default value: `5000000`.
## memory_overcommit_ratio_denominator_for_user
## memory_overcommit_ratio_denominator_for_user {#memory_overcommit_ratio_denominator_for_user}
It represents soft memory limit in case when hard limit is reached on global level.
This value is used to compute overcommit ratio for the query.
@ -3343,6 +3350,36 @@ Read more about [memory overcommit](memory-overcommit.md).
Default value: `1GiB`.
## schema_inference_use_cache_for_file {schema_inference_use_cache_for_file}
Enable schemas cache for schema inference in `file` table function.
Default value: `true`.
## schema_inference_use_cache_for_s3 {schema_inference_use_cache_for_s3}
Enable schemas cache for schema inference in `s3` table function.
Default value: `true`.
## schema_inference_use_cache_for_url {schema_inference_use_cache_for_url}
Enable schemas cache for schema inference in `url` table function.
Default value: `true`.
## schema_inference_use_cache_for_hdfs {schema_inference_use_cache_for_hdfs}
Enable schemas cache for schema inference in `hdfs` table function.
Default value: `true`.
## schema_inference_cache_require_modification_time_for_url {#schema_inference_cache_require_modification_time_for_url}
Use schema from cache for URL with last modification time validation (for urls with Last-Modified header). If this setting is enabled and URL doesn't have Last-Modified header, schema from cache won't be used.
Default value: `true`.
## compatibility {#compatibility}
This setting changes other settings according to provided ClickHouse version.
@ -3468,6 +3505,24 @@ Default value: `25'000`.
The list of column names to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'
## schema_inference_hints {#schema_inference_hints}
The list of column names and types to use as hints in schema inference for formats without schema.
Example:
Query:
```sql
desc format(JSONEachRow, '{"x" : 1, "y" : "String", "z" : "0.0.0.0" }') settings schema_inference_hints='x UInt8, z IPv4';
```
Result:
```sql
x UInt8
y Nullable(String)
z IPv4
```
## date_time_input_format {#date_time_input_format}
Allows choosing a parser of the text representation of date and time.

View File

@ -316,4 +316,8 @@ Use [http_max_single_read_retries](../operations/settings/settings.md#http-max-s
## Zero-copy Replication (not ready for production) {#zero-copy}
ClickHouse supports zero-copy replication for `S3` and `HDFS` disks, which means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
Zero-copy replication is possible, but not recommended, with `S3` and `HDFS` disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
:::warning Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::

View File

@ -18,7 +18,9 @@ DateTime64(precision, [timezone])
Internally, stores data as a number of ticks since epoch start (1970-01-01 00:00:00 UTC) as Int64. The tick resolution is determined by the precision parameter. Additionally, the `DateTime64` type can store time zone that is the same for the entire column, that affects how the values of the `DateTime64` type values are displayed in text format and how the values specified as strings are parsed (2020-01-01 05:00:01.000). The time zone is not stored in the rows of the table (or in resultset), but is stored in the column metadata. See details in [DateTime](../../sql-reference/data-types/datetime.md).
Supported range of values: \[1900-01-01 00:00:00, 2299-12-31 23:59:59.99999999\] (Note: The precision of the maximum value is 8).
Supported range of values: \[1900-01-01 00:00:00, 2299-12-31 23:59:59.99999999\]
Note: The precision of the maximum value is 8. If the maximum precision of 9 digits (nanoseconds) is used, the maximum supported value is `2262-04-11 23:47:16` in UTC.
## Examples

View File

@ -1068,7 +1068,10 @@ Query:
```sql
WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT dateName('year', date_value), dateName('month', date_value), dateName('day', date_value);
SELECT
dateName('year', date_value),
dateName('month', date_value),
dateName('day', date_value);
```
Result:
@ -1076,7 +1079,44 @@ Result:
```text
┌─dateName('year', date_value)─┬─dateName('month', date_value)─┬─dateName('day', date_value)─┐
│ 2021 │ April │ 14 │
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────
└──────────────────────────────┴───────────────────────────────┴─────────────────────────────┘
```
## monthName
Returns name of the month.
**Syntax**
``` sql
monthName(date)
```
**Arguments**
- `date` — Date or date with time. [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
**Returned value**
- The name of the month.
Type: [String](../../sql-reference/data-types/string.md#string)
**Example**
Query:
```sql
WITH toDateTime('2021-04-14 11:22:33') AS date_value
SELECT monthName(date_value);
```
Result:
```text
┌─monthName(date_value)─┐
│ April │
└───────────────────────┘
```
## FROM\_UNIXTIME

View File

@ -1822,10 +1822,13 @@ Result:
Evaluate external model.
Accepts a model name and model arguments. Returns Float64.
## throwIf(x\[, custom_message\])
## throwIf(x\[, message\[, error_code\]\])
Throw an exception if the argument is non zero.
custom_message - is an optional parameter: a constant string, provides an error message
`message` - is an optional parameter: a constant string providing a custom error message
`error_code` - is an optional parameter: a constant integer providing a custom error code
To use the `error_code` argument, configuration parameter `allow_custom_error_code_in_throwif` must be enabled.
``` sql
SELECT throwIf(number = 3, 'Too many') FROM numbers(10);

View File

@ -28,19 +28,65 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
Описание параметров запроса смотрите в [описании запроса](../../../engines/table-engines/mergetree-family/replacingmergetree.md).
:::note "Внимание"
Уникальность строк определяется `ORDER BY` секцией таблицы, а не `PRIMARY KEY`.
:::
**Параметры ReplacingMergeTree**
:::warning "Внимание"
Уникальность строк определяется `ORDER BY` секцией таблицы, а не `PRIMARY KEY`.
:::
- `ver` — столбец с номером версии. Тип `UInt*`, `Date`, `DateTime` или `DateTime64`. Необязательный параметр.
## Параметры ReplacingMergeTree
При слиянии `ReplacingMergeTree` оставляет только строку для каждого уникального ключа сортировки:
### ver
`ver` — столбец с номером версии. Тип `UInt*`, `Date`, `DateTime` или `DateTime64`. Необязательный параметр.
При слиянии `ReplacingMergeTree` оставляет только строку для каждого уникального ключа сортировки:
- Последнюю в выборке, если `ver` не задан. Под выборкой здесь понимается набор строк в наборе кусков данных, участвующих в слиянии. Последний по времени создания кусок (последняя вставка) будет последним в выборке. Таким образом, после дедупликации для каждого значения ключа сортировки останется самая последняя строка из самой последней вставки.
- С максимальной версией, если `ver` задан. Если `ver` одинаковый у нескольких строк, то для них используется правило -- если `ver` не задан, т.е. в результате слияния останется самая последняя строка из самой последней вставки.
**Секции запроса**
Пример:
```sql
-- without ver - the last inserted 'wins'
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
-- with ver - the row with the biggest ver 'wins'
CREATE TABLE mySecondReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree(eventTime)
ORDER BY key;
INSERT INTO mySecondReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO mySecondReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM mySecondReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ first │ 2020-01-01 01:01:01 │
└─────┴─────────┴─────────────────────┘
```
## Секции запроса
При создании таблицы `ReplacingMergeTree` используются те же [секции](mergetree.md), что и при создании таблицы `MergeTree`.
@ -48,9 +94,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>Устаревший способ создания таблицы</summary>
:::note "Внимание"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ описанный выше.
:::
:::warning "Внимание"
Не используйте этот способ в новых проектах и по возможности переведите старые проекты на способ, описанный выше.
:::
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(

View File

@ -29,13 +29,13 @@ histogram(number_of_bins)(values)
- [Массив](../../sql-reference/data-types/array.md) [кортежей](../../sql-reference/data-types/tuple.md) следующего вида:
```
[(lower_1, upper_1, height_1), ... (lower_N, upper_N, height_N)]
```
```
[(lower_1, upper_1, height_1), ... (lower_N, upper_N, height_N)]
```
- `lower` — нижняя граница корзины.
- `upper` — верхняя граница корзины.
- `height` — количество значений в корзине.
- `lower` — нижняя граница корзины.
- `upper` — верхняя граница корзины.
- `height` — количество значений в корзине.
**Пример**
@ -91,6 +91,7 @@ sequenceMatch(pattern)(timestamp, cond1, cond2, ...)
:::danger "Предупреждение"
События, произошедшие в одну и ту же секунду, располагаются в последовательности в неопределенном порядке, что может повлиять на результат работы функции.
:::
**Аргументы**
@ -176,6 +177,7 @@ SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2, number = 4) FROM
:::danger "Предупреждение"
События, произошедшие в одну и ту же секунду, располагаются в последовательности в неопределенном порядке, что может повлиять на результат работы функции.
:::
``` sql
sequenceCount(pattern)(timestamp, cond1, cond2, ...)

View File

@ -931,6 +931,13 @@ SELECT now('Europe/Moscow');
└──────────────────────┘
```
## nowInBlock {#nowinblock}
Возращает текующию дату и время в момент обработки блока данных. В отличие от функции `now`, возращаемое значение не константа, и будет возрващаться разлчиные значения в разных блоках данных при долгих запросах
Имеет смысл использовать данную функцию для получения времени сейчас при длительных запросов INSERT SELECT.
## today {#today}
Возвращает текущую дату на момент выполнения запроса. Функция не требует аргументов.

View File

@ -1727,10 +1727,13 @@ SELECT joinGet(db_test.id_val,'val',toUInt32(number)) from numbers(4) SETTINGS j
Принимает на вход имя и аргументы модели. Возвращает Float64.
## throwIf(x\[, custom_message\]) {#throwifx-custom-message}
## throwIf(x\[, message\[, error_code\]\]) {#throwifx-custom-message}
Бросает исключение, если аргумент не равен нулю.
custom_message - необязательный параметр, константная строка, задает текст сообщения об ошибке.
`custom_message` - необязательный параметр, константная строка, задает текст сообщения об ошибке.
`error_code` - необязательный параметр, константное число, задает код ошибки.
Чтобы использовать аргумент `error_code`, должен быть включен параметр конфигурации `allow_custom_error_code_in_throwif`.
``` sql
SELECT throwIf(number = 3, 'Too many') FROM numbers(10);

View File

@ -1156,22 +1156,20 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (config->has("max_partition_size_to_drop"))
global_context->setMaxPartitionSizeToDrop(config->getUInt64("max_partition_size_to_drop"));
if (config->has("concurrent_threads_soft_limit"))
ConcurrencyControl::SlotCount concurrent_threads_soft_limit = ConcurrencyControl::Unlimited;
if (config->has("concurrent_threads_soft_limit_num"))
{
auto concurrent_threads_soft_limit = config->getInt("concurrent_threads_soft_limit", 0);
if (concurrent_threads_soft_limit == -1)
{
// Based on tests concurrent_threads_soft_limit has an optimal value when it's about 3 times of logical CPU cores
constexpr size_t thread_factor = 3;
concurrent_threads_soft_limit = std::thread::hardware_concurrency() * thread_factor;
}
if (concurrent_threads_soft_limit)
ConcurrencyControl::instance().setMaxConcurrency(concurrent_threads_soft_limit);
else
ConcurrencyControl::instance().setMaxConcurrency(ConcurrencyControl::Unlimited);
auto value = config->getUInt64("concurrent_threads_soft_limit_num", 0);
if (value > 0 && value < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = value;
}
else
ConcurrencyControl::instance().setMaxConcurrency(ConcurrencyControl::Unlimited);
if (config->has("concurrent_threads_soft_limit_ratio_to_cores"))
{
auto value = config->getUInt64("concurrent_threads_soft_limit_ratio_to_cores", 0) * std::thread::hardware_concurrency();
if (value > 0 && value < concurrent_threads_soft_limit)
concurrent_threads_soft_limit = value;
}
ConcurrencyControl::instance().setMaxConcurrency(concurrent_threads_soft_limit);
if (config->has("max_concurrent_queries"))
global_context->getProcessList().setMaxSize(config->getInt("max_concurrent_queries", 0));

View File

@ -281,12 +281,12 @@
<http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
-->
<!-- Maximum number of query processing threads to run all queries.
Note that This is not a hard limit. In case if the limit is reached the query will still get one thread to run.
For value equals to -1 this parameter is initialized by number of logical cores multiplies by 3.
Which is a good heuristic for CPU-bound tasks.
<!-- The maximum number of query processing threads, excluding threads for retrieving data from remote servers, allowed to run all queries.
This is not a hard limit. In case if the limit is reached the query will still get at least one thread to run.
Query can upscale to desired number of threads during execution if more threads become available.
-->
<concurrent_threads_soft_limit>0</concurrent_threads_soft_limit>
<concurrent_threads_soft_limit_num>0</concurrent_threads_soft_limit_num>
<concurrent_threads_soft_limit_ratio_to_cores>0</concurrent_threads_soft_limit_ratio_to_cores>
<!-- Maximum number of concurrent queries. -->
<max_concurrent_queries>100</max_concurrent_queries>

View File

@ -140,6 +140,7 @@ enum class AccessType
M(SYSTEM_DROP_MMAP_CACHE, "SYSTEM DROP MMAP, DROP MMAP CACHE, DROP MMAP", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_COMPILED_EXPRESSION_CACHE, "SYSTEM DROP COMPILED EXPRESSION, DROP COMPILED EXPRESSION CACHE, DROP COMPILED EXPRESSIONS", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_FILESYSTEM_CACHE, "SYSTEM DROP FILESYSTEM CACHE, DROP FILESYSTEM CACHE", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_SCHEMA_CACHE, "SYSTEM DROP SCHEMA CACHE, DROP SCHEMA CACHE", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_CACHE, "DROP CACHE", GROUP, SYSTEM) \
M(SYSTEM_RELOAD_CONFIG, "RELOAD CONFIG", GLOBAL, SYSTEM_RELOAD) \
M(SYSTEM_RELOAD_SYMBOLS, "RELOAD SYMBOLS", GLOBAL, SYSTEM_RELOAD) \

View File

@ -130,6 +130,9 @@ if (TARGET ch_contrib::hdfs)
add_headers_and_sources(dbms Disks/ObjectStorages/HDFS)
endif()
add_headers_and_sources(dbms Disks/ObjectStorages/Cached)
add_headers_and_sources(dbms Disks/ObjectStorages/Web)
add_headers_and_sources(dbms Storages/Cache)
if (TARGET ch_contrib::hivemetastore)
add_headers_and_sources(dbms Storages/Hive)

View File

@ -15,6 +15,6 @@ namespace DB::ConfigHelper
/// The behavior is like `config.getBool(key, default_)`,
/// except when the tag is empty (aka. self-closing), `empty_as` will be used instead of throwing Poco::Exception.
bool getBool(const Poco::Util::AbstractConfiguration & config, const std::string & key, bool default_, bool empty_as);
bool getBool(const Poco::Util::AbstractConfiguration & config, const std::string & key, bool default_ = false, bool empty_as = true);
}

View File

@ -634,6 +634,7 @@
M(663, INCONSISTENT_METADATA_FOR_BACKUP) \
M(664, ACCESS_STORAGE_DOESNT_ALLOW_BACKUP) \
M(665, CANNOT_CONNECT_NATS) \
M(666, CANNOT_USE_CACHE) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -29,13 +29,13 @@ FileCache::FileCache(
, max_size(cache_settings_.max_size)
, max_element_size(cache_settings_.max_elements)
, max_file_segment_size(cache_settings_.max_file_segment_size)
, allow_persistent_files(cache_settings_.do_not_evict_index_and_mark_files)
, enable_cache_hits_threshold(cache_settings_.enable_cache_hits_threshold)
, enable_filesystem_query_cache_limit(cache_settings_.enable_filesystem_query_cache_limit)
, log(&Poco::Logger::get("FileCache"))
, main_priority(std::make_unique<LRUFileCachePriority>())
, stash_priority(std::make_unique<LRUFileCachePriority>())
, max_stash_element_size(cache_settings_.max_elements)
, enable_cache_hits_threshold(cache_settings_.enable_cache_hits_threshold)
, log(&Poco::Logger::get("FileCache"))
, allow_to_remove_persistent_segments_from_cache_by_default(cache_settings_.allow_to_remove_persistent_segments_from_cache_by_default)
{
}
@ -77,132 +77,6 @@ void FileCache::assertInitialized() const
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cache not initialized");
}
FileCache::QueryContextPtr FileCache::getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock)
{
if (!isQueryInitialized())
return nullptr;
return getQueryContext(std::string(CurrentThread::getQueryId()), cache_lock);
}
FileCache::QueryContextPtr FileCache::getQueryContext(const String & query_id, std::lock_guard<std::mutex> & /* cache_lock */)
{
auto query_iter = query_map.find(query_id);
return (query_iter == query_map.end()) ? nullptr : query_iter->second;
}
void FileCache::removeQueryContext(const String & query_id)
{
std::lock_guard cache_lock(mutex);
auto query_iter = query_map.find(query_id);
if (query_iter == query_map.end())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Attempt to release query context that does not exist (query_id: {})",
query_id);
}
query_map.erase(query_iter);
}
FileCache::QueryContextPtr FileCache::getOrSetQueryContext(
const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> & cache_lock)
{
if (query_id.empty())
return nullptr;
auto context = getQueryContext(query_id, cache_lock);
if (context)
return context;
auto query_context = std::make_shared<QueryContext>(settings.max_query_cache_size, settings.skip_download_if_exceeds_query_cache);
auto query_iter = query_map.emplace(query_id, query_context).first;
return query_iter->second;
}
FileCache::QueryContextHolder FileCache::getQueryContextHolder(const String & query_id, const ReadSettings & settings)
{
std::lock_guard cache_lock(mutex);
if (!enable_filesystem_query_cache_limit || settings.max_query_cache_size == 0)
return {};
/// if enable_filesystem_query_cache_limit is true, and max_query_cache_size large than zero,
/// we create context query for current query.
auto context = getOrSetQueryContext(query_id, settings, cache_lock);
return QueryContextHolder(query_id, this, context);
}
void FileCache::QueryContext::remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size < size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Deleted cache size exceeds existing cache size");
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record != records.end())
{
record->second->removeAndGetNext(cache_lock);
records.erase({key, offset});
}
}
cache_size -= size;
}
void FileCache::QueryContext::reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size + size > max_cache_size)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Reserved cache size exceeds the remaining cache size (key: {}, offset: {})",
key.toString(), offset);
}
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record == records.end())
{
auto queue_iter = priority->add(key, offset, 0, cache_lock);
record = records.insert({{key, offset}, queue_iter}).first;
}
record->second->incrementSize(size, cache_lock);
}
cache_size += size;
}
void FileCache::QueryContext::use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock)
{
if (skip_download_if_exceeds_query_cache)
return;
auto record = records.find({key, offset});
if (record != records.end())
record->second->use(cache_lock);
}
FileCache::QueryContextHolder::QueryContextHolder(
const String & query_id_,
FileCache * cache_,
FileCache::QueryContextPtr context_)
: query_id(query_id_)
, cache(cache_)
, context(context_)
{
}
FileCache::QueryContextHolder::~QueryContextHolder()
{
/// If only the query_map and the current holder hold the context_query,
/// the query has been completed and the query_context is released.
if (context && context.use_count() == 2)
cache->removeQueryContext(query_id);
}
void FileCache::initialize()
{
std::lock_guard cache_lock(mutex);
@ -597,18 +471,20 @@ FileCache::FileSegmentCell * FileCache::addCell(
return &(it->second);
}
FileSegmentsHolder FileCache::setDownloading(
FileSegmentPtr FileCache::createFileSegmentForDownload(
const Key & key,
size_t offset,
size_t size,
bool is_persistent)
bool is_persistent,
std::lock_guard<std::mutex> & cache_lock)
{
std::lock_guard cache_lock(mutex);
#ifndef NDEBUG
assertCacheCorrectness(key, cache_lock);
#endif
if (size > max_file_segment_size)
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Requested size exceeds max file segment size");
auto * cell = getCell(key, offset, cache_lock);
if (cell)
throw Exception(
@ -616,8 +492,12 @@ FileSegmentsHolder FileCache::setDownloading(
"Cache cell already exists for key `{}` and offset {}",
key.toString(), offset);
auto file_segments = splitRangeIntoCells(key, offset, size, FileSegment::State::DOWNLOADING, is_persistent, cache_lock);
return FileSegmentsHolder(std::move(file_segments));
cell = addCell(key, offset, size, FileSegment::State::EMPTY, is_persistent, cache_lock);
if (!cell)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to add a new cell for download");
return cell->file_segment;
}
bool FileCache::tryReserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
@ -691,6 +571,13 @@ bool FileCache::tryReserve(const Key & key, size_t offset, size_t size, std::loc
if (cell->releasable())
{
auto & file_segment = cell->file_segment;
if (file_segment->isPersistent() && allow_persistent_files)
{
LOG_DEBUG(log, "File segment will not be removed, because it is persistent: {}", file_segment->getInfoForLog());
continue;
}
std::lock_guard segment_lock(file_segment->mutex);
switch (file_segment->download_state)
@ -806,6 +693,12 @@ bool FileCache::tryReserveForMainList(
{
auto & file_segment = cell->file_segment;
if (file_segment->isPersistent() && allow_persistent_files)
{
LOG_DEBUG(log, "File segment will not be removed, because it is persistent: {}", file_segment->getInfoForLog());
continue;
}
std::lock_guard segment_lock(file_segment->mutex);
switch (file_segment->download_state)
@ -927,7 +820,7 @@ void FileCache::removeIfExists(const Key & key)
}
}
void FileCache::removeIfReleasable(bool remove_persistent_files)
void FileCache::removeIfReleasable()
{
/// Try remove all cached files by cache_base_path.
/// Only releasable file segments are evicted.
@ -951,10 +844,8 @@ void FileCache::removeIfReleasable(bool remove_persistent_files)
if (cell->releasable())
{
auto file_segment = cell->file_segment;
if (file_segment
&& (!file_segment->isPersistent()
|| remove_persistent_files
|| allow_to_remove_persistent_segments_from_cache_by_default))
if (file_segment)
{
to_remove.emplace_back(file_segment);
}
@ -1088,9 +979,11 @@ void FileCache::loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_lock
}
else
{
LOG_WARNING(log,
"Cache capacity changed (max size: {}, available: {}), cached file `{}` does not fit in cache anymore (size: {})",
max_size, getAvailableCacheSizeUnlocked(cache_lock), key_it->path().string(), size);
LOG_WARNING(
log,
"Cache capacity changed (max size: {}, available: {}), cached file `{}` does not fit in cache anymore (size: {})",
max_size, getAvailableCacheSizeUnlocked(cache_lock), key_it->path().string(), size);
fs::remove(offset_it->path());
}
}
@ -1203,12 +1096,6 @@ size_t FileCache::getUsedCacheSizeUnlocked(std::lock_guard<std::mutex> & cache_l
return main_priority->getCacheSize(cache_lock);
}
size_t FileCache::getAvailableCacheSize() const
{
std::lock_guard cache_lock(mutex);
return getAvailableCacheSizeUnlocked(cache_lock);
}
size_t FileCache::getAvailableCacheSizeUnlocked(std::lock_guard<std::mutex> & cache_lock) const
{
return max_size - getUsedCacheSizeUnlocked(cache_lock);
@ -1327,4 +1214,130 @@ void FileCache::assertPriorityCorrectness(std::lock_guard<std::mutex> & cache_lo
assert(main_priority->getElementsNum(cache_lock) <= max_element_size);
}
FileCache::QueryContextPtr FileCache::getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock)
{
if (!isQueryInitialized())
return nullptr;
return getQueryContext(std::string(CurrentThread::getQueryId()), cache_lock);
}
FileCache::QueryContextPtr FileCache::getQueryContext(const String & query_id, std::lock_guard<std::mutex> & /* cache_lock */)
{
auto query_iter = query_map.find(query_id);
return (query_iter == query_map.end()) ? nullptr : query_iter->second;
}
void FileCache::removeQueryContext(const String & query_id)
{
std::lock_guard cache_lock(mutex);
auto query_iter = query_map.find(query_id);
if (query_iter == query_map.end())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Attempt to release query context that does not exist (query_id: {})",
query_id);
}
query_map.erase(query_iter);
}
FileCache::QueryContextPtr FileCache::getOrSetQueryContext(
const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> & cache_lock)
{
if (query_id.empty())
return nullptr;
auto context = getQueryContext(query_id, cache_lock);
if (context)
return context;
auto query_context = std::make_shared<QueryContext>(settings.max_query_cache_size, settings.skip_download_if_exceeds_query_cache);
auto query_iter = query_map.emplace(query_id, query_context).first;
return query_iter->second;
}
FileCache::QueryContextHolder FileCache::getQueryContextHolder(const String & query_id, const ReadSettings & settings)
{
std::lock_guard cache_lock(mutex);
if (!enable_filesystem_query_cache_limit || settings.max_query_cache_size == 0)
return {};
/// if enable_filesystem_query_cache_limit is true, and max_query_cache_size large than zero,
/// we create context query for current query.
auto context = getOrSetQueryContext(query_id, settings, cache_lock);
return QueryContextHolder(query_id, this, context);
}
void FileCache::QueryContext::remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size < size)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Deleted cache size exceeds existing cache size");
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record != records.end())
{
record->second->removeAndGetNext(cache_lock);
records.erase({key, offset});
}
}
cache_size -= size;
}
void FileCache::QueryContext::reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock)
{
if (cache_size + size > max_cache_size)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Reserved cache size exceeds the remaining cache size (key: {}, offset: {})",
key.toString(), offset);
}
if (!skip_download_if_exceeds_query_cache)
{
auto record = records.find({key, offset});
if (record == records.end())
{
auto queue_iter = priority->add(key, offset, 0, cache_lock);
record = records.insert({{key, offset}, queue_iter}).first;
}
record->second->incrementSize(size, cache_lock);
}
cache_size += size;
}
void FileCache::QueryContext::use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock)
{
if (skip_download_if_exceeds_query_cache)
return;
auto record = records.find({key, offset});
if (record != records.end())
record->second->use(cache_lock);
}
FileCache::QueryContextHolder::QueryContextHolder(
const String & query_id_,
FileCache * cache_,
FileCache::QueryContextPtr context_)
: query_id(query_id_)
, cache(cache_)
, context(context_)
{
}
FileCache::QueryContextHolder::~QueryContextHolder()
{
/// If only the query_map and the current holder hold the context_query,
/// the query has been completed and the query_context is released.
if (context && context.use_count() == 2)
cache->removeQueryContext(query_id);
}
}

View File

@ -23,13 +23,17 @@ namespace DB
{
/// Local cache for remote filesystem files, represented as a set of non-overlapping non-empty file segments.
/// Different caching algorithms are implemented based on IFileCachePriority.
/// Different caching algorithms are implemented using IFileCachePriority.
class FileCache : private boost::noncopyable
{
friend class FileSegment;
friend class IFileCachePriority;
friend struct FileSegmentsHolder;
friend class FileSegmentRangeWriter;
friend class FileSegment;
friend class IFileCachePriority;
friend struct FileSegmentsHolder;
friend class FileSegmentRangeWriter;
struct QueryContext;
using QueryContextPtr = std::shared_ptr<QueryContext>;
public:
using Key = DB::FileCacheKey;
@ -41,25 +45,8 @@ public:
/// Restore cache from local filesystem.
void initialize();
void removeIfExists(const Key & key);
void removeIfReleasable(bool remove_persistent_files);
static bool isReadOnly();
/// Cache capacity in bytes.
size_t capacity() const { return max_size; }
static Key hash(const String & path);
String getPathInLocalCache(const Key & key, size_t offset, bool is_persistent) const;
String getPathInLocalCache(const Key & key) const;
const String & getBasePath() const { return cache_base_path; }
std::vector<String> tryGetCachePaths(const Key & key);
/**
* Given an `offset` and `size` representing [offset, offset + size) bytes interval,
* return list of cached non-overlapping non-empty
@ -84,98 +71,46 @@ public:
*/
FileSegmentsHolder get(const Key & key, size_t offset, size_t size);
FileSegmentsHolder setDownloading(const Key & key, size_t offset, size_t size, bool is_persistent);
/// Remove files by `key`. Removes files which might be used at the moment.
void removeIfExists(const Key & key);
/// Remove files by `key`. Will not remove files which are used at the moment.
void removeIfReleasable();
static Key hash(const String & path);
String getPathInLocalCache(const Key & key, size_t offset, bool is_persistent) const;
String getPathInLocalCache(const Key & key) const;
std::vector<String> tryGetCachePaths(const Key & key);
size_t capacity() const { return max_size; }
size_t getUsedCacheSize() const;
size_t getFileSegmentsNum() const;
static bool isReadOnly();
/**
* Create a file segment of exactly requested size with EMPTY state.
* Throw exception if requested size exceeds max allowed file segment size.
* This method is for protected usage: file segment range writer uses it
* to dynamically allocate file segments.
*/
FileSegmentPtr createFileSegmentForDownload(
const Key & key,
size_t offset,
size_t size,
bool is_persistent,
std::lock_guard<std::mutex> & cache_lock);
FileSegments getSnapshot() const;
/// For debug.
String dumpStructure(const Key & key);
size_t getUsedCacheSize() const;
size_t getFileSegmentsNum() const;
private:
String cache_base_path;
size_t max_size;
size_t max_element_size;
size_t max_file_segment_size;
bool is_initialized = false;
mutable std::mutex mutex;
bool tryReserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void remove(Key key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
bool isLastFileSegmentHolder(
const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
void reduceSizeToDownloaded(
const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & /* segment_lock */);
void assertInitialized() const;
using AccessKeyAndOffset = std::pair<Key, size_t>;
struct KeyAndOffsetHash
{
std::size_t operator()(const AccessKeyAndOffset & key) const
{
return std::hash<UInt128>()(key.first.key) ^ std::hash<UInt64>()(key.second);
}
};
using FileCacheRecords = std::unordered_map<AccessKeyAndOffset, IFileCachePriority::WriteIterator, KeyAndOffsetHash>;
/// Used to track and control the cache access of each query.
/// Through it, we can realize the processing of different queries by the cache layer.
struct QueryContext
{
FileCacheRecords records;
FileCachePriorityPtr priority;
size_t cache_size = 0;
size_t max_cache_size;
bool skip_download_if_exceeds_query_cache;
QueryContext(size_t max_cache_size_, bool skip_download_if_exceeds_query_cache_)
: max_cache_size(max_cache_size_), skip_download_if_exceeds_query_cache(skip_download_if_exceeds_query_cache_)
{
}
void remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock);
size_t getMaxCacheSize() const { return max_cache_size; }
size_t getCacheSize() const { return cache_size; }
FileCachePriorityPtr getPriority() { return priority; }
bool isSkipDownloadIfExceed() const { return skip_download_if_exceeds_query_cache; }
};
using QueryContextPtr = std::shared_ptr<QueryContext>;
using QueryContextMap = std::unordered_map<String, QueryContextPtr>;
QueryContextMap query_map;
bool enable_filesystem_query_cache_limit;
QueryContextPtr getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock);
QueryContextPtr getQueryContext(const String & query_id, std::lock_guard<std::mutex> & cache_lock);
void removeQueryContext(const String & query_id);
QueryContextPtr getOrSetQueryContext(const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> &);
public:
/// Save a query context information, and adopt different cache policies
/// for different queries through the context cache layer.
struct QueryContextHolder : private boost::noncopyable
@ -194,6 +129,43 @@ public:
QueryContextHolder getQueryContextHolder(const String & query_id, const ReadSettings & settings);
private:
String cache_base_path;
size_t max_size;
size_t max_element_size;
size_t max_file_segment_size;
bool allow_persistent_files;
size_t enable_cache_hits_threshold;
bool enable_filesystem_query_cache_limit;
Poco::Logger * log;
bool is_initialized = false;
mutable std::mutex mutex;
bool tryReserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void remove(
Key key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
bool isLastFileSegmentHolder(
const Key & key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
void reduceSizeToDownloaded(
const Key & key,
size_t offset,
std::lock_guard<std::mutex> & cache_lock,
std::lock_guard<std::mutex> & segment_lock);
void assertInitialized() const;
struct FileSegmentCell : private boost::noncopyable
{
FileSegmentPtr file_segment;
@ -211,25 +183,30 @@ private:
FileSegmentCell(FileSegmentPtr file_segment_, FileCache * cache, std::lock_guard<std::mutex> & cache_lock);
FileSegmentCell(FileSegmentCell && other) noexcept
: file_segment(std::move(other.file_segment)), queue_iterator(other.queue_iterator)
: file_segment(std::move(other.file_segment)), queue_iterator(std::move(other.queue_iterator)) {}
};
using AccessKeyAndOffset = std::pair<Key, size_t>;
struct KeyAndOffsetHash
{
std::size_t operator()(const AccessKeyAndOffset & key) const
{
return std::hash<UInt128>()(key.first.key) ^ std::hash<UInt64>()(key.second);
}
};
using FileSegmentsByOffset = std::map<size_t, FileSegmentCell>;
using CachedFiles = std::unordered_map<Key, FileSegmentsByOffset>;
using FileCacheRecords = std::unordered_map<AccessKeyAndOffset, IFileCachePriority::WriteIterator, KeyAndOffsetHash>;
CachedFiles files;
std::unique_ptr<IFileCachePriority> main_priority;
FileCacheRecords stash_records;
std::unique_ptr<IFileCachePriority> stash_priority;
size_t max_stash_element_size;
size_t enable_cache_hits_threshold;
Poco::Logger * log;
bool allow_to_remove_persistent_segments_from_cache_by_default;
void loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_lock);
FileSegments getImpl(const Key & key, const FileSegment::Range & range, std::lock_guard<std::mutex> & cache_lock);
@ -246,11 +223,11 @@ private:
void useCell(const FileSegmentCell & cell, FileSegments & result, std::lock_guard<std::mutex> & cache_lock) const;
bool tryReserveForMainList(
const Key & key, size_t offset, size_t size, QueryContextPtr query_context, std::lock_guard<std::mutex> & cache_lock);
size_t getAvailableCacheSize() const;
void loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_lock);
const Key & key,
size_t offset,
size_t size,
QueryContextPtr query_context,
std::lock_guard<std::mutex> & cache_lock);
FileSegments splitRangeIntoCells(
const Key & key,
@ -278,6 +255,48 @@ private:
void assertCacheCellsCorrectness(const FileSegmentsByOffset & cells_by_offset, std::lock_guard<std::mutex> & cache_lock);
/// Used to track and control the cache access of each query.
/// Through it, we can realize the processing of different queries by the cache layer.
struct QueryContext
{
FileCacheRecords records;
FileCachePriorityPtr priority;
size_t cache_size = 0;
size_t max_cache_size;
bool skip_download_if_exceeds_query_cache;
QueryContext(size_t max_cache_size_, bool skip_download_if_exceeds_query_cache_)
: max_cache_size(max_cache_size_)
, skip_download_if_exceeds_query_cache(skip_download_if_exceeds_query_cache_) {}
size_t getMaxCacheSize() const { return max_cache_size; }
size_t getCacheSize() const { return cache_size; }
FileCachePriorityPtr getPriority() const { return priority; }
bool isSkipDownloadIfExceed() const { return skip_download_if_exceeds_query_cache; }
void remove(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void reserve(const Key & key, size_t offset, size_t size, std::lock_guard<std::mutex> & cache_lock);
void use(const Key & key, size_t offset, std::lock_guard<std::mutex> & cache_lock);
};
using QueryContextMap = std::unordered_map<String, QueryContextPtr>;
QueryContextMap query_map;
QueryContextPtr getCurrentQueryContext(std::lock_guard<std::mutex> & cache_lock);
QueryContextPtr getQueryContext(const String & query_id, std::lock_guard<std::mutex> & cache_lock);
void removeQueryContext(const String & query_id);
QueryContextPtr getOrSetQueryContext(const String & query_id, const ReadSettings & settings, std::lock_guard<std::mutex> &);
public:
void assertCacheCorrectness(const Key & key, std::lock_guard<std::mutex> & cache_lock);

View File

@ -1,20 +1,37 @@
#include "FileCacheSettings.h"
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
void FileCacheSettings::loadFromConfig(const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
{
max_size = config.getUInt64(config_prefix + ".data_cache_max_size", REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_CACHE_SIZE);
max_elements = config.getUInt64(config_prefix + ".data_cache_max_elements", REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_ELEMENTS);
if (!config.has(config_prefix + ".max_size"))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected cache size (`size`) in configuration");
max_size = config.getUInt64(config_prefix + ".max_size", 0);
if (max_size == 0)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected non-zero size for cache configuration");
auto path = config.getString(config_prefix + ".path", "");
if (path.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Disk Cache requires non-empty `path` field (cache base path) in config");
max_elements = config.getUInt64(config_prefix + ".max_elements", REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_ELEMENTS);
max_file_segment_size = config.getUInt64(config_prefix + ".max_file_segment_size", REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_FILE_SEGMENT_SIZE);
cache_on_write_operations = config.getUInt64(config_prefix + ".cache_on_write_operations", false);
enable_filesystem_query_cache_limit = config.getUInt64(config_prefix + ".enable_filesystem_query_cache_limit", false);
enable_cache_hits_threshold = config.getUInt64(config_prefix + ".enable_cache_hits_threshold", REMOTE_FS_OBJECTS_CACHE_ENABLE_HITS_THRESHOLD);
do_not_evict_index_and_mark_files = config.getUInt64(config_prefix + ".do_not_evict_index_and_mark_files", true);
allow_to_remove_persistent_segments_from_cache_by_default = config.getUInt64(config_prefix + ".allow_to_remove_persistent_segments_from_cache_by_default", true);
}
}

View File

@ -19,7 +19,6 @@ struct FileCacheSettings
bool enable_filesystem_query_cache_limit = false;
bool do_not_evict_index_and_mark_files = true;
bool allow_to_remove_persistent_segments_from_cache_by_default = true;
void loadFromConfig(const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix);
};

View File

@ -4,7 +4,6 @@
namespace DB
{
static constexpr int REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_CACHE_SIZE = 1024 * 1024 * 1024;
static constexpr int REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_FILE_SEGMENT_SIZE = 100 * 1024 * 1024;
static constexpr int REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_ELEMENTS = 1024 * 1024;
static constexpr int REMOTE_FS_OBJECTS_CACHE_ENABLE_HITS_THRESHOLD = 0;

View File

@ -1,11 +1,13 @@
#include "FileSegment.h"
#include <base/getThreadId.h>
#include <Common/hex.h>
#include <base/scope_guard.h>
#include <Common/logger_useful.h>
#include <Common/FileCache.h>
#include <Common/hex.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
#include <filesystem>
#include <Common/FileCache.h>
namespace CurrentMetrics
{
@ -37,7 +39,7 @@ FileSegment::FileSegment(
#else
, log(&Poco::Logger::get("FileSegment"))
#endif
, is_persistent(is_persistent_) /// Not really used for now, see PR 36171
, is_persistent(is_persistent_)
{
/// On creation, file segment state can be EMPTY, DOWNLOADED, DOWNLOADING.
switch (download_state)
@ -55,13 +57,6 @@ FileSegment::FileSegment(
reserved_size = downloaded_size = size_;
break;
}
/// DOWNLOADING is used only for write-through caching (e.g. getOrSetDownloader() is not
/// needed, downloader is set on file segment creation).
case (State::DOWNLOADING):
{
downloader_id = getCallerId();
break;
}
case (State::SKIP_CACHE):
{
break;
@ -91,6 +86,18 @@ size_t FileSegment::getDownloadedSize() const
return getDownloadedSize(segment_lock);
}
size_t FileSegment::getRemainingSizeToDownload() const
{
std::lock_guard segment_lock(mutex);
return range().size() - downloaded_size;
}
bool FileSegment::isDetached() const
{
std::lock_guard segment_lock(mutex);
return is_detached;
}
size_t FileSegment::getDownloadedSize(std::lock_guard<std::mutex> & /* segment_lock */) const
{
if (download_state == State::DOWNLOADED)
@ -184,6 +191,22 @@ FileSegment::RemoteFileReaderPtr FileSegment::getRemoteFileReader()
return remote_file_reader;
}
FileSegment::RemoteFileReaderPtr FileSegment::extractRemoteFileReader()
{
std::lock_guard cache_lock(cache->mutex);
std::lock_guard segment_lock(mutex);
if (!is_detached)
{
bool is_last_holder = cache->isLastFileSegmentHolder(key(), offset(), cache_lock, segment_lock);
if (!downloader_id.empty() || !is_last_holder)
return nullptr;
}
LOG_TRACE(log, "Extracted reader from file segment");
return std::move(remote_file_reader);
}
void FileSegment::setRemoteFileReader(RemoteFileReaderPtr remote_file_reader_)
{
if (!isDownloader())
@ -279,80 +302,6 @@ String FileSegment::getPathInLocalCache() const
return cache->getPathInLocalCache(key(), offset(), isPersistent());
}
void FileSegment::writeInMemory(const char * from, size_t size)
{
if (!size)
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Attempt to write zero size cache file");
if (availableSize() < size)
throw Exception(
ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR,
"Not enough space is reserved. Available: {}, expected: {}", availableSize(), size);
std::lock_guard segment_lock(mutex);
assertNotDetached(segment_lock);
if (cache_writer)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cache writer already initialized");
auto download_path = getPathInLocalCache();
cache_writer = std::make_unique<WriteBufferFromFile>(download_path, size + 1);
try
{
cache_writer->write(from, size);
}
catch (Exception & e)
{
wrapWithCacheInfo(e, "while writing into cache", segment_lock);
setDownloadFailed(segment_lock);
cv.notify_all();
throw;
}
}
size_t FileSegment::finalizeWrite()
{
std::lock_guard segment_lock(mutex);
if (!cache_writer)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cache writer not initialized");
size_t size = cache_writer->offset();
if (size == 0)
throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Writing zero size is not allowed");
assertNotDetached(segment_lock);
try
{
cache_writer->next();
}
catch (Exception & e)
{
wrapWithCacheInfo(e, "while writing into cache", segment_lock);
setDownloadFailed(segment_lock);
cv.notify_all();
throw;
}
downloaded_size += size;
if (downloaded_size != range().size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected downloaded size to equal file segment size ({} == {})", downloaded_size, range().size());
setDownloaded(segment_lock);
return size;
}
FileSegment::State FileSegment::wait()
{
std::unique_lock segment_lock(mutex);
@ -481,7 +430,7 @@ void FileSegment::completeBatchAndResetDownloader()
cv.notify_all();
}
void FileSegment::complete(State state)
void FileSegment::completeWithState(State state, bool auto_resize)
{
std::lock_guard cache_lock(cache->mutex);
std::lock_guard segment_lock(mutex);
@ -506,8 +455,24 @@ void FileSegment::complete(State state)
}
if (state == State::DOWNLOADED)
{
if (auto_resize && downloaded_size != range().size())
{
LOG_TEST(log, "Resize cell {} to downloaded: {}", range().toString(), downloaded_size);
assert(downloaded_size <= range().size());
segment_range = Range(segment_range.left, segment_range.left + downloaded_size - 1);
}
/// Update states and finalize cache write buffer.
setDownloaded(segment_lock);
if (downloaded_size != range().size())
throw Exception(
ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR,
"Cannot complete file segment as DOWNLOADED, because downloaded size ({}) does not match expected size ({})",
downloaded_size, range().size());
}
download_state = state;
try
@ -526,16 +491,20 @@ void FileSegment::complete(State state)
cv.notify_all();
}
void FileSegment::complete(std::lock_guard<std::mutex> & cache_lock)
void FileSegment::completeBasedOnCurrentState(std::lock_guard<std::mutex> & cache_lock)
{
std::lock_guard segment_lock(mutex);
if (is_detached)
return;
assertNotDetached(segment_lock);
completeUnlocked(cache_lock, segment_lock);
completeBasedOnCurrentStateUnlocked(cache_lock, segment_lock);
}
void FileSegment::completeUnlocked(std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock)
void FileSegment::completeBasedOnCurrentStateUnlocked(
std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock)
{
bool is_last_holder = cache->isLastFileSegmentHolder(key(), offset(), cache_lock, segment_lock);
@ -607,7 +576,10 @@ void FileSegment::completeImpl(std::lock_guard<std::mutex> & cache_lock, std::lo
* it only when nobody needs it.
*/
download_state = State::PARTIALLY_DOWNLOADED_NO_CONTINUATION;
LOG_TEST(log, "Resize cell {} to downloaded: {}", range().toString(), current_downloaded_size);
/// Resize this file segment by creating a copy file segment with DOWNLOADED state,
/// but current file segment should remain PARRTIALLY_DOWNLOADED_NO_CONTINUATION and with detached state,
/// because otherwise an invariant that getOrSet() returns a contiguous range of file segments will be broken
/// (this will be crucial for other file segment holder, not for current one).
cache->reduceSizeToDownloaded(key(), offset(), cache_lock, segment_lock);
}
@ -644,8 +616,9 @@ String FileSegment::getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock
info << "state: " << download_state << ", ";
info << "downloaded size: " << getDownloadedSize(segment_lock) << ", ";
info << "reserved size: " << reserved_size << ", ";
info << "downloader id: " << downloader_id << ", ";
info << "caller id: " << getCallerId();
info << "downloader id: " << (downloader_id.empty() ? "None" : downloader_id) << ", ";
info << "caller id: " << getCallerId() << ", ";
info << "persistent: " << is_persistent;
return info.str();
}
@ -820,7 +793,7 @@ FileSegmentsHolder::~FileSegmentsHolder()
/// under the same mutex, because complete() checks for segment pointers.
std::lock_guard cache_lock(cache->mutex);
file_segment->complete(cache_lock);
file_segment->completeBasedOnCurrentState(cache_lock);
file_segment_it = file_segments.erase(current_file_segment_it);
}
@ -843,4 +816,149 @@ String FileSegmentsHolder::toString()
return ranges;
}
FileSegmentRangeWriter::FileSegmentRangeWriter(
FileCache * cache_,
const FileSegment::Key & key_,
OnCompleteFileSegmentCallback && on_complete_file_segment_func_)
: cache(cache_)
, key(key_)
, current_file_segment_it(file_segments_holder.file_segments.end())
, on_complete_file_segment_func(on_complete_file_segment_func_)
{
}
FileSegments::iterator FileSegmentRangeWriter::allocateFileSegment(size_t offset, bool is_persistent)
{
/**
* Allocate a new file segment starting `offset`.
* File segment capacity will equal `max_file_segment_size`, but actual size is 0.
*/
std::lock_guard cache_lock(cache->mutex);
/// We set max_file_segment_size to be downloaded,
/// if we have less size to write, file segment will be resized in complete() method.
auto file_segment = cache->createFileSegmentForDownload(
key, offset, cache->max_file_segment_size, is_persistent, cache_lock);
return file_segments_holder.add(std::move(file_segment));
}
void FileSegmentRangeWriter::completeFileSegment(FileSegment & file_segment)
{
/**
* Complete file segment based on downaloaded size.
*/
/// File segment can be detached if space reservation failed.
if (file_segment.isDetached())
return;
if (file_segment.getDownloadedSize() > 0)
{
/// file_segment->complete(DOWNLOADED) is not enough, because file segment capacity
/// was initially set with a margin as `max_file_segment_size`. => We need to always
/// resize to actual size after download finished.
file_segment.getOrSetDownloader();
file_segment.completeWithState(FileSegment::State::DOWNLOADED, /* auto_resize */true);
on_complete_file_segment_func(file_segment);
}
else
{
std::lock_guard cache_lock(cache->mutex);
file_segment.completeBasedOnCurrentState(cache_lock);
}
}
bool FileSegmentRangeWriter::write(const char * data, size_t size, size_t offset, bool is_persistent)
{
/**
* Write a range of file segments. Allocate file segment of `max_file_segment_size` and write to
* it until it is full and then allocate next file segment.
*/
if (finalized)
return false;
auto & file_segments = file_segments_holder.file_segments;
if (current_file_segment_it == file_segments.end())
{
current_file_segment_it = allocateFileSegment(current_file_segment_write_offset, is_persistent);
}
else
{
if (current_file_segment_write_offset != offset)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Cannot write file segment at offset {}, because current write offset is: {}",
offset, current_file_segment_write_offset);
}
if ((*current_file_segment_it)->getRemainingSizeToDownload() == 0)
{
completeFileSegment(**current_file_segment_it);
current_file_segment_it = allocateFileSegment(current_file_segment_write_offset, is_persistent);
}
else if ((*current_file_segment_it)->getDownloadOffset() != offset)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Cannot file segment download offset {} does not match current write offset {}",
(*current_file_segment_it)->getDownloadOffset(), offset);
}
}
auto & file_segment = *current_file_segment_it;
file_segment->getOrSetDownloader();
SCOPE_EXIT({
file_segment->resetDownloader();
});
bool reserved = file_segment->reserve(size);
if (!reserved)
{
file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
on_complete_file_segment_func(*file_segment);
LOG_DEBUG(
&Poco::Logger::get("FileSegmentRangeWriter"),
"Unsuccessful space reservation attempt (size: {}, file segment info: {}",
size, file_segment->getInfoForLog());
return false;
}
(*current_file_segment_it)->write(data, size, offset);
current_file_segment_write_offset += size;
return true;
}
void FileSegmentRangeWriter::finalize()
{
if (finalized)
return;
auto & file_segments = file_segments_holder.file_segments;
if (file_segments.empty() || current_file_segment_it == file_segments.end())
return;
completeFileSegment(**current_file_segment_it);
finalized = true;
}
FileSegmentRangeWriter::~FileSegmentRangeWriter()
{
try
{
if (!finalized)
finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
}

View File

@ -1,7 +1,7 @@
#pragma once
#include <boost/noncopyable.hpp>
#include <Core/Types.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/ReadBufferFromFileBase.h>
#include <list>
@ -18,6 +18,7 @@ namespace DB
{
class FileCache;
class ReadBufferFromFileBase;
class FileSegment;
using FileSegmentPtr = std::shared_ptr<FileSegment>;
@ -113,17 +114,10 @@ public:
void write(const char * from, size_t size, size_t offset_);
/**
* writeInMemory and finalizeWrite are used together to write a single file with delay.
* Both can be called only once, one after another. Used for writing cache via threadpool
* on wrote operations. TODO: this solution is temporary, until adding a separate cache layer.
*/
void writeInMemory(const char * from, size_t size);
size_t finalizeWrite();
RemoteFileReaderPtr getRemoteFileReader();
RemoteFileReaderPtr extractRemoteFileReader();
void setRemoteFileReader(RemoteFileReaderPtr remote_file_reader_);
void resetRemoteFileReader();
@ -144,9 +138,11 @@ public:
size_t getDownloadedSize() const;
size_t getRemainingSizeToDownload() const;
void completeBatchAndResetDownloader();
void complete(State state);
void completeWithState(State state, bool auto_resize = false);
String getInfoForLog() const;
@ -168,6 +164,8 @@ public:
[[noreturn]] void throwIfDetached() const;
bool isDetached() const;
String getPathInLocalCache() const;
private:
@ -197,8 +195,8 @@ private:
/// FileSegmentsHolder. complete() might check if the caller of the method
/// is the last alive holder of the segment. Therefore, complete() and destruction
/// of the file segment pointer must be done under the same cache mutex.
void complete(std::lock_guard<std::mutex> & cache_lock);
void completeUnlocked(std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
void completeBasedOnCurrentState(std::lock_guard<std::mutex> & cache_lock);
void completeBasedOnCurrentStateUnlocked(std::lock_guard<std::mutex> & cache_lock, std::lock_guard<std::mutex> & segment_lock);
void completeImpl(
std::lock_guard<std::mutex> & cache_lock,
@ -206,7 +204,7 @@ private:
void resetDownloaderImpl(std::lock_guard<std::mutex> & segment_lock);
const Range segment_range;
Range segment_range;
State download_state;
@ -246,23 +244,71 @@ private:
std::atomic<size_t> hits_count = 0; /// cache hits.
std::atomic<size_t> ref_count = 0; /// Used for getting snapshot state
/// Currently no-op. (will be added in PR 36171)
/// Defined if a file comply by the eviction policy.
bool is_persistent;
CurrentMetrics::Increment metric_increment{CurrentMetrics::CacheFileSegments};
};
struct FileSegmentsHolder : private boost::noncopyable
{
FileSegmentsHolder() = default;
explicit FileSegmentsHolder(FileSegments && file_segments_) : file_segments(std::move(file_segments_)) {}
FileSegmentsHolder(FileSegmentsHolder && other) noexcept : file_segments(std::move(other.file_segments)) {}
~FileSegmentsHolder();
FileSegments file_segments{};
String toString();
FileSegments::iterator add(FileSegmentPtr && file_segment)
{
return file_segments.insert(file_segments.end(), file_segment);
}
FileSegments file_segments{};
};
/**
* We want to write eventually some size, which is not known until the very end.
* Therefore we allocate file segments lazily. Each file segment is assigned capacity
* of max_file_segment_size, but reserved_size remains 0, until call to tryReserve().
* Once current file segment is full (reached max_file_segment_size), we allocate a
* new file segment. All allocated file segments resize in file segments holder.
* If at the end of all writes, the last file segment is not full, then it is resized.
*/
class FileSegmentRangeWriter
{
public:
using OnCompleteFileSegmentCallback = std::function<void(const FileSegment & file_segment)>;
FileSegmentRangeWriter(
FileCache * cache_,
const FileSegment::Key & key_,
/// A callback which is called right after each file segment is completed.
/// It is used to write into filesystem cache log.
OnCompleteFileSegmentCallback && on_complete_file_segment_func_);
~FileSegmentRangeWriter();
bool write(const char * data, size_t size, size_t offset, bool is_persistent);
void finalize();
private:
FileSegments::iterator allocateFileSegment(size_t offset, bool is_persistent);
void completeFileSegment(FileSegment & file_segment);
FileCache * cache;
FileSegment::Key key;
FileSegmentsHolder file_segments_holder;
FileSegments::iterator current_file_segment_it;
size_t current_file_segment_write_offset = 0;
bool finalized = false;
OnCompleteFileSegmentCallback on_complete_file_segment_func;
};
}

View File

@ -278,6 +278,8 @@
M(CachedReadBufferReadFromCacheBytes, "Bytes read from filesystem cache") \
M(CachedReadBufferCacheWriteBytes, "Bytes written from source (remote fs, etc) to filesystem cache") \
M(CachedReadBufferCacheWriteMicroseconds, "Time spent writing data into filesystem cache") \
M(CachedWriteBufferCacheWriteBytes, "Bytes written from source (remote fs, etc) to filesystem cache") \
M(CachedWriteBufferCacheWriteMicroseconds, "Time spent writing data into filesystem cache") \
\
M(RemoteFSSeeks, "Total number of seeks for async buffer") \
M(RemoteFSPrefetches, "Number of prefetches made with asynchronous reading from remote filesystem") \
@ -347,7 +349,13 @@
\
M(ScalarSubqueriesGlobalCacheHit, "Number of times a read from a scalar subquery was done using the global cache") \
M(ScalarSubqueriesLocalCacheHit, "Number of times a read from a scalar subquery was done using the local cache") \
M(ScalarSubqueriesCacheMiss, "Number of times a read from a scalar subquery was not cached and had to be calculated completely") \
M(ScalarSubqueriesCacheMiss, "Number of times a read from a scalar subquery was not cached and had to be calculated completely") \
\
M(SchemaInferenceCacheHits, "Number of times a schema from cache was used for schema inference") \
M(SchemaInferenceCacheMisses, "Number of times a schema is not in cache while schema inference") \
M(SchemaInferenceCacheEvictions, "Number of times a schema from cache was evicted due to overflow") \
M(SchemaInferenceCacheInvalidations, "Number of times a schema in cache became invalid due to changes in data") \
\
M(KeeperPacketsSent, "Packets sent by keeper server") \
M(KeeperPacketsReceived, "Packets received by keeper server") \
M(KeeperRequestTotal, "Total requests number on keeper server") \

View File

@ -80,7 +80,7 @@ void complete(const DB::FileSegmentsHolder & holder)
{
ASSERT_TRUE(file_segment->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(file_segment);
file_segment->complete(DB::FileSegment::State::DOWNLOADED);
file_segment->completeWithState(DB::FileSegment::State::DOWNLOADED);
}
}
@ -125,7 +125,7 @@ TEST(FileCache, get)
assertRange(2, segments[0], DB::FileSegment::Range(0, 9), DB::FileSegment::State::DOWNLOADING);
download(segments[0]);
segments[0]->complete(DB::FileSegment::State::DOWNLOADED);
segments[0]->completeWithState(DB::FileSegment::State::DOWNLOADED);
assertRange(3, segments[0], DB::FileSegment::Range(0, 9), DB::FileSegment::State::DOWNLOADED);
}
@ -146,7 +146,7 @@ TEST(FileCache, get)
ASSERT_TRUE(segments[1]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(segments[1]);
segments[1]->complete(DB::FileSegment::State::DOWNLOADED);
segments[1]->completeWithState(DB::FileSegment::State::DOWNLOADED);
assertRange(6, segments[1], DB::FileSegment::Range(10, 14), DB::FileSegment::State::DOWNLOADED);
}
@ -203,7 +203,7 @@ TEST(FileCache, get)
ASSERT_TRUE(segments[2]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(segments[2]);
segments[2]->complete(DB::FileSegment::State::DOWNLOADED);
segments[2]->completeWithState(DB::FileSegment::State::DOWNLOADED);
assertRange(14, segments[3], DB::FileSegment::Range(17, 20), DB::FileSegment::State::DOWNLOADED);
@ -244,7 +244,7 @@ TEST(FileCache, get)
ASSERT_TRUE(segments[3]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(segments[3]);
segments[3]->complete(DB::FileSegment::State::DOWNLOADED);
segments[3]->completeWithState(DB::FileSegment::State::DOWNLOADED);
ASSERT_TRUE(segments[3]->state() == DB::FileSegment::State::DOWNLOADED);
}
@ -267,8 +267,8 @@ TEST(FileCache, get)
ASSERT_TRUE(segments[2]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(segments[0]);
prepareAndDownload(segments[2]);
segments[0]->complete(DB::FileSegment::State::DOWNLOADED);
segments[2]->complete(DB::FileSegment::State::DOWNLOADED);
segments[0]->completeWithState(DB::FileSegment::State::DOWNLOADED);
segments[2]->completeWithState(DB::FileSegment::State::DOWNLOADED);
}
/// Current cache: [____][_] [][___][__]
@ -290,8 +290,8 @@ TEST(FileCache, get)
ASSERT_TRUE(s1[0]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(s5[0]);
prepareAndDownload(s1[0]);
s5[0]->complete(DB::FileSegment::State::DOWNLOADED);
s1[0]->complete(DB::FileSegment::State::DOWNLOADED);
s5[0]->completeWithState(DB::FileSegment::State::DOWNLOADED);
s1[0]->completeWithState(DB::FileSegment::State::DOWNLOADED);
/// Current cache: [___] [_][___][_] [__]
/// ^ ^ ^ ^ ^ ^ ^ ^
@ -393,7 +393,7 @@ TEST(FileCache, get)
}
prepareAndDownload(segments[2]);
segments[2]->complete(DB::FileSegment::State::DOWNLOADED);
segments[2]->completeWithState(DB::FileSegment::State::DOWNLOADED);
ASSERT_TRUE(segments[2]->state() == DB::FileSegment::State::DOWNLOADED);
other_1.join();
@ -458,7 +458,7 @@ TEST(FileCache, get)
ASSERT_TRUE(segments_2[1]->getOrSetDownloader() == DB::FileSegment::getCallerId());
prepareAndDownload(segments_2[1]);
segments_2[1]->complete(DB::FileSegment::State::DOWNLOADED);
segments_2[1]->completeWithState(DB::FileSegment::State::DOWNLOADED);
});
{

View File

@ -73,6 +73,11 @@ bool BackgroundSchedulePoolTaskInfo::activateAndSchedule()
return true;
}
std::unique_lock<std::mutex> BackgroundSchedulePoolTaskInfo::getExecLock()
{
return std::unique_lock{exec_mutex};
}
void BackgroundSchedulePoolTaskInfo::execute()
{
Stopwatch watch;

View File

@ -121,6 +121,10 @@ public:
/// get Coordination::WatchCallback needed for notifications from ZooKeeper watches.
Coordination::WatchCallback getWatchCallback();
/// Returns lock that protects from concurrent task execution.
/// This lock should not be held for a long time.
std::unique_lock<std::mutex> getExecLock();
private:
friend class TaskNotification;
friend class BackgroundSchedulePool;

View File

@ -280,6 +280,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(UInt64, http_max_fields, 1000000, "Maximum number of fields in HTTP header", 0) \
M(UInt64, http_max_field_name_size, 1048576, "Maximum length of field name in HTTP header", 0) \
M(UInt64, http_max_field_value_size, 1048576, "Maximum length of field value in HTTP header", 0) \
M(UInt64, http_max_chunk_size, 100_GiB, "Maximum value of a chunk size in HTTP chunked transfer encoding", 0) \
M(Bool, http_skip_not_found_url_for_globs, true, "Skip url's for globs with HTTP_NOT_FOUND error", 0) \
M(Bool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown", 0) \
M(Bool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.", 0) \
@ -408,6 +409,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(UInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.", 0) \
M(Bool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.", 0) \
M(Bool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations", 0) \
M(Bool, allow_custom_error_code_in_throwif, false, "Enable custom error code in function throwIf(). If true, thrown exceptions may have unexpected error codes.", 0) \
\
M(Bool, prefer_localhost_replica, true, "If it's true then queries will be always sent to local replica (if it exists). If it's false then replica to send a query will be chosen between local and remote ones according to load_balancing", 0) \
M(UInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.", 0) \
@ -591,6 +593,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, enable_filesystem_cache_on_write_operations, false, "Write into cache on write operations. To actually work this setting requires be added to disk config too", 0) \
M(Bool, enable_filesystem_cache_log, false, "Allows to record the filesystem caching log for each query", 0) \
M(Bool, read_from_filesystem_cache_if_exists_otherwise_bypass_cache, false, "", 0) \
M(Bool, enable_filesystem_cache_on_lower_level, true, "If read buffer supports caching inside threadpool, allow it to do it, otherwise cache outside ot threadpool. Do not use this setting, it is needed for testing", 0) \
M(Bool, skip_download_if_exceeds_query_cache, true, "Skip download from remote filesystem if exceeds query cache size", 0) \
M(UInt64, max_query_cache_size, (128UL * 1024 * 1024 * 1024), "Max remote filesystem cache size that can be used by a single query", 0) \
\
@ -608,6 +611,12 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, allow_deprecated_database_ordinary, false, "Allow to create databases with deprecated Ordinary engine", 0) \
M(Bool, allow_deprecated_syntax_for_merge_tree, false, "Allow to create *MergeTree tables with deprecated engine definition syntax", 0) \
\
M(Bool, schema_inference_use_cache_for_file, true, "Use cache in schema inference while using file table function", 0) \
M(Bool, schema_inference_use_cache_for_s3, true, "Use cache in schema inference while using s3 table function", 0) \
M(Bool, schema_inference_use_cache_for_hdfs, true, "Use cache in schema inference while using hdfs table function", 0) \
M(Bool, schema_inference_use_cache_for_url, true, "Use cache in schema inference while using url table function", 0) \
M(Bool, schema_inference_cache_require_modification_time_for_url, true, "Use schema from cache for URL with last modification time validation (for urls with Last-Modified header)", 0) \
\
M(String, compatibility, "", "Changes other settings according to provided ClickHouse version. If we know that we changed some behaviour in ClickHouse by changing some settings in some version, this compatibility setting will control these settings", 0) \
\
M(Map, additional_table_filters, "", "Additional filter expression which would be applied after reading from specified table. Syntax: {'table1': 'expression', 'database.table2': 'expression'}", 0) \
@ -706,6 +715,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, input_format_orc_skip_columns_with_unsupported_types_in_schema_inference, false, "Skip columns with unsupported types while schema inference for format ORC", 0) \
M(Bool, input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference, false, "Skip columns with unsupported types while schema inference for format Arrow", 0) \
M(String, column_names_for_schema_inference, "", "The list of column names to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'", 0) \
M(String, schema_inference_hints, "", "The list of column names and types to use in schema inference for formats without column names. The format: 'column_name1 column_type1, column_name2 column_type2, ...'", 0) \
M(Bool, input_format_json_read_bools_as_numbers, true, "Allow to parse bools as numbers in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, true, "Try to infer numbers from string fields while schema inference", 0) \
M(Bool, input_format_try_infer_integers, true, "Try to infer numbers from string fields while schema inference in text formats", 0) \

View File

@ -126,13 +126,30 @@ std::pair<String, String> DatabaseReplicated::parseFullReplicaName(const String
return {shard, replica};
}
ClusterPtr DatabaseReplicated::getCluster() const
ClusterPtr DatabaseReplicated::tryGetCluster() const
{
std::lock_guard lock{mutex};
if (cluster)
return cluster;
cluster = getClusterImpl();
/// Database is probably not created or not initialized yet, it's ok to return nullptr
if (is_readonly)
return cluster;
try
{
/// A quick fix for stateless tests with DatabaseReplicated. Its ZK
/// node can be destroyed at any time. If another test lists
/// system.clusters to get client command line suggestions, it will
/// get an error when trying to get the info about DB from ZK.
/// Just ignore these inaccessible databases. A good example of a
/// failing test is `01526_client_start_and_exit`.
cluster = getClusterImpl();
}
catch (...)
{
tryLogCurrentException(log);
}
return cluster;
}

View File

@ -60,7 +60,7 @@ public:
const String & getZooKeeperPath() const { return zookeeper_path; }
/// Returns cluster consisting of database replicas
ClusterPtr getCluster() const;
ClusterPtr tryGetCluster() const;
void drop(ContextPtr /*context*/) override;

View File

@ -1,371 +0,0 @@
#include "DiskCacheWrapper.h"
#include <IO/copyData.h>
#include <IO/ReadBufferFromFileDecorator.h>
#include <IO/WriteBufferFromFileDecorator.h>
#include <Common/quoteString.h>
#include <condition_variable>
namespace DB
{
/**
* This buffer writes to cache, but after finalize() copy written file from cache to disk.
*/
class WritingToCacheWriteBuffer final : public WriteBufferFromFileDecorator
{
public:
WritingToCacheWriteBuffer(
std::unique_ptr<WriteBufferFromFileBase> impl_,
std::function<std::unique_ptr<ReadBuffer>()> create_read_buffer_,
std::function<std::unique_ptr<WriteBuffer>()> create_write_buffer_)
: WriteBufferFromFileDecorator(std::move(impl_))
, create_read_buffer(std::move(create_read_buffer_))
, create_write_buffer(std::move(create_write_buffer_))
{
}
~WritingToCacheWriteBuffer() override
{
try
{
finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
void preFinalize() override
{
impl->next();
impl->preFinalize();
impl->finalize();
read_buffer = create_read_buffer();
write_buffer = create_write_buffer();
copyData(*read_buffer, *write_buffer);
write_buffer->next();
write_buffer->preFinalize();
is_prefinalized = true;
}
void finalizeImpl() override
{
if (!is_prefinalized)
preFinalize();
write_buffer->finalize();
}
private:
std::function<std::unique_ptr<ReadBuffer>()> create_read_buffer;
std::function<std::unique_ptr<WriteBuffer>()> create_write_buffer;
std::unique_ptr<ReadBuffer> read_buffer;
std::unique_ptr<WriteBuffer> write_buffer;
bool is_prefinalized = false;
};
enum FileDownloadStatus
{
NONE,
DOWNLOADING,
DOWNLOADED,
ERROR
};
struct FileDownloadMetadata
{
/// Thread waits on this condition if download process is in progress.
std::condition_variable condition;
FileDownloadStatus status = NONE;
};
DiskCacheWrapper::DiskCacheWrapper(
std::shared_ptr<IDisk> delegate_, std::shared_ptr<DiskLocal> cache_disk_, std::function<bool(const String &)> cache_file_predicate_)
: DiskDecorator(delegate_), cache_disk(cache_disk_), cache_file_predicate(cache_file_predicate_)
{
}
std::shared_ptr<FileDownloadMetadata> DiskCacheWrapper::acquireDownloadMetadata(const String & path) const
{
std::lock_guard lock{mutex};
auto it = file_downloads.find(path);
if (it != file_downloads.end())
if (auto x = it->second.lock())
return x;
std::shared_ptr<FileDownloadMetadata> metadata(
new FileDownloadMetadata,
[this, path] (FileDownloadMetadata * p)
{
std::lock_guard erase_lock{mutex};
file_downloads.erase(path);
delete p;
});
file_downloads.emplace(path, metadata);
return metadata;
}
std::unique_ptr<ReadBufferFromFileBase>
DiskCacheWrapper::readFile(
const String & path,
const ReadSettings & settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const
{
if (!cache_file_predicate(path))
return DiskDecorator::readFile(path, settings, read_hint, file_size);
LOG_TEST(log, "Read file {} from cache", backQuote(path));
if (cache_disk->exists(path))
return cache_disk->readFile(path, settings, read_hint, file_size);
auto metadata = acquireDownloadMetadata(path);
{
std::unique_lock<std::mutex> lock{mutex};
if (metadata->status == NONE)
{
/// This thread will responsible for file downloading to cache.
metadata->status = DOWNLOADING;
LOG_TEST(log, "File {} doesn't exist in cache. Will download it", backQuote(path));
}
else if (metadata->status == DOWNLOADING)
{
LOG_TEST(log, "Waiting for file {} download to cache", backQuote(path));
metadata->condition.wait(lock, [metadata] { return metadata->status == DOWNLOADED || metadata->status == ERROR; });
}
}
auto current_read_settings = settings;
/// Do not use RemoteFSReadMethod::threadpool for index and mark files.
/// Here it does not make sense since the files are small.
/// Note: enabling `threadpool` read requires to call setReadUntilEnd().
current_read_settings.remote_fs_method = RemoteFSReadMethod::read;
/// Disable data cache.
current_read_settings.enable_filesystem_cache = false;
if (metadata->status == DOWNLOADING)
{
FileDownloadStatus result_status = DOWNLOADED;
if (!cache_disk->exists(path))
{
try
{
auto dir_path = directoryPath(path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
auto tmp_path = path + ".tmp";
{
auto src_buffer = DiskDecorator::readFile(path, current_read_settings, read_hint, file_size);
WriteSettings write_settings;
write_settings.enable_filesystem_cache_on_write_operations = false;
auto dst_buffer = cache_disk->writeFile(tmp_path, settings.local_fs_buffer_size, WriteMode::Rewrite, write_settings);
copyData(*src_buffer, *dst_buffer);
}
cache_disk->moveFile(tmp_path, path);
LOG_TEST(log, "File {} downloaded to cache", backQuote(path));
}
catch (...)
{
tryLogCurrentException("DiskCache", "Failed to download file + " + backQuote(path) + " to cache");
result_status = ERROR;
}
}
/// Notify all waiters that file download is finished.
std::unique_lock<std::mutex> lock{mutex};
metadata->status = result_status;
lock.unlock();
metadata->condition.notify_all();
}
if (metadata->status == DOWNLOADED)
return cache_disk->readFile(path, settings, read_hint, file_size);
return DiskDecorator::readFile(path, current_read_settings, read_hint, file_size);
}
std::unique_ptr<WriteBufferFromFileBase>
DiskCacheWrapper::writeFile(const String & path, size_t buf_size, WriteMode mode, const WriteSettings & settings)
{
if (!cache_file_predicate(path))
return DiskDecorator::writeFile(path, buf_size, mode, settings);
WriteSettings current_settings = settings;
/// There are two different cache implementations. Disable second one if the first is enabled.
/// The first will soon be removed, this disabling is temporary.
current_settings.enable_filesystem_cache_on_write_operations = false;
LOG_TEST(log, "Write file {} to cache", backQuote(path));
auto dir_path = directoryPath(path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
return std::make_unique<WritingToCacheWriteBuffer>(
cache_disk->writeFile(path, buf_size, mode, current_settings),
[this, path]()
{
/// Copy file from cache to actual disk when cached buffer is finalized.
return cache_disk->readFile(path, ReadSettings(), /* read_hint= */ {}, /* file_size= */ {});
},
[this, path, buf_size, mode, current_settings]()
{
return DiskDecorator::writeFile(path, buf_size, mode, current_settings);
});
}
void DiskCacheWrapper::clearDirectory(const String & path)
{
if (cache_disk->exists(path))
cache_disk->clearDirectory(path);
DiskDecorator::clearDirectory(path);
}
void DiskCacheWrapper::moveDirectory(const String & from_path, const String & to_path)
{
if (cache_disk->exists(from_path))
{
/// Destination directory may not be empty if previous directory move attempt was failed.
if (cache_disk->exists(to_path) && cache_disk->isDirectory(to_path))
cache_disk->clearDirectory(to_path);
cache_disk->moveDirectory(from_path, to_path);
}
DiskDecorator::moveDirectory(from_path, to_path);
}
void DiskCacheWrapper::moveFile(const String & from_path, const String & to_path)
{
if (cache_disk->exists(from_path))
{
auto dir_path = directoryPath(to_path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
cache_disk->moveFile(from_path, to_path);
}
DiskDecorator::moveFile(from_path, to_path);
}
void DiskCacheWrapper::replaceFile(const String & from_path, const String & to_path)
{
if (cache_disk->exists(from_path))
{
auto dir_path = directoryPath(to_path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
cache_disk->replaceFile(from_path, to_path);
}
DiskDecorator::replaceFile(from_path, to_path);
}
void DiskCacheWrapper::removeFile(const String & path)
{
cache_disk->removeFileIfExists(path);
DiskDecorator::removeFile(path);
}
void DiskCacheWrapper::removeFileIfExists(const String & path)
{
cache_disk->removeFileIfExists(path);
DiskDecorator::removeFileIfExists(path);
}
void DiskCacheWrapper::removeDirectory(const String & path)
{
if (cache_disk->exists(path))
cache_disk->removeDirectory(path);
DiskDecorator::removeDirectory(path);
}
void DiskCacheWrapper::removeRecursive(const String & path)
{
if (cache_disk->exists(path))
cache_disk->removeRecursive(path);
DiskDecorator::removeRecursive(path);
}
void DiskCacheWrapper::removeSharedFile(const String & path, bool keep_s3)
{
if (cache_disk->exists(path))
cache_disk->removeSharedFile(path, keep_s3);
DiskDecorator::removeSharedFile(path, keep_s3);
}
void DiskCacheWrapper::removeSharedRecursive(const String & path, bool keep_all, const NameSet & files_to_keep)
{
if (cache_disk->exists(path))
cache_disk->removeSharedRecursive(path, keep_all, files_to_keep);
DiskDecorator::removeSharedRecursive(path, keep_all, files_to_keep);
}
void DiskCacheWrapper::removeSharedFiles(const RemoveBatchRequest & files, bool keep_all, const NameSet & files_to_keep)
{
for (const auto & file : files)
{
if (cache_disk->exists(file.path))
{
bool keep_file = keep_all || files_to_keep.contains(fs::path(file.path).filename());
cache_disk->removeSharedFile(file.path, keep_file);
}
}
DiskDecorator::removeSharedFiles(files, keep_all, files_to_keep);
}
void DiskCacheWrapper::createHardLink(const String & src_path, const String & dst_path)
{
/// Don't create hardlinks for cache files to shadow directory as it just waste cache disk space.
if (cache_disk->exists(src_path) && !dst_path.starts_with("shadow/"))
{
auto dir_path = directoryPath(dst_path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
cache_disk->createHardLink(src_path, dst_path);
}
DiskDecorator::createHardLink(src_path, dst_path);
}
void DiskCacheWrapper::createDirectory(const String & path)
{
cache_disk->createDirectory(path);
DiskDecorator::createDirectory(path);
}
void DiskCacheWrapper::createDirectories(const String & path)
{
cache_disk->createDirectories(path);
DiskDecorator::createDirectories(path);
}
ReservationPtr DiskCacheWrapper::reserve(UInt64 bytes)
{
auto ptr = DiskDecorator::reserve(bytes);
if (ptr)
{
auto disk_ptr = std::static_pointer_cast<DiskCacheWrapper>(shared_from_this());
return std::make_unique<ReservationDelegate>(std::move(ptr), disk_ptr);
}
return ptr;
}
}

View File

@ -1,71 +0,0 @@
#pragma once
#include <unordered_map>
#include <Common/logger_useful.h>
#include "DiskDecorator.h"
#include "DiskLocal.h"
namespace DB
{
struct FileDownloadMetadata;
/**
* Simple cache wrapper.
* Tries to cache files matched by predicate to given local disk (cache disk).
*
* When writeFile() is invoked wrapper firstly writes file to cache.
* After write buffer is finalized actual file is stored to underlying disk.
*
* When readFile() is invoked and file exists in cache wrapper reads this file from cache.
* If file doesn't exist wrapper downloads this file from underlying disk to cache.
* readFile() invocation is thread-safe.
*/
class DiskCacheWrapper : public DiskDecorator
{
public:
DiskCacheWrapper(
std::shared_ptr<IDisk> delegate_,
std::shared_ptr<DiskLocal> cache_disk_,
std::function<bool(const String &)> cache_file_predicate_);
void createDirectory(const String & path) override;
void createDirectories(const String & path) override;
void clearDirectory(const String & path) override;
void moveDirectory(const String & from_path, const String & to_path) override;
void moveFile(const String & from_path, const String & to_path) override;
void replaceFile(const String & from_path, const String & to_path) override;
std::unique_ptr<ReadBufferFromFileBase> readFile(
const String & path,
const ReadSettings & settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const override;
std::unique_ptr<WriteBufferFromFileBase> writeFile(const String & path, size_t buf_size, WriteMode mode, const WriteSettings &) override;
void removeFile(const String & path) override;
void removeFileIfExists(const String & path) override;
void removeDirectory(const String & path) override;
void removeRecursive(const String & path) override;
void removeSharedFile(const String & path, bool keep_s3) override;
void removeSharedRecursive(const String & path, bool keep_all, const NameSet & files_to_keep) override;
void removeSharedFiles(const RemoveBatchRequest & files, bool keep_all, const NameSet & files_to_keep) override;
void createHardLink(const String & src_path, const String & dst_path) override;
ReservationPtr reserve(UInt64 bytes) override;
private:
std::shared_ptr<FileDownloadMetadata> acquireDownloadMetadata(const String & path) const;
/// Disk to cache files.
std::shared_ptr<DiskLocal> cache_disk;
/// Cache only files satisfies predicate.
const std::function<bool(const String &)> cache_file_predicate;
/// Contains information about currently running file downloads to cache.
mutable std::unordered_map<String, std::weak_ptr<FileDownloadMetadata>> file_downloads;
/// Protects concurrent downloading files to cache.
mutable std::mutex mutex;
Poco::Logger * log = &Poco::Logger::get("DiskCache");
};
}

View File

@ -236,9 +236,9 @@ void DiskDecorator::applyNewSettings(const Poco::Util::AbstractConfiguration & c
delegate->applyNewSettings(config, context, config_prefix, map);
}
DiskObjectStoragePtr DiskDecorator::createDiskObjectStorage(const String & name)
DiskObjectStoragePtr DiskDecorator::createDiskObjectStorage()
{
return delegate->createDiskObjectStorage(name);
return delegate->createDiskObjectStorage();
}
}

View File

@ -83,13 +83,14 @@ public:
void applyNewSettings(const Poco::Util::AbstractConfiguration & config, ContextPtr context, const String & config_prefix, const DisksMap & map) override;
bool supportsCache() const override { return delegate->supportsCache(); }
String getCacheBasePath() const override { return delegate->getCacheBasePath(); }
const String & getCacheBasePath() const override { return delegate->getCacheBasePath(); }
StoredObjects getStorageObjects(const String & path) const override { return delegate->getStorageObjects(path); }
DiskObjectStoragePtr createDiskObjectStorage(const String &) override;
void getRemotePathsRecursive(const String & path, std::vector<LocalPathWithObjectStoragePaths> & paths_map) override { return delegate->getRemotePathsRecursive(path, paths_map); }
DiskObjectStoragePtr createDiskObjectStorage() override;
NameSet getCacheLayersNames() const override { return delegate->getCacheLayersNames(); }
MetadataStoragePtr getMetadataStorage() override { return delegate->getMetadataStorage(); }
std::unordered_map<String, String> getSerializedMetadata(const std::vector<String> & file_paths) const override { return delegate->getSerializedMetadata(file_paths); }
@ -97,6 +98,7 @@ public:
UInt32 getRefCount(const String & path) const override { return delegate->getRefCount(path); }
void syncRevision(UInt64 revision) override { delegate->syncRevision(revision); }
UInt64 getRevision() const override { return delegate->getRevision(); }
bool supportsStat() const override { return delegate->supportsStat(); }

View File

@ -8,6 +8,9 @@
#include <Common/quoteString.h>
#include <Common/atomicRename.h>
#include <Disks/IO/createReadBufferFromFileBase.h>
#include <Disks/ObjectStorages/LocalObjectStorage.h>
#include <Disks/ObjectStorages/DiskObjectStorage.h>
#include <Disks/ObjectStorages/FakeMetadataStorageFromDisk.h>
#include <fstream>
#include <unistd.h>
@ -598,6 +601,26 @@ catch (...)
return false;
}
DiskObjectStoragePtr DiskLocal::createDiskObjectStorage()
{
auto object_storage = std::make_shared<LocalObjectStorage>();
auto metadata_storage = std::make_shared<FakeMetadataStorageFromDisk>(
/* metadata_storage */std::static_pointer_cast<DiskLocal>(shared_from_this()),
object_storage,
/* object_storage_root_path */getPath());
return std::make_shared<DiskObjectStorage>(
getName(),
disk_path,
"Local",
metadata_storage,
object_storage,
DiskType::Local,
false,
/* threadpool_size */16
);
}
bool DiskLocal::setup()
{
try

View File

@ -122,6 +122,8 @@ public:
bool canRead() const noexcept;
bool canWrite() const noexcept;
DiskObjectStoragePtr createDiskObjectStorage() override;
bool supportsStat() const override { return true; }
struct stat stat(const String & path) const override;

View File

@ -312,7 +312,7 @@ bool DiskRestartProxy::checkUniqueId(const String & id) const
return DiskDecorator::checkUniqueId(id);
}
String DiskRestartProxy::getCacheBasePath() const
const String & DiskRestartProxy::getCacheBasePath() const
{
ReadLock lock (mutex);
return DiskDecorator::getCacheBasePath();

View File

@ -64,7 +64,8 @@ public:
void truncateFile(const String & path, size_t size) override;
String getUniqueId(const String & path) const override;
bool checkUniqueId(const String & id) const override;
String getCacheBasePath() const override;
const String & getCacheBasePath() const override;
StoredObjects getStorageObjects(const String & path) const override;
void getRemotePathsRecursive(const String & path, std::vector<LocalPathWithObjectStoragePaths> & paths_map) override;

View File

@ -16,9 +16,18 @@ namespace ErrorCodes
{
extern const int EXCESSIVE_ELEMENT_IN_CONFIG;
extern const int UNKNOWN_DISK;
extern const int LOGICAL_ERROR;
}
DiskSelector::DiskSelector(const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context)
void DiskSelector::assertInitialized() const
{
if (!is_initialized)
throw Exception(ErrorCodes::LOGICAL_ERROR, "DiskSelector not initialized");
}
void DiskSelector::initialize(const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context)
{
Poco::Util::AbstractConfiguration::Keys keys;
config.keys(config_prefix, keys);
@ -46,12 +55,16 @@ DiskSelector::DiskSelector(const Poco::Util::AbstractConfiguration & config, con
std::make_shared<DiskLocal>(
default_disk_name, context->getPath(), 0, context, config.getUInt("local_disk_check_period_ms", 0)));
}
is_initialized = true;
}
DiskSelectorPtr DiskSelector::updateFromConfig(
const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context) const
{
assertInitialized();
Poco::Util::AbstractConfiguration::Keys keys;
config.keys(config_prefix, keys);
@ -110,10 +123,33 @@ DiskSelectorPtr DiskSelector::updateFromConfig(
DiskPtr DiskSelector::get(const String & name) const
{
assertInitialized();
auto it = disks.find(name);
if (it == disks.end())
throw Exception("Unknown disk " + name, ErrorCodes::UNKNOWN_DISK);
return it->second;
}
const DisksMap & DiskSelector::getDisksMap() const
{
assertInitialized();
return disks;
}
void DiskSelector::addToDiskMap(const String & name, DiskPtr disk)
{
assertInitialized();
disks.emplace(name, disk);
}
void DiskSelector::shutdown()
{
assertInitialized();
for (auto & e : disks)
e.second->shutdown();
}
}

View File

@ -18,9 +18,11 @@ using DiskSelectorPtr = std::shared_ptr<const DiskSelector>;
class DiskSelector
{
public:
DiskSelector(const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context);
DiskSelector() = default;
DiskSelector(const DiskSelector & from) = default;
void initialize(const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context);
DiskSelectorPtr updateFromConfig(
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
@ -31,20 +33,17 @@ public:
DiskPtr get(const String & name) const;
/// Get all disks with names
const DisksMap & getDisksMap() const { return disks; }
void addToDiskMap(const String & name, DiskPtr disk)
{
disks.emplace(name, disk);
}
const DisksMap & getDisksMap() const;
void shutdown()
{
for (auto & e : disks)
e.second->shutdown();
}
void addToDiskMap(const String & name, DiskPtr disk);
void shutdown();
private:
DisksMap disks;
bool is_initialized = false;
void assertInitialized() const;
};
}

View File

@ -14,6 +14,7 @@ enum class DiskType
Encrypted,
WebServer,
AzureBlobStorage,
Cache,
};
inline String toString(DiskType disk_type)
@ -34,6 +35,8 @@ inline String toString(DiskType disk_type)
return "web";
case DiskType::AzureBlobStorage:
return "azure_blob_storage";
case DiskType::Cache:
return "cache";
}
__builtin_unreachable();
}

View File

@ -1,291 +0,0 @@
#include "DiskWebServer.h"
#include <Common/logger_useful.h>
#include <Common/escapeForFileName.h>
#include <IO/ConnectionTimeoutsContext.h>
#include <IO/ReadWriteBufferFromHTTP.h>
#include <IO/SeekAvoidingReadBuffer.h>
#include <Disks/IO/ReadBufferFromWebServer.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <IO/ReadBufferFromFile.h>
#include <Disks/IO/AsynchronousReadIndirectBufferFromRemoteFS.h>
#include <Disks/IO/ReadIndirectBufferFromRemoteFS.h>
#include <Disks/IO/WriteIndirectBufferFromRemoteFS.h>
#include <Disks/IO/ReadBufferFromRemoteFSGather.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Poco/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int FILE_DOESNT_EXIST;
extern const int DIRECTORY_DOESNT_EXIST;
extern const int NETWORK_ERROR;
}
void DiskWebServer::initialize(const String & uri_path) const
{
std::vector<String> directories_to_load;
LOG_TRACE(log, "Loading metadata for directory: {}", uri_path);
try
{
Poco::Net::HTTPBasicCredentials credentials{};
ReadWriteBufferFromHTTP metadata_buf(Poco::URI(fs::path(uri_path) / ".index"),
Poco::Net::HTTPRequest::HTTP_GET,
ReadWriteBufferFromHTTP::OutStreamCallback(),
ConnectionTimeouts::getHTTPTimeouts(getContext()),
credentials);
String file_name;
FileData file_data{};
String dir_name = fs::path(uri_path.substr(url.size())) / "";
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Adding directory: {}", dir_name);
while (!metadata_buf.eof())
{
readText(file_name, metadata_buf);
assertChar('\t', metadata_buf);
bool is_directory;
readBoolText(is_directory, metadata_buf);
if (!is_directory)
{
assertChar('\t', metadata_buf);
readIntText(file_data.size, metadata_buf);
}
assertChar('\n', metadata_buf);
file_data.type = is_directory ? FileType::Directory : FileType::File;
String file_path = fs::path(uri_path) / file_name;
if (file_data.type == FileType::Directory)
{
directories_to_load.push_back(file_path);
}
file_path = file_path.substr(url.size());
files.emplace(std::make_pair(file_path, file_data));
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Adding file: {}, size: {}", file_path, file_data.size);
}
files.emplace(std::make_pair(dir_name, FileData({ .type = FileType::Directory })));
}
catch (Exception & e)
{
e.addMessage("while loading disk metadata");
throw;
}
for (const auto & directory_path : directories_to_load)
initialize(directory_path);
}
class DiskWebServerDirectoryIterator final : public IDirectoryIterator
{
public:
explicit DiskWebServerDirectoryIterator(std::vector<fs::path> && dir_file_paths_)
: dir_file_paths(std::move(dir_file_paths_)), iter(dir_file_paths.begin()) {}
void next() override { ++iter; }
bool isValid() const override { return iter != dir_file_paths.end(); }
String path() const override { return iter->string(); }
String name() const override { return iter->filename(); }
private:
std::vector<fs::path> dir_file_paths;
std::vector<fs::path>::iterator iter;
};
DiskWebServer::DiskWebServer(
const String & disk_name_,
const String & url_,
ContextPtr context_,
size_t min_bytes_for_seek_)
: WithContext(context_->getGlobalContext())
, log(&Poco::Logger::get("DiskWeb"))
, url(url_)
, name(disk_name_)
, min_bytes_for_seek(min_bytes_for_seek_)
{
}
bool DiskWebServer::exists(const String & path) const
{
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Checking existence of path: {}", path);
if (files.find(path) != files.end())
return true;
if (path.ends_with(MergeTreeData::FORMAT_VERSION_FILE_NAME) && files.find(fs::path(path).parent_path() / "") == files.end())
{
try
{
initialize(fs::path(url) / fs::path(path).parent_path());
return files.find(path) != files.end();
}
catch (...)
{
const auto message = getCurrentExceptionMessage(false);
bool can_throw = CurrentThread::isInitialized() && CurrentThread::get().getQueryContext();
if (can_throw)
throw Exception(ErrorCodes::NETWORK_ERROR, "Cannot load disk metadata. Error: {}", message);
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Cannot load disk metadata. Error: {}", message);
return false;
}
}
return false;
}
std::unique_ptr<ReadBufferFromFileBase> DiskWebServer::readFile(const String & path, const ReadSettings & read_settings, std::optional<size_t>, std::optional<size_t>) const
{
LOG_TRACE(log, "Read from path: {}", path);
auto iter = files.find(path);
if (iter == files.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "File path {} does not exist", path);
auto fs_path = fs::path(url) / path;
auto remote_path = fs_path.parent_path() / (escapeForFileName(fs_path.stem()) + fs_path.extension().string());
remote_path = remote_path.string().substr(url.size());
StoredObjects objects;
objects.emplace_back(remote_path, iter->second.size);
auto read_buffer_creator =
[this, read_settings]
(const std::string & path_, size_t read_until_position) -> std::shared_ptr<ReadBufferFromFileBase>
{
return std::make_shared<ReadBufferFromWebServer>(
fs::path(url) / path_,
getContext(),
read_settings,
/* use_external_buffer */true,
read_until_position);
};
auto web_impl = std::make_unique<ReadBufferFromRemoteFSGather>(std::move(read_buffer_creator), objects, read_settings);
if (read_settings.remote_fs_method == RemoteFSReadMethod::threadpool)
{
auto reader = IObjectStorage::getThreadPoolReader();
return std::make_unique<AsynchronousReadIndirectBufferFromRemoteFS>(reader, read_settings, std::move(web_impl), min_bytes_for_seek);
}
else
{
auto buf = std::make_unique<ReadIndirectBufferFromRemoteFS>(std::move(web_impl));
return std::make_unique<SeekAvoidingReadBuffer>(std::move(buf), min_bytes_for_seek);
}
}
DirectoryIteratorPtr DiskWebServer::iterateDirectory(const String & path) const
{
std::vector<fs::path> dir_file_paths;
if (files.find(path) == files.end())
{
try
{
initialize(fs::path(url) / path);
}
catch (...)
{
const auto message = getCurrentExceptionMessage(false);
bool can_throw = CurrentThread::isInitialized() && CurrentThread::get().getQueryContext();
if (can_throw)
throw Exception(ErrorCodes::NETWORK_ERROR, "Cannot load disk metadata. Error: {}", message);
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Cannot load disk metadata. Error: {}", message);
return std::make_unique<DiskWebServerDirectoryIterator>(std::move(dir_file_paths));
}
}
if (files.find(path) == files.end())
throw Exception("Directory '" + path + "' does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
for (const auto & file : files)
if (parentPath(file.first) == path)
dir_file_paths.emplace_back(file.first);
LOG_TRACE(log, "Iterate directory {} with {} files", path, dir_file_paths.size());
return std::make_unique<DiskWebServerDirectoryIterator>(std::move(dir_file_paths));
}
size_t DiskWebServer::getFileSize(const String & path) const
{
auto iter = files.find(path);
if (iter == files.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "File path {} does not exist", path);
return iter->second.size;
}
bool DiskWebServer::isFile(const String & path) const
{
auto iter = files.find(path);
if (iter == files.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "File path {} does not exist", path);
return iter->second.type == FileType::File;
}
bool DiskWebServer::isDirectory(const String & path) const
{
auto iter = files.find(path);
if (iter == files.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "File path {} does not exist", path);
return iter->second.type == FileType::Directory;
}
void registerDiskWebServer(DiskFactory & factory)
{
auto creator = [](const String & disk_name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context,
const DisksMap & /*map*/) -> DiskPtr
{
String uri{config.getString(config_prefix + ".endpoint")};
if (!uri.ends_with('/'))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "URI must end with '/', but '{}' doesn't.", uri);
try
{
Poco::URI poco_uri(uri);
}
catch (const Poco::Exception & e)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Bad URI: `{}`. Error: {}", uri, e.what());
}
return std::make_shared<DiskWebServer>(disk_name, uri, context, config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024));
};
factory.registerDiskType("web", creator);
}
}

View File

@ -1,216 +0,0 @@
#pragma once
#include <IO/WriteBufferFromFile.h>
#include <Core/UUID.h>
#include <set>
#include <Interpreters/Context_fwd.h>
#include <Disks/IDisk.h>
#include <IO/ReadBufferFromFile.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
}
/*
* Quick ready test: ATTACH TABLE test_hits UUID '1ae36516-d62d-4218-9ae3-6516d62da218' ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS storage_policy='web';
*
* <storage_configuration>
* <disks>
* <web>
* <type>web</type>
* <endpoint>https://clickhouse-datasets.s3.amazonaws.com/disk-with-static-files-tests/test-hits/</endpoint>
* </web>
* </disks>
* <policies>
* <web>
* <volumes>
* <main>
* <disk>web</disk>
* </main>
* </volumes>
* </web>
* </policies>
* </storage_configuration>
*
* If query fails with `DB:Exception Unreachable URL` -- may help to adjust settings: http_connection_timeout, http_receive_timeout, keep_alive_timeout.
*
* To get files for upload run:
* clickhouse static-files-disk-uploader --metadata-path <path> --output-dir <dir>
* (--metadata-path can be found in query: `select data_paths from system.tables where name='<table_name>';`) /// NOLINT
*
* When loading files by <endpoint> they must be loaded into <endpoint>/store/ path, but config must conrain only <endpoint>.
*
* If url is not reachable on disk load when server is starting up tables, then all errors are caught.
* If in this case there were errors, tables can be reloaded (become visible) via detach table table_name -> attach table table_name.
* If metadata was successfully loaded at server startup, then tables are available straight away.
**/
class DiskWebServer : public IDisk, WithContext
{
public:
DiskWebServer(const String & disk_name_,
const String & url_,
ContextPtr context,
size_t min_bytes_for_seek_);
bool supportZeroCopyReplication() const override { return false; }
DiskType getType() const override { return DiskType::WebServer; }
bool isRemote() const override { return true; }
std::unique_ptr<ReadBufferFromFileBase> readFile(const String & path,
const ReadSettings & settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const override;
/// Disk info
const String & getName() const final override { return name; }
const String & getPath() const final override { return url; }
bool isReadOnly() const override { return true; }
UInt64 getTotalSpace() const final override { return std::numeric_limits<UInt64>::max(); }
UInt64 getAvailableSpace() const final override { return std::numeric_limits<UInt64>::max(); }
UInt64 getUnreservedSpace() const final override { return std::numeric_limits<UInt64>::max(); }
/// Read-only part
bool exists(const String & path) const override;
bool isFile(const String & path) const override;
size_t getFileSize(const String & path) const override;
void listFiles(const String & /* path */, std::vector<String> & /* file_names */) const override { }
void setReadOnly(const String & /* path */) override {}
bool isDirectory(const String & path) const override;
DirectoryIteratorPtr iterateDirectory(const String & /* path */) const override;
Poco::Timestamp getLastModified(const String &) const override { return Poco::Timestamp{}; }
time_t getLastChanged(const String &) const override { return {}; }
/// Write and modification part
std::unique_ptr<WriteBufferFromFileBase> writeFile(const String &, size_t, WriteMode, const WriteSettings &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void moveFile(const String &, const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void replaceFile(const String &, const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeFile(const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeFileIfExists(const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
ReservationPtr reserve(UInt64 /*bytes*/) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeRecursive(const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeSharedFile(const String &, bool) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeSharedFileIfExists(const String &, bool) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeSharedRecursive(const String &, bool, const NameSet &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void clearDirectory(const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void moveDirectory(const String &, const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void removeDirectory(const String &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
void setLastModified(const String &, const Poco::Timestamp &) override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Disk {} is read-only", getName());
}
StoredObjects getStorageObjects(const String &) const override { return {}; }
void getRemotePathsRecursive(const String &, std::vector<LocalPathWithObjectStoragePaths> &) override {}
/// Create part
void createFile(const String &) final override {}
void createDirectory(const String &) override {}
void createDirectories(const String &) override {}
void createHardLink(const String &, const String &) override {}
private:
void initialize(const String & uri_path) const;
enum class FileType
{
File,
Directory
};
struct FileData
{
FileType type{};
size_t size = 0;
};
using Files = std::unordered_map<String, FileData>; /// file path -> file data
mutable Files files;
Poco::Logger * log;
String url;
String name;
size_t min_bytes_for_seek;
};
}

View File

@ -102,8 +102,8 @@ class IDisk : public Space
{
public:
/// Default constructor.
explicit IDisk(std::unique_ptr<Executor> executor_ = std::make_unique<SyncExecutor>())
: executor(std::move(executor_))
explicit IDisk(std::shared_ptr<Executor> executor_ = std::make_shared<SyncExecutor>())
: executor(executor_)
{
}
@ -193,6 +193,7 @@ public:
const WriteSettings & settings = {}) = 0;
/// Remove file. Throws exception if file doesn't exists or it's a directory.
/// Return whether file was finally removed. (For remote disks it is not always removed).
virtual void removeFile(const String & path) = 0;
/// Remove file if it exists.
@ -220,10 +221,15 @@ public:
/// Second bool param is a flag to remove (true) or keep (false) shared data on S3
virtual void removeSharedFileIfExists(const String & path, bool /* keep_shared_data */) { removeFileIfExists(path); }
virtual String getCacheBasePath() const { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "There is no cache path"); }
virtual const String & getCacheBasePath() const { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "There is no cache path"); }
virtual bool supportsCache() const { return false; }
virtual NameSet getCacheLayersNames() const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method `getCacheLayersNames()` is not implemented for disk: {}", getType());
}
/// Returns a list of storage objects (contains path, size, ...).
/// (A list is returned because for Log family engines there might
/// be multiple files in remote fs for single clickhouse file.
@ -343,7 +349,10 @@ public:
/// Return current disk revision.
virtual UInt64 getRevision() const { return 0; }
virtual DiskObjectStoragePtr createDiskObjectStorage(const String &)
/// Create disk object storage according to disk type.
/// For example for DiskLocal create DiskObjectStorage(LocalObjectStorage),
/// for DiskObjectStorage create just a copy.
virtual DiskObjectStoragePtr createDiskObjectStorage()
{
throw Exception(
ErrorCodes::NOT_IMPLEMENTED,
@ -369,7 +378,7 @@ protected:
void copyThroughBuffers(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path, bool copy_root_dir = true);
private:
std::unique_ptr<Executor> executor;
std::shared_ptr<Executor> executor;
};
using Disks = std::vector<DiskPtr>;

View File

@ -53,6 +53,8 @@ public:
size_t getFileSize() override;
bool isIntegratedWithFilesystemCache() const override { return true; }
private:
bool nextImpl() override;

View File

@ -1,4 +1,4 @@
#include "CachedReadBufferFromRemoteFS.h"
#include "CachedOnDiskReadBufferFromFile.h"
#include <Disks/IO/createReadBufferFromFileBase.h>
#include <IO/ReadBufferFromFile.h>
@ -31,64 +31,73 @@ namespace DB
namespace ErrorCodes
{
extern const int CANNOT_SEEK_THROUGH_FILE;
extern const int CANNOT_USE_CACHE;
extern const int LOGICAL_ERROR;
extern const int ARGUMENT_OUT_OF_BOUND;
}
CachedReadBufferFromRemoteFS::CachedReadBufferFromRemoteFS(
const String & remote_fs_object_path_,
CachedOnDiskReadBufferFromFile::CachedOnDiskReadBufferFromFile(
const String & source_file_path_,
const FileCache::Key & cache_key_,
FileCachePtr cache_,
RemoteFSFileReaderCreator remote_file_reader_creator_,
ImplementationBufferCreator implementation_buffer_creator_,
const ReadSettings & settings_,
const String & query_id_,
size_t read_until_position_)
: SeekableReadBuffer(nullptr, 0)
size_t file_size_,
bool allow_seeks_after_first_read_,
bool use_external_buffer_,
std::optional<size_t> read_until_position_)
: ReadBufferFromFileBase(settings_.remote_fs_buffer_size, nullptr, 0, file_size_)
#ifndef NDEBUG
, log(&Poco::Logger::get("CachedReadBufferFromRemoteFS(" + remote_fs_object_path_ + ")"))
, log(&Poco::Logger::get("CachedOnDiskReadBufferFromFile(" + source_file_path_ + ")"))
#else
, log(&Poco::Logger::get("CachedReadBufferFromRemoteFS"))
, log(&Poco::Logger::get("CachedOnDiskReadBufferFromFile"))
#endif
, cache_key(cache_->hash(remote_fs_object_path_))
, remote_fs_object_path(remote_fs_object_path_)
, cache_key(cache_key_)
, source_file_path(source_file_path_)
, cache(cache_)
, settings(settings_)
, read_until_position(read_until_position_)
, remote_file_reader_creator(remote_file_reader_creator_)
, read_until_position(read_until_position_ ? *read_until_position_ : file_size_)
, implementation_buffer_creator(implementation_buffer_creator_)
, query_id(query_id_)
, enable_logging(!query_id.empty() && settings_.enable_filesystem_cache_log)
, current_buffer_id(getRandomASCIIString(8))
, allow_seeks_after_first_read(allow_seeks_after_first_read_)
, use_external_buffer(use_external_buffer_)
, query_context_holder(cache_->getQueryContextHolder(query_id, settings_))
, is_persistent(false) /// Unused for now, see PR 36171
, is_persistent(settings_.is_file_cache_persistent)
{
}
void CachedReadBufferFromRemoteFS::appendFilesystemCacheLog(
const FileSegment::Range & file_segment_range, CachedReadBufferFromRemoteFS::ReadType type)
void CachedOnDiskReadBufferFromFile::appendFilesystemCacheLog(
const FileSegment::Range & file_segment_range, CachedOnDiskReadBufferFromFile::ReadType type)
{
FilesystemCacheLogElement elem
{
.event_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()),
.query_id = query_id,
.source_file_path = remote_fs_object_path,
.source_file_path = source_file_path,
.file_segment_range = { file_segment_range.left, file_segment_range.right },
.requested_range = { first_offset, read_until_position },
.file_segment_size = file_segment_range.size(),
.cache_attempted = true,
.read_buffer_id = current_buffer_id,
.profile_counters = std::make_shared<ProfileEvents::Counters::Snapshot>(current_file_segment_counters.getPartiallyAtomicSnapshot()),
.profile_counters = std::make_shared<ProfileEvents::Counters::Snapshot>(
current_file_segment_counters.getPartiallyAtomicSnapshot()),
};
current_file_segment_counters.reset();
switch (type)
{
case CachedReadBufferFromRemoteFS::ReadType::CACHED:
elem.read_type = FilesystemCacheLogElement::ReadType::READ_FROM_CACHE;
case CachedOnDiskReadBufferFromFile::ReadType::CACHED:
elem.cache_type = FilesystemCacheLogElement::CacheType::READ_FROM_CACHE;
break;
case CachedReadBufferFromRemoteFS::ReadType::REMOTE_FS_READ_BYPASS_CACHE:
elem.read_type = FilesystemCacheLogElement::ReadType::READ_FROM_FS_BYPASSING_CACHE;
case CachedOnDiskReadBufferFromFile::ReadType::REMOTE_FS_READ_BYPASS_CACHE:
elem.cache_type = FilesystemCacheLogElement::CacheType::READ_FROM_FS_BYPASSING_CACHE;
break;
case CachedReadBufferFromRemoteFS::ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE:
elem.read_type = FilesystemCacheLogElement::ReadType::READ_FROM_FS_AND_DOWNLOADED_TO_CACHE;
case CachedOnDiskReadBufferFromFile::ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE:
elem.cache_type = FilesystemCacheLogElement::CacheType::READ_FROM_FS_AND_DOWNLOADED_TO_CACHE;
break;
}
@ -96,8 +105,13 @@ void CachedReadBufferFromRemoteFS::appendFilesystemCacheLog(
cache_log->add(elem);
}
void CachedReadBufferFromRemoteFS::initialize(size_t offset, size_t size)
void CachedOnDiskReadBufferFromFile::initialize(size_t offset, size_t size)
{
if (initialized)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Caching buffer already initialized");
implementation_buffer.reset();
if (settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache)
{
file_segments_holder.emplace(cache->get(cache_key, offset, size));
@ -114,13 +128,18 @@ void CachedReadBufferFromRemoteFS::initialize(size_t offset, size_t size)
if (file_segments_holder->file_segments.empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "List of file segments cannot be empty");
LOG_TEST(log, "Having {} file segments to read", file_segments_holder->file_segments.size());
LOG_TEST(
log,
"Having {} file segments to read: {}, current offset: {}",
file_segments_holder->file_segments.size(), file_segments_holder->toString(), file_offset_of_buffer_end);
current_file_segment_it = file_segments_holder->file_segments.begin();
initialized = true;
}
SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getCacheReadBuffer(size_t offset) const
CachedOnDiskReadBufferFromFile::ImplementationBufferPtr
CachedOnDiskReadBufferFromFile::getCacheReadBuffer(size_t offset) const
{
auto path = cache->getPathInLocalCache(cache_key, offset, is_persistent);
@ -129,14 +148,15 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getCacheReadBuffer(size_t of
local_read_settings.local_fs_method = LocalFSReadMethod::pread;
auto buf = createReadBufferFromFileBase(path, local_read_settings);
auto * from_fd = dynamic_cast<ReadBufferFromFileDescriptor*>(buf.get());
if (from_fd && from_fd->getFileSize() == 0)
if (getFileSizeFromReadBuffer(*buf) == 0)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to read from an empty cache file: {}", path);
return buf;
}
SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getRemoteFSReadBuffer(FileSegmentPtr & file_segment, ReadType read_type_)
CachedOnDiskReadBufferFromFile::ImplementationBufferPtr
CachedOnDiskReadBufferFromFile::getRemoteFSReadBuffer(FileSegmentPtr & file_segment, ReadType read_type_)
{
switch (read_type_)
{
@ -160,11 +180,17 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getRemoteFSReadBuffer(FileSe
auto remote_fs_segment_reader = file_segment->getRemoteFileReader();
if (remote_fs_segment_reader)
return remote_fs_segment_reader;
if (!remote_fs_segment_reader)
{
remote_fs_segment_reader = implementation_buffer_creator();
remote_fs_segment_reader = remote_file_reader_creator();
file_segment->setRemoteFileReader(remote_fs_segment_reader);
if (!remote_fs_segment_reader->supportsRightBoundedReads())
throw Exception(
ErrorCodes::CANNOT_USE_CACHE,
"Cache cannot be used with a ReadBuffer which does not support right bounded reads");
file_segment->setRemoteFileReader(remote_fs_segment_reader);
}
return remote_fs_segment_reader;
}
@ -175,15 +201,24 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getRemoteFSReadBuffer(FileSe
if (remote_file_reader && remote_file_reader->getFileOffsetOfBufferEnd() == file_offset_of_buffer_end)
return remote_file_reader;
remote_file_reader = remote_file_reader_creator();
auto remote_fs_segment_reader = file_segment->extractRemoteFileReader();
if (remote_fs_segment_reader)
remote_file_reader = remote_fs_segment_reader;
else
remote_file_reader = implementation_buffer_creator();
return remote_file_reader;
}
default:
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot use remote filesystem reader with read type: {}", toString(read_type));
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Cannot use remote filesystem reader with read type: {}",
toString(read_type));
}
}
SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(FileSegmentPtr & file_segment)
CachedOnDiskReadBufferFromFile::ImplementationBufferPtr
CachedOnDiskReadBufferFromFile::getReadBufferForFileSegment(FileSegmentPtr & file_segment)
{
auto range = file_segment->range();
@ -191,6 +226,7 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
size_t wait_download_tries = 0;
auto download_state = file_segment->state();
LOG_TEST(log, "getReadBufferForFileSegment: {}", file_segment->getInfoForLog());
if (settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache)
{
@ -201,6 +237,7 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
}
else
{
LOG_DEBUG(log, "Bypassing cache because `read_from_filesystem_cache_if_exists_otherwise_bypass_cache` option is used");
read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
return getRemoteFSReadBuffer(file_segment, read_type);
}
@ -212,6 +249,7 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
{
case FileSegment::State::SKIP_CACHE:
{
LOG_DEBUG(log, "Bypassing cache because file segment state is `SKIP_CACHE`");
read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
return getRemoteFSReadBuffer(file_segment, read_type);
}
@ -220,7 +258,8 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
size_t download_offset = file_segment->getDownloadOffset();
bool can_start_from_cache = download_offset > file_offset_of_buffer_end;
/// If file segment is being downloaded but we can already read from already downloaded part, do that.
/// If file segment is being downloaded but we can already read
/// from already downloaded part, do that.
if (can_start_from_cache)
{
/// segment{k} state: DOWNLOADING
@ -241,6 +280,7 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
}
else
{
LOG_DEBUG(log, "Retries to wait for file segment download exceeded ({})", wait_download_tries);
download_state = FileSegment::State::SKIP_CACHE;
}
@ -274,7 +314,10 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
size_t download_offset = file_segment->getDownloadOffset();
bool can_start_from_cache = download_offset > file_offset_of_buffer_end;
LOG_TEST(log, "Current download offset: {}, file offset of buffer end: {}", download_offset, file_offset_of_buffer_end);
LOG_TEST(
log,
"Current download offset: {}, file offset of buffer end: {}",
download_offset, file_offset_of_buffer_end);
if (can_start_from_cache)
{
@ -329,6 +372,9 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
}
else
{
LOG_DEBUG(
log,
"Bypassing cache because file segment state is `PARTIALLY_DOWNLOADED_NO_CONTINUATION` and downloaded part already used");
read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
return getRemoteFSReadBuffer(file_segment, read_type);
}
@ -337,7 +383,8 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getReadBufferForFileSegment(
}
}
SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getImplementationBuffer(FileSegmentPtr & file_segment)
CachedOnDiskReadBufferFromFile::ImplementationBufferPtr
CachedOnDiskReadBufferFromFile::getImplementationBuffer(FileSegmentPtr & file_segment)
{
assert(!file_segment->isDownloader());
assert(file_offset_of_buffer_end >= file_segment->range().left);
@ -350,7 +397,8 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getImplementationBuffer(File
auto read_buffer_for_file_segment = getReadBufferForFileSegment(file_segment);
watch.stop();
current_file_segment_counters.increment(ProfileEvents::FileSegmentWaitReadBufferMicroseconds, watch.elapsedMicroseconds());
current_file_segment_counters.increment(
ProfileEvents::FileSegmentWaitReadBufferMicroseconds, watch.elapsedMicroseconds());
[[maybe_unused]] auto download_current_segment = read_type == ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE;
assert(download_current_segment == file_segment->isDownloader());
@ -372,14 +420,13 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getImplementationBuffer(File
case ReadType::CACHED:
{
#ifndef NDEBUG
auto * file_reader = dynamic_cast<ReadBufferFromFileDescriptor *>(read_buffer_for_file_segment.get());
size_t file_size = file_reader->getFileSize();
size_t file_size = getFileSizeFromReadBuffer(*read_buffer_for_file_segment);
if (file_size == 0 || range.left + file_size <= file_offset_of_buffer_end)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Unexpected state of cache file. Cache file size: {}, cache file offset: {}, "
"expected file size to be non-zero and file downloaded size to exceed current file read offset (expected: {} > {})",
"expected file size to be non-zero and file downloaded size to exceed "
"current file read offset (expected: {} > {})",
file_size,
range.left,
range.left + file_size,
@ -416,22 +463,22 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getImplementationBuffer(File
else
{
read_buffer_for_file_segment->seek(file_offset_of_buffer_end, SEEK_SET);
assert(static_cast<size_t>(read_buffer_for_file_segment->getPosition()) == file_offset_of_buffer_end);
assert(static_cast<size_t>(read_buffer_for_file_segment->getFileOffsetOfBufferEnd()) == file_offset_of_buffer_end);
}
auto download_offset = file_segment->getDownloadOffset();
if (download_offset != static_cast<size_t>(read_buffer_for_file_segment->getPosition()))
{
auto impl_range = read_buffer_for_file_segment->getRemainingReadRange();
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Buffer's offsets mismatch; cached buffer offset: {}, download_offset: {}, position: {}, implementation buffer offset: "
"{}, "
"implementation buffer reading until: {}, file segment info: {}",
"Buffer's offsets mismatch; cached buffer offset: {}, download_offset: {}, "
"position: {}, implementation buffer remaining read range: {}, file segment info: {}",
file_offset_of_buffer_end,
download_offset,
read_buffer_for_file_segment->getPosition(),
impl_range.left,
*impl_range.right,
read_buffer_for_file_segment->getRemainingReadRange().toString(),
file_segment->getInfoForLog());
}
@ -442,7 +489,7 @@ SeekableReadBufferPtr CachedReadBufferFromRemoteFS::getImplementationBuffer(File
return read_buffer_for_file_segment;
}
bool CachedReadBufferFromRemoteFS::completeFileSegmentAndGetNext()
bool CachedOnDiskReadBufferFromFile::completeFileSegmentAndGetNext()
{
LOG_TEST(log, "Completed segment: {}", (*current_file_segment_it)->range().toString());
@ -481,7 +528,7 @@ bool CachedReadBufferFromRemoteFS::completeFileSegmentAndGetNext()
return true;
}
CachedReadBufferFromRemoteFS::~CachedReadBufferFromRemoteFS()
CachedOnDiskReadBufferFromFile::~CachedOnDiskReadBufferFromFile()
{
if (enable_logging
&& file_segments_holder
@ -491,12 +538,13 @@ CachedReadBufferFromRemoteFS::~CachedReadBufferFromRemoteFS()
}
}
void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
void CachedOnDiskReadBufferFromFile::predownload(FileSegmentPtr & file_segment)
{
Stopwatch predownload_watch(CLOCK_MONOTONIC);
SCOPE_EXIT({
predownload_watch.stop();
current_file_segment_counters.increment(ProfileEvents::FileSegmentPredownloadMicroseconds, predownload_watch.elapsedMicroseconds());
current_file_segment_counters.increment(
ProfileEvents::FileSegmentPredownloadMicroseconds, predownload_watch.elapsedMicroseconds());
});
if (bytes_to_predownload)
@ -532,8 +580,8 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
if (bytes_to_predownload)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Failed to predownload remaining {} bytes. Current file segment: {}, current download offset: {}, expected: {}, "
"eof: {}",
"Failed to predownload remaining {} bytes. Current file segment: {}, "
"current download offset: {}, expected: {}, eof: {}",
bytes_to_predownload,
current_range.toString(),
file_segment->getDownloadOffset(),
@ -551,8 +599,8 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
|| download_offset != file_offset_of_buffer_end)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Buffer's offsets mismatch after predownloading; download offset: {}, cached buffer offset: {}, implementation "
"buffer offset: {}, "
"Buffer's offsets mismatch after predownloading; download offset: {}, "
"cached buffer offset: {}, implementation buffer offset: {}, "
"file segment info: {}",
download_offset,
file_offset_of_buffer_end,
@ -585,8 +633,9 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
}
else
{
LOG_TEST(log, "Bypassing cache because writeCache method failed");
read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
file_segment->complete(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
continue_predownload = false;
}
@ -610,8 +659,9 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
/// TODO: allow seek more than once with seek avoiding.
bytes_to_predownload = 0;
file_segment->complete(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
LOG_TEST(log, "Bypassing cache because space reservation failed");
read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
swap(*implementation_buffer);
@ -626,7 +676,8 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
LOG_TEST(
log,
"Predownload failed because of space limit. Will read from remote filesystem starting from offset: {}",
"Predownload failed because of space limit. "
"Will read from remote filesystem starting from offset: {}",
file_offset_of_buffer_end);
break;
@ -635,7 +686,7 @@ void CachedReadBufferFromRemoteFS::predownload(FileSegmentPtr & file_segment)
}
}
bool CachedReadBufferFromRemoteFS::updateImplementationBufferIfNeeded()
bool CachedOnDiskReadBufferFromFile::updateImplementationBufferIfNeeded()
{
auto & file_segment = *current_file_segment_it;
auto current_read_range = file_segment->range();
@ -661,8 +712,26 @@ bool CachedReadBufferFromRemoteFS::updateImplementationBufferIfNeeded()
/// ^
/// file_offset_of_buffer_end
auto download_offset = file_segment->getDownloadOffset();
if (download_offset == file_offset_of_buffer_end)
size_t download_offset = file_segment->getDownloadOffset();
bool cached_part_is_finished = download_offset == file_offset_of_buffer_end;
#ifndef NDEBUG
size_t cache_file_size = getFileSizeFromReadBuffer(*implementation_buffer);
size_t cache_file_read_offset = implementation_buffer->getFileOffsetOfBufferEnd();
size_t implementation_buffer_finished = cache_file_size == cache_file_read_offset;
if (cached_part_is_finished != implementation_buffer_finished)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Incorrect state of buffers. Current download offset: {}, file offset of buffer end: {}, "
"cache file size: {}, cache file offset: {}, file segment info: {}",
download_offset, file_offset_of_buffer_end, cache_file_size, cache_file_read_offset,
file_segment->getInfoForLog());
}
#endif
if (cached_part_is_finished)
{
/// TODO: makes sense to reuse local file reader if we return here with CACHED read type again?
implementation_buffer = getImplementationBuffer(*current_file_segment_it);
@ -670,8 +739,12 @@ bool CachedReadBufferFromRemoteFS::updateImplementationBufferIfNeeded()
return true;
}
else if (download_offset < file_offset_of_buffer_end)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR, "Expected {} >= {} ({})", download_offset, file_offset_of_buffer_end, getInfoForLog());
ErrorCodes::LOGICAL_ERROR,
"Expected {} >= {} ({})",
download_offset, file_offset_of_buffer_end, getInfoForLog());
}
}
if (read_type == ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE)
@ -695,7 +768,7 @@ bool CachedReadBufferFromRemoteFS::updateImplementationBufferIfNeeded()
return true;
}
bool CachedReadBufferFromRemoteFS::writeCache(char * data, size_t size, size_t offset, FileSegment & file_segment)
bool CachedOnDiskReadBufferFromFile::writeCache(char * data, size_t size, size_t offset, FileSegment & file_segment)
{
Stopwatch watch(CLOCK_MONOTONIC);
@ -723,7 +796,7 @@ bool CachedReadBufferFromRemoteFS::writeCache(char * data, size_t size, size_t o
return true;
}
bool CachedReadBufferFromRemoteFS::nextImpl()
bool CachedOnDiskReadBufferFromFile::nextImpl()
{
try
{
@ -736,12 +809,18 @@ bool CachedReadBufferFromRemoteFS::nextImpl()
}
}
bool CachedReadBufferFromRemoteFS::nextImplStep()
bool CachedOnDiskReadBufferFromFile::nextImplStep()
{
last_caller_id = FileSegment::getCallerId();
assertCorrectness();
if (file_offset_of_buffer_end == read_until_position)
{
LOG_TEST(log, "Read finished on offset {}", file_offset_of_buffer_end);
return false;
}
if (!initialized)
initialize(file_offset_of_buffer_end, getTotalSizeToRead());
@ -795,6 +874,7 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
}
assert(!internal_buffer.empty());
swap(*implementation_buffer);
auto & file_segment = *current_file_segment_it;
@ -802,11 +882,12 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
LOG_TEST(
log,
"Current segment: {}, downloader: {}, current count: {}, position: {}",
"Current segment: {}, downloader: {}, current count: {}, position: {}, read range: {}",
current_read_range.toString(),
file_segment->getDownloader(),
implementation_buffer->count(),
implementation_buffer->getPosition());
implementation_buffer->getPosition(),
implementation_buffer->getRemainingReadRange().toString());
assert(current_read_range.left <= file_offset_of_buffer_end);
assert(current_read_range.right >= file_offset_of_buffer_end);
@ -835,13 +916,23 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
if (!result)
{
#ifndef NDEBUG
if (auto * cache_file_reader = dynamic_cast<ReadBufferFromFileDescriptor *>(implementation_buffer.get()))
if (read_type == ReadType::CACHED)
{
auto cache_file_size = cache_file_reader->getFileSize();
size_t cache_file_size = getFileSizeFromReadBuffer(*implementation_buffer);
if (cache_file_size == 0)
{
throw Exception(
ErrorCodes::LOGICAL_ERROR, "Attempt to read from an empty cache file: {} (just before actual read)", cache_file_size);
ErrorCodes::LOGICAL_ERROR,
"Attempt to read from an empty cache file: {} (just before actual read)",
cache_file_size);
}
}
else
{
assert(file_offset_of_buffer_end == static_cast<size_t>(implementation_buffer->getFileOffsetOfBufferEnd()));
}
assert(!implementation_buffer->hasPendingData());
#endif
Stopwatch watch(CLOCK_MONOTONIC);
@ -854,6 +945,12 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
size = implementation_buffer->buffer().size();
LOG_TEST(
log,
"Read {} bytes, read type {}, position: {}, offset: {}, remaining read range: {}",
size, toString(read_type), implementation_buffer->getPosition(),
implementation_buffer->getFileOffsetOfBufferEnd(), implementation_buffer->getRemainingReadRange().toString());
if (read_type == ReadType::CACHED)
{
ProfileEvents::increment(ProfileEvents::CachedReadBufferReadFromCacheBytes, size);
@ -888,12 +985,13 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
else
{
assert(file_segment->state() == FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
LOG_TEST(log, "Bypassing cache because writeCache method failed");
}
}
else
{
LOG_DEBUG(log, "No space left in cache, will continue without cache download");
file_segment->complete(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION);
}
if (!success)
@ -903,20 +1001,22 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
}
}
/// - If last file segment was read from remote fs, then we read up to segment->range().right, but
/// the requested right boundary could be segment->range().left < requested_right_boundary < segment->range().right.
/// - If last file segment was read from remote fs, then we read up to segment->range().right,
/// but the requested right boundary could be
/// segment->range().left < requested_right_boundary < segment->range().right.
/// Therefore need to resize to a smaller size. And resize must be done after write into cache.
/// - If last file segment was read from local fs, then we could read more than file_segemnt->range().right, so resize is also needed.
/// - If last file segment was read from local fs, then we could read more than
/// file_segemnt->range().right, so resize is also needed.
if (std::next(current_file_segment_it) == file_segments_holder->file_segments.end())
{
size_t remaining_size_to_read = std::min(current_read_range.right, read_until_position - 1) - file_offset_of_buffer_end + 1;
size_t remaining_size_to_read
= std::min(current_read_range.right, read_until_position - 1) - file_offset_of_buffer_end + 1;
size = std::min(size, remaining_size_to_read);
assert(implementation_buffer->buffer().size() >= nextimpl_working_buffer_offset + size);
implementation_buffer->buffer().resize(nextimpl_working_buffer_offset + size);
}
file_offset_of_buffer_end += size;
}
swap(*implementation_buffer);
@ -931,8 +1031,9 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
LOG_TEST(
log,
"Key: {}. Returning with {} bytes, buffer position: {} (offset: {}, predownloaded: {}), "
"buffer available: {}, current range: {}, current offset: {}, file segment state: {}, download offset: {}, read_type: {}, "
"reading until position: {}, started with offset: {}, remaining ranges: {}",
"buffer available: {}, current range: {}, current offset: {}, file segment state: {}, "
"download offset: {}, read_type: {}, reading until position: {}, started with offset: {}, "
"remaining ranges: {}",
getHexUIntLowercase(cache_key),
working_buffer.size(),
getPosition(),
@ -950,39 +1051,114 @@ bool CachedReadBufferFromRemoteFS::nextImplStep()
if (size == 0 && file_offset_of_buffer_end < read_until_position)
{
std::optional<size_t> cache_file_size;
if (auto * cache_file_reader = dynamic_cast<ReadBufferFromFileDescriptor *>(implementation_buffer.get()))
cache_file_size = cache_file_reader->getFileSize();
size_t cache_file_size = getFileSizeFromReadBuffer(*implementation_buffer);
auto cache_file_path = getFileNameFromReadBuffer(*implementation_buffer);
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Having zero bytes, but range is not finished: file offset: {}, reading until: {}, read type: {}, cache file size: {}",
"Having zero bytes, but range is not finished: file offset: {}, starting offset: {}, "
"reading until: {}, read type: {}, cache file size: {}, cache file path: {}, "
"cache file offset: {}, current file segment: {}",
file_offset_of_buffer_end,
first_offset,
read_until_position,
toString(read_type),
cache_file_size ? std::to_string(*cache_file_size) : "None");
cache_file_size ? std::to_string(cache_file_size) : "None",
cache_file_path,
implementation_buffer->getFileOffsetOfBufferEnd(),
file_segment->getInfoForLog());
}
return result;
}
off_t CachedReadBufferFromRemoteFS::seek(off_t offset, int whence)
off_t CachedOnDiskReadBufferFromFile::seek(off_t offset, int whence)
{
if (initialized)
throw Exception(ErrorCodes::CANNOT_SEEK_THROUGH_FILE, "Seek is allowed only before first read attempt from the buffer");
if (initialized && !allow_seeks_after_first_read)
{
throw Exception(
ErrorCodes::CANNOT_SEEK_THROUGH_FILE,
"Seek is allowed only before first read attempt from the buffer");
}
if (whence != SEEK_SET)
size_t new_pos = offset;
if (allow_seeks_after_first_read)
{
if (whence != SEEK_SET && whence != SEEK_CUR)
{
throw Exception("Expected SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
}
if (whence == SEEK_CUR)
{
new_pos = file_offset_of_buffer_end - (working_buffer.end() - pos) + offset;
}
if (new_pos + (working_buffer.end() - pos) == file_offset_of_buffer_end)
return new_pos;
if (file_offset_of_buffer_end - working_buffer.size() <= new_pos && new_pos <= file_offset_of_buffer_end)
{
pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
assert(pos >= working_buffer.begin());
assert(pos <= working_buffer.end());
return new_pos;
}
}
else if (whence != SEEK_SET)
{
throw Exception(ErrorCodes::CANNOT_SEEK_THROUGH_FILE, "Only SEEK_SET allowed");
}
first_offset = offset;
file_offset_of_buffer_end = offset;
size_t size = getTotalSizeToRead();
initialize(offset, size);
first_offset = file_offset_of_buffer_end = new_pos;
resetWorkingBuffer();
return offset;
// if (file_segments_holder && current_file_segment_it != file_segments_holder->file_segments.end())
// {
// auto & file_segments = file_segments_holder->file_segments;
// LOG_TRACE(
// log,
// "Having {} file segments to read: {}, current offset: {}",
// file_segments_holder->file_segments.size(), file_segments_holder->toString(), file_offset_of_buffer_end);
// auto it = std::upper_bound(
// file_segments.begin(),
// file_segments.end(),
// new_pos,
// [](size_t pos, const FileSegmentPtr & file_segment) { return pos < file_segment->range().right; });
// if (it != file_segments.end())
// {
// if (it != file_segments.begin() && (*std::prev(it))->range().right == new_pos)
// current_file_segment_it = std::prev(it);
// else
// current_file_segment_it = it;
// [[maybe_unused]] const auto & file_segment = *current_file_segment_it;
// assert(file_offset_of_buffer_end <= file_segment->range().right);
// assert(file_offset_of_buffer_end >= file_segment->range().left);
// resetWorkingBuffer();
// swap(*implementation_buffer);
// implementation_buffer->seek(file_offset_of_buffer_end, SEEK_SET);
// swap(*implementation_buffer);
// LOG_TRACE(log, "Found suitable file segment: {}", file_segment->range().toString());
// LOG_TRACE(log, "seek2 Internal buffer size: {}", internal_buffer.size());
// return new_pos;
// }
// }
file_segments_holder.reset();
implementation_buffer.reset();
initialized = false;
return new_pos;
}
size_t CachedReadBufferFromRemoteFS::getTotalSizeToRead()
size_t CachedOnDiskReadBufferFromFile::getTotalSizeToRead()
{
/// Last position should be guaranteed to be set, as at least we always know file size.
if (!read_until_position)
@ -991,22 +1167,34 @@ size_t CachedReadBufferFromRemoteFS::getTotalSizeToRead()
/// On this level should be guaranteed that read size is non-zero.
if (file_offset_of_buffer_end >= read_until_position)
throw Exception(
ErrorCodes::LOGICAL_ERROR, "Read boundaries mismatch. Expected {} < {}", file_offset_of_buffer_end, read_until_position);
ErrorCodes::LOGICAL_ERROR,
"Read boundaries mismatch. Expected {} < {}",
file_offset_of_buffer_end, read_until_position);
return read_until_position - file_offset_of_buffer_end;
}
void CachedReadBufferFromRemoteFS::setReadUntilPosition(size_t)
void CachedOnDiskReadBufferFromFile::setReadUntilPosition(size_t position)
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Method `setReadUntilPosition()` not allowed");
if (!allow_seeks_after_first_read)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Method `setReadUntilPosition()` not allowed");
read_until_position = position;
initialized = false;
implementation_buffer.reset();
}
off_t CachedReadBufferFromRemoteFS::getPosition()
void CachedOnDiskReadBufferFromFile::setReadUntilEnd()
{
setReadUntilPosition(getFileSize());
}
off_t CachedOnDiskReadBufferFromFile::getPosition()
{
return file_offset_of_buffer_end - available();
}
std::optional<size_t> CachedReadBufferFromRemoteFS::getLastNonDownloadedOffset() const
std::optional<size_t> CachedOnDiskReadBufferFromFile::getLastNonDownloadedOffset() const
{
if (!file_segments_holder)
throw Exception(ErrorCodes::LOGICAL_ERROR, "File segments holder not initialized");
@ -1022,31 +1210,32 @@ std::optional<size_t> CachedReadBufferFromRemoteFS::getLastNonDownloadedOffset()
return std::nullopt;
}
void CachedReadBufferFromRemoteFS::assertCorrectness() const
void CachedOnDiskReadBufferFromFile::assertCorrectness() const
{
if (FileCache::isReadOnly() && !settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache)
if (FileCache::isReadOnly()
&& !settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cache usage is not allowed");
}
String CachedReadBufferFromRemoteFS::getInfoForLog()
String CachedOnDiskReadBufferFromFile::getInfoForLog()
{
String implementation_buffer_read_range_str;
if (implementation_buffer)
{
auto read_range = implementation_buffer->getRemainingReadRange();
implementation_buffer_read_range_str
= std::to_string(read_range.left) + '-' + (read_range.right ? std::to_string(*read_range.right) : "None");
}
implementation_buffer_read_range_str = implementation_buffer->getRemainingReadRange().toString();
else
implementation_buffer_read_range_str = "None";
auto current_file_segment_info
= current_file_segment_it == file_segments_holder->file_segments.end() ? "None" : (*current_file_segment_it)->getInfoForLog();
String current_file_segment_info;
if (current_file_segment_it == file_segments_holder->file_segments.end())
current_file_segment_info = (*current_file_segment_it)->getInfoForLog();
else
current_file_segment_info = "None";
return fmt::format(
"Buffer path: {}, hash key: {}, file_offset_of_buffer_end: {}, internal buffer remaining read range: {}, "
"Buffer path: {}, hash key: {}, file_offset_of_buffer_end: {}, "
"internal buffer remaining read range: {}, "
"read_type: {}, last caller: {}, file segment info: {}",
remote_fs_object_path,
source_file_path,
getHexUIntLowercase(cache_key),
file_offset_of_buffer_end,
implementation_buffer_read_range_str,

View File

@ -1,10 +1,11 @@
#pragma once
#include <Common/FileCache.h>
#include <Common/logger_useful.h>
#include <IO/SeekableReadBuffer.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/ReadSettings.h>
#include <Common/logger_useful.h>
#include <IO/ReadBufferFromFileBase.h>
#include <Interpreters/FilesystemCacheLog.h>
#include <Common/FileSegment.h>
@ -17,20 +18,25 @@ extern const Metric FilesystemCacheReadBuffers;
namespace DB
{
class CachedReadBufferFromRemoteFS : public SeekableReadBuffer
class CachedOnDiskReadBufferFromFile : public ReadBufferFromFileBase
{
public:
using RemoteFSFileReaderCreator = std::function<FileSegment::RemoteFileReaderPtr()>;
using ImplementationBufferPtr = std::shared_ptr<ReadBufferFromFileBase>;
using ImplementationBufferCreator = std::function<ImplementationBufferPtr()>;
CachedReadBufferFromRemoteFS(
const String & remote_fs_object_path_,
CachedOnDiskReadBufferFromFile(
const String & source_file_path_,
const FileCache::Key & cache_key_,
FileCachePtr cache_,
RemoteFSFileReaderCreator remote_file_reader_creator_,
ImplementationBufferCreator implementation_buffer_creator_,
const ReadSettings & settings_,
const String & query_id_,
size_t read_until_position_);
size_t file_size_,
bool allow_seeks_after_first_read_,
bool use_external_buffer_,
std::optional<size_t> read_until_position_ = std::nullopt);
~CachedReadBufferFromRemoteFS() override;
~CachedOnDiskReadBufferFromFile() override;
bool nextImpl() override;
@ -44,6 +50,10 @@ public:
void setReadUntilPosition(size_t position) override;
void setReadUntilEnd() override;
String getFileName() const override { return source_file_path; }
enum class ReadType
{
CACHED,
@ -54,11 +64,11 @@ public:
private:
void initialize(size_t offset, size_t size);
SeekableReadBufferPtr getImplementationBuffer(FileSegmentPtr & file_segment);
ImplementationBufferPtr getImplementationBuffer(FileSegmentPtr & file_segment);
SeekableReadBufferPtr getReadBufferForFileSegment(FileSegmentPtr & file_segment);
ImplementationBufferPtr getReadBufferForFileSegment(FileSegmentPtr & file_segment);
SeekableReadBufferPtr getCacheReadBuffer(size_t offset) const;
ImplementationBufferPtr getCacheReadBuffer(size_t offset) const;
std::optional<size_t> getLastNonDownloadedOffset() const;
@ -70,7 +80,7 @@ private:
void assertCorrectness() const;
SeekableReadBufferPtr getRemoteFSReadBuffer(FileSegmentPtr & file_segment, ReadType read_type_);
std::shared_ptr<ReadBufferFromFileBase> getRemoteFSReadBuffer(FileSegmentPtr & file_segment, ReadType read_type_);
size_t getTotalSizeToRead();
@ -82,7 +92,8 @@ private:
Poco::Logger * log;
FileCache::Key cache_key;
String remote_fs_object_path;
String source_file_path;
FileCachePtr cache;
ReadSettings settings;
@ -90,7 +101,7 @@ private:
size_t file_offset_of_buffer_end = 0;
size_t bytes_to_predownload = 0;
RemoteFSFileReaderCreator remote_file_reader_creator;
ImplementationBufferCreator implementation_buffer_creator;
/// Remote read buffer, which can only be owned by current buffer.
FileSegment::RemoteFileReaderPtr remote_file_reader;
@ -98,7 +109,7 @@ private:
std::optional<FileSegmentsHolder> file_segments_holder;
FileSegments::iterator current_file_segment_it;
SeekableReadBufferPtr implementation_buffer;
ImplementationBufferPtr implementation_buffer;
bool initialized = false;
ReadType read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
@ -125,6 +136,8 @@ private:
bool enable_logging = false;
String current_buffer_id;
bool allow_seeks_after_first_read;
[[maybe_unused]]bool use_external_buffer;
CurrentMetrics::Increment metric_increment{CurrentMetrics::FilesystemCacheReadBuffers};
ProfileEvents::Counters current_file_segment_counters;

View File

@ -0,0 +1,180 @@
#include "CachedOnDiskWriteBufferFromFile.h"
#include <Common/FileCacheFactory.h>
#include <Common/FileSegment.h>
#include <Common/logger_useful.h>
#include <Interpreters/FilesystemCacheLog.h>
#include <Interpreters/Context.h>
namespace ProfileEvents
{
extern const Event CachedWriteBufferCacheWriteBytes;
extern const Event CachedWriteBufferCacheWriteMicroseconds;
}
namespace DB
{
namespace
{
class SwapHelper
{
public:
SwapHelper(WriteBuffer & b1_, WriteBuffer & b2_) : b1(b1_), b2(b2_) { b1.swap(b2); }
~SwapHelper() { b1.swap(b2); }
private:
WriteBuffer & b1;
WriteBuffer & b2;
};
}
CachedOnDiskWriteBufferFromFile::CachedOnDiskWriteBufferFromFile(
std::unique_ptr<WriteBuffer> impl_,
FileCachePtr cache_,
const String & source_path_,
const FileCache::Key & key_,
bool is_persistent_cache_file_,
const String & query_id_,
const WriteSettings & settings_)
: WriteBufferFromFileDecorator(std::move(impl_))
, log(&Poco::Logger::get("CachedOnDiskWriteBufferFromFile"))
, cache(cache_)
, source_path(source_path_)
, key(key_)
, is_persistent_cache_file(is_persistent_cache_file_)
, query_id(query_id_)
, enable_cache_log(!query_id_.empty() && settings_.enable_filesystem_cache_log)
, cache_log(Context::getGlobalContextInstance()->getFilesystemCacheLog())
{
}
void CachedOnDiskWriteBufferFromFile::nextImpl()
{
size_t size = offset();
try
{
SwapHelper swap(*this, *impl);
/// Write data to the underlying buffer.
impl->next();
}
catch (...)
{
/// If something was already written to cache, remove it.
cache_writer.reset();
cache->removeIfExists(key);
throw;
}
/// Write data to cache.
cacheData(working_buffer.begin(), size);
current_download_offset += size;
}
void CachedOnDiskWriteBufferFromFile::cacheData(char * data, size_t size)
{
if (stop_caching)
return;
if (!cache_writer)
{
cache_writer = std::make_unique<FileSegmentRangeWriter>(
cache.get(), key, [this](const FileSegment & file_segment) { appendFilesystemCacheLog(file_segment); });
}
Stopwatch watch(CLOCK_MONOTONIC);
try
{
if (!cache_writer->write(data, size, current_download_offset, is_persistent_cache_file))
{
LOG_INFO(log, "Write-through cache is stopped as cache limit is reached and nothing can be evicted");
/// No space left, disable caching.
stop_caching = true;
return;
}
}
catch (ErrnoException & e)
{
int code = e.getErrno();
if (code == /* No space left on device */28 || code == /* Quota exceeded */122)
{
LOG_INFO(log, "Insert into cache is skipped due to insufficient disk space. ({})", e.displayText());
return;
}
tryLogCurrentException(__PRETTY_FUNCTION__);
return;
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
return;
}
ProfileEvents::increment(ProfileEvents::CachedWriteBufferCacheWriteBytes, size);
ProfileEvents::increment(ProfileEvents::CachedWriteBufferCacheWriteMicroseconds, watch.elapsedMicroseconds());
}
void CachedOnDiskWriteBufferFromFile::appendFilesystemCacheLog(const FileSegment & file_segment)
{
if (cache_log)
{
auto file_segment_range = file_segment.range();
FilesystemCacheLogElement elem
{
.event_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()),
.query_id = query_id,
.source_file_path = source_path,
.file_segment_range = { file_segment_range.left, file_segment_range.right },
.requested_range = {},
.cache_type = FilesystemCacheLogElement::CacheType::WRITE_THROUGH_CACHE,
.file_segment_size = file_segment_range.size(),
.cache_attempted = false,
.read_buffer_id = {},
.profile_counters = std::make_shared<ProfileEvents::Counters::Snapshot>(current_file_segment_counters.getPartiallyAtomicSnapshot()),
};
current_file_segment_counters.reset();
cache_log->add(elem);
}
}
void CachedOnDiskWriteBufferFromFile::finalizeImpl()
{
try
{
SwapHelper swap(*this, *impl);
impl->finalize();
}
catch (...)
{
if (cache_writer)
{
try
{
cache_writer->finalize();
cache_writer.reset();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
throw;
}
if (cache_writer)
{
cache_writer->finalize();
cache_writer.reset();
}
}
}

View File

@ -0,0 +1,60 @@
#pragma once
#include <IO/WriteBufferFromFileDecorator.h>
#include <IO/WriteSettings.h>
#include <Common/FileCache.h>
#include <Interpreters/FilesystemCacheLog.h>
namespace Poco
{
class Logger;
}
namespace DB
{
/**
* Write buffer for filesystem caching on write operations.
*/
class FileSegmentRangeWriter;
class CachedOnDiskWriteBufferFromFile final : public WriteBufferFromFileDecorator
{
public:
CachedOnDiskWriteBufferFromFile(
std::unique_ptr<WriteBuffer> impl_,
FileCachePtr cache_,
const String & source_path_,
const FileCache::Key & key_,
bool is_persistent_cache_file_,
const String & query_id_,
const WriteSettings & settings_);
void nextImpl() override;
void finalizeImpl() override;
private:
void cacheData(char * data, size_t size);
void appendFilesystemCacheLog(const FileSegment & file_segment);
Poco::Logger * log;
FileCachePtr cache;
String source_path;
FileCache::Key key;
bool is_persistent_cache_file;
size_t current_download_offset = 0;
const String query_id;
bool enable_cache_log;
std::shared_ptr<FilesystemCacheLog> cache_log;
bool stop_caching = false;
ProfileEvents::Counters current_file_segment_counters;
std::unique_ptr<FileSegmentRangeWriter> cache_writer;
};
}

View File

@ -47,6 +47,19 @@ ReadBufferFromAzureBlobStorage::ReadBufferFromAzureBlobStorage(
}
}
SeekableReadBuffer::Range ReadBufferFromAzureBlobStorage::getRemainingReadRange() const
{
return Range{
.left = static_cast<size_t>(offset),
.right = read_until_position ? std::optional{read_until_position - 1} : std::nullopt
};
}
void ReadBufferFromAzureBlobStorage::setReadUntilPosition(size_t position)
{
read_until_position = position;
initialized = false;
}
bool ReadBufferFromAzureBlobStorage::nextImpl()
{

View File

@ -36,6 +36,12 @@ public:
String getFileName() const override { return path; }
void setReadUntilPosition(size_t position) override;
Range getRemainingReadRange() const override;
bool supportsRightBoundedReads() const override { return true; }
private:
void initialize();

View File

@ -2,7 +2,7 @@
#include <IO/SeekableReadBuffer.h>
#include <Disks/IO/CachedReadBufferFromRemoteFS.h>
#include <Disks/IO/CachedOnDiskReadBufferFromFile.h>
#include <Common/logger_useful.h>
#include <iostream>
#include <Common/hex.h>
@ -54,13 +54,18 @@ SeekableReadBufferPtr ReadBufferFromRemoteFSGather::createImplementationBuffer(c
if (with_cache)
{
return std::make_shared<CachedReadBufferFromRemoteFS>(
auto cache_key = settings.remote_fs_cache->hash(path);
return std::make_shared<CachedOnDiskReadBufferFromFile>(
path,
cache_key,
settings.remote_fs_cache,
current_read_buffer_creator,
std::move(current_read_buffer_creator),
settings,
query_id,
current_read_until_position);
file_size,
/* allow_seeks */false,
/* use_external_buffer */true,
read_until_position ? std::optional<size_t>(read_until_position) : std::nullopt);
}
return current_read_buffer_creator();
@ -74,7 +79,7 @@ void ReadBufferFromRemoteFSGather::appendFilesystemCacheLog()
.query_id = query_id,
.source_file_path = current_file_path,
.file_segment_range = { 0, current_file_size },
.read_type = FilesystemCacheLogElement::ReadType::READ_FROM_FS_BYPASSING_CACHE,
.cache_type = FilesystemCacheLogElement::CacheType::READ_FROM_FS_BYPASSING_CACHE,
.file_segment_size = total_bytes_read_from_current_file,
.cache_attempted = false,
};

View File

@ -82,6 +82,22 @@ std::unique_ptr<ReadBuffer> ReadBufferFromWebServer::initialize()
}
void ReadBufferFromWebServer::setReadUntilPosition(size_t position)
{
read_until_position = position;
impl.reset();
}
SeekableReadBuffer::Range ReadBufferFromWebServer::getRemainingReadRange() const
{
return Range{
.left = static_cast<size_t>(offset),
.right = read_until_position ? std::optional{read_until_position - 1} : std::nullopt
};
}
bool ReadBufferFromWebServer::nextImpl()
{
if (read_until_position)

View File

@ -31,9 +31,15 @@ public:
off_t getPosition() override;
String getFileName() const override { return url; }
void setReadUntilPosition(size_t position) override;
size_t getFileOffsetOfBufferEnd() const override { return offset; }
String getFileName() const override { return url; }
Range getRemainingReadRange() const override;
bool supportsRightBoundedReads() const override { return true; }
private:
std::unique_ptr<ReadBuffer> initialize();

View File

@ -30,6 +30,8 @@ public:
void setReadUntilEnd() override;
bool isIntegratedWithFilesystemCache() const override { return true; }
size_t getFileSize() override;
private:

View File

@ -9,6 +9,7 @@
#include <Disks/IO/ReadBufferFromRemoteFSGather.h>
#include <Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageAuth.h>
#include <Common/logger_useful.h>
namespace DB
@ -29,6 +30,7 @@ AzureObjectStorage::AzureObjectStorage(
: name(name_)
, client(std::move(client_))
, settings(std::move(settings_))
, log(&Poco::Logger::get("AzureObjectStorage"))
{
}
@ -123,6 +125,8 @@ std::unique_ptr<WriteBufferFromFileBase> AzureObjectStorage::writeObject( /// NO
if (mode != WriteMode::Rewrite)
throw Exception("Azure storage doesn't support append", ErrorCodes::UNSUPPORTED_METHOD);
LOG_TEST(log, "Writing file: {}", object.absolute_path);
auto buffer = std::make_unique<WriteBufferFromAzureBlobStorage>(
client.get(),
object.absolute_path,
@ -151,6 +155,7 @@ void AzureObjectStorage::listPrefix(const std::string & path, RelativePathsWithS
void AzureObjectStorage::removeObject(const StoredObject & object)
{
const auto & path = object.absolute_path;
LOG_TEST(log, "Removing single object: {}", path);
auto client_ptr = client.get();
auto delete_info = client_ptr->DeleteBlob(path);
if (!delete_info.Value.Deleted)
@ -162,6 +167,7 @@ void AzureObjectStorage::removeObjects(const StoredObjects & objects)
auto client_ptr = client.get();
for (const auto & object : objects)
{
LOG_TEST(log, "Removing object: {} (total: {})", object.absolute_path, objects.size());
auto delete_info = client_ptr->DeleteBlob(object.absolute_path);
if (!delete_info.Value.Deleted)
throw Exception(ErrorCodes::AZURE_BLOB_STORAGE_ERROR, "Failed to delete file in AzureBlob Storage: {}", object.absolute_path);

View File

@ -16,6 +16,11 @@
#include <azure/storage/blobs.hpp>
#endif
namespace Poco
{
class Logger;
}
namespace DB
{
@ -122,6 +127,8 @@ private:
/// client used to access the files in the Blob Storage cloud
MultiVersion<Azure::Storage::Blobs::BlobContainerClient> client;
MultiVersion<AzureObjectStorageSettings> settings;
Poco::Logger * log;
};
}

View File

@ -105,18 +105,6 @@ void registerDiskAzureBlobStorage(DiskFactory & factory)
azure_blob_storage_disk->startup(context);
#ifdef NDEBUG
bool use_cache = true;
#else
/// Current cache implementation lead to allocations in destructor of
/// read buffer.
bool use_cache = false;
#endif
if (config.getBool(config_prefix + ".cache_enabled", use_cache))
{
String cache_path = config.getString(config_prefix + ".cache_path", context->getPath() + "disks/" + name + "/cache/");
azure_blob_storage_disk = wrapWithCache(azure_blob_storage_disk, "azure-blob-storage-cache", cache_path, metadata_path);
}
return std::make_shared<DiskRestartProxy>(azure_blob_storage_disk);
};

View File

@ -0,0 +1,304 @@
#include "CachedObjectStorage.h"
#include <Disks/ObjectStorages/DiskObjectStorageCommon.h>
#include <IO/BoundedReadBuffer.h>
#include <Disks/IO/CachedOnDiskWriteBufferFromFile.h>
#include <Disks/IO/CachedOnDiskReadBufferFromFile.h>
#include <Common/FileCache.h>
#include <Common/FileCacheFactory.h>
#include <Common/CurrentThread.h>
#include <Common/logger_useful.h>
#include <filesystem>
namespace fs = std::filesystem;
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
CachedObjectStorage::CachedObjectStorage(
ObjectStoragePtr object_storage_,
FileCachePtr cache_,
const FileCacheSettings & cache_settings_,
const std::string & cache_config_name_)
: object_storage(object_storage_)
, cache(cache_)
, cache_settings(cache_settings_)
, cache_config_name(cache_config_name_)
, log(&Poco::Logger::get(getName()))
{
cache->initialize();
}
FileCache::Key CachedObjectStorage::getCacheKey(const std::string & path) const
{
return cache->hash(path);
}
String CachedObjectStorage::getCachePath(const std::string & path) const
{
FileCache::Key cache_key = getCacheKey(path);
return cache->getPathInLocalCache(cache_key);
}
std::string CachedObjectStorage::generateBlobNameForPath(const std::string & path)
{
return object_storage->generateBlobNameForPath(path);
}
ReadSettings CachedObjectStorage::patchSettings(const ReadSettings & read_settings) const
{
ReadSettings modified_settings{read_settings};
modified_settings.remote_fs_cache = cache;
if (FileCache::isReadOnly())
modified_settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache = true;
return IObjectStorage::patchSettings(modified_settings);
}
void CachedObjectStorage::startup()
{
object_storage->startup();
}
bool CachedObjectStorage::exists(const StoredObject & object) const
{
return object_storage->exists(object);
}
std::unique_ptr<ReadBufferFromFileBase> CachedObjectStorage::readObjects( /// NOLINT
const StoredObjects & objects,
const ReadSettings & read_settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const
{
if (objects.empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Received empty list of objects to read");
assert(!objects[0].getPathKeyForCache().empty());
/// Add cache relating settings to ReadSettings.
auto modified_read_settings = patchSettings(read_settings);
auto implementation_buffer = object_storage->readObjects(objects, modified_read_settings, read_hint, file_size);
/// If underlying read buffer does caching on its own, do not wrap it in caching buffer.
if (implementation_buffer->isIntegratedWithFilesystemCache()
&& modified_read_settings.enable_filesystem_cache_on_lower_level)
{
return implementation_buffer;
}
else
{
if (!file_size)
file_size = implementation_buffer->getFileSize();
auto implementation_buffer_creator = [objects, modified_read_settings, read_hint, file_size, this]()
{
return std::make_unique<BoundedReadBuffer>(
object_storage->readObjects(objects, modified_read_settings, read_hint, file_size));
};
/// TODO: A test is needed for the case of non-s3 storage and *Log family engines.
std::string path = objects[0].absolute_path;
FileCache::Key key = getCacheKey(objects[0].getPathKeyForCache());
return std::make_unique<CachedOnDiskReadBufferFromFile>(
path,
key,
cache,
implementation_buffer_creator,
modified_read_settings,
CurrentThread::isInitialized() && CurrentThread::get().getQueryContext() ? std::string(CurrentThread::getQueryId()) : "",
file_size.value(),
/* allow_seeks */true,
/* use_external_buffer */false);
}
}
std::unique_ptr<ReadBufferFromFileBase> CachedObjectStorage::readObject( /// NOLINT
const StoredObject & object,
const ReadSettings & read_settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const
{
/// Add cache relating settings to ReadSettings.
auto modified_read_settings = patchSettings(read_settings);
auto implementation_buffer = object_storage->readObject(object, read_settings, read_hint, file_size);
/// If underlying read buffer does caching on its own, do not wrap it in caching buffer.
if (implementation_buffer->isIntegratedWithFilesystemCache()
&& modified_read_settings.enable_filesystem_cache_on_lower_level)
{
return implementation_buffer;
}
else
{
if (!file_size)
file_size = implementation_buffer->getFileSize();
auto implementation_buffer_creator = [object, read_settings, read_hint, file_size, this]()
{
return std::make_unique<BoundedReadBuffer>(object_storage->readObject(object, read_settings, read_hint, file_size));
};
FileCache::Key key = getCacheKey(object.getPathKeyForCache());
LOG_TEST(log, "Reading from file `{}` with cache key `{}`", object.absolute_path, key.toString());
return std::make_unique<CachedOnDiskReadBufferFromFile>(
object.absolute_path,
key,
cache,
implementation_buffer_creator,
read_settings,
CurrentThread::isInitialized() && CurrentThread::get().getQueryContext() ? std::string(CurrentThread::getQueryId()) : "",
file_size.value(),
/* allow_seeks */true,
/* use_external_buffer */false);
}
}
std::unique_ptr<WriteBufferFromFileBase> CachedObjectStorage::writeObject( /// NOLINT
const StoredObject & object,
WriteMode mode, // Cached doesn't support append, only rewrite
std::optional<ObjectAttributes> attributes,
FinalizeCallback && finalize_callback,
size_t buf_size,
const WriteSettings & write_settings)
{
/// Add cache relating settings to WriteSettings.
auto modified_write_settings = IObjectStorage::patchSettings(write_settings);
auto implementation_buffer = object_storage->writeObject(object, mode, attributes, std::move(finalize_callback), buf_size, modified_write_settings);
bool cache_on_write = modified_write_settings.enable_filesystem_cache_on_write_operations
&& FileCacheFactory::instance().getSettings(cache->getBasePath()).cache_on_write_operations
&& fs::path(object.absolute_path).extension() != ".tmp";
auto path_key_for_cache = object.getPathKeyForCache();
/// Need to remove even if cache_on_write == false.
removeCacheIfExists(path_key_for_cache);
if (cache_on_write)
{
auto key = getCacheKey(path_key_for_cache);
LOG_TEST(log, "Caching file `{}` to `{}` with key {}", object.absolute_path, getCachePath(path_key_for_cache), key.toString());
return std::make_unique<CachedOnDiskWriteBufferFromFile>(
std::move(implementation_buffer),
cache,
implementation_buffer->getFileName(),
key,
modified_write_settings.is_file_cache_persistent,
CurrentThread::isInitialized() && CurrentThread::get().getQueryContext() ? std::string(CurrentThread::getQueryId()) : "",
modified_write_settings);
}
return implementation_buffer;
}
void CachedObjectStorage::removeCacheIfExists(const std::string & path_key_for_cache)
{
if (path_key_for_cache.empty())
return;
/// Add try catch?
cache->removeIfExists(getCacheKey(path_key_for_cache));
}
void CachedObjectStorage::removeObject(const StoredObject & object)
{
removeCacheIfExists(object.getPathKeyForCache());
object_storage->removeObject(object);
}
void CachedObjectStorage::removeObjects(const StoredObjects & objects)
{
for (const auto & object : objects)
removeCacheIfExists(object.getPathKeyForCache());
object_storage->removeObjects(objects);
}
void CachedObjectStorage::removeObjectIfExists(const StoredObject & object)
{
removeCacheIfExists(object.getPathKeyForCache());
object_storage->removeObjectIfExists(object);
}
void CachedObjectStorage::removeObjectsIfExist(const StoredObjects & objects)
{
for (const auto & object : objects)
removeCacheIfExists(object.getPathKeyForCache());
object_storage->removeObjectsIfExist(objects);
}
ReadSettings CachedObjectStorage::getAdjustedSettingsFromMetadataFile(const ReadSettings & settings, const std::string & path) const
{
ReadSettings new_settings{settings};
new_settings.is_file_cache_persistent = isFileWithPersistentCache(path) && cache_settings.do_not_evict_index_and_mark_files;
return new_settings;
}
WriteSettings CachedObjectStorage::getAdjustedSettingsFromMetadataFile(const WriteSettings & settings, const std::string & path) const
{
WriteSettings new_settings{settings};
new_settings.is_file_cache_persistent = isFileWithPersistentCache(path) && cache_settings.do_not_evict_index_and_mark_files;
return new_settings;
}
void CachedObjectStorage::copyObjectToAnotherObjectStorage( // NOLINT
const StoredObject & object_from,
const StoredObject & object_to,
IObjectStorage & object_storage_to,
std::optional<ObjectAttributes> object_to_attributes)
{
object_storage->copyObjectToAnotherObjectStorage(object_from, object_to, object_storage_to, object_to_attributes);
}
void CachedObjectStorage::copyObject( // NOLINT
const StoredObject & object_from, const StoredObject & object_to, std::optional<ObjectAttributes> object_to_attributes)
{
object_storage->copyObject(object_from, object_to, object_to_attributes);
}
std::unique_ptr<IObjectStorage> CachedObjectStorage::cloneObjectStorage(
const std::string & new_namespace,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
ContextPtr context)
{
return object_storage->cloneObjectStorage(new_namespace, config, config_prefix, context);
}
void CachedObjectStorage::listPrefix(const std::string & path, RelativePathsWithSize & children) const
{
object_storage->listPrefix(path, children);
}
ObjectMetadata CachedObjectStorage::getObjectMetadata(const std::string & path) const
{
return object_storage->getObjectMetadata(path);
}
void CachedObjectStorage::shutdown()
{
object_storage->shutdown();
}
void CachedObjectStorage::applyNewSettings(
const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, ContextPtr context)
{
object_storage->applyNewSettings(config, config_prefix, context);
}
String CachedObjectStorage::getObjectsNamespace() const
{
return object_storage->getObjectsNamespace();
}
}

View File

@ -0,0 +1,126 @@
#pragma once
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Common/FileCache.h>
#include <Common/FileCacheSettings.h>
namespace Poco
{
class Logger;
}
namespace DB
{
/**
* Wraps another object storage and add a caching layer for it.
*/
class CachedObjectStorage final : public IObjectStorage
{
public:
CachedObjectStorage(ObjectStoragePtr object_storage_, FileCachePtr cache_, const FileCacheSettings & cache_settings_, const String & cache_config_name_);
std::string getName() const override { return fmt::format("CachedObjectStorage-{}({})", cache_config_name, object_storage->getName()); }
bool exists(const StoredObject & object) const override;
std::unique_ptr<ReadBufferFromFileBase> readObject( /// NOLINT
const StoredObject & object,
const ReadSettings & read_settings = ReadSettings{},
std::optional<size_t> read_hint = {},
std::optional<size_t> file_size = {}) const override;
std::unique_ptr<ReadBufferFromFileBase> readObjects( /// NOLINT
const StoredObjects & objects,
const ReadSettings & read_settings = ReadSettings{},
std::optional<size_t> read_hint = {},
std::optional<size_t> file_size = {}) const override;
/// Open the file for write and return WriteBufferFromFileBase object.
std::unique_ptr<WriteBufferFromFileBase> writeObject( /// NOLINT
const StoredObject & object,
WriteMode mode,
std::optional<ObjectAttributes> attributes = {},
FinalizeCallback && finalize_callback = {},
size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE,
const WriteSettings & write_settings = {}) override;
void removeObject(const StoredObject & object) override;
void removeObjects(const StoredObjects & objects) override;
void removeObjectIfExists(const StoredObject & object) override;
void removeObjectsIfExist(const StoredObjects & objects) override;
void copyObject( /// NOLINT
const StoredObject & object_from,
const StoredObject & object_to,
std::optional<ObjectAttributes> object_to_attributes = {}) override;
void copyObjectToAnotherObjectStorage( /// NOLINT
const StoredObject & object_from,
const StoredObject & object_to,
IObjectStorage & object_storage_to,
std::optional<ObjectAttributes> object_to_attributes = {}) override;
std::unique_ptr<IObjectStorage> cloneObjectStorage(
const std::string & new_namespace,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
ContextPtr context) override;
void listPrefix(const std::string & path, RelativePathsWithSize & children) const override;
ObjectMetadata getObjectMetadata(const std::string & path) const override;
void shutdown() override;
void startup() override;
void applyNewSettings(
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
ContextPtr context) override;
String getObjectsNamespace() const override;
const String & getCacheBasePath() const override { return cache->getBasePath(); }
std::string generateBlobNameForPath(const std::string & path) override;
bool isRemote() const override { return object_storage->isRemote(); }
void removeCacheIfExists(const std::string & path_key_for_cache) override;
bool supportsCache() const override { return true; }
std::string getUniqueId(const std::string & path) const override { return object_storage->getUniqueId(path); }
bool isReadOnly() const override { return object_storage->isReadOnly(); }
const std::string & getCacheConfigName() const { return cache_config_name; }
ObjectStoragePtr getWrappedObjectStorage() { return object_storage; }
bool supportParallelWrite() const override { return object_storage->supportParallelWrite(); }
ReadSettings getAdjustedSettingsFromMetadataFile(const ReadSettings & settings, const std::string & path) const override;
WriteSettings getAdjustedSettingsFromMetadataFile(const WriteSettings & settings, const std::string & path) const override;
private:
FileCache::Key getCacheKey(const std::string & path) const;
String getCachePath(const std::string & path) const;
ReadSettings patchSettings(const ReadSettings & read_settings) const override;
ObjectStoragePtr object_storage;
FileCachePtr cache;
FileCacheSettings cache_settings;
std::string cache_config_name;
Poco::Logger * log;
};
}

View File

@ -0,0 +1,65 @@
#include <Common/FileCacheSettings.h>
#include <Common/FileCacheFactory.h>
#include <Common/FileCache.h>
#include <Common/logger_useful.h>
#include <Common/assert_cast.h>
#include <Disks/DiskFactory.h>
#include <Disks/ObjectStorages/Cached/CachedObjectStorage.h>
#include <Disks/ObjectStorages/DiskObjectStorage.h>
#include <Interpreters/Context.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
void registerDiskCache(DiskFactory & factory)
{
auto creator = [](const String & name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context,
const DisksMap & map) -> DiskPtr
{
auto disk_name = config.getString(config_prefix + ".disk", "");
if (disk_name.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Disk Cache requires `disk` field in config");
auto disk_it = map.find(disk_name);
if (disk_it == map.end())
{
throw Exception(
ErrorCodes::BAD_ARGUMENTS,
"Cannot wrap disk `{}` with cache layer `{}`: there is no such disk (it should be initialized before cache disk)",
disk_name, name);
}
FileCacheSettings file_cache_settings;
file_cache_settings.loadFromConfig(config, config_prefix);
auto cache_base_path = config.getString(config_prefix + ".path", fs::path(context->getPath()) / "disks" / name / "cache/");
if (!fs::exists(cache_base_path))
fs::create_directories(cache_base_path);
auto disk = disk_it->second;
auto cache = FileCacheFactory::instance().getOrCreate(cache_base_path, file_cache_settings, name);
auto disk_object_storage = disk->createDiskObjectStorage();
disk_object_storage->wrapWithCache(cache, file_cache_settings, name);
LOG_INFO(
&Poco::Logger::get("DiskCache"),
"Registered cached disk (`{}`) with structure: {}",
name, assert_cast<DiskObjectStorage *>(disk_object_storage.get())->getStructure());
return disk_object_storage;
};
factory.registerDiskType("cache", creator);
}
}

View File

@ -10,8 +10,8 @@
#include <Common/quoteString.h>
#include <Common/logger_useful.h>
#include <Common/filesystemHelpers.h>
#include <Disks/IO/ThreadPoolRemoteFSReader.h>
#include <Common/FileCache.h>
#include <Disks/ObjectStorages/Cached/CachedObjectStorage.h>
#include <Disks/ObjectStorages/DiskObjectStorageRemoteMetadataRestoreHelper.h>
#include <Disks/ObjectStorages/DiskObjectStorageTransaction.h>
#include <Disks/FakeDiskTransaction.h>
@ -91,6 +91,12 @@ DiskTransactionPtr DiskObjectStorage::createObjectStorageTransaction()
send_metadata ? metadata_helper.get() : nullptr);
}
std::shared_ptr<Executor> DiskObjectStorage::getAsyncExecutor(const std::string & log_name, size_t size)
{
static auto reader = std::make_shared<AsyncThreadPoolExecutor>(log_name, size);
return reader;
}
DiskObjectStorage::DiskObjectStorage(
const String & name_,
const String & object_storage_root_path_,
@ -100,7 +106,7 @@ DiskObjectStorage::DiskObjectStorage(
DiskType disk_type_,
bool send_metadata_,
uint64_t thread_pool_size_)
: IDisk(std::make_unique<AsyncThreadPoolExecutor>(log_name, thread_pool_size_))
: IDisk(getAsyncExecutor(log_name, thread_pool_size_))
, name(name_)
, object_storage_root_path(object_storage_root_path_)
, log (&Poco::Logger::get("DiskObjectStorage(" + log_name + ")"))
@ -258,7 +264,7 @@ bool DiskObjectStorage::checkUniqueId(const String & id) const
if (!id.starts_with(object_storage_root_path))
return false;
auto object = StoredObject::create(*object_storage, id, {}, true);
auto object = StoredObject::create(*object_storage, id, {}, {}, true);
return object_storage->exists(object);
}
@ -450,12 +456,17 @@ bool DiskObjectStorage::supportsCache() const
return object_storage->supportsCache();
}
DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage(const String & name_)
bool DiskObjectStorage::isReadOnly() const
{
return object_storage->isReadOnly();
}
DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage()
{
return std::make_shared<DiskObjectStorage>(
name_,
getName(),
object_storage_root_path,
name,
getName(),
metadata_storage,
object_storage,
disk_type,
@ -463,6 +474,24 @@ DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage(const String & n
threadpool_size);
}
void DiskObjectStorage::wrapWithCache(FileCachePtr cache, const FileCacheSettings & cache_settings, const String & layer_name)
{
object_storage = std::make_shared<CachedObjectStorage>(object_storage, cache, cache_settings, layer_name);
}
NameSet DiskObjectStorage::getCacheLayersNames() const
{
NameSet cache_layers;
auto current_object_storage = object_storage;
while (current_object_storage->supportsCache())
{
auto * cached_object_storage = assert_cast<CachedObjectStorage *>(current_object_storage.get());
cache_layers.insert(cached_object_storage->getCacheConfigName());
current_object_storage = cached_object_storage->getWrappedObjectStorage();
}
return cache_layers;
}
std::unique_ptr<ReadBufferFromFileBase> DiskObjectStorage::readFile(
const String & path,
const ReadSettings & settings,
@ -471,7 +500,7 @@ std::unique_ptr<ReadBufferFromFileBase> DiskObjectStorage::readFile(
{
return object_storage->readObjects(
metadata_storage->getStorageObjects(path),
settings,
object_storage->getAdjustedSettingsFromMetadataFile(settings, path),
read_hint,
file_size);
}
@ -485,7 +514,11 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorage::writeFile(
LOG_TEST(log, "Write file: {}", path);
auto transaction = createObjectStorageTransaction();
auto result = transaction->writeFile(path, buf_size, mode, settings);
auto result = transaction->writeFile(
path,
buf_size,
mode,
object_storage->getAdjustedSettingsFromMetadataFile(settings, path));
return result;
}
@ -567,5 +600,4 @@ DiskObjectStorageReservation::~DiskObjectStorageReservation()
}
}
}

View File

@ -45,7 +45,7 @@ public:
bool supportZeroCopyReplication() const override { return true; }
bool supportParallelWrite() const override { return true; }
bool supportParallelWrite() const override { return object_storage->supportParallelWrite(); }
const String & getName() const override { return name; }
@ -55,7 +55,7 @@ public:
void getRemotePathsRecursive(const String & local_path, std::vector<LocalPathWithObjectStoragePaths> & paths_map) override;
std::string getCacheBasePath() const override
const std::string & getCacheBasePath() const override
{
return object_storage->getCacheBasePath();
}
@ -164,10 +164,32 @@ public:
UInt64 getRevision() const override;
DiskObjectStoragePtr createDiskObjectStorage(const String & name_) override;
DiskObjectStoragePtr createDiskObjectStorage() override;
bool supportsCache() const override;
/// Is object storage read only?
/// For example: WebObjectStorage is read only as it allows to read from a web server
/// with static files, so only read-only operations are allowed for this storage.
bool isReadOnly() const override;
/// Add a cache layer.
/// Example: DiskObjectStorage(S3ObjectStorage) -> DiskObjectStorage(CachedObjectStorage(S3ObjectStorage))
/// There can be any number of cache layers:
/// DiskObjectStorage(CachedObjectStorage(...CacheObjectStorage(S3ObjectStorage)...))
void wrapWithCache(FileCachePtr cache, const FileCacheSettings & cache_settings, const String & layer_name);
/// Get structure of object storage this disk works with. Examples:
/// DiskObjectStorage(S3ObjectStorage)
/// DiskObjectStorage(CachedObjectStorage(S3ObjectStorage))
/// DiskObjectStorage(CachedObjectStorage(CachedObjectStorage(S3ObjectStorage)))
String getStructure() const { return fmt::format("DiskObjectStorage-{}({})", getName(), object_storage->getName()); }
/// Get names of all cache layers. Name is how cache is defined in configuration file.
NameSet getCacheLayersNames() const override;
static std::shared_ptr<Executor> getAsyncExecutor(const std::string & log_name, size_t size);
bool supportsStat() const override { return metadata_storage->supportsStat(); }
struct stat stat(const String & path) const override;

View File

@ -1,35 +1,12 @@
#include <Disks/ObjectStorages/DiskObjectStorageCommon.h>
#include <Common/getRandomASCIIString.h>
#include <Common/FileCacheFactory.h>
#include <Common/FileCache.h>
#include <Common/FileCacheSettings.h>
#include <Disks/DiskLocal.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Interpreters/Context.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
std::shared_ptr<DiskCacheWrapper> wrapWithCache(
std::shared_ptr<IDisk> disk, String cache_name, String cache_path, String metadata_path)
{
if (metadata_path == cache_path)
throw Exception("Metadata and cache paths should be different: " + metadata_path, ErrorCodes::BAD_ARGUMENTS);
auto cache_disk = std::make_shared<DiskLocal>(cache_name, cache_path, 0);
auto cache_file_predicate = [] (const String & path)
{
return path.ends_with("idx") // index files.
|| path.ends_with("mrk") || path.ends_with("mrk2") || path.ends_with("mrk3") /// mark files.
|| path.ends_with("txt") || path.ends_with("dat");
};
return std::make_shared<DiskCacheWrapper>(disk, cache_disk, cache_file_predicate);
}
static String getDiskMetadataPath(
const String & name,
const Poco::Util::AbstractConfiguration & config,
@ -52,39 +29,12 @@ std::pair<String, DiskPtr> prepareForLocalMetadata(
return std::make_pair(metadata_path, metadata_disk);
}
FileCachePtr getCachePtrForDisk(
const String & name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context)
bool isFileWithPersistentCache(const String & path)
{
bool data_cache_enabled = config.getBool(config_prefix + ".data_cache_enabled", false);
if (!data_cache_enabled)
return nullptr;
auto cache_base_path = config.getString(config_prefix + ".data_cache_path", fs::path(context->getPath()) / "disks" / name / "data_cache/");
if (!fs::exists(cache_base_path))
fs::create_directories(cache_base_path);
auto metadata_path = getDiskMetadataPath(name, config, config_prefix, context);
if (metadata_path == cache_base_path)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Metadata path and cache base path must be different: {}", metadata_path);
FileCacheSettings file_cache_settings;
file_cache_settings.loadFromConfig(config, config_prefix);
auto cache = FileCacheFactory::instance().getOrCreate(cache_base_path, file_cache_settings, name);
cache->initialize();
auto * log = &Poco::Logger::get("Disk(" + name + ")");
LOG_INFO(log, "Disk registered with cache path: {}. Cache size: {}, max cache elements size: {}, max_file_segment_size: {}",
cache_base_path,
file_cache_settings.max_size ? toString(file_cache_settings.max_size) : "UNLIMITED",
file_cache_settings.max_elements ? toString(file_cache_settings.max_elements) : "UNLIMITED",
file_cache_settings.max_file_segment_size);
return cache;
auto path_extension = std::filesystem::path(path).extension();
return path_extension == ".idx" // index files.
|| path_extension == ".mrk" || path_extension == ".mrk2" || path_extension == ".mrk3" /// mark files.
|| path_extension == ".txt" || path_extension == ".dat";
}
}

View File

@ -8,27 +8,16 @@
#include <Common/getRandomASCIIString.h>
#include <Disks/IDisk.h>
#include <Disks/DiskCacheWrapper.h>
namespace DB
{
std::shared_ptr<DiskCacheWrapper> wrapWithCache(
std::shared_ptr<IDisk> disk,
String cache_name,
String cache_path,
String metadata_path);
std::pair<String, DiskPtr> prepareForLocalMetadata(
const String & name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context);
FileCachePtr getCachePtrForDisk(
const String & name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context);
bool isFileWithPersistentCache(const String & path);
}

View File

@ -430,6 +430,7 @@ void DiskObjectStorageRemoteMetadataRestoreHelper::processRestoreFiles(
disk->createDirectories(directoryPath(path));
auto relative_key = shrinkKey(source_path, key);
auto full_path = fs::path(disk->object_storage_root_path) / relative_key;
StoredObject object_from{key};
StoredObject object_to{fs::path(disk->object_storage_root_path) / relative_key};

View File

@ -109,7 +109,6 @@ struct RemoveObjectStorageOperation final : public IDiskObjectStorageOperation
if (hardlink_count == 0)
{
objects_to_remove = objects;
remove_from_cache = true;
}
}
catch (const Exception & e)
@ -136,12 +135,6 @@ struct RemoveObjectStorageOperation final : public IDiskObjectStorageOperation
{
if (!delete_metadata_only && !objects_to_remove.empty())
object_storage.removeObjects(objects_to_remove);
if (remove_from_cache)
{
for (const auto & object : objects_to_remove)
object_storage.removeCacheIfExists(object.getPathKeyForCache());
}
}
};
@ -186,9 +179,7 @@ struct RemoveRecursiveObjectStorageOperation final : public IDiskObjectStorageOp
if (hardlink_count == 0)
{
objects_to_remove[path_to_remove] = objects_paths;
objects_to_remove_from_cache.insert(objects_to_remove_from_cache.end(), objects_paths.begin(), objects_paths.end());
}
}
catch (const Exception & e)
{
@ -237,9 +228,6 @@ struct RemoveRecursiveObjectStorageOperation final : public IDiskObjectStorageOp
}
object_storage.removeObjects(remove_from_remote);
}
for (const auto & object : objects_to_remove_from_cache)
object_storage.removeCacheIfExists(object.getPathKeyForCache());
}
};

View File

@ -95,17 +95,15 @@ StoredObjects FakeMetadataStorageFromDisk::getStorageObjects(const std::string &
std::string blob_name = object_storage->generateBlobNameForPath(path);
std::string object_path = fs::path(object_storage_root_path) / blob_name;
size_t object_size = getFileSize(object_path);
size_t object_size = getFileSize(path);
auto object = StoredObject::create(*object_storage, object_path, object_size);
auto object = StoredObject::create(*object_storage, object_path, object_size, /* exists */true);
return {std::move(object)};
}
uint32_t FakeMetadataStorageFromDisk::getHardlinkCount(const std::string & path) const
{
size_t ref_count = disk->getRefCount(path);
assert(ref_count > 0);
return ref_count - 1;
return disk->getRefCount(path);
}
const IMetadataStorage & FakeMetadataStorageFromDiskTransaction::getStorageForNonTransactionalReads() const

View File

@ -38,7 +38,7 @@ void registerDiskHDFS(DiskFactory & factory)
/// FIXME Cache currently unsupported :(
ObjectStoragePtr hdfs_storage = std::make_unique<HDFSObjectStorage>(uri, std::move(settings), config);
auto [metadata_path, metadata_disk] = prepareForLocalMetadata(name, config, config_prefix, context_);
auto [_, metadata_disk] = prepareForLocalMetadata(name, config, config_prefix, context_);
auto metadata_storage = std::make_shared<MetadataStorageFromDisk>(metadata_disk, uri);
uint64_t copy_thread_pool_size = config.getUInt(config_prefix + ".thread_pool_size", 16);
@ -53,20 +53,6 @@ void registerDiskHDFS(DiskFactory & factory)
/* send_metadata = */ false,
copy_thread_pool_size);
#ifdef NDEBUG
bool use_cache = true;
#else
/// Current S3 cache implementation lead to allocations in destructor of
/// read buffer.
bool use_cache = false;
#endif
if (config.getBool(config_prefix + ".cache_enabled", use_cache))
{
String cache_path = config.getString(config_prefix + ".cache_path", context_->getPath() + "disks/" + name + "/cache/");
disk_result = wrapWithCache(disk_result, "hdfs-cache", cache_path, metadata_path);
}
return std::make_shared<DiskRestartProxy>(disk_result);
};

View File

@ -88,8 +88,6 @@ using MetadataTransactionPtr = std::shared_ptr<IMetadataTransaction>;
/// small amounts of data (strings).
class IMetadataStorage : private boost::noncopyable
{
friend class MetadataStorageFromDiskTransaction;
public:
virtual MetadataTransactionPtr createTransaction() const = 0;

View File

@ -44,9 +44,9 @@ void IObjectStorage::copyObjectToAnotherObjectStorage( // NOLINT
out->finalize();
}
std::string IObjectStorage::getCacheBasePath() const
const std::string & IObjectStorage::getCacheBasePath() const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "getCacheBasePath() is not implemented for {}", getName());
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "getCacheBasePath() is not implemented for object storage");
}
void IObjectStorage::applyRemoteThrottlingSettings(ContextPtr context)

View File

@ -41,8 +41,6 @@ struct RelativePathWithSize
using RelativePathsWithSize = std::vector<RelativePathWithSize>;
using StoredObjects = std::vector<StoredObject>;
struct ObjectMetadata
{
uint64_t size_bytes;
@ -128,7 +126,7 @@ public:
virtual ~IObjectStorage() = default;
/// Path to directory with objects cache
virtual std::string getCacheBasePath() const;
virtual const std::string & getCacheBasePath() const;
static AsynchronousReaderPtr getThreadPoolReader();
@ -170,13 +168,21 @@ public:
virtual bool supportsCache() const { return false; }
virtual bool isReadOnly() const { return false; }
virtual bool supportParallelWrite() const { return false; }
virtual ReadSettings getAdjustedSettingsFromMetadataFile(const ReadSettings & settings, const std::string & /* path */) const { return settings; }
virtual WriteSettings getAdjustedSettingsFromMetadataFile(const WriteSettings & settings, const std::string & /* path */) const { return settings; }
protected:
/// Should be called from implementation of applyNewSettings()
void applyRemoteThrottlingSettings(ContextPtr context);
/// Should be used by implementation of read* and write* methods
ReadSettings patchSettings(const ReadSettings & read_settings) const;
WriteSettings patchSettings(const WriteSettings & write_settings) const;
virtual ReadSettings patchSettings(const ReadSettings & read_settings) const;
virtual WriteSettings patchSettings(const WriteSettings & write_settings) const;
private:
mutable std::mutex throttlers_mutex;

View File

@ -107,6 +107,10 @@ void LocalObjectStorage::listPrefix(const std::string & path, RelativePathsWithS
void LocalObjectStorage::removeObject(const StoredObject & object)
{
/// For local object storage files are actually removed when "metadata" is removed.
if (!exists(object))
return;
if (0 != unlink(object.absolute_path.data()))
throwFromErrnoWithPath("Cannot unlink file " + object.absolute_path, object.absolute_path, ErrorCodes::CANNOT_UNLINK);
}

View File

@ -24,8 +24,6 @@
#include <aws/s3/model/UploadPartCopyRequest.h>
#include <aws/s3/model/AbortMultipartUploadRequest.h>
#include <Common/FileCache.h>
#include <Common/FileCacheFactory.h>
#include <Common/getRandomASCIIString.h>
#include <Common/logger_useful.h>
#include <Common/MultiVersion.h>
@ -115,23 +113,6 @@ bool S3ObjectStorage::exists(const StoredObject & object) const
return true;
}
String S3ObjectStorage::getCacheBasePath() const
{
if (!cache)
return "";
return cache->getBasePath();
}
void S3ObjectStorage::removeCacheIfExists(const std::string & path_key)
{
if (!cache || path_key.empty())
return;
FileCache::Key key = cache->hash(path_key);
cache->removeIfExists(key);
}
std::unique_ptr<ReadBufferFromFileBase> S3ObjectStorage::readObjects( /// NOLINT
const StoredObjects & objects,
const ReadSettings & read_settings,
@ -207,10 +188,6 @@ std::unique_ptr<WriteBufferFromFileBase> S3ObjectStorage::writeObject( /// NOLIN
if (mode != WriteMode::Rewrite)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "S3 doesn't support append to files");
bool cache_on_write = cache
&& write_settings.enable_filesystem_cache_on_write_operations
&& FileCacheFactory::instance().getSettings(getCacheBasePath()).cache_on_write_operations;
auto settings_ptr = s3_settings.get();
auto s3_buffer = std::make_unique<WriteBufferFromS3>(
client.get(),
@ -220,9 +197,7 @@ std::unique_ptr<WriteBufferFromFileBase> S3ObjectStorage::writeObject( /// NOLIN
attributes,
buf_size,
threadPoolCallbackRunner(getThreadPoolWriter()),
disk_write_settings,
cache_on_write ? cache : nullptr);
disk_write_settings);
return std::make_unique<WriteIndirectBufferFromRemoteFS>(
std::move(s3_buffer), std::move(finalize_callback), object.absolute_path);
@ -495,19 +470,6 @@ void S3ObjectStorage::copyObject( // NOLINT
}
}
ReadSettings S3ObjectStorage::patchSettings(const ReadSettings & read_settings) const
{
ReadSettings settings{read_settings};
if (cache)
{
if (FileCache::isReadOnly())
settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache = true;
settings.remote_fs_cache = cache;
}
return IObjectStorage::patchSettings(settings);
}
void S3ObjectStorage::setNewSettings(std::unique_ptr<S3ObjectStorageSettings> && s3_settings_)
{
s3_settings.set(std::move(s3_settings_));
@ -549,7 +511,7 @@ std::unique_ptr<IObjectStorage> S3ObjectStorage::cloneObjectStorage(
return std::make_unique<S3ObjectStorage>(
getClient(config, config_prefix, context),
getSettings(config, config_prefix, context),
version_id, s3_capabilities, new_namespace, nullptr);
version_id, s3_capabilities, new_namespace);
}
}

View File

@ -48,14 +48,12 @@ public:
std::unique_ptr<S3ObjectStorageSettings> && s3_settings_,
String version_id_,
const S3Capabilities & s3_capabilities_,
String bucket_,
FileCachePtr cache_)
String bucket_)
: bucket(bucket_)
, client(std::move(client_))
, s3_settings(std::move(s3_settings_))
, s3_capabilities(s3_capabilities_)
, version_id(std::move(version_id_))
, cache(cache_)
{
}
@ -136,15 +134,9 @@ public:
const std::string & config_prefix,
ContextPtr context) override;
bool supportsCache() const override { return true; }
void removeCacheIfExists(const std::string & path_key) override;
String getCacheBasePath() const override;
bool supportParallelWrite() const override { return true; }
private:
ReadSettings patchSettings(const ReadSettings & read_settings) const;
void setNewSettings(std::unique_ptr<S3ObjectStorageSettings> && s3_settings_);
void setNewClient(std::unique_ptr<Aws::S3::S3Client> && client_);
@ -177,8 +169,6 @@ private:
S3Capabilities s3_capabilities;
const String version_id;
FileCachePtr cache;
};
}

View File

@ -9,14 +9,8 @@
#if USE_AWS_S3
#include <aws/core/client/DefaultRetryStrategy.h>
#include <base/getFQDNOrHostName.h>
#include <Common/FileCacheFactory.h>
#include <IO/S3Common.h>
#include <Disks/DiskCacheWrapper.h>
#include <Disks/DiskRestartProxy.h>
#include <Disks/DiskLocal.h>
#include <Disks/ObjectStorages/DiskObjectStorage.h>
@ -27,6 +21,7 @@
#include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
#include <Disks/ObjectStorages/S3/diskSettings.h>
#include <Disks/ObjectStorages/MetadataStorageFromDisk.h>
#include <IO/S3Common.h>
#include <Storages/StorageS3Settings.h>
@ -120,7 +115,8 @@ void registerDiskS3(DiskFactory & factory)
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
ContextPtr context,
const DisksMap & /*map*/) -> DiskPtr {
const DisksMap & /*map*/) -> DiskPtr
{
S3::URI uri(Poco::URI(config.getString(config_prefix + ".endpoint")));
if (uri.key.empty())
@ -132,14 +128,12 @@ void registerDiskS3(DiskFactory & factory)
auto [metadata_path, metadata_disk] = prepareForLocalMetadata(name, config, config_prefix, context);
auto metadata_storage = std::make_shared<MetadataStorageFromDisk>(metadata_disk, uri.key);
FileCachePtr cache = getCachePtrForDisk(name, config, config_prefix, context);
S3Capabilities s3_capabilities = getCapabilitiesFromConfig(config, config_prefix);
auto s3_storage = std::make_unique<S3ObjectStorage>(
getClient(config, config_prefix, context),
getSettings(config, config_prefix, context),
uri.version_id, s3_capabilities, uri.bucket, cache);
uri.version_id, s3_capabilities, uri.bucket);
bool skip_access_check = config.getBool(config_prefix + ".skip_access_check", false);
@ -184,20 +178,6 @@ void registerDiskS3(DiskFactory & factory)
std::shared_ptr<IDisk> disk_result = s3disk;
#ifdef NDEBUG
bool use_cache = true;
#else
/// Current S3 cache implementation lead to allocations in destructor of
/// read buffer.
bool use_cache = false;
#endif
if (config.getBool(config_prefix + ".cache_enabled", use_cache))
{
String cache_path = config.getString(config_prefix + ".cache_path", context->getPath() + "disks/" + name + "/cache/");
disk_result = wrapWithCache(disk_result, "s3-cache", cache_path, metadata_path);
}
return std::make_shared<DiskRestartProxy>(disk_result);
};
factory.registerDiskType("s3", creator);

View File

@ -27,12 +27,12 @@ std::string StoredObject::getPathKeyForCache() const
}
StoredObject StoredObject::create(
const IObjectStorage & object_storage, const std::string & object_path, size_t object_size, bool object_bypasses_cache)
const IObjectStorage & object_storage, const std::string & object_path, size_t object_size, bool exists, bool object_bypasses_cache)
{
if (object_bypasses_cache)
return StoredObject(object_path, object_size, {});
auto path_key_for_cache_creator = [&object_storage](const std::string & path) -> String
PathKeyForCacheCreator path_key_for_cache_creator = [&object_storage](const std::string & path) -> std::string
{
try
{
@ -49,6 +49,11 @@ StoredObject StoredObject::create(
}
};
if (exists)
{
path_key_for_cache_creator = [path = path_key_for_cache_creator(object_path)](const std::string &) { return path; };
}
return StoredObject(object_path, object_size, std::move(path_key_for_cache_creator));
}

View File

@ -20,6 +20,7 @@ struct StoredObject
const IObjectStorage & object_storage,
const std::string & object_path,
size_t object_size = 0,
bool exists = false,
bool object_bypasses_cache = false);
/// Optional hint for cache. Use delayed initialization
@ -33,4 +34,6 @@ struct StoredObject
PathKeyForCacheCreator && path_key_for_cache_creator_ = {});
};
using StoredObjects = std::vector<StoredObject>;
}

View File

@ -0,0 +1,276 @@
#include "MetadataStorageFromStaticFilesWebServer.h"
#include <Disks/IDisk.h>
#include <Common/filesystemHelpers.h>
#include <Common/logger_useful.h>
#include <IO/WriteHelpers.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
extern const int FILE_DOESNT_EXIST;
extern const int NETWORK_ERROR;
}
class DiskWebServerDirectoryIterator final : public IDirectoryIterator
{
public:
explicit DiskWebServerDirectoryIterator(std::vector<fs::path> && dir_file_paths_)
: dir_file_paths(std::move(dir_file_paths_)), iter(dir_file_paths.begin()) {}
void next() override { ++iter; }
bool isValid() const override { return iter != dir_file_paths.end(); }
String path() const override { return iter->string(); }
String name() const override { return iter->filename(); }
private:
std::vector<fs::path> dir_file_paths;
std::vector<fs::path>::iterator iter;
};
MetadataStorageFromStaticFilesWebServer::MetadataStorageFromStaticFilesWebServer(
const WebObjectStorage & object_storage_)
: object_storage(object_storage_)
{
}
MetadataTransactionPtr MetadataStorageFromStaticFilesWebServer::createTransaction() const
{
return std::make_shared<MetadataStorageFromStaticFilesWebServerTransaction>(*this);
}
const std::string & MetadataStorageFromStaticFilesWebServer::getPath() const
{
return root_path;
}
bool MetadataStorageFromStaticFilesWebServer::exists(const std::string & path) const
{
return object_storage.files.contains(path);
}
void MetadataStorageFromStaticFilesWebServer::assertExists(const std::string & path) const
{
initializeIfNeeded(path);
if (!exists(path))
#ifdef NDEBUG
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "There is no path {}", path);
#else
{
std::string all_files;
for (const auto & [file, _] : object_storage.files)
{
if (!all_files.empty())
all_files += ", ";
all_files += file;
}
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "There is no path {} (available files: {})", path, all_files);
}
#endif
}
bool MetadataStorageFromStaticFilesWebServer::isFile(const std::string & path) const
{
assertExists(path);
return object_storage.files.at(path).type == WebObjectStorage::FileType::File;
}
bool MetadataStorageFromStaticFilesWebServer::isDirectory(const std::string & path) const
{
assertExists(path);
return object_storage.files.at(path).type == WebObjectStorage::FileType::Directory;
}
uint64_t MetadataStorageFromStaticFilesWebServer::getFileSize(const String & path) const
{
assertExists(path);
return object_storage.files.at(path).size;
}
StoredObjects MetadataStorageFromStaticFilesWebServer::getStorageObjects(const std::string & path) const
{
assertExists(path);
return {StoredObject::create(object_storage, path, object_storage.files.at(path).size, true)};
}
std::vector<std::string> MetadataStorageFromStaticFilesWebServer::listDirectory(const std::string & path) const
{
std::vector<std::string> result;
for (const auto & [file_path, _] : object_storage.files)
{
if (file_path.starts_with(path))
result.push_back(file_path);
}
return result;
}
bool MetadataStorageFromStaticFilesWebServer::initializeIfNeeded(const std::string & path) const
{
if (object_storage.files.find(path) == object_storage.files.end())
{
try
{
object_storage.initialize(fs::path(object_storage.url) / path);
}
catch (...)
{
const auto message = getCurrentExceptionMessage(false);
bool can_throw = CurrentThread::isInitialized() && CurrentThread::get().getQueryContext();
if (can_throw)
throw Exception(ErrorCodes::NETWORK_ERROR, "Cannot load disk metadata. Error: {}", message);
LOG_TRACE(&Poco::Logger::get("DiskWeb"), "Cannot load disk metadata. Error: {}", message);
return false;
}
}
return true;
}
DirectoryIteratorPtr MetadataStorageFromStaticFilesWebServer::iterateDirectory(const std::string & path) const
{
std::vector<fs::path> dir_file_paths;
if (!initializeIfNeeded(path))
return std::make_unique<DiskWebServerDirectoryIterator>(std::move(dir_file_paths));
assertExists(path);
for (const auto & [file_path, _] : object_storage.files)
if (parentPath(file_path) == path)
dir_file_paths.emplace_back(file_path);
LOG_TRACE(object_storage.log, "Iterate directory {} with {} files", path, dir_file_paths.size());
return std::make_unique<DiskWebServerDirectoryIterator>(std::move(dir_file_paths));
}
std::string MetadataStorageFromStaticFilesWebServer::readFileToString(const std::string &) const
{
WebObjectStorage::throwNotAllowed();
}
Poco::Timestamp MetadataStorageFromStaticFilesWebServer::getLastModified(const std::string &) const
{
return {};
}
time_t MetadataStorageFromStaticFilesWebServer::getLastChanged(const std::string &) const
{
return {};
}
uint32_t MetadataStorageFromStaticFilesWebServer::getHardlinkCount(const std::string &) const
{
return 1;
}
const IMetadataStorage & MetadataStorageFromStaticFilesWebServerTransaction::getStorageForNonTransactionalReads() const
{
return metadata_storage;
}
void MetadataStorageFromStaticFilesWebServerTransaction::writeStringToFile(const std::string &, const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::setLastModified(const std::string &, const Poco::Timestamp &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::unlinkFile(const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::removeRecursive(const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::removeDirectory(const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::moveFile(const std::string &, const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::moveDirectory(const std::string &, const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::replaceFile(const std::string &, const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::setReadOnly(const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::createHardLink(const std::string &, const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::addBlobToMetadata(const std::string &, const std::string &, uint64_t)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::unlinkMetadata(const std::string &)
{
WebObjectStorage::throwNotAllowed();
}
void MetadataStorageFromStaticFilesWebServerTransaction::createDirectory(const std::string &)
{
/// Noop.
}
void MetadataStorageFromStaticFilesWebServerTransaction::createDirectoryRecursive(const std::string &)
{
/// Noop.
}
void MetadataStorageFromStaticFilesWebServerTransaction::createEmptyMetadataFile(const std::string & /* path */)
{
/// Noop.
}
void MetadataStorageFromStaticFilesWebServerTransaction::createMetadataFile(
const std::string & /* path */, const std::string & /* blob_name */, uint64_t /* size_in_bytes */)
{
/// Noop.
}
void MetadataStorageFromStaticFilesWebServerTransaction::commit()
{
/// Noop.
}
std::unordered_map<String, String> MetadataStorageFromStaticFilesWebServer::getSerializedMetadata(const std::vector<String> &) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "getSerializedMetadata is not implemented for MetadataStorageFromStaticFilesWebServer");
}
void MetadataStorageFromStaticFilesWebServerTransaction::chmod(const String &, mode_t)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "chmod is not implemented for MetadataStorageFromStaticFilesWebServer");
}
}

View File

@ -0,0 +1,119 @@
#pragma once
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/Web/WebObjectStorage.h>
#include <Disks/IDisk.h>
namespace DB
{
class MetadataStorageFromStaticFilesWebServer final : public IMetadataStorage
{
private:
friend class MetadataStorageFromStaticFilesWebServerTransaction;
const WebObjectStorage & object_storage;
std::string root_path;
void assertExists(const std::string & path) const;
bool initializeIfNeeded(const std::string & path) const;
public:
explicit MetadataStorageFromStaticFilesWebServer(const WebObjectStorage & object_storage_);
MetadataTransactionPtr createTransaction() const override;
const std::string & getPath() const override;
bool exists(const std::string & path) const override;
bool isFile(const std::string & path) const override;
bool isDirectory(const std::string & path) const override;
uint64_t getFileSize(const String & path) const override;
Poco::Timestamp getLastModified(const std::string & path) const override;
time_t getLastChanged(const std::string & path) const override;
std::vector<std::string> listDirectory(const std::string & path) const override;
DirectoryIteratorPtr iterateDirectory(const std::string & path) const override;
std::string readFileToString(const std::string & path) const override;
std::unordered_map<String, String> getSerializedMetadata(const std::vector<String> & file_paths) const override;
uint32_t getHardlinkCount(const std::string & path) const override;
StoredObjects getStorageObjects(const std::string & path) const override;
std::string getObjectStorageRootPath() const override { return ""; }
bool supportsChmod() const override { return false; }
bool supportsStat() const override { return false; }
struct stat stat(const String &) const override { return {}; }
};
class MetadataStorageFromStaticFilesWebServerTransaction final : public IMetadataTransaction
{
private:
DiskPtr disk;
const MetadataStorageFromStaticFilesWebServer & metadata_storage;
public:
explicit MetadataStorageFromStaticFilesWebServerTransaction(
const MetadataStorageFromStaticFilesWebServer & metadata_storage_)
: metadata_storage(metadata_storage_)
{}
~MetadataStorageFromStaticFilesWebServerTransaction() override = default;
const IMetadataStorage & getStorageForNonTransactionalReads() const override;
void commit() override;
void writeStringToFile(const std::string & path, const std::string & data) override;
void createEmptyMetadataFile(const std::string & path) override;
void createMetadataFile(const std::string & path, const std::string & blob_name, uint64_t size_in_bytes) override;
void addBlobToMetadata(const std::string & path, const std::string & blob_name, uint64_t size_in_bytes) override;
void setLastModified(const std::string & path, const Poco::Timestamp & timestamp) override;
void setReadOnly(const std::string & path) override;
void unlinkFile(const std::string & path) override;
void createDirectory(const std::string & path) override;
void createDirectoryRecursive(const std::string & path) override;
void removeDirectory(const std::string & path) override;
void removeRecursive(const std::string & path) override;
void createHardLink(const std::string & path_from, const std::string & path_to) override;
void moveFile(const std::string & path_from, const std::string & path_to) override;
void moveDirectory(const std::string & path_from, const std::string & path_to) override;
void replaceFile(const std::string & path_from, const std::string & path_to) override;
void unlinkMetadata(const std::string & path) override;
bool supportsChmod() const override { return false; }
void chmod(const String &, mode_t) override;
};
}

Some files were not shown because too many files have changed in this diff Show More