Merge branch 'master' into contrib_sparse_checkout

This commit is contained in:
Alexander Tokmakov 2023-03-31 16:10:15 +02:00
commit 2ab198f19d
315 changed files with 6437 additions and 2549 deletions

View File

@ -470,7 +470,7 @@ jobs:
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-server --image-path docker/server
python3 docker_server.py --release-type head --no-push --no-ubuntu \
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-keeper --image-path docker/keeper
- name: Cleanup
if: always()

View File

@ -862,7 +862,7 @@ jobs:
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_server.py --release-type head \
--image-repo clickhouse/clickhouse-server --image-path docker/server
python3 docker_server.py --release-type head --no-ubuntu \
python3 docker_server.py --release-type head \
--image-repo clickhouse/clickhouse-keeper --image-path docker/keeper
- name: Cleanup
if: always()

View File

@ -918,7 +918,7 @@ jobs:
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-server --image-path docker/server
python3 docker_server.py --release-type head --no-push --no-ubuntu \
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-keeper --image-path docker/keeper
- name: Cleanup
if: always()

View File

@ -55,7 +55,7 @@ jobs:
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_server.py --release-type auto --version "$GITHUB_TAG" \
--image-repo clickhouse/clickhouse-server --image-path docker/server
python3 docker_server.py --release-type auto --version "$GITHUB_TAG" --no-ubuntu \
python3 docker_server.py --release-type auto --version "$GITHUB_TAG" \
--image-repo clickhouse/clickhouse-keeper --image-path docker/keeper
- name: Cleanup
if: always()

View File

@ -527,7 +527,7 @@ jobs:
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-server --image-path docker/server
python3 docker_server.py --release-type head --no-push --no-ubuntu \
python3 docker_server.py --release-type head --no-push \
--image-repo clickhouse/clickhouse-keeper --image-path docker/keeper
- name: Cleanup
if: always()

View File

@ -1,10 +1,195 @@
### Table of Contents
**[ClickHouse release v23.3 LTS, 2023-03-30](#233)**<br/>
**[ClickHouse release v23.2, 2023-02-23](#232)**<br/>
**[ClickHouse release v23.1, 2023-01-25](#231)**<br/>
**[Changelog for 2022](https://clickhouse.com/docs/en/whats-new/changelog/2022/)**<br/>
# 2023 Changelog
### <a id="233"></a> ClickHouse release 23.3 LTS, 2023-03-30
#### Upgrade Notes
* Lightweight DELETEs are production ready and enabled by default. The `DELETE` query for MergeTree tables is now available by default.
* The behavior of `*domain*RFC` and `netloc` functions is slightly changed: relaxed the set of symbols that are allowed in the URL authority for better conformance. [#46841](https://github.com/ClickHouse/ClickHouse/pull/46841) ([Azat Khuzhin](https://github.com/azat)).
* Prohibited creating tables based on KafkaEngine with DEFAULT/EPHEMERAL/ALIAS/MATERIALIZED statements for columns. [#47138](https://github.com/ClickHouse/ClickHouse/pull/47138) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* An "asynchronous connection drain" feature is removed. Related settings and metrics are removed as well. It was an internal feature, so the removal should not affect users who had never heard about that feature. [#47486](https://github.com/ClickHouse/ClickHouse/pull/47486) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support 256-bit Decimal data type (more than 38 digits) in `arraySum`/`Min`/`Max`/`Avg`/`Product`, `arrayCumSum`/`CumSumNonNegative`, `arrayDifference`, array construction, IN operator, query parameters, `groupArrayMovingSum`, statistical functions, `min`/`max`/`any`/`argMin`/`argMax`, PostgreSQL wire protocol, MySQL table engine and function, `sumMap`, `mapAdd`, `mapSubtract`, `arrayIntersect`. Add support for big integers in `arrayIntersect`. Statistical aggregate functions involving moments (such as `corr` or various `TTest`s) will use `Float64` as their internal representation (they were using `Decimal128` before this change, but it was pointless), and these functions can return `nan` instead of `inf` in case of infinite variance. Some functions were allowed on `Decimal256` data types but returned `Decimal128` in previous versions - now it is fixed. This closes [#47569](https://github.com/ClickHouse/ClickHouse/issues/47569). This closes [#44864](https://github.com/ClickHouse/ClickHouse/issues/44864). This closes [#28335](https://github.com/ClickHouse/ClickHouse/issues/28335). [#47594](https://github.com/ClickHouse/ClickHouse/pull/47594) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make backup_threads/restore_threads server settings (instead of user settings). [#47881](https://github.com/ClickHouse/ClickHouse/pull/47881) ([Azat Khuzhin](https://github.com/azat)).
* Do not allow const and non-deterministic secondary indices [#46839](https://github.com/ClickHouse/ClickHouse/pull/46839) ([Anton Popov](https://github.com/CurtizJ)).
#### New Feature
* Add a new mode for splitting the work on replicas using settings `parallel_replicas_custom_key` and `parallel_replicas_custom_key_filter_type`. If the cluster consists of a single shard with multiple replicas, up to `max_parallel_replicas` will be randomly picked and turned into shards. For each shard, a corresponding filter is added to the query on the initiator before being sent to the shard. If the cluster consists of multiple shards, it will behave the same as `sample_key` but with the possibility to define an arbitrary key. [#45108](https://github.com/ClickHouse/ClickHouse/pull/45108) ([Antonio Andelic](https://github.com/antonio2368)).
* An option to display partial result on cancel: Added query setting `partial_result_on_first_cancel` allowing the canceled query (e.g. due to Ctrl-C) to return a partial result. [#45689](https://github.com/ClickHouse/ClickHouse/pull/45689) ([Alexey Perevyshin](https://github.com/alexX512)).
* Added support of arbitrary tables engines for temporary tables (except for Replicated and KeeperMap engines). Close [#31497](https://github.com/ClickHouse/ClickHouse/issues/31497). [#46071](https://github.com/ClickHouse/ClickHouse/pull/46071) ([Roman Vasin](https://github.com/rvasin)).
* Add support for replication of user-defined SQL functions using centralized storage in Keeper. [#46085](https://github.com/ClickHouse/ClickHouse/pull/46085) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Implement `system.server_settings` (similar to `system.settings`), which will contain server configurations. [#46550](https://github.com/ClickHouse/ClickHouse/pull/46550) ([pufit](https://github.com/pufit)).
* Support for `UNDROP TABLE` query. Closes [#46811](https://github.com/ClickHouse/ClickHouse/issues/46811). [#47241](https://github.com/ClickHouse/ClickHouse/pull/47241) ([chen](https://github.com/xiedeyantu)).
* Allow separate grants for named collections (e.g. to be able to give `SHOW/CREATE/ALTER/DROP named collection` access only to certain collections, instead of all at once). Closes [#40894](https://github.com/ClickHouse/ClickHouse/issues/40894). Add new access type `NAMED_COLLECTION_CONTROL` which is not given to user default unless explicitly added to the user config (is required to be able to do `GRANT ALL`), also `show_named_collections` is no longer obligatory to be manually specified for user default to be able to have full access rights as was in 23.2. [#46241](https://github.com/ClickHouse/ClickHouse/pull/46241) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow nested custom disks. Previously custom disks supported only flat disk structure. [#47106](https://github.com/ClickHouse/ClickHouse/pull/47106) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Introduce a function `widthBucket` (with a `WIDTH_BUCKET` alias for compatibility). [#42974](https://github.com/ClickHouse/ClickHouse/issues/42974). [#46790](https://github.com/ClickHouse/ClickHouse/pull/46790) ([avoiderboi](https://github.com/avoiderboi)).
* Add new function `parseDateTime`/`parseDateTimeInJodaSyntax` according to the specified format string. parseDateTime parses String to DateTime in MySQL syntax, parseDateTimeInJodaSyntax parses in Joda syntax. [#46815](https://github.com/ClickHouse/ClickHouse/pull/46815) ([李扬](https://github.com/taiyang-li)).
* Use `dummy UInt8` for the default structure of table function `null`. Closes [#46930](https://github.com/ClickHouse/ClickHouse/issues/46930). [#47006](https://github.com/ClickHouse/ClickHouse/pull/47006) ([flynn](https://github.com/ucasfl)).
* Support for date format with a comma, like `Dec 15, 2021` in the `parseDateTimeBestEffort` function. Closes [#46816](https://github.com/ClickHouse/ClickHouse/issues/46816). [#47071](https://github.com/ClickHouse/ClickHouse/pull/47071) ([chen](https://github.com/xiedeyantu)).
* Add settings `http_wait_end_of_query` and `http_response_buffer_size` that corresponds to URL params `wait_end_of_query` and `buffer_size` for the HTTP interface. This allows changing these settings in the profiles. [#47108](https://github.com/ClickHouse/ClickHouse/pull/47108) ([Vladimir C](https://github.com/vdimir)).
* Add `system.dropped_tables` table that shows tables that were dropped from `Atomic` databases but were not completely removed yet. [#47364](https://github.com/ClickHouse/ClickHouse/pull/47364) ([chen](https://github.com/xiedeyantu)).
* Add `INSTR` as alias of `positionCaseInsensitive` for MySQL compatibility. Closes [#47529](https://github.com/ClickHouse/ClickHouse/issues/47529). [#47535](https://github.com/ClickHouse/ClickHouse/pull/47535) ([flynn](https://github.com/ucasfl)).
* Added `toDecimalString` function allowing to convert numbers to string with fixed precision. [#47838](https://github.com/ClickHouse/ClickHouse/pull/47838) ([Andrey Zvonov](https://github.com/zvonand)).
* Add a merge tree setting `max_number_of_mutations_for_replica`. It limits the number of part mutations per replica to the specified amount. Zero means no limit on the number of mutations per replica (the execution can still be constrained by other settings). [#48047](https://github.com/ClickHouse/ClickHouse/pull/48047) ([Vladimir C](https://github.com/vdimir)).
* Add the Map-related function `mapFromArrays`, which allows the creation of a map from a pair of arrays. [#31125](https://github.com/ClickHouse/ClickHouse/pull/31125) ([李扬](https://github.com/taiyang-li)).
* Allow control of compression in Parquet/ORC/Arrow output formats, adds support for more compression input formats. This closes [#13541](https://github.com/ClickHouse/ClickHouse/issues/13541). [#47114](https://github.com/ClickHouse/ClickHouse/pull/47114) ([Kruglov Pavel](https://github.com/Avogar)).
* Add SSL User Certificate authentication to the native protocol. Closes [#47077](https://github.com/ClickHouse/ClickHouse/issues/47077). [#47596](https://github.com/ClickHouse/ClickHouse/pull/47596) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add *OrNull() and *OrZero() variants for `parseDateTime`, add alias `str_to_date` for MySQL parity. [#48000](https://github.com/ClickHouse/ClickHouse/pull/48000) ([Robert Schulze](https://github.com/rschu1ze)).
* Added operator `REGEXP` (similar to operators "LIKE", "IN", "MOD" etc.) for better compatibility with MySQL [#47869](https://github.com/ClickHouse/ClickHouse/pull/47869) ([Robert Schulze](https://github.com/rschu1ze)).
#### Performance Improvement
* Marks in memory are now compressed, using 3-6x less memory. [#47290](https://github.com/ClickHouse/ClickHouse/pull/47290) ([Michael Kolupaev](https://github.com/al13n321)).
* Backups for large numbers of files were unbelievably slow in previous versions. Not anymore. Now they are unbelievably fast. [#47251](https://github.com/ClickHouse/ClickHouse/pull/47251) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Introduced a separate thread pool for backup's IO operations. This will allow scaling it independently of other pools and increase performance. [#47174](https://github.com/ClickHouse/ClickHouse/pull/47174) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). Use MultiRead request and retries for collecting metadata at the final stage of backup processing. [#47243](https://github.com/ClickHouse/ClickHouse/pull/47243) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). If a backup and restoring data are both in S3 then server-side copy should be used from now on. [#47546](https://github.com/ClickHouse/ClickHouse/pull/47546) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed excessive reading in queries with `FINAL`. [#47801](https://github.com/ClickHouse/ClickHouse/pull/47801) ([Nikita Taranov](https://github.com/nickitat)).
* Setting `max_final_threads` would be set to the number of cores at server startup (by the same algorithm as used for `max_threads`). This improves the concurrency of `final` execution on servers with high number of CPUs. [#47915](https://github.com/ClickHouse/ClickHouse/pull/47915) ([Nikita Taranov](https://github.com/nickitat)).
* Allow executing reading pipeline for DIRECT dictionary with CLICKHOUSE source in multiple threads. To enable set `dictionary_use_async_executor=1` in `SETTINGS` section for source in `CREATE DICTIONARY` statement. [#47986](https://github.com/ClickHouse/ClickHouse/pull/47986) ([Vladimir C](https://github.com/vdimir)).
* Optimize one nullable key aggregate performance. [#45772](https://github.com/ClickHouse/ClickHouse/pull/45772) ([LiuNeng](https://github.com/liuneng1994)).
* Implemented lowercase `tokenbf_v1` index utilization for `hasTokenOrNull`, `hasTokenCaseInsensitive` and `hasTokenCaseInsensitiveOrNull`. [#46252](https://github.com/ClickHouse/ClickHouse/pull/46252) ([ltrk2](https://github.com/ltrk2)).
* Optimize functions `position` and `LIKE` by searching the first two chars using SIMD. [#46289](https://github.com/ClickHouse/ClickHouse/pull/46289) ([Jiebin Sun](https://github.com/jiebinn)).
* Optimize queries from the `system.detached_parts`, which could be significantly large. Added several sources with respect to the block size limitation; in each block, an IO thread pool is used to calculate the part size, i.e. to make syscalls in parallel. [#46624](https://github.com/ClickHouse/ClickHouse/pull/46624) ([Sema Checherinda](https://github.com/CheSema)).
* Increase the default value of `max_replicated_merges_in_queue` for ReplicatedMergeTree tables from 16 to 1000. It allows faster background merge operation on clusters with a very large number of replicas, such as clusters with shared storage in ClickHouse Cloud. [#47050](https://github.com/ClickHouse/ClickHouse/pull/47050) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Updated `clickhouse-copier` to use `GROUP BY` instead of `DISTINCT` to get the list of partitions. For large tables, this reduced the select time from over 500s to under 1s. [#47386](https://github.com/ClickHouse/ClickHouse/pull/47386) ([Clayton McClure](https://github.com/cmcclure-twilio)).
* Fix performance degradation in `ASOF JOIN`. [#47544](https://github.com/ClickHouse/ClickHouse/pull/47544) ([Ongkong](https://github.com/ongkong)).
* Even more batching in Keeper. Improve performance by avoiding breaking batches on read requests. [#47978](https://github.com/ClickHouse/ClickHouse/pull/47978) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow PREWHERE for Merge with different DEFAULT expressions for columns. [#46831](https://github.com/ClickHouse/ClickHouse/pull/46831) ([Azat Khuzhin](https://github.com/azat)).
#### Experimental Feature
* Parallel replicas: Improved the overall performance by better utilizing the local replica, and forbid the reading with parallel replicas from non-replicated MergeTree by default. [#47858](https://github.com/ClickHouse/ClickHouse/pull/47858) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Support filter push down to left table for JOIN with `Join`, `Dictionary` and `EmbeddedRocksDB` tables if the experimental Analyzer is enabled. [#47280](https://github.com/ClickHouse/ClickHouse/pull/47280) ([Maksim Kita](https://github.com/kitaisreal)).
* Now ReplicatedMergeTree with zero copy replication has less load to Keeper. [#47676](https://github.com/ClickHouse/ClickHouse/pull/47676) ([alesapin](https://github.com/alesapin)).
* Fix create materialized view with MaterializedPostgreSQL [#40807](https://github.com/ClickHouse/ClickHouse/pull/40807) ([Maksim Buren](https://github.com/maks-buren630501)).
#### Improvement
* Enable `input_format_json_ignore_unknown_keys_in_named_tuple` by default. [#46742](https://github.com/ClickHouse/ClickHouse/pull/46742) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow errors to be ignored while pushing to MATERIALIZED VIEW (add new setting `materialized_views_ignore_errors`, by default to `false`, but it is set to `true` for flushing logs to `system.*_log` tables unconditionally). [#46658](https://github.com/ClickHouse/ClickHouse/pull/46658) ([Azat Khuzhin](https://github.com/azat)).
* Track the file queue of distributed sends in memory. [#45491](https://github.com/ClickHouse/ClickHouse/pull/45491) ([Azat Khuzhin](https://github.com/azat)).
* Now `X-ClickHouse-Query-Id` and `X-ClickHouse-Timezone` headers are added to responses in all queries via HTTP protocol. Previously it was done only for `SELECT` queries. [#46364](https://github.com/ClickHouse/ClickHouse/pull/46364) ([Anton Popov](https://github.com/CurtizJ)).
* External tables from `MongoDB`: support for connection to a replica set via a URI with a host:port enum and support for the readPreference option in MongoDB dictionaries. Example URI: mongodb://db0.example.com:27017,db1.example.com:27017,db2.example.com:27017/?replicaSet=myRepl&readPreference=primary. [#46524](https://github.com/ClickHouse/ClickHouse/pull/46524) ([artem-yadr](https://github.com/artem-yadr)).
* This improvement should be invisible for users. Re-implement projection analysis on top of query plan. Added setting `query_plan_optimize_projection=1` to switch between old and new version. Fixes [#44963](https://github.com/ClickHouse/ClickHouse/issues/44963). [#46537](https://github.com/ClickHouse/ClickHouse/pull/46537) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Use Parquet format v2 instead of v1 in output format by default. Add setting `output_format_parquet_version` to control parquet version, possible values `1.0`, `2.4`, `2.6`, `2.latest` (default). [#46617](https://github.com/ClickHouse/ClickHouse/pull/46617) ([Kruglov Pavel](https://github.com/Avogar)).
* It is now possible to use the new configuration syntax to configure Kafka topics with periods (`.`) in their name. [#46752](https://github.com/ClickHouse/ClickHouse/pull/46752) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix heuristics that check hyperscan patterns for problematic repeats. [#46819](https://github.com/ClickHouse/ClickHouse/pull/46819) ([Robert Schulze](https://github.com/rschu1ze)).
* Don't report ZK node exists to system.errors when a block was created concurrently by a different replica. [#46820](https://github.com/ClickHouse/ClickHouse/pull/46820) ([Raúl Marín](https://github.com/Algunenano)).
* Increase the limit for opened files in `clickhouse-local`. It will be able to read from `web` tables on servers with a huge number of CPU cores. Do not back off reading from the URL table engine in case of too many opened files. This closes [#46852](https://github.com/ClickHouse/ClickHouse/issues/46852). [#46853](https://github.com/ClickHouse/ClickHouse/pull/46853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Exceptions thrown when numbers cannot be parsed now have an easier-to-read exception message. [#46917](https://github.com/ClickHouse/ClickHouse/pull/46917) ([Robert Schulze](https://github.com/rschu1ze)).
* Added update `system.backups` after every processed task to track the progress of backups. [#46989](https://github.com/ClickHouse/ClickHouse/pull/46989) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Allow types conversion in Native input format. Add settings `input_format_native_allow_types_conversion` that controls it (enabled by default). [#46990](https://github.com/ClickHouse/ClickHouse/pull/46990) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow IPv4 in the `range` function to generate IP ranges. [#46995](https://github.com/ClickHouse/ClickHouse/pull/46995) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Improve exception message when it's impossible to move a part from one volume/disk to another. [#47032](https://github.com/ClickHouse/ClickHouse/pull/47032) ([alesapin](https://github.com/alesapin)).
* Support `Bool` type in `JSONType` function. Previously `Null` type was mistakenly returned for bool values. [#47046](https://github.com/ClickHouse/ClickHouse/pull/47046) ([Anton Popov](https://github.com/CurtizJ)).
* Use `_request_body` parameter to configure predefined HTTP queries. [#47086](https://github.com/ClickHouse/ClickHouse/pull/47086) ([Constantine Peresypkin](https://github.com/pkit)).
* Automatic indentation in the built-in UI SQL editor when Enter is pressed. [#47113](https://github.com/ClickHouse/ClickHouse/pull/47113) ([Alexey Korepanov](https://github.com/alexkorep)).
* Self-extraction with 'sudo' will attempt to set uid and gid of extracted files to running user. [#47116](https://github.com/ClickHouse/ClickHouse/pull/47116) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Previously, the `repeat` function's second argument only accepted an unsigned integer type, which meant it could not accept values such as -1. This behavior differed from that of the Spark function. In this update, the repeat function has been modified to match the behavior of the Spark function. It now accepts the same types of inputs, including negative integers. Extensive testing has been performed to verify the correctness of the updated implementation. [#47134](https://github.com/ClickHouse/ClickHouse/pull/47134) ([KevinyhZou](https://github.com/KevinyhZou)). Note: the changelog entry was rewritten by ChatGPT.
* Remove `::__1` part from stacktraces. Display `std::basic_string<char, ...` as `String` in stacktraces. [#47171](https://github.com/ClickHouse/ClickHouse/pull/47171) ([Mike Kot](https://github.com/myrrc)).
* Reimplement interserver mode to avoid replay attacks (note, that change is backward compatible with older servers). [#47213](https://github.com/ClickHouse/ClickHouse/pull/47213) ([Azat Khuzhin](https://github.com/azat)).
* Improve recognition of regular expression groups and refine the regexp_tree dictionary. [#47218](https://github.com/ClickHouse/ClickHouse/pull/47218) ([Han Fei](https://github.com/hanfei1991)).
* Keeper improvement: Add new 4LW `clrs` to clean resources used by Keeper (e.g. release unused memory). [#47256](https://github.com/ClickHouse/ClickHouse/pull/47256) ([Antonio Andelic](https://github.com/antonio2368)).
* Add optional arguments to codecs `DoubleDelta(bytes_size)`, `Gorilla(bytes_size)`, `FPC(level, float_size)`, this allows using these codecs without column type in `clickhouse-compressor`. Fix possible aborts and arithmetic errors in `clickhouse-compressor` with these codecs. Fixes: https://github.com/ClickHouse/ClickHouse/discussions/47262. [#47271](https://github.com/ClickHouse/ClickHouse/pull/47271) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for big int types to the `runningDifference` function. Closes [#47194](https://github.com/ClickHouse/ClickHouse/issues/47194). [#47322](https://github.com/ClickHouse/ClickHouse/pull/47322) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add an expiration window for S3 credentials that have an expiration time to avoid `ExpiredToken` errors in some edge cases. It can be controlled with `expiration_window_seconds` config, the default is 120 seconds. [#47423](https://github.com/ClickHouse/ClickHouse/pull/47423) ([Antonio Andelic](https://github.com/antonio2368)).
* Support Decimals and Date32 in `Avro` format. [#47434](https://github.com/ClickHouse/ClickHouse/pull/47434) ([Kruglov Pavel](https://github.com/Avogar)).
* Do not start the server if an interrupted conversion from `Ordinary` to `Atomic` was detected, print a better error message with troubleshooting instructions. [#47487](https://github.com/ClickHouse/ClickHouse/pull/47487) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add a new column `kind` to the `system.opentelemetry_span_log`. This column holds the value of [SpanKind](https://opentelemetry.io/docs/reference/specification/trace/api/#spankind) defined in OpenTelemtry. [#47499](https://github.com/ClickHouse/ClickHouse/pull/47499) ([Frank Chen](https://github.com/FrankChen021)).
* Allow reading/writing nested arrays in `Protobuf` format with only the root field name as column name. Previously column name should've contained all nested field names (like `a.b.c Array(Array(Array(UInt32)))`, now you can use just `a Array(Array(Array(UInt32)))`. [#47650](https://github.com/ClickHouse/ClickHouse/pull/47650) ([Kruglov Pavel](https://github.com/Avogar)).
* Added an optional `STRICT` modifier for `SYSTEM SYNC REPLICA` which makes the query wait for the replication queue to become empty (just like it worked before https://github.com/ClickHouse/ClickHouse/pull/45648). [#47659](https://github.com/ClickHouse/ClickHouse/pull/47659) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Improve the naming of some OpenTelemetry span logs. [#47667](https://github.com/ClickHouse/ClickHouse/pull/47667) ([Frank Chen](https://github.com/FrankChen021)).
* Prevent using too long chains of aggregate function combinators (they can lead to slow queries in the analysis stage). This closes [#47715](https://github.com/ClickHouse/ClickHouse/issues/47715). [#47716](https://github.com/ClickHouse/ClickHouse/pull/47716) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support for subquery in parameterized views; resolves [#46741](https://github.com/ClickHouse/ClickHouse/issues/46741) [#47725](https://github.com/ClickHouse/ClickHouse/pull/47725) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix memory leak in MySQL integration (reproduces with `connection_auto_close=1`). [#47732](https://github.com/ClickHouse/ClickHouse/pull/47732) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improved error handling in the code related to Decimal parameters, resulting in more informative error messages. Previously, when incorrect Decimal parameters were supplied, the error message generated was unclear or unhelpful. With this update, the error message printed has been fixed to provide more detailed and useful information, making it easier to identify and correct issues related to Decimal parameters. [#47812](https://github.com/ClickHouse/ClickHouse/pull/47812) ([Yu Feng](https://github.com/Vigor-jpg)). Note: this changelog entry is rewritten by ChatGPT.
* The parameter `exact_rows_before_limit` is used to make `rows_before_limit_at_least` is designed to accurately reflect the number of rows returned before the limit is reached. This pull request addresses issues encountered when the query involves distributed processing across multiple shards or sorting operations. Prior to this update, these scenarios were not functioning as intended. [#47874](https://github.com/ClickHouse/ClickHouse/pull/47874) ([Amos Bird](https://github.com/amosbird)).
* ThreadPools metrics introspection. [#47880](https://github.com/ClickHouse/ClickHouse/pull/47880) ([Azat Khuzhin](https://github.com/azat)).
* Add `WriteBufferFromS3Microseconds` and `WriteBufferFromS3RequestsErrors` profile events. [#47885](https://github.com/ClickHouse/ClickHouse/pull/47885) ([Antonio Andelic](https://github.com/antonio2368)).
* Add `--link` and `--noninteractive` (`-y`) options to ClickHouse install. Closes [#47750](https://github.com/ClickHouse/ClickHouse/issues/47750). [#47887](https://github.com/ClickHouse/ClickHouse/pull/47887) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fixed `UNKNOWN_TABLE` exception when attaching to a materialized view that has dependent tables that are not available. This might be useful when trying to restore state from a backup. [#47975](https://github.com/ClickHouse/ClickHouse/pull/47975) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix case when the (optional) path is not added to an encrypted disk configuration. [#47981](https://github.com/ClickHouse/ClickHouse/pull/47981) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support for CTE in parameterized views Implementation: Updated to allow query parameters while evaluating scalar subqueries. [#48065](https://github.com/ClickHouse/ClickHouse/pull/48065) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Support big integers `(U)Int128/(U)Int256`, `Map` with any key type and `DateTime64` with any precision (not only 3 and 6). [#48119](https://github.com/ClickHouse/ClickHouse/pull/48119) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow skipping errors related to unknown enum values in row input formats. [#48133](https://github.com/ClickHouse/ClickHouse/pull/48133) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Build/Testing/Packaging Improvement
* ClickHouse now builds with `C++23`. [#47424](https://github.com/ClickHouse/ClickHouse/pull/47424) ([Robert Schulze](https://github.com/rschu1ze)).
* Fuzz `EXPLAIN` queries in the AST Fuzzer. [#47803](https://github.com/ClickHouse/ClickHouse/pull/47803) [#47852](https://github.com/ClickHouse/ClickHouse/pull/47852) ([flynn](https://github.com/ucasfl)).
* Split stress test and the automated backward compatibility check (now Upgrade check). [#44879](https://github.com/ClickHouse/ClickHouse/pull/44879) ([Kruglov Pavel](https://github.com/Avogar)).
* Updated the Ubuntu Image for Docker to calm down some bogus security reports. [#46784](https://github.com/ClickHouse/ClickHouse/pull/46784) ([Julio Jimenez](https://github.com/juliojimenez)). Please note that ClickHouse has no dependencies and does not require Docker.
* Adds a prompt to allow the removal of an existing `clickhouse` download when using "curl | sh" download of ClickHouse. Prompt is "ClickHouse binary clickhouse already exists. Overwrite? \[y/N\]". [#46859](https://github.com/ClickHouse/ClickHouse/pull/46859) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Fix error during server startup on old distros (e.g. Amazon Linux 2) and on ARM that glibc 2.28 symbols are not found. [#47008](https://github.com/ClickHouse/ClickHouse/pull/47008) ([Robert Schulze](https://github.com/rschu1ze)).
* Prepare for clang 16. [#47027](https://github.com/ClickHouse/ClickHouse/pull/47027) ([Amos Bird](https://github.com/amosbird)).
* Added a CI check which ensures ClickHouse can run with an old glibc on ARM. [#47063](https://github.com/ClickHouse/ClickHouse/pull/47063) ([Robert Schulze](https://github.com/rschu1ze)).
* Add a style check to prevent incorrect usage of the `NDEBUG` macro. [#47699](https://github.com/ClickHouse/ClickHouse/pull/47699) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Speed up the build a little. [#47714](https://github.com/ClickHouse/ClickHouse/pull/47714) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Bump `vectorscan` to 5.4.9. [#47955](https://github.com/ClickHouse/ClickHouse/pull/47955) ([Robert Schulze](https://github.com/rschu1ze)).
* Add a unit test to assert Apache Arrow's fatal logging does not abort. It covers the changes in [ClickHouse/arrow#16](https://github.com/ClickHouse/arrow/pull/16). [#47958](https://github.com/ClickHouse/ClickHouse/pull/47958) ([Arthur Passos](https://github.com/arthurpassos)).
* Restore the ability of native macOS debug server build to start. [#48050](https://github.com/ClickHouse/ClickHouse/pull/48050) ([Robert Schulze](https://github.com/rschu1ze)). Note: this change is only relevant for development, as the ClickHouse official builds are done with cross-compilation.
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix formats parser resetting, test processing bad messages in `Kafka` [#45693](https://github.com/ClickHouse/ClickHouse/pull/45693) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data size calculation in Keeper [#46086](https://github.com/ClickHouse/ClickHouse/pull/46086) ([Antonio Andelic](https://github.com/antonio2368)).
* Fixed a bug in automatic retries of `DROP TABLE` query with `ReplicatedMergeTree` tables and `Atomic` databases. In rare cases it could lead to `Can't get data for node /zk_path/log_pointer` and `The specified key does not exist` errors if the ZooKeeper session expired during DROP and a new replicated table with the same path in ZooKeeper was created in parallel. [#46384](https://github.com/ClickHouse/ClickHouse/pull/46384) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix incorrect alias recursion while normalizing queries that prevented some queries to run. [#46609](https://github.com/ClickHouse/ClickHouse/pull/46609) ([Raúl Marín](https://github.com/Algunenano)).
* Fix IPv4/IPv6 serialization/deserialization in binary formats [#46616](https://github.com/ClickHouse/ClickHouse/pull/46616) ([Kruglov Pavel](https://github.com/Avogar)).
* ActionsDAG: do not change result of `and` during optimization [#46653](https://github.com/ClickHouse/ClickHouse/pull/46653) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Improve query cancellation when a client dies [#46681](https://github.com/ClickHouse/ClickHouse/pull/46681) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix arithmetic operations in aggregate optimization [#46705](https://github.com/ClickHouse/ClickHouse/pull/46705) ([Duc Canh Le](https://github.com/canhld94)).
* Fix possible `clickhouse-local`'s abort on JSONEachRow schema inference [#46731](https://github.com/ClickHouse/ClickHouse/pull/46731) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix changing an expired role [#46772](https://github.com/ClickHouse/ClickHouse/pull/46772) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix combined PREWHERE column accumulation from multiple steps [#46785](https://github.com/ClickHouse/ClickHouse/pull/46785) ([Alexander Gololobov](https://github.com/davenger)).
* Use initial range for fetching file size in HTTP read buffer. Without this change, some remote files couldn't be processed. [#46824](https://github.com/ClickHouse/ClickHouse/pull/46824) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix the incorrect progress bar while using the URL tables [#46830](https://github.com/ClickHouse/ClickHouse/pull/46830) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix MSan report in `maxIntersections` function [#46847](https://github.com/ClickHouse/ClickHouse/pull/46847) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix a bug in `Map` data type [#46856](https://github.com/ClickHouse/ClickHouse/pull/46856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong results of some LIKE searches when the LIKE pattern contains quoted non-quotable characters [#46875](https://github.com/ClickHouse/ClickHouse/pull/46875) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix - WITH FILL would produce abort when the Filling Transform processing an empty block [#46897](https://github.com/ClickHouse/ClickHouse/pull/46897) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix date and int inference from string in JSON [#46972](https://github.com/ClickHouse/ClickHouse/pull/46972) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix bug in zero-copy replication disk choice during fetch [#47010](https://github.com/ClickHouse/ClickHouse/pull/47010) ([alesapin](https://github.com/alesapin)).
* Fix a typo in systemd service definition [#47051](https://github.com/ClickHouse/ClickHouse/pull/47051) ([Palash Goel](https://github.com/palash-goel)).
* Fix the NOT_IMPLEMENTED error with CROSS JOIN and algorithm = auto [#47068](https://github.com/ClickHouse/ClickHouse/pull/47068) ([Vladimir C](https://github.com/vdimir)).
* Fix the problem that the 'ReplicatedMergeTree' table failed to insert two similar data when the 'part_type' is configured as 'InMemory' mode (experimental feature). [#47121](https://github.com/ClickHouse/ClickHouse/pull/47121) ([liding1992](https://github.com/liding1992)).
* External dictionaries / library-bridge: Fix error "unknown library method 'extDict_libClone'" [#47136](https://github.com/ClickHouse/ClickHouse/pull/47136) ([alex filatov](https://github.com/phil-88)).
* Fix race condition in a grace hash join with limit [#47153](https://github.com/ClickHouse/ClickHouse/pull/47153) ([Vladimir C](https://github.com/vdimir)).
* Fix concrete columns PREWHERE support [#47154](https://github.com/ClickHouse/ClickHouse/pull/47154) ([Azat Khuzhin](https://github.com/azat)).
* Fix possible deadlock in Query Status [#47161](https://github.com/ClickHouse/ClickHouse/pull/47161) ([Kruglov Pavel](https://github.com/Avogar)).
* Forbid insert select for the same `Join` table, as it leads to a deadlock [#47260](https://github.com/ClickHouse/ClickHouse/pull/47260) ([Vladimir C](https://github.com/vdimir)).
* Skip merged partitions for `min_age_to_force_merge_seconds` merges [#47303](https://github.com/ClickHouse/ClickHouse/pull/47303) ([Antonio Andelic](https://github.com/antonio2368)).
* Modify find_first_symbols, so it works as expected for find_first_not_symbols [#47304](https://github.com/ClickHouse/ClickHouse/pull/47304) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix big numbers inference in CSV [#47410](https://github.com/ClickHouse/ClickHouse/pull/47410) ([Kruglov Pavel](https://github.com/Avogar)).
* Disable logical expression optimizer for expression with aliases. [#47451](https://github.com/ClickHouse/ClickHouse/pull/47451) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix error in `decodeURLComponent` [#47457](https://github.com/ClickHouse/ClickHouse/pull/47457) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix explain graph with projection [#47473](https://github.com/ClickHouse/ClickHouse/pull/47473) ([flynn](https://github.com/ucasfl)).
* Fix query parameters [#47488](https://github.com/ClickHouse/ClickHouse/pull/47488) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Parameterized view: a bug fix. [#47495](https://github.com/ClickHouse/ClickHouse/pull/47495) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fuzzer of data formats, and the corresponding fixes. [#47519](https://github.com/ClickHouse/ClickHouse/pull/47519) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix monotonicity check for `DateTime64` [#47526](https://github.com/ClickHouse/ClickHouse/pull/47526) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix "block structure mismatch" for a Nullable LowCardinality column [#47537](https://github.com/ClickHouse/ClickHouse/pull/47537) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Proper fix for a bug in Apache Parquet [#45878](https://github.com/ClickHouse/ClickHouse/issues/45878) [#47538](https://github.com/ClickHouse/ClickHouse/pull/47538) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix `BSONEachRow` parallel parsing when document size is invalid [#47540](https://github.com/ClickHouse/ClickHouse/pull/47540) ([Kruglov Pavel](https://github.com/Avogar)).
* Preserve error in `system.distribution_queue` on `SYSTEM FLUSH DISTRIBUTED` [#47541](https://github.com/ClickHouse/ClickHouse/pull/47541) ([Azat Khuzhin](https://github.com/azat)).
* Check for duplicate column in `BSONEachRow` format [#47609](https://github.com/ClickHouse/ClickHouse/pull/47609) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix wait for zero copy lock during move [#47631](https://github.com/ClickHouse/ClickHouse/pull/47631) ([alesapin](https://github.com/alesapin)).
* Fix aggregation by partitions [#47634](https://github.com/ClickHouse/ClickHouse/pull/47634) ([Nikita Taranov](https://github.com/nickitat)).
* Fix bug in tuple as array serialization in `BSONEachRow` format [#47690](https://github.com/ClickHouse/ClickHouse/pull/47690) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash in `polygonsSymDifferenceCartesian` [#47702](https://github.com/ClickHouse/ClickHouse/pull/47702) ([pufit](https://github.com/pufit)).
* Fix reading from storage `File` compressed files with `zlib` and `gzip` compression [#47796](https://github.com/ClickHouse/ClickHouse/pull/47796) ([Anton Popov](https://github.com/CurtizJ)).
* Improve empty query detection for PostgreSQL (for pgx golang driver) [#47854](https://github.com/ClickHouse/ClickHouse/pull/47854) ([Azat Khuzhin](https://github.com/azat)).
* Fix DateTime monotonicity check for LowCardinality types [#47860](https://github.com/ClickHouse/ClickHouse/pull/47860) ([Antonio Andelic](https://github.com/antonio2368)).
* Use restore_threads (not backup_threads) for RESTORE ASYNC [#47861](https://github.com/ClickHouse/ClickHouse/pull/47861) ([Azat Khuzhin](https://github.com/azat)).
* Fix DROP COLUMN with ReplicatedMergeTree containing projections [#47883](https://github.com/ClickHouse/ClickHouse/pull/47883) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix for Replicated database recovery [#47901](https://github.com/ClickHouse/ClickHouse/pull/47901) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Hotfix for too verbose warnings in HTTP [#47903](https://github.com/ClickHouse/ClickHouse/pull/47903) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "Field value too long" in `catboostEvaluate` [#47970](https://github.com/ClickHouse/ClickHouse/pull/47970) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix [#36971](https://github.com/ClickHouse/ClickHouse/issues/36971): Watchdog: exit with non-zero code if child process exits [#47973](https://github.com/ClickHouse/ClickHouse/pull/47973) ([Коренберг Марк](https://github.com/socketpair)).
* Fix for "index file `cidx` is unexpectedly long" [#48010](https://github.com/ClickHouse/ClickHouse/pull/48010) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix MaterializedPostgreSQL query to get attributes (replica-identity) [#48015](https://github.com/ClickHouse/ClickHouse/pull/48015) ([Solomatov Sergei](https://github.com/solomatovs)).
* parseDateTime(): Fix UB (signed integer overflow) [#48019](https://github.com/ClickHouse/ClickHouse/pull/48019) ([Robert Schulze](https://github.com/rschu1ze)).
* Use unique names for Records in Avro to avoid reusing its schema [#48057](https://github.com/ClickHouse/ClickHouse/pull/48057) ([Kruglov Pavel](https://github.com/Avogar)).
* Correctly set TCP/HTTP socket timeouts in Keeper [#48108](https://github.com/ClickHouse/ClickHouse/pull/48108) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix possible member call on null pointer in `Avro` format [#48184](https://github.com/ClickHouse/ClickHouse/pull/48184) ([Kruglov Pavel](https://github.com/Avogar)).
### <a id="232"></a> ClickHouse release 23.2, 2023-02-23
#### Backward Incompatible Change

View File

@ -568,7 +568,7 @@ if (NATIVE_BUILD_TARGETS
COMMAND ${CMAKE_COMMAND}
"-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}"
"-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}"
"-DENABLE_CCACHE=${ENABLE_CCACHE}"
"-DCOMPILER_CACHE=${COMPILER_CACHE}"
# Avoid overriding .cargo/config.toml with native toolchain.
"-DENABLE_RUST=OFF"
"-DENABLE_CLICKHOUSE_SELF_EXTRACTING=${ENABLE_CLICKHOUSE_SELF_EXTRACTING}"

View File

@ -1,5 +1,6 @@
# Setup integration with ccache to speed up builds, see https://ccache.dev/
# Matches both ccache and sccache
if (CMAKE_CXX_COMPILER_LAUNCHER MATCHES "ccache" OR CMAKE_C_COMPILER_LAUNCHER MATCHES "ccache")
# custom compiler launcher already defined, most likely because cmake was invoked with like "-DCMAKE_CXX_COMPILER_LAUNCHER=ccache" or
# via environment variable --> respect setting and trust that the launcher was specified correctly
@ -8,45 +9,65 @@ if (CMAKE_CXX_COMPILER_LAUNCHER MATCHES "ccache" OR CMAKE_C_COMPILER_LAUNCHER MA
return()
endif()
option(ENABLE_CCACHE "Speedup re-compilations using ccache (external tool)" ON)
if (NOT ENABLE_CCACHE)
message(STATUS "Using ccache: no (disabled via configuration)")
return()
set(ENABLE_CCACHE "default" CACHE STRING "Deprecated, use COMPILER_CACHE=(auto|ccache|sccache|disabled)")
if (NOT ENABLE_CCACHE STREQUAL "default")
message(WARNING "The -DENABLE_CCACHE is deprecated in favor of -DCOMPILER_CACHE")
endif()
set(COMPILER_CACHE "auto" CACHE STRING "Speedup re-compilations using the caching tools; valid options are 'auto' (ccache, then sccache), 'ccache', 'sccache', or 'disabled'")
# It has pretty complex logic, because the ENABLE_CCACHE is deprecated, but still should
# control the COMPILER_CACHE
# After it will be completely removed, the following block will be much simpler
if (COMPILER_CACHE STREQUAL "ccache" OR (ENABLE_CCACHE AND NOT ENABLE_CCACHE STREQUAL "default"))
find_program (CCACHE_EXECUTABLE ccache)
elseif(COMPILER_CACHE STREQUAL "disabled" OR NOT ENABLE_CCACHE STREQUAL "default")
message(STATUS "Using *ccache: no (disabled via configuration)")
return()
elseif(COMPILER_CACHE STREQUAL "auto")
find_program (CCACHE_EXECUTABLE ccache sccache)
elseif(COMPILER_CACHE STREQUAL "sccache")
find_program (CCACHE_EXECUTABLE sccache)
else()
message(${RECONFIGURE_MESSAGE_LEVEL} "The COMPILER_CACHE must be one of (auto|ccache|sccache|disabled), given '${COMPILER_CACHE}'")
endif()
find_program (CCACHE_EXECUTABLE ccache)
if (NOT CCACHE_EXECUTABLE)
message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (Could not find find ccache. To significantly reduce compile times for the 2nd, 3rd, etc. build, it is highly recommended to install ccache. To suppress this message, run cmake with -DENABLE_CCACHE=0)")
message(${RECONFIGURE_MESSAGE_LEVEL} "Using *ccache: no (Could not find find ccache or sccache. To significantly reduce compile times for the 2nd, 3rd, etc. build, it is highly recommended to install one of them. To suppress this message, run cmake with -DCOMPILER_CACHE=disabled)")
return()
endif()
execute_process(COMMAND ${CCACHE_EXECUTABLE} "-V" OUTPUT_VARIABLE CCACHE_VERSION)
string(REGEX REPLACE "ccache version ([0-9\\.]+).*" "\\1" CCACHE_VERSION ${CCACHE_VERSION})
if (CCACHE_EXECUTABLE MATCHES "/ccache$")
execute_process(COMMAND ${CCACHE_EXECUTABLE} "-V" OUTPUT_VARIABLE CCACHE_VERSION)
string(REGEX REPLACE "ccache version ([0-9\\.]+).*" "\\1" CCACHE_VERSION ${CCACHE_VERSION})
set (CCACHE_MINIMUM_VERSION 3.3)
set (CCACHE_MINIMUM_VERSION 3.3)
if (CCACHE_VERSION VERSION_LESS_EQUAL ${CCACHE_MINIMUM_VERSION})
message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (found ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION}), the minimum required version is ${CCACHE_MINIMUM_VERSION}")
return()
endif()
if (CCACHE_VERSION VERSION_LESS_EQUAL ${CCACHE_MINIMUM_VERSION})
message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (found ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION}), the minimum required version is ${CCACHE_MINIMUM_VERSION}")
return()
endif()
message(STATUS "Using ccache: ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION})")
set(LAUNCHER ${CCACHE_EXECUTABLE})
message(STATUS "Using ccache: ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION})")
set(LAUNCHER ${CCACHE_EXECUTABLE})
# Work around a well-intended but unfortunate behavior of ccache 4.0 & 4.1 with
# environment variable SOURCE_DATE_EPOCH. This variable provides an alternative
# to source-code embedded timestamps (__DATE__/__TIME__) and therefore helps with
# reproducible builds (*). SOURCE_DATE_EPOCH is set automatically by the
# distribution, e.g. Debian. Ccache 4.0 & 4.1 incorporate SOURCE_DATE_EPOCH into
# the hash calculation regardless they contain timestamps or not. This invalidates
# the cache whenever SOURCE_DATE_EPOCH changes. As a fix, ignore SOURCE_DATE_EPOCH.
#
# (*) https://reproducible-builds.org/specs/source-date-epoch/
if (CCACHE_VERSION VERSION_GREATER_EQUAL "4.0" AND CCACHE_VERSION VERSION_LESS "4.2")
message(STATUS "Ignore SOURCE_DATE_EPOCH for ccache 4.0 / 4.1")
set(LAUNCHER env -u SOURCE_DATE_EPOCH ${CCACHE_EXECUTABLE})
# Work around a well-intended but unfortunate behavior of ccache 4.0 & 4.1 with
# environment variable SOURCE_DATE_EPOCH. This variable provides an alternative
# to source-code embedded timestamps (__DATE__/__TIME__) and therefore helps with
# reproducible builds (*). SOURCE_DATE_EPOCH is set automatically by the
# distribution, e.g. Debian. Ccache 4.0 & 4.1 incorporate SOURCE_DATE_EPOCH into
# the hash calculation regardless they contain timestamps or not. This invalidates
# the cache whenever SOURCE_DATE_EPOCH changes. As a fix, ignore SOURCE_DATE_EPOCH.
#
# (*) https://reproducible-builds.org/specs/source-date-epoch/
if (CCACHE_VERSION VERSION_GREATER_EQUAL "4.0" AND CCACHE_VERSION VERSION_LESS "4.2")
message(STATUS "Ignore SOURCE_DATE_EPOCH for ccache 4.0 / 4.1")
set(LAUNCHER env -u SOURCE_DATE_EPOCH ${CCACHE_EXECUTABLE})
endif()
elseif(CCACHE_EXECUTABLE MATCHES "/sccache$")
message(STATUS "Using sccache: ${CCACHE_EXECUTABLE}")
set(LAUNCHER ${CCACHE_EXECUTABLE})
endif()
set (CMAKE_CXX_COMPILER_LAUNCHER ${LAUNCHER} ${CMAKE_CXX_COMPILER_LAUNCHER})

2
contrib/boost vendored

@ -1 +1 @@
Subproject commit 03d9ec9cd159d14bd0b17c05138098451a1ea606
Subproject commit 8fe7b3326ef482ee6ecdf5a4f698f2b8c2780f98

@ -1 +1 @@
Subproject commit 4bfaeb31dd0ef13f025221f93c138974a3e0a22a
Subproject commit e0accd517933ebb44aff84bc8db448ffd8ef1929

@ -1 +1 @@
Subproject commit 400ad7152a0c7ee07756d96ab4f6a8f6d1080916
Subproject commit 20598079891d27ef1a3ad3f66bbfa3f983c25268

View File

@ -1,3 +1,6 @@
# The Dockerfile.ubuntu exists for the tests/ci/docker_server.py script
# If the image is built from Dockerfile.alpine, then the `-alpine` suffix is added automatically,
# so the only purpose of Dockerfile.ubuntu is to push `latest`, `head` and so on w/o suffixes
FROM ubuntu:20.04 AS glibc-donor
ARG TARGETARCH

View File

@ -0,0 +1 @@
Dockerfile

View File

@ -69,13 +69,14 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test --yes \
libc6 \
libc6-dev \
libc6-dev-arm64-cross \
python3-boto3 \
yasm \
zstd \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists
# Download toolchain and SDK for Darwin
RUN wget -nv https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz
RUN curl -sL -O https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz
# Architecture of the image when BuildKit/buildx is used
ARG TARGETARCH
@ -97,7 +98,7 @@ ENV PATH="$PATH:/usr/local/go/bin"
ENV GOPATH=/workdir/go
ENV GOCACHE=/workdir/
ARG CLANG_TIDY_SHA1=03644275e794b0587849bfc2ec6123d5ae0bdb1c
ARG CLANG_TIDY_SHA1=c191254ea00d47ade11d7170ef82fe038c213774
RUN curl -Lo /usr/bin/clang-tidy-cache \
"https://raw.githubusercontent.com/matus-chochlik/ctcache/$CLANG_TIDY_SHA1/clang-tidy-cache" \
&& chmod +x /usr/bin/clang-tidy-cache

View File

@ -6,6 +6,7 @@ exec &> >(ts)
ccache_status () {
ccache --show-config ||:
ccache --show-stats ||:
SCCACHE_NO_DAEMON=1 sccache --show-stats ||:
}
[ -O /build ] || git config --global --add safe.directory /build

View File

@ -5,13 +5,19 @@ import os
import argparse
import logging
import sys
from typing import List
from pathlib import Path
from typing import List, Optional
SCRIPT_PATH = os.path.realpath(__file__)
SCRIPT_PATH = Path(__file__).absolute()
IMAGE_TYPE = "binary"
IMAGE_NAME = f"clickhouse/{IMAGE_TYPE}-builder"
def check_image_exists_locally(image_name):
class BuildException(Exception):
pass
def check_image_exists_locally(image_name: str) -> bool:
try:
output = subprocess.check_output(
f"docker images -q {image_name} 2> /dev/null", shell=True
@ -21,17 +27,17 @@ def check_image_exists_locally(image_name):
return False
def pull_image(image_name):
def pull_image(image_name: str) -> bool:
try:
subprocess.check_call(f"docker pull {image_name}", shell=True)
return True
except subprocess.CalledProcessError:
logging.info(f"Cannot pull image {image_name}".format())
logging.info("Cannot pull image %s", image_name)
return False
def build_image(image_name, filepath):
context = os.path.dirname(filepath)
def build_image(image_name: str, filepath: Path) -> None:
context = filepath.parent
build_cmd = f"docker build --network=host -t {image_name} -f {filepath} {context}"
logging.info("Will build image with cmd: '%s'", build_cmd)
subprocess.check_call(
@ -40,7 +46,7 @@ def build_image(image_name, filepath):
)
def pre_build(repo_path: str, env_variables: List[str]):
def pre_build(repo_path: Path, env_variables: List[str]):
if "WITH_PERFORMANCE=1" in env_variables:
current_branch = subprocess.check_output(
"git branch --show-current", shell=True, encoding="utf-8"
@ -56,7 +62,9 @@ def pre_build(repo_path: str, env_variables: List[str]):
# conclusion is: in the current state the easiest way to go is to force
# unshallow repository for performance artifacts.
# To change it we need to rework our performance tests docker image
raise Exception("shallow repository is not suitable for performance builds")
raise BuildException(
"shallow repository is not suitable for performance builds"
)
if current_branch != "master":
cmd = (
f"git -C {repo_path} fetch --no-recurse-submodules "
@ -67,14 +75,14 @@ def pre_build(repo_path: str, env_variables: List[str]):
def run_docker_image_with_env(
image_name,
as_root,
output,
env_variables,
ch_root,
ccache_dir,
docker_image_version,
image_name: str,
as_root: bool,
output_dir: Path,
env_variables: List[str],
ch_root: Path,
ccache_dir: Optional[Path],
):
output_dir.mkdir(parents=True, exist_ok=True)
env_part = " -e ".join(env_variables)
if env_part:
env_part = " -e " + env_part
@ -89,10 +97,14 @@ def run_docker_image_with_env(
else:
user = f"{os.geteuid()}:{os.getegid()}"
ccache_mount = f"--volume={ccache_dir}:/ccache"
if ccache_dir is None:
ccache_mount = ""
cmd = (
f"docker run --network=host --user={user} --rm --volume={output}:/output "
f"--volume={ch_root}:/build --volume={ccache_dir}:/ccache {env_part} "
f"{interactive} {image_name}:{docker_image_version}"
f"docker run --network=host --user={user} --rm {ccache_mount}"
f"--volume={output_dir}:/output --volume={ch_root}:/build {env_part} "
f"{interactive} {image_name}"
)
logging.info("Will build ClickHouse pkg with cmd: '%s'", cmd)
@ -100,24 +112,25 @@ def run_docker_image_with_env(
subprocess.check_call(cmd, shell=True)
def is_release_build(build_type, package_type, sanitizer):
def is_release_build(build_type: str, package_type: str, sanitizer: str) -> bool:
return build_type == "" and package_type == "deb" and sanitizer == ""
def parse_env_variables(
build_type,
compiler,
sanitizer,
package_type,
cache,
distcc_hosts,
clang_tidy,
version,
author,
official,
additional_pkgs,
with_coverage,
with_binaries,
build_type: str,
compiler: str,
sanitizer: str,
package_type: str,
cache: str,
s3_bucket: str,
s3_directory: str,
s3_rw_access: bool,
clang_tidy: bool,
version: str,
official: bool,
additional_pkgs: bool,
with_coverage: bool,
with_binaries: str,
):
DARWIN_SUFFIX = "-darwin"
DARWIN_ARM_SUFFIX = "-darwin-aarch64"
@ -243,32 +256,43 @@ def parse_env_variables(
else:
result.append("BUILD_TYPE=None")
if cache == "distcc":
result.append(f"CCACHE_PREFIX={cache}")
if not cache:
cmake_flags.append("-DCOMPILER_CACHE=disabled")
if cache:
if cache == "ccache":
cmake_flags.append("-DCOMPILER_CACHE=ccache")
result.append("CCACHE_DIR=/ccache")
result.append("CCACHE_COMPRESSLEVEL=5")
result.append("CCACHE_BASEDIR=/build")
result.append("CCACHE_NOHASHDIR=true")
result.append("CCACHE_COMPILERCHECK=content")
cache_maxsize = "15G"
if clang_tidy:
# 15G is not enough for tidy build
cache_maxsize = "25G"
result.append("CCACHE_MAXSIZE=15G")
# `CTCACHE_DIR` has the same purpose as the `CCACHE_DIR` above.
# It's there to have the clang-tidy cache embedded into our standard `CCACHE_DIR`
if cache == "sccache":
cmake_flags.append("-DCOMPILER_CACHE=sccache")
# see https://github.com/mozilla/sccache/blob/main/docs/S3.md
result.append(f"SCCACHE_BUCKET={s3_bucket}")
sccache_dir = "sccache"
if s3_directory:
sccache_dir = f"{s3_directory}/{sccache_dir}"
result.append(f"SCCACHE_S3_KEY_PREFIX={sccache_dir}")
if not s3_rw_access:
result.append("SCCACHE_S3_NO_CREDENTIALS=true")
if clang_tidy:
# `CTCACHE_DIR` has the same purpose as the `CCACHE_DIR` above.
# It's there to have the clang-tidy cache embedded into our standard `CCACHE_DIR`
if cache == "ccache":
result.append("CTCACHE_DIR=/ccache/clang-tidy-cache")
result.append(f"CCACHE_MAXSIZE={cache_maxsize}")
if distcc_hosts:
hosts_with_params = [f"{host}/24,lzo" for host in distcc_hosts] + [
"localhost/`nproc`"
]
result.append('DISTCC_HOSTS="' + " ".join(hosts_with_params) + '"')
elif cache == "distcc":
result.append('DISTCC_HOSTS="localhost/`nproc`"')
if s3_bucket:
# see https://github.com/matus-chochlik/ctcache#environment-variables
ctcache_dir = "clang-tidy-cache"
if s3_directory:
ctcache_dir = f"{s3_directory}/{ctcache_dir}"
result.append(f"CTCACHE_S3_BUCKET={s3_bucket}")
result.append(f"CTCACHE_S3_FOLDER={ctcache_dir}")
if not s3_rw_access:
result.append("CTCACHE_S3_NO_CREDENTIALS=true")
if additional_pkgs:
# NOTE: This are the env for packages/build script
@ -300,9 +324,6 @@ def parse_env_variables(
if version:
result.append(f"VERSION_STRING='{version}'")
if author:
result.append(f"AUTHOR='{author}'")
if official:
cmake_flags.append("-DCLICKHOUSE_OFFICIAL_BUILD=1")
@ -312,14 +333,14 @@ def parse_env_variables(
return result
def dir_name(name: str) -> str:
if not os.path.isabs(name):
name = os.path.abspath(os.path.join(os.getcwd(), name))
return name
def dir_name(name: str) -> Path:
path = Path(name)
if not path.is_absolute():
path = Path.cwd() / name
return path
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="ClickHouse building script using prebuilt Docker image",
@ -331,7 +352,7 @@ if __name__ == "__main__":
)
parser.add_argument(
"--clickhouse-repo-path",
default=os.path.join(os.path.dirname(SCRIPT_PATH), os.pardir, os.pardir),
default=SCRIPT_PATH.parents[2],
type=dir_name,
help="ClickHouse git repository",
)
@ -361,17 +382,34 @@ if __name__ == "__main__":
)
parser.add_argument("--clang-tidy", action="store_true")
parser.add_argument("--cache", choices=("ccache", "distcc", ""), default="")
parser.add_argument(
"--ccache_dir",
default=os.getenv("HOME", "") + "/.ccache",
"--cache",
choices=("ccache", "sccache", ""),
default="",
help="ccache or sccache for objects caching; sccache uses only S3 buckets",
)
parser.add_argument(
"--ccache-dir",
default=Path.home() / ".ccache",
type=dir_name,
help="a directory with ccache",
)
parser.add_argument("--distcc-hosts", nargs="+")
parser.add_argument(
"--s3-bucket",
help="an S3 bucket used for sscache and clang-tidy-cache",
)
parser.add_argument(
"--s3-directory",
default="ccache",
help="an S3 directory prefix used for sscache and clang-tidy-cache",
)
parser.add_argument(
"--s3-rw-access",
action="store_true",
help="if set, the build fails on errors writing cache to S3",
)
parser.add_argument("--force-build-image", action="store_true")
parser.add_argument("--version")
parser.add_argument("--author", default="clickhouse", help="a package author")
parser.add_argument("--official", action="store_true")
parser.add_argument("--additional-pkgs", action="store_true")
parser.add_argument("--with-coverage", action="store_true")
@ -387,34 +425,54 @@ if __name__ == "__main__":
args = parser.parse_args()
image_name = f"clickhouse/{IMAGE_TYPE}-builder"
if args.additional_pkgs and args.package_type != "deb":
raise argparse.ArgumentTypeError(
"Can build additional packages only in deb build"
)
if args.cache != "ccache":
args.ccache_dir = None
if args.with_binaries != "":
if args.package_type != "deb":
raise argparse.ArgumentTypeError(
"Can add additional binaries only in deb build"
)
logging.info("Should place %s to output", args.with_binaries)
if args.cache == "sccache":
if not args.s3_bucket:
raise argparse.ArgumentTypeError("sccache must have --s3-bucket set")
return args
def main():
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
args = parse_args()
ch_root = args.clickhouse_repo_path
if args.additional_pkgs and args.package_type != "deb":
raise Exception("Can build additional packages only in deb build")
dockerfile = ch_root / "docker/packager" / IMAGE_TYPE / "Dockerfile"
image_with_version = IMAGE_NAME + ":" + args.docker_image_version
if args.force_build_image:
build_image(image_with_version, dockerfile)
elif not (
check_image_exists_locally(image_with_version) or pull_image(image_with_version)
):
build_image(image_with_version, dockerfile)
if args.with_binaries != "" and args.package_type != "deb":
raise Exception("Can add additional binaries only in deb build")
if args.with_binaries != "" and args.package_type == "deb":
logging.info("Should place %s to output", args.with_binaries)
dockerfile = os.path.join(ch_root, "docker/packager", IMAGE_TYPE, "Dockerfile")
image_with_version = image_name + ":" + args.docker_image_version
if not check_image_exists_locally(image_name) or args.force_build_image:
if not pull_image(image_with_version) or args.force_build_image:
build_image(image_with_version, dockerfile)
env_prepared = parse_env_variables(
args.build_type,
args.compiler,
args.sanitizer,
args.package_type,
args.cache,
args.distcc_hosts,
args.s3_bucket,
args.s3_directory,
args.s3_rw_access,
args.clang_tidy,
args.version,
args.author,
args.official,
args.additional_pkgs,
args.with_coverage,
@ -423,12 +481,15 @@ if __name__ == "__main__":
pre_build(args.clickhouse_repo_path, env_prepared)
run_docker_image_with_env(
image_name,
image_with_version,
args.as_root,
args.output_dir,
env_prepared,
ch_root,
args.ccache_dir,
args.docker_image_version,
)
logging.info("Output placed into %s", args.output_dir)
if __name__ == "__main__":
main()

View File

@ -20,12 +20,6 @@ RUN apt-get update \
zstd \
--yes --no-install-recommends
# Install CMake 3.20+ for Rust compilation
RUN apt purge cmake --yes
RUN wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
RUN apt-add-repository 'deb https://apt.kitware.com/ubuntu/ focal main'
RUN apt update && apt install cmake --yes
RUN pip3 install numpy scipy pandas Jinja2
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.4.20200302/clickhouse-odbc-1.1.4-Linux.tar.gz"

View File

@ -16,7 +16,8 @@ export LLVM_VERSION=${LLVM_VERSION:-13}
# it being undefined. Also read it as array so that we can pass an empty list
# of additional variable to cmake properly, and it doesn't generate an extra
# empty parameter.
read -ra FASTTEST_CMAKE_FLAGS <<< "${FASTTEST_CMAKE_FLAGS:-}"
# Read it as CMAKE_FLAGS to not lose exported FASTTEST_CMAKE_FLAGS on subsequential launch
read -ra CMAKE_FLAGS <<< "${FASTTEST_CMAKE_FLAGS:-}"
# Run only matching tests.
FASTTEST_FOCUS=${FASTTEST_FOCUS:-""}
@ -37,6 +38,13 @@ export FASTTEST_DATA
export FASTTEST_OUT
export PATH
function ccache_status
{
ccache --show-config ||:
ccache --show-stats ||:
SCCACHE_NO_DAEMON=1 sccache --show-stats ||:
}
function start_server
{
set -m # Spawn server in its own process groups
@ -171,14 +179,14 @@ function run_cmake
export CCACHE_COMPILERCHECK=content
export CCACHE_MAXSIZE=15G
ccache --show-stats ||:
ccache_status
ccache --zero-stats ||:
mkdir "$FASTTEST_BUILD" ||:
(
cd "$FASTTEST_BUILD"
cmake "$FASTTEST_SOURCE" -DCMAKE_CXX_COMPILER="clang++-${LLVM_VERSION}" -DCMAKE_C_COMPILER="clang-${LLVM_VERSION}" "${CMAKE_LIBS_CONFIG[@]}" "${FASTTEST_CMAKE_FLAGS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/cmake_log.txt"
cmake "$FASTTEST_SOURCE" -DCMAKE_CXX_COMPILER="clang++-${LLVM_VERSION}" -DCMAKE_C_COMPILER="clang-${LLVM_VERSION}" "${CMAKE_LIBS_CONFIG[@]}" "${CMAKE_FLAGS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/cmake_log.txt"
)
}
@ -193,7 +201,7 @@ function build
strip programs/clickhouse -o "$FASTTEST_OUTPUT/clickhouse-stripped"
zstd --threads=0 "$FASTTEST_OUTPUT/clickhouse-stripped"
fi
ccache --show-stats ||:
ccache_status
ccache --evict-older-than 1d ||:
)
}

View File

@ -92,4 +92,17 @@ RUN mkdir /tmp/ccache \
&& cd / \
&& rm -rf /tmp/ccache
ARG TARGETARCH
ARG SCCACHE_VERSION=v0.4.1
RUN arch=${TARGETARCH:-amd64} \
&& case $arch in \
amd64) rarch=x86_64 ;; \
arm64) rarch=aarch64 ;; \
esac \
&& curl -Ls "https://github.com/mozilla/sccache/releases/download/$SCCACHE_VERSION/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl.tar.gz" | \
tar xz -C /tmp \
&& mv "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl/sccache" /usr/bin \
&& rm "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl" -r
COPY process_functional_tests_result.py /

View File

@ -9,7 +9,7 @@ sidebar_label: 2022
#### Backward Incompatible Change
* Function `yandexConsistentHash` (consistent hashing algorithm by Konstantin "kostik" Oblakov) is renamed to `kostikConsistentHash`. The old name is left as an alias for compatibility. Although this change is backward compatible, we may remove the alias in subsequent releases, that's why it's recommended to update the usages of this function in your apps. [#35553](https://github.com/ClickHouse/ClickHouse/pull/35553) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Do not allow SETTINGS after FORMAT for INSERT queries (there is compatibility setting `parser_settings_after_format_compact` to accept such queries, but it is turned OFF by default). [#35883](https://github.com/ClickHouse/ClickHouse/pull/35883) ([Azat Khuzhin](https://github.com/azat)).
* Do not allow SETTINGS after FORMAT for INSERT queries (there is compatibility setting `allow_settings_after_format_in_insert` to accept such queries, but it is turned OFF by default). [#35883](https://github.com/ClickHouse/ClickHouse/pull/35883) ([Azat Khuzhin](https://github.com/azat)).
* Changed hashed path for cache files. [#36079](https://github.com/ClickHouse/ClickHouse/pull/36079) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### New Feature

View File

@ -85,7 +85,7 @@ The build requires the following components:
- Git (is used only to checkout the sources, its not needed for the build)
- CMake 3.15 or newer
- Ninja
- C++ compiler: clang-14 or newer
- C++ compiler: clang-15 or newer
- Linker: lld
- Yasm
- Gawk

View File

@ -259,4 +259,4 @@ The number of rows in one Kafka message depends on whether the format is row-bas
**See Also**
- [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)
- [background_message_broker_schedule_pool_size](../../../operations/settings/settings.md#background_message_broker_schedule_pool_size)
- [background_message_broker_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_message_broker_schedule_pool_size)

View File

@ -12,7 +12,7 @@ This engine provides integration with [Amazon S3](https://aws.amazon.com/s3/) ec
``` sql
CREATE TABLE s3_engine_table (name String, value UInt32)
ENGINE = S3(path, [aws_access_key_id, aws_secret_access_key,] format, [compression])
ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression])
[PARTITION BY expr]
[SETTINGS ...]
```
@ -20,6 +20,7 @@ CREATE TABLE s3_engine_table (name String, value UInt32)
**Engine parameters**
- `path` — Bucket url with path to file. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings. For more information see [below](#wildcards-in-path).
- `NOSIGN` - If this keyword is provided in place of credentials, all the requests will not be signed.
- `format` — The [format](../../../interfaces/formats.md#formats) of the file.
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
- `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will autodetect compression by file extension.
@ -151,6 +152,7 @@ The following settings can be specified in configuration file for given endpoint
- `region` — Specifies S3 region name. Optional.
- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Optional, default value is `false`.
- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
- `no_sign_request` - Ignore all the credentials so requests are not signed. Useful for accessing public buckets.
- `header` — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set. Optional.
- `max_single_read_retries` — The maximum number of attempts during single read. Default value is `4`. Optional.
@ -168,6 +170,7 @@ The following settings can be specified in configuration file for given endpoint
<!-- <use_environment_credentials>false</use_environment_credentials> -->
<!-- <use_insecure_imds_request>false</use_insecure_imds_request> -->
<!-- <expiration_window_seconds>120</expiration_window_seconds> -->
<!-- <no_sign_request>false</no_sign_request> -->
<!-- <header>Authorization: Bearer SOME-TOKEN</header> -->
<!-- <server_side_encryption_customer_key_base64>BASE64-ENCODED-KEY</server_side_encryption_customer_key_base64> -->
<!-- <max_single_read_retries>4</max_single_read_retries> -->
@ -175,6 +178,17 @@ The following settings can be specified in configuration file for given endpoint
</s3>
```
## Accessing public buckets
ClickHouse tries to fetch credentials from many different types of sources.
Sometimes, it can produce problems when accessing some buckets that are public causing the client to return `403` error code.
This issue can be avoided by using `NOSIGN` keyword, forcing the client to ignore all the credentials, and not sign the requests.
``` sql
CREATE TABLE big_table (name String, value UInt32)
ENGINE = S3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', NOSIGN, 'CSVWithNames');
```
## See also
- [s3 table function](../../../sql-reference/table-functions/s3.md)

View File

@ -874,7 +874,7 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
The `default` storage policy implies using only one volume, which consists of only one disk given in `<path>`.
You could change storage policy after table creation with [ALTER TABLE ... MODIFY SETTING] query, new policy should include all old disks and volumes with same names.
The number of threads performing background moves of data parts can be changed by [background_move_pool_size](/docs/en/operations/settings/settings.md/#background_move_pool_size) setting.
The number of threads performing background moves of data parts can be changed by [background_move_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_move_pool_size) setting.
### Details {#details}

View File

@ -112,7 +112,7 @@ For each `INSERT` query, approximately ten entries are added to ZooKeeper throug
For very large clusters, you can use different ZooKeeper clusters for different shards. However, from our experience this has not proven necessary based on production clusters with approximately 300 servers.
Replication is asynchronous and multi-master. `INSERT` queries (as well as `ALTER`) can be sent to any available server. Data is inserted on the server where the query is run, and then it is copied to the other servers. Because it is asynchronous, recently inserted data appears on the other replicas with some latency. If part of the replicas are not available, the data is written when they become available. If a replica is available, the latency is the amount of time it takes to transfer the block of compressed data over the network. The number of threads performing background tasks for replicated tables can be set by [background_schedule_pool_size](/docs/en/operations/settings/settings.md/#background_schedule_pool_size) setting.
Replication is asynchronous and multi-master. `INSERT` queries (as well as `ALTER`) can be sent to any available server. Data is inserted on the server where the query is run, and then it is copied to the other servers. Because it is asynchronous, recently inserted data appears on the other replicas with some latency. If part of the replicas are not available, the data is written when they become available. If a replica is available, the latency is the amount of time it takes to transfer the block of compressed data over the network. The number of threads performing background tasks for replicated tables can be set by [background_schedule_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_schedule_pool_size) setting.
`ReplicatedMergeTree` engine uses a separate thread pool for replicated fetches. Size of the pool is limited by the [background_fetches_pool_size](/docs/en/operations/settings/settings.md/#background_fetches_pool_size) setting which can be tuned with a server restart.
@ -320,8 +320,8 @@ If the data in ClickHouse Keeper was lost or damaged, you can save data by movin
**See Also**
- [background_schedule_pool_size](/docs/en/operations/settings/settings.md/#background_schedule_pool_size)
- [background_fetches_pool_size](/docs/en/operations/settings/settings.md/#background_fetches_pool_size)
- [background_schedule_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_schedule_pool_size)
- [background_fetches_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_fetches_pool_size)
- [execute_merges_on_single_replica_time_threshold](/docs/en/operations/settings/settings.md/#execute-merges-on-single-replica-time-threshold)
- [max_replicated_fetches_network_bandwidth](/docs/en/operations/settings/merge-tree-settings.md/#max_replicated_fetches_network_bandwidth)
- [max_replicated_sends_network_bandwidth](/docs/en/operations/settings/merge-tree-settings.md/#max_replicated_sends_network_bandwidth)

View File

@ -141,6 +141,10 @@ Clusters are configured in the [server configuration file](../../../operations/c
be used as current user for the query.
-->
<!-- <secret></secret> -->
<!-- Optional. Whether distributed DDL queries (ON CLUSTER clause) are allowed for this cluster. Default: true (allowed). -->
<!-- <allow_distributed_ddl_queries>true</allow_distributed_ddl_queries> -->
<shard>
<!-- Optional. Shard weight when writing data. Default: 1. -->
<weight>1</weight>

View File

@ -0,0 +1,476 @@
---
slug: /en/getting-started/example-datasets/amazon-reviews
sidebar_label: Amazon customer reviews
---
# Amazon customer reviews dataset
[**Amazon Customer Reviews**](https://s3.amazonaws.com/amazon-reviews-pds/readme.html) (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. By accessing the dataset, you agree to the [license terms](https://s3.amazonaws.com/amazon-reviews-pds/license.txt).
The data is in a tab-separated format in gzipped files are up in AWS S3. Let's walk through the steps to insert it into ClickHouse.
:::note
The queries below were executed on a **Production** instance of [ClickHouse Cloud](https://clickhouse.cloud).
:::
1. Without inserting the data into ClickHouse, we can query it in place. Let's grab some rows so we can see what they look like:
```sql
SELECT *
FROM s3('https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Wireless_v1_00.tsv.gz',
'TabSeparatedWithNames',
'marketplace String,
customer_id Int64,
review_id String,
product_id String,
product_parent Int64,
product_title String,
product_category String,
star_rating Int64,
helpful_votes Int64,
total_votes Int64,
vine Bool,
verified_purchase Bool,
review_headline String,
review_body String,
review_date Date'
)
LIMIT 10;
```
The rows look like:
```response
┌─marketplace─┬─customer_id─┬─review_id──────┬─product_id─┬─product_parent─┬─product_title──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─product_category─┬─star_rating─┬─helpful_votes─┬─total_votes─┬─vine──┬─verified_purchase─┬─review_headline───────────┬─review_body────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─review_date─┐
│ US │ 16414143 │ R3W4P9UBGNGH1U │ B00YL0EKWE │ 852431543 │ LG G4 Case Hard Transparent Slim Clear Cover for LG G4 │ Wireless │ 2 │ 1 │ 3 │ false │ true │ Looks good, functions meh │ 2 issues - Once I turned on the circle apps and installed this case, my battery drained twice as fast as usual. I ended up turning off the circle apps, which kind of makes the case just a case... with a hole in it. Second, the wireless charging doesn't work. I have a Motorola 360 watch and a Qi charging pad. The watch charges fine but this case doesn't. But hey, it looks nice. │ 2015-08-31 │
│ US │ 50800750 │ R15V54KBMTQWAY │ B00XK95RPQ │ 516894650 │ Selfie Stick Fiblastiq&trade; Extendable Wireless Bluetooth Selfie Stick with built-in Bluetooth Adjustable Phone Holder │ Wireless │ 4 │ 0 │ 0 │ false │ false │ A fun little gadget │ Im embarrassed to admit that until recently, I have had a very negative opinion about “selfie sticks” aka “monopods” aka “narcissticks.” But having reviewed a number of them recently, theyre growing on me. This one is pretty nice and simple to set up and with easy instructions illustrated on the back of the box (not sure why some reviewers have stated that there are no instructions when they are clearly printed on the box unless they received different packaging than I did). Once assembled, the pairing via bluetooth and use of the stick are easy and intuitive. Nothing to it.<br /><br />The stick comes with a USB charging cable but arrived with a charge so you can use it immediately, though its probably a good idea to charge it right away so that you have no interruption of use out of the box. Make sure the stick is switched to on (it will light up) and extend your stick to the length you desire up to about a yards length and snap away.<br /><br />The phone clamp held the phone sturdily so I wasnt worried about it slipping out. But the longer you extend the stick, the harder it is to maneuver. But that will happen with any stick and is not specific to this one in particular.<br /><br />Two things that could improve this: 1) add the option to clamp this in portrait orientation instead of having to try and hold the stick at the portrait angle, which makes it feel unstable; 2) add the opening for a tripod so that this can be used to sit upright on a table for skyping and facetime eliminating the need to hold the phone up with your hand, causing fatigue.<br /><br />But other than that, this is a nice quality monopod for a variety of picture taking opportunities.<br /><br />I received a sample in exchange for my honest opinion. │ 2015-08-31 │
│ US │ 15184378 │ RY8I449HNXSVF │ B00SXRXUKO │ 984297154 │ Tribe AB40 Water Resistant Sports Armband with Key Holder for 4.7-Inch iPhone 6S/6/5/5S/5C, Galaxy S4 + Screen Protector - Dark Pink │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Fits iPhone 6 well │ 2015-08-31 │
│ US │ 10203548 │ R18TLJYCKJFLSR │ B009V5X1CE │ 279912704 │ RAVPower® Element 10400mAh External Battery USB Portable Charger (Dual USB Outputs, Ultra Compact Design), Travel Charger for iPhone 6,iPhone 6 plus,iPhone 5, 5S, 5C, 4S, 4, iPad Air, 4, 3, 2, Mini 2 (Apple adapters not included); Samsung Galaxy S5, S4, S3, S2, Note 3, Note 2; HTC One, EVO, Thunderbolt, Incredible, Droid DNA, Motorola ATRIX, Droid, Moto X, Google Glass, Nexus 4, Nexus 5, Nexus 7, │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Great charger │ Great charger. I easily get 3+ charges on a Samsung Galaxy 3. Works perfectly for camping trips or long days on the boat. │ 2015-08-31 │
│ US │ 488280 │ R1NK26SWS53B8Q │ B00D93OVF0 │ 662791300 │ Fosmon Micro USB Value Pack Bundle for Samsung Galaxy Exhilarate - Includes Home / Travel Charger, Car / Vehicle Charger and USB Cable │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Great for the price :-) │ 2015-08-31 │
│ US │ 13334021 │ R11LOHEDYJALTN │ B00XVGJMDQ │ 421688488 │ iPhone 6 Case, Vofolen Impact Resistant Protective Shell iPhone 6S Wallet Cover Shockproof Rubber Bumper Case Anti-scratches Hard Cover Skin Card Slot Holder for iPhone 6 6S │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Great Case, better customer service! │ 2015-08-31 │
│ US │ 27520697 │ R3ALQVQB2P9LA7 │ B00KQW1X1C │ 554285554 │ Nokia Lumia 630 RM-978 White Factory Unlocked - International Version No Warranty │ Wireless │ 4 │ 0 │ 0 │ false │ true │ Four Stars │ Easy to set up and use. Great functions for the price │ 2015-08-31 │
│ US │ 48086021 │ R3MWLXLNO21PDQ │ B00IP1MQNK │ 488006702 │ Lumsing 10400mah external battery │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Works great │ 2015-08-31 │
│ US │ 12738196 │ R2L15IS24CX0LI │ B00HVORET8 │ 389677711 │ iPhone 5S Battery Case - iPhone 5 Battery Case , Maxboost Atomic S [MFI Certified] External Protective Battery Charging Case Power Bank Charger All Versions of Apple iPhone 5/5S [Juice Battery Pack] │ Wireless │ 5 │ 0 │ 0 │ false │ true │ So far so good │ So far so good. It is essentially identical to the one it replaced from another company. That one stopped working after 7 months so I am a bit apprehensive about this one. │ 2015-08-31 │
│ US │ 15867807 │ R1DJ8976WPWVZU │ B00HX3G6J6 │ 299654876 │ HTC One M8 Screen Protector, Skinomi TechSkin Full Coverage Screen Protector for HTC One M8 Clear HD Anti-Bubble Film │ Wireless │ 3 │ 0 │ 0 │ false │ true │ seems durable but these are always harder to get on ... │ seems durable but these are always harder to get on right than people make them out to be. also send to curl up at the edges after a while. with today's smartphones, you hardly need screen protectors anyway. │ 2015-08-31 │
└─────────────┴─────────────┴────────────────┴────────────┴────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────┴─────────────┴───────────────┴─────────────┴───────┴───────────────────┴─────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
```
:::note
Normally you would not need to pass in the schema into the `s3` table function - ClickHouse can infer the names and data types of the columns. However, this particular dataset uses a non-standard tab-separated format, but the `s3` function seems to work fine with this non-standard format if you include the schema.
:::
2. Let's define a new table named `amazon_reviews`. We'll optimize some of the column data types - and choose a primary key (the `ORDER BY` clause):
```sql
CREATE TABLE amazon_reviews
(
review_date Date,
marketplace LowCardinality(String),
customer_id UInt64,
review_id String,
product_id String,
product_parent UInt64,
product_title String,
product_category LowCardinality(String),
star_rating UInt8,
helpful_votes UInt32,
total_votes UInt32,
vine Bool,
verified_purchase Bool,
review_headline String,
review_body String
)
ENGINE = MergeTree
ORDER BY (marketplace, review_date, product_category);
```
3. We are now ready to insert the data into ClickHouse. Before we do, check out the [list of files in the dataset](https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt) and decide which ones you want to include.
4. We will insert all of the US reviews - which is about 151M rows. The following `INSERT` command uses the `s3Cluster` table function, which allows the processing of mulitple S3 files in parallel using all the nodes of your cluster. We also use a wildcard to insert any file that starts with the name `https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_`:
```sql
INSERT INTO amazon_reviews
WITH
transform(vine, ['Y','N'],[true, false]) AS vine,
transform(verified_purchase, ['Y','N'],[true, false]) AS verified_purchase
SELECT
*
FROM s3Cluster(
'default',
'https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_*.tsv.gz',
'TSVWithNames',
'review_date Date,
marketplace LowCardinality(String),
customer_id UInt64,
review_id String,
product_id String,
product_parent UInt64,
product_title String,
product_category LowCardinality(String),
star_rating UInt8,
helpful_votes UInt32,
total_votes UInt32,
vine FixedString(1),
verified_purchase FixedString(1),
review_headline String,
review_body String'
)
SETTINGS input_format_allow_errors_num = 1000000;
```
:::tip
In ClickHouse Cloud, there is a cluster named `default`. Change `default` to the name of your cluster...or use the `s3` table function (instead of `s3Cluster`) if you do not have a cluster.
:::
5. That query doesn't take long - within 5 minutes or so you should see all the rows inserted:
```sql
SELECT formatReadableQuantity(count())
FROM amazon_reviews
```
```response
┌─formatReadableQuantity(count())─┐
│ 150.96 million │
└─────────────────────────────────┘
```
6. Let's see how much space our data is using:
```sql
SELECT
disk_name,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
round(usize / size, 2) AS compr_rate,
sum(rows) AS rows,
count() AS part_count
FROM system.parts
WHERE (active = 1) AND (table = 'amazon_reviews')
GROUP BY disk_name
ORDER BY size DESC;
```
The original data was about 70G, but compressed in ClickHouse it takes up about 30G:
```response
┌─disk_name─┬─compressed─┬─uncompressed─┬─compr_rate─┬──────rows─┬─part_count─┐
│ s3disk │ 30.00 GiB │ 70.61 GiB │ 2.35 │ 150957260 │ 9 │
└───────────┴────────────┴──────────────┴────────────┴───────────┴────────────┘
```
7. Let's run some queries...here are the top 10 most-helpful reviews on Amazon:
```sql
SELECT
product_title,
review_headline
FROM amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 10;
```
Notice the query has to process all 151M rows, and it takes about 17 seconds:
```response
┌─product_title────────────────────────────────────────────────────────────────────────────┬─review_headline───────────────────────────────────────────────────────┐
│ Kindle: Amazon's Original Wireless Reading Device (1st generation) │ Why and how the Kindle changes everything │
│ BIC Cristal For Her Ball Pen, 1.0mm, Black, 16ct (MSLP16-Blk) │ FINALLY! │
│ The Mountain Kids 100% Cotton Three Wolf Moon T-Shirt │ Dual Function Design │
│ Kindle Keyboard 3G, Free 3G + Wi-Fi, 6" E Ink Display │ Kindle vs. Nook (updated) │
│ Kindle Fire HD 7", Dolby Audio, Dual-Band Wi-Fi │ You Get What You Pay For │
│ Kindle Fire (Previous Generation - 1st) │ A great device WHEN you consider price and function, with a few flaws │
│ Fifty Shades of Grey: Book One of the Fifty Shades Trilogy (Fifty Shades of Grey Series) │ Did a teenager write this??? │
│ Wheelmate Laptop Steering Wheel Desk │ Perfect for an Starfleet Helmsman │
│ Kindle Wireless Reading Device (6" Display, U.S. Wireless) │ BEWARE of the SIGNIFICANT DIFFERENCES between Kindle 1 and Kindle 2! │
│ Tuscan Dairy Whole Vitamin D Milk, Gallon, 128 oz │ Make this your only stock and store │
└──────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────┘
10 rows in set. Elapsed: 17.595 sec. Processed 150.96 million rows, 15.36 GB (8.58 million rows/s., 872.89 MB/s.)
```
8. Here are the top 10 products in Amazon with the most reviews:
```sql
SELECT
any(product_title),
count()
FROM amazon_reviews
GROUP BY product_id
ORDER BY 2 DESC
LIMIT 10;
```
```response
┌─any(product_title)────────────────────────────┬─count()─┐
│ Candy Crush Saga │ 50051 │
│ The Secret Society® - Hidden Mystery │ 41255 │
│ Google Chromecast HDMI Streaming Media Player │ 35977 │
│ Minecraft │ 35129 │
│ Bosch Season 1 │ 33610 │
│ Gone Girl: A Novel │ 33240 │
│ Subway Surfers │ 32328 │
│ The Fault in Our Stars │ 30149 │
│ Amazon.com eGift Cards │ 28879 │
│ Crossy Road │ 28111 │
└───────────────────────────────────────────────┴─────────┘
10 rows in set. Elapsed: 16.684 sec. Processed 195.05 million rows, 20.86 GB (11.69 million rows/s., 1.25 GB/s.)
```
9. Here are the average review ratings per month for each product (an actual [Amazon job interview question](https://datalemur.com/questions/sql-avg-review-ratings)!):
```sql
SELECT
toStartOfMonth(review_date) AS month,
any(product_title),
avg(star_rating) AS avg_stars
FROM amazon_reviews
GROUP BY
month,
product_id
ORDER BY
month DESC,
product_id ASC
LIMIT 20;
```
It calculates all the monthly averages for each product, but we only returned 20 rows:
```response
┌──────month─┬─any(product_title)──────────────────────────────────────────────────────────────────────┬─avg_stars─┐
│ 2015-08-01 │ Mystiqueshapes Girls Ballet Tutu Neon Lime Green │ 4 │
│ 2015-08-01 │ Adult Ballet Tutu Yellow │ 5 │
│ 2015-08-01 │ The Way Things Work: An Illustrated Encyclopedia of Technology │ 5 │
│ 2015-08-01 │ Hilda Boswell's Treasury of Poetry │ 5 │
│ 2015-08-01 │ Treasury of Poetry │ 5 │
│ 2015-08-01 │ Uncle Remus Stories │ 5 │
│ 2015-08-01 │ The Book of Daniel │ 5 │
│ 2015-08-01 │ Berenstains' B Book │ 5 │
│ 2015-08-01 │ The High Hills (Brambly Hedge) │ 4.5 │
│ 2015-08-01 │ Fuzzypeg Goes to School (The Little Grey Rabbit library) │ 5 │
│ 2015-08-01 │ Dictionary in French: The Cat in the Hat (Beginner Series) │ 5 │
│ 2015-08-01 │ Windfallen │ 5 │
│ 2015-08-01 │ The Monk Who Sold His Ferrari: A Remarkable Story About Living Your Dreams │ 5 │
│ 2015-08-01 │ Illustrissimi: The Letters of Pope John Paul I │ 5 │
│ 2015-08-01 │ Social Contract: A Personal Inquiry into the Evolutionary Sources of Order and Disorder │ 5 │
│ 2015-08-01 │ Mexico The Beautiful Cookbook: Authentic Recipes from the Regions of Mexico │ 4.5 │
│ 2015-08-01 │ Alanbrooke │ 5 │
│ 2015-08-01 │ Back to Cape Horn │ 4 │
│ 2015-08-01 │ Ovett: An Autobiography (Willow books) │ 5 │
│ 2015-08-01 │ The Birds of West Africa (Collins Field Guides) │ 4 │
└────────────┴─────────────────────────────────────────────────────────────────────────────────────────┴───────────┘
20 rows in set. Elapsed: 52.827 sec. Processed 251.46 million rows, 35.26 GB (4.76 million rows/s., 667.55 MB/s.)
```
10. Here are the total number of votes per product category. This query is fast because `product_category` is in the primary key:
```sql
SELECT
sum(total_votes),
product_category
FROM amazon_reviews
GROUP BY product_category
ORDER BY 1 DESC;
```
```response
┌─sum(total_votes)─┬─product_category─────────┐
│ 103877874 │ Books │
│ 25330411 │ Digital_Ebook_Purchase │
│ 23065953 │ Video DVD │
│ 18048069 │ Music │
│ 17292294 │ Mobile_Apps │
│ 15977124 │ Health & Personal Care │
│ 13554090 │ PC │
│ 13065746 │ Kitchen │
│ 12537926 │ Home │
│ 11067538 │ Beauty │
│ 10418643 │ Wireless │
│ 9089085 │ Toys │
│ 9071484 │ Sports │
│ 7335647 │ Electronics │
│ 6885504 │ Apparel │
│ 6710085 │ Video Games │
│ 6556319 │ Camera │
│ 6305478 │ Lawn and Garden │
│ 5954422 │ Office Products │
│ 5339437 │ Home Improvement │
│ 5284343 │ Outdoors │
│ 5125199 │ Pet Products │
│ 4733251 │ Grocery │
│ 4697750 │ Shoes │
│ 4666487 │ Automotive │
│ 4361518 │ Digital_Video_Download │
│ 4033550 │ Tools │
│ 3559010 │ Baby │
│ 3317662 │ Home Entertainment │
│ 2559501 │ Video │
│ 2204328 │ Furniture │
│ 2157587 │ Musical Instruments │
│ 1881662 │ Software │
│ 1676081 │ Jewelry │
│ 1499945 │ Watches │
│ 1224071 │ Digital_Music_Purchase │
│ 847918 │ Luggage │
│ 503939 │ Major Appliances │
│ 392001 │ Digital_Video_Games │
│ 348990 │ Personal_Care_Appliances │
│ 321372 │ Digital_Software │
│ 169585 │ Mobile_Electronics │
│ 72970 │ Gift Card │
└──────────────────┴──────────────────────────┘
43 rows in set. Elapsed: 0.423 sec. Processed 150.96 million rows, 756.20 MB (356.70 million rows/s., 1.79 GB/s.)
```
11. Let's find the products with the word **"awful"** occurring most frequently in the review. This is a big task - over 151M strings have to be parsed looking for a single word:
```sql
SELECT
product_id,
any(product_title),
avg(star_rating),
count() AS count
FROM amazon_reviews
WHERE position(review_body, 'awful') > 0
GROUP BY product_id
ORDER BY count DESC
LIMIT 50;
```
The query takes a couple of minutes, but the results are a fun read:
```response
┌─product_id─┬─any(product_title)───────────────────────────────────────────────────────────────────────┬───avg(star_rating)─┬─count─┐
│ 0345803485 │ Fifty Shades of Grey: Book One of the Fifty Shades Trilogy (Fifty Shades of Grey Series) │ 1.3870967741935485 │ 248 │
│ B007J4T2G8 │ Fifty Shades of Grey (Fifty Shades, Book 1) │ 1.4439834024896265 │ 241 │
│ B006LSZECO │ Gone Girl: A Novel │ 2.2986425339366514 │ 221 │
│ B00008OWZG │ St. Anger │ 1.6565656565656566 │ 198 │
│ B00BD99JMW │ Allegiant (Divergent Trilogy, Book 3) │ 1.8342541436464088 │ 181 │
│ B0000YUXI0 │ Mavala Switzerland Mavala Stop Nail Biting │ 4.473684210526316 │ 171 │
│ B004S8F7QM │ Cards Against Humanity │ 4.753012048192771 │ 166 │
│ 031606792X │ Breaking Dawn (The Twilight Saga, Book 4) │ 1.796875 │ 128 │
│ 006202406X │ Allegiant (Divergent Series) │ 1.4242424242424243 │ 99 │
│ B0051VVOB2 │ Kindle Fire (Previous Generation - 1st) │ 2.7448979591836733 │ 98 │
│ B00I3MP3SG │ Pilot │ 1.8762886597938144 │ 97 │
│ 030758836X │ Gone Girl │ 2.15625 │ 96 │
│ B0009X29WK │ Precious Cat Ultra Premium Clumping Cat Litter │ 3.0759493670886076 │ 79 │
│ B00JB3MVCW │ Noah │ 1.2027027027027026 │ 74 │
│ B00BAXFECK │ The Goldfinch: A Novel (Pulitzer Prize for Fiction) │ 2.643835616438356 │ 73 │
│ B00N28818A │ Amazon Prime Video │ 1.4305555555555556 │ 72 │
│ B007FTE2VW │ SimCity - Limited Edition │ 1.2794117647058822 │ 68 │
│ 0439023513 │ Mockingjay (The Hunger Games) │ 2.6417910447761193 │ 67 │
│ B00178630A │ Diablo III - PC/Mac │ 1.671875 │ 64 │
│ B000OCEWGW │ Liquid Ass │ 4.8125 │ 64 │
│ B005ZOBNOI │ The Fault in Our Stars │ 4.316666666666666 │ 60 │
│ B00L9B7IKE │ The Girl on the Train: A Novel │ 2.0677966101694913 │ 59 │
│ B007S6Y6VS │ Garden of Life Raw Organic Meal │ 2.8793103448275863 │ 58 │
│ B0064X7B4A │ Words With Friends │ 2.2413793103448274 │ 58 │
│ B003WUYPPG │ Unbroken: A World War II Story of Survival, Resilience, and Redemption │ 4.620689655172414 │ 58 │
│ B00006HBUJ │ Star Wars: Episode II - Attack of the Clones (Widescreen Edition) │ 2.2982456140350878 │ 57 │
│ B000XUBFE2 │ The Book Thief │ 4.526315789473684 │ 57 │
│ B0006399FS │ How to Dismantle an Atomic Bomb │ 1.9821428571428572 │ 56 │
│ B003ZSJ212 │ Star Wars: The Complete Saga (Episodes I-VI) (Packaging May Vary) [Blu-ray] │ 2.309090909090909 │ 55 │
│ 193700788X │ Dead Ever After (Sookie Stackhouse/True Blood) │ 1.5185185185185186 │ 54 │
│ B004FYEZMQ │ Mass Effect 3 │ 2.056603773584906 │ 53 │
│ B000CFYAMC │ The Room │ 3.9615384615384617 │ 52 │
│ B0031JK95S │ Garden of Life Raw Organic Meal │ 3.3137254901960786 │ 51 │
│ B0012JY4G4 │ Color Oops Hair Color Remover Extra Strength 1 Each │ 3.9019607843137254 │ 51 │
│ B007VTVRFA │ SimCity - Limited Edition │ 1.2040816326530612 │ 49 │
│ B00CE18P0K │ Pilot │ 1.7142857142857142 │ 49 │
│ 0316015849 │ Twilight (The Twilight Saga, Book 1) │ 1.8979591836734695 │ 49 │
│ B00DR0PDNE │ Google Chromecast HDMI Streaming Media Player │ 2.5416666666666665 │ 48 │
│ B000056OWC │ The First Years: 4-Stage Bath System │ 1.2127659574468086 │ 47 │
│ B007IXWKUK │ Fifty Shades Darker (Fifty Shades, Book 2) │ 1.6304347826086956 │ 46 │
│ 1892112000 │ To Train Up a Child │ 1.4130434782608696 │ 46 │
│ 043935806X │ Harry Potter and the Order of the Phoenix (Book 5) │ 3.977272727272727 │ 44 │
│ B00BGO0Q9O │ Fitbit Flex Wireless Wristband with Sleep Function, Black │ 1.9318181818181819 │ 44 │
│ B003XF1XOQ │ Mockingjay (Hunger Games Trilogy, Book 3) │ 2.772727272727273 │ 44 │
│ B00DD2B52Y │ Spring Breakers │ 1.2093023255813953 │ 43 │
│ B0064X7FVE │ The Weather Channel: Forecast, Radar & Alerts │ 1.5116279069767442 │ 43 │
│ B0083PWAPW │ Kindle Fire HD 7", Dolby Audio, Dual-Band Wi-Fi │ 2.627906976744186 │ 43 │
│ B00192KCQ0 │ Death Magnetic │ 3.5714285714285716 │ 42 │
│ B007S6Y74O │ Garden of Life Raw Organic Meal │ 3.292682926829268 │ 41 │
│ B0052QYLUM │ Infant Optics DXR-5 Portable Video Baby Monitor │ 2.1463414634146343 │ 41 │
└────────────┴──────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┴───────┘
50 rows in set. Elapsed: 60.052 sec. Processed 150.96 million rows, 68.93 GB (2.51 million rows/s., 1.15 GB/s.)
```
12. We can run the same query again, except this time we search for **awesome** in the reviews:
```sql
SELECT
product_id,
any(product_title),
avg(star_rating),
count() AS count
FROM amazon_reviews
WHERE position(review_body, 'awesome') > 0
GROUP BY product_id
ORDER BY count DESC
LIMIT 50;
```
It runs quite a bit faster - which means the cache is helping us out here:
```response
┌─product_id─┬─any(product_title)────────────────────────────────────────────────────┬───avg(star_rating)─┬─count─┐
│ B00992CF6W │ Minecraft │ 4.848130353039482 │ 4787 │
│ B009UX2YAC │ Subway Surfers │ 4.866720955483171 │ 3684 │
│ B00QW8TYWO │ Crossy Road │ 4.935217903415784 │ 2547 │
│ B00DJFIMW6 │ Minion Rush: Despicable Me Official Game │ 4.850450450450451 │ 2220 │
│ B00AREIAI8 │ My Horse │ 4.865313653136531 │ 2168 │
│ B00I8Q77Y0 │ Flappy Wings (not Flappy Bird) │ 4.8246561886051085 │ 2036 │
│ B0054JZC6E │ 101-in-1 Games │ 4.792542016806722 │ 1904 │
│ B00G5LQ5MU │ Escape The Titanic │ 4.724673710379117 │ 1609 │
│ B0086700CM │ Temple Run │ 4.87636130685458 │ 1561 │
│ B009HKL4B8 │ The Sims Freeplay │ 4.763942931258106 │ 1542 │
│ B00I6IKSZ0 │ Pixel Gun 3D (Pocket Edition) - multiplayer shooter with skin creator │ 4.849894291754757 │ 1419 │
│ B006OC2ANS │ BLOOD & GLORY │ 4.8561538461538465 │ 1300 │
│ B00FATEJYE │ Injustice: Gods Among Us (Kindle Tablet Edition) │ 4.789265982636149 │ 1267 │
│ B00B2V66VS │ Temple Run 2 │ 4.764705882352941 │ 1173 │
│ B00JOT3HQ2 │ Geometry Dash Lite │ 4.909747292418772 │ 1108 │
│ B00DUGCLY4 │ Guess The Emoji │ 4.813606710158434 │ 1073 │
│ B00DR0PDNE │ Google Chromecast HDMI Streaming Media Player │ 4.607276119402985 │ 1072 │
│ B00FAPF5U0 │ Candy Crush Saga │ 4.825757575757576 │ 1056 │
│ B0051VVOB2 │ Kindle Fire (Previous Generation - 1st) │ 4.600407747196738 │ 981 │
│ B007JPG04E │ FRONTLINE COMMANDO │ 4.8125 │ 912 │
│ B00PTB7B34 │ Call of Duty®: Heroes │ 4.876404494382022 │ 890 │
│ B00846GKTW │ Style Me Girl - Free 3D Fashion Dressup │ 4.785714285714286 │ 882 │
│ B004S8F7QM │ Cards Against Humanity │ 4.931034482758621 │ 754 │
│ B00FAX6XQC │ DEER HUNTER CLASSIC │ 4.700272479564033 │ 734 │
│ B00PSGW79I │ Buddyman: Kick │ 4.888736263736264 │ 728 │
│ B00CTQ6SIG │ The Simpsons: Tapped Out │ 4.793948126801153 │ 694 │
│ B008JK6W5K │ Logo Quiz │ 4.782106782106782 │ 693 │
│ B00EDTSKLU │ Geometry Dash │ 4.942028985507246 │ 690 │
│ B00CSR2J9I │ Hill Climb Racing │ 4.880059970014993 │ 667 │
│ B005ZXWMUS │ Netflix │ 4.722306525037936 │ 659 │
│ B00CRFAAYC │ Fab Tattoo Artist FREE │ 4.907435508345979 │ 659 │
│ B00DHQHQCE │ Battle Beach │ 4.863287250384024 │ 651 │
│ B00BGA9WK2 │ PlayStation 4 500GB Console [Old Model] │ 4.688751926040061 │ 649 │
│ B008Y7SMQU │ Logo Quiz - Fun Plus Free │ 4.7888 │ 625 │
│ B0083PWAPW │ Kindle Fire HD 7", Dolby Audio, Dual-Band Wi-Fi │ 4.593900481540931 │ 623 │
│ B008XG1X18 │ Pinterest │ 4.8148760330578515 │ 605 │
│ B007SYWFRM │ Ice Age Village │ 4.8566666666666665 │ 600 │
│ B00K7WGUKA │ Don't Tap The White Tile (Piano Tiles) │ 4.922689075630252 │ 595 │
│ B00BWYQ9YE │ Kindle Fire HDX 7", HDX Display (Previous Generation - 3rd) │ 4.649913344887349 │ 577 │
│ B00IZLM8MY │ High School Story │ 4.840425531914893 │ 564 │
│ B004MC8CA2 │ Bible │ 4.884476534296029 │ 554 │
│ B00KNWYDU8 │ Dragon City │ 4.861111111111111 │ 540 │
│ B009ZKSPDK │ Survivalcraft │ 4.738317757009346 │ 535 │
│ B00A4O6NMG │ My Singing Monsters │ 4.845559845559846 │ 518 │
│ B002MQYOFW │ The Hunger Games (Hunger Games Trilogy, Book 1) │ 4.846899224806202 │ 516 │
│ B005ZFOOE8 │ iHeartRadio Free Music & Internet Radio │ 4.837301587301587 │ 504 │
│ B00AIUUXHC │ Hungry Shark Evolution │ 4.846311475409836 │ 488 │
│ B00E8KLWB4 │ The Secret Society® - Hidden Mystery │ 4.669438669438669 │ 481 │
│ B006D1ONE4 │ Where's My Water? │ 4.916317991631799 │ 478 │
│ B00G6ZTM3Y │ Terraria │ 4.728421052631579 │ 475 │
└────────────┴───────────────────────────────────────────────────────────────────────┴────────────────────┴───────┘
50 rows in set. Elapsed: 33.954 sec. Processed 150.96 million rows, 68.95 GB (4.45 million rows/s., 2.03 GB/s.)
```

View File

@ -0,0 +1,172 @@
---
slug: /en/getting-started/example-datasets/environmental-sensors
sidebar_label: Environmental Sensors Data
---
# Environmental Sensors Data
[Sensor.Community](https://sensor.community/en/) is a contributors-driven global sensor network that creates Open Environmental Data. The data is collected from sensors all over the globe. Anyone can purchase a sensor and place it wherever they like. The APIs to download the data is in [GitHub](https://github.com/opendata-stuttgart/meta/wiki/APIs) and the data is freely available under the [Database Contents License (DbCL)](https://opendatacommons.org/licenses/dbcl/1-0/).
:::important
The dataset has over 20 billion records, so be careful just copying-and-pasting the commands below unless your resources can handle that type of volume. The commands below were executed on a **Production** instance of [ClickHouse Cloud](https://clickhouse.cloud).
:::
1. The data is in S3, so we can use the `s3` table function to create a table from the files. We can also query the data in place. Let's look at a few rows before attempting to insert it into ClickHouse:
```sql
SELECT *
FROM s3(
'https://clickhouse-public-datasets.s3.eu-central-1.amazonaws.com/sensors/monthly/2019-06_bmp180.csv.zst',
'CSVWithNames'
)
LIMIT 10
SETTINGS format_csv_delimiter = ';';
```
The data is in CSV files but uses a semi-colon for the delimiter. The rows look like:
```response
┌─sensor_id─┬─sensor_type─┬─location─┬────lat─┬────lon─┬─timestamp───────────┬──pressure─┬─altitude─┬─pressure_sealevel─┬─temperature─┐
│ 9119 │ BMP180 │ 4594 │ 50.994 │ 7.126 │ 2019-06-01T00:00:00 │ 101471 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 19.9 │
│ 21210 │ BMP180 │ 10762 │ 42.206 │ 25.326 │ 2019-06-01T00:00:00 │ 99525 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 19.3 │
│ 19660 │ BMP180 │ 9978 │ 52.434 │ 17.056 │ 2019-06-01T00:00:04 │ 101570 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 15.3 │
│ 12126 │ BMP180 │ 6126 │ 57.908 │ 16.49 │ 2019-06-01T00:00:05 │ 101802.56 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 8.07 │
│ 15845 │ BMP180 │ 8022 │ 52.498 │ 13.466 │ 2019-06-01T00:00:05 │ 101878 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 23 │
│ 16415 │ BMP180 │ 8316 │ 49.312 │ 6.744 │ 2019-06-01T00:00:06 │ 100176 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 14.7 │
│ 7389 │ BMP180 │ 3735 │ 50.136 │ 11.062 │ 2019-06-01T00:00:06 │ 98905 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 12.1 │
│ 13199 │ BMP180 │ 6664 │ 52.514 │ 13.44 │ 2019-06-01T00:00:07 │ 101855.54 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 19.74 │
│ 12753 │ BMP180 │ 6440 │ 44.616 │ 2.032 │ 2019-06-01T00:00:07 │ 99475 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 17 │
│ 16956 │ BMP180 │ 8594 │ 52.052 │ 8.354 │ 2019-06-01T00:00:08 │ 101322 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 17.2 │
└───────────┴─────────────┴──────────┴────────┴───────┴─────────────────────┴──────────┴──────────┴───────────────────┴─────────────┘
```
2. We will use the following `MergeTree` table to store the data in ClickHouse:
```sql
CREATE TABLE sensors
(
sensor_id UInt16,
sensor_type Enum('BME280', 'BMP180', 'BMP280', 'DHT22', 'DS18B20', 'HPM', 'HTU21D', 'PMS1003', 'PMS3003', 'PMS5003', 'PMS6003', 'PMS7003', 'PPD42NS', 'SDS011'),
location UInt32,
lat Float32,
lon Float32,
timestamp DateTime,
P1 Float32,
P2 Float32,
P0 Float32,
durP1 Float32,
ratioP1 Float32,
durP2 Float32,
ratioP2 Float32,
pressure Float32,
altitude Float32,
pressure_sealevel Float32,
temperature Float32,
humidity Float32,
date Date MATERIALIZED toDate(timestamp)
)
ENGINE = MergeTree
ORDER BY (timestamp, sensor_id);
```
3. ClickHouse Cloud services have a cluster named `default`. We will use the `s3Cluster` table function, which reads S3 files in parallel from the nodes in your cluster. (If you do not have a cluster, just use the `s3` function and remove the cluster name.)
This query will take a while - it's about 1.67T of data uncompressed:
```sql
INSERT INTO sensors
SELECT *
FROM s3Cluster(
'default',
'https://clickhouse-public-datasets.s3.amazonaws.com/sensors/monthly/*.csv.zst',
'CSVWithNames',
$$ sensor_id UInt16,
sensor_type String,
location UInt32,
lat Float32,
lon Float32,
timestamp DateTime,
P1 Float32,
P2 Float32,
P0 Float32,
durP1 Float32,
ratioP1 Float32,
durP2 Float32,
ratioP2 Float32,
pressure Float32,
altitude Float32,
pressure_sealevel Float32,
temperature Float32,
humidity Float32 $$
)
SETTINGS
format_csv_delimiter = ';',
input_format_allow_errors_ratio = '0.5',
input_format_allow_errors_num = 10000,
input_format_parallel_parsing = 0,
date_time_input_format = 'best_effort',
max_insert_threads = 32,
parallel_distributed_insert_select = 1;
```
Here is the response - showing the number of rows and the speed of processing. It is input at a rate of over 6M rows per second!
```response
0 rows in set. Elapsed: 3419.330 sec. Processed 20.69 billion rows, 1.67 TB (6.05 million rows/s., 488.52 MB/s.)
```
4. Let's see how much storage disk is needed for the `sensors` table:
```sql
SELECT
disk_name,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
round(usize / size, 2) AS compr_rate,
sum(rows) AS rows,
count() AS part_count
FROM system.parts
WHERE (active = 1) AND (table = 'sensors')
GROUP BY
disk_name
ORDER BY size DESC;
```
The 1.67T is compressed down to 310 GiB, and there are 20.69 billion rows:
```response
┌─disk_name─┬─compressed─┬─uncompressed─┬─compr_rate─┬────────rows─┬─part_count─┐
│ s3disk │ 310.21 GiB │ 1.30 TiB │ 4.29 │ 20693971809 │ 472 │
└───────────┴────────────┴──────────────┴────────────┴─────────────┴────────────┘
```
5. Let's analyze the data now that it's in ClickHouse. Notice the quantity of data increases over time as more sensors are deployed:
```sql
SELECT
date,
count()
FROM sensors
GROUP BY date
ORDER BY date ASC;
```
We can create a chart in the SQL Console to visualize the results:
![Number of events per day](./images/sensors_01.png)
6. This query counts the number of overly hot and humid days:
```sql
WITH
toYYYYMMDD(timestamp) AS day
SELECT day, count() FROM sensors
WHERE temperature >= 40 AND temperature <= 50 AND humidity >= 90
GROUP BY day
ORDER BY day asc;
```
Here's a visualization of the result:
![Hot and humid days](./images/sensors_02.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

View File

@ -216,4 +216,276 @@ The results look like:
│ 1919 │ 63 │ 1 │ https://youtu.be/b9MeoOtAivQ │ ClickHouse v21.10 Release Webinar │
│ 8710 │ 62 │ 4 │ https://youtu.be/PeV1mC2z--M │ What is JDBC DriverManager? | JDBC │
│ 3534 │ 62 │ 1 │ https://youtu.be/8nWRhK9gw10 │ CLICKHOUSE - Arquitetura Modular │
```
## Questions
### If someone disables comments does it lower the chance someone will actually click like or dislike?
When commenting is disabled, are people more likely to like or dislike to express their feelings about a video?
```sql
SELECT
concat('< ', formatReadableQuantity(view_range)) AS views,
is_comments_enabled,
total_clicks / num_views AS prob_like_dislike
FROM
(
SELECT
is_comments_enabled,
power(10, CEILING(log10(view_count + 1))) AS view_range,
sum(like_count + dislike_count) AS total_clicks,
sum(view_count) AS num_views
FROM youtube
GROUP BY
view_range,
is_comments_enabled
) WHERE view_range > 1
ORDER BY
is_comments_enabled ASC,
num_views ASC
```
```response
┌─views─────────────┬─is_comments_enabled─┬────prob_like_dislike─┐
< 10.00 false 0.08224180712685371
< 100.00 false 0.06346337759167248
< 1.00 thousand false 0.03201883652987105
< 10.00 thousand false 0.01716073540410903
< 10.00 billion false 0.004555639481829971
< 100.00 thousand false 0.01293351460515323
< 1.00 billion false 0.004761811192464957
< 1.00 million false 0.010472604018980551
< 10.00 million false 0.00788902538420125
< 100.00 million false 0.00579152804250582
< 10.00 true 0.09819517478134059
< 100.00 true 0.07403784478585775
< 1.00 thousand true 0.03846294910067627
< 10.00 billion true 0.005615217329358215
< 10.00 thousand true 0.02505881391701455
< 1.00 billion true 0.007434998802482997
< 100.00 thousand true 0.022694648130822004
< 100.00 million true 0.011761563746575625
< 1.00 million true 0.020776022304589435
< 10.00 million true 0.016917095718089584
└───────────────────┴─────────────────────┴──────────────────────┘
22 rows in set. Elapsed: 8.460 sec. Processed 4.56 billion rows, 77.48 GB (538.73 million rows/s., 9.16 GB/s.)
```
Enabling comments seems to be correlated with a higher rate of engagement.
### How does the number of videos change over time - notable events?
```sql
SELECT
toStartOfMonth(toDateTime(upload_date)) AS month,
uniq(uploader_id) AS uploaders,
count() as num_videos,
sum(view_count) as view_count
FROM youtube
WHERE (month >= '2005-01-01') AND (month < '2021-12-01')
GROUP BY month
ORDER BY month ASC
```
```response
┌──────month─┬─uploaders─┬─num_videos─┬───view_count─┐
│ 2005-04-01 │ 5 │ 6 │ 213597737 │
│ 2005-05-01 │ 6 │ 9 │ 2944005 │
│ 2005-06-01 │ 165 │ 351 │ 18624981 │
│ 2005-07-01 │ 395 │ 1168 │ 94164872 │
│ 2005-08-01 │ 1171 │ 3128 │ 124540774 │
│ 2005-09-01 │ 2418 │ 5206 │ 475536249 │
│ 2005-10-01 │ 6750 │ 13747 │ 737593613 │
│ 2005-11-01 │ 13706 │ 28078 │ 1896116976 │
│ 2005-12-01 │ 24756 │ 49885 │ 2478418930 │
│ 2006-01-01 │ 49992 │ 100447 │ 4532656581 │
│ 2006-02-01 │ 67882 │ 138485 │ 5677516317 │
│ 2006-03-01 │ 103358 │ 212237 │ 8430301366 │
│ 2006-04-01 │ 114615 │ 234174 │ 9980760440 │
│ 2006-05-01 │ 152682 │ 332076 │ 14129117212 │
│ 2006-06-01 │ 193962 │ 429538 │ 17014143263 │
│ 2006-07-01 │ 234401 │ 530311 │ 18721143410 │
│ 2006-08-01 │ 281280 │ 614128 │ 20473502342 │
│ 2006-09-01 │ 312434 │ 679906 │ 23158422265 │
│ 2006-10-01 │ 404873 │ 897590 │ 27357846117 │
```
A spike of uploaders [around covid is noticeable](https://www.theverge.com/2020/3/27/21197642/youtube-with-me-style-videos-views-coronavirus-cook-workout-study-home-beauty).
### More subtitiles over time and when
With advances in speech recognition, its easier than ever to create subtitles for video with youtube adding auto-captioning in late 2009 - was the jump then?
```sql
SELECT
toStartOfMonth(upload_date) AS month,
countIf(has_subtitles) / count() AS percent_subtitles,
percent_subtitles - any(percent_subtitles) OVER (ORDER BY month ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS previous
FROM youtube
WHERE (month >= '2015-01-01') AND (month < '2021-12-02')
GROUP BY month
ORDER BY month ASC
```
```response
┌──────month─┬───percent_subtitles─┬────────────────previous─┐
│ 2015-01-01 │ 0.2652653881082824 │ 0.2652653881082824 │
│ 2015-02-01 │ 0.3147556050309162 │ 0.049490216922633834 │
│ 2015-03-01 │ 0.32460464492371877 │ 0.009849039892802558 │
│ 2015-04-01 │ 0.33471963051468445 │ 0.010114985590965686 │
│ 2015-05-01 │ 0.3168087575501062 │ -0.017910872964578273 │
│ 2015-06-01 │ 0.3162609788438222 │ -0.0005477787062839745 │
│ 2015-07-01 │ 0.31828767677518033 │ 0.0020266979313581235 │
│ 2015-08-01 │ 0.3045551564286859 │ -0.013732520346494415 │
│ 2015-09-01 │ 0.311221133995152 │ 0.006665977566466086 │
│ 2015-10-01 │ 0.30574870926812175 │ -0.005472424727030245 │
│ 2015-11-01 │ 0.31125409712077234 │ 0.0055053878526505895 │
│ 2015-12-01 │ 0.3190967954651779 │ 0.007842698344405541 │
│ 2016-01-01 │ 0.32636021432496176 │ 0.007263418859783877 │
```
The data results show a spike in 2009. Apparently at that, time YouTube was removing their community captions feature, which allowed you to upload captions for other people's video.
This prompted a very successful campaign to have creators add captions to their videos for hard of hearing and deaf viewers.
### Top uploaders over time
```sql
WITH uploaders AS
(
SELECT uploader
FROM youtube
GROUP BY uploader
ORDER BY sum(view_count) DESC
LIMIT 10
)
SELECT
month,
uploader,
sum(view_count) AS total_views,
avg(dislike_count / like_count) AS like_to_dislike_ratio
FROM youtube
WHERE uploader IN (uploaders)
GROUP BY
toStartOfMonth(upload_date) AS month,
uploader
ORDER BY
month ASC,
total_views DESC
1001 rows in set. Elapsed: 34.917 sec. Processed 4.58 billion rows, 69.08 GB (131.15 million rows/s., 1.98 GB/s.)
```
```response
┌──────month─┬─uploader───────────────────┬─total_views─┬─like_to_dislike_ratio─┐
│ 1970-01-01 │ T-Series │ 10957099 │ 0.022784656361208206 │
│ 1970-01-01 │ Ryan's World │ 0 │ 0.003035559410234172 │
│ 1970-01-01 │ SET India │ 0 │ nan │
│ 2006-09-01 │ Cocomelon - Nursery Rhymes │ 256406497 │ 0.7005566715978622 │
│ 2007-06-01 │ Cocomelon - Nursery Rhymes │ 33641320 │ 0.7088650914344298 │
│ 2008-02-01 │ WWE │ 43733469 │ 0.07198856488734842 │
│ 2008-03-01 │ WWE │ 16514541 │ 0.1230603715431997 │
│ 2008-04-01 │ WWE │ 5907295 │ 0.2089399470159618 │
│ 2008-05-01 │ WWE │ 7779627 │ 0.09101676560436774 │
│ 2008-06-01 │ WWE │ 7018780 │ 0.0974184753155297 │
│ 2008-07-01 │ WWE │ 4686447 │ 0.1263845422065158 │
│ 2008-08-01 │ WWE │ 4514312 │ 0.08384574274791441 │
│ 2008-09-01 │ WWE │ 3717092 │ 0.07872802579349912 │
```
### How do like ratio changes as views go up?
```sql
SELECT
concat('< ', formatReadableQuantity(view_range)) AS view_range,
is_comments_enabled,
round(like_ratio, 2) AS like_ratio
FROM
(
SELECT
power(10, CEILING(log10(view_count + 1))) as view_range,
is_comments_enabled,
avg(like_count / dislike_count) as like_ratio
FROM youtube WHERE dislike_count > 0
GROUP BY
view_range,
is_comments_enabled HAVING view_range > 1
ORDER BY
view_range ASC,
is_comments_enabled ASC
)
20 rows in set. Elapsed: 9.043 sec. Processed 4.56 billion rows, 77.48 GB (503.99 million rows/s., 8.57 GB/s.)
```
```response
┌─view_range────────┬─is_comments_enabled─┬─like_ratio─┐
< 10.00 false 0.66
< 10.00 true 0.66
< 100.00 false 3
< 100.00 true 3.95
< 1.00 thousand false 8.45
< 1.00 thousand true 13.07
< 10.00 thousand false 18.57
< 10.00 thousand true 30.92
< 100.00 thousand false 23.55
< 100.00 thousand true 42.13
< 1.00 million false 19.23
< 1.00 million true 37.86
< 10.00 million false 12.13
< 10.00 million true 30.72
< 100.00 million false 6.67
< 100.00 million true 23.32
< 1.00 billion false 3.08
< 1.00 billion true 20.69
< 10.00 billion false 1.77
< 10.00 billion true 19.5
└───────────────────┴─────────────────────┴────────────┘
```
### How are views distributed?
```sql
SELECT
labels AS percentile,
round(quantiles) AS views
FROM
(
SELECT
quantiles(0.999, 0.99, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)(view_count) AS quantiles,
['99.9th', '99th', '95th', '90th', '80th', '70th','60th', '50th', '40th', '30th', '20th', '10th'] AS labels
FROM youtube
)
ARRAY JOIN
quantiles,
labels
12 rows in set. Elapsed: 1.864 sec. Processed 4.56 billion rows, 36.46 GB (2.45 billion rows/s., 19.56 GB/s.)
```
```response
┌─percentile─┬───views─┐
│ 99.9th │ 1216624 │
│ 99th │ 143519 │
│ 95th │ 13542 │
│ 90th │ 4054 │
│ 80th │ 950 │
│ 70th │ 363 │
│ 60th │ 177 │
│ 50th │ 97 │
│ 40th │ 57 │
│ 30th │ 32 │
│ 20th │ 16 │
│ 10th │ 6 │
└────────────┴─────────┘
```

View File

@ -1818,15 +1818,19 @@ The table below shows supported data types and how they match ClickHouse [data t
| `bytes`, `string`, `fixed` | [FixedString(N)](/docs/en/sql-reference/data-types/fixedstring.md) | `fixed(N)` |
| `enum` | [Enum(8\16)](/docs/en/sql-reference/data-types/enum.md) | `enum` |
| `array(T)` | [Array(T)](/docs/en/sql-reference/data-types/array.md) | `array(T)` |
| `map(V, K)` | [Map(V, K)](/docs/en/sql-reference/data-types/map.md) | `map(string, K)` |
| `union(null, T)`, `union(T, null)` | [Nullable(T)](/docs/en/sql-reference/data-types/date.md) | `union(null, T)` |
| `null` | [Nullable(Nothing)](/docs/en/sql-reference/data-types/special-data-types/nothing.md) | `null` |
| `int (date)` \** | [Date](/docs/en/sql-reference/data-types/date.md), [Date32](docs/en/sql-reference/data-types/date32.md) | `int (date)` \** |
| `long (timestamp-millis)` \** | [DateTime64(3)](/docs/en/sql-reference/data-types/datetime.md) | `long (timestamp-millis)` \** |
| `long (timestamp-micros)` \** | [DateTime64(6)](/docs/en/sql-reference/data-types/datetime.md) | `long (timestamp-micros)` \** |
| `bytes (decimal)` \** | [DateTime64(N)](/docs/en/sql-reference/data-types/datetime.md) | `bytes (decimal)` \** |
| `int` | [IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | `int` |
| `fixed(16)` | [IPv6](/docs/en/sql-reference/data-types/domains/ipv6.md) | `fixed(16)` |
| `bytes (decimal)` \** | [Decimal(P, S)](/docs/en/sql-reference/data-types/decimal.md) | `bytes (decimal)` \** |
| `string (uuid)` \** | [UUID](/docs/en/sql-reference/data-types/uuid.md) | `string (uuid)` \** |
| `bytes (decimal)` \** | [Decimal(P, S)](/docs/en/sql-reference/data-types/decimal.md) | `bytes (decimal)` \** |
| `string (uuid)` \** | [UUID](/docs/en/sql-reference/data-types/uuid.md) | `string (uuid)` \** |
| `fixed(16)` | [Int128/UInt128](/docs/en/sql-reference/data-types/int-uint.md) | `fixed(16)` |
| `fixed(32)` | [Int256/UInt256](/docs/en/sql-reference/data-types/int-uint.md) | `fixed(32)` |
\* `bytes` is default, controlled by [output_format_avro_string_column_pattern](/docs/en/operations/settings/settings-formats.md/#output_format_avro_string_column_pattern)
@ -2281,22 +2285,28 @@ ClickHouse supports reading and writing [MessagePack](https://msgpack.org/) data
### Data Types Matching {#data-types-matching-msgpack}
| MessagePack data type (`INSERT`) | ClickHouse data type | MessagePack data type (`SELECT`) |
|--------------------------------------------------------------------|-----------------------------------------------------------------|------------------------------------|
| `uint N`, `positive fixint` | [UIntN](/docs/en/sql-reference/data-types/int-uint.md) | `uint N` |
| `int N`, `negative fixint` | [IntN](/docs/en/sql-reference/data-types/int-uint.md) | `int N` |
| `bool` | [UInt8](/docs/en/sql-reference/data-types/int-uint.md) | `uint 8` |
| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [String](/docs/en/sql-reference/data-types/string.md) | `bin 8`, `bin 16`, `bin 32` |
| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | `bin 8`, `bin 16`, `bin 32` |
| `float 32` | [Float32](/docs/en/sql-reference/data-types/float.md) | `float 32` |
| `float 64` | [Float64](/docs/en/sql-reference/data-types/float.md) | `float 64` |
| `uint 16` | [Date](/docs/en/sql-reference/data-types/date.md) | `uint 16` |
| `uint 32` | [DateTime](/docs/en/sql-reference/data-types/datetime.md) | `uint 32` |
| `uint 64` | [DateTime64](/docs/en/sql-reference/data-types/datetime.md) | `uint 64` |
| `fixarray`, `array 16`, `array 32` | [Array](/docs/en/sql-reference/data-types/array.md) | `fixarray`, `array 16`, `array 32` |
| `fixmap`, `map 16`, `map 32` | [Map](/docs/en/sql-reference/data-types/map.md) | `fixmap`, `map 16`, `map 32` |
| `uint 32` | [IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | `uint 32` |
| `bin 8` | [String](/docs/en/sql-reference/data-types/string.md) | `bin 8` |
| MessagePack data type (`INSERT`) | ClickHouse data type | MessagePack data type (`SELECT`) |
|--------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------|
| `uint N`, `positive fixint` | [UIntN](/docs/en/sql-reference/data-types/int-uint.md) | `uint N` |
| `int N`, `negative fixint` | [IntN](/docs/en/sql-reference/data-types/int-uint.md) | `int N` |
| `bool` | [UInt8](/docs/en/sql-reference/data-types/int-uint.md) | `uint 8` |
| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [String](/docs/en/sql-reference/data-types/string.md) | `bin 8`, `bin 16`, `bin 32` |
| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | `bin 8`, `bin 16`, `bin 32` |
| `float 32` | [Float32](/docs/en/sql-reference/data-types/float.md) | `float 32` |
| `float 64` | [Float64](/docs/en/sql-reference/data-types/float.md) | `float 64` |
| `uint 16` | [Date](/docs/en/sql-reference/data-types/date.md) | `uint 16` |
| `int 32` | [Date32](/docs/en/sql-reference/data-types/date32.md) | `int 32` |
| `uint 32` | [DateTime](/docs/en/sql-reference/data-types/datetime.md) | `uint 32` |
| `uint 64` | [DateTime64](/docs/en/sql-reference/data-types/datetime.md) | `uint 64` |
| `fixarray`, `array 16`, `array 32` | [Array](/docs/en/sql-reference/data-types/array.md)/[Tuple](/docs/en/sql-reference/data-types/tuple.md) | `fixarray`, `array 16`, `array 32` |
| `fixmap`, `map 16`, `map 32` | [Map](/docs/en/sql-reference/data-types/map.md) | `fixmap`, `map 16`, `map 32` |
| `uint 32` | [IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | `uint 32` |
| `bin 8` | [String](/docs/en/sql-reference/data-types/string.md) | `bin 8` |
| `int 8` | [Enum8](/docs/en/sql-reference/data-types/enum.md) | `int 8` |
| `bin 8` | [(U)Int128/(U)Int256](/docs/en/sql-reference/data-types/int-uint.md) | `bin 8` |
| `int 32` | [Decimal32](/docs/en/sql-reference/data-types/decimal.md) | `int 32` |
| `int 64` | [Decimal64](/docs/en/sql-reference/data-types/decimal.md) | `int 64` |
| `bin 8` | [Decimal128/Decimal256](/docs/en/sql-reference/data-types/decimal.md) | `bin 8 ` |
Example:

View File

@ -257,6 +257,7 @@ The path to the table in ZooKeeper.
``` xml
<default_replica_path>/clickhouse/tables/{uuid}/{shard}</default_replica_path>
```
## default_replica_name {#default_replica_name}
The replica name in ZooKeeper.
@ -418,6 +419,7 @@ Opens `https://tabix.io/` when accessing `http://localhost: http_port`.
<![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]>
</http_server_default_response>
```
## hsts_max_age {#hsts-max-age}
Expired time for HSTS in seconds. The default value is 0 means clickhouse disabled HSTS. If you set a positive number, the HSTS will be enabled and the max-age is the number you set.
@ -1113,7 +1115,7 @@ Default value: 8.
## background_fetches_pool_size {#background_fetches_pool_size}
Sets the number of threads performing background fetches for tables with ReplicatedMergeTree engines. Could be increased at runtime and could be applied at server startup from the `default` profile for backward compatibility.
Sets the number of threads performing background fetches for tables with ReplicatedMergeTree engines. Could be increased at runtime.
Possible values:
@ -1129,7 +1131,7 @@ Default value: 8.
## background_common_pool_size {#background_common_pool_size}
Sets the number of threads performing background non-specialized operations like cleaning the filesystem etc. for tables with MergeTree engines. Could be increased at runtime and could be applied at server startup from the `default` profile for backward compatibility.
Sets the number of threads performing background non-specialized operations like cleaning the filesystem etc. for tables with MergeTree engines. Could be increased at runtime.
Possible values:
@ -1143,6 +1145,25 @@ Default value: 8.
<background_common_pool_size>36</background_common_pool_size>
```
## background_buffer_flush_schedule_pool_size {#background_buffer_flush_schedule_pool_size}
Sets the number of threads performing background flush in [Buffer](../../engines/table-engines/special/buffer.md)-engine tables.
Possible values:
- Any positive integer.
Default value: 16.
## background_schedule_pool_size {#background_schedule_pool_size}
Sets the number of threads performing background tasks for [replicated](../../engines/table-engines/mergetree-family/replication.md) tables, [Kafka](../../engines/table-engines/integrations/kafka.md) streaming, [DNS cache updates](../../operations/server-configuration-parameters/settings.md/#server-settings-dns-cache-update-period).
Possible values:
- Any positive integer.
Default value: 128.
## merge_tree {#server_configuration_parameters-merge_tree}

View File

@ -1128,7 +1128,7 @@ Default value: `2.latest`.
Compression method used in output Parquet format. Supported codecs: `snappy`, `lz4`, `brotli`, `zstd`, `gzip`, `none` (uncompressed)
Default value: `snappy`.
Default value: `lz4`.
## Hive format settings {#hive-format-settings}

View File

@ -1552,6 +1552,7 @@ For the replicated tables by default the only 100 of the most recent blocks for
For not replicated tables see [non_replicated_deduplication_window](merge-tree-settings.md/#non-replicated-deduplication-window).
## Asynchronous Insert settings
### async_insert {#async-insert}
Enables or disables asynchronous inserts. This makes sense only for insertion over HTTP protocol. Note that deduplication isn't working for such inserts.
@ -1645,6 +1646,7 @@ Possible values:
- 0 — Timeout disabled.
Default value: `0`.
### async_insert_deduplicate {#settings-async-insert-deduplicate}
Enables or disables insert deduplication of `ASYNC INSERT` (for Replicated\* tables).
@ -2449,43 +2451,19 @@ Default value: `1`.
## background_buffer_flush_schedule_pool_size {#background_buffer_flush_schedule_pool_size}
Sets the number of threads performing background flush in [Buffer](../../engines/table-engines/special/buffer.md)-engine tables. This setting is applied at the ClickHouse server start and cant be changed in a user session.
Possible values:
- Any positive integer.
Default value: 16.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_buffer_flush_schedule_pool_size).
## background_move_pool_size {#background_move_pool_size}
Sets the number of threads performing background moves of data parts for [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-multiple-volumes)-engine tables. This setting is applied at the ClickHouse server start and cant be changed in a user session.
Possible values:
- Any positive integer.
Default value: 8.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_move_pool_size).
## background_schedule_pool_size {#background_schedule_pool_size}
Sets the number of threads performing background tasks for [replicated](../../engines/table-engines/mergetree-family/replication.md) tables, [Kafka](../../engines/table-engines/integrations/kafka.md) streaming, [DNS cache updates](../../operations/server-configuration-parameters/settings.md/#server-settings-dns-cache-update-period). This setting is applied at ClickHouse server start and cant be changed in a user session.
Possible values:
- Any positive integer.
Default value: 128.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_schedule_pool_size).
## background_fetches_pool_size {#background_fetches_pool_size}
Sets the number of threads performing background fetches for [replicated](../../engines/table-engines/mergetree-family/replication.md) tables. This setting is applied at the ClickHouse server start and cant be changed in a user session. For production usage with frequent small insertions or slow ZooKeeper cluster it is recommended to use default value.
Possible values:
- Any positive integer.
Default value: 8.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_fetches_pool_size).
## always_fetch_merged_part {#always_fetch_merged_part}
@ -2506,28 +2484,11 @@ Default value: 0.
## background_distributed_schedule_pool_size {#background_distributed_schedule_pool_size}
Sets the number of threads performing background tasks for [distributed](../../engines/table-engines/special/distributed.md) sends. This setting is applied at the ClickHouse server start and cant be changed in a user session.
Possible values:
- Any positive integer.
Default value: 16.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_distributed_schedule_pool_size).
## background_message_broker_schedule_pool_size {#background_message_broker_schedule_pool_size}
Sets the number of threads performing background tasks for message streaming. This setting is applied at the ClickHouse server start and cant be changed in a user session.
Possible values:
- Any positive integer.
Default value: 16.
**See Also**
- [Kafka](../../engines/table-engines/integrations/kafka.md/#kafka) engine.
- [RabbitMQ](../../engines/table-engines/integrations/rabbitmq.md/#rabbitmq-engine) engine.
That setting was moved to the [server configuration parameters](../../operations/server-configuration-parameters/settings.md/#background_message_broker_schedule_pool_size).
## validate_polygons {#validate_polygons}
@ -2769,7 +2730,7 @@ Default value: `120` seconds.
## cast_keep_nullable {#cast_keep_nullable}
Enables or disables keeping of the `Nullable` data type in [CAST](../../sql-reference/functions/type-conversion-functions.md/#type_conversion_function-cast) operations.
Enables or disables keeping of the `Nullable` data type in [CAST](../../sql-reference/functions/type-conversion-functions.md/#castx-t) operations.
When the setting is enabled and the argument of `CAST` function is `Nullable`, the result is also transformed to `Nullable` type. When the setting is disabled, the result always has the destination type exactly.
@ -4060,8 +4021,8 @@ Possible values:
Default value: `0`.
## stop_reading_on_first_cancel {#stop_reading_on_first_cancel}
When set to `true` and the user wants to interrupt a query (for example using `Ctrl+C` on the client), then the query continues execution only on data that was already read from the table. Afterward, it will return a partial result of the query for the part of the table that was read. To fully stop the execution of a query without a partial result, the user should send 2 cancel requests.
## partial_result_on_first_cancel {#partial_result_on_first_cancel}
When set to `true` and the user wants to interrupt a query (for example using `Ctrl+C` on the client), then the query continues execution only on data that was already read from the table. Afterwards, it will return a partial result of the query for the part of the table that was read. To fully stop the execution of a query without a partial result, the user should send 2 cancel requests.
**Example without setting on Ctrl+C**
```sql
@ -4076,7 +4037,7 @@ Query was cancelled.
**Example with setting on Ctrl+C**
```sql
SELECT sum(number) FROM numbers(10000000000) SETTINGS stop_reading_on_first_cancel=true
SELECT sum(number) FROM numbers(10000000000) SETTINGS partial_result_on_first_cancel=true
┌──────sum(number)─┐
│ 1355411451286266 │

View File

@ -13,6 +13,40 @@ incompatible datatypes (for example from `String` to `Int`). Make sure to check
ClickHouse generally uses the [same behavior as C++ programs](https://en.cppreference.com/w/cpp/language/implicit_conversion).
`to<type>` functions and [cast](#castx-t) have different behaviour in some cases, for example in case of [LowCardinality](../data-types/lowcardinality.md): [cast](#castx-t) removes [LowCardinality](../data-types/lowcardinality.md) trait `to<type>` functions don't. The same with [Nullable](../data-types/nullable.md), this behaviour is not compatible with SQL standard, and it can be changed using [cast_keep_nullable](../../operations/settings/settings.md/#cast_keep_nullable) setting.
Example:
```sql
SELECT
toTypeName(toLowCardinality('') AS val) AS source_type,
toTypeName(toString(val)) AS to_type_result_type,
toTypeName(CAST(val, 'String')) AS cast_result_type
┌─source_type────────────┬─to_type_result_type────┬─cast_result_type─┐
│ LowCardinality(String) │ LowCardinality(String) │ String │
└────────────────────────┴────────────────────────┴──────────────────┘
SELECT
toTypeName(toNullable('') AS val) AS source_type,
toTypeName(toString(val)) AS to_type_result_type,
toTypeName(CAST(val, 'String')) AS cast_result_type
┌─source_type──────┬─to_type_result_type─┬─cast_result_type─┐
│ Nullable(String) │ Nullable(String) │ String │
└──────────────────┴─────────────────────┴──────────────────┘
SELECT
toTypeName(toNullable('') AS val) AS source_type,
toTypeName(toString(val)) AS to_type_result_type,
toTypeName(CAST(val, 'String')) AS cast_result_type
SETTINGS cast_keep_nullable = 1
┌─source_type──────┬─to_type_result_type─┬─cast_result_type─┐
│ Nullable(String) │ Nullable(String) │ Nullable(String) │
└──────────────────┴─────────────────────┴──────────────────┘
```
## toInt(8\|16\|32\|64\|128\|256)
Converts an input value to a value the [Int](/docs/en/sql-reference/data-types/int-uint.md) data type. This function family includes:
@ -737,6 +771,44 @@ Result:
└────────────┴───────┘
```
## toDecimalString
Converts a numeric value to String with the number of fractional digits in the output specified by the user.
**Syntax**
``` sql
toDecimalString(number, scale)
```
**Parameters**
- `number` — Value to be represented as String, [Int, UInt](/docs/en/sql-reference/data-types/int-uint.md), [Float](/docs/en/sql-reference/data-types/float.md), [Decimal](/docs/en/sql-reference/data-types/decimal.md),
- `scale` — Number of fractional digits, [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
* Maximum scale for [Decimal](/docs/en/sql-reference/data-types/decimal.md) and [Int, UInt](/docs/en/sql-reference/data-types/int-uint.md) types is 77 (it is the maximum possible number of significant digits for Decimal),
* Maximum scale for [Float](/docs/en/sql-reference/data-types/float.md) is 60.
**Returned value**
- Input value represented as [String](/docs/en/sql-reference/data-types/string.md) with given number of fractional digits (scale).
The number is rounded up or down according to common arithmetics in case requested scale is smaller than original number's scale.
**Example**
Query:
``` sql
SELECT toDecimalString(CAST('64.32', 'Float64'), 5);
```
Result:
```response
┌toDecimalString(CAST('64.32', 'Float64'), 5)─┐
│ 64.32000 │
└─────────────────────────────────────────────┘
```
## reinterpretAsUInt(8\|16\|32\|64)
## reinterpretAsInt(8\|16\|32\|64)
@ -956,7 +1028,7 @@ Result:
**See also**
- [cast_keep_nullable](/docs/en/operations/settings/settings.md/#cast_keep_nullable) setting
- [cast_keep_nullable](../../operations/settings/settings.md/#cast_keep_nullable) setting
## accurateCast(x, T)

View File

@ -48,6 +48,39 @@ SELECT generateULID(1), generateULID(2)
└────────────────────────────┴────────────────────────────┘
```
## ULIDStringToDateTime
This function extracts the timestamp from a ULID.
**Syntax**
``` sql
ULIDStringToDateTime(ulid[, timezone])
```
**Arguments**
- `ulid` — Input ULID. [String](/docs/en/sql-reference/data-types/string.md) or [FixedString(26)](/docs/en/sql-reference/data-types/fixedstring.md).
- `timezone` — [Timezone name](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). [String](../../sql-reference/data-types/string.md).
**Returned value**
- Timestamp with milliseconds precision.
Type: [DateTime64(3)](/docs/en/sql-reference/data-types/datetime64.md).
**Usage example**
``` sql
SELECT ULIDStringToDateTime('01GNB2S2FGN2P93QPXDNB4EN2R')
```
``` text
┌─ULIDStringToDateTime('01GNB2S2FGN2P93QPXDNB4EN2R')─┐
│ 2022-12-28 00:40:37.616 │
└────────────────────────────────────────────────────┘
```
## See Also
- [UUID](../../sql-reference/functions/uuid-functions.md)

View File

@ -21,15 +21,6 @@ DELETE FROM hits WHERE Title LIKE '%hello%';
Lightweight deletes are asynchronous by default. Set `mutations_sync` equal to 1 to wait for one replica to process the statement, and set `mutations_sync` to 2 to wait for all replicas.
:::note
This feature is experimental and requires you to set `allow_experimental_lightweight_delete` to true:
```sql
SET allow_experimental_lightweight_delete = true;
```
:::
:::note
`DELETE FROM` requires the `ALTER DELETE` privilege:
```sql
@ -64,6 +55,3 @@ With the described implementation now we can see what can negatively affect 'DEL
- Table having a very large number of data parts
- Having a lot of data in Compact parts—in a Compact part, all columns are stored in one file.
:::note
This implementation might change in the future.
:::

View File

@ -47,6 +47,7 @@ The default join type can be overridden using [join_default_strictness](../../..
The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting.
**See also**
- [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm)
@ -57,6 +58,8 @@ The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_
- [join_on_disk_max_files_to_merge](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge)
- [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys)
Use the `cross_to_inner_join_rewrite` setting to define the behavior when ClickHouse fails to rewrite a `CROSS JOIN` as an `INNER JOIN`. The default value is `1`, which allows the join to continue but it will be slower. Set `cross_to_inner_join_rewrite` to `0` if you want an error to be thrown, and set it to `2` to not run the cross joins but instead force a rewrite of all comma/cross joins. If the rewriting fails when the value is `2`, you will receive an error message stating "Please, try to simplify `WHERE` section".
## ON Section Conditions
An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must refer both left and right tables and must use the equality operator. Other conditions may use other logical operators but they must refer either the left or the right table of a query.

View File

@ -12,7 +12,7 @@ Provides a table-like interface to select/insert files in [Amazon S3](https://aw
**Syntax**
``` sql
s3(path [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
s3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
```
:::tip GCS
@ -33,6 +33,7 @@ For GCS, substitute your HMAC key and HMAC secret where you see `aws_access_key_
and not ~~https://storage.cloud.google.com~~.
:::
- `NOSIGN` - If this keyword is provided in place of credentials, all the requests will not be signed.
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
- `compression` — Parameter is optional. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression by file extension.
@ -185,6 +186,21 @@ INSERT INTO TABLE FUNCTION
```
As a result, the data is written into three files in different buckets: `my_bucket_1/file.csv`, `my_bucket_10/file.csv`, and `my_bucket_20/file.csv`.
## Accessing public buckets
ClickHouse tries to fetch credentials from many different types of sources.
Sometimes, it can produce problems when accessing some buckets that are public causing the client to return `403` error code.
This issue can be avoided by using `NOSIGN` keyword, forcing the client to ignore all the credentials, and not sign the requests.
``` sql
SELECT *
FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv',
NOSIGN,
'CSVWithNames'
)
LIMIT 5;
```
**See Also**

23
docs/get-clickhouse-docs.sh Normal file → Executable file
View File

@ -7,6 +7,8 @@ WORKDIR=$(dirname "$0")
WORKDIR=$(readlink -f "${WORKDIR}")
cd "$WORKDIR"
UPDATE_PERIOD_HOURS=${UPDATE_PERIOD_HOURS:=24}
if [ -d "clickhouse-docs" ]; then
git -C clickhouse-docs pull
else
@ -27,5 +29,26 @@ else
exit 1
;;
esac
if [ -n "$2" ]; then
set_git_hook="$2"
elif [ -z "$1" ]; then
read -rp "Would you like to setup git hook for automatic update? (y|n): " set_git_hook
fi
git clone "$git_url" "clickhouse-docs"
if [ "$set_git_hook" = "y" ]; then
hook_command="$(pwd)/pull-clickhouse-docs-hook.sh $UPDATE_PERIOD_HOURS ||:"
hook_file=$(realpath "$(pwd)/../.git/hooks/post-checkout")
if grep -Faq "pull-clickhouse-docs-hook.sh" "$hook_file" 2>/dev/null; then
echo "Looks like the update hook already exists, will not add another one"
else
echo "Appending '$hook_command' to $hook_file"
echo "$hook_command" >> "$hook_file"
chmod u+x "$hook_file" # Just in case it did not exist before append
fi
elif [ ! "$set_git_hook" = "n" ]; then
echo "Expected 'y' or 'n', got '$set_git_hook', will not setup git hook"
fi
fi

View File

@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -e
# The script to update user-guides documentation repo
# https://github.com/ClickHouse/clickhouse-docs
WORKDIR=$(dirname "$0")
WORKDIR=$(readlink -f "${WORKDIR}")
cd "$WORKDIR"
UPDATE_PERIOD_HOURS="${1:-24}" # By default update once per 24 hours; 0 means "always update"
if [ ! -d "clickhouse-docs" ]; then
echo "There's no clickhouse-docs/ dir, run get-clickhouse-docs.sh first to clone the repo"
exit 1
fi
# Do not update it too often
LAST_FETCH_TS=$(stat -c %Y clickhouse-docs/.git/FETCH_HEAD 2>/dev/null || echo 0)
CURRENT_TS=$(date +%s)
HOURS_SINCE_LAST_FETCH=$(( (CURRENT_TS - LAST_FETCH_TS) / 60 / 60 ))
if [ "$HOURS_SINCE_LAST_FETCH" -lt "$UPDATE_PERIOD_HOURS" ]; then
exit 0;
fi
echo "Updating clickhouse-docs..."
git -C clickhouse-docs pull

View File

@ -211,4 +211,4 @@ ClickHouse может поддерживать учетные данные Kerbe
**Смотрите также**
- [Виртуальные столбцы](index.md#table_engines-virtual_columns)
- [background_message_broker_schedule_pool_size](../../../operations/settings/settings.md#background_message_broker_schedule_pool_size)
- [background_message_broker_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_message_broker_schedule_pool_size)

View File

@ -752,7 +752,7 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
Изменить политику хранения после создания таблицы можно при помощи запроса [ALTER TABLE ... MODIFY SETTING]. При этом необходимо учесть, что новая политика должна содержать все тома и диски предыдущей политики с теми же именами.
Количество потоков для фоновых перемещений кусков между дисками можно изменить с помощью настройки [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size)
Количество потоков для фоновых перемещений кусков между дисками можно изменить с помощью настройки [background_move_pool_size](../../../operations/server-configuration-parameters/settings.md#background_move_pool_size)
### Особенности работы {#details}

View File

@ -64,9 +64,9 @@ ClickHouse хранит метаинформацию о репликах в [Apa
Для очень больших кластеров, можно использовать разные кластеры ZooKeeper для разных шардов. Впрочем, на кластере Яндекс.Метрики (примерно 300 серверов) такой необходимости не возникает.
Репликация асинхронная, мульти-мастер. Запросы `INSERT` и `ALTER` можно направлять на любой доступный сервер. Данные вставятся на сервер, где выполнен запрос, а затем скопируются на остальные серверы. В связи с асинхронностью, только что вставленные данные появляются на остальных репликах с небольшой задержкой. Если часть реплик недоступна, данные на них запишутся тогда, когда они станут доступны. Если реплика доступна, то задержка составляет столько времени, сколько требуется для передачи блока сжатых данных по сети. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size).
Репликация асинхронная, мульти-мастер. Запросы `INSERT` и `ALTER` можно направлять на любой доступный сервер. Данные вставятся на сервер, где выполнен запрос, а затем скопируются на остальные серверы. В связи с асинхронностью, только что вставленные данные появляются на остальных репликах с небольшой задержкой. Если часть реплик недоступна, данные на них запишутся тогда, когда они станут доступны. Если реплика доступна, то задержка составляет столько времени, сколько требуется для передачи блока сжатых данных по сети. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_schedule_pool_size).
Движок `ReplicatedMergeTree` использует отдельный пул потоков для скачивания кусков данных. Размер пула ограничен настройкой [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size), которую можно указать при перезапуске сервера.
Движок `ReplicatedMergeTree` использует отдельный пул потоков для скачивания кусков данных. Размер пула ограничен настройкой [background_fetches_pool_size](../../../operations/server-configuration-parameters/settings.md#background_fetches_pool_size), которую можно указать при перезапуске сервера.
По умолчанию, запрос INSERT ждёт подтверждения записи только от одной реплики. Если данные были успешно записаны только на одну реплику, и сервер с этой репликой перестал существовать, то записанные данные будут потеряны. Вы можете включить подтверждение записи от нескольких реплик, используя настройку `insert_quorum`.
@ -251,8 +251,8 @@ $ sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
**Смотрите также**
- [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size)
- [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size)
- [background_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_schedule_pool_size)
- [background_fetches_pool_size](../../../operations/server-configuration-parameters/settings.md#background_fetches_pool_size)
- [execute_merges_on_single_replica_time_threshold](../../../operations/settings/settings.md#execute-merges-on-single-replica-time-threshold)
- [max_replicated_fetches_network_bandwidth](../../../operations/settings/merge-tree-settings.md#max_replicated_fetches_network_bandwidth)
- [max_replicated_sends_network_bandwidth](../../../operations/settings/merge-tree-settings.md#max_replicated_sends_network_bandwidth)

View File

@ -225,6 +225,7 @@ ClickHouse проверяет условия для `min_part_size` и `min_part
``` xml
<default_replica_path>/clickhouse/tables/{uuid}/{shard}</default_replica_path>
```
## default_replica_name {#default_replica_name}
Имя реплики в ZooKeeper.
@ -916,6 +917,72 @@ ClickHouse использует потоки из глобального пул
<thread_pool_queue_size>12000</thread_pool_queue_size>
```
## background_buffer_flush_schedule_pool_size {#background_buffer_flush_schedule_pool_size}
Задает количество потоков для выполнения фонового сброса данных в таблицах с движком [Buffer](../../engines/table-engines/special/buffer.md).
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
## background_move_pool_size {#background_move_pool_size}
Задает количество потоков для фоновых перемещений кусков между дисками. Работает для таблиц с движком [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes).
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 8.
## background_schedule_pool_size {#background_schedule_pool_size}
Задает количество потоков для выполнения фоновых задач. Работает для [реплицируемых](../../engines/table-engines/mergetree-family/replication.md) таблиц, стримов в [Kafka](../../engines/table-engines/integrations/kafka.md) и обновления IP адресов у записей во внутреннем [DNS кеше](../server-configuration-parameters/settings.md#server-settings-dns-cache-update-period).
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 128.
## background_fetches_pool_size {#background_fetches_pool_size}
Задает количество потоков для скачивания кусков данных для [реплицируемых](../../engines/table-engines/mergetree-family/replication.md) таблиц. Для использования в продакшене с частыми небольшими вставками или медленным кластером ZooKeeper рекомендуется использовать значение по умолчанию.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 8.
## background_distributed_schedule_pool_size {#background_distributed_schedule_pool_size}
Задает количество потоков для выполнения фоновых задач. Работает для таблиц с движком [Distributed](../../engines/table-engines/special/distributed.md).
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
## background_message_broker_schedule_pool_size {#background_message_broker_schedule_pool_size}
Задает количество потоков для фонового потокового вывода сообщений.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
**Смотрите также**
- Движок [Kafka](../../engines/table-engines/integrations/kafka.md#kafka).
- Движок [RabbitMQ](../../engines/table-engines/integrations/rabbitmq.md#rabbitmq-engine).
## merge_tree {#server_configuration_parameters-merge_tree}
Тонкая настройка таблиц семейства [MergeTree](../../operations/server-configuration-parameters/settings.md).

View File

@ -1122,6 +1122,7 @@ SELECT type, query FROM system.query_log WHERE log_comment = 'log_comment test'
:::note "Предупреждение"
Эта настройка экспертного уровня, не используйте ее, если вы только начинаете работать с Clickhouse.
:::
## max_query_size {#settings-max_query_size}
Максимальный кусок запроса, который будет считан в оперативку для разбора парсером языка SQL.
@ -2517,68 +2518,27 @@ SELECT idx, i FROM null_in WHERE i IN (1, NULL) SETTINGS transform_null_in = 1;
## background_buffer_flush_schedule_pool_size {#background_buffer_flush_schedule_pool_size}
Задает количество потоков для выполнения фонового сброса данных в таблицах с движком [Buffer](../../engines/table-engines/special/buffer.md). Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_buffer_flush_schedule_pool_size).
## background_move_pool_size {#background_move_pool_size}
Задает количество потоков для фоновых перемещений кусков между дисками. Работает для таблиц с движком [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes). Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 8.
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_move_pool_size).
## background_schedule_pool_size {#background_schedule_pool_size}
Задает количество потоков для выполнения фоновых задач. Работает для [реплицируемых](../../engines/table-engines/mergetree-family/replication.md) таблиц, стримов в [Kafka](../../engines/table-engines/integrations/kafka.md) и обновления IP адресов у записей во внутреннем [DNS кеше](../server-configuration-parameters/settings.md#server-settings-dns-cache-update-period). Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 128.
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_schedule_pool_size).
## background_fetches_pool_size {#background_fetches_pool_size}
Задает количество потоков для скачивания кусков данных для [реплицируемых](../../engines/table-engines/mergetree-family/replication.md) таблиц. Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе. Для использования в продакшене с частыми небольшими вставками или медленным кластером ZooKeeper рекомендуется использовать значение по умолчанию.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 8.
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_fetches_pool_size).
## background_distributed_schedule_pool_size {#background_distributed_schedule_pool_size}
Задает количество потоков для выполнения фоновых задач. Работает для таблиц с движком [Distributed](../../engines/table-engines/special/distributed.md). Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_distributed_schedule_pool_size).
## background_message_broker_schedule_pool_size {#background_message_broker_schedule_pool_size}
Задает количество потоков для фонового потокового вывода сообщений. Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 16.
**Смотрите также**
- Движок [Kafka](../../engines/table-engines/integrations/kafka.md#kafka).
- Движок [RabbitMQ](../../engines/table-engines/integrations/rabbitmq.md#rabbitmq-engine).
Параметр перенесен в [серверную конфигурацию](../../operations/server-configuration-parameters/settings.md/#background_message_broker_schedule_pool_size).
## format_avro_schema_registry_url {#format_avro_schema_registry_url}
@ -3388,6 +3348,7 @@ SELECT * FROM test LIMIT 10 OFFSET 100;
│ 109 │
└─────┘
```
## http_connection_timeout {#http_connection_timeout}
Тайм-аут для HTTP-соединения (в секундах).
@ -4085,7 +4046,7 @@ ALTER TABLE test FREEZE SETTINGS alter_partition_verbose_result = 1;
Значение по умолчанию: `''`.
## stop_reading_on_first_cancel {#stop_reading_on_first_cancel}
## partial_result_on_first_cancel {#partial_result_on_first_cancel}
Если установлено значение `true` и пользователь хочет прервать запрос (например, с помощью `Ctrl+C` на клиенте), то запрос продолжает выполнение только для данных, которые уже были считаны из таблицы. После этого он вернет частичный результат запроса для той части таблицы, которая была прочитана. Чтобы полностью остановить выполнение запроса без частичного результата, пользователь должен отправить 2 запроса отмены.
**Пример с выключенной настройкой при нажатии Ctrl+C**
@ -4101,7 +4062,7 @@ Query was cancelled.
**Пример с включенной настройкой при нажатии Ctrl+C**
```sql
SELECT sum(number) FROM numbers(10000000000) SETTINGS stop_reading_on_first_cancel=true
SELECT sum(number) FROM numbers(10000000000) SETTINGS partial_result_on_first_cancel=true
┌──────sum(number)─┐
│ 1355411451286266 │

View File

@ -553,6 +553,44 @@ SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut;
└────────────┴───────┘
```
## toDecimalString
Принимает любой численный тип первым аргументом, возвращает строковое десятичное представление числа с точностью, заданной вторым аргументом.
**Синтаксис**
``` sql
toDecimalString(number, scale)
```
**Параметры**
- `number` — Значение любого числового типа: [Int, UInt](/docs/ru/sql-reference/data-types/int-uint.md), [Float](/docs/ru/sql-reference/data-types/float.md), [Decimal](/docs/ru/sql-reference/data-types/decimal.md),
- `scale` — Требуемое количество десятичных знаков после запятой, [UInt8](/docs/ru/sql-reference/data-types/int-uint.md).
* Значение `scale` для типов [Decimal](/docs/ru/sql-reference/data-types/decimal.md) и [Int, UInt](/docs/ru/sql-reference/data-types/int-uint.md) должно не превышать 77 (так как это наибольшее количество значимых символов для этих типов),
* Значение `scale` для типа [Float](/docs/ru/sql-reference/data-types/float.md) не должно превышать 60.
**Возвращаемое значение**
- Строка ([String](/docs/en/sql-reference/data-types/string.md)), представляющая собой десятичное представление входного числа с заданной длиной дробной части.
При необходимости число округляется по стандартным правилам арифметики.
**Пример использования**
Запрос:
``` sql
SELECT toDecimalString(CAST('64.32', 'Float64'), 5);
```
Результат:
```response
┌─toDecimalString(CAST('64.32', 'Float64'), 5)┐
│ 64.32000 │
└─────────────────────────────────────────────┘
```
## reinterpretAsUInt(8\|16\|32\|64) {#reinterpretasuint8163264}
## reinterpretAsInt(8\|16\|32\|64) {#reinterpretasint8163264}

View File

@ -163,4 +163,4 @@ clickhouse也支持自己使用keyfile的方式来维护kerbros的凭证。配
**另请参阅**
- [虚拟列](../../../engines/table-engines/index.md#table_engines-virtual_columns)
- [后台消息代理调度池大小](../../../operations/settings/settings.md#background_message_broker_schedule_pool_size)
- [后台消息代理调度池大小](../../../operations/server-configuration-parameters/settings.md#background_message_broker_schedule_pool_size)

View File

@ -689,7 +689,7 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
`default` 存储策略意味着只使用一个卷,这个卷只包含一个在 `<path>` 中定义的磁盘。您可以使用[ALTER TABLE ... MODIFY SETTING]来修改存储策略,新的存储策略应该包含所有以前的磁盘和卷,并使用相同的名称。
可以通过 [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size) 设置调整执行后台任务的线程数。
可以通过 [background_move_pool_size](../../../operations/server-configuration-parameters/settings.md#background_move_pool_size) 设置调整执行后台任务的线程数。
### 详细说明 {#details}

View File

@ -98,7 +98,7 @@ CREATE TABLE table_name ( ... ) ENGINE = ReplicatedMergeTree('zookeeper_name_con
对于非常大的集群,你可以把不同的 ZooKeeper 集群用于不同的分片。然而,即使 Yandex.Metrica 集群大约300台服务器也证明还不需要这么做。
复制是多主异步。 `INSERT` 语句(以及 `ALTER` )可以发给任意可用的服务器。数据会先插入到执行该语句的服务器上,然后被复制到其他服务器。由于它是异步的,在其他副本上最近插入的数据会有一些延迟。如果部分副本不可用,则数据在其可用时再写入。副本可用的情况下,则延迟时长是通过网络传输压缩数据块所需的时间。为复制表执行后台任务的线程数量,可以通过 [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size) 进行设置。
复制是多主异步。 `INSERT` 语句(以及 `ALTER` )可以发给任意可用的服务器。数据会先插入到执行该语句的服务器上,然后被复制到其他服务器。由于它是异步的,在其他副本上最近插入的数据会有一些延迟。如果部分副本不可用,则数据在其可用时再写入。副本可用的情况下,则延迟时长是通过网络传输压缩数据块所需的时间。为复制表执行后台任务的线程数量,可以通过 [background_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_schedule_pool_size) 进行设置。
`ReplicatedMergeTree` 引擎采用一个独立的线程池进行复制拉取。线程池的大小通过 [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size) 进行限定,它可以在重启服务器时进行调整。
@ -282,8 +282,8 @@ sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
**参考**
- [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size)
- [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size)
- [background_schedule_pool_size](../../../operations/server-configuration-parameters/settings.md#background_schedule_pool_size)
- [background_fetches_pool_size](../../../operations/server-configuration-parameters/settings.md#background_fetches_pool_size)
- [execute_merges_on_single_replica_time_threshold](../../../operations/settings/settings.md#execute-merges-on-single-replica-time-threshold)
- [max_replicated_fetches_network_bandwidth](../../../operations/settings/merge-tree-settings.mdx#max_replicated_fetches_network_bandwidth)
- [max_replicated_sends_network_bandwidth](../../../operations/settings/merge-tree-settings.mdx#max_replicated_sends_network_bandwidth)

View File

@ -34,6 +34,7 @@
#include <Common/Config/configReadClient.h>
#include <Common/TerminalSize.h>
#include <Common/StudentTTest.h>
#include <Common/CurrentMetrics.h>
#include <filesystem>
@ -43,6 +44,12 @@ namespace fs = std::filesystem;
* The tool emulates a case with fixed amount of simultaneously executing queries.
*/
namespace CurrentMetrics
{
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
namespace DB
{
@ -103,7 +110,7 @@ public:
settings(settings_),
shared_context(Context::createShared()),
global_context(Context::createGlobal(shared_context.get())),
pool(concurrency)
pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, concurrency)
{
const auto secure = secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable;
size_t connections_cnt = std::max(ports_.size(), hosts_.size());

View File

@ -6,6 +6,7 @@
#include <Common/ZooKeeper/ZooKeeper.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Common/setThreadName.h>
#include <Common/CurrentMetrics.h>
#include <Interpreters/InterpreterInsertQuery.h>
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
#include <Parsers/ASTFunction.h>
@ -19,6 +20,12 @@
#include <Processors/QueryPlan/BuildQueryPipelineSettings.h>
#include <Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h>
namespace CurrentMetrics
{
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
namespace DB
{
@ -192,7 +199,7 @@ void ClusterCopier::discoverTablePartitions(const ConnectionTimeouts & timeouts,
{
/// Fetch partitions list from a shard
{
ThreadPool thread_pool(num_threads ? num_threads : 2 * getNumberOfPhysicalCPUCores());
ThreadPool thread_pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads ? num_threads : 2 * getNumberOfPhysicalCPUCores());
for (const TaskShardPtr & task_shard : task_table.all_shards)
thread_pool.scheduleOrThrowOnError([this, timeouts, task_shard]()

View File

@ -315,12 +315,12 @@ struct Keeper::KeeperHTTPContext : public IHTTPContext
Poco::Timespan getReceiveTimeout() const override
{
return context->getConfigRef().getUInt64("keeper_server.http_receive_timeout", DEFAULT_HTTP_READ_BUFFER_TIMEOUT);
return {context->getConfigRef().getInt64("keeper_server.http_receive_timeout", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC), 0};
}
Poco::Timespan getSendTimeout() const override
{
return context->getConfigRef().getUInt64("keeper_server.http_send_timeout", DEFAULT_HTTP_READ_BUFFER_TIMEOUT);
return {context->getConfigRef().getInt64("keeper_server.http_send_timeout", DBMS_DEFAULT_SEND_TIMEOUT_SEC), 0};
}
TinyContextPtr context;
@ -445,6 +445,9 @@ try
return tiny_context->getConfigRef();
};
auto tcp_receive_timeout = config().getInt64("keeper_server.socket_receive_timeout_sec", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC);
auto tcp_send_timeout = config().getInt64("keeper_server.socket_send_timeout_sec", DBMS_DEFAULT_SEND_TIMEOUT_SEC);
for (const auto & listen_host : listen_hosts)
{
/// TCP Keeper
@ -453,8 +456,8 @@ try
{
Poco::Net::ServerSocket socket;
auto address = socketBindListen(socket, listen_host, port);
socket.setReceiveTimeout(config().getUInt64("keeper_server.socket_receive_timeout_sec", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC));
socket.setSendTimeout(config().getUInt64("keeper_server.socket_send_timeout_sec", DBMS_DEFAULT_SEND_TIMEOUT_SEC));
socket.setReceiveTimeout(Poco::Timespan{tcp_receive_timeout, 0});
socket.setSendTimeout(Poco::Timespan{tcp_send_timeout, 0});
servers->emplace_back(
listen_host,
port_name,
@ -462,8 +465,7 @@ try
std::make_unique<TCPServer>(
new KeeperTCPHandlerFactory(
config_getter, tiny_context->getKeeperDispatcher(),
config().getUInt64("keeper_server.socket_receive_timeout_sec", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC),
config().getUInt64("keeper_server.socket_send_timeout_sec", DBMS_DEFAULT_SEND_TIMEOUT_SEC), false), server_pool, socket));
tcp_receive_timeout, tcp_send_timeout, false), server_pool, socket));
});
const char * secure_port_name = "keeper_server.tcp_port_secure";
@ -472,8 +474,8 @@ try
#if USE_SSL
Poco::Net::SecureServerSocket socket;
auto address = socketBindListen(socket, listen_host, port, /* secure = */ true);
socket.setReceiveTimeout(config().getUInt64("keeper_server.socket_receive_timeout_sec", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC));
socket.setSendTimeout(config().getUInt64("keeper_server.socket_send_timeout_sec", DBMS_DEFAULT_SEND_TIMEOUT_SEC));
socket.setReceiveTimeout(Poco::Timespan{tcp_receive_timeout, 0});
socket.setSendTimeout(Poco::Timespan{tcp_send_timeout, 0});
servers->emplace_back(
listen_host,
secure_port_name,
@ -481,8 +483,7 @@ try
std::make_unique<TCPServer>(
new KeeperTCPHandlerFactory(
config_getter, tiny_context->getKeeperDispatcher(),
config().getUInt64("keeper_server.socket_receive_timeout_sec", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC),
config().getUInt64("keeper_server.socket_send_timeout_sec", DBMS_DEFAULT_SEND_TIMEOUT_SEC), true), server_pool, socket));
tcp_receive_timeout, tcp_send_timeout, true), server_pool, socket));
#else
UNUSED(port);
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "SSL support for TCP protocol is disabled because Poco library was built without NetSSL support.");
@ -490,18 +491,18 @@ try
});
const auto & config = config_getter();
auto http_context = httpContext();
Poco::Timespan keep_alive_timeout(config.getUInt("keep_alive_timeout", 10), 0);
Poco::Net::HTTPServerParams::Ptr http_params = new Poco::Net::HTTPServerParams;
http_params->setTimeout(DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC);
http_params->setTimeout(http_context->getReceiveTimeout());
http_params->setKeepAliveTimeout(keep_alive_timeout);
/// Prometheus (if defined and not setup yet with http_port)
port_name = "prometheus.port";
createServer(listen_host, port_name, listen_try, [&](UInt16 port)
createServer(listen_host, port_name, listen_try, [&, http_context = std::move(http_context)](UInt16 port) mutable
{
Poco::Net::ServerSocket socket;
auto address = socketBindListen(socket, listen_host, port);
auto http_context = httpContext();
socket.setReceiveTimeout(http_context->getReceiveTimeout());
socket.setSendTimeout(http_context->getSendTimeout());
servers->emplace_back(

View File

@ -1271,7 +1271,7 @@ try
{
auto new_pool_size = server_settings.background_pool_size;
auto new_ratio = server_settings.background_merges_mutations_concurrency_ratio;
global_context->getMergeMutateExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, new_pool_size * new_ratio);
global_context->getMergeMutateExecutor()->increaseThreadsAndMaxTasksCount(new_pool_size, static_cast<size_t>(new_pool_size * new_ratio));
global_context->getMergeMutateExecutor()->updateSchedulingPolicy(server_settings.background_merges_mutations_scheduling_policy.toString());
}

View File

@ -104,6 +104,8 @@ enum class AccessType
M(DROP_NAMED_COLLECTION, "", NAMED_COLLECTION, NAMED_COLLECTION_CONTROL) /* allows to execute DROP NAMED COLLECTION */\
M(DROP, "", GROUP, ALL) /* allows to execute {DROP|DETACH} */\
\
M(UNDROP_TABLE, "", TABLE, ALL) /* allows to execute {UNDROP} TABLE */\
\
M(TRUNCATE, "TRUNCATE TABLE", TABLE, ALL) \
M(OPTIMIZE, "OPTIMIZE TABLE", TABLE, ALL) \
M(BACKUP, "", TABLE, ALL) /* allows to backup tables */\

View File

@ -48,7 +48,7 @@ TEST(AccessRights, Union)
ASSERT_EQ(lhs.toString(),
"GRANT INSERT ON *.*, "
"GRANT SHOW, SELECT, ALTER, CREATE DATABASE, CREATE TABLE, CREATE VIEW, "
"CREATE DICTIONARY, DROP DATABASE, DROP TABLE, DROP VIEW, DROP DICTIONARY, "
"CREATE DICTIONARY, DROP DATABASE, DROP TABLE, DROP VIEW, DROP DICTIONARY, UNDROP TABLE, "
"TRUNCATE, OPTIMIZE, BACKUP, CREATE ROW POLICY, ALTER ROW POLICY, DROP ROW POLICY, "
"SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, "
"SYSTEM MOVES, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, "

View File

@ -21,18 +21,21 @@ namespace ErrorCodes
namespace
{
/// TODO Proper support for Decimal256.
template <typename T, typename LimitNumberOfElements>
struct MovingSum
{
using Data = MovingSumData<std::conditional_t<is_decimal<T>, Decimal128, NearestFieldType<T>>>;
using Data = MovingSumData<std::conditional_t<is_decimal<T>,
std::conditional_t<sizeof(T) <= sizeof(Decimal128), Decimal128, Decimal256>,
NearestFieldType<T>>>;
using Function = MovingImpl<T, LimitNumberOfElements, Data>;
};
template <typename T, typename LimitNumberOfElements>
struct MovingAvg
{
using Data = MovingAvgData<std::conditional_t<is_decimal<T>, Decimal128, Float64>>;
using Data = MovingAvgData<std::conditional_t<is_decimal<T>,
std::conditional_t<sizeof(T) <= sizeof(Decimal128), Decimal128, Decimal256>,
Float64>>;
using Function = MovingImpl<T, LimitNumberOfElements, Data>;
};

View File

@ -33,11 +33,6 @@
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
struct Settings;
enum class StatisticsFunctionKind
@ -57,7 +52,7 @@ struct StatFuncOneArg
using Type1 = T;
using Type2 = T;
using ResultType = std::conditional_t<std::is_same_v<T, Float32>, Float32, Float64>;
using Data = std::conditional_t<is_decimal<T>, VarMomentsDecimal<Decimal128, _level>, VarMoments<ResultType, _level>>;
using Data = VarMoments<ResultType, _level>;
static constexpr UInt32 num_args = 1;
};
@ -89,12 +84,11 @@ public:
explicit AggregateFunctionVarianceSimple(const DataTypes & argument_types_, StatisticsFunctionKind kind_)
: IAggregateFunctionDataHelper<typename StatFunc::Data, AggregateFunctionVarianceSimple<StatFunc>>(argument_types_, {}, std::make_shared<DataTypeNumber<ResultType>>())
, src_scale(0), kind(kind_)
{}
AggregateFunctionVarianceSimple(const IDataType & data_type, const DataTypes & argument_types_, StatisticsFunctionKind kind_)
: IAggregateFunctionDataHelper<typename StatFunc::Data, AggregateFunctionVarianceSimple<StatFunc>>(argument_types_, {}, std::make_shared<DataTypeNumber<ResultType>>())
, src_scale(getDecimalScale(data_type)), kind(kind_)
{}
{
chassert(!argument_types_.empty());
if (isDecimal(argument_types_.front()))
src_scale = getDecimalScale(*argument_types_.front());
}
String getName() const override
{
@ -113,8 +107,9 @@ public:
{
if constexpr (is_decimal<T1>)
{
this->data(place).add(static_cast<ResultType>(
static_cast<const ColVecT1 &>(*columns[0]).getData()[row_num].value));
this->data(place).add(
convertFromDecimal<DataTypeDecimal<T1>, DataTypeFloat64>(
static_cast<const ColVecT1 &>(*columns[0]).getData()[row_num], src_scale));
}
else
this->data(place).add(
@ -142,161 +137,86 @@ public:
const auto & data = this->data(place);
auto & dst = static_cast<ColVecResult &>(to).getData();
if constexpr (is_decimal<T1>)
switch (kind)
{
switch (kind)
case StatisticsFunctionKind::varPop:
{
case StatisticsFunctionKind::varPop:
{
dst.push_back(data.getPopulation(src_scale * 2));
break;
}
case StatisticsFunctionKind::varSamp:
{
dst.push_back(data.getSample(src_scale * 2));
break;
}
case StatisticsFunctionKind::stddevPop:
{
dst.push_back(sqrt(data.getPopulation(src_scale * 2)));
break;
}
case StatisticsFunctionKind::stddevSamp:
{
dst.push_back(sqrt(data.getSample(src_scale * 2)));
break;
}
case StatisticsFunctionKind::skewPop:
{
Float64 var_value = data.getPopulation(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment3(src_scale * 3) / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
break;
}
case StatisticsFunctionKind::skewSamp:
{
Float64 var_value = data.getSample(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment3(src_scale * 3) / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
break;
}
case StatisticsFunctionKind::kurtPop:
{
Float64 var_value = data.getPopulation(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment4(src_scale * 4) / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
break;
}
case StatisticsFunctionKind::kurtSamp:
{
Float64 var_value = data.getSample(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment4(src_scale * 4) / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
break;
}
default:
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected statistical function kind");
dst.push_back(data.getPopulation());
break;
}
}
else
{
switch (kind)
case StatisticsFunctionKind::varSamp:
{
case StatisticsFunctionKind::varPop:
{
dst.push_back(data.getPopulation());
break;
}
case StatisticsFunctionKind::varSamp:
{
dst.push_back(data.getSample());
break;
}
case StatisticsFunctionKind::stddevPop:
{
dst.push_back(sqrt(data.getPopulation()));
break;
}
case StatisticsFunctionKind::stddevSamp:
{
dst.push_back(sqrt(data.getSample()));
break;
}
case StatisticsFunctionKind::skewPop:
{
ResultType var_value = data.getPopulation();
dst.push_back(data.getSample());
break;
}
case StatisticsFunctionKind::stddevPop:
{
dst.push_back(sqrt(data.getPopulation()));
break;
}
case StatisticsFunctionKind::stddevSamp:
{
dst.push_back(sqrt(data.getSample()));
break;
}
case StatisticsFunctionKind::skewPop:
{
ResultType var_value = data.getPopulation();
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment3() / pow(var_value, 1.5)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment3() / pow(var_value, 1.5)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
break;
}
case StatisticsFunctionKind::skewSamp:
{
ResultType var_value = data.getSample();
break;
}
case StatisticsFunctionKind::skewSamp:
{
ResultType var_value = data.getSample();
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment3() / pow(var_value, 1.5)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment3() / pow(var_value, 1.5)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
break;
}
case StatisticsFunctionKind::kurtPop:
{
ResultType var_value = data.getPopulation();
break;
}
case StatisticsFunctionKind::kurtPop:
{
ResultType var_value = data.getPopulation();
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment4() / pow(var_value, 2)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment4() / pow(var_value, 2)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
break;
}
case StatisticsFunctionKind::kurtSamp:
{
ResultType var_value = data.getSample();
break;
}
case StatisticsFunctionKind::kurtSamp:
{
ResultType var_value = data.getSample();
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment4() / pow(var_value, 2)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
if (var_value > 0)
dst.push_back(static_cast<ResultType>(data.getMoment4() / pow(var_value, 2)));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
break;
}
case StatisticsFunctionKind::covarPop:
{
dst.push_back(data.getPopulation());
break;
}
case StatisticsFunctionKind::covarSamp:
{
dst.push_back(data.getSample());
break;
}
case StatisticsFunctionKind::corr:
{
dst.push_back(data.get());
break;
}
break;
}
case StatisticsFunctionKind::covarPop:
{
dst.push_back(data.getPopulation());
break;
}
case StatisticsFunctionKind::covarSamp:
{
dst.push_back(data.getSample());
break;
}
case StatisticsFunctionKind::corr:
{
dst.push_back(data.get());
break;
}
}
}
@ -327,7 +247,7 @@ AggregateFunctionPtr createAggregateFunctionStatisticsUnary(
AggregateFunctionPtr res;
const DataTypePtr & data_type = argument_types[0];
if (isDecimal(data_type))
res.reset(createWithDecimalType<FunctionTemplate>(*data_type, *data_type, argument_types, kind));
res.reset(createWithDecimalType<FunctionTemplate>(*data_type, argument_types, kind));
else
res.reset(createWithNumericType<FunctionTemplate>(*data_type, argument_types, kind));

View File

@ -19,7 +19,7 @@ namespace
template <typename T>
struct SumSimple
{
/// @note It uses slow Decimal128 (cause we need such a variant). sumWithOverflow is faster for Decimal32/64
/// @note It uses slow Decimal128/256 (cause we need such a variant). sumWithOverflow is faster for Decimal32/64
using ResultType = std::conditional_t<is_decimal<T>,
std::conditional_t<std::is_same_v<T, Decimal256>, Decimal256, Decimal128>,
NearestFieldType<T>>;

View File

@ -41,6 +41,8 @@ static IAggregateFunction * createAggregateFunctionSingleValue(const String & na
return new AggregateFunctionTemplate<Data<SingleValueDataFixed<Decimal64>>>(argument_type);
if (which.idx == TypeIndex::Decimal128)
return new AggregateFunctionTemplate<Data<SingleValueDataFixed<Decimal128>>>(argument_type);
if (which.idx == TypeIndex::Decimal256)
return new AggregateFunctionTemplate<Data<SingleValueDataFixed<Decimal256>>>(argument_type);
if (which.idx == TypeIndex::String)
return new AggregateFunctionTemplate<Data<SingleValueDataString>>(argument_type);
@ -72,6 +74,8 @@ static IAggregateFunction * createAggregateFunctionArgMinMaxSecond(const DataTyp
return new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxData<ResData, MinMaxData<SingleValueDataFixed<Decimal64>>>>(res_type, val_type);
if (which.idx == TypeIndex::Decimal128)
return new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxData<ResData, MinMaxData<SingleValueDataFixed<Decimal128>>>>(res_type, val_type);
if (which.idx == TypeIndex::Decimal256)
return new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxData<ResData, MinMaxData<SingleValueDataFixed<Decimal256>>>>(res_type, val_type);
if (which.idx == TypeIndex::String)
return new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxData<ResData, MinMaxData<SingleValueDataString>>>(res_type, val_type);
@ -106,6 +110,8 @@ static IAggregateFunction * createAggregateFunctionArgMinMax(const String & name
return createAggregateFunctionArgMinMaxSecond<MinMaxData, SingleValueDataFixed<Decimal64>>(res_type, val_type);
if (which.idx == TypeIndex::Decimal128)
return createAggregateFunctionArgMinMaxSecond<MinMaxData, SingleValueDataFixed<Decimal128>>(res_type, val_type);
if (which.idx == TypeIndex::Decimal256)
return createAggregateFunctionArgMinMaxSecond<MinMaxData, SingleValueDataFixed<Decimal256>>(res_type, val_type);
if (which.idx == TypeIndex::String)
return createAggregateFunctionArgMinMaxSecond<MinMaxData, SingleValueDataString>(res_type, val_type);

View File

@ -17,7 +17,6 @@ struct Settings;
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int DECIMAL_OVERFLOW;
extern const int LOGICAL_ERROR;
}
@ -136,114 +135,6 @@ struct VarMoments
}
};
template <typename T, size_t _level>
class VarMomentsDecimal
{
public:
using NativeType = typename T::NativeType;
void add(NativeType x)
{
++m0;
getM(1) += x;
NativeType tmp;
bool overflow = common::mulOverflow(x, x, tmp) || common::addOverflow(getM(2), tmp, getM(2));
if constexpr (_level >= 3)
overflow = overflow || common::mulOverflow(tmp, x, tmp) || common::addOverflow(getM(3), tmp, getM(3));
if constexpr (_level >= 4)
overflow = overflow || common::mulOverflow(tmp, x, tmp) || common::addOverflow(getM(4), tmp, getM(4));
if (overflow)
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
}
void merge(const VarMomentsDecimal & rhs)
{
m0 += rhs.m0;
getM(1) += rhs.getM(1);
bool overflow = common::addOverflow(getM(2), rhs.getM(2), getM(2));
if constexpr (_level >= 3)
overflow = overflow || common::addOverflow(getM(3), rhs.getM(3), getM(3));
if constexpr (_level >= 4)
overflow = overflow || common::addOverflow(getM(4), rhs.getM(4), getM(4));
if (overflow)
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
}
void write(WriteBuffer & buf) const { writePODBinary(*this, buf); }
void read(ReadBuffer & buf) { readPODBinary(*this, buf); }
Float64 get() const
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Variation moments should be obtained by either 'getSample' or 'getPopulation' method");
}
Float64 getPopulation(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(getM(1), getM(1), tmp) ||
common::subOverflow(getM(2), NativeType(tmp / m0), tmp))
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
return std::max(Float64{}, DecimalUtils::convertTo<Float64>(T(tmp / m0), scale));
}
Float64 getSample(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::quiet_NaN();
if (m0 == 1)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(getM(1), getM(1), tmp) ||
common::subOverflow(getM(2), NativeType(tmp / m0), tmp))
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
return std::max(Float64{}, DecimalUtils::convertTo<Float64>(T(tmp / (m0 - 1)), scale));
}
Float64 getMoment3(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(2 * getM(1), getM(1), tmp) ||
common::subOverflow(3 * getM(2), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(getM(3), NativeType(tmp / m0), tmp))
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
return DecimalUtils::convertTo<Float64>(T(tmp / m0), scale);
}
Float64 getMoment4(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(3 * getM(1), getM(1), tmp) ||
common::subOverflow(6 * getM(2), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(4 * getM(3), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(getM(4), NativeType(tmp / m0), tmp))
throw Exception(ErrorCodes::DECIMAL_OVERFLOW, "Decimal math overflow");
return DecimalUtils::convertTo<Float64>(T(tmp / m0), scale);
}
private:
UInt64 m0{};
NativeType m[_level]{};
NativeType & getM(size_t i) { return m[i - 1]; }
const NativeType & getM(size_t i) const { return m[i - 1]; }
};
/**
Calculating multivariate central moments

View File

@ -33,7 +33,7 @@ struct QuantileExactWeighted
using Weight = UInt64;
using UnderlyingType = NativeType<Value>;
using Hasher = std::conditional_t<std::is_same_v<Value, Decimal128>, Int128Hash, HashCRC32<UnderlyingType>>;
using Hasher = HashCRC32<UnderlyingType>;
/// When creating, the hash table must be small.
using Map = HashMapWithStackMemory<UnderlyingType, Weight, Hasher, 4>;

View File

@ -34,7 +34,7 @@ struct QuantileInterpolatedWeighted
using Weight = UInt64;
using UnderlyingType = NativeType<Value>;
using Hasher = std::conditional_t<std::is_same_v<Value, Decimal128>, Int128Hash, HashCRC32<UnderlyingType>>;
using Hasher = HashCRC32<UnderlyingType>;
/// When creating, the hash table must be small.
using Map = HashMapWithStackMemory<UnderlyingType, Weight, Hasher, 4>;

View File

@ -153,9 +153,11 @@ private:
}
};
if (const auto * lhs_literal = lhs->as<ConstantNode>())
if (const auto * lhs_literal = lhs->as<ConstantNode>();
lhs_literal && !lhs_literal->getValue().isNull())
add_equals_function_if_not_present(rhs, lhs_literal);
else if (const auto * rhs_literal = rhs->as<ConstantNode>())
else if (const auto * rhs_literal = rhs->as<ConstantNode>();
rhs_literal && !rhs_literal->getValue().isNull())
add_equals_function_if_not_present(lhs, rhs_literal);
else
or_operands.push_back(argument);
@ -217,7 +219,6 @@ private:
/// we can replace OR with the operand
if (or_operands[0]->getResultType()->equals(*function_node.getResultType()))
{
assert(!function_node.getResultType()->isNullable());
node = std::move(or_operands[0]);
return;
}

View File

@ -0,0 +1,146 @@
#include <Backups/BackupCoordinationFileInfos.h>
#include <Common/quoteString.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BACKUP_ENTRY_ALREADY_EXISTS;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
}
using SizeAndChecksum = std::pair<UInt64, UInt128>;
void BackupCoordinationFileInfos::addFileInfos(BackupFileInfos && file_infos_, const String & host_id_)
{
if (prepared)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addFileInfos() must not be called after preparing");
file_infos.emplace(host_id_, std::move(file_infos_));
}
BackupFileInfos BackupCoordinationFileInfos::getFileInfos(const String & host_id_) const
{
prepare();
auto it = file_infos.find(host_id_);
if (it == file_infos.end())
return {};
return it->second;
}
BackupFileInfos BackupCoordinationFileInfos::getFileInfosForAllHosts() const
{
prepare();
BackupFileInfos res;
res.reserve(file_infos_for_all_hosts.size());
for (const auto * file_info : file_infos_for_all_hosts)
res.emplace_back(*file_info);
return res;
}
BackupFileInfo BackupCoordinationFileInfos::getFileInfoByDataFileIndex(size_t data_file_index) const
{
prepare();
if (data_file_index >= file_infos_for_all_hosts.size())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Invalid data file index: {}", data_file_index);
return *(file_infos_for_all_hosts[data_file_index]);
}
void BackupCoordinationFileInfos::prepare() const
{
if (prepared)
return;
/// Make a list of all file infos from all hosts.
size_t total_num_infos = 0;
for (const auto & [_, infos] : file_infos)
total_num_infos += infos.size();
file_infos_for_all_hosts.reserve(total_num_infos);
for (auto & [_, infos] : file_infos)
for (auto & info : infos)
file_infos_for_all_hosts.emplace_back(&info);
/// Sort the list of all file infos by file name (file names must be unique).
std::sort(file_infos_for_all_hosts.begin(), file_infos_for_all_hosts.end(), BackupFileInfo::LessByFileName{});
auto adjacent_it = std::adjacent_find(file_infos_for_all_hosts.begin(), file_infos_for_all_hosts.end(), BackupFileInfo::EqualByFileName{});
if (adjacent_it != file_infos_for_all_hosts.end())
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_ALREADY_EXISTS, "Entry {} added multiple times to backup", quoteString((*adjacent_it)->file_name));
}
num_files = 0;
total_size_of_files = 0;
if (plain_backup)
{
/// For plain backup all file infos are stored as is, without checking for duplicates or skipping empty files.
for (size_t i = 0; i != file_infos_for_all_hosts.size(); ++i)
{
auto & info = *(file_infos_for_all_hosts[i]);
info.data_file_name = info.file_name;
info.data_file_index = i;
info.base_size = 0; /// Base backup must not be used while creating a plain backup.
info.base_checksum = 0;
total_size_of_files += info.size;
}
num_files = file_infos_for_all_hosts.size();
}
else
{
/// For non-plain backups files with the same size and checksum are stored only once,
/// in order to find those files we'll use this map.
std::map<SizeAndChecksum, size_t> data_file_index_by_checksum;
for (size_t i = 0; i != file_infos_for_all_hosts.size(); ++i)
{
auto & info = *(file_infos_for_all_hosts[i]);
if (info.size == info.base_size)
{
/// A file is either empty or can be get from the base backup as a whole.
info.data_file_name.clear();
info.data_file_index = static_cast<size_t>(-1);
}
else
{
SizeAndChecksum size_and_checksum{info.size, info.checksum};
auto [it, inserted] = data_file_index_by_checksum.emplace(size_and_checksum, i);
if (inserted)
{
/// Found a new file.
info.data_file_name = info.file_name;
info.data_file_index = i;
++num_files;
total_size_of_files += info.size - info.base_size;
}
else
{
/// Found a file with the same size and checksum as some file before, reuse old `data_file_index` and `data_file_name`.
info.data_file_index = it->second;
info.data_file_name = file_infos_for_all_hosts[it->second]->data_file_name;
}
}
}
}
prepared = true;
}
size_t BackupCoordinationFileInfos::getNumFiles() const
{
prepare();
return num_files;
}
size_t BackupCoordinationFileInfos::getTotalSizeOfFiles() const
{
prepare();
return total_size_of_files;
}
}

View File

@ -0,0 +1,56 @@
#pragma once
#include <map>
#include <memory>
#include <unordered_map>
#include <unordered_set>
#include <Backups/BackupFileInfo.h>
namespace DB
{
/// Hosts use this class to coordinate lists of files they are going to write to a backup.
/// Because different hosts shouldn't write the same file twice and or even files with different names but with the same checksum.
/// Also the initiator of the BACKUP query uses this class to get a whole list of files written by all hosts to write that list
/// as a part of the contents of the .backup file (the backup metadata file).
class BackupCoordinationFileInfos
{
public:
/// plain_backup sets that we're writing a plain backup, which means all duplicates are written as is, and empty files are written as is.
/// (For normal backups only the first file amongst duplicates is actually stored, and empty files are not stored).
BackupCoordinationFileInfos(bool plain_backup_) : plain_backup(plain_backup_) {}
/// Adds file infos for the specified host.
void addFileInfos(BackupFileInfos && file_infos, const String & host_id);
/// Returns file infos for the specified host after preparation.
BackupFileInfos getFileInfos(const String & host_id) const;
/// Returns file infos for all hosts after preparation.
BackupFileInfos getFileInfosForAllHosts() const;
/// Returns a file info by data file index (see BackupFileInfo::data_file_index).
BackupFileInfo getFileInfoByDataFileIndex(size_t data_file_index) const;
/// Returns the number of files after deduplication and excluding empty files.
size_t getNumFiles() const;
/// Returns the total size of files after deduplication and excluding empty files.
size_t getTotalSizeOfFiles() const;
private:
void prepare() const;
/// before preparation
const bool plain_backup;
mutable std::unordered_map<String, BackupFileInfos> file_infos;
/// after preparation
mutable bool prepared = false;
mutable std::vector<BackupFileInfo *> file_infos_for_all_hosts;
mutable size_t num_files;
mutable size_t total_size_of_files;
};
}

View File

@ -1,16 +1,17 @@
#include <Backups/BackupCoordinationLocal.h>
#include <Common/Exception.h>
#include <Common/logger_useful.h>
#include <Common/quoteString.h>
#include <fmt/format.h>
namespace DB
{
using SizeAndChecksum = IBackupCoordination::SizeAndChecksum;
using FileInfo = IBackupCoordination::FileInfo;
BackupCoordinationLocal::BackupCoordinationLocal(bool plain_backup_) : file_infos(plain_backup_)
{
}
BackupCoordinationLocal::BackupCoordinationLocal() = default;
BackupCoordinationLocal::~BackupCoordinationLocal() = default;
void BackupCoordinationLocal::setStage(const String &, const String &)
@ -33,187 +34,94 @@ Strings BackupCoordinationLocal::waitForStage(const String &, std::chrono::milli
void BackupCoordinationLocal::addReplicatedPartNames(const String & table_shared_id, const String & table_name_for_logs, const String & replica_name, const std::vector<PartNameAndChecksum> & part_names_and_checksums)
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addPartNames(table_shared_id, table_name_for_logs, replica_name, part_names_and_checksums);
}
Strings BackupCoordinationLocal::getReplicatedPartNames(const String & table_shared_id, const String & replica_name) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
return replicated_tables.getPartNames(table_shared_id, replica_name);
}
void BackupCoordinationLocal::addReplicatedMutations(const String & table_shared_id, const String & table_name_for_logs, const String & replica_name, const std::vector<MutationInfo> & mutations)
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addMutations(table_shared_id, table_name_for_logs, replica_name, mutations);
}
std::vector<IBackupCoordination::MutationInfo> BackupCoordinationLocal::getReplicatedMutations(const String & table_shared_id, const String & replica_name) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
return replicated_tables.getMutations(table_shared_id, replica_name);
}
void BackupCoordinationLocal::addReplicatedDataPath(const String & table_shared_id, const String & data_path)
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
replicated_tables.addDataPath(table_shared_id, data_path);
}
Strings BackupCoordinationLocal::getReplicatedDataPaths(const String & table_shared_id) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
return replicated_tables.getDataPaths(table_shared_id);
}
void BackupCoordinationLocal::addReplicatedAccessFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & file_path)
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_access_mutex};
replicated_access.addFilePath(access_zk_path, access_entity_type, "", file_path);
}
Strings BackupCoordinationLocal::getReplicatedAccessFilePaths(const String & access_zk_path, AccessEntityType access_entity_type) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_access_mutex};
return replicated_access.getFilePaths(access_zk_path, access_entity_type, "");
}
void BackupCoordinationLocal::addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path)
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_sql_objects_mutex};
replicated_sql_objects.addDirectory(loader_zk_path, object_type, "", dir_path);
}
Strings BackupCoordinationLocal::getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_sql_objects_mutex};
return replicated_sql_objects.getDirectories(loader_zk_path, object_type, "");
}
void BackupCoordinationLocal::addFileInfo(const FileInfo & file_info, bool & is_data_file_required)
void BackupCoordinationLocal::addFileInfos(BackupFileInfos && file_infos_)
{
std::lock_guard lock{mutex};
file_names.emplace(file_info.file_name, std::pair{file_info.size, file_info.checksum});
if (!file_info.size)
{
is_data_file_required = false;
return;
}
bool inserted_file_info = file_infos.try_emplace(std::pair{file_info.size, file_info.checksum}, file_info).second;
is_data_file_required = inserted_file_info && (file_info.size > file_info.base_size);
std::lock_guard lock{file_infos_mutex};
file_infos.addFileInfos(std::move(file_infos_), "");
}
void BackupCoordinationLocal::updateFileInfo(const FileInfo & file_info)
BackupFileInfos BackupCoordinationLocal::getFileInfos() const
{
if (!file_info.size)
return; /// we don't keep FileInfos for empty files, nothing to update
std::lock_guard lock{mutex};
auto & dest = file_infos.at(std::pair{file_info.size, file_info.checksum});
dest.archive_suffix = file_info.archive_suffix;
std::lock_guard lock{file_infos_mutex};
return file_infos.getFileInfos("");
}
std::vector<FileInfo> BackupCoordinationLocal::getAllFileInfos() const
BackupFileInfos BackupCoordinationLocal::getFileInfosForAllHosts() const
{
std::lock_guard lock{mutex};
std::vector<FileInfo> res;
for (const auto & [file_name, size_and_checksum] : file_names)
{
FileInfo info;
UInt64 size = size_and_checksum.first;
if (size) /// we don't keep FileInfos for empty files
info = file_infos.at(size_and_checksum);
info.file_name = file_name;
res.push_back(std::move(info));
}
return res;
std::lock_guard lock{file_infos_mutex};
return file_infos.getFileInfosForAllHosts();
}
Strings BackupCoordinationLocal::listFiles(const String & directory, bool recursive) const
bool BackupCoordinationLocal::startWritingFile(size_t data_file_index)
{
std::lock_guard lock{mutex};
String prefix = directory;
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
String terminator = recursive ? "" : "/";
Strings elements;
for (auto it = file_names.lower_bound(prefix); it != file_names.end(); ++it)
{
const String & name = it->first;
if (!name.starts_with(prefix))
break;
size_t start_pos = prefix.length();
size_t end_pos = String::npos;
if (!terminator.empty())
end_pos = name.find(terminator, start_pos);
std::string_view new_element = std::string_view{name}.substr(start_pos, end_pos - start_pos);
if (!elements.empty() && (elements.back() == new_element))
continue;
elements.push_back(String{new_element});
}
return elements;
std::lock_guard lock{writing_files_mutex};
/// Return false if this function was already called with this `data_file_index`.
return writing_files.emplace(data_file_index).second;
}
bool BackupCoordinationLocal::hasFiles(const String & directory) const
{
std::lock_guard lock{mutex};
String prefix = directory;
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
auto it = file_names.lower_bound(prefix);
if (it == file_names.end())
return false;
const String & name = it->first;
return name.starts_with(prefix);
}
std::optional<FileInfo> BackupCoordinationLocal::getFileInfo(const String & file_name) const
{
std::lock_guard lock{mutex};
auto it = file_names.find(file_name);
if (it == file_names.end())
return std::nullopt;
const auto & size_and_checksum = it->second;
UInt64 size = size_and_checksum.first;
FileInfo info;
if (size) /// we don't keep FileInfos for empty files
info = file_infos.at(size_and_checksum);
info.file_name = file_name;
return info;
}
std::optional<FileInfo> BackupCoordinationLocal::getFileInfo(const SizeAndChecksum & size_and_checksum) const
{
std::lock_guard lock{mutex};
auto it = file_infos.find(size_and_checksum);
if (it == file_infos.end())
return std::nullopt;
return it->second;
}
String BackupCoordinationLocal::getNextArchiveSuffix()
{
std::lock_guard lock{mutex};
String new_archive_suffix = fmt::format("{:03}", ++current_archive_suffix); /// Outputs 001, 002, 003, ...
archive_suffixes.push_back(new_archive_suffix);
return new_archive_suffix;
}
Strings BackupCoordinationLocal::getAllArchiveSuffixes() const
{
std::lock_guard lock{mutex};
return archive_suffixes;
}
bool BackupCoordinationLocal::hasConcurrentBackups(const std::atomic<size_t> & num_active_backups) const
{

View File

@ -1,12 +1,13 @@
#pragma once
#include <Backups/IBackupCoordination.h>
#include <Backups/BackupCoordinationFileInfos.h>
#include <Backups/BackupCoordinationReplicatedAccess.h>
#include <Backups/BackupCoordinationReplicatedSQLObjects.h>
#include <Backups/BackupCoordinationReplicatedTables.h>
#include <base/defines.h>
#include <map>
#include <mutex>
#include <unordered_set>
namespace Poco { class Logger; }
@ -18,7 +19,7 @@ namespace DB
class BackupCoordinationLocal : public IBackupCoordination
{
public:
BackupCoordinationLocal();
BackupCoordinationLocal(bool plain_backup_);
~BackupCoordinationLocal() override;
void setStage(const String & new_stage, const String & message) override;
@ -43,31 +44,25 @@ public:
void addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path) override;
Strings getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const override;
void addFileInfo(const FileInfo & file_info, bool & is_data_file_required) override;
void updateFileInfo(const FileInfo & file_info) override;
std::vector<FileInfo> getAllFileInfos() const override;
Strings listFiles(const String & directory, bool recursive) const override;
bool hasFiles(const String & directory) const override;
std::optional<FileInfo> getFileInfo(const String & file_name) const override;
std::optional<FileInfo> getFileInfo(const SizeAndChecksum & size_and_checksum) const override;
String getNextArchiveSuffix() override;
Strings getAllArchiveSuffixes() const override;
void addFileInfos(BackupFileInfos && file_infos) override;
BackupFileInfos getFileInfos() const override;
BackupFileInfos getFileInfosForAllHosts() const override;
bool startWritingFile(size_t data_file_index) override;
bool hasConcurrentBackups(const std::atomic<size_t> & num_active_backups) const override;
private:
mutable std::mutex mutex;
BackupCoordinationReplicatedTables replicated_tables TSA_GUARDED_BY(mutex);
BackupCoordinationReplicatedAccess replicated_access TSA_GUARDED_BY(mutex);
BackupCoordinationReplicatedSQLObjects replicated_sql_objects TSA_GUARDED_BY(mutex);
BackupCoordinationReplicatedTables TSA_GUARDED_BY(replicated_tables_mutex) replicated_tables;
BackupCoordinationReplicatedAccess TSA_GUARDED_BY(replicated_access_mutex) replicated_access;
BackupCoordinationReplicatedSQLObjects TSA_GUARDED_BY(replicated_sql_objects_mutex) replicated_sql_objects;
BackupCoordinationFileInfos TSA_GUARDED_BY(file_infos_mutex) file_infos;
std::unordered_set<size_t> TSA_GUARDED_BY(writing_files_mutex) writing_files;
std::map<String /* file_name */, SizeAndChecksum> file_names TSA_GUARDED_BY(mutex); /// Should be ordered alphabetically, see listFiles(). For empty files we assume checksum = 0.
std::map<SizeAndChecksum, FileInfo> file_infos TSA_GUARDED_BY(mutex); /// Information about files. Without empty files.
Strings archive_suffixes TSA_GUARDED_BY(mutex);
size_t current_archive_suffix TSA_GUARDED_BY(mutex) = 0;
mutable std::mutex replicated_tables_mutex;
mutable std::mutex replicated_access_mutex;
mutable std::mutex replicated_sql_objects_mutex;
mutable std::mutex file_infos_mutex;
mutable std::mutex writing_files_mutex;
};
}

View File

@ -7,7 +7,6 @@
#include <IO/WriteHelpers.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Common/escapeForFileName.h>
#include <base/hex.h>
#include <Backups/BackupCoordinationStage.h>
@ -16,21 +15,13 @@ namespace DB
namespace ErrorCodes
{
extern const int UNEXPECTED_NODE_IN_ZOOKEEPER;
extern const int LOGICAL_ERROR;
}
namespace Stage = BackupCoordinationStage;
/// zookeeper_path/file_names/file_name->checksum_and_size
/// zookeeper_path/file_infos/checksum_and_size->info
/// zookeeper_path/archive_suffixes
/// zookeeper_path/current_archive_suffix
namespace
{
using SizeAndChecksum = IBackupCoordination::SizeAndChecksum;
using FileInfo = IBackupCoordination::FileInfo;
using PartNameAndChecksum = IBackupCoordination::PartNameAndChecksum;
using MutationInfo = IBackupCoordination::MutationInfo;
@ -104,66 +95,46 @@ namespace
}
};
String serializeFileInfo(const FileInfo & info)
struct FileInfos
{
WriteBufferFromOwnString out;
writeBinary(info.file_name, out);
writeBinary(info.size, out);
writeBinary(info.checksum, out);
writeBinary(info.base_size, out);
writeBinary(info.base_checksum, out);
writeBinary(info.data_file_name, out);
writeBinary(info.archive_suffix, out);
writeBinary(info.pos_in_archive, out);
return out.str();
}
BackupFileInfos file_infos;
FileInfo deserializeFileInfo(const String & str)
{
FileInfo info;
ReadBufferFromString in{str};
readBinary(info.file_name, in);
readBinary(info.size, in);
readBinary(info.checksum, in);
readBinary(info.base_size, in);
readBinary(info.base_checksum, in);
readBinary(info.data_file_name, in);
readBinary(info.archive_suffix, in);
readBinary(info.pos_in_archive, in);
return info;
}
static String serialize(const BackupFileInfos & file_infos_)
{
WriteBufferFromOwnString out;
writeBinary(file_infos_.size(), out);
for (const auto & info : file_infos_)
{
writeBinary(info.file_name, out);
writeBinary(info.size, out);
writeBinary(info.checksum, out);
writeBinary(info.base_size, out);
writeBinary(info.base_checksum, out);
/// We don't store `info.data_file_name` and `info.data_file_index` because they're determined automalically
/// after reading file infos for all the hosts (see the class BackupCoordinationFileInfos).
}
return out.str();
}
String serializeSizeAndChecksum(const SizeAndChecksum & size_and_checksum)
{
return getHexUIntLowercase(size_and_checksum.second) + '_' + std::to_string(size_and_checksum.first);
}
SizeAndChecksum deserializeSizeAndChecksum(const String & str)
{
constexpr size_t num_chars_in_checksum = sizeof(UInt128) * 2;
if (str.size() <= num_chars_in_checksum)
throw Exception(
ErrorCodes::UNEXPECTED_NODE_IN_ZOOKEEPER,
"Unexpected size of checksum: {}, must be {}",
str.size(),
num_chars_in_checksum);
UInt128 checksum = unhexUInt<UInt128>(str.data());
UInt64 size = parseFromString<UInt64>(str.substr(num_chars_in_checksum + 1));
return std::pair{size, checksum};
}
size_t extractCounterFromSequentialNodeName(const String & node_name)
{
size_t pos_before_counter = node_name.find_last_not_of("0123456789");
size_t counter_length = node_name.length() - 1 - pos_before_counter;
auto counter = std::string_view{node_name}.substr(node_name.length() - counter_length);
return parseFromString<UInt64>(counter);
}
String formatArchiveSuffix(size_t counter)
{
return fmt::format("{:03}", counter); /// Outputs 001, 002, 003, ...
}
static FileInfos deserialize(const String & str)
{
ReadBufferFromString in{str};
FileInfos res;
size_t num;
readBinary(num, in);
res.file_infos.resize(num);
for (size_t i = 0; i != num; ++i)
{
auto & info = res.file_infos[i];
readBinary(info.file_name, in);
readBinary(info.size, in);
readBinary(info.checksum, in);
readBinary(info.base_size, in);
readBinary(info.base_checksum, in);
}
return res;
}
};
}
size_t BackupCoordinationRemote::findCurrentHostIndex(const Strings & all_hosts, const String & current_host)
@ -181,6 +152,7 @@ BackupCoordinationRemote::BackupCoordinationRemote(
const String & backup_uuid_,
const Strings & all_hosts_,
const String & current_host_,
bool plain_backup_,
bool is_internal_)
: get_zookeeper(get_zookeeper_)
, root_zookeeper_path(root_zookeeper_path_)
@ -190,6 +162,7 @@ BackupCoordinationRemote::BackupCoordinationRemote(
, all_hosts(all_hosts_)
, current_host(current_host_)
, current_host_index(findCurrentHostIndex(all_hosts, current_host))
, plain_backup(plain_backup_)
, is_internal(is_internal_)
{
zookeeper_retries_info = ZooKeeperRetriesInfo(
@ -219,12 +192,7 @@ BackupCoordinationRemote::~BackupCoordinationRemote()
zkutil::ZooKeeperPtr BackupCoordinationRemote::getZooKeeper() const
{
std::lock_guard lock{mutex};
return getZooKeeperNoLock();
}
zkutil::ZooKeeperPtr BackupCoordinationRemote::getZooKeeperNoLock() const
{
std::lock_guard lock{zookeeper_mutex};
if (!zookeeper || zookeeper->expired())
{
zookeeper = get_zookeeper();
@ -246,9 +214,8 @@ void BackupCoordinationRemote::createRootNodes()
zk->createIfNotExists(zookeeper_path + "/repl_data_paths", "");
zk->createIfNotExists(zookeeper_path + "/repl_access", "");
zk->createIfNotExists(zookeeper_path + "/repl_sql_objects", "");
zk->createIfNotExists(zookeeper_path + "/file_names", "");
zk->createIfNotExists(zookeeper_path + "/file_infos", "");
zk->createIfNotExists(zookeeper_path + "/archive_suffixes", "");
zk->createIfNotExists(zookeeper_path + "/writing_files", "");
}
void BackupCoordinationRemote::removeAllNodes()
@ -285,6 +252,72 @@ Strings BackupCoordinationRemote::waitForStage(const String & stage_to_wait, std
}
void BackupCoordinationRemote::serializeToMultipleZooKeeperNodes(const String & path, const String & value, const String & logging_name)
{
{
ZooKeeperRetriesControl retries_ctl(logging_name + "::create", zookeeper_retries_info);
retries_ctl.retryLoop([&]
{
auto zk = getZooKeeper();
zk->createIfNotExists(path, "");
});
}
if (value.empty())
return;
size_t max_part_size = keeper_settings.keeper_value_max_size;
if (!max_part_size)
max_part_size = value.size();
size_t num_parts = (value.size() + max_part_size - 1) / max_part_size; /// round up
for (size_t i = 0; i != num_parts; ++i)
{
size_t begin = i * max_part_size;
size_t end = std::min(begin + max_part_size, value.size());
String part = value.substr(begin, end - begin);
String part_path = fmt::format("{}/{:06}", path, i);
ZooKeeperRetriesControl retries_ctl(logging_name + "::createPart", zookeeper_retries_info);
retries_ctl.retryLoop([&]
{
auto zk = getZooKeeper();
zk->createIfNotExists(part_path, part);
});
}
}
String BackupCoordinationRemote::deserializeFromMultipleZooKeeperNodes(const String & path, const String & logging_name) const
{
Strings part_names;
{
ZooKeeperRetriesControl retries_ctl(logging_name + "::getChildren", zookeeper_retries_info);
retries_ctl.retryLoop([&]{
auto zk = getZooKeeper();
part_names = zk->getChildren(path);
std::sort(part_names.begin(), part_names.end());
});
}
String res;
for (const String & part_name : part_names)
{
String part;
String part_path = path + "/" + part_name;
ZooKeeperRetriesControl retries_ctl(logging_name + "::get", zookeeper_retries_info);
retries_ctl.retryLoop([&]
{
auto zk = getZooKeeper();
part = zk->get(part_path);
});
res += part;
}
return res;
}
void BackupCoordinationRemote::addReplicatedPartNames(
const String & table_shared_id,
const String & table_name_for_logs,
@ -292,7 +325,7 @@ void BackupCoordinationRemote::addReplicatedPartNames(
const std::vector<PartNameAndChecksum> & part_names_and_checksums)
{
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
if (replicated_tables)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedPartNames() must not be called after preparing");
}
@ -306,7 +339,7 @@ void BackupCoordinationRemote::addReplicatedPartNames(
Strings BackupCoordinationRemote::getReplicatedPartNames(const String & table_shared_id, const String & replica_name) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
prepareReplicatedTables();
return replicated_tables->getPartNames(table_shared_id, replica_name);
}
@ -318,7 +351,7 @@ void BackupCoordinationRemote::addReplicatedMutations(
const std::vector<MutationInfo> & mutations)
{
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
if (replicated_tables)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedMutations() must not be called after preparing");
}
@ -332,7 +365,7 @@ void BackupCoordinationRemote::addReplicatedMutations(
std::vector<IBackupCoordination::MutationInfo> BackupCoordinationRemote::getReplicatedMutations(const String & table_shared_id, const String & replica_name) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
prepareReplicatedTables();
return replicated_tables->getMutations(table_shared_id, replica_name);
}
@ -342,7 +375,7 @@ void BackupCoordinationRemote::addReplicatedDataPath(
const String & table_shared_id, const String & data_path)
{
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
if (replicated_tables)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedDataPath() must not be called after preparing");
}
@ -356,7 +389,7 @@ void BackupCoordinationRemote::addReplicatedDataPath(
Strings BackupCoordinationRemote::getReplicatedDataPaths(const String & table_shared_id) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_tables_mutex};
prepareReplicatedTables();
return replicated_tables->getDataPaths(table_shared_id);
}
@ -368,7 +401,7 @@ void BackupCoordinationRemote::prepareReplicatedTables() const
return;
replicated_tables.emplace();
auto zk = getZooKeeperNoLock();
auto zk = getZooKeeper();
{
String path = zookeeper_path + "/repl_part_names";
@ -419,7 +452,7 @@ void BackupCoordinationRemote::prepareReplicatedTables() const
void BackupCoordinationRemote::addReplicatedAccessFilePath(const String & access_zk_path, AccessEntityType access_entity_type, const String & file_path)
{
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_access_mutex};
if (replicated_access)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedAccessFilePath() must not be called after preparing");
}
@ -435,7 +468,7 @@ void BackupCoordinationRemote::addReplicatedAccessFilePath(const String & access
Strings BackupCoordinationRemote::getReplicatedAccessFilePaths(const String & access_zk_path, AccessEntityType access_entity_type) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_access_mutex};
prepareReplicatedAccess();
return replicated_access->getFilePaths(access_zk_path, access_entity_type, current_host);
}
@ -446,7 +479,7 @@ void BackupCoordinationRemote::prepareReplicatedAccess() const
return;
replicated_access.emplace();
auto zk = getZooKeeperNoLock();
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_access";
for (const String & escaped_access_zk_path : zk->getChildren(path))
@ -469,7 +502,7 @@ void BackupCoordinationRemote::prepareReplicatedAccess() const
void BackupCoordinationRemote::addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path)
{
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_sql_objects_mutex};
if (replicated_sql_objects)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addReplicatedSQLObjectsDir() must not be called after preparing");
}
@ -493,7 +526,7 @@ void BackupCoordinationRemote::addReplicatedSQLObjectsDir(const String & loader_
Strings BackupCoordinationRemote::getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const
{
std::lock_guard lock{mutex};
std::lock_guard lock{replicated_sql_objects_mutex};
prepareReplicatedSQLObjects();
return replicated_sql_objects->getDirectories(loader_zk_path, object_type, current_host);
}
@ -504,7 +537,7 @@ void BackupCoordinationRemote::prepareReplicatedSQLObjects() const
return;
replicated_sql_objects.emplace();
auto zk = getZooKeeperNoLock();
auto zk = getZooKeeper();
String path = zookeeper_path + "/repl_sql_objects";
for (const String & escaped_loader_zk_path : zk->getChildren(path))
@ -525,274 +558,80 @@ void BackupCoordinationRemote::prepareReplicatedSQLObjects() const
}
void BackupCoordinationRemote::addFileInfo(const FileInfo & file_info, bool & is_data_file_required)
void BackupCoordinationRemote::addFileInfos(BackupFileInfos && file_infos_)
{
auto zk = getZooKeeper();
String full_path = zookeeper_path + "/file_names/" + escapeForFileName(file_info.file_name);
String size_and_checksum = serializeSizeAndChecksum(std::pair{file_info.size, file_info.checksum});
zk->create(full_path, size_and_checksum, zkutil::CreateMode::Persistent);
if (!file_info.size)
{
is_data_file_required = false;
std::lock_guard lock{file_infos_mutex};
if (file_infos)
throw Exception(ErrorCodes::LOGICAL_ERROR, "addFileInfos() must not be called after preparing");
}
/// Serialize `file_infos_` and write it to ZooKeeper's nodes.
String file_infos_str = FileInfos::serialize(file_infos_);
serializeToMultipleZooKeeperNodes(zookeeper_path + "/file_infos/" + current_host, file_infos_str, "addFileInfos");
}
BackupFileInfos BackupCoordinationRemote::getFileInfos() const
{
std::lock_guard lock{file_infos_mutex};
prepareFileInfos();
return file_infos->getFileInfos(current_host);
}
BackupFileInfos BackupCoordinationRemote::getFileInfosForAllHosts() const
{
std::lock_guard lock{file_infos_mutex};
prepareFileInfos();
return file_infos->getFileInfosForAllHosts();
}
void BackupCoordinationRemote::prepareFileInfos() const
{
if (file_infos)
return;
file_infos.emplace(plain_backup);
Strings hosts_with_file_infos;
{
ZooKeeperRetriesControl retries_ctl("prepareFileInfos::get_hosts", zookeeper_retries_info);
retries_ctl.retryLoop([&]{
auto zk = getZooKeeper();
hosts_with_file_infos = zk->getChildren(zookeeper_path + "/file_infos");
});
}
full_path = zookeeper_path + "/file_infos/" + size_and_checksum;
auto code = zk->tryCreate(full_path, serializeFileInfo(file_info), zkutil::CreateMode::Persistent);
if ((code != Coordination::Error::ZOK) && (code != Coordination::Error::ZNODEEXISTS))
throw zkutil::KeeperException(code, full_path);
is_data_file_required = (code == Coordination::Error::ZOK) && (file_info.size > file_info.base_size);
for (const String & host : hosts_with_file_infos)
{
String file_infos_str = deserializeFromMultipleZooKeeperNodes(zookeeper_path + "/file_infos/" + host, "prepareFileInfos");
auto deserialized_file_infos = FileInfos::deserialize(file_infos_str).file_infos;
file_infos->addFileInfos(std::move(deserialized_file_infos), host);
}
}
void BackupCoordinationRemote::updateFileInfo(const FileInfo & file_info)
bool BackupCoordinationRemote::startWritingFile(size_t data_file_index)
{
if (!file_info.size)
return; /// we don't keep FileInfos for empty files, nothing to update
bool acquired_writing = false;
String full_path = zookeeper_path + "/writing_files/" + std::to_string(data_file_index);
String host_index_str = std::to_string(current_host_index);
auto zk = getZooKeeper();
String size_and_checksum = serializeSizeAndChecksum(std::pair{file_info.size, file_info.checksum});
String full_path = zookeeper_path + "/file_infos/" + size_and_checksum;
for (size_t attempt = 0; attempt < MAX_ZOOKEEPER_ATTEMPTS; ++attempt)
ZooKeeperRetriesControl retries_ctl("startWritingFile", zookeeper_retries_info);
retries_ctl.retryLoop([&]
{
Coordination::Stat stat;
auto new_info = deserializeFileInfo(zk->get(full_path, &stat));
new_info.archive_suffix = file_info.archive_suffix;
auto code = zk->trySet(full_path, serializeFileInfo(new_info), stat.version);
auto zk = getZooKeeper();
auto code = zk->tryCreate(full_path, host_index_str, zkutil::CreateMode::Persistent);
if (code == Coordination::Error::ZOK)
return;
bool is_last_attempt = (attempt == MAX_ZOOKEEPER_ATTEMPTS - 1);
if ((code != Coordination::Error::ZBADVERSION) || is_last_attempt)
acquired_writing = true; /// If we've just created this ZooKeeper's node, the writing is acquired, i.e. we should write this data file.
else if (code == Coordination::Error::ZNODEEXISTS)
acquired_writing = (zk->get(full_path) == host_index_str); /// The previous retry could write this ZooKeeper's node and then fail.
else
throw zkutil::KeeperException(code, full_path);
}
});
return acquired_writing;
}
std::vector<FileInfo> BackupCoordinationRemote::getAllFileInfos() const
{
/// There could be tons of files inside /file_names or /file_infos
/// Thus we use MultiRead requests for processing them
/// We also use [Zoo]Keeper retries and it should be safe, because
/// this function is called at the end after the actual copying is finished.
auto split_vector = [](Strings && vec, size_t max_batch_size) -> std::vector<Strings>
{
std::vector<Strings> result;
size_t left_border = 0;
auto move_to_result = [&](auto && begin, auto && end)
{
auto batch = Strings();
batch.reserve(max_batch_size);
std::move(begin, end, std::back_inserter(batch));
result.push_back(std::move(batch));
};
if (max_batch_size == 0)
{
move_to_result(vec.begin(), vec.end());
return result;
}
for (size_t pos = 0; pos < vec.size(); ++pos)
{
if (pos >= left_border + max_batch_size)
{
move_to_result(vec.begin() + left_border, vec.begin() + pos);
left_border = pos;
}
}
if (vec.begin() + left_border != vec.end())
move_to_result(vec.begin() + left_border, vec.end());
return result;
};
std::vector<Strings> batched_escaped_names;
{
ZooKeeperRetriesControl retries_ctl("getAllFileInfos::getChildren", zookeeper_retries_info);
retries_ctl.retryLoop([&]()
{
auto zk = getZooKeeper();
batched_escaped_names = split_vector(zk->getChildren(zookeeper_path + "/file_names"), keeper_settings.batch_size_for_keeper_multiread);
});
}
std::vector<FileInfo> file_infos;
file_infos.reserve(batched_escaped_names.size());
for (auto & batch : batched_escaped_names)
{
zkutil::ZooKeeper::MultiGetResponse sizes_and_checksums;
{
Strings file_names_paths;
file_names_paths.reserve(batch.size());
for (const String & escaped_name : batch)
file_names_paths.emplace_back(zookeeper_path + "/file_names/" + escaped_name);
ZooKeeperRetriesControl retries_ctl("getAllFileInfos::getSizesAndChecksums", zookeeper_retries_info);
retries_ctl.retryLoop([&]
{
auto zk = getZooKeeper();
sizes_and_checksums = zk->get(file_names_paths);
});
}
Strings non_empty_file_names;
Strings non_empty_file_infos_paths;
std::vector<FileInfo> non_empty_files_infos;
/// Process all files and understand whether there are some empty files
/// Save non empty file names for further batch processing
{
std::vector<FileInfo> empty_files_infos;
for (size_t i = 0; i < batch.size(); ++i)
{
auto file_name = batch[i];
if (sizes_and_checksums[i].error != Coordination::Error::ZOK)
throw zkutil::KeeperException(sizes_and_checksums[i].error);
const auto & size_and_checksum = sizes_and_checksums[i].data;
auto size = deserializeSizeAndChecksum(size_and_checksum).first;
if (size)
{
/// Save it later for batch processing
non_empty_file_names.emplace_back(file_name);
non_empty_file_infos_paths.emplace_back(zookeeper_path + "/file_infos/" + size_and_checksum);
continue;
}
/// File is empty
FileInfo empty_file_info;
empty_file_info.file_name = unescapeForFileName(file_name);
empty_files_infos.emplace_back(std::move(empty_file_info));
}
std::move(empty_files_infos.begin(), empty_files_infos.end(), std::back_inserter(file_infos));
}
zkutil::ZooKeeper::MultiGetResponse non_empty_file_infos_serialized;
ZooKeeperRetriesControl retries_ctl("getAllFileInfos::getFileInfos", zookeeper_retries_info);
retries_ctl.retryLoop([&]()
{
auto zk = getZooKeeper();
non_empty_file_infos_serialized = zk->get(non_empty_file_infos_paths);
});
/// Process non empty files
for (size_t i = 0; i < non_empty_file_names.size(); ++i)
{
FileInfo file_info;
if (non_empty_file_infos_serialized[i].error != Coordination::Error::ZOK)
throw zkutil::KeeperException(non_empty_file_infos_serialized[i].error);
file_info = deserializeFileInfo(non_empty_file_infos_serialized[i].data);
file_info.file_name = unescapeForFileName(non_empty_file_names[i]);
non_empty_files_infos.emplace_back(std::move(file_info));
}
std::move(non_empty_files_infos.begin(), non_empty_files_infos.end(), std::back_inserter(file_infos));
}
return file_infos;
}
Strings BackupCoordinationRemote::listFiles(const String & directory, bool recursive) const
{
auto zk = getZooKeeper();
Strings escaped_names = zk->getChildren(zookeeper_path + "/file_names");
String prefix = directory;
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
String terminator = recursive ? "" : "/";
Strings elements;
std::unordered_set<std::string_view> unique_elements;
for (const String & escaped_name : escaped_names)
{
String name = unescapeForFileName(escaped_name);
if (!name.starts_with(prefix))
continue;
size_t start_pos = prefix.length();
size_t end_pos = String::npos;
if (!terminator.empty())
end_pos = name.find(terminator, start_pos);
std::string_view new_element = std::string_view{name}.substr(start_pos, end_pos - start_pos);
if (unique_elements.contains(new_element))
continue;
elements.push_back(String{new_element});
unique_elements.emplace(new_element);
}
::sort(elements.begin(), elements.end());
return elements;
}
bool BackupCoordinationRemote::hasFiles(const String & directory) const
{
auto zk = getZooKeeper();
Strings escaped_names = zk->getChildren(zookeeper_path + "/file_names");
String prefix = directory;
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
for (const String & escaped_name : escaped_names)
{
String name = unescapeForFileName(escaped_name);
if (name.starts_with(prefix))
return true;
}
return false;
}
std::optional<FileInfo> BackupCoordinationRemote::getFileInfo(const String & file_name) const
{
auto zk = getZooKeeper();
String size_and_checksum;
if (!zk->tryGet(zookeeper_path + "/file_names/" + escapeForFileName(file_name), size_and_checksum))
return std::nullopt;
UInt64 size = deserializeSizeAndChecksum(size_and_checksum).first;
FileInfo file_info;
if (size) /// we don't keep FileInfos for empty files
file_info = deserializeFileInfo(zk->get(zookeeper_path + "/file_infos/" + size_and_checksum));
file_info.file_name = file_name;
return file_info;
}
std::optional<FileInfo> BackupCoordinationRemote::getFileInfo(const SizeAndChecksum & size_and_checksum) const
{
auto zk = getZooKeeper();
String file_info_str;
if (!zk->tryGet(zookeeper_path + "/file_infos/" + serializeSizeAndChecksum(size_and_checksum), file_info_str))
return std::nullopt;
return deserializeFileInfo(file_info_str);
}
String BackupCoordinationRemote::getNextArchiveSuffix()
{
auto zk = getZooKeeper();
String path = zookeeper_path + "/archive_suffixes/a";
String path_created;
auto code = zk->tryCreate(path, "", zkutil::CreateMode::PersistentSequential, path_created);
if (code != Coordination::Error::ZOK)
throw zkutil::KeeperException(code, path);
return formatArchiveSuffix(extractCounterFromSequentialNodeName(path_created));
}
Strings BackupCoordinationRemote::getAllArchiveSuffixes() const
{
auto zk = getZooKeeper();
Strings node_names = zk->getChildren(zookeeper_path + "/archive_suffixes");
for (auto & node_name : node_names)
node_name = formatArchiveSuffix(extractCounterFromSequentialNodeName(node_name));
return node_names;
}
bool BackupCoordinationRemote::hasConcurrentBackups(const std::atomic<size_t> &) const
{

View File

@ -1,6 +1,7 @@
#pragma once
#include <Backups/IBackupCoordination.h>
#include <Backups/BackupCoordinationFileInfos.h>
#include <Backups/BackupCoordinationReplicatedAccess.h>
#include <Backups/BackupCoordinationReplicatedSQLObjects.h>
#include <Backups/BackupCoordinationReplicatedTables.h>
@ -23,7 +24,7 @@ public:
UInt64 keeper_max_retries;
UInt64 keeper_retry_initial_backoff_ms;
UInt64 keeper_retry_max_backoff_ms;
UInt64 batch_size_for_keeper_multiread;
UInt64 keeper_value_max_size;
};
BackupCoordinationRemote(
@ -33,6 +34,7 @@ public:
const String & backup_uuid_,
const Strings & all_hosts_,
const String & current_host_,
bool plain_backup_,
bool is_internal_);
~BackupCoordinationRemote() override;
@ -67,17 +69,10 @@ public:
void addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path) override;
Strings getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const override;
void addFileInfo(const FileInfo & file_info, bool & is_data_file_required) override;
void updateFileInfo(const FileInfo & file_info) override;
std::vector<FileInfo> getAllFileInfos() const override;
Strings listFiles(const String & directory, bool recursive) const override;
bool hasFiles(const String & directory) const override;
std::optional<FileInfo> getFileInfo(const String & file_name) const override;
std::optional<FileInfo> getFileInfo(const SizeAndChecksum & size_and_checksum) const override;
String getNextArchiveSuffix() override;
Strings getAllArchiveSuffixes() const override;
void addFileInfos(BackupFileInfos && file_infos) override;
BackupFileInfos getFileInfos() const override;
BackupFileInfos getFileInfosForAllHosts() const override;
bool startWritingFile(size_t data_file_index) override;
bool hasConcurrentBackups(const std::atomic<size_t> & num_active_backups) const override;
@ -85,16 +80,19 @@ public:
private:
zkutil::ZooKeeperPtr getZooKeeper() const;
zkutil::ZooKeeperPtr getZooKeeperNoLock() const;
void createRootNodes();
void removeAllNodes();
void serializeToMultipleZooKeeperNodes(const String & path, const String & value, const String & logging_name);
String deserializeFromMultipleZooKeeperNodes(const String & path, const String & logging_name) const;
/// Reads data of all objects from ZooKeeper that replicas have added to backup and add it to the corresponding
/// BackupCoordinationReplicated* objects.
/// After that, calling addReplicated* functions is not allowed and throws an exception.
void prepareReplicatedTables() const;
void prepareReplicatedAccess() const;
void prepareReplicatedSQLObjects() const;
void prepareReplicatedTables() const TSA_REQUIRES(replicated_tables_mutex);
void prepareReplicatedAccess() const TSA_REQUIRES(replicated_access_mutex);
void prepareReplicatedSQLObjects() const TSA_REQUIRES(replicated_sql_objects_mutex);
void prepareFileInfos() const TSA_REQUIRES(file_infos_mutex);
const zkutil::GetZooKeeper get_zookeeper;
const String root_zookeeper_path;
@ -104,16 +102,23 @@ private:
const Strings all_hosts;
const String current_host;
const size_t current_host_index;
const bool plain_backup;
const bool is_internal;
mutable ZooKeeperRetriesInfo zookeeper_retries_info;
std::optional<BackupCoordinationStageSync> stage_sync;
mutable std::mutex mutex;
mutable zkutil::ZooKeeperPtr zookeeper;
mutable std::optional<BackupCoordinationReplicatedTables> replicated_tables;
mutable std::optional<BackupCoordinationReplicatedAccess> replicated_access;
mutable std::optional<BackupCoordinationReplicatedSQLObjects> replicated_sql_objects;
mutable zkutil::ZooKeeperPtr TSA_GUARDED_BY(zookeeper_mutex) zookeeper;
mutable std::optional<BackupCoordinationReplicatedTables> TSA_GUARDED_BY(replicated_tables_mutex) replicated_tables;
mutable std::optional<BackupCoordinationReplicatedAccess> TSA_GUARDED_BY(replicated_access_mutex) replicated_access;
mutable std::optional<BackupCoordinationReplicatedSQLObjects> TSA_GUARDED_BY(replicated_sql_objects_mutex) replicated_sql_objects;
mutable std::optional<BackupCoordinationFileInfos> TSA_GUARDED_BY(file_infos_mutex) file_infos;
mutable std::mutex zookeeper_mutex;
mutable std::mutex replicated_tables_mutex;
mutable std::mutex replicated_access_mutex;
mutable std::mutex replicated_sql_objects_mutex;
mutable std::mutex file_infos_mutex;
};
}

View File

@ -23,6 +23,9 @@ namespace BackupCoordinationStage
/// Running special tasks for replicated tables which can also prepare some backup entries.
constexpr const char * RUNNING_POST_TASKS = "running post-tasks";
/// Building information about all files which will be written to a backup.
constexpr const char * BUILDING_FILE_INFOS = "building file infos";
/// Writing backup entries to the backup and removing temporary hard links.
constexpr const char * WRITING_BACKUP = "writing backup";

View File

@ -123,7 +123,6 @@ BackupEntries BackupEntriesCollector::run()
runPostTasks();
/// No more backup entries or tasks are allowed after this point.
setStage(Stage::WRITING_BACKUP);
return std::move(backup_entries);
}

View File

@ -0,0 +1,284 @@
#include <Backups/BackupFileInfo.h>
#include <Backups/IBackup.h>
#include <Backups/IBackupEntry.h>
#include <Common/CurrentThread.h>
#include <Common/logger_useful.h>
#include <Common/scope_guard_safe.h>
#include <Common/setThreadName.h>
#include <IO/HashingReadBuffer.h>
namespace DB
{
namespace
{
using SizeAndChecksum = std::pair<UInt64, UInt128>;
std::optional<SizeAndChecksum> getInfoAboutFileFromBaseBackupIfExists(const BackupPtr & base_backup, const std::string & file_path)
{
if (base_backup && base_backup->fileExists(file_path))
return base_backup->getFileSizeAndChecksum(file_path);
return std::nullopt;
}
enum class CheckBackupResult
{
HasPrefix,
HasFull,
HasNothing,
};
CheckBackupResult checkBaseBackupForFile(const SizeAndChecksum & base_backup_info, const BackupFileInfo & new_entry_info)
{
/// We cannot reuse base backup because our file is smaller
/// than file stored in previous backup
if (new_entry_info.size < base_backup_info.first)
return CheckBackupResult::HasNothing;
if (base_backup_info.first == new_entry_info.size)
return CheckBackupResult::HasFull;
return CheckBackupResult::HasPrefix;
}
struct ChecksumsForNewEntry
{
UInt128 full_checksum;
UInt128 prefix_checksum;
};
/// Calculate checksum for backup entry if it's empty.
/// Also able to calculate additional checksum of some prefix.
ChecksumsForNewEntry calculateNewEntryChecksumsIfNeeded(const BackupEntryPtr & entry, size_t prefix_size)
{
if (prefix_size > 0)
{
auto read_buffer = entry->getReadBuffer();
HashingReadBuffer hashing_read_buffer(*read_buffer);
hashing_read_buffer.ignore(prefix_size);
auto prefix_checksum = hashing_read_buffer.getHash();
if (entry->getChecksum() == std::nullopt)
{
hashing_read_buffer.ignoreAll();
auto full_checksum = hashing_read_buffer.getHash();
return ChecksumsForNewEntry{full_checksum, prefix_checksum};
}
else
{
return ChecksumsForNewEntry{*(entry->getChecksum()), prefix_checksum};
}
}
else
{
if (entry->getChecksum() == std::nullopt)
{
auto read_buffer = entry->getReadBuffer();
HashingReadBuffer hashing_read_buffer(*read_buffer);
hashing_read_buffer.ignoreAll();
return ChecksumsForNewEntry{hashing_read_buffer.getHash(), 0};
}
else
{
return ChecksumsForNewEntry{*(entry->getChecksum()), 0};
}
}
}
/// We store entries' file names in the backup without leading slashes.
String removeLeadingSlash(const String & path)
{
if (path.starts_with('/'))
return path.substr(1);
return path;
}
}
/// Note: this format doesn't allow to parse data back
/// It is useful only for debugging purposes
String BackupFileInfo::describe() const
{
String result;
result += fmt::format("file_name: {};\n", file_name);
result += fmt::format("size: {};\n", size);
result += fmt::format("checksum: {};\n", getHexUIntLowercase(checksum));
result += fmt::format("base_size: {};\n", base_size);
result += fmt::format("base_checksum: {};\n", getHexUIntLowercase(checksum));
result += fmt::format("data_file_name: {};\n", data_file_name);
result += fmt::format("data_file_index: {};\n", data_file_index);
return result;
}
BackupFileInfo buildFileInfoForBackupEntry(const String & file_name, const BackupEntryPtr & backup_entry, const BackupPtr & base_backup, Poco::Logger * log)
{
auto adjusted_path = removeLeadingSlash(file_name);
BackupFileInfo info;
info.file_name = adjusted_path;
info.size = backup_entry->getSize();
/// We don't set `info.data_file_name` and `info.data_file_index` in this function because they're set during backup coordination
/// (see the class BackupCoordinationFileInfos).
if (!info.size)
{
/// Empty file.
return info;
}
if (!log)
log = &Poco::Logger::get("FileInfoFromBackupEntry");
std::optional<SizeAndChecksum> base_backup_file_info = getInfoAboutFileFromBaseBackupIfExists(base_backup, adjusted_path);
/// We have info about this file in base backup
/// If file has no checksum -- calculate and fill it.
if (base_backup_file_info.has_value())
{
LOG_TRACE(log, "File {} found in base backup, checking for equality", adjusted_path);
CheckBackupResult check_base = checkBaseBackupForFile(*base_backup_file_info, info);
/// File with the same name but smaller size exist in previous backup
if (check_base == CheckBackupResult::HasPrefix)
{
auto checksums = calculateNewEntryChecksumsIfNeeded(backup_entry, base_backup_file_info->first);
info.checksum = checksums.full_checksum;
/// We have prefix of this file in backup with the same checksum.
/// In ClickHouse this can happen for StorageLog for example.
if (checksums.prefix_checksum == base_backup_file_info->second)
{
LOG_TRACE(log, "Found prefix of file {} in the base backup, will write rest of the file to current backup", adjusted_path);
info.base_size = base_backup_file_info->first;
info.base_checksum = base_backup_file_info->second;
}
else
{
LOG_TRACE(log, "Prefix of file {} doesn't match the file in the base backup", adjusted_path);
}
}
else
{
/// We have full file or have nothing, first of all let's get checksum
/// of current file
auto checksums = calculateNewEntryChecksumsIfNeeded(backup_entry, 0);
info.checksum = checksums.full_checksum;
if (info.checksum == base_backup_file_info->second)
{
LOG_TRACE(log, "Found whole file {} in base backup", adjusted_path);
assert(check_base == CheckBackupResult::HasFull);
assert(info.size == base_backup_file_info->first);
info.base_size = base_backup_file_info->first;
info.base_checksum = base_backup_file_info->second;
/// Actually we can add this info to coordination and exist,
/// but we intentionally don't do it, otherwise control flow
/// of this function will be very complex.
}
else
{
LOG_TRACE(log, "Whole file {} in base backup doesn't match by checksum", adjusted_path);
}
}
}
else
{
auto checksums = calculateNewEntryChecksumsIfNeeded(backup_entry, 0);
info.checksum = checksums.full_checksum;
}
/// We don't have info about this file_name (sic!) in base backup,
/// however file could be renamed, so we will check one more time using size and checksum
if (base_backup && base_backup->fileExists(std::pair{info.size, info.checksum}))
{
LOG_TRACE(log, "Found a file in the base backup with the same size and checksum as {}", adjusted_path);
info.base_size = info.size;
info.base_checksum = info.checksum;
}
if (base_backup && !info.base_size)
LOG_TRACE(log, "Nothing found for file {} in base backup", adjusted_path);
return info;
}
BackupFileInfos buildFileInfosForBackupEntries(const BackupEntries & backup_entries, const BackupPtr & base_backup, ThreadPool & thread_pool)
{
BackupFileInfos infos;
infos.resize(backup_entries.size());
size_t num_active_jobs = 0;
std::mutex mutex;
std::condition_variable event;
std::exception_ptr exception;
auto thread_group = CurrentThread::getGroup();
Poco::Logger * log = &Poco::Logger::get("FileInfosFromBackupEntries");
for (size_t i = 0; i != backup_entries.size(); ++i)
{
{
std::lock_guard lock{mutex};
if (exception)
break;
++num_active_jobs;
}
auto job = [&mutex, &num_active_jobs, &event, &exception, &infos, &backup_entries, &base_backup, &thread_group, i, log](bool async)
{
SCOPE_EXIT_SAFE({
std::lock_guard lock{mutex};
if (!--num_active_jobs)
event.notify_all();
if (async)
CurrentThread::detachFromGroupIfNotDetached();
});
try
{
const auto & name = backup_entries[i].first;
const auto & entry = backup_entries[i].second;
if (async && thread_group)
CurrentThread::attachToGroup(thread_group);
if (async)
setThreadName("BackupWorker");
{
std::lock_guard lock{mutex};
if (exception)
return;
}
infos[i] = buildFileInfoForBackupEntry(name, entry, base_backup, log);
}
catch (...)
{
std::lock_guard lock{mutex};
if (!exception)
exception = std::current_exception();
}
};
if (!thread_pool.trySchedule([job] { job(true); }))
job(false);
}
{
std::unique_lock lock{mutex};
event.wait(lock, [&] { return !num_active_jobs; });
if (exception)
std::rethrow_exception(exception);
}
return infos;
}
}

View File

@ -0,0 +1,70 @@
#pragma once
#include <Core/Types.h>
#include <Common/ThreadPool.h>
namespace DB
{
class IBackup;
class IBackupEntry;
using BackupPtr = std::shared_ptr<const IBackup>;
using BackupEntryPtr = std::shared_ptr<const IBackupEntry>;
using BackupEntries = std::vector<std::pair<String, BackupEntryPtr>>;
/// Information about a file stored in a backup.
struct BackupFileInfo
{
String file_name;
UInt64 size = 0;
UInt128 checksum{0};
/// for incremental backups
UInt64 base_size = 0;
UInt128 base_checksum{0};
/// Name of the data file. An empty string means there is no data file (that can happen if the file is empty or was taken from the base backup as a whole).
/// This field is set during backup coordination (see the class BackupCoordinationFileInfos).
String data_file_name;
/// Index of the data file. -1 means there is no data file.
/// This field is set during backup coordination (see the class BackupCoordinationFileInfos).
size_t data_file_index = static_cast<size_t>(-1);
struct LessByFileName
{
bool operator()(const BackupFileInfo & lhs, const BackupFileInfo & rhs) const { return (lhs.file_name < rhs.file_name); }
bool operator()(const BackupFileInfo * lhs, const BackupFileInfo * rhs) const { return (lhs->file_name < rhs->file_name); }
};
struct EqualByFileName
{
bool operator()(const BackupFileInfo & lhs, const BackupFileInfo & rhs) const { return (lhs.file_name == rhs.file_name); }
bool operator()(const BackupFileInfo * lhs, const BackupFileInfo * rhs) const { return (lhs->file_name == rhs->file_name); }
};
struct LessBySizeOrChecksum
{
bool operator()(const BackupFileInfo & lhs, const BackupFileInfo & rhs) const
{
return (lhs.size < rhs.size) || (lhs.size == rhs.size && lhs.checksum < rhs.checksum);
}
};
/// Note: this format doesn't allow to parse data back.
/// Must be used only for debugging purposes.
String describe() const;
};
using BackupFileInfos = std::vector<BackupFileInfo>;
/// Builds a BackupFileInfo for a specified backup entry.
BackupFileInfo buildFileInfoForBackupEntry(const String & file_name, const BackupEntryPtr & backup_entry, const BackupPtr & base_backup, Poco::Logger * log);
/// Builds a vector of BackupFileInfos for specified backup entries.
BackupFileInfos buildFileInfosForBackupEntries(const BackupEntries & backup_entries, const BackupPtr & base_backup, ThreadPool & thread_pool);
}

View File

@ -66,12 +66,17 @@ namespace
credentials.GetAWSSecretKey(),
settings.auth_settings.server_side_encryption_customer_key_base64,
std::move(headers),
settings.auth_settings.use_environment_credentials.value_or(
context->getConfigRef().getBool("s3.use_environment_credentials", false)),
settings.auth_settings.use_insecure_imds_request.value_or(
context->getConfigRef().getBool("s3.use_insecure_imds_request", false)),
settings.auth_settings.expiration_window_seconds.value_or(
context->getConfigRef().getUInt64("s3.expiration_window_seconds", S3::DEFAULT_EXPIRATION_WINDOW_SECONDS)));
S3::CredentialsConfiguration
{
settings.auth_settings.use_environment_credentials.value_or(
context->getConfigRef().getBool("s3.use_environment_credentials", false)),
settings.auth_settings.use_insecure_imds_request.value_or(
context->getConfigRef().getBool("s3.use_insecure_imds_request", false)),
settings.auth_settings.expiration_window_seconds.value_or(
context->getConfigRef().getUInt64("s3.expiration_window_seconds", S3::DEFAULT_EXPIRATION_WINDOW_SECONDS)),
settings.auth_settings.no_sign_request.value_or(
context->getConfigRef().getBool("s3.no_sign_request", false)),
});
}
Aws::Vector<Aws::S3::Model::Object> listObjects(S3::Client & client, const S3::URI & s3_uri, const String & file_name)

View File

@ -1,12 +1,11 @@
#include <Backups/BackupImpl.h>
#include <Backups/BackupFactory.h>
#include <Backups/BackupEntryFromMemory.h>
#include <Backups/BackupFileInfo.h>
#include <Backups/BackupIO.h>
#include <Backups/IBackupEntry.h>
#include <Backups/BackupCoordinationLocal.h>
#include <Backups/BackupCoordinationRemote.h>
#include <Common/StringUtils/StringUtils.h>
#include <base/hex.h>
#include <Common/logger_useful.h>
#include <Common/quoteString.h>
#include <Common/XMLUtils.h>
#include <Interpreters/Context.h>
@ -15,7 +14,6 @@
#include <IO/Archives/createArchiveReader.h>
#include <IO/Archives/createArchiveWriter.h>
#include <IO/ConcatSeekableReadBuffer.h>
#include <IO/HashingReadBuffer.h>
#include <IO/ReadHelpers.h>
#include <IO/SeekableReadBuffer.h>
#include <IO/WriteBufferFromFileBase.h>
@ -36,7 +34,6 @@ namespace ErrorCodes
extern const int BACKUP_DAMAGED;
extern const int NO_BASE_BACKUP;
extern const int WRONG_BASE_BACKUP;
extern const int BACKUP_ENTRY_ALREADY_EXISTS;
extern const int BACKUP_ENTRY_NOT_FOUND;
extern const int BACKUP_IS_EMPTY;
extern const int FAILED_TO_SYNC_BACKUP_OR_RESTORE;
@ -49,7 +46,6 @@ namespace
const int CURRENT_BACKUP_VERSION = 1;
using SizeAndChecksum = IBackup::SizeAndChecksum;
using FileInfo = IBackupCoordination::FileInfo;
String hexChecksum(UInt128 checksum)
{
@ -86,12 +82,11 @@ BackupImpl::BackupImpl(
std::shared_ptr<IBackupReader> reader_,
const ContextPtr & context_)
: backup_name_for_logging(backup_name_for_logging_)
, use_archive(!archive_params_.archive_name.empty())
, archive_params(archive_params_)
, use_archives(!archive_params.archive_name.empty())
, open_mode(OpenMode::READ)
, reader(std::move(reader_))
, is_internal_backup(false)
, coordination(std::make_shared<BackupCoordinationLocal>())
, version(INITIAL_BACKUP_VERSION)
, base_backup_info(base_backup_info_)
{
@ -110,8 +105,8 @@ BackupImpl::BackupImpl(
const std::optional<UUID> & backup_uuid_,
bool deduplicate_files_)
: backup_name_for_logging(backup_name_for_logging_)
, use_archive(!archive_params_.archive_name.empty())
, archive_params(archive_params_)
, use_archives(!archive_params.archive_name.empty())
, open_mode(OpenMode::WRITE)
, writer(std::move(writer_))
, is_internal_backup(is_internal_backup_)
@ -147,7 +142,7 @@ void BackupImpl::open(const ContextPtr & context)
timestamp = std::time(nullptr);
if (!uuid)
uuid = UUIDHelpers::generateV4();
lock_file_name = use_archives ? (archive_params.archive_name + ".lock") : ".lock";
lock_file_name = use_archive ? (archive_params.archive_name + ".lock") : ".lock";
writing_finalized = false;
/// Check that we can write a backup there and create the lock file to own this destination.
@ -157,6 +152,9 @@ void BackupImpl::open(const ContextPtr & context)
checkLockFile(true);
}
if (use_archive)
openArchive();
if (open_mode == OpenMode::READ)
readBackupMetadata();
@ -188,7 +186,7 @@ void BackupImpl::open(const ContextPtr & context)
void BackupImpl::close()
{
std::lock_guard lock{mutex};
closeArchives();
closeArchive();
if (!is_internal_backup && writer && !writing_finalized)
removeAllFilesAfterFailure();
@ -198,11 +196,33 @@ void BackupImpl::close()
coordination.reset();
}
void BackupImpl::closeArchives()
void BackupImpl::openArchive()
{
archive_readers.clear();
for (auto & archive_writer : archive_writers)
archive_writer = {"", nullptr};
if (!use_archive)
return;
const String & archive_name = archive_params.archive_name;
if (open_mode == OpenMode::READ)
{
if (!reader->fileExists(archive_name))
throw Exception(ErrorCodes::BACKUP_NOT_FOUND, "Backup {} not found", backup_name_for_logging);
size_t archive_size = reader->getFileSize(archive_name);
archive_reader = createArchiveReader(archive_name, [reader=reader, archive_name]{ return reader->readFile(archive_name); }, archive_size);
archive_reader->setPassword(archive_params.password);
}
else
{
archive_writer = createArchiveWriter(archive_name, writer->writeFile(archive_name));
archive_writer->setPassword(archive_params.password);
archive_writer->setCompression(archive_params.compression_method, archive_params.compression_level);
}
}
void BackupImpl::closeArchive()
{
archive_reader.reset();
archive_writer.reset();
}
size_t BackupImpl::getNumFiles() const
@ -260,8 +280,8 @@ void BackupImpl::writeBackupMetadata()
checkLockFile(true);
std::unique_ptr<WriteBuffer> out;
if (use_archives)
out = getArchiveWriter("")->writeFile(".backup");
if (use_archive)
out = archive_writer->writeFile(".backup");
else
out = writer->writeFile(".backup");
@ -271,7 +291,10 @@ void BackupImpl::writeBackupMetadata()
*out << "<timestamp>" << toString(LocalDateTime{timestamp}) << "</timestamp>";
*out << "<uuid>" << toString(*uuid) << "</uuid>";
auto all_file_infos = coordination->getAllFileInfos();
auto all_file_infos = coordination->getFileInfosForAllHosts();
if (all_file_infos.empty())
throw Exception(ErrorCodes::BACKUP_IS_EMPTY, "Backup must not be empty");
if (base_backup_info)
{
@ -316,10 +339,6 @@ void BackupImpl::writeBackupMetadata()
}
if (!info.data_file_name.empty() && (info.data_file_name != info.file_name))
*out << "<data_file>" << xml << info.data_file_name << "</data_file>";
if (!info.archive_suffix.empty())
*out << "<archive_suffix>" << xml << info.archive_suffix << "</archive_suffix>";
if (info.pos_in_archive != static_cast<size_t>(-1))
*out << "<pos_in_archive>" << info.pos_in_archive << "</pos_in_archive>";
}
total_size += info.size;
@ -347,12 +366,12 @@ void BackupImpl::readBackupMetadata()
using namespace XMLUtils;
std::unique_ptr<ReadBuffer> in;
if (use_archives)
if (use_archive)
{
if (!reader->fileExists(archive_params.archive_name))
throw Exception(ErrorCodes::BACKUP_NOT_FOUND, "Backup {} not found", backup_name_for_logging);
if (!archive_reader->fileExists(".backup"))
throw Exception(ErrorCodes::BACKUP_NOT_FOUND, "Archive {} is not a backup", backup_name_for_logging);
setCompressedSize();
in = getArchiveReader("")->readFile(".backup");
in = archive_reader->readFile(".backup");
}
else
{
@ -392,7 +411,7 @@ void BackupImpl::readBackupMetadata()
if (child->nodeName() == "file")
{
const Poco::XML::Node * file_config = child;
FileInfo info;
BackupFileInfo info;
info.file_name = getString(file_config, "name");
info.size = getUInt64(file_config, "size");
if (info.size)
@ -424,12 +443,12 @@ void BackupImpl::readBackupMetadata()
if (info.size > info.base_size)
{
info.data_file_name = getString(file_config, "data_file", info.file_name);
info.archive_suffix = getString(file_config, "archive_suffix", "");
info.pos_in_archive = getUInt64(file_config, "pos_in_archive", static_cast<UInt64>(-1));
}
}
coordination->addFileInfo(info);
file_names.emplace(info.file_name, std::pair{info.size, info.checksum});
if (info.size)
file_infos.try_emplace(std::pair{info.size, info.checksum}, info);
++num_files;
total_size += info.size;
@ -444,14 +463,14 @@ void BackupImpl::readBackupMetadata()
uncompressed_size = size_of_entries + str.size();
compressed_size = uncompressed_size;
if (!use_archives)
if (!use_archive)
setCompressedSize();
}
void BackupImpl::checkBackupDoesntExist() const
{
String file_name_to_check_existence;
if (use_archives)
if (use_archive)
file_name_to_check_existence = archive_params.archive_name;
else
file_name_to_check_existence = ".backup";
@ -512,69 +531,91 @@ void BackupImpl::removeLockFile()
Strings BackupImpl::listFiles(const String & directory, bool recursive) const
{
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
String prefix = removeLeadingSlash(directory);
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
String terminator = recursive ? "" : "/";
Strings elements;
std::lock_guard lock{mutex};
auto adjusted_dir = removeLeadingSlash(directory);
return coordination->listFiles(adjusted_dir, recursive);
for (auto it = file_names.lower_bound(prefix); it != file_names.end(); ++it)
{
const String & name = it->first;
if (!name.starts_with(prefix))
break;
size_t start_pos = prefix.length();
size_t end_pos = String::npos;
if (!terminator.empty())
end_pos = name.find(terminator, start_pos);
std::string_view new_element = std::string_view{name}.substr(start_pos, end_pos - start_pos);
if (!elements.empty() && (elements.back() == new_element))
continue;
elements.push_back(String{new_element});
}
return elements;
}
bool BackupImpl::hasFiles(const String & directory) const
{
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
String prefix = removeLeadingSlash(directory);
if (!prefix.empty() && !prefix.ends_with('/'))
prefix += '/';
std::lock_guard lock{mutex};
auto adjusted_dir = removeLeadingSlash(directory);
return coordination->hasFiles(adjusted_dir);
auto it = file_names.lower_bound(prefix);
if (it == file_names.end())
return false;
const String & name = it->first;
return name.starts_with(prefix);
}
bool BackupImpl::fileExists(const String & file_name) const
{
std::lock_guard lock{mutex};
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
auto adjusted_path = removeLeadingSlash(file_name);
return coordination->getFileInfo(adjusted_path).has_value();
std::lock_guard lock{mutex};
return file_names.contains(adjusted_path);
}
bool BackupImpl::fileExists(const SizeAndChecksum & size_and_checksum) const
{
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
std::lock_guard lock{mutex};
return coordination->getFileInfo(size_and_checksum).has_value();
return file_infos.contains(size_and_checksum);
}
UInt64 BackupImpl::getFileSize(const String & file_name) const
{
std::lock_guard lock{mutex};
auto adjusted_path = removeLeadingSlash(file_name);
auto info = coordination->getFileInfo(adjusted_path);
if (!info)
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
quoteString(file_name));
}
return info->size;
return getFileSizeAndChecksum(file_name).first;
}
UInt128 BackupImpl::getFileChecksum(const String & file_name) const
{
std::lock_guard lock{mutex};
auto adjusted_path = removeLeadingSlash(file_name);
auto info = coordination->getFileInfo(adjusted_path);
if (!info)
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
quoteString(file_name));
}
return info->checksum;
return getFileSizeAndChecksum(file_name).second;
}
SizeAndChecksum BackupImpl::getFileSizeAndChecksum(const String & file_name) const
{
std::lock_guard lock{mutex};
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
auto adjusted_path = removeLeadingSlash(file_name);
auto info = coordination->getFileInfo(adjusted_path);
if (!info)
std::lock_guard lock{mutex};
auto it = file_names.find(adjusted_path);
if (it == file_names.end())
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
@ -582,7 +623,8 @@ SizeAndChecksum BackupImpl::getFileSizeAndChecksum(const String & file_name) con
backup_name_for_logging,
quoteString(file_name));
}
return {info->size, info->checksum};
return it->second;
}
std::unique_ptr<SeekableReadBuffer> BackupImpl::readFile(const String & file_name) const
@ -603,37 +645,31 @@ std::unique_ptr<SeekableReadBuffer> BackupImpl::readFile(const SizeAndChecksum &
return std::make_unique<ReadBufferFromMemory>(static_cast<char *>(nullptr), 0);
}
auto info_opt = coordination->getFileInfo(size_and_checksum);
if (!info_opt)
BackupFileInfo info;
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
formatSizeAndChecksum(size_and_checksum));
std::lock_guard lock{mutex};
auto it = file_infos.find(size_and_checksum);
if (it == file_infos.end())
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
formatSizeAndChecksum(size_and_checksum));
}
info = it->second;
}
const auto & info = *info_opt;
std::unique_ptr<SeekableReadBuffer> read_buffer;
std::unique_ptr<SeekableReadBuffer> base_read_buffer;
if (info.size > info.base_size)
{
/// Make `read_buffer` if there is data for this backup entry in this backup.
if (use_archives)
{
std::shared_ptr<IArchiveReader> archive_reader;
{
std::lock_guard lock{mutex};
archive_reader = getArchiveReader(info.archive_suffix);
}
if (use_archive)
read_buffer = archive_reader->readFile(info.data_file_name);
}
else
{
read_buffer = reader->readFile(info.data_file_name);
}
}
if (info.base_size)
@ -709,21 +745,24 @@ size_t BackupImpl::copyFileToDisk(const SizeAndChecksum & size_and_checksum, Dis
return 0;
}
auto info_opt = coordination->getFileInfo(size_and_checksum);
if (!info_opt)
BackupFileInfo info;
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
formatSizeAndChecksum(size_and_checksum));
std::lock_guard lock{mutex};
auto it = file_infos.find(size_and_checksum);
if (it == file_infos.end())
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_NOT_FOUND,
"Backup {}: Entry {} not found in the backup",
backup_name_for_logging,
formatSizeAndChecksum(size_and_checksum));
}
info = it->second;
}
const auto & info = *info_opt;
bool file_copied = false;
if (info.size && !info.base_size && !use_archives)
if (info.size && !info.base_size && !use_archive)
{
/// Data comes completely from this backup.
reader->copyFileToDisk(info.data_file_name, info.size, destination_disk, destination_path, write_mode, write_settings);
@ -758,84 +797,7 @@ size_t BackupImpl::copyFileToDisk(const SizeAndChecksum & size_and_checksum, Dis
}
namespace
{
std::optional<SizeAndChecksum> getInfoAboutFileFromBaseBackupIfExists(std::shared_ptr<const IBackup> base_backup, const std::string & file_path)
{
if (base_backup && base_backup->fileExists(file_path))
return std::pair{base_backup->getFileSize(file_path), base_backup->getFileChecksum(file_path)};
return std::nullopt;
}
enum class CheckBackupResult
{
HasPrefix,
HasFull,
HasNothing,
};
CheckBackupResult checkBaseBackupForFile(const SizeAndChecksum & base_backup_info, const FileInfo & new_entry_info)
{
/// We cannot reuse base backup because our file is smaller
/// than file stored in previous backup
if (new_entry_info.size < base_backup_info.first)
return CheckBackupResult::HasNothing;
if (base_backup_info.first == new_entry_info.size)
return CheckBackupResult::HasFull;
return CheckBackupResult::HasPrefix;
}
struct ChecksumsForNewEntry
{
UInt128 full_checksum;
UInt128 prefix_checksum;
};
/// Calculate checksum for backup entry if it's empty.
/// Also able to calculate additional checksum of some prefix.
ChecksumsForNewEntry calculateNewEntryChecksumsIfNeeded(BackupEntryPtr entry, size_t prefix_size)
{
if (prefix_size > 0)
{
auto read_buffer = entry->getReadBuffer();
HashingReadBuffer hashing_read_buffer(*read_buffer);
hashing_read_buffer.ignore(prefix_size);
auto prefix_checksum = hashing_read_buffer.getHash();
if (entry->getChecksum() == std::nullopt)
{
hashing_read_buffer.ignoreAll();
auto full_checksum = hashing_read_buffer.getHash();
return ChecksumsForNewEntry{full_checksum, prefix_checksum};
}
else
{
return ChecksumsForNewEntry{*(entry->getChecksum()), prefix_checksum};
}
}
else
{
if (entry->getChecksum() == std::nullopt)
{
auto read_buffer = entry->getReadBuffer();
HashingReadBuffer hashing_read_buffer(*read_buffer);
hashing_read_buffer.ignoreAll();
return ChecksumsForNewEntry{hashing_read_buffer.getHash(), 0};
}
else
{
return ChecksumsForNewEntry{*(entry->getChecksum()), 0};
}
}
}
}
void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
void BackupImpl::writeFile(const BackupFileInfo & info, BackupEntryPtr entry)
{
if (open_mode != OpenMode::WRITE)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for writing");
@ -846,24 +808,6 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
std::string from_file_name = "memory buffer";
if (auto fname = entry->getFilePath(); !fname.empty())
from_file_name = "file " + fname;
LOG_TRACE(log, "Writing backup for file {} from {}", file_name, from_file_name);
auto adjusted_path = removeLeadingSlash(file_name);
if (coordination->getFileInfo(adjusted_path))
{
throw Exception(
ErrorCodes::BACKUP_ENTRY_ALREADY_EXISTS, "Backup {}: Entry {} already exists",
backup_name_for_logging, quoteString(file_name));
}
FileInfo info
{
.file_name = adjusted_path,
.size = entry->getSize(),
.base_size = 0,
.base_checksum = 0,
};
{
std::lock_guard lock{mutex};
@ -871,123 +815,29 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
total_size += info.size;
}
/// Empty file, nothing to backup
if (info.size == 0 && deduplicate_files)
if (info.data_file_name.empty())
{
coordination->addFileInfo(info);
LOG_TRACE(log, "Writing backup for file {} from {}: skipped, {}", info.data_file_name, from_file_name, !info.size ? "empty" : "base backup has it");
return;
}
std::optional<SizeAndChecksum> base_backup_file_info = getInfoAboutFileFromBaseBackupIfExists(base_backup, adjusted_path);
/// We have info about this file in base backup
/// If file has no checksum -- calculate and fill it.
if (base_backup_file_info.has_value())
if (!coordination->startWritingFile(info.data_file_index))
{
LOG_TRACE(log, "File {} found in base backup, checking for equality", adjusted_path);
CheckBackupResult check_base = checkBaseBackupForFile(*base_backup_file_info, info);
/// File with the same name but smaller size exist in previous backup
if (check_base == CheckBackupResult::HasPrefix)
{
auto checksums = calculateNewEntryChecksumsIfNeeded(entry, base_backup_file_info->first);
info.checksum = checksums.full_checksum;
/// We have prefix of this file in backup with the same checksum.
/// In ClickHouse this can happen for StorageLog for example.
if (checksums.prefix_checksum == base_backup_file_info->second)
{
LOG_TRACE(log, "File prefix of {} in base backup, will write rest part of file to current backup", adjusted_path);
info.base_size = base_backup_file_info->first;
info.base_checksum = base_backup_file_info->second;
}
else
{
LOG_TRACE(log, "Prefix checksum of file {} doesn't match with checksum in base backup", adjusted_path);
}
}
else
{
/// We have full file or have nothing, first of all let's get checksum
/// of current file
auto checksums = calculateNewEntryChecksumsIfNeeded(entry, 0);
info.checksum = checksums.full_checksum;
if (info.checksum == base_backup_file_info->second)
{
LOG_TRACE(log, "Found whole file {} in base backup", adjusted_path);
assert(check_base == CheckBackupResult::HasFull);
assert(info.size == base_backup_file_info->first);
info.base_size = base_backup_file_info->first;
info.base_checksum = base_backup_file_info->second;
/// Actually we can add this info to coordination and exist,
/// but we intentionally don't do it, otherwise control flow
/// of this function will be very complex.
}
else
{
LOG_TRACE(log, "Whole file {} in base backup doesn't match by checksum", adjusted_path);
}
}
}
else /// We don't have info about this file_name (sic!) in base backup,
/// however file could be renamed, so we will check one more time using size and checksum
{
LOG_TRACE(log, "Nothing found for file {} in base backup", adjusted_path);
auto checksums = calculateNewEntryChecksumsIfNeeded(entry, 0);
info.checksum = checksums.full_checksum;
}
/// Maybe we have a copy of this file in the backup already.
if (coordination->getFileInfo(std::pair{info.size, info.checksum}) && deduplicate_files)
{
LOG_TRACE(log, "File {} already exist in current backup, adding reference", adjusted_path);
coordination->addFileInfo(info);
LOG_TRACE(log, "Writing backup for file {} from {}: skipped, data file #{} is already being written", info.data_file_name, from_file_name, info.data_file_index);
return;
}
/// On the previous lines we checked that backup for file with adjusted_name exist in previous backup.
/// However file can be renamed, but has the same size and checksums, let's check for this case.
if (base_backup && base_backup->fileExists(std::pair{info.size, info.checksum}))
{
LOG_TRACE(log, "Writing backup for file {} from {}: data file #{}", info.data_file_name, from_file_name, info.data_file_index);
LOG_TRACE(log, "File {} doesn't exist in current backup, but we have file with same size and checksum", adjusted_path);
info.base_size = info.size;
info.base_checksum = info.checksum;
coordination->addFileInfo(info);
return;
}
/// All "short paths" failed. We don't have this file in previous or existing backup
/// or have only prefix of it in previous backup. Let's go long path.
info.data_file_name = info.file_name;
if (use_archives)
{
std::lock_guard lock{mutex};
info.archive_suffix = current_archive_suffix;
}
bool is_data_file_required;
coordination->addFileInfo(info, is_data_file_required);
if (!is_data_file_required && deduplicate_files)
{
LOG_TRACE(log, "File {} doesn't exist in current backup, but we have file with same size and checksum", adjusted_path);
return; /// We copy data only if it's a new combination of size & checksum.
}
auto writer_description = writer->getDataSourceDescription();
auto reader_description = entry->getDataSourceDescription();
/// We need to copy whole file without archive, we can do it faster
/// if source and destination are compatible
if (!use_archives && writer->supportNativeCopy(reader_description))
if (!use_archive && writer->supportNativeCopy(reader_description))
{
/// Should be much faster than writing data through server.
LOG_TRACE(log, "Will copy file {} using native copy", adjusted_path);
LOG_TRACE(log, "Will copy file {} using native copy", info.data_file_name);
/// NOTE: `mutex` must be unlocked here otherwise writing will be in one thread maximum and hence slow.
@ -995,8 +845,6 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
}
else
{
LOG_TRACE(log, "Will copy file {}", adjusted_path);
bool has_entries = false;
{
std::lock_guard lock{mutex};
@ -1005,30 +853,10 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
if (!has_entries)
checkLockFile(true);
if (use_archives)
if (use_archive)
{
LOG_TRACE(log, "Adding file {} to archive", adjusted_path);
/// An archive must be written strictly in one thread, so it's correct to lock the mutex for all the time we're writing the file
/// to the archive.
std::lock_guard lock{mutex};
String archive_suffix = current_archive_suffix;
bool next_suffix = false;
if (current_archive_suffix.empty() && is_internal_backup)
next_suffix = true;
/*if (archive_params.max_volume_size && current_archive_writer
&& (current_archive_writer->getTotalSize() + size - base_size > archive_params.max_volume_size))
next_suffix = true;*/
if (next_suffix)
current_archive_suffix = coordination->getNextArchiveSuffix();
if (info.archive_suffix != current_archive_suffix)
{
info.archive_suffix = current_archive_suffix;
coordination->updateFileInfo(info);
}
auto out = getArchiveWriter(current_archive_suffix)->writeFile(info.data_file_name);
LOG_TRACE(log, "Adding file {} to archive", info.data_file_name);
auto out = archive_writer->writeFile(info.data_file_name);
auto read_buffer = entry->getReadBuffer();
if (info.base_size != 0)
read_buffer->seek(info.base_size, SEEK_SET);
@ -1037,6 +865,7 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
}
else
{
LOG_TRACE(log, "Will copy file {}", info.data_file_name);
auto create_read_buffer = [entry] { return entry->getReadBuffer(); };
/// NOTE: `mutex` must be unlocked here otherwise writing will be in one thread maximum and hence slow.
@ -1062,14 +891,11 @@ void BackupImpl::finalizeWriting()
if (writing_finalized)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is already finalized");
if (!coordination->hasFiles(""))
throw Exception(ErrorCodes::BACKUP_IS_EMPTY, "Backup must not be empty");
if (!is_internal_backup)
{
LOG_TRACE(log, "Finalizing backup {}", backup_name_for_logging);
writeBackupMetadata();
closeArchives();
closeArchive();
setCompressedSize();
removeLockFile();
LOG_TRACE(log, "Finalized backup {}", backup_name_for_logging);
@ -1081,51 +907,13 @@ void BackupImpl::finalizeWriting()
void BackupImpl::setCompressedSize()
{
if (use_archives)
if (use_archive)
compressed_size = writer ? writer->getFileSize(archive_params.archive_name) : reader->getFileSize(archive_params.archive_name);
else
compressed_size = uncompressed_size;
}
String BackupImpl::getArchiveNameWithSuffix(const String & suffix) const
{
return archive_params.archive_name + (suffix.empty() ? "" : ".") + suffix;
}
std::shared_ptr<IArchiveReader> BackupImpl::getArchiveReader(const String & suffix) const
{
auto it = archive_readers.find(suffix);
if (it != archive_readers.end())
return it->second;
String archive_name_with_suffix = getArchiveNameWithSuffix(suffix);
size_t archive_size = reader->getFileSize(archive_name_with_suffix);
auto new_archive_reader = createArchiveReader(archive_params.archive_name, [reader=reader, archive_name_with_suffix]{ return reader->readFile(archive_name_with_suffix); },
archive_size);
new_archive_reader->setPassword(archive_params.password);
archive_readers.emplace(suffix, new_archive_reader);
return new_archive_reader;
}
std::shared_ptr<IArchiveWriter> BackupImpl::getArchiveWriter(const String & suffix)
{
for (const auto & archive_writer : archive_writers)
{
if ((suffix == archive_writer.first) && archive_writer.second)
return archive_writer.second;
}
String archive_name_with_suffix = getArchiveNameWithSuffix(suffix);
auto new_archive_writer = createArchiveWriter(archive_params.archive_name, writer->writeFile(archive_name_with_suffix));
new_archive_writer->setPassword(archive_params.password);
new_archive_writer->setCompression(archive_params.compression_method, archive_params.compression_level);
size_t pos = suffix.empty() ? 0 : 1;
archive_writers[pos] = {suffix, new_archive_writer};
return new_archive_writer;
}
void BackupImpl::removeAllFilesAfterFailure()
{
if (is_internal_backup)
@ -1136,19 +924,14 @@ void BackupImpl::removeAllFilesAfterFailure()
LOG_INFO(log, "Removing all files of backup {} after failure", backup_name_for_logging);
Strings files_to_remove;
if (use_archives)
if (use_archive)
{
files_to_remove.push_back(archive_params.archive_name);
for (const auto & suffix : coordination->getAllArchiveSuffixes())
{
String archive_name_with_suffix = getArchiveNameWithSuffix(suffix);
files_to_remove.push_back(std::move(archive_name_with_suffix));
}
}
else
{
files_to_remove.push_back(".backup");
for (const auto & file_info : coordination->getAllFileInfos())
for (const auto & file_info : coordination->getFileInfosForAllHosts())
files_to_remove.push_back(file_info.data_file_name);
}

View File

@ -3,8 +3,8 @@
#include <Backups/IBackup.h>
#include <Backups/IBackupCoordination.h>
#include <Backups/BackupInfo.h>
#include <map>
#include <mutex>
#include <unordered_map>
namespace DB
@ -58,6 +58,7 @@ public:
OpenMode getOpenMode() const override { return open_mode; }
time_t getTimestamp() const override { return timestamp; }
UUID getUUID() const override { return *uuid; }
BackupPtr getBaseBackup() const override { return base_backup; }
size_t getNumFiles() const override;
UInt64 getTotalSize() const override;
size_t getNumEntries() const override;
@ -79,21 +80,20 @@ public:
WriteMode write_mode, const WriteSettings & write_settings) const override;
size_t copyFileToDisk(const SizeAndChecksum & size_and_checksum, DiskPtr destination_disk, const String & destination_path,
WriteMode write_mode, const WriteSettings & write_settings) const override;
void writeFile(const String & file_name, BackupEntryPtr entry) override;
void writeFile(const BackupFileInfo & info, BackupEntryPtr entry) override;
void finalizeWriting() override;
bool supportsWritingInMultipleThreads() const override { return !use_archives; }
bool supportsWritingInMultipleThreads() const override { return !use_archive; }
private:
using FileInfo = IBackupCoordination::FileInfo;
class BackupEntryFromBackupImpl;
void open(const ContextPtr & context);
void close();
void closeArchives();
void openArchive();
void closeArchive();
/// Writes the file ".backup" containing backup's metadata.
void writeBackupMetadata();
void readBackupMetadata();
void writeBackupMetadata() TSA_REQUIRES(mutex);
void readBackupMetadata() TSA_REQUIRES(mutex);
/// Checks that a new backup doesn't exist yet.
void checkBackupDoesntExist() const;
@ -106,16 +106,12 @@ private:
void removeAllFilesAfterFailure();
String getArchiveNameWithSuffix(const String & suffix) const;
std::shared_ptr<IArchiveReader> getArchiveReader(const String & suffix) const;
std::shared_ptr<IArchiveWriter> getArchiveWriter(const String & suffix);
/// Calculates and sets `compressed_size`.
void setCompressedSize();
const String backup_name_for_logging;
const bool use_archive;
const ArchiveParams archive_params;
const bool use_archives;
const OpenMode open_mode;
std::shared_ptr<IBackupWriter> writer;
std::shared_ptr<IBackupReader> reader;
@ -123,6 +119,11 @@ private:
std::shared_ptr<IBackupCoordination> coordination;
mutable std::mutex mutex;
using SizeAndChecksum = std::pair<UInt64, UInt128>;
std::map<String /* file_name */, SizeAndChecksum> file_names TSA_GUARDED_BY(mutex); /// Should be ordered alphabetically, see listFiles(). For empty files we assume checksum = 0.
std::map<SizeAndChecksum, BackupFileInfo> file_infos TSA_GUARDED_BY(mutex); /// Information about files. Without empty files.
std::optional<UUID> uuid;
time_t timestamp = 0;
size_t num_files = 0;
@ -137,10 +138,10 @@ private:
std::optional<BackupInfo> base_backup_info;
std::shared_ptr<const IBackup> base_backup;
std::optional<UUID> base_backup_uuid;
mutable std::unordered_map<String /* archive_suffix */, std::shared_ptr<IArchiveReader>> archive_readers;
std::pair<String, std::shared_ptr<IArchiveWriter>> archive_writers[2];
String current_archive_suffix;
std::shared_ptr<IArchiveReader> archive_reader;
std::shared_ptr<IArchiveWriter> archive_writer;
String lock_file_name;
bool writing_finalized = false;
bool deduplicate_files = true;
const Poco::Logger * log;

View File

@ -20,10 +20,19 @@
#include <Common/Exception.h>
#include <Common/Macros.h>
#include <Common/logger_useful.h>
#include <Common/CurrentMetrics.h>
#include <Common/setThreadName.h>
#include <Common/scope_guard_safe.h>
namespace CurrentMetrics
{
extern const Metric BackupsThreads;
extern const Metric BackupsThreadsActive;
extern const Metric RestoreThreads;
extern const Metric RestoreThreadsActive;
}
namespace DB
{
@ -52,7 +61,7 @@ namespace
.keeper_max_retries = context->getSettingsRef().backup_keeper_max_retries,
.keeper_retry_initial_backoff_ms = context->getSettingsRef().backup_keeper_retry_initial_backoff_ms,
.keeper_retry_max_backoff_ms = context->getSettingsRef().backup_keeper_retry_max_backoff_ms,
.batch_size_for_keeper_multiread = context->getSettingsRef().backup_batch_size_for_keeper_multiread,
.keeper_value_max_size = context->getSettingsRef().backup_keeper_value_max_size,
};
auto all_hosts = BackupSettings::Util::filterHostIDs(
@ -65,11 +74,12 @@ namespace
toString(*backup_settings.backup_uuid),
all_hosts,
backup_settings.host_id,
!backup_settings.deduplicate_files,
backup_settings.internal);
}
else
{
return std::make_shared<BackupCoordinationLocal>();
return std::make_shared<BackupCoordinationLocal>(!backup_settings.deduplicate_files);
}
}
@ -152,8 +162,8 @@ namespace
BackupsWorker::BackupsWorker(size_t num_backup_threads, size_t num_restore_threads, bool allow_concurrent_backups_, bool allow_concurrent_restores_)
: backups_thread_pool(num_backup_threads, /* max_free_threads = */ 0, num_backup_threads)
, restores_thread_pool(num_restore_threads, /* max_free_threads = */ 0, num_restore_threads)
: backups_thread_pool(CurrentMetrics::BackupsThreads, CurrentMetrics::BackupsThreadsActive, num_backup_threads, /* max_free_threads = */ 0, num_backup_threads)
, restores_thread_pool(CurrentMetrics::RestoreThreads, CurrentMetrics::RestoreThreadsActive, num_restore_threads, /* max_free_threads = */ 0, num_restore_threads)
, log(&Poco::Logger::get("BackupsWorker"))
, allow_concurrent_backups(allow_concurrent_backups_)
, allow_concurrent_restores(allow_concurrent_restores_)
@ -350,7 +360,8 @@ void BackupsWorker::doBackup(
}
/// Write the backup entries to the backup.
writeBackupEntries(backup_id, backup, std::move(backup_entries), backups_thread_pool, backup_settings.internal);
buildFileInfosForBackupEntries(backup, backup_entries, backup_coordination);
writeBackupEntries(backup, std::move(backup_entries), backup_id, backup_coordination, backup_settings.internal);
/// We have written our backup entries, we need to tell other hosts (they could be waiting for it).
backup_coordination->setStage(Stage::COMPLETED, "");
@ -399,8 +410,31 @@ void BackupsWorker::doBackup(
}
void BackupsWorker::writeBackupEntries(const OperationID & backup_id, BackupMutablePtr backup, BackupEntries && backup_entries, ThreadPool & thread_pool, bool internal)
void BackupsWorker::buildFileInfosForBackupEntries(const BackupPtr & backup, const BackupEntries & backup_entries, std::shared_ptr<IBackupCoordination> backup_coordination)
{
LOG_TRACE(log, "{}", Stage::BUILDING_FILE_INFOS);
backup_coordination->setStage(Stage::BUILDING_FILE_INFOS, "");
backup_coordination->waitForStage(Stage::BUILDING_FILE_INFOS);
backup_coordination->addFileInfos(::DB::buildFileInfosForBackupEntries(backup_entries, backup->getBaseBackup(), backups_thread_pool));
}
void BackupsWorker::writeBackupEntries(BackupMutablePtr backup, BackupEntries && backup_entries, const OperationID & backup_id, std::shared_ptr<IBackupCoordination> backup_coordination, bool internal)
{
LOG_TRACE(log, "{}, num backup entries={}", Stage::WRITING_BACKUP, backup_entries.size());
backup_coordination->setStage(Stage::WRITING_BACKUP, "");
backup_coordination->waitForStage(Stage::WRITING_BACKUP);
auto file_infos = backup_coordination->getFileInfos();
if (file_infos.size() != backup_entries.size())
{
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Number of file infos ({}) doesn't match the number of backup entries ({})",
file_infos.size(),
backup_entries.size());
}
size_t num_active_jobs = 0;
std::mutex mutex;
std::condition_variable event;
@ -409,10 +443,10 @@ void BackupsWorker::writeBackupEntries(const OperationID & backup_id, BackupMuta
bool always_single_threaded = !backup->supportsWritingInMultipleThreads();
auto thread_group = CurrentThread::getGroup();
for (auto & name_and_entry : backup_entries)
for (size_t i = 0; i != backup_entries.size(); ++i)
{
auto & name = name_and_entry.first;
auto & entry = name_and_entry.second;
auto & entry = backup_entries[i].second;
const auto & file_info = file_infos[i];
{
std::unique_lock lock{mutex};
@ -445,7 +479,7 @@ void BackupsWorker::writeBackupEntries(const OperationID & backup_id, BackupMuta
return;
}
backup->writeFile(name, std::move(entry));
backup->writeFile(file_info, std::move(entry));
// Update metadata
if (!internal)
{
@ -468,7 +502,7 @@ void BackupsWorker::writeBackupEntries(const OperationID & backup_id, BackupMuta
}
};
if (always_single_threaded || !thread_pool.trySchedule([job] { job(true); }))
if (always_single_threaded || !backups_thread_pool.trySchedule([job] { job(true); }))
job(false);
}

View File

@ -105,8 +105,11 @@ private:
ContextMutablePtr mutable_context,
bool called_async);
/// Builds file infos for specified backup entries.
void buildFileInfosForBackupEntries(const BackupPtr & backup, const BackupEntries & backup_entries, std::shared_ptr<IBackupCoordination> backup_coordination);
/// Write backup entries to an opened backup.
void writeBackupEntries(const OperationID & backup_id, BackupMutablePtr backup, BackupEntries && backup_entries, ThreadPool & thread_pool, bool internal);
void writeBackupEntries(BackupMutablePtr backup, BackupEntries && backup_entries, const OperationID & backup_id, std::shared_ptr<IBackupCoordination> backup_coordination, bool internal);
OperationID startRestoring(const ASTPtr & query, ContextMutablePtr context);

View File

@ -10,8 +10,9 @@
namespace DB
{
class IBackupEntry;
class IDisk;
using BackupEntryPtr = std::shared_ptr<const IBackupEntry>;
struct BackupFileInfo;
class IDisk;
using DiskPtr = std::shared_ptr<IDisk>;
class SeekableReadBuffer;
@ -42,6 +43,9 @@ public:
/// Returns UUID of the backup.
virtual UUID getUUID() const = 0;
/// Returns the base backup (can be null).
virtual std::shared_ptr<const IBackup> getBaseBackup() const = 0;
/// Returns the number of files stored in the backup. Compare with getNumEntries().
virtual size_t getNumFiles() const = 0;
@ -111,7 +115,7 @@ public:
WriteMode write_mode = WriteMode::Rewrite, const WriteSettings & write_settings = {}) const = 0;
/// Puts a new entry to the backup.
virtual void writeFile(const String & file_name, BackupEntryPtr entry) = 0;
virtual void writeFile(const BackupFileInfo & file_info, BackupEntryPtr entry) = 0;
/// Finalizes writing the backup, should be called after all entries have been successfully written.
virtual void finalizeWriting() = 0;

View File

@ -1,14 +1,13 @@
#pragma once
#include <optional>
#include <fmt/format.h>
#include <base/hex.h>
#include <Core/Types.h>
namespace DB
{
class Exception;
struct BackupFileInfo;
using BackupFileInfos = std::vector<BackupFileInfo>;
enum class AccessEntityType;
enum class UserDefinedSQLObjectType;
@ -73,70 +72,14 @@ public:
virtual void addReplicatedSQLObjectsDir(const String & loader_zk_path, UserDefinedSQLObjectType object_type, const String & dir_path) = 0;
virtual Strings getReplicatedSQLObjectsDirs(const String & loader_zk_path, UserDefinedSQLObjectType object_type) const = 0;
struct FileInfo
{
String file_name;
UInt64 size = 0;
UInt128 checksum{0};
/// for incremental backups
UInt64 base_size = 0;
UInt128 base_checksum{0};
/// Name of the data file.
String data_file_name;
/// Suffix of an archive if the backup is stored as a series of archives.
String archive_suffix;
/// Position in the archive.
UInt64 pos_in_archive = static_cast<UInt64>(-1);
/// Note: this format doesn't allow to parse data back
/// It is useful only for debugging purposes
[[ maybe_unused ]] String describe()
{
String result;
result += fmt::format("file_name: {};\n", file_name);
result += fmt::format("size: {};\n", size);
result += fmt::format("checksum: {};\n", getHexUIntLowercase(checksum));
result += fmt::format("base_size: {};\n", base_size);
result += fmt::format("base_checksum: {};\n", getHexUIntLowercase(checksum));
result += fmt::format("data_file_name: {};\n", data_file_name);
result += fmt::format("archive_suffix: {};\n", archive_suffix);
result += fmt::format("pos_in_archive: {};\n", pos_in_archive);
return result;
}
};
/// Adds file information.
/// If specified checksum+size are new for this IBackupContentsInfo the function sets `is_data_file_required`.
virtual void addFileInfo(const FileInfo & file_info, bool & is_data_file_required) = 0;
virtual void addFileInfos(BackupFileInfos && file_infos) = 0;
virtual BackupFileInfos getFileInfos() const = 0;
virtual BackupFileInfos getFileInfosForAllHosts() const = 0;
void addFileInfo(const FileInfo & file_info)
{
bool is_data_file_required;
addFileInfo(file_info, is_data_file_required);
}
/// Updates some fields (currently only `archive_suffix`) of a stored file's information.
virtual void updateFileInfo(const FileInfo & file_info) = 0;
virtual std::vector<FileInfo> getAllFileInfos() const = 0;
virtual Strings listFiles(const String & directory, bool recursive) const = 0;
virtual bool hasFiles(const String & directory) const = 0;
using SizeAndChecksum = std::pair<UInt64, UInt128>;
virtual std::optional<FileInfo> getFileInfo(const String & file_name) const = 0;
virtual std::optional<FileInfo> getFileInfo(const SizeAndChecksum & size_and_checksum) const = 0;
/// Generates a new archive suffix, e.g. "001", "002", "003", ...
virtual String getNextArchiveSuffix() = 0;
/// Returns the list of all the archive suffixes which were generated.
virtual Strings getAllArchiveSuffixes() const = 0;
/// Starts writing a specified file, the function returns false if that file is already being written concurrently.
virtual bool startWritingFile(size_t data_file_index) = 0;
/// This function is used to check if concurrent backups are running
/// other than the backup passed to the function

View File

@ -861,7 +861,7 @@ void ClientBase::processOrdinaryQuery(const String & query_to_execute, ASTPtr pa
}
const auto & settings = global_context->getSettingsRef();
const Int32 signals_before_stop = settings.stop_reading_on_first_cancel ? 2 : 1;
const Int32 signals_before_stop = settings.partial_result_on_first_cancel ? 2 : 1;
int retries_left = 10;
while (retries_left)
@ -885,7 +885,7 @@ void ClientBase::processOrdinaryQuery(const String & query_to_execute, ASTPtr pa
if (send_external_tables)
sendExternalTables(parsed_query);
receiveResult(parsed_query, signals_before_stop, settings.stop_reading_on_first_cancel);
receiveResult(parsed_query, signals_before_stop, settings.partial_result_on_first_cancel);
break;
}
@ -910,7 +910,7 @@ void ClientBase::processOrdinaryQuery(const String & query_to_execute, ASTPtr pa
/// Receives and processes packets coming from server.
/// Also checks if query execution should be cancelled.
void ClientBase::receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, bool stop_reading_on_first_cancel)
void ClientBase::receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, bool partial_result_on_first_cancel)
{
// TODO: get the poll_interval from commandline.
const auto receive_timeout = connection_parameters.timeouts.receive_timeout;
@ -934,11 +934,11 @@ void ClientBase::receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, b
/// to avoid losing sync.
if (!cancelled)
{
if (stop_reading_on_first_cancel && QueryInterruptHandler::cancelled_status() == signals_before_stop - 1)
if (partial_result_on_first_cancel && QueryInterruptHandler::cancelled_status() == signals_before_stop - 1)
{
connection->sendCancel();
/// First cancel reading request was sent. Next requests will only be with a full cancel
stop_reading_on_first_cancel = false;
partial_result_on_first_cancel = false;
}
else if (QueryInterruptHandler::cancelled())
{

View File

@ -131,7 +131,7 @@ protected:
private:
void receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, bool stop_reading_on_first_cancel);
void receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, bool partial_result_on_first_cancel);
bool receiveAndProcessPacket(ASTPtr parsed_query, bool cancelled_);
void receiveLogsAndProfileEvents(ASTPtr parsed_query);
bool receiveSampleBlock(Block & out, ColumnsDescription & columns_description, ASTPtr parsed_query);

View File

@ -107,8 +107,9 @@ Field QueryFuzzer::fuzzField(Field field)
type_index = 1;
}
else if (type == Field::Types::Decimal32
|| type == Field::Types::Decimal64
|| type == Field::Types::Decimal128)
|| type == Field::Types::Decimal64
|| type == Field::Types::Decimal128
|| type == Field::Types::Decimal256)
{
type_index = 2;
}

View File

@ -5,6 +5,7 @@
#include <IO/WriteBufferFromString.h>
#include <IO/copyData.h>
#include <algorithm>
#include <stdexcept>
#include <chrono>
#include <cerrno>

View File

@ -72,6 +72,64 @@
M(GlobalThreadActive, "Number of threads in global thread pool running a task.") \
M(LocalThread, "Number of threads in local thread pools. The threads in local thread pools are taken from the global thread pool.") \
M(LocalThreadActive, "Number of threads in local thread pools running a task.") \
M(MergeTreeDataSelectExecutorThreads, "Number of threads in the MergeTreeDataSelectExecutor thread pool.") \
M(MergeTreeDataSelectExecutorThreadsActive, "Number of threads in the MergeTreeDataSelectExecutor thread pool running a task.") \
M(BackupsThreads, "Number of threads in the thread pool for BACKUP.") \
M(BackupsThreadsActive, "Number of threads in thread pool for BACKUP running a task.") \
M(RestoreThreads, "Number of threads in the thread pool for RESTORE.") \
M(RestoreThreadsActive, "Number of threads in the thread pool for RESTORE running a task.") \
M(IOThreads, "Number of threads in the IO thread pool.") \
M(IOThreadsActive, "Number of threads in the IO thread pool running a task.") \
M(ThreadPoolRemoteFSReaderThreads, "Number of threads in the thread pool for remote_filesystem_read_method=threadpool.") \
M(ThreadPoolRemoteFSReaderThreadsActive, "Number of threads in the thread pool for remote_filesystem_read_method=threadpool running a task.") \
M(ThreadPoolFSReaderThreads, "Number of threads in the thread pool for local_filesystem_read_method=threadpool.") \
M(ThreadPoolFSReaderThreadsActive, "Number of threads in the thread pool for local_filesystem_read_method=threadpool running a task.") \
M(BackupsIOThreads, "Number of threads in the BackupsIO thread pool.") \
M(BackupsIOThreadsActive, "Number of threads in the BackupsIO thread pool running a task.") \
M(DiskObjectStorageAsyncThreads, "Number of threads in the async thread pool for DiskObjectStorage.") \
M(DiskObjectStorageAsyncThreadsActive, "Number of threads in the async thread pool for DiskObjectStorage running a task.") \
M(StorageHiveThreads, "Number of threads in the StorageHive thread pool.") \
M(StorageHiveThreadsActive, "Number of threads in the StorageHive thread pool running a task.") \
M(TablesLoaderThreads, "Number of threads in the tables loader thread pool.") \
M(TablesLoaderThreadsActive, "Number of threads in the tables loader thread pool running a task.") \
M(DatabaseOrdinaryThreads, "Number of threads in the Ordinary database thread pool.") \
M(DatabaseOrdinaryThreadsActive, "Number of threads in the Ordinary database thread pool running a task.") \
M(DatabaseOnDiskThreads, "Number of threads in the DatabaseOnDisk thread pool.") \
M(DatabaseOnDiskThreadsActive, "Number of threads in the DatabaseOnDisk thread pool running a task.") \
M(DatabaseCatalogThreads, "Number of threads in the DatabaseCatalog thread pool.") \
M(DatabaseCatalogThreadsActive, "Number of threads in the DatabaseCatalog thread pool running a task.") \
M(DestroyAggregatesThreads, "Number of threads in the thread pool for destroy aggregate states.") \
M(DestroyAggregatesThreadsActive, "Number of threads in the thread pool for destroy aggregate states running a task.") \
M(HashedDictionaryThreads, "Number of threads in the HashedDictionary thread pool.") \
M(HashedDictionaryThreadsActive, "Number of threads in the HashedDictionary thread pool running a task.") \
M(CacheDictionaryThreads, "Number of threads in the CacheDictionary thread pool.") \
M(CacheDictionaryThreadsActive, "Number of threads in the CacheDictionary thread pool running a task.") \
M(ParallelFormattingOutputFormatThreads, "Number of threads in the ParallelFormattingOutputFormatThreads thread pool.") \
M(ParallelFormattingOutputFormatThreadsActive, "Number of threads in the ParallelFormattingOutputFormatThreads thread pool running a task.") \
M(ParallelParsingInputFormatThreads, "Number of threads in the ParallelParsingInputFormat thread pool.") \
M(ParallelParsingInputFormatThreadsActive, "Number of threads in the ParallelParsingInputFormat thread pool running a task.") \
M(MergeTreeBackgroundExecutorThreads, "Number of threads in the MergeTreeBackgroundExecutor thread pool.") \
M(MergeTreeBackgroundExecutorThreadsActive, "Number of threads in the MergeTreeBackgroundExecutor thread pool running a task.") \
M(AsynchronousInsertThreads, "Number of threads in the AsynchronousInsert thread pool.") \
M(AsynchronousInsertThreadsActive, "Number of threads in the AsynchronousInsert thread pool running a task.") \
M(StartupSystemTablesThreads, "Number of threads in the StartupSystemTables thread pool.") \
M(StartupSystemTablesThreadsActive, "Number of threads in the StartupSystemTables thread pool running a task.") \
M(AggregatorThreads, "Number of threads in the Aggregator thread pool.") \
M(AggregatorThreadsActive, "Number of threads in the Aggregator thread pool running a task.") \
M(DDLWorkerThreads, "Number of threads in the DDLWorker thread pool for ON CLUSTER queries.") \
M(DDLWorkerThreadsActive, "Number of threads in the DDLWORKER thread pool for ON CLUSTER queries running a task.") \
M(StorageDistributedThreads, "Number of threads in the StorageDistributed thread pool.") \
M(StorageDistributedThreadsActive, "Number of threads in the StorageDistributed thread pool running a task.") \
M(StorageS3Threads, "Number of threads in the StorageS3 thread pool.") \
M(StorageS3ThreadsActive, "Number of threads in the StorageS3 thread pool running a task.") \
M(MergeTreePartsLoaderThreads, "Number of threads in the MergeTree parts loader thread pool.") \
M(MergeTreePartsLoaderThreadsActive, "Number of threads in the MergeTree parts loader thread pool running a task.") \
M(MergeTreePartsCleanerThreads, "Number of threads in the MergeTree parts cleaner thread pool.") \
M(MergeTreePartsCleanerThreadsActive, "Number of threads in the MergeTree parts cleaner thread pool running a task.") \
M(SystemReplicasThreads, "Number of threads in the system.replicas thread pool.") \
M(SystemReplicasThreadsActive, "Number of threads in the system.replicas thread pool running a task.") \
M(RestartReplicaThreads, "Number of threads in the RESTART REPLICA thread pool.") \
M(RestartReplicaThreadsActive, "Number of threads in the RESTART REPLICA thread pool running a task.") \
M(DistributedFilesToInsert, "Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.") \
M(BrokenDistributedFilesToInsert, "Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.") \
M(TablesToDropQueueSize, "Number of dropped tables, that are waiting for background data removal.") \

View File

@ -4,6 +4,7 @@
#include <cstdint>
#include <utility>
#include <atomic>
#include <cassert>
#include <base/types.h>
/** Allows to count number of simultaneously happening processes or current value of some metric.
@ -73,7 +74,10 @@ namespace CurrentMetrics
public:
explicit Increment(Metric metric, Value amount_ = 1)
: Increment(&values[metric], amount_) {}
: Increment(&values[metric], amount_)
{
assert(metric < CurrentMetrics::end());
}
~Increment()
{

View File

@ -25,6 +25,13 @@ void CurrentThread::updatePerformanceCounters()
current_thread->updatePerformanceCounters();
}
void CurrentThread::updatePerformanceCountersIfNeeded()
{
if (unlikely(!current_thread))
return;
current_thread->updatePerformanceCountersIfNeeded();
}
bool CurrentThread::isInitialized()
{
return current_thread;

View File

@ -53,6 +53,7 @@ public:
/// Makes system calls to update ProfileEvents that contain info from rusage and taskstats
static void updatePerformanceCounters();
static void updatePerformanceCountersIfNeeded();
static ProfileEvents::Counters & getProfileEvents();
inline ALWAYS_INLINE static MemoryTracker * getMemoryTracker()

View File

@ -649,6 +649,7 @@
M(678, IO_URING_INIT_FAILED) \
M(679, IO_URING_SUBMIT_ERROR) \
M(690, MIXED_ACCESS_PARAMETER_TYPES) \
M(691, UNKNOWN_ELEMENT_OF_ENUM) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -68,7 +68,8 @@ bool Span::addAttribute(const Exception & e) noexcept
if (!this->isTraceEnabled())
return false;
return addAttributeImpl("clickhouse.exception", getExceptionMessage(e, false));
return addAttributeImpl("clickhouse.exception", getExceptionMessage(e, false))
&& addAttributeImpl("clickhouse.exception_code", toString(e.code()));
}
bool Span::addAttribute(std::exception_ptr e) noexcept
@ -79,6 +80,15 @@ bool Span::addAttribute(std::exception_ptr e) noexcept
return addAttributeImpl("clickhouse.exception", getExceptionMessage(e, false));
}
bool Span::addAttribute(const ExecutionStatus & e) noexcept
{
if (!this->isTraceEnabled())
return false;
return addAttributeImpl("clickhouse.exception", e.message)
&& addAttributeImpl("clickhouse.exception_code", toString(e.code));
}
bool Span::addAttributeImpl(std::string_view name, std::string_view value) noexcept
{
try

View File

@ -9,6 +9,7 @@ struct Settings;
class OpenTelemetrySpanLog;
class WriteBuffer;
class ReadBuffer;
struct ExecutionStatus;
namespace OpenTelemetry
{
@ -57,6 +58,7 @@ struct Span
bool addAttribute(std::string_view name, std::function<String()> value_supplier) noexcept;
bool addAttribute(const Exception & e) noexcept;
bool addAttribute(std::exception_ptr e) noexcept;
bool addAttribute(const ExecutionStatus & e) noexcept;
bool isTraceEnabled() const
{

View File

@ -11,6 +11,7 @@
#include <Poco/Util/Application.h>
#include <Poco/Util/LayeredConfiguration.h>
#include <base/demangle.h>
namespace DB
{
@ -25,27 +26,37 @@ namespace CurrentMetrics
{
extern const Metric GlobalThread;
extern const Metric GlobalThreadActive;
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
static constexpr auto DEFAULT_THREAD_NAME = "ThreadPool";
template <typename Thread>
ThreadPoolImpl<Thread>::ThreadPoolImpl()
: ThreadPoolImpl(getNumberOfPhysicalCPUCores())
ThreadPoolImpl<Thread>::ThreadPoolImpl(Metric metric_threads_, Metric metric_active_threads_)
: ThreadPoolImpl(metric_threads_, metric_active_threads_, getNumberOfPhysicalCPUCores())
{
}
template <typename Thread>
ThreadPoolImpl<Thread>::ThreadPoolImpl(size_t max_threads_)
: ThreadPoolImpl(max_threads_, max_threads_, max_threads_)
ThreadPoolImpl<Thread>::ThreadPoolImpl(
Metric metric_threads_,
Metric metric_active_threads_,
size_t max_threads_)
: ThreadPoolImpl(metric_threads_, metric_active_threads_, max_threads_, max_threads_, max_threads_)
{
}
template <typename Thread>
ThreadPoolImpl<Thread>::ThreadPoolImpl(size_t max_threads_, size_t max_free_threads_, size_t queue_size_, bool shutdown_on_exception_)
: max_threads(max_threads_)
ThreadPoolImpl<Thread>::ThreadPoolImpl(
Metric metric_threads_,
Metric metric_active_threads_,
size_t max_threads_,
size_t max_free_threads_,
size_t queue_size_,
bool shutdown_on_exception_)
: metric_threads(metric_threads_)
, metric_active_threads(metric_active_threads_)
, max_threads(max_threads_)
, max_free_threads(std::min(max_free_threads_, max_threads))
, queue_size(queue_size_ ? std::max(queue_size_, max_threads) : 0 /* zero means the queue is unlimited */)
, shutdown_on_exception(shutdown_on_exception_)
@ -322,8 +333,7 @@ template <typename Thread>
void ThreadPoolImpl<Thread>::worker(typename std::list<Thread>::iterator thread_it)
{
DENY_ALLOCATIONS_IN_SCOPE;
CurrentMetrics::Increment metric_all_threads(
std::is_same_v<Thread, std::thread> ? CurrentMetrics::GlobalThread : CurrentMetrics::LocalThread);
CurrentMetrics::Increment metric_pool_threads(metric_threads);
/// Remove this thread from `threads` and detach it, that must be done before exiting from this worker.
/// We can't wrap the following lambda function into `SCOPE_EXIT` because it requires `mutex` to be locked.
@ -342,7 +352,7 @@ void ThreadPoolImpl<Thread>::worker(typename std::list<Thread>::iterator thread_
while (true)
{
/// This is inside the loop to also reset previous thread names set inside the jobs.
setThreadName("ThreadPool");
setThreadName(DEFAULT_THREAD_NAME);
/// A copy of parent trace context
DB::OpenTelemetry::TracingContextOnThread parent_thead_trace_context;
@ -381,18 +391,24 @@ void ThreadPoolImpl<Thread>::worker(typename std::list<Thread>::iterator thread_
try
{
CurrentMetrics::Increment metric_active_threads(
std::is_same_v<Thread, std::thread> ? CurrentMetrics::GlobalThreadActive : CurrentMetrics::LocalThreadActive);
CurrentMetrics::Increment metric_active_pool_threads(metric_active_threads);
job();
if (thread_trace_context.root_span.isTraceEnabled())
{
/// Use the thread name as operation name so that the tracing log will be more clear.
/// The thread name is usually set in the jobs, we can only get the name after the job finishes
/// The thread name is usually set in jobs, we can only get the name after the job finishes
std::string thread_name = getThreadName();
if (!thread_name.empty())
if (!thread_name.empty() && thread_name != DEFAULT_THREAD_NAME)
{
thread_trace_context.root_span.operation_name = thread_name;
}
else
{
/// If the thread name is not set, use the type name of the job instead
thread_trace_context.root_span.operation_name = demangle(job.target_type().name());
}
}
/// job should be reset before decrementing scheduled_jobs to
@ -449,6 +465,22 @@ template class ThreadFromGlobalPoolImpl<true>;
std::unique_ptr<GlobalThreadPool> GlobalThreadPool::the_instance;
GlobalThreadPool::GlobalThreadPool(
size_t max_threads_,
size_t max_free_threads_,
size_t queue_size_,
const bool shutdown_on_exception_)
: FreeThreadPool(
CurrentMetrics::GlobalThread,
CurrentMetrics::GlobalThreadActive,
max_threads_,
max_free_threads_,
queue_size_,
shutdown_on_exception_)
{
}
void GlobalThreadPool::initialize(size_t max_threads, size_t max_free_threads, size_t queue_size)
{
if (the_instance)

View File

@ -16,6 +16,7 @@
#include <Poco/Event.h>
#include <Common/ThreadStatus.h>
#include <Common/OpenTelemetryTraceContext.h>
#include <Common/CurrentMetrics.h>
#include <base/scope_guard.h>
/** Very simple thread pool similar to boost::threadpool.
@ -33,15 +34,25 @@ class ThreadPoolImpl
{
public:
using Job = std::function<void()>;
using Metric = CurrentMetrics::Metric;
/// Maximum number of threads is based on the number of physical cores.
ThreadPoolImpl();
ThreadPoolImpl(Metric metric_threads_, Metric metric_active_threads_);
/// Size is constant. Up to num_threads are created on demand and then run until shutdown.
explicit ThreadPoolImpl(size_t max_threads_);
explicit ThreadPoolImpl(
Metric metric_threads_,
Metric metric_active_threads_,
size_t max_threads_);
/// queue_size - maximum number of running plus scheduled jobs. It can be greater than max_threads. Zero means unlimited.
ThreadPoolImpl(size_t max_threads_, size_t max_free_threads_, size_t queue_size_, bool shutdown_on_exception_ = true);
ThreadPoolImpl(
Metric metric_threads_,
Metric metric_active_threads_,
size_t max_threads_,
size_t max_free_threads_,
size_t queue_size_,
bool shutdown_on_exception_ = true);
/// Add new job. Locks until number of scheduled jobs is less than maximum or exception in one of threads was thrown.
/// If any thread was throw an exception, first exception will be rethrown from this method,
@ -96,6 +107,9 @@ private:
std::condition_variable job_finished;
std::condition_variable new_job_or_shutdown;
Metric metric_threads;
Metric metric_active_threads;
size_t max_threads;
size_t max_free_threads;
size_t queue_size;
@ -159,12 +173,11 @@ class GlobalThreadPool : public FreeThreadPool, private boost::noncopyable
{
static std::unique_ptr<GlobalThreadPool> the_instance;
GlobalThreadPool(size_t max_threads_, size_t max_free_threads_,
size_t queue_size_, const bool shutdown_on_exception_)
: FreeThreadPool(max_threads_, max_free_threads_, queue_size_,
shutdown_on_exception_)
{
}
GlobalThreadPool(
size_t max_threads_,
size_t max_free_threads_,
size_t queue_size_,
bool shutdown_on_exception_);
public:
static void initialize(size_t max_threads = 10000, size_t max_free_threads = 1000, size_t queue_size = 10000);

View File

@ -217,6 +217,20 @@ void ThreadStatus::updatePerformanceCounters()
}
}
void ThreadStatus::updatePerformanceCountersIfNeeded()
{
if (last_rusage->thread_id == 0)
return; // Performance counters are not initialized, so there is no need to update them
constexpr UInt64 performance_counters_update_period_microseconds = 10 * 1000; // 10 milliseconds
UInt64 total_elapsed_microseconds = stopwatch.elapsedMicroseconds();
if (last_performance_counters_update_time + performance_counters_update_period_microseconds < total_elapsed_microseconds)
{
updatePerformanceCounters();
last_performance_counters_update_time = total_elapsed_microseconds;
}
}
void ThreadStatus::onFatalError()
{
if (fatal_error_callback)

View File

@ -5,6 +5,7 @@
#include <IO/Progress.h>
#include <Common/MemoryTracker.h>
#include <Common/ProfileEvents.h>
#include <Common/Stopwatch.h>
#include <base/StringRef.h>
#include <boost/noncopyable.hpp>
@ -202,6 +203,8 @@ private:
/// Use ptr not to add extra dependencies in the header
std::unique_ptr<RUsageCounters> last_rusage;
std::unique_ptr<TasksStatsCounters> taskstats;
Stopwatch stopwatch{CLOCK_MONOTONIC_COARSE};
UInt64 last_performance_counters_update_time = 0;
/// See setInternalThread()
bool internal_thread = false;
@ -265,6 +268,7 @@ public:
/// Update several ProfileEvents counters
void updatePerformanceCounters();
void updatePerformanceCountersIfNeeded();
/// Update ProfileEvents and dumps info to system.query_thread_log
void finalizePerformanceCounters();

View File

@ -17,6 +17,7 @@
#include <Common/Stopwatch.h>
#include <Common/ThreadPool.h>
#include <Common/CurrentMetrics.h>
using Key = UInt64;
@ -28,6 +29,12 @@ using Map = HashMap<Key, Value>;
using MapTwoLevel = TwoLevelHashMap<Key, Value>;
namespace CurrentMetrics
{
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
struct SmallLock
{
std::atomic<int> locked {false};
@ -247,7 +254,7 @@ int main(int argc, char ** argv)
std::cerr << std::fixed << std::setprecision(2);
ThreadPool pool(num_threads);
ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads);
Source data(n);

View File

@ -17,6 +17,7 @@
#include <Common/Stopwatch.h>
#include <Common/ThreadPool.h>
#include <Common/CurrentMetrics.h>
using Key = UInt64;
@ -24,6 +25,12 @@ using Value = UInt64;
using Source = std::vector<Key>;
namespace CurrentMetrics
{
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
template <typename Map>
struct AggregateIndependent
{
@ -274,7 +281,7 @@ int main(int argc, char ** argv)
std::cerr << std::fixed << std::setprecision(2);
ThreadPool pool(num_threads);
ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads);
Source data(n);

View File

@ -6,6 +6,7 @@
#include <Common/Stopwatch.h>
#include <Common/Exception.h>
#include <Common/ThreadPool.h>
#include <Common/CurrentMetrics.h>
int value = 0;
@ -14,6 +15,12 @@ static void f() { ++value; }
static void * g(void *) { f(); return {}; }
namespace CurrentMetrics
{
extern const Metric LocalThread;
extern const Metric LocalThreadActive;
}
namespace DB
{
namespace ErrorCodes
@ -65,7 +72,7 @@ int main(int argc, char ** argv)
test(n, "Create and destroy ThreadPool each iteration", []
{
ThreadPool tp(1);
ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 1);
tp.scheduleOrThrowOnError(f);
tp.wait();
});
@ -86,7 +93,7 @@ int main(int argc, char ** argv)
});
{
ThreadPool tp(1);
ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 1);
test(n, "Schedule job for Threadpool each iteration", [&tp]
{
@ -96,7 +103,7 @@ int main(int argc, char ** argv)
}
{
ThreadPool tp(128);
ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 128);
test(n, "Schedule job for Threadpool with 128 threads each iteration", [&tp]
{

Some files were not shown because too many files have changed in this diff Show More