Merge branch 'master' into decimal-asan

This commit is contained in:
Alexey Milovidov 2022-10-21 01:28:34 +02:00 committed by GitHub
commit a7d2fd2cc5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
120 changed files with 3052 additions and 461 deletions

View File

@ -1,6 +1,6 @@
### Table of Contents
**[ClickHouse release v22.9, 2022-09-22](#229)**<br/>
**[ClickHouse release v22.8, 2022-08-18](#228)**<br/>
**[ClickHouse release v22.8-lts, 2022-08-18](#228)**<br/>
**[ClickHouse release v22.7, 2022-07-21](#227)**<br/>
**[ClickHouse release v22.6, 2022-06-16](#226)**<br/>
**[ClickHouse release v22.5, 2022-05-19](#225)**<br/>
@ -10,10 +10,10 @@
**[ClickHouse release v22.1, 2022-01-18](#221)**<br/>
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br/>
### <a id="229"></a> ClickHouse release 22.9, 2022-09-22
#### Backward Incompatible Change
* Upgrade from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any `ReplicatedMergeTree` tables, otherwise server with the new version will not start. [#40641](https://github.com/ClickHouse/ClickHouse/pull/40641) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Remove the functions `accurate_Cast` and `accurate_CastOrNull` (they are different to `accurateCast` and `accurateCastOrNull` by underscore in the name and they are not affected by the value of `cast_keep_nullable` setting). These functions were undocumented, untested, unused, and unneeded. They appeared to be alive due to code generalization. [#40682](https://github.com/ClickHouse/ClickHouse/pull/40682) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to ensure that every new table function will be documented. See [#40649](https://github.com/ClickHouse/ClickHouse/issues/40649). Rename table function `MeiliSearch` to `meilisearch`. [#40709](https://github.com/ClickHouse/ClickHouse/pull/40709) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -21,6 +21,7 @@
* Make interpretation of YAML configs to be more conventional. [#41044](https://github.com/ClickHouse/ClickHouse/pull/41044) ([Vitaly Baranov](https://github.com/vitlibar)).
#### New Feature
* Support `insert_quorum = 'auto'` to use majority number. [#39970](https://github.com/ClickHouse/ClickHouse/pull/39970) ([Sachin](https://github.com/SachinSetiya)).
* Add embedded dashboards to ClickHouse server. This is a demo project about how to achieve 90% results with 1% effort using ClickHouse features. [#40461](https://github.com/ClickHouse/ClickHouse/pull/40461) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added new settings constraint writability kind `changeable_in_readonly`. [#40631](https://github.com/ClickHouse/ClickHouse/pull/40631) ([Sergei Trifonov](https://github.com/serxa)).
@ -38,6 +39,7 @@
* Improvement for in-memory data parts: remove completely processed WAL files. [#40592](https://github.com/ClickHouse/ClickHouse/pull/40592) ([Azat Khuzhin](https://github.com/azat)).
#### Performance Improvement
* Implement compression of marks and primary key. Close [#34437](https://github.com/ClickHouse/ClickHouse/issues/34437). [#37693](https://github.com/ClickHouse/ClickHouse/pull/37693) ([zhongyuankai](https://github.com/zhongyuankai)).
* Allow to load marks with threadpool in advance. Regulated by setting `load_marks_asynchronously` (default: 0). [#40821](https://github.com/ClickHouse/ClickHouse/pull/40821) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Virtual filesystem over s3 will use random object names split into multiple path prefixes for better performance on AWS. [#40968](https://github.com/ClickHouse/ClickHouse/pull/40968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -58,6 +60,7 @@
* Parallel hash JOIN for Float data types might be suboptimal. Make it better. [#41183](https://github.com/ClickHouse/ClickHouse/pull/41183) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* During startup and ATTACH call, `ReplicatedMergeTree` tables will be readonly until the ZooKeeper connection is made and the setup is finished. [#40148](https://github.com/ClickHouse/ClickHouse/pull/40148) ([Antonio Andelic](https://github.com/antonio2368)).
* Add `enable_extended_results_for_datetime_functions` option to return results of type Date32 for functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toMonday and toLastDayOfMonth when argument is Date32 or DateTime64, otherwise results of Date type are returned. For compatibility reasons default value is 0. [#41214](https://github.com/ClickHouse/ClickHouse/pull/41214) ([Roman Vasin](https://github.com/rvasin)).
* For security and stability reasons, CatBoost models are no longer evaluated within the ClickHouse server. Instead, the evaluation is now done in the clickhouse-library-bridge, a separate process that loads the catboost library and communicates with the server process via HTTP. [#40897](https://github.com/ClickHouse/ClickHouse/pull/40897) ([Robert Schulze](https://github.com/rschu1ze)). [#39629](https://github.com/ClickHouse/ClickHouse/pull/39629) ([Robert Schulze](https://github.com/rschu1ze)).
@ -108,6 +111,7 @@
* Add `has_lightweight_delete` to system.parts. [#41564](https://github.com/ClickHouse/ClickHouse/pull/41564) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Build/Testing/Packaging Improvement
* Enforce documentation for every setting. [#40644](https://github.com/ClickHouse/ClickHouse/pull/40644) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enforce documentation for every current metric. [#40645](https://github.com/ClickHouse/ClickHouse/pull/40645) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enforce documentation for every profile event counter. Write the documentation where it was missing. [#40646](https://github.com/ClickHouse/ClickHouse/pull/40646) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -217,15 +221,16 @@
* Fix read bytes/rows in X-ClickHouse-Summary with materialized views. [#41586](https://github.com/ClickHouse/ClickHouse/pull/41586) ([Raúl Marín](https://github.com/Algunenano)).
* Fix possible `pipeline stuck` exception for queries with `OFFSET`. The error was found with `enable_optimize_predicate_expression = 0` and always false condition in `WHERE`. Fixes [#41383](https://github.com/ClickHouse/ClickHouse/issues/41383). [#41588](https://github.com/ClickHouse/ClickHouse/pull/41588) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
### <a id="228"></a> ClickHouse release 22.8, 2022-08-18
### <a id="228"></a> ClickHouse release 22.8-lts, 2022-08-18
#### Backward Incompatible Change
* Extended range of `Date32` and `DateTime64` to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. ([Kseniia Sumarokova](https://github.com/kssenii)). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171))
#### New Feature
* Query parameters can be set in interactive mode as `SET param_abc = 'def'` and transferred via the native protocol as settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
* Quota key can be set in the native protocol ([Yakov Olkhovsky](https://github.com/ClickHouse/ClickHouse/pull/39874)).
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
@ -240,12 +245,14 @@
* Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
#### Experimental Feature
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)) ([Alexander Gololobov](https://github.com/davenger)). Note: this new feature does not make ClickHouse an HTAP DBMS.
#### Performance Improvement
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)). Add `concurrent_threads_soft_limit parameter` to increase performance in case of high QPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)). [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
* `DISTINCT` in order with `ORDER BY`: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)). Improve memory usage (significantly) and query execution time + use `DistinctSortedChunkTransform` for final distinct when `DISTINCT` columns match `ORDER BY` columns, but rename to `DistinctSortedStreamTransform` in `EXPLAIN PIPELINE` → this improves memory usage significantly + remove unnecessary allocations in hot loop in `DistinctSortedChunkTransform`. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)). Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)). Fix: `DistinctSortedTransform` didn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary `DISTINCT` (`DistinctTransform`). The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
@ -256,6 +263,7 @@
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
#### Improvement
* Normalize `AggregateFunction` types and state representations because optimizations like [#35788](https://github.com/ClickHouse/ClickHouse/pull/35788) will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)). The functions with identical states can be used in materialized views interchangeably.
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set the ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
@ -294,6 +302,7 @@
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
#### Build/Testing/Packaging Improvement
* [ClickFiddle](https://fiddle.clickhouse.com/): A new tool for testing ClickHouse versions in read/write mode (**Igor Baliuk**).
* ClickHouse binary is made self-extracting [#35775](https://github.com/ClickHouse/ClickHouse/pull/35775) ([Yakov Olkhovskiy, Arthur Filatenkov](https://github.com/yakov-olkhovskiy)).
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -308,6 +317,7 @@
* Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Bug Fix
* Fix possible segfault in `CapnProto` input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -358,16 +368,17 @@
* A fix for reverse DNS resolution. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
* Fix unexpected result `arrayDifference` of `Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
### <a id="227"></a> ClickHouse release 22.7, 2022-07-21
#### Upgrade Notes
* Enable setting `enable_positional_arguments` by default. It allows queries like `SELECT ... ORDER BY 1, 2` where 1, 2 are the references to the select clause. If you need to return the old behavior, disable this setting. [#38204](https://github.com/ClickHouse/ClickHouse/pull/38204) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Disable `format_csv_allow_single_quotes` by default. See [#37096](https://github.com/ClickHouse/ClickHouse/issues/37096). ([Kruglov Pavel](https://github.com/Avogar)).
* `Ordinary` database engine and old storage definition syntax for `*MergeTree` tables are deprecated. By default it's not possible to create new databases with `Ordinary` engine. If `system` database has `Ordinary` engine it will be automatically converted to `Atomic` on server startup. There are settings to keep old behavior (`allow_deprecated_database_ordinary` and `allow_deprecated_syntax_for_merge_tree`), but these settings may be removed in future releases. [#38335](https://github.com/ClickHouse/ClickHouse/pull/38335) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Force rewriting comma join to inner by default (set default value `cross_to_inner_join_rewrite = 2`). To have old behavior set `cross_to_inner_join_rewrite = 1`. [#39326](https://github.com/ClickHouse/ClickHouse/pull/39326) ([Vladimir C](https://github.com/vdimir)). If you will face any incompatibilities, you can turn this setting back.
#### New Feature
* Support expressions with window functions. Closes [#19857](https://github.com/ClickHouse/ClickHouse/issues/19857). [#37848](https://github.com/ClickHouse/ClickHouse/pull/37848) ([Dmitry Novik](https://github.com/novikd)).
* Add new `direct` join algorithm for `EmbeddedRocksDB` tables, see [#33582](https://github.com/ClickHouse/ClickHouse/issues/33582). [#35363](https://github.com/ClickHouse/ClickHouse/pull/35363) ([Vladimir C](https://github.com/vdimir)).
* Added full sorting merge join algorithm. [#35796](https://github.com/ClickHouse/ClickHouse/pull/35796) ([Vladimir C](https://github.com/vdimir)).
@ -395,9 +406,11 @@
* Add `clickhouse-diagnostics` binary to the packages. [#38647](https://github.com/ClickHouse/ClickHouse/pull/38647) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Experimental Feature
* Adds new setting `implicit_transaction` to run standalone queries inside a transaction. It handles both creation and closing (via COMMIT if the query succeeded or ROLLBACK if it didn't) of the transaction automatically. [#38344](https://github.com/ClickHouse/ClickHouse/pull/38344) ([Raúl Marín](https://github.com/Algunenano)).
#### Performance Improvement
* Distinct optimization for sorted columns. Use specialized distinct transformation in case input stream is sorted by column(s) in distinct. Optimization can be applied to pre-distinct, final distinct, or both. Initial implementation by @dimarub2000. [#37803](https://github.com/ClickHouse/ClickHouse/pull/37803) ([Igor Nikonov](https://github.com/devcrafter)).
* Improve performance of `ORDER BY`, `MergeTree` merges, window functions using batch version of `BinaryHeap`. [#38022](https://github.com/ClickHouse/ClickHouse/pull/38022) ([Maksim Kita](https://github.com/kitaisreal)).
* More parallel execution for queries with `FINAL` [#36396](https://github.com/ClickHouse/ClickHouse/pull/36396) ([Nikita Taranov](https://github.com/nickitat)).
@ -407,7 +420,7 @@
* Improve performance of insertion to columns of type `JSON`. [#38320](https://github.com/ClickHouse/ClickHouse/pull/38320) ([Anton Popov](https://github.com/CurtizJ)).
* Optimized insertion and lookups in the HashTable. [#38413](https://github.com/ClickHouse/ClickHouse/pull/38413) ([Nikita Taranov](https://github.com/nickitat)).
* Fix performance degradation from [#32493](https://github.com/ClickHouse/ClickHouse/issues/32493). [#38417](https://github.com/ClickHouse/ClickHouse/pull/38417) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)).
* Norm and Distance functions for arrays speed up 1.2-2 times. [#38740](https://github.com/ClickHouse/ClickHouse/pull/38740) ([Alexander Gololobov](https://github.com/davenger)).
* Add AVX-512 VBMI optimized `copyOverlap32Shuffle` for LZ4 decompression. In other words, LZ4 decompression performance is improved. [#37891](https://github.com/ClickHouse/ClickHouse/pull/37891) ([Guo Wangyang](https://github.com/guowangy)).
* `ORDER BY (a, b)` will use all the same benefits as `ORDER BY a, b`. [#38873](https://github.com/ClickHouse/ClickHouse/pull/38873) ([Igor Nikonov](https://github.com/devcrafter)).
@ -419,6 +432,7 @@
* The table `system.asynchronous_metric_log` is further optimized for storage space. This closes [#38134](https://github.com/ClickHouse/ClickHouse/issues/38134). See the [YouTube video](https://www.youtube.com/watch?v=0fSp9SF8N8A). [#38428](https://github.com/ClickHouse/ClickHouse/pull/38428) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Support SQL standard CREATE INDEX and DROP INDEX syntax. [#35166](https://github.com/ClickHouse/ClickHouse/pull/35166) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Send profile events for INSERT queries (previously only SELECT was supported). [#37391](https://github.com/ClickHouse/ClickHouse/pull/37391) ([Azat Khuzhin](https://github.com/azat)).
* Implement in order aggregation (`optimize_aggregation_in_order`) for fully materialized projections. [#37469](https://github.com/ClickHouse/ClickHouse/pull/37469) ([Azat Khuzhin](https://github.com/azat)).
@ -464,6 +478,7 @@
* Allow to declare `RabbitMQ` queue without default arguments `x-max-length` and `x-overflow`. [#39259](https://github.com/ClickHouse/ClickHouse/pull/39259) ([rnbondarenko](https://github.com/rnbondarenko)).
#### Build/Testing/Packaging Improvement
* Apply Clang Thread Safety Analysis (TSA) annotations to ClickHouse. [#38068](https://github.com/ClickHouse/ClickHouse/pull/38068) ([Robert Schulze](https://github.com/rschu1ze)).
* Adapt universal installation script for FreeBSD. [#39302](https://github.com/ClickHouse/ClickHouse/pull/39302) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Preparation for building on `s390x` platform. [#39193](https://github.com/ClickHouse/ClickHouse/pull/39193) ([Harry Lee](https://github.com/HarryLeeIBM)).
@ -473,6 +488,7 @@
* Change `all|noarch` packages to architecture-dependent - Fix some documentation for it - Push aarch64|arm64 packages to artifactory and release assets - Fixes [#36443](https://github.com/ClickHouse/ClickHouse/issues/36443). [#38580](https://github.com/ClickHouse/ClickHouse/pull/38580) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Fix rounding for `Decimal128/Decimal256` with more than 19-digits long scale. [#38027](https://github.com/ClickHouse/ClickHouse/pull/38027) ([Igor Nikonov](https://github.com/devcrafter)).
* Fixed crash caused by data race in storage `Hive` (integration table engine). [#38887](https://github.com/ClickHouse/ClickHouse/pull/38887) ([lgbo](https://github.com/lgbo-ustc)).
* Fix crash when executing GRANT ALL ON *.* with ON CLUSTER. It was broken in https://github.com/ClickHouse/ClickHouse/pull/35767. This closes [#38618](https://github.com/ClickHouse/ClickHouse/issues/38618). [#38674](https://github.com/ClickHouse/ClickHouse/pull/38674) ([Vitaly Baranov](https://github.com/vitlibar)).
@ -529,6 +545,7 @@
### <a id="226"></a> ClickHouse release 22.6, 2022-06-16
#### Backward Incompatible Change
* Remove support for octal number literals in SQL. In previous versions they were parsed as Float64. [#37765](https://github.com/ClickHouse/ClickHouse/pull/37765) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Changes how settings using `seconds` as type are parsed to support floating point values (for example: `max_execution_time=0.5`). Infinity or NaN values will throw an exception. [#37187](https://github.com/ClickHouse/ClickHouse/pull/37187) ([Raúl Marín](https://github.com/Algunenano)).
* Changed format of binary serialization of columns of experimental type `Object`. New format is more convenient to implement by third-party clients. [#37482](https://github.com/ClickHouse/ClickHouse/pull/37482) ([Anton Popov](https://github.com/CurtizJ)).
@ -537,6 +554,7 @@
* If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
#### New Feature
* Add `GROUPING` function. It allows to disambiguate the records in the queries with `ROLLUP`, `CUBE` or `GROUPING SETS`. Closes [#19426](https://github.com/ClickHouse/ClickHouse/issues/19426). [#37163](https://github.com/ClickHouse/ClickHouse/pull/37163) ([Dmitry Novik](https://github.com/novikd)).
* A new codec [FPC](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf) algorithm for floating point data compression. [#37553](https://github.com/ClickHouse/ClickHouse/pull/37553) ([Mikhail Guzov](https://github.com/koloshmet)).
* Add new columnar JSON formats: `JSONColumns`, `JSONCompactColumns`, `JSONColumnsWithMetadata`. Closes [#36338](https://github.com/ClickHouse/ClickHouse/issues/36338) Closes [#34509](https://github.com/ClickHouse/ClickHouse/issues/34509). [#36975](https://github.com/ClickHouse/ClickHouse/pull/36975) ([Kruglov Pavel](https://github.com/Avogar)).
@ -557,11 +575,13 @@
* Added `SYSTEM UNFREEZE` query that deletes the whole backup regardless if the corresponding table is deleted or not. [#36424](https://github.com/ClickHouse/ClickHouse/pull/36424) ([Vadim Volodin](https://github.com/PolyProgrammist)).
#### Experimental Feature
* Enables `POPULATE` for `WINDOW VIEW`. [#36945](https://github.com/ClickHouse/ClickHouse/pull/36945) ([vxider](https://github.com/Vxider)).
* `ALTER TABLE ... MODIFY QUERY` support for `WINDOW VIEW`. [#37188](https://github.com/ClickHouse/ClickHouse/pull/37188) ([vxider](https://github.com/Vxider)).
* This PR changes the behavior of the `ENGINE` syntax in `WINDOW VIEW`, to make it like in `MATERIALIZED VIEW`. [#37214](https://github.com/ClickHouse/ClickHouse/pull/37214) ([vxider](https://github.com/Vxider)).
#### Performance Improvement
* Added numerous optimizations for ARM NEON [#38093](https://github.com/ClickHouse/ClickHouse/pull/38093)([Daniel Kutenin](https://github.com/danlark1)), ([Alexandra Pilipyuk](https://github.com/chalice19)) Note: if you run different ClickHouse versions on a cluster with ARM CPU and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, the result of the aggregation query will be wrong during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
* Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting `input_format_skip_unknown_fields` by default, because otherwise in case of select of subset of columns exception will be thrown. [#37192](https://github.com/ClickHouse/ClickHouse/pull/37192) ([Kruglov Pavel](https://github.com/Avogar)).
* Now more filters can be pushed down for join. [#37472](https://github.com/ClickHouse/ClickHouse/pull/37472) ([Amos Bird](https://github.com/amosbird)).
@ -592,6 +612,7 @@
* In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". [#37242](https://github.com/ClickHouse/ClickHouse/pull/37242) ([jasperzhu](https://github.com/jinjunzh)).
#### Improvement
* Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes [#37360](https://github.com/ClickHouse/ClickHouse/issues/37360). [#37667](https://github.com/ClickHouse/ClickHouse/pull/37667) ([Dmitry Novik](https://github.com/novikd)).
* Fix stack traces collection on ARM. Closes [#37044](https://github.com/ClickHouse/ClickHouse/issues/37044). Closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#37797](https://github.com/ClickHouse/ClickHouse/pull/37797) ([Maksim Kita](https://github.com/kitaisreal)).
* Client will try every IP address returned by DNS resolution until successful connection. [#37273](https://github.com/ClickHouse/ClickHouse/pull/37273) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
@ -633,6 +654,7 @@
* Add implicit grants with grant option too. For example `GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTION` now allows `A` to execute `GRANT CREATE VIEW ON test.* TO B`. [#38017](https://github.com/ClickHouse/ClickHouse/pull/38017) ([Vitaly Baranov](https://github.com/vitlibar)).
#### Build/Testing/Packaging Improvement
* Use `clang-14` and LLVM infrastructure version 14 for builds. This closes [#34681](https://github.com/ClickHouse/ClickHouse/issues/34681). [#34754](https://github.com/ClickHouse/ClickHouse/pull/34754) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Note: `clang-14` has [a bug](https://github.com/google/sanitizers/issues/1540) in ThreadSanitizer that makes our CI work worse.
* Allow to drop privileges at startup. This simplifies Docker images. Closes [#36293](https://github.com/ClickHouse/ClickHouse/issues/36293). [#36341](https://github.com/ClickHouse/ClickHouse/pull/36341) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add docs spellcheck to CI. [#37790](https://github.com/ClickHouse/ClickHouse/pull/37790) ([Vladimir C](https://github.com/vdimir)).
@ -690,7 +712,6 @@
* Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes [#37184](https://github.com/ClickHouse/ClickHouse/issues/37184). [#37185](https://github.com/ClickHouse/ClickHouse/pull/37185) ([Amos Bird](https://github.com/amosbird)).
* Fixed `DateTime64` fractional seconds behavior prior to Unix epoch. [#37697](https://github.com/ClickHouse/ClickHouse/pull/37697) ([Andrey Zvonov](https://github.com/zvonand)). [#37039](https://github.com/ClickHouse/ClickHouse/pull/37039) ([李扬](https://github.com/taiyang-li)).
### <a id="225"></a> ClickHouse release 22.5, 2022-05-19
#### Upgrade Notes
@ -743,7 +764,7 @@
* Implement partial GROUP BY key for optimize_aggregation_in_order. [#35111](https://github.com/ClickHouse/ClickHouse/pull/35111) ([Azat Khuzhin](https://github.com/azat)).
#### Improvement
* Show names of erroneous files in case of parsing errors while executing table functions `file`, `s3` and `url`. [#36314](https://github.com/ClickHouse/ClickHouse/pull/36314) ([Anton Popov](https://github.com/CurtizJ)).
* Allowed to increase the number of threads for executing background operations (merges, mutations, moves and fetches) at runtime if they are specified at top level config. [#36425](https://github.com/ClickHouse/ClickHouse/pull/36425) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Now date time conversion functions that generates time before 1970-01-01 00:00:00 with partial hours/minutes timezones will be saturated to zero instead of overflow. This is the continuation of https://github.com/ClickHouse/ClickHouse/pull/29953 which addresses https://github.com/ClickHouse/ClickHouse/pull/29953#discussion_r800550280 . Mark as improvement because it's implementation defined behavior (and very rare case) and we are allowed to break it. [#36656](https://github.com/ClickHouse/ClickHouse/pull/36656) ([Amos Bird](https://github.com/amosbird)).
@ -852,7 +873,6 @@
* Fix ALTER DROP COLUMN of nested column with compact parts (i.e. `ALTER TABLE x DROP COLUMN n`, when there is column `n.d`). [#35797](https://github.com/ClickHouse/ClickHouse/pull/35797) ([Azat Khuzhin](https://github.com/azat)).
* Fix substring function range error length when `offset` and `length` is negative constant and `s` is not constant. [#33861](https://github.com/ClickHouse/ClickHouse/pull/33861) ([RogerYK](https://github.com/RogerYK)).
### <a id="224"></a> ClickHouse release 22.4, 2022-04-19
#### Backward Incompatible Change
@ -1004,8 +1024,7 @@
* Fix mutations in tables with enabled sparse columns. [#35284](https://github.com/ClickHouse/ClickHouse/pull/35284) ([Anton Popov](https://github.com/CurtizJ)).
* Do not delay final part writing by default (fixes possible `Memory limit exceeded` during `INSERT` by adding `max_insert_delayed_streams_for_parallel_write` with default to 1000 for writes to s3 and disabled as before otherwise). [#34780](https://github.com/ClickHouse/ClickHouse/pull/34780) ([Azat Khuzhin](https://github.com/azat)).
## <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
### <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
#### Backward Incompatible Change
@ -1132,7 +1151,6 @@
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
### <a id="222"></a> ClickHouse release v22.2, 2022-02-17
#### Upgrade Notes
@ -1308,7 +1326,6 @@
* Fix issue [#18206](https://github.com/ClickHouse/ClickHouse/issues/18206). [#33977](https://github.com/ClickHouse/ClickHouse/pull/33977) ([Vitaly Baranov](https://github.com/vitlibar)).
* This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)).
### <a id="221"></a> ClickHouse release v22.1, 2022-01-18
#### Upgrade Notes
@ -1335,7 +1352,6 @@
* Add function `decodeURLFormComponent` slightly different to `decodeURLComponent`. Close [#10298](https://github.com/ClickHouse/ClickHouse/issues/10298). [#33451](https://github.com/ClickHouse/ClickHouse/pull/33451) ([SuperDJY](https://github.com/cmsxbc)).
* Allow to split `GraphiteMergeTree` rollup rules for plain/tagged metrics (optional rule_type field). [#33494](https://github.com/ClickHouse/ClickHouse/pull/33494) ([Michail Safronov](https://github.com/msaf1980)).
#### Performance Improvement
* Support moving conditions to `PREWHERE` (setting `optimize_move_to_prewhere`) for tables of `Merge` engine if its all underlying tables supports `PREWHERE`. [#33300](https://github.com/ClickHouse/ClickHouse/pull/33300) ([Anton Popov](https://github.com/CurtizJ)).
@ -1351,7 +1367,6 @@
* Optimize selecting of MergeTree parts that can be moved between volumes. [#33225](https://github.com/ClickHouse/ClickHouse/pull/33225) ([OnePiece](https://github.com/zhongyuankai)).
* Fix `sparse_hashed` dict performance with sequential keys (wrong hash function). [#32536](https://github.com/ClickHouse/ClickHouse/pull/32536) ([Azat Khuzhin](https://github.com/azat)).
#### Experimental Feature
* Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set `allow_experimental_parallel_reading_from_replicas = 1` and `max_parallel_replicas` to any number. This closes [#26748](https://github.com/ClickHouse/ClickHouse/issues/26748). [#29279](https://github.com/ClickHouse/ClickHouse/pull/29279) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
@ -1364,7 +1379,6 @@
* Fix ACL with explicit digit hash in `clickhouse-keeper`: now the behavior consistent with ZooKeeper and generated digest is always accepted. [#33249](https://github.com/ClickHouse/ClickHouse/pull/33249) ([小路](https://github.com/nicelulu)). [#33246](https://github.com/ClickHouse/ClickHouse/pull/33246).
* Fix unexpected projection removal when detaching parts. [#32067](https://github.com/ClickHouse/ClickHouse/pull/32067) ([Amos Bird](https://github.com/amosbird)).
#### Improvement
* Now date time conversion functions that generates time before `1970-01-01 00:00:00` will be saturated to zero instead of overflow. [#29953](https://github.com/ClickHouse/ClickHouse/pull/29953) ([Amos Bird](https://github.com/amosbird)). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch.
@ -1411,7 +1425,6 @@
* Updating `modification_time` for data part in `system.parts` after part movement [#32964](https://github.com/ClickHouse/ClickHouse/issues/32964). [#32965](https://github.com/ClickHouse/ClickHouse/pull/32965) ([save-my-heart](https://github.com/save-my-heart)).
* Potential issue, cannot be exploited: integer overflow may happen in array resize. [#33024](https://github.com/ClickHouse/ClickHouse/pull/33024) ([varadarajkumar](https://github.com/varadarajkumar)).
#### Build/Testing/Packaging Improvement
* Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. [#32911](https://github.com/ClickHouse/ClickHouse/pull/32911) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). [#32415](https://github.com/ClickHouse/ClickHouse/pull/32415)
@ -1426,7 +1439,6 @@
* Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. [#33124](https://github.com/ClickHouse/ClickHouse/pull/33124) ([taiyang-li](https://github.com/taiyang-li)).
* Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by [Mallik Hassan](https://github.com/SadiHassan) in [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). This closes [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). [#33026](https://github.com/ClickHouse/ClickHouse/pull/33026) ([alexey-milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Several fixes for format parsing. This is relevant if `clickhouse-server` is open for write access to adversary. Specifically crafted input data for `Native` format may lead to reading uninitialized memory or crash. This is relevant if `clickhouse-server` is open for write access to adversary. [#33050](https://github.com/ClickHouse/ClickHouse/pull/33050) ([Heena Bansal](https://github.com/HeenaBansal2009)). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. [#33022](https://github.com/ClickHouse/ClickHouse/pull/33022) ([Harry Lee](https://github.com/HarryLeeIBM)). Fix null pointer dereference in `LowCardinality` data when deserializing `LowCardinality` data in the Native format. [#33021](https://github.com/ClickHouse/ClickHouse/pull/33021) ([Harry Lee](https://github.com/HarryLeeIBM)).
@ -1485,5 +1497,4 @@
* Fix possible crash (or incorrect result) in case of `LowCardinality` arguments of window function. Fixes [#31114](https://github.com/ClickHouse/ClickHouse/issues/31114). [#31888](https://github.com/ClickHouse/ClickHouse/pull/31888) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix hang up with command `DROP TABLE system.query_log sync`. [#33293](https://github.com/ClickHouse/ClickHouse/pull/33293) ([zhanghuajie](https://github.com/zhanghuajieHIT)).
## [Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021)

View File

@ -3,15 +3,15 @@
# This is a workaround for bug in llvm/clang,
# that does not produce .debug_aranges with LTO
#
# NOTE: this is a temporary solution, that should be removed once [1] will be
# resolved.
# NOTE: this is a temporary solution, that should be removed after upgrading to
# clang-16/llvm-16.
#
# [1]: https://discourse.llvm.org/t/clang-does-not-produce-full-debug-aranges-section-with-thinlto/64898/8
# Refs: https://reviews.llvm.org/D133092
# NOTE: only -flto=thin is supported.
# NOTE: it is not possible to check was there -gdwarf-aranges initially or not.
if [[ "$*" =~ -plugin-opt=thinlto ]]; then
exec "@LLD_PATH@" -mllvm -generate-arange-section "$@"
exec "@LLD_PATH@" -plugin-opt=-generate-arange-section "$@"
else
exec "@LLD_PATH@" "$@"
fi

View File

@ -36,10 +36,7 @@ RUN arch=${TARGETARCH:-amd64} \
# repo versions doesn't work correctly with C++17
# also we push reports to s3, so we add index.html to subfolder urls
# https://github.com/ClickHouse-Extras/woboq_codebrowser/commit/37e15eaf377b920acb0b48dbe82471be9203f76b
# TODO: remove branch in a few weeks after merge, e.g. in May or June 2022
#
# FIXME: update location of a repo
RUN git clone https://github.com/azat/woboq_codebrowser --branch llvm-15 \
RUN git clone https://github.com/ClickHouse/woboq_codebrowser \
&& cd woboq_codebrowser \
&& cmake . -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang\+\+-${LLVM_VERSION} -DCMAKE_C_COMPILER=clang-${LLVM_VERSION} \
&& ninja \

22
docker/test/stress/run.sh Executable file → Normal file
View File

@ -47,7 +47,6 @@ function install_packages()
function configure()
{
export ZOOKEEPER_FAULT_INJECTION=1
# install test configs
export USE_DATABASE_ORDINARY=1
export EXPORT_S3_STORAGE_POLICIES=1
@ -203,6 +202,7 @@ quit
install_packages package_folder
export ZOOKEEPER_FAULT_INJECTION=1
configure
azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
@ -243,6 +243,7 @@ stop
# Let's enable S3 storage by default
export USE_S3_STORAGE_FOR_MERGE_TREE=1
export ZOOKEEPER_FAULT_INJECTION=1
configure
# But we still need default disk because some tables loaded only into it
@ -375,6 +376,8 @@ else
install_packages previous_release_package_folder
# Start server from previous release
# Previous version may not be ready for fault injections
export ZOOKEEPER_FAULT_INJECTION=0
configure
# Avoid "Setting s3_check_objects_after_upload is neither a builtin setting..."
@ -389,12 +392,23 @@ else
clickhouse-client --query="SELECT 'Server version: ', version()"
# Install new package before running stress test because we should use new clickhouse-client and new clickhouse-test
# But we should leave old binary in /usr/bin/ for gdb (so it will print sane stacktarces)
# Install new package before running stress test because we should use new
# clickhouse-client and new clickhouse-test.
#
# But we should leave old binary in /usr/bin/ and debug symbols in
# /usr/lib/debug/usr/bin (if any) for gdb and internal DWARF parser, so it
# will print sane stacktraces and also to avoid possible crashes.
#
# FIXME: those files can be extracted directly from debian package, but
# actually better solution will be to use different PATH instead of playing
# games with files from packages.
mv /usr/bin/clickhouse previous_release_package_folder/
mv /usr/lib/debug/usr/bin/clickhouse.debug previous_release_package_folder/
install_packages package_folder
mv /usr/bin/clickhouse package_folder/
mv /usr/lib/debug/usr/bin/clickhouse.debug package_folder/
mv previous_release_package_folder/clickhouse /usr/bin/
mv previous_release_package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug
mkdir tmp_stress_output
@ -410,6 +424,8 @@ else
# Start new server
mv package_folder/clickhouse /usr/bin/
mv package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug
export ZOOKEEPER_FAULT_INJECTION=1
configure
start 500
clickhouse-client --query "SELECT 'Backward compatibility check: Server successfully started', 'OK'" >> /test_output/test_results.tsv \

View File

@ -49,27 +49,13 @@ When we calculate some function over columns in a block, we add another column w
Blocks are created for every processed chunk of data. Note that for the same type of calculation, the column names and types remain the same for different blocks, and only column data changes. It is better to split block data from the block header because small block sizes have a high overhead of temporary strings for copying shared_ptrs and column names.
## Block Streams {#block-streams}
## Processors
Block streams are for processing data. We use streams of blocks to read data from somewhere, perform data transformations, or write data to somewhere. `IBlockInputStream` has the `read` method to fetch the next block while available. `IBlockOutputStream` has the `write` method to push the block somewhere.
Streams are responsible for:
1. Reading or writing to a table. The table just returns a stream for reading or writing blocks.
2. Implementing data formats. For example, if you want to output data to a terminal in `Pretty` format, you create a block output stream where you push blocks, and it formats them.
3. Performing data transformations. Lets say you have `IBlockInputStream` and want to create a filtered stream. You create `FilterBlockInputStream` and initialize it with your stream. Then when you pull a block from `FilterBlockInputStream`, it pulls a block from your stream, filters it, and returns the filtered block to you. Query execution pipelines are represented this way.
There are more sophisticated transformations. For example, when you pull from `AggregatingBlockInputStream`, it reads all data from its source, aggregates it, and then returns a stream of aggregated data for you. Another example: `UnionBlockInputStream` accepts many input sources in the constructor and also a number of threads. It launches multiple threads and reads from multiple sources in parallel.
> Block streams use the “pull” approach to control flow: when you pull a block from the first stream, it consequently pulls the required blocks from nested streams, and the entire execution pipeline will work. Neither “pull” nor “push” is the best solution, because control flow is implicit, and that limits the implementation of various features like simultaneous execution of multiple queries (merging many pipelines together). This limitation could be overcome with coroutines or just running extra threads that wait for each other. We may have more possibilities if we make control flow explicit: if we locate the logic for passing data from one calculation unit to another outside of those calculation units. Read this [article](http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/) for more thoughts.
We should note that the query execution pipeline creates temporary data at each step. We try to keep block size small enough so that temporary data fits in the CPU cache. With that assumption, writing and reading temporary data is almost free in comparison with other calculations. We could consider an alternative, which is to fuse many operations in the pipeline together. It could make the pipeline as short as possible and remove much of the temporary data, which could be an advantage, but it also has drawbacks. For example, a split pipeline makes it easy to implement caching intermediate data, stealing intermediate data from similar queries running at the same time, and merging pipelines for similar queries.
See the description at [https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h).
## Formats {#formats}
Data formats are implemented with block streams. There are “presentational” formats only suitable for the output of data to the client, such as `Pretty` format, which provides only `IBlockOutputStream`. And there are input/output formats, such as `TabSeparated` or `JSONEachRow`.
There are also row streams: `IRowInputStream` and `IRowOutputStream`. They allow you to pull/push data by individual rows, not by blocks. And they are only needed to simplify the implementation of row-oriented formats. The wrappers `BlockInputStreamFromRowInputStream` and `BlockOutputStreamFromRowOutputStream` allow you to convert row-oriented streams to regular block-oriented streams.
Data formats are implemented with processors.
## I/O {#io}

View File

@ -419,6 +419,8 @@ Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `St
For `Map` data type client can specify if index should be created for keys or values using [mapKeys](../../../sql-reference/functions/tuple-map-functions.md#mapkeys) or [mapValues](../../../sql-reference/functions/tuple-map-functions.md#mapvalues) function.
There are also special-purpose and experimental indexes to support approximate nearest neighbor (ANN) queries. See [here](annindexes.md) for details.
The following functions can use the filter: [equals](../../../sql-reference/functions/comparison-functions.md), [notEquals](../../../sql-reference/functions/comparison-functions.md), [in](../../../sql-reference/functions/in-functions), [notIn](../../../sql-reference/functions/in-functions), [has](../../../sql-reference/functions/array-functions#hasarr-elem), [hasAny](../../../sql-reference/functions/array-functions#hasany), [hasAll](../../../sql-reference/functions/array-functions#hasall).
Example of index creation for `Map` data type

View File

@ -19,7 +19,6 @@
{host}
{port}
{user}
{database}
{display_name}
Terminal colors: https://misc.flogisoft.com/bash/tip_colors_and_formatting
See also: https://wiki.hackzine.org/development/misc/readline-color-prompt.html

View File

@ -45,6 +45,7 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperLogStore.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperServer.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManager.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManagerS3.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateMachine.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateManager.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStorage.cpp

View File

@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);

View File

@ -40,7 +40,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
if (which.idx == TypeIndex::DateTime) return std::make_shared<Function<DataTypeDateTime::FieldType, false>>(argument_types, params);
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);

View File

@ -47,7 +47,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);

View File

@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);

View File

@ -40,7 +40,15 @@ struct WelchTTestData : public TTestMoments<Float64>
Float64 denominator_x = sx2 * sx2 / (nx * nx * (nx - 1));
Float64 denominator_y = sy2 * sy2 / (ny * ny * (ny - 1));
return numerator / (denominator_x + denominator_y);
auto result = numerator / (denominator_x + denominator_y);
if (result <= 0 || std::isinf(result) || isNaN(result))
throw Exception(
ErrorCodes::BAD_ARGUMENTS,
"Cannot calculate p_value, because the t-distribution \
has inappropriate value of degrees of freedom (={}). It should be > 0", result);
return result;
}
std::tuple<Float64, Float64> getResult() const

View File

@ -53,9 +53,12 @@ String IAggregateFunction::getDescription() const
bool IAggregateFunction::haveEqualArgumentTypes(const IAggregateFunction & rhs) const
{
return std::equal(argument_types.begin(), argument_types.end(),
rhs.argument_types.begin(), rhs.argument_types.end(),
[](const auto & t1, const auto & t2) { return t1->equals(*t2); });
return std::equal(
argument_types.begin(),
argument_types.end(),
rhs.argument_types.begin(),
rhs.argument_types.end(),
[](const auto & t1, const auto & t2) { return t1->equals(*t2); });
}
bool IAggregateFunction::haveSameStateRepresentation(const IAggregateFunction & rhs) const
@ -67,11 +70,7 @@ bool IAggregateFunction::haveSameStateRepresentation(const IAggregateFunction &
bool IAggregateFunction::haveSameStateRepresentationImpl(const IAggregateFunction & rhs) const
{
bool res = getName() == rhs.getName()
&& parameters == rhs.parameters
&& haveEqualArgumentTypes(rhs);
assert(res == (getStateType()->getName() == rhs.getStateType()->getName()));
return res;
return getStateType()->equals(*rhs.getStateType());
}
}

View File

@ -32,10 +32,12 @@ void BackupFactory::registerBackupEngine(const String & engine_name, const Creat
}
void registerBackupEnginesFileAndDisk(BackupFactory &);
void registerBackupEngineS3(BackupFactory &);
void registerBackupEngines(BackupFactory & factory)
{
registerBackupEnginesFileAndDisk(factory);
registerBackupEngineS3(factory);
}
BackupFactory::BackupFactory()

375
src/Backups/BackupIO_S3.cpp Normal file
View File

@ -0,0 +1,375 @@
#include <Backups/BackupIO_S3.h>
#if USE_AWS_S3
#include <Common/quoteString.h>
#include <Interpreters/threadPoolCallbackRunner.h>
#include <Interpreters/Context.h>
#include <Storages/StorageS3Settings.h>
#include <IO/IOThreadPool.h>
#include <IO/ReadBufferFromS3.h>
#include <IO/WriteBufferFromS3.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <aws/core/auth/AWSCredentials.h>
#include <aws/s3/S3Client.h>
#include <filesystem>
#include <aws/s3/model/ListObjectsRequest.h>
namespace fs = std::filesystem;
namespace DB
{
namespace ErrorCodes
{
extern const int S3_ERROR;
extern const int LOGICAL_ERROR;
}
namespace
{
std::shared_ptr<Aws::S3::S3Client>
makeS3Client(const S3::URI & s3_uri, const String & access_key_id, const String & secret_access_key, const ContextPtr & context)
{
auto settings = context->getStorageS3Settings().getSettings(s3_uri.uri.toString());
Aws::Auth::AWSCredentials credentials(access_key_id, secret_access_key);
HeaderCollection headers;
if (access_key_id.empty())
{
credentials = Aws::Auth::AWSCredentials(settings.auth_settings.access_key_id, settings.auth_settings.secret_access_key);
headers = settings.auth_settings.headers;
}
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
settings.auth_settings.region,
context->getRemoteHostFilter(),
context->getGlobalContext()->getSettingsRef().s3_max_redirects,
context->getGlobalContext()->getSettingsRef().enable_s3_requests_logging,
/* for_disk_s3 = */ false);
client_configuration.endpointOverride = s3_uri.endpoint;
client_configuration.maxConnections = context->getSettingsRef().s3_max_connections;
/// Increase connect timeout
client_configuration.connectTimeoutMs = 10 * 1000;
/// Requests in backups can be extremely long, set to one hour
client_configuration.requestTimeoutMs = 60 * 60 * 1000;
return S3::ClientFactory::instance().create(
client_configuration,
s3_uri.is_virtual_hosted_style,
credentials.GetAWSAccessKeyId(),
credentials.GetAWSSecretKey(),
settings.auth_settings.server_side_encryption_customer_key_base64,
std::move(headers),
settings.auth_settings.use_environment_credentials.value_or(
context->getConfigRef().getBool("s3.use_environment_credentials", false)),
settings.auth_settings.use_insecure_imds_request.value_or(
context->getConfigRef().getBool("s3.use_insecure_imds_request", false)));
}
Aws::Vector<Aws::S3::Model::Object> listObjects(Aws::S3::S3Client & client, const S3::URI & s3_uri, const String & file_name)
{
Aws::S3::Model::ListObjectsRequest request;
request.SetBucket(s3_uri.bucket);
request.SetPrefix(fs::path{s3_uri.key} / file_name);
request.SetMaxKeys(1);
auto outcome = client.ListObjects(request);
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
return outcome.GetResult().GetContents();
}
}
BackupReaderS3::BackupReaderS3(
const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_)
: s3_uri(s3_uri_)
, client(makeS3Client(s3_uri_, access_key_id_, secret_access_key_, context_))
, max_single_read_retries(context_->getSettingsRef().s3_max_single_read_retries)
, read_settings(context_->getReadSettings())
{
}
DataSourceDescription BackupReaderS3::getDataSourceDescription() const
{
return DataSourceDescription{DataSourceType::S3, s3_uri.endpoint, false, false};
}
BackupReaderS3::~BackupReaderS3() = default;
bool BackupReaderS3::fileExists(const String & file_name)
{
return !listObjects(*client, s3_uri, file_name).empty();
}
UInt64 BackupReaderS3::getFileSize(const String & file_name)
{
auto objects = listObjects(*client, s3_uri, file_name);
if (objects.empty())
throw Exception(ErrorCodes::S3_ERROR, "Object {} must exist");
return objects[0].GetSize();
}
std::unique_ptr<SeekableReadBuffer> BackupReaderS3::readFile(const String & file_name)
{
return std::make_unique<ReadBufferFromS3>(
client, s3_uri.bucket, fs::path(s3_uri.key) / file_name, s3_uri.version_id, max_single_read_retries, read_settings);
}
BackupWriterS3::BackupWriterS3(
const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_)
: s3_uri(s3_uri_)
, client(makeS3Client(s3_uri_, access_key_id_, secret_access_key_, context_))
, max_single_read_retries(context_->getSettingsRef().s3_max_single_read_retries)
, read_settings(context_->getReadSettings())
, rw_settings(context_->getStorageS3Settings().getSettings(s3_uri.uri.toString()).rw_settings)
{
rw_settings.updateFromSettingsIfEmpty(context_->getSettingsRef());
}
DataSourceDescription BackupWriterS3::getDataSourceDescription() const
{
return DataSourceDescription{DataSourceType::S3, s3_uri.endpoint, false, false};
}
bool BackupWriterS3::supportNativeCopy(DataSourceDescription data_source_description) const
{
return getDataSourceDescription() == data_source_description;
}
void BackupWriterS3::copyObjectImpl(
const String & src_bucket,
const String & src_key,
const String & dst_bucket,
const String & dst_key,
std::optional<Aws::S3::Model::HeadObjectResult> head,
std::optional<ObjectAttributes> metadata) const
{
Aws::S3::Model::CopyObjectRequest request;
request.SetCopySource(src_bucket + "/" + src_key);
request.SetBucket(dst_bucket);
request.SetKey(dst_key);
if (metadata)
{
request.SetMetadata(*metadata);
request.SetMetadataDirective(Aws::S3::Model::MetadataDirective::REPLACE);
}
auto outcome = client->CopyObject(request);
if (!outcome.IsSuccess() && outcome.GetError().GetExceptionName() == "EntityTooLarge")
{ // Can't come here with MinIO, MinIO allows single part upload for large objects.
copyObjectMultipartImpl(src_bucket, src_key, dst_bucket, dst_key, head, metadata);
return;
}
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
Aws::S3::Model::HeadObjectOutcome BackupWriterS3::requestObjectHeadData(const std::string & bucket_from, const std::string & key) const
{
Aws::S3::Model::HeadObjectRequest request;
request.SetBucket(bucket_from);
request.SetKey(key);
return client->HeadObject(request);
}
void BackupWriterS3::copyObjectMultipartImpl(
const String & src_bucket,
const String & src_key,
const String & dst_bucket,
const String & dst_key,
std::optional<Aws::S3::Model::HeadObjectResult> head,
std::optional<ObjectAttributes> metadata) const
{
if (!head)
head = requestObjectHeadData(src_bucket, src_key).GetResult();
size_t size = head->GetContentLength();
String multipart_upload_id;
{
Aws::S3::Model::CreateMultipartUploadRequest request;
request.SetBucket(dst_bucket);
request.SetKey(dst_key);
if (metadata)
request.SetMetadata(*metadata);
auto outcome = client->CreateMultipartUpload(request);
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
multipart_upload_id = outcome.GetResult().GetUploadId();
}
std::vector<String> part_tags;
size_t upload_part_size = rw_settings.min_upload_part_size;
for (size_t position = 0, part_number = 1; position < size; ++part_number, position += upload_part_size)
{
Aws::S3::Model::UploadPartCopyRequest part_request;
part_request.SetCopySource(src_bucket + "/" + src_key);
part_request.SetBucket(dst_bucket);
part_request.SetKey(dst_key);
part_request.SetUploadId(multipart_upload_id);
part_request.SetPartNumber(part_number);
part_request.SetCopySourceRange(fmt::format("bytes={}-{}", position, std::min(size, position + upload_part_size) - 1));
auto outcome = client->UploadPartCopy(part_request);
if (!outcome.IsSuccess())
{
Aws::S3::Model::AbortMultipartUploadRequest abort_request;
abort_request.SetBucket(dst_bucket);
abort_request.SetKey(dst_key);
abort_request.SetUploadId(multipart_upload_id);
client->AbortMultipartUpload(abort_request);
// In error case we throw exception later with first error from UploadPartCopy
}
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
auto etag = outcome.GetResult().GetCopyPartResult().GetETag();
part_tags.push_back(etag);
}
{
Aws::S3::Model::CompleteMultipartUploadRequest req;
req.SetBucket(dst_bucket);
req.SetKey(dst_key);
req.SetUploadId(multipart_upload_id);
Aws::S3::Model::CompletedMultipartUpload multipart_upload;
for (size_t i = 0; i < part_tags.size(); ++i)
{
Aws::S3::Model::CompletedPart part;
multipart_upload.AddParts(part.WithETag(part_tags[i]).WithPartNumber(i + 1));
}
req.SetMultipartUpload(multipart_upload);
auto outcome = client->CompleteMultipartUpload(req);
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
}
void BackupWriterS3::copyFileNative(DiskPtr from_disk, const String & file_name_from, const String & file_name_to)
{
if (!from_disk)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot natively copy data to disk without source disk");
auto objects = from_disk->getStorageObjects(file_name_from);
if (objects.size() > 1)
{
copyFileThroughBuffer(from_disk->readFile(file_name_from), file_name_to);
}
else
{
auto object_storage = from_disk->getObjectStorage();
std::string source_bucket = object_storage->getObjectsNamespace();
auto file_path = fs::path(s3_uri.key) / file_name_to;
auto head = requestObjectHeadData(source_bucket, objects[0].absolute_path).GetResult();
static constexpr int64_t multipart_upload_threashold = 5UL * 1024 * 1024 * 1024;
if (head.GetContentLength() >= multipart_upload_threashold)
{
copyObjectMultipartImpl(
source_bucket, objects[0].absolute_path, s3_uri.bucket, file_path, head);
}
else
{
copyObjectImpl(
source_bucket, objects[0].absolute_path, s3_uri.bucket, file_path, head);
}
}
}
BackupWriterS3::~BackupWriterS3() = default;
bool BackupWriterS3::fileExists(const String & file_name)
{
return !listObjects(*client, s3_uri, file_name).empty();
}
UInt64 BackupWriterS3::getFileSize(const String & file_name)
{
auto objects = listObjects(*client, s3_uri, file_name);
if (objects.empty())
throw Exception(ErrorCodes::S3_ERROR, "Object {} must exist");
return objects[0].GetSize();
}
bool BackupWriterS3::fileContentsEqual(const String & file_name, const String & expected_file_contents)
{
if (listObjects(*client, s3_uri, file_name).empty())
return false;
try
{
auto in = std::make_unique<ReadBufferFromS3>(
client, s3_uri.bucket, fs::path(s3_uri.key) / file_name, s3_uri.version_id, max_single_read_retries, read_settings);
String actual_file_contents(expected_file_contents.size(), ' ');
return (in->read(actual_file_contents.data(), actual_file_contents.size()) == actual_file_contents.size())
&& (actual_file_contents == expected_file_contents) && in->eof();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
return false;
}
}
std::unique_ptr<WriteBuffer> BackupWriterS3::writeFile(const String & file_name)
{
return std::make_unique<WriteBufferFromS3>(
client,
s3_uri.bucket,
fs::path(s3_uri.key) / file_name,
rw_settings,
std::nullopt,
DBMS_DEFAULT_BUFFER_SIZE,
threadPoolCallbackRunner<void>(IOThreadPool::get(), "BackupWriterS3"));
}
void BackupWriterS3::removeFiles(const Strings & file_names)
{
/// One call of DeleteObjects() cannot remove more than 1000 keys.
size_t chunk_size_limit = 1000;
size_t current_position = 0;
while (current_position < file_names.size())
{
std::vector<Aws::S3::Model::ObjectIdentifier> current_chunk;
for (; current_position < file_names.size() && current_chunk.size() < chunk_size_limit; ++current_position)
{
Aws::S3::Model::ObjectIdentifier obj;
obj.SetKey(fs::path(s3_uri.key) / file_names[current_position]);
current_chunk.push_back(obj);
}
Aws::S3::Model::Delete delkeys;
delkeys.SetObjects(current_chunk);
Aws::S3::Model::DeleteObjectsRequest request;
request.SetBucket(s3_uri.bucket);
request.SetDelete(delkeys);
auto outcome = client->DeleteObjects(request);
if (!outcome.IsSuccess())
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
}
}
#endif

92
src/Backups/BackupIO_S3.h Normal file
View File

@ -0,0 +1,92 @@
#pragma once
#include "config.h"
#if USE_AWS_S3
#include <Backups/BackupIO.h>
#include <IO/S3Common.h>
#include <IO/ReadSettings.h>
#include <Storages/StorageS3Settings.h>
#include <aws/s3/S3Client.h>
#include <aws/s3/model/CopyObjectRequest.h>
#include <aws/s3/model/ListObjectsV2Request.h>
#include <aws/s3/model/HeadObjectRequest.h>
#include <aws/s3/model/DeleteObjectRequest.h>
#include <aws/s3/model/DeleteObjectsRequest.h>
#include <aws/s3/model/CreateMultipartUploadRequest.h>
#include <aws/s3/model/CompleteMultipartUploadRequest.h>
#include <aws/s3/model/UploadPartCopyRequest.h>
#include <aws/s3/model/AbortMultipartUploadRequest.h>
#include <aws/s3/model/HeadObjectResult.h>
#include <aws/s3/model/ListObjectsV2Result.h>
namespace DB
{
/// Represents a backup stored to AWS S3.
class BackupReaderS3 : public IBackupReader
{
public:
BackupReaderS3(const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_);
~BackupReaderS3() override;
bool fileExists(const String & file_name) override;
UInt64 getFileSize(const String & file_name) override;
std::unique_ptr<SeekableReadBuffer> readFile(const String & file_name) override;
DataSourceDescription getDataSourceDescription() const override;
private:
S3::URI s3_uri;
std::shared_ptr<Aws::S3::S3Client> client;
UInt64 max_single_read_retries;
ReadSettings read_settings;
};
class BackupWriterS3 : public IBackupWriter
{
public:
BackupWriterS3(const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_);
~BackupWriterS3() override;
bool fileExists(const String & file_name) override;
UInt64 getFileSize(const String & file_name) override;
bool fileContentsEqual(const String & file_name, const String & expected_file_contents) override;
std::unique_ptr<WriteBuffer> writeFile(const String & file_name) override;
void removeFiles(const Strings & file_names) override;
DataSourceDescription getDataSourceDescription() const override;
bool supportNativeCopy(DataSourceDescription data_source_description) const override;
void copyFileNative(DiskPtr from_disk, const String & file_name_from, const String & file_name_to) override;
private:
Aws::S3::Model::HeadObjectOutcome requestObjectHeadData(const std::string & bucket_from, const std::string & key) const;
void copyObjectImpl(
const String & src_bucket,
const String & src_key,
const String & dst_bucket,
const String & dst_key,
std::optional<Aws::S3::Model::HeadObjectResult> head = std::nullopt,
std::optional<ObjectAttributes> metadata = std::nullopt) const;
void copyObjectMultipartImpl(
const String & src_bucket,
const String & src_key,
const String & dst_bucket,
const String & dst_key,
std::optional<Aws::S3::Model::HeadObjectResult> head = std::nullopt,
std::optional<ObjectAttributes> metadata = std::nullopt) const;
S3::URI s3_uri;
std::shared_ptr<Aws::S3::S3Client> client;
UInt64 max_single_read_retries;
ReadSettings read_settings;
S3Settings::ReadWriteSettings rw_settings;
};
}
#endif

View File

@ -455,6 +455,7 @@ void BackupImpl::createLockFile()
assert(uuid);
auto out = writer->writeFile(lock_file_name);
writeUUIDText(*uuid, *out);
out->finalize();
}
bool BackupImpl::checkLockFile(bool throw_if_failed) const

View File

@ -0,0 +1,129 @@
#include "config.h"
#include <Backups/BackupFactory.h>
#include <Common/Exception.h>
#if USE_AWS_S3
#include <Backups/BackupIO_S3.h>
#include <Backups/BackupImpl.h>
#include <IO/Archives/hasRegisteredArchiveFileExtension.h>
#include <Interpreters/Context.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <filesystem>
#endif
namespace DB
{
namespace fs = std::filesystem;
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int SUPPORT_IS_DISABLED;
}
#if USE_AWS_S3
namespace
{
String removeFileNameFromURL(String & url)
{
Poco::URI url2{url};
String path = url2.getPath();
size_t slash_pos = path.find_last_of('/');
String file_name = path.substr(slash_pos + 1);
path.resize(slash_pos + 1);
url2.setPath(path);
url = url2.toString();
return file_name;
}
}
#endif
void registerBackupEngineS3(BackupFactory & factory)
{
auto creator_fn = []([[maybe_unused]] const BackupFactory::CreateParams & params) -> std::unique_ptr<IBackup>
{
#if USE_AWS_S3
String backup_name = params.backup_info.toString();
const String & id_arg = params.backup_info.id_arg;
const auto & args = params.backup_info.args;
String s3_uri, access_key_id, secret_access_key;
if (!id_arg.empty())
{
const auto & config = params.context->getConfigRef();
auto config_prefix = "named_collections." + id_arg;
if (!config.has(config_prefix))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no collection named `{}` in config", id_arg);
s3_uri = config.getString(config_prefix + ".url");
access_key_id = config.getString(config_prefix + ".access_key_id", "");
secret_access_key = config.getString(config_prefix + ".secret_access_key", "");
if (config.has(config_prefix + ".filename"))
s3_uri = fs::path(s3_uri) / config.getString(config_prefix + ".filename");
if (args.size() > 1)
throw Exception(
"Backup S3 requires 1 or 2 arguments: named_collection, [filename]",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (args.size() == 1)
s3_uri = fs::path(s3_uri) / args[0].safeGet<String>();
}
else
{
if ((args.size() != 1) && (args.size() != 3))
throw Exception(
"Backup S3 requires 1 or 3 arguments: url, [access_key_id, secret_access_key]",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
s3_uri = args[0].safeGet<String>();
if (args.size() >= 3)
{
access_key_id = args[1].safeGet<String>();
secret_access_key = args[2].safeGet<String>();
}
}
BackupImpl::ArchiveParams archive_params;
if (hasRegisteredArchiveFileExtension(s3_uri))
{
if (params.is_internal_backup)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Using archives with backups on clusters is disabled");
archive_params.archive_name = removeFileNameFromURL(s3_uri);
archive_params.compression_method = params.compression_method;
archive_params.compression_level = params.compression_level;
archive_params.password = params.password;
}
else
{
if (!params.password.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Password is not applicable, backup cannot be encrypted");
}
if (params.open_mode == IBackup::OpenMode::READ)
{
auto reader = std::make_shared<BackupReaderS3>(S3::URI{Poco::URI{s3_uri}}, access_key_id, secret_access_key, params.context);
return std::make_unique<BackupImpl>(backup_name, archive_params, params.base_backup_info, reader, params.context);
}
else
{
auto writer = std::make_shared<BackupWriterS3>(S3::URI{Poco::URI{s3_uri}}, access_key_id, secret_access_key, params.context);
return std::make_unique<BackupImpl>(backup_name, archive_params, params.base_backup_info, writer, params.context, params.is_internal_backup, params.backup_coordination, params.backup_uuid);
}
#else
throw Exception("S3 support is disabled", ErrorCodes::SUPPORT_IS_DISABLED);
#endif
};
factory.registerBackupEngine("S3", creator_fn);
}
}

View File

@ -1,7 +1,6 @@
#include <Client/ClientBase.h>
#include <iostream>
#include <iomanip>
#include <filesystem>
#include <map>
#include <unordered_map>
@ -9,7 +8,6 @@
#include "config.h"
#include <Common/DateLUT.h>
#include <Common/LocalDate.h>
#include <Common/MemoryTracker.h>
#include <base/argsToConfig.h>
#include <base/LineReader.h>
@ -32,7 +30,6 @@
#include <Common/clearPasswordFromCommandLine.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/filesystemHelpers.h>
#include <Common/Config/configReadClient.h>
#include <Common/NetException.h>
#include <Storages/ColumnsDescription.h>
@ -70,10 +67,10 @@
#include <IO/WriteBufferFromOStream.h>
#include <IO/CompressionMethod.h>
#include <Client/InternalTextLogs.h>
#include <boost/algorithm/string/replace.hpp>
#include <IO/ForkWriteBuffer.h>
#include <Parsers/Kusto/ParserKQLStatement.h>
namespace fs = std::filesystem;
using namespace std::literals;
@ -1925,7 +1922,7 @@ bool ClientBase::processQueryText(const String & text)
String ClientBase::prompt() const
{
return boost::replace_all_copy(prompt_by_server_display_name, "{database}", config().getString("database", "default"));
return prompt_by_server_display_name;
}

View File

@ -393,24 +393,38 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead
Poco::Net::Socket::SocketList write_list;
Poco::Net::Socket::SocketList except_list;
for (const ReplicaState & state : replica_states)
{
Connection * connection = state.connection;
if (connection != nullptr)
read_list.push_back(*connection->socket);
}
auto timeout = is_draining ? drain_timeout : receive_timeout;
int n = Poco::Net::Socket::select(
read_list,
write_list,
except_list,
timeout);
int n = 0;
/// EINTR loop
while (true)
{
read_list.clear();
for (const ReplicaState & state : replica_states)
{
Connection * connection = state.connection;
if (connection != nullptr)
read_list.push_back(*connection->socket);
}
/// poco returns 0 on EINTR, let's reset errno to ensure that EINTR came from select().
errno = 0;
n = Poco::Net::Socket::select(
read_list,
write_list,
except_list,
timeout);
if (n <= 0 && errno == EINTR)
continue;
break;
}
/// We treat any error as timeout for simplicity.
/// And we also check if read_list is still empty just in case.
if (n <= 0 || read_list.empty())
{
const auto & addresses = dumpAddressesUnlocked();
for (ReplicaState & state : replica_states)
{
Connection * connection = state.connection;
@ -423,7 +437,7 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead
throw Exception(ErrorCodes::TIMEOUT_EXCEEDED,
"Timeout ({} ms) exceeded while reading from {}",
timeout.totalMilliseconds(),
dumpAddressesUnlocked());
addresses);
}
}

View File

@ -1,14 +1,21 @@
#include <Coordination/KeeperDispatcher.h>
#include <Poco/Path.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/hex.h>
#include <Common/setThreadName.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <future>
#include <chrono>
#include <Poco/Path.h>
#include <Common/hex.h>
#include <filesystem>
#include <Common/checkStackSize.h>
#include <Common/CurrentMetrics.h>
#include <future>
#include <chrono>
#include <filesystem>
#include <iterator>
#include <limits>
namespace CurrentMetrics
{
extern const Metric KeeperAliveConnections;
@ -32,9 +39,7 @@ KeeperDispatcher::KeeperDispatcher()
: responses_queue(std::numeric_limits<size_t>::max())
, configuration_and_settings(std::make_shared<KeeperConfigurationAndSettings>())
, log(&Poco::Logger::get("KeeperDispatcher"))
{
}
{}
void KeeperDispatcher::requestThread()
{
@ -191,7 +196,13 @@ void KeeperDispatcher::snapshotThread()
try
{
task.create_snapshot(std::move(task.snapshot));
auto snapshot_path = task.create_snapshot(std::move(task.snapshot));
if (snapshot_path.empty())
continue;
if (isLeader())
snapshot_s3.uploadSnapshot(snapshot_path);
}
catch (...)
{
@ -285,7 +296,9 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf
responses_thread = ThreadFromGlobalPool([this] { responseThread(); });
snapshot_thread = ThreadFromGlobalPool([this] { snapshotThread(); });
server = std::make_unique<KeeperServer>(configuration_and_settings, config, responses_queue, snapshots_queue);
snapshot_s3.startup(config);
server = std::make_unique<KeeperServer>(configuration_and_settings, config, responses_queue, snapshots_queue, snapshot_s3);
try
{
@ -312,7 +325,6 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf
/// Start it after keeper server start
session_cleaner_thread = ThreadFromGlobalPool([this] { sessionCleanerTask(); });
update_configuration_thread = ThreadFromGlobalPool([this] { updateConfigurationThread(); });
updateConfiguration(config);
LOG_DEBUG(log, "Dispatcher initialized");
}
@ -415,6 +427,8 @@ void KeeperDispatcher::shutdown()
if (server)
server->shutdown();
snapshot_s3.shutdown();
CurrentMetrics::set(CurrentMetrics::KeeperAliveConnections, 0);
}
@ -678,6 +692,8 @@ void KeeperDispatcher::updateConfiguration(const Poco::Util::AbstractConfigurati
if (!push_result)
throw Exception(ErrorCodes::SYSTEM_ERROR, "Cannot push configuration update to queue");
}
snapshot_s3.updateS3Configuration(config);
}
void KeeperDispatcher::updateKeeperStatLatency(uint64_t process_time_ms)

View File

@ -14,6 +14,7 @@
#include <Coordination/CoordinationSettings.h>
#include <Coordination/Keeper4LWInfo.h>
#include <Coordination/KeeperConnectionStats.h>
#include <Coordination/KeeperSnapshotManagerS3.h>
namespace DB
{
@ -76,6 +77,8 @@ private:
/// Counter for new session_id requests.
std::atomic<int64_t> internal_session_id_counter{0};
KeeperSnapshotManagerS3 snapshot_s3;
/// Thread put requests to raft
void requestThread();
/// Thread put responses for subscribed sessions

View File

@ -8,6 +8,7 @@
#include <string>
#include <Coordination/KeeperStateMachine.h>
#include <Coordination/KeeperStateManager.h>
#include <Coordination/KeeperSnapshotManagerS3.h>
#include <Coordination/LoggerWrapper.h>
#include <Coordination/ReadBufferFromNuraftBuffer.h>
#include <Coordination/WriteBufferFromNuraftBuffer.h>
@ -105,7 +106,8 @@ KeeperServer::KeeperServer(
const KeeperConfigurationAndSettingsPtr & configuration_and_settings_,
const Poco::Util::AbstractConfiguration & config,
ResponsesQueue & responses_queue_,
SnapshotsQueue & snapshots_queue_)
SnapshotsQueue & snapshots_queue_,
KeeperSnapshotManagerS3 & snapshot_manager_s3)
: server_id(configuration_and_settings_->server_id)
, coordination_settings(configuration_and_settings_->coordination_settings)
, log(&Poco::Logger::get("KeeperServer"))
@ -125,6 +127,7 @@ KeeperServer::KeeperServer(
configuration_and_settings_->snapshot_storage_path,
coordination_settings,
keeper_context,
config.getBool("keeper_server.upload_snapshot_on_exit", true) ? &snapshot_manager_s3 : nullptr,
checkAndGetSuperdigest(configuration_and_settings_->super_digest));
state_manager = nuraft::cs_new<KeeperStateManager>(

View File

@ -71,7 +71,8 @@ public:
const KeeperConfigurationAndSettingsPtr & settings_,
const Poco::Util::AbstractConfiguration & config_,
ResponsesQueue & responses_queue_,
SnapshotsQueue & snapshots_queue_);
SnapshotsQueue & snapshots_queue_,
KeeperSnapshotManagerS3 & snapshot_manager_s3);
/// Load state machine from the latest snapshot and load log storage. Start NuRaft with required settings.
void startup(const Poco::Util::AbstractConfiguration & config, bool enable_ipv6 = true);

View File

@ -87,7 +87,7 @@ public:
};
using KeeperStorageSnapshotPtr = std::shared_ptr<KeeperStorageSnapshot>;
using CreateSnapshotCallback = std::function<void(KeeperStorageSnapshotPtr &&)>;
using CreateSnapshotCallback = std::function<std::string(KeeperStorageSnapshotPtr &&)>;
using SnapshotMetaAndStorage = std::pair<SnapshotMetadataPtr, KeeperStoragePtr>;

View File

@ -0,0 +1,311 @@
#include <Coordination/KeeperSnapshotManagerS3.h>
#if USE_AWS_S3
#include <Core/UUID.h>
#include <Common/Exception.h>
#include <Common/setThreadName.h>
#include <IO/S3Common.h>
#include <IO/WriteBufferFromS3.h>
#include <IO/ReadBufferFromS3.h>
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadHelpers.h>
#include <IO/S3/PocoHTTPClient.h>
#include <IO/WriteHelpers.h>
#include <IO/copyData.h>
#include <aws/core/auth/AWSCredentials.h>
#include <aws/s3/S3Client.h>
#include <aws/s3/S3Errors.h>
#include <aws/s3/model/HeadObjectRequest.h>
#include <aws/s3/model/DeleteObjectRequest.h>
#include <filesystem>
namespace fs = std::filesystem;
namespace DB
{
struct KeeperSnapshotManagerS3::S3Configuration
{
S3Configuration(S3::URI uri_, S3::AuthSettings auth_settings_, std::shared_ptr<const Aws::S3::S3Client> client_)
: uri(std::move(uri_))
, auth_settings(std::move(auth_settings_))
, client(std::move(client_))
{}
S3::URI uri;
S3::AuthSettings auth_settings;
std::shared_ptr<const Aws::S3::S3Client> client;
};
KeeperSnapshotManagerS3::KeeperSnapshotManagerS3()
: snapshots_s3_queue(std::numeric_limits<size_t>::max())
, log(&Poco::Logger::get("KeeperSnapshotManagerS3"))
, uuid(UUIDHelpers::generateV4())
{}
void KeeperSnapshotManagerS3::updateS3Configuration(const Poco::Util::AbstractConfiguration & config)
{
try
{
const std::string config_prefix = "keeper_server.s3_snapshot";
if (!config.has(config_prefix))
{
std::lock_guard client_lock{snapshot_s3_client_mutex};
if (snapshot_s3_client)
LOG_INFO(log, "S3 configuration was removed");
snapshot_s3_client = nullptr;
return;
}
auto auth_settings = S3::AuthSettings::loadFromConfig(config_prefix, config);
auto endpoint = config.getString(config_prefix + ".endpoint");
auto new_uri = S3::URI{Poco::URI(endpoint)};
{
std::lock_guard client_lock{snapshot_s3_client_mutex};
// if client is not changed (same auth settings, same endpoint) we don't need to update
if (snapshot_s3_client && snapshot_s3_client->client && auth_settings == snapshot_s3_client->auth_settings
&& snapshot_s3_client->uri.uri == new_uri.uri)
return;
}
LOG_INFO(log, "S3 configuration was updated");
auto credentials = Aws::Auth::AWSCredentials(auth_settings.access_key_id, auth_settings.secret_access_key);
HeaderCollection headers = auth_settings.headers;
static constexpr size_t s3_max_redirects = 10;
static constexpr bool enable_s3_requests_logging = false;
if (!new_uri.key.empty())
{
LOG_ERROR(log, "Invalid endpoint defined for S3, it shouldn't contain key, endpoint: {}", endpoint);
return;
}
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
auth_settings.region,
RemoteHostFilter(), s3_max_redirects,
enable_s3_requests_logging,
/* for_disk_s3 = */ false);
client_configuration.endpointOverride = new_uri.endpoint;
auto client = S3::ClientFactory::instance().create(
client_configuration,
new_uri.is_virtual_hosted_style,
credentials.GetAWSAccessKeyId(),
credentials.GetAWSSecretKey(),
auth_settings.server_side_encryption_customer_key_base64,
std::move(headers),
auth_settings.use_environment_credentials.value_or(false),
auth_settings.use_insecure_imds_request.value_or(false));
auto new_client = std::make_shared<KeeperSnapshotManagerS3::S3Configuration>(std::move(new_uri), std::move(auth_settings), std::move(client));
{
std::lock_guard client_lock{snapshot_s3_client_mutex};
snapshot_s3_client = std::move(new_client);
}
LOG_INFO(log, "S3 client was updated");
}
catch (...)
{
LOG_ERROR(log, "Failed to create an S3 client for snapshots");
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
std::shared_ptr<KeeperSnapshotManagerS3::S3Configuration> KeeperSnapshotManagerS3::getSnapshotS3Client() const
{
std::lock_guard lock{snapshot_s3_client_mutex};
return snapshot_s3_client;
}
void KeeperSnapshotManagerS3::uploadSnapshotImpl(const std::string & snapshot_path)
{
try
{
auto s3_client = getSnapshotS3Client();
if (s3_client == nullptr)
return;
S3Settings::ReadWriteSettings read_write_settings;
read_write_settings.upload_part_size_multiply_parts_count_threshold = 10000;
const auto create_writer = [&](const auto & key)
{
return WriteBufferFromS3
{
s3_client->client,
s3_client->uri.bucket,
key,
read_write_settings
};
};
const auto file_exists = [&](const auto & key)
{
Aws::S3::Model::HeadObjectRequest request;
request.SetBucket(s3_client->uri.bucket);
request.SetKey(key);
auto outcome = s3_client->client->HeadObject(request);
if (outcome.IsSuccess())
return true;
const auto & error = outcome.GetError();
if (error.GetErrorType() != Aws::S3::S3Errors::NO_SUCH_KEY && error.GetErrorType() != Aws::S3::S3Errors::RESOURCE_NOT_FOUND)
throw S3Exception(error.GetErrorType(), "Failed to verify existence of lock file: {}", error.GetMessage());
return false;
};
LOG_INFO(log, "Will try to upload snapshot on {} to S3", snapshot_path);
ReadBufferFromFile snapshot_file(snapshot_path);
auto snapshot_name = fs::path(snapshot_path).filename().string();
auto lock_file = fmt::format(".{}_LOCK", snapshot_name);
if (file_exists(snapshot_name))
{
LOG_ERROR(log, "Snapshot {} already exists", snapshot_name);
return;
}
// First we need to verify that there isn't already a lock file for the snapshot we want to upload
// Only leader uploads a snapshot, but there can be a rare case where we have 2 leaders in NuRaft
if (file_exists(lock_file))
{
LOG_ERROR(log, "Lock file for {} already, exists. Probably a different node is already uploading the snapshot", snapshot_name);
return;
}
// We write our UUID to lock file
LOG_DEBUG(log, "Trying to create a lock file");
WriteBufferFromS3 lock_writer = create_writer(lock_file);
writeUUIDText(uuid, lock_writer);
lock_writer.finalize();
// We read back the written UUID, if it's the same we can upload the file
ReadBufferFromS3 lock_reader
{
s3_client->client,
s3_client->uri.bucket,
lock_file,
"",
1,
{}
};
std::string read_uuid;
readStringUntilEOF(read_uuid, lock_reader);
if (read_uuid != toString(uuid))
{
LOG_ERROR(log, "Failed to create a lock file");
return;
}
SCOPE_EXIT(
{
LOG_INFO(log, "Removing lock file");
try
{
Aws::S3::Model::DeleteObjectRequest delete_request;
delete_request.SetBucket(s3_client->uri.bucket);
delete_request.SetKey(lock_file);
auto delete_outcome = s3_client->client->DeleteObject(delete_request);
if (!delete_outcome.IsSuccess())
throw S3Exception(delete_outcome.GetError().GetMessage(), delete_outcome.GetError().GetErrorType());
}
catch (...)
{
LOG_INFO(log, "Failed to delete lock file for {} from S3", snapshot_path);
tryLogCurrentException(__PRETTY_FUNCTION__);
}
});
WriteBufferFromS3 snapshot_writer = create_writer(snapshot_name);
copyData(snapshot_file, snapshot_writer);
snapshot_writer.finalize();
LOG_INFO(log, "Successfully uploaded {} to S3", snapshot_path);
}
catch (...)
{
LOG_INFO(log, "Failure during upload of {} to S3", snapshot_path);
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
void KeeperSnapshotManagerS3::snapshotS3Thread()
{
setThreadName("KeeperS3SnpT");
while (!shutdown_called)
{
std::string snapshot_path;
if (!snapshots_s3_queue.pop(snapshot_path))
break;
if (shutdown_called)
break;
uploadSnapshotImpl(snapshot_path);
}
}
void KeeperSnapshotManagerS3::uploadSnapshot(const std::string & path, bool async_upload)
{
if (getSnapshotS3Client() == nullptr)
return;
if (async_upload)
{
if (!snapshots_s3_queue.push(path))
LOG_WARNING(log, "Failed to add snapshot {} to S3 queue", path);
return;
}
uploadSnapshotImpl(path);
}
void KeeperSnapshotManagerS3::startup(const Poco::Util::AbstractConfiguration & config)
{
updateS3Configuration(config);
snapshot_s3_thread = ThreadFromGlobalPool([this] { snapshotS3Thread(); });
}
void KeeperSnapshotManagerS3::shutdown()
{
if (shutdown_called)
return;
LOG_DEBUG(log, "Shutting down KeeperSnapshotManagerS3");
shutdown_called = true;
try
{
snapshots_s3_queue.finish();
if (snapshot_s3_thread.joinable())
snapshot_s3_thread.join();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
LOG_INFO(log, "KeeperSnapshotManagerS3 shut down");
}
}
#endif

View File

@ -0,0 +1,68 @@
#pragma once
#include "config.h"
#include <Poco/Util/AbstractConfiguration.h>
#if USE_AWS_S3
#include <Common/ConcurrentBoundedQueue.h>
#include <Common/ThreadPool.h>
#include <Common/logger_useful.h>
#include <string>
#endif
namespace DB
{
#if USE_AWS_S3
class KeeperSnapshotManagerS3
{
public:
KeeperSnapshotManagerS3();
void updateS3Configuration(const Poco::Util::AbstractConfiguration & config);
void uploadSnapshot(const std::string & path, bool async_upload = true);
void startup(const Poco::Util::AbstractConfiguration & config);
void shutdown();
private:
using SnapshotS3Queue = ConcurrentBoundedQueue<std::string>;
SnapshotS3Queue snapshots_s3_queue;
/// Upload new snapshots to S3
ThreadFromGlobalPool snapshot_s3_thread;
struct S3Configuration;
mutable std::mutex snapshot_s3_client_mutex;
std::shared_ptr<S3Configuration> snapshot_s3_client;
std::atomic<bool> shutdown_called{false};
Poco::Logger * log;
UUID uuid;
std::shared_ptr<S3Configuration> getSnapshotS3Client() const;
void uploadSnapshotImpl(const std::string & snapshot_path);
/// Thread upload snapshots to S3 in the background
void snapshotS3Thread();
};
#else
class KeeperSnapshotManagerS3
{
public:
KeeperSnapshotManagerS3() = default;
void updateS3Configuration(const Poco::Util::AbstractConfiguration &) {}
void uploadSnapshot(const std::string &, [[maybe_unused]] bool async_upload = true) {}
void startup(const Poco::Util::AbstractConfiguration &) {}
void shutdown() {}
};
#endif
}

View File

@ -44,6 +44,7 @@ KeeperStateMachine::KeeperStateMachine(
const std::string & snapshots_path_,
const CoordinationSettingsPtr & coordination_settings_,
const KeeperContextPtr & keeper_context_,
KeeperSnapshotManagerS3 * snapshot_manager_s3_,
const std::string & superdigest_)
: coordination_settings(coordination_settings_)
, snapshot_manager(
@ -59,6 +60,7 @@ KeeperStateMachine::KeeperStateMachine(
, log(&Poco::Logger::get("KeeperStateMachine"))
, superdigest(superdigest_)
, keeper_context(keeper_context_)
, snapshot_manager_s3(snapshot_manager_s3_)
{
}
@ -400,13 +402,22 @@ void KeeperStateMachine::create_snapshot(nuraft::snapshot & s, nuraft::async_res
}
when_done(ret, exception);
return ret ? latest_snapshot_path : "";
};
if (keeper_context->server_state == KeeperContext::Phase::SHUTDOWN)
{
LOG_INFO(log, "Creating a snapshot during shutdown because 'create_snapshot_on_exit' is enabled.");
snapshot_task.create_snapshot(std::move(snapshot_task.snapshot));
auto snapshot_path = snapshot_task.create_snapshot(std::move(snapshot_task.snapshot));
if (!snapshot_path.empty() && snapshot_manager_s3)
{
LOG_INFO(log, "Uploading snapshot {} during shutdown because 'upload_snapshot_on_exit' is enabled.", snapshot_path);
snapshot_manager_s3->uploadSnapshot(snapshot_path, /* asnyc_upload */ false);
}
return;
}

View File

@ -2,11 +2,13 @@
#include <Coordination/CoordinationSettings.h>
#include <Coordination/KeeperSnapshotManager.h>
#include <Coordination/KeeperSnapshotManagerS3.h>
#include <Coordination/KeeperContext.h>
#include <Coordination/KeeperStorage.h>
#include <libnuraft/nuraft.hxx>
#include <Common/ConcurrentBoundedQueue.h>
#include <Common/logger_useful.h>
#include <Coordination/KeeperContext.h>
namespace DB
@ -26,6 +28,7 @@ public:
const std::string & snapshots_path_,
const CoordinationSettingsPtr & coordination_settings_,
const KeeperContextPtr & keeper_context_,
KeeperSnapshotManagerS3 * snapshot_manager_s3_,
const std::string & superdigest_ = "");
/// Read state from the latest snapshot
@ -146,6 +149,8 @@ private:
const std::string superdigest;
KeeperContextPtr keeper_context;
KeeperSnapshotManagerS3 * snapshot_manager_s3;
};
}

View File

@ -1318,7 +1318,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint
ResponsesQueue queue(std::numeric_limits<size_t>::max());
SnapshotsQueue snapshots_queue{1};
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context);
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr);
state_machine->init();
DB::KeeperLogStore changelog("./logs", settings->rotate_log_storage_interval, true, enable_compression);
changelog.init(state_machine->last_commit_index() + 1, settings->reserved_log_items);
@ -1359,7 +1359,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint
}
SnapshotsQueue snapshots_queue1{1};
auto restore_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue1, "./snapshots", settings, keeper_context);
auto restore_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue1, "./snapshots", settings, keeper_context, nullptr);
restore_machine->init();
EXPECT_EQ(restore_machine->last_commit_index(), total_logs - total_logs % settings->snapshot_distance);
@ -1471,7 +1471,7 @@ TEST_P(CoordinationTest, TestEphemeralNodeRemove)
ResponsesQueue queue(std::numeric_limits<size_t>::max());
SnapshotsQueue snapshots_queue{1};
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context);
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr);
state_machine->init();
std::shared_ptr<ZooKeeperCreateRequest> request_c = std::make_shared<ZooKeeperCreateRequest>();

View File

@ -76,9 +76,9 @@ void SerializationDate::serializeTextCSV(const IColumn & column, size_t row_num,
void SerializationDate::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
LocalDate value;
DayNum value;
readCSV(value, istr);
assert_cast<ColumnUInt16 &>(column).getData().push_back(value.getDayNum());
assert_cast<ColumnUInt16 &>(column).getData().push_back(value);
}
}

View File

@ -241,6 +241,11 @@ DiskObjectStoragePtr DiskDecorator::createDiskObjectStorage()
return delegate->createDiskObjectStorage();
}
ObjectStoragePtr DiskDecorator::getObjectStorage()
{
return delegate->getObjectStorage();
}
DiskPtr DiskDecorator::getNestedDisk() const
{
if (const auto * decorator = dynamic_cast<const DiskDecorator *>(delegate.get()))

View File

@ -89,6 +89,7 @@ public:
void getRemotePathsRecursive(const String & path, std::vector<LocalPathWithObjectStoragePaths> & paths_map) override { return delegate->getRemotePathsRecursive(path, paths_map); }
DiskObjectStoragePtr createDiskObjectStorage() override;
ObjectStoragePtr getObjectStorage() override;
NameSet getCacheLayersNames() const override { return delegate->getCacheLayersNames(); }
MetadataStoragePtr getMetadataStorage() override { return delegate->getMetadataStorage(); }

View File

@ -366,6 +366,14 @@ public:
/// Return current disk revision.
virtual UInt64 getRevision() const { return 0; }
virtual ObjectStoragePtr getObjectStorage()
{
throw Exception(
ErrorCodes::NOT_IMPLEMENTED,
"Method getObjectStorage() is not implemented for disk type: {}",
getDataSourceDescription().type);
}
/// Create disk object storage according to disk type.
/// For example for DiskLocal create DiskObjectStorage(LocalObjectStorage),
/// for DiskObjectStorage create just a copy.

View File

@ -82,6 +82,11 @@ DiskTransactionPtr DiskObjectStorage::createTransaction()
return std::make_shared<FakeDiskTransaction>(*this);
}
ObjectStoragePtr DiskObjectStorage::getObjectStorage()
{
return object_storage;
}
DiskTransactionPtr DiskObjectStorage::createObjectStorageTransaction()
{
return std::make_shared<DiskObjectStorageTransaction>(

View File

@ -166,6 +166,8 @@ public:
UInt64 getRevision() const override;
ObjectStoragePtr getObjectStorage() override;
DiskObjectStoragePtr createDiskObjectStorage() override;
bool supportsCache() const override;

View File

@ -11,6 +11,7 @@
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/getLeastSupertype.h>
@ -875,4 +876,19 @@ String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, Fo
return result;
}
void checkSupportedDelimiterAfterField(FormatSettings::EscapingRule escaping_rule, const String & delimiter, const DataTypePtr & type)
{
if (escaping_rule != FormatSettings::EscapingRule::Escaped)
return;
bool is_supported_delimiter_after_string = !delimiter.empty() && (delimiter.front() == '\t' || delimiter.front() == '\n');
if (is_supported_delimiter_after_string)
return;
/// Nullptr means that field is skipped and it's equivalent to String
if (!type || isString(removeNullable(removeLowCardinality(type))))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "'Escaped' serialization requires delimiter after String field to start with '\\t' or '\\n'");
}
}

View File

@ -77,6 +77,8 @@ void transformInferredTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, c
void transformInferredJSONTypesIfNeeded(DataTypes & types, const FormatSettings & settings, const std::unordered_set<const IDataType *> * numbers_parsed_from_json_strings = nullptr);
void transformInferredJSONTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, const FormatSettings & settings);
String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings,FormatSettings::EscapingRule escaping_rule);
String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, FormatSettings::EscapingRule escaping_rule);
void checkSupportedDelimiterAfterField(FormatSettings::EscapingRule escaping_rule, const String & delimiter, const DataTypePtr & type);
}

View File

@ -0,0 +1,472 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionFactory.h>
#include "Common/Exception.h"
#include <Common/NaNUtils.h>
#include <Columns/ColumnConst.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypesNumber.h>
#include <Common/FieldVisitorConvertToNumber.h>
#include <Common/ProfileEvents.h>
#include <Common/assert_cast.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/Context_fwd.h>
#include <random>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
}
namespace
{
struct UniformDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randUniform"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(Float64 min, Float64 max, ColumnFloat64::Container & container)
{
auto distribution = std::uniform_real_distribution<>(min, max);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct NormalDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randNormal"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(Float64 mean, Float64 variance, ColumnFloat64::Container & container)
{
auto distribution = std::normal_distribution<>(mean, variance);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct LogNormalDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randLogNormal"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(Float64 mean, Float64 variance, ColumnFloat64::Container & container)
{
auto distribution = std::lognormal_distribution<>(mean, variance);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct ExponentialDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randExponential"; }
static constexpr size_t getNumberOfArguments() { return 1; }
static void generate(Float64 lambda, ColumnFloat64::Container & container)
{
auto distribution = std::exponential_distribution<>(lambda);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct ChiSquaredDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randChiSquared"; }
static constexpr size_t getNumberOfArguments() { return 1; }
static void generate(Float64 degree_of_freedom, ColumnFloat64::Container & container)
{
auto distribution = std::chi_squared_distribution<>(degree_of_freedom);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct StudentTDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randStudentT"; }
static constexpr size_t getNumberOfArguments() { return 1; }
static void generate(Float64 degree_of_freedom, ColumnFloat64::Container & container)
{
auto distribution = std::student_t_distribution<>(degree_of_freedom);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct FisherFDistribution
{
using ReturnType = DataTypeFloat64;
static constexpr const char * getName() { return "randFisherF"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(Float64 d1, Float64 d2, ColumnFloat64::Container & container)
{
auto distribution = std::fisher_f_distribution<>(d1, d2);
for (auto & elem : container)
elem = distribution(thread_local_rng);
}
};
struct BernoulliDistribution
{
using ReturnType = DataTypeUInt8;
static constexpr const char * getName() { return "randBernoulli"; }
static constexpr size_t getNumberOfArguments() { return 1; }
static void generate(Float64 p, ColumnUInt8::Container & container)
{
if (p < 0.0f || p > 1.0f)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName());
auto distribution = std::bernoulli_distribution(p);
for (auto & elem : container)
elem = static_cast<UInt8>(distribution(thread_local_rng));
}
};
struct BinomialDistribution
{
using ReturnType = DataTypeUInt64;
static constexpr const char * getName() { return "randBinomial"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(UInt64 t, Float64 p, ColumnUInt64::Container & container)
{
if (p < 0.0f || p > 1.0f)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName());
auto distribution = std::binomial_distribution(t, p);
for (auto & elem : container)
elem = static_cast<UInt64>(distribution(thread_local_rng));
}
};
struct NegativeBinomialDistribution
{
using ReturnType = DataTypeUInt64;
static constexpr const char * getName() { return "randNegativeBinomial"; }
static constexpr size_t getNumberOfArguments() { return 2; }
static void generate(UInt64 t, Float64 p, ColumnUInt64::Container & container)
{
if (p < 0.0f || p > 1.0f)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName());
auto distribution = std::negative_binomial_distribution(t, p);
for (auto & elem : container)
elem = static_cast<UInt64>(distribution(thread_local_rng));
}
};
struct PoissonDistribution
{
using ReturnType = DataTypeUInt64;
static constexpr const char * getName() { return "randPoisson"; }
static constexpr size_t getNumberOfArguments() { return 1; }
static void generate(UInt64 n, ColumnUInt64::Container & container)
{
auto distribution = std::poisson_distribution(n);
for (auto & elem : container)
elem = static_cast<UInt64>(distribution(thread_local_rng));
}
};
}
/** Function which will generate values according to the specified distribution
* Accepts only constant arguments
* Similar to the functions rand and rand64 an additional 'tag' argument could be added to the
* end of arguments list (this argument will be ignored) which will guarantee that functions are not sticked together
* during optimisations.
* Example: SELECT randNormal(0, 1, 1), randNormal(0, 1, 2) FROM numbers(10)
* This query will return two different columns
*/
template <typename Distribution>
class FunctionRandomDistribution : public IFunction
{
private:
template <typename ResultType>
ResultType getParameterFromConstColumn(size_t parameter_number, const ColumnsWithTypeAndName & arguments) const
{
if (parameter_number >= arguments.size())
throw Exception(
ErrorCodes::LOGICAL_ERROR, "Parameter number ({}) is greater than the size of arguments ({}). This is a bug", parameter_number, arguments.size());
const IColumn * col = arguments[parameter_number].column.get();
if (!isColumnConst(*col))
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Parameter number {} of function must be constant.", parameter_number, getName());
auto parameter = applyVisitor(FieldVisitorConvertToNumber<ResultType>(), assert_cast<const ColumnConst &>(*col).getField());
if (isNaN(parameter) || !std::isfinite(parameter))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Parameter number {} of function {} cannot be NaN of infinite", parameter_number, getName());
return parameter;
}
public:
static FunctionPtr create(ContextPtr)
{
return std::make_shared<FunctionRandomDistribution<Distribution>>();
}
static constexpr auto name = Distribution::getName();
String getName() const override { return name; }
size_t getNumberOfArguments() const override { return Distribution::getNumberOfArguments(); }
bool isVariadic() const override { return true; }
bool isDeterministic() const override { return false; }
bool isDeterministicInScopeOfQuery() const override { return false; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
auto desired = Distribution::getNumberOfArguments();
if (arguments.size() != desired && arguments.size() != desired + 1)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Wrong number of arguments for function {}. Should be {} or {}", getName(), desired, desired + 1);
for (size_t i = 0; i < Distribution::getNumberOfArguments(); ++i)
{
const auto & type = arguments[i];
WhichDataType which(type);
if (!which.isFloat() && !which.isNativeUInt())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument of function {}, expected Float64 or integer", type->getName(), getName());
}
return std::make_shared<typename Distribution::ReturnType>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
{
if constexpr (std::is_same_v<Distribution, BernoulliDistribution>)
{
auto res_column = ColumnUInt8::create(input_rows_count);
auto & res_data = res_column->getData();
Distribution::generate(getParameterFromConstColumn<Float64>(0, arguments), res_data);
return res_column;
}
else if constexpr (std::is_same_v<Distribution, BinomialDistribution> || std::is_same_v<Distribution, NegativeBinomialDistribution>)
{
auto res_column = ColumnUInt64::create(input_rows_count);
auto & res_data = res_column->getData();
Distribution::generate(getParameterFromConstColumn<UInt64>(0, arguments), getParameterFromConstColumn<Float64>(1, arguments), res_data);
return res_column;
}
else if constexpr (std::is_same_v<Distribution, PoissonDistribution>)
{
auto res_column = ColumnUInt64::create(input_rows_count);
auto & res_data = res_column->getData();
Distribution::generate(getParameterFromConstColumn<UInt64>(0, arguments), res_data);
return res_column;
}
else
{
auto res_column = ColumnFloat64::create(input_rows_count);
auto & res_data = res_column->getData();
if constexpr (Distribution::getNumberOfArguments() == 1)
{
Distribution::generate(getParameterFromConstColumn<Float64>(0, arguments), res_data);
}
else if constexpr (Distribution::getNumberOfArguments() == 2)
{
Distribution::generate(getParameterFromConstColumn<Float64>(0, arguments), getParameterFromConstColumn<Float64>(1, arguments), res_data);
}
else
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "More than two argument specified for function {}", getName());
}
return res_column;
}
}
};
REGISTER_FUNCTION(Distribution)
{
factory.registerFunction<FunctionRandomDistribution<UniformDistribution>>(
{
R"(
Returns a random number from the uniform distribution in the specified range.
Accepts two parameters - minimum bound and maximum bound.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randUniform(0, 1) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<NormalDistribution>>(
{
R"(
Returns a random number from the normal distribution.
Accepts two parameters - mean and variance.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randNormal(0, 5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<LogNormalDistribution>>(
{
R"(
Returns a random number from the lognormal distribution (a distribution of a random variable whose logarithm is normally distributed).
Accepts two parameters - mean and variance.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randLogNormal(0, 5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<ExponentialDistribution>>(
{
R"(
Returns a random number from the exponential distribution.
Accepts one parameter.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randExponential(0, 5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<ChiSquaredDistribution>>(
{
R"(
Returns a random number from the chi-squared distribution (a distribution of a sum of the squares of k independent standard normal random variables).
Accepts one parameter - degree of freedom.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randChiSquared(5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<StudentTDistribution>>(
{
R"(
Returns a random number from the t-distribution.
Accepts one parameter - degree of freedom.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randStudentT(5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<FisherFDistribution>>(
{
R"(
Returns a random number from the f-distribution.
The F-distribution is the distribution of X = (S1 / d1) / (S2 / d2) where d1 and d2 are degrees of freedom.
Accepts two parameters - degrees of freedom.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randFisherF(5) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<BernoulliDistribution>>(
{
R"(
Returns a random number from the Bernoulli distribution.
Accepts two parameters - probability of success.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randBernoulli(0.1) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<BinomialDistribution>>(
{
R"(
Returns a random number from the binomial distribution.
Accepts two parameters - number of experiments and probability of success in each experiment.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randBinomial(10, 0.1) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<NegativeBinomialDistribution>>(
{
R"(
Returns a random number from the negative binomial distribution.
Accepts two parameters - number of experiments and probability of success in each experiment.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randNegativeBinomial(10, 0.1) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
factory.registerFunction<FunctionRandomDistribution<PoissonDistribution>>(
{
R"(
Returns a random number from the poisson distribution.
Accepts two parameters - the mean number of occurrences.
Typical usage:
[example:typical]
)",
Documentation::Examples{
{"typical", "SELECT randPoisson(3) FROM numbers(100000);"}},
Documentation::Categories{"Distribution"}
});
}
}

View File

@ -1095,6 +1095,7 @@ inline void readText(is_floating_point auto & x, ReadBuffer & buf) { readFloatTe
inline void readText(String & x, ReadBuffer & buf) { readEscapedString(x, buf); }
inline void readText(LocalDate & x, ReadBuffer & buf) { readDateText(x, buf); }
inline void readText(DayNum & x, ReadBuffer & buf) { readDateText(x, buf); }
inline void readText(LocalDateTime & x, ReadBuffer & buf) { readDateTimeText(x, buf); }
inline void readText(UUID & x, ReadBuffer & buf) { readUUIDText(x, buf); }
@ -1176,6 +1177,7 @@ inline void readCSV(T & x, ReadBuffer & buf)
inline void readCSV(String & x, ReadBuffer & buf, const FormatSettings::CSV & settings) { readCSVString(x, buf, settings); }
inline void readCSV(LocalDate & x, ReadBuffer & buf) { readCSVSimple(x, buf); }
inline void readCSV(DayNum & x, ReadBuffer & buf) { readCSVSimple(x, buf); }
inline void readCSV(LocalDateTime & x, ReadBuffer & buf) { readCSVSimple(x, buf); }
inline void readCSV(UUID & x, ReadBuffer & buf) { readCSVSimple(x, buf); }
inline void readCSV(UInt128 & x, ReadBuffer & buf) { readCSVSimple(x, buf); }

View File

@ -2,20 +2,22 @@
#include "config.h"
#include <string>
#include <vector>
#if USE_AWS_S3
#include <Common/RemoteHostFilter.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/HTTPCommon.h>
#include <IO/S3/SessionAwareIOStream.h>
#include <Storages/StorageS3Settings.h>
#include <Storages/HeaderCollection.h>
#include <aws/core/client/ClientConfiguration.h>
#include <aws/core/http/HttpClient.h>
#include <aws/core/http/HttpRequest.h>
#include <aws/core/http/standard/StandardHttpResponse.h>
namespace Aws::Http::Standard
{
class StandardHttpResponse;
@ -23,6 +25,7 @@ class StandardHttpResponse;
namespace DB
{
class Context;
}

View File

@ -1,9 +1,11 @@
#include <IO/S3Common.h>
#include <Common/Exception.h>
#include <Poco/Util/AbstractConfiguration.h>
#include "config.h"
#if USE_AWS_S3
# include <IO/S3Common.h>
# include <Common/quoteString.h>
# include <IO/WriteBufferFromString.h>
@ -780,25 +782,16 @@ namespace S3
boost::to_upper(name);
if (name != S3 && name != COS && name != OBS && name != OSS)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Object storage system name is unrecognized in virtual hosted style S3 URI: {}", quoteString(name));
}
if (name == S3)
{
storage_name = name;
}
else if (name == OBS)
{
storage_name = OBS;
}
else if (name == OSS)
{
storage_name = OSS;
}
else
{
storage_name = COSN;
}
}
else if (re2::RE2::PartialMatch(uri.getPath(), path_style_pattern, &bucket, &key))
{
@ -851,8 +844,82 @@ namespace S3
{
return getObjectInfo(client_ptr, bucket, key, version_id, throw_on_error, for_disk_s3).size;
}
}
}
#endif
namespace DB
{
namespace ErrorCodes
{
extern const int INVALID_CONFIG_PARAMETER;
}
namespace S3
{
AuthSettings AuthSettings::loadFromConfig(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config)
{
auto access_key_id = config.getString(config_elem + ".access_key_id", "");
auto secret_access_key = config.getString(config_elem + ".secret_access_key", "");
auto region = config.getString(config_elem + ".region", "");
auto server_side_encryption_customer_key_base64 = config.getString(config_elem + ".server_side_encryption_customer_key_base64", "");
std::optional<bool> use_environment_credentials;
if (config.has(config_elem + ".use_environment_credentials"))
use_environment_credentials = config.getBool(config_elem + ".use_environment_credentials");
std::optional<bool> use_insecure_imds_request;
if (config.has(config_elem + ".use_insecure_imds_request"))
use_insecure_imds_request = config.getBool(config_elem + ".use_insecure_imds_request");
HeaderCollection headers;
Poco::Util::AbstractConfiguration::Keys subconfig_keys;
config.keys(config_elem, subconfig_keys);
for (const std::string & subkey : subconfig_keys)
{
if (subkey.starts_with("header"))
{
auto header_str = config.getString(config_elem + "." + subkey);
auto delimiter = header_str.find(':');
if (delimiter == std::string::npos)
throw Exception("Malformed s3 header value", ErrorCodes::INVALID_CONFIG_PARAMETER);
headers.emplace_back(HttpHeader{header_str.substr(0, delimiter), header_str.substr(delimiter + 1, String::npos)});
}
}
return AuthSettings
{
std::move(access_key_id), std::move(secret_access_key),
std::move(region),
std::move(server_side_encryption_customer_key_base64),
std::move(headers),
use_environment_credentials,
use_insecure_imds_request
};
}
void AuthSettings::updateFrom(const AuthSettings & from)
{
/// Update with check for emptyness only parameters which
/// can be passed not only from config, but via ast.
if (!from.access_key_id.empty())
access_key_id = from.access_key_id;
if (!from.secret_access_key.empty())
secret_access_key = from.secret_access_key;
headers = from.headers;
region = from.region;
server_side_encryption_customer_key_base64 = from.server_side_encryption_customer_key_base64;
use_environment_credentials = from.use_environment_credentials;
use_insecure_imds_request = from.use_insecure_imds_request;
}
}
}

View File

@ -1,5 +1,11 @@
#pragma once
#include <Storages/HeaderCollection.h>
#include <IO/S3/PocoHTTPClient.h>
#include <string>
#include <optional>
#include "config.h"
#if USE_AWS_S3
@ -8,7 +14,6 @@
#include <aws/core/Aws.h>
#include <aws/core/client/ClientConfiguration.h>
#include <aws/s3/S3Errors.h>
#include <IO/S3/PocoHTTPClient.h>
#include <Poco/URI.h>
#include <Common/Exception.h>
@ -27,8 +32,6 @@ namespace ErrorCodes
}
class RemoteHostFilter;
struct HttpHeader;
using HeaderCollection = std::vector<HttpHeader>;
class S3Exception : public Exception
{
@ -130,5 +133,33 @@ S3::ObjectInfo getObjectInfo(std::shared_ptr<const Aws::S3::S3Client> client_ptr
size_t getObjectSize(std::shared_ptr<const Aws::S3::S3Client> client_ptr, const String & bucket, const String & key, const String & version_id, bool throw_on_error, bool for_disk_s3);
}
#endif
namespace Poco::Util
{
class AbstractConfiguration;
};
namespace DB::S3
{
struct AuthSettings
{
static AuthSettings loadFromConfig(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config);
std::string access_key_id;
std::string secret_access_key;
std::string region;
std::string server_side_encryption_customer_key_base64;
HeaderCollection headers;
std::optional<bool> use_environment_credentials;
std::optional<bool> use_insecure_imds_request;
bool operator==(const AuthSettings & other) const = default;
void updateFrom(const AuthSettings & from);
};
}

View File

@ -66,7 +66,7 @@ FileSegment::FileSegment(
{
throw Exception(
ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR,
"Can create cell with either EMPTY, DOWNLOADED, DOWNLOADING state");
"Can only create cell with either EMPTY, DOWNLOADED or SKIP_CACHE state");
}
}
}

View File

@ -66,10 +66,10 @@ public:
*/
DOWNLOADING,
/**
* Space reservation for a file segment is incremental, i.e. downaloder reads buffer_size bytes
* Space reservation for a file segment is incremental, i.e. downloader reads buffer_size bytes
* from remote fs -> tries to reserve buffer_size bytes to put them to cache -> writes to cache
* on successful reservation and stops cache write otherwise. Those, who waited for the same file
* file segment, will read downloaded part from cache and remaining part directly from remote fs.
* segment, will read downloaded part from cache and remaining part directly from remote fs.
*/
PARTIALLY_DOWNLOADED_NO_CONTINUATION,
/**

View File

@ -7,6 +7,7 @@
#include <Interpreters/ProcessList.h>
#include <Interpreters/OptimizeShardingKeyRewriteInVisitor.h>
#include <QueryPipeline/Pipe.h>
#include <Parsers/queryToString.h>
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/ReadFromRemote.h>
#include <Processors/QueryPlan/UnionStep.h>
@ -26,7 +27,7 @@ namespace ErrorCodes
namespace ClusterProxy
{
ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr context, const Settings & settings, Poco::Logger * log)
ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr context, const Settings & settings, const StorageID & main_table, const SelectQueryInfo * query_info, Poco::Logger * log)
{
Settings new_settings = settings;
new_settings.queue_max_wait_ms = Cluster::saturate(new_settings.queue_max_wait_ms, settings.max_execution_time);
@ -96,6 +97,20 @@ ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr c
new_settings.limit.changed = false;
}
/// Setting additional_table_filters may be applied to Distributed table.
/// In case if query is executed up to WithMergableState on remote shard, it is impossible to filter on initiator.
/// We need to propagate the setting, but change the table name from distributed to source.
///
/// Here we don't try to analyze setting again. In case if query_info->additional_filter_ast is not empty, some filter was applied.
/// It's just easier to add this filter for a source table.
if (query_info && query_info->additional_filter_ast)
{
Tuple tuple;
tuple.push_back(main_table.getShortName());
tuple.push_back(queryToString(query_info->additional_filter_ast));
new_settings.additional_table_filters.value.push_back(std::move(tuple));
}
auto new_context = Context::createCopy(context);
new_context->setSettings(new_settings);
return new_context;
@ -121,7 +136,7 @@ void executeQuery(
std::vector<QueryPlanPtr> plans;
SelectStreamFactory::Shards remote_shards;
auto new_context = updateSettingsForCluster(*query_info.getCluster(), context, settings, log);
auto new_context = updateSettingsForCluster(*query_info.getCluster(), context, settings, main_table, &query_info, log);
new_context->getClientInfo().distributed_depth += 1;

View File

@ -35,7 +35,7 @@ class SelectStreamFactory;
///
/// @return new Context with adjusted settings
ContextMutablePtr updateSettingsForCluster(
const Cluster & cluster, ContextPtr context, const Settings & settings, Poco::Logger * log = nullptr);
const Cluster & cluster, ContextPtr context, const Settings & settings, const StorageID & main_table, const SelectQueryInfo * query_info = nullptr, Poco::Logger * log = nullptr);
/// Execute a distributed query, creating a query plan, from which the query pipeline can be built.
/// `stream_factory` object encapsulates the logic of creating plans for a different type of query

View File

@ -114,7 +114,7 @@ DDLWorker::DDLWorker(
void DDLWorker::startup()
{
[[maybe_unused]] bool prev_stop_flag = stop_flag.exchange(false);
chassert(true);
chassert(prev_stop_flag);
main_thread = ThreadFromGlobalPool(&DDLWorker::runMainThread, this);
cleanup_thread = ThreadFromGlobalPool(&DDLWorker::runCleanupThread, this);
}

View File

@ -67,6 +67,19 @@ CustomSeparatedRowInputFormat::CustomSeparatedRowInputFormat(
}
}
void CustomSeparatedRowInputFormat::readPrefix()
{
RowInputFormatWithNamesAndTypes::readPrefix();
/// Provide better error message for unsupported delimiters
for (const auto & column_index : column_mapping->column_indexes_for_input_fields)
{
if (column_index)
checkSupportedDelimiterAfterField(format_settings.custom.escaping_rule, format_settings.custom.field_delimiter, data_types[*column_index]);
else
checkSupportedDelimiterAfterField(format_settings.custom.escaping_rule, format_settings.custom.field_delimiter, nullptr);
}
}
bool CustomSeparatedRowInputFormat::allowSyncAfterError() const
{

View File

@ -30,6 +30,7 @@ private:
bool allowSyncAfterError() const override;
void syncAfterError() override;
void readPrefix() override;
std::unique_ptr<PeekableReadBuffer> buf;
bool ignore_spaces;

View File

@ -53,18 +53,25 @@ TemplateRowInputFormat::TemplateRowInputFormat(const Block & header_, std::uniqu
std::vector<UInt8> column_in_format(header_.columns(), false);
for (size_t i = 0; i < row_format.columnsCount(); ++i)
{
if (row_format.format_idx_to_column_idx[i])
const auto & column_index = row_format.format_idx_to_column_idx[i];
if (column_index)
{
if (header_.columns() <= *row_format.format_idx_to_column_idx[i])
row_format.throwInvalidFormat("Column index " + std::to_string(*row_format.format_idx_to_column_idx[i]) +
if (header_.columns() <= *column_index)
row_format.throwInvalidFormat("Column index " + std::to_string(*column_index) +
" must be less then number of columns (" + std::to_string(header_.columns()) + ")", i);
if (row_format.escaping_rules[i] == EscapingRule::None)
row_format.throwInvalidFormat("Column is not skipped, but deserialization type is None", i);
size_t col_idx = *row_format.format_idx_to_column_idx[i];
size_t col_idx = *column_index;
if (column_in_format[col_idx])
row_format.throwInvalidFormat("Duplicate column", i);
column_in_format[col_idx] = true;
checkSupportedDelimiterAfterField(row_format.escaping_rules[i], row_format.delimiters[i + 1], data_types[*column_index]);
}
else
{
checkSupportedDelimiterAfterField(row_format.escaping_rules[i], row_format.delimiters[i + 1], nullptr);
}
}

View File

@ -41,6 +41,7 @@ protected:
void resetParser() override;
bool isGarbageAfterField(size_t index, ReadBuffer::Position pos) override;
void setReadBuffer(ReadBuffer & in_) override;
void readPrefix() override;
const FormatSettings format_settings;
DataTypes data_types;
@ -48,7 +49,6 @@ protected:
private:
bool readRow(MutableColumns & columns, RowReadExtension & ext) override;
void readPrefix() override;
bool parseRowAndPrintDiagnosticInfo(MutableColumns & columns, WriteBuffer & out) override;
void tryDeserializeField(const DataTypePtr & type, IColumn & column, size_t file_column) override;

View File

@ -755,9 +755,10 @@ bool isMetadataOnlyConversion(const IDataType * from, const IDataType * to)
const auto * nullable_from = typeid_cast<const DataTypeNullable *>(from);
const auto * nullable_to = typeid_cast<const DataTypeNullable *>(to);
if (nullable_from && nullable_to)
if (nullable_to)
{
from = nullable_from->getNestedType().get();
/// Here we allow a conversion X -> Nullable(X) to make a metadata-only conversion.
from = nullable_from ? nullable_from->getNestedType().get() : from;
to = nullable_to->getNestedType().get();
continue;
}

View File

@ -117,7 +117,7 @@ struct URLBasedDataSourceConfiguration
struct StorageS3Configuration : URLBasedDataSourceConfiguration
{
S3Settings::AuthSettings auth_settings;
S3::AuthSettings auth_settings;
S3Settings::ReadWriteSettings rw_settings;
};

View File

@ -406,14 +406,18 @@ void DataPartStorageOnDisk::clearDirectory(
}
}
std::string DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const
std::optional<String> DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const
{
assert(!broken || detached);
String res;
auto full_relative_path = fs::path(root_path);
if (detached)
full_relative_path /= "detached";
std::optional<String> original_checksums_content;
std::optional<Strings> original_files_list;
for (int try_no = 0; try_no < 10; ++try_no)
{
res = (prefix.empty() ? "" : prefix + "_") + part_dir + (try_no ? "_try" + DB::toString(try_no) : "");
@ -421,12 +425,69 @@ std::string DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log,
if (!volume->getDisk()->exists(full_relative_path / res))
return res;
if (broken && looksLikeBrokenDetachedPartHasTheSameContent(res, original_checksums_content, original_files_list))
{
LOG_WARNING(log, "Directory {} (to detach to) already exists, "
"but its content looks similar to content of the broken part which we are going to detach. "
"Assuming it was already cloned to detached, will not do it again to avoid redundant copies of broken part.", res);
return {};
}
LOG_WARNING(log, "Directory {} (to detach to) already exists. Will detach to directory with '_tryN' suffix.", res);
}
return res;
}
bool DataPartStorageOnDisk::looksLikeBrokenDetachedPartHasTheSameContent(const String & detached_part_path,
std::optional<String> & original_checksums_content,
std::optional<Strings> & original_files_list) const
{
/// We cannot know for sure that content of detached part is the same,
/// but in most cases it's enough to compare checksums.txt and list of files.
if (!exists("checksums.txt"))
return false;
auto detached_full_path = fs::path(root_path) / "detached" / detached_part_path;
auto disk = volume->getDisk();
if (!disk->exists(detached_full_path / "checksums.txt"))
return false;
if (!original_checksums_content)
{
auto in = disk->readFile(detached_full_path / "checksums.txt", /* settings */ {}, /* read_hint */ {}, /* file_size */ {});
original_checksums_content.emplace();
readStringUntilEOF(*original_checksums_content, *in);
}
if (original_checksums_content->empty())
return false;
auto part_full_path = fs::path(root_path) / part_dir;
String detached_checksums_content;
{
auto in = readFile("checksums.txt", /* settings */ {}, /* read_hint */ {}, /* file_size */ {});
readStringUntilEOF(detached_checksums_content, *in);
}
if (original_checksums_content != detached_checksums_content)
return false;
if (!original_files_list)
{
original_files_list.emplace();
disk->listFiles(part_full_path, *original_files_list);
std::sort(original_files_list->begin(), original_files_list->end());
}
Strings detached_files_list;
disk->listFiles(detached_full_path, detached_files_list);
std::sort(detached_files_list.begin(), detached_files_list.end());
return original_files_list == detached_files_list;
}
void DataPartStorageBuilderOnDisk::setRelativePath(const std::string & path)
{
part_dir = path;

View File

@ -52,7 +52,12 @@ public:
MergeTreeDataPartState state,
Poco::Logger * log) override;
std::string getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const override;
/// Returns path to place detached part in or nullopt if we don't need to detach part (if it already exists and has the same content)
std::optional<String> getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const override;
/// Returns true if detached part already exists and has the same content (compares checksums.txt and the list of files)
bool looksLikeBrokenDetachedPartHasTheSameContent(const String & detached_part_path, std::optional<String> & original_checksums_content,
std::optional<Strings> & original_files_list) const;
void setRelativePath(const std::string & path) override;
void onRename(const std::string & new_root_path, const std::string & new_part_dir) override;

View File

@ -129,7 +129,7 @@ public:
/// Get a name like 'prefix_partdir_tryN' which does not exist in a root dir.
/// TODO: remove it.
virtual std::string getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const = 0;
virtual std::optional<String> getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const = 0;
/// Reset part directory, used for im-memory parts.
/// TODO: remove it.

View File

@ -1478,8 +1478,9 @@ void IMergeTreeDataPart::remove() const
data_part_storage->remove(std::move(can_remove_callback), checksums, projection_checksums, is_temp, getState(), storage.log);
}
String IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool detached) const
std::optional<String> IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool detached, bool broken) const
{
assert(!broken || detached);
String res;
/** If you need to detach a part, and directory into which we want to rename it already exists,
@ -1491,22 +1492,26 @@ String IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool
if (detached && parent_part)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot detach projection");
return data_part_storage->getRelativePathForPrefix(storage.log, prefix, detached);
return data_part_storage->getRelativePathForPrefix(storage.log, prefix, detached, broken);
}
String IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) const
std::optional<String> IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix, bool broken) const
{
/// Do not allow underscores in the prefix because they are used as separators.
assert(prefix.find_first_of('_') == String::npos);
assert(prefix.empty() || std::find(DetachedPartInfo::DETACH_REASONS.begin(),
DetachedPartInfo::DETACH_REASONS.end(),
prefix) != DetachedPartInfo::DETACH_REASONS.end());
return "detached/" + getRelativePathForPrefix(prefix, /* detached */ true);
if (auto path = getRelativePathForPrefix(prefix, /* detached */ true, broken))
return "detached/" + *path;
return {};
}
void IMergeTreeDataPart::renameToDetached(const String & prefix, DataPartStorageBuilderPtr builder) const
{
renameTo(getRelativePathForDetachedPart(prefix), true, builder);
auto path_to_detach = getRelativePathForDetachedPart(prefix, /* broken */ false);
assert(path_to_detach);
renameTo(path_to_detach.value(), true, builder);
part_is_probably_removed_from_disk = true;
}
@ -1518,9 +1523,16 @@ void IMergeTreeDataPart::makeCloneInDetached(const String & prefix, const Storag
/// because hardlinks tracking doesn't work for detached parts.
bool copy_instead_of_hardlink = isStoredOnRemoteDiskWithZeroCopySupport() && storage.supportsReplication() && storage_settings->allow_remote_fs_zero_copy_replication;
/// Avoid unneeded duplicates of broken parts if we try to detach the same broken part multiple times.
/// Otherwise it may pollute detached/ with dirs with _tryN suffix and we will fail to remove broken part after 10 attempts.
bool broken = !prefix.empty();
auto maybe_path_in_detached = getRelativePathForDetachedPart(prefix, broken);
if (!maybe_path_in_detached)
return;
data_part_storage->freeze(
storage.relative_data_path,
getRelativePathForDetachedPart(prefix),
*maybe_path_in_detached,
/*make_source_readonly*/ true,
{},
copy_instead_of_hardlink,

View File

@ -347,7 +347,7 @@ public:
/// Calculate column and secondary indices sizes on disk.
void calculateColumnsAndSecondaryIndicesSizesOnDisk();
String getRelativePathForPrefix(const String & prefix, bool detached = false) const;
std::optional<String> getRelativePathForPrefix(const String & prefix, bool detached = false, bool broken = false) const;
bool isProjectionPart() const { return parent_part != nullptr; }
@ -485,7 +485,7 @@ protected:
/// disk using columns and checksums.
virtual void calculateEachColumnSizes(ColumnSizeByName & each_columns_size, ColumnSize & total_size) const = 0;
String getRelativePathForDetachedPart(const String & prefix) const;
std::optional<String> getRelativePathForDetachedPart(const String & prefix, bool broken) const;
/// Checks that part can be actually removed from disk.
/// In ordinary scenario always returns true, but in case of

View File

@ -3245,7 +3245,10 @@ void MergeTreeData::outdateBrokenPartAndCloneToDetached(const DataPartPtr & part
LOG_INFO(log, "Cloning part {} to {}_{} and making it obsolete.", part_to_detach->data_part_storage->getPartDirectory(), prefix, part_to_detach->name);
part_to_detach->makeCloneInDetached(prefix, metadata_snapshot);
removePartsFromWorkingSet(NO_TRANSACTION_RAW, {part_to_detach}, true);
DataPartsLock lock = lockParts();
if (part_to_detach->getState() == DataPartState::Active)
removePartsFromWorkingSet(NO_TRANSACTION_RAW, {part_to_detach}, true, &lock);
}
void MergeTreeData::forcefullyMovePartToDetachedAndRemoveFromMemory(const MergeTreeData::DataPartPtr & part_to_detach, const String & prefix, bool restore_covered)
@ -6250,7 +6253,7 @@ std::pair<MergeTreeData::MutableDataPartPtr, scope_guard> MergeTreeData::cloneAn
if (auto src_part_in_memory = asInMemoryPart(src_part))
{
auto flushed_part_path = src_part_in_memory->getRelativePathForPrefix(tmp_part_prefix);
src_part_storage = src_part_in_memory->flushToDisk(flushed_part_path, metadata_snapshot);
src_part_storage = src_part_in_memory->flushToDisk(*flushed_part_path, metadata_snapshot);
}
String with_copy;
@ -6434,7 +6437,7 @@ PartitionCommandsResultInfo MergeTreeData::freezePartitionsByMatcher(
if (auto part_in_memory = asInMemoryPart(part))
{
auto flushed_part_path = part_in_memory->getRelativePathForPrefix("tmp_freeze");
data_part_storage = part_in_memory->flushToDisk(flushed_part_path, metadata_snapshot);
data_part_storage = part_in_memory->flushToDisk(*flushed_part_path, metadata_snapshot);
}
auto callback = [this, &part, &backup_part_path](const DiskPtr & disk)

View File

@ -142,7 +142,7 @@ DataPartStoragePtr MergeTreeDataPartInMemory::flushToDisk(const String & new_rel
void MergeTreeDataPartInMemory::makeCloneInDetached(const String & prefix, const StorageMetadataPtr & metadata_snapshot) const
{
String detached_path = getRelativePathForDetachedPart(prefix);
String detached_path = *getRelativePathForDetachedPart(prefix, /* broken */ false);
flushToDisk(detached_path, metadata_snapshot);
}

View File

@ -9,6 +9,7 @@
#include <Interpreters/castColumn.h>
#include <Columns/ColumnArray.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
namespace DB
@ -64,9 +65,11 @@ uint64_t AnnoyIndex<Dist>::getNumOfDimensions() const
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int INCORRECT_QUERY;
extern const int ILLEGAL_COLUMN;
extern const int INCORRECT_DATA;
extern const int INCORRECT_NUMBER_OF_COLUMNS;
extern const int INCORRECT_QUERY;
extern const int LOGICAL_ERROR;
}
MergeTreeIndexGranuleAnnoy::MergeTreeIndexGranuleAnnoy(const String & index_name_, const Block & index_sample_block_)
@ -132,9 +135,7 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si
return;
if (index_sample_block.columns() > 1)
{
throw Exception("Only one column is supported", ErrorCodes::LOGICAL_ERROR);
}
auto index_column_name = index_sample_block.getByPosition(0).name;
const auto & column_cut = block.getByName(index_column_name).column->cut(*pos, rows_read);
@ -144,27 +145,22 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si
const auto & data = column_array->getData();
const auto & array = typeid_cast<const ColumnFloat32&>(data).getData();
if (array.empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Array have 0 rows, but {} expected", rows_read);
throw Exception(ErrorCodes::LOGICAL_ERROR, "Array has 0 rows, {} rows expected", rows_read);
const auto & offsets = column_array->getOffsets();
size_t num_rows = offsets.size();
/// All sizes are the same
/// Check all sizes are the same
size_t size = offsets[0];
for (size_t i = 0; i < num_rows - 1; ++i)
{
if (offsets[i + 1] - offsets[i] != size)
{
throw Exception(ErrorCodes::INCORRECT_DATA, "Arrays should have same length");
}
}
index = std::make_shared<AnnoyIndex>(size);
index->add_item(index->get_n_items(), array.data());
/// add all rows from 1 to num_rows - 1 (this is the same as the beginning of the last element)
for (size_t current_row = 1; current_row < num_rows; ++current_row)
{
index->add_item(index->get_n_items(), &array[offsets[current_row - 1]]);
}
}
else
{
@ -181,19 +177,13 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si
{
const auto& pod_array = typeid_cast<const ColumnFloat32*>(column.get())->getData();
for (size_t i = 0; i < pod_array.size(); ++i)
{
data[i].push_back(pod_array[i]);
}
}
assert(!data.empty());
if (!index)
{
index = std::make_shared<AnnoyIndex>(data[0].size());
}
for (const auto& item : data)
{
index->add_item(index->get_n_items(), item.data());
}
}
*pos += rows_read;
@ -222,7 +212,7 @@ std::vector<size_t> MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex
{
UInt64 limit = condition.getLimit();
UInt64 index_granularity = condition.getIndexGranularity();
std::optional<float> comp_dist = condition.getQueryType() == ANN::ANNQueryInformation::Type::Where ?
std::optional<float> comp_dist = condition.getQueryType() == ApproximateNearestNeighbour::ANNQueryInformation::Type::Where ?
std::optional<float>(condition.getComparisonDistanceForWhereQuery()) : std::nullopt;
if (comp_dist && comp_dist.value() < 0)
@ -232,16 +222,13 @@ std::vector<size_t> MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex
auto granule = std::dynamic_pointer_cast<MergeTreeIndexGranuleAnnoy>(idx_granule);
if (granule == nullptr)
{
throw Exception("Granule has the wrong type", ErrorCodes::LOGICAL_ERROR);
}
auto annoy = granule->index;
if (condition.getNumOfDimensions() != annoy->getNumOfDimensions())
{
throw Exception("The dimension of the space in the request (" + toString(condition.getNumOfDimensions()) + ") "
+ "does not match with the dimension in the index (" + toString(annoy->getNumOfDimensions()) + ")", ErrorCodes::INCORRECT_QUERY);
}
/// neighbors contain indexes of dots which were closest to target vector
std::vector<UInt64> neighbors;
@ -268,23 +255,25 @@ std::vector<size_t> MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex
for (size_t i = 0; i < neighbors.size(); ++i)
{
if (comp_dist && distances[i] > comp_dist)
{
continue;
}
granule_numbers.insert(neighbors[i] / index_granularity);
}
std::vector<size_t> result_vector;
result_vector.reserve(granule_numbers.size());
for (auto granule_number : granule_numbers)
{
result_vector.push_back(granule_number);
}
return result_vector;
}
MergeTreeIndexAnnoy::MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_)
: IMergeTreeIndex(index_)
, number_of_trees(number_of_trees_)
{
}
MergeTreeIndexGranulePtr MergeTreeIndexAnnoy::createIndexGranule() const
{
return std::make_shared<MergeTreeIndexGranuleAnnoy>(index.name, index.sample_block);
@ -307,6 +296,40 @@ MergeTreeIndexPtr annoyIndexCreator(const IndexDescription & index)
return std::make_shared<MergeTreeIndexAnnoy>(index, param);
}
static void assertIndexColumnsType(const Block & header)
{
DataTypePtr column_data_type_ptr = header.getDataTypes()[0];
if (const auto * array_type = typeid_cast<const DataTypeArray *>(column_data_type_ptr.get()))
{
TypeIndex nested_type_index = array_type->getNestedType()->getTypeId();
if (!WhichDataType(nested_type_index).isFloat32())
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.",
column_data_type_ptr->getName());
}
else if (const auto * tuple_type = typeid_cast<const DataTypeTuple *>(column_data_type_ptr.get()))
{
const DataTypes & nested_types = tuple_type->getElements();
for (const auto & type : nested_types)
{
TypeIndex nested_type_index = type->getTypeId();
if (!WhichDataType(nested_type_index).isFloat32())
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.",
column_data_type_ptr->getName());
}
}
else
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.",
column_data_type_ptr->getName());
}
void annoyIndexValidator(const IndexDescription & index, bool /* attach */)
{
if (index.arguments.size() != 1)
@ -317,6 +340,11 @@ void annoyIndexValidator(const IndexDescription & index, bool /* attach */)
{
throw Exception("Annoy index argument must be UInt64.", ErrorCodes::INCORRECT_QUERY);
}
if (index.column_names.size() != 1 || index.data_types.size() != 1)
throw Exception("Annoy indexes must be created on a single column", ErrorCodes::INCORRECT_NUMBER_OF_COLUMNS);
assertIndexColumnsType(index.sample_block);
}
}

View File

@ -10,8 +10,6 @@
namespace DB
{
namespace ANN = ApproximateNearestNeighbour;
// auxiliary namespace for working with spotify-annoy library
// mainly for serialization and deserialization of the index
namespace ApproximateNearestNeighbour
@ -33,7 +31,7 @@ namespace ApproximateNearestNeighbour
struct MergeTreeIndexGranuleAnnoy final : public IMergeTreeIndexGranule
{
using AnnoyIndex = ANN::AnnoyIndex<>;
using AnnoyIndex = ApproximateNearestNeighbour::AnnoyIndex<>;
using AnnoyIndexPtr = std::shared_ptr<AnnoyIndex>;
MergeTreeIndexGranuleAnnoy(const String & index_name_, const Block & index_sample_block_);
@ -57,7 +55,7 @@ struct MergeTreeIndexGranuleAnnoy final : public IMergeTreeIndexGranule
struct MergeTreeIndexAggregatorAnnoy final : IMergeTreeIndexAggregator
{
using AnnoyIndex = ANN::AnnoyIndex<>;
using AnnoyIndex = ApproximateNearestNeighbour::AnnoyIndex<>;
using AnnoyIndexPtr = std::shared_ptr<AnnoyIndex>;
MergeTreeIndexAggregatorAnnoy(const String & index_name_, const Block & index_sample_block, uint64_t number_of_trees);
@ -74,7 +72,7 @@ struct MergeTreeIndexAggregatorAnnoy final : IMergeTreeIndexAggregator
};
class MergeTreeIndexConditionAnnoy final : public ANN::IMergeTreeIndexConditionAnn
class MergeTreeIndexConditionAnnoy final : public ApproximateNearestNeighbour::IMergeTreeIndexConditionAnn
{
public:
MergeTreeIndexConditionAnnoy(
@ -91,18 +89,14 @@ public:
~MergeTreeIndexConditionAnnoy() override = default;
private:
ANN::ANNCondition condition;
ApproximateNearestNeighbour::ANNCondition condition;
};
class MergeTreeIndexAnnoy : public IMergeTreeIndex
{
public:
MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_)
: IMergeTreeIndex(index_)
, number_of_trees(number_of_trees_)
{}
MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_);
~MergeTreeIndexAnnoy() override = default;
MergeTreeIndexGranulePtr createIndexGranule() const override;

View File

@ -419,14 +419,14 @@ void ReplicatedMergeTreeCleanupThread::getBlocksSortedByTime(zkutil::ZooKeeper &
LOG_TRACE(log, "Checking {} blocks ({} are not cached){}", stat.numChildren, not_cached_blocks, " to clear old ones from ZooKeeper.");
}
zkutil::AsyncResponses<Coordination::ExistsResponse> exists_futures;
std::vector<std::string> exists_paths;
for (const String & block : blocks)
{
auto it = cached_block_stats.find(block);
if (it == cached_block_stats.end())
{
/// New block. Fetch its stat asynchronously.
exists_futures.emplace_back(block, zookeeper.asyncExists(storage.zookeeper_path + "/blocks/" + block));
exists_paths.emplace_back(storage.zookeeper_path + "/blocks/" + block);
}
else
{
@ -436,14 +436,18 @@ void ReplicatedMergeTreeCleanupThread::getBlocksSortedByTime(zkutil::ZooKeeper &
}
}
auto exists_size = exists_paths.size();
auto exists_results = zookeeper.exists(exists_paths);
/// Put fetched stats into the cache
for (auto & elem : exists_futures)
for (size_t i = 0; i < exists_size; ++i)
{
auto status = elem.second.get();
auto status = exists_results[i];
if (status.error != Coordination::Error::ZNONODE)
{
cached_block_stats.emplace(elem.first, std::make_pair(status.stat.ctime, status.stat.version));
timed_blocks.emplace_back(elem.first, status.stat.ctime, status.stat.version);
auto node_name = fs::path(exists_paths[i]).filename();
cached_block_stats.emplace(node_name, std::make_pair(status.stat.ctime, status.stat.version));
timed_blocks.emplace_back(node_name, status.stat.ctime, status.stat.version);
}
}

View File

@ -41,7 +41,7 @@ ReplicatedMergeTreeQueue::ReplicatedMergeTreeQueue(StorageReplicatedMergeTree &
void ReplicatedMergeTreeQueue::clear()
{
auto locks = lockQueue();
assert(future_parts.empty());
chassert(future_parts.empty());
current_parts.clear();
virtual_parts.clear();
queue.clear();
@ -62,6 +62,7 @@ void ReplicatedMergeTreeQueue::setBrokenPartsToEnqueueFetchesOnLoading(Strings &
void ReplicatedMergeTreeQueue::initialize(zkutil::ZooKeeperPtr zookeeper)
{
clear();
std::lock_guard lock(state_mutex);
LOG_TRACE(log, "Initializing parts in queue");
@ -153,17 +154,19 @@ bool ReplicatedMergeTreeQueue::load(zkutil::ZooKeeperPtr zookeeper)
::sort(children.begin(), children.end());
zkutil::AsyncResponses<Coordination::GetResponse> futures;
futures.reserve(children.size());
auto children_num = children.size();
std::vector<std::string> paths;
paths.reserve(children_num);
for (const String & child : children)
futures.emplace_back(child, zookeeper->asyncGet(fs::path(queue_path) / child));
paths.emplace_back(fs::path(queue_path) / child);
for (auto & future : futures)
auto results = zookeeper->get(paths);
for (size_t i = 0; i < children_num; ++i)
{
Coordination::GetResponse res = future.second.get();
auto res = results[i];
LogEntryPtr entry = LogEntry::parse(res.data, res.stat);
entry->znode_name = future.first;
entry->znode_name = children[i];
std::lock_guard lock(state_mutex);
@ -641,11 +644,11 @@ int32_t ReplicatedMergeTreeQueue::pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper
LOG_DEBUG(log, "Pulling {} entries to queue: {} - {}", (end - begin), *begin, *last);
zkutil::AsyncResponses<Coordination::GetResponse> futures;
futures.reserve(end - begin);
Strings get_paths;
get_paths.reserve(end - begin);
for (auto it = begin; it != end; ++it)
futures.emplace_back(*it, zookeeper->asyncGet(fs::path(zookeeper_path) / "log" / *it));
get_paths.emplace_back(fs::path(zookeeper_path) / "log" / *it);
/// Simultaneously add all new entries to the queue and move the pointer to the log.
@ -655,9 +658,11 @@ int32_t ReplicatedMergeTreeQueue::pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper
std::optional<time_t> min_unprocessed_insert_time_changed;
for (auto & future : futures)
auto get_results = zookeeper->get(get_paths);
auto get_num = get_results.size();
for (size_t i = 0; i < get_num; ++i)
{
Coordination::GetResponse res = future.second.get();
auto res = get_results[i];
copied_entries.emplace_back(LogEntry::parse(res.data, res.stat));

View File

@ -99,19 +99,22 @@ size_t ReplicatedMergeTreeSink::checkQuorumPrecondition(zkutil::ZooKeeperPtr & z
quorum_info.status_path = storage.zookeeper_path + "/quorum/status";
Strings replicas = zookeeper->getChildren(fs::path(storage.zookeeper_path) / "replicas");
std::vector<std::future<Coordination::ExistsResponse>> replicas_status_futures;
replicas_status_futures.reserve(replicas.size());
Strings exists_paths;
for (const auto & replica : replicas)
if (replica != storage.replica_name)
replicas_status_futures.emplace_back(zookeeper->asyncExists(fs::path(storage.zookeeper_path) / "replicas" / replica / "is_active"));
exists_paths.emplace_back(fs::path(storage.zookeeper_path) / "replicas" / replica / "is_active");
std::future<Coordination::GetResponse> is_active_future = zookeeper->asyncTryGet(storage.replica_path + "/is_active");
std::future<Coordination::GetResponse> host_future = zookeeper->asyncTryGet(storage.replica_path + "/host");
auto exists_result = zookeeper->exists(exists_paths);
auto get_results = zookeeper->get(Strings{storage.replica_path + "/is_active", storage.replica_path + "/host"});
size_t active_replicas = 1; /// Assume current replica is active (will check below)
for (auto & status : replicas_status_futures)
if (status.get().error == Coordination::Error::ZOK)
for (size_t i = 0; i < exists_paths.size(); ++i)
{
auto status = exists_result[i];
if (status.error == Coordination::Error::ZOK)
++active_replicas;
}
size_t replicas_number = replicas.size();
size_t quorum_size = getQuorumSize(replicas_number);
@ -135,8 +138,8 @@ size_t ReplicatedMergeTreeSink::checkQuorumPrecondition(zkutil::ZooKeeperPtr & z
/// Both checks are implicitly made also later (otherwise there would be a race condition).
auto is_active = is_active_future.get();
auto host = host_future.get();
auto is_active = get_results[0];
auto host = get_results[1];
if (is_active.error == Coordination::Error::ZNONODE || host.error == Coordination::Error::ZNONODE)
throw Exception("Replica is not active right now", ErrorCodes::READONLY);

View File

@ -682,24 +682,20 @@ Chunk StorageKeeperMap::getBySerializedKeys(const std::span<const std::string> k
auto client = getClient();
std::vector<std::future<Coordination::GetResponse>> values;
values.reserve(keys.size());
Strings full_key_paths;
full_key_paths.reserve(keys.size());
for (const auto & key : keys)
{
const auto full_path = fullPathForKey(key);
values.emplace_back(client->asyncTryGet(full_path));
full_key_paths.emplace_back(fullPathForKey(key));
}
auto wait_until = std::chrono::system_clock::now() + std::chrono::milliseconds(Coordination::DEFAULT_OPERATION_TIMEOUT_MS);
auto values = client->tryGet(full_key_paths);
for (size_t i = 0; i < keys.size(); ++i)
{
auto & value = values[i];
if (value.wait_until(wait_until) != std::future_status::ready)
throw DB::Exception(ErrorCodes::KEEPER_EXCEPTION, "Failed to fetch values: timeout");
auto response = values[i];
auto response = value.get();
Coordination::Error code = response.error;
if (code == Coordination::Error::ZOK)

View File

@ -993,14 +993,6 @@ MergeMutateSelectedEntryPtr StorageMergeTree::selectPartsToMutate(
const StorageMetadataPtr & metadata_snapshot, String * /* disable_reason */, TableLockHolder & /* table_lock_holder */,
std::unique_lock<std::mutex> & /*currently_processing_in_background_mutex_lock*/)
{
size_t max_ast_elements = getContext()->getSettingsRef().max_expanded_ast_elements;
auto future_part = std::make_shared<FutureMergedMutatedPart>();
if (storage_settings.get()->assign_part_uuids)
future_part->uuid = UUIDHelpers::generateV4();
CurrentlyMergingPartsTaggerPtr tagger;
if (current_mutations_by_version.empty())
return {};
@ -1014,6 +1006,14 @@ MergeMutateSelectedEntryPtr StorageMergeTree::selectPartsToMutate(
return {};
}
size_t max_ast_elements = getContext()->getSettingsRef().max_expanded_ast_elements;
auto future_part = std::make_shared<FutureMergedMutatedPart>();
if (storage_settings.get()->assign_part_uuids)
future_part->uuid = UUIDHelpers::generateV4();
CurrentlyMergingPartsTaggerPtr tagger;
auto mutations_end_it = current_mutations_by_version.end();
for (const auto & part : getDataPartsVectorForInternalUsage())
{
@ -1152,7 +1152,8 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign
return false;
merge_entry = selectPartsToMerge(metadata_snapshot, false, {}, false, nullptr, share_lock, lock, txn);
if (!merge_entry)
if (!merge_entry && !current_mutations_by_version.empty())
mutate_entry = selectPartsToMutate(metadata_snapshot, nullptr, share_lock, lock);
has_mutations = !current_mutations_by_version.empty();

View File

@ -285,21 +285,32 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree(
, replicated_fetches_throttler(std::make_shared<Throttler>(getSettings()->max_replicated_fetches_network_bandwidth, getContext()->getReplicatedFetchesThrottler()))
, replicated_sends_throttler(std::make_shared<Throttler>(getSettings()->max_replicated_sends_network_bandwidth, getContext()->getReplicatedSendsThrottler()))
{
/// We create and deactivate all tasks for consistency.
/// They all will be scheduled and activated by the restarting thread.
queue_updating_task = getContext()->getSchedulePool().createTask(
getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::queueUpdatingTask)", [this]{ queueUpdatingTask(); });
queue_updating_task->deactivate();
mutations_updating_task = getContext()->getSchedulePool().createTask(
getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mutationsUpdatingTask)", [this]{ mutationsUpdatingTask(); });
mutations_updating_task->deactivate();
merge_selecting_task = getContext()->getSchedulePool().createTask(
getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mergeSelectingTask)", [this] { mergeSelectingTask(); });
/// Will be activated if we win leader election.
/// Will be activated if we will achieve leader state.
merge_selecting_task->deactivate();
mutations_finalizing_task = getContext()->getSchedulePool().createTask(
getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mutationsFinalizingTask)", [this] { mutationsFinalizingTask(); });
/// This task can be scheduled by different parts of code even when storage is readonly.
/// This can lead to redundant exceptions during startup.
/// Will be activated by restarting thread.
mutations_finalizing_task->deactivate();
bool has_zookeeper = getContext()->hasZooKeeper() || getContext()->hasAuxiliaryZooKeeper(zookeeper_name);
if (has_zookeeper)
{
@ -2408,6 +2419,7 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo
std::vector<QueueEntryInfo> source_queue;
ActiveDataPartSet get_part_set{format_version};
ActiveDataPartSet drop_range_set{format_version};
std::unordered_set<String> exact_part_names;
{
std::vector<zkutil::ZooKeeper::FutureGet> queue_get_futures;
@ -2445,14 +2457,22 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo
info.parsed_entry->znode_name = source_queue_names[i];
if (info.parsed_entry->type == LogEntry::DROP_RANGE)
{
drop_range_set.add(info.parsed_entry->new_part_name);
if (info.parsed_entry->type == LogEntry::GET_PART)
}
else if (info.parsed_entry->type == LogEntry::GET_PART)
{
String maybe_covering_drop_range = drop_range_set.getContainingPart(info.parsed_entry->new_part_name);
if (maybe_covering_drop_range.empty())
get_part_set.add(info.parsed_entry->new_part_name);
}
else
{
/// We should keep local parts if they present in the queue of source replica.
/// There's a chance that we are the only replica that has these parts.
Strings entry_virtual_parts = info.parsed_entry->getVirtualPartNames(format_version);
std::move(entry_virtual_parts.begin(), entry_virtual_parts.end(), std::inserter(exact_part_names, exact_part_names.end()));
}
}
}
@ -2472,11 +2492,17 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo
for (const auto & part : local_parts_in_zk)
{
if (get_part_set.getContainingPart(part).empty())
{
parts_to_remove_from_zk.emplace_back(part);
LOG_WARNING(log, "Source replica does not have part {}. Removing it from ZooKeeper.", part);
}
/// We look for exact match (and not for any covering part)
/// because our part might be dropped and covering part might be merged though gap.
/// (avoid resurrection of data that was removed a long time ago)
if (get_part_set.getContainingPart(part) == part)
continue;
if (exact_part_names.contains(part))
continue;
parts_to_remove_from_zk.emplace_back(part);
LOG_WARNING(log, "Source replica does not have part {}. Removing it from ZooKeeper.", part);
}
{
@ -2498,11 +2524,14 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo
for (const auto & part : local_active_parts)
{
if (get_part_set.getContainingPart(part->name).empty())
{
parts_to_remove_from_working_set.emplace_back(part);
LOG_WARNING(log, "Source replica does not have part {}. Removing it from working set.", part->name);
}
if (get_part_set.getContainingPart(part->name) == part->name)
continue;
if (exact_part_names.contains(part->name))
continue;
parts_to_remove_from_working_set.emplace_back(part);
LOG_WARNING(log, "Source replica does not have part {}. Removing it from working set.", part->name);
}
if (getSettings()->detach_old_local_parts_when_cloning_replica)
@ -3206,16 +3235,17 @@ StorageReplicatedMergeTree::CreateMergeEntryResult StorageReplicatedMergeTree::c
int32_t log_version,
MergeType merge_type)
{
std::vector<std::future<Coordination::ExistsResponse>> exists_futures;
exists_futures.reserve(parts.size());
Strings exists_paths;
exists_paths.reserve(parts.size());
for (const auto & part : parts)
exists_futures.emplace_back(zookeeper->asyncExists(fs::path(replica_path) / "parts" / part->name));
exists_paths.emplace_back(fs::path(replica_path) / "parts" / part->name);
auto exists_results = zookeeper->exists(exists_paths);
bool all_in_zk = true;
for (size_t i = 0; i < parts.size(); ++i)
{
/// If there is no information about part in ZK, we will not merge it.
if (exists_futures[i].get().error == Coordination::Error::ZNONODE)
if (exists_results[i].error == Coordination::Error::ZNONODE)
{
all_in_zk = false;
@ -6228,19 +6258,20 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeperWithRetries(const Strin
auto zookeeper = getZooKeeper();
std::vector<std::future<Coordination::ExistsResponse>> exists_futures;
exists_futures.reserve(part_names.size());
Strings exists_paths;
exists_paths.reserve(part_names.size());
for (const String & part_name : part_names)
{
String part_path = fs::path(replica_path) / "parts" / part_name;
exists_futures.emplace_back(zookeeper->asyncExists(part_path));
exists_paths.emplace_back(fs::path(replica_path) / "parts" / part_name);
}
auto exists_results = zookeeper->exists(exists_paths);
std::vector<std::future<Coordination::MultiResponse>> remove_futures;
remove_futures.reserve(part_names.size());
for (size_t i = 0; i < part_names.size(); ++i)
{
Coordination::ExistsResponse exists_resp = exists_futures[i].get();
Coordination::ExistsResponse exists_resp = exists_results[i];
if (exists_resp.error == Coordination::Error::ZOK)
{
Coordination::Requests ops;
@ -6286,9 +6317,9 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeperWithRetries(const Strin
void StorageReplicatedMergeTree::removePartsFromZooKeeper(
zkutil::ZooKeeperPtr & zookeeper, const Strings & part_names, NameSet * parts_should_be_retried)
{
std::vector<std::future<Coordination::ExistsResponse>> exists_futures;
Strings exists_paths;
std::vector<std::future<Coordination::MultiResponse>> remove_futures;
exists_futures.reserve(part_names.size());
exists_paths.reserve(part_names.size());
remove_futures.reserve(part_names.size());
try
{
@ -6296,13 +6327,14 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeper(
/// if zk session will be dropped
for (const String & part_name : part_names)
{
String part_path = fs::path(replica_path) / "parts" / part_name;
exists_futures.emplace_back(zookeeper->asyncExists(part_path));
exists_paths.emplace_back(fs::path(replica_path) / "parts" / part_name);
}
auto exists_results = zookeeper->exists(exists_paths);
for (size_t i = 0; i < part_names.size(); ++i)
{
Coordination::ExistsResponse exists_resp = exists_futures[i].get();
auto exists_resp = exists_results[i];
if (exists_resp.error == Coordination::Error::ZOK)
{
Coordination::Requests ops;

View File

@ -197,7 +197,7 @@ public:
const S3::URI uri;
std::shared_ptr<const Aws::S3::S3Client> client;
S3Settings::AuthSettings auth_settings;
S3::AuthSettings auth_settings;
S3Settings::ReadWriteSettings rw_settings;
/// If s3 configuration was passed from ast, then it is static.
@ -209,7 +209,7 @@ public:
S3Configuration(
const String & url_,
const S3Settings::AuthSettings & auth_settings_,
const S3::AuthSettings & auth_settings_,
const S3Settings::ReadWriteSettings & rw_settings_,
const HeaderCollection & headers_from_ast_)
: uri(S3::URI(url_))

View File

@ -1,5 +1,7 @@
#include <Storages/StorageS3Settings.h>
#include <IO/S3Common.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/Exception.h>
#include <Interpreters/Context.h>
@ -9,10 +11,6 @@
namespace DB
{
namespace ErrorCodes
{
extern const int INVALID_CONFIG_PARAMETER;
}
void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::Util::AbstractConfiguration & config, const Settings & settings)
{
@ -46,41 +44,8 @@ void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::U
if (config.has(config_elem + "." + key + ".endpoint"))
{
auto endpoint = get_string_for_key(key, "endpoint", false);
auto access_key_id = get_string_for_key(key, "access_key_id");
auto secret_access_key = get_string_for_key(key, "secret_access_key");
auto region = get_string_for_key(key, "region");
auto server_side_encryption_customer_key_base64 = get_string_for_key(key, "server_side_encryption_customer_key_base64");
std::optional<bool> use_environment_credentials;
if (config.has(config_elem + "." + key + ".use_environment_credentials"))
use_environment_credentials = config.getBool(config_elem + "." + key + ".use_environment_credentials");
std::optional<bool> use_insecure_imds_request;
if (config.has(config_elem + "." + key + ".use_insecure_imds_request"))
use_insecure_imds_request = config.getBool(config_elem + "." + key + ".use_insecure_imds_request");
HeaderCollection headers;
Poco::Util::AbstractConfiguration::Keys subconfig_keys;
config.keys(config_elem + "." + key, subconfig_keys);
for (const String & subkey : subconfig_keys)
{
if (subkey.starts_with("header"))
{
auto header_str = config.getString(config_elem + "." + key + "." + subkey);
auto delimiter = header_str.find(':');
if (delimiter == String::npos)
throw Exception("Malformed s3 header value", ErrorCodes::INVALID_CONFIG_PARAMETER);
headers.emplace_back(HttpHeader{header_str.substr(0, delimiter), header_str.substr(delimiter + 1, String::npos)});
}
}
S3Settings::AuthSettings auth_settings{
std::move(access_key_id), std::move(secret_access_key),
std::move(region),
std::move(server_side_encryption_customer_key_base64),
std::move(headers),
use_environment_credentials,
use_insecure_imds_request};
auto auth_settings = S3::AuthSettings::loadFromConfig(config_elem + "." + key, config);
S3Settings::ReadWriteSettings rw_settings;
rw_settings.max_single_read_retries = get_uint_for_key(key, "max_single_read_retries", true, settings.s3_max_single_read_retries);

View File

@ -9,6 +9,8 @@
#include <Interpreters/Context_fwd.h>
#include <Storages/HeaderCollection.h>
#include <IO/S3Common.h>
namespace Poco::Util
{
class AbstractConfiguration;
@ -21,46 +23,6 @@ struct Settings;
struct S3Settings
{
struct AuthSettings
{
String access_key_id;
String secret_access_key;
String region;
String server_side_encryption_customer_key_base64;
HeaderCollection headers;
std::optional<bool> use_environment_credentials;
std::optional<bool> use_insecure_imds_request;
inline bool operator==(const AuthSettings & other) const
{
return access_key_id == other.access_key_id && secret_access_key == other.secret_access_key
&& region == other.region
&& server_side_encryption_customer_key_base64 == other.server_side_encryption_customer_key_base64
&& headers == other.headers
&& use_environment_credentials == other.use_environment_credentials
&& use_insecure_imds_request == other.use_insecure_imds_request;
}
void updateFrom(const AuthSettings & from)
{
/// Update with check for emptyness only parameters which
/// can be passed not only from config, but via ast.
if (!from.access_key_id.empty())
access_key_id = from.access_key_id;
if (!from.secret_access_key.empty())
secret_access_key = from.secret_access_key;
headers = from.headers;
region = from.region;
server_side_encryption_customer_key_base64 = from.server_side_encryption_customer_key_base64;
use_environment_credentials = from.use_environment_credentials;
use_insecure_imds_request = from.use_insecure_imds_request;
}
};
struct ReadWriteSettings
{
size_t max_single_read_retries = 0;
@ -90,7 +52,7 @@ struct S3Settings
void updateFromSettingsIfEmpty(const Settings & settings);
};
AuthSettings auth_settings;
S3::AuthSettings auth_settings;
ReadWriteSettings rw_settings;
inline bool operator==(const S3Settings & other) const

View File

@ -58,7 +58,7 @@ ColumnsDescription getStructureOfRemoteTableInShard(
}
ColumnsDescription res;
auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef());
auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef(), table_id);
/// Expect only needed columns from the result of DESC TABLE. NOTE 'comment' column is ignored for compatibility reasons.
Block sample_block
@ -169,7 +169,7 @@ ColumnsDescriptionByShardNum getExtendedObjectsOfRemoteTables(
const auto & shards_info = cluster.getShardsInfo();
auto query = "DESC TABLE " + remote_table_id.getFullTableName();
auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef());
auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef(), remote_table_id);
new_context->setSetting("describe_extend_object_types", true);
/// Expect only needed columns from the result of DESC TABLE.

View File

@ -987,7 +987,7 @@ class TestCase:
and (proc.stderr is None)
and (proc.stdout is None or "Exception" not in proc.stdout)
)
need_drop_database = not maybe_passed
need_drop_database = maybe_passed
debug_log = ""
if os.path.exists(self.testcase_args.debug_log_file):
@ -2055,7 +2055,7 @@ if __name__ == "__main__":
parser.add_argument(
"--no-drop-if-fail",
action="store_true",
help="Do not drop database for test if test has failed",
help="Do not drop database for test if test has failed (does not work if reference file mismatch)",
)
parser.add_argument(
"--hide-db-name",

View File

@ -0,0 +1,47 @@
<?xml version="1.0"?>
<clickhouse>
<storage_configuration>
<disks>
<disk_s3>
<type>s3</type>
<endpoint>http://minio1:9001/root/data/disks/disk_s3/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</disk_s3>
<disk_s3_other_bucket>
<type>s3</type>
<endpoint>http://minio1:9001/root2/data/disks/disk_s3/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</disk_s3_other_bucket>
<disk_s3_plain>
<type>s3_plain</type>
<endpoint>http://minio1:9001/root/data/disks/disk_s3_plain/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
<s3_max_single_part_upload_size>33554432</s3_max_single_part_upload_size>
</disk_s3_plain>
</disks>
<policies>
<policy_s3>
<volumes>
<main>
<disk>disk_s3</disk>
</main>
</volumes>
</policy_s3>
<policy_s3_other_bucket>
<volumes>
<main>
<disk>disk_s3_other_bucket</disk>
</main>
</volumes>
</policy_s3_other_bucket>
</policies>
</storage_configuration>
<backups>
<allowed_disk>default</allowed_disk>
<allowed_disk>disk_s3</allowed_disk>
<allowed_disk>disk_s3_plain</allowed_disk>
</backups>
</clickhouse>

View File

@ -0,0 +1,9 @@
<clickhouse>
<named_collections>
<named_collection_s3_backups>
<url>http://minio1:9001/root/data/backups</url>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</named_collection_s3_backups>
</named_collections>
</clickhouse>

View File

@ -1,42 +0,0 @@
<clickhouse>
<storage_configuration>
<disks>
<s3>
<type>s3</type>
<endpoint>http://minio1:9001/root/data/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
<s3_max_single_part_upload_size>33554432</s3_max_single_part_upload_size>
</s3>
<s3_plain>
<type>s3_plain</type>
<endpoint>http://minio1:9001/root/data/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
<s3_max_single_part_upload_size>33554432</s3_max_single_part_upload_size>
</s3_plain>
<hdd>
<type>local</type>
<path>/</path>
</hdd>
</disks>
<policies>
<s3>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3>
</policies>
</storage_configuration>
<backups>
<allowed_disk>default</allowed_disk>
<allowed_disk>s3</allowed_disk>
<allowed_disk>s3_plain</allowed_disk>
<allowed_path>/backups/</allowed_path>
</backups>
</clickhouse>

View File

@ -1,65 +1,40 @@
#!/usr/bin/env python3
# pylint: disable=unused-argument
import pytest
from helpers.cluster import ClickHouseCluster
cluster = ClickHouseCluster(__file__)
node = cluster.add_instance(
"node",
main_configs=["configs/storage_conf.xml"],
main_configs=["configs/disk_s3.xml", "configs/named_collection_s3_backups.xml"],
with_minio=True,
)
@pytest.fixture(scope="module")
@pytest.fixture(scope="module", autouse=True)
def start_cluster():
try:
cluster.start()
yield cluster
yield
finally:
cluster.shutdown()
@pytest.mark.parametrize(
"storage_policy,to_disk",
[
pytest.param(
"default",
"default",
id="from_local_to_local",
),
pytest.param(
"s3",
"default",
id="from_s3_to_local",
),
pytest.param(
"default",
"s3",
id="from_local_to_s3",
),
pytest.param(
"s3",
"s3_plain",
id="from_s3_to_s3_plain",
),
pytest.param(
"default",
"s3_plain",
id="from_local_to_s3_plain",
),
],
)
def test_backup_restore(start_cluster, storage_policy, to_disk):
backup_name = storage_policy + "_" + to_disk
backup_id_counter = 0
def new_backup_name():
global backup_id_counter
backup_id_counter += 1
return f"backup{backup_id_counter}"
def check_backup_and_restore(storage_policy, backup_destination):
node.query(
f"""
DROP TABLE IF EXISTS data NO DELAY;
CREATE TABLE data (key Int, value String, array Array(String)) Engine=MergeTree() ORDER BY tuple() SETTINGS storage_policy='{storage_policy}';
INSERT INTO data SELECT * FROM generateRandom('key Int, value String, array Array(String)') LIMIT 1000;
BACKUP TABLE data TO Disk('{to_disk}', '{backup_name}');
RESTORE TABLE data AS data_restored FROM Disk('{to_disk}', '{backup_name}');
BACKUP TABLE data TO {backup_destination};
RESTORE TABLE data AS data_restored FROM {backup_destination};
SELECT throwIf(
(SELECT groupArray(tuple(*)) FROM data) !=
(SELECT groupArray(tuple(*)) FROM data_restored),
@ -69,3 +44,75 @@ def test_backup_restore(start_cluster, storage_policy, to_disk):
DROP TABLE data_restored NO DELAY;
"""
)
@pytest.mark.parametrize(
"storage_policy, to_disk",
[
pytest.param(
"default",
"default",
id="from_local_to_local",
),
pytest.param(
"policy_s3",
"default",
id="from_s3_to_local",
),
pytest.param(
"default",
"disk_s3",
id="from_local_to_s3",
),
pytest.param(
"policy_s3",
"disk_s3_plain",
id="from_s3_to_s3_plain",
),
pytest.param(
"default",
"disk_s3_plain",
id="from_local_to_s3_plain",
),
],
)
def test_backup_to_disk(storage_policy, to_disk):
backup_name = new_backup_name()
backup_destination = f"Disk('{to_disk}', '{backup_name}')"
check_backup_and_restore(storage_policy, backup_destination)
def test_backup_to_s3():
storage_policy = "default"
backup_name = new_backup_name()
backup_destination = (
f"S3('http://minio1:9001/root/data/backups/{backup_name}', 'minio', 'minio123')"
)
check_backup_and_restore(storage_policy, backup_destination)
def test_backup_to_s3_named_collection():
storage_policy = "default"
backup_name = new_backup_name()
backup_destination = f"S3(named_collection_s3_backups, '{backup_name}')"
check_backup_and_restore(storage_policy, backup_destination)
def test_backup_to_s3_native_copy():
storage_policy = "policy_s3"
backup_name = new_backup_name()
backup_destination = (
f"S3('http://minio1:9001/root/data/backups/{backup_name}', 'minio', 'minio123')"
)
check_backup_and_restore(storage_policy, backup_destination)
assert node.contains_in_log("using native copy")
def test_backup_to_s3_other_bucket_native_copy():
storage_policy = "policy_s3_other_bucket"
backup_name = new_backup_name()
backup_destination = (
f"S3('http://minio1:9001/root/data/backups/{backup_name}', 'minio', 'minio123')"
)
check_backup_and_restore(storage_policy, backup_destination)
assert node.contains_in_log("using native copy")

View File

@ -0,0 +1 @@
#!/usr/bin/env python3

View File

@ -0,0 +1,42 @@
<clickhouse>
<keeper_server>
<s3_snapshot>
<endpoint>http://minio1:9001/snapshots/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<min_session_timeout_ms>5000</min_session_timeout_ms>
<snapshot_distance>50</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,42 @@
<clickhouse>
<keeper_server>
<s3_snapshot>
<endpoint>http://minio1:9001/snapshots/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<tcp_port>9181</tcp_port>
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<min_session_timeout_ms>5000</min_session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,42 @@
<clickhouse>
<keeper_server>
<s3_snapshot>
<endpoint>http://minio1:9001/snapshots/</endpoint>
<access_key_id>minio</access_key_id>
<secret_access_key>minio123</secret_access_key>
</s3_snapshot>
<tcp_port>9181</tcp_port>
<server_id>3</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<min_session_timeout_ms>5000</min_session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<start_as_follower>true</start_as_follower>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,120 @@
import pytest
from helpers.cluster import ClickHouseCluster
from time import sleep
from kazoo.client import KazooClient
# from kazoo.protocol.serialization import Connect, read_buffer, write_buffer
cluster = ClickHouseCluster(__file__)
node1 = cluster.add_instance(
"node1",
main_configs=["configs/keeper_config1.xml"],
stay_alive=True,
with_minio=True,
)
node2 = cluster.add_instance(
"node2",
main_configs=["configs/keeper_config2.xml"],
stay_alive=True,
with_minio=True,
)
node3 = cluster.add_instance(
"node3",
main_configs=["configs/keeper_config3.xml"],
stay_alive=True,
with_minio=True,
)
@pytest.fixture(scope="module")
def started_cluster():
try:
cluster.start()
cluster.minio_client.make_bucket("snapshots")
yield cluster
finally:
cluster.shutdown()
def get_fake_zk(nodename, timeout=30.0):
_fake_zk_instance = KazooClient(
hosts=cluster.get_instance_ip(nodename) + ":9181", timeout=timeout
)
_fake_zk_instance.start()
return _fake_zk_instance
def destroy_zk_client(zk):
try:
if zk:
zk.stop()
zk.close()
except:
pass
def wait_node(node):
for _ in range(100):
zk = None
try:
zk = get_fake_zk(node.name, timeout=30.0)
zk.sync("/")
print("node", node.name, "ready")
break
except Exception as ex:
sleep(0.2)
print("Waiting until", node.name, "will be ready, exception", ex)
finally:
destroy_zk_client(zk)
else:
raise Exception("Can't wait node", node.name, "to become ready")
def test_s3_upload(started_cluster):
node1_zk = get_fake_zk(node1.name)
# we defined in configs snapshot_distance as 50
# so after 50 requests we should generate a snapshot
for _ in range(210):
node1_zk.create("/test", sequence=True)
def get_saved_snapshots():
return [
obj.object_name
for obj in list(cluster.minio_client.list_objects("snapshots"))
]
saved_snapshots = get_saved_snapshots()
assert set(saved_snapshots) == set(
[
"snapshot_50.bin.zstd",
"snapshot_100.bin.zstd",
"snapshot_150.bin.zstd",
"snapshot_200.bin.zstd",
]
)
destroy_zk_client(node1_zk)
node1.stop_clickhouse(kill=True)
# wait for new leader to be picked and that it continues
# uploading snapshots
wait_node(node2)
node2_zk = get_fake_zk(node2.name)
for _ in range(200):
node2_zk.create("/test", sequence=True)
saved_snapshots = get_saved_snapshots()
assert len(saved_snapshots) > 4
success_upload_message = "Successfully uploaded"
assert node2.contains_in_log(success_upload_message) or node3.contains_in_log(
success_upload_message
)
destroy_zk_client(node2_zk)

View File

@ -0,0 +1,6 @@
<clickhouse>
<zookeeper>
<!-- Don't need real [Zoo]Keeper for this test -->
<implementation>testkeeper</implementation>
</zookeeper>
</clickhouse>

View File

@ -2,9 +2,15 @@ import pytest
import logging
from helpers.cluster import ClickHouseCluster
from helpers.test_tools import TSV
from helpers.test_tools import assert_eq_with_retry
cluster = ClickHouseCluster(__file__)
instance = cluster.add_instance("instance")
instance = cluster.add_instance(
"instance",
main_configs=[
"configs/testkeeper.xml",
],
)
q = instance.query
path_to_data = "/var/lib/clickhouse/"
@ -478,3 +484,86 @@ def test_detached_part_dir_exists(started_cluster):
== "all_1_1_0\nall_1_1_0_try1\nall_2_2_0\nall_2_2_0_try1\n"
)
q("drop table detached_part_dir_exists")
def test_make_clone_in_detached(started_cluster):
q(
"create table clone_in_detached (n int, m String) engine=ReplicatedMergeTree('/clone_in_detached', '1') order by n"
)
path = path_to_data + "data/default/clone_in_detached/"
# broken part already detached
q("insert into clone_in_detached values (42, '¯\_(ツ)_/¯')")
instance.exec_in_container(["rm", path + "all_0_0_0/data.bin"])
instance.exec_in_container(
["cp", "-r", path + "all_0_0_0", path + "detached/broken_all_0_0_0"]
)
assert_eq_with_retry(instance, "select * from clone_in_detached", "\n")
assert ["broken_all_0_0_0",] == sorted(
instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n")
)
# there's a directory with the same name, but different content
q("insert into clone_in_detached values (43, '¯\_(ツ)_/¯')")
instance.exec_in_container(["rm", path + "all_1_1_0/data.bin"])
instance.exec_in_container(
["cp", "-r", path + "all_1_1_0", path + "detached/broken_all_1_1_0"]
)
instance.exec_in_container(["rm", path + "detached/broken_all_1_1_0/primary.idx"])
instance.exec_in_container(
["cp", "-r", path + "all_1_1_0", path + "detached/broken_all_1_1_0_try0"]
)
instance.exec_in_container(
[
"bash",
"-c",
"echo 'broken' > {}".format(
path + "detached/broken_all_1_1_0_try0/checksums.txt"
),
]
)
assert_eq_with_retry(instance, "select * from clone_in_detached", "\n")
assert [
"broken_all_0_0_0",
"broken_all_1_1_0",
"broken_all_1_1_0_try0",
"broken_all_1_1_0_try1",
] == sorted(
instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n")
)
# there are directories with the same name, but different content, and part already detached
q("insert into clone_in_detached values (44, '¯\_(ツ)_/¯')")
instance.exec_in_container(["rm", path + "all_2_2_0/data.bin"])
instance.exec_in_container(
["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0"]
)
instance.exec_in_container(["rm", path + "detached/broken_all_2_2_0/primary.idx"])
instance.exec_in_container(
["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0_try0"]
)
instance.exec_in_container(
[
"bash",
"-c",
"echo 'broken' > {}".format(
path + "detached/broken_all_2_2_0_try0/checksums.txt"
),
]
)
instance.exec_in_container(
["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0_try1"]
)
assert_eq_with_retry(instance, "select * from clone_in_detached", "\n")
assert [
"broken_all_0_0_0",
"broken_all_1_1_0",
"broken_all_1_1_0_try0",
"broken_all_1_1_0_try1",
"broken_all_2_2_0",
"broken_all_2_2_0_try0",
"broken_all_2_2_0_try1",
] == sorted(
instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n")
)

View File

@ -11,11 +11,13 @@ node1 = cluster.add_instance(
"node1",
main_configs=["configs/zookeeper_config.xml", "configs/remote_servers.xml"],
with_zookeeper=True,
use_keeper=False,
)
node2 = cluster.add_instance(
"node2",
main_configs=["configs/zookeeper_config.xml", "configs/remote_servers.xml"],
with_zookeeper=True,
use_keeper=False,
)

View File

@ -1,3 +1,10 @@
import pytest
# FIXME This test is too flaky
# https://github.com/ClickHouse/ClickHouse/issues/39185
pytestmark = pytest.mark.skip
import json
import os.path as p
import random
@ -9,7 +16,6 @@ from random import randrange
import math
import asyncio
import pytest
from google.protobuf.internal.encoder import _VarintBytes
from helpers.client import QueryRuntimeException
from helpers.cluster import ClickHouseCluster, check_nats_is_available, nats_connect_ssl

View File

@ -1,39 +1,12 @@
#!/usr/bin/env bash
# Tags: no-fasttest
set -e
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
function stress()
{
# We set up a signal handler to make sure to wait for all queries to be finished before ending
CONTINUE=true
handle_interruption()
{
CONTINUE=false
}
trap handle_interruption INT
while $CONTINUE; do
${CLICKHOUSE_CLIENT} --query "CREATE TABLE IF NOT EXISTS table (x UInt8) ENGINE = MergeTree ORDER BY tuple()" 2>/dev/null
${CLICKHOUSE_CLIENT} --query "DROP TABLE table" 2>/dev/null
done
trap - INT
}
# https://stackoverflow.com/questions/9954794/execute-a-shell-function-with-timeout
export -f stress
for _ in {1..5}; do
# Ten seconds are just barely enough to reproduce the issue in most of runs.
timeout -s INT 10 bash -c stress &
done
yes 'CREATE TABLE IF NOT EXISTS table (x UInt8) ENGINE = MergeTree ORDER BY tuple();' | head -n 1000 | $CLICKHOUSE_CLIENT --ignore-error -nm 2>/dev/null &
yes 'DROP TABLE table;' | head -n 1000 | $CLICKHOUSE_CLIENT --ignore-error -nm 2>/dev/null &
wait
echo
${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS table";
${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS table"

View File

@ -31,3 +31,5 @@ cv bn m","qwe,rty",456,"2016-01-02"
"zx\cv\bn m","qwe,rty","as""df'gh","",789,"2016-01-04"
"","zx
cv bn m","qwe,rty","as""df'gh",9876543210,"2016-01-03"
1
1

View File

@ -83,3 +83,13 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE template1";
$CLICKHOUSE_CLIENT --query="DROP TABLE template2";
rm "$CURDIR"/00938_template_input_format_resultset.tmp "$CURDIR"/00938_template_input_format_row.tmp
echo -ne '\${a:Escaped},\${b:Escaped}\n' > "$CURDIR"/00938_template_input_format_row.tmp
echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String, b String" --input-format Template \
--format_template_row "$CURDIR"/00938_template_input_format_row.tmp --format_template_rows_between_delimiter '' \
-q 'select * from table' 2>&1| grep -Fac "'Escaped' serialization requires delimiter"
echo -ne '\${a:Escaped},\${:Escaped}\n' > "$CURDIR"/00938_template_input_format_row.tmp
echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String" --input-format Template \
--format_template_row "$CURDIR"/00938_template_input_format_row.tmp --format_template_rows_between_delimiter '' \
-q 'select * from table' 2>&1| grep -Fac "'Escaped' serialization requires delimiter"
rm "$CURDIR"/00938_template_input_format_row.tmp

View File

@ -8,3 +8,4 @@
1,"2019-09-25","world"
2,"2019-09-26","custom"
3,"2019-09-27","separated"
1

View File

@ -34,3 +34,8 @@ FORMAT CustomSeparated"
$CLICKHOUSE_CLIENT --query="SELECT * FROM custom_separated ORDER BY n FORMAT CSV"
$CLICKHOUSE_CLIENT --query="DROP TABLE custom_separated"
echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String, b String" \
--input-format CustomSeparated --format_custom_escaping_rule=Escaped \
--format_custom_field_delimiter=',' --format_custom_row_after_delimiter=$'\n' -q 'select * from table' \
2>&1| grep -Fac "'Escaped' serialization requires delimiter"

View File

@ -60,6 +60,14 @@ select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filt
0
0
select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy != 0'};
select * from distr_table settings additional_table_filters={'distr_table' : 'x = 2'};
2 bb
2 bb
select * from distr_table settings additional_table_filters={'distr_table' : 'x != 2 and x != 3'};
1 a
4 dddd
1 a
4 dddd
select * from system.numbers limit 5;
0
1

View File

@ -1,3 +1,4 @@
-- Tags: distributed
drop table if exists table_1;
drop table if exists table_2;
drop table if exists v_numbers;
@ -6,6 +7,8 @@ drop table if exists mv_table;
create table table_1 (x UInt32, y String) engine = MergeTree order by x;
insert into table_1 values (1, 'a'), (2, 'bb'), (3, 'ccc'), (4, 'dddd');
CREATE TABLE distr_table (x UInt32, y String) ENGINE = Distributed(test_cluster_two_shards, currentDatabase(), 'table_1');
-- { echoOn }
select * from table_1;
@ -29,6 +32,9 @@ select x from table_1 prewhere x != 2 where x != 2 settings additional_table_fil
select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy = 0'};
select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy != 0'};
select * from distr_table settings additional_table_filters={'distr_table' : 'x = 2'};
select * from distr_table settings additional_table_filters={'distr_table' : 'x != 2 and x != 3'};
select * from system.numbers limit 5;
select * from system.numbers as t limit 5 settings additional_table_filters={'t' : 'number % 2 != 0'};
select * from system.numbers limit 5 settings additional_table_filters={'system.numbers' : 'number != 3'};

View File

@ -0,0 +1,3 @@
4 dddd
5 a
6 bb

View File

@ -0,0 +1,20 @@
-- Tags: no-parallel, distributed
create database if not exists shard_0;
create database if not exists shard_1;
drop table if exists dist_02346;
drop table if exists shard_0.data_02346;
drop table if exists shard_1.data_02346;
create table shard_0.data_02346 (x UInt32, y String) engine = MergeTree order by x settings index_granularity = 2;
insert into shard_0.data_02346 values (1, 'a'), (2, 'bb'), (3, 'ccc'), (4, 'dddd');
create table shard_1.data_02346 (x UInt32, y String) engine = MergeTree order by x settings index_granularity = 2;
insert into shard_1.data_02346 values (5, 'a'), (6, 'bb'), (7, 'ccc'), (8, 'dddd');
create table dist_02346 (x UInt32, y String) engine=Distributed('test_cluster_two_shards_different_databases', /* default_database= */ '', data_02346);
set max_rows_to_read=4;
select * from dist_02346 order by x settings additional_table_filters={'dist_02346' : 'x > 3 and x < 7'};

Some files were not shown because too many files have changed in this diff Show More