diff --git a/CHANGELOG.md b/CHANGELOG.md index 56d117d05dd..83c1cbf1eb4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ ### Table of Contents **[ClickHouse release v22.9, 2022-09-22](#229)**
-**[ClickHouse release v22.8, 2022-08-18](#228)**
+**[ClickHouse release v22.8-lts, 2022-08-18](#228)**
**[ClickHouse release v22.7, 2022-07-21](#227)**
**[ClickHouse release v22.6, 2022-06-16](#226)**
**[ClickHouse release v22.5, 2022-05-19](#225)**
@@ -10,10 +10,10 @@ **[ClickHouse release v22.1, 2022-01-18](#221)**
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**
- ### ClickHouse release 22.9, 2022-09-22 #### Backward Incompatible Change + * Upgrade from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any `ReplicatedMergeTree` tables, otherwise server with the new version will not start. [#40641](https://github.com/ClickHouse/ClickHouse/pull/40641) ([Alexander Tokmakov](https://github.com/tavplubix)). * Remove the functions `accurate_Cast` and `accurate_CastOrNull` (they are different to `accurateCast` and `accurateCastOrNull` by underscore in the name and they are not affected by the value of `cast_keep_nullable` setting). These functions were undocumented, untested, unused, and unneeded. They appeared to be alive due to code generalization. [#40682](https://github.com/ClickHouse/ClickHouse/pull/40682) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Add a test to ensure that every new table function will be documented. See [#40649](https://github.com/ClickHouse/ClickHouse/issues/40649). Rename table function `MeiliSearch` to `meilisearch`. [#40709](https://github.com/ClickHouse/ClickHouse/pull/40709) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -21,6 +21,7 @@ * Make interpretation of YAML configs to be more conventional. [#41044](https://github.com/ClickHouse/ClickHouse/pull/41044) ([Vitaly Baranov](https://github.com/vitlibar)). #### New Feature + * Support `insert_quorum = 'auto'` to use majority number. [#39970](https://github.com/ClickHouse/ClickHouse/pull/39970) ([Sachin](https://github.com/SachinSetiya)). * Add embedded dashboards to ClickHouse server. This is a demo project about how to achieve 90% results with 1% effort using ClickHouse features. [#40461](https://github.com/ClickHouse/ClickHouse/pull/40461) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Added new settings constraint writability kind `changeable_in_readonly`. [#40631](https://github.com/ClickHouse/ClickHouse/pull/40631) ([Sergei Trifonov](https://github.com/serxa)). @@ -38,6 +39,7 @@ * Improvement for in-memory data parts: remove completely processed WAL files. [#40592](https://github.com/ClickHouse/ClickHouse/pull/40592) ([Azat Khuzhin](https://github.com/azat)). #### Performance Improvement + * Implement compression of marks and primary key. Close [#34437](https://github.com/ClickHouse/ClickHouse/issues/34437). [#37693](https://github.com/ClickHouse/ClickHouse/pull/37693) ([zhongyuankai](https://github.com/zhongyuankai)). * Allow to load marks with threadpool in advance. Regulated by setting `load_marks_asynchronously` (default: 0). [#40821](https://github.com/ClickHouse/ClickHouse/pull/40821) ([Kseniia Sumarokova](https://github.com/kssenii)). * Virtual filesystem over s3 will use random object names split into multiple path prefixes for better performance on AWS. [#40968](https://github.com/ClickHouse/ClickHouse/pull/40968) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -58,6 +60,7 @@ * Parallel hash JOIN for Float data types might be suboptimal. Make it better. [#41183](https://github.com/ClickHouse/ClickHouse/pull/41183) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### Improvement + * During startup and ATTACH call, `ReplicatedMergeTree` tables will be readonly until the ZooKeeper connection is made and the setup is finished. [#40148](https://github.com/ClickHouse/ClickHouse/pull/40148) ([Antonio Andelic](https://github.com/antonio2368)). * Add `enable_extended_results_for_datetime_functions` option to return results of type Date32 for functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toMonday and toLastDayOfMonth when argument is Date32 or DateTime64, otherwise results of Date type are returned. For compatibility reasons default value is ‘0’. [#41214](https://github.com/ClickHouse/ClickHouse/pull/41214) ([Roman Vasin](https://github.com/rvasin)). * For security and stability reasons, CatBoost models are no longer evaluated within the ClickHouse server. Instead, the evaluation is now done in the clickhouse-library-bridge, a separate process that loads the catboost library and communicates with the server process via HTTP. [#40897](https://github.com/ClickHouse/ClickHouse/pull/40897) ([Robert Schulze](https://github.com/rschu1ze)). [#39629](https://github.com/ClickHouse/ClickHouse/pull/39629) ([Robert Schulze](https://github.com/rschu1ze)). @@ -108,6 +111,7 @@ * Add `has_lightweight_delete` to system.parts. [#41564](https://github.com/ClickHouse/ClickHouse/pull/41564) ([Kseniia Sumarokova](https://github.com/kssenii)). #### Build/Testing/Packaging Improvement + * Enforce documentation for every setting. [#40644](https://github.com/ClickHouse/ClickHouse/pull/40644) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Enforce documentation for every current metric. [#40645](https://github.com/ClickHouse/ClickHouse/pull/40645) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Enforce documentation for every profile event counter. Write the documentation where it was missing. [#40646](https://github.com/ClickHouse/ClickHouse/pull/40646) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -217,15 +221,16 @@ * Fix read bytes/rows in X-ClickHouse-Summary with materialized views. [#41586](https://github.com/ClickHouse/ClickHouse/pull/41586) ([Raúl Marín](https://github.com/Algunenano)). * Fix possible `pipeline stuck` exception for queries with `OFFSET`. The error was found with `enable_optimize_predicate_expression = 0` and always false condition in `WHERE`. Fixes [#41383](https://github.com/ClickHouse/ClickHouse/issues/41383). [#41588](https://github.com/ClickHouse/ClickHouse/pull/41588) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). - -### ClickHouse release 22.8, 2022-08-18 +### ClickHouse release 22.8-lts, 2022-08-18 #### Backward Incompatible Change + * Extended range of `Date32` and `DateTime64` to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)). * Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. ([Kseniia Sumarokova](https://github.com/kssenii)). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171)) #### New Feature + * Query parameters can be set in interactive mode as `SET param_abc = 'def'` and transferred via the native protocol as settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)). * Quota key can be set in the native protocol ([Yakov Olkhovsky](https://github.com/ClickHouse/ClickHouse/pull/39874)). * Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)). @@ -240,12 +245,14 @@ * Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)). #### Experimental Feature + * Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)) ([Alexander Gololobov](https://github.com/davenger)). Note: this new feature does not make ClickHouse an HTAP DBMS. #### Performance Improvement + * Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)). * Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)). Add `concurrent_threads_soft_limit parameter` to increase performance in case of high QPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)). -* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)). +* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)). * Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)). [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)). * `DISTINCT` in order with `ORDER BY`: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)). Improve memory usage (significantly) and query execution time + use `DistinctSortedChunkTransform` for final distinct when `DISTINCT` columns match `ORDER BY` columns, but rename to `DistinctSortedStreamTransform` in `EXPLAIN PIPELINE` → this improves memory usage significantly + remove unnecessary allocations in hot loop in `DistinctSortedChunkTransform`. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)). Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)). Fix: `DistinctSortedTransform` didn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary `DISTINCT` (`DistinctTransform`). The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)). * Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)). @@ -256,6 +263,7 @@ * Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)). #### Improvement + * Normalize `AggregateFunction` types and state representations because optimizations like [#35788](https://github.com/ClickHouse/ClickHouse/pull/35788) will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)). The functions with identical states can be used in materialized views interchangeably. * Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set the ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)). * Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)). @@ -294,6 +302,7 @@ * Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)). #### Build/Testing/Packaging Improvement + * [ClickFiddle](https://fiddle.clickhouse.com/): A new tool for testing ClickHouse versions in read/write mode (**Igor Baliuk**). * ClickHouse binary is made self-extracting [#35775](https://github.com/ClickHouse/ClickHouse/pull/35775) ([Yakov Olkhovskiy, Arthur Filatenkov](https://github.com/yakov-olkhovskiy)). * Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -308,6 +317,7 @@ * Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). #### Bug Fix + * Fix possible segfault in `CapnProto` input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)). * Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -358,16 +368,17 @@ * A fix for reverse DNS resolution. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)). * Fix unexpected result `arrayDifference` of `Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)). - ### ClickHouse release 22.7, 2022-07-21 #### Upgrade Notes + * Enable setting `enable_positional_arguments` by default. It allows queries like `SELECT ... ORDER BY 1, 2` where 1, 2 are the references to the select clause. If you need to return the old behavior, disable this setting. [#38204](https://github.com/ClickHouse/ClickHouse/pull/38204) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Disable `format_csv_allow_single_quotes` by default. See [#37096](https://github.com/ClickHouse/ClickHouse/issues/37096). ([Kruglov Pavel](https://github.com/Avogar)). * `Ordinary` database engine and old storage definition syntax for `*MergeTree` tables are deprecated. By default it's not possible to create new databases with `Ordinary` engine. If `system` database has `Ordinary` engine it will be automatically converted to `Atomic` on server startup. There are settings to keep old behavior (`allow_deprecated_database_ordinary` and `allow_deprecated_syntax_for_merge_tree`), but these settings may be removed in future releases. [#38335](https://github.com/ClickHouse/ClickHouse/pull/38335) ([Alexander Tokmakov](https://github.com/tavplubix)). * Force rewriting comma join to inner by default (set default value `cross_to_inner_join_rewrite = 2`). To have old behavior set `cross_to_inner_join_rewrite = 1`. [#39326](https://github.com/ClickHouse/ClickHouse/pull/39326) ([Vladimir C](https://github.com/vdimir)). If you will face any incompatibilities, you can turn this setting back. #### New Feature + * Support expressions with window functions. Closes [#19857](https://github.com/ClickHouse/ClickHouse/issues/19857). [#37848](https://github.com/ClickHouse/ClickHouse/pull/37848) ([Dmitry Novik](https://github.com/novikd)). * Add new `direct` join algorithm for `EmbeddedRocksDB` tables, see [#33582](https://github.com/ClickHouse/ClickHouse/issues/33582). [#35363](https://github.com/ClickHouse/ClickHouse/pull/35363) ([Vladimir C](https://github.com/vdimir)). * Added full sorting merge join algorithm. [#35796](https://github.com/ClickHouse/ClickHouse/pull/35796) ([Vladimir C](https://github.com/vdimir)). @@ -395,9 +406,11 @@ * Add `clickhouse-diagnostics` binary to the packages. [#38647](https://github.com/ClickHouse/ClickHouse/pull/38647) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). #### Experimental Feature + * Adds new setting `implicit_transaction` to run standalone queries inside a transaction. It handles both creation and closing (via COMMIT if the query succeeded or ROLLBACK if it didn't) of the transaction automatically. [#38344](https://github.com/ClickHouse/ClickHouse/pull/38344) ([Raúl Marín](https://github.com/Algunenano)). #### Performance Improvement + * Distinct optimization for sorted columns. Use specialized distinct transformation in case input stream is sorted by column(s) in distinct. Optimization can be applied to pre-distinct, final distinct, or both. Initial implementation by @dimarub2000. [#37803](https://github.com/ClickHouse/ClickHouse/pull/37803) ([Igor Nikonov](https://github.com/devcrafter)). * Improve performance of `ORDER BY`, `MergeTree` merges, window functions using batch version of `BinaryHeap`. [#38022](https://github.com/ClickHouse/ClickHouse/pull/38022) ([Maksim Kita](https://github.com/kitaisreal)). * More parallel execution for queries with `FINAL` [#36396](https://github.com/ClickHouse/ClickHouse/pull/36396) ([Nikita Taranov](https://github.com/nickitat)). @@ -407,7 +420,7 @@ * Improve performance of insertion to columns of type `JSON`. [#38320](https://github.com/ClickHouse/ClickHouse/pull/38320) ([Anton Popov](https://github.com/CurtizJ)). * Optimized insertion and lookups in the HashTable. [#38413](https://github.com/ClickHouse/ClickHouse/pull/38413) ([Nikita Taranov](https://github.com/nickitat)). * Fix performance degradation from [#32493](https://github.com/ClickHouse/ClickHouse/issues/32493). [#38417](https://github.com/ClickHouse/ClickHouse/pull/38417) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)). +* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)). * Norm and Distance functions for arrays speed up 1.2-2 times. [#38740](https://github.com/ClickHouse/ClickHouse/pull/38740) ([Alexander Gololobov](https://github.com/davenger)). * Add AVX-512 VBMI optimized `copyOverlap32Shuffle` for LZ4 decompression. In other words, LZ4 decompression performance is improved. [#37891](https://github.com/ClickHouse/ClickHouse/pull/37891) ([Guo Wangyang](https://github.com/guowangy)). * `ORDER BY (a, b)` will use all the same benefits as `ORDER BY a, b`. [#38873](https://github.com/ClickHouse/ClickHouse/pull/38873) ([Igor Nikonov](https://github.com/devcrafter)). @@ -419,6 +432,7 @@ * The table `system.asynchronous_metric_log` is further optimized for storage space. This closes [#38134](https://github.com/ClickHouse/ClickHouse/issues/38134). See the [YouTube video](https://www.youtube.com/watch?v=0fSp9SF8N8A). [#38428](https://github.com/ClickHouse/ClickHouse/pull/38428) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### Improvement + * Support SQL standard CREATE INDEX and DROP INDEX syntax. [#35166](https://github.com/ClickHouse/ClickHouse/pull/35166) ([Jianmei Zhang](https://github.com/zhangjmruc)). * Send profile events for INSERT queries (previously only SELECT was supported). [#37391](https://github.com/ClickHouse/ClickHouse/pull/37391) ([Azat Khuzhin](https://github.com/azat)). * Implement in order aggregation (`optimize_aggregation_in_order`) for fully materialized projections. [#37469](https://github.com/ClickHouse/ClickHouse/pull/37469) ([Azat Khuzhin](https://github.com/azat)). @@ -464,6 +478,7 @@ * Allow to declare `RabbitMQ` queue without default arguments `x-max-length` and `x-overflow`. [#39259](https://github.com/ClickHouse/ClickHouse/pull/39259) ([rnbondarenko](https://github.com/rnbondarenko)). #### Build/Testing/Packaging Improvement + * Apply Clang Thread Safety Analysis (TSA) annotations to ClickHouse. [#38068](https://github.com/ClickHouse/ClickHouse/pull/38068) ([Robert Schulze](https://github.com/rschu1ze)). * Adapt universal installation script for FreeBSD. [#39302](https://github.com/ClickHouse/ClickHouse/pull/39302) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Preparation for building on `s390x` platform. [#39193](https://github.com/ClickHouse/ClickHouse/pull/39193) ([Harry Lee](https://github.com/HarryLeeIBM)). @@ -473,6 +488,7 @@ * Change `all|noarch` packages to architecture-dependent - Fix some documentation for it - Push aarch64|arm64 packages to artifactory and release assets - Fixes [#36443](https://github.com/ClickHouse/ClickHouse/issues/36443). [#38580](https://github.com/ClickHouse/ClickHouse/pull/38580) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). #### Bug Fix (user-visible misbehavior in official stable or prestable release) + * Fix rounding for `Decimal128/Decimal256` with more than 19-digits long scale. [#38027](https://github.com/ClickHouse/ClickHouse/pull/38027) ([Igor Nikonov](https://github.com/devcrafter)). * Fixed crash caused by data race in storage `Hive` (integration table engine). [#38887](https://github.com/ClickHouse/ClickHouse/pull/38887) ([lgbo](https://github.com/lgbo-ustc)). * Fix crash when executing GRANT ALL ON *.* with ON CLUSTER. It was broken in https://github.com/ClickHouse/ClickHouse/pull/35767. This closes [#38618](https://github.com/ClickHouse/ClickHouse/issues/38618). [#38674](https://github.com/ClickHouse/ClickHouse/pull/38674) ([Vitaly Baranov](https://github.com/vitlibar)). @@ -529,6 +545,7 @@ ### ClickHouse release 22.6, 2022-06-16 #### Backward Incompatible Change + * Remove support for octal number literals in SQL. In previous versions they were parsed as Float64. [#37765](https://github.com/ClickHouse/ClickHouse/pull/37765) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). * Changes how settings using `seconds` as type are parsed to support floating point values (for example: `max_execution_time=0.5`). Infinity or NaN values will throw an exception. [#37187](https://github.com/ClickHouse/ClickHouse/pull/37187) ([Raúl Marín](https://github.com/Algunenano)). * Changed format of binary serialization of columns of experimental type `Object`. New format is more convenient to implement by third-party clients. [#37482](https://github.com/ClickHouse/ClickHouse/pull/37482) ([Anton Popov](https://github.com/CurtizJ)). @@ -537,6 +554,7 @@ * If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade. #### New Feature + * Add `GROUPING` function. It allows to disambiguate the records in the queries with `ROLLUP`, `CUBE` or `GROUPING SETS`. Closes [#19426](https://github.com/ClickHouse/ClickHouse/issues/19426). [#37163](https://github.com/ClickHouse/ClickHouse/pull/37163) ([Dmitry Novik](https://github.com/novikd)). * A new codec [FPC](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf) algorithm for floating point data compression. [#37553](https://github.com/ClickHouse/ClickHouse/pull/37553) ([Mikhail Guzov](https://github.com/koloshmet)). * Add new columnar JSON formats: `JSONColumns`, `JSONCompactColumns`, `JSONColumnsWithMetadata`. Closes [#36338](https://github.com/ClickHouse/ClickHouse/issues/36338) Closes [#34509](https://github.com/ClickHouse/ClickHouse/issues/34509). [#36975](https://github.com/ClickHouse/ClickHouse/pull/36975) ([Kruglov Pavel](https://github.com/Avogar)). @@ -557,11 +575,13 @@ * Added `SYSTEM UNFREEZE` query that deletes the whole backup regardless if the corresponding table is deleted or not. [#36424](https://github.com/ClickHouse/ClickHouse/pull/36424) ([Vadim Volodin](https://github.com/PolyProgrammist)). #### Experimental Feature + * Enables `POPULATE` for `WINDOW VIEW`. [#36945](https://github.com/ClickHouse/ClickHouse/pull/36945) ([vxider](https://github.com/Vxider)). * `ALTER TABLE ... MODIFY QUERY` support for `WINDOW VIEW`. [#37188](https://github.com/ClickHouse/ClickHouse/pull/37188) ([vxider](https://github.com/Vxider)). * This PR changes the behavior of the `ENGINE` syntax in `WINDOW VIEW`, to make it like in `MATERIALIZED VIEW`. [#37214](https://github.com/ClickHouse/ClickHouse/pull/37214) ([vxider](https://github.com/Vxider)). #### Performance Improvement + * Added numerous optimizations for ARM NEON [#38093](https://github.com/ClickHouse/ClickHouse/pull/38093)([Daniel Kutenin](https://github.com/danlark1)), ([Alexandra Pilipyuk](https://github.com/chalice19)) Note: if you run different ClickHouse versions on a cluster with ARM CPU and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, the result of the aggregation query will be wrong during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade. * Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting `input_format_skip_unknown_fields` by default, because otherwise in case of select of subset of columns exception will be thrown. [#37192](https://github.com/ClickHouse/ClickHouse/pull/37192) ([Kruglov Pavel](https://github.com/Avogar)). * Now more filters can be pushed down for join. [#37472](https://github.com/ClickHouse/ClickHouse/pull/37472) ([Amos Bird](https://github.com/amosbird)). @@ -592,6 +612,7 @@ * In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". [#37242](https://github.com/ClickHouse/ClickHouse/pull/37242) ([jasperzhu](https://github.com/jinjunzh)). #### Improvement + * Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes [#37360](https://github.com/ClickHouse/ClickHouse/issues/37360). [#37667](https://github.com/ClickHouse/ClickHouse/pull/37667) ([Dmitry Novik](https://github.com/novikd)). * Fix stack traces collection on ARM. Closes [#37044](https://github.com/ClickHouse/ClickHouse/issues/37044). Closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#37797](https://github.com/ClickHouse/ClickHouse/pull/37797) ([Maksim Kita](https://github.com/kitaisreal)). * Client will try every IP address returned by DNS resolution until successful connection. [#37273](https://github.com/ClickHouse/ClickHouse/pull/37273) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). @@ -633,6 +654,7 @@ * Add implicit grants with grant option too. For example `GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTION` now allows `A` to execute `GRANT CREATE VIEW ON test.* TO B`. [#38017](https://github.com/ClickHouse/ClickHouse/pull/38017) ([Vitaly Baranov](https://github.com/vitlibar)). #### Build/Testing/Packaging Improvement + * Use `clang-14` and LLVM infrastructure version 14 for builds. This closes [#34681](https://github.com/ClickHouse/ClickHouse/issues/34681). [#34754](https://github.com/ClickHouse/ClickHouse/pull/34754) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Note: `clang-14` has [a bug](https://github.com/google/sanitizers/issues/1540) in ThreadSanitizer that makes our CI work worse. * Allow to drop privileges at startup. This simplifies Docker images. Closes [#36293](https://github.com/ClickHouse/ClickHouse/issues/36293). [#36341](https://github.com/ClickHouse/ClickHouse/pull/36341) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Add docs spellcheck to CI. [#37790](https://github.com/ClickHouse/ClickHouse/pull/37790) ([Vladimir C](https://github.com/vdimir)). @@ -690,7 +712,6 @@ * Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes [#37184](https://github.com/ClickHouse/ClickHouse/issues/37184). [#37185](https://github.com/ClickHouse/ClickHouse/pull/37185) ([Amos Bird](https://github.com/amosbird)). * Fixed `DateTime64` fractional seconds behavior prior to Unix epoch. [#37697](https://github.com/ClickHouse/ClickHouse/pull/37697) ([Andrey Zvonov](https://github.com/zvonand)). [#37039](https://github.com/ClickHouse/ClickHouse/pull/37039) ([李扬](https://github.com/taiyang-li)). - ### ClickHouse release 22.5, 2022-05-19 #### Upgrade Notes @@ -743,7 +764,7 @@ * Implement partial GROUP BY key for optimize_aggregation_in_order. [#35111](https://github.com/ClickHouse/ClickHouse/pull/35111) ([Azat Khuzhin](https://github.com/azat)). #### Improvement - + * Show names of erroneous files in case of parsing errors while executing table functions `file`, `s3` and `url`. [#36314](https://github.com/ClickHouse/ClickHouse/pull/36314) ([Anton Popov](https://github.com/CurtizJ)). * Allowed to increase the number of threads for executing background operations (merges, mutations, moves and fetches) at runtime if they are specified at top level config. [#36425](https://github.com/ClickHouse/ClickHouse/pull/36425) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Now date time conversion functions that generates time before 1970-01-01 00:00:00 with partial hours/minutes timezones will be saturated to zero instead of overflow. This is the continuation of https://github.com/ClickHouse/ClickHouse/pull/29953 which addresses https://github.com/ClickHouse/ClickHouse/pull/29953#discussion_r800550280 . Mark as improvement because it's implementation defined behavior (and very rare case) and we are allowed to break it. [#36656](https://github.com/ClickHouse/ClickHouse/pull/36656) ([Amos Bird](https://github.com/amosbird)). @@ -852,7 +873,6 @@ * Fix ALTER DROP COLUMN of nested column with compact parts (i.e. `ALTER TABLE x DROP COLUMN n`, when there is column `n.d`). [#35797](https://github.com/ClickHouse/ClickHouse/pull/35797) ([Azat Khuzhin](https://github.com/azat)). * Fix substring function range error length when `offset` and `length` is negative constant and `s` is not constant. [#33861](https://github.com/ClickHouse/ClickHouse/pull/33861) ([RogerYK](https://github.com/RogerYK)). - ### ClickHouse release 22.4, 2022-04-19 #### Backward Incompatible Change @@ -1004,8 +1024,7 @@ * Fix mutations in tables with enabled sparse columns. [#35284](https://github.com/ClickHouse/ClickHouse/pull/35284) ([Anton Popov](https://github.com/CurtizJ)). * Do not delay final part writing by default (fixes possible `Memory limit exceeded` during `INSERT` by adding `max_insert_delayed_streams_for_parallel_write` with default to 1000 for writes to s3 and disabled as before otherwise). [#34780](https://github.com/ClickHouse/ClickHouse/pull/34780) ([Azat Khuzhin](https://github.com/azat)). - -## ClickHouse release v22.3-lts, 2022-03-17 +### ClickHouse release v22.3-lts, 2022-03-17 #### Backward Incompatible Change @@ -1132,7 +1151,6 @@ * Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)). * Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)). - ### ClickHouse release v22.2, 2022-02-17 #### Upgrade Notes @@ -1308,7 +1326,6 @@ * Fix issue [#18206](https://github.com/ClickHouse/ClickHouse/issues/18206). [#33977](https://github.com/ClickHouse/ClickHouse/pull/33977) ([Vitaly Baranov](https://github.com/vitlibar)). * This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)). - ### ClickHouse release v22.1, 2022-01-18 #### Upgrade Notes @@ -1335,7 +1352,6 @@ * Add function `decodeURLFormComponent` slightly different to `decodeURLComponent`. Close [#10298](https://github.com/ClickHouse/ClickHouse/issues/10298). [#33451](https://github.com/ClickHouse/ClickHouse/pull/33451) ([SuperDJY](https://github.com/cmsxbc)). * Allow to split `GraphiteMergeTree` rollup rules for plain/tagged metrics (optional rule_type field). [#33494](https://github.com/ClickHouse/ClickHouse/pull/33494) ([Michail Safronov](https://github.com/msaf1980)). - #### Performance Improvement * Support moving conditions to `PREWHERE` (setting `optimize_move_to_prewhere`) for tables of `Merge` engine if its all underlying tables supports `PREWHERE`. [#33300](https://github.com/ClickHouse/ClickHouse/pull/33300) ([Anton Popov](https://github.com/CurtizJ)). @@ -1351,7 +1367,6 @@ * Optimize selecting of MergeTree parts that can be moved between volumes. [#33225](https://github.com/ClickHouse/ClickHouse/pull/33225) ([OnePiece](https://github.com/zhongyuankai)). * Fix `sparse_hashed` dict performance with sequential keys (wrong hash function). [#32536](https://github.com/ClickHouse/ClickHouse/pull/32536) ([Azat Khuzhin](https://github.com/azat)). - #### Experimental Feature * Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set `allow_experimental_parallel_reading_from_replicas = 1` and `max_parallel_replicas` to any number. This closes [#26748](https://github.com/ClickHouse/ClickHouse/issues/26748). [#29279](https://github.com/ClickHouse/ClickHouse/pull/29279) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). @@ -1364,7 +1379,6 @@ * Fix ACL with explicit digit hash in `clickhouse-keeper`: now the behavior consistent with ZooKeeper and generated digest is always accepted. [#33249](https://github.com/ClickHouse/ClickHouse/pull/33249) ([小路](https://github.com/nicelulu)). [#33246](https://github.com/ClickHouse/ClickHouse/pull/33246). * Fix unexpected projection removal when detaching parts. [#32067](https://github.com/ClickHouse/ClickHouse/pull/32067) ([Amos Bird](https://github.com/amosbird)). - #### Improvement * Now date time conversion functions that generates time before `1970-01-01 00:00:00` will be saturated to zero instead of overflow. [#29953](https://github.com/ClickHouse/ClickHouse/pull/29953) ([Amos Bird](https://github.com/amosbird)). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch. @@ -1411,7 +1425,6 @@ * Updating `modification_time` for data part in `system.parts` after part movement [#32964](https://github.com/ClickHouse/ClickHouse/issues/32964). [#32965](https://github.com/ClickHouse/ClickHouse/pull/32965) ([save-my-heart](https://github.com/save-my-heart)). * Potential issue, cannot be exploited: integer overflow may happen in array resize. [#33024](https://github.com/ClickHouse/ClickHouse/pull/33024) ([varadarajkumar](https://github.com/varadarajkumar)). - #### Build/Testing/Packaging Improvement * Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. [#32911](https://github.com/ClickHouse/ClickHouse/pull/32911) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). [#32415](https://github.com/ClickHouse/ClickHouse/pull/32415) @@ -1426,7 +1439,6 @@ * Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. [#33124](https://github.com/ClickHouse/ClickHouse/pull/33124) ([taiyang-li](https://github.com/taiyang-li)). * Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by [Mallik Hassan](https://github.com/SadiHassan) in [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). This closes [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). [#33026](https://github.com/ClickHouse/ClickHouse/pull/33026) ([alexey-milovidov](https://github.com/alexey-milovidov)). - #### Bug Fix (user-visible misbehavior in official stable or prestable release) * Several fixes for format parsing. This is relevant if `clickhouse-server` is open for write access to adversary. Specifically crafted input data for `Native` format may lead to reading uninitialized memory or crash. This is relevant if `clickhouse-server` is open for write access to adversary. [#33050](https://github.com/ClickHouse/ClickHouse/pull/33050) ([Heena Bansal](https://github.com/HeenaBansal2009)). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. [#33022](https://github.com/ClickHouse/ClickHouse/pull/33022) ([Harry Lee](https://github.com/HarryLeeIBM)). Fix null pointer dereference in `LowCardinality` data when deserializing `LowCardinality` data in the Native format. [#33021](https://github.com/ClickHouse/ClickHouse/pull/33021) ([Harry Lee](https://github.com/HarryLeeIBM)). @@ -1485,5 +1497,4 @@ * Fix possible crash (or incorrect result) in case of `LowCardinality` arguments of window function. Fixes [#31114](https://github.com/ClickHouse/ClickHouse/issues/31114). [#31888](https://github.com/ClickHouse/ClickHouse/pull/31888) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix hang up with command `DROP TABLE system.query_log sync`. [#33293](https://github.com/ClickHouse/ClickHouse/pull/33293) ([zhanghuajie](https://github.com/zhanghuajieHIT)). - ## [Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021) diff --git a/base/base/safeExit.cpp b/base/base/safeExit.cpp index e4f9e80759e..ddb93dac65b 100644 --- a/base/base/safeExit.cpp +++ b/base/base/safeExit.cpp @@ -1,10 +1,8 @@ #if defined(OS_LINUX) # include #endif -#include #include #include -#include [[noreturn]] void safeExit(int code) { diff --git a/cmake/ld.lld.in b/cmake/ld.lld.in index 9736dab1bc3..78a264a0089 100755 --- a/cmake/ld.lld.in +++ b/cmake/ld.lld.in @@ -3,15 +3,15 @@ # This is a workaround for bug in llvm/clang, # that does not produce .debug_aranges with LTO # -# NOTE: this is a temporary solution, that should be removed once [1] will be -# resolved. +# NOTE: this is a temporary solution, that should be removed after upgrading to +# clang-16/llvm-16. # -# [1]: https://discourse.llvm.org/t/clang-does-not-produce-full-debug-aranges-section-with-thinlto/64898/8 +# Refs: https://reviews.llvm.org/D133092 # NOTE: only -flto=thin is supported. # NOTE: it is not possible to check was there -gdwarf-aranges initially or not. if [[ "$*" =~ -plugin-opt=thinlto ]]; then - exec "@LLD_PATH@" -mllvm -generate-arange-section "$@" + exec "@LLD_PATH@" -plugin-opt=-generate-arange-section "$@" else exec "@LLD_PATH@" "$@" fi diff --git a/cmake/tools.cmake b/cmake/tools.cmake index 57d39899a40..8a17d97cf13 100644 --- a/cmake/tools.cmake +++ b/cmake/tools.cmake @@ -117,7 +117,7 @@ endif() # Archiver if (COMPILER_GCC) - find_program (LLVM_AR_PATH NAMES "llvm-ar" "llvm-ar-14" "llvm-ar-13" "llvm-ar-12") + find_program (LLVM_AR_PATH NAMES "llvm-ar" "llvm-ar-15" "llvm-ar-14" "llvm-ar-13" "llvm-ar-12") else () find_program (LLVM_AR_PATH NAMES "llvm-ar-${COMPILER_VERSION_MAJOR}" "llvm-ar") endif () @@ -131,7 +131,7 @@ message(STATUS "Using archiver: ${CMAKE_AR}") # Ranlib if (COMPILER_GCC) - find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib" "llvm-ranlib-14" "llvm-ranlib-13" "llvm-ranlib-12") + find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib" "llvm-ranlib-15" "llvm-ranlib-14" "llvm-ranlib-13" "llvm-ranlib-12") else () find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib-${COMPILER_VERSION_MAJOR}" "llvm-ranlib") endif () @@ -145,7 +145,7 @@ message(STATUS "Using ranlib: ${CMAKE_RANLIB}") # Install Name Tool if (COMPILER_GCC) - find_program (LLVM_INSTALL_NAME_TOOL_PATH NAMES "llvm-install-name-tool" "llvm-install-name-tool-14" "llvm-install-name-tool-13" "llvm-install-name-tool-12") + find_program (LLVM_INSTALL_NAME_TOOL_PATH NAMES "llvm-install-name-tool" "llvm-install-name-tool-15" "llvm-install-name-tool-14" "llvm-install-name-tool-13" "llvm-install-name-tool-12") else () find_program (LLVM_INSTALL_NAME_TOOL_PATH NAMES "llvm-install-name-tool-${COMPILER_VERSION_MAJOR}" "llvm-install-name-tool") endif () @@ -159,7 +159,7 @@ message(STATUS "Using install-name-tool: ${CMAKE_INSTALL_NAME_TOOL}") # Objcopy if (COMPILER_GCC) - find_program (OBJCOPY_PATH NAMES "llvm-objcopy" "llvm-objcopy-14" "llvm-objcopy-13" "llvm-objcopy-12" "objcopy") + find_program (OBJCOPY_PATH NAMES "llvm-objcopy" "llvm-objcopy-15" "llvm-objcopy-14" "llvm-objcopy-13" "llvm-objcopy-12" "objcopy") else () find_program (OBJCOPY_PATH NAMES "llvm-objcopy-${COMPILER_VERSION_MAJOR}" "llvm-objcopy" "objcopy") endif () @@ -173,7 +173,7 @@ endif () # Strip if (COMPILER_GCC) - find_program (STRIP_PATH NAMES "llvm-strip" "llvm-strip-14" "llvm-strip-13" "llvm-strip-12" "strip") + find_program (STRIP_PATH NAMES "llvm-strip" "llvm-strip-15" "llvm-strip-14" "llvm-strip-13" "llvm-strip-12" "strip") else () find_program (STRIP_PATH NAMES "llvm-strip-${COMPILER_VERSION_MAJOR}" "llvm-strip" "strip") endif () diff --git a/docker/test/stress/run.sh b/docker/test/stress/run.sh old mode 100755 new mode 100644 index 6b9954c2431..27c96acbae1 --- a/docker/test/stress/run.sh +++ b/docker/test/stress/run.sh @@ -47,7 +47,6 @@ function install_packages() function configure() { - export ZOOKEEPER_FAULT_INJECTION=1 # install test configs export USE_DATABASE_ORDINARY=1 export EXPORT_S3_STORAGE_POLICIES=1 @@ -203,6 +202,7 @@ quit install_packages package_folder +export ZOOKEEPER_FAULT_INJECTION=1 configure azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log & @@ -243,6 +243,7 @@ stop # Let's enable S3 storage by default export USE_S3_STORAGE_FOR_MERGE_TREE=1 +export ZOOKEEPER_FAULT_INJECTION=1 configure # But we still need default disk because some tables loaded only into it @@ -375,6 +376,8 @@ else install_packages previous_release_package_folder # Start server from previous release + # Previous version may not be ready for fault injections + export ZOOKEEPER_FAULT_INJECTION=0 configure # Avoid "Setting s3_check_objects_after_upload is neither a builtin setting..." @@ -389,12 +392,23 @@ else clickhouse-client --query="SELECT 'Server version: ', version()" - # Install new package before running stress test because we should use new clickhouse-client and new clickhouse-test - # But we should leave old binary in /usr/bin/ for gdb (so it will print sane stacktarces) + # Install new package before running stress test because we should use new + # clickhouse-client and new clickhouse-test. + # + # But we should leave old binary in /usr/bin/ and debug symbols in + # /usr/lib/debug/usr/bin (if any) for gdb and internal DWARF parser, so it + # will print sane stacktraces and also to avoid possible crashes. + # + # FIXME: those files can be extracted directly from debian package, but + # actually better solution will be to use different PATH instead of playing + # games with files from packages. mv /usr/bin/clickhouse previous_release_package_folder/ + mv /usr/lib/debug/usr/bin/clickhouse.debug previous_release_package_folder/ install_packages package_folder mv /usr/bin/clickhouse package_folder/ + mv /usr/lib/debug/usr/bin/clickhouse.debug package_folder/ mv previous_release_package_folder/clickhouse /usr/bin/ + mv previous_release_package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug mkdir tmp_stress_output @@ -410,6 +424,8 @@ else # Start new server mv package_folder/clickhouse /usr/bin/ + mv package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug + export ZOOKEEPER_FAULT_INJECTION=1 configure start 500 clickhouse-client --query "SELECT 'Backward compatibility check: Server successfully started', 'OK'" >> /test_output/test_results.tsv \ diff --git a/docs/en/development/architecture.md b/docs/en/development/architecture.md index c13b2519b84..fe644c43889 100644 --- a/docs/en/development/architecture.md +++ b/docs/en/development/architecture.md @@ -49,27 +49,13 @@ When we calculate some function over columns in a block, we add another column w Blocks are created for every processed chunk of data. Note that for the same type of calculation, the column names and types remain the same for different blocks, and only column data changes. It is better to split block data from the block header because small block sizes have a high overhead of temporary strings for copying shared_ptrs and column names. -## Block Streams {#block-streams} +## Processors -Block streams are for processing data. We use streams of blocks to read data from somewhere, perform data transformations, or write data to somewhere. `IBlockInputStream` has the `read` method to fetch the next block while available. `IBlockOutputStream` has the `write` method to push the block somewhere. - -Streams are responsible for: - -1. Reading or writing to a table. The table just returns a stream for reading or writing blocks. -2. Implementing data formats. For example, if you want to output data to a terminal in `Pretty` format, you create a block output stream where you push blocks, and it formats them. -3. Performing data transformations. Let’s say you have `IBlockInputStream` and want to create a filtered stream. You create `FilterBlockInputStream` and initialize it with your stream. Then when you pull a block from `FilterBlockInputStream`, it pulls a block from your stream, filters it, and returns the filtered block to you. Query execution pipelines are represented this way. - -There are more sophisticated transformations. For example, when you pull from `AggregatingBlockInputStream`, it reads all data from its source, aggregates it, and then returns a stream of aggregated data for you. Another example: `UnionBlockInputStream` accepts many input sources in the constructor and also a number of threads. It launches multiple threads and reads from multiple sources in parallel. - -> Block streams use the “pull” approach to control flow: when you pull a block from the first stream, it consequently pulls the required blocks from nested streams, and the entire execution pipeline will work. Neither “pull” nor “push” is the best solution, because control flow is implicit, and that limits the implementation of various features like simultaneous execution of multiple queries (merging many pipelines together). This limitation could be overcome with coroutines or just running extra threads that wait for each other. We may have more possibilities if we make control flow explicit: if we locate the logic for passing data from one calculation unit to another outside of those calculation units. Read this [article](http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/) for more thoughts. - -We should note that the query execution pipeline creates temporary data at each step. We try to keep block size small enough so that temporary data fits in the CPU cache. With that assumption, writing and reading temporary data is almost free in comparison with other calculations. We could consider an alternative, which is to fuse many operations in the pipeline together. It could make the pipeline as short as possible and remove much of the temporary data, which could be an advantage, but it also has drawbacks. For example, a split pipeline makes it easy to implement caching intermediate data, stealing intermediate data from similar queries running at the same time, and merging pipelines for similar queries. +See the description at [https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h). ## Formats {#formats} -Data formats are implemented with block streams. There are “presentational” formats only suitable for the output of data to the client, such as `Pretty` format, which provides only `IBlockOutputStream`. And there are input/output formats, such as `TabSeparated` or `JSONEachRow`. - -There are also row streams: `IRowInputStream` and `IRowOutputStream`. They allow you to pull/push data by individual rows, not by blocks. And they are only needed to simplify the implementation of row-oriented formats. The wrappers `BlockInputStreamFromRowInputStream` and `BlockOutputStreamFromRowOutputStream` allow you to convert row-oriented streams to regular block-oriented streams. +Data formats are implemented with processors. ## I/O {#io} diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index 9dc7e300d45..486baac2310 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -419,6 +419,8 @@ Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `St For `Map` data type client can specify if index should be created for keys or values using [mapKeys](../../../sql-reference/functions/tuple-map-functions.md#mapkeys) or [mapValues](../../../sql-reference/functions/tuple-map-functions.md#mapvalues) function. +There are also special-purpose and experimental indexes to support approximate nearest neighbor (ANN) queries. See [here](annindexes.md) for details. + The following functions can use the filter: [equals](../../../sql-reference/functions/comparison-functions.md), [notEquals](../../../sql-reference/functions/comparison-functions.md), [in](../../../sql-reference/functions/in-functions), [notIn](../../../sql-reference/functions/in-functions), [has](../../../sql-reference/functions/array-functions#hasarr-elem), [hasAny](../../../sql-reference/functions/array-functions#hasany), [hasAll](../../../sql-reference/functions/array-functions#hasall). Example of index creation for `Map` data type diff --git a/docs/en/sql-reference/functions/date-time-functions.md b/docs/en/sql-reference/functions/date-time-functions.md index 76f66db924f..15fc9ef0c89 100644 --- a/docs/en/sql-reference/functions/date-time-functions.md +++ b/docs/en/sql-reference/functions/date-time-functions.md @@ -271,11 +271,7 @@ Result: The return type of `toStartOf*`, `toLastDayOfMonth`, `toMonday`, `timeSlot` functions described below is determined by the configuration parameter [enable_extended_results_for_datetime_functions](../../operations/settings/settings#enable-extended-results-for-datetime-functions) which is `0` by default. Behavior for -* `enable_extended_results_for_datetime_functions = 0`: Functions `toStartOfYear`, `toStartOfISOYear`, `toStartOfQuarter`, `toStartOfMonth`, `toStartOfWeek`, `toLastDayOfMonth`, `toMonday` return `Date` or `DateTime`. Functions `toStartOfDay`, `toStartOfHour`, `toStartOfFifteenMinutes`, `toStartOfTenMinutes`, `toStartOfFiveMinutes`, `toStartOfMinute`, `timeSlot` return `DateTime`. Though these functions can take values of the extended types `Date32` and `DateTime64` as an argument, passing them a time outside the normal range (year 1970 to 2149 for `Date` / 2106 for `DateTime`) will produce wrong results. In case argument is out of normal range: - * If the argument is smaller than 1970, the result will be calculated from the argument `1970-01-01 (00:00:00)` instead. - * If the return type is `DateTime` and the argument is larger than `2106-02-07 08:28:15`, the result will be calculated from the argument `2106-02-07 08:28:15` instead. - * If the return type is `Date` and the argument is larger than `2149-06-06`, the result will be calculated from the argument `2149-06-06` instead. - * If `toLastDayOfMonth` is called with an argument greater then `2149-05-31`, the result will be calculated from the argument `2149-05-31` instead. +* `enable_extended_results_for_datetime_functions = 0`: Functions `toStartOfYear`, `toStartOfISOYear`, `toStartOfQuarter`, `toStartOfMonth`, `toStartOfWeek`, `toLastDayOfMonth`, `toMonday` return `Date` or `DateTime`. Functions `toStartOfDay`, `toStartOfHour`, `toStartOfFifteenMinutes`, `toStartOfTenMinutes`, `toStartOfFiveMinutes`, `toStartOfMinute`, `timeSlot` return `DateTime`. Though these functions can take values of the extended types `Date32` and `DateTime64` as an argument, passing them a time outside the normal range (year 1970 to 2149 for `Date` / 2106 for `DateTime`) will produce wrong results. * `enable_extended_results_for_datetime_functions = 1`: * Functions `toStartOfYear`, `toStartOfISOYear`, `toStartOfQuarter`, `toStartOfMonth`, `toStartOfWeek`, `toLastDayOfMonth`, `toMonday` return `Date` or `DateTime` if their argument is a `Date` or `DateTime`, and they return `Date32` or `DateTime64` if their argument is a `Date32` or `DateTime64`. * Functions `toStartOfDay`, `toStartOfHour`, `toStartOfFifteenMinutes`, `toStartOfTenMinutes`, `toStartOfFiveMinutes`, `toStartOfMinute`, `timeSlot` return `DateTime` if their argument is a `Date` or `DateTime`, and they return `DateTime64` if their argument is a `Date32` or `DateTime64`. @@ -302,25 +298,22 @@ Returns the date. Rounds down a date or date with time to the first day of the month. Returns the date. -## toLastDayOfMonth - -Rounds up a date or date with time to the last day of the month. -Returns the date. +:::note +The behavior of parsing incorrect dates is implementation specific. ClickHouse may return zero date, throw an exception or do “natural” overflow. +::: If `toLastDayOfMonth` is called with an argument of type `Date` greater then 2149-05-31, the result will be calculated from the argument 2149-05-31 instead. ## toMonday Rounds down a date or date with time to the nearest Monday. -As a special case, date arguments `1970-01-01`, `1970-01-02`, `1970-01-03` and `1970-01-04` return date `1970-01-01`. Returns the date. ## toStartOfWeek(t\[,mode\]) Rounds down a date or date with time to the nearest Sunday or Monday by mode. Returns the date. -As a special case, date arguments `1970-01-01`, `1970-01-02`, `1970-01-03` and `1970-01-04` (and `1970-01-05` if `mode` is `1`) return date `1970-01-01`. -The `mode` argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used. +The mode argument works exactly like the mode argument to toWeek(). For the single-argument syntax, a mode value of 0 is used. ## toStartOfDay @@ -671,9 +664,9 @@ Aliases: `dateDiff`, `DATE_DIFF`. - `quarter` - `year` -- `startdate` — The first time value to subtract (the subtrahend). [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md). +- `startdate` — The first time value to subtract (the subtrahend). [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md). -- `enddate` — The second time value to subtract from (the minuend). [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md). +- `enddate` — The second time value to subtract from (the minuend). [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md). - `timezone` — [Timezone name](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) (optional). If specified, it is applied to both `startdate` and `enddate`. If not specified, timezones of `startdate` and `enddate` are used. If they are not the same, the result is unspecified. [String](../../sql-reference/data-types/string.md). @@ -1075,7 +1068,7 @@ Example: SELECT timeSlots(toDateTime('2012-01-01 12:20:00'), toUInt32(600)); SELECT timeSlots(toDateTime('1980-12-12 21:01:02', 'UTC'), toUInt32(600), 299); SELECT timeSlots(toDateTime64('1980-12-12 21:01:02.1234', 4, 'UTC'), toDecimal64(600.1, 1), toDecimal64(299, 0)); -``` +``` ``` text ┌─timeSlots(toDateTime('2012-01-01 12:20:00'), toUInt32(600))─┐ │ ['2012-01-01 12:00:00','2012-01-01 12:30:00'] │ @@ -1163,7 +1156,7 @@ dateName(date_part, date) **Arguments** - `date_part` — Date part. Possible values: 'year', 'quarter', 'month', 'week', 'dayofyear', 'day', 'weekday', 'hour', 'minute', 'second'. [String](../../sql-reference/data-types/string.md). -- `date` — Date. [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md). +- `date` — Date. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md). - `timezone` — Timezone. Optional. [String](../../sql-reference/data-types/string.md). **Returned value** diff --git a/docs/en/sql-reference/functions/other-functions.md b/docs/en/sql-reference/functions/other-functions.md index b80d75e3611..6490d4c2272 100644 --- a/docs/en/sql-reference/functions/other-functions.md +++ b/docs/en/sql-reference/functions/other-functions.md @@ -571,7 +571,7 @@ Example: ``` sql SELECT - transform(domain(Referer), ['yandex.ru', 'google.ru', 'vk.com'], ['www.yandex', 'example.com']) AS s, + transform(domain(Referer), ['yandex.ru', 'google.ru', 'vkontakte.ru'], ['www.yandex', 'example.com', 'vk.com']) AS s, count() AS c FROM test.hits GROUP BY domain(Referer) diff --git a/docs/en/sql-reference/statements/misc.md b/docs/en/sql-reference/statements/misc.md deleted file mode 100644 index d812dd2008a..00000000000 --- a/docs/en/sql-reference/statements/misc.md +++ /dev/null @@ -1,21 +0,0 @@ ---- -slug: /en/sql-reference/statements/misc -toc_hidden: true -sidebar_position: 70 ---- - -# Miscellaneous Statements - -- [ATTACH](../../sql-reference/statements/attach.md) -- [CHECK TABLE](../../sql-reference/statements/check-table.md) -- [DESCRIBE TABLE](../../sql-reference/statements/describe-table.md) -- [DETACH](../../sql-reference/statements/detach.md) -- [DROP](../../sql-reference/statements/drop.md) -- [EXISTS](../../sql-reference/statements/exists.md) -- [KILL](../../sql-reference/statements/kill.md) -- [OPTIMIZE](../../sql-reference/statements/optimize.md) -- [RENAME](../../sql-reference/statements/rename.md) -- [SET](../../sql-reference/statements/set.md) -- [SET ROLE](../../sql-reference/statements/set-role.md) -- [TRUNCATE](../../sql-reference/statements/truncate.md) -- [USE](../../sql-reference/statements/use.md) diff --git a/docs/ru/sql-reference/data-types/date.md b/docs/ru/sql-reference/data-types/date.md index 7254b82f461..185fe28d567 100644 --- a/docs/ru/sql-reference/data-types/date.md +++ b/docs/ru/sql-reference/data-types/date.md @@ -6,7 +6,7 @@ sidebar_label: Date # Date {#data-type-date} -Дата. Хранится в двух байтах в виде (беззнакового) числа дней, прошедших от 1970-01-01. Позволяет хранить значения от чуть больше, чем начала unix-эпохи до верхнего порога, определяющегося константой на этапе компиляции (сейчас - до 2149 года, последний полностью поддерживаемый год - 2148). +Дата. Хранится в двух байтах в виде (беззнакового) числа дней, прошедших от 1970-01-01. Позволяет хранить значения от чуть больше, чем начала unix-эпохи до верхнего порога, определяющегося константой на этапе компиляции (сейчас - до 2106 года, последний полностью поддерживаемый год - 2105). Диапазон значений: \[1970-01-01, 2149-06-06\]. diff --git a/docs/ru/sql-reference/functions/date-time-functions.md b/docs/ru/sql-reference/functions/date-time-functions.md index a7d2ce49fae..80e2561a8d7 100644 --- a/docs/ru/sql-reference/functions/date-time-functions.md +++ b/docs/ru/sql-reference/functions/date-time-functions.md @@ -272,15 +272,9 @@ SELECT toUnixTimestamp('2017-11-05 08:07:47', 'Asia/Tokyo') AS unix_timestamp; Поведение для * `enable_extended_results_for_datetime_functions = 0`: Функции `toStartOf*`, `toLastDayOfMonth`, `toMonday` возвращают `Date` или `DateTime`. Функции `toStartOfDay`, `toStartOfHour`, `toStartOfFifteenMinutes`, `toStartOfTenMinutes`, `toStartOfFiveMinutes`, `toStartOfMinute`, `timeSlot` возвращают `DateTime`. Хотя эти функции могут принимать значения типа `Date32` или `DateTime64` в качестве аргумента, при обработке аргумента вне нормального диапазона значений (`1970` - `2148` для `Date` и `1970-01-01 00:00:00`-`2106-02-07 08:28:15` для `DateTime`) будет получен некорректный результат. -В случае если значение аргумента вне нормального диапазона: - * `1970-01-01 (00:00:00)` будет возвращён для моментов времени до 1970 года, - * `2106-02-07 08:28:15` будет взят в качестве аргумента, если полученный аргумент превосходит данное значение и возвращаемый тип - `DateTime`, - * `2149-06-06` будет взят в качестве аргумента, если полученный аргумент превосходит данное значение и возвращаемый тип - `Date`, - * `2149-05-31` будет результатом функции `toLastDayOfMonth` при обработке аргумента больше `2149-05-31`. * `enable_extended_results_for_datetime_functions = 1`: * Функции `toStartOfYear`, `toStartOfISOYear`, `toStartOfQuarter`, `toStartOfMonth`, `toStartOfWeek`, `toLastDayOfMonth`, `toMonday` возвращают `Date` или `DateTime` если их аргумент `Date` или `DateTime` и они возвращают `Date32` или `DateTime64` если их аргумент `Date32` или `DateTime64`. * Функции `toStartOfDay`, `toStartOfHour`, `toStartOfFifteenMinutes`, `toStartOfTenMinutes`, `toStartOfFiveMinutes`, `toStartOfMinute`, `timeSlot` возвращают `DateTime` если их аргумент `Date` или `DateTime` и они возвращают `DateTime64` если их аргумент `Date32` или `DateTime64`. - ::: ## toStartOfYear {#tostartofyear} @@ -321,20 +315,20 @@ SELECT toStartOfISOYear(toDate('2017-01-01')) AS ISOYear20170101; Округляет дату или дату-с-временем до последнего числа месяца. Возвращается дата. -Если `toLastDayOfMonth` вызывается с аргументом типа `Date` большим чем 2149-05-31, то результат будет вычислен от аргумента 2149-05-31. - +:::note "Attention" + Возвращаемое значение для некорректных дат зависит от реализации. ClickHouse может вернуть нулевую дату, выбросить исключение, или выполнить «естественное» перетекание дат между месяцами. +::: + ## toMonday {#tomonday} Округляет дату или дату-с-временем вниз до ближайшего понедельника. -Частный случай: для дат `1970-01-01`, `1970-01-02`, `1970-01-03` и `1970-01-04` результатом будет `1970-01-01`. Возвращается дата. ## toStartOfWeek(t[,mode]) {#tostartofweek} Округляет дату или дату со временем до ближайшего воскресенья или понедельника в соответствии с mode. Возвращается дата. -Частный случай: для дат `1970-01-01`, `1970-01-02`, `1970-01-03` и `1970-01-04` (и `1970-01-05`, если `mode` равен `1`) результатом будет `1970-01-01`. -Аргумент `mode` работает точно так же, как аргумент mode [toWeek()](#toweek). Если аргумент mode опущен, то используется режим 0. +Аргумент mode работает точно так же, как аргумент mode [toWeek()](#toweek). Если аргумент mode опущен, то используется режим 0. ## toStartOfDay {#tostartofday} @@ -721,9 +715,9 @@ date_diff('unit', startdate, enddate, [timezone]) - `quarter` - `year` -- `startdate` — первая дата или дата со временем, которая вычитается из `enddate`. [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). +- `startdate` — первая дата или дата со временем, которая вычитается из `enddate`. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md). -- `enddate` — вторая дата или дата со временем, из которой вычитается `startdate`. [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). +- `enddate` — вторая дата или дата со временем, из которой вычитается `startdate`. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md). - `timezone` — [часовой пояс](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) (необязательно). Если этот аргумент указан, то он применяется как для `startdate`, так и для `enddate`. Если этот аргумент не указан, то используются часовые пояса аргументов `startdate` и `enddate`. Если часовые пояса аргументов `startdate` и `enddate` не совпадают, то результат не определен. [String](../../sql-reference/data-types/string.md). @@ -975,8 +969,7 @@ SELECT now('Europe/Moscow'); ## timeSlots(StartTime, Duration,\[, Size\]) {#timeslotsstarttime-duration-size} Для интервала, начинающегося в `StartTime` и длящегося `Duration` секунд, возвращает массив моментов времени, кратных `Size`. Параметр `Size` указывать необязательно, по умолчанию он равен 1800 секундам (30 минутам) - необязательный параметр. -Данная функция может использоваться, например, для анализа количества просмотров страницы за соответствующую сессию. -Аргумент `StartTime` может иметь тип `DateTime` или `DateTime64`. В случае, если используется `DateTime`, аргументы `Duration` и `Size` должны иметь тип `UInt32`; Для DateTime64 они должны быть типа `Decimal64`. + Возвращает массив DateTime/DateTime64 (тип будет совпадать с типом параметра ’StartTime’). Для DateTime64 масштаб(scale) возвращаемой величины может отличаться от масштаба фргумента ’StartTime’ --- результат будет иметь наибольший масштаб среди всех данных аргументов. Пример использования: @@ -1085,7 +1078,7 @@ dateName(date_part, date) **Аргументы** - `date_part` — часть даты. Возможные значения: 'year', 'quarter', 'month', 'week', 'dayofyear', 'day', 'weekday', 'hour', 'minute', 'second'. [String](../../sql-reference/data-types/string.md). -- `date` — дата. [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md). +- `date` — дата. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md). - `timezone` — часовой пояс. Необязательный аргумент. [String](../../sql-reference/data-types/string.md). **Возвращаемое значение** diff --git a/docs/ru/sql-reference/functions/other-functions.md b/docs/ru/sql-reference/functions/other-functions.md index 5c8584cd2a0..af21ccd6bed 100644 --- a/docs/ru/sql-reference/functions/other-functions.md +++ b/docs/ru/sql-reference/functions/other-functions.md @@ -568,7 +568,7 @@ ORDER BY c DESC ``` sql SELECT - transform(domain(Referer), ['yandex.ru', 'google.ru', 'vk.com'], ['www.yandex', 'example.com']) AS s, + transform(domain(Referer), ['yandex.ru', 'google.ru', 'vkontakte.ru'], ['www.yandex', 'example.com', 'vk.com']) AS s, count() AS c FROM test.hits GROUP BY domain(Referer) diff --git a/docs/ru/sql-reference/statements/misc.md b/docs/ru/sql-reference/statements/misc.md deleted file mode 100644 index 437215f20ce..00000000000 --- a/docs/ru/sql-reference/statements/misc.md +++ /dev/null @@ -1,21 +0,0 @@ ---- -slug: /ru/sql-reference/statements/misc -sidebar_position: 41 ---- - -# Прочие виды запросов {#prochie-vidy-zaprosov} - -- [ATTACH](../../sql-reference/statements/attach.md) -- [CHECK TABLE](../../sql-reference/statements/check-table.md) -- [DESCRIBE TABLE](../../sql-reference/statements/describe-table.md) -- [DETACH](../../sql-reference/statements/detach.md) -- [DROP](../../sql-reference/statements/drop.md) -- [EXISTS](../../sql-reference/statements/exists.md) -- [KILL](../../sql-reference/statements/kill.md) -- [OPTIMIZE](../../sql-reference/statements/optimize.md) -- [RENAME](../../sql-reference/statements/rename.md) -- [SET](../../sql-reference/statements/set.md) -- [SET ROLE](../../sql-reference/statements/set-role.md) -- [TRUNCATE](../../sql-reference/statements/truncate.md) -- [USE](../../sql-reference/statements/use.md) - diff --git a/docs/zh/sql-reference/data-types/date.md b/docs/zh/sql-reference/data-types/date.md index 9b1acdbe939..a8874151e75 100644 --- a/docs/zh/sql-reference/data-types/date.md +++ b/docs/zh/sql-reference/data-types/date.md @@ -3,7 +3,7 @@ slug: /zh/sql-reference/data-types/date --- # 日期 {#date} -日期类型,用两个字节存储,表示从 1970-01-01 (无符号) 到当前的日期值。允许存储从 Unix 纪元开始到编译阶段定义的上限阈值常量(目前上限是2149年,但最终完全支持的年份为2148)。最小值输出为1970-01-01。 +日期类型,用两个字节存储,表示从 1970-01-01 (无符号) 到当前的日期值。允许存储从 Unix 纪元开始到编译阶段定义的上限阈值常量(目前上限是2106年,但最终完全支持的年份为2105)。最小值输出为1970-01-01。 值的范围: \[1970-01-01, 2149-06-06\]。 diff --git a/docs/zh/sql-reference/functions/other-functions.md b/docs/zh/sql-reference/functions/other-functions.md index a475420ba64..62d2a377ff1 100644 --- a/docs/zh/sql-reference/functions/other-functions.md +++ b/docs/zh/sql-reference/functions/other-functions.md @@ -237,7 +237,7 @@ ORDER BY c DESC ``` sql SELECT - transform(domain(Referer), ['yandex.ru', 'google.ru', 'vk.com'], ['www.yandex', 'example.com']) AS s, + transform(domain(Referer), ['yandex.ru', 'google.ru', 'vkontakte.ru'], ['www.yandex', 'example.com', 'vk.com']) AS s, count() AS c FROM test.hits GROUP BY domain(Referer) diff --git a/programs/client/clickhouse-client.xml b/programs/client/clickhouse-client.xml index 66e7afd8f8c..00f5b26eddf 100644 --- a/programs/client/clickhouse-client.xml +++ b/programs/client/clickhouse-client.xml @@ -19,7 +19,6 @@ {host} {port} {user} - {database} {display_name} Terminal colors: https://misc.flogisoft.com/bash/tip_colors_and_formatting See also: https://wiki.hackzine.org/development/misc/readline-color-prompt.html diff --git a/programs/disks/DisksApp.cpp b/programs/disks/DisksApp.cpp index 749ccb3e503..91472a8df33 100644 --- a/programs/disks/DisksApp.cpp +++ b/programs/disks/DisksApp.cpp @@ -57,7 +57,7 @@ void DisksApp::addOptions( ("config-file,C", po::value(), "Set config file") ("disk", po::value(), "Set disk name") ("command_name", po::value(), "Name for command to do") - ("send-logs", "Send logs") + ("save-logs", "Save logs to a file") ("log-level", po::value(), "Logging level") ; @@ -82,10 +82,10 @@ void DisksApp::processOptions() config().setString("config-file", options["config-file"].as()); if (options.count("disk")) config().setString("disk", options["disk"].as()); - if (options.count("send-logs")) - config().setBool("send-logs", true); + if (options.count("save-logs")) + config().setBool("save-logs", true); if (options.count("log-level")) - Poco::Logger::root().setLevel(options["log-level"].as()); + config().setString("log-level", options["log-level"].as()); } void DisksApp::init(std::vector & common_arguments) @@ -149,15 +149,6 @@ void DisksApp::parseAndCheckOptions( int DisksApp::main(const std::vector & /*args*/) { - if (config().has("send-logs")) - { - auto log_level = config().getString("log-level", "trace"); - Poco::Logger::root().setLevel(Poco::Logger::parseLevel(log_level)); - - auto log_path = config().getString("logger.clickhouse-disks", "/var/log/clickhouse-server/clickhouse-disks.log"); - Poco::Logger::root().setChannel(Poco::AutoPtr(new Poco::FileChannel(log_path))); - } - if (config().has("config-file") || fs::exists(getDefaultConfigFileName())) { String config_path = config().getString("config-file", getDefaultConfigFileName()); @@ -171,6 +162,20 @@ int DisksApp::main(const std::vector & /*args*/) throw Exception(ErrorCodes::BAD_ARGUMENTS, "No config-file specifiged"); } + if (config().has("save-logs")) + { + auto log_level = config().getString("log-level", "trace"); + Poco::Logger::root().setLevel(Poco::Logger::parseLevel(log_level)); + + auto log_path = config().getString("logger.clickhouse-disks", "/var/log/clickhouse-server/clickhouse-disks.log"); + Poco::Logger::root().setChannel(Poco::AutoPtr(new Poco::FileChannel(log_path))); + } + else + { + auto log_level = config().getString("log-level", "none"); + Poco::Logger::root().setLevel(Poco::Logger::parseLevel(log_level)); + } + registerDisks(); registerFormats(); diff --git a/programs/keeper/CMakeLists.txt b/programs/keeper/CMakeLists.txt index ce176ccade5..9266a4ca419 100644 --- a/programs/keeper/CMakeLists.txt +++ b/programs/keeper/CMakeLists.txt @@ -45,6 +45,7 @@ if (BUILD_STANDALONE_KEEPER) ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperLogStore.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperServer.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManager.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManagerS3.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateMachine.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateManager.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStorage.cpp diff --git a/src/AggregateFunctions/AggregateFunctionQuantile.cpp b/src/AggregateFunctions/AggregateFunctionQuantile.cpp index 38b3c91be69..60e759b45a3 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantile.cpp +++ b/src/AggregateFunctions/AggregateFunctionQuantile.cpp @@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile( if (which.idx == TypeIndex::DateTime64) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params); - if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); + if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params); diff --git a/src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp b/src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp index a9486da25fa..1605056e5d9 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp +++ b/src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp @@ -40,7 +40,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile( if (which.idx == TypeIndex::DateTime) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params); - if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); + if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params); diff --git a/src/AggregateFunctions/AggregateFunctionQuantileExact.cpp b/src/AggregateFunctions/AggregateFunctionQuantileExact.cpp index 39de9d0eeaf..e9a3edf1e05 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantileExact.cpp +++ b/src/AggregateFunctions/AggregateFunctionQuantileExact.cpp @@ -47,7 +47,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile( if (which.idx == TypeIndex::DateTime64) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params); - if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); + if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params); diff --git a/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp b/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp index 63e4d3df24b..e9b6012dcdb 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp +++ b/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp @@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile( if (which.idx == TypeIndex::DateTime64) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params); - if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); + if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params); if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params); diff --git a/src/AggregateFunctions/AggregateFunctionWelchTTest.cpp b/src/AggregateFunctions/AggregateFunctionWelchTTest.cpp index 74000296a2d..3a72e0e92bb 100644 --- a/src/AggregateFunctions/AggregateFunctionWelchTTest.cpp +++ b/src/AggregateFunctions/AggregateFunctionWelchTTest.cpp @@ -40,7 +40,15 @@ struct WelchTTestData : public TTestMoments Float64 denominator_x = sx2 * sx2 / (nx * nx * (nx - 1)); Float64 denominator_y = sy2 * sy2 / (ny * ny * (ny - 1)); - return numerator / (denominator_x + denominator_y); + auto result = numerator / (denominator_x + denominator_y); + + if (result <= 0 || std::isinf(result) || isNaN(result)) + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Cannot calculate p_value, because the t-distribution \ + has inappropriate value of degrees of freedom (={}). It should be > 0", result); + + return result; } std::tuple getResult() const diff --git a/src/AggregateFunctions/IAggregateFunction.cpp b/src/AggregateFunctions/IAggregateFunction.cpp index 25d2a9a4530..7da341cc5b9 100644 --- a/src/AggregateFunctions/IAggregateFunction.cpp +++ b/src/AggregateFunctions/IAggregateFunction.cpp @@ -53,9 +53,12 @@ String IAggregateFunction::getDescription() const bool IAggregateFunction::haveEqualArgumentTypes(const IAggregateFunction & rhs) const { - return std::equal(argument_types.begin(), argument_types.end(), - rhs.argument_types.begin(), rhs.argument_types.end(), - [](const auto & t1, const auto & t2) { return t1->equals(*t2); }); + return std::equal( + argument_types.begin(), + argument_types.end(), + rhs.argument_types.begin(), + rhs.argument_types.end(), + [](const auto & t1, const auto & t2) { return t1->equals(*t2); }); } bool IAggregateFunction::haveSameStateRepresentation(const IAggregateFunction & rhs) const @@ -67,11 +70,7 @@ bool IAggregateFunction::haveSameStateRepresentation(const IAggregateFunction & bool IAggregateFunction::haveSameStateRepresentationImpl(const IAggregateFunction & rhs) const { - bool res = getName() == rhs.getName() - && parameters == rhs.parameters - && haveEqualArgumentTypes(rhs); - assert(res == (getStateType()->getName() == rhs.getStateType()->getName())); - return res; + return getStateType()->equals(*rhs.getStateType()); } } diff --git a/src/Client/ClientBase.cpp b/src/Client/ClientBase.cpp index 0a2fbcf9f46..0db7a9533db 100644 --- a/src/Client/ClientBase.cpp +++ b/src/Client/ClientBase.cpp @@ -1,7 +1,6 @@ #include #include -#include #include #include #include @@ -9,7 +8,6 @@ #include "config.h" #include -#include #include #include #include @@ -32,7 +30,6 @@ #include #include #include -#include #include #include @@ -70,10 +67,10 @@ #include #include #include -#include #include #include + namespace fs = std::filesystem; using namespace std::literals; @@ -1925,7 +1922,7 @@ bool ClientBase::processQueryText(const String & text) String ClientBase::prompt() const { - return boost::replace_all_copy(prompt_by_server_display_name, "{database}", config().getString("database", "default")); + return prompt_by_server_display_name; } diff --git a/src/Client/LocalConnection.cpp b/src/Client/LocalConnection.cpp index 7ac68324915..476386889d2 100644 --- a/src/Client/LocalConnection.cpp +++ b/src/Client/LocalConnection.cpp @@ -6,8 +6,6 @@ #include #include #include -#include -#include namespace DB diff --git a/src/Client/MultiplexedConnections.cpp b/src/Client/MultiplexedConnections.cpp index 72cd4c46477..87eda765a7a 100644 --- a/src/Client/MultiplexedConnections.cpp +++ b/src/Client/MultiplexedConnections.cpp @@ -393,24 +393,38 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead Poco::Net::Socket::SocketList write_list; Poco::Net::Socket::SocketList except_list; - for (const ReplicaState & state : replica_states) - { - Connection * connection = state.connection; - if (connection != nullptr) - read_list.push_back(*connection->socket); - } - auto timeout = is_draining ? drain_timeout : receive_timeout; - int n = Poco::Net::Socket::select( - read_list, - write_list, - except_list, - timeout); + int n = 0; + + /// EINTR loop + while (true) + { + read_list.clear(); + for (const ReplicaState & state : replica_states) + { + Connection * connection = state.connection; + if (connection != nullptr) + read_list.push_back(*connection->socket); + } + + /// poco returns 0 on EINTR, let's reset errno to ensure that EINTR came from select(). + errno = 0; + + n = Poco::Net::Socket::select( + read_list, + write_list, + except_list, + timeout); + if (n <= 0 && errno == EINTR) + continue; + break; + } /// We treat any error as timeout for simplicity. /// And we also check if read_list is still empty just in case. if (n <= 0 || read_list.empty()) { + const auto & addresses = dumpAddressesUnlocked(); for (ReplicaState & state : replica_states) { Connection * connection = state.connection; @@ -423,7 +437,7 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead throw Exception(ErrorCodes::TIMEOUT_EXCEEDED, "Timeout ({} ms) exceeded while reading from {}", timeout.totalMilliseconds(), - dumpAddressesUnlocked()); + addresses); } } diff --git a/src/Common/DateLUTImpl.h b/src/Common/DateLUTImpl.h index a0d5a976f35..2deb477ca23 100644 --- a/src/Common/DateLUTImpl.h +++ b/src/Common/DateLUTImpl.h @@ -895,6 +895,19 @@ public: return toRelativeHourNum(lut[toLUTIndex(v)].date); } + /// The same formula is used for positive time (after Unix epoch) and negative time (before Unix epoch). + /// It’s needed for correct work of dateDiff function. + inline Time toStableRelativeHourNum(Time t) const + { + return (t + DATE_LUT_ADD + 86400 - offset_at_start_of_epoch) / 3600 - (DATE_LUT_ADD / 3600); + } + + template + inline Time toStableRelativeHourNum(DateOrTime v) const + { + return toStableRelativeHourNum(lut[toLUTIndex(v)].date); + } + inline Time toRelativeMinuteNum(Time t) const /// NOLINT { return (t + DATE_LUT_ADD) / 60 - (DATE_LUT_ADD / 60); diff --git a/src/Common/OvercommitTracker.cpp b/src/Common/OvercommitTracker.cpp index c7730667f55..bb477d6019d 100644 --- a/src/Common/OvercommitTracker.cpp +++ b/src/Common/OvercommitTracker.cpp @@ -5,6 +5,7 @@ #include #include + namespace ProfileEvents { extern const Event MemoryOvercommitWaitTimeMicroseconds; @@ -170,7 +171,8 @@ void UserOvercommitTracker::pickQueryToExcludeImpl() GlobalOvercommitTracker::GlobalOvercommitTracker(DB::ProcessList * process_list_) : OvercommitTracker(process_list_) -{} +{ +} void GlobalOvercommitTracker::pickQueryToExcludeImpl() { @@ -180,16 +182,16 @@ void GlobalOvercommitTracker::pickQueryToExcludeImpl() // This is guaranteed by locking global_mutex in OvercommitTracker::needToStopQuery. for (auto const & query : process_list->processes) { - if (query.isKilled()) + if (query->isKilled()) continue; Int64 user_soft_limit = 0; - if (auto const * user_process_list = query.getUserProcessList()) + if (auto const * user_process_list = query->getUserProcessList()) user_soft_limit = user_process_list->user_memory_tracker.getSoftLimit(); if (user_soft_limit == 0) continue; - auto * memory_tracker = query.getMemoryTracker(); + auto * memory_tracker = query->getMemoryTracker(); if (!memory_tracker) continue; auto ratio = memory_tracker->getOvercommitRatio(user_soft_limit); diff --git a/src/Common/getNumberOfPhysicalCPUCores.cpp b/src/Common/getNumberOfPhysicalCPUCores.cpp index 7bb68b324b2..7a1f10b6435 100644 --- a/src/Common/getNumberOfPhysicalCPUCores.cpp +++ b/src/Common/getNumberOfPhysicalCPUCores.cpp @@ -48,7 +48,7 @@ static unsigned getNumberOfPhysicalCPUCoresImpl() /// Let's limit ourself to the number of physical cores. /// But if the number of logical cores is small - maybe it is a small machine /// or very limited cloud instance and it is reasonable to use all the cores. - if (cpu_count >= 8) + if (cpu_count >= 32) cpu_count /= 2; #endif diff --git a/src/Common/intExp.h b/src/Common/intExp.h index 3529990ef3b..04c163ff224 100644 --- a/src/Common/intExp.h +++ b/src/Common/intExp.h @@ -47,6 +47,11 @@ namespace common constexpr inline int exp10_i32(int x) { + if (x < 0) + return 0; + if (x > 9) + return std::numeric_limits::max(); + constexpr int values[] = { 1, @@ -65,6 +70,11 @@ constexpr inline int exp10_i32(int x) constexpr inline int64_t exp10_i64(int x) { + if (x < 0) + return 0; + if (x > 18) + return std::numeric_limits::max(); + constexpr int64_t values[] = { 1LL, @@ -92,6 +102,11 @@ constexpr inline int64_t exp10_i64(int x) constexpr inline Int128 exp10_i128(int x) { + if (x < 0) + return 0; + if (x > 38) + return std::numeric_limits::max(); + constexpr Int128 values[] = { static_cast(1LL), @@ -140,6 +155,11 @@ constexpr inline Int128 exp10_i128(int x) inline Int256 exp10_i256(int x) { + if (x < 0) + return 0; + if (x > 76) + return std::numeric_limits::max(); + using Int256 = Int256; static constexpr Int256 i10e18{1000000000000000000ll}; static const Int256 values[] = { @@ -231,8 +251,10 @@ inline Int256 exp10_i256(int x) template constexpr inline T intExp10OfSize(int x) { - if constexpr (sizeof(T) <= 8) - return intExp10(x); + if constexpr (sizeof(T) <= 4) + return common::exp10_i32(x); + else if constexpr (sizeof(T) <= 8) + return common::exp10_i64(x); else if constexpr (sizeof(T) <= 16) return common::exp10_i128(x); else diff --git a/src/Common/tests/gtest_DateLUTImpl.cpp b/src/Common/tests/gtest_DateLUTImpl.cpp index 49013625ed3..aca17ae4f93 100644 --- a/src/Common/tests/gtest_DateLUTImpl.cpp +++ b/src/Common/tests/gtest_DateLUTImpl.cpp @@ -134,6 +134,7 @@ TEST(DateLUTTest, TimeValuesInMiddleOfRange) EXPECT_EQ(lut.toRelativeMonthNum(time), 24237 /*unsigned*/); EXPECT_EQ(lut.toRelativeQuarterNum(time), 8078 /*unsigned*/); EXPECT_EQ(lut.toRelativeHourNum(time), 435736 /*time_t*/); + EXPECT_EQ(lut.toStableRelativeHourNum(time), 435757 /*time_t*/); EXPECT_EQ(lut.toRelativeMinuteNum(time), 26144180 /*time_t*/); EXPECT_EQ(lut.toStartOfMinuteInterval(time, 6), 1568650680 /*time_t*/); EXPECT_EQ(lut.toStartOfSecondInterval(time, 7), 1568650811 /*time_t*/); @@ -196,6 +197,7 @@ TEST(DateLUTTest, TimeValuesAtLeftBoderOfRange) EXPECT_EQ(lut.toRelativeMonthNum(time), 23641 /*unsigned*/); // ? EXPECT_EQ(lut.toRelativeQuarterNum(time), 7880 /*unsigned*/); // ? EXPECT_EQ(lut.toRelativeHourNum(time), 0 /*time_t*/); + EXPECT_EQ(lut.toStableRelativeHourNum(time), 24 /*time_t*/); EXPECT_EQ(lut.toRelativeMinuteNum(time), 0 /*time_t*/); EXPECT_EQ(lut.toStartOfMinuteInterval(time, 6), 0 /*time_t*/); EXPECT_EQ(lut.toStartOfSecondInterval(time, 7), 0 /*time_t*/); @@ -259,6 +261,7 @@ TEST(DateLUTTest, TimeValuesAtRightBoderOfRangeOfOldLUT) EXPECT_EQ(lut.toRelativeMonthNum(time), 25273 /*unsigned*/); EXPECT_EQ(lut.toRelativeQuarterNum(time), 8424 /*unsigned*/); EXPECT_EQ(lut.toRelativeHourNum(time), 1192873 /*time_t*/); + EXPECT_EQ(lut.toStableRelativeHourNum(time), 1192897 /*time_t*/); EXPECT_EQ(lut.toRelativeMinuteNum(time), 71572397 /*time_t*/); EXPECT_EQ(lut.toStartOfMinuteInterval(time, 6), 4294343520 /*time_t*/); EXPECT_EQ(lut.toStartOfSecondInterval(time, 7), 4294343872 /*time_t*/); diff --git a/src/Coordination/KeeperDispatcher.cpp b/src/Coordination/KeeperDispatcher.cpp index d725ecb5cfe..6e9116d4b75 100644 --- a/src/Coordination/KeeperDispatcher.cpp +++ b/src/Coordination/KeeperDispatcher.cpp @@ -1,14 +1,21 @@ #include + +#include +#include + +#include #include #include -#include -#include -#include -#include -#include #include #include + +#include +#include +#include +#include +#include + namespace CurrentMetrics { extern const Metric KeeperAliveConnections; @@ -32,9 +39,7 @@ KeeperDispatcher::KeeperDispatcher() : responses_queue(std::numeric_limits::max()) , configuration_and_settings(std::make_shared()) , log(&Poco::Logger::get("KeeperDispatcher")) -{ -} - +{} void KeeperDispatcher::requestThread() { @@ -191,7 +196,13 @@ void KeeperDispatcher::snapshotThread() try { - task.create_snapshot(std::move(task.snapshot)); + auto snapshot_path = task.create_snapshot(std::move(task.snapshot)); + + if (snapshot_path.empty()) + continue; + + if (isLeader()) + snapshot_s3.uploadSnapshot(snapshot_path); } catch (...) { @@ -285,7 +296,9 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf responses_thread = ThreadFromGlobalPool([this] { responseThread(); }); snapshot_thread = ThreadFromGlobalPool([this] { snapshotThread(); }); - server = std::make_unique(configuration_and_settings, config, responses_queue, snapshots_queue); + snapshot_s3.startup(config); + + server = std::make_unique(configuration_and_settings, config, responses_queue, snapshots_queue, snapshot_s3); try { @@ -312,7 +325,6 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf /// Start it after keeper server start session_cleaner_thread = ThreadFromGlobalPool([this] { sessionCleanerTask(); }); update_configuration_thread = ThreadFromGlobalPool([this] { updateConfigurationThread(); }); - updateConfiguration(config); LOG_DEBUG(log, "Dispatcher initialized"); } @@ -415,6 +427,8 @@ void KeeperDispatcher::shutdown() if (server) server->shutdown(); + snapshot_s3.shutdown(); + CurrentMetrics::set(CurrentMetrics::KeeperAliveConnections, 0); } @@ -678,6 +692,8 @@ void KeeperDispatcher::updateConfiguration(const Poco::Util::AbstractConfigurati if (!push_result) throw Exception(ErrorCodes::SYSTEM_ERROR, "Cannot push configuration update to queue"); } + + snapshot_s3.updateS3Configuration(config); } void KeeperDispatcher::updateKeeperStatLatency(uint64_t process_time_ms) diff --git a/src/Coordination/KeeperDispatcher.h b/src/Coordination/KeeperDispatcher.h index 3b524b24ed7..0003867adbe 100644 --- a/src/Coordination/KeeperDispatcher.h +++ b/src/Coordination/KeeperDispatcher.h @@ -14,6 +14,7 @@ #include #include #include +#include namespace DB { @@ -76,6 +77,8 @@ private: /// Counter for new session_id requests. std::atomic internal_session_id_counter{0}; + KeeperSnapshotManagerS3 snapshot_s3; + /// Thread put requests to raft void requestThread(); /// Thread put responses for subscribed sessions diff --git a/src/Coordination/KeeperServer.cpp b/src/Coordination/KeeperServer.cpp index 7a0cee746c6..1c8959379da 100644 --- a/src/Coordination/KeeperServer.cpp +++ b/src/Coordination/KeeperServer.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -105,7 +106,8 @@ KeeperServer::KeeperServer( const KeeperConfigurationAndSettingsPtr & configuration_and_settings_, const Poco::Util::AbstractConfiguration & config, ResponsesQueue & responses_queue_, - SnapshotsQueue & snapshots_queue_) + SnapshotsQueue & snapshots_queue_, + KeeperSnapshotManagerS3 & snapshot_manager_s3) : server_id(configuration_and_settings_->server_id) , coordination_settings(configuration_and_settings_->coordination_settings) , log(&Poco::Logger::get("KeeperServer")) @@ -125,6 +127,7 @@ KeeperServer::KeeperServer( configuration_and_settings_->snapshot_storage_path, coordination_settings, keeper_context, + config.getBool("keeper_server.upload_snapshot_on_exit", true) ? &snapshot_manager_s3 : nullptr, checkAndGetSuperdigest(configuration_and_settings_->super_digest)); state_manager = nuraft::cs_new( diff --git a/src/Coordination/KeeperServer.h b/src/Coordination/KeeperServer.h index 6873ef2a01e..a33e29b4540 100644 --- a/src/Coordination/KeeperServer.h +++ b/src/Coordination/KeeperServer.h @@ -71,7 +71,8 @@ public: const KeeperConfigurationAndSettingsPtr & settings_, const Poco::Util::AbstractConfiguration & config_, ResponsesQueue & responses_queue_, - SnapshotsQueue & snapshots_queue_); + SnapshotsQueue & snapshots_queue_, + KeeperSnapshotManagerS3 & snapshot_manager_s3); /// Load state machine from the latest snapshot and load log storage. Start NuRaft with required settings. void startup(const Poco::Util::AbstractConfiguration & config, bool enable_ipv6 = true); diff --git a/src/Coordination/KeeperSnapshotManager.h b/src/Coordination/KeeperSnapshotManager.h index c00ce9421e7..52647712083 100644 --- a/src/Coordination/KeeperSnapshotManager.h +++ b/src/Coordination/KeeperSnapshotManager.h @@ -87,7 +87,7 @@ public: }; using KeeperStorageSnapshotPtr = std::shared_ptr; -using CreateSnapshotCallback = std::function; +using CreateSnapshotCallback = std::function; using SnapshotMetaAndStorage = std::pair; diff --git a/src/Coordination/KeeperSnapshotManagerS3.cpp b/src/Coordination/KeeperSnapshotManagerS3.cpp new file mode 100644 index 00000000000..2e19d496407 --- /dev/null +++ b/src/Coordination/KeeperSnapshotManagerS3.cpp @@ -0,0 +1,311 @@ +#include + +#if USE_AWS_S3 +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +namespace fs = std::filesystem; + +namespace DB +{ + +struct KeeperSnapshotManagerS3::S3Configuration +{ + S3Configuration(S3::URI uri_, S3::AuthSettings auth_settings_, std::shared_ptr client_) + : uri(std::move(uri_)) + , auth_settings(std::move(auth_settings_)) + , client(std::move(client_)) + {} + + S3::URI uri; + S3::AuthSettings auth_settings; + std::shared_ptr client; +}; + +KeeperSnapshotManagerS3::KeeperSnapshotManagerS3() + : snapshots_s3_queue(std::numeric_limits::max()) + , log(&Poco::Logger::get("KeeperSnapshotManagerS3")) + , uuid(UUIDHelpers::generateV4()) +{} + +void KeeperSnapshotManagerS3::updateS3Configuration(const Poco::Util::AbstractConfiguration & config) +{ + try + { + const std::string config_prefix = "keeper_server.s3_snapshot"; + + if (!config.has(config_prefix)) + { + std::lock_guard client_lock{snapshot_s3_client_mutex}; + if (snapshot_s3_client) + LOG_INFO(log, "S3 configuration was removed"); + snapshot_s3_client = nullptr; + return; + } + + auto auth_settings = S3::AuthSettings::loadFromConfig(config_prefix, config); + + auto endpoint = config.getString(config_prefix + ".endpoint"); + auto new_uri = S3::URI{Poco::URI(endpoint)}; + + { + std::lock_guard client_lock{snapshot_s3_client_mutex}; + // if client is not changed (same auth settings, same endpoint) we don't need to update + if (snapshot_s3_client && snapshot_s3_client->client && auth_settings == snapshot_s3_client->auth_settings + && snapshot_s3_client->uri.uri == new_uri.uri) + return; + } + + LOG_INFO(log, "S3 configuration was updated"); + + auto credentials = Aws::Auth::AWSCredentials(auth_settings.access_key_id, auth_settings.secret_access_key); + HeaderCollection headers = auth_settings.headers; + + static constexpr size_t s3_max_redirects = 10; + static constexpr bool enable_s3_requests_logging = false; + + if (!new_uri.key.empty()) + { + LOG_ERROR(log, "Invalid endpoint defined for S3, it shouldn't contain key, endpoint: {}", endpoint); + return; + } + + S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration( + auth_settings.region, + RemoteHostFilter(), s3_max_redirects, + enable_s3_requests_logging, + /* for_disk_s3 = */ false); + + client_configuration.endpointOverride = new_uri.endpoint; + + auto client = S3::ClientFactory::instance().create( + client_configuration, + new_uri.is_virtual_hosted_style, + credentials.GetAWSAccessKeyId(), + credentials.GetAWSSecretKey(), + auth_settings.server_side_encryption_customer_key_base64, + std::move(headers), + auth_settings.use_environment_credentials.value_or(false), + auth_settings.use_insecure_imds_request.value_or(false)); + + auto new_client = std::make_shared(std::move(new_uri), std::move(auth_settings), std::move(client)); + + { + std::lock_guard client_lock{snapshot_s3_client_mutex}; + snapshot_s3_client = std::move(new_client); + } + LOG_INFO(log, "S3 client was updated"); + } + catch (...) + { + LOG_ERROR(log, "Failed to create an S3 client for snapshots"); + tryLogCurrentException(__PRETTY_FUNCTION__); + } +} +std::shared_ptr KeeperSnapshotManagerS3::getSnapshotS3Client() const +{ + std::lock_guard lock{snapshot_s3_client_mutex}; + return snapshot_s3_client; +} + +void KeeperSnapshotManagerS3::uploadSnapshotImpl(const std::string & snapshot_path) +{ + try + { + auto s3_client = getSnapshotS3Client(); + if (s3_client == nullptr) + return; + + S3Settings::ReadWriteSettings read_write_settings; + read_write_settings.upload_part_size_multiply_parts_count_threshold = 10000; + + const auto create_writer = [&](const auto & key) + { + return WriteBufferFromS3 + { + s3_client->client, + s3_client->uri.bucket, + key, + read_write_settings + }; + }; + + const auto file_exists = [&](const auto & key) + { + Aws::S3::Model::HeadObjectRequest request; + request.SetBucket(s3_client->uri.bucket); + request.SetKey(key); + auto outcome = s3_client->client->HeadObject(request); + + if (outcome.IsSuccess()) + return true; + + const auto & error = outcome.GetError(); + if (error.GetErrorType() != Aws::S3::S3Errors::NO_SUCH_KEY && error.GetErrorType() != Aws::S3::S3Errors::RESOURCE_NOT_FOUND) + throw S3Exception(error.GetErrorType(), "Failed to verify existence of lock file: {}", error.GetMessage()); + + return false; + }; + + + LOG_INFO(log, "Will try to upload snapshot on {} to S3", snapshot_path); + ReadBufferFromFile snapshot_file(snapshot_path); + + auto snapshot_name = fs::path(snapshot_path).filename().string(); + auto lock_file = fmt::format(".{}_LOCK", snapshot_name); + + if (file_exists(snapshot_name)) + { + LOG_ERROR(log, "Snapshot {} already exists", snapshot_name); + return; + } + + // First we need to verify that there isn't already a lock file for the snapshot we want to upload + // Only leader uploads a snapshot, but there can be a rare case where we have 2 leaders in NuRaft + if (file_exists(lock_file)) + { + LOG_ERROR(log, "Lock file for {} already, exists. Probably a different node is already uploading the snapshot", snapshot_name); + return; + } + + // We write our UUID to lock file + LOG_DEBUG(log, "Trying to create a lock file"); + WriteBufferFromS3 lock_writer = create_writer(lock_file); + writeUUIDText(uuid, lock_writer); + lock_writer.finalize(); + + // We read back the written UUID, if it's the same we can upload the file + ReadBufferFromS3 lock_reader + { + s3_client->client, + s3_client->uri.bucket, + lock_file, + "", + 1, + {} + }; + + std::string read_uuid; + readStringUntilEOF(read_uuid, lock_reader); + + if (read_uuid != toString(uuid)) + { + LOG_ERROR(log, "Failed to create a lock file"); + return; + } + + SCOPE_EXIT( + { + LOG_INFO(log, "Removing lock file"); + try + { + Aws::S3::Model::DeleteObjectRequest delete_request; + delete_request.SetBucket(s3_client->uri.bucket); + delete_request.SetKey(lock_file); + auto delete_outcome = s3_client->client->DeleteObject(delete_request); + if (!delete_outcome.IsSuccess()) + throw S3Exception(delete_outcome.GetError().GetMessage(), delete_outcome.GetError().GetErrorType()); + } + catch (...) + { + LOG_INFO(log, "Failed to delete lock file for {} from S3", snapshot_path); + tryLogCurrentException(__PRETTY_FUNCTION__); + } + }); + + WriteBufferFromS3 snapshot_writer = create_writer(snapshot_name); + copyData(snapshot_file, snapshot_writer); + snapshot_writer.finalize(); + + LOG_INFO(log, "Successfully uploaded {} to S3", snapshot_path); + } + catch (...) + { + LOG_INFO(log, "Failure during upload of {} to S3", snapshot_path); + tryLogCurrentException(__PRETTY_FUNCTION__); + } +} + +void KeeperSnapshotManagerS3::snapshotS3Thread() +{ + setThreadName("KeeperS3SnpT"); + + while (!shutdown_called) + { + std::string snapshot_path; + if (!snapshots_s3_queue.pop(snapshot_path)) + break; + + if (shutdown_called) + break; + + uploadSnapshotImpl(snapshot_path); + } +} + +void KeeperSnapshotManagerS3::uploadSnapshot(const std::string & path, bool async_upload) +{ + if (getSnapshotS3Client() == nullptr) + return; + + if (async_upload) + { + if (!snapshots_s3_queue.push(path)) + LOG_WARNING(log, "Failed to add snapshot {} to S3 queue", path); + + return; + } + + uploadSnapshotImpl(path); +} + +void KeeperSnapshotManagerS3::startup(const Poco::Util::AbstractConfiguration & config) +{ + updateS3Configuration(config); + snapshot_s3_thread = ThreadFromGlobalPool([this] { snapshotS3Thread(); }); +} + +void KeeperSnapshotManagerS3::shutdown() +{ + if (shutdown_called) + return; + + LOG_DEBUG(log, "Shutting down KeeperSnapshotManagerS3"); + shutdown_called = true; + + try + { + snapshots_s3_queue.finish(); + if (snapshot_s3_thread.joinable()) + snapshot_s3_thread.join(); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } + + LOG_INFO(log, "KeeperSnapshotManagerS3 shut down"); +} + +} + +#endif diff --git a/src/Coordination/KeeperSnapshotManagerS3.h b/src/Coordination/KeeperSnapshotManagerS3.h new file mode 100644 index 00000000000..5b62d114aae --- /dev/null +++ b/src/Coordination/KeeperSnapshotManagerS3.h @@ -0,0 +1,68 @@ +#pragma once + +#include "config.h" + +#include + +#if USE_AWS_S3 +#include +#include +#include + +#include +#endif + +namespace DB +{ + +#if USE_AWS_S3 +class KeeperSnapshotManagerS3 +{ +public: + KeeperSnapshotManagerS3(); + + void updateS3Configuration(const Poco::Util::AbstractConfiguration & config); + void uploadSnapshot(const std::string & path, bool async_upload = true); + + void startup(const Poco::Util::AbstractConfiguration & config); + void shutdown(); +private: + using SnapshotS3Queue = ConcurrentBoundedQueue; + SnapshotS3Queue snapshots_s3_queue; + + /// Upload new snapshots to S3 + ThreadFromGlobalPool snapshot_s3_thread; + + struct S3Configuration; + mutable std::mutex snapshot_s3_client_mutex; + std::shared_ptr snapshot_s3_client; + + std::atomic shutdown_called{false}; + + Poco::Logger * log; + + UUID uuid; + + std::shared_ptr getSnapshotS3Client() const; + + void uploadSnapshotImpl(const std::string & snapshot_path); + + /// Thread upload snapshots to S3 in the background + void snapshotS3Thread(); +}; +#else +class KeeperSnapshotManagerS3 +{ +public: + KeeperSnapshotManagerS3() = default; + + void updateS3Configuration(const Poco::Util::AbstractConfiguration &) {} + void uploadSnapshot(const std::string &, [[maybe_unused]] bool async_upload = true) {} + + void startup(const Poco::Util::AbstractConfiguration &) {} + + void shutdown() {} +}; +#endif + +} diff --git a/src/Coordination/KeeperStateMachine.cpp b/src/Coordination/KeeperStateMachine.cpp index c5a66ce29ca..ee5bfa48357 100644 --- a/src/Coordination/KeeperStateMachine.cpp +++ b/src/Coordination/KeeperStateMachine.cpp @@ -44,6 +44,7 @@ KeeperStateMachine::KeeperStateMachine( const std::string & snapshots_path_, const CoordinationSettingsPtr & coordination_settings_, const KeeperContextPtr & keeper_context_, + KeeperSnapshotManagerS3 * snapshot_manager_s3_, const std::string & superdigest_) : coordination_settings(coordination_settings_) , snapshot_manager( @@ -59,6 +60,7 @@ KeeperStateMachine::KeeperStateMachine( , log(&Poco::Logger::get("KeeperStateMachine")) , superdigest(superdigest_) , keeper_context(keeper_context_) + , snapshot_manager_s3(snapshot_manager_s3_) { } @@ -400,13 +402,22 @@ void KeeperStateMachine::create_snapshot(nuraft::snapshot & s, nuraft::async_res } when_done(ret, exception); + + return ret ? latest_snapshot_path : ""; }; if (keeper_context->server_state == KeeperContext::Phase::SHUTDOWN) { LOG_INFO(log, "Creating a snapshot during shutdown because 'create_snapshot_on_exit' is enabled."); - snapshot_task.create_snapshot(std::move(snapshot_task.snapshot)); + auto snapshot_path = snapshot_task.create_snapshot(std::move(snapshot_task.snapshot)); + + if (!snapshot_path.empty() && snapshot_manager_s3) + { + LOG_INFO(log, "Uploading snapshot {} during shutdown because 'upload_snapshot_on_exit' is enabled.", snapshot_path); + snapshot_manager_s3->uploadSnapshot(snapshot_path, /* asnyc_upload */ false); + } + return; } diff --git a/src/Coordination/KeeperStateMachine.h b/src/Coordination/KeeperStateMachine.h index fbd4fdc5ac2..ffc7fce1cfe 100644 --- a/src/Coordination/KeeperStateMachine.h +++ b/src/Coordination/KeeperStateMachine.h @@ -2,11 +2,13 @@ #include #include +#include +#include #include + #include #include #include -#include namespace DB @@ -26,6 +28,7 @@ public: const std::string & snapshots_path_, const CoordinationSettingsPtr & coordination_settings_, const KeeperContextPtr & keeper_context_, + KeeperSnapshotManagerS3 * snapshot_manager_s3_, const std::string & superdigest_ = ""); /// Read state from the latest snapshot @@ -146,6 +149,8 @@ private: const std::string superdigest; KeeperContextPtr keeper_context; + + KeeperSnapshotManagerS3 * snapshot_manager_s3; }; } diff --git a/src/Coordination/tests/gtest_coordination.cpp b/src/Coordination/tests/gtest_coordination.cpp index 5bb1ecc7c85..b1d27d4541d 100644 --- a/src/Coordination/tests/gtest_coordination.cpp +++ b/src/Coordination/tests/gtest_coordination.cpp @@ -1318,7 +1318,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint ResponsesQueue queue(std::numeric_limits::max()); SnapshotsQueue snapshots_queue{1}; - auto state_machine = std::make_shared(queue, snapshots_queue, "./snapshots", settings, keeper_context); + auto state_machine = std::make_shared(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr); state_machine->init(); DB::KeeperLogStore changelog("./logs", settings->rotate_log_storage_interval, true, enable_compression); changelog.init(state_machine->last_commit_index() + 1, settings->reserved_log_items); @@ -1359,7 +1359,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint } SnapshotsQueue snapshots_queue1{1}; - auto restore_machine = std::make_shared(queue, snapshots_queue1, "./snapshots", settings, keeper_context); + auto restore_machine = std::make_shared(queue, snapshots_queue1, "./snapshots", settings, keeper_context, nullptr); restore_machine->init(); EXPECT_EQ(restore_machine->last_commit_index(), total_logs - total_logs % settings->snapshot_distance); @@ -1471,7 +1471,7 @@ TEST_P(CoordinationTest, TestEphemeralNodeRemove) ResponsesQueue queue(std::numeric_limits::max()); SnapshotsQueue snapshots_queue{1}; - auto state_machine = std::make_shared(queue, snapshots_queue, "./snapshots", settings, keeper_context); + auto state_machine = std::make_shared(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr); state_machine->init(); std::shared_ptr request_c = std::make_shared(); diff --git a/src/Core/Settings.h b/src/Core/Settings.h index 07618ee731d..0b8d24b1abc 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -331,8 +331,8 @@ static constexpr UInt64 operator""_GiB(unsigned long long value) M(UInt64, max_bytes_before_remerge_sort, 1000000000, "In case of ORDER BY with LIMIT, when memory usage is higher than specified threshold, perform additional steps of merging blocks before final merge to keep just top LIMIT rows.", 0) \ M(Float, remerge_sort_lowered_memory_bytes_ratio, 2., "If memory usage after remerge does not reduced by this ratio, remerge will be disabled.", 0) \ \ - M(UInt64, max_result_rows, 0, "Limit on result size in rows. Also checked for intermediate data sent from remote servers.", 0) \ - M(UInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). Also checked for intermediate data sent from remote servers.", 0) \ + M(UInt64, max_result_rows, 0, "Limit on result size in rows. The query will stop after processing a block of data if the threshold is met, but it will not cut the last block of the result, therefore the result size can be larger than the threshold.", 0) \ + M(UInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). The query will stop after processing a block of data if the threshold is met, but it will not cut the last block of the result, therefore the result size can be larger than the threshold. Caveats: the result size in memory is taken into account for this threshold. Even if the result size is small, it can reference larger data structures in memory, representing dictionaries of LowCardinality columns, and Arenas of AggregateFunction columns, so the threshold can be exceeded despite the small result size. The setting is fairly low level and should be used with caution.", 0) \ M(OverflowMode, result_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \ \ /* TODO: Check also when merging and finalizing aggregate functions. */ \ diff --git a/src/Formats/EscapingRuleUtils.cpp b/src/Formats/EscapingRuleUtils.cpp index bfc026342eb..e47525d089a 100644 --- a/src/Formats/EscapingRuleUtils.cpp +++ b/src/Formats/EscapingRuleUtils.cpp @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -875,4 +876,19 @@ String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, Fo return result; } + +void checkSupportedDelimiterAfterField(FormatSettings::EscapingRule escaping_rule, const String & delimiter, const DataTypePtr & type) +{ + if (escaping_rule != FormatSettings::EscapingRule::Escaped) + return; + + bool is_supported_delimiter_after_string = !delimiter.empty() && (delimiter.front() == '\t' || delimiter.front() == '\n'); + if (is_supported_delimiter_after_string) + return; + + /// Nullptr means that field is skipped and it's equivalent to String + if (!type || isString(removeNullable(removeLowCardinality(type)))) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "'Escaped' serialization requires delimiter after String field to start with '\\t' or '\\n'"); +} + } diff --git a/src/Formats/EscapingRuleUtils.h b/src/Formats/EscapingRuleUtils.h index 901679b6a05..c8b710002a5 100644 --- a/src/Formats/EscapingRuleUtils.h +++ b/src/Formats/EscapingRuleUtils.h @@ -77,6 +77,8 @@ void transformInferredTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, c void transformInferredJSONTypesIfNeeded(DataTypes & types, const FormatSettings & settings, const std::unordered_set * numbers_parsed_from_json_strings = nullptr); void transformInferredJSONTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, const FormatSettings & settings); -String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings,FormatSettings::EscapingRule escaping_rule); +String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, FormatSettings::EscapingRule escaping_rule); + +void checkSupportedDelimiterAfterField(FormatSettings::EscapingRule escaping_rule, const String & delimiter, const DataTypePtr & type); } diff --git a/src/Formats/FormatFactory.cpp b/src/Formats/FormatFactory.cpp index bfe651dd1af..a882fcf5009 100644 --- a/src/Formats/FormatFactory.cpp +++ b/src/Formats/FormatFactory.cpp @@ -303,7 +303,7 @@ InputFormatPtr FormatFactory::getInputFormat( static void addExistingProgressToOutputFormat(OutputFormatPtr format, ContextPtr context) { - auto * element_id = context->getProcessListElement(); + auto element_id = context->getProcessListElement(); if (element_id) { /// While preparing the query there might have been progress (for example in subscalar subqueries) so add it here diff --git a/src/Functions/CustomWeekTransforms.h b/src/Functions/CustomWeekTransforms.h index b690463d456..781c18bc338 100644 --- a/src/Functions/CustomWeekTransforms.h +++ b/src/Functions/CustomWeekTransforms.h @@ -62,10 +62,7 @@ struct ToStartOfWeekImpl static inline UInt16 execute(Int64 t, UInt8 week_mode, const DateLUTImpl & time_zone) { - if (t < 0) - return 0; - - return time_zone.toFirstDayNumOfWeek(DayNum(std::min(Int32(time_zone.toDayNum(t)), Int32(DATE_LUT_MAX_DAY_NUM))), week_mode); + return time_zone.toFirstDayNumOfWeek(time_zone.toDayNum(t), week_mode); } static inline UInt16 execute(UInt32 t, UInt8 week_mode, const DateLUTImpl & time_zone) { @@ -73,10 +70,7 @@ struct ToStartOfWeekImpl } static inline UInt16 execute(Int32 d, UInt8 week_mode, const DateLUTImpl & time_zone) { - if (d < 0) - return 0; - - return time_zone.toFirstDayNumOfWeek(DayNum(std::min(d, Int32(DATE_LUT_MAX_DAY_NUM))), week_mode); + return time_zone.toFirstDayNumOfWeek(ExtendedDayNum(d), week_mode); } static inline UInt16 execute(UInt16 d, UInt8 week_mode, const DateLUTImpl & time_zone) { diff --git a/src/Functions/DateTimeTransforms.h b/src/Functions/DateTimeTransforms.h index 217f158cc8e..f7924981d09 100644 --- a/src/Functions/DateTimeTransforms.h +++ b/src/Functions/DateTimeTransforms.h @@ -55,15 +55,15 @@ struct ToDateImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - return t < 0 ? 0 : std::min(Int32(time_zone.toDayNum(t)), Int32(DATE_LUT_MAX_DAY_NUM)); + return UInt16(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { - return time_zone.toDayNum(t); + return UInt16(time_zone.toDayNum(t)); } - static inline UInt16 execute(Int32 t, const DateLUTImpl &) + static inline UInt16 execute(Int32, const DateLUTImpl &) { - return t < 0 ? 0 : std::min(t, Int32(DATE_LUT_MAX_DAY_NUM)); + throwDateIsNotSupported(name); } static inline UInt16 execute(UInt16 d, const DateLUTImpl &) { @@ -104,10 +104,7 @@ struct ToStartOfDayImpl static inline UInt32 execute(const DecimalUtils::DecimalComponents & t, const DateLUTImpl & time_zone) { - if (t.whole < 0 || (t.whole >= 0 && t.fractional < 0)) - return 0; - - return time_zone.toDate(std::min(t.whole, Int64(0xffffffff))); + return time_zone.toDate(static_cast(t.whole)); } static inline UInt32 execute(UInt32 t, const DateLUTImpl & time_zone) { @@ -115,19 +112,11 @@ struct ToStartOfDayImpl } static inline UInt32 execute(Int32 d, const DateLUTImpl & time_zone) { - if (d < 0) - return 0; - - auto date_time = time_zone.fromDayNum(ExtendedDayNum(d)); - if (date_time <= 0xffffffff) - return date_time; - else - return time_zone.toDate(0xffffffff); + return time_zone.toDate(ExtendedDayNum(d)); } static inline UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { - auto date_time = time_zone.fromDayNum(ExtendedDayNum(d)); - return date_time < 0xffffffff ? date_time : time_zone.toDate(0xffffffff); + return time_zone.toDate(DayNum(d)); } static inline DecimalUtils::DecimalComponents executeExtendedResult(const DecimalUtils::DecimalComponents & t, const DateLUTImpl & time_zone) { @@ -147,16 +136,17 @@ struct ToMondayImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - return t < 0 ? 0 : time_zone.toFirstDayNumOfWeek(ExtendedDayNum( - std::min(Int32(time_zone.toDayNum(t)), Int32(DATE_LUT_MAX_DAY_NUM)))); + //return time_zone.toFirstDayNumOfWeek(time_zone.toDayNum(t)); + return time_zone.toFirstDayNumOfWeek(t); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { + //return time_zone.toFirstDayNumOfWeek(time_zone.toDayNum(t)); return time_zone.toFirstDayNumOfWeek(t); } static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) { - return d < 0 ? 0 : time_zone.toFirstDayNumOfWeek(ExtendedDayNum(std::min(d, Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfWeek(ExtendedDayNum(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -179,15 +169,15 @@ struct ToStartOfMonthImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - return t < 0 ? 0 : time_zone.toFirstDayNumOfMonth(ExtendedDayNum(std::min(Int32(time_zone.toDayNum(t)), Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfMonth(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { - return time_zone.toFirstDayNumOfMonth(ExtendedDayNum(time_zone.toDayNum(t))); + return time_zone.toFirstDayNumOfMonth(time_zone.toDayNum(t)); } static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) { - return d < 0 ? 0 : time_zone.toFirstDayNumOfMonth(ExtendedDayNum(std::min(d, Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfMonth(ExtendedDayNum(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -211,11 +201,7 @@ struct ToLastDayOfMonthImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - if (t < 0) - return 0; - - /// 0xFFF9 is Int value for 2149-05-31 -- the last day where we can actually find LastDayOfMonth. This will also be the return value. - return time_zone.toLastDayNumOfMonth(ExtendedDayNum(std::min(Int32(time_zone.toDayNum(t)), Int32(0xFFF9)))); + return time_zone.toLastDayNumOfMonth(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { @@ -223,16 +209,11 @@ struct ToLastDayOfMonthImpl } static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) { - if (d < 0) - return 0; - - /// 0xFFF9 is Int value for 2149-05-31 -- the last day where we can actually find LastDayOfMonth. This will also be the return value. - return time_zone.toLastDayNumOfMonth(ExtendedDayNum(std::min(d, Int32(0xFFF9)))); + return time_zone.toLastDayNumOfMonth(ExtendedDayNum(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { - /// 0xFFF9 is Int value for 2149-05-31 -- the last day where we can actually find LastDayOfMonth. This will also be the return value. - return time_zone.toLastDayNumOfMonth(DayNum(std::min(d, UInt16(0xFFF9)))); + return time_zone.toLastDayNumOfMonth(DayNum(d)); } static inline Int64 executeExtendedResult(Int64 t, const DateLUTImpl & time_zone) { @@ -251,7 +232,7 @@ struct ToStartOfQuarterImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - return t < 0 ? 0 : time_zone.toFirstDayNumOfQuarter(ExtendedDayNum(std::min(Int64(time_zone.toDayNum(t)), Int64(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfQuarter(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { @@ -259,7 +240,7 @@ struct ToStartOfQuarterImpl } static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) { - return d < 0 ? 0 : time_zone.toFirstDayNumOfQuarter(ExtendedDayNum(std::min(d, Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfQuarter(ExtendedDayNum(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -282,7 +263,7 @@ struct ToStartOfYearImpl static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) { - return t < 0 ? 0 : time_zone.toFirstDayNumOfYear(ExtendedDayNum(std::min(Int32(time_zone.toDayNum(t)), Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfYear(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { @@ -290,7 +271,7 @@ struct ToStartOfYearImpl } static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) { - return d < 0 ? 0 : time_zone.toFirstDayNumOfYear(ExtendedDayNum(std::min(d, Int32(DATE_LUT_MAX_DAY_NUM)))); + return time_zone.toFirstDayNumOfYear(ExtendedDayNum(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -340,10 +321,7 @@ struct ToStartOfMinuteImpl static inline UInt32 execute(const DecimalUtils::DecimalComponents & t, const DateLUTImpl & time_zone) { - if (t.whole < 0 || (t.whole >= 0 && t.fractional < 0)) - return 0; - - return time_zone.toStartOfMinute(std::min(t.whole, Int64(0xffffffff))); + return time_zone.toStartOfMinute(t.whole); } static inline UInt32 execute(UInt32 t, const DateLUTImpl & time_zone) { @@ -677,10 +655,7 @@ struct ToStartOfHourImpl static inline UInt32 execute(const DecimalUtils::DecimalComponents & t, const DateLUTImpl & time_zone) { - if (t.whole < 0 || (t.whole >= 0 && t.fractional < 0)) - return 0; - - return time_zone.toStartOfHour(std::min(t.whole, Int64(0xffffffff))); + return time_zone.toStartOfHour(t.whole); } static inline UInt32 execute(UInt32 t, const DateLUTImpl & time_zone) @@ -1034,21 +1009,39 @@ struct ToISOWeekImpl using FactorTransform = ToISOYearImpl; }; +enum class ResultPrecision +{ + Standard, + Extended +}; + +/// Standard precision results (precision_ == ResultPrecision::Standard) potentially lead to overflows when returning values. +/// This mode is used by SQL functions "toRelative*Num()" which cannot easily be changed due to backward compatibility. +/// According to documentation, these functions merely need to compute the time difference to a deterministic, fixed point in the past. +/// As a future TODO, we should fix their behavior in a backwards-compatible way. +/// See https://github.com/ClickHouse/ClickHouse/issues/41977#issuecomment-1267536814. +template struct ToRelativeYearNumImpl { static constexpr auto name = "toRelativeYearNum"; - static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toYear(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toYear(t)); + else + return static_cast(time_zone.toYear(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toYear(static_cast(t)); } - static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toYear(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toYear(ExtendedDayNum(d))); + else + return static_cast(time_zone.toYear(ExtendedDayNum(d))); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -1058,21 +1051,28 @@ struct ToRelativeYearNumImpl using FactorTransform = ZeroTransform; }; +template struct ToRelativeQuarterNumImpl { static constexpr auto name = "toRelativeQuarterNum"; - static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeQuarterNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeQuarterNum(t)); + else + return static_cast(time_zone.toRelativeQuarterNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toRelativeQuarterNum(static_cast(t)); } - static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeQuarterNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeQuarterNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.toRelativeQuarterNum(ExtendedDayNum(d))); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -1082,21 +1082,28 @@ struct ToRelativeQuarterNumImpl using FactorTransform = ZeroTransform; }; +template struct ToRelativeMonthNumImpl { static constexpr auto name = "toRelativeMonthNum"; - static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeMonthNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeMonthNum(t)); + else + return static_cast(time_zone.toRelativeMonthNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toRelativeMonthNum(static_cast(t)); } - static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeMonthNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeMonthNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.toRelativeMonthNum(ExtendedDayNum(d))); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -1106,21 +1113,28 @@ struct ToRelativeMonthNumImpl using FactorTransform = ZeroTransform; }; +template struct ToRelativeWeekNumImpl { static constexpr auto name = "toRelativeWeekNum"; - static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeWeekNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeWeekNum(t)); + else + return static_cast(time_zone.toRelativeWeekNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toRelativeWeekNum(static_cast(t)); } - static inline UInt16 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeWeekNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeWeekNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.toRelativeWeekNum(ExtendedDayNum(d))); } static inline UInt16 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -1130,21 +1144,28 @@ struct ToRelativeWeekNumImpl using FactorTransform = ZeroTransform; }; +template struct ToRelativeDayNumImpl { static constexpr auto name = "toRelativeDayNum"; - static inline UInt16 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toDayNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toDayNum(t)); + else + return static_cast(time_zone.toDayNum(t)); } static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toDayNum(static_cast(t)); } - static inline UInt16 execute(Int32 d, const DateLUTImpl &) + static inline auto execute(Int32 d, const DateLUTImpl &) { - return static_cast(d); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(static_cast(d)); + else + return static_cast(static_cast(d)); } static inline UInt16 execute(UInt16 d, const DateLUTImpl &) { @@ -1154,46 +1175,65 @@ struct ToRelativeDayNumImpl using FactorTransform = ZeroTransform; }; - +template struct ToRelativeHourNumImpl { static constexpr auto name = "toRelativeHourNum"; - static inline UInt32 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeHourNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toStableRelativeHourNum(t)); + else + return static_cast(time_zone.toRelativeHourNum(t)); } static inline UInt32 execute(UInt32 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeHourNum(static_cast(t)); + if constexpr (precision_ == ResultPrecision::Extended) + return time_zone.toStableRelativeHourNum(static_cast(t)); + else + return time_zone.toRelativeHourNum(static_cast(t)); } - static inline UInt32 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeHourNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toStableRelativeHourNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.toRelativeHourNum(ExtendedDayNum(d))); } static inline UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeHourNum(DayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return time_zone.toStableRelativeHourNum(DayNum(d)); + else + return time_zone.toRelativeHourNum(DayNum(d)); } using FactorTransform = ZeroTransform; }; +template struct ToRelativeMinuteNumImpl { static constexpr auto name = "toRelativeMinuteNum"; - static inline UInt32 execute(Int64 t, const DateLUTImpl & time_zone) + static inline auto execute(Int64 t, const DateLUTImpl & time_zone) { - return time_zone.toRelativeMinuteNum(t); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeMinuteNum(t)); + else + return static_cast(time_zone.toRelativeMinuteNum(t)); } static inline UInt32 execute(UInt32 t, const DateLUTImpl & time_zone) { return time_zone.toRelativeMinuteNum(static_cast(t)); } - static inline UInt32 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.toRelativeMinuteNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.toRelativeMinuteNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.toRelativeMinuteNum(ExtendedDayNum(d))); } static inline UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { @@ -1203,6 +1243,7 @@ struct ToRelativeMinuteNumImpl using FactorTransform = ZeroTransform; }; +template struct ToRelativeSecondNumImpl { static constexpr auto name = "toRelativeSecondNum"; @@ -1215,9 +1256,12 @@ struct ToRelativeSecondNumImpl { return t; } - static inline UInt32 execute(Int32 d, const DateLUTImpl & time_zone) + static inline auto execute(Int32 d, const DateLUTImpl & time_zone) { - return time_zone.fromDayNum(ExtendedDayNum(d)); + if constexpr (precision_ == ResultPrecision::Extended) + return static_cast(time_zone.fromDayNum(ExtendedDayNum(d))); + else + return static_cast(time_zone.fromDayNum(ExtendedDayNum(d))); } static inline UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { diff --git a/src/Functions/FunctionsConversion.h b/src/Functions/FunctionsConversion.h index 8cbe3b0e532..f3c9f46097f 100644 --- a/src/Functions/FunctionsConversion.h +++ b/src/Functions/FunctionsConversion.h @@ -302,11 +302,6 @@ struct ConvertImpl } }; -/** Conversion of Date32 to Date: check bounds. - */ -template struct ConvertImpl - : DateTimeTransformImpl {}; - /** Conversion of DateTime to Date: throw off time component. */ template struct ConvertImpl @@ -325,17 +320,12 @@ struct ToDateTimeImpl static UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { - auto date_time = time_zone.fromDayNum(ExtendedDayNum(d)); - return date_time <= 0xffffffff ? UInt32(date_time) : UInt32(0xffffffff); + return time_zone.fromDayNum(DayNum(d)); } - static UInt32 execute(Int32 d, const DateLUTImpl & time_zone) + static Int64 execute(Int32 d, const DateLUTImpl & time_zone) { - if (d < 0) - return 0; - - auto date_time = time_zone.fromDayNum(ExtendedDayNum(d)); - return date_time <= 0xffffffff ? date_time : 0xffffffff; + return time_zone.fromDayNum(ExtendedDayNum(d)); } static UInt32 execute(UInt32 dt, const DateLUTImpl & /*time_zone*/) @@ -343,21 +333,10 @@ struct ToDateTimeImpl return dt; } - static UInt32 execute(Int64 d, const DateLUTImpl & time_zone) + // TODO: return UInt32 ??? + static Int64 execute(Int64 dt64, const DateLUTImpl & /*time_zone*/) { - if (d < 0) - return 0; - - auto date_time = time_zone.toDate(d); - return date_time <= 0xffffffff ? date_time : 0xffffffff; - } - - static UInt32 execute(const DecimalUtils::DecimalComponents & t, const DateLUTImpl & /*time_zone*/) - { - if (t.whole < 0 || (t.whole >= 0 && t.fractional < 0)) - return 0; - - return std::min(t.whole, Int64(0xFFFFFFFF)); + return dt64; } }; @@ -377,12 +356,9 @@ struct ToDateTransform32Or64 static NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & time_zone) { // since converting to Date, no need in values outside of default LUT range. - if (from < 0) - return 0; - return (from < DATE_LUT_MAX_DAY_NUM) ? from - : std::min(Int32(time_zone.toDayNum(from)), Int32(DATE_LUT_MAX_DAY_NUM)); + : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); } }; @@ -397,14 +373,9 @@ struct ToDateTransform32Or64Signed /// The function should be monotonic (better for query optimizations), so we saturate instead of overflow. if (from < 0) return 0; - - auto day_num = time_zone.toDayNum(ExtendedDayNum(static_cast(from))); - return day_num < DATE_LUT_MAX_DAY_NUM ? day_num : DATE_LUT_MAX_DAY_NUM; - return (from < DATE_LUT_MAX_DAY_NUM) - ? from - : std::min(Int32(time_zone.toDayNum(static_cast(from))), Int32(0xFFFFFFFF)); - + ? static_cast(from) + : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); } }; @@ -435,7 +406,7 @@ struct ToDate32Transform32Or64 { return (from < DATE_LUT_MAX_EXTEND_DAY_NUM) ? from - : std::min(Int32(time_zone.toDayNum(from)), Int32(DATE_LUT_MAX_EXTEND_DAY_NUM)); + : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); } }; @@ -451,7 +422,7 @@ struct ToDate32Transform32Or64Signed return daynum_min_offset; return (from < DATE_LUT_MAX_EXTEND_DAY_NUM) ? static_cast(from) - : time_zone.toDayNum(std::min(Int64(from), Int64(0xFFFFFFFF))); + : time_zone.toDayNum(std::min(time_t(Int64(from)), time_t(0xFFFFFFFF))); } }; @@ -477,49 +448,35 @@ struct ToDate32Transform8Or16Signed */ template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; - template struct ConvertImpl : DateTimeTransformImpl> {}; @@ -531,7 +488,7 @@ struct ToDateTimeTransform64 static NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl &) { - return std::min(Int64(from), Int64(0xFFFFFFFF)); + return std::min(time_t(from), time_t(0xFFFFFFFF)); } }; @@ -553,12 +510,11 @@ struct ToDateTimeTransform64Signed { static constexpr auto name = "toDateTime"; - static NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & /* time_zone */) + static NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl &) { if (from < 0) return 0; - - return std::min(Int64(from), Int64(0xFFFFFFFF)); + return std::min(time_t(from), time_t(0xFFFFFFFF)); } }; @@ -678,6 +634,8 @@ struct FromDateTime64Transform } }; +/** Conversion of DateTime64 to Date or DateTime: discards fractional part. + */ template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl @@ -701,7 +659,7 @@ struct ToDateTime64Transform DateTime64::NativeType execute(Int32 d, const DateLUTImpl & time_zone) const { - const auto dt = time_zone.fromDayNum(ExtendedDayNum(d)); + const auto dt = ToDateTimeImpl::execute(d, time_zone); return DecimalUtils::decimalFromComponentsWithMultiplier(dt, 0, scale_multiplier); } @@ -1855,7 +1813,7 @@ private: { /// Account for optional timezone argument. if (arguments.size() != 2 && arguments.size() != 3) - throw Exception{"Function " + getName() + " expects 2 or 3 arguments for DateTime64.", + throw Exception{"Function " + getName() + " expects 2 or 3 arguments for DataTypeDateTime64.", ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION}; } else if (arguments.size() != 2) diff --git a/src/Functions/GregorianDate.h b/src/Functions/GregorianDate.h index ef2b9e6eede..332069e45ed 100644 --- a/src/Functions/GregorianDate.h +++ b/src/Functions/GregorianDate.h @@ -38,7 +38,7 @@ namespace DB * integral type which should be at least 32 bits wide, and * should preferably signed. */ - explicit GregorianDate(is_integer auto mjd); + explicit GregorianDate(is_integer auto modified_julian_day); /** Convert to Modified Julian Day. The type T is an integral type * which should be at least 32 bits wide, and should preferably @@ -89,7 +89,8 @@ namespace DB * integral type which should be at least 32 bits wide, and * should preferably signed. */ - explicit OrdinalDate(is_integer auto mjd); + template + explicit OrdinalDate(DayT modified_julian_day); /** Convert to Modified Julian Day. The type T is an integral * type which should be at least 32 bits wide, and should @@ -257,9 +258,9 @@ namespace DB } template - GregorianDate::GregorianDate(is_integer auto mjd) + GregorianDate::GregorianDate(is_integer auto modified_julian_day) { - const OrdinalDate ord(mjd); + const OrdinalDate ord(modified_julian_day); const MonthDay md(gd::is_leap_year(ord.year()), ord.dayOfYear()); year_ = ord.year(); month_ = md.month(); @@ -329,9 +330,24 @@ namespace DB } template - OrdinalDate::OrdinalDate(is_integer auto mjd) + template + OrdinalDate::OrdinalDate(DayT modified_julian_day) { - const auto a = mjd + 678575; + /// This function supports day number from -678941 to 2973119 (which represent 0000-01-01 and 9999-12-31 respectively). + + if constexpr (is_signed_v && std::numeric_limits::lowest() < -678941) + if (modified_julian_day < -678941) + throw Exception( + ErrorCodes::CANNOT_FORMAT_DATETIME, + "Value cannot be represented as date because it's out of range"); + + if constexpr (std::numeric_limits::max() > 2973119) + if (modified_julian_day > 2973119) + throw Exception( + ErrorCodes::CANNOT_FORMAT_DATETIME, + "Value cannot be represented as date because it's out of range"); + + const auto a = modified_julian_day + 678575; const auto quad_cent = gd::div(a, 146097); const auto b = gd::mod(a, 146097); const auto cent = gd::min(gd::div(b, 36524), 3); @@ -339,8 +355,9 @@ namespace DB const auto quad = gd::div(c, 1461); const auto d = gd::mod(c, 1461); const auto y = gd::min(gd::div(d, 365), 3); + day_of_year_ = d - y * 365 + 1; - year_ = quad_cent * 400 + cent * 100 + quad * 4 + y + 1; + year_ = quad_cent * 400 + cent * 100 + quad * 4 + y + 1; } template diff --git a/src/Functions/dateDiff.cpp b/src/Functions/dateDiff.cpp index b8bf3c11698..b33fcf32de1 100644 --- a/src/Functions/dateDiff.cpp +++ b/src/Functions/dateDiff.cpp @@ -61,25 +61,30 @@ public: DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override { if (arguments.size() != 3 && arguments.size() != 4) - throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " - + toString(arguments.size()) + ", should be 3 or 4", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Number of arguments for function {} doesn't match: passed {}, should be 3 or 4", + getName(), arguments.size()); if (!isString(arguments[0])) - throw Exception("First argument for function " + getName() + " (unit) must be String", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "First argument for function {} (unit) must be String", + getName()); - if (!isDate(arguments[1]) && !isDateTime(arguments[1]) && !isDateTime64(arguments[1])) - throw Exception("Second argument for function " + getName() + " must be Date or DateTime", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!isDate(arguments[1]) && !isDate32(arguments[1]) && !isDateTime(arguments[1]) && !isDateTime64(arguments[1])) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Second argument for function {} must be Date, Date32, DateTime or DateTime64", + getName()); - if (!isDate(arguments[2]) && !isDateTime(arguments[2]) && !isDateTime64(arguments[2])) - throw Exception("Third argument for function " + getName() + " must be Date or DateTime", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!isDate(arguments[2]) && !isDate32(arguments[2]) && !isDateTime(arguments[2]) && !isDateTime64(arguments[2])) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Third argument for function {} must be Date, Date32, DateTime or DateTime64", + getName() + ); if (arguments.size() == 4 && !isString(arguments[3])) - throw Exception("Fourth argument for function " + getName() + " (timezone) must be String", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Fourth argument for function {} (timezone) must be String", + getName()); return std::make_shared(); } @@ -91,7 +96,9 @@ public: { const auto * unit_column = checkAndGetColumnConst(arguments[0].column.get()); if (!unit_column) - throw Exception("First argument for function " + getName() + " must be constant String", ErrorCodes::ILLEGAL_COLUMN); + throw Exception(ErrorCodes::ILLEGAL_COLUMN, + "First argument for function {} must be constant String", + getName()); String unit = Poco::toLower(unit_column->getValue()); @@ -105,23 +112,24 @@ public: const auto & timezone_y = extractTimeZoneFromFunctionArguments(arguments, 3, 2); if (unit == "year" || unit == "yy" || unit == "yyyy") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "quarter" || unit == "qq" || unit == "q") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "month" || unit == "mm" || unit == "m") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "week" || unit == "wk" || unit == "ww") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "day" || unit == "dd" || unit == "d") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "hour" || unit == "hh" || unit == "h") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "minute" || unit == "mi" || unit == "n") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else if (unit == "second" || unit == "ss" || unit == "s") - dispatchForColumns(x, y, timezone_x, timezone_y, res->getData()); + dispatchForColumns>(x, y, timezone_x, timezone_y, res->getData()); else - throw Exception("Function " + getName() + " does not support '" + unit + "' unit", ErrorCodes::BAD_ARGUMENTS); + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Function {} does not support '{}' unit", getName(), unit); return res; } @@ -137,16 +145,22 @@ private: dispatchForSecondColumn(*x_vec_16, y, timezone_x, timezone_y, result); else if (const auto * x_vec_32 = checkAndGetColumn(&x)) dispatchForSecondColumn(*x_vec_32, y, timezone_x, timezone_y, result); + else if (const auto * x_vec_32_s = checkAndGetColumn(&x)) + dispatchForSecondColumn(*x_vec_32_s, y, timezone_x, timezone_y, result); else if (const auto * x_vec_64 = checkAndGetColumn(&x)) dispatchForSecondColumn(*x_vec_64, y, timezone_x, timezone_y, result); else if (const auto * x_const_16 = checkAndGetColumnConst(&x)) dispatchConstForSecondColumn(x_const_16->getValue(), y, timezone_x, timezone_y, result); else if (const auto * x_const_32 = checkAndGetColumnConst(&x)) dispatchConstForSecondColumn(x_const_32->getValue(), y, timezone_x, timezone_y, result); + else if (const auto * x_const_32_s = checkAndGetColumnConst(&x)) + dispatchConstForSecondColumn(x_const_32_s->getValue(), y, timezone_x, timezone_y, result); else if (const auto * x_const_64 = checkAndGetColumnConst(&x)) dispatchConstForSecondColumn(x_const_64->getValue>(), y, timezone_x, timezone_y, result); else - throw Exception("Illegal column for first argument of function " + getName() + ", must be Date, DateTime or DateTime64", ErrorCodes::ILLEGAL_COLUMN); + throw Exception(ErrorCodes::ILLEGAL_COLUMN, + "Illegal column for first argument of function {}, must be Date, Date32, DateTime or DateTime64", + getName()); } template @@ -159,16 +173,22 @@ private: vectorVector(x, *y_vec_16, timezone_x, timezone_y, result); else if (const auto * y_vec_32 = checkAndGetColumn(&y)) vectorVector(x, *y_vec_32, timezone_x, timezone_y, result); + else if (const auto * y_vec_32_s = checkAndGetColumn(&y)) + vectorVector(x, *y_vec_32_s, timezone_x, timezone_y, result); else if (const auto * y_vec_64 = checkAndGetColumn(&y)) vectorVector(x, *y_vec_64, timezone_x, timezone_y, result); else if (const auto * y_const_16 = checkAndGetColumnConst(&y)) vectorConstant(x, y_const_16->getValue(), timezone_x, timezone_y, result); else if (const auto * y_const_32 = checkAndGetColumnConst(&y)) vectorConstant(x, y_const_32->getValue(), timezone_x, timezone_y, result); + else if (const auto * y_const_32_s = checkAndGetColumnConst(&y)) + vectorConstant(x, y_const_32_s->getValue(), timezone_x, timezone_y, result); else if (const auto * y_const_64 = checkAndGetColumnConst(&y)) vectorConstant(x, y_const_64->getValue>(), timezone_x, timezone_y, result); else - throw Exception("Illegal column for second argument of function " + getName() + ", must be Date or DateTime", ErrorCodes::ILLEGAL_COLUMN); + throw Exception(ErrorCodes::ILLEGAL_COLUMN, + "Illegal column for second argument of function {}, must be Date, Date32, DateTime or DateTime64", + getName()); } template @@ -181,10 +201,14 @@ private: constantVector(x, *y_vec_16, timezone_x, timezone_y, result); else if (const auto * y_vec_32 = checkAndGetColumn(&y)) constantVector(x, *y_vec_32, timezone_x, timezone_y, result); + else if (const auto * y_vec_32_s = checkAndGetColumn(&y)) + constantVector(x, *y_vec_32_s, timezone_x, timezone_y, result); else if (const auto * y_vec_64 = checkAndGetColumn(&y)) constantVector(x, *y_vec_64, timezone_x, timezone_y, result); else - throw Exception("Illegal column for second argument of function " + getName() + ", must be Date or DateTime", ErrorCodes::ILLEGAL_COLUMN); + throw Exception(ErrorCodes::ILLEGAL_COLUMN, + "Illegal column for second argument of function {}, must be Date, Date32, DateTime or DateTime64", + getName()); } template diff --git a/src/Functions/dateName.cpp b/src/Functions/dateName.cpp index 3911b1cf838..36c0be49190 100644 --- a/src/Functions/dateName.cpp +++ b/src/Functions/dateName.cpp @@ -4,6 +4,7 @@ #include #include +#include #include #include #include @@ -34,6 +35,11 @@ template <> struct DataTypeToTimeTypeMap using TimeType = UInt16; }; +template <> struct DataTypeToTimeTypeMap +{ + using TimeType = Int32; +}; + template <> struct DataTypeToTimeTypeMap { using TimeType = UInt32; @@ -72,7 +78,7 @@ public: ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Number of arguments for function {} doesn't match: passed {}", getName(), - toString(arguments.size())); + arguments.size()); if (!WhichDataType(arguments[0].type).isString()) throw Exception( @@ -83,7 +89,7 @@ public: WhichDataType first_argument_type(arguments[1].type); - if (!(first_argument_type.isDate() || first_argument_type.isDateTime() || first_argument_type.isDateTime64())) + if (!(first_argument_type.isDate() || first_argument_type.isDateTime() || first_argument_type.isDate32() || first_argument_type.isDateTime64())) throw Exception( ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of 2 argument of function {}. Must be a date or a date with time", @@ -108,6 +114,7 @@ public: ColumnPtr res; if (!((res = executeType(arguments, result_type)) + || (res = executeType(arguments, result_type)) || (res = executeType(arguments, result_type)) || (res = executeType(arguments, result_type)))) throw Exception( diff --git a/src/Functions/randDistribution.cpp b/src/Functions/randDistribution.cpp new file mode 100644 index 00000000000..94dad4fdc89 --- /dev/null +++ b/src/Functions/randDistribution.cpp @@ -0,0 +1,472 @@ +#include +#include +#include +#include "Common/Exception.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int ILLEGAL_COLUMN; + extern const int BAD_ARGUMENTS; + extern const int LOGICAL_ERROR; +} + +namespace +{ +struct UniformDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randUniform"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(Float64 min, Float64 max, ColumnFloat64::Container & container) + { + auto distribution = std::uniform_real_distribution<>(min, max); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct NormalDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randNormal"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(Float64 mean, Float64 variance, ColumnFloat64::Container & container) + { + auto distribution = std::normal_distribution<>(mean, variance); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct LogNormalDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randLogNormal"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(Float64 mean, Float64 variance, ColumnFloat64::Container & container) + { + auto distribution = std::lognormal_distribution<>(mean, variance); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct ExponentialDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randExponential"; } + static constexpr size_t getNumberOfArguments() { return 1; } + + static void generate(Float64 lambda, ColumnFloat64::Container & container) + { + auto distribution = std::exponential_distribution<>(lambda); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct ChiSquaredDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randChiSquared"; } + static constexpr size_t getNumberOfArguments() { return 1; } + + static void generate(Float64 degree_of_freedom, ColumnFloat64::Container & container) + { + auto distribution = std::chi_squared_distribution<>(degree_of_freedom); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct StudentTDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randStudentT"; } + static constexpr size_t getNumberOfArguments() { return 1; } + + static void generate(Float64 degree_of_freedom, ColumnFloat64::Container & container) + { + auto distribution = std::student_t_distribution<>(degree_of_freedom); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct FisherFDistribution +{ + using ReturnType = DataTypeFloat64; + static constexpr const char * getName() { return "randFisherF"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(Float64 d1, Float64 d2, ColumnFloat64::Container & container) + { + auto distribution = std::fisher_f_distribution<>(d1, d2); + for (auto & elem : container) + elem = distribution(thread_local_rng); + } +}; + +struct BernoulliDistribution +{ + using ReturnType = DataTypeUInt8; + static constexpr const char * getName() { return "randBernoulli"; } + static constexpr size_t getNumberOfArguments() { return 1; } + + static void generate(Float64 p, ColumnUInt8::Container & container) + { + if (p < 0.0f || p > 1.0f) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName()); + + auto distribution = std::bernoulli_distribution(p); + for (auto & elem : container) + elem = static_cast(distribution(thread_local_rng)); + } +}; + +struct BinomialDistribution +{ + using ReturnType = DataTypeUInt64; + static constexpr const char * getName() { return "randBinomial"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(UInt64 t, Float64 p, ColumnUInt64::Container & container) + { + if (p < 0.0f || p > 1.0f) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName()); + + auto distribution = std::binomial_distribution(t, p); + for (auto & elem : container) + elem = static_cast(distribution(thread_local_rng)); + } +}; + +struct NegativeBinomialDistribution +{ + using ReturnType = DataTypeUInt64; + static constexpr const char * getName() { return "randNegativeBinomial"; } + static constexpr size_t getNumberOfArguments() { return 2; } + + static void generate(UInt64 t, Float64 p, ColumnUInt64::Container & container) + { + if (p < 0.0f || p > 1.0f) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of function {} should be inside [0, 1] because it is a probability", getName()); + + auto distribution = std::negative_binomial_distribution(t, p); + for (auto & elem : container) + elem = static_cast(distribution(thread_local_rng)); + } +}; + +struct PoissonDistribution +{ + using ReturnType = DataTypeUInt64; + static constexpr const char * getName() { return "randPoisson"; } + static constexpr size_t getNumberOfArguments() { return 1; } + + static void generate(UInt64 n, ColumnUInt64::Container & container) + { + auto distribution = std::poisson_distribution(n); + for (auto & elem : container) + elem = static_cast(distribution(thread_local_rng)); + } +}; + +} + +/** Function which will generate values according to the specified distribution + * Accepts only constant arguments + * Similar to the functions rand and rand64 an additional 'tag' argument could be added to the + * end of arguments list (this argument will be ignored) which will guarantee that functions are not sticked together + * during optimisations. + * Example: SELECT randNormal(0, 1, 1), randNormal(0, 1, 2) FROM numbers(10) + * This query will return two different columns + */ +template +class FunctionRandomDistribution : public IFunction +{ +private: + + template + ResultType getParameterFromConstColumn(size_t parameter_number, const ColumnsWithTypeAndName & arguments) const + { + if (parameter_number >= arguments.size()) + throw Exception( + ErrorCodes::LOGICAL_ERROR, "Parameter number ({}) is greater than the size of arguments ({}). This is a bug", parameter_number, arguments.size()); + + const IColumn * col = arguments[parameter_number].column.get(); + + if (!isColumnConst(*col)) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Parameter number {} of function must be constant.", parameter_number, getName()); + + auto parameter = applyVisitor(FieldVisitorConvertToNumber(), assert_cast(*col).getField()); + + if (isNaN(parameter) || !std::isfinite(parameter)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Parameter number {} of function {} cannot be NaN of infinite", parameter_number, getName()); + + return parameter; + } + +public: + static FunctionPtr create(ContextPtr) + { + return std::make_shared>(); + } + + static constexpr auto name = Distribution::getName(); + String getName() const override { return name; } + size_t getNumberOfArguments() const override { return Distribution::getNumberOfArguments(); } + bool isVariadic() const override { return true; } + bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return false; } + bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + auto desired = Distribution::getNumberOfArguments(); + if (arguments.size() != desired && arguments.size() != desired + 1) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Wrong number of arguments for function {}. Should be {} or {}", getName(), desired, desired + 1); + + for (size_t i = 0; i < Distribution::getNumberOfArguments(); ++i) + { + const auto & type = arguments[i]; + WhichDataType which(type); + if (!which.isFloat() && !which.isNativeUInt()) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Illegal type {} of argument of function {}, expected Float64 or integer", type->getName(), getName()); + } + + return std::make_shared(); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override + { + if constexpr (std::is_same_v) + { + auto res_column = ColumnUInt8::create(input_rows_count); + auto & res_data = res_column->getData(); + Distribution::generate(getParameterFromConstColumn(0, arguments), res_data); + return res_column; + } + else if constexpr (std::is_same_v || std::is_same_v) + { + auto res_column = ColumnUInt64::create(input_rows_count); + auto & res_data = res_column->getData(); + Distribution::generate(getParameterFromConstColumn(0, arguments), getParameterFromConstColumn(1, arguments), res_data); + return res_column; + } + else if constexpr (std::is_same_v) + { + auto res_column = ColumnUInt64::create(input_rows_count); + auto & res_data = res_column->getData(); + Distribution::generate(getParameterFromConstColumn(0, arguments), res_data); + return res_column; + } + else + { + auto res_column = ColumnFloat64::create(input_rows_count); + auto & res_data = res_column->getData(); + if constexpr (Distribution::getNumberOfArguments() == 1) + { + Distribution::generate(getParameterFromConstColumn(0, arguments), res_data); + } + else if constexpr (Distribution::getNumberOfArguments() == 2) + { + Distribution::generate(getParameterFromConstColumn(0, arguments), getParameterFromConstColumn(1, arguments), res_data); + } + else + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "More than two argument specified for function {}", getName()); + } + + return res_column; + } + } +}; + + +REGISTER_FUNCTION(Distribution) +{ + factory.registerFunction>( + { + R"( +Returns a random number from the uniform distribution in the specified range. +Accepts two parameters - minimum bound and maximum bound. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randUniform(0, 1) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + factory.registerFunction>( + { + R"( +Returns a random number from the normal distribution. +Accepts two parameters - mean and variance. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randNormal(0, 5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the lognormal distribution (a distribution of a random variable whose logarithm is normally distributed). +Accepts two parameters - mean and variance. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randLogNormal(0, 5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the exponential distribution. +Accepts one parameter. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randExponential(0, 5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the chi-squared distribution (a distribution of a sum of the squares of k independent standard normal random variables). +Accepts one parameter - degree of freedom. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randChiSquared(5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + factory.registerFunction>( + { + R"( +Returns a random number from the t-distribution. +Accepts one parameter - degree of freedom. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randStudentT(5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the f-distribution. +The F-distribution is the distribution of X = (S1 / d1) / (S2 / d2) where d1 and d2 are degrees of freedom. +Accepts two parameters - degrees of freedom. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randFisherF(5) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the Bernoulli distribution. +Accepts two parameters - probability of success. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randBernoulli(0.1) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the binomial distribution. +Accepts two parameters - number of experiments and probability of success in each experiment. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randBinomial(10, 0.1) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the negative binomial distribution. +Accepts two parameters - number of experiments and probability of success in each experiment. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randNegativeBinomial(10, 0.1) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); + + + factory.registerFunction>( + { + R"( +Returns a random number from the poisson distribution. +Accepts two parameters - the mean number of occurrences. + +Typical usage: +[example:typical] +)", + Documentation::Examples{ + {"typical", "SELECT randPoisson(3) FROM numbers(100000);"}}, + Documentation::Categories{"Distribution"} + }); +} + +} diff --git a/src/Functions/timeSlots.cpp b/src/Functions/timeSlots.cpp index 949ca7bc0e4..e986e32d76f 100644 --- a/src/Functions/timeSlots.cpp +++ b/src/Functions/timeSlots.cpp @@ -19,6 +19,7 @@ namespace ErrorCodes extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int ILLEGAL_TYPE_OF_ARGUMENT; extern const int ILLEGAL_COLUMN; + extern const int BAD_ARGUMENTS; } namespace @@ -41,6 +42,9 @@ struct TimeSlotsImpl const PaddedPODArray & starts, const PaddedPODArray & durations, UInt32 time_slot_size, PaddedPODArray & result_values, ColumnArray::Offsets & result_offsets) { + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + size_t size = starts.size(); result_offsets.resize(size); @@ -63,6 +67,9 @@ struct TimeSlotsImpl const PaddedPODArray & starts, UInt32 duration, UInt32 time_slot_size, PaddedPODArray & result_values, ColumnArray::Offsets & result_offsets) { + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + size_t size = starts.size(); result_offsets.resize(size); @@ -85,6 +92,9 @@ struct TimeSlotsImpl UInt32 start, const PaddedPODArray & durations, UInt32 time_slot_size, PaddedPODArray & result_values, ColumnArray::Offsets & result_offsets) { + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + size_t size = durations.size(); result_offsets.resize(size); @@ -125,6 +135,9 @@ struct TimeSlotsImpl ColumnArray::Offset current_offset = 0; time_slot_size = time_slot_size.value * ts_multiplier; + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + for (size_t i = 0; i < size; ++i) { for (DateTime64 value = (starts[i] * dt_multiplier) / time_slot_size, end = (starts[i] * dt_multiplier + durations[i] * dur_multiplier) / time_slot_size; value <= end; value += 1) @@ -155,6 +168,9 @@ struct TimeSlotsImpl ColumnArray::Offset current_offset = 0; duration = duration * dur_multiplier; time_slot_size = time_slot_size.value * ts_multiplier; + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + for (size_t i = 0; i < size; ++i) { for (DateTime64 value = (starts[i] * dt_multiplier) / time_slot_size, end = (starts[i] * dt_multiplier + duration) / time_slot_size; value <= end; value += 1) @@ -185,6 +201,9 @@ struct TimeSlotsImpl ColumnArray::Offset current_offset = 0; start = dt_multiplier * start; time_slot_size = time_slot_size.value * ts_multiplier; + if (time_slot_size == 0) + throw Exception("Time slot size cannot be zero", ErrorCodes::BAD_ARGUMENTS); + for (size_t i = 0; i < size; ++i) { for (DateTime64 value = start / time_slot_size, end = (start + durations[i] * dur_multiplier) / time_slot_size; value <= end; value += 1) diff --git a/src/Functions/toRelativeDayNum.cpp b/src/Functions/toRelativeDayNum.cpp index 241104493cd..db3eb119dcf 100644 --- a/src/Functions/toRelativeDayNum.cpp +++ b/src/Functions/toRelativeDayNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeDayNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeDayNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeDayNum) { diff --git a/src/Functions/toRelativeHourNum.cpp b/src/Functions/toRelativeHourNum.cpp index 2404d73c450..838b1bb1ca1 100644 --- a/src/Functions/toRelativeHourNum.cpp +++ b/src/Functions/toRelativeHourNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeHourNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeHourNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeHourNum) { diff --git a/src/Functions/toRelativeMinuteNum.cpp b/src/Functions/toRelativeMinuteNum.cpp index a5ecada1e92..e9318517119 100644 --- a/src/Functions/toRelativeMinuteNum.cpp +++ b/src/Functions/toRelativeMinuteNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeMinuteNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeMinuteNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeMinuteNum) { diff --git a/src/Functions/toRelativeMonthNum.cpp b/src/Functions/toRelativeMonthNum.cpp index 8f46e04e483..7b058c3ba12 100644 --- a/src/Functions/toRelativeMonthNum.cpp +++ b/src/Functions/toRelativeMonthNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeMonthNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeMonthNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeMonthNum) { diff --git a/src/Functions/toRelativeQuarterNum.cpp b/src/Functions/toRelativeQuarterNum.cpp index 8ea0c42ef09..c7702d47f42 100644 --- a/src/Functions/toRelativeQuarterNum.cpp +++ b/src/Functions/toRelativeQuarterNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeQuarterNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeQuarterNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeQuarterNum) { diff --git a/src/Functions/toRelativeSecondNum.cpp b/src/Functions/toRelativeSecondNum.cpp index 7af41ab8334..db80f721fbd 100644 --- a/src/Functions/toRelativeSecondNum.cpp +++ b/src/Functions/toRelativeSecondNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeSecondNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeSecondNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeSecondNum) { diff --git a/src/Functions/toRelativeWeekNum.cpp b/src/Functions/toRelativeWeekNum.cpp index fe7aec3fd9a..beca00d8cc4 100644 --- a/src/Functions/toRelativeWeekNum.cpp +++ b/src/Functions/toRelativeWeekNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeWeekNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeWeekNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeWeekNum) { diff --git a/src/Functions/toRelativeYearNum.cpp b/src/Functions/toRelativeYearNum.cpp index 4574d8513e0..b4fe3318129 100644 --- a/src/Functions/toRelativeYearNum.cpp +++ b/src/Functions/toRelativeYearNum.cpp @@ -7,7 +7,7 @@ namespace DB { -using FunctionToRelativeYearNum = FunctionDateOrDateTimeToSomething; +using FunctionToRelativeYearNum = FunctionDateOrDateTimeToSomething>; REGISTER_FUNCTION(ToRelativeYearNum) { diff --git a/src/IO/S3/PocoHTTPClient.h b/src/IO/S3/PocoHTTPClient.h index 57e4369e565..5649638285d 100644 --- a/src/IO/S3/PocoHTTPClient.h +++ b/src/IO/S3/PocoHTTPClient.h @@ -2,20 +2,22 @@ #include "config.h" +#include +#include + #if USE_AWS_S3 #include #include #include #include -#include +#include #include #include #include #include - namespace Aws::Http::Standard { class StandardHttpResponse; @@ -23,6 +25,7 @@ class StandardHttpResponse; namespace DB { + class Context; } diff --git a/src/IO/S3Common.cpp b/src/IO/S3Common.cpp index df19748b493..859f5ce796b 100644 --- a/src/IO/S3Common.cpp +++ b/src/IO/S3Common.cpp @@ -1,9 +1,11 @@ +#include + +#include +#include #include "config.h" #if USE_AWS_S3 -# include - # include # include @@ -780,25 +782,16 @@ namespace S3 boost::to_upper(name); if (name != S3 && name != COS && name != OBS && name != OSS) - { throw Exception(ErrorCodes::BAD_ARGUMENTS, "Object storage system name is unrecognized in virtual hosted style S3 URI: {}", quoteString(name)); - } + if (name == S3) - { storage_name = name; - } else if (name == OBS) - { storage_name = OBS; - } else if (name == OSS) - { storage_name = OSS; - } else - { storage_name = COSN; - } } else if (re2::RE2::PartialMatch(uri.getPath(), path_style_pattern, &bucket, &key)) { @@ -851,8 +844,82 @@ namespace S3 { return getObjectInfo(client_ptr, bucket, key, version_id, throw_on_error, for_disk_s3).size; } + } } #endif + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int INVALID_CONFIG_PARAMETER; +} + +namespace S3 +{ + +AuthSettings AuthSettings::loadFromConfig(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config) +{ + auto access_key_id = config.getString(config_elem + ".access_key_id", ""); + auto secret_access_key = config.getString(config_elem + ".secret_access_key", ""); + auto region = config.getString(config_elem + ".region", ""); + auto server_side_encryption_customer_key_base64 = config.getString(config_elem + ".server_side_encryption_customer_key_base64", ""); + + std::optional use_environment_credentials; + if (config.has(config_elem + ".use_environment_credentials")) + use_environment_credentials = config.getBool(config_elem + ".use_environment_credentials"); + + std::optional use_insecure_imds_request; + if (config.has(config_elem + ".use_insecure_imds_request")) + use_insecure_imds_request = config.getBool(config_elem + ".use_insecure_imds_request"); + + HeaderCollection headers; + Poco::Util::AbstractConfiguration::Keys subconfig_keys; + config.keys(config_elem, subconfig_keys); + for (const std::string & subkey : subconfig_keys) + { + if (subkey.starts_with("header")) + { + auto header_str = config.getString(config_elem + "." + subkey); + auto delimiter = header_str.find(':'); + if (delimiter == std::string::npos) + throw Exception("Malformed s3 header value", ErrorCodes::INVALID_CONFIG_PARAMETER); + headers.emplace_back(HttpHeader{header_str.substr(0, delimiter), header_str.substr(delimiter + 1, String::npos)}); + } + } + + return AuthSettings + { + std::move(access_key_id), std::move(secret_access_key), + std::move(region), + std::move(server_side_encryption_customer_key_base64), + std::move(headers), + use_environment_credentials, + use_insecure_imds_request + }; +} + + +void AuthSettings::updateFrom(const AuthSettings & from) +{ + /// Update with check for emptyness only parameters which + /// can be passed not only from config, but via ast. + + if (!from.access_key_id.empty()) + access_key_id = from.access_key_id; + if (!from.secret_access_key.empty()) + secret_access_key = from.secret_access_key; + + headers = from.headers; + region = from.region; + server_side_encryption_customer_key_base64 = from.server_side_encryption_customer_key_base64; + use_environment_credentials = from.use_environment_credentials; + use_insecure_imds_request = from.use_insecure_imds_request; +} + +} +} diff --git a/src/IO/S3Common.h b/src/IO/S3Common.h index 5c27b32985f..93e5eb78c7f 100644 --- a/src/IO/S3Common.h +++ b/src/IO/S3Common.h @@ -1,5 +1,11 @@ #pragma once +#include +#include + +#include +#include + #include "config.h" #if USE_AWS_S3 @@ -8,7 +14,6 @@ #include #include #include -#include #include #include @@ -27,8 +32,6 @@ namespace ErrorCodes } class RemoteHostFilter; -struct HttpHeader; -using HeaderCollection = std::vector; class S3Exception : public Exception { @@ -130,5 +133,33 @@ S3::ObjectInfo getObjectInfo(std::shared_ptr client_ptr size_t getObjectSize(std::shared_ptr client_ptr, const String & bucket, const String & key, const String & version_id, bool throw_on_error, bool for_disk_s3); } - #endif + +namespace Poco::Util +{ +class AbstractConfiguration; +}; + +namespace DB::S3 +{ + +struct AuthSettings +{ + static AuthSettings loadFromConfig(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config); + + std::string access_key_id; + std::string secret_access_key; + std::string region; + std::string server_side_encryption_customer_key_base64; + + HeaderCollection headers; + + std::optional use_environment_credentials; + std::optional use_insecure_imds_request; + + bool operator==(const AuthSettings & other) const = default; + + void updateFrom(const AuthSettings & from); +}; + +} diff --git a/src/Interpreters/Cache/FileCache.cpp b/src/Interpreters/Cache/FileCache.cpp index 20a9f6cce1d..72fa1b3c324 100644 --- a/src/Interpreters/Cache/FileCache.cpp +++ b/src/Interpreters/Cache/FileCache.cpp @@ -32,6 +32,8 @@ FileCache::FileCache( , allow_persistent_files(cache_settings_.do_not_evict_index_and_mark_files) , enable_cache_hits_threshold(cache_settings_.enable_cache_hits_threshold) , enable_filesystem_query_cache_limit(cache_settings_.enable_filesystem_query_cache_limit) + , enable_bypass_cache_with_threashold(cache_settings_.enable_bypass_cache_with_threashold) + , bypass_cache_threashold(cache_settings_.bypass_cache_threashold) , log(&Poco::Logger::get("FileCache")) , main_priority(std::make_unique()) , stash_priority(std::make_unique()) @@ -185,6 +187,20 @@ FileSegments FileCache::getImpl( /// Given range = [left, right] and non-overlapping ordered set of file segments, /// find list [segment1, ..., segmentN] of segments which intersect with given range. + FileSegments result; + + if (enable_bypass_cache_with_threashold && (range.size() > bypass_cache_threashold)) + { + auto file_segment = std::make_shared( + range.left, range.size(), key, this, FileSegment::State::SKIP_CACHE, CreateFileSegmentSettings{}); + { + std::unique_lock segment_lock(file_segment->mutex); + file_segment->detachAssumeStateFinalized(segment_lock); + } + result.emplace_back(file_segment); + return result; + } + auto it = files.find(key); if (it == files.end()) return {}; @@ -197,7 +213,6 @@ FileSegments FileCache::getImpl( return {}; } - FileSegments result; auto segment_it = file_segments.lower_bound(range.left); if (segment_it == file_segments.end()) { @@ -392,7 +407,6 @@ FileSegmentsHolder FileCache::getOrSet(const Key & key, size_t offset, size_t si #endif FileSegment::Range range(offset, offset + size - 1); - /// Get all segments which intersect with the given range. auto file_segments = getImpl(key, range, cache_lock); @@ -404,7 +418,6 @@ FileSegmentsHolder FileCache::getOrSet(const Key & key, size_t offset, size_t si { fillHolesWithEmptyFileSegments(file_segments, key, range, /* fill_with_detached */false, settings, cache_lock); } - assert(!file_segments.empty()); return FileSegmentsHolder(std::move(file_segments)); } diff --git a/src/Interpreters/Cache/FileCache.h b/src/Interpreters/Cache/FileCache.h index 07aea230803..706762b6915 100644 --- a/src/Interpreters/Cache/FileCache.h +++ b/src/Interpreters/Cache/FileCache.h @@ -140,6 +140,9 @@ private: const size_t enable_cache_hits_threshold; const bool enable_filesystem_query_cache_limit; + const bool enable_bypass_cache_with_threashold; + const size_t bypass_cache_threashold; + mutable std::mutex mutex; Poco::Logger * log; diff --git a/src/Interpreters/Cache/FileCacheSettings.cpp b/src/Interpreters/Cache/FileCacheSettings.cpp index 4b8d806bb53..b13cdd2ed04 100644 --- a/src/Interpreters/Cache/FileCacheSettings.cpp +++ b/src/Interpreters/Cache/FileCacheSettings.cpp @@ -35,6 +35,13 @@ void FileCacheSettings::loadFromConfig(const Poco::Util::AbstractConfiguration & enable_filesystem_query_cache_limit = config.getUInt64(config_prefix + ".enable_filesystem_query_cache_limit", false); enable_cache_hits_threshold = config.getUInt64(config_prefix + ".enable_cache_hits_threshold", REMOTE_FS_OBJECTS_CACHE_ENABLE_HITS_THRESHOLD); + enable_bypass_cache_with_threashold = config.getUInt64(config_prefix + ".enable_bypass_cache_with_threashold", false); + + if (config.has(config_prefix + ".bypass_cache_threashold")) + bypass_cache_threashold = parseWithSizeSuffix(config.getString(config_prefix + ".bypass_cache_threashold")); + else + bypass_cache_threashold = REMOTE_FS_OBJECTS_CACHE_BYPASS_THRESHOLD; + do_not_evict_index_and_mark_files = config.getUInt64(config_prefix + ".do_not_evict_index_and_mark_files", false); } diff --git a/src/Interpreters/Cache/FileCacheSettings.h b/src/Interpreters/Cache/FileCacheSettings.h index c6155edad85..80f7b5fa93f 100644 --- a/src/Interpreters/Cache/FileCacheSettings.h +++ b/src/Interpreters/Cache/FileCacheSettings.h @@ -20,6 +20,9 @@ struct FileCacheSettings bool do_not_evict_index_and_mark_files = true; + bool enable_bypass_cache_with_threashold = false; + size_t bypass_cache_threashold = REMOTE_FS_OBJECTS_CACHE_BYPASS_THRESHOLD; + void loadFromConfig(const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix); }; diff --git a/src/Interpreters/Cache/FileCache_fwd.h b/src/Interpreters/Cache/FileCache_fwd.h index 25c16b4e840..72dc1144fb9 100644 --- a/src/Interpreters/Cache/FileCache_fwd.h +++ b/src/Interpreters/Cache/FileCache_fwd.h @@ -7,6 +7,7 @@ namespace DB static constexpr int REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_FILE_SEGMENT_SIZE = 100 * 1024 * 1024; static constexpr int REMOTE_FS_OBJECTS_CACHE_DEFAULT_MAX_ELEMENTS = 1024 * 1024; static constexpr int REMOTE_FS_OBJECTS_CACHE_ENABLE_HITS_THRESHOLD = 0; +static constexpr size_t REMOTE_FS_OBJECTS_CACHE_BYPASS_THRESHOLD = 256 * 1024 * 1024;; class FileCache; using FileCachePtr = std::shared_ptr; diff --git a/src/Interpreters/ClusterProxy/executeQuery.cpp b/src/Interpreters/ClusterProxy/executeQuery.cpp index d974721627e..3c294dd7885 100644 --- a/src/Interpreters/ClusterProxy/executeQuery.cpp +++ b/src/Interpreters/ClusterProxy/executeQuery.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -26,7 +27,7 @@ namespace ErrorCodes namespace ClusterProxy { -ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr context, const Settings & settings, Poco::Logger * log) +ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr context, const Settings & settings, const StorageID & main_table, const SelectQueryInfo * query_info, Poco::Logger * log) { Settings new_settings = settings; new_settings.queue_max_wait_ms = Cluster::saturate(new_settings.queue_max_wait_ms, settings.max_execution_time); @@ -96,6 +97,20 @@ ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr c new_settings.limit.changed = false; } + /// Setting additional_table_filters may be applied to Distributed table. + /// In case if query is executed up to WithMergableState on remote shard, it is impossible to filter on initiator. + /// We need to propagate the setting, but change the table name from distributed to source. + /// + /// Here we don't try to analyze setting again. In case if query_info->additional_filter_ast is not empty, some filter was applied. + /// It's just easier to add this filter for a source table. + if (query_info && query_info->additional_filter_ast) + { + Tuple tuple; + tuple.push_back(main_table.getShortName()); + tuple.push_back(queryToString(query_info->additional_filter_ast)); + new_settings.additional_table_filters.value.push_back(std::move(tuple)); + } + auto new_context = Context::createCopy(context); new_context->setSettings(new_settings); return new_context; @@ -121,12 +136,12 @@ void executeQuery( std::vector plans; SelectStreamFactory::Shards remote_shards; - auto new_context = updateSettingsForCluster(*query_info.getCluster(), context, settings, log); + auto new_context = updateSettingsForCluster(*query_info.getCluster(), context, settings, main_table, &query_info, log); new_context->getClientInfo().distributed_depth += 1; ThrottlerPtr user_level_throttler; - if (auto * process_list_element = context->getProcessListElement()) + if (auto process_list_element = context->getProcessListElement()) user_level_throttler = process_list_element->getUserNetworkThrottler(); /// Network bandwidth limit, if needed. @@ -228,7 +243,7 @@ void executeQueryWithParallelReplicas( const Settings & settings = context->getSettingsRef(); ThrottlerPtr user_level_throttler; - if (auto * process_list_element = context->getProcessListElement()) + if (auto process_list_element = context->getProcessListElement()) user_level_throttler = process_list_element->getUserNetworkThrottler(); /// Network bandwidth limit, if needed. diff --git a/src/Interpreters/ClusterProxy/executeQuery.h b/src/Interpreters/ClusterProxy/executeQuery.h index 1a5035015a7..ac88752ce74 100644 --- a/src/Interpreters/ClusterProxy/executeQuery.h +++ b/src/Interpreters/ClusterProxy/executeQuery.h @@ -35,7 +35,7 @@ class SelectStreamFactory; /// /// @return new Context with adjusted settings ContextMutablePtr updateSettingsForCluster( - const Cluster & cluster, ContextPtr context, const Settings & settings, Poco::Logger * log = nullptr); + const Cluster & cluster, ContextPtr context, const Settings & settings, const StorageID & main_table, const SelectQueryInfo * query_info = nullptr, Poco::Logger * log = nullptr); /// Execute a distributed query, creating a query plan, from which the query pipeline can be built. /// `stream_factory` object encapsulates the logic of creating plans for a different type of query diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index 1de56e950c6..721d701c9a2 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -1463,10 +1463,8 @@ void Context::setCurrentQueryId(const String & query_id) void Context::killCurrentQuery() { - if (process_list_elem) - { - process_list_elem->cancelQuery(true); - } + if (auto elem = process_list_elem.lock()) + elem->cancelQuery(true); } String Context::getDefaultFormat() const @@ -1707,15 +1705,15 @@ ProgressCallback Context::getProgressCallback() const } -void Context::setProcessListElement(ProcessList::Element * elem) +void Context::setProcessListElement(QueryStatusPtr elem) { /// Set to a session or query. In the session, only one query is processed at a time. Therefore, the lock is not needed. process_list_elem = elem; } -ProcessList::Element * Context::getProcessListElement() const +QueryStatusPtr Context::getProcessListElement() const { - return process_list_elem; + return process_list_elem.lock(); } diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index 233f4011ce3..eeb9e8da148 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -68,6 +68,7 @@ class MMappedFileCache; class UncompressedCache; class ProcessList; class QueryStatus; +using QueryStatusPtr = std::shared_ptr; class Macros; struct Progress; struct FileProgress; @@ -230,7 +231,7 @@ private: using FileProgressCallback = std::function; FileProgressCallback file_progress_callback; /// Callback for tracking progress of file loading. - QueryStatus * process_list_elem = nullptr; /// For tracking total resource usage for query. + std::weak_ptr process_list_elem; /// For tracking total resource usage for query. StorageID insertion_table = StorageID::createEmpty(); /// Saved insertion table in query context bool is_distributed = false; /// Whether the current context it used for distributed query @@ -750,9 +751,9 @@ public: /** Set in executeQuery and InterpreterSelectQuery. Then it is used in QueryPipeline, * to update and monitor information about the total number of resources spent for the query. */ - void setProcessListElement(QueryStatus * elem); + void setProcessListElement(QueryStatusPtr elem); /// Can return nullptr if the query was not inserted into the ProcessList. - QueryStatus * getProcessListElement() const; + QueryStatusPtr getProcessListElement() const; /// List all queries. ProcessList & getProcessList(); diff --git a/src/Interpreters/DDLWorker.cpp b/src/Interpreters/DDLWorker.cpp index 8873d851de1..0e4c658a1ee 100644 --- a/src/Interpreters/DDLWorker.cpp +++ b/src/Interpreters/DDLWorker.cpp @@ -114,7 +114,7 @@ DDLWorker::DDLWorker( void DDLWorker::startup() { [[maybe_unused]] bool prev_stop_flag = stop_flag.exchange(false); - chassert(true); + chassert(prev_stop_flag); main_thread = ThreadFromGlobalPool(&DDLWorker::runMainThread, this); cleanup_thread = ThreadFromGlobalPool(&DDLWorker::runCleanupThread, this); } diff --git a/src/Interpreters/ProcessList.cpp b/src/Interpreters/ProcessList.cpp index d5194a02513..3c1ebe21c48 100644 --- a/src/Interpreters/ProcessList.cpp +++ b/src/Interpreters/ProcessList.cpp @@ -243,15 +243,15 @@ ProcessList::EntryPtr ProcessList::insert(const String & query_, const IAST * as } auto process_it = processes.emplace(processes.end(), - query_context, query_, client_info, priorities.insert(settings.priority), std::move(thread_group), query_kind); + std::make_shared(query_context, query_, client_info, priorities.insert(settings.priority), std::move(thread_group), query_kind)); increaseQueryKindAmount(query_kind); res = std::make_shared(*this, process_it); - process_it->setUserProcessList(&user_process_list); + (*process_it)->setUserProcessList(&user_process_list); - user_process_list.queries.emplace(client_info.current_query_id, &res->get()); + user_process_list.queries.emplace(client_info.current_query_id, res->getQueryStatus()); /// Track memory usage for all simultaneously running queries from single user. user_process_list.user_memory_tracker.setOrRaiseHardLimit(settings.max_memory_usage_for_user); @@ -280,11 +280,11 @@ ProcessListEntry::~ProcessListEntry() { auto lock = parent.safeLock(); - String user = it->getClientInfo().current_user; - String query_id = it->getClientInfo().current_query_id; - IAST::QueryKind query_kind = it->query_kind; + String user = (*it)->getClientInfo().current_user; + String query_id = (*it)->getClientInfo().current_query_id; + IAST::QueryKind query_kind = (*it)->query_kind; - const QueryStatus * process_list_element_ptr = &*it; + const QueryStatusPtr process_list_element_ptr = *it; auto user_process_list_it = parent.user_to_queries.find(user); if (user_process_list_it == parent.user_to_queries.end()) @@ -307,7 +307,7 @@ ProcessListEntry::~ProcessListEntry() } /// Wait for the query if it is in the cancellation right now. - parent.cancelled_cv.wait(lock.lock, [&]() { return it->is_cancelling == false; }); + parent.cancelled_cv.wait(lock.lock, [&]() { return process_list_element_ptr->is_cancelling == false; }); /// This removes the memory_tracker of one request. parent.processes.erase(it); @@ -344,6 +344,7 @@ QueryStatus::QueryStatus( , client_info(client_info_) , thread_group(std::move(thread_group_)) , priority_handle(std::move(priority_handle_)) + , global_overcommit_tracker(context_->getGlobalOvercommitTracker()) , query_kind(query_kind_) , num_queries_increment(CurrentMetrics::Query) { @@ -360,8 +361,8 @@ QueryStatus::~QueryStatus() { if (user_process_list) user_process_list->user_overcommit_tracker.onQueryStop(memory_tracker); - if (auto shared_context = getContext()) - shared_context->getGlobalOvercommitTracker()->onQueryStop(memory_tracker); + if (global_overcommit_tracker) + global_overcommit_tracker->onQueryStop(memory_tracker); } } @@ -430,7 +431,7 @@ ThrottlerPtr QueryStatus::getUserNetworkThrottler() } -QueryStatus * ProcessList::tryGetProcessListElement(const String & current_query_id, const String & current_user) +QueryStatusPtr ProcessList::tryGetProcessListElement(const String & current_query_id, const String & current_user) { auto user_it = user_to_queries.find(current_user); if (user_it != user_to_queries.end()) @@ -442,13 +443,13 @@ QueryStatus * ProcessList::tryGetProcessListElement(const String & current_query return query_it->second; } - return nullptr; + return {}; } CancellationCode ProcessList::sendCancelToQuery(const String & current_query_id, const String & current_user, bool kill) { - QueryStatus * elem; + QueryStatusPtr elem; /// Cancelling the query should be done without the lock. /// @@ -484,7 +485,7 @@ CancellationCode ProcessList::sendCancelToQuery(const String & current_query_id, void ProcessList::killAllQueries() { - std::vector cancelled_processes; + std::vector cancelled_processes; SCOPE_EXIT({ auto lock = safeLock(); @@ -498,8 +499,8 @@ void ProcessList::killAllQueries() cancelled_processes.reserve(processes.size()); for (auto & process : processes) { - cancelled_processes.push_back(&process); - process.is_cancelling = true; + cancelled_processes.push_back(process); + process->is_cancelling = true; } } @@ -558,7 +559,7 @@ ProcessList::Info ProcessList::getInfo(bool get_thread_list, bool get_profile_ev per_query_infos.reserve(processes.size()); for (const auto & process : processes) - per_query_infos.emplace_back(process.getInfo(get_thread_list, get_profile_events, get_settings)); + per_query_infos.emplace_back(process->getInfo(get_thread_list, get_profile_events, get_settings)); return per_query_infos; } diff --git a/src/Interpreters/ProcessList.h b/src/Interpreters/ProcessList.h index 6943c7cfcd8..5fbdce358f9 100644 --- a/src/Interpreters/ProcessList.h +++ b/src/Interpreters/ProcessList.h @@ -133,6 +133,8 @@ protected: ProcessListForUser * user_process_list = nullptr; + OvercommitTracker * global_overcommit_tracker = nullptr; + IAST::QueryKind query_kind; /// This field is unused in this class, but it @@ -221,6 +223,8 @@ public: [[nodiscard]] bool checkTimeLimitSoft(); }; +using QueryStatusPtr = std::shared_ptr; + /// Information of process list for user. struct ProcessListForUserInfo @@ -241,7 +245,7 @@ struct ProcessListForUser ProcessListForUser(ContextPtr global_context, ProcessList * global_process_list); /// query_id -> ProcessListElement(s). There can be multiple queries with the same query_id as long as all queries except one are cancelled. - using QueryToElement = std::unordered_map; + using QueryToElement = std::unordered_map; QueryToElement queries; ProfileEvents::Counters user_performance_counters{VariableContext::User, &ProfileEvents::global_counters}; @@ -278,7 +282,7 @@ class ProcessList; class ProcessListEntry { private: - using Container = std::list; + using Container = std::list; ProcessList & parent; Container::iterator it; @@ -289,11 +293,8 @@ public: ~ProcessListEntry(); - QueryStatus * operator->() { return &*it; } - const QueryStatus * operator->() const { return &*it; } - - QueryStatus & get() { return *it; } - const QueryStatus & get() const { return *it; } + QueryStatusPtr getQueryStatus() { return *it; } + const QueryStatusPtr getQueryStatus() const { return *it; } }; @@ -319,7 +320,7 @@ protected: class ProcessList : public ProcessListBase { public: - using Element = QueryStatus; + using Element = QueryStatusPtr; using Entry = ProcessListEntry; using QueryAmount = UInt64; @@ -358,7 +359,7 @@ protected: ThrottlerPtr total_network_throttler; /// Call under lock. Finds process with specified current_user and current_query_id. - QueryStatus * tryGetProcessListElement(const String & current_query_id, const String & current_user); + QueryStatusPtr tryGetProcessListElement(const String & current_query_id, const String & current_user); /// limit for insert. 0 means no limit. Otherwise, when limit exceeded, an exception is thrown. size_t max_insert_queries_amount = 0; diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index abca563de55..86cf3401f03 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -537,7 +537,7 @@ static std::tuple executeQueryImpl( { /// processlist also has query masked now, to avoid secrets leaks though SHOW PROCESSLIST by other users. process_list_entry = context->getProcessList().insert(query_for_logging, ast.get(), context); - context->setProcessListElement(&process_list_entry->get()); + context->setProcessListElement(process_list_entry->getQueryStatus()); } /// Load external tables if they were provided @@ -713,9 +713,9 @@ static std::tuple executeQueryImpl( if (process_list_entry) { /// Query was killed before execution - if ((*process_list_entry)->isKilled()) - throw Exception("Query '" + (*process_list_entry)->getInfo().client_info.current_query_id + "' is killed in pending state", - ErrorCodes::QUERY_WAS_CANCELLED); + if (process_list_entry->getQueryStatus()->isKilled()) + throw Exception(ErrorCodes::QUERY_WAS_CANCELLED, + "Query '{}' is killed in pending state", process_list_entry->getQueryStatus()->getInfo().client_info.current_query_id); } /// Hold element of process list till end of query execution. @@ -859,7 +859,7 @@ static std::tuple executeQueryImpl( pulling_pipeline = pipeline.pulling(), query_span](QueryPipeline & query_pipeline) mutable { - QueryStatus * process_list_elem = context->getProcessListElement(); + QueryStatusPtr process_list_elem = context->getProcessListElement(); if (process_list_elem) { @@ -1025,7 +1025,7 @@ static std::tuple executeQueryImpl( elem.exception_code = getCurrentExceptionCode(); elem.exception = getCurrentExceptionMessage(false); - QueryStatus * process_list_elem = context->getProcessListElement(); + QueryStatusPtr process_list_elem = context->getProcessListElement(); const Settings & current_settings = context->getSettingsRef(); /// Update performance counters before logging to query_log diff --git a/src/Processors/Executors/CompletedPipelineExecutor.cpp b/src/Processors/Executors/CompletedPipelineExecutor.cpp index 9e5ea3916bc..a4c7fe2f687 100644 --- a/src/Processors/Executors/CompletedPipelineExecutor.cpp +++ b/src/Processors/Executors/CompletedPipelineExecutor.cpp @@ -72,9 +72,9 @@ void CompletedPipelineExecutor::execute() data->executor = std::make_shared(pipeline.processors, pipeline.process_list_element); data->executor->setReadProgressCallback(pipeline.getReadProgressCallback()); - /// Avoid passing this to labmda, copy ptr to data instead. + /// Avoid passing this to lambda, copy ptr to data instead. /// Destructor of unique_ptr copy raw ptr into local variable first, only then calls object destructor. - auto func = [data_ptr = data.get(), num_threads = pipeline.getNumThreads(), thread_group = CurrentThread::getGroup()]() + auto func = [data_ptr = data.get(), num_threads = pipeline.getNumThreads(), thread_group = CurrentThread::getGroup()] { threadFunction(*data_ptr, thread_group, num_threads); }; diff --git a/src/Processors/Executors/CompletedPipelineExecutor.h b/src/Processors/Executors/CompletedPipelineExecutor.h index e616cd6a2b7..65fab6035b1 100644 --- a/src/Processors/Executors/CompletedPipelineExecutor.h +++ b/src/Processors/Executors/CompletedPipelineExecutor.h @@ -1,7 +1,9 @@ #pragma once + #include #include + namespace DB { diff --git a/src/Processors/Executors/ExecutingGraph.cpp b/src/Processors/Executors/ExecutingGraph.cpp index 651ede10cfd..9d69abc5e87 100644 --- a/src/Processors/Executors/ExecutingGraph.cpp +++ b/src/Processors/Executors/ExecutingGraph.cpp @@ -10,17 +10,17 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -ExecutingGraph::ExecutingGraph(Processors & processors_, bool profile_processors_) - : processors(processors_) +ExecutingGraph::ExecutingGraph(std::shared_ptr processors_, bool profile_processors_) + : processors(std::move(processors_)) , profile_processors(profile_processors_) { - uint64_t num_processors = processors.size(); + uint64_t num_processors = processors->size(); nodes.reserve(num_processors); /// Create nodes. for (uint64_t node = 0; node < num_processors; ++node) { - IProcessor * proc = processors[node].get(); + IProcessor * proc = processors->at(node).get(); processors_map[proc] = node; nodes.emplace_back(std::make_unique(proc, node)); } @@ -109,10 +109,10 @@ bool ExecutingGraph::expandPipeline(std::stack & stack, uint64_t pid) { std::lock_guard guard(processors_mutex); - processors.insert(processors.end(), new_processors.begin(), new_processors.end()); + processors->insert(processors->end(), new_processors.begin(), new_processors.end()); } - uint64_t num_processors = processors.size(); + uint64_t num_processors = processors->size(); std::vector back_edges_sizes(num_processors, 0); std::vector direct_edge_sizes(num_processors, 0); @@ -126,7 +126,7 @@ bool ExecutingGraph::expandPipeline(std::stack & stack, uint64_t pid) while (nodes.size() < num_processors) { - auto * processor = processors[nodes.size()].get(); + auto * processor = processors->at(nodes.size()).get(); if (processors_map.contains(processor)) throw Exception(ErrorCodes::LOGICAL_ERROR, "Processor {} was already added to pipeline", processor->getName()); @@ -386,7 +386,7 @@ bool ExecutingGraph::updateNode(uint64_t pid, Queue & queue, Queue & async_queue void ExecutingGraph::cancel() { std::lock_guard guard(processors_mutex); - for (auto & processor : processors) + for (auto & processor : *processors) processor->cancel(); } diff --git a/src/Processors/Executors/ExecutingGraph.h b/src/Processors/Executors/ExecutingGraph.h index 587a2561ae0..b374f968122 100644 --- a/src/Processors/Executors/ExecutingGraph.h +++ b/src/Processors/Executors/ExecutingGraph.h @@ -1,4 +1,5 @@ #pragma once + #include #include #include @@ -6,6 +7,7 @@ #include #include + namespace DB { @@ -123,9 +125,9 @@ public: using ProcessorsMap = std::unordered_map; ProcessorsMap processors_map; - explicit ExecutingGraph(Processors & processors_, bool profile_processors_); + explicit ExecutingGraph(std::shared_ptr processors_, bool profile_processors_); - const Processors & getProcessors() const { return processors; } + const Processors & getProcessors() const { return *processors; } /// Traverse graph the first time to update all the childless nodes. void initializeExecution(Queue & queue); @@ -149,7 +151,7 @@ private: /// All new nodes and nodes with updated ports are pushed into stack. bool expandPipeline(std::stack & stack, uint64_t pid); - Processors & processors; + std::shared_ptr processors; std::mutex processors_mutex; UpgradableMutex nodes_mutex; diff --git a/src/Processors/Executors/PipelineExecutor.cpp b/src/Processors/Executors/PipelineExecutor.cpp index ae20d97604b..3772381de04 100644 --- a/src/Processors/Executors/PipelineExecutor.cpp +++ b/src/Processors/Executors/PipelineExecutor.cpp @@ -15,6 +15,7 @@ #include #endif + namespace DB { @@ -24,8 +25,8 @@ namespace ErrorCodes } -PipelineExecutor::PipelineExecutor(Processors & processors, QueryStatus * elem) - : process_list_element(elem) +PipelineExecutor::PipelineExecutor(std::shared_ptr & processors, QueryStatusPtr elem) + : process_list_element(std::move(elem)) { if (process_list_element) { @@ -41,7 +42,7 @@ PipelineExecutor::PipelineExecutor(Processors & processors, QueryStatus * elem) /// If exception was thrown while pipeline initialization, it means that query pipeline was not build correctly. /// It is logical error, and we need more information about pipeline. WriteBufferFromOwnString buf; - printPipeline(processors, buf); + printPipeline(*processors, buf); buf.finalize(); exception.addMessage("Query pipeline:\n" + buf.str()); diff --git a/src/Processors/Executors/PipelineExecutor.h b/src/Processors/Executors/PipelineExecutor.h index cea64d309fa..21bde312cbc 100644 --- a/src/Processors/Executors/PipelineExecutor.h +++ b/src/Processors/Executors/PipelineExecutor.h @@ -10,16 +10,19 @@ #include #include + namespace DB { class QueryStatus; +using QueryStatusPtr = std::shared_ptr; class ExecutingGraph; using ExecutingGraphPtr = std::unique_ptr; class ReadProgressCallback; using ReadProgressCallbackPtr = std::unique_ptr; + /// Executes query pipeline. class PipelineExecutor { @@ -30,7 +33,7 @@ public: /// During pipeline execution new processors can appear. They will be added to existing set. /// /// Explicit graph representation is built in constructor. Throws if graph is not correct. - explicit PipelineExecutor(Processors & processors, QueryStatus * elem); + explicit PipelineExecutor(std::shared_ptr & processors, QueryStatusPtr elem); ~PipelineExecutor(); /// Execute pipeline in multiple threads. Must be called once. @@ -79,7 +82,7 @@ private: Poco::Logger * log = &Poco::Logger::get("PipelineExecutor"); /// Now it's used to check if query was killed. - QueryStatus * const process_list_element = nullptr; + QueryStatusPtr process_list_element; ReadProgressCallbackPtr read_progress_callback; diff --git a/src/Processors/Executors/PushingAsyncPipelineExecutor.cpp b/src/Processors/Executors/PushingAsyncPipelineExecutor.cpp index 7a55d26f16c..ee8e94b6f28 100644 --- a/src/Processors/Executors/PushingAsyncPipelineExecutor.cpp +++ b/src/Processors/Executors/PushingAsyncPipelineExecutor.cpp @@ -129,7 +129,7 @@ PushingAsyncPipelineExecutor::PushingAsyncPipelineExecutor(QueryPipeline & pipel pushing_source = std::make_shared(pipeline.input->getHeader()); connect(pushing_source->getPort(), *pipeline.input); - pipeline.processors.emplace_back(pushing_source); + pipeline.processors->emplace_back(pushing_source); } PushingAsyncPipelineExecutor::~PushingAsyncPipelineExecutor() diff --git a/src/Processors/Executors/PushingPipelineExecutor.cpp b/src/Processors/Executors/PushingPipelineExecutor.cpp index bf43cd327fe..d9a14704cd0 100644 --- a/src/Processors/Executors/PushingPipelineExecutor.cpp +++ b/src/Processors/Executors/PushingPipelineExecutor.cpp @@ -58,7 +58,7 @@ PushingPipelineExecutor::PushingPipelineExecutor(QueryPipeline & pipeline_) : pi pushing_source = std::make_shared(pipeline.input->getHeader(), input_wait_flag); connect(pushing_source->getPort(), *pipeline.input); - pipeline.processors.emplace_back(pushing_source); + pipeline.processors->emplace_back(pushing_source); } PushingPipelineExecutor::~PushingPipelineExecutor() diff --git a/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.cpp b/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.cpp index 1c99a5484a2..16df132b9d8 100644 --- a/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.cpp @@ -67,6 +67,19 @@ CustomSeparatedRowInputFormat::CustomSeparatedRowInputFormat( } } +void CustomSeparatedRowInputFormat::readPrefix() +{ + RowInputFormatWithNamesAndTypes::readPrefix(); + + /// Provide better error message for unsupported delimiters + for (const auto & column_index : column_mapping->column_indexes_for_input_fields) + { + if (column_index) + checkSupportedDelimiterAfterField(format_settings.custom.escaping_rule, format_settings.custom.field_delimiter, data_types[*column_index]); + else + checkSupportedDelimiterAfterField(format_settings.custom.escaping_rule, format_settings.custom.field_delimiter, nullptr); + } +} bool CustomSeparatedRowInputFormat::allowSyncAfterError() const { diff --git a/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.h b/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.h index c7e332b983f..e7e96ab87b1 100644 --- a/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.h +++ b/src/Processors/Formats/Impl/CustomSeparatedRowInputFormat.h @@ -30,6 +30,7 @@ private: bool allowSyncAfterError() const override; void syncAfterError() override; + void readPrefix() override; std::unique_ptr buf; bool ignore_spaces; diff --git a/src/Processors/Formats/Impl/MySQLOutputFormat.cpp b/src/Processors/Formats/Impl/MySQLOutputFormat.cpp index 344c5c179db..b4aafbd3d9e 100644 --- a/src/Processors/Formats/Impl/MySQLOutputFormat.cpp +++ b/src/Processors/Formats/Impl/MySQLOutputFormat.cpp @@ -74,7 +74,7 @@ void MySQLOutputFormat::finalizeImpl() { size_t affected_rows = 0; std::string human_readable_info; - if (QueryStatus * process_list_elem = getContext()->getProcessListElement()) + if (QueryStatusPtr process_list_elem = getContext()->getProcessListElement()) { CurrentThread::finalizePerformanceCounters(); QueryStatusInfo info = process_list_elem->getInfo(); diff --git a/src/Processors/Formats/Impl/TemplateRowInputFormat.cpp b/src/Processors/Formats/Impl/TemplateRowInputFormat.cpp index 785658c0fa2..76fd0d2a907 100644 --- a/src/Processors/Formats/Impl/TemplateRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/TemplateRowInputFormat.cpp @@ -53,18 +53,25 @@ TemplateRowInputFormat::TemplateRowInputFormat(const Block & header_, std::uniqu std::vector column_in_format(header_.columns(), false); for (size_t i = 0; i < row_format.columnsCount(); ++i) { - if (row_format.format_idx_to_column_idx[i]) + const auto & column_index = row_format.format_idx_to_column_idx[i]; + if (column_index) { - if (header_.columns() <= *row_format.format_idx_to_column_idx[i]) - row_format.throwInvalidFormat("Column index " + std::to_string(*row_format.format_idx_to_column_idx[i]) + + if (header_.columns() <= *column_index) + row_format.throwInvalidFormat("Column index " + std::to_string(*column_index) + " must be less then number of columns (" + std::to_string(header_.columns()) + ")", i); if (row_format.escaping_rules[i] == EscapingRule::None) row_format.throwInvalidFormat("Column is not skipped, but deserialization type is None", i); - size_t col_idx = *row_format.format_idx_to_column_idx[i]; + size_t col_idx = *column_index; if (column_in_format[col_idx]) row_format.throwInvalidFormat("Duplicate column", i); column_in_format[col_idx] = true; + + checkSupportedDelimiterAfterField(row_format.escaping_rules[i], row_format.delimiters[i + 1], data_types[*column_index]); + } + else + { + checkSupportedDelimiterAfterField(row_format.escaping_rules[i], row_format.delimiters[i + 1], nullptr); } } diff --git a/src/Processors/Formats/RowInputFormatWithNamesAndTypes.h b/src/Processors/Formats/RowInputFormatWithNamesAndTypes.h index d2dd28eb15a..9d0734f4567 100644 --- a/src/Processors/Formats/RowInputFormatWithNamesAndTypes.h +++ b/src/Processors/Formats/RowInputFormatWithNamesAndTypes.h @@ -41,6 +41,7 @@ protected: void resetParser() override; bool isGarbageAfterField(size_t index, ReadBuffer::Position pos) override; void setReadBuffer(ReadBuffer & in_) override; + void readPrefix() override; const FormatSettings format_settings; DataTypes data_types; @@ -48,7 +49,6 @@ protected: private: bool readRow(MutableColumns & columns, RowReadExtension & ext) override; - void readPrefix() override; bool parseRowAndPrintDiagnosticInfo(MutableColumns & columns, WriteBuffer & out) override; void tryDeserializeField(const DataTypePtr & type, IColumn & column, size_t file_column) override; diff --git a/src/Processors/QueryPlan/BuildQueryPipelineSettings.h b/src/Processors/QueryPlan/BuildQueryPipelineSettings.h index fadbd061fbd..3b5e4e06953 100644 --- a/src/Processors/QueryPlan/BuildQueryPipelineSettings.h +++ b/src/Processors/QueryPlan/BuildQueryPipelineSettings.h @@ -5,16 +5,18 @@ #include + namespace DB { struct Settings; class QueryStatus; +using QueryStatusPtr = std::shared_ptr; struct BuildQueryPipelineSettings { ExpressionActionsSettings actions_settings; - QueryStatus * process_list_element = nullptr; + QueryStatusPtr process_list_element; ProgressCallback progress_callback = nullptr; const ExpressionActionsSettings & getActionsSettings() const { return actions_settings; } diff --git a/src/Processors/Transforms/CountingTransform.h b/src/Processors/Transforms/CountingTransform.h index bd2ec58a27f..05d8e2aeac8 100644 --- a/src/Processors/Transforms/CountingTransform.h +++ b/src/Processors/Transforms/CountingTransform.h @@ -9,6 +9,7 @@ namespace DB { class QueryStatus; +using QueryStatusPtr = std::shared_ptr; class ThreadStatus; /// Proxy class which counts number of written block, rows, bytes @@ -29,7 +30,7 @@ public: progress_callback = callback; } - void setProcessListElement(QueryStatus * elem) + void setProcessListElement(QueryStatusPtr elem) { process_elem = elem; } @@ -50,7 +51,7 @@ public: protected: Progress progress; ProgressCallback progress_callback; - QueryStatus * process_elem = nullptr; + QueryStatusPtr process_elem; ThreadStatus * thread_status = nullptr; /// Quota is used to limit amount of written bytes. diff --git a/src/Processors/Transforms/buildPushingToViewsChain.cpp b/src/Processors/Transforms/buildPushingToViewsChain.cpp index 174aaf67ec5..830f400faf2 100644 --- a/src/Processors/Transforms/buildPushingToViewsChain.cpp +++ b/src/Processors/Transforms/buildPushingToViewsChain.cpp @@ -620,9 +620,10 @@ void PushingToLiveViewSink::consume(Chunk chunk) { Progress local_progress(chunk.getNumRows(), chunk.bytes(), 0); StorageLiveView::writeIntoLiveView(live_view, getHeader().cloneWithColumns(chunk.detachColumns()), context); - auto * process = context->getProcessListElement(); - if (process) + + if (auto process = context->getProcessListElement()) process->updateProgressIn(local_progress); + ProfileEvents::increment(ProfileEvents::SelectedRows, local_progress.read_rows); ProfileEvents::increment(ProfileEvents::SelectedBytes, local_progress.read_bytes); } @@ -643,9 +644,10 @@ void PushingToWindowViewSink::consume(Chunk chunk) Progress local_progress(chunk.getNumRows(), chunk.bytes(), 0); StorageWindowView::writeIntoWindowView( window_view, getHeader().cloneWithColumns(chunk.detachColumns()), context); - auto * process = context->getProcessListElement(); - if (process) + + if (auto process = context->getProcessListElement()) process->updateProgressIn(local_progress); + ProfileEvents::increment(ProfileEvents::SelectedRows, local_progress.read_rows); ProfileEvents::increment(ProfileEvents::SelectedBytes, local_progress.read_bytes); } diff --git a/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp b/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp index b137eaf0f47..40718bd968a 100644 --- a/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp +++ b/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp @@ -23,11 +23,11 @@ TEST(Processors, PortsConnected) connect(source->getPort(), sink->getPort()); - Processors processors; - processors.emplace_back(std::move(source)); - processors.emplace_back(std::move(sink)); + auto processors = std::make_shared(); + processors->emplace_back(std::move(source)); + processors->emplace_back(std::move(sink)); - QueryStatus * element = nullptr; + QueryStatusPtr element; PipelineExecutor executor(processors, element); executor.execute(1); } @@ -46,14 +46,14 @@ TEST(Processors, PortsNotConnected) /// connect(source->getPort(), sink->getPort()); - Processors processors; - processors.emplace_back(std::move(source)); - processors.emplace_back(std::move(sink)); + auto processors = std::make_shared(); + processors->emplace_back(std::move(source)); + processors->emplace_back(std::move(sink)); #ifndef ABORT_ON_LOGICAL_ERROR try { - QueryStatus * element = nullptr; + QueryStatusPtr element; PipelineExecutor executor(processors, element); executor.execute(1); ASSERT_TRUE(false) << "Should have thrown."; diff --git a/src/QueryPipeline/BlockIO.cpp b/src/QueryPipeline/BlockIO.cpp index 35463ca6be9..9e42e06c722 100644 --- a/src/QueryPipeline/BlockIO.cpp +++ b/src/QueryPipeline/BlockIO.cpp @@ -53,9 +53,8 @@ void BlockIO::setAllDataSent() const /// - internal /// - SHOW PROCESSLIST if (process_list_entry) - (*process_list_entry)->setAllDataSent(); + process_list_entry->getQueryStatus()->setAllDataSent(); } } - diff --git a/src/QueryPipeline/BlockIO.h b/src/QueryPipeline/BlockIO.h index 1f2a8f6f033..b69f86ac684 100644 --- a/src/QueryPipeline/BlockIO.h +++ b/src/QueryPipeline/BlockIO.h @@ -34,9 +34,8 @@ struct BlockIO void onFinish() { if (finish_callback) - { finish_callback(pipeline); - } + pipeline.reset(); } diff --git a/src/QueryPipeline/Pipe.cpp b/src/QueryPipeline/Pipe.cpp index 291739079a2..62a928d814c 100644 --- a/src/QueryPipeline/Pipe.cpp +++ b/src/QueryPipeline/Pipe.cpp @@ -102,7 +102,12 @@ static OutputPort * uniteTotals(const OutputPortRawPtrs & ports, const Block & h return totals_port; } +Pipe::Pipe() : processors(std::make_shared()) +{ +} + Pipe::Pipe(ProcessorPtr source, OutputPort * output, OutputPort * totals, OutputPort * extremes) + : processors(std::make_shared()) { if (!source->getInputs().empty()) throw Exception( @@ -155,11 +160,12 @@ Pipe::Pipe(ProcessorPtr source, OutputPort * output, OutputPort * totals, Output totals_port = totals; extremes_port = extremes; output_ports.push_back(output); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); max_parallel_streams = 1; } Pipe::Pipe(ProcessorPtr source) + : processors(std::make_shared()) { checkSource(*source); @@ -168,18 +174,18 @@ Pipe::Pipe(ProcessorPtr source) output_ports.push_back(&source->getOutputs().front()); header = output_ports.front()->getHeader(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); max_parallel_streams = 1; } -Pipe::Pipe(Processors processors_) : processors(std::move(processors_)) +Pipe::Pipe(std::shared_ptr processors_) : processors(std::move(processors_)) { /// Create hash table with processors. std::unordered_set set; - for (const auto & processor : processors) + for (const auto & processor : *processors) set.emplace(processor.get()); - for (auto & processor : processors) + for (auto & processor : *processors) { for (const auto & port : processor->getInputs()) { @@ -225,7 +231,7 @@ Pipe::Pipe(Processors processors_) : processors(std::move(processors_)) max_parallel_streams = output_ports.size(); if (collected_processors) - for (const auto & processor : processors) + for (const auto & processor : *processors) collected_processors->emplace_back(processor); } @@ -311,7 +317,7 @@ Pipe Pipe::unitePipes(Pipes pipes, Processors * collected_processors, bool allow if (!allow_empty_header || pipe.header) assertCompatibleHeader(pipe.header, res.header, "Pipe::unitePipes"); - res.processors.insert(res.processors.end(), pipe.processors.begin(), pipe.processors.end()); + res.processors->insert(res.processors->end(), pipe.processors->begin(), pipe.processors->end()); res.output_ports.insert(res.output_ports.end(), pipe.output_ports.begin(), pipe.output_ports.end()); res.max_parallel_streams += pipe.max_parallel_streams; @@ -323,15 +329,15 @@ Pipe Pipe::unitePipes(Pipes pipes, Processors * collected_processors, bool allow extremes.emplace_back(pipe.extremes_port); } - size_t num_processors = res.processors.size(); + size_t num_processors = res.processors->size(); - res.totals_port = uniteTotals(totals, res.header, res.processors); - res.extremes_port = uniteExtremes(extremes, res.header, res.processors); + res.totals_port = uniteTotals(totals, res.header, *res.processors); + res.extremes_port = uniteExtremes(extremes, res.header, *res.processors); if (res.collected_processors) { - for (; num_processors < res.processors.size(); ++num_processors) - res.collected_processors->emplace_back(res.processors[num_processors]); + for (; num_processors < res.processors->size(); ++num_processors) + res.collected_processors->emplace_back(res.processors->at(num_processors)); } return res; @@ -351,7 +357,7 @@ void Pipe::addSource(ProcessorPtr source) collected_processors->emplace_back(source); output_ports.push_back(&source->getOutputs().front()); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); max_parallel_streams = std::max(max_parallel_streams, output_ports.size()); } @@ -373,7 +379,7 @@ void Pipe::addTotalsSource(ProcessorPtr source) collected_processors->emplace_back(source); totals_port = &source->getOutputs().front(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } void Pipe::addExtremesSource(ProcessorPtr source) @@ -393,7 +399,7 @@ void Pipe::addExtremesSource(ProcessorPtr source) collected_processors->emplace_back(source); extremes_port = &source->getOutputs().front(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } static void dropPort(OutputPort *& port, Processors & processors, Processors * collected_processors) @@ -413,12 +419,12 @@ static void dropPort(OutputPort *& port, Processors & processors, Processors * c void Pipe::dropTotals() { - dropPort(totals_port, processors, collected_processors); + dropPort(totals_port, *processors, collected_processors); } void Pipe::dropExtremes() { - dropPort(extremes_port, processors, collected_processors); + dropPort(extremes_port, *processors, collected_processors); } void Pipe::addTransform(ProcessorPtr transform) @@ -504,7 +510,7 @@ void Pipe::addTransform(ProcessorPtr transform, OutputPort * totals, OutputPort if (collected_processors) collected_processors->emplace_back(transform); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); max_parallel_streams = std::max(max_parallel_streams, output_ports.size()); } @@ -595,7 +601,7 @@ void Pipe::addTransform(ProcessorPtr transform, InputPort * totals, InputPort * if (collected_processors) collected_processors->emplace_back(transform); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); max_parallel_streams = std::max(max_parallel_streams, output_ports.size()); } @@ -647,7 +653,7 @@ void Pipe::addSimpleTransform(const ProcessorGetterWithStreamKind & getter) if (collected_processors) collected_processors->emplace_back(transform); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); } }; @@ -698,7 +704,7 @@ void Pipe::addChains(std::vector chains) if (collected_processors) collected_processors->emplace_back(transform); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); } } @@ -757,7 +763,7 @@ void Pipe::setSinks(const Pipe::ProcessorGetterWithStreamKind & getter) transform = std::make_shared(stream->getHeader()); connect(*stream, transform->getInputs().front()); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); }; for (auto & port : output_ports) @@ -858,7 +864,7 @@ void Pipe::transform(const Transformer & transformer, bool check_ports) collected_processors->emplace_back(processor); } - processors.insert(processors.end(), new_processors.begin(), new_processors.end()); + processors->insert(processors->end(), new_processors.begin(), new_processors.end()); max_parallel_streams = std::max(max_parallel_streams, output_ports.size()); } diff --git a/src/QueryPipeline/Pipe.h b/src/QueryPipeline/Pipe.h index 79d19a18193..7e30d9c990e 100644 --- a/src/QueryPipeline/Pipe.h +++ b/src/QueryPipeline/Pipe.h @@ -5,6 +5,7 @@ #include #include + namespace DB { @@ -27,13 +28,13 @@ class Pipe public: /// Default constructor creates empty pipe. Generally, you cannot do anything with it except to check it is empty(). /// You cannot get empty pipe in any other way. All transforms check that result pipe is not empty. - Pipe() = default; + Pipe(); /// Create from source. Source must have no input ports and single output. explicit Pipe(ProcessorPtr source); /// Create from source with specified totals end extremes (may be nullptr). Ports should be owned by source. explicit Pipe(ProcessorPtr source, OutputPort * output, OutputPort * totals, OutputPort * extremes); /// Create from processors. Use all not-connected output ports as output_ports. Check invariants. - explicit Pipe(Processors processors_); + explicit Pipe(std::shared_ptr processors_); Pipe(const Pipe & other) = delete; Pipe(Pipe && other) = default; @@ -41,7 +42,7 @@ public: Pipe & operator=(Pipe && other) = default; const Block & getHeader() const { return header; } - bool empty() const { return processors.empty(); } + bool empty() const { return processors->empty(); } size_t numOutputPorts() const { return output_ports.size(); } size_t maxParallelStreams() const { return max_parallel_streams; } OutputPort * getOutputPort(size_t pos) const { return output_ports[pos]; } @@ -96,15 +97,15 @@ public: /// Unite several pipes together. They should have same header. static Pipe unitePipes(Pipes pipes); - /// Get processors from Pipe. Use it with cautious, it is easy to loss totals and extremes ports. - static Processors detachProcessors(Pipe pipe) { return std::move(pipe.processors); } + /// Get processors from Pipe. Use it with caution, it is easy to lose totals and extremes ports. + static Processors detachProcessors(Pipe pipe) { return *std::move(pipe.processors); } /// Get processors from Pipe without destroying pipe (used for EXPLAIN to keep QueryPlan). - const Processors & getProcessors() const { return processors; } + const Processors & getProcessors() const { return *processors; } private: /// Header is common for all output below. Block header; - Processors processors; + std::shared_ptr processors; /// Output ports. Totals and extremes are allowed to be empty. OutputPortRawPtrs output_ports; diff --git a/src/QueryPipeline/PipelineResourcesHolder.h b/src/QueryPipeline/PipelineResourcesHolder.h index 46b1024f384..ed9eb68b7ba 100644 --- a/src/QueryPipeline/PipelineResourcesHolder.h +++ b/src/QueryPipeline/PipelineResourcesHolder.h @@ -19,8 +19,9 @@ struct QueryPlanResourceHolder QueryPlanResourceHolder(); QueryPlanResourceHolder(QueryPlanResourceHolder &&) noexcept; ~QueryPlanResourceHolder(); + /// Custom move assignment does not destroy data from lhs. It appends data from rhs to lhs. - QueryPlanResourceHolder& operator=(QueryPlanResourceHolder &&) noexcept; + QueryPlanResourceHolder & operator=(QueryPlanResourceHolder &&) noexcept; /// Some processors may implicitly use Context or temporary Storage created by Interpreter. /// But lifetime of Streams is not nested in lifetime of Interpreters, so we have to store it here, diff --git a/src/QueryPipeline/QueryPipeline.cpp b/src/QueryPipeline/QueryPipeline.cpp index 31b18c7f7f0..e0da4c4f0eb 100644 --- a/src/QueryPipeline/QueryPipeline.cpp +++ b/src/QueryPipeline/QueryPipeline.cpp @@ -21,6 +21,7 @@ #include #include + namespace DB { @@ -29,7 +30,11 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -QueryPipeline::QueryPipeline() = default; +QueryPipeline::QueryPipeline() + : processors(std::make_shared()) +{ +} + QueryPipeline::QueryPipeline(QueryPipeline &&) noexcept = default; QueryPipeline & QueryPipeline::operator=(QueryPipeline &&) noexcept = default; QueryPipeline::~QueryPipeline() = default; @@ -210,16 +215,16 @@ static void initRowsBeforeLimit(IOutputFormat * output_format) QueryPipeline::QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_) + std::shared_ptr processors_) : resources(std::move(resources_)) , processors(std::move(processors_)) { - checkCompleted(processors); + checkCompleted(*processors); } QueryPipeline::QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_, + std::shared_ptr processors_, InputPort * input_) : resources(std::move(resources_)) , processors(std::move(processors_)) @@ -231,7 +236,7 @@ QueryPipeline::QueryPipeline( "Cannot create pushing QueryPipeline because its input port is connected or null"); bool found_input = false; - for (const auto & processor : processors) + for (const auto & processor : *processors) { for (const auto & in : processor->getInputs()) { @@ -255,7 +260,7 @@ QueryPipeline::QueryPipeline(std::shared_ptr source) : QueryPipeline(Pi QueryPipeline::QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_, + std::shared_ptr processors_, OutputPort * output_, OutputPort * totals_, OutputPort * extremes_) @@ -265,7 +270,7 @@ QueryPipeline::QueryPipeline( , totals(totals_) , extremes(extremes_) { - checkPulling(processors, output, totals, extremes); + checkPulling(*processors, output, totals, extremes); } QueryPipeline::QueryPipeline(Pipe pipe) @@ -278,32 +283,34 @@ QueryPipeline::QueryPipeline(Pipe pipe) extremes = pipe.getExtremesPort(); processors = std::move(pipe.processors); - checkPulling(processors, output, totals, extremes); + checkPulling(*processors, output, totals, extremes); } else { processors = std::move(pipe.processors); - checkCompleted(processors); + checkCompleted(*processors); } } QueryPipeline::QueryPipeline(Chain chain) : resources(chain.detachResources()) + , processors(std::make_shared()) , input(&chain.getInputPort()) , num_threads(chain.getNumThreads()) { - processors.reserve(chain.getProcessors().size() + 1); + processors->reserve(chain.getProcessors().size() + 1); for (auto processor : chain.getProcessors()) - processors.emplace_back(std::move(processor)); + processors->emplace_back(std::move(processor)); auto sink = std::make_shared(chain.getOutputPort().getHeader()); connect(chain.getOutputPort(), sink->getPort()); - processors.emplace_back(std::move(sink)); + processors->emplace_back(std::move(sink)); input = &chain.getInputPort(); } QueryPipeline::QueryPipeline(std::shared_ptr format) + : processors(std::make_shared()) { auto & format_main = format->getPort(IOutputFormat::PortKind::Main); auto & format_totals = format->getPort(IOutputFormat::PortKind::Totals); @@ -313,14 +320,14 @@ QueryPipeline::QueryPipeline(std::shared_ptr format) { auto source = std::make_shared(format_totals.getHeader()); totals = &source->getPort(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } if (!extremes) { auto source = std::make_shared(format_extremes.getHeader()); extremes = &source->getPort(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } connect(*totals, format_totals); @@ -332,7 +339,7 @@ QueryPipeline::QueryPipeline(std::shared_ptr format) output_format = format.get(); - processors.emplace_back(std::move(format)); + processors->emplace_back(std::move(format)); } static void drop(OutputPort *& port, Processors & processors) @@ -354,11 +361,11 @@ void QueryPipeline::complete(std::shared_ptr sink) if (!pulling()) throw Exception(ErrorCodes::LOGICAL_ERROR, "Pipeline must be pulling to be completed with sink"); - drop(totals, processors); - drop(extremes, processors); + drop(totals, *processors); + drop(extremes, *processors); connect(*output, sink->getPort()); - processors.emplace_back(std::move(sink)); + processors->emplace_back(std::move(sink)); output = nullptr; } @@ -369,17 +376,17 @@ void QueryPipeline::complete(Chain chain) resources = chain.detachResources(); - drop(totals, processors); - drop(extremes, processors); + drop(totals, *processors); + drop(extremes, *processors); - processors.reserve(processors.size() + chain.getProcessors().size() + 1); + processors->reserve(processors->size() + chain.getProcessors().size() + 1); for (auto processor : chain.getProcessors()) - processors.emplace_back(std::move(processor)); + processors->emplace_back(std::move(processor)); auto sink = std::make_shared(chain.getOutputPort().getHeader()); connect(*output, chain.getInputPort()); connect(chain.getOutputPort(), sink->getPort()); - processors.emplace_back(std::move(sink)); + processors->emplace_back(std::move(sink)); output = nullptr; } @@ -400,7 +407,7 @@ void QueryPipeline::complete(Pipe pipe) input = nullptr; auto pipe_processors = Pipe::detachProcessors(std::move(pipe)); - processors.insert(processors.end(), pipe_processors.begin(), pipe_processors.end()); + processors->insert(processors->end(), pipe_processors.begin(), pipe_processors.end()); } static void addMaterializing(OutputPort *& output, Processors & processors) @@ -421,9 +428,9 @@ void QueryPipeline::complete(std::shared_ptr format) if (format->expectMaterializedColumns()) { - addMaterializing(output, processors); - addMaterializing(totals, processors); - addMaterializing(extremes, processors); + addMaterializing(output, *processors); + addMaterializing(totals, *processors); + addMaterializing(extremes, *processors); } auto & format_main = format->getPort(IOutputFormat::PortKind::Main); @@ -434,14 +441,14 @@ void QueryPipeline::complete(std::shared_ptr format) { auto source = std::make_shared(format_totals.getHeader()); totals = &source->getPort(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } if (!extremes) { auto source = std::make_shared(format_extremes.getHeader()); extremes = &source->getPort(); - processors.emplace_back(std::move(source)); + processors->emplace_back(std::move(source)); } connect(*output, format_main); @@ -455,7 +462,7 @@ void QueryPipeline::complete(std::shared_ptr format) initRowsBeforeLimit(format.get()); output_format = format.get(); - processors.emplace_back(std::move(format)); + processors->emplace_back(std::move(format)); } Block QueryPipeline::getHeader() const @@ -475,7 +482,7 @@ void QueryPipeline::setProgressCallback(const ProgressCallback & callback) progress_callback = callback; } -void QueryPipeline::setProcessListElement(QueryStatus * elem) +void QueryPipeline::setProcessListElement(QueryStatusPtr elem) { process_list_element = elem; @@ -504,7 +511,7 @@ void QueryPipeline::setLimitsAndQuota(const StreamLocalLimits & limits, std::sha transform->setQuota(quota_); connect(*output, transform->getInputPort()); output = &transform->getOutputPort(); - processors.emplace_back(std::move(transform)); + processors->emplace_back(std::move(transform)); } @@ -529,7 +536,7 @@ void QueryPipeline::addCompletedPipeline(QueryPipeline other) throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot add not completed pipeline"); resources = std::move(other.resources); - processors.insert(processors.end(), other.processors.begin(), other.processors.end()); + processors->insert(processors->end(), other.processors->begin(), other.processors->end()); } void QueryPipeline::reset() @@ -560,9 +567,9 @@ void QueryPipeline::convertStructureTo(const ColumnsWithTypeAndName & columns) ActionsDAG::MatchColumnsMode::Position); auto actions = std::make_shared(std::move(converting)); - addExpression(output, actions, processors); - addExpression(totals, actions, processors); - addExpression(extremes, actions, processors); + addExpression(output, actions, *processors); + addExpression(totals, actions, *processors); + addExpression(extremes, actions, *processors); } std::unique_ptr QueryPipeline::getReadProgressCallback() const diff --git a/src/QueryPipeline/QueryPipeline.h b/src/QueryPipeline/QueryPipeline.h index 1b88ede3349..63f444e6ec1 100644 --- a/src/QueryPipeline/QueryPipeline.h +++ b/src/QueryPipeline/QueryPipeline.h @@ -4,6 +4,7 @@ #include #include + namespace DB { @@ -15,6 +16,7 @@ using ProcessorPtr = std::shared_ptr; using Processors = std::vector; class QueryStatus; +using QueryStatusPtr = std::shared_ptr; struct Progress; using ProgressCallback = std::function; @@ -34,6 +36,7 @@ class ReadProgressCallback; struct ColumnWithTypeAndName; using ColumnsWithTypeAndName = std::vector; + class QueryPipeline { public: @@ -58,23 +61,23 @@ public: /// completed QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_); + std::shared_ptr processors_); /// pushing QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_, + std::shared_ptr processors_, InputPort * input_); /// pulling QueryPipeline( QueryPlanResourceHolder resources_, - Processors processors_, + std::shared_ptr processors_, OutputPort * output_, OutputPort * totals_ = nullptr, OutputPort * extremes_ = nullptr); - bool initialized() const { return !processors.empty(); } + bool initialized() const { return !processors->empty(); } /// When initialized, exactly one of the following is true. /// Use PullingPipelineExecutor or PullingAsyncPipelineExecutor. bool pulling() const { return output != nullptr; } @@ -97,7 +100,7 @@ public: size_t getNumThreads() const { return num_threads; } void setNumThreads(size_t num_threads_) { num_threads = num_threads_; } - void setProcessListElement(QueryStatus * elem); + void setProcessListElement(QueryStatusPtr elem); void setProgressCallback(const ProgressCallback & callback); void setLimitsAndQuota(const StreamLocalLimits & limits, std::shared_ptr quota_); bool tryGetResultRowsAndBytes(UInt64 & result_rows, UInt64 & result_bytes) const; @@ -119,7 +122,7 @@ public: /// Add processors and resources from other pipeline. Other pipeline should be completed. void addCompletedPipeline(QueryPipeline other); - const Processors & getProcessors() const { return processors; } + const Processors & getProcessors() const { return *processors; } /// For pulling pipeline, convert structure to expected. /// Trash, need to remove later. @@ -134,7 +137,7 @@ private: std::shared_ptr quota; bool update_profile_events = true; - Processors processors; + std::shared_ptr processors; InputPort * input = nullptr; @@ -142,7 +145,7 @@ private: OutputPort * totals = nullptr; OutputPort * extremes = nullptr; - QueryStatus * process_list_element = nullptr; + QueryStatusPtr process_list_element; IOutputFormat * output_format = nullptr; diff --git a/src/QueryPipeline/QueryPipelineBuilder.cpp b/src/QueryPipeline/QueryPipelineBuilder.cpp index 440f123e876..812bd155b42 100644 --- a/src/QueryPipeline/QueryPipelineBuilder.cpp +++ b/src/QueryPipeline/QueryPipelineBuilder.cpp @@ -327,9 +327,9 @@ QueryPipelineBuilderPtr QueryPipelineBuilder::mergePipelines( collected_processors->emplace_back(transform); left->pipe.output_ports.front() = &transform->getOutputs().front(); - left->pipe.processors.emplace_back(transform); + left->pipe.processors->emplace_back(transform); - left->pipe.processors.insert(left->pipe.processors.end(), right->pipe.processors.begin(), right->pipe.processors.end()); + left->pipe.processors->insert(left->pipe.processors->end(), right->pipe.processors->begin(), right->pipe.processors->end()); left->pipe.header = left->pipe.output_ports.front()->getHeader(); left->pipe.max_parallel_streams = std::max(left->pipe.max_parallel_streams, right->pipe.max_parallel_streams); return left; @@ -383,7 +383,7 @@ std::unique_ptr QueryPipelineBuilder::joinPipelinesRightLe /// Collect the NEW processors for the right pipeline. QueryPipelineProcessorsCollector collector(*right); /// Remember the last step of the right pipeline. - ExpressionStep* step = typeid_cast(right->pipe.processors.back()->getQueryPlanStep()); + ExpressionStep* step = typeid_cast(right->pipe.processors->back()->getQueryPlanStep()); if (!step) { throw Exception(ErrorCodes::LOGICAL_ERROR, "The top step of the right pipeline should be ExpressionStep"); @@ -467,7 +467,7 @@ std::unique_ptr QueryPipelineBuilder::joinPipelinesRightLe if (collected_processors) collected_processors->emplace_back(joining); - left->pipe.processors.emplace_back(std::move(joining)); + left->pipe.processors->emplace_back(std::move(joining)); } if (left->hasTotals()) @@ -482,14 +482,14 @@ std::unique_ptr QueryPipelineBuilder::joinPipelinesRightLe if (collected_processors) collected_processors->emplace_back(joining); - left->pipe.processors.emplace_back(std::move(joining)); + left->pipe.processors->emplace_back(std::move(joining)); } /// Move the collected processors to the last step in the right pipeline. Processors processors = collector.detachProcessors(); step->appendExtraProcessors(processors); - left->pipe.processors.insert(left->pipe.processors.end(), right->pipe.processors.begin(), right->pipe.processors.end()); + left->pipe.processors->insert(left->pipe.processors->end(), right->pipe.processors->begin(), right->pipe.processors->end()); left->resources = std::move(right->resources); left->pipe.header = left->pipe.output_ports.front()->getHeader(); left->pipe.max_parallel_streams = std::max(left->pipe.max_parallel_streams, right->pipe.max_parallel_streams); @@ -537,7 +537,7 @@ void QueryPipelineBuilder::addPipelineBefore(QueryPipelineBuilder pipeline) addTransform(std::move(processor)); } -void QueryPipelineBuilder::setProcessListElement(QueryStatus * elem) +void QueryPipelineBuilder::setProcessListElement(QueryStatusPtr elem) { process_list_element = elem; } diff --git a/src/QueryPipeline/QueryPipelineBuilder.h b/src/QueryPipeline/QueryPipelineBuilder.h index 13b4d681b7d..5a0694100eb 100644 --- a/src/QueryPipeline/QueryPipelineBuilder.h +++ b/src/QueryPipeline/QueryPipelineBuilder.h @@ -148,7 +148,7 @@ public: const Block & getHeader() const { return pipe.getHeader(); } - void setProcessListElement(QueryStatus * elem); + void setProcessListElement(QueryStatusPtr elem); void setProgressCallback(ProgressCallback callback); /// Recommend number of threads for pipeline execution. @@ -189,7 +189,7 @@ private: /// Sometimes, more streams are created then the number of threads for more optimal execution. size_t max_threads = 0; - QueryStatus * process_list_element = nullptr; + QueryStatusPtr process_list_element; ProgressCallback progress_callback = nullptr; void checkInitialized(); diff --git a/src/QueryPipeline/ReadProgressCallback.cpp b/src/QueryPipeline/ReadProgressCallback.cpp index bbdabb8e8d8..6692b0f96bd 100644 --- a/src/QueryPipeline/ReadProgressCallback.cpp +++ b/src/QueryPipeline/ReadProgressCallback.cpp @@ -2,6 +2,7 @@ #include #include + namespace ProfileEvents { extern const Event SelectedRows; @@ -17,7 +18,7 @@ namespace ErrorCodes extern const int TOO_MANY_BYTES; } -void ReadProgressCallback::setProcessListElement(QueryStatus * elem) +void ReadProgressCallback::setProcessListElement(QueryStatusPtr elem) { process_list_elem = elem; if (!elem) diff --git a/src/QueryPipeline/ReadProgressCallback.h b/src/QueryPipeline/ReadProgressCallback.h index f64123ef39d..c8f0d4cf537 100644 --- a/src/QueryPipeline/ReadProgressCallback.h +++ b/src/QueryPipeline/ReadProgressCallback.h @@ -4,20 +4,23 @@ #include #include + namespace DB { class QueryStatus; +using QueryStatusPtr = std::shared_ptr; class EnabledQuota; struct StorageLimits; using StorageLimitsList = std::list; + class ReadProgressCallback { public: void setQuota(const std::shared_ptr & quota_) { quota = quota_; } - void setProcessListElement(QueryStatus * elem); + void setProcessListElement(QueryStatusPtr elem); void setProgressCallback(const ProgressCallback & callback) { progress_callback = callback; } void addTotalRowsApprox(size_t value) { total_rows_approx += value; } @@ -30,7 +33,7 @@ public: private: std::shared_ptr quota; ProgressCallback progress_callback; - QueryStatus * process_list_elem = nullptr; + QueryStatusPtr process_list_elem; /// The approximate total number of rows to read. For progress bar. std::atomic_size_t total_rows_approx = 0; diff --git a/src/Server/TCPHandler.cpp b/src/Server/TCPHandler.cpp index 25a832ab7e3..962d5412b48 100644 --- a/src/Server/TCPHandler.cpp +++ b/src/Server/TCPHandler.cpp @@ -377,8 +377,8 @@ void TCPHandler::runImpl() after_send_progress.restart(); if (state.io.pipeline.pushing()) - /// FIXME: check explicitly that insert query suggests to receive data via native protocol, { + /// FIXME: check explicitly that insert query suggests to receive data via native protocol, state.need_receive_data_for_insert = true; processInsertQuery(); state.io.onFinish(); @@ -390,27 +390,30 @@ void TCPHandler::runImpl() } else if (state.io.pipeline.completed()) { - CompletedPipelineExecutor executor(state.io.pipeline); - /// Should not check for cancel in case of input. - if (!state.need_receive_data_for_input) { - auto callback = [this]() + CompletedPipelineExecutor executor(state.io.pipeline); + + /// Should not check for cancel in case of input. + if (!state.need_receive_data_for_input) { - std::lock_guard lock(fatal_error_mutex); + auto callback = [this]() + { + std::lock_guard lock(fatal_error_mutex); - if (isQueryCancelled()) - return true; + if (isQueryCancelled()) + return true; - sendProgress(); - sendSelectProfileEvents(); - sendLogs(); + sendProgress(); + sendSelectProfileEvents(); + sendLogs(); - return false; - }; + return false; + }; - executor.setCancelCallback(callback, interactive_delay / 1000); + executor.setCancelCallback(callback, interactive_delay / 1000); + } + executor.execute(); } - executor.execute(); state.io.onFinish(); /// Send final progress after calling onFinish(), since it will update the progress. diff --git a/src/Storages/ExternalDataSourceConfiguration.h b/src/Storages/ExternalDataSourceConfiguration.h index 0890247eb45..5736336983a 100644 --- a/src/Storages/ExternalDataSourceConfiguration.h +++ b/src/Storages/ExternalDataSourceConfiguration.h @@ -117,7 +117,7 @@ struct URLBasedDataSourceConfiguration struct StorageS3Configuration : URLBasedDataSourceConfiguration { - S3Settings::AuthSettings auth_settings; + S3::AuthSettings auth_settings; S3Settings::ReadWriteSettings rw_settings; }; diff --git a/src/Storages/MergeTree/DataPartStorageOnDisk.cpp b/src/Storages/MergeTree/DataPartStorageOnDisk.cpp index e2a2f3f793f..efc7710f640 100644 --- a/src/Storages/MergeTree/DataPartStorageOnDisk.cpp +++ b/src/Storages/MergeTree/DataPartStorageOnDisk.cpp @@ -406,14 +406,18 @@ void DataPartStorageOnDisk::clearDirectory( } } -std::string DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const +std::optional DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const { + assert(!broken || detached); String res; auto full_relative_path = fs::path(root_path); if (detached) full_relative_path /= "detached"; + std::optional original_checksums_content; + std::optional original_files_list; + for (int try_no = 0; try_no < 10; ++try_no) { res = (prefix.empty() ? "" : prefix + "_") + part_dir + (try_no ? "_try" + DB::toString(try_no) : ""); @@ -421,12 +425,69 @@ std::string DataPartStorageOnDisk::getRelativePathForPrefix(Poco::Logger * log, if (!volume->getDisk()->exists(full_relative_path / res)) return res; + if (broken && looksLikeBrokenDetachedPartHasTheSameContent(res, original_checksums_content, original_files_list)) + { + LOG_WARNING(log, "Directory {} (to detach to) already exists, " + "but its content looks similar to content of the broken part which we are going to detach. " + "Assuming it was already cloned to detached, will not do it again to avoid redundant copies of broken part.", res); + return {}; + } + LOG_WARNING(log, "Directory {} (to detach to) already exists. Will detach to directory with '_tryN' suffix.", res); } return res; } +bool DataPartStorageOnDisk::looksLikeBrokenDetachedPartHasTheSameContent(const String & detached_part_path, + std::optional & original_checksums_content, + std::optional & original_files_list) const +{ + /// We cannot know for sure that content of detached part is the same, + /// but in most cases it's enough to compare checksums.txt and list of files. + + if (!exists("checksums.txt")) + return false; + + auto detached_full_path = fs::path(root_path) / "detached" / detached_part_path; + auto disk = volume->getDisk(); + if (!disk->exists(detached_full_path / "checksums.txt")) + return false; + + if (!original_checksums_content) + { + auto in = disk->readFile(detached_full_path / "checksums.txt", /* settings */ {}, /* read_hint */ {}, /* file_size */ {}); + original_checksums_content.emplace(); + readStringUntilEOF(*original_checksums_content, *in); + } + + if (original_checksums_content->empty()) + return false; + + auto part_full_path = fs::path(root_path) / part_dir; + String detached_checksums_content; + { + auto in = readFile("checksums.txt", /* settings */ {}, /* read_hint */ {}, /* file_size */ {}); + readStringUntilEOF(detached_checksums_content, *in); + } + + if (original_checksums_content != detached_checksums_content) + return false; + + if (!original_files_list) + { + original_files_list.emplace(); + disk->listFiles(part_full_path, *original_files_list); + std::sort(original_files_list->begin(), original_files_list->end()); + } + + Strings detached_files_list; + disk->listFiles(detached_full_path, detached_files_list); + std::sort(detached_files_list.begin(), detached_files_list.end()); + + return original_files_list == detached_files_list; +} + void DataPartStorageBuilderOnDisk::setRelativePath(const std::string & path) { part_dir = path; diff --git a/src/Storages/MergeTree/DataPartStorageOnDisk.h b/src/Storages/MergeTree/DataPartStorageOnDisk.h index adf1b78cdfb..d325049f056 100644 --- a/src/Storages/MergeTree/DataPartStorageOnDisk.h +++ b/src/Storages/MergeTree/DataPartStorageOnDisk.h @@ -52,7 +52,12 @@ public: MergeTreeDataPartState state, Poco::Logger * log) override; - std::string getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const override; + /// Returns path to place detached part in or nullopt if we don't need to detach part (if it already exists and has the same content) + std::optional getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const override; + + /// Returns true if detached part already exists and has the same content (compares checksums.txt and the list of files) + bool looksLikeBrokenDetachedPartHasTheSameContent(const String & detached_part_path, std::optional & original_checksums_content, + std::optional & original_files_list) const; void setRelativePath(const std::string & path) override; void onRename(const std::string & new_root_path, const std::string & new_part_dir) override; diff --git a/src/Storages/MergeTree/IDataPartStorage.h b/src/Storages/MergeTree/IDataPartStorage.h index 17af6dd2909..03627938348 100644 --- a/src/Storages/MergeTree/IDataPartStorage.h +++ b/src/Storages/MergeTree/IDataPartStorage.h @@ -129,7 +129,7 @@ public: /// Get a name like 'prefix_partdir_tryN' which does not exist in a root dir. /// TODO: remove it. - virtual std::string getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached) const = 0; + virtual std::optional getRelativePathForPrefix(Poco::Logger * log, const String & prefix, bool detached, bool broken) const = 0; /// Reset part directory, used for im-memory parts. /// TODO: remove it. diff --git a/src/Storages/MergeTree/IMergeTreeDataPart.cpp b/src/Storages/MergeTree/IMergeTreeDataPart.cpp index 5d2e755c1ab..cc9a14162f8 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPart.cpp +++ b/src/Storages/MergeTree/IMergeTreeDataPart.cpp @@ -1478,8 +1478,9 @@ void IMergeTreeDataPart::remove() const data_part_storage->remove(std::move(can_remove_callback), checksums, projection_checksums, is_temp, getState(), storage.log); } -String IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool detached) const +std::optional IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool detached, bool broken) const { + assert(!broken || detached); String res; /** If you need to detach a part, and directory into which we want to rename it already exists, @@ -1491,22 +1492,26 @@ String IMergeTreeDataPart::getRelativePathForPrefix(const String & prefix, bool if (detached && parent_part) throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot detach projection"); - return data_part_storage->getRelativePathForPrefix(storage.log, prefix, detached); + return data_part_storage->getRelativePathForPrefix(storage.log, prefix, detached, broken); } -String IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) const +std::optional IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix, bool broken) const { /// Do not allow underscores in the prefix because they are used as separators. assert(prefix.find_first_of('_') == String::npos); assert(prefix.empty() || std::find(DetachedPartInfo::DETACH_REASONS.begin(), DetachedPartInfo::DETACH_REASONS.end(), prefix) != DetachedPartInfo::DETACH_REASONS.end()); - return "detached/" + getRelativePathForPrefix(prefix, /* detached */ true); + if (auto path = getRelativePathForPrefix(prefix, /* detached */ true, broken)) + return "detached/" + *path; + return {}; } void IMergeTreeDataPart::renameToDetached(const String & prefix, DataPartStorageBuilderPtr builder) const { - renameTo(getRelativePathForDetachedPart(prefix), true, builder); + auto path_to_detach = getRelativePathForDetachedPart(prefix, /* broken */ false); + assert(path_to_detach); + renameTo(path_to_detach.value(), true, builder); part_is_probably_removed_from_disk = true; } @@ -1518,9 +1523,16 @@ void IMergeTreeDataPart::makeCloneInDetached(const String & prefix, const Storag /// because hardlinks tracking doesn't work for detached parts. bool copy_instead_of_hardlink = isStoredOnRemoteDiskWithZeroCopySupport() && storage.supportsReplication() && storage_settings->allow_remote_fs_zero_copy_replication; + /// Avoid unneeded duplicates of broken parts if we try to detach the same broken part multiple times. + /// Otherwise it may pollute detached/ with dirs with _tryN suffix and we will fail to remove broken part after 10 attempts. + bool broken = !prefix.empty(); + auto maybe_path_in_detached = getRelativePathForDetachedPart(prefix, broken); + if (!maybe_path_in_detached) + return; + data_part_storage->freeze( storage.relative_data_path, - getRelativePathForDetachedPart(prefix), + *maybe_path_in_detached, /*make_source_readonly*/ true, {}, copy_instead_of_hardlink, diff --git a/src/Storages/MergeTree/IMergeTreeDataPart.h b/src/Storages/MergeTree/IMergeTreeDataPart.h index 32afa2a482d..6f034574fb4 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPart.h +++ b/src/Storages/MergeTree/IMergeTreeDataPart.h @@ -347,7 +347,7 @@ public: /// Calculate column and secondary indices sizes on disk. void calculateColumnsAndSecondaryIndicesSizesOnDisk(); - String getRelativePathForPrefix(const String & prefix, bool detached = false) const; + std::optional getRelativePathForPrefix(const String & prefix, bool detached = false, bool broken = false) const; bool isProjectionPart() const { return parent_part != nullptr; } @@ -485,7 +485,7 @@ protected: /// disk using columns and checksums. virtual void calculateEachColumnSizes(ColumnSizeByName & each_columns_size, ColumnSize & total_size) const = 0; - String getRelativePathForDetachedPart(const String & prefix) const; + std::optional getRelativePathForDetachedPart(const String & prefix, bool broken) const; /// Checks that part can be actually removed from disk. /// In ordinary scenario always returns true, but in case of diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index 8957f134053..66950734d5f 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -3245,7 +3245,10 @@ void MergeTreeData::outdateBrokenPartAndCloneToDetached(const DataPartPtr & part LOG_INFO(log, "Cloning part {} to {}_{} and making it obsolete.", part_to_detach->data_part_storage->getPartDirectory(), prefix, part_to_detach->name); part_to_detach->makeCloneInDetached(prefix, metadata_snapshot); - removePartsFromWorkingSet(NO_TRANSACTION_RAW, {part_to_detach}, true); + + DataPartsLock lock = lockParts(); + if (part_to_detach->getState() == DataPartState::Active) + removePartsFromWorkingSet(NO_TRANSACTION_RAW, {part_to_detach}, true, &lock); } void MergeTreeData::forcefullyMovePartToDetachedAndRemoveFromMemory(const MergeTreeData::DataPartPtr & part_to_detach, const String & prefix, bool restore_covered) @@ -6250,7 +6253,7 @@ std::pair MergeTreeData::cloneAn if (auto src_part_in_memory = asInMemoryPart(src_part)) { auto flushed_part_path = src_part_in_memory->getRelativePathForPrefix(tmp_part_prefix); - src_part_storage = src_part_in_memory->flushToDisk(flushed_part_path, metadata_snapshot); + src_part_storage = src_part_in_memory->flushToDisk(*flushed_part_path, metadata_snapshot); } String with_copy; @@ -6434,7 +6437,7 @@ PartitionCommandsResultInfo MergeTreeData::freezePartitionsByMatcher( if (auto part_in_memory = asInMemoryPart(part)) { auto flushed_part_path = part_in_memory->getRelativePathForPrefix("tmp_freeze"); - data_part_storage = part_in_memory->flushToDisk(flushed_part_path, metadata_snapshot); + data_part_storage = part_in_memory->flushToDisk(*flushed_part_path, metadata_snapshot); } auto callback = [this, &part, &backup_part_path](const DiskPtr & disk) diff --git a/src/Storages/MergeTree/MergeTreeDataPartInMemory.cpp b/src/Storages/MergeTree/MergeTreeDataPartInMemory.cpp index c7c831c23ec..7a3c5f11c81 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartInMemory.cpp +++ b/src/Storages/MergeTree/MergeTreeDataPartInMemory.cpp @@ -142,7 +142,7 @@ DataPartStoragePtr MergeTreeDataPartInMemory::flushToDisk(const String & new_rel void MergeTreeDataPartInMemory::makeCloneInDetached(const String & prefix, const StorageMetadataPtr & metadata_snapshot) const { - String detached_path = getRelativePathForDetachedPart(prefix); + String detached_path = *getRelativePathForDetachedPart(prefix, /* broken */ false); flushToDisk(detached_path, metadata_snapshot); } diff --git a/src/Storages/MergeTree/MergeTreeIndexAnnoy.cpp b/src/Storages/MergeTree/MergeTreeIndexAnnoy.cpp index 3b16998337e..595e790ea3b 100644 --- a/src/Storages/MergeTree/MergeTreeIndexAnnoy.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexAnnoy.cpp @@ -9,6 +9,7 @@ #include #include #include +#include namespace DB @@ -64,9 +65,11 @@ uint64_t AnnoyIndex::getNumOfDimensions() const namespace ErrorCodes { - extern const int LOGICAL_ERROR; - extern const int INCORRECT_QUERY; + extern const int ILLEGAL_COLUMN; extern const int INCORRECT_DATA; + extern const int INCORRECT_NUMBER_OF_COLUMNS; + extern const int INCORRECT_QUERY; + extern const int LOGICAL_ERROR; } MergeTreeIndexGranuleAnnoy::MergeTreeIndexGranuleAnnoy(const String & index_name_, const Block & index_sample_block_) @@ -132,9 +135,7 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si return; if (index_sample_block.columns() > 1) - { throw Exception("Only one column is supported", ErrorCodes::LOGICAL_ERROR); - } auto index_column_name = index_sample_block.getByPosition(0).name; const auto & column_cut = block.getByName(index_column_name).column->cut(*pos, rows_read); @@ -144,27 +145,22 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si const auto & data = column_array->getData(); const auto & array = typeid_cast(data).getData(); if (array.empty()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "Array have 0 rows, but {} expected", rows_read); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Array has 0 rows, {} rows expected", rows_read); const auto & offsets = column_array->getOffsets(); size_t num_rows = offsets.size(); - /// All sizes are the same + /// Check all sizes are the same size_t size = offsets[0]; for (size_t i = 0; i < num_rows - 1; ++i) - { if (offsets[i + 1] - offsets[i] != size) - { throw Exception(ErrorCodes::INCORRECT_DATA, "Arrays should have same length"); - } - } + index = std::make_shared(size); index->add_item(index->get_n_items(), array.data()); /// add all rows from 1 to num_rows - 1 (this is the same as the beginning of the last element) for (size_t current_row = 1; current_row < num_rows; ++current_row) - { index->add_item(index->get_n_items(), &array[offsets[current_row - 1]]); - } } else { @@ -181,19 +177,13 @@ void MergeTreeIndexAggregatorAnnoy::update(const Block & block, size_t * pos, si { const auto& pod_array = typeid_cast(column.get())->getData(); for (size_t i = 0; i < pod_array.size(); ++i) - { data[i].push_back(pod_array[i]); - } } assert(!data.empty()); if (!index) - { index = std::make_shared(data[0].size()); - } for (const auto& item : data) - { index->add_item(index->get_n_items(), item.data()); - } } *pos += rows_read; @@ -222,7 +212,7 @@ std::vector MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex { UInt64 limit = condition.getLimit(); UInt64 index_granularity = condition.getIndexGranularity(); - std::optional comp_dist = condition.getQueryType() == ANN::ANNQueryInformation::Type::Where ? + std::optional comp_dist = condition.getQueryType() == ApproximateNearestNeighbour::ANNQueryInformation::Type::Where ? std::optional(condition.getComparisonDistanceForWhereQuery()) : std::nullopt; if (comp_dist && comp_dist.value() < 0) @@ -232,16 +222,13 @@ std::vector MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex auto granule = std::dynamic_pointer_cast(idx_granule); if (granule == nullptr) - { throw Exception("Granule has the wrong type", ErrorCodes::LOGICAL_ERROR); - } + auto annoy = granule->index; if (condition.getNumOfDimensions() != annoy->getNumOfDimensions()) - { throw Exception("The dimension of the space in the request (" + toString(condition.getNumOfDimensions()) + ") " + "does not match with the dimension in the index (" + toString(annoy->getNumOfDimensions()) + ")", ErrorCodes::INCORRECT_QUERY); - } /// neighbors contain indexes of dots which were closest to target vector std::vector neighbors; @@ -268,23 +255,25 @@ std::vector MergeTreeIndexConditionAnnoy::getUsefulRanges(MergeTreeIndex for (size_t i = 0; i < neighbors.size(); ++i) { if (comp_dist && distances[i] > comp_dist) - { continue; - } granule_numbers.insert(neighbors[i] / index_granularity); } std::vector result_vector; result_vector.reserve(granule_numbers.size()); for (auto granule_number : granule_numbers) - { result_vector.push_back(granule_number); - } return result_vector; } +MergeTreeIndexAnnoy::MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_) + : IMergeTreeIndex(index_) + , number_of_trees(number_of_trees_) +{ +} + MergeTreeIndexGranulePtr MergeTreeIndexAnnoy::createIndexGranule() const { return std::make_shared(index.name, index.sample_block); @@ -307,6 +296,40 @@ MergeTreeIndexPtr annoyIndexCreator(const IndexDescription & index) return std::make_shared(index, param); } +static void assertIndexColumnsType(const Block & header) +{ + DataTypePtr column_data_type_ptr = header.getDataTypes()[0]; + + if (const auto * array_type = typeid_cast(column_data_type_ptr.get())) + { + TypeIndex nested_type_index = array_type->getNestedType()->getTypeId(); + if (!WhichDataType(nested_type_index).isFloat32()) + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, + "Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.", + column_data_type_ptr->getName()); + } + else if (const auto * tuple_type = typeid_cast(column_data_type_ptr.get())) + { + const DataTypes & nested_types = tuple_type->getElements(); + for (const auto & type : nested_types) + { + TypeIndex nested_type_index = type->getTypeId(); + if (!WhichDataType(nested_type_index).isFloat32()) + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, + "Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.", + column_data_type_ptr->getName()); + } + } + else + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, + "Unexpected type {} of Annoy index. Only Array(Float32) and Tuple(Float32) are supported.", + column_data_type_ptr->getName()); + +} + void annoyIndexValidator(const IndexDescription & index, bool /* attach */) { if (index.arguments.size() != 1) @@ -317,6 +340,11 @@ void annoyIndexValidator(const IndexDescription & index, bool /* attach */) { throw Exception("Annoy index argument must be UInt64.", ErrorCodes::INCORRECT_QUERY); } + + if (index.column_names.size() != 1 || index.data_types.size() != 1) + throw Exception("Annoy indexes must be created on a single column", ErrorCodes::INCORRECT_NUMBER_OF_COLUMNS); + + assertIndexColumnsType(index.sample_block); } } diff --git a/src/Storages/MergeTree/MergeTreeIndexAnnoy.h b/src/Storages/MergeTree/MergeTreeIndexAnnoy.h index 85bbb0a1bd2..6a844947bd2 100644 --- a/src/Storages/MergeTree/MergeTreeIndexAnnoy.h +++ b/src/Storages/MergeTree/MergeTreeIndexAnnoy.h @@ -10,8 +10,6 @@ namespace DB { -namespace ANN = ApproximateNearestNeighbour; - // auxiliary namespace for working with spotify-annoy library // mainly for serialization and deserialization of the index namespace ApproximateNearestNeighbour @@ -33,7 +31,7 @@ namespace ApproximateNearestNeighbour struct MergeTreeIndexGranuleAnnoy final : public IMergeTreeIndexGranule { - using AnnoyIndex = ANN::AnnoyIndex<>; + using AnnoyIndex = ApproximateNearestNeighbour::AnnoyIndex<>; using AnnoyIndexPtr = std::shared_ptr; MergeTreeIndexGranuleAnnoy(const String & index_name_, const Block & index_sample_block_); @@ -57,7 +55,7 @@ struct MergeTreeIndexGranuleAnnoy final : public IMergeTreeIndexGranule struct MergeTreeIndexAggregatorAnnoy final : IMergeTreeIndexAggregator { - using AnnoyIndex = ANN::AnnoyIndex<>; + using AnnoyIndex = ApproximateNearestNeighbour::AnnoyIndex<>; using AnnoyIndexPtr = std::shared_ptr; MergeTreeIndexAggregatorAnnoy(const String & index_name_, const Block & index_sample_block, uint64_t number_of_trees); @@ -74,7 +72,7 @@ struct MergeTreeIndexAggregatorAnnoy final : IMergeTreeIndexAggregator }; -class MergeTreeIndexConditionAnnoy final : public ANN::IMergeTreeIndexConditionAnn +class MergeTreeIndexConditionAnnoy final : public ApproximateNearestNeighbour::IMergeTreeIndexConditionAnn { public: MergeTreeIndexConditionAnnoy( @@ -91,18 +89,14 @@ public: ~MergeTreeIndexConditionAnnoy() override = default; private: - ANN::ANNCondition condition; + ApproximateNearestNeighbour::ANNCondition condition; }; class MergeTreeIndexAnnoy : public IMergeTreeIndex { public: - MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_) - : IMergeTreeIndex(index_) - , number_of_trees(number_of_trees_) - {} - + MergeTreeIndexAnnoy(const IndexDescription & index_, uint64_t number_of_trees_); ~MergeTreeIndexAnnoy() override = default; MergeTreeIndexGranulePtr createIndexGranule() const override; diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp index 3936ee61b70..7993840f1d9 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp @@ -419,14 +419,14 @@ void ReplicatedMergeTreeCleanupThread::getBlocksSortedByTime(zkutil::ZooKeeper & LOG_TRACE(log, "Checking {} blocks ({} are not cached){}", stat.numChildren, not_cached_blocks, " to clear old ones from ZooKeeper."); } - zkutil::AsyncResponses exists_futures; + std::vector exists_paths; for (const String & block : blocks) { auto it = cached_block_stats.find(block); if (it == cached_block_stats.end()) { /// New block. Fetch its stat asynchronously. - exists_futures.emplace_back(block, zookeeper.asyncExists(storage.zookeeper_path + "/blocks/" + block)); + exists_paths.emplace_back(storage.zookeeper_path + "/blocks/" + block); } else { @@ -436,14 +436,18 @@ void ReplicatedMergeTreeCleanupThread::getBlocksSortedByTime(zkutil::ZooKeeper & } } + auto exists_size = exists_paths.size(); + auto exists_results = zookeeper.exists(exists_paths); + /// Put fetched stats into the cache - for (auto & elem : exists_futures) + for (size_t i = 0; i < exists_size; ++i) { - auto status = elem.second.get(); + auto status = exists_results[i]; if (status.error != Coordination::Error::ZNONODE) { - cached_block_stats.emplace(elem.first, std::make_pair(status.stat.ctime, status.stat.version)); - timed_blocks.emplace_back(elem.first, status.stat.ctime, status.stat.version); + auto node_name = fs::path(exists_paths[i]).filename(); + cached_block_stats.emplace(node_name, std::make_pair(status.stat.ctime, status.stat.version)); + timed_blocks.emplace_back(node_name, status.stat.ctime, status.stat.version); } } diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp index 0305ce440f9..6ffcde161da 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp @@ -41,7 +41,7 @@ ReplicatedMergeTreeQueue::ReplicatedMergeTreeQueue(StorageReplicatedMergeTree & void ReplicatedMergeTreeQueue::clear() { auto locks = lockQueue(); - assert(future_parts.empty()); + chassert(future_parts.empty()); current_parts.clear(); virtual_parts.clear(); queue.clear(); @@ -62,6 +62,7 @@ void ReplicatedMergeTreeQueue::setBrokenPartsToEnqueueFetchesOnLoading(Strings & void ReplicatedMergeTreeQueue::initialize(zkutil::ZooKeeperPtr zookeeper) { + clear(); std::lock_guard lock(state_mutex); LOG_TRACE(log, "Initializing parts in queue"); @@ -153,17 +154,19 @@ bool ReplicatedMergeTreeQueue::load(zkutil::ZooKeeperPtr zookeeper) ::sort(children.begin(), children.end()); - zkutil::AsyncResponses futures; - futures.reserve(children.size()); + auto children_num = children.size(); + std::vector paths; + paths.reserve(children_num); for (const String & child : children) - futures.emplace_back(child, zookeeper->asyncGet(fs::path(queue_path) / child)); + paths.emplace_back(fs::path(queue_path) / child); - for (auto & future : futures) + auto results = zookeeper->get(paths); + for (size_t i = 0; i < children_num; ++i) { - Coordination::GetResponse res = future.second.get(); + auto res = results[i]; LogEntryPtr entry = LogEntry::parse(res.data, res.stat); - entry->znode_name = future.first; + entry->znode_name = children[i]; std::lock_guard lock(state_mutex); @@ -641,11 +644,11 @@ int32_t ReplicatedMergeTreeQueue::pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper LOG_DEBUG(log, "Pulling {} entries to queue: {} - {}", (end - begin), *begin, *last); - zkutil::AsyncResponses futures; - futures.reserve(end - begin); + Strings get_paths; + get_paths.reserve(end - begin); for (auto it = begin; it != end; ++it) - futures.emplace_back(*it, zookeeper->asyncGet(fs::path(zookeeper_path) / "log" / *it)); + get_paths.emplace_back(fs::path(zookeeper_path) / "log" / *it); /// Simultaneously add all new entries to the queue and move the pointer to the log. @@ -655,9 +658,11 @@ int32_t ReplicatedMergeTreeQueue::pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper std::optional min_unprocessed_insert_time_changed; - for (auto & future : futures) + auto get_results = zookeeper->get(get_paths); + auto get_num = get_results.size(); + for (size_t i = 0; i < get_num; ++i) { - Coordination::GetResponse res = future.second.get(); + auto res = get_results[i]; copied_entries.emplace_back(LogEntry::parse(res.data, res.stat)); diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp index 0abea5977c3..158cbfca9fd 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp @@ -99,19 +99,22 @@ size_t ReplicatedMergeTreeSink::checkQuorumPrecondition(zkutil::ZooKeeperPtr & z quorum_info.status_path = storage.zookeeper_path + "/quorum/status"; Strings replicas = zookeeper->getChildren(fs::path(storage.zookeeper_path) / "replicas"); - std::vector> replicas_status_futures; - replicas_status_futures.reserve(replicas.size()); + + Strings exists_paths; for (const auto & replica : replicas) if (replica != storage.replica_name) - replicas_status_futures.emplace_back(zookeeper->asyncExists(fs::path(storage.zookeeper_path) / "replicas" / replica / "is_active")); + exists_paths.emplace_back(fs::path(storage.zookeeper_path) / "replicas" / replica / "is_active"); - std::future is_active_future = zookeeper->asyncTryGet(storage.replica_path + "/is_active"); - std::future host_future = zookeeper->asyncTryGet(storage.replica_path + "/host"); + auto exists_result = zookeeper->exists(exists_paths); + auto get_results = zookeeper->get(Strings{storage.replica_path + "/is_active", storage.replica_path + "/host"}); size_t active_replicas = 1; /// Assume current replica is active (will check below) - for (auto & status : replicas_status_futures) - if (status.get().error == Coordination::Error::ZOK) + for (size_t i = 0; i < exists_paths.size(); ++i) + { + auto status = exists_result[i]; + if (status.error == Coordination::Error::ZOK) ++active_replicas; + } size_t replicas_number = replicas.size(); size_t quorum_size = getQuorumSize(replicas_number); @@ -135,8 +138,8 @@ size_t ReplicatedMergeTreeSink::checkQuorumPrecondition(zkutil::ZooKeeperPtr & z /// Both checks are implicitly made also later (otherwise there would be a race condition). - auto is_active = is_active_future.get(); - auto host = host_future.get(); + auto is_active = get_results[0]; + auto host = get_results[1]; if (is_active.error == Coordination::Error::ZNONODE || host.error == Coordination::Error::ZNONODE) throw Exception("Replica is not active right now", ErrorCodes::READONLY); diff --git a/src/Storages/StorageKeeperMap.cpp b/src/Storages/StorageKeeperMap.cpp index e62874490f8..f0bf4e431ae 100644 --- a/src/Storages/StorageKeeperMap.cpp +++ b/src/Storages/StorageKeeperMap.cpp @@ -682,24 +682,20 @@ Chunk StorageKeeperMap::getBySerializedKeys(const std::span k auto client = getClient(); - std::vector> values; - values.reserve(keys.size()); + Strings full_key_paths; + full_key_paths.reserve(keys.size()); for (const auto & key : keys) { - const auto full_path = fullPathForKey(key); - values.emplace_back(client->asyncTryGet(full_path)); + full_key_paths.emplace_back(fullPathForKey(key)); } - auto wait_until = std::chrono::system_clock::now() + std::chrono::milliseconds(Coordination::DEFAULT_OPERATION_TIMEOUT_MS); + auto values = client->tryGet(full_key_paths); for (size_t i = 0; i < keys.size(); ++i) { - auto & value = values[i]; - if (value.wait_until(wait_until) != std::future_status::ready) - throw DB::Exception(ErrorCodes::KEEPER_EXCEPTION, "Failed to fetch values: timeout"); + auto response = values[i]; - auto response = value.get(); Coordination::Error code = response.error; if (code == Coordination::Error::ZOK) diff --git a/src/Storages/StorageMergeTree.cpp b/src/Storages/StorageMergeTree.cpp index a65af1cf69e..7cfce882e7a 100644 --- a/src/Storages/StorageMergeTree.cpp +++ b/src/Storages/StorageMergeTree.cpp @@ -993,14 +993,6 @@ MergeMutateSelectedEntryPtr StorageMergeTree::selectPartsToMutate( const StorageMetadataPtr & metadata_snapshot, String * /* disable_reason */, TableLockHolder & /* table_lock_holder */, std::unique_lock & /*currently_processing_in_background_mutex_lock*/) { - size_t max_ast_elements = getContext()->getSettingsRef().max_expanded_ast_elements; - - auto future_part = std::make_shared(); - if (storage_settings.get()->assign_part_uuids) - future_part->uuid = UUIDHelpers::generateV4(); - - CurrentlyMergingPartsTaggerPtr tagger; - if (current_mutations_by_version.empty()) return {}; @@ -1014,6 +1006,14 @@ MergeMutateSelectedEntryPtr StorageMergeTree::selectPartsToMutate( return {}; } + size_t max_ast_elements = getContext()->getSettingsRef().max_expanded_ast_elements; + + auto future_part = std::make_shared(); + if (storage_settings.get()->assign_part_uuids) + future_part->uuid = UUIDHelpers::generateV4(); + + CurrentlyMergingPartsTaggerPtr tagger; + auto mutations_end_it = current_mutations_by_version.end(); for (const auto & part : getDataPartsVectorForInternalUsage()) { @@ -1152,7 +1152,8 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign return false; merge_entry = selectPartsToMerge(metadata_snapshot, false, {}, false, nullptr, share_lock, lock, txn); - if (!merge_entry) + + if (!merge_entry && !current_mutations_by_version.empty()) mutate_entry = selectPartsToMutate(metadata_snapshot, nullptr, share_lock, lock); has_mutations = !current_mutations_by_version.empty(); diff --git a/src/Storages/StorageReplicatedMergeTree.cpp b/src/Storages/StorageReplicatedMergeTree.cpp index 37bed80bfb4..eddb6f6bfcd 100644 --- a/src/Storages/StorageReplicatedMergeTree.cpp +++ b/src/Storages/StorageReplicatedMergeTree.cpp @@ -286,21 +286,32 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree( , replicated_fetches_throttler(std::make_shared(getSettings()->max_replicated_fetches_network_bandwidth, getContext()->getReplicatedFetchesThrottler())) , replicated_sends_throttler(std::make_shared(getSettings()->max_replicated_sends_network_bandwidth, getContext()->getReplicatedSendsThrottler())) { + /// We create and deactivate all tasks for consistency. + /// They all will be scheduled and activated by the restarting thread. queue_updating_task = getContext()->getSchedulePool().createTask( getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::queueUpdatingTask)", [this]{ queueUpdatingTask(); }); + queue_updating_task->deactivate(); + mutations_updating_task = getContext()->getSchedulePool().createTask( getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mutationsUpdatingTask)", [this]{ mutationsUpdatingTask(); }); + mutations_updating_task->deactivate(); + merge_selecting_task = getContext()->getSchedulePool().createTask( getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mergeSelectingTask)", [this] { mergeSelectingTask(); }); - /// Will be activated if we win leader election. + /// Will be activated if we will achieve leader state. merge_selecting_task->deactivate(); mutations_finalizing_task = getContext()->getSchedulePool().createTask( getStorageID().getFullTableName() + " (StorageReplicatedMergeTree::mutationsFinalizingTask)", [this] { mutationsFinalizingTask(); }); + /// This task can be scheduled by different parts of code even when storage is readonly. + /// This can lead to redundant exceptions during startup. + /// Will be activated by restarting thread. + mutations_finalizing_task->deactivate(); + bool has_zookeeper = getContext()->hasZooKeeper() || getContext()->hasAuxiliaryZooKeeper(zookeeper_name); if (has_zookeeper) { @@ -2409,6 +2420,7 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo std::vector source_queue; ActiveDataPartSet get_part_set{format_version}; ActiveDataPartSet drop_range_set{format_version}; + std::unordered_set exact_part_names; { std::vector queue_get_futures; @@ -2446,14 +2458,22 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo info.parsed_entry->znode_name = source_queue_names[i]; if (info.parsed_entry->type == LogEntry::DROP_RANGE) + { drop_range_set.add(info.parsed_entry->new_part_name); - - if (info.parsed_entry->type == LogEntry::GET_PART) + } + else if (info.parsed_entry->type == LogEntry::GET_PART) { String maybe_covering_drop_range = drop_range_set.getContainingPart(info.parsed_entry->new_part_name); if (maybe_covering_drop_range.empty()) get_part_set.add(info.parsed_entry->new_part_name); } + else + { + /// We should keep local parts if they present in the queue of source replica. + /// There's a chance that we are the only replica that has these parts. + Strings entry_virtual_parts = info.parsed_entry->getVirtualPartNames(format_version); + std::move(entry_virtual_parts.begin(), entry_virtual_parts.end(), std::inserter(exact_part_names, exact_part_names.end())); + } } } @@ -2473,11 +2493,17 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo for (const auto & part : local_parts_in_zk) { - if (get_part_set.getContainingPart(part).empty()) - { - parts_to_remove_from_zk.emplace_back(part); - LOG_WARNING(log, "Source replica does not have part {}. Removing it from ZooKeeper.", part); - } + /// We look for exact match (and not for any covering part) + /// because our part might be dropped and covering part might be merged though gap. + /// (avoid resurrection of data that was removed a long time ago) + if (get_part_set.getContainingPart(part) == part) + continue; + + if (exact_part_names.contains(part)) + continue; + + parts_to_remove_from_zk.emplace_back(part); + LOG_WARNING(log, "Source replica does not have part {}. Removing it from ZooKeeper.", part); } { @@ -2499,11 +2525,14 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo for (const auto & part : local_active_parts) { - if (get_part_set.getContainingPart(part->name).empty()) - { - parts_to_remove_from_working_set.emplace_back(part); - LOG_WARNING(log, "Source replica does not have part {}. Removing it from working set.", part->name); - } + if (get_part_set.getContainingPart(part->name) == part->name) + continue; + + if (exact_part_names.contains(part->name)) + continue; + + parts_to_remove_from_working_set.emplace_back(part); + LOG_WARNING(log, "Source replica does not have part {}. Removing it from working set.", part->name); } if (getSettings()->detach_old_local_parts_when_cloning_replica) @@ -3207,16 +3236,17 @@ StorageReplicatedMergeTree::CreateMergeEntryResult StorageReplicatedMergeTree::c int32_t log_version, MergeType merge_type) { - std::vector> exists_futures; - exists_futures.reserve(parts.size()); + Strings exists_paths; + exists_paths.reserve(parts.size()); for (const auto & part : parts) - exists_futures.emplace_back(zookeeper->asyncExists(fs::path(replica_path) / "parts" / part->name)); + exists_paths.emplace_back(fs::path(replica_path) / "parts" / part->name); + auto exists_results = zookeeper->exists(exists_paths); bool all_in_zk = true; for (size_t i = 0; i < parts.size(); ++i) { /// If there is no information about part in ZK, we will not merge it. - if (exists_futures[i].get().error == Coordination::Error::ZNONODE) + if (exists_results[i].error == Coordination::Error::ZNONODE) { all_in_zk = false; @@ -6229,19 +6259,20 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeperWithRetries(const Strin auto zookeeper = getZooKeeper(); - std::vector> exists_futures; - exists_futures.reserve(part_names.size()); + Strings exists_paths; + exists_paths.reserve(part_names.size()); for (const String & part_name : part_names) { - String part_path = fs::path(replica_path) / "parts" / part_name; - exists_futures.emplace_back(zookeeper->asyncExists(part_path)); + exists_paths.emplace_back(fs::path(replica_path) / "parts" / part_name); } + auto exists_results = zookeeper->exists(exists_paths); + std::vector> remove_futures; remove_futures.reserve(part_names.size()); for (size_t i = 0; i < part_names.size(); ++i) { - Coordination::ExistsResponse exists_resp = exists_futures[i].get(); + Coordination::ExistsResponse exists_resp = exists_results[i]; if (exists_resp.error == Coordination::Error::ZOK) { Coordination::Requests ops; @@ -6287,9 +6318,9 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeperWithRetries(const Strin void StorageReplicatedMergeTree::removePartsFromZooKeeper( zkutil::ZooKeeperPtr & zookeeper, const Strings & part_names, NameSet * parts_should_be_retried) { - std::vector> exists_futures; + Strings exists_paths; std::vector> remove_futures; - exists_futures.reserve(part_names.size()); + exists_paths.reserve(part_names.size()); remove_futures.reserve(part_names.size()); try { @@ -6297,13 +6328,14 @@ void StorageReplicatedMergeTree::removePartsFromZooKeeper( /// if zk session will be dropped for (const String & part_name : part_names) { - String part_path = fs::path(replica_path) / "parts" / part_name; - exists_futures.emplace_back(zookeeper->asyncExists(part_path)); + exists_paths.emplace_back(fs::path(replica_path) / "parts" / part_name); } + auto exists_results = zookeeper->exists(exists_paths); + for (size_t i = 0; i < part_names.size(); ++i) { - Coordination::ExistsResponse exists_resp = exists_futures[i].get(); + auto exists_resp = exists_results[i]; if (exists_resp.error == Coordination::Error::ZOK) { Coordination::Requests ops; diff --git a/src/Storages/StorageS3.h b/src/Storages/StorageS3.h index a983a59d98c..c74a8501964 100644 --- a/src/Storages/StorageS3.h +++ b/src/Storages/StorageS3.h @@ -197,7 +197,7 @@ public: const S3::URI uri; std::shared_ptr client; - S3Settings::AuthSettings auth_settings; + S3::AuthSettings auth_settings; S3Settings::ReadWriteSettings rw_settings; /// If s3 configuration was passed from ast, then it is static. @@ -209,7 +209,7 @@ public: S3Configuration( const String & url_, - const S3Settings::AuthSettings & auth_settings_, + const S3::AuthSettings & auth_settings_, const S3Settings::ReadWriteSettings & rw_settings_, const HeaderCollection & headers_from_ast_) : uri(S3::URI(url_)) diff --git a/src/Storages/StorageS3Settings.cpp b/src/Storages/StorageS3Settings.cpp index 4ab3375e188..65e9bb1ab8c 100644 --- a/src/Storages/StorageS3Settings.cpp +++ b/src/Storages/StorageS3Settings.cpp @@ -1,5 +1,7 @@ #include +#include + #include #include #include @@ -9,10 +11,6 @@ namespace DB { -namespace ErrorCodes -{ - extern const int INVALID_CONFIG_PARAMETER; -} void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::Util::AbstractConfiguration & config, const Settings & settings) { @@ -46,41 +44,8 @@ void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::U if (config.has(config_elem + "." + key + ".endpoint")) { auto endpoint = get_string_for_key(key, "endpoint", false); - auto access_key_id = get_string_for_key(key, "access_key_id"); - auto secret_access_key = get_string_for_key(key, "secret_access_key"); - auto region = get_string_for_key(key, "region"); - auto server_side_encryption_customer_key_base64 = get_string_for_key(key, "server_side_encryption_customer_key_base64"); - std::optional use_environment_credentials; - if (config.has(config_elem + "." + key + ".use_environment_credentials")) - use_environment_credentials = config.getBool(config_elem + "." + key + ".use_environment_credentials"); - - std::optional use_insecure_imds_request; - if (config.has(config_elem + "." + key + ".use_insecure_imds_request")) - use_insecure_imds_request = config.getBool(config_elem + "." + key + ".use_insecure_imds_request"); - - HeaderCollection headers; - Poco::Util::AbstractConfiguration::Keys subconfig_keys; - config.keys(config_elem + "." + key, subconfig_keys); - for (const String & subkey : subconfig_keys) - { - if (subkey.starts_with("header")) - { - auto header_str = config.getString(config_elem + "." + key + "." + subkey); - auto delimiter = header_str.find(':'); - if (delimiter == String::npos) - throw Exception("Malformed s3 header value", ErrorCodes::INVALID_CONFIG_PARAMETER); - headers.emplace_back(HttpHeader{header_str.substr(0, delimiter), header_str.substr(delimiter + 1, String::npos)}); - } - } - - S3Settings::AuthSettings auth_settings{ - std::move(access_key_id), std::move(secret_access_key), - std::move(region), - std::move(server_side_encryption_customer_key_base64), - std::move(headers), - use_environment_credentials, - use_insecure_imds_request}; + auto auth_settings = S3::AuthSettings::loadFromConfig(config_elem + "." + key, config); S3Settings::ReadWriteSettings rw_settings; rw_settings.max_single_read_retries = get_uint_for_key(key, "max_single_read_retries", true, settings.s3_max_single_read_retries); diff --git a/src/Storages/StorageS3Settings.h b/src/Storages/StorageS3Settings.h index 80ef4f52deb..2da4a1d7590 100644 --- a/src/Storages/StorageS3Settings.h +++ b/src/Storages/StorageS3Settings.h @@ -9,6 +9,8 @@ #include #include +#include + namespace Poco::Util { class AbstractConfiguration; @@ -21,46 +23,6 @@ struct Settings; struct S3Settings { - struct AuthSettings - { - String access_key_id; - String secret_access_key; - String region; - String server_side_encryption_customer_key_base64; - - HeaderCollection headers; - - std::optional use_environment_credentials; - std::optional use_insecure_imds_request; - - inline bool operator==(const AuthSettings & other) const - { - return access_key_id == other.access_key_id && secret_access_key == other.secret_access_key - && region == other.region - && server_side_encryption_customer_key_base64 == other.server_side_encryption_customer_key_base64 - && headers == other.headers - && use_environment_credentials == other.use_environment_credentials - && use_insecure_imds_request == other.use_insecure_imds_request; - } - - void updateFrom(const AuthSettings & from) - { - /// Update with check for emptyness only parameters which - /// can be passed not only from config, but via ast. - - if (!from.access_key_id.empty()) - access_key_id = from.access_key_id; - if (!from.secret_access_key.empty()) - secret_access_key = from.secret_access_key; - - headers = from.headers; - region = from.region; - server_side_encryption_customer_key_base64 = from.server_side_encryption_customer_key_base64; - use_environment_credentials = from.use_environment_credentials; - use_insecure_imds_request = from.use_insecure_imds_request; - } - }; - struct ReadWriteSettings { size_t max_single_read_retries = 0; @@ -90,7 +52,7 @@ struct S3Settings void updateFromSettingsIfEmpty(const Settings & settings); }; - AuthSettings auth_settings; + S3::AuthSettings auth_settings; ReadWriteSettings rw_settings; inline bool operator==(const S3Settings & other) const diff --git a/src/Storages/getStructureOfRemoteTable.cpp b/src/Storages/getStructureOfRemoteTable.cpp index 3d104ada0b6..a93a480adb0 100644 --- a/src/Storages/getStructureOfRemoteTable.cpp +++ b/src/Storages/getStructureOfRemoteTable.cpp @@ -58,7 +58,7 @@ ColumnsDescription getStructureOfRemoteTableInShard( } ColumnsDescription res; - auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef()); + auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef(), table_id); /// Expect only needed columns from the result of DESC TABLE. NOTE 'comment' column is ignored for compatibility reasons. Block sample_block @@ -169,7 +169,7 @@ ColumnsDescriptionByShardNum getExtendedObjectsOfRemoteTables( const auto & shards_info = cluster.getShardsInfo(); auto query = "DESC TABLE " + remote_table_id.getFullTableName(); - auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef()); + auto new_context = ClusterProxy::updateSettingsForCluster(cluster, context, context->getSettingsRef(), remote_table_id); new_context->setSetting("describe_extend_object_types", true); /// Expect only needed columns from the result of DESC TABLE. diff --git a/tests/clickhouse-test b/tests/clickhouse-test index 12f85a5adbf..20e63412d91 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -987,7 +987,7 @@ class TestCase: and (proc.stderr is None) and (proc.stdout is None or "Exception" not in proc.stdout) ) - need_drop_database = not maybe_passed + need_drop_database = maybe_passed debug_log = "" if os.path.exists(self.testcase_args.debug_log_file): @@ -2055,7 +2055,7 @@ if __name__ == "__main__": parser.add_argument( "--no-drop-if-fail", action="store_true", - help="Do not drop database for test if test has failed", + help="Do not drop database for test if test has failed (does not work if reference file mismatch)", ) parser.add_argument( "--hide-db-name", diff --git a/tests/config/config.d/storage_conf.xml b/tests/config/config.d/storage_conf.xml index a2a7f5cc750..8226d801cef 100644 --- a/tests/config/config.d/storage_conf.xml +++ b/tests/config/config.d/storage_conf.xml @@ -93,6 +93,15 @@ 22548578304 0 + + cache + s3_disk_6 + s3_cache_6/ + 22548578304 + 0 + 1 + 100 + cache s3_disk_6 @@ -183,6 +192,13 @@ + + +
+ s3_cache_6 +
+
+
diff --git a/tests/integration/helpers/cluster.py b/tests/integration/helpers/cluster.py index c987ca292c1..666833013c8 100644 --- a/tests/integration/helpers/cluster.py +++ b/tests/integration/helpers/cluster.py @@ -2678,7 +2678,9 @@ class ClickHouseCluster: # Check server logs for Fatal messages and sanitizer failures. # NOTE: we cannot do this via docker since in case of Fatal message container may already die. for name, instance in self.instances.items(): - if instance.contains_in_log(SANITIZER_SIGN, from_host=True): + if instance.contains_in_log( + SANITIZER_SIGN, from_host=True, filename="stderr.log" + ): sanitizer_assert_instance = instance.grep_in_log( SANITIZER_SIGN, from_host=True, filename="stderr.log" ) diff --git a/tests/integration/test_disks_app_func/test.py b/tests/integration/test_disks_app_func/test.py index d87f387e122..de9b23abd5e 100644 --- a/tests/integration/test_disks_app_func/test.py +++ b/tests/integration/test_disks_app_func/test.py @@ -37,7 +37,7 @@ def test_disks_app_func_ld(started_cluster): source = cluster.instances["disks_app_test"] out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "list-disks"] + ["/usr/bin/clickhouse", "disks", "--save-logs", "list-disks"] ) disks = out.split("\n") @@ -51,7 +51,7 @@ def test_disks_app_func_ls(started_cluster): init_data(source) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test1", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test1", "list", "."] ) files = out.split("\n") @@ -62,7 +62,7 @@ def test_disks_app_func_ls(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test1", "list", @@ -89,7 +89,7 @@ def test_disks_app_func_cp(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test1", "write", @@ -114,7 +114,7 @@ def test_disks_app_func_cp(started_cluster): ) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test2", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test2", "list", "."] ) assert "path1" in out @@ -123,7 +123,7 @@ def test_disks_app_func_cp(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test2", "remove", @@ -135,7 +135,7 @@ def test_disks_app_func_cp(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test1", "remove", @@ -146,13 +146,13 @@ def test_disks_app_func_cp(started_cluster): # alesapin: Why we need list one more time? # kssenii: it is an assertion that the file is indeed deleted out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test2", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test2", "list", "."] ) assert "path1" not in out out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test1", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test1", "list", "."] ) assert "path1" not in out @@ -174,7 +174,7 @@ def test_disks_app_func_ln(started_cluster): ) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "list", "data/default/"] + ["/usr/bin/clickhouse", "disks", "--save-logs", "list", "data/default/"] ) files = out.split("\n") @@ -196,7 +196,7 @@ def test_disks_app_func_rm(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test2", "write", @@ -207,7 +207,7 @@ def test_disks_app_func_rm(started_cluster): ) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test2", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test2", "list", "."] ) assert "path3" in out @@ -216,7 +216,7 @@ def test_disks_app_func_rm(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test2", "remove", @@ -225,7 +225,7 @@ def test_disks_app_func_rm(started_cluster): ) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test2", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test2", "list", "."] ) assert "path3" not in out @@ -237,7 +237,7 @@ def test_disks_app_func_mv(started_cluster): init_data(source) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test1", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test1", "list", "."] ) files = out.split("\n") @@ -257,7 +257,7 @@ def test_disks_app_func_mv(started_cluster): ) out = source.exec_in_container( - ["/usr/bin/clickhouse", "disks", "--send-logs", "--disk", "test1", "list", "."] + ["/usr/bin/clickhouse", "disks", "--save-logs", "--disk", "test1", "list", "."] ) files = out.split("\n") @@ -277,7 +277,7 @@ def test_disks_app_func_read_write(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test1", "write", @@ -291,7 +291,7 @@ def test_disks_app_func_read_write(started_cluster): [ "/usr/bin/clickhouse", "disks", - "--send-logs", + "--save-logs", "--disk", "test1", "read", diff --git a/tests/integration/test_keeper_s3_snapshot/__init__.py b/tests/integration/test_keeper_s3_snapshot/__init__.py new file mode 100644 index 00000000000..e5a0d9b4834 --- /dev/null +++ b/tests/integration/test_keeper_s3_snapshot/__init__.py @@ -0,0 +1 @@ +#!/usr/bin/env python3 diff --git a/tests/integration/test_keeper_s3_snapshot/configs/keeper_config1.xml b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config1.xml new file mode 100644 index 00000000000..8459ea3e068 --- /dev/null +++ b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config1.xml @@ -0,0 +1,42 @@ + + + + http://minio1:9001/snapshots/ + minio + minio123 + + 9181 + 1 + /var/lib/clickhouse/coordination/log + /var/lib/clickhouse/coordination/snapshots + * + + + 5000 + 10000 + 5000 + 50 + trace + + + + + 1 + node1 + 9234 + + + 2 + node2 + 9234 + true + + + 3 + node3 + 9234 + true + + + + diff --git a/tests/integration/test_keeper_s3_snapshot/configs/keeper_config2.xml b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config2.xml new file mode 100644 index 00000000000..dfe73628f66 --- /dev/null +++ b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config2.xml @@ -0,0 +1,42 @@ + + + + http://minio1:9001/snapshots/ + minio + minio123 + + 9181 + 2 + /var/lib/clickhouse/coordination/log + /var/lib/clickhouse/coordination/snapshots + * + + + 5000 + 10000 + 5000 + 75 + trace + + + + + 1 + node1 + 9234 + + + 2 + node2 + 9234 + true + + + 3 + node3 + 9234 + true + + + + diff --git a/tests/integration/test_keeper_s3_snapshot/configs/keeper_config3.xml b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config3.xml new file mode 100644 index 00000000000..948d9527718 --- /dev/null +++ b/tests/integration/test_keeper_s3_snapshot/configs/keeper_config3.xml @@ -0,0 +1,42 @@ + + + + http://minio1:9001/snapshots/ + minio + minio123 + + 9181 + 3 + /var/lib/clickhouse/coordination/log + /var/lib/clickhouse/coordination/snapshots + * + + + 5000 + 10000 + 5000 + 75 + trace + + + + + 1 + node1 + 9234 + + + 2 + node2 + 9234 + true + + + 3 + node3 + 9234 + true + + + + diff --git a/tests/integration/test_keeper_s3_snapshot/test.py b/tests/integration/test_keeper_s3_snapshot/test.py new file mode 100644 index 00000000000..3e19bc4822c --- /dev/null +++ b/tests/integration/test_keeper_s3_snapshot/test.py @@ -0,0 +1,120 @@ +import pytest +from helpers.cluster import ClickHouseCluster +from time import sleep + +from kazoo.client import KazooClient + +# from kazoo.protocol.serialization import Connect, read_buffer, write_buffer + +cluster = ClickHouseCluster(__file__) +node1 = cluster.add_instance( + "node1", + main_configs=["configs/keeper_config1.xml"], + stay_alive=True, + with_minio=True, +) +node2 = cluster.add_instance( + "node2", + main_configs=["configs/keeper_config2.xml"], + stay_alive=True, + with_minio=True, +) +node3 = cluster.add_instance( + "node3", + main_configs=["configs/keeper_config3.xml"], + stay_alive=True, + with_minio=True, +) + + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.start() + + cluster.minio_client.make_bucket("snapshots") + + yield cluster + + finally: + cluster.shutdown() + + +def get_fake_zk(nodename, timeout=30.0): + _fake_zk_instance = KazooClient( + hosts=cluster.get_instance_ip(nodename) + ":9181", timeout=timeout + ) + _fake_zk_instance.start() + return _fake_zk_instance + + +def destroy_zk_client(zk): + try: + if zk: + zk.stop() + zk.close() + except: + pass + + +def wait_node(node): + for _ in range(100): + zk = None + try: + zk = get_fake_zk(node.name, timeout=30.0) + zk.sync("/") + print("node", node.name, "ready") + break + except Exception as ex: + sleep(0.2) + print("Waiting until", node.name, "will be ready, exception", ex) + finally: + destroy_zk_client(zk) + else: + raise Exception("Can't wait node", node.name, "to become ready") + + +def test_s3_upload(started_cluster): + node1_zk = get_fake_zk(node1.name) + + # we defined in configs snapshot_distance as 50 + # so after 50 requests we should generate a snapshot + for _ in range(210): + node1_zk.create("/test", sequence=True) + + def get_saved_snapshots(): + return [ + obj.object_name + for obj in list(cluster.minio_client.list_objects("snapshots")) + ] + + saved_snapshots = get_saved_snapshots() + assert set(saved_snapshots) == set( + [ + "snapshot_50.bin.zstd", + "snapshot_100.bin.zstd", + "snapshot_150.bin.zstd", + "snapshot_200.bin.zstd", + ] + ) + + destroy_zk_client(node1_zk) + node1.stop_clickhouse(kill=True) + + # wait for new leader to be picked and that it continues + # uploading snapshots + wait_node(node2) + node2_zk = get_fake_zk(node2.name) + for _ in range(200): + node2_zk.create("/test", sequence=True) + + saved_snapshots = get_saved_snapshots() + + assert len(saved_snapshots) > 4 + + success_upload_message = "Successfully uploaded" + assert node2.contains_in_log(success_upload_message) or node3.contains_in_log( + success_upload_message + ) + + destroy_zk_client(node2_zk) diff --git a/tests/integration/test_partition/configs/testkeeper.xml b/tests/integration/test_partition/configs/testkeeper.xml new file mode 100644 index 00000000000..5200b789a9b --- /dev/null +++ b/tests/integration/test_partition/configs/testkeeper.xml @@ -0,0 +1,6 @@ + + + + testkeeper + + \ No newline at end of file diff --git a/tests/integration/test_partition/test.py b/tests/integration/test_partition/test.py index f3df66631a5..320209b5d7e 100644 --- a/tests/integration/test_partition/test.py +++ b/tests/integration/test_partition/test.py @@ -2,9 +2,15 @@ import pytest import logging from helpers.cluster import ClickHouseCluster from helpers.test_tools import TSV +from helpers.test_tools import assert_eq_with_retry cluster = ClickHouseCluster(__file__) -instance = cluster.add_instance("instance") +instance = cluster.add_instance( + "instance", + main_configs=[ + "configs/testkeeper.xml", + ], +) q = instance.query path_to_data = "/var/lib/clickhouse/" @@ -478,3 +484,86 @@ def test_detached_part_dir_exists(started_cluster): == "all_1_1_0\nall_1_1_0_try1\nall_2_2_0\nall_2_2_0_try1\n" ) q("drop table detached_part_dir_exists") + + +def test_make_clone_in_detached(started_cluster): + q( + "create table clone_in_detached (n int, m String) engine=ReplicatedMergeTree('/clone_in_detached', '1') order by n" + ) + + path = path_to_data + "data/default/clone_in_detached/" + + # broken part already detached + q("insert into clone_in_detached values (42, '¯\_(ツ)_/¯')") + instance.exec_in_container(["rm", path + "all_0_0_0/data.bin"]) + instance.exec_in_container( + ["cp", "-r", path + "all_0_0_0", path + "detached/broken_all_0_0_0"] + ) + assert_eq_with_retry(instance, "select * from clone_in_detached", "\n") + assert ["broken_all_0_0_0",] == sorted( + instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n") + ) + + # there's a directory with the same name, but different content + q("insert into clone_in_detached values (43, '¯\_(ツ)_/¯')") + instance.exec_in_container(["rm", path + "all_1_1_0/data.bin"]) + instance.exec_in_container( + ["cp", "-r", path + "all_1_1_0", path + "detached/broken_all_1_1_0"] + ) + instance.exec_in_container(["rm", path + "detached/broken_all_1_1_0/primary.idx"]) + instance.exec_in_container( + ["cp", "-r", path + "all_1_1_0", path + "detached/broken_all_1_1_0_try0"] + ) + instance.exec_in_container( + [ + "bash", + "-c", + "echo 'broken' > {}".format( + path + "detached/broken_all_1_1_0_try0/checksums.txt" + ), + ] + ) + assert_eq_with_retry(instance, "select * from clone_in_detached", "\n") + assert [ + "broken_all_0_0_0", + "broken_all_1_1_0", + "broken_all_1_1_0_try0", + "broken_all_1_1_0_try1", + ] == sorted( + instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n") + ) + + # there are directories with the same name, but different content, and part already detached + q("insert into clone_in_detached values (44, '¯\_(ツ)_/¯')") + instance.exec_in_container(["rm", path + "all_2_2_0/data.bin"]) + instance.exec_in_container( + ["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0"] + ) + instance.exec_in_container(["rm", path + "detached/broken_all_2_2_0/primary.idx"]) + instance.exec_in_container( + ["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0_try0"] + ) + instance.exec_in_container( + [ + "bash", + "-c", + "echo 'broken' > {}".format( + path + "detached/broken_all_2_2_0_try0/checksums.txt" + ), + ] + ) + instance.exec_in_container( + ["cp", "-r", path + "all_2_2_0", path + "detached/broken_all_2_2_0_try1"] + ) + assert_eq_with_retry(instance, "select * from clone_in_detached", "\n") + assert [ + "broken_all_0_0_0", + "broken_all_1_1_0", + "broken_all_1_1_0_try0", + "broken_all_1_1_0_try1", + "broken_all_2_2_0", + "broken_all_2_2_0_try0", + "broken_all_2_2_0_try1", + ] == sorted( + instance.exec_in_container(["ls", path + "detached/"]).strip().split("\n") + ) diff --git a/tests/integration/test_replicated_merge_tree_hdfs_zero_copy/test.py b/tests/integration/test_replicated_merge_tree_hdfs_zero_copy/test.py index 7d65bed3901..1f81421f93c 100644 --- a/tests/integration/test_replicated_merge_tree_hdfs_zero_copy/test.py +++ b/tests/integration/test_replicated_merge_tree_hdfs_zero_copy/test.py @@ -1,8 +1,14 @@ +import pytest + +# FIXME This test is too flaky +# https://github.com/ClickHouse/ClickHouse/issues/42561 + +pytestmark = pytest.mark.skip + import logging from string import Template import time -import pytest from helpers.cluster import ClickHouseCluster from helpers.test_tools import assert_eq_with_retry diff --git a/tests/integration/test_replicated_merge_tree_with_auxiliary_zookeepers/test.py b/tests/integration/test_replicated_merge_tree_with_auxiliary_zookeepers/test.py index c46e6840153..cf76d47157a 100644 --- a/tests/integration/test_replicated_merge_tree_with_auxiliary_zookeepers/test.py +++ b/tests/integration/test_replicated_merge_tree_with_auxiliary_zookeepers/test.py @@ -11,11 +11,13 @@ node1 = cluster.add_instance( "node1", main_configs=["configs/zookeeper_config.xml", "configs/remote_servers.xml"], with_zookeeper=True, + use_keeper=False, ) node2 = cluster.add_instance( "node2", main_configs=["configs/zookeeper_config.xml", "configs/remote_servers.xml"], with_zookeeper=True, + use_keeper=False, ) diff --git a/tests/integration/test_storage_nats/test.py b/tests/integration/test_storage_nats/test.py index 63dde8922a6..77db3008524 100644 --- a/tests/integration/test_storage_nats/test.py +++ b/tests/integration/test_storage_nats/test.py @@ -1,3 +1,10 @@ +import pytest + +# FIXME This test is too flaky +# https://github.com/ClickHouse/ClickHouse/issues/39185 + +pytestmark = pytest.mark.skip + import json import os.path as p import random @@ -9,7 +16,6 @@ from random import randrange import math import asyncio -import pytest from google.protobuf.internal.encoder import _VarintBytes from helpers.client import QueryRuntimeException from helpers.cluster import ClickHouseCluster, check_nats_is_available, nats_connect_ssl diff --git a/tests/queries/0_stateless/00463_long_sessions_in_http_interface.reference b/tests/queries/0_stateless/00463_long_sessions_in_http_interface.reference index 53cdf1e9393..a14d334a483 100644 --- a/tests/queries/0_stateless/00463_long_sessions_in_http_interface.reference +++ b/tests/queries/0_stateless/00463_long_sessions_in_http_interface.reference @@ -1 +1,28 @@ -PASSED +Using non-existent session with the 'session_check' flag will throw exception: +1 +Using non-existent session without the 'session_check' flag will create a new session: +1 +1 +The 'session_timeout' parameter is checked for validity and for the maximum value: +1 +1 +1 +Valid cases are accepted: +1 +1 +1 +Sessions are local per user: +1 +Hello +World +And cannot be accessed for a non-existent user: +1 +The temporary tables created in a session are not accessible without entering this session: +1 +A session successfully expire after a timeout: +111 +A session successfully expire after a timeout and the session's temporary table shadows the permanent table: +HelloWorld +A session cannot be used by concurrent connections: +1 +1 diff --git a/tests/queries/0_stateless/00463_long_sessions_in_http_interface.sh b/tests/queries/0_stateless/00463_long_sessions_in_http_interface.sh index e9f486fbb73..89da84a5bdd 100755 --- a/tests/queries/0_stateless/00463_long_sessions_in_http_interface.sh +++ b/tests/queries/0_stateless/00463_long_sessions_in_http_interface.sh @@ -1,113 +1,87 @@ #!/usr/bin/env bash # Tags: long, no-parallel +# shellcheck disable=SC2015 CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -request() { - local url="$1" - local select="$2" - ${CLICKHOUSE_CURL} --silent "$url" --data "$select" -} +echo "Using non-existent session with the 'session_check' flag will throw exception:" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=nonexistent&session_check=1" --data-binary "SELECT 1" | grep -c -F 'Session not found' -create_temporary_table() { - local url="$1" - request "$url" "CREATE TEMPORARY TABLE temp (x String)" - request "$url" "INSERT INTO temp VALUES ('Hello'), ('World')" -} +echo "Using non-existent session without the 'session_check' flag will create a new session:" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_1" --data-binary "SELECT 1" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_1&session_check=0" --data-binary "SELECT 1" +echo "The 'session_timeout' parameter is checked for validity and for the maximum value:" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_2&session_timeout=string" --data-binary "SELECT 1" | grep -c -F 'Invalid session timeout' +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_2&session_timeout=3601" --data-binary "SELECT 1" | grep -c -F 'Maximum session timeout' +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_2&session_timeout=-1" --data-binary "SELECT 1" | grep -c -F 'Invalid session timeout' -check() { - local url="$1" - local select="$2" - local output="$3" - local expected_result="$4" - local message="$5" - result=$(request "$url" "$select" | grep --count "$output") - if [ "$result" -ne "$expected_result" ]; then - echo "FAILED: $message" - exit 1 - fi -} +echo "Valid cases are accepted:" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_3&session_timeout=0" --data-binary "SELECT 1" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_4&session_timeout=3600" --data-binary "SELECT 1" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_5&session_timeout=60" --data-binary "SELECT 1" +echo "Sessions are local per user:" +${CLICKHOUSE_CLIENT} --multiquery --query "DROP USER IF EXISTS test_00463; CREATE USER test_00463; GRANT ALL ON *.* TO test_00463;" -address=${CLICKHOUSE_HOST} -port=${CLICKHOUSE_PORT_HTTP} -url="${CLICKHOUSE_PORT_HTTP_PROTO}://$address:$port/" -session="?session_id=test_$$" # use PID for session ID -select="SELECT * FROM system.settings WHERE name = 'max_rows_to_read'" -select_from_temporary_table="SELECT * FROM temp ORDER BY x" -select_from_non_existent_table="SELECT * FROM no_such_table ORDER BY x" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_6&session_timeout=600" --data-binary "CREATE TEMPORARY TABLE t (s String)" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_6" --data-binary "INSERT INTO t VALUES ('Hello')" +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&user=test_00463&session_id=${CLICKHOUSE_DATABASE}_6&session_check=1" --data-binary "SELECT 1" | grep -c -F 'Session not found' +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&user=test_00463&session_id=${CLICKHOUSE_DATABASE}_6&session_timeout=600" --data-binary "CREATE TEMPORARY TABLE t (s String)" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&user=test_00463&session_id=${CLICKHOUSE_DATABASE}_6" --data-binary "INSERT INTO t VALUES ('World')" -check "$url?session_id=no_such_session_$$&session_check=1" "$select" "Exception.*Session not found" 1 "session_check=1 does not work." -check "$url$session&session_check=0" "$select" "Exception" 0 "session_check=0 does not work." +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_6" --data-binary "SELECT * FROM t" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&user=test_00463&session_id=${CLICKHOUSE_DATABASE}_6" --data-binary "SELECT * FROM t" -request "$url""$session" "SET max_rows_to_read=7777777" +${CLICKHOUSE_CLIENT} --multiquery --query "DROP USER test_00463"; -check "$url$session&session_timeout=string" "$select" "Exception.*Invalid session timeout" 1 "Non-numeric value accepted as a timeout." -check "$url$session&session_timeout=3601" "$select" "Exception.*Maximum session timeout*" 1 "More then 3600 seconds accepted as a timeout." -check "$url$session&session_timeout=-1" "$select" "Exception.*Invalid session timeout" 1 "Negative timeout accepted." -check "$url$session&session_timeout=0" "$select" "Exception" 0 "Zero timeout not accepted." -check "$url$session&session_timeout=3600" "$select" "Exception" 0 "3600 second timeout not accepted." -check "$url$session&session_timeout=60" "$select" "Exception" 0 "60 second timeout not accepted." +echo "And cannot be accessed for a non-existent user:" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&user=test_00463&session_id=${CLICKHOUSE_DATABASE}_6" --data-binary "SELECT * FROM t" | grep -c -F 'Exception' -check "$url""$session" "$select" "7777777" 1 "Failed to reuse session." -# Workaround here -# TODO: move the test to integration test or add readonly user to test environment -if [[ -z $(request "$url?user=readonly" "SELECT ''") ]]; then - # We have readonly user - check "$url$session&user=readonly&session_check=1" "$select" "Exception.*Session not found" 1 "Session is accessable for another user." -else - check "$url$session&user=readonly&session_check=1" "$select" "Exception.*Unknown user*" 1 "Session is accessable for unknown user." -fi +echo "The temporary tables created in a session are not accessible without entering this session:" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}" --data-binary "SELECT * FROM t" | grep -c -F 'Exception' -create_temporary_table "$url""$session" -check "$url""$session" "$select_from_temporary_table" "Hello" 1 "Failed to reuse a temporary table for session." - -check "$url?session_id=another_session_$$" "$select_from_temporary_table" "Exception.*Table .* doesn't exist." 1 "Temporary table is visible for another table." - - -( ( -cat </dev/null 2>/dev/null) & -sleep 1 -check "$url""$session" "$select" "Exception.*Session is locked" 1 "Double access to the same session." - - -session="?session_id=test_timeout_$$" - -create_temporary_table "$url$session&session_timeout=1" -check "$url$session&session_timeout=1" "$select_from_temporary_table" "Hello" 1 "Failed to reuse a temporary table for session." -sleep 3 -check "$url$session&session_check=1" "$select" "Exception.*Session not found" 1 "Session did not expire on time." - -create_temporary_table "$url$session&session_timeout=2" -for _ in $(seq 1 3); do - check "$url$session&session_timeout=2" "$select_from_temporary_table" "Hello" 1 "Session expired too early." - sleep 1 +echo "A session successfully expire after a timeout:" +# An infinite loop is required to make the test reliable. We will check that the timeout corresponds to the observed time at least once +while true +do + ( + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_7&session_timeout=1" --data-binary "SELECT 1" + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_7&session_check=1" --data-binary "SELECT 1" + sleep 3 + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_7&session_check=1" --data-binary "SELECT 1" | grep -c -F 'Session not found' + ) | tr -d '\n' | grep -F '111' && break || sleep 1 done -sleep 3 -check "$url$session&session_check=1" "$select" "Exception.*Session not found" 1 "Session did not expire on time." -create_temporary_table "$url$session&session_timeout=2" -for _ in $(seq 1 5); do - check "$url$session&session_timeout=2" "$select_from_non_existent_table" "Exception.*Table .* doesn't exist." 1 "Session expired too early." - sleep 1 +echo "A session successfully expire after a timeout and the session's temporary table shadows the permanent table:" +# An infinite loop is required to make the test reliable. We will check that the timeout corresponds to the observed time at least once +${CLICKHOUSE_CLIENT} --multiquery --query "DROP TABLE IF EXISTS t; CREATE TABLE t (s String) ENGINE = Memory; INSERT INTO t VALUES ('World');" +while true +do + ( + ${CLICKHOUSE_CURL} -X POST -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_8&session_timeout=1" --data-binary "CREATE TEMPORARY TABLE t (s String)" + ${CLICKHOUSE_CURL} -X POST -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_8" --data-binary "INSERT INTO t VALUES ('Hello')" + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_8" --data-binary "SELECT * FROM t" + sleep 3 + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_8" --data-binary "SELECT * FROM t" + ) | tr -d '\n' | grep -F 'HelloWorld' && break || sleep 1 done -check "$url$session&session_timeout=2" "$select_from_temporary_table" "Hello" 1 "Session expired too early. Failed to update timeout in case of exceptions." -sleep 4 -check "$url$session&session_check=1" "$select" "Exception.*Session not found" 1 "Session did not expire on time." +${CLICKHOUSE_CLIENT} --multiquery --query "DROP TABLE t" +echo "A session cannot be used by concurrent connections:" -echo "PASSED" +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_9&query_id=${CLICKHOUSE_DATABASE}_9" --data-binary "SELECT count() FROM system.numbers" >/dev/null & + +# An infinite loop is required to make the test reliable. We will ensure that at least once the query on the line above has started before this check +while true +do + ${CLICKHOUSE_CLIENT} --query "SELECT count() > 0 FROM system.processes WHERE query_id = '${CLICKHOUSE_DATABASE}_9'" | grep -F '1' && break || sleep 1 +done + +${CLICKHOUSE_CURL} -sS -X POST "${CLICKHOUSE_URL}&session_id=${CLICKHOUSE_DATABASE}_9" --data-binary "SELECT 1" | grep -c -F 'Session is locked' +${CLICKHOUSE_CLIENT} --multiquery --query "KILL QUERY WHERE query_id = '${CLICKHOUSE_DATABASE}_9' SYNC FORMAT Null"; +wait diff --git a/tests/queries/0_stateless/00705_drop_create_merge_tree.reference b/tests/queries/0_stateless/00705_drop_create_merge_tree.reference index 8b137891791..e69de29bb2d 100644 --- a/tests/queries/0_stateless/00705_drop_create_merge_tree.reference +++ b/tests/queries/0_stateless/00705_drop_create_merge_tree.reference @@ -1 +0,0 @@ - diff --git a/tests/queries/0_stateless/00705_drop_create_merge_tree.sh b/tests/queries/0_stateless/00705_drop_create_merge_tree.sh index 146d6e54c0b..d7754091290 100755 --- a/tests/queries/0_stateless/00705_drop_create_merge_tree.sh +++ b/tests/queries/0_stateless/00705_drop_create_merge_tree.sh @@ -1,39 +1,12 @@ #!/usr/bin/env bash # Tags: no-fasttest -set -e - CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -function stress() -{ - # We set up a signal handler to make sure to wait for all queries to be finished before ending - CONTINUE=true - handle_interruption() - { - CONTINUE=false - } - trap handle_interruption INT - - while $CONTINUE; do - ${CLICKHOUSE_CLIENT} --query "CREATE TABLE IF NOT EXISTS table (x UInt8) ENGINE = MergeTree ORDER BY tuple()" 2>/dev/null - ${CLICKHOUSE_CLIENT} --query "DROP TABLE table" 2>/dev/null - done - - trap - INT -} - -# https://stackoverflow.com/questions/9954794/execute-a-shell-function-with-timeout -export -f stress - -for _ in {1..5}; do - # Ten seconds are just barely enough to reproduce the issue in most of runs. - timeout -s INT 10 bash -c stress & -done - +yes 'CREATE TABLE IF NOT EXISTS table (x UInt8) ENGINE = MergeTree ORDER BY tuple();' | head -n 1000 | $CLICKHOUSE_CLIENT --ignore-error -nm 2>/dev/null & +yes 'DROP TABLE table;' | head -n 1000 | $CLICKHOUSE_CLIENT --ignore-error -nm 2>/dev/null & wait -echo -${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS table"; +${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS table" diff --git a/tests/queries/0_stateless/00900_long_parquet.reference b/tests/queries/0_stateless/00900_long_parquet.reference index 4dfc726145e..bbdad7243bd 100644 --- a/tests/queries/0_stateless/00900_long_parquet.reference +++ b/tests/queries/0_stateless/00900_long_parquet.reference @@ -44,12 +44,12 @@ converted: diff: dest: 79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 00:00:00 2004-05-06 07:08:09.012000000 -80 81 82 83 84 85 86 87 88 89 str02 fstr2\0\0\0\0\0\0\0\0\0\0 2149-06-06 2006-08-09 10:11:12 2006-08-09 10:11:12.345000000 +80 81 82 83 84 85 86 87 88 89 str02 fstr2\0\0\0\0\0\0\0\0\0\0 2005-03-04 2006-08-09 10:11:12 2006-08-09 10:11:12.345000000 min: --128 0 0 0 0 0 0 0 -1 -1 string-1\0\0\0\0\0\0\0 fixedstring-1\0\0 2003-04-05 2149-06-06 2003-02-03 04:05:06.789000000 --108 108 8 92 -8 108 -40 -116 -1 -1 string-0\0\0\0\0\0\0\0 fixedstring\0\0\0\0 2001-02-03 2149-06-06 2002-02-03 04:05:06.789000000 +-128 0 0 0 0 0 0 0 -1 -1 string-1\0\0\0\0\0\0\0 fixedstring-1\0\0 2003-04-05 2003-02-03 2003-02-03 04:05:06.789000000 +-108 108 8 92 -8 108 -40 -116 -1 -1 string-0\0\0\0\0\0\0\0 fixedstring\0\0\0\0 2001-02-03 2002-02-03 2002-02-03 04:05:06.789000000 79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 2004-05-06 07:08:09.012000000 -127 -1 -1 -1 -1 -1 -1 -1 -1 -1 string-2\0\0\0\0\0\0\0 fixedstring-2\0\0 2004-06-07 2149-06-06 2004-02-03 04:05:06.789000000 +127 -1 -1 -1 -1 -1 -1 -1 -1 -1 string-2\0\0\0\0\0\0\0 fixedstring-2\0\0 2004-06-07 2004-02-03 2004-02-03 04:05:06.789000000 max: -128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1 -1 string-1 fixedstring-1\0\0 2003-04-05 00:00:00 2003-02-03 04:05:06 2003-02-03 04:05:06.789000000 -108 108 -1016 1116 -1032 1132 -1064 1164 -1 -1 string-0 fixedstring\0\0\0\0 2001-02-03 00:00:00 2002-02-03 04:05:06 2002-02-03 04:05:06.789000000 diff --git a/tests/queries/0_stateless/00938_template_input_format.reference b/tests/queries/0_stateless/00938_template_input_format.reference index e1f77d9a581..ec8cd7a21f0 100644 --- a/tests/queries/0_stateless/00938_template_input_format.reference +++ b/tests/queries/0_stateless/00938_template_input_format.reference @@ -31,3 +31,5 @@ cv bn m","qwe,rty",456,"2016-01-02" "zx\cv\bn m","qwe,rty","as""df'gh","",789,"2016-01-04" "","zx cv bn m","qwe,rty","as""df'gh",9876543210,"2016-01-03" +1 +1 diff --git a/tests/queries/0_stateless/00938_template_input_format.sh b/tests/queries/0_stateless/00938_template_input_format.sh index e99f59614da..be75edcdb61 100755 --- a/tests/queries/0_stateless/00938_template_input_format.sh +++ b/tests/queries/0_stateless/00938_template_input_format.sh @@ -83,3 +83,13 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE template1"; $CLICKHOUSE_CLIENT --query="DROP TABLE template2"; rm "$CURDIR"/00938_template_input_format_resultset.tmp "$CURDIR"/00938_template_input_format_row.tmp +echo -ne '\${a:Escaped},\${b:Escaped}\n' > "$CURDIR"/00938_template_input_format_row.tmp +echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String, b String" --input-format Template \ + --format_template_row "$CURDIR"/00938_template_input_format_row.tmp --format_template_rows_between_delimiter '' \ + -q 'select * from table' 2>&1| grep -Fac "'Escaped' serialization requires delimiter" +echo -ne '\${a:Escaped},\${:Escaped}\n' > "$CURDIR"/00938_template_input_format_row.tmp +echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String" --input-format Template \ + --format_template_row "$CURDIR"/00938_template_input_format_row.tmp --format_template_rows_between_delimiter '' \ + -q 'select * from table' 2>&1| grep -Fac "'Escaped' serialization requires delimiter" +rm "$CURDIR"/00938_template_input_format_row.tmp + diff --git a/tests/queries/0_stateless/00941_to_custom_week.sql b/tests/queries/0_stateless/00941_to_custom_week.sql index 04ff08d4117..4dd5d209306 100644 --- a/tests/queries/0_stateless/00941_to_custom_week.sql +++ b/tests/queries/0_stateless/00941_to_custom_week.sql @@ -49,3 +49,4 @@ SELECT toStartOfWeek(x, 3) AS w3, toStartOfWeek(x_t, 3) AS wt3 FROM numbers(10); + diff --git a/tests/queries/0_stateless/01014_format_custom_separated.reference b/tests/queries/0_stateless/01014_format_custom_separated.reference index d46a6fdf5b1..626d6ed66b8 100644 --- a/tests/queries/0_stateless/01014_format_custom_separated.reference +++ b/tests/queries/0_stateless/01014_format_custom_separated.reference @@ -8,3 +8,4 @@ 1,"2019-09-25","world" 2,"2019-09-26","custom" 3,"2019-09-27","separated" +1 diff --git a/tests/queries/0_stateless/01014_format_custom_separated.sh b/tests/queries/0_stateless/01014_format_custom_separated.sh index 4e88419d125..655607c8c9b 100755 --- a/tests/queries/0_stateless/01014_format_custom_separated.sh +++ b/tests/queries/0_stateless/01014_format_custom_separated.sh @@ -34,3 +34,8 @@ FORMAT CustomSeparated" $CLICKHOUSE_CLIENT --query="SELECT * FROM custom_separated ORDER BY n FORMAT CSV" $CLICKHOUSE_CLIENT --query="DROP TABLE custom_separated" + +echo -ne "a,b\nc,d\n" | $CLICKHOUSE_LOCAL --structure "a String, b String" \ + --input-format CustomSeparated --format_custom_escaping_rule=Escaped \ + --format_custom_field_delimiter=',' --format_custom_row_after_delimiter=$'\n' -q 'select * from table' \ + 2>&1| grep -Fac "'Escaped' serialization requires delimiter" diff --git a/tests/queries/0_stateless/01440_to_date_monotonicity.reference b/tests/queries/0_stateless/01440_to_date_monotonicity.reference index dd8545b721d..2dbec540fbb 100644 --- a/tests/queries/0_stateless/01440_to_date_monotonicity.reference +++ b/tests/queries/0_stateless/01440_to_date_monotonicity.reference @@ -1,4 +1,4 @@ 0 -1970-01-01 2120-07-26 1970-04-11 1970-01-01 2149-06-06 +1970-01-01 2106-02-07 1970-04-11 1970-01-01 2149-06-06 1970-01-01 02:00:00 2106-02-07 09:28:15 1970-01-01 02:16:40 2000-01-01 13:12:12 diff --git a/tests/queries/0_stateless/01811_datename.reference b/tests/queries/0_stateless/01811_datename.reference index 2968fde301a..29bf05750e7 100644 --- a/tests/queries/0_stateless/01811_datename.reference +++ b/tests/queries/0_stateless/01811_datename.reference @@ -1,10 +1,10 @@ -2021 2021 2021 -2 2 2 -April April April -104 104 104 -14 14 14 -15 15 15 -Wednesday Wednesday Wednesday +2021 2021 2021 2021 +2 2 2 2 +April April April April +104 104 104 104 +14 14 14 14 +15 15 15 15 +Wednesday Wednesday Wednesday Wednesday 11 11 22 22 33 33 diff --git a/tests/queries/0_stateless/01811_datename.sql b/tests/queries/0_stateless/01811_datename.sql index b757d9ae018..fe9f5d20238 100644 --- a/tests/queries/0_stateless/01811_datename.sql +++ b/tests/queries/0_stateless/01811_datename.sql @@ -1,44 +1,51 @@ WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('year', date_value), dateName('year', date_time_value), dateName('year', date_time_64_value); +SELECT dateName('year', date_value), dateName('year', date_32_value), dateName('year', date_time_value), dateName('year', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('quarter', date_value), dateName('quarter', date_time_value), dateName('quarter', date_time_64_value); +SELECT dateName('quarter', date_value), dateName('quarter', date_32_value), dateName('quarter', date_time_value), dateName('quarter', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('month', date_value), dateName('month', date_time_value), dateName('month', date_time_64_value); +SELECT dateName('month', date_value), dateName('month', date_32_value), dateName('month', date_time_value), dateName('month', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('dayofyear', date_value), dateName('dayofyear', date_time_value), dateName('dayofyear', date_time_64_value); +SELECT dateName('dayofyear', date_value), dateName('dayofyear', date_32_value), dateName('dayofyear', date_time_value), dateName('dayofyear', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('day', date_value), dateName('day', date_time_value), dateName('day', date_time_64_value); +SELECT dateName('day', date_value), dateName('day', date_32_value), dateName('day', date_time_value), dateName('day', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('week', date_value), dateName('week', date_time_value), dateName('week', date_time_64_value); +SELECT dateName('week', date_value), dateName('week', date_32_value), dateName('week', date_time_value), dateName('week', date_time_64_value); WITH toDate('2021-04-14') AS date_value, + toDate32('2021-04-14') AS date_32_value, toDateTime('2021-04-14 11:22:33') AS date_time_value, toDateTime64('2021-04-14 11:22:33', 3) AS date_time_64_value -SELECT dateName('weekday', date_value), dateName('weekday', date_time_value), dateName('weekday', date_time_64_value); +SELECT dateName('weekday', date_value), dateName('weekday', date_32_value), dateName('weekday', date_time_value), dateName('weekday', date_time_64_value); WITH toDateTime('2021-04-14 11:22:33') AS date_time_value, diff --git a/tests/queries/0_stateless/01921_datatype_date32.reference b/tests/queries/0_stateless/01921_datatype_date32.reference index dcfc193e119..14079b906cb 100644 --- a/tests/queries/0_stateless/01921_datatype_date32.reference +++ b/tests/queries/0_stateless/01921_datatype_date32.reference @@ -43,16 +43,16 @@ -------toMinute--------- -------toSecond--------- -------toStartOfDay--------- -1970-01-01 02:00:00 -1970-01-01 02:00:00 -2106-02-07 00:00:00 -2106-02-07 00:00:00 +2036-02-07 07:31:20 +2036-02-07 07:31:20 +2027-10-01 11:03:28 +2027-10-17 11:03:28 2021-06-22 00:00:00 -------toMonday--------- -1970-01-01 -1970-01-01 -2149-06-02 -2149-06-02 +2079-06-07 +2079-06-07 +2120-07-06 +2120-07-20 2021-06-21 -------toISOWeek--------- 1 @@ -79,28 +79,28 @@ 229953 202125 -------toStartOfWeek--------- -1970-01-01 -1970-01-01 -2149-06-01 -2149-06-01 +2079-06-06 +2079-06-06 +2120-07-05 +2120-07-26 2021-06-20 -------toStartOfMonth--------- -1970-01-01 -1970-01-01 -2149-06-01 -2149-06-01 +2079-06-07 +2079-06-07 +2120-06-26 +2120-06-26 2021-06-01 -------toStartOfQuarter--------- -1970-01-01 -1970-01-01 -2149-04-01 -2149-04-01 +2079-06-07 +2079-06-07 +2120-04-26 +2120-04-26 2021-04-01 -------toStartOfYear--------- -1970-01-01 -1970-01-01 -2149-01-01 -2149-01-01 +2079-06-07 +2079-06-07 +2119-07-28 +2119-07-28 2021-01-01 -------toStartOfSecond--------- -------toStartOfMinute--------- diff --git a/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.reference b/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.reference new file mode 100644 index 00000000000..de9ac10f641 --- /dev/null +++ b/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.reference @@ -0,0 +1,16 @@ +-- { echo } + +SYSTEM DROP FILESYSTEM CACHE; +SET enable_filesystem_cache_on_write_operations=0; +DROP TABLE IF EXISTS test; +CREATE TABLE test (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='s3_cache_6', min_bytes_for_wide_part = 10485760; +INSERT INTO test SELECT number, toString(number) FROM numbers(100); +SELECT * FROM test FORMAT Null; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache ORDER BY file_segment_range_end, size; +0 79 80 +SYSTEM DROP FILESYSTEM CACHE; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; +SELECT * FROM test FORMAT Null; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; +SYSTEM DROP FILESYSTEM CACHE; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; diff --git a/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.sql b/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.sql new file mode 100644 index 00000000000..d3b3d3d7f4c --- /dev/null +++ b/tests/queries/0_stateless/02240_filesystem_cache_bypass_cache_threshold.sql @@ -0,0 +1,19 @@ +-- Tags: no-parallel, no-fasttest, no-s3-storage, no-random-settings + +-- { echo } + +SYSTEM DROP FILESYSTEM CACHE; +SET enable_filesystem_cache_on_write_operations=0; + +DROP TABLE IF EXISTS test; +CREATE TABLE test (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='s3_cache_6', min_bytes_for_wide_part = 10485760; +INSERT INTO test SELECT number, toString(number) FROM numbers(100); + +SELECT * FROM test FORMAT Null; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache ORDER BY file_segment_range_end, size; +SYSTEM DROP FILESYSTEM CACHE; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; +SELECT * FROM test FORMAT Null; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; +SYSTEM DROP FILESYSTEM CACHE; +SELECT file_segment_range_begin, file_segment_range_end, size FROM system.filesystem_cache; diff --git a/tests/queries/0_stateless/02240_system_remote_filesystem_query_cache.reference b/tests/queries/0_stateless/02240_filesystem_query_cache.reference similarity index 100% rename from tests/queries/0_stateless/02240_system_remote_filesystem_query_cache.reference rename to tests/queries/0_stateless/02240_filesystem_query_cache.reference diff --git a/tests/queries/0_stateless/02240_system_remote_filesystem_query_cache.sql b/tests/queries/0_stateless/02240_filesystem_query_cache.sql similarity index 100% rename from tests/queries/0_stateless/02240_system_remote_filesystem_query_cache.sql rename to tests/queries/0_stateless/02240_filesystem_query_cache.sql diff --git a/tests/queries/0_stateless/02344_show_caches.reference b/tests/queries/0_stateless/02344_show_caches.reference index 0c5957edb82..68882f63e1f 100644 --- a/tests/queries/0_stateless/02344_show_caches.reference +++ b/tests/queries/0_stateless/02344_show_caches.reference @@ -1,12 +1,13 @@ cached_azure s3_cache_2 +s3_cache +s3_cache_3 +s3_cache_multi s3_cache_4 s3_cache_5 local_cache +s3_cache_6 s3_cache_small local_cache_2 local_cache_3 -s3_cache_multi -s3_cache_3 -s3_cache s3_cache_multi_2 diff --git a/tests/queries/0_stateless/02346_additional_filters.reference b/tests/queries/0_stateless/02346_additional_filters.reference index 22d53173e71..0a08995223d 100644 --- a/tests/queries/0_stateless/02346_additional_filters.reference +++ b/tests/queries/0_stateless/02346_additional_filters.reference @@ -60,6 +60,14 @@ select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filt 0 0 select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy != 0'}; +select * from distr_table settings additional_table_filters={'distr_table' : 'x = 2'}; +2 bb +2 bb +select * from distr_table settings additional_table_filters={'distr_table' : 'x != 2 and x != 3'}; +1 a +4 dddd +1 a +4 dddd select * from system.numbers limit 5; 0 1 diff --git a/tests/queries/0_stateless/02346_additional_filters.sql b/tests/queries/0_stateless/02346_additional_filters.sql index 9e0bee4549b..f6b665713ec 100644 --- a/tests/queries/0_stateless/02346_additional_filters.sql +++ b/tests/queries/0_stateless/02346_additional_filters.sql @@ -1,3 +1,4 @@ +-- Tags: distributed drop table if exists table_1; drop table if exists table_2; drop table if exists v_numbers; @@ -6,6 +7,8 @@ drop table if exists mv_table; create table table_1 (x UInt32, y String) engine = MergeTree order by x; insert into table_1 values (1, 'a'), (2, 'bb'), (3, 'ccc'), (4, 'dddd'); +CREATE TABLE distr_table (x UInt32, y String) ENGINE = Distributed(test_cluster_two_shards, currentDatabase(), 'table_1'); + -- { echoOn } select * from table_1; @@ -29,6 +32,9 @@ select x from table_1 prewhere x != 2 where x != 2 settings additional_table_fil select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy = 0'}; select * from remote('127.0.0.{1,2}', system.one) settings additional_table_filters={'system.one' : 'dummy != 0'}; +select * from distr_table settings additional_table_filters={'distr_table' : 'x = 2'}; +select * from distr_table settings additional_table_filters={'distr_table' : 'x != 2 and x != 3'}; + select * from system.numbers limit 5; select * from system.numbers as t limit 5 settings additional_table_filters={'t' : 'number % 2 != 0'}; select * from system.numbers limit 5 settings additional_table_filters={'system.numbers' : 'number != 3'}; diff --git a/tests/queries/0_stateless/02346_additional_filters_distr.reference b/tests/queries/0_stateless/02346_additional_filters_distr.reference new file mode 100644 index 00000000000..81814b5e7bb --- /dev/null +++ b/tests/queries/0_stateless/02346_additional_filters_distr.reference @@ -0,0 +1,3 @@ +4 dddd +5 a +6 bb diff --git a/tests/queries/0_stateless/02346_additional_filters_distr.sql b/tests/queries/0_stateless/02346_additional_filters_distr.sql new file mode 100644 index 00000000000..bc9c1715c72 --- /dev/null +++ b/tests/queries/0_stateless/02346_additional_filters_distr.sql @@ -0,0 +1,20 @@ +-- Tags: no-parallel, distributed + +create database if not exists shard_0; +create database if not exists shard_1; + +drop table if exists dist_02346; +drop table if exists shard_0.data_02346; +drop table if exists shard_1.data_02346; + +create table shard_0.data_02346 (x UInt32, y String) engine = MergeTree order by x settings index_granularity = 2; +insert into shard_0.data_02346 values (1, 'a'), (2, 'bb'), (3, 'ccc'), (4, 'dddd'); + +create table shard_1.data_02346 (x UInt32, y String) engine = MergeTree order by x settings index_granularity = 2; +insert into shard_1.data_02346 values (5, 'a'), (6, 'bb'), (7, 'ccc'), (8, 'dddd'); + +create table dist_02346 (x UInt32, y String) engine=Distributed('test_cluster_two_shards_different_databases', /* default_database= */ '', data_02346); + +set max_rows_to_read=4; + +select * from dist_02346 order by x settings additional_table_filters={'dist_02346' : 'x > 3 and x < 7'}; diff --git a/tests/queries/0_stateless/02346_additional_filters_index.reference b/tests/queries/0_stateless/02346_additional_filters_index.reference new file mode 100644 index 00000000000..d4b9509cb3c --- /dev/null +++ b/tests/queries/0_stateless/02346_additional_filters_index.reference @@ -0,0 +1,30 @@ +-- { echoOn } +set max_rows_to_read = 2; +select * from table_1 order by x settings additional_table_filters={'table_1' : 'x > 3'}; +4 dddd +select * from table_1 order by x settings additional_table_filters={'table_1' : 'x < 3'}; +1 a +2 bb +select * from table_1 order by x settings additional_table_filters={'table_1' : 'length(y) >= 3'}; +3 ccc +4 dddd +select * from table_1 order by x settings additional_table_filters={'table_1' : 'length(y) < 3'}; +1 a +2 bb +set max_rows_to_read = 4; +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'x > 3'}; +4 dddd +4 dddd +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'x < 3'}; +1 a +1 a +2 bb +2 bb +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'length(y) > 3'}; +4 dddd +4 dddd +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'length(y) < 3'}; +1 a +1 a +2 bb +2 bb diff --git a/tests/queries/0_stateless/02346_additional_filters_index.sql b/tests/queries/0_stateless/02346_additional_filters_index.sql new file mode 100644 index 00000000000..0d40cc1f898 --- /dev/null +++ b/tests/queries/0_stateless/02346_additional_filters_index.sql @@ -0,0 +1,24 @@ +-- Tags: distributed + +create table table_1 (x UInt32, y String, INDEX a (length(y)) TYPE minmax GRANULARITY 1) engine = MergeTree order by x settings index_granularity = 2; +insert into table_1 values (1, 'a'), (2, 'bb'), (3, 'ccc'), (4, 'dddd'); + +CREATE TABLE distr_table (x UInt32, y String) ENGINE = Distributed(test_cluster_two_shards, currentDatabase(), 'table_1'); + +-- { echoOn } +set max_rows_to_read = 2; + +select * from table_1 order by x settings additional_table_filters={'table_1' : 'x > 3'}; +select * from table_1 order by x settings additional_table_filters={'table_1' : 'x < 3'}; + +select * from table_1 order by x settings additional_table_filters={'table_1' : 'length(y) >= 3'}; +select * from table_1 order by x settings additional_table_filters={'table_1' : 'length(y) < 3'}; + +set max_rows_to_read = 4; + +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'x > 3'}; +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'x < 3'}; + +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'length(y) > 3'}; +select * from distr_table order by x settings additional_table_filters={'distr_table' : 'length(y) < 3'}; + diff --git a/tests/queries/0_stateless/02354_annoy.sql b/tests/queries/0_stateless/02354_annoy.sql index 8a8d023a104..654a4b545ea 100644 --- a/tests/queries/0_stateless/02354_annoy.sql +++ b/tests/queries/0_stateless/02354_annoy.sql @@ -44,3 +44,71 @@ ORDER BY L2Distance(embedding, [0.0, 0.0]) LIMIT 3; -- { serverError 80 } DROP TABLE IF EXISTS 02354_annoy; + +-- ------------------------------------ +-- Check that weird base columns are rejected + +-- Index spans >1 column + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Array(Float32), + INDEX annoy_index (embedding, id) TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 7 } + +-- Index must be created on Array(Float32) or Tuple(Float32) + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Float32, + INDEX annoy_index embedding TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 44 } + + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Array(Float64), + INDEX annoy_index embedding TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 44 } + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Tuple(Float32, Float64), + INDEX annoy_index embedding TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 44 } + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Array(LowCardinality(Float32)), + INDEX annoy_index embedding TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 44 } + +CREATE TABLE 02354_annoy +( + id Int32, + embedding Array(Nullable(Float32)), + INDEX annoy_index embedding TYPE annoy(100) GRANULARITY 1 +) +ENGINE = MergeTree +ORDER BY id +SETTINGS index_granularity=5; -- {serverError 44 } diff --git a/tests/queries/0_stateless/02403_date_time_narrowing.reference b/tests/queries/0_stateless/02403_date_time_narrowing.reference deleted file mode 100644 index 7d6e91c61b8..00000000000 --- a/tests/queries/0_stateless/02403_date_time_narrowing.reference +++ /dev/null @@ -1,20 +0,0 @@ -1970-01-01 2149-06-06 1970-01-01 2149-06-06 1900-01-01 1970-01-02 1970-01-01 00:00:00 2106-02-07 06:28:15 -1970-01-01 2149-06-06 -1970-01-01 2149-06-06 -1970-01-01 00:00:00 2106-02-07 06:28:15 -1970-01-01 00:00:00 2106-02-07 06:28:15 -2106-02-07 06:28:15 -toStartOfDay -2106-02-07 00:00:00 1970-01-01 00:00:00 2106-02-07 00:00:00 1970-01-01 00:00:00 2106-02-07 00:00:00 -toStartOfWeek -1970-01-01 1970-01-01 1970-01-01 1970-01-01 1970-01-01 2149-06-01 1970-01-01 2149-06-02 -toMonday -1970-01-01 1970-01-01 2149-06-02 1970-01-01 2149-06-02 -toStartOfMonth -1970-01-01 2149-06-01 1970-01-01 2149-06-01 -toLastDayOfMonth -2149-05-31 1970-01-01 2149-05-31 1970-01-01 2149-05-31 -toStartOfQuarter -1970-01-01 2149-04-01 1970-01-01 2149-04-01 -toStartOfYear -1970-01-01 2149-01-01 1970-01-01 2149-01-01 diff --git a/tests/queries/0_stateless/02403_date_time_narrowing.sql b/tests/queries/0_stateless/02403_date_time_narrowing.sql deleted file mode 100644 index 07cbba6f31c..00000000000 --- a/tests/queries/0_stateless/02403_date_time_narrowing.sql +++ /dev/null @@ -1,74 +0,0 @@ --- check conversion of numbers to date/time -- -SELECT toDate(toInt32(toDate32('1930-01-01', 'UTC')), 'UTC'), - toDate(toInt32(toDate32('2151-01-01', 'UTC')), 'UTC'), - toDate(toInt64(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC')), 'UTC'), - toDate(toInt64(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC')), 'UTC'), - toDate32(toInt32(toDate32('1900-01-01', 'UTC')) - 1, 'UTC'), - toDate32(toInt32(toDate32('2299-12-31', 'UTC')) + 1, 'UTC'), - toDateTime(toInt64(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC')), 'UTC'), - toDateTime(toInt64(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC')), 'UTC'); - --- check conversion of extended range type to normal range type -- -SELECT toDate(toDate32('1930-01-01', 'UTC'), 'UTC'), - toDate(toDate32('2151-01-01', 'UTC'), 'UTC'); - -SELECT toDate(toDateTime64('1930-01-01 12:12:12.12', 3, 'UTC'), 'UTC'), - toDate(toDateTime64('2151-01-01 12:12:12.12', 3, 'UTC'), 'UTC'); - -SELECT toDateTime(toDateTime64('1930-01-01 12:12:12.12', 3, 'UTC'), 'UTC'), - toDateTime(toDateTime64('2151-01-01 12:12:12.12', 3, 'UTC'), 'UTC'); - -SELECT toDateTime(toDate32('1930-01-01', 'UTC'), 'UTC'), - toDateTime(toDate32('2151-01-01', 'UTC'), 'UTC'); - -SELECT toDateTime(toDate('2141-01-01', 'UTC'), 'UTC'); - --- test DateTimeTransforms -- -SELECT 'toStartOfDay'; -SELECT toStartOfDay(toDate('2141-01-01', 'UTC'), 'UTC'), - toStartOfDay(toDate32('1930-01-01', 'UTC'), 'UTC'), - toStartOfDay(toDate32('2141-01-01', 'UTC'), 'UTC'), - toStartOfDay(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toStartOfDay(toDateTime64('2141-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); - -SELECT 'toStartOfWeek'; -SELECT toStartOfWeek(toDate('1970-01-01', 'UTC')), - toStartOfWeek(toDate32('1970-01-01', 'UTC')), - toStartOfWeek(toDateTime('1970-01-01 10:10:10', 'UTC'), 0, 'UTC'), - toStartOfWeek(toDateTime64('1970-01-01 10:10:10.123', 3, 'UTC'), 1, 'UTC'), - toStartOfWeek(toDate32('1930-01-01', 'UTC')), - toStartOfWeek(toDate32('2151-01-01', 'UTC')), - toStartOfWeek(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 2, 'UTC'), - toStartOfWeek(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 3, 'UTC'); - -SELECT 'toMonday'; -SELECT toMonday(toDate('1970-01-02', 'UTC')), - toMonday(toDate32('1930-01-01', 'UTC')), - toMonday(toDate32('2151-01-01', 'UTC')), - toMonday(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toMonday(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); - -SELECT 'toStartOfMonth'; -SELECT toStartOfMonth(toDate32('1930-01-01', 'UTC')), - toStartOfMonth(toDate32('2151-01-01', 'UTC')), - toStartOfMonth(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toStartOfMonth(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); - -SELECT 'toLastDayOfMonth'; -SELECT toLastDayOfMonth(toDate('2149-06-03', 'UTC')), - toLastDayOfMonth(toDate32('1930-01-01', 'UTC')), - toLastDayOfMonth(toDate32('2151-01-01', 'UTC')), - toLastDayOfMonth(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toLastDayOfMonth(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); - -SELECT 'toStartOfQuarter'; -SELECT toStartOfQuarter(toDate32('1930-01-01', 'UTC')), - toStartOfQuarter(toDate32('2151-01-01', 'UTC')), - toStartOfQuarter(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toStartOfQuarter(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); - -SELECT 'toStartOfYear'; -SELECT toStartOfYear(toDate32('1930-01-01', 'UTC')), - toStartOfYear(toDate32('2151-01-01', 'UTC')), - toStartOfYear(toDateTime64('1930-01-01 12:12:12.123', 3, 'UTC'), 'UTC'), - toStartOfYear(toDateTime64('2151-01-01 12:12:12.123', 3, 'UTC'), 'UTC'); diff --git a/tests/queries/0_stateless/02403_enable_extended_results_for_datetime_functions.reference b/tests/queries/0_stateless/02403_enable_extended_results_for_datetime_functions.reference index 5773810bf64..025191c234a 100644 --- a/tests/queries/0_stateless/02403_enable_extended_results_for_datetime_functions.reference +++ b/tests/queries/0_stateless/02403_enable_extended_results_for_datetime_functions.reference @@ -42,39 +42,39 @@ timeSlot;toDateTime64;true 1920-02-02 10:00:00.000 type;timeSlot;toDateTime64;true DateTime64(3, \'UTC\') toStartOfDay;toDate32;true 1920-02-02 00:00:00.000 type;toStartOfDay;toDate32;true DateTime64(3, \'UTC\') -toStartOfYear;toDate32;false 1970-01-01 +toStartOfYear;toDate32;false 2099-06-06 type;toStartOfYear;toDate32;false Date -toStartOfYear;toDateTime64;false 1970-01-01 +toStartOfYear;toDateTime64;false 2099-06-06 type;toStartOfYear;toDateTime64;false Date toStartOfISOYear;toDate32;false 1970-01-01 type;toStartOfISOYear;toDate32;false Date toStartOfISOYear;toDateTime64;false 1970-01-01 type;toStartOfISOYear;toDateTime64;false Date -toStartOfQuarter;toDate32;false 1970-01-01 +toStartOfQuarter;toDate32;false 2099-06-06 type;toStartOfQuarter;toDate32;false Date -toStartOfQuarter;toDateTime64;false 1970-01-01 +toStartOfQuarter;toDateTime64;false 2099-06-06 type;toStartOfQuarter;toDateTime64;false Date -toStartOfMonth;toDate32;false 1970-01-01 +toStartOfMonth;toDate32;false 2099-07-07 type;toStartOfMonth;toDate32;false Date -toStartOfMonth;toDateTime64;false 1970-01-01 +toStartOfMonth;toDateTime64;false 2099-07-07 type;toStartOfMonth;toDateTime64;false Date -toStartOfWeek;toDate32;false 1970-01-01 +toStartOfWeek;toDate32;false 2099-07-07 type;toStartOfWeek;toDate32;false Date -toStartOfWeek;toDateTime64;false 1970-01-01 +toStartOfWeek;toDateTime64;false 2099-07-07 type;toStartOfWeek;toDateTime64;false Date -toMonday;toDate32;false 1970-01-01 +toMonday;toDate32;false 2099-07-08 type;toMonday;toDate32;false Date -toMonday;toDateTime64;false 1970-01-01 +toMonday;toDateTime64;false 2099-07-08 type;toMonday;toDateTime64;false Date -toLastDayOfMonth;toDate32;false 1970-01-01 +toLastDayOfMonth;toDate32;false 2099-08-04 type;toLastDayOfMonth;toDate32;false Date -toLastDayOfMonth;toDateTime64;false 1970-01-01 +toLastDayOfMonth;toDateTime64;false 2099-08-04 type;toLastDayOfMonth;toDateTime64;false Date -toStartOfDay;toDateTime64;false 1970-01-01 00:00:00 +toStartOfDay;toDateTime64;false 2056-03-09 06:28:16 type;toStartOfDay;toDateTime64;false DateTime(\'UTC\') -toStartOfHour;toDateTime64;false 1970-01-01 00:00:00 +toStartOfHour;toDateTime64;false 2056-03-09 16:28:16 type;toStartOfHour;toDateTime64;false DateTime(\'UTC\') -toStartOfMinute;toDateTime64;false 1970-01-01 00:00:00 +toStartOfMinute;toDateTime64;false 2056-03-09 16:51:16 type;toStartOfMinute;toDateTime64;false DateTime(\'UTC\') toStartOfFiveMinutes;toDateTime64;false 2056-03-09 16:48:16 type;toStartOfFiveMinutes;toDateTime64;false DateTime(\'UTC\') @@ -84,5 +84,5 @@ toStartOfFifteenMinutes;toDateTime64;false 2056-03-09 16:43:16 type;toStartOfFifteenMinutes;toDateTime64;false DateTime(\'UTC\') timeSlot;toDateTime64;false 2056-03-09 16:58:16 type;timeSlot;toDateTime64;false DateTime(\'UTC\') -toStartOfDay;toDate32;false 1970-01-01 00:00:00 +toStartOfDay;toDate32;false 2056-03-09 06:28:16 type;toStartOfDay;toDate32;false DateTime(\'UTC\') diff --git a/tests/queries/0_stateless/02448_clone_replica_lost_part.reference b/tests/queries/0_stateless/02448_clone_replica_lost_part.reference new file mode 100644 index 00000000000..26c6cbf438b --- /dev/null +++ b/tests/queries/0_stateless/02448_clone_replica_lost_part.reference @@ -0,0 +1,11 @@ +1 [2,3,4,5] +2 [1,2,3,4,5] +3 [1,2,3,4,5] +4 [3,4,5] +5 [1,2,3,4,5] +6 [1,2,3,4,5] +7 [1,2,3,4,5,20,30,40,50] +8 [1,2,3,4,5,10,20,30,40,50] +9 [1,2,3,4,5,10,20,30,40,50] +11 [1,2,3,4,5,10,20,30,40,50,100,300,400,500,600] +12 [1,2,3,4,5,10,20,30,40,50,100,300,400,500,600] diff --git a/tests/queries/0_stateless/02448_clone_replica_lost_part.sql b/tests/queries/0_stateless/02448_clone_replica_lost_part.sql new file mode 100644 index 00000000000..371f7389837 --- /dev/null +++ b/tests/queries/0_stateless/02448_clone_replica_lost_part.sql @@ -0,0 +1,147 @@ +-- Tags: long + +drop table if exists rmt1; +drop table if exists rmt2; +create table rmt1 (n int) engine=ReplicatedMergeTree('/test/02448/{database}/rmt', '1') order by tuple() + settings min_replicated_logs_to_keep=1, max_replicated_logs_to_keep=2, cleanup_delay_period=0, cleanup_delay_period_random_add=1, old_parts_lifetime=0, max_parts_to_merge_at_once=5; +create table rmt2 (n int) engine=ReplicatedMergeTree('/test/02448/{database}/rmt', '2') order by tuple() + settings min_replicated_logs_to_keep=1, max_replicated_logs_to_keep=2, cleanup_delay_period=0, cleanup_delay_period_random_add=1, old_parts_lifetime=0, max_parts_to_merge_at_once=5; + +-- insert part only on one replica +system stop replicated sends rmt1; +insert into rmt1 values (1); +detach table rmt1; -- make replica inactive +system start replicated sends rmt1; + +-- trigger log rotation, rmt1 will be lost +insert into rmt2 values (2); +insert into rmt2 values (3); +insert into rmt2 values (4); +insert into rmt2 values (5); +-- check that entry was not removed from the queue (part is not lost) +set receive_timeout=5; +system sync replica rmt2; -- {serverError TIMEOUT_EXCEEDED} +set receive_timeout=300; + +select 1, arraySort(groupArray(n)) from rmt2; + +-- rmt1 will mimic rmt2 +attach table rmt1; +system sync replica rmt1; +system sync replica rmt2; + +-- check that no parts are lost +select 2, arraySort(groupArray(n)) from rmt1; +select 3, arraySort(groupArray(n)) from rmt2; + + +truncate table rmt1; +truncate table rmt2; + + +-- insert parts only on one replica and merge them +system stop replicated sends rmt2; +insert into rmt2 values (1); +insert into rmt2 values (2); +system sync replica rmt2; +optimize table rmt2 final; +system sync replica rmt2; +-- give it a chance to remove source parts +select sleep(2) format Null; -- increases probability of reproducing the issue +detach table rmt2; +system start replicated sends rmt2; + + +-- trigger log rotation, rmt2 will be lost +insert into rmt1 values (3); +insert into rmt1 values (4); +insert into rmt1 values (5); +set receive_timeout=5; +-- check that entry was not removed from the queue (part is not lost) +system sync replica rmt1; -- {serverError TIMEOUT_EXCEEDED} +set receive_timeout=300; + +select 4, arraySort(groupArray(n)) from rmt1; + +-- rmt1 will mimic rmt2 +system stop fetches rmt1; +attach table rmt2; +system sync replica rmt2; +-- give rmt2 a chance to remove merged part (but it should not do it) +select sleep(2) format Null; -- increases probability of reproducing the issue +system start fetches rmt1; +system sync replica rmt1; + +-- check that no parts are lost +select 5, arraySort(groupArray(n)) from rmt1; +select 6, arraySort(groupArray(n)) from rmt2; + + +-- insert part only on one replica +system stop replicated sends rmt1; +insert into rmt1 values (123); +alter table rmt1 update n=10 where n=123 settings mutations_sync=1; +-- give it a chance to remove source part +select sleep(2) format Null; -- increases probability of reproducing the issue +detach table rmt1; -- make replica inactive +system start replicated sends rmt1; + +-- trigger log rotation, rmt1 will be lost +insert into rmt2 values (20); +insert into rmt2 values (30); +insert into rmt2 values (40); +insert into rmt2 values (50); +-- check that entry was not removed from the queue (part is not lost) +set receive_timeout=5; +system sync replica rmt2; -- {serverError TIMEOUT_EXCEEDED} +set receive_timeout=300; + +select 7, arraySort(groupArray(n)) from rmt2; + +-- rmt1 will mimic rmt2 +system stop fetches rmt2; +attach table rmt1; +system sync replica rmt1; +-- give rmt1 a chance to remove mutated part (but it should not do it) +select sleep(2) format Null; -- increases probability of reproducing the issue +system start fetches rmt2; +system sync replica rmt2; + +-- check that no parts are lost +select 8, arraySort(groupArray(n)) from rmt1; +select 9, arraySort(groupArray(n)) from rmt2; + +-- avoid arbitrary merges after inserting +optimize table rmt2 final; +-- insert parts (all_18_18_0, all_19_19_0) on both replicas (will be deduplicated, but it does not matter) +insert into rmt1 values (100); +insert into rmt2 values (100); +insert into rmt1 values (200); +insert into rmt2 values (200); +detach table rmt1; + +-- create a gap in block numbers buy dropping part +insert into rmt2 values (300); +alter table rmt2 drop part 'all_19_19_0'; -- remove 200 +insert into rmt2 values (400); +insert into rmt2 values (500); +insert into rmt2 values (600); +system sync replica rmt2; +-- merge through gap +optimize table rmt2; +-- give it a chance to cleanup log +select sleep(2) format Null; -- increases probability of reproducing the issue + +-- rmt1 will mimic rmt2, but will not be able to fetch parts for a while +system stop replicated sends rmt2; +attach table rmt1; +-- rmt1 should not show the value (200) from dropped part +select throwIf(n = 200) from rmt1 format Null; +select 11, arraySort(groupArray(n)) from rmt2; + +system start replicated sends rmt2; +system sync replica rmt1; +select 12, arraySort(groupArray(n)) from rmt1; + +drop table rmt1; +drop table rmt2; diff --git a/tests/queries/0_stateless/02456_aggregate_state_conversion.reference b/tests/queries/0_stateless/02456_aggregate_state_conversion.reference new file mode 100644 index 00000000000..abf55dde8a7 --- /dev/null +++ b/tests/queries/0_stateless/02456_aggregate_state_conversion.reference @@ -0,0 +1 @@ +1027000000000000000000000000000000000000000000000000000000000000 diff --git a/tests/queries/0_stateless/02456_aggregate_state_conversion.sql b/tests/queries/0_stateless/02456_aggregate_state_conversion.sql new file mode 100644 index 00000000000..3c05c59de59 --- /dev/null +++ b/tests/queries/0_stateless/02456_aggregate_state_conversion.sql @@ -0,0 +1 @@ +SELECT hex(CAST(x, 'AggregateFunction(sum, Decimal(50, 10))')) FROM (SELECT arrayReduce('sumState', [toDecimal256('0.0000010.000001', 10)]) AS x) GROUP BY x; diff --git a/tests/queries/0_stateless/02457_datediff_via_unix_epoch.reference b/tests/queries/0_stateless/02457_datediff_via_unix_epoch.reference new file mode 100644 index 00000000000..ba12c868037 --- /dev/null +++ b/tests/queries/0_stateless/02457_datediff_via_unix_epoch.reference @@ -0,0 +1,16 @@ +year 1 +year 1 +quarter 1 +quarter 1 +month 1 +month 1 +week 1 +week 1 +day 11 +day 11 +hour 264 +hour 264 +minute 1440 +minute 20 +second 86400 +second 1200 diff --git a/tests/queries/0_stateless/02457_datediff_via_unix_epoch.sql b/tests/queries/0_stateless/02457_datediff_via_unix_epoch.sql new file mode 100644 index 00000000000..796b4cc6e8f --- /dev/null +++ b/tests/queries/0_stateless/02457_datediff_via_unix_epoch.sql @@ -0,0 +1,23 @@ +select 'year', date_diff('year', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'year', date_diff('year', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'quarter', date_diff('quarter', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'quarter', date_diff('quarter', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'month', date_diff('month', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'month', date_diff('month', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'week', date_diff('week', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'week', date_diff('week', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'day', date_diff('day', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'day', date_diff('day', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'hour', date_diff('hour', toDate32('1969-12-25'), toDate32('1970-01-05')); +select 'hour', date_diff('hour', toDateTime64('1969-12-25 10:00:00.000', 3), toDateTime64('1970-01-05 10:00:00.000', 3)); + +select 'minute', date_diff('minute', toDate32('1969-12-31'), toDate32('1970-01-01')); +select 'minute', date_diff('minute', toDateTime64('1969-12-31 23:50:00.000', 3), toDateTime64('1970-01-01 00:10:00.000', 3)); + +select 'second', date_diff('second', toDate32('1969-12-31'), toDate32('1970-01-01')); +select 'second', date_diff('second', toDateTime64('1969-12-31 23:50:00.000', 3), toDateTime64('1970-01-01 00:10:00.000', 3)); diff --git a/tests/queries/0_stateless/02458_datediff_date32.reference b/tests/queries/0_stateless/02458_datediff_date32.reference new file mode 100644 index 00000000000..67bfa895199 --- /dev/null +++ b/tests/queries/0_stateless/02458_datediff_date32.reference @@ -0,0 +1,169 @@ +-- { echo } + +-- Date32 vs Date32 +SELECT dateDiff('second', toDate32('1900-01-01'), toDate32('1900-01-02')); +86400 +SELECT dateDiff('minute', toDate32('1900-01-01'), toDate32('1900-01-02')); +1440 +SELECT dateDiff('hour', toDate32('1900-01-01'), toDate32('1900-01-02')); +24 +SELECT dateDiff('day', toDate32('1900-01-01'), toDate32('1900-01-02')); +1 +SELECT dateDiff('week', toDate32('1900-01-01'), toDate32('1900-01-08')); +1 +SELECT dateDiff('month', toDate32('1900-01-01'), toDate32('1900-02-01')); +1 +SELECT dateDiff('quarter', toDate32('1900-01-01'), toDate32('1900-04-01')); +1 +SELECT dateDiff('year', toDate32('1900-01-01'), toDate32('1901-01-01')); +1 +-- With DateTime64 +-- Date32 vs DateTime64 +SELECT dateDiff('second', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +86400 +SELECT dateDiff('minute', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +1440 +SELECT dateDiff('hour', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +24 +SELECT dateDiff('day', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +1 +SELECT dateDiff('week', toDate32('1900-01-01'), toDateTime64('1900-01-08 00:00:00', 3)); +1 +SELECT dateDiff('month', toDate32('1900-01-01'), toDateTime64('1900-02-01 00:00:00', 3)); +1 +SELECT dateDiff('quarter', toDate32('1900-01-01'), toDateTime64('1900-04-01 00:00:00', 3)); +1 +SELECT dateDiff('year', toDate32('1900-01-01'), toDateTime64('1901-01-01 00:00:00', 3)); +1 +-- DateTime64 vs Date32 +SELECT dateDiff('second', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +86400 +SELECT dateDiff('minute', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +1440 +SELECT dateDiff('hour', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +24 +SELECT dateDiff('day', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +1 +SELECT dateDiff('week', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-08')); +1 +SELECT dateDiff('month', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-02-01')); +1 +SELECT dateDiff('quarter', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-04-01')); +1 +SELECT dateDiff('year', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1901-01-01')); +1 +-- With DateTime +-- Date32 vs DateTime +SELECT dateDiff('second', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +86400 +SELECT dateDiff('minute', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +1440 +SELECT dateDiff('hour', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +24 +SELECT dateDiff('day', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +1 +SELECT dateDiff('week', toDate32('2015-08-18'), toDateTime('2015-08-25 00:00:00')); +1 +SELECT dateDiff('month', toDate32('2015-08-18'), toDateTime('2015-09-18 00:00:00')); +1 +SELECT dateDiff('quarter', toDate32('2015-08-18'), toDateTime('2015-11-18 00:00:00')); +1 +SELECT dateDiff('year', toDate32('2015-08-18'), toDateTime('2016-08-18 00:00:00')); +1 +-- DateTime vs Date32 +SELECT dateDiff('second', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +86400 +SELECT dateDiff('minute', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +1440 +SELECT dateDiff('hour', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +24 +SELECT dateDiff('day', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +1 +SELECT dateDiff('week', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-25')); +1 +SELECT dateDiff('month', toDateTime('2015-08-18 00:00:00'), toDate32('2015-09-18')); +1 +SELECT dateDiff('quarter', toDateTime('2015-08-18 00:00:00'), toDate32('2015-11-18')); +1 +SELECT dateDiff('year', toDateTime('2015-08-18 00:00:00'), toDate32('2016-08-18')); +1 +-- With Date +-- Date32 vs Date +SELECT dateDiff('second', toDate32('2015-08-18'), toDate('2015-08-19')); +86400 +SELECT dateDiff('minute', toDate32('2015-08-18'), toDate('2015-08-19')); +1440 +SELECT dateDiff('hour', toDate32('2015-08-18'), toDate('2015-08-19')); +24 +SELECT dateDiff('day', toDate32('2015-08-18'), toDate('2015-08-19')); +1 +SELECT dateDiff('week', toDate32('2015-08-18'), toDate('2015-08-25')); +1 +SELECT dateDiff('month', toDate32('2015-08-18'), toDate('2015-09-18')); +1 +SELECT dateDiff('quarter', toDate32('2015-08-18'), toDate('2015-11-18')); +1 +SELECT dateDiff('year', toDate32('2015-08-18'), toDate('2016-08-18')); +1 +-- Date vs Date32 +SELECT dateDiff('second', toDate('2015-08-18'), toDate32('2015-08-19')); +86400 +SELECT dateDiff('minute', toDate('2015-08-18'), toDate32('2015-08-19')); +1440 +SELECT dateDiff('hour', toDate('2015-08-18'), toDate32('2015-08-19')); +24 +SELECT dateDiff('day', toDate('2015-08-18'), toDate32('2015-08-19')); +1 +SELECT dateDiff('week', toDate('2015-08-18'), toDate32('2015-08-25')); +1 +SELECT dateDiff('month', toDate('2015-08-18'), toDate32('2015-09-18')); +1 +SELECT dateDiff('quarter', toDate('2015-08-18'), toDate32('2015-11-18')); +1 +SELECT dateDiff('year', toDate('2015-08-18'), toDate32('2016-08-18')); +1 +-- Const vs non-const columns +SELECT dateDiff('day', toDate32('1900-01-01'), materialize(toDate32('1900-01-02'))); +1 +SELECT dateDiff('day', toDate32('1900-01-01'), materialize(toDateTime64('1900-01-02 00:00:00', 3))); +1 +SELECT dateDiff('day', toDateTime64('1900-01-01 00:00:00', 3), materialize(toDate32('1900-01-02'))); +1 +SELECT dateDiff('day', toDate32('2015-08-18'), materialize(toDateTime('2015-08-19 00:00:00'))); +1 +SELECT dateDiff('day', toDateTime('2015-08-18 00:00:00'), materialize(toDate32('2015-08-19'))); +1 +SELECT dateDiff('day', toDate32('2015-08-18'), materialize(toDate('2015-08-19'))); +1 +SELECT dateDiff('day', toDate('2015-08-18'), materialize(toDate32('2015-08-19'))); +1 +-- Non-const vs const columns +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), toDate32('1900-01-02')); +1 +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), toDateTime64('1900-01-02 00:00:00', 3)); +1 +SELECT dateDiff('day', materialize(toDateTime64('1900-01-01 00:00:00', 3)), toDate32('1900-01-02')); +1 +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), toDateTime('2015-08-19 00:00:00')); +1 +SELECT dateDiff('day', materialize(toDateTime('2015-08-18 00:00:00')), toDate32('2015-08-19')); +1 +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), toDate('2015-08-19')); +1 +SELECT dateDiff('day', materialize(toDate('2015-08-18')), toDate32('2015-08-19')); +1 +-- Non-const vs non-const columns +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), materialize(toDate32('1900-01-02'))); +1 +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), materialize(toDateTime64('1900-01-02 00:00:00', 3))); +1 +SELECT dateDiff('day', materialize(toDateTime64('1900-01-01 00:00:00', 3)), materialize(toDate32('1900-01-02'))); +1 +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), materialize(toDateTime('2015-08-19 00:00:00'))); +1 +SELECT dateDiff('day', materialize(toDateTime('2015-08-18 00:00:00')), materialize(toDate32('2015-08-19'))); +1 +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), materialize(toDate('2015-08-19'))); +1 +SELECT dateDiff('day', materialize(toDate('2015-08-18')), materialize(toDate32('2015-08-19'))); +1 diff --git a/tests/queries/0_stateless/02458_datediff_date32.sql b/tests/queries/0_stateless/02458_datediff_date32.sql new file mode 100644 index 00000000000..4c26e04ac27 --- /dev/null +++ b/tests/queries/0_stateless/02458_datediff_date32.sql @@ -0,0 +1,101 @@ +-- { echo } + +-- Date32 vs Date32 +SELECT dateDiff('second', toDate32('1900-01-01'), toDate32('1900-01-02')); +SELECT dateDiff('minute', toDate32('1900-01-01'), toDate32('1900-01-02')); +SELECT dateDiff('hour', toDate32('1900-01-01'), toDate32('1900-01-02')); +SELECT dateDiff('day', toDate32('1900-01-01'), toDate32('1900-01-02')); +SELECT dateDiff('week', toDate32('1900-01-01'), toDate32('1900-01-08')); +SELECT dateDiff('month', toDate32('1900-01-01'), toDate32('1900-02-01')); +SELECT dateDiff('quarter', toDate32('1900-01-01'), toDate32('1900-04-01')); +SELECT dateDiff('year', toDate32('1900-01-01'), toDate32('1901-01-01')); + +-- With DateTime64 +-- Date32 vs DateTime64 +SELECT dateDiff('second', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +SELECT dateDiff('minute', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +SELECT dateDiff('hour', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +SELECT dateDiff('day', toDate32('1900-01-01'), toDateTime64('1900-01-02 00:00:00', 3)); +SELECT dateDiff('week', toDate32('1900-01-01'), toDateTime64('1900-01-08 00:00:00', 3)); +SELECT dateDiff('month', toDate32('1900-01-01'), toDateTime64('1900-02-01 00:00:00', 3)); +SELECT dateDiff('quarter', toDate32('1900-01-01'), toDateTime64('1900-04-01 00:00:00', 3)); +SELECT dateDiff('year', toDate32('1900-01-01'), toDateTime64('1901-01-01 00:00:00', 3)); + +-- DateTime64 vs Date32 +SELECT dateDiff('second', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +SELECT dateDiff('minute', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +SELECT dateDiff('hour', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +SELECT dateDiff('day', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-02')); +SELECT dateDiff('week', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-01-08')); +SELECT dateDiff('month', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-02-01')); +SELECT dateDiff('quarter', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1900-04-01')); +SELECT dateDiff('year', toDateTime64('1900-01-01 00:00:00', 3), toDate32('1901-01-01')); + +-- With DateTime +-- Date32 vs DateTime +SELECT dateDiff('second', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +SELECT dateDiff('minute', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +SELECT dateDiff('hour', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +SELECT dateDiff('day', toDate32('2015-08-18'), toDateTime('2015-08-19 00:00:00')); +SELECT dateDiff('week', toDate32('2015-08-18'), toDateTime('2015-08-25 00:00:00')); +SELECT dateDiff('month', toDate32('2015-08-18'), toDateTime('2015-09-18 00:00:00')); +SELECT dateDiff('quarter', toDate32('2015-08-18'), toDateTime('2015-11-18 00:00:00')); +SELECT dateDiff('year', toDate32('2015-08-18'), toDateTime('2016-08-18 00:00:00')); + +-- DateTime vs Date32 +SELECT dateDiff('second', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +SELECT dateDiff('minute', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +SELECT dateDiff('hour', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +SELECT dateDiff('day', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-19')); +SELECT dateDiff('week', toDateTime('2015-08-18 00:00:00'), toDate32('2015-08-25')); +SELECT dateDiff('month', toDateTime('2015-08-18 00:00:00'), toDate32('2015-09-18')); +SELECT dateDiff('quarter', toDateTime('2015-08-18 00:00:00'), toDate32('2015-11-18')); +SELECT dateDiff('year', toDateTime('2015-08-18 00:00:00'), toDate32('2016-08-18')); + +-- With Date +-- Date32 vs Date +SELECT dateDiff('second', toDate32('2015-08-18'), toDate('2015-08-19')); +SELECT dateDiff('minute', toDate32('2015-08-18'), toDate('2015-08-19')); +SELECT dateDiff('hour', toDate32('2015-08-18'), toDate('2015-08-19')); +SELECT dateDiff('day', toDate32('2015-08-18'), toDate('2015-08-19')); +SELECT dateDiff('week', toDate32('2015-08-18'), toDate('2015-08-25')); +SELECT dateDiff('month', toDate32('2015-08-18'), toDate('2015-09-18')); +SELECT dateDiff('quarter', toDate32('2015-08-18'), toDate('2015-11-18')); +SELECT dateDiff('year', toDate32('2015-08-18'), toDate('2016-08-18')); + +-- Date vs Date32 +SELECT dateDiff('second', toDate('2015-08-18'), toDate32('2015-08-19')); +SELECT dateDiff('minute', toDate('2015-08-18'), toDate32('2015-08-19')); +SELECT dateDiff('hour', toDate('2015-08-18'), toDate32('2015-08-19')); +SELECT dateDiff('day', toDate('2015-08-18'), toDate32('2015-08-19')); +SELECT dateDiff('week', toDate('2015-08-18'), toDate32('2015-08-25')); +SELECT dateDiff('month', toDate('2015-08-18'), toDate32('2015-09-18')); +SELECT dateDiff('quarter', toDate('2015-08-18'), toDate32('2015-11-18')); +SELECT dateDiff('year', toDate('2015-08-18'), toDate32('2016-08-18')); + +-- Const vs non-const columns +SELECT dateDiff('day', toDate32('1900-01-01'), materialize(toDate32('1900-01-02'))); +SELECT dateDiff('day', toDate32('1900-01-01'), materialize(toDateTime64('1900-01-02 00:00:00', 3))); +SELECT dateDiff('day', toDateTime64('1900-01-01 00:00:00', 3), materialize(toDate32('1900-01-02'))); +SELECT dateDiff('day', toDate32('2015-08-18'), materialize(toDateTime('2015-08-19 00:00:00'))); +SELECT dateDiff('day', toDateTime('2015-08-18 00:00:00'), materialize(toDate32('2015-08-19'))); +SELECT dateDiff('day', toDate32('2015-08-18'), materialize(toDate('2015-08-19'))); +SELECT dateDiff('day', toDate('2015-08-18'), materialize(toDate32('2015-08-19'))); + +-- Non-const vs const columns +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), toDate32('1900-01-02')); +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), toDateTime64('1900-01-02 00:00:00', 3)); +SELECT dateDiff('day', materialize(toDateTime64('1900-01-01 00:00:00', 3)), toDate32('1900-01-02')); +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), toDateTime('2015-08-19 00:00:00')); +SELECT dateDiff('day', materialize(toDateTime('2015-08-18 00:00:00')), toDate32('2015-08-19')); +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), toDate('2015-08-19')); +SELECT dateDiff('day', materialize(toDate('2015-08-18')), toDate32('2015-08-19')); + +-- Non-const vs non-const columns +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), materialize(toDate32('1900-01-02'))); +SELECT dateDiff('day', materialize(toDate32('1900-01-01')), materialize(toDateTime64('1900-01-02 00:00:00', 3))); +SELECT dateDiff('day', materialize(toDateTime64('1900-01-01 00:00:00', 3)), materialize(toDate32('1900-01-02'))); +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), materialize(toDateTime('2015-08-19 00:00:00'))); +SELECT dateDiff('day', materialize(toDateTime('2015-08-18 00:00:00')), materialize(toDate32('2015-08-19'))); +SELECT dateDiff('day', materialize(toDate32('2015-08-18')), materialize(toDate('2015-08-19'))); +SELECT dateDiff('day', materialize(toDate('2015-08-18')), materialize(toDate32('2015-08-19'))); diff --git a/tests/queries/0_stateless/02461_cancel_finish_race.reference b/tests/queries/0_stateless/02461_cancel_finish_race.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/02461_cancel_finish_race.sh b/tests/queries/0_stateless/02461_cancel_finish_race.sh new file mode 100755 index 00000000000..7e775437da1 --- /dev/null +++ b/tests/queries/0_stateless/02461_cancel_finish_race.sh @@ -0,0 +1,59 @@ +#!/usr/bin/env bash +# Tags: no-fasttest + + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +function thread_query() +{ + while true; do + $CLICKHOUSE_CLIENT --query "SELECT count() FROM numbers_mt(10000) WHERE rand() = 0 FORMAT Null"; + done +} + +function thread_cancel() +{ + while true; do + $CLICKHOUSE_CLIENT --query "KILL QUERY WHERE current_database = '$CLICKHOUSE_DATABASE' SYNC FORMAT Null"; + done +} + +# https://stackoverflow.com/questions/9954794/execute-a-shell-function-with-timeout +export -f thread_query; +export -f thread_cancel; + +TIMEOUT=30 + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +timeout $TIMEOUT bash -c thread_query 2> /dev/null & +timeout $TIMEOUT bash -c thread_cancel 2> /dev/null & + +wait diff --git a/tests/queries/0_stateless/02461_welch_t_test_fuzz.reference b/tests/queries/0_stateless/02461_welch_t_test_fuzz.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/02461_welch_t_test_fuzz.sql b/tests/queries/0_stateless/02461_welch_t_test_fuzz.sql new file mode 100644 index 00000000000..b22dc49dec3 --- /dev/null +++ b/tests/queries/0_stateless/02461_welch_t_test_fuzz.sql @@ -0,0 +1,8 @@ + +DROP TABLE IF EXISTS welch_ttest__fuzz_7; +CREATE TABLE welch_ttest__fuzz_7 (left UInt128, right UInt128) ENGINE = Memory; + +INSERT INTO welch_ttest__fuzz_7 VALUES (0.010268, 0), (0.000167, 0), (0.000167, 0), (0.159258, 1), (0.136278, 1), (0.122389, 1); + +SELECT roundBankers(welchTTest(left, right).2, 6) from welch_ttest__fuzz_7; -- { serverError 36 } +SELECT roundBankers(studentTTest(left, right).2, 6) from welch_ttest__fuzz_7; -- { serverError 36 } diff --git a/tests/queries/0_stateless/02462_distributions.reference b/tests/queries/0_stateless/02462_distributions.reference new file mode 100644 index 00000000000..56b04bcb856 --- /dev/null +++ b/tests/queries/0_stateless/02462_distributions.reference @@ -0,0 +1,12 @@ +Ok +Ok +Ok +Ok +Ok +Ok +Ok +0 +1 +Ok +Ok +Ok diff --git a/tests/queries/0_stateless/02462_distributions.sql b/tests/queries/0_stateless/02462_distributions.sql new file mode 100644 index 00000000000..b45dc897f2a --- /dev/null +++ b/tests/queries/0_stateless/02462_distributions.sql @@ -0,0 +1,24 @@ +# Values should be between 0 and 1 +SELECT DISTINCT if (a >= toFloat64(0) AND a <= toFloat64(1), 'Ok', 'Fail') FROM (SELECT randUniform(0, 1) AS a FROM numbers(100000)); +# Mean should be around 0 +SELECT DISTINCT if (m >= toFloat64(-0.2) AND m <= toFloat64(0.2), 'Ok', 'Fail') FROM (SELECT avg(a) as m FROM (SELECT randNormal(0, 5) AS a FROM numbers(100000))); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randLogNormal(0, 5) AS a FROM numbers(100000)); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randExponential(15) AS a FROM numbers(100000)); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randChiSquared(3) AS a FROM numbers(100000)); +# Mean should be around 0 +SELECT DISTINCT if (m > toFloat64(-0.2) AND m < toFloat64(0.2), 'Ok', 'Fail') FROM (SELECT avg(a) as m FROM (SELECT randStudentT(5) AS a FROM numbers(100000))); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randFisherF(3, 4) AS a FROM numbers(100000)); +# There should be only 0s and 1s +SELECT a FROM (SELECT DISTINCT randBernoulli(0.5) AS a FROM numbers(100000)) ORDER BY a; +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randBinomial(3, 0.5) AS a FROM numbers(100000)); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randNegativeBinomial(3, 0.5) AS a FROM numbers(100000)); +# Values should be >= 0 +SELECT DISTINCT if (a >= toFloat64(0), 'Ok', 'Fail') FROM (SELECT randPoisson(44) AS a FROM numbers(100000)); +# No errors +SELECT randUniform(1, 2, 1), randNormal(0, 1, 'abacaba'), randLogNormal(0, 10, 'b'), randChiSquared(1, 1), randStudentT(7, '8'), randFisherF(23, 42, 100), randBernoulli(0.5, 2), randBinomial(3, 0.5, 1), randNegativeBinomial(3, 0.5, 2), randPoisson(44, 44) FORMAT Null; diff --git a/tests/queries/0_stateless/02462_int_to_date.reference b/tests/queries/0_stateless/02462_int_to_date.reference new file mode 100644 index 00000000000..f31441cf3b8 --- /dev/null +++ b/tests/queries/0_stateless/02462_int_to_date.reference @@ -0,0 +1,4 @@ +20221011 2022-10-11 1665519765 +20221011 2022-10-11 1665519765 +20221011 2022-10-11 1665519765 Int32 +20221011 2022-10-11 1665519765 UInt32 diff --git a/tests/queries/0_stateless/02462_int_to_date.sql b/tests/queries/0_stateless/02462_int_to_date.sql new file mode 100644 index 00000000000..cd470ca12f6 --- /dev/null +++ b/tests/queries/0_stateless/02462_int_to_date.sql @@ -0,0 +1,4 @@ +select toYYYYMMDD(toDate(recordTimestamp, 'Europe/Amsterdam')), toDate(recordTimestamp, 'Europe/Amsterdam'), toInt64(1665519765) as recordTimestamp; +select toYYYYMMDD(toDate(recordTimestamp, 'Europe/Amsterdam')), toDate(recordTimestamp, 'Europe/Amsterdam'), toUInt64(1665519765) as recordTimestamp; +select toYYYYMMDD(toDate(recordTimestamp, 'Europe/Amsterdam')), toDate(recordTimestamp, 'Europe/Amsterdam'), toInt32(1665519765) as recordTimestamp, toTypeName(recordTimestamp); +select toYYYYMMDD(toDate(recordTimestamp, 'Europe/Amsterdam')), toDate(recordTimestamp, 'Europe/Amsterdam'), toUInt32(1665519765) as recordTimestamp, toTypeName(recordTimestamp); diff --git a/tests/queries/0_stateless/02463_julian_day_ubsan.reference b/tests/queries/0_stateless/02463_julian_day_ubsan.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/02463_julian_day_ubsan.sql b/tests/queries/0_stateless/02463_julian_day_ubsan.sql new file mode 100644 index 00000000000..a8583d7b0a8 --- /dev/null +++ b/tests/queries/0_stateless/02463_julian_day_ubsan.sql @@ -0,0 +1 @@ +SELECT fromModifiedJulianDay(9223372036854775807 :: Int64); -- { serverError 490 } diff --git a/tests/queries/0_stateless/02464_decimal_scale_buffer_overflow.reference b/tests/queries/0_stateless/02464_decimal_scale_buffer_overflow.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/02464_decimal_scale_buffer_overflow.sql b/tests/queries/0_stateless/02464_decimal_scale_buffer_overflow.sql new file mode 100644 index 00000000000..355d9012f1f --- /dev/null +++ b/tests/queries/0_stateless/02464_decimal_scale_buffer_overflow.sql @@ -0,0 +1,5 @@ +DROP TABLE IF EXISTS series__fuzz_35; +CREATE TABLE series__fuzz_35 (`i` UInt8, `x_value` Decimal(18, 14), `y_value` DateTime) ENGINE = Memory; +INSERT INTO series__fuzz_35(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3); +SELECT skewSamp(x_value) FROM (SELECT x_value as x_value FROM series__fuzz_35 LIMIT 2) FORMAT Null; +DROP TABLE series__fuzz_35; diff --git a/tests/queries/0_stateless/02466_distributed_query_profiler.reference b/tests/queries/0_stateless/02466_distributed_query_profiler.reference new file mode 100644 index 00000000000..4521d575ff3 --- /dev/null +++ b/tests/queries/0_stateless/02466_distributed_query_profiler.reference @@ -0,0 +1,10 @@ +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 diff --git a/tests/queries/0_stateless/02466_distributed_query_profiler.sql b/tests/queries/0_stateless/02466_distributed_query_profiler.sql new file mode 100644 index 00000000000..9fc2fe7b4bd --- /dev/null +++ b/tests/queries/0_stateless/02466_distributed_query_profiler.sql @@ -0,0 +1,21 @@ +-- This is a regression test for EINTR handling in MultiplexedConnections::getReplicaForReading() + +select * from remote('127.{2,4}', view( + -- This is the emulation of the slow query, the server will return a line each 0.1 second + select sleep(0.1) from numbers(20) settings max_block_size=1) +) +-- LIMIT is to activate query cancellation in case of enough rows already read. +limit 10 +settings + -- This is to avoid draining in background and got the exception during query execution + drain_timeout=-1, + -- This is to activate as much signals as possible to trigger EINTR + query_profiler_real_time_period_ns=1, + -- This is to use MultiplexedConnections + use_hedged_requests=0, + -- This is to make the initiator waiting for cancel packet in MultiplexedConnections::getReplicaForReading() + -- + -- NOTE: that even smaller sleep will be enough to trigger this problem + -- with 100% probability, however just to make it more reliable, increase + -- it to 2 seconds. + sleep_in_receive_cancel_ms=2000; diff --git a/tests/queries/0_stateless/02467_cross_join_three_table_functions.reference b/tests/queries/0_stateless/02467_cross_join_three_table_functions.reference new file mode 100644 index 00000000000..0718dd8e65f --- /dev/null +++ b/tests/queries/0_stateless/02467_cross_join_three_table_functions.reference @@ -0,0 +1 @@ +1320 diff --git a/tests/queries/0_stateless/02467_cross_join_three_table_functions.sql b/tests/queries/0_stateless/02467_cross_join_three_table_functions.sql new file mode 100644 index 00000000000..5c7da815bbe --- /dev/null +++ b/tests/queries/0_stateless/02467_cross_join_three_table_functions.sql @@ -0,0 +1 @@ +SELECT count(*) FROM numbers(10) AS a, numbers(11) AS b, numbers(12) AS c; diff --git a/tests/queries/0_stateless/02468_has_any_tuple.reference b/tests/queries/0_stateless/02468_has_any_tuple.reference new file mode 100644 index 00000000000..252a9293563 --- /dev/null +++ b/tests/queries/0_stateless/02468_has_any_tuple.reference @@ -0,0 +1,4 @@ +1 +1 +[(3,3)] +1 diff --git a/tests/queries/0_stateless/02468_has_any_tuple.sql b/tests/queries/0_stateless/02468_has_any_tuple.sql new file mode 100644 index 00000000000..12c7222d593 --- /dev/null +++ b/tests/queries/0_stateless/02468_has_any_tuple.sql @@ -0,0 +1,4 @@ +select [(toUInt8(3), toUInt8(3))] = [(toInt16(3), toInt16(3))]; +select hasAny([(toInt16(3), toInt16(3))],[(toInt16(3), toInt16(3))]); +select arrayFilter(x -> x = (toInt16(3), toInt16(3)), arrayZip([toUInt8(3)], [toUInt8(3)])); +select hasAny([(toUInt8(3), toUInt8(3))],[(toInt16(3), toInt16(3))]); diff --git a/tests/queries/1_stateful/00097_constexpr_in_index.reference b/tests/queries/1_stateful/00097_constexpr_in_index.reference new file mode 100644 index 00000000000..5080d6d4cd8 --- /dev/null +++ b/tests/queries/1_stateful/00097_constexpr_in_index.reference @@ -0,0 +1 @@ +1803 diff --git a/tests/queries/1_stateful/00097_constexpr_in_index.sql b/tests/queries/1_stateful/00097_constexpr_in_index.sql new file mode 100644 index 00000000000..b5cac75c767 --- /dev/null +++ b/tests/queries/1_stateful/00097_constexpr_in_index.sql @@ -0,0 +1,3 @@ +-- Even in presense of OR, we evaluate the "0 IN (1, 2, 3)" as a constant expression therefore it does not prevent the index analysis. + +SELECT count() FROM test.hits WHERE CounterID IN (14917930, 33034174) OR 0 IN (1, 2, 3) SETTINGS max_rows_to_read = 1000000, force_primary_key = 1; diff --git a/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.reference b/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.reference deleted file mode 100644 index 2675904dea0..00000000000 --- a/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.reference +++ /dev/null @@ -1,110 +0,0 @@ -Testing 00001_count_hits.sql ----> Ok! ✅ -Testing 00002_count_visits.sql ----> Ok! ✅ -Testing 00004_top_counters.sql ----> Ok! ✅ -Testing 00005_filtering.sql ----> Ok! ✅ -Testing 00006_agregates.sql ----> Ok! ✅ -Testing 00007_uniq.sql ----> Ok! ✅ -Testing 00008_uniq.sql ----> Ok! ✅ -Testing 00009_uniq_distributed.sql ----> Ok! ✅ -Testing 00010_quantiles_segfault.sql ----> Ok! ✅ -Testing 00011_sorting.sql ----> Ok! ✅ -Testing 00012_sorting_distributed.sql ----> Ok! ✅ -Skipping 00013_sorting_of_nested.sql -Testing 00014_filtering_arrays.sql ----> Ok! ✅ -Testing 00015_totals_and_no_aggregate_functions.sql ----> Ok! ✅ -Testing 00016_any_if_distributed_cond_always_false.sql ----> Ok! ✅ -Testing 00017_aggregation_uninitialized_memory.sql ----> Ok! ✅ -Testing 00020_distinct_order_by_distributed.sql ----> Ok! ✅ -Testing 00021_1_select_with_in.sql ----> Ok! ✅ -Testing 00021_2_select_with_in.sql ----> Ok! ✅ -Testing 00021_3_select_with_in.sql ----> Ok! ✅ -Testing 00022_merge_prewhere.sql ----> Ok! ✅ -Testing 00023_totals_limit.sql ----> Ok! ✅ -Testing 00024_random_counters.sql ----> Ok! ✅ -Testing 00030_array_enumerate_uniq.sql ----> Ok! ✅ -Testing 00031_array_enumerate_uniq.sql ----> Ok! ✅ -Testing 00032_aggregate_key64.sql ----> Ok! ✅ -Testing 00033_aggregate_key_string.sql ----> Ok! ✅ -Testing 00034_aggregate_key_fixed_string.sql ----> Ok! ✅ -Testing 00035_aggregate_keys128.sql ----> Ok! ✅ -Testing 00036_aggregate_hashed.sql ----> Ok! ✅ -Testing 00037_uniq_state_merge1.sql ----> Ok! ✅ -Testing 00038_uniq_state_merge2.sql ----> Ok! ✅ -Testing 00039_primary_key.sql ----> Ok! ✅ -Testing 00040_aggregating_materialized_view.sql ----> Ok! ✅ -Testing 00041_aggregating_materialized_view.sql ----> Ok! ✅ -Testing 00042_any_left_join.sql ----> Ok! ✅ -Testing 00043_any_left_join.sql ----> Ok! ✅ -Testing 00044_any_left_join_string.sql ----> Ok! ✅ -Testing 00045_uniq_upto.sql ----> Ok! ✅ -Testing 00046_uniq_upto_distributed.sql ----> Ok! ✅ -Testing 00047_bar.sql ----> Ok! ✅ -Testing 00048_min_max.sql ----> Ok! ✅ -Testing 00049_max_string_if.sql ----> Ok! ✅ -Testing 00050_min_max.sql ----> Ok! ✅ -Testing 00051_min_max_array.sql ----> Ok! ✅ -Testing 00052_group_by_in.sql ----> Ok! ✅ -Testing 00053_replicate_segfault.sql ----> Ok! ✅ -Testing 00054_merge_tree_partitions.sql ----> Ok! ✅ -Testing 00055_index_and_not.sql ----> Ok! ✅ -Testing 00056_view.sql ----> Ok! ✅ -Testing 00059_merge_sorting_empty_array_joined.sql ----> Ok! ✅ -Testing 00060_move_to_prewhere_and_sets.sql ----> Ok! ✅ -Skipping 00061_storage_buffer.sql -Testing 00062_loyalty.sql ----> Ok! ✅ -Testing 00063_loyalty_joins.sql ----> Ok! ✅ -Testing 00065_loyalty_with_storage_join.sql ----> Ok! ✅ -Testing 00066_sorting_distributed_many_replicas.sql ----> Ok! ✅ -Testing 00067_union_all.sql ----> Ok! ✅ -Testing 00068_subquery_in_prewhere.sql ----> Ok! ✅ -Testing 00069_duplicate_aggregation_keys.sql ----> Ok! ✅ -Testing 00071_merge_tree_optimize_aio.sql ----> Ok! ✅ -Testing 00072_compare_date_and_string_index.sql ----> Ok! ✅ -Testing 00073_uniq_array.sql ----> Ok! ✅ -Testing 00074_full_join.sql ----> Ok! ✅ -Testing 00075_left_array_join.sql ----> Ok! ✅ -Testing 00076_system_columns_bytes.sql ----> Ok! ✅ -Testing 00077_log_tinylog_stripelog.sql ----> Ok! ✅ -Testing 00078_group_by_arrays.sql ----> Ok! ✅ -Testing 00079_array_join_not_used_joined_column.sql ----> Ok! ✅ -Testing 00080_array_join_and_union.sql ----> Ok! ✅ -Testing 00081_group_by_without_key_and_totals.sql ----> Ok! ✅ -Testing 00082_quantiles.sql ----> Ok! ✅ -Testing 00083_array_filter.sql ----> Ok! ✅ -Testing 00084_external_aggregation.sql ----> Ok! ✅ -Testing 00085_monotonic_evaluation_segfault.sql ----> Ok! ✅ -Testing 00086_array_reduce.sql ----> Ok! ✅ -Testing 00087_where_0.sql ----> Ok! ✅ -Testing 00088_global_in_one_shard_and_rows_before_limit.sql ----> Ok! ✅ -Testing 00089_position_functions_with_non_constant_arg.sql ----> Ok! ✅ -Testing 00091_prewhere_two_conditions.sql ----> Ok! ✅ -Testing 00093_prewhere_array_join.sql ----> Ok! ✅ -Testing 00094_order_by_array_join_limit.sql ----> Ok! ✅ -Skipping 00095_hyperscan_profiler.sql -Testing 00139_like.sql ----> Ok! ✅ -Skipping 00140_rename.sql -Testing 00141_transform.sql ----> Ok! ✅ -Testing 00142_system_columns.sql ----> Ok! ✅ -Testing 00143_transform_non_const_default.sql ----> Ok! ✅ -Testing 00144_functions_of_aggregation_states.sql ----> Ok! ✅ -Testing 00145_aggregate_functions_statistics.sql ----> Ok! ✅ -Testing 00146_aggregate_function_uniq.sql ----> Ok! ✅ -Testing 00147_global_in_aggregate_function.sql ----> Ok! ✅ -Testing 00148_monotonic_functions_and_index.sql ----> Ok! ✅ -Testing 00149_quantiles_timing_distributed.sql ----> Ok! ✅ -Testing 00150_quantiles_timing_precision.sql ----> Ok! ✅ -Testing 00151_order_by_read_in_order.sql ----> Ok! ✅ -Skipping 00151_replace_partition_with_different_granularity.sql -Skipping 00152_insert_different_granularity.sql -Testing 00153_aggregate_arena_race.sql ----> Ok! ✅ -Skipping 00154_avro.sql -Testing 00156_max_execution_speed_sample_merge.sql ----> Ok! ✅ -Skipping 00157_cache_dictionary.sql -Skipping 00158_cache_dictionary_has.sql -Testing 00160_decode_xml_component.sql ----> Ok! ✅ -Testing 00162_mmap_compression_none.sql ----> Ok! ✅ -Testing 00164_quantileBfloat16.sql ----> Ok! ✅ -Testing 00165_jit_aggregate_functions.sql ----> Ok! ✅ -Skipping 00166_explain_estimate.sql -Testing 00167_read_bytes_from_fs.sql ----> Ok! ✅ -Total failed tests: diff --git a/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.sh b/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.sh deleted file mode 100755 index ecd0d281b53..00000000000 --- a/tests/queries/1_stateful/00168_parallel_processing_on_replicas_part_1.sh +++ /dev/null @@ -1,102 +0,0 @@ -#!/usr/bin/env bash -# Tags: no-tsan, no-random-settings - -CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) -# shellcheck source=../shell_config.sh -. "$CURDIR"/../shell_config.sh - -# set -e - -# All replicas are localhost, disable `prefer_localhost_replica` option to test network interface -# Currently this feature could not work with hedged requests -# Enabling `enable_sample_offset_parallel_processing` feature could lead to intersecting marks, so some of them would be thrown away and it will lead to incorrect result of SELECT query -SETTINGS="--max_parallel_replicas=3 --use_hedged_requests=false --allow_experimental_parallel_reading_from_replicas=true" - -# Prepare tables -$CLICKHOUSE_CLIENT $SETTINGS -nm -q ''' - drop table if exists test.dist_hits SYNC; - drop table if exists test.dist_visits SYNC; - - create table test.dist_hits as test.hits engine = Distributed("test_cluster_one_shard_three_replicas_localhost", test, hits, rand()); - create table test.dist_visits as test.visits engine = Distributed("test_cluster_one_shard_three_replicas_localhost", test, visits, rand()); -'''; - -FAILED=() - -# PreviouslyFailed=( -# ) - -SkipList=( - "00013_sorting_of_nested.sql" # It contains FINAL, which is not allowed together with parallel reading - - "00061_storage_buffer.sql" - "00095_hyperscan_profiler.sql" # too long in debug (there is a --no-debug tag inside a test) - - "00140_rename.sql" # Multiple renames are not allowed with DatabaseReplicated and tags are not forwarded through this test - - "00154_avro.sql" # Plain select * with limit with Distributed table is not deterministic - "00151_replace_partition_with_different_granularity.sql" # Replace partition from Distributed is not allowed - "00152_insert_different_granularity.sql" # The same as above - - "00157_cache_dictionary.sql" # Too long in debug mode, but result is correct - "00158_cache_dictionary_has.sql" # The same as above - - "00166_explain_estimate.sql" # Distributed table returns nothing -) - -# for TESTPATH in "${PreviouslyFailed[@]}" -for TESTPATH in "$CURDIR"/*.sql; -do - TESTNAME=$(basename $TESTPATH) - NUM=$(echo "${TESTNAME}" | grep -o -P '^\d+' | sed 's/^0*//') - if [[ "${NUM}" -ge 168 ]]; then - continue - fi - - if [[ " ${SkipList[*]} " =~ ${TESTNAME} ]]; then - echo "Skipping $TESTNAME " - continue - fi - - echo -n "Testing $TESTNAME ----> " - - # prepare test - NEW_TESTNAME="/tmp/dist_$TESTNAME" - # Added g to sed command to replace all tables, not the first - cat $TESTPATH | sed -e 's/test.hits/test.dist_hits/g' | sed -e 's/test.visits/test.dist_visits/g' > $NEW_TESTNAME - - TESTNAME_RESULT="/tmp/result_$TESTNAME" - NEW_TESTNAME_RESULT="/tmp/result_dist_$TESTNAME" - - $CLICKHOUSE_CLIENT $SETTINGS -nm < $TESTPATH > $TESTNAME_RESULT - $CLICKHOUSE_CLIENT $SETTINGS -nm < $NEW_TESTNAME > $NEW_TESTNAME_RESULT - - expected=$(cat $TESTNAME_RESULT | md5sum) - actual=$(cat $NEW_TESTNAME_RESULT | md5sum) - - if [[ "$expected" != "$actual" ]]; then - FAILED+=("$TESTNAME") - echo "Failed! ❌" - echo "Plain:" - cat $TESTNAME_RESULT - echo "Distributed:" - cat $NEW_TESTNAME_RESULT - else - echo "Ok! ✅" - fi -done - - -echo "Total failed tests: " -# Iterate the loop to read and print each array element -for value in "${FAILED[@]}" -do - echo "🔺 $value" -done - -# Drop tables - -$CLICKHOUSE_CLIENT $SETTINGS -nm -q ''' - drop table if exists test.dist_hits SYNC; - drop table if exists test.dist_visits SYNC; -'''; diff --git a/utils/keeper-data-dumper/main.cpp b/utils/keeper-data-dumper/main.cpp index 0762c740ac1..dd3c3a4e2ad 100644 --- a/utils/keeper-data-dumper/main.cpp +++ b/utils/keeper-data-dumper/main.cpp @@ -63,7 +63,7 @@ int main(int argc, char *argv[]) SnapshotsQueue snapshots_queue{1}; CoordinationSettingsPtr settings = std::make_shared(); KeeperContextPtr keeper_context = std::make_shared(); - auto state_machine = std::make_shared(queue, snapshots_queue, argv[1], settings, keeper_context); + auto state_machine = std::make_shared(queue, snapshots_queue, argv[1], settings, keeper_context, nullptr); state_machine->init(); size_t last_commited_index = state_machine->last_commit_index();