diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4200d583a3f..9cb545f94e7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -13,43 +13,43 @@
* The `PREALLOCATE` option for `HASHED`/`SPARSE_HASHED` dictionaries becomes a no-op. [#45388](https://github.com/ClickHouse/ClickHouse/pull/45388) ([Azat Khuzhin](https://github.com/azat)). It does not give significant advantages anymore.
* Disallow `Gorilla` codec on columns of non-Float32 or non-Float64 type. [#45252](https://github.com/ClickHouse/ClickHouse/pull/45252) ([Robert Schulze](https://github.com/rschu1ze)). It was pointless and led to inconsistencies.
* Parallel quorum inserts might work incorrectly with `*MergeTree` tables created with the deprecated syntax. Therefore, parallel quorum inserts support is completely disabled for such tables. It does not affect tables created with a new syntax. [#45430](https://github.com/ClickHouse/ClickHouse/pull/45430) ([Alexander Tokmakov](https://github.com/tavplubix)).
-* Use `GetObjectAttributes` request instead of `HeadObject` request to get the size of an object in AWS S3. This change fixes handling endpoints without explicit region after updating the AWS SDK, for example. [#45288](https://github.com/ClickHouse/ClickHouse/pull/45288) ([Vitaly Baranov](https://github.com/vitlibar)). AWS S3 and Minio are tested, but keep in mind that various S3-compatible services (GCS, R2, B2) may have subtle incompatibilities. This change also may require you to adjust the ACL to allow the `GetObjectAttributes` request.
+* Use the `GetObjectAttributes` request instead of the `HeadObject` request to get the size of an object in AWS S3. This change fixes handling endpoints without explicit regions after updating the AWS SDK, for example. [#45288](https://github.com/ClickHouse/ClickHouse/pull/45288) ([Vitaly Baranov](https://github.com/vitlibar)). AWS S3 and Minio are tested, but keep in mind that various S3-compatible services (GCS, R2, B2) may have subtle incompatibilities. This change also may require you to adjust the ACL to allow the `GetObjectAttributes` request.
* Forbid paths in timezone names. For example, a timezone name like `/usr/share/zoneinfo/Asia/Aden` is not allowed; the IANA timezone database name like `Asia/Aden` should be used. [#44225](https://github.com/ClickHouse/ClickHouse/pull/44225) ([Kruglov Pavel](https://github.com/Avogar)).
#### New Feature
* Dictionary source for extracting keys by traversing regular expressions tree. It can be used for User-Agent parsing. [#40878](https://github.com/ClickHouse/ClickHouse/pull/40878) ([Vage Ogannisian](https://github.com/nooblose)). [#43858](https://github.com/ClickHouse/ClickHouse/pull/43858) ([Han Fei](https://github.com/hanfei1991)).
-* Added parametrized view functionality, now it's possible to specify query parameters for View table engine. resolves [#40907](https://github.com/ClickHouse/ClickHouse/issues/40907). [#41687](https://github.com/ClickHouse/ClickHouse/pull/41687) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
+* Added parametrized view functionality, now it's possible to specify query parameters for the View table engine. resolves [#40907](https://github.com/ClickHouse/ClickHouse/issues/40907). [#41687](https://github.com/ClickHouse/ClickHouse/pull/41687) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add `quantileInterpolatedWeighted`/`quantilesInterpolatedWeighted` functions. [#38252](https://github.com/ClickHouse/ClickHouse/pull/38252) ([Bharat Nallan](https://github.com/bharatnc)).
* Array join support for the `Map` type, like the function "explode" in Spark. [#43239](https://github.com/ClickHouse/ClickHouse/pull/43239) ([李扬](https://github.com/taiyang-li)).
* Support SQL standard binary and hex string literals. [#43785](https://github.com/ClickHouse/ClickHouse/pull/43785) ([Mo Xuan](https://github.com/mo-avatar)).
-* Allow to format `DateTime` in Joda-Time style. Refer to [the Joda-Time docs](https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html). [#43818](https://github.com/ClickHouse/ClickHouse/pull/43818) ([李扬](https://github.com/taiyang-li)).
+* Allow formatting `DateTime` in Joda-Time style. Refer to [the Joda-Time docs](https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html). [#43818](https://github.com/ClickHouse/ClickHouse/pull/43818) ([李扬](https://github.com/taiyang-li)).
* Implemented a fractional second formatter (`%f`) for `formatDateTime`. [#44060](https://github.com/ClickHouse/ClickHouse/pull/44060) ([ltrk2](https://github.com/ltrk2)). [#44497](https://github.com/ClickHouse/ClickHouse/pull/44497) ([Alexander Gololobov](https://github.com/davenger)).
-* Added `age` function to calculate difference between two dates or dates with time values expressed as number of full units. Closes [#41115](https://github.com/ClickHouse/ClickHouse/issues/41115). [#44421](https://github.com/ClickHouse/ClickHouse/pull/44421) ([Robert Schulze](https://github.com/rschu1ze)).
+* Added `age` function to calculate the difference between two dates or dates with time values expressed as the number of full units. Closes [#41115](https://github.com/ClickHouse/ClickHouse/issues/41115). [#44421](https://github.com/ClickHouse/ClickHouse/pull/44421) ([Robert Schulze](https://github.com/rschu1ze)).
* Add `Null` source for dictionaries. Closes [#44240](https://github.com/ClickHouse/ClickHouse/issues/44240). [#44502](https://github.com/ClickHouse/ClickHouse/pull/44502) ([mayamika](https://github.com/mayamika)).
* Allow configuring the S3 storage class with the `s3_storage_class` configuration option. Such as `STANDARD/INTELLIGENT_TIERING` Closes [#44443](https://github.com/ClickHouse/ClickHouse/issues/44443). [#44707](https://github.com/ClickHouse/ClickHouse/pull/44707) ([chen](https://github.com/xiedeyantu)).
* Insert default values in case of missing elements in JSON object while parsing named tuple. Add setting `input_format_json_defaults_for_missing_elements_in_named_tuple` that controls this behaviour. Closes [#45142](https://github.com/ClickHouse/ClickHouse/issues/45142)#issuecomment-1380153217. [#45231](https://github.com/ClickHouse/ClickHouse/pull/45231) ([Kruglov Pavel](https://github.com/Avogar)).
* Record server startup time in ProfileEvents (`ServerStartupMilliseconds`). Resolves [#43188](https://github.com/ClickHouse/ClickHouse/issues/43188). [#45250](https://github.com/ClickHouse/ClickHouse/pull/45250) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
-* Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and add support for all formats, also refactor formats a bit: - Fix producing messages in row-based formats with suffixes/prefixes. Now every message is formatted completely with all delimiters and can be parsed back using input format. - Support block-based formats like Native, Parquet, ORC, etc. Every block is formatted as a separated message. The number of rows in one message depends on the block size, so you can control it via setting `max_block_size`. - Add new engine settings `kafka_max_rows_per_message/rabbitmq_max_rows_per_message/nats_max_rows_per_message`. They control the number of rows formatted in one message in row-based formats. Default value: 1. - Fix high memory consumption in NATS table engine. - Support arbitrary binary data in NATS producer (previously it worked only with strings contained \0 at the end) - Add missing Kafka/RabbitMQ/NATS engine settings in documentation. - Refactor producing and consuming in Kafka/RabbitMQ/NATS, separate it from WriteBuffers/ReadBuffers semantic. - Refactor output formats: remove callbacks on each row used in Kafka/RabbitMQ/NATS (now we don't use callbacks there), allow to use IRowOutputFormat directly, clarify row end and row between delimiters, make it possible to reset output format to start formatting again - Add proper implementation in formatRow function (bonus after formats refactoring). [#42777](https://github.com/ClickHouse/ClickHouse/pull/42777) ([Kruglov Pavel](https://github.com/Avogar)).
+* Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and add support for all formats, also refactor formats a bit: - Fix producing messages in row-based formats with suffixes/prefixes. Now every message is formatted completely with all delimiters and can be parsed back using input format. - Support block-based formats like Native, Parquet, ORC, etc. Every block is formatted as a separate message. The number of rows in one message depends on the block size, so you can control it via the setting `max_block_size`. - Add new engine settings `kafka_max_rows_per_message/rabbitmq_max_rows_per_message/nats_max_rows_per_message`. They control the number of rows formatted in one message in row-based formats. Default value: 1. - Fix high memory consumption in the NATS table engine. - Support arbitrary binary data in NATS producer (previously it worked only with strings contained \0 at the end) - Add missing Kafka/RabbitMQ/NATS engine settings in the documentation. - Refactor producing and consuming in Kafka/RabbitMQ/NATS, separate it from WriteBuffers/ReadBuffers semantic. - Refactor output formats: remove callbacks on each row used in Kafka/RabbitMQ/NATS (now we don't use callbacks there), allow to use IRowOutputFormat directly, clarify row end and row between delimiters, make it possible to reset output format to start formatting again - Add proper implementation in formatRow function (bonus after formats refactoring). [#42777](https://github.com/ClickHouse/ClickHouse/pull/42777) ([Kruglov Pavel](https://github.com/Avogar)).
* Support reading/writing `Nested` tables as `List` of `Struct` in `CapnProto` format. Read/write `Decimal32/64` as `Int32/64`. Closes [#43319](https://github.com/ClickHouse/ClickHouse/issues/43319). [#43379](https://github.com/ClickHouse/ClickHouse/pull/43379) ([Kruglov Pavel](https://github.com/Avogar)).
-* Added a `message_format_string` column to `system.text_log`. The column contains a pattern that was used to format the message. [#44543](https://github.com/ClickHouse/ClickHouse/pull/44543) ([Alexander Tokmakov](https://github.com/tavplubix)). This allows various analytics over ClickHouse own logs.
-* Try to autodetect header with column names (and maybe types) for CSV/TSV/CustomSeparated input formats.
-Add settings input_format_tsv/csv/custom_detect_header that enables this behaviour (enabled by default). Closes [#44640](https://github.com/ClickHouse/ClickHouse/issues/44640). [#44953](https://github.com/ClickHouse/ClickHouse/pull/44953) ([Kruglov Pavel](https://github.com/Avogar)).
+* Added a `message_format_string` column to `system.text_log`. The column contains a pattern that was used to format the message. [#44543](https://github.com/ClickHouse/ClickHouse/pull/44543) ([Alexander Tokmakov](https://github.com/tavplubix)). This allows various analytics over the ClickHouse logs.
+* Try to autodetect headers with column names (and maybe types) for CSV/TSV/CustomSeparated input formats.
+Add settings input_format_tsv/csv/custom_detect_header that enable this behaviour (enabled by default). Closes [#44640](https://github.com/ClickHouse/ClickHouse/issues/44640). [#44953](https://github.com/ClickHouse/ClickHouse/pull/44953) ([Kruglov Pavel](https://github.com/Avogar)).
#### Experimental Feature
* Add an experimental inverted index as a new secondary index type for efficient text search. [#38667](https://github.com/ClickHouse/ClickHouse/pull/38667) ([larryluogit](https://github.com/larryluogit)).
* Add experimental query result cache. [#43797](https://github.com/ClickHouse/ClickHouse/pull/43797) ([Robert Schulze](https://github.com/rschu1ze)).
* Added extendable and configurable scheduling subsystem for IO requests (not yet integrated with IO code itself). [#41840](https://github.com/ClickHouse/ClickHouse/pull/41840) ([Sergei Trifonov](https://github.com/serxa)). This feature does nothing at all, enjoy.
-* Added `SYSTEM DROP DATABASE REPLICA` that removes metadata of dead replica of `Replicated` database. Resolves [#41794](https://github.com/ClickHouse/ClickHouse/issues/41794). [#42807](https://github.com/ClickHouse/ClickHouse/pull/42807) ([Alexander Tokmakov](https://github.com/tavplubix)).
+* Added `SYSTEM DROP DATABASE REPLICA` that removes metadata of a dead replica of a `Replicated` database. Resolves [#41794](https://github.com/ClickHouse/ClickHouse/issues/41794). [#42807](https://github.com/ClickHouse/ClickHouse/pull/42807) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### Performance Improvement
* Do not load inactive parts at startup of `MergeTree` tables. [#42181](https://github.com/ClickHouse/ClickHouse/pull/42181) ([Anton Popov](https://github.com/CurtizJ)).
-* Improved latency of reading from storage `S3` and table function `s3` with large number of small files. Now settings `remote_filesystem_read_method` and `remote_filesystem_read_prefetch` take effect while reading from storage `S3`. [#43726](https://github.com/ClickHouse/ClickHouse/pull/43726) ([Anton Popov](https://github.com/CurtizJ)).
+* Improved latency of reading from storage `S3` and table function `s3` with large numbers of small files. Now settings `remote_filesystem_read_method` and `remote_filesystem_read_prefetch` take effect while reading from storage `S3`. [#43726](https://github.com/ClickHouse/ClickHouse/pull/43726) ([Anton Popov](https://github.com/CurtizJ)).
* Optimization for reading struct fields in Parquet/ORC files. Only the required fields are loaded. [#44484](https://github.com/ClickHouse/ClickHouse/pull/44484) ([lgbo](https://github.com/lgbo-ustc)).
-* Two-level aggregation algorithm was mistakenly disabled for queries over HTTP interface. It was enabled back, and it leads to a major performance improvement. [#45450](https://github.com/ClickHouse/ClickHouse/pull/45450) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Two-level aggregation algorithm was mistakenly disabled for queries over the HTTP interface. It was enabled back, and it leads to a major performance improvement. [#45450](https://github.com/ClickHouse/ClickHouse/pull/45450) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Added mmap support for StorageFile, which should improve the performance of clickhouse-local. [#43927](https://github.com/ClickHouse/ClickHouse/pull/43927) ([pufit](https://github.com/pufit)).
* Added sharding support in HashedDictionary to allow parallel load (almost linear scaling based on number of shards). [#40003](https://github.com/ClickHouse/ClickHouse/pull/40003) ([Azat Khuzhin](https://github.com/azat)).
* Speed up query parsing. [#42284](https://github.com/ClickHouse/ClickHouse/pull/42284) ([Raúl Marín](https://github.com/Algunenano)).
-* Always replace OR chain `expr = x1 OR ... OR expr = xN` to `expr IN (x1, ..., xN)` in case if `expr` is a `LowCardinality` column. Setting `optimize_min_equality_disjunction_chain_length` is ignored in this case. [#42889](https://github.com/ClickHouse/ClickHouse/pull/42889) ([Guo Wangyang](https://github.com/guowangy)).
+* Always replace OR chain `expr = x1 OR ... OR expr = xN` to `expr IN (x1, ..., xN)` in the case where `expr` is a `LowCardinality` column. Setting `optimize_min_equality_disjunction_chain_length` is ignored in this case. [#42889](https://github.com/ClickHouse/ClickHouse/pull/42889) ([Guo Wangyang](https://github.com/guowangy)).
* Slightly improve performance by optimizing the code around ThreadStatus. [#43586](https://github.com/ClickHouse/ClickHouse/pull/43586) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Optimize the column-wise ternary logic evaluation by achieving auto-vectorization. In the performance test of this [microbenchmark](https://github.com/ZhiguoZh/ClickHouse/blob/20221123-ternary-logic-opt-example/src/Functions/examples/associative_applier_perf.cpp), we've observed a peak **performance gain** of **21x** on the ICX device (Intel Xeon Platinum 8380 CPU). [#43669](https://github.com/ClickHouse/ClickHouse/pull/43669) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Avoid acquiring read locks in the `system.tables` table if possible. [#43840](https://github.com/ClickHouse/ClickHouse/pull/43840) ([Raúl Marín](https://github.com/Algunenano)).
@@ -59,10 +59,10 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
* Add fast path for: - `col like '%%'`; - `col like '%'`; - `col not like '%'`; - `col not like '%'`; - `match(col, '.*')`. [#45244](https://github.com/ClickHouse/ClickHouse/pull/45244) ([李扬](https://github.com/taiyang-li)).
* Slightly improve happy path optimisation in filtering (WHERE clause). [#45289](https://github.com/ClickHouse/ClickHouse/pull/45289) ([Nikita Taranov](https://github.com/nickitat)).
* Provide monotonicity info for `toUnixTimestamp64*` to enable more algebraic optimizations for index analysis. [#44116](https://github.com/ClickHouse/ClickHouse/pull/44116) ([Nikita Taranov](https://github.com/nickitat)).
-* Allow to configure temporary data for query processing (spilling to disk) to cooperate with filesystem cache (taking up the space from the cache disk) [#43972](https://github.com/ClickHouse/ClickHouse/pull/43972) ([Vladimir C](https://github.com/vdimir)). This mainly improves [ClickHouse Cloud](https://clickhouse.cloud/), but can be used for self-managed setups as well, if you know what to do.
+* Allow the configuration of temporary data for query processing (spilling to disk) to cooperate with the filesystem cache (taking up the space from the cache disk) [#43972](https://github.com/ClickHouse/ClickHouse/pull/43972) ([Vladimir C](https://github.com/vdimir)). This mainly improves [ClickHouse Cloud](https://clickhouse.cloud/), but can be used for self-managed setups as well, if you know what to do.
* Make `system.replicas` table do parallel fetches of replicas statuses. Closes [#43918](https://github.com/ClickHouse/ClickHouse/issues/43918). [#43998](https://github.com/ClickHouse/ClickHouse/pull/43998) ([Nikolay Degterinsky](https://github.com/evillique)).
* Optimize memory consumption during backup to S3: files to S3 now will be copied directly without using `WriteBufferFromS3` (which could use a lot of memory). [#45188](https://github.com/ClickHouse/ClickHouse/pull/45188) ([Vitaly Baranov](https://github.com/vitlibar)).
-* Add a cache for async block ids. This will reduce the requests of zookeeper when we enable async inserts deduplication. [#45106](https://github.com/ClickHouse/ClickHouse/pull/45106) ([Han Fei](https://github.com/hanfei1991)).
+* Add a cache for async block ids. This will reduce the number of requests of ZooKeeper when we enable async inserts deduplication. [#45106](https://github.com/ClickHouse/ClickHouse/pull/45106) ([Han Fei](https://github.com/hanfei1991)).
#### Improvement
@@ -71,25 +71,25 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
* Added fields `supports_parallel_parsing` and `supports_parallel_formatting` to table `system.formats` for better introspection. [#45499](https://github.com/ClickHouse/ClickHouse/pull/45499) ([Anton Popov](https://github.com/CurtizJ)).
* Improve reading CSV field in CustomSeparated/Template format. Closes [#42352](https://github.com/ClickHouse/ClickHouse/issues/42352) Closes [#39620](https://github.com/ClickHouse/ClickHouse/issues/39620). [#43332](https://github.com/ClickHouse/ClickHouse/pull/43332) ([Kruglov Pavel](https://github.com/Avogar)).
* Unify query elapsed time measurements. [#43455](https://github.com/ClickHouse/ClickHouse/pull/43455) ([Raúl Marín](https://github.com/Algunenano)).
-* Improve automatic usage of structure from insertion table in table functions file/hdfs/s3 when virtual columns present in select query, it fixes possible error `Block structure mismatch` or `number of columns mismatch`. [#43695](https://github.com/ClickHouse/ClickHouse/pull/43695) ([Kruglov Pavel](https://github.com/Avogar)).
-* Add support for signed arguments in function `range`. Fixes [#43333](https://github.com/ClickHouse/ClickHouse/issues/43333). [#43733](https://github.com/ClickHouse/ClickHouse/pull/43733) ([sanyu](https://github.com/wineternity)).
+* Improve automatic usage of structure from insertion table in table functions file/hdfs/s3 when virtual columns are present in a select query, it fixes the possible error `Block structure mismatch` or `number of columns mismatch`. [#43695](https://github.com/ClickHouse/ClickHouse/pull/43695) ([Kruglov Pavel](https://github.com/Avogar)).
+* Add support for signed arguments in the function `range`. Fixes [#43333](https://github.com/ClickHouse/ClickHouse/issues/43333). [#43733](https://github.com/ClickHouse/ClickHouse/pull/43733) ([sanyu](https://github.com/wineternity)).
* Remove redundant sorting, for example, sorting related ORDER BY clauses in subqueries. Implemented on top of query plan. It does similar optimization as `optimize_duplicate_order_by_and_distinct` regarding `ORDER BY` clauses, but more generic, since it's applied to any redundant sorting steps (not only caused by ORDER BY clause) and applied to subqueries of any depth. Related to [#42648](https://github.com/ClickHouse/ClickHouse/issues/42648). [#43905](https://github.com/ClickHouse/ClickHouse/pull/43905) ([Igor Nikonov](https://github.com/devcrafter)).
-* Add ability to disable deduplication of files for BACKUP (for backups wiithout deduplication ATTACH can be used instead of full RESTORE), example `BACKUP foo TO S3(...) SETTINGS deduplicate_files=0` (default `deduplicate_files=1`). [#43947](https://github.com/ClickHouse/ClickHouse/pull/43947) ([Azat Khuzhin](https://github.com/azat)).
+* Add the ability to disable deduplication of files for BACKUP (for backups without deduplication ATTACH can be used instead of full RESTORE). For example `BACKUP foo TO S3(...) SETTINGS deduplicate_files=0` (default `deduplicate_files=1`). [#43947](https://github.com/ClickHouse/ClickHouse/pull/43947) ([Azat Khuzhin](https://github.com/azat)).
* Refactor and improve schema inference for text formats. Add new setting `schema_inference_make_columns_nullable` that controls making result types `Nullable` (enabled by default);. [#44019](https://github.com/ClickHouse/ClickHouse/pull/44019) ([Kruglov Pavel](https://github.com/Avogar)).
* Better support for `PROXYv1` protocol. [#44135](https://github.com/ClickHouse/ClickHouse/pull/44135) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add information about the latest part check by cleanup threads into `system.parts` table. [#44244](https://github.com/ClickHouse/ClickHouse/pull/44244) ([Dmitry Novik](https://github.com/novikd)).
* Disable table functions in readonly mode for inserts. [#44290](https://github.com/ClickHouse/ClickHouse/pull/44290) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
-* Add a setting `simultaneous_parts_removal_limit` to allow to limit the number of parts being processed by one iteration of CleanupThread. [#44461](https://github.com/ClickHouse/ClickHouse/pull/44461) ([Dmitry Novik](https://github.com/novikd)).
-* If user only need virtual columns, we don't need to initialize ReadBufferFromS3. May be helpful to [#44246](https://github.com/ClickHouse/ClickHouse/issues/44246). [#44493](https://github.com/ClickHouse/ClickHouse/pull/44493) ([chen](https://github.com/xiedeyantu)).
+* Add a setting `simultaneous_parts_removal_limit` to allow limiting the number of parts being processed by one iteration of CleanupThread. [#44461](https://github.com/ClickHouse/ClickHouse/pull/44461) ([Dmitry Novik](https://github.com/novikd)).
+* Do not initialize ReadBufferFromS3 when only virtual columns are needed in a query. This may be helpful to [#44246](https://github.com/ClickHouse/ClickHouse/issues/44246). [#44493](https://github.com/ClickHouse/ClickHouse/pull/44493) ([chen](https://github.com/xiedeyantu)).
* Prevent duplicate column names hints. Closes [#44130](https://github.com/ClickHouse/ClickHouse/issues/44130). [#44519](https://github.com/ClickHouse/ClickHouse/pull/44519) ([Joanna Hulboj](https://github.com/jh0x)).
* Allow macro substitution in endpoint of disks. Resolve [#40951](https://github.com/ClickHouse/ClickHouse/issues/40951). [#44533](https://github.com/ClickHouse/ClickHouse/pull/44533) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Improve schema inference when `input_format_json_read_object_as_string` is enabled. [#44546](https://github.com/ClickHouse/ClickHouse/pull/44546) ([Kruglov Pavel](https://github.com/Avogar)).
-* Add a user-level setting `database_replicated_allow_replicated_engine_arguments` which allow to ban creation of `ReplicatedMergeTree` tables with arguments in `DatabaseReplicated`. [#44566](https://github.com/ClickHouse/ClickHouse/pull/44566) ([alesapin](https://github.com/alesapin)).
+* Add a user-level setting `database_replicated_allow_replicated_engine_arguments` which allows banning the creation of `ReplicatedMergeTree` tables with arguments in `DatabaseReplicated`. [#44566](https://github.com/ClickHouse/ClickHouse/pull/44566) ([alesapin](https://github.com/alesapin)).
* Prevent users from mistakenly specifying zero (invalid) value for `index_granularity`. This closes [#44536](https://github.com/ClickHouse/ClickHouse/issues/44536). [#44578](https://github.com/ClickHouse/ClickHouse/pull/44578) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added possibility to set path to service keytab file in `keytab` parameter in `kerberos` section of config.xml. [#44594](https://github.com/ClickHouse/ClickHouse/pull/44594) ([Roman Vasin](https://github.com/rvasin)).
* Use already written part of the query for fuzzy search (pass to the `skim` library, which is written in Rust and linked statically to ClickHouse). [#44600](https://github.com/ClickHouse/ClickHouse/pull/44600) ([Azat Khuzhin](https://github.com/azat)).
* Enable `input_format_json_read_objects_as_strings` by default to be able to read nested JSON objects while JSON Object type is experimental. [#44657](https://github.com/ClickHouse/ClickHouse/pull/44657) ([Kruglov Pavel](https://github.com/Avogar)).
-* Improvement for deduplication of async inserts: when users do duplicate async inserts, we should dedup inside the memory before we query keeper. [#44682](https://github.com/ClickHouse/ClickHouse/pull/44682) ([Han Fei](https://github.com/hanfei1991)).
+* Improvement for deduplication of async inserts: when users do duplicate async inserts, we should deduplicate inside the memory before we query Keeper. [#44682](https://github.com/ClickHouse/ClickHouse/pull/44682) ([Han Fei](https://github.com/hanfei1991)).
* Input/ouptut `Avro` format will parse bool type as ClickHouse bool type. [#44684](https://github.com/ClickHouse/ClickHouse/pull/44684) ([Kruglov Pavel](https://github.com/Avogar)).
* Support Bool type in Arrow/Parquet/ORC. Closes [#43970](https://github.com/ClickHouse/ClickHouse/issues/43970). [#44698](https://github.com/ClickHouse/ClickHouse/pull/44698) ([Kruglov Pavel](https://github.com/Avogar)).
* Don't greedily parse beyond the quotes when reading UUIDs - it may lead to mistakenly successful parsing of incorrect data. [#44686](https://github.com/ClickHouse/ClickHouse/pull/44686) ([Raúl Marín](https://github.com/Algunenano)).
@@ -98,7 +98,7 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
* Fix `output_format_pretty_row_numbers` does not preserve the counter across the blocks. Closes [#44815](https://github.com/ClickHouse/ClickHouse/issues/44815). [#44832](https://github.com/ClickHouse/ClickHouse/pull/44832) ([flynn](https://github.com/ucasfl)).
* Don't report errors in `system.errors` due to parts being merged concurrently with the background cleanup process. [#44874](https://github.com/ClickHouse/ClickHouse/pull/44874) ([Raúl Marín](https://github.com/Algunenano)).
* Optimize and fix metrics for Distributed async INSERT. [#44922](https://github.com/ClickHouse/ClickHouse/pull/44922) ([Azat Khuzhin](https://github.com/azat)).
-* Added settings to disallow concurrent backups and restores resolves [#43891](https://github.com/ClickHouse/ClickHouse/issues/43891) Implementation: * Added server level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context. * Settings are set to true by default. * Before starting backup or restores, added a check to see if any other backups/restores are running. For internal request it checks if its from the self node using backup_uuid. [#45072](https://github.com/ClickHouse/ClickHouse/pull/45072) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
+* Added settings to disallow concurrent backups and restores resolves [#43891](https://github.com/ClickHouse/ClickHouse/issues/43891) Implementation: * Added server-level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context. * Settings are set to true by default. * Before starting backup or restores, added a check to see if any other backups/restores are running. For internal requests, it checks if it is from the self node using backup_uuid. [#45072](https://github.com/ClickHouse/ClickHouse/pull/45072) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add `` config parameter for system logs. [#45320](https://github.com/ClickHouse/ClickHouse/pull/45320) ([Stig Bakken](https://github.com/stigsb)).
#### Build/Testing/Packaging Improvement
@@ -114,22 +114,22 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
#### Bug Fix
* Replace domain IP types (IPv4, IPv6) with native. [#43221](https://github.com/ClickHouse/ClickHouse/pull/43221) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). It automatically fixes some missing implementations in the code.
-* Fix backup process if mutations get killed during the backup process. [#45351](https://github.com/ClickHouse/ClickHouse/pull/45351) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix the backup process if mutations get killed during the backup process. [#45351](https://github.com/ClickHouse/ClickHouse/pull/45351) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix the `Invalid number of rows in Chunk` exception message. [#41404](https://github.com/ClickHouse/ClickHouse/issues/41404). [#42126](https://github.com/ClickHouse/ClickHouse/pull/42126) ([Alexander Gololobov](https://github.com/davenger)).
-* Fix possible use of uninitialized value after executing expressions after sorting. Closes [#43386](https://github.com/ClickHouse/ClickHouse/issues/43386) [#43635](https://github.com/ClickHouse/ClickHouse/pull/43635) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fix possible use of an uninitialized value after executing expressions after sorting. Closes [#43386](https://github.com/ClickHouse/ClickHouse/issues/43386) [#43635](https://github.com/ClickHouse/ClickHouse/pull/43635) ([Kruglov Pavel](https://github.com/Avogar)).
* Better handling of NULL in aggregate combinators, fix possible segfault/logical error while using an obscure optimization `optimize_rewrite_sum_if_to_count_if`. Closes [#43758](https://github.com/ClickHouse/ClickHouse/issues/43758). [#43813](https://github.com/ClickHouse/ClickHouse/pull/43813) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix CREATE USER/ROLE query settings constraints. [#43993](https://github.com/ClickHouse/ClickHouse/pull/43993) ([Nikolay Degterinsky](https://github.com/evillique)).
-* Fix wrong behavior of `JOIN ON t1.x = t2.x AND 1 = 1`, forbid such queries. [#44016](https://github.com/ClickHouse/ClickHouse/pull/44016) ([Vladimir C](https://github.com/vdimir)).
+* Fix incorrect behavior of `JOIN ON t1.x = t2.x AND 1 = 1`, forbid such queries. [#44016](https://github.com/ClickHouse/ClickHouse/pull/44016) ([Vladimir C](https://github.com/vdimir)).
* Fixed bug with non-parsable default value for `EPHEMERAL` column in table metadata. [#44026](https://github.com/ClickHouse/ClickHouse/pull/44026) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix parsing of bad version from compatibility setting. [#44224](https://github.com/ClickHouse/ClickHouse/pull/44224) ([Kruglov Pavel](https://github.com/Avogar)).
* Bring interval subtraction from datetime in line with addition. [#44241](https://github.com/ClickHouse/ClickHouse/pull/44241) ([ltrk2](https://github.com/ltrk2)).
-* Remove limits on maximum size of the result for view. [#44261](https://github.com/ClickHouse/ClickHouse/pull/44261) ([lizhuoyu5](https://github.com/lzydmxy)).
+* Remove limits on the maximum size of the result for view. [#44261](https://github.com/ClickHouse/ClickHouse/pull/44261) ([lizhuoyu5](https://github.com/lzydmxy)).
* Fix possible logical error in cache if `do_not_evict_index_and_mrk_files=1`. Closes [#42142](https://github.com/ClickHouse/ClickHouse/issues/42142). [#44268](https://github.com/ClickHouse/ClickHouse/pull/44268) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible too early cache write interruption in write-through cache (caching could be stopped due to false assumption when it shouldn't have). [#44289](https://github.com/ClickHouse/ClickHouse/pull/44289) ([Kseniia Sumarokova](https://github.com/kssenii)).
-* Fix possible crash in case function `IN` with constant arguments was used as a constant argument together with `LowCardinality`. Fixes [#44221](https://github.com/ClickHouse/ClickHouse/issues/44221). [#44346](https://github.com/ClickHouse/ClickHouse/pull/44346) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix possible crash in the case function `IN` with constant arguments was used as a constant argument together with `LowCardinality`. Fixes [#44221](https://github.com/ClickHouse/ClickHouse/issues/44221). [#44346](https://github.com/ClickHouse/ClickHouse/pull/44346) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix support for complex parameters (like arrays) of parametric aggregate functions. This closes [#30975](https://github.com/ClickHouse/ClickHouse/issues/30975). The aggregate function `sumMapFiltered` was unusable in distributed queries before this change. [#44358](https://github.com/ClickHouse/ClickHouse/pull/44358) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix reading ObjectId in BSON schema inference. [#44382](https://github.com/ClickHouse/ClickHouse/pull/44382) ([Kruglov Pavel](https://github.com/Avogar)).
-* Fix race which can lead to premature temp parts removal before merge finished in ReplicatedMergeTree. This issue could lead to errors like `No such file or directory: xxx`. Fixes [#43983](https://github.com/ClickHouse/ClickHouse/issues/43983). [#44383](https://github.com/ClickHouse/ClickHouse/pull/44383) ([alesapin](https://github.com/alesapin)).
+* Fix race which can lead to premature temp parts removal before merge finishes in ReplicatedMergeTree. This issue could lead to errors like `No such file or directory: xxx`. Fixes [#43983](https://github.com/ClickHouse/ClickHouse/issues/43983). [#44383](https://github.com/ClickHouse/ClickHouse/pull/44383) ([alesapin](https://github.com/alesapin)).
* Some invalid `SYSTEM ... ON CLUSTER` queries worked in an unexpected way if a cluster name was not specified. It's fixed, now invalid queries throw `SYNTAX_ERROR` as they should. Fixes [#44264](https://github.com/ClickHouse/ClickHouse/issues/44264). [#44387](https://github.com/ClickHouse/ClickHouse/pull/44387) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix reading Map type in ORC format. [#44400](https://github.com/ClickHouse/ClickHouse/pull/44400) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix reading columns that are not presented in input data in Parquet/ORC formats. Previously it could lead to error `INCORRECT_NUMBER_OF_COLUMNS`. Closes [#44333](https://github.com/ClickHouse/ClickHouse/issues/44333). [#44405](https://github.com/ClickHouse/ClickHouse/pull/44405) ([Kruglov Pavel](https://github.com/Avogar)).
@@ -137,49 +137,49 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
* Placing profile settings after profile settings constraints in the configuration file made constraints ineffective. [#44411](https://github.com/ClickHouse/ClickHouse/pull/44411) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix `SYNTAX_ERROR` while running `EXPLAIN AST INSERT` queries with data. Closes [#44207](https://github.com/ClickHouse/ClickHouse/issues/44207). [#44413](https://github.com/ClickHouse/ClickHouse/pull/44413) ([save-my-heart](https://github.com/save-my-heart)).
* Fix reading bool value with CRLF in CSV format. Closes [#44401](https://github.com/ClickHouse/ClickHouse/issues/44401). [#44442](https://github.com/ClickHouse/ClickHouse/pull/44442) ([Kruglov Pavel](https://github.com/Avogar)).
-* Don't execute and/or/if/multiIf on LowCardinality dictionary, so the result type cannot be LowCardinality. It could lead to error `Illegal column ColumnLowCardinality` in some cases. Fixes [#43603](https://github.com/ClickHouse/ClickHouse/issues/43603). [#44469](https://github.com/ClickHouse/ClickHouse/pull/44469) ([Kruglov Pavel](https://github.com/Avogar)).
-* Fix mutations with setting `max_streams_for_merge_tree_reading`. [#44472](https://github.com/ClickHouse/ClickHouse/pull/44472) ([Anton Popov](https://github.com/CurtizJ)).
+* Don't execute and/or/if/multiIf on a LowCardinality dictionary, so the result type cannot be LowCardinality. It could lead to the error `Illegal column ColumnLowCardinality` in some cases. Fixes [#43603](https://github.com/ClickHouse/ClickHouse/issues/43603). [#44469](https://github.com/ClickHouse/ClickHouse/pull/44469) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fix mutations with the setting `max_streams_for_merge_tree_reading`. [#44472](https://github.com/ClickHouse/ClickHouse/pull/44472) ([Anton Popov](https://github.com/CurtizJ)).
* Fix potential null pointer dereference with GROUPING SETS in ASTSelectQuery::formatImpl ([#43049](https://github.com/ClickHouse/ClickHouse/issues/43049)). [#44479](https://github.com/ClickHouse/ClickHouse/pull/44479) ([Robert Schulze](https://github.com/rschu1ze)).
* Validate types in table function arguments, CAST function arguments, JSONAsObject schema inference according to settings. [#44501](https://github.com/ClickHouse/ClickHouse/pull/44501) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix IN function with LowCardinality and const column, close [#44503](https://github.com/ClickHouse/ClickHouse/issues/44503). [#44506](https://github.com/ClickHouse/ClickHouse/pull/44506) ([Duc Canh Le](https://github.com/canhld94)).
-* Fixed a bug in normalization of a `DEFAULT` expression in `CREATE TABLE` statement. The second argument of function `in` (or the right argument of operator `IN`) might be replaced with the result of its evaluation during CREATE query execution. Fixes [#44496](https://github.com/ClickHouse/ClickHouse/issues/44496). [#44547](https://github.com/ClickHouse/ClickHouse/pull/44547) ([Alexander Tokmakov](https://github.com/tavplubix)).
+* Fixed a bug in the normalization of a `DEFAULT` expression in `CREATE TABLE` statement. The second argument of the function `in` (or the right argument of operator `IN`) might be replaced with the result of its evaluation during CREATE query execution. Fixes [#44496](https://github.com/ClickHouse/ClickHouse/issues/44496). [#44547](https://github.com/ClickHouse/ClickHouse/pull/44547) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Projections do not work in presence of WITH ROLLUP, WITH CUBE and WITH TOTALS. In previous versions, a query produced an exception instead of skipping the usage of projections. This closes [#44614](https://github.com/ClickHouse/ClickHouse/issues/44614). This closes [#42772](https://github.com/ClickHouse/ClickHouse/issues/42772). [#44615](https://github.com/ClickHouse/ClickHouse/pull/44615) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Async blocks were not cleaned because the function `get all blocks sorted by time` didn't get async blocks. [#44651](https://github.com/ClickHouse/ClickHouse/pull/44651) ([Han Fei](https://github.com/hanfei1991)).
* Fix `LOGICAL_ERROR` `The top step of the right pipeline should be ExpressionStep` for JOIN with subquery, UNION, and TOTALS. Fixes [#43687](https://github.com/ClickHouse/ClickHouse/issues/43687). [#44673](https://github.com/ClickHouse/ClickHouse/pull/44673) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Avoid `std::out_of_range` exception in the Executable table engine. [#44681](https://github.com/ClickHouse/ClickHouse/pull/44681) ([Kruglov Pavel](https://github.com/Avogar)).
* Do not apply `optimize_syntax_fuse_functions` to quantiles on AST, close [#44712](https://github.com/ClickHouse/ClickHouse/issues/44712). [#44713](https://github.com/ClickHouse/ClickHouse/pull/44713) ([Vladimir C](https://github.com/vdimir)).
* Fix bug with wrong type in Merge table and PREWHERE, close [#43324](https://github.com/ClickHouse/ClickHouse/issues/43324). [#44716](https://github.com/ClickHouse/ClickHouse/pull/44716) ([Vladimir C](https://github.com/vdimir)).
-* Fix possible crash during shutdown (while destroying TraceCollector). Fixes [#44757](https://github.com/ClickHouse/ClickHouse/issues/44757). [#44758](https://github.com/ClickHouse/ClickHouse/pull/44758) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
-* Fix a possible crash in distributed query processing. The crash could happen if a query with totals or extremes returned an empty result and there are mismatched types in the Distrubuted and the local tables. Fixes [#44738](https://github.com/ClickHouse/ClickHouse/issues/44738). [#44760](https://github.com/ClickHouse/ClickHouse/pull/44760) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix a possible crash during shutdown (while destroying TraceCollector). Fixes [#44757](https://github.com/ClickHouse/ClickHouse/issues/44757). [#44758](https://github.com/ClickHouse/ClickHouse/pull/44758) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix a possible crash in distributed query processing. The crash could happen if a query with totals or extremes returned an empty result and there are mismatched types in the Distributed and the local tables. Fixes [#44738](https://github.com/ClickHouse/ClickHouse/issues/44738). [#44760](https://github.com/ClickHouse/ClickHouse/pull/44760) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix fsync for fetches (`min_compressed_bytes_to_fsync_after_fetch`)/small files (ttl.txt, columns.txt) in mutations (`min_rows_to_fsync_after_merge`/`min_compressed_bytes_to_fsync_after_merge`). [#44781](https://github.com/ClickHouse/ClickHouse/pull/44781) ([Azat Khuzhin](https://github.com/azat)).
* A rare race condition was possible when querying the `system.parts` or `system.parts_columns` tables in the presence of parts being moved between disks. Introduced in [#41145](https://github.com/ClickHouse/ClickHouse/issues/41145). [#44809](https://github.com/ClickHouse/ClickHouse/pull/44809) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix the error `Context has expired` which could appear with enabled projections optimization. Can be reproduced for queries with specific functions, like `dictHas/dictGet` which use context in runtime. Fixes [#44844](https://github.com/ClickHouse/ClickHouse/issues/44844). [#44850](https://github.com/ClickHouse/ClickHouse/pull/44850) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* A fix for `Cannot read all data` error which could happen while reading `LowCardinality` dictionary from remote fs. Fixes [#44709](https://github.com/ClickHouse/ClickHouse/issues/44709). [#44875](https://github.com/ClickHouse/ClickHouse/pull/44875) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Ignore cases when hardware monitor sensors cannot be read instead of showing a full exception message in logs. [#44895](https://github.com/ClickHouse/ClickHouse/pull/44895) ([Raúl Marín](https://github.com/Algunenano)).
-* Use `max_delay_to_insert` value in case calculated time to delay INSERT exceeds the setting value. Related to [#44902](https://github.com/ClickHouse/ClickHouse/issues/44902). [#44916](https://github.com/ClickHouse/ClickHouse/pull/44916) ([Igor Nikonov](https://github.com/devcrafter)).
+* Use `max_delay_to_insert` value in case the calculated time to delay INSERT exceeds the setting value. Related to [#44902](https://github.com/ClickHouse/ClickHouse/issues/44902). [#44916](https://github.com/ClickHouse/ClickHouse/pull/44916) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix error `Different order of columns in UNION subquery` for queries with `UNION`. Fixes [#44866](https://github.com/ClickHouse/ClickHouse/issues/44866). [#44920](https://github.com/ClickHouse/ClickHouse/pull/44920) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Delay for INSERT can be calculated incorrectly, which can lead to always using `max_delay_to_insert` setting as delay instead of a correct value. Using simple formula `max_delay_to_insert * (parts_over_threshold/max_allowed_parts_over_threshold)` i.e. delay grows proportionally to parts over threshold. Closes [#44902](https://github.com/ClickHouse/ClickHouse/issues/44902). [#44954](https://github.com/ClickHouse/ClickHouse/pull/44954) ([Igor Nikonov](https://github.com/devcrafter)).
-* fix alter table ttl error when wide part has light weight delete mask. [#44959](https://github.com/ClickHouse/ClickHouse/pull/44959) ([Mingliang Pan](https://github.com/liangliangpan)).
+* Fix alter table TTL error when a wide part has the lightweight delete mask. [#44959](https://github.com/ClickHouse/ClickHouse/pull/44959) ([Mingliang Pan](https://github.com/liangliangpan)).
* Follow-up fix for Replace domain IP types (IPv4, IPv6) with native [#43221](https://github.com/ClickHouse/ClickHouse/issues/43221). [#45024](https://github.com/ClickHouse/ClickHouse/pull/45024) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Follow-up fix for Replace domain IP types (IPv4, IPv6) with native https://github.com/ClickHouse/ClickHouse/pull/43221. [#45043](https://github.com/ClickHouse/ClickHouse/pull/45043) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* A buffer overflow was possible in the parser. Found by fuzzer. [#45047](https://github.com/ClickHouse/ClickHouse/pull/45047) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix possible cannot-read-all-data error in storage FileLog. Closes [#45051](https://github.com/ClickHouse/ClickHouse/issues/45051), [#38257](https://github.com/ClickHouse/ClickHouse/issues/38257). [#45057](https://github.com/ClickHouse/ClickHouse/pull/45057) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Memory efficient aggregation (setting `distributed_aggregation_memory_efficient`) is disabled when grouping sets are present in the query. [#45058](https://github.com/ClickHouse/ClickHouse/pull/45058) ([Nikita Taranov](https://github.com/nickitat)).
-* Fix `RANGE_HASHED` dictionary to count range columns as part of primary key during updates when `update_field` is specified. Closes [#44588](https://github.com/ClickHouse/ClickHouse/issues/44588). [#45061](https://github.com/ClickHouse/ClickHouse/pull/45061) ([Maksim Kita](https://github.com/kitaisreal)).
-* Fix error `Cannot capture column` for `LowCardinality` captured argument of nested labmda. Fixes [#45028](https://github.com/ClickHouse/ClickHouse/issues/45028). [#45065](https://github.com/ClickHouse/ClickHouse/pull/45065) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
-* Fix the wrong query result of `additional_table_filters` (additional filter was not applied) in case if minmax/count projection is used. [#45133](https://github.com/ClickHouse/ClickHouse/pull/45133) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix `RANGE_HASHED` dictionary to count range columns as part of the primary key during updates when `update_field` is specified. Closes [#44588](https://github.com/ClickHouse/ClickHouse/issues/44588). [#45061](https://github.com/ClickHouse/ClickHouse/pull/45061) ([Maksim Kita](https://github.com/kitaisreal)).
+* Fix error `Cannot capture column` for `LowCardinality` captured argument of nested lambda. Fixes [#45028](https://github.com/ClickHouse/ClickHouse/issues/45028). [#45065](https://github.com/ClickHouse/ClickHouse/pull/45065) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix the wrong query result of `additional_table_filters` (additional filter was not applied) in case the minmax/count projection is used. [#45133](https://github.com/ClickHouse/ClickHouse/pull/45133) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed bug in `histogram` function accepting negative values. [#45147](https://github.com/ClickHouse/ClickHouse/pull/45147) ([simpleton](https://github.com/rgzntrade)).
* Fix wrong column nullability in StoreageJoin, close [#44940](https://github.com/ClickHouse/ClickHouse/issues/44940). [#45184](https://github.com/ClickHouse/ClickHouse/pull/45184) ([Vladimir C](https://github.com/vdimir)).
* Fix `background_fetches_pool_size` settings reload (increase at runtime). [#45189](https://github.com/ClickHouse/ClickHouse/pull/45189) ([Raúl Marín](https://github.com/Algunenano)).
* Correctly process `SELECT` queries on KV engines (e.g. KeeperMap, EmbeddedRocksDB) using `IN` on the key with subquery producing different type. [#45215](https://github.com/ClickHouse/ClickHouse/pull/45215) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix logical error in SEMI JOIN & join_use_nulls in some cases, close [#45163](https://github.com/ClickHouse/ClickHouse/issues/45163), close [#45209](https://github.com/ClickHouse/ClickHouse/issues/45209). [#45230](https://github.com/ClickHouse/ClickHouse/pull/45230) ([Vladimir C](https://github.com/vdimir)).
* Fix heap-use-after-free in reading from s3. [#45253](https://github.com/ClickHouse/ClickHouse/pull/45253) ([Kruglov Pavel](https://github.com/Avogar)).
-* Fix bug when the Avro Union type is ['null', Nested type], closes [#45275](https://github.com/ClickHouse/ClickHouse/issues/45275). Fix bug that incorrectly infer `bytes` type to `Float`. [#45276](https://github.com/ClickHouse/ClickHouse/pull/45276) ([flynn](https://github.com/ucasfl)).
-* Throw a correct exception when explicit PREWHERE cannot be used with table using storage engine `Merge`. [#45319](https://github.com/ClickHouse/ClickHouse/pull/45319) ([Antonio Andelic](https://github.com/antonio2368)).
-* Under WSL1 Ubuntu self-extracting clickhouse fails to decompress due to inconsistency - /proc/self/maps reporting 32bit file's inode, while stat reporting 64bit inode. [#45339](https://github.com/ClickHouse/ClickHouse/pull/45339) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
+* Fix bug when the Avro Union type is ['null', Nested type], closes [#45275](https://github.com/ClickHouse/ClickHouse/issues/45275). Fix bug that incorrectly infers `bytes` type to `Float`. [#45276](https://github.com/ClickHouse/ClickHouse/pull/45276) ([flynn](https://github.com/ucasfl)).
+* Throw a correct exception when explicit PREWHERE cannot be used with a table using the storage engine `Merge`. [#45319](https://github.com/ClickHouse/ClickHouse/pull/45319) ([Antonio Andelic](https://github.com/antonio2368)).
+* Under WSL1 Ubuntu self-extracting ClickHouse fails to decompress due to inconsistency - /proc/self/maps reporting 32bit file's inode, while stat reporting 64bit inode. [#45339](https://github.com/ClickHouse/ClickHouse/pull/45339) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix race in Distributed table startup (that could lead to processing file of async INSERT multiple times). [#45360](https://github.com/ClickHouse/ClickHouse/pull/45360) ([Azat Khuzhin](https://github.com/azat)).
-* Fix possible crash while reading from storage `S3` and table function `s3` in case when `ListObject` request has failed. [#45371](https://github.com/ClickHouse/ClickHouse/pull/45371) ([Anton Popov](https://github.com/CurtizJ)).
-* Fix `SELECT ... FROM system.dictionaries` exception when there is a dictionary with a bad structure (e.g. incorrect type in xml config). [#45399](https://github.com/ClickHouse/ClickHouse/pull/45399) ([Aleksei Filatov](https://github.com/aalexfvk)).
+* Fix a possible crash while reading from storage `S3` and table function `s3` in the case when `ListObject` request has failed. [#45371](https://github.com/ClickHouse/ClickHouse/pull/45371) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix `SELECT ... FROM system.dictionaries` exception when there is a dictionary with a bad structure (e.g. incorrect type in XML config). [#45399](https://github.com/ClickHouse/ClickHouse/pull/45399) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Fix s3Cluster schema inference when structure from insertion table is used in `INSERT INTO ... SELECT * FROM s3Cluster` queries. [#45422](https://github.com/ClickHouse/ClickHouse/pull/45422) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix bug in JSON/BSONEachRow parsing with HTTP that could lead to using default values for some columns instead of values from data. [#45424](https://github.com/ClickHouse/ClickHouse/pull/45424) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed bug (Code: 632. DB::Exception: Unexpected data ... after parsed IPv6 value ...) with typed parsing of IP types from text source. [#45425](https://github.com/ClickHouse/ClickHouse/pull/45425) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
@@ -187,7 +187,7 @@ Add settings input_format_tsv/csv/custom_detect_header that enables this behavio
* Fix possible (likely distributed) query hung. [#45448](https://github.com/ClickHouse/ClickHouse/pull/45448) ([Azat Khuzhin](https://github.com/azat)).
* Fix possible deadlock with `allow_asynchronous_read_from_io_pool_for_merge_tree` enabled in case of exception from `ThreadPool::schedule`. [#45481](https://github.com/ClickHouse/ClickHouse/pull/45481) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix possible in-use table after DETACH. [#45493](https://github.com/ClickHouse/ClickHouse/pull/45493) ([Azat Khuzhin](https://github.com/azat)).
-* Fix rare abort in case when query is canceled and parallel parsing was used during its execution. [#45498](https://github.com/ClickHouse/ClickHouse/pull/45498) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix rare abort in the case when a query is canceled and parallel parsing was used during its execution. [#45498](https://github.com/ClickHouse/ClickHouse/pull/45498) ([Anton Popov](https://github.com/CurtizJ)).
* Fix a race between Distributed table creation and INSERT into it (could lead to CANNOT_LINK during INSERT into the table). [#45502](https://github.com/ClickHouse/ClickHouse/pull/45502) ([Azat Khuzhin](https://github.com/azat)).
* Add proper default (SLRU) to cache policy getter. Closes [#45514](https://github.com/ClickHouse/ClickHouse/issues/45514). [#45524](https://github.com/ClickHouse/ClickHouse/pull/45524) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Disallow array join in mutations closes [#42637](https://github.com/ClickHouse/ClickHouse/issues/42637) [#44447](https://github.com/ClickHouse/ClickHouse/pull/44447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
diff --git a/contrib/krb5 b/contrib/krb5
index b89e20367b0..f8262a1b548 160000
--- a/contrib/krb5
+++ b/contrib/krb5
@@ -1 +1 @@
-Subproject commit b89e20367b074bd02dd118a6534099b21e88b3c3
+Subproject commit f8262a1b548eb29d97e059260042036255d07f8d
diff --git a/contrib/krb5-cmake/CMakeLists.txt b/contrib/krb5-cmake/CMakeLists.txt
index 7e184d424aa..ceaa270ad85 100644
--- a/contrib/krb5-cmake/CMakeLists.txt
+++ b/contrib/krb5-cmake/CMakeLists.txt
@@ -15,6 +15,10 @@ if(NOT AWK_PROGRAM)
message(FATAL_ERROR "You need the awk program to build ClickHouse with krb5 enabled.")
endif()
+if (NOT (ENABLE_OPENSSL OR ENABLE_OPENSSL_DYNAMIC))
+ add_compile_definitions(USE_BORINGSSL=1)
+endif ()
+
set(KRB5_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/krb5/src")
set(KRB5_ET_BIN_DIR "${CMAKE_CURRENT_BINARY_DIR}/include_private")
@@ -578,12 +582,6 @@ if(CMAKE_SYSTEM_NAME MATCHES "Darwin")
list(APPEND ALL_SRCS "${CMAKE_CURRENT_BINARY_DIR}/include_private/kcmrpc.c")
endif()
-if (ENABLE_OPENSSL OR ENABLE_OPENSSL_DYNAMIC)
- list(REMOVE_ITEM ALL_SRCS "${KRB5_SOURCE_DIR}/lib/crypto/openssl/enc_provider/aes.c")
- list(APPEND ALL_SRCS "${CMAKE_CURRENT_SOURCE_DIR}/aes.c")
-endif ()
-
-
target_sources(_krb5 PRIVATE
${ALL_SRCS}
)
diff --git a/contrib/krb5-cmake/aes.c b/contrib/krb5-cmake/aes.c
deleted file mode 100644
index c0c8c728bff..00000000000
--- a/contrib/krb5-cmake/aes.c
+++ /dev/null
@@ -1,302 +0,0 @@
-/* -*- mode: c; c-basic-offset: 4; indent-tabs-mode: nil -*- */
-/* lib/crypto/openssl/enc_provider/aes.c */
-/*
- * Copyright (C) 2003, 2007, 2008, 2009 by the Massachusetts Institute of Technology.
- * All rights reserved.
- *
- * Export of this software from the United States of America may
- * require a specific license from the United States Government.
- * It is the responsibility of any person or organization contemplating
- * export to obtain such a license before exporting.
- *
- * WITHIN THAT CONSTRAINT, permission to use, copy, modify, and
- * distribute this software and its documentation for any purpose and
- * without fee is hereby granted, provided that the above copyright
- * notice appear in all copies and that both that copyright notice and
- * this permission notice appear in supporting documentation, and that
- * the name of M.I.T. not be used in advertising or publicity pertaining
- * to distribution of the software without specific, written prior
- * permission. Furthermore if you modify this software you must label
- * your software as modified software and not distribute it in such a
- * fashion that it might be confused with the original M.I.T. software.
- * M.I.T. makes no representations about the suitability of
- * this software for any purpose. It is provided "as is" without express
- * or implied warranty.
- */
-
-#include "crypto_int.h"
-#include
-#include
-
-/* proto's */
-static krb5_error_code
-cbc_enc(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data);
-static krb5_error_code
-cbc_decr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data);
-static krb5_error_code
-cts_encr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data, size_t dlen);
-static krb5_error_code
-cts_decr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data, size_t dlen);
-
-#define BLOCK_SIZE 16
-#define NUM_BITS 8
-#define IV_CTS_BUF_SIZE 16 /* 16 - hardcoded in CRYPTO_cts128_en/decrypt */
-
-static const EVP_CIPHER *
-map_mode(unsigned int len)
-{
- if (len==16)
- return EVP_aes_128_cbc();
- if (len==32)
- return EVP_aes_256_cbc();
- else
- return NULL;
-}
-
-/* Encrypt one block using CBC. */
-static krb5_error_code
-cbc_enc(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data)
-{
- int ret, olen = BLOCK_SIZE;
- unsigned char iblock[BLOCK_SIZE], oblock[BLOCK_SIZE];
- EVP_CIPHER_CTX *ctx;
- struct iov_cursor cursor;
-
- ctx = EVP_CIPHER_CTX_new();
- if (ctx == NULL)
- return ENOMEM;
-
- ret = EVP_EncryptInit_ex(ctx, map_mode(key->keyblock.length),
- NULL, key->keyblock.contents, (ivec) ? (unsigned char*)ivec->data : NULL);
- if (ret == 0) {
- EVP_CIPHER_CTX_free(ctx);
- return KRB5_CRYPTO_INTERNAL;
- }
-
- k5_iov_cursor_init(&cursor, data, num_data, BLOCK_SIZE, FALSE);
- k5_iov_cursor_get(&cursor, iblock);
- EVP_CIPHER_CTX_set_padding(ctx,0);
- ret = EVP_EncryptUpdate(ctx, oblock, &olen, iblock, BLOCK_SIZE);
- if (ret == 1)
- k5_iov_cursor_put(&cursor, oblock);
- EVP_CIPHER_CTX_free(ctx);
-
- zap(iblock, BLOCK_SIZE);
- zap(oblock, BLOCK_SIZE);
- return (ret == 1) ? 0 : KRB5_CRYPTO_INTERNAL;
-}
-
-/* Decrypt one block using CBC. */
-static krb5_error_code
-cbc_decr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data)
-{
- int ret = 0, olen = BLOCK_SIZE;
- unsigned char iblock[BLOCK_SIZE], oblock[BLOCK_SIZE];
- EVP_CIPHER_CTX *ctx;
- struct iov_cursor cursor;
-
- ctx = EVP_CIPHER_CTX_new();
- if (ctx == NULL)
- return ENOMEM;
-
- ret = EVP_DecryptInit_ex(ctx, map_mode(key->keyblock.length),
- NULL, key->keyblock.contents, (ivec) ? (unsigned char*)ivec->data : NULL);
- if (ret == 0) {
- EVP_CIPHER_CTX_free(ctx);
- return KRB5_CRYPTO_INTERNAL;
- }
-
- k5_iov_cursor_init(&cursor, data, num_data, BLOCK_SIZE, FALSE);
- k5_iov_cursor_get(&cursor, iblock);
- EVP_CIPHER_CTX_set_padding(ctx,0);
- ret = EVP_DecryptUpdate(ctx, oblock, &olen, iblock, BLOCK_SIZE);
- if (ret == 1)
- k5_iov_cursor_put(&cursor, oblock);
- EVP_CIPHER_CTX_free(ctx);
-
- zap(iblock, BLOCK_SIZE);
- zap(oblock, BLOCK_SIZE);
- return (ret == 1) ? 0 : KRB5_CRYPTO_INTERNAL;
-}
-
-static krb5_error_code
-cts_encr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data, size_t dlen)
-{
- int ret = 0;
- size_t size = 0;
- unsigned char *oblock = NULL, *dbuf = NULL;
- unsigned char iv_cts[IV_CTS_BUF_SIZE];
- struct iov_cursor cursor;
- AES_KEY enck;
-
- memset(iv_cts,0,sizeof(iv_cts));
- if (ivec && ivec->data){
- if (ivec->length != sizeof(iv_cts))
- return KRB5_CRYPTO_INTERNAL;
- memcpy(iv_cts, ivec->data,ivec->length);
- }
-
- oblock = OPENSSL_malloc(dlen);
- if (!oblock){
- return ENOMEM;
- }
- dbuf = OPENSSL_malloc(dlen);
- if (!dbuf){
- OPENSSL_free(oblock);
- return ENOMEM;
- }
-
- k5_iov_cursor_init(&cursor, data, num_data, dlen, FALSE);
- k5_iov_cursor_get(&cursor, dbuf);
-
- AES_set_encrypt_key(key->keyblock.contents,
- NUM_BITS * key->keyblock.length, &enck);
-
- size = CRYPTO_cts128_encrypt((unsigned char *)dbuf, oblock, dlen, &enck,
- iv_cts, AES_cbc_encrypt);
- if (size <= 0)
- ret = KRB5_CRYPTO_INTERNAL;
- else
- k5_iov_cursor_put(&cursor, oblock);
-
- if (!ret && ivec && ivec->data)
- memcpy(ivec->data, iv_cts, sizeof(iv_cts));
-
- zap(oblock, dlen);
- zap(dbuf, dlen);
- OPENSSL_free(oblock);
- OPENSSL_free(dbuf);
-
- return ret;
-}
-
-static krb5_error_code
-cts_decr(krb5_key key, const krb5_data *ivec, krb5_crypto_iov *data,
- size_t num_data, size_t dlen)
-{
- int ret = 0;
- size_t size = 0;
- unsigned char *oblock = NULL;
- unsigned char *dbuf = NULL;
- unsigned char iv_cts[IV_CTS_BUF_SIZE];
- struct iov_cursor cursor;
- AES_KEY deck;
-
- memset(iv_cts,0,sizeof(iv_cts));
- if (ivec && ivec->data){
- if (ivec->length != sizeof(iv_cts))
- return KRB5_CRYPTO_INTERNAL;
- memcpy(iv_cts, ivec->data,ivec->length);
- }
-
- oblock = OPENSSL_malloc(dlen);
- if (!oblock)
- return ENOMEM;
- dbuf = OPENSSL_malloc(dlen);
- if (!dbuf){
- OPENSSL_free(oblock);
- return ENOMEM;
- }
-
- AES_set_decrypt_key(key->keyblock.contents,
- NUM_BITS * key->keyblock.length, &deck);
-
- k5_iov_cursor_init(&cursor, data, num_data, dlen, FALSE);
- k5_iov_cursor_get(&cursor, dbuf);
-
- size = CRYPTO_cts128_decrypt((unsigned char *)dbuf, oblock,
- dlen, &deck,
- iv_cts, AES_cbc_encrypt);
- if (size <= 0)
- ret = KRB5_CRYPTO_INTERNAL;
- else
- k5_iov_cursor_put(&cursor, oblock);
-
- if (!ret && ivec && ivec->data)
- memcpy(ivec->data, iv_cts, sizeof(iv_cts));
-
- zap(oblock, dlen);
- zap(dbuf, dlen);
- OPENSSL_free(oblock);
- OPENSSL_free(dbuf);
-
- return ret;
-}
-
-krb5_error_code
-krb5int_aes_encrypt(krb5_key key, const krb5_data *ivec,
- krb5_crypto_iov *data, size_t num_data)
-{
- int ret = 0;
- size_t input_length, nblocks;
-
- input_length = iov_total_length(data, num_data, FALSE);
- nblocks = (input_length + BLOCK_SIZE - 1) / BLOCK_SIZE;
- if (nblocks == 1) {
- if (input_length != BLOCK_SIZE)
- return KRB5_BAD_MSIZE;
- ret = cbc_enc(key, ivec, data, num_data);
- } else if (nblocks > 1) {
- ret = cts_encr(key, ivec, data, num_data, input_length);
- }
-
- return ret;
-}
-
-krb5_error_code
-krb5int_aes_decrypt(krb5_key key, const krb5_data *ivec,
- krb5_crypto_iov *data, size_t num_data)
-{
- int ret = 0;
- size_t input_length, nblocks;
-
- input_length = iov_total_length(data, num_data, FALSE);
- nblocks = (input_length + BLOCK_SIZE - 1) / BLOCK_SIZE;
- if (nblocks == 1) {
- if (input_length != BLOCK_SIZE)
- return KRB5_BAD_MSIZE;
- ret = cbc_decr(key, ivec, data, num_data);
- } else if (nblocks > 1) {
- ret = cts_decr(key, ivec, data, num_data, input_length);
- }
-
- return ret;
-}
-
-static krb5_error_code
-krb5int_aes_init_state (const krb5_keyblock *key, krb5_keyusage usage,
- krb5_data *state)
-{
- state->length = 16;
- state->data = (void *) malloc(16);
- if (state->data == NULL)
- return ENOMEM;
- memset(state->data, 0, state->length);
- return 0;
-}
-const struct krb5_enc_provider krb5int_enc_aes128 = {
- 16,
- 16, 16,
- krb5int_aes_encrypt,
- krb5int_aes_decrypt,
- NULL,
- krb5int_aes_init_state,
- krb5int_default_free_state
-};
-
-const struct krb5_enc_provider krb5int_enc_aes256 = {
- 16,
- 32, 32,
- krb5int_aes_encrypt,
- krb5int_aes_decrypt,
- NULL,
- krb5int_aes_init_state,
- krb5int_default_free_state
-};
diff --git a/contrib/openssl-cmake/CMakeLists.txt b/contrib/openssl-cmake/CMakeLists.txt
index dff5dff0936..92739ff3608 100644
--- a/contrib/openssl-cmake/CMakeLists.txt
+++ b/contrib/openssl-cmake/CMakeLists.txt
@@ -1,3 +1,9 @@
+# Note: ClickHouse uses BoringSSL. The presence of OpenSSL is only due to IBM's port of ClickHouse to s390x. BoringSSL does not support
+# s390x, also FIPS validation provided by the OS vendor (Red Hat, Ubuntu) requires (preferrably dynamic) linking with OS packages which
+# ClickHouse generally avoids.
+#
+# Furthermore, the in-source OpenSSL dump in this directory is due to development purposes and non FIPS-compliant.
+
if(ENABLE_OPENSSL_DYNAMIC OR ENABLE_OPENSSL)
set(ENABLE_SSL 1 CACHE INTERNAL "")
set(OPENSSL_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/openssl)
diff --git a/docs/en/engines/table-engines/mergetree-family/invertedindexes.md b/docs/en/engines/table-engines/mergetree-family/invertedindexes.md
index 4d7a0050c76..2899476b847 100644
--- a/docs/en/engines/table-engines/mergetree-family/invertedindexes.md
+++ b/docs/en/engines/table-engines/mergetree-family/invertedindexes.md
@@ -49,20 +49,20 @@ where `N` specifies the tokenizer:
Being a type of skipping index, inverted indexes can be dropped or added to a column after table creation:
``` sql
-ALTER TABLE tbl DROP INDEX inv_idx;
-ALTER TABLE tbl ADD INDEX inv_idx(s) TYPE inverted(2) GRANULARITY 1;
+ALTER TABLE tab DROP INDEX inv_idx;
+ALTER TABLE tab ADD INDEX inv_idx(s) TYPE inverted(2) GRANULARITY 1;
```
To use the index, no special functions or syntax are required. Typical string search predicates automatically leverage the index. As
examples, consider:
```sql
-SELECT * from tab WHERE s == 'Hello World;
-SELECT * from tab WHERE s IN (‘Hello’, ‘World’);
-SELECT * from tab WHERE s LIKE ‘%Hello%’;
-SELECT * from tab WHERE multiSearchAny(s, ‘Hello’, ‘World’);
-SELECT * from tab WHERE hasToken(s, ‘Hello’);
-SELECT * from tab WHERE multiSearchAll(s, [‘Hello’, ‘World’]);
+INSERT INTO tab(key, str) values (1, 'Hello World');
+SELECT * from tab WHERE str == 'Hello World';
+SELECT * from tab WHERE str IN ('Hello', 'World');
+SELECT * from tab WHERE str LIKE '%Hello%';
+SELECT * from tab WHERE multiSearchAny(str, ['Hello', 'World']);
+SELECT * from tab WHERE hasToken(str, 'Hello');
```
The inverted index also works on columns of type `Array(String)`, `Array(FixedString)`, `Map(String)` and `Map(String)`.
diff --git a/docs/en/engines/table-engines/special/generate.md b/docs/en/engines/table-engines/special/generate.md
index 32fa2cd9b2b..77d90082ddc 100644
--- a/docs/en/engines/table-engines/special/generate.md
+++ b/docs/en/engines/table-engines/special/generate.md
@@ -19,7 +19,7 @@ ENGINE = GenerateRandom([random_seed] [,max_string_length] [,max_array_length])
```
The `max_array_length` and `max_string_length` parameters specify maximum length of all
-array columns and strings correspondingly in generated data.
+array or map columns and strings correspondingly in generated data.
Generate table engine supports only `SELECT` queries.
diff --git a/docs/en/operations/_troubleshooting.md b/docs/en/operations/_troubleshooting.md
index aed63ec4d0f..a5c07ed18bd 100644
--- a/docs/en/operations/_troubleshooting.md
+++ b/docs/en/operations/_troubleshooting.md
@@ -56,6 +56,19 @@ sudo apt-get clean
sudo apt-get autoclean
```
+### You Can't Get Packages With Yum Because Of Wrong Signature
+
+Possible issue: the cache is wrong, maybe it's broken after updated GPG key in 2022-09.
+
+The solution is to clean out the cache and lib directory for yum:
+
+```
+sudo find /var/lib/yum/repos/ /var/cache/yum/ -name 'clickhouse-*' -type d -exec rm -rf {} +
+sudo rm -f /etc/yum.repos.d/clickhouse.repo
+```
+
+After that follow the [install guide](../getting-started/install.md#from-rpm-packages)
+
## Connecting to the Server {#troubleshooting-accepts-no-connections}
Possible issues:
diff --git a/docs/en/operations/backup.md b/docs/en/operations/backup.md
index 4feb434d762..f1a5649cd4c 100644
--- a/docs/en/operations/backup.md
+++ b/docs/en/operations/backup.md
@@ -79,7 +79,7 @@ The BACKUP and RESTORE statements take a list of DATABASE and TABLE names, a des
- ASYNC: backup or restore asynchronously
- PARTITIONS: a list of partitions to restore
- SETTINGS:
- - [`compression_method`](en/sql-reference/statements/create/table/#column-compression-codecs) and compression_level
+ - [`compression_method`](/docs/en/sql-reference/statements/create/table.md/#column-compression-codecs) and compression_level
- `password` for the file on disk
- `base_backup`: the destination of the previous backup of this source. For example, `Disk('backups', '1.zip')`
diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md
index b8376f3449e..b5b75b6ba03 100644
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@@ -1632,6 +1632,49 @@ SELECT * FROM test_table
└───┘
```
+## insert_keeper_max_retries
+
+The setting sets the maximum number of retries for ClickHouse Keeper (or ZooKeeper) requests during insert into replicated MergeTree. Only Keeper requests which failed due to network error, Keeper session timeout, or request timeout are considered for retries.
+
+Possible values:
+
+- Positive integer.
+- 0 — Retries are disabled
+
+Default value: 0
+
+Keeper request retries are done after some timeout. The timeout is controlled by the following settings: `insert_keeper_retry_initial_backoff_ms`, `insert_keeper_retry_max_backoff_ms`.
+The first retry is done after `insert_keeper_retry_initial_backoff_ms` timeout. The consequent timeouts will be calculated as follows:
+```
+timeout = min(insert_keeper_retry_max_backoff_ms, latest_timeout * 2)
+```
+
+For example, if `insert_keeper_retry_initial_backoff_ms=100`, `insert_keeper_retry_max_backoff_ms=10000` and `insert_keeper_max_retries=8` then timeouts will be `100, 200, 400, 800, 1600, 3200, 6400, 10000`.
+
+Apart from fault tolerance, the retries aim to provide a better user experience - they allow to avoid returning an error during INSERT execution if Keeper is restarted, for example, due to an upgrade.
+
+## insert_keeper_retry_initial_backoff_ms {#insert_keeper_retry_initial_backoff_ms}
+
+Initial timeout(in milliseconds) to retry a failed Keeper request during INSERT query execution
+
+Possible values:
+
+- Positive integer.
+- 0 — No timeout
+
+Default value: 100
+
+## insert_keeper_retry_max_backoff_ms {#insert_keeper_retry_max_backoff_ms}
+
+Maximum timeout (in milliseconds) to retry a failed Keeper request during INSERT query execution
+
+Possible values:
+
+- Positive integer.
+- 0 — Maximum timeout is not limited
+
+Default value: 10000
+
## max_network_bytes {#settings-max-network-bytes}
Limits the data volume (in bytes) that is received or transmitted over the network when executing a query. This setting applies to every individual query.
diff --git a/docs/en/sql-reference/functions/hash-functions.md b/docs/en/sql-reference/functions/hash-functions.md
index 730b494fcb5..ae6cdb7052d 100644
--- a/docs/en/sql-reference/functions/hash-functions.md
+++ b/docs/en/sql-reference/functions/hash-functions.md
@@ -45,37 +45,38 @@ SELECT halfMD5(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')
Calculates the MD4 from a string and returns the resulting set of bytes as FixedString(16).
-## MD5
+## MD5 {#hash_functions-md5}
Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16).
If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘sipHash128’ function instead.
If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).
-## sipHash64
+## sipHash64 (#hash_functions-siphash64)
-Produces a 64-bit [SipHash](https://131002.net/siphash/) hash value.
+Produces a 64-bit [SipHash](https://en.wikipedia.org/wiki/SipHash) hash value.
```sql
sipHash64(par1,...)
```
-This is a cryptographic hash function. It works at least three times faster than the [MD5](#hash_functions-md5) function.
+This is a cryptographic hash function. It works at least three times faster than the [MD5](#hash_functions-md5) hash function.
-Function [interprets](/docs/en/sql-reference/functions/type-conversion-functions.md/#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
+The function [interprets](/docs/en/sql-reference/functions/type-conversion-functions.md/#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the hash value for each of them. It then combines the hashes by the following algorithm:
-1. After hashing all the input parameters, the function gets the array of hashes.
-2. Function takes the first and the second elements and calculates a hash for the array of them.
-3. Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.
-4. The previous step is repeated for all the remaining elements of the initial hash array.
+1. The first and the second hash value are concatenated to an array which is hashed.
+2. The previously calculated hash value and the hash of the third input paramter are hashed in a similar way.
+3. This calculation is repeated for all remaining hash values of the original input.
**Arguments**
-The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
+The function takes a variable number of input parameters of any of the [supported data types](/docs/en/sql-reference/data-types/index.md).
**Returned Value**
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
+Note that the calculated hash values may be equal for the same input values of different argument types. This affects for example integer types of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data.
+
**Example**
```sql
@@ -84,13 +85,45 @@ SELECT sipHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00
```response
┌──────────────SipHash─┬─type───┐
-│ 13726873534472839665 │ UInt64 │
+│ 11400366955626497465 │ UInt64 │
└──────────────────────┴────────┘
```
+## sipHash64Keyed
+
+Same as [sipHash64](#hash_functions-siphash64) but additionally takes an explicit key argument instead of using a fixed key.
+
+**Syntax**
+
+```sql
+sipHash64Keyed((k0, k1), par1,...)
+```
+
+**Arguments**
+
+Same as [sipHash64](#hash_functions-siphash64), but the first argument is a tuple of two UInt64 values representing the key.
+
+**Returned value**
+
+A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
+
+**Example**
+
+Query:
+
+```sql
+SELECT sipHash64Keyed((506097522914230528, 1084818905618843912), array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS SipHash, toTypeName(SipHash) AS type;
+```
+
+```response
+┌─────────────SipHash─┬─type───┐
+│ 8017656310194184311 │ UInt64 │
+└─────────────────────┴────────┘
+```
+
## sipHash128
-Produces a 128-bit [SipHash](https://131002.net/siphash/) hash value. Differs from [sipHash64](#hash_functions-siphash64) in that the final xor-folding state is done up to 128 bits.
+Like [sipHash64](#hash_functions-siphash64) but produces a 128-bit hash value, i.e. the final xor-folding state is done up to 128 bits.
**Syntax**
@@ -100,13 +133,11 @@ sipHash128(par1,...)
**Arguments**
-The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
+Same as for [sipHash64](#hash_functions-siphash64).
**Returned value**
-A 128-bit `SipHash` hash value.
-
-Type: [FixedString(16)](/docs/en/sql-reference/data-types/fixedstring.md).
+A 128-bit `SipHash` hash value of type [FixedString(16)](/docs/en/sql-reference/data-types/fixedstring.md).
**Example**
@@ -124,6 +155,40 @@ Result:
└──────────────────────────────────┘
```
+## sipHash128Keyed
+
+Same as [sipHash128](#hash_functions-siphash128) but additionally takes an explicit key argument instead of using a fixed key.
+
+**Syntax**
+
+```sql
+sipHash128Keyed((k0, k1), par1,...)
+```
+
+**Arguments**
+
+Same as [sipHash128](#hash_functions-siphash128), but the first argument is a tuple of two UInt64 values representing the key.
+
+**Returned value**
+
+A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
+
+**Example**
+
+Query:
+
+```sql
+SELECT hex(sipHash128Keyed((506097522914230528, 1084818905618843912),'foo', '\x01', 3));
+```
+
+Result:
+
+```response
+┌─hex(sipHash128Keyed((506097522914230528, 1084818905618843912), 'foo', '', 3))─┐
+│ B8467F65C8B4CFD9A5F8BD733917D9BF │
+└───────────────────────────────────────────────────────────────────────────────┘
+```
+
## cityHash64
Produces a 64-bit [CityHash](https://github.com/google/cityhash) hash value.
diff --git a/docs/en/sql-reference/statements/create/dictionary.md b/docs/en/sql-reference/statements/create/dictionary.md
index a470b071971..e789dd9257f 100644
--- a/docs/en/sql-reference/statements/create/dictionary.md
+++ b/docs/en/sql-reference/statements/create/dictionary.md
@@ -110,7 +110,7 @@ LIFETIME(MIN 0 MAX 1000)
### Create a dictionary from a file available by HTTP(S)
```sql
-statement: CREATE DICTIONARY default.taxi_zone_dictionary
+CREATE DICTIONARY default.taxi_zone_dictionary
(
`LocationID` UInt16 DEFAULT 0,
`Borough` String,
diff --git a/docs/en/sql-reference/table-functions/generate.md b/docs/en/sql-reference/table-functions/generate.md
index 90d81e4f74e..b53ccdd42b5 100644
--- a/docs/en/sql-reference/table-functions/generate.md
+++ b/docs/en/sql-reference/table-functions/generate.md
@@ -8,7 +8,7 @@ sidebar_label: generateRandom
Generates random data with given schema.
Allows to populate test tables with data.
-Supports all data types that can be stored in table except `LowCardinality` and `AggregateFunction`.
+Not all types are supported.
``` sql
generateRandom('name TypeName[, name TypeName]...', [, 'random_seed'[, 'max_string_length'[, 'max_array_length']]])
@@ -18,7 +18,7 @@ generateRandom('name TypeName[, name TypeName]...', [, 'random_seed'[, 'max_stri
- `name` — Name of corresponding column.
- `TypeName` — Type of corresponding column.
-- `max_array_length` — Maximum array length for all generated arrays. Defaults to `10`.
+- `max_array_length` — Maximum elements for all generated arrays or maps. Defaults to `10`.
- `max_string_length` — Maximum string length for all generated strings. Defaults to `10`.
- `random_seed` — Specify random seed manually to produce stable results. If NULL — seed is randomly generated.
diff --git a/src/Backups/BackupImpl.cpp b/src/Backups/BackupImpl.cpp
index bd9ac45d0f5..4d96bb2a20c 100644
--- a/src/Backups/BackupImpl.cpp
+++ b/src/Backups/BackupImpl.cpp
@@ -271,6 +271,18 @@ size_t BackupImpl::getNumFiles() const
return num_files;
}
+size_t BackupImpl::getNumProcessedFiles() const
+{
+ std::lock_guard lock{mutex};
+ return num_processed_files;
+}
+
+UInt64 BackupImpl::getProcessedFilesSize() const
+{
+ std::lock_guard lock{mutex};
+ return processed_files_size;
+}
+
UInt64 BackupImpl::getUncompressedSize() const
{
std::lock_guard lock{mutex};
@@ -355,6 +367,7 @@ void BackupImpl::writeBackupMetadata()
out->finalize();
increaseUncompressedSize(str.size());
+ increaseProcessedSize(str.size());
}
@@ -380,6 +393,7 @@ void BackupImpl::readBackupMetadata()
String str;
readStringUntilEOF(str, *in);
increaseUncompressedSize(str.size());
+ increaseProcessedSize(str.size());
Poco::XML::DOMParser dom_parser;
Poco::AutoPtr config = dom_parser.parseMemory(str.data(), str.size());
const Poco::XML::Node * config_root = getRootNode(config);
@@ -598,6 +612,8 @@ BackupEntryPtr BackupImpl::readFile(const SizeAndChecksum & size_and_checksum) c
if (open_mode != OpenMode::READ)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is not opened for reading");
+ increaseProcessedSize(size_and_checksum.first);
+
if (!size_and_checksum.first)
{
/// Entry's data is empty.
@@ -762,6 +778,11 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
.base_checksum = 0,
};
+ {
+ std::lock_guard lock{mutex};
+ increaseProcessedSize(info);
+ }
+
/// Empty file, nothing to backup
if (info.size == 0 && deduplicate_files)
{
@@ -972,6 +993,17 @@ void BackupImpl::increaseUncompressedSize(const FileInfo & info)
increaseUncompressedSize(info.size - info.base_size);
}
+void BackupImpl::increaseProcessedSize(UInt64 file_size) const
+{
+ processed_files_size += file_size;
+ ++num_processed_files;
+}
+
+void BackupImpl::increaseProcessedSize(const FileInfo & info)
+{
+ increaseProcessedSize(info.size);
+}
+
void BackupImpl::setCompressedSize()
{
if (use_archives)
@@ -1011,7 +1043,7 @@ std::shared_ptr BackupImpl::getArchiveWriter(const String & suff
String archive_name_with_suffix = getArchiveNameWithSuffix(suffix);
auto new_archive_writer = createArchiveWriter(archive_params.archive_name, writer->writeFile(archive_name_with_suffix));
new_archive_writer->setPassword(archive_params.password);
-
+ new_archive_writer->setCompression(archive_params.compression_method, archive_params.compression_level);
size_t pos = suffix.empty() ? 0 : 1;
archive_writers[pos] = {suffix, new_archive_writer};
diff --git a/src/Backups/BackupImpl.h b/src/Backups/BackupImpl.h
index 9fc881bf680..45c471aa825 100644
--- a/src/Backups/BackupImpl.h
+++ b/src/Backups/BackupImpl.h
@@ -59,6 +59,8 @@ public:
time_t getTimestamp() const override { return timestamp; }
UUID getUUID() const override { return *uuid; }
size_t getNumFiles() const override;
+ size_t getNumProcessedFiles() const override;
+ UInt64 getProcessedFilesSize() const override;
UInt64 getUncompressedSize() const override;
UInt64 getCompressedSize() const override;
Strings listFiles(const String & directory, bool recursive) const override;
@@ -101,10 +103,16 @@ private:
std::shared_ptr getArchiveReader(const String & suffix) const;
std::shared_ptr getArchiveWriter(const String & suffix);
- /// Increases `uncompressed_size` by a specific value and `num_files` by 1.
+ /// Increases `uncompressed_size` by a specific value,
+ /// also increases `num_files` by 1.
void increaseUncompressedSize(UInt64 file_size);
void increaseUncompressedSize(const FileInfo & info);
+ /// Increases `num_processed_files` by a specific value,
+ /// also increases `num_processed_files` by 1.
+ void increaseProcessedSize(UInt64 file_size) const;
+ void increaseProcessedSize(const FileInfo & info);
+
/// Calculates and sets `compressed_size`.
void setCompressedSize();
@@ -121,6 +129,8 @@ private:
std::optional uuid;
time_t timestamp = 0;
size_t num_files = 0;
+ mutable size_t num_processed_files = 0;
+ mutable UInt64 processed_files_size = 0;
UInt64 uncompressed_size = 0;
UInt64 compressed_size = 0;
int version;
diff --git a/src/Backups/BackupsWorker.cpp b/src/Backups/BackupsWorker.cpp
index 1c56e35e2f6..5348f4ec81a 100644
--- a/src/Backups/BackupsWorker.cpp
+++ b/src/Backups/BackupsWorker.cpp
@@ -338,16 +338,20 @@ void BackupsWorker::doBackup(
}
size_t num_files = 0;
+ size_t num_processed_files = 0;
UInt64 uncompressed_size = 0;
UInt64 compressed_size = 0;
+ UInt64 processed_files_size = 0;
/// Finalize backup (write its metadata).
if (!backup_settings.internal)
{
backup->finalizeWriting();
num_files = backup->getNumFiles();
+ num_processed_files = backup->getNumProcessedFiles();
uncompressed_size = backup->getUncompressedSize();
compressed_size = backup->getCompressedSize();
+ processed_files_size = backup->getProcessedFilesSize();
}
/// Close the backup.
@@ -355,7 +359,7 @@ void BackupsWorker::doBackup(
LOG_INFO(log, "{} {} was created successfully", (backup_settings.internal ? "Internal backup" : "Backup"), backup_name_for_logging);
setStatus(backup_id, BackupStatus::BACKUP_CREATED);
- setNumFilesAndSize(backup_id, num_files, uncompressed_size, compressed_size);
+ setNumFilesAndSize(backup_id, num_files, num_processed_files, processed_files_size, uncompressed_size, compressed_size);
}
catch (...)
{
@@ -496,8 +500,6 @@ void BackupsWorker::doRestore(
backup_open_params.password = restore_settings.password;
BackupPtr backup = BackupFactory::instance().createBackup(backup_open_params);
- setNumFilesAndSize(restore_id, backup->getNumFiles(), backup->getUncompressedSize(), backup->getCompressedSize());
-
String current_database = context->getCurrentDatabase();
/// Checks access rights if this is ON CLUSTER query.
@@ -578,6 +580,13 @@ void BackupsWorker::doRestore(
LOG_INFO(log, "Restored from {} {} successfully", (restore_settings.internal ? "internal backup" : "backup"), backup_name_for_logging);
setStatus(restore_id, BackupStatus::RESTORED);
+ setNumFilesAndSize(
+ restore_id,
+ backup->getNumFiles(),
+ backup->getNumProcessedFiles(),
+ backup->getProcessedFilesSize(),
+ backup->getUncompressedSize(),
+ backup->getCompressedSize());
}
catch (...)
{
@@ -658,7 +667,7 @@ void BackupsWorker::setStatus(const String & id, BackupStatus status, bool throw
}
-void BackupsWorker::setNumFilesAndSize(const String & id, size_t num_files, UInt64 uncompressed_size, UInt64 compressed_size)
+void BackupsWorker::setNumFilesAndSize(const String & id, size_t num_files, size_t num_processed_files, UInt64 processed_files_size, UInt64 uncompressed_size, UInt64 compressed_size)
{
std::lock_guard lock{infos_mutex};
auto it = infos.find(id);
@@ -667,6 +676,8 @@ void BackupsWorker::setNumFilesAndSize(const String & id, size_t num_files, UInt
auto & info = it->second;
info.num_files = num_files;
+ info.num_processed_files = num_processed_files;
+ info.processed_files_size = processed_files_size;
info.uncompressed_size = uncompressed_size;
info.compressed_size = compressed_size;
}
diff --git a/src/Backups/BackupsWorker.h b/src/Backups/BackupsWorker.h
index 98ba9d10e93..462a9033251 100644
--- a/src/Backups/BackupsWorker.h
+++ b/src/Backups/BackupsWorker.h
@@ -56,6 +56,14 @@ public:
/// Number of files in the backup (including backup's metadata; only unique files are counted).
size_t num_files = 0;
+ /// Number of processed files during backup or restore process
+ /// For restore it includes files from base backups
+ size_t num_processed_files = 0;
+
+ /// Size of processed files during backup or restore
+ /// For restore in includes sizes from base backups
+ UInt64 processed_files_size = 0;
+
/// Size of all files in the backup (including backup's metadata; only unique files are counted).
UInt64 uncompressed_size = 0;
@@ -102,7 +110,7 @@ private:
void addInfo(const OperationID & id, const String & name, bool internal, BackupStatus status);
void setStatus(const OperationID & id, BackupStatus status, bool throw_if_error = true);
void setStatusSafe(const String & id, BackupStatus status) { setStatus(id, status, false); }
- void setNumFilesAndSize(const OperationID & id, size_t num_files, UInt64 uncompressed_size, UInt64 compressed_size);
+ void setNumFilesAndSize(const OperationID & id, size_t num_files, size_t num_processed_files, UInt64 processed_files_size, UInt64 uncompressed_size, UInt64 compressed_size);
std::vector getAllActiveBackupInfos() const;
std::vector getAllActiveRestoreInfos() const;
bool hasConcurrentBackups(const BackupSettings & backup_settings) const;
diff --git a/src/Backups/IBackup.h b/src/Backups/IBackup.h
index 43763c5bfde..13c21fb0287 100644
--- a/src/Backups/IBackup.h
+++ b/src/Backups/IBackup.h
@@ -40,6 +40,12 @@ public:
/// Returns the number of unique files in the backup.
virtual size_t getNumFiles() const = 0;
+ /// Returns the number of files were processed for backup or restore
+ virtual size_t getNumProcessedFiles() const = 0;
+
+ // Returns the total size of processed files for backup or restore
+ virtual UInt64 getProcessedFilesSize() const = 0;
+
/// Returns the total size of unique files in the backup.
virtual UInt64 getUncompressedSize() const = 0;
diff --git a/src/Common/NamedCollections/NamedCollectionConfiguration.cpp b/src/Common/NamedCollections/NamedCollectionConfiguration.cpp
index 5d65ec10c17..1c42b001ceb 100644
--- a/src/Common/NamedCollections/NamedCollectionConfiguration.cpp
+++ b/src/Common/NamedCollections/NamedCollectionConfiguration.cpp
@@ -52,11 +52,13 @@ template T getConfigValueOrDefault(
return config.getInt64(path);
else if constexpr (std::is_same_v)
return config.getDouble(path);
+ else if constexpr (std::is_same_v)
+ return config.getBool(path);
else
throw Exception(
ErrorCodes::NOT_IMPLEMENTED,
"Unsupported type in getConfigValueOrDefault(). "
- "Supported types are String, UInt64, Int64, Float64");
+ "Supported types are String, UInt64, Int64, Float64, bool");
}
catch (const Poco::SyntaxException &)
{
@@ -85,11 +87,13 @@ template void setConfigValue(
config.setInt64(path, value);
else if constexpr (std::is_same_v)
config.setDouble(path, value);
+ else if constexpr (std::is_same_v)
+ config.setBool(path, value);
else
throw Exception(
ErrorCodes::NOT_IMPLEMENTED,
"Unsupported type in setConfigValue(). "
- "Supported types are String, UInt64, Int64, Float64");
+ "Supported types are String, UInt64, Int64, Float64, bool");
}
template void copyConfigValue(
@@ -206,6 +210,8 @@ template Int64 getConfigValue(const Poco::Util::AbstractConfiguration & c
const std::string & path);
template Float64 getConfigValue(const Poco::Util::AbstractConfiguration & config,
const std::string & path);
+template bool getConfigValue(const Poco::Util::AbstractConfiguration & config,
+ const std::string & path);
template String getConfigValueOrDefault(const Poco::Util::AbstractConfiguration & config,
const std::string & path, const String * default_value);
@@ -215,6 +221,8 @@ template Int64 getConfigValueOrDefault(const Poco::Util::AbstractConfigur
const std::string & path, const Int64 * default_value);
template Float64 getConfigValueOrDefault(const Poco::Util::AbstractConfiguration & config,
const std::string & path, const Float64 * default_value);
+template bool getConfigValueOrDefault(const Poco::Util::AbstractConfiguration & config,
+ const std::string & path, const bool * default_value);
template void setConfigValue(Poco::Util::AbstractConfiguration & config,
const std::string & path, const String & value, bool update);
@@ -224,6 +232,8 @@ template void setConfigValue(Poco::Util::AbstractConfiguration & config,
const std::string & path, const Int64 & value, bool update);
template void setConfigValue(Poco::Util::AbstractConfiguration & config,
const std::string & path, const Float64 & value, bool update);
+template void setConfigValue(Poco::Util::AbstractConfiguration & config,
+ const std::string & path, const bool & value, bool update);
template void copyConfigValue(const Poco::Util::AbstractConfiguration & from_config, const std::string & from_path,
Poco::Util::AbstractConfiguration & to_config, const std::string & to_path);
@@ -233,6 +243,8 @@ template void copyConfigValue(const Poco::Util::AbstractConfiguration & f
Poco::Util::AbstractConfiguration & to_config, const std::string & to_path);
template void copyConfigValue(const Poco::Util::AbstractConfiguration & from_config, const std::string & from_path,
Poco::Util::AbstractConfiguration & to_config, const std::string & to_path);
+template void copyConfigValue(const Poco::Util::AbstractConfiguration & from_config, const std::string & from_path,
+ Poco::Util::AbstractConfiguration & to_config, const std::string & to_path);
}
}
diff --git a/src/Common/NamedCollections/NamedCollections.cpp b/src/Common/NamedCollections/NamedCollections.cpp
index 760c1ff91a8..5db46826b19 100644
--- a/src/Common/NamedCollections/NamedCollections.cpp
+++ b/src/Common/NamedCollections/NamedCollections.cpp
@@ -436,11 +436,13 @@ template String NamedCollection::get(const NamedCollection::Key & key) c
template UInt64 NamedCollection::get(const NamedCollection::Key & key) const;
template Int64 NamedCollection::get(const NamedCollection::Key & key) const;
template Float64 NamedCollection::get(const NamedCollection::Key & key) const;
+template bool NamedCollection::get(const NamedCollection::Key & key) const;
template String NamedCollection::getOrDefault(const NamedCollection::Key & key, const String & default_value) const;
template UInt64 NamedCollection::getOrDefault(const NamedCollection::Key & key, const UInt64 & default_value) const;
template Int64 NamedCollection::getOrDefault(const NamedCollection::Key & key, const Int64 & default_value) const;
template Float64 NamedCollection::getOrDefault(const NamedCollection::Key & key, const Float64 & default_value) const;
+template bool NamedCollection::getOrDefault(const NamedCollection::Key & key, const bool & default_value) const;
template void NamedCollection::set(const NamedCollection::Key & key, const String & value);
template void NamedCollection::set(const NamedCollection::Key & key, const String & value);
@@ -450,6 +452,7 @@ template void NamedCollection::set(const NamedCollection::Key & key
template void NamedCollection::set(const NamedCollection::Key & key, const Int64 & value);
template void NamedCollection::set(const NamedCollection::Key & key, const Float64 & value);
template void NamedCollection::set(const NamedCollection::Key & key, const Float64 & value);
+template void NamedCollection::set(const NamedCollection::Key & key, const bool & value);
template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const String & value);
template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const String & value);
@@ -459,6 +462,7 @@ template void NamedCollection::setOrUpdate(const NamedCollection::K
template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const Int64 & value);
template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const Float64 & value);
template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const Float64 & value);
+template void NamedCollection::setOrUpdate(const NamedCollection::Key & key, const bool & value);
template void NamedCollection::remove(const Key & key);
template void NamedCollection::remove(const Key & key);
diff --git a/src/Common/ProfileEvents.cpp b/src/Common/ProfileEvents.cpp
index f4589108e66..d03e8c8d002 100644
--- a/src/Common/ProfileEvents.cpp
+++ b/src/Common/ProfileEvents.cpp
@@ -326,7 +326,6 @@ The server successfully detected this situation and will download merged part fr
M(S3ListObjects, "Number of S3 API ListObjects calls.") \
M(S3HeadObject, "Number of S3 API HeadObject calls.") \
M(S3GetObjectAttributes, "Number of S3 API GetObjectAttributes calls.") \
- M(S3GetObjectMetadata, "Number of S3 API GetObject calls for getting metadata.") \
M(S3CreateMultipartUpload, "Number of S3 API CreateMultipartUpload calls.") \
M(S3UploadPartCopy, "Number of S3 API UploadPartCopy calls.") \
M(S3UploadPart, "Number of S3 API UploadPart calls.") \
@@ -340,7 +339,6 @@ The server successfully detected this situation and will download merged part fr
M(DiskS3ListObjects, "Number of DiskS3 API ListObjects calls.") \
M(DiskS3HeadObject, "Number of DiskS3 API HeadObject calls.") \
M(DiskS3GetObjectAttributes, "Number of DiskS3 API GetObjectAttributes calls.") \
- M(DiskS3GetObjectMetadata, "Number of DiskS3 API GetObject calls for getting metadata.") \
M(DiskS3CreateMultipartUpload, "Number of DiskS3 API CreateMultipartUpload calls.") \
M(DiskS3UploadPartCopy, "Number of DiskS3 API UploadPartCopy calls.") \
M(DiskS3UploadPart, "Number of DiskS3 API UploadPart calls.") \
diff --git a/src/Common/SipHash.h b/src/Common/SipHash.h
index 96b095724c2..1d602a3b191 100644
--- a/src/Common/SipHash.h
+++ b/src/Common/SipHash.h
@@ -78,13 +78,13 @@ private:
public:
/// Arguments - seed.
- SipHash(UInt64 k0 = 0, UInt64 k1 = 0) /// NOLINT
+ SipHash(UInt64 key0 = 0, UInt64 key1 = 0) /// NOLINT
{
/// Initialize the state with some random bytes and seed.
- v0 = 0x736f6d6570736575ULL ^ k0;
- v1 = 0x646f72616e646f6dULL ^ k1;
- v2 = 0x6c7967656e657261ULL ^ k0;
- v3 = 0x7465646279746573ULL ^ k1;
+ v0 = 0x736f6d6570736575ULL ^ key0;
+ v1 = 0x646f72616e646f6dULL ^ key1;
+ v2 = 0x6c7967656e657261ULL ^ key0;
+ v3 = 0x7465646279746573ULL ^ key1;
cnt = 0;
current_word = 0;
@@ -216,20 +216,30 @@ inline void sipHash128(const char * data, const size_t size, char * out)
hash.get128(out);
}
-inline UInt128 sipHash128(const char * data, const size_t size)
+inline UInt128 sipHash128Keyed(UInt64 key0, UInt64 key1, const char * data, const size_t size)
{
- SipHash hash;
+ SipHash hash(key0, key1);
hash.update(data, size);
return hash.get128();
}
-inline UInt64 sipHash64(const char * data, const size_t size)
+inline UInt128 sipHash128(const char * data, const size_t size)
{
- SipHash hash;
+ return sipHash128Keyed(0, 0, data, size);
+}
+
+inline UInt64 sipHash64Keyed(UInt64 key0, UInt64 key1, const char * data, const size_t size)
+{
+ SipHash hash(key0, key1);
hash.update(data, size);
return hash.get64();
}
+inline UInt64 sipHash64(const char * data, const size_t size)
+{
+ return sipHash64Keyed(0, 0, data, size);
+}
+
template
UInt64 sipHash64(const T & x)
{
diff --git a/src/Common/shuffle.h b/src/Common/shuffle.h
new file mode 100644
index 00000000000..c21a3e4ea33
--- /dev/null
+++ b/src/Common/shuffle.h
@@ -0,0 +1,54 @@
+#pragma once
+
+#include
+#include
+#include
+
+/* Reorders the elements in the given range [first, last) such that each
+ * possible permutation of those elements has equal probability of appearance.
+ *
+ * for i ∈ [0, n-2):
+ * j ← random from ∈ [i, n)
+ * swap arr[i] ↔ arr[j]
+ */
+template
+void shuffle(Iter first, Iter last, Rng && rng)
+{
+ using diff_t = typename std::iterator_traits::difference_type;
+ using distr_t = std::uniform_int_distribution;
+ using param_t = typename distr_t::param_type;
+ distr_t d;
+ diff_t n = last - first;
+ for (ssize_t i = 0; i < n - 1; ++i)
+ {
+ using std::swap;
+ auto j = d(rng, param_t(i, n - 1));
+ swap(first[i], first[j]);
+ }
+}
+
+
+/* Partially shuffle elements in range [first, last) in such a way that
+ * [first, first + limit) is a random subset of the original range.
+ * [first + limit, last) shall contain the elements not in [first, first + limit)
+ * in undefined order.
+ *
+ * for i ∈ [0, limit):
+ * j ← random from ∈ [i, n)
+ * swap arr[i] ↔ arr[j]
+ */
+template
+void partial_shuffle(Iter first, Iter last, size_t limit, Rng && rng)
+{
+ using diff_t = typename std::iterator_traits::difference_type;
+ using distr_t = std::uniform_int_distribution;
+ using param_t = typename distr_t::param_type;
+ distr_t d;
+ diff_t n = last - first;
+ for (size_t i = 0; i < limit; ++i)
+ {
+ using std::swap;
+ auto j = d(rng, param_t(i, n - 1));
+ swap(first[i], first[j]);
+ }
+}
diff --git a/src/Coordination/KeeperSnapshotManagerS3.cpp b/src/Coordination/KeeperSnapshotManagerS3.cpp
index 0f794a2876c..c4cbbb7f28a 100644
--- a/src/Coordination/KeeperSnapshotManagerS3.cpp
+++ b/src/Coordination/KeeperSnapshotManagerS3.cpp
@@ -6,7 +6,7 @@
#include
#include
-#include
+#include
#include
#include
#include
diff --git a/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp b/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp
index f0433762150..fca8f4111b3 100644
--- a/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp
+++ b/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp
@@ -16,6 +16,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -109,7 +110,8 @@ std::string S3ObjectStorage::generateBlobNameForPath(const std::string & /* path
bool S3ObjectStorage::exists(const StoredObject & object) const
{
- return S3::objectExists(*client.get(), bucket, object.absolute_path, {}, /* for_disk_s3= */ true);
+ auto settings_ptr = s3_settings.get();
+ return S3::objectExists(*client.get(), bucket, object.absolute_path, {}, settings_ptr->request_settings, /* for_disk_s3= */ true);
}
std::unique_ptr S3ObjectStorage::readObjects( /// NOLINT
@@ -384,12 +386,13 @@ void S3ObjectStorage::removeObjectsIfExist(const StoredObjects & objects)
ObjectMetadata S3ObjectStorage::getObjectMetadata(const std::string & path) const
{
- ObjectMetadata result;
+ auto settings_ptr = s3_settings.get();
+ auto object_info = S3::getObjectInfo(*client.get(), bucket, path, {}, settings_ptr->request_settings, /* with_metadata= */ true, /* for_disk_s3= */ true);
- auto object_info = S3::getObjectInfo(*client.get(), bucket, path, {}, /* for_disk_s3= */ true);
+ ObjectMetadata result;
result.size_bytes = object_info.size;
result.last_modified = object_info.last_modification_time;
- result.attributes = S3::getObjectMetadata(*client.get(), bucket, path, {}, /* for_disk_s3= */ true);
+ result.attributes = object_info.metadata;
return result;
}
@@ -404,8 +407,8 @@ void S3ObjectStorage::copyObjectToAnotherObjectStorage( // NOLINT
if (auto * dest_s3 = dynamic_cast(&object_storage_to); dest_s3 != nullptr)
{
auto client_ptr = client.get();
- auto size = S3::getObjectSize(*client_ptr, bucket, object_from.absolute_path, {}, /* for_disk_s3= */ true);
auto settings_ptr = s3_settings.get();
+ auto size = S3::getObjectSize(*client_ptr, bucket, object_from.absolute_path, {}, settings_ptr->request_settings, /* for_disk_s3= */ true);
auto scheduler = threadPoolCallbackRunner(getThreadPoolWriter(), "S3ObjStor_copy");
copyS3File(client_ptr, bucket, object_from.absolute_path, 0, size, dest_s3->bucket, object_to.absolute_path,
settings_ptr->request_settings, object_to_attributes, scheduler, /* for_disk_s3= */ true);
@@ -420,8 +423,8 @@ void S3ObjectStorage::copyObject( // NOLINT
const StoredObject & object_from, const StoredObject & object_to, std::optional object_to_attributes)
{
auto client_ptr = client.get();
- auto size = S3::getObjectSize(*client_ptr, bucket, object_from.absolute_path, {}, /* for_disk_s3= */ true);
auto settings_ptr = s3_settings.get();
+ auto size = S3::getObjectSize(*client_ptr, bucket, object_from.absolute_path, {}, settings_ptr->request_settings, /* for_disk_s3= */ true);
auto scheduler = threadPoolCallbackRunner(getThreadPoolWriter(), "S3ObjStor_copy");
copyS3File(client_ptr, bucket, object_from.absolute_path, 0, size, bucket, object_to.absolute_path,
settings_ptr->request_settings, object_to_attributes, scheduler, /* for_disk_s3= */ true);
diff --git a/src/Formats/FormatFactory.cpp b/src/Formats/FormatFactory.cpp
index 55621142702..a9045733cac 100644
--- a/src/Formats/FormatFactory.cpp
+++ b/src/Formats/FormatFactory.cpp
@@ -219,7 +219,7 @@ InputFormatPtr FormatFactory::getInput(
if (!getCreators(name).input_creator)
{
- throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT, "Format {} is not suitable for input (with processors)", name);
+ throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT, "Format {} is not suitable for input", name);
}
const Settings & settings = context->getSettingsRef();
@@ -337,7 +337,7 @@ OutputFormatPtr FormatFactory::getOutputFormatParallelIfPossible(
{
const auto & output_getter = getCreators(name).output_creator;
if (!output_getter)
- throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output (with processors)", name);
+ throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output", name);
auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context);
@@ -374,7 +374,7 @@ OutputFormatPtr FormatFactory::getOutputFormat(
{
const auto & output_getter = getCreators(name).output_creator;
if (!output_getter)
- throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output (with processors)", name);
+ throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output", name);
if (context->hasQueryContext() && context->getSettingsRef().log_queries)
context->getQueryContext()->addQueryFactoriesInfo(Context::QueryLogFactories::Format, name);
@@ -406,7 +406,7 @@ String FormatFactory::getContentType(
{
const auto & output_getter = getCreators(name).output_creator;
if (!output_getter)
- throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output (with processors)", name);
+ throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output", name);
auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context);
diff --git a/src/Functions/FunctionsHashing.h b/src/Functions/FunctionsHashing.h
index 239e497c7d6..031d6e3b586 100644
--- a/src/Functions/FunctionsHashing.h
+++ b/src/Functions/FunctionsHashing.h
@@ -71,6 +71,38 @@ namespace ErrorCodes
extern const int SUPPORT_IS_DISABLED;
}
+namespace impl
+{
+ struct SipHashKey
+ {
+ UInt64 key0 = 0;
+ UInt64 key1 = 0;
+ };
+
+ static SipHashKey parseSipHashKey(const ColumnWithTypeAndName & key)
+ {
+ SipHashKey ret;
+
+ const auto * tuple = checkAndGetColumn(key.column.get());
+ if (!tuple)
+ throw Exception(ErrorCodes::NOT_IMPLEMENTED, "key must be a tuple");
+
+ if (tuple->tupleSize() != 2)
+ throw Exception(ErrorCodes::NOT_IMPLEMENTED, "wrong tuple size: key must be a tuple of 2 UInt64");
+
+ if (const auto * key0col = checkAndGetColumn(&(tuple->getColumn(0))))
+ ret.key0 = key0col->get64(0);
+ else
+ throw Exception(ErrorCodes::NOT_IMPLEMENTED, "first element of the key tuple is not UInt64");
+
+ if (const auto * key1col = checkAndGetColumn(&(tuple->getColumn(1))))
+ ret.key1 = key1col->get64(0);
+ else
+ throw Exception(ErrorCodes::NOT_IMPLEMENTED, "second element of the key tuple is not UInt64");
+
+ return ret;
+ }
+}
/** Hashing functions.
*
@@ -274,6 +306,25 @@ struct SipHash64Impl
static constexpr bool use_int_hash_for_pods = false;
};
+struct SipHash64KeyedImpl
+{
+ static constexpr auto name = "sipHash64Keyed";
+ using ReturnType = UInt64;
+ using Key = impl::SipHashKey;
+
+ static Key parseKey(const ColumnWithTypeAndName & key) { return impl::parseSipHashKey(key); }
+
+ static UInt64 applyKeyed(const Key & key, const char * begin, size_t size) { return sipHash64Keyed(key.key0, key.key1, begin, size); }
+
+ static UInt64 combineHashesKeyed(const Key & key, UInt64 h1, UInt64 h2)
+ {
+ UInt64 hashes[] = {h1, h2};
+ return applyKeyed(key, reinterpret_cast(hashes), 2 * sizeof(UInt64));
+ }
+
+ static constexpr bool use_int_hash_for_pods = false;
+};
+
struct SipHash128Impl
{
static constexpr auto name = "sipHash128";
@@ -293,6 +344,25 @@ struct SipHash128Impl
static constexpr bool use_int_hash_for_pods = false;
};
+struct SipHash128KeyedImpl
+{
+ static constexpr auto name = "sipHash128Keyed";
+ using ReturnType = UInt128;
+ using Key = impl::SipHashKey;
+
+ static Key parseKey(const ColumnWithTypeAndName & key) { return impl::parseSipHashKey(key); }
+
+ static UInt128 applyKeyed(const Key & key, const char * begin, size_t size) { return sipHash128Keyed(key.key0, key.key1, begin, size); }
+
+ static UInt128 combineHashesKeyed(const Key & key, UInt128 h1, UInt128 h2)
+ {
+ UInt128 hashes[] = {h1, h2};
+ return applyKeyed(key, reinterpret_cast(hashes), 2 * sizeof(UInt128));
+ }
+
+ static constexpr bool use_int_hash_for_pods = false;
+};
+
/** Why we need MurmurHash2?
* MurmurHash2 is an outdated hash function, superseded by MurmurHash3 and subsequently by CityHash, xxHash, HighwayHash.
* Usually there is no reason to use MurmurHash.
@@ -896,7 +966,7 @@ private:
DECLARE_MULTITARGET_CODE(
-template
+template
class FunctionAnyHash : public IFunction
{
public:
@@ -906,7 +976,7 @@ private:
using ToType = typename Impl::ReturnType;
template
- void executeIntType(const IColumn * column, typename ColumnVector::Container & vec_to) const
+ void executeIntType(const KeyType & key, const IColumn * column, typename ColumnVector::Container & vec_to) const
{
using ColVecType = ColumnVectorOrDecimal;
@@ -930,13 +1000,13 @@ private:
if (std::is_same_v)
h = JavaHashImpl::apply(vec_from[i]);
else
- h = Impl::apply(reinterpret_cast(&vec_from[i]), sizeof(vec_from[i]));
+ h = apply(key, reinterpret_cast(&vec_from[i]), sizeof(vec_from[i]));
}
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
}
}
else if (auto col_from_const = checkAndGetColumnConst(column))
@@ -956,7 +1026,7 @@ private:
else
{
for (size_t i = 0; i < size; ++i)
- vec_to[i] = Impl::combineHashes(vec_to[i], hash);
+ vec_to[i] = combineHashes(key, vec_to[i], hash);
}
}
else
@@ -965,7 +1035,7 @@ private:
}
template
- void executeBigIntType(const IColumn * column, typename ColumnVector::Container & vec_to) const
+ void executeBigIntType(const KeyType & key, const IColumn * column, typename ColumnVector::Container & vec_to) const
{
using ColVecType = ColumnVectorOrDecimal;
@@ -975,19 +1045,19 @@ private:
size_t size = vec_from.size();
for (size_t i = 0; i < size; ++i)
{
- ToType h = Impl::apply(reinterpret_cast(&vec_from[i]), sizeof(vec_from[i]));
+ ToType h = apply(key, reinterpret_cast(&vec_from[i]), sizeof(vec_from[i]));
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
}
}
else if (auto col_from_const = checkAndGetColumnConst(column))
{
auto value = col_from_const->template getValue();
- ToType h = Impl::apply(reinterpret_cast(&value), sizeof(value));
+ ToType h = apply(key, reinterpret_cast(&value), sizeof(value));
size_t size = vec_to.size();
if constexpr (first)
@@ -997,7 +1067,7 @@ private:
else
{
for (size_t i = 0; i < size; ++i)
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
}
}
else
@@ -1006,21 +1076,21 @@ private:
}
template
- void executeGeneric(const IColumn * column, typename ColumnVector::Container & vec_to) const
+ void executeGeneric(const KeyType & key, const IColumn * column, typename ColumnVector::Container & vec_to) const
{
for (size_t i = 0, size = column->size(); i < size; ++i)
{
StringRef bytes = column->getDataAt(i);
- const ToType h = Impl::apply(bytes.data, bytes.size);
+ const ToType h = apply(key, bytes.data, bytes.size);
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
}
}
template
- void executeString(const IColumn * column, typename ColumnVector::Container & vec_to) const
+ void executeString(const KeyType & key, const IColumn * column, typename ColumnVector::Container & vec_to) const
{
if (const ColumnString * col_from = checkAndGetColumn(column))
{
@@ -1031,14 +1101,14 @@ private:
ColumnString::Offset current_offset = 0;
for (size_t i = 0; i < size; ++i)
{
- const ToType h = Impl::apply(
+ const ToType h = apply(key,
reinterpret_cast(&data[current_offset]),
offsets[i] - current_offset - 1);
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
current_offset = offsets[i];
}
@@ -1051,17 +1121,17 @@ private:
for (size_t i = 0; i < size; ++i)
{
- const ToType h = Impl::apply(reinterpret_cast(&data[i * n]), n);
+ const ToType h = apply(key, reinterpret_cast(&data[i * n]), n);
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
}
}
else if (const ColumnConst * col_from_const = checkAndGetColumnConstStringOrFixedString(column))
{
String value = col_from_const->getValue();
- const ToType hash = Impl::apply(value.data(), value.size());
+ const ToType hash = apply(key, value.data(), value.size());
const size_t size = vec_to.size();
if constexpr (first)
@@ -1072,7 +1142,7 @@ private:
{
for (size_t i = 0; i < size; ++i)
{
- vec_to[i] = Impl::combineHashes(vec_to[i], hash);
+ vec_to[i] = combineHashes(key, vec_to[i], hash);
}
}
}
@@ -1082,7 +1152,7 @@ private:
}
template
- void executeArray(const IDataType * type, const IColumn * column, typename ColumnVector::Container & vec_to) const
+ void executeArray(const KeyType & key, const IDataType * type, const IColumn * column, typename ColumnVector::Container & vec_to) const
{
const IDataType * nested_type = typeid_cast(type)->getNestedType().get();
@@ -1094,7 +1164,7 @@ private:
typename ColumnVector::Container vec_temp(nested_size);
bool nested_is_first = true;
- executeForArgument(nested_type, nested_column, vec_temp, nested_is_first);
+ executeForArgument(key, nested_type, nested_column, vec_temp, nested_is_first);
const size_t size = offsets.size();
@@ -1112,10 +1182,10 @@ private:
if constexpr (first)
vec_to[i] = h;
else
- vec_to[i] = Impl::combineHashes(vec_to[i], h);
+ vec_to[i] = combineHashes(key, vec_to[i], h);
for (size_t j = current_offset; j < next_offset; ++j)
- vec_to[i] = Impl::combineHashes(vec_to[i], vec_temp[j]);
+ vec_to[i] = combineHashes(key, vec_to[i], vec_temp[j]);
current_offset = offsets[i];
}
@@ -1124,7 +1194,7 @@ private:
{
/// NOTE: here, of course, you can do without the materialization of the column.
ColumnPtr full_column = col_from_const->convertToFullColumn();
- executeArray(type, &*full_column, vec_to);
+ executeArray(key, type, &*full_column, vec_to);
}
else
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of first argument of function {}",
@@ -1132,44 +1202,44 @@ private:
}
template
- void executeAny(const IDataType * from_type, const IColumn * icolumn, typename ColumnVector::Container & vec_to) const
+ void executeAny(const KeyType & key, const IDataType * from_type, const IColumn * icolumn, typename ColumnVector::Container & vec_to) const
{
WhichDataType which(from_type);
- if (which.isUInt8()) executeIntType(icolumn, vec_to);
- else if (which.isUInt16()) executeIntType(icolumn, vec_to);
- else if (which.isUInt32()) executeIntType(icolumn, vec_to);
- else if (which.isUInt64()) executeIntType(icolumn, vec_to);
- else if (which.isUInt128()) executeBigIntType(icolumn, vec_to);
- else if (which.isUInt256()) executeBigIntType(icolumn, vec_to);
- else if (which.isInt8()) executeIntType(icolumn, vec_to);
- else if (which.isInt16()) executeIntType(icolumn, vec_to);
- else if (which.isInt32()) executeIntType(icolumn, vec_to);
- else if (which.isInt64()) executeIntType(icolumn, vec_to);
- else if (which.isInt128()) executeBigIntType(icolumn, vec_to);
- else if (which.isInt256()) executeBigIntType(icolumn, vec_to);
- else if (which.isUUID()) executeBigIntType(icolumn, vec_to);
- else if (which.isIPv4()) executeIntType(icolumn, vec_to);
- else if (which.isIPv6()) executeBigIntType(icolumn, vec_to);
- else if (which.isEnum8()) executeIntType(icolumn, vec_to);
- else if (which.isEnum16()) executeIntType(icolumn, vec_to);
- else if (which.isDate()) executeIntType(icolumn, vec_to);
- else if (which.isDate32()) executeIntType(icolumn, vec_to);
- else if (which.isDateTime()) executeIntType(icolumn, vec_to);
+ if (which.isUInt8()) executeIntType(key, icolumn, vec_to);
+ else if (which.isUInt16()) executeIntType(key, icolumn, vec_to);
+ else if (which.isUInt32()) executeIntType(key, icolumn, vec_to);
+ else if (which.isUInt64()) executeIntType(key, icolumn, vec_to);
+ else if (which.isUInt128()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isUInt256()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isInt8()) executeIntType(key, icolumn, vec_to);
+ else if (which.isInt16()) executeIntType(key, icolumn, vec_to);
+ else if (which.isInt32()) executeIntType(key, icolumn, vec_to);
+ else if (which.isInt64()) executeIntType(key, icolumn, vec_to);
+ else if (which.isInt128()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isInt256()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isUUID()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isIPv4()) executeIntType(key, icolumn, vec_to);
+ else if (which.isIPv6()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isEnum8()) executeIntType(key, icolumn, vec_to);
+ else if (which.isEnum16()) executeIntType(key, icolumn, vec_to);
+ else if (which.isDate()) executeIntType(key, icolumn, vec_to);
+ else if (which.isDate32()) executeIntType(key, icolumn, vec_to);
+ else if (which.isDateTime()) executeIntType(key, icolumn, vec_to);
/// TODO: executeIntType() for Decimal32/64 leads to incompatible result
- else if (which.isDecimal32()) executeBigIntType(icolumn, vec_to);
- else if (which.isDecimal64()) executeBigIntType(icolumn, vec_to);
- else if (which.isDecimal128()) executeBigIntType(icolumn, vec_to);
- else if (which.isDecimal256()) executeBigIntType(icolumn, vec_to);
- else if (which.isFloat32()) executeIntType(icolumn, vec_to);
- else if (which.isFloat64()) executeIntType(icolumn, vec_to);
- else if (which.isString()) executeString(icolumn, vec_to);
- else if (which.isFixedString()) executeString(icolumn, vec_to);
- else if (which.isArray()) executeArray(from_type, icolumn, vec_to);
- else executeGeneric(icolumn, vec_to);
+ else if (which.isDecimal32()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isDecimal64()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isDecimal128()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isDecimal256()) executeBigIntType(key, icolumn, vec_to);
+ else if (which.isFloat32()) executeIntType(key, icolumn, vec_to);
+ else if (which.isFloat64()) executeIntType(key, icolumn, vec_to);
+ else if (which.isString()) executeString(key, icolumn, vec_to);
+ else if (which.isFixedString()) executeString(key, icolumn, vec_to);
+ else if (which.isArray()) executeArray(key, from_type, icolumn, vec_to);
+ else executeGeneric(key, icolumn, vec_to);
}
- void executeForArgument(const IDataType * type, const IColumn * column, typename ColumnVector::Container & vec_to, bool & is_first) const
+ void executeForArgument(const KeyType & key, const IDataType * type, const IColumn * column, typename ColumnVector::Container & vec_to, bool & is_first) const
{
/// Flattening of tuples.
if (const ColumnTuple * tuple = typeid_cast(column))
@@ -1178,7 +1248,7 @@ private:
const DataTypes & tuple_types = typeid_cast(*type).getElements();
size_t tuple_size = tuple_columns.size();
for (size_t i = 0; i < tuple_size; ++i)
- executeForArgument(tuple_types[i].get(), tuple_columns[i].get(), vec_to, is_first);
+ executeForArgument(key, tuple_types[i].get(), tuple_columns[i].get(), vec_to, is_first);
}
else if (const ColumnTuple * tuple_const = checkAndGetColumnConstData(column))
{
@@ -1188,25 +1258,25 @@ private:
for (size_t i = 0; i < tuple_size; ++i)
{
auto tmp = ColumnConst::create(tuple_columns[i], column->size());
- executeForArgument(tuple_types[i].get(), tmp.get(), vec_to, is_first);
+ executeForArgument(key, tuple_types[i].get(), tmp.get(), vec_to, is_first);
}
}
else if (const auto * map = checkAndGetColumn(column))
{
const auto & type_map = assert_cast(*type);
- executeForArgument(type_map.getNestedType().get(), map->getNestedColumnPtr().get(), vec_to, is_first);
+ executeForArgument(key, type_map.getNestedType().get(), map->getNestedColumnPtr().get(), vec_to, is_first);
}
else if (const auto * const_map = checkAndGetColumnConstData(column))
{
const auto & type_map = assert_cast(*type);
- executeForArgument(type_map.getNestedType().get(), const_map->getNestedColumnPtr().get(), vec_to, is_first);
+ executeForArgument(key, type_map.getNestedType().get(), const_map->getNestedColumnPtr().get(), vec_to, is_first);
}
else
{
if (is_first)
- executeAny(type, column, vec_to);
+ executeAny(key, type, column, vec_to);
else
- executeAny(type, column, vec_to);
+ executeAny(key, type, column, vec_to);
}
is_first = false;
@@ -1240,17 +1310,29 @@ public:
typename ColumnVector::Container & vec_to = col_to->getData();
- if (arguments.empty())
+ /// If using a "keyed" algorithm, the first argument is the key and
+ /// the data starts from the second argument.
+ /// Otherwise there is no key and all arguments are interpreted as data.
+ constexpr size_t first_data_argument = Keyed;
+
+ if (arguments.size() <= first_data_argument)
{
- /// Constant random number from /dev/urandom is used as a hash value of empty list of arguments.
+ /// Return a fixed random-looking magic number when input is empty
vec_to.assign(rows, static_cast(0xe28dbde7fe22e41c));
}
- /// The function supports arbitrary number of arguments of arbitrary types.
+ KeyType key{};
+ if constexpr (Keyed)
+ if (!arguments.empty())
+ key = Impl::parseKey(arguments[0]);
+ /// The function supports arbitrary number of arguments of arbitrary types.
bool is_first_argument = true;
- for (const auto & col : arguments)
- executeForArgument(col.type.get(), col.column.get(), vec_to, is_first_argument);
+ for (size_t i = first_data_argument; i < arguments.size(); ++i)
+ {
+ const auto & col = arguments[i];
+ executeForArgument(key, col.type.get(), col.column.get(), vec_to, is_first_argument);
+ }
if constexpr (std::is_same_v) /// backward-compatible
{
@@ -1261,25 +1343,38 @@ public:
return col_to;
}
+
+ static ToType apply(const KeyType & key, const char * begin, size_t size)
+ {
+ if constexpr (Keyed)
+ return Impl::applyKeyed(key, begin, size);
+ else
+ return Impl::apply(begin, size);
+ }
+
+ static ToType combineHashes(const KeyType & key, ToType h1, ToType h2)
+ {
+ if constexpr (Keyed)
+ return Impl::combineHashesKeyed(key, h1, h2);
+ else
+ return Impl::combineHashes(h1, h2);
+ }
};
) // DECLARE_MULTITARGET_CODE
-template
-class FunctionAnyHash : public TargetSpecific::Default::FunctionAnyHash
+template
+class FunctionAnyHash : public TargetSpecific::Default::FunctionAnyHash
{
public:
explicit FunctionAnyHash(ContextPtr context) : selector(context)
{
- selector.registerImplementation>();
+ selector.registerImplementation>();
- #if USE_MULTITARGET_CODE
- selector.registerImplementation>();
- selector.registerImplementation>();
- #endif
+#if USE_MULTITARGET_CODE
+ selector.registerImplementation>();
+ selector.registerImplementation>();
+#endif
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
@@ -1514,6 +1609,7 @@ struct NameIntHash32 { static constexpr auto name = "intHash32"; };
struct NameIntHash64 { static constexpr auto name = "intHash64"; };
using FunctionSipHash64 = FunctionAnyHash;
+using FunctionSipHash64Keyed = FunctionAnyHash;
using FunctionIntHash32 = FunctionIntHash;
using FunctionIntHash64 = FunctionIntHash;
#if USE_SSL
@@ -1527,6 +1623,7 @@ using FunctionSHA384 = FunctionStringHashFixedString;
using FunctionSHA512 = FunctionStringHashFixedString;
#endif
using FunctionSipHash128 = FunctionAnyHash;
+using FunctionSipHash128Keyed = FunctionAnyHash;
using FunctionCityHash64 = FunctionAnyHash;
using FunctionFarmFingerprint64 = FunctionAnyHash;
using FunctionFarmHash64 = FunctionAnyHash;
diff --git a/src/Functions/FunctionsHashingMisc.cpp b/src/Functions/FunctionsHashingMisc.cpp
index b33d9366094..2a705e87a1e 100644
--- a/src/Functions/FunctionsHashingMisc.cpp
+++ b/src/Functions/FunctionsHashingMisc.cpp
@@ -12,7 +12,9 @@ namespace DB
REGISTER_FUNCTION(Hashing)
{
factory.registerFunction();
+ factory.registerFunction();
factory.registerFunction();
+ factory.registerFunction();
factory.registerFunction();
factory.registerFunction();
factory.registerFunction();
diff --git a/src/Functions/array/arrayShuffle.cpp b/src/Functions/array/arrayShuffle.cpp
new file mode 100644
index 00000000000..9cf3ac8f3fe
--- /dev/null
+++ b/src/Functions/array/arrayShuffle.cpp
@@ -0,0 +1,227 @@
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+
+#include
+#include
+
+namespace DB
+{
+
+namespace ErrorCodes
+{
+ extern const int ILLEGAL_COLUMN;
+ extern const int ILLEGAL_TYPE_OF_ARGUMENT;
+ extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
+}
+
+/** Shuffle array elements
+ * arrayShuffle(arr)
+ * arrayShuffle(arr, seed)
+ */
+struct FunctionArrayShuffleTraits
+{
+ static constexpr auto name = "arrayShuffle";
+ static constexpr auto has_limit = false; // Permute the whole array
+ static ColumnNumbers getArgumentsThatAreAlwaysConstant() { return {1}; }
+ static constexpr auto max_num_params = 2; // array[, seed]
+ static constexpr auto seed_param_idx = 1; // --------^^^^
+};
+
+/** Partial shuffle array elements
+ * arrayPartialShuffle(arr)
+ * arrayPartialShuffle(arr, limit)
+ * arrayPartialShuffle(arr, limit, seed)
+ */
+struct FunctionArrayPartialShuffleTraits
+{
+ static constexpr auto name = "arrayPartialShuffle";
+ static constexpr auto has_limit = true;
+ static ColumnNumbers getArgumentsThatAreAlwaysConstant() { return {1, 2}; }
+ static constexpr auto max_num_params = 3; // array[, limit[, seed]]
+ static constexpr auto seed_param_idx = 2; // ----------------^^^^
+};
+
+template
+class FunctionArrayShuffleImpl : public IFunction
+{
+public:
+ static constexpr auto name = Traits::name;
+
+ String getName() const override { return name; }
+ bool isVariadic() const override { return true; }
+ size_t getNumberOfArguments() const override { return 0; }
+ ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return Traits::getArgumentsThatAreAlwaysConstant(); }
+ bool useDefaultImplementationForConstants() const override { return true; }
+ bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
+
+ static FunctionPtr create(ContextPtr) { return std::make_shared>(); }
+
+ DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
+ {
+ if (arguments.size() > Traits::max_num_params || arguments.empty())
+ {
+ throw Exception(
+ ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
+ "Function '{}' needs from 1 to {} arguments, passed {}.",
+ getName(),
+ Traits::max_num_params,
+ arguments.size());
+ }
+
+ const DataTypeArray * array_type = checkAndGetDataType