Merge remote-tracking branch 'origin/master' into live_view_support_for_subquery

2024-11-23 16:12:01 +00:00 · 2019-12-05 06:54:04 -05:00 · 2019-12-05 06:54:04 -05:00 · e20793b2dc
commit e20793b2dc
parent 1e0e00b7e4 86ff01d3aa
443 changed files with 12127 additions and 3049 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -1,6 +1,7 @@
 [submodule "contrib/poco"]
 	path = contrib/poco
 	url = https://github.com/ClickHouse-Extras/poco
+	branch = clickhouse
 [submodule "contrib/zstd"]
 	path = contrib/zstd
 	url = https://github.com/facebook/zstd.git
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,82 @@
+## ClickHouse release v19.17.4.11, 2019-11-22
+
+### Backward Incompatible Change
+* Using column instead of AST to store scalar subquery results for better performance. Setting `enable_scalar_subquery_optimization` was added in 19.17 and it was enabled by default. It leads to errors like [this](https://github.com/ClickHouse/ClickHouse/issues/7851) during upgrade to 19.17.2 or 19.17.3 from previous versions. This setting was disabled by default in 19.17.4, to make possible upgrading from 19.16 and older versions without errors. [#7392](https://github.com/ClickHouse/ClickHouse/pull/7392) ([Amos Bird](https://github.com/amosbird))
+
+### New Feature
+* Add the ability to create dictionaries with DDL queries. [#7360](https://github.com/ClickHouse/ClickHouse/pull/7360) ([alesapin](https://github.com/alesapin))
+* Make `bloom_filter` type of index supporting `LowCardinality` and `Nullable` [#7363](https://github.com/ClickHouse/ClickHouse/issues/7363) [#7561](https://github.com/ClickHouse/ClickHouse/pull/7561) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+* Add function `isValidJSON` to check that passed string is a valid json. [#5910](https://github.com/ClickHouse/ClickHouse/issues/5910) [#7293](https://github.com/ClickHouse/ClickHouse/pull/7293) ([Vdimir](https://github.com/Vdimir))
+* Implement `arrayCompact` function [#7328](https://github.com/ClickHouse/ClickHouse/pull/7328) ([Memo](https://github.com/Joeywzr))
+* Created function `hex` for Decimal numbers. It works like `hex(reinterpretAsString())`, but doesn't delete last zero bytes. [#7355](https://github.com/ClickHouse/ClickHouse/pull/7355) ([Mikhail Korotov](https://github.com/millb))
+* Add `arrayFill` and `arrayReverseFill` functions, which replace elements by other elements in front/back of them in the array. [#7380](https://github.com/ClickHouse/ClickHouse/pull/7380) ([hcz](https://github.com/hczhcz))
+* Add `CRC32IEEE()`/`CRC64()` support [#7480](https://github.com/ClickHouse/ClickHouse/pull/7480) ([Azat Khuzhin](https://github.com/azat))
+* Implement `char` function similar to one in [mysql](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char)  [#7486](https://github.com/ClickHouse/ClickHouse/pull/7486) ([sundyli](https://github.com/sundy-li))
+* Add `bitmapTransform` function. It transforms an array of values in a bitmap to another array of values, the result is a new bitmap [#7598](https://github.com/ClickHouse/ClickHouse/pull/7598) ([Zhichang Yu](https://github.com/yuzhichang))
+* Implemented `javaHashUTF16LE()` function [#7651](https://github.com/ClickHouse/ClickHouse/pull/7651) ([achimbab](https://github.com/achimbab))
+* Add `_shard_num` virtual column for the Distributed engine [#7624](https://github.com/ClickHouse/ClickHouse/pull/7624) ([Azat Khuzhin](https://github.com/azat))
+
+### Experimental Feature
+* Support for processors (new query execution pipeline) in `MergeTree`. [#7181](https://github.com/ClickHouse/ClickHouse/pull/7181) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+
+### Bug Fix
+* Fix incorrect float parsing in `Values` [#7817](https://github.com/ClickHouse/ClickHouse/issues/7817) [#7870](https://github.com/ClickHouse/ClickHouse/pull/7870) ([tavplubix](https://github.com/tavplubix))
+* Fix rare deadlock which can happen when trace_log is enabled. [#7838](https://github.com/ClickHouse/ClickHouse/pull/7838) ([filimonov](https://github.com/filimonov))
+* Prevent message duplication when producing Kafka table has any MVs selecting from it [#7265](https://github.com/ClickHouse/ClickHouse/pull/7265) ([Ivan](https://github.com/abyss7))
+* Support for `Array(LowCardinality(Nullable(String)))` in `IN`. Resolves [#7364](https://github.com/ClickHouse/ClickHouse/issues/7364)  [#7366](https://github.com/ClickHouse/ClickHouse/pull/7366) ([achimbab](https://github.com/achimbab))
+* Add handling of `SQL_TINYINT` and `SQL_BIGINT`, and fix handling of `SQL_FLOAT` data source types in ODBC Bridge. [#7491](https://github.com/ClickHouse/ClickHouse/pull/7491) ([Denis Glazachev](https://github.com/traceon))
+* Fix aggregation (`avg` and quantiles) over empty decimal columns [#7431](https://github.com/ClickHouse/ClickHouse/pull/7431) ([Andrey Konyaev](https://github.com/akonyaev90))
+* Fix `INSERT` into Distributed with `MATERIALIZED` columns [#7377](https://github.com/ClickHouse/ClickHouse/pull/7377) ([Azat Khuzhin](https://github.com/azat))
+* Make `MOVE PARTITION` work if some parts of partition are already on destination disk or volume [#7434](https://github.com/ClickHouse/ClickHouse/pull/7434) ([Vladimir Chebotarev](https://github.com/excitoon))
+* Fixed bug with hardlinks failing to be created during mutations in `ReplicatedMergeTree` in multi-disk configurations. [#7558](https://github.com/ClickHouse/ClickHouse/pull/7558) ([Vladimir Chebotarev](https://github.com/excitoon))
+* Fixed a bug with a mutation on a MergeTree when whole part remains unchanged and best space is being found on another disk [#7602](https://github.com/ClickHouse/ClickHouse/pull/7602) ([Vladimir Chebotarev](https://github.com/excitoon))
+* Fixed bug with `keep_free_space_ratio` not being read from disks configuration [#7645](https://github.com/ClickHouse/ClickHouse/pull/7645) ([Vladimir Chebotarev](https://github.com/excitoon))
+* Fix bug with table contains only `Tuple` columns or columns with complex paths. Fixes [7541](https://github.com/ClickHouse/ClickHouse/issues/7541). [#7545](https://github.com/ClickHouse/ClickHouse/pull/7545) ([alesapin](https://github.com/alesapin))
+* Do not account memory for Buffer engine in max_memory_usage limit [#7552](https://github.com/ClickHouse/ClickHouse/pull/7552) ([Azat Khuzhin](https://github.com/azat))
+* Fix final mark usage in `MergeTree` tables ordered by `tuple()`. In rare cases it could lead to `Can't adjust last granule` error while select. [#7639](https://github.com/ClickHouse/ClickHouse/pull/7639) ([Anton Popov](https://github.com/CurtizJ))
+* Fix bug in mutations that have predicate with actions that require context (for example functions for json), which may lead to crashes or strange exceptions. [#7664](https://github.com/ClickHouse/ClickHouse/pull/7664) ([alesapin](https://github.com/alesapin))
+* Fix mismatch of database and table names escaping in `data/` and `shadow/` directories [#7575](https://github.com/ClickHouse/ClickHouse/pull/7575) ([Alexander Burmak](https://github.com/Alex-Burmak))
+* Support duplicated keys in RIGHT|FULL JOINs, e.g. ```ON t.x = u.x AND t.x = u.y```. Fix crash in this case. [#7586](https://github.com/ClickHouse/ClickHouse/pull/7586) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix `Not found column <expression> in block` when joining on expression with RIGHT or FULL JOIN. [#7641](https://github.com/ClickHouse/ClickHouse/pull/7641) ([Artem Zuikov](https://github.com/4ertus2))
+* One more attempt to fix infinite loop in `PrettySpace` format [#7591](https://github.com/ClickHouse/ClickHouse/pull/7591) ([Olga Khvostikova](https://github.com/stavrolia))
+* Fix bug in `concat` function when all arguments were `FixedString` of the same size. [#7635](https://github.com/ClickHouse/ClickHouse/pull/7635) ([alesapin](https://github.com/alesapin))
+* Fixed exception in case of using 1 argument while defining S3, URL and HDFS storages. [#7618](https://github.com/ClickHouse/ClickHouse/pull/7618) ([Vladimir Chebotarev](https://github.com/excitoon))
+* Fix scope of the InterpreterSelectQuery for views with query [#7601](https://github.com/ClickHouse/ClickHouse/pull/7601) ([Azat Khuzhin](https://github.com/azat))
+
+### Improvement
+* `Nullable` columns recognized and NULL-values handled correctly by ODBC-bridge [#7402](https://github.com/ClickHouse/ClickHouse/pull/7402) ([Vasily Nemkov](https://github.com/Enmk))
+* Write current batch for distributed send atomically [#7600](https://github.com/ClickHouse/ClickHouse/pull/7600) ([Azat Khuzhin](https://github.com/azat))
+* Throw an exception if we cannot detect table for column name in query. [#7358](https://github.com/ClickHouse/ClickHouse/pull/7358) ([Artem Zuikov](https://github.com/4ertus2))
+* Add `merge_max_block_size` setting to `MergeTreeSettings` [#7412](https://github.com/ClickHouse/ClickHouse/pull/7412) ([Artem Zuikov](https://github.com/4ertus2))
+* Queries with `HAVING` and without `GROUP BY` assume group by constant. So, `SELECT 1 HAVING 1` now returns a result. [#7496](https://github.com/ClickHouse/ClickHouse/pull/7496) ([Amos Bird](https://github.com/amosbird))
+* Support parsing `(X,)` as tuple similar to python. [#7501](https://github.com/ClickHouse/ClickHouse/pull/7501), [#7562](https://github.com/ClickHouse/ClickHouse/pull/7562) ([Amos Bird](https://github.com/amosbird))
+* Make `range` function behaviors almost like pythonic one. [#7518](https://github.com/ClickHouse/ClickHouse/pull/7518) ([sundyli](https://github.com/sundy-li))
+* Add `constraints` columns to table `system.settings` [#7553](https://github.com/ClickHouse/ClickHouse/pull/7553) ([Vitaly Baranov](https://github.com/vitlibar))
+* Better Null format for tcp handler, so that it's possible to use `select ignore(<expression>) from table format Null` for perf measure via clickhouse-client [#7606](https://github.com/ClickHouse/ClickHouse/pull/7606) ([Amos Bird](https://github.com/amosbird))
+* Queries like `CREATE TABLE ... AS (SELECT (1, 2))` are parsed correctly [#7542](https://github.com/ClickHouse/ClickHouse/pull/7542) ([hcz](https://github.com/hczhcz))
+
+### Performance Improvement
+* The performance of aggregation over short string keys is improved. [#6243](https://github.com/ClickHouse/ClickHouse/pull/6243) ([Alexander Kuzmenkov](https://github.com/akuzm), [Amos Bird](https://github.com/amosbird))
+* Run another pass of syntax/expression analysis to get potential optimizations after constant predicates are folded. [#7497](https://github.com/ClickHouse/ClickHouse/pull/7497) ([Amos Bird](https://github.com/amosbird))
+* Use storage meta info to evaluate trivial `SELECT count() FROM table;` [#7510](https://github.com/ClickHouse/ClickHouse/pull/7510) ([Amos Bird](https://github.com/amosbird), [alexey-milovidov](https://github.com/alexey-milovidov))
+* Vectorize processing `arrayReduce` similar to Aggregator `addBatch`. [#7608](https://github.com/ClickHouse/ClickHouse/pull/7608) ([Amos Bird](https://github.com/amosbird))
+* Minor improvements in performance of `Kafka` consumption [#7475](https://github.com/ClickHouse/ClickHouse/pull/7475) ([Ivan](https://github.com/abyss7))
+
+### Build/Testing/Packaging Improvement
+* Add support for cross-compiling to the CPU architecture AARCH64. Refactor packager script. [#7370](https://github.com/ClickHouse/ClickHouse/pull/7370) [#7539](https://github.com/ClickHouse/ClickHouse/pull/7539) ([Ivan](https://github.com/abyss7))
+* Unpack darwin-x86_64 and linux-aarch64 toolchains into mounted Docker volume when building packages [#7534](https://github.com/ClickHouse/ClickHouse/pull/7534) ([Ivan](https://github.com/abyss7))
+* Update Docker Image for Binary Packager [#7474](https://github.com/ClickHouse/ClickHouse/pull/7474) ([Ivan](https://github.com/abyss7))
+* Fixed compile errors on MacOS Catalina [#7585](https://github.com/ClickHouse/ClickHouse/pull/7585) ([Ernest Poletaev](https://github.com/ernestp))
+* Some refactoring in query analysis logic: split complex class into several simple ones. [#7454](https://github.com/ClickHouse/ClickHouse/pull/7454) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix build without submodules [#7295](https://github.com/ClickHouse/ClickHouse/pull/7295) ([proller](https://github.com/proller))
+* Better `add_globs` in CMake files [#7418](https://github.com/ClickHouse/ClickHouse/pull/7418) ([Amos Bird](https://github.com/amosbird))
+* Remove hardcoded paths in `unwind` target [#7460](https://github.com/ClickHouse/ClickHouse/pull/7460) ([Konstantin Podshumok](https://github.com/podshumok))
+* Allow to use mysql format without ssl [#7524](https://github.com/ClickHouse/ClickHouse/pull/7524) ([proller](https://github.com/proller))
+
+### Other
+* Added ANTLR4 grammar for ClickHouse SQL dialect [#7595](https://github.com/ClickHouse/ClickHouse/issues/7595) [#7596](https://github.com/ClickHouse/ClickHouse/pull/7596) ([alexey-milovidov](https://github.com/alexey-milovidov))
+
+
 ## ClickHouse release v19.16.2.2, 2019-10-30

 ### Backward Incompatible Change
@ -128,7 +207,7 @@ Kuzmenkov](https://github.com/akuzm))
 Zuikov](https://github.com/4ertus2))
 * Optimize partial merge join. [#7070](https://github.com/ClickHouse/ClickHouse/pull/7070)
  ([Artem Zuikov](https://github.com/4ertus2))
-* Do not use more then 98K of memory in uniqCombined functions.
+* Do not use more than 98K of memory in uniqCombined functions.
  [#7236](https://github.com/ClickHouse/ClickHouse/pull/7236),
 [#7270](https://github.com/ClickHouse/ClickHouse/pull/7270) ([Azat
 Khuzhin](https://github.com/azat))
@ -396,7 +475,7 @@ fix comments to make obvious that it may throw.
 * Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/ClickHouse/ClickHouse/pull/6384) ([Anton Popov](https://github.com/CurtizJ))
 * Fixed wrong code in mutations that may lead to memory corruption. Fixed segfault with read of address `0x14c0` that may happed due to concurrent `DROP TABLE` and `SELECT` from `system.parts` or `system.parts_columns`. Fixed race condition in preparation of mutation queries. Fixed deadlock caused by `OPTIMIZE` of Replicated tables and concurrent modification operations like ALTERs. [#6514](https://github.com/ClickHouse/ClickHouse/pull/6514) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Removed extra verbose logging in MySQL interface [#6389](https://github.com/ClickHouse/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
-* Return ability to parse boolean settings from 'true' and 'false' in configuration file. [#6278](https://github.com/ClickHouse/ClickHouse/pull/6278) ([alesapin](https://github.com/alesapin))
+* Return the ability to parse boolean settings from 'true' and 'false' in the configuration file. [#6278](https://github.com/ClickHouse/ClickHouse/pull/6278) ([alesapin](https://github.com/alesapin))
 * Fix crash in `quantile` and `median` function over `Nullable(Decimal128)`. [#6378](https://github.com/ClickHouse/ClickHouse/pull/6378) ([Artem Zuikov](https://github.com/4ertus2))
 * Fixed possible incomplete result returned by `SELECT` query with `WHERE` condition on primary key contained conversion to Float type. It was caused by incorrect checking of monotonicity in `toFloat` function. [#6248](https://github.com/ClickHouse/ClickHouse/issues/6248) [#6374](https://github.com/ClickHouse/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
 * Check `max_expanded_ast_elements` setting for mutations. Clear mutations after `TRUNCATE TABLE`. [#6205](https://github.com/ClickHouse/ClickHouse/pull/6205) ([Winter Zhang](https://github.com/zhang2014))
@ -424,8 +503,8 @@ fix comments to make obvious that it may throw.
 * Fix bug with writing secondary indices marks with adaptive granularity. [#6126](https://github.com/ClickHouse/ClickHouse/pull/6126) ([alesapin](https://github.com/alesapin))
 * Fix initialization order while server startup. Since `StorageMergeTree::background_task_handle` is initialized in `startup()` the `MergeTreeBlockOutputStream::write()` may try to use it before initialization. Just check if it is initialized. [#6080](https://github.com/ClickHouse/ClickHouse/pull/6080) ([Ivan](https://github.com/abyss7))
 * Clearing the data buffer from the previous read operation that was completed with an error. [#6026](https://github.com/ClickHouse/ClickHouse/pull/6026) ([Nikolay](https://github.com/bopohaa))
-* Fix bug with enabling adaptive granularity when creating new replica for Replicated*MergeTree table. [#6394](https://github.com/ClickHouse/ClickHouse/issues/6394) [#6452](https://github.com/ClickHouse/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
-* Fixed possible crash during server startup in case of exception happened in `libunwind` during exception at access to uninitialised `ThreadStatus` structure. [#6456](https://github.com/ClickHouse/ClickHouse/pull/6456) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
+* Fix bug with enabling adaptive granularity when creating a new replica for Replicated*MergeTree table. [#6394](https://github.com/ClickHouse/ClickHouse/issues/6394) [#6452](https://github.com/ClickHouse/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
+* Fixed possible crash during server startup in case of exception happened in `libunwind` during exception at access to uninitialized `ThreadStatus` structure. [#6456](https://github.com/ClickHouse/ClickHouse/pull/6456) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
 * Fix crash in `yandexConsistentHash` function. Found by fuzz test. [#6304](https://github.com/ClickHouse/ClickHouse/issues/6304) [#6305](https://github.com/ClickHouse/ClickHouse/pull/6305) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed the possibility of hanging queries when server is overloaded and global thread pool becomes near full. This have higher chance to happen on clusters with large number of shards (hundreds), because distributed queries allocate a thread per connection to each shard. For example, this issue may reproduce if a cluster of 330 shards is processing 30 concurrent distributed queries. This issue affects all versions starting from 19.2. [#6301](https://github.com/ClickHouse/ClickHouse/pull/6301) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/ClickHouse/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
@ -598,7 +677,7 @@ fix comments to make obvious that it may throw.

 ### Backward Incompatible Change
 * Removed rarely used table function `catBoostPool` and storage `CatBoostPool`. If you have used this table function, please write email to `clickhouse-feedback@yandex-team.com`. Note that CatBoost integration remains and will be supported. [#6279](https://github.com/ClickHouse/ClickHouse/pull/6279) ([alexey-milovidov](https://github.com/alexey-milovidov))
-* Disable `ANY RIGHT JOIN` and `ANY FULL JOIN` by default. Set `any_join_get_any_from_right_table` setting to enable them. [#5126](https://github.com/ClickHouse/ClickHouse/issues/5126) [#6351](https://github.com/ClickHouse/ClickHouse/pull/6351) ([Artem Zuikov](https://github.com/4ertus2))
+* Disable `ANY RIGHT JOIN` and `ANY FULL JOIN` by default. Set `any_join_distinct_right_table_keys` setting to enable them. [#5126](https://github.com/ClickHouse/ClickHouse/issues/5126) [#6351](https://github.com/ClickHouse/ClickHouse/pull/6351) ([Artem Zuikov](https://github.com/4ertus2))

 ## ClickHouse release 19.13.6.51, 2019-10-02

@ -669,7 +748,7 @@ fix comments to make obvious that it may throw.
 * Fix kafka tests. [#6805](https://github.com/ClickHouse/ClickHouse/pull/6805) ([Ivan](https://github.com/abyss7))

 ### Security Fix
-* If the attacker has write access to ZooKeeper and is able to run custom server available from the network where ClickHouse run, it can create custom-built malicious server that will act as ClickHouse replica and register it in ZooKeeper. When another replica will fetch data part from malicious replica, it can force clickhouse-server to write to arbitrary path on filesystem. Found by Eldar Zaitov, information security team at Yandex. [#6247](https://github.com/ClickHouse/ClickHouse/pull/6247) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* If the attacker has write access to ZooKeeper and is able to run custom server available from the network where ClickHouse runs, it can create custom-built malicious server that will act as ClickHouse replica and register it in ZooKeeper. When another replica will fetch data part from malicious replica, it can force clickhouse-server to write to arbitrary path on filesystem. Found by Eldar Zaitov, information security team at Yandex. [#6247](https://github.com/ClickHouse/ClickHouse/pull/6247) ([alexey-milovidov](https://github.com/alexey-milovidov))

 ## ClickHouse release 19.13.3.26, 2019-08-22

@ -697,7 +776,7 @@ fix comments to make obvious that it may throw.
 * Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/ClickHouse/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))

 ### Backward Incompatible Change
-* The setting `input_format_defaults_for_omitted_fields` is enabled by default. Inserts in Distibuted tables need this setting to be the same on cluster (you need to set it before rolling update). It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference. [#6043](https://github.com/ClickHouse/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/ClickHouse/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))
+* The setting `input_format_defaults_for_omitted_fields` is enabled by default. Inserts in Distributed tables need this setting to be the same on cluster (you need to set it before rolling update). It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behavior but may lead to negligible performance difference. [#6043](https://github.com/ClickHouse/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/ClickHouse/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))

 ### Experimental features
 * New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/ClickHouse/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
@ -1478,7 +1557,7 @@ lee](https://github.com/neverlee))

 ### Bug fixes

-* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/ClickHouse/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed error in #3920. This error manifests itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/ClickHouse/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))


 ## ClickHouse release 19.3.6, 2019-03-02
@ -2335,7 +2414,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si

 ### Improvements:

-* Changed the numbering scheme for release versions. Now the first part contains the year of release (A.D., Moscow timezone, minus 2000), the second part contains the number for major changes (increases for most releases), and the third part is the patch version. Releases are still backwards compatible, unless otherwise stated in the changelog.
+* Changed the numbering scheme for release versions. Now the first part contains the year of release (A.D., Moscow timezone, minus 2000), the second part contains the number for major changes (increases for most releases), and the third part is the patch version. Releases are still backward compatible, unless otherwise stated in the changelog.
 * Faster conversions of floating-point numbers to a string ([Amos Bird](https://github.com/ClickHouse/ClickHouse/pull/2664)).
 * If some rows were skipped during an insert due to parsing errors (this is possible with the `input_allow_errors_num` and `input_allow_errors_ratio` settings enabled), the number of skipped rows is now written to the server log ([Leonardo Cecchi](https://github.com/ClickHouse/ClickHouse/pull/2669)).

@ -2534,7 +2613,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si
 * Configuration of the table level for the `ReplicatedMergeTree` family in order to minimize the amount of data stored in Zookeeper: : `use_minimalistic_checksums_in_zookeeper = 1`
 * Configuration of the `clickhouse-client` prompt. By default, server names are now output to the prompt. The server's display name can be changed. It's also sent in the `X-ClickHouse-Display-Name` HTTP header (Kirill Shvakov).
 * Multiple comma-separated `topics` can be specified for the `Kafka` engine  (Tobias Adamson)
-* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was cancelled` exception instead of an incomplete result.
+* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was canceled` exception instead of an incomplete result.

 ### Improvements:

--- a/README.md
+++ b/README.md
@ -13,9 +13,5 @@ ClickHouse is an open-source column-oriented database management system that all
 * You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.

 ## Upcoming Events
-* [ClickHouse Meetup in Tokyo](https://clickhouse.connpass.com/event/147001/) on November 14.
-* [ClickHouse Meetup in Istanbul](https://www.eventbrite.com/e/clickhouse-meetup-istanbul-create-blazing-fast-experiences-w-clickhouse-tickets-73101120419) on November 19.
-* [ClickHouse Meetup in Ankara](https://www.eventbrite.com/e/clickhouse-meetup-ankara-create-blazing-fast-experiences-w-clickhouse-tickets-73100530655) on November 21.
-* [ClickHouse Meetup in Singapore](https://www.meetup.com/Singapore-Clickhouse-Meetup-Group/events/265085331/) on November 23.
-* [ClickHouse Meetup in San Francisco](https://www.eventbrite.com/e/clickhouse-december-meetup-registration-78642047481) on December 3.

+* [ClickHouse Meetup in Moscow](https://yandex.ru/promo/clickhouse/moscow-december-2019) on December 11.
--- a/contrib/libhdfs3-cmake/CMakeLists.txt
+++ b/contrib/libhdfs3-cmake/CMakeLists.txt
@ -182,6 +182,9 @@ set(SRCS
    ${HDFS3_SOURCE_DIR}/common/FileWrapper.h
    )

+# old kernels (< 3.17) doens't have SYS_getrandom. Always use POSIX implementation to have better compatibility
+set_source_files_properties(${HDFS3_SOURCE_DIR}/rpc/RpcClient.cpp PROPERTIES COMPILE_FLAGS "-DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX=1")
+
 # target
 add_library(hdfs3 ${SRCS} ${PROTO_SOURCES} ${PROTO_HEADERS})

--- a/contrib/libunwind
+++ b/contrib/libunwind
@ -1 +1 @@
-Subproject commit 5afe6d87ae9e66485c7fcb106d2f7c2c0359c8f6
+Subproject commit 68cffcbbd1840e14664a5f7f19c5e43f65c525b5
--- a/contrib/libunwind-cmake/CMakeLists.txt
+++ b/contrib/libunwind-cmake/CMakeLists.txt
@ -11,7 +11,9 @@ endif ()
 set(LIBUNWIND_C_SOURCES
    ${LIBUNWIND_SOURCE_DIR}/src/UnwindLevel1.c
    ${LIBUNWIND_SOURCE_DIR}/src/UnwindLevel1-gcc-ext.c
-    ${LIBUNWIND_SOURCE_DIR}/src/Unwind-sjlj.c)
+    ${LIBUNWIND_SOURCE_DIR}/src/Unwind-sjlj.c
+    # Use unw_backtrace to override libgcc's backtrace symbol for better ABI compatibility
+    unwind-override.c)
 set_source_files_properties(${LIBUNWIND_C_SOURCES} PROPERTIES COMPILE_FLAGS "-std=c99")

 set(LIBUNWIND_ASM_SOURCES
--- a/contrib/libunwind-cmake/unwind-override.c
+++ b/contrib/libunwind-cmake/unwind-override.c
@ -0,0 +1,6 @@
+#include <libunwind.h>
+
+int backtrace(void ** buffer, int size)
+{
+    return unw_backtrace(buffer, size);
+}
--- a/contrib/poco
+++ b/contrib/poco
@ -1 +1 @@
-Subproject commit 6216cc01a107ce149863411ca29013a224f80343
+Subproject commit 2b273bfe9db89429b2040c024484dee0197e48c7
--- a/contrib/protobuf
+++ b/contrib/protobuf
@ -1 +1 @@
-Subproject commit 12735370922a35f03999afff478e1c6d7aa917a4
+Subproject commit 0795fa6bc443666068bec56bf700e1f488f592f1
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@ -76,7 +76,7 @@ elseif (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
 endif()

 if (USE_DEBUG_HELPERS)
-    set (INCLUDE_DEBUG_HELPERS "-include ${ClickHouse_SOURCE_DIR}/libs/libcommon/include/common/iostream_debug_helpers.h")
+    set (INCLUDE_DEBUG_HELPERS "-I${ClickHouse_SOURCE_DIR}/libs/libcommon/include -include ${ClickHouse_SOURCE_DIR}/dbms/src/Core/iostream_debug_helpers.h")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${INCLUDE_DEBUG_HELPERS}")
 endif ()

@ -376,6 +376,10 @@ if (USE_POCO_MONGODB)
    dbms_target_link_libraries (PRIVATE ${Poco_MongoDB_LIBRARY})
 endif()

+if (USE_POCO_REDIS)
+    dbms_target_link_libraries (PRIVATE ${Poco_Redis_LIBRARY})
+endif()
+
 if (USE_POCO_NETSSL)
    target_link_libraries (clickhouse_common_io PRIVATE ${Poco_NetSSL_LIBRARY} ${Poco_Crypto_LIBRARY})
    dbms_target_link_libraries (PRIVATE ${Poco_NetSSL_LIBRARY} ${Poco_Crypto_LIBRARY})
@ -428,6 +432,8 @@ if (USE_JEMALLOC)

    if(NOT MAKE_STATIC_LIBRARIES AND ${JEMALLOC_LIBRARIES} MATCHES "${CMAKE_STATIC_LIBRARY_SUFFIX}$")
        # mallctl in dbms/src/Interpreters/AsynchronousMetrics.cpp
+        # Actually we link JEMALLOC to almost all libraries.
+        # This is just hotfix for some uninvestigated problem.
        target_link_libraries(clickhouse_interpreters PRIVATE ${JEMALLOC_LIBRARIES})
    endif()
 endif ()
--- a/dbms/programs/client/Client.cpp
+++ b/dbms/programs/client/Client.cpp
@ -97,8 +97,7 @@
 #define BRACK_PASTE_LAST '~'
 #define BRACK_PASTE_SLEN 6

-/// Make sure we don't get ^J for the enter character.
-/// This handler also bypasses some unused macro/event checkings.
+/// This handler bypasses some unused macro/event checkings.
 static int clickhouse_rl_bracketed_paste_begin(int /* count */, int /* key */)
 {
    std::string buf;
@ -106,10 +105,10 @@ static int clickhouse_rl_bracketed_paste_begin(int /* count */, int /* key */)

    RL_SETSTATE(RL_STATE_MOREINPUT);
    SCOPE_EXIT(RL_UNSETSTATE(RL_STATE_MOREINPUT));
-    char c;
+    int c;
    while ((c = rl_read_key()) >= 0)
    {
-        if (c == '\r' || c == '\n')
+        if (c == '\r')
            c = '\n';
        buf.push_back(c);
        if (buf.size() >= BRACK_PASTE_SLEN && c == BRACK_PASTE_LAST && buf.substr(buf.size() - BRACK_PASTE_SLEN) == BRACK_PASTE_SUFF)
@ -1112,7 +1111,14 @@ private:
            /// Check if server send Exception packet
            auto packet_type = connection->checkPacket();
            if (packet_type && *packet_type == Protocol::Server::Exception)
+            {
+                /*
+                 * We're exiting with error, so it makes sense to kill the
+                 * input stream without waiting for it to complete.
+                 */
+                async_block_input->cancel(true);
                return;
+            }

            connection->sendData(block);
            processed_rows += block.rows();
@ -1226,7 +1232,7 @@ private:
    /// Returns true if one should continue receiving packets.
    bool receiveAndProcessPacket()
    {
-        Connection::Packet packet = connection->receivePacket();
+        Packet packet = connection->receivePacket();

        switch (packet.type)
        {
@ -1274,7 +1280,7 @@ private:
    {
        while (true)
        {
-            Connection::Packet packet = connection->receivePacket();
+            Packet packet = connection->receivePacket();

            switch (packet.type)
            {
@ -1308,7 +1314,7 @@ private:
    {
        while (true)
        {
-            Connection::Packet packet = connection->receivePacket();
+            Packet packet = connection->receivePacket();

            switch (packet.type)
            {
--- a/dbms/programs/client/Suggest.h
+++ b/dbms/programs/client/Suggest.h
@ -113,7 +113,7 @@ private:

        while (true)
        {
-            Connection::Packet packet = connection.receivePacket();
+            Packet packet = connection.receivePacket();
            switch (packet.type)
            {
                case Protocol::Server::Data:
--- a/dbms/programs/copier/ClusterCopier.cpp
+++ b/dbms/programs/copier/ClusterCopier.cpp
@ -1,6 +1,7 @@
 #include "ClusterCopier.h"

 #include <chrono>
+#include <optional>
 #include <Poco/Util/XMLConfiguration.h>
 #include <Poco/Logger.h>
 #include <Poco/ConsoleChannel.h>
@ -178,7 +179,9 @@ struct ShardPartition
    ShardPartition(TaskShard & parent, const String & name_quoted_) : task_shard(parent), name(name_quoted_) {}

    String getPartitionPath() const;
+    String getPartitionCleanStartPath() const;
    String getCommonPartitionIsDirtyPath() const;
+    String getCommonPartitionIsCleanedPath() const;
    String getPartitionActiveWorkersPath() const;
    String getActiveWorkerPath() const;
    String getPartitionShardsPath() const;
@ -259,6 +262,8 @@ struct TaskTable

    String getPartitionPath(const String & partition_name) const;
    String getPartitionIsDirtyPath(const String & partition_name) const;
+    String getPartitionIsCleanedPath(const String & partition_name) const;
+    String getPartitionTaskStatusPath(const String & partition_name) const;

    String name_in_config;

@ -369,23 +374,6 @@ struct MultiTransactionInfo
    Coordination::Responses responses;
 };

-
-/// Atomically checks that is_dirty node is not exists, and made the remaining op
-/// Returns relative number of failed operation in the second field (the passed op has 0 index)
-static MultiTransactionInfo checkNoNodeAndCommit(
-    const zkutil::ZooKeeperPtr & zookeeper,
-    const String & checking_node_path,
-    Coordination::RequestPtr && op)
-{
-    MultiTransactionInfo info;
-    info.requests.emplace_back(zkutil::makeCreateRequest(checking_node_path, "", zkutil::CreateMode::Persistent));
-    info.requests.emplace_back(zkutil::makeRemoveRequest(checking_node_path, -1));
-    info.requests.emplace_back(std::move(op));
-    info.code = zookeeper->tryMulti(info.requests, info.responses);
-    return info;
-}
-
-
 // Creates AST representing 'ENGINE = Distributed(cluster, db, table, [sharding_key])
 std::shared_ptr<ASTStorage> createASTStorageDistributed(
    const String & cluster_name, const String & database, const String & table, const ASTPtr & sharding_key_ast = nullptr)
@ -431,6 +419,11 @@ String TaskTable::getPartitionPath(const String & partition_name) const
           + "/" + escapeForFileName(partition_name);   // 201701
 }

+String ShardPartition::getPartitionCleanStartPath() const
+{
+    return getPartitionPath() + "/clean_start";
+}
+
 String ShardPartition::getPartitionPath() const
 {
    return task_shard.task_table.getPartitionPath(name);
@ -438,8 +431,9 @@ String ShardPartition::getPartitionPath() const

 String ShardPartition::getShardStatusPath() const
 {
-    // /root/table_test.hits/201701/1
-    return getPartitionPath() + "/shards/" + toString(task_shard.numberInCluster());
+    // schema: /<root...>/tables/<table>/<partition>/shards/<shard>
+    // e.g. /root/table_test.hits/201701/shards/1
+    return getPartitionShardsPath() + "/" + toString(task_shard.numberInCluster());
 }

 String ShardPartition::getPartitionShardsPath() const
@ -462,11 +456,26 @@ String ShardPartition::getCommonPartitionIsDirtyPath() const
    return getPartitionPath() + "/is_dirty";
 }

+String ShardPartition::getCommonPartitionIsCleanedPath() const
+{
+    return getCommonPartitionIsDirtyPath() + "/cleaned";
+}
+
 String TaskTable::getPartitionIsDirtyPath(const String & partition_name) const
 {
    return getPartitionPath(partition_name) + "/is_dirty";
 }

+String TaskTable::getPartitionIsCleanedPath(const String & partition_name) const
+{
+    return getPartitionIsDirtyPath(partition_name) + "/cleaned";
+}
+
+String TaskTable::getPartitionTaskStatusPath(const String & partition_name) const
+{
+    return getPartitionPath(partition_name) + "/shards";
+}
+
 String DB::TaskShard::getDescription() const
 {
    std::stringstream ss;
@ -1129,9 +1138,9 @@ protected:
    }

    /** Checks that the whole partition of a table was copied. We should do it carefully due to dirty lock.
-     * State of some task could be changed during the processing.
-     * We have to ensure that all shards have the finished state and there are no dirty flag.
-     * Moreover, we have to check status twice and check zxid, because state could be changed during the checking.
+     * State of some task could change during the processing.
+     * We have to ensure that all shards have the finished state and there is no dirty flag.
+     * Moreover, we have to check status twice and check zxid, because state can change during the checking.
     */
    bool checkPartitionIsDone(const TaskTable & task_table, const String & partition_name, const TasksShard & shards_with_partition)
    {
@ -1170,10 +1179,22 @@ protected:
            }

            // Check that partition is not dirty
-            if (zookeeper->exists(task_table.getPartitionIsDirtyPath(partition_name)))
            {
-                LOG_INFO(log, "Partition " << partition_name << " become dirty");
-                return false;
+                CleanStateClock clean_state_clock (
+                                                   zookeeper,
+                                                   task_table.getPartitionIsDirtyPath(partition_name),
+                                                   task_table.getPartitionIsCleanedPath(partition_name)
+                                                   );
+                Coordination::Stat stat;
+                LogicalClock task_start_clock;
+                if (zookeeper->exists(task_table.getPartitionTaskStatusPath(partition_name), &stat))
+                    task_start_clock = LogicalClock(stat.mzxid);
+                zookeeper->get(task_table.getPartitionTaskStatusPath(partition_name), &stat);
+                if (!clean_state_clock.is_clean() || task_start_clock <= clean_state_clock.discovery_zxid)
+                {
+                    LOG_INFO(log, "Partition " << partition_name << " become dirty");
+                    return false;
+                }
            }

            get_futures.clear();
@ -1260,17 +1281,135 @@ protected:
        return res;
    }

-    bool tryDropPartition(ShardPartition & task_partition, const zkutil::ZooKeeperPtr & zookeeper)
+    class LogicalClock
+    {
+    public:
+        std::optional<UInt64> zxid;
+
+        LogicalClock() = default;
+
+        LogicalClock(UInt64 _zxid)
+            : zxid(_zxid)
+        {}
+
+        bool hasHappened() const
+        {
+            return bool(zxid);
+        }
+
+        // happens-before relation with a reasonable time bound
+        bool happensBefore(const LogicalClock & other) const
+        {
+            const UInt64 HALF = 1ull << 63;
+            return
+                !zxid ||
+                (other.zxid && *zxid <= *other.zxid && *other.zxid - *zxid < HALF) ||
+                (other.zxid && *zxid >= *other.zxid && *zxid - *other.zxid > HALF);
+        }
+
+        bool operator<=(const LogicalClock & other) const
+        {
+            return happensBefore(other);
+        }
+
+        // strict equality check
+        bool operator==(const LogicalClock & other) const
+        {
+            return zxid == other.zxid;
+        }
+    };
+
+    class CleanStateClock
+    {
+    public:
+        LogicalClock discovery_zxid;
+        std::optional<UInt32> discovery_version;
+
+        LogicalClock clean_state_zxid;
+        std::optional<UInt32> clean_state_version;
+
+        std::shared_ptr<std::atomic_bool> stale;
+
+        bool is_clean() const
+        {
+            return
+                !is_stale()
+                && (
+                    !discovery_zxid.hasHappened()
+                    || (clean_state_zxid.hasHappened() && discovery_zxid <= clean_state_zxid));
+        }
+
+        bool is_stale() const
+        {
+            return stale->load();
+        }
+
+        CleanStateClock(
+                        const zkutil::ZooKeeperPtr & zookeeper,
+                        const String & discovery_path,
+                        const String & clean_state_path)
+            : stale(std::make_shared<std::atomic_bool>(false))
+        {
+            Coordination::Stat stat;
+            String _some_data;
+            auto watch_callback =
+                [stale = stale] (const Coordination::WatchResponse & rsp)
+                {
+                    auto logger = &Poco::Logger::get("ClusterCopier");
+                    if (rsp.error == Coordination::ZOK)
+                    {
+                        switch (rsp.type)
+                        {
+                        case Coordination::CREATED:
+                            LOG_DEBUG(logger, "CleanStateClock change: CREATED, at " << rsp.path);
+                            stale->store(true);
+                            break;
+                        case Coordination::CHANGED:
+                            LOG_DEBUG(logger, "CleanStateClock change: CHANGED, at" << rsp.path);
+                            stale->store(true);
+                        }
+                    }
+                };
+            if (zookeeper->tryGetWatch(discovery_path, _some_data, &stat, watch_callback))
+            {
+                discovery_zxid = LogicalClock(stat.mzxid);
+                discovery_version = stat.version;
+            }
+            if (zookeeper->tryGetWatch(clean_state_path, _some_data, &stat, watch_callback))
+            {
+                clean_state_zxid = LogicalClock(stat.mzxid);
+                clean_state_version = stat.version;
+            }
+        }
+
+        bool operator==(const CleanStateClock & other) const
+        {
+            return !is_stale()
+                && !other.is_stale()
+                && discovery_zxid == other.discovery_zxid
+                && discovery_version == other.discovery_version
+                && clean_state_zxid == other.clean_state_zxid
+                && clean_state_version == other.clean_state_version;
+        }
+
+        bool operator!=(const CleanStateClock & other) const
+        {
+            return !(*this == other);
+        }
+    };
+
+    bool tryDropPartition(ShardPartition & task_partition, const zkutil::ZooKeeperPtr & zookeeper, const CleanStateClock & clean_state_clock)
    {
        if (is_safe_mode)
            throw Exception("DROP PARTITION is prohibited in safe mode", ErrorCodes::NOT_IMPLEMENTED);

        TaskTable & task_table = task_partition.task_shard.task_table;

-        String current_shards_path = task_partition.getPartitionShardsPath();
-        String current_partition_active_workers_dir = task_partition.getPartitionActiveWorkersPath();
-        String is_dirty_flag_path = task_partition.getCommonPartitionIsDirtyPath();
-        String dirt_cleaner_path = is_dirty_flag_path + "/cleaner";
+        const String current_shards_path = task_partition.getPartitionShardsPath();
+        const String current_partition_active_workers_dir = task_partition.getPartitionActiveWorkersPath();
+        const String is_dirty_flag_path = task_partition.getCommonPartitionIsDirtyPath();
+        const String dirt_cleaner_path = is_dirty_flag_path + "/cleaner";
+        const String is_dirt_cleaned_path = task_partition.getCommonPartitionIsCleanedPath();

        zkutil::EphemeralNodeHolder::Ptr cleaner_holder;
        try
@ -1294,44 +1433,92 @@ protected:
        {
            if (stat.numChildren != 0)
            {
-                LOG_DEBUG(log, "Partition " << task_partition.name << " contains " << stat.numChildren << " active workers, sleep");
+                LOG_DEBUG(log, "Partition " << task_partition.name << " contains " << stat.numChildren << " active workers while trying to drop it. Going to sleep.");
                std::this_thread::sleep_for(default_sleep_time);
                return false;
            }
+            else
+            {
+                zookeeper->remove(current_partition_active_workers_dir);
+            }
        }

-        /// Remove all status nodes
-        zookeeper->tryRemoveRecursive(current_shards_path);
-
-        String query = "ALTER TABLE " + getQuotedTable(task_table.table_push);
-        query += " DROP PARTITION " + task_partition.name + "";
-
-        /// TODO: use this statement after servers will be updated up to 1.1.54310
-        // query += " DROP PARTITION ID '" + task_partition.name + "'";
-
-        ClusterPtr & cluster_push = task_table.cluster_push;
-        Settings settings_push = task_cluster->settings_push;
-
-        /// It is important, DROP PARTITION must be done synchronously
-        settings_push.replication_alter_partitions_sync = 2;
-
-        LOG_DEBUG(log, "Execute distributed DROP PARTITION: " << query);
-        /// Limit number of max executing replicas to 1
-        UInt64 num_shards = executeQueryOnCluster(cluster_push, query, nullptr, &settings_push, PoolMode::GET_ONE, 1);
-
-        if (num_shards < cluster_push->getShardCount())
        {
-            LOG_INFO(log, "DROP PARTITION wasn't successfully executed on " << cluster_push->getShardCount() - num_shards << " shards");
-            return false;
+            zkutil::EphemeralNodeHolder::Ptr active_workers_lock;
+            try
+            {
+                active_workers_lock = zkutil::EphemeralNodeHolder::create(current_partition_active_workers_dir, *zookeeper, host_id);
+            }
+            catch (const Coordination::Exception & e)
+            {
+                if (e.code == Coordination::ZNODEEXISTS)
+                {
+                    LOG_DEBUG(log, "Partition " << task_partition.name << " is being filled now by somebody, sleep");
+                    return false;
+                }
+
+                throw;
+            }
+
+            // Lock the dirty flag
+            zookeeper->set(is_dirty_flag_path, host_id, clean_state_clock.discovery_version.value());
+            zookeeper->tryRemove(task_partition.getPartitionCleanStartPath());
+            CleanStateClock my_clock(zookeeper, is_dirty_flag_path, is_dirt_cleaned_path);
+
+            /// Remove all status nodes
+            {
+                Strings children;
+                if (zookeeper->tryGetChildren(current_shards_path, children) == Coordination::ZOK)
+                    for (const auto & child : children)
+                    {
+                        zookeeper->removeRecursive(current_shards_path + "/" + child);
+                    }
+            }
+
+            String query = "ALTER TABLE " + getQuotedTable(task_table.table_push);
+            query += " DROP PARTITION " + task_partition.name + "";
+
+            /// TODO: use this statement after servers will be updated up to 1.1.54310
+            // query += " DROP PARTITION ID '" + task_partition.name + "'";
+
+            ClusterPtr & cluster_push = task_table.cluster_push;
+            Settings settings_push = task_cluster->settings_push;
+
+            /// It is important, DROP PARTITION must be done synchronously
+            settings_push.replication_alter_partitions_sync = 2;
+
+            LOG_DEBUG(log, "Execute distributed DROP PARTITION: " << query);
+            /// Limit number of max executing replicas to 1
+            UInt64 num_shards = executeQueryOnCluster(cluster_push, query, nullptr, &settings_push, PoolMode::GET_ONE, 1);
+
+            if (num_shards < cluster_push->getShardCount())
+            {
+                LOG_INFO(log, "DROP PARTITION wasn't successfully executed on " << cluster_push->getShardCount() - num_shards << " shards");
+                return false;
+            }
+
+            /// Update the locking node
+            if (!my_clock.is_stale())
+            {
+                zookeeper->set(is_dirty_flag_path, host_id, my_clock.discovery_version.value());
+                if (my_clock.clean_state_version)
+                    zookeeper->set(is_dirt_cleaned_path, host_id, my_clock.clean_state_version.value());
+                else
+                    zookeeper->create(is_dirt_cleaned_path, host_id, zkutil::CreateMode::Persistent);
+            }
+            else
+            {
+                LOG_DEBUG(log, "Clean state is altered when dropping the partition, cowardly bailing");
+                /// clean state is stale
+                return false;
+            }
+
+            LOG_INFO(log, "Partition " << task_partition.name << " was dropped on cluster " << task_table.cluster_push_name);
+            if (zookeeper->tryCreate(current_shards_path, host_id, zkutil::CreateMode::Persistent) == Coordination::ZNODEEXISTS)
+                zookeeper->set(current_shards_path, host_id);
        }

-        /// Remove the locking node
-        Coordination::Requests requests;
-        requests.emplace_back(zkutil::makeRemoveRequest(dirt_cleaner_path, -1));
-        requests.emplace_back(zkutil::makeRemoveRequest(is_dirty_flag_path, -1));
-        zookeeper->multi(requests);
-
-        LOG_INFO(log, "Partition " << task_partition.name << " was dropped on cluster " << task_table.cluster_push_name);
+        LOG_INFO(log, "Partition " << task_partition.name << " is safe for work now.");
        return true;
    }

@ -1362,6 +1549,7 @@ protected:

            /// Process each source shard having current partition and copy current partition
            /// NOTE: shards are sorted by "distance" to current host
+            bool has_shard_to_process = false;
            for (const TaskShardPtr & shard : task_table.all_shards)
            {
                /// Does shard have a node with current partition?
@ -1405,6 +1593,7 @@ protected:
                bool is_unprioritized_task = !previous_shard_is_instantly_finished && shard->priority.is_remote;
                PartitionTaskStatus task_status = PartitionTaskStatus::Error;
                bool was_error = false;
+                has_shard_to_process = true;
                for (UInt64 try_num = 0; try_num < max_shard_partition_tries; ++try_num)
                {
                    task_status = tryProcessPartitionTask(timeouts, partition, is_unprioritized_task);
@ -1432,11 +1621,13 @@ protected:
            cluster_partition.elapsed_time_seconds += watch.elapsedSeconds();

            /// Check that whole cluster partition is done
-            /// Firstly check number failed partition tasks, than look into ZooKeeper and ensure that each partition is done
+            /// Firstly check the number of failed partition tasks, then look into ZooKeeper and ensure that each partition is done
            bool partition_is_done = num_failed_shards == 0;
            try
            {
-                partition_is_done = partition_is_done && checkPartitionIsDone(task_table, partition_name, expected_shards);
+                partition_is_done =
+                    !has_shard_to_process
+                    || (partition_is_done && checkPartitionIsDone(task_table, partition_name, expected_shards));
            }
            catch (...)
            {
@ -1526,20 +1717,35 @@ protected:
        TaskTable & task_table = task_shard.task_table;
        ClusterPartition & cluster_partition = task_table.getClusterPartition(task_partition.name);

+        /// We need to update table definitions for each partition, it could be changed after ALTER
+        createShardInternalTables(timeouts, task_shard);
+
        auto zookeeper = context.getZooKeeper();

-        String is_dirty_flag_path = task_partition.getCommonPartitionIsDirtyPath();
-        String current_task_is_active_path = task_partition.getActiveWorkerPath();
-        String current_task_status_path = task_partition.getShardStatusPath();
+        const String is_dirty_flag_path = task_partition.getCommonPartitionIsDirtyPath();
+        const String is_dirt_cleaned_path = task_partition.getCommonPartitionIsCleanedPath();
+        const String current_task_is_active_path = task_partition.getActiveWorkerPath();
+        const String current_task_status_path = task_partition.getShardStatusPath();

        /// Auxiliary functions:

        /// Creates is_dirty node to initialize DROP PARTITION
-        auto create_is_dirty_node = [&] ()
+        auto create_is_dirty_node = [&, this] (const CleanStateClock & clock)
        {
-            auto code = zookeeper->tryCreate(is_dirty_flag_path, current_task_status_path, zkutil::CreateMode::Persistent);
-            if (code && code != Coordination::ZNODEEXISTS)
-                throw Coordination::Exception(code, is_dirty_flag_path);
+            if (clock.is_stale())
+                LOG_DEBUG(log, "Clean state clock is stale while setting dirty flag, cowardly bailing");
+            else if (!clock.is_clean())
+                LOG_DEBUG(log, "Thank you, Captain Obvious");
+            else if (clock.discovery_version)
+            {
+                LOG_DEBUG(log, "Updating clean state clock");
+                zookeeper->set(is_dirty_flag_path, host_id, clock.discovery_version.value());
+            }
+            else
+            {
+                LOG_DEBUG(log, "Creating clean state clock");
+                zookeeper->create(is_dirty_flag_path, host_id, zkutil::CreateMode::Persistent);
+            }
        };

        /// Returns SELECT query filtering current partition and applying user filter
@ -1563,14 +1769,29 @@ protected:

        LOG_DEBUG(log, "Processing " << current_task_status_path);

+        CleanStateClock clean_state_clock (zookeeper, is_dirty_flag_path, is_dirt_cleaned_path);
+
+        LogicalClock task_start_clock;
+        {
+            Coordination::Stat stat;
+            if (zookeeper->exists(task_partition.getPartitionShardsPath(), &stat))
+                task_start_clock = LogicalClock(stat.mzxid);
+        }
+
        /// Do not start if partition is dirty, try to clean it
-        if (zookeeper->exists(is_dirty_flag_path))
+        if (clean_state_clock.is_clean()
+            && (!task_start_clock.hasHappened() || clean_state_clock.discovery_zxid <= task_start_clock))
+        {
+            LOG_DEBUG(log, "Partition " << task_partition.name << " appears to be clean");
+            zookeeper->createAncestors(current_task_status_path);
+        }
+        else
        {
            LOG_DEBUG(log, "Partition " << task_partition.name << " is dirty, try to drop it");

            try
            {
-                tryDropPartition(task_partition, zookeeper);
+                tryDropPartition(task_partition, zookeeper, clean_state_clock);
            }
            catch (...)
            {
@ -1598,7 +1819,8 @@ protected:
            throw;
        }

-        /// Exit if task has been already processed, create blocking node if it is abandoned
+        /// Exit if task has been already processed;
+        /// create blocking node to signal cleaning up if it is abandoned
        {
            String status_data;
            if (zookeeper->tryGet(current_task_status_path, status_data))
@ -1611,21 +1833,21 @@ protected:
                }

                // Task is abandoned, initialize DROP PARTITION
-                LOG_DEBUG(log, "Task " << current_task_status_path << " has not been successfully finished by " << status.owner);
+                LOG_DEBUG(log, "Task " << current_task_status_path << " has not been successfully finished by " << status.owner << ". Partition will be dropped and refilled.");

-                create_is_dirty_node();
+                create_is_dirty_node(clean_state_clock);
                return PartitionTaskStatus::Error;
            }
        }

-        zookeeper->createAncestors(current_task_status_path);
-
-        /// We need to update table definitions for each partition, it could be changed after ALTER
-        createShardInternalTables(timeouts, task_shard);
-
        /// Check that destination partition is empty if we are first worker
        /// NOTE: this check is incorrect if pull and push tables have different partition key!
+        String clean_start_status;
+        if (!zookeeper->tryGet(task_partition.getPartitionCleanStartPath(), clean_start_status) || clean_start_status != "ok")
        {
+            zookeeper->createIfNotExists(task_partition.getPartitionCleanStartPath(), "");
+            auto checker = zkutil::EphemeralNodeHolder::create(task_partition.getPartitionCleanStartPath() + "/checker", *zookeeper, host_id);
+            // Maybe we are the first worker
            ASTPtr query_select_ast = get_select_query(task_shard.table_split_shard, "count()");
            UInt64 count;
            {
@ -1643,36 +1865,38 @@ protected:
                Coordination::Stat stat_shards;
                zookeeper->get(task_partition.getPartitionShardsPath(), &stat_shards);

+                /// NOTE: partition is still fresh if dirt discovery happens before cleaning
                if (stat_shards.numChildren == 0)
                {
-                    LOG_WARNING(log, "There are no any workers for partition " << task_partition.name
+                    LOG_WARNING(log, "There are no workers for partition " << task_partition.name
                                     << ", but destination table contains " << count << " rows"
                                     << ". Partition will be dropped and refilled.");

-                    create_is_dirty_node();
+                    create_is_dirty_node(clean_state_clock);
                    return PartitionTaskStatus::Error;
                }
            }
+            zookeeper->set(task_partition.getPartitionCleanStartPath(), "ok");
        }
+        /// At this point, we need to sync that the destination table is clean
+        /// before any actual work

        /// Try start processing, create node about it
        {
            String start_state = TaskStateWithOwner::getData(TaskState::Started, host_id);
-            auto op_create = zkutil::makeCreateRequest(current_task_status_path, start_state, zkutil::CreateMode::Persistent);
-            MultiTransactionInfo info = checkNoNodeAndCommit(zookeeper, is_dirty_flag_path, std::move(op_create));
-
-            if (info.code)
+            CleanStateClock new_clean_state_clock (zookeeper, is_dirty_flag_path, is_dirt_cleaned_path);
+            if (clean_state_clock != new_clean_state_clock)
            {
-                zkutil::KeeperMultiException exception(info.code, info.requests, info.responses);
-
-                if (exception.getPathForFirstFailedOp() == is_dirty_flag_path)
-                {
-                    LOG_INFO(log, "Partition " << task_partition.name << " is dirty and will be dropped and refilled");
-                    return PartitionTaskStatus::Error;
-                }
-
-                throw exception;
+                LOG_INFO(log, "Partition " << task_partition.name << " clean state changed, cowardly bailing");
+                return PartitionTaskStatus::Error;
            }
+            else if (!new_clean_state_clock.is_clean())
+            {
+                LOG_INFO(log, "Partition " << task_partition.name << " is dirty and will be dropped and refilled");
+                create_is_dirty_node(new_clean_state_clock);
+                return PartitionTaskStatus::Error;
+            }
+            zookeeper->create(current_task_status_path, start_state, zkutil::CreateMode::Persistent);
        }

        /// Try create table (if not exists) on each shard
@ -1733,12 +1957,13 @@ protected:
                    output = io_insert.out;
                }

+                /// Fail-fast optimization to abort copying when the current clean state expires
                std::future<Coordination::ExistsResponse> future_is_dirty_checker;

                Stopwatch watch(CLOCK_MONOTONIC_COARSE);
                constexpr UInt64 check_period_milliseconds = 500;

-                /// Will asynchronously check that ZooKeeper connection and is_dirty flag appearing while copy data
+                /// Will asynchronously check that ZooKeeper connection and is_dirty flag appearing while copying data
                auto cancel_check = [&] ()
                {
                    if (zookeeper->expired())
@ -1754,7 +1979,12 @@ protected:
                        Coordination::ExistsResponse status = future_is_dirty_checker.get();

                        if (status.error != Coordination::ZNONODE)
+                        {
+                            LogicalClock dirt_discovery_epoch (status.stat.mzxid);
+                            if (dirt_discovery_epoch == clean_state_clock.discovery_zxid)
+                                return false;
                            throw Exception("Partition is dirty, cancel INSERT SELECT", ErrorCodes::UNFINISHED);
+                        }
                    }

                    return false;
@ -1789,20 +2019,19 @@ protected:
        /// Finalize the processing, change state of current partition task (and also check is_dirty flag)
        {
            String state_finished = TaskStateWithOwner::getData(TaskState::Finished, host_id);
-            auto op_set = zkutil::makeSetRequest(current_task_status_path, state_finished, 0);
-            MultiTransactionInfo info = checkNoNodeAndCommit(zookeeper, is_dirty_flag_path, std::move(op_set));
-
-            if (info.code)
+            CleanStateClock new_clean_state_clock (zookeeper, is_dirty_flag_path, is_dirt_cleaned_path);
+            if (clean_state_clock != new_clean_state_clock)
            {
-                zkutil::KeeperMultiException exception(info.code, info.requests, info.responses);
-
-                if (exception.getPathForFirstFailedOp() == is_dirty_flag_path)
-                    LOG_INFO(log, "Partition " << task_partition.name << " became dirty and will be dropped and refilled");
-                else
-                    LOG_INFO(log, "Someone made the node abandoned. Will refill partition. " << zkutil::ZooKeeper::error2string(info.code));
-
+                LOG_INFO(log, "Partition " << task_partition.name << " clean state changed, cowardly bailing");
                return PartitionTaskStatus::Error;
            }
+            else if (!new_clean_state_clock.is_clean())
+            {
+                LOG_INFO(log, "Partition " << task_partition.name << " became dirty and will be dropped and refilled");
+                create_is_dirty_node(new_clean_state_clock);
+                return PartitionTaskStatus::Error;
+            }
+            zookeeper->set(current_task_status_path, state_finished, 0);
        }

        LOG_INFO(log, "Partition " << task_partition.name << " copied");
--- a/dbms/programs/odbc-bridge/CMakeLists.txt
+++ b/dbms/programs/odbc-bridge/CMakeLists.txt
@ -30,6 +30,11 @@ if (Poco_Data_FOUND)
    set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${Poco_Data_LIBRARY})
    set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR})
 endif ()
+if (USE_JEMALLOC)
+    # We need to link jemalloc directly to odbc-bridge-library, because in other case
+    # we will build it with default malloc.
+    set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${JEMALLOC_LIBRARIES})
+endif()

 clickhouse_program_add_library(odbc-bridge)

--- a/dbms/programs/performance-test/PerformanceTest.cpp
+++ b/dbms/programs/performance-test/PerformanceTest.cpp
@ -35,7 +35,7 @@ void waitQuery(Connection & connection)
        if (!connection.poll(1000000))
            continue;

-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();
        switch (packet.type)
        {
            case Protocol::Server::EndOfStream:
@ -120,7 +120,7 @@ bool PerformanceTest::checkPreconditions() const

            while (true)
            {
-                Connection::Packet packet = connection.receivePacket();
+                Packet packet = connection.receivePacket();

                if (packet.type == Protocol::Server::Data)
                {
--- a/dbms/programs/server/HTTPHandler.cpp
+++ b/dbms/programs/server/HTTPHandler.cpp
@ -407,16 +407,16 @@ void HTTPHandler::processQuery(
    {
        if (http_request_compression_method_str == "gzip")
        {
-            in_post = std::make_unique<ZlibInflatingReadBuffer>(*in_post_raw, CompressionMethod::Gzip);
+            in_post = std::make_unique<ZlibInflatingReadBuffer>(std::move(in_post_raw), CompressionMethod::Gzip);
        }
        else if (http_request_compression_method_str == "deflate")
        {
-            in_post = std::make_unique<ZlibInflatingReadBuffer>(*in_post_raw, CompressionMethod::Zlib);
+            in_post = std::make_unique<ZlibInflatingReadBuffer>(std::move(in_post_raw), CompressionMethod::Zlib);
        }
 #if USE_BROTLI
        else if (http_request_compression_method_str == "br")
        {
-            in_post = std::make_unique<BrotliReadBuffer>(*in_post_raw);
+            in_post = std::make_unique<BrotliReadBuffer>(std::move(in_post_raw));
        }
 #endif
        else
--- a/dbms/programs/server/MySQLHandler.cpp
+++ b/dbms/programs/server/MySQLHandler.cpp
@ -15,6 +15,7 @@
 #include <IO/ReadBufferFromString.h>
 #include <IO/WriteBufferFromPocoSocket.h>
 #include <Storages/IStorage.h>
+#include <boost/algorithm/string/replace.hpp>

 #if USE_POCO_NETSSL
 #include <Poco/Net/SecureStreamSocket.h>
@ -216,15 +217,15 @@ void MySQLHandler::finishHandshake(MySQLProtocol::HandshakeResponse & packet)

 void MySQLHandler::authenticate(const String & user_name, const String & auth_plugin_name, const String & initial_auth_response)
 {
-    // For compatibility with JavaScript MySQL client, Native41 authentication plugin is used when possible (if password is specified using double SHA1). Otherwise SHA256 plugin is used.
-    auto user = connection_context.getUser(user_name);
-    if (user->authentication.getType() != DB::Authentication::DOUBLE_SHA1_PASSWORD)
-    {
-        authPluginSSL();
-    }
-
    try
    {
+        // For compatibility with JavaScript MySQL client, Native41 authentication plugin is used when possible (if password is specified using double SHA1). Otherwise SHA256 plugin is used.
+        auto user = connection_context.getUser(user_name);
+        if (user->authentication.getType() != DB::Authentication::DOUBLE_SHA1_PASSWORD)
+        {
+            authPluginSSL();
+        }
+
        std::optional<String> auth_response = auth_plugin_name == auth_plugin->getName() ? std::make_optional<String>(initial_auth_response) : std::nullopt;
        auth_plugin->authenticate(user_name, auth_response, connection_context, packet_sender, secure_connection, socket().peerAddress());
    }
@ -267,39 +268,59 @@ void MySQLHandler::comPing()
    packet_sender->sendPacket(OK_Packet(0x0, client_capability_flags, 0, 0, 0), true);
 }

+static bool isFederatedServerSetupCommand(const String & query);
+
 void MySQLHandler::comQuery(ReadBuffer & payload)
 {
-    bool with_output = false;
-    std::function<void(const String &)> set_content_type = [&with_output](const String &) -> void {
-        with_output = true;
-    };
+    String query = String(payload.position(), payload.buffer().end());

-    const String query("select ''");
-    ReadBufferFromString empty_select(query);
-
-    bool should_replace = false;
-    // Translate query from MySQL to ClickHouse.
-    // This is a temporary workaround until ClickHouse supports the syntax "@@var_name".
-    if (std::string(payload.position(), payload.buffer().end()) == "select @@version_comment limit 1")  // MariaDB client starts session with that query
+    // This is a workaround in order to support adding ClickHouse to MySQL using federated server.
+    // As Clickhouse doesn't support these statements, we just send OK packet in response.
+    if (isFederatedServerSetupCommand(query))
    {
-        should_replace = true;
-    }
-
-    Context query_context = connection_context;
-    executeQuery(should_replace ? empty_select : payload, *out, true, query_context, set_content_type, nullptr);
-
-    if (!with_output)
        packet_sender->sendPacket(OK_Packet(0x00, client_capability_flags, 0, 0, 0), true);
+    }
+    else
+    {
+        bool with_output = false;
+        std::function<void(const String &)> set_content_type = [&with_output](const String &) -> void {
+            with_output = true;
+        };
+
+        String replacement_query = "select ''";
+        bool should_replace = false;
+
+        // Translate query from MySQL to ClickHouse.
+        // This is a temporary workaround until ClickHouse supports the syntax "@@var_name".
+        if (query == "select @@version_comment limit 1")  // MariaDB client starts session with that query
+        {
+            should_replace = true;
+        }
+        // This is a workaround in order to support adding ClickHouse to MySQL using federated server.
+        if (0 == strncasecmp("SHOW TABLE STATUS LIKE", query.c_str(), 22))
+        {
+            should_replace = true;
+            replacement_query = boost::replace_all_copy(query, "SHOW TABLE STATUS LIKE ", show_table_status_replacement_query);
+        }
+
+        ReadBufferFromString replacement(replacement_query);
+
+        Context query_context = connection_context;
+        executeQuery(should_replace ? replacement : payload, *out, true, query_context, set_content_type, nullptr);
+
+        if (!with_output)
+            packet_sender->sendPacket(OK_Packet(0x00, client_capability_flags, 0, 0, 0), true);
+    }
 }

 void MySQLHandler::authPluginSSL()
 {
-    throw Exception("Compiled without SSL", ErrorCodes::SUPPORT_IS_DISABLED);
+    throw Exception("ClickHouse was built without SSL support. Try specifying password using double SHA1 in users.xml.", ErrorCodes::SUPPORT_IS_DISABLED);
 }

 void MySQLHandler::finishHandshakeSSL([[maybe_unused]] size_t packet_size, [[maybe_unused]] char * buf, [[maybe_unused]] size_t pos, [[maybe_unused]] std::function<void(size_t)> read_bytes, [[maybe_unused]] MySQLProtocol::HandshakeResponse & packet)
 {
-    throw Exception("Compiled without SSL", ErrorCodes::SUPPORT_IS_DISABLED);
+    throw Exception("Client requested SSL, while it is disabled.", ErrorCodes::SUPPORT_IS_DISABLED);
 }

 #if USE_SSL && USE_POCO_NETSSL
@ -335,4 +356,33 @@ void MySQLHandlerSSL::finishHandshakeSSL(size_t packet_size, char * buf, size_t

 #endif

+static bool isFederatedServerSetupCommand(const String & query)
+{
+    return 0 == strncasecmp("SET NAMES", query.c_str(), 9) || 0 == strncasecmp("SET character_set_results", query.c_str(), 25)
+        || 0 == strncasecmp("SET FOREIGN_KEY_CHECKS", query.c_str(), 22) || 0 == strncasecmp("SET AUTOCOMMIT", query.c_str(), 14)
+        || 0 == strncasecmp("SET SESSION TRANSACTION ISOLATION LEVEL", query.c_str(), 39);
+}
+
+const String MySQLHandler::show_table_status_replacement_query("SELECT"
+                                                               " name AS Name,"
+                                                               " engine AS Engine,"
+                                                               " '10' AS Version,"
+                                                               " 'Dynamic' AS Row_format,"
+                                                               " 0 AS Rows,"
+                                                               " 0 AS Avg_row_length,"
+                                                               " 0 AS Data_length,"
+                                                               " 0 AS Max_data_length,"
+                                                               " 0 AS Index_length,"
+                                                               " 0 AS Data_free,"
+                                                               " 'NULL' AS Auto_increment,"
+                                                               " metadata_modification_time AS Create_time,"
+                                                               " metadata_modification_time AS Update_time,"
+                                                               " metadata_modification_time AS Check_time,"
+                                                               " 'utf8_bin' AS Collation,"
+                                                               " 'NULL' AS Checksum,"
+                                                               " '' AS Create_options,"
+                                                               " '' AS Comment"
+                                                               " FROM system.tables"
+                                                               " WHERE name LIKE ");
+
 }
--- a/dbms/programs/server/MySQLHandler.h
+++ b/dbms/programs/server/MySQLHandler.h
@ -11,7 +11,6 @@

 namespace DB
 {
-
 /// Handler for MySQL wire protocol connections. Allows to connect to ClickHouse using MySQL client.
 class MySQLHandler : public Poco::Net::TCPServerConnection
 {
@ -59,6 +58,9 @@ protected:
    std::shared_ptr<WriteBuffer> out;

    bool secure_connection = false;
+
+private:
+    static const String show_table_status_replacement_query;
 };

 #if USE_SSL && USE_POCO_NETSSL
--- a/dbms/programs/server/Server.cpp
+++ b/dbms/programs/server/Server.cpp
@ -243,6 +243,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
    }
 #endif

+    global_context->setRemoteHostFilter(config());
+
    std::string path = getCanonicalPath(config().getString("path", DBMS_DEFAULT_PATH));
    std::string default_database = config().getString("default_database", "default");

@ -438,6 +440,13 @@ int Server::main(const std::vector<std::string> & /*args*/)
            buildLoggers(*config, logger());
            global_context->setClustersConfig(config);
            global_context->setMacros(std::make_unique<Macros>(*config, "macros"));
+
+            /// Setup protection to avoid accidental DROP for big tables (that are greater than 50 GB by default)
+            if (config->has("max_table_size_to_drop"))
+                global_context->setMaxTableSizeToDrop(config->getUInt64("max_table_size_to_drop"));
+
+            if (config->has("max_partition_size_to_drop"))
+                global_context->setMaxPartitionSizeToDrop(config->getUInt64("max_partition_size_to_drop"));
        },
        /* already_loaded = */ true);

@ -469,13 +478,6 @@ int Server::main(const std::vector<std::string> & /*args*/)
    /// Limit on total number of concurrently executed queries.
    global_context->getProcessList().setMaxSize(config().getInt("max_concurrent_queries", 0));

-    /// Setup protection to avoid accidental DROP for big tables (that are greater than 50 GB by default)
-    if (config().has("max_table_size_to_drop"))
-        global_context->setMaxTableSizeToDrop(config().getUInt64("max_table_size_to_drop"));
-
-    if (config().has("max_partition_size_to_drop"))
-        global_context->setMaxPartitionSizeToDrop(config().getUInt64("max_partition_size_to_drop"));
-
    /// Set up caches.

    /// Lower cache size on low-memory systems.
@ -814,7 +816,6 @@ int Server::main(const std::vector<std::string> & /*args*/)

            create_server("mysql_port", [&](UInt16 port)
            {
-#if USE_SSL
                Poco::Net::ServerSocket socket;
                auto address = socket_bind_listen(socket, listen_host, port, /* secure = */ true);
                socket.setReceiveTimeout(Poco::Timespan());
@ -826,11 +827,6 @@ int Server::main(const std::vector<std::string> & /*args*/)
                    new Poco::Net::TCPServerParams));

                LOG_INFO(log, "Listening for MySQL compatibility protocol: " + address.toString());
-#else
-                UNUSED(port);
-                throw Exception{"SSL support for MySQL protocol is disabled because Poco library was built without NetSSL support.",
-                        ErrorCodes::SUPPORT_IS_DISABLED};
-#endif
            });
        }

--- a/dbms/programs/server/TCPHandler.cpp
+++ b/dbms/programs/server/TCPHandler.cpp
@ -924,7 +924,9 @@ void TCPHandler::receiveQuery()

    /// Per query settings.
    Settings & settings = query_context->getSettingsRef();
-    settings.deserialize(*in);
+    auto settings_format = (client_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                      : SettingsBinaryFormat::OLD;
+    settings.deserialize(*in, settings_format);

    /// Sync timeouts on client and server during current query to avoid dangling queries on server
    /// NOTE: We use settings.send_timeout for the receive timeout and vice versa (change arguments ordering in TimeoutSetter),
@ -953,7 +955,9 @@ void TCPHandler::receiveUnexpectedQuery()
        skip_client_info.read(*in, client_revision);

    Settings & skip_settings = query_context->getSettingsRef();
-    skip_settings.deserialize(*in);
+    auto settings_format = (client_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                      : SettingsBinaryFormat::OLD;
+    skip_settings.deserialize(*in, settings_format);

    readVarUInt(skip_uint_64, *in);
    readVarUInt(skip_uint_64, *in);
--- a/dbms/programs/server/config.xml
+++ b/dbms/programs/server/config.xml
@ -3,6 +3,25 @@
  NOTE: User and query level settings are set up in "users.xml" file.
 -->
 <yandex>
+	<!-- The list of hosts allowed to use in URL-related storage engines and table functions.
+		If this section is not present in configuration, all hosts are allowed.
+	-->
+	<remote_url_allow_hosts>
+		<!-- Host should be specified exactly as in URL. The name is checked before DNS resolution.
+			Example: "yandex.ru", "yandex.ru." and "www.yandex.ru" are different hosts.
+            		If port is explicitly specified in URL, the host:port is checked as a whole.
+            		If host specified here without port, any port with this host allowed.
+            		"yandex.ru" -> "yandex.ru:443", "yandex.ru:80" etc. is allowed, but "yandex.ru:80" -> only "yandex.ru:80" is allowed. 
+			If the host is specified as IP address, it is checked as specified in URL. Example: "[2a02:6b8:a::a]".
+			If there are redirects and support for redirects is enabled, every redirect (the Location field) is checked. 
+		-->
+
+		<!-- Regular expression can be specified. RE2 engine is used for regexps.
+			Regexps are not aligned: don't forget to add ^ and $. Also don't forget to escape dot (.) metacharacter
+			(forgetting to do so is a common source of error).
+		-->
+	</remote_url_allow_hosts>
+
    <logger>
        <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
        <level>trace</level>
@ -15,7 +34,6 @@
    <!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>
-
    <!-- For HTTPS and SSL over native protocol. -->
    <!--
    <https_port>8443</https_port>
@ -411,7 +429,7 @@

    <!-- Protection from accidental DROP.
         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
-         If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
+         If you want do delete one table and don't want to change clickhouse-server config, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
         The same for max_partition_size_to_drop.
         Uncomment to disable protection.
--- a/dbms/src/Client/Connection.cpp
+++ b/dbms/src/Client/Connection.cpp
@ -409,7 +409,11 @@ void Connection::sendQuery(

    /// Per query settings.
    if (settings)
-        settings->serialize(*out);
+    {
+        auto settings_format = (server_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                          : SettingsBinaryFormat::OLD;
+        settings->serialize(*out, settings_format);
+    }
    else
        writeStringBinary("" /* empty string is a marker of the end of settings */, *out);

@ -612,7 +616,7 @@ std::optional<UInt64> Connection::checkPacket(size_t timeout_microseconds)
 }


-Connection::Packet Connection::receivePacket()
+Packet Connection::receivePacket()
 {
    try
    {
--- a/dbms/src/Client/Connection.h
+++ b/dbms/src/Client/Connection.h
@ -42,6 +42,21 @@ using ConnectionPtr = std::shared_ptr<Connection>;
 using Connections = std::vector<ConnectionPtr>;


+/// Packet that could be received from server.
+struct Packet
+{
+    UInt64 type;
+
+    Block block;
+    std::unique_ptr<Exception> exception;
+    std::vector<String> multistring_message;
+    Progress progress;
+    BlockStreamProfileInfo profile_info;
+
+    Packet() : type(Protocol::Server::Hello) {}
+};
+
+
 /** Connection with database server, to use by client.
  * How to use - see Core/Protocol.h
  * (Implementation of server end - see Server/TCPHandler.h)
@ -87,20 +102,6 @@ public:
    }


-    /// Packet that could be received from server.
-    struct Packet
-    {
-        UInt64 type;
-
-        Block block;
-        std::unique_ptr<Exception> exception;
-        std::vector<String> multistring_message;
-        Progress progress;
-        BlockStreamProfileInfo profile_info;
-
-        Packet() : type(Protocol::Server::Hello) {}
-    };
-
    /// Change default database. Changes will take effect on next reconnect.
    void setDefaultDatabase(const String & database);

--- a/dbms/src/Client/MultiplexedConnections.cpp
+++ b/dbms/src/Client/MultiplexedConnections.cpp
@ -138,10 +138,10 @@ void MultiplexedConnections::sendQuery(
    sent_query = true;
 }

-Connection::Packet MultiplexedConnections::receivePacket()
+Packet MultiplexedConnections::receivePacket()
 {
    std::lock_guard lock(cancel_mutex);
-    Connection::Packet packet = receivePacketUnlocked();
+    Packet packet = receivePacketUnlocked();
    return packet;
 }

@ -177,19 +177,19 @@ void MultiplexedConnections::sendCancel()
    cancelled = true;
 }

-Connection::Packet MultiplexedConnections::drain()
+Packet MultiplexedConnections::drain()
 {
    std::lock_guard lock(cancel_mutex);

    if (!cancelled)
        throw Exception("Cannot drain connections: cancel first.", ErrorCodes::LOGICAL_ERROR);

-    Connection::Packet res;
+    Packet res;
    res.type = Protocol::Server::EndOfStream;

    while (hasActiveConnections())
    {
-        Connection::Packet packet = receivePacketUnlocked();
+        Packet packet = receivePacketUnlocked();

        switch (packet.type)
        {
@ -235,7 +235,7 @@ std::string MultiplexedConnections::dumpAddressesUnlocked() const
    return os.str();
 }

-Connection::Packet MultiplexedConnections::receivePacketUnlocked()
+Packet MultiplexedConnections::receivePacketUnlocked()
 {
    if (!sent_query)
        throw Exception("Cannot receive packets: no query sent.", ErrorCodes::LOGICAL_ERROR);
@ -247,7 +247,7 @@ Connection::Packet MultiplexedConnections::receivePacketUnlocked()
    if (current_connection == nullptr)
        throw Exception("Logical error: no available replica", ErrorCodes::NO_AVAILABLE_REPLICA);

-    Connection::Packet packet = current_connection->receivePacket();
+    Packet packet = current_connection->receivePacket();

    switch (packet.type)
    {
--- a/dbms/src/Client/MultiplexedConnections.h
+++ b/dbms/src/Client/MultiplexedConnections.h
@ -42,7 +42,7 @@ public:
        bool with_pending_data = false);

    /// Get packet from any replica.
-    Connection::Packet receivePacket();
+    Packet receivePacket();

    /// Break all active connections.
    void disconnect();
@ -54,7 +54,7 @@ public:
      * Returns EndOfStream if no exception has been received. Otherwise
      * returns the last received packet of type Exception.
      */
-    Connection::Packet drain();
+    Packet drain();

    /// Get the replica addresses as a string.
    std::string dumpAddresses() const;
@ -69,7 +69,7 @@ public:

 private:
    /// Internal version of `receivePacket` function without locking.
-    Connection::Packet receivePacketUnlocked();
+    Packet receivePacketUnlocked();

    /// Internal version of `dumpAddresses` function without locking.
    std::string dumpAddressesUnlocked() const;
--- a/dbms/src/Columns/ColumnConst.h
+++ b/dbms/src/Columns/ColumnConst.h
@ -105,6 +105,11 @@ public:
        return data->getFloat64(0);
    }

+    Float32 getFloat32(size_t) const override
+    {
+        return data->getFloat32(0);
+    }
+
    bool isNullAt(size_t) const override
    {
        return data->isNullAt(0);
@ -219,6 +224,7 @@ public:

    Field getField() const { return getDataColumn()[0]; }

+    /// The constant value. It is valid even if the size of the column is 0.
    template <typename T>
    T getValue() const { return getField().safeGet<NearestFieldType<T>>(); }
 };
--- a/dbms/src/Columns/ColumnDecimal.h
+++ b/dbms/src/Columns/ColumnDecimal.h
@ -144,7 +144,7 @@ public:
    }


-    void insert(const T value) { data.push_back(value); }
+    void insertValue(const T value) { data.push_back(value); }
    Container & getData() { return data; }
    const Container & getData() const { return data; }
    const T & getElement(size_t n) const { return data[n]; }
--- a/dbms/src/Columns/ColumnLowCardinality.h
+++ b/dbms/src/Columns/ColumnLowCardinality.h
@ -59,6 +59,7 @@ public:
    UInt64 getUInt(size_t n) const override { return getDictionary().getUInt(getIndexes().getUInt(n)); }
    Int64 getInt(size_t n) const override { return getDictionary().getInt(getIndexes().getUInt(n)); }
    Float64 getFloat64(size_t n) const override { return getDictionary().getInt(getIndexes().getFloat64(n)); }
+    Float32 getFloat32(size_t n) const override { return getDictionary().getInt(getIndexes().getFloat32(n)); }
    bool getBool(size_t n) const override { return getDictionary().getInt(getIndexes().getBool(n)); }
    bool isNullAt(size_t n) const override { return getDictionary().isNullAt(getIndexes().getUInt(n)); }
    ColumnPtr cut(size_t start, size_t length) const override
--- a/dbms/src/Columns/ColumnUnique.h
+++ b/dbms/src/Columns/ColumnUnique.h
@ -66,6 +66,7 @@ public:
    UInt64 getUInt(size_t n) const override { return getNestedColumn()->getUInt(n); }
    Int64 getInt(size_t n) const override { return getNestedColumn()->getInt(n); }
    Float64 getFloat64(size_t n) const override { return getNestedColumn()->getFloat64(n); }
+    Float32 getFloat32(size_t n) const override { return getNestedColumn()->getFloat32(n); }
    bool getBool(size_t n) const override { return getNestedColumn()->getBool(n); }
    bool isNullAt(size_t n) const override { return is_nullable && n == getNullValueIndex(); }
    StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
--- a/dbms/src/Columns/ColumnVector.cpp
+++ b/dbms/src/Columns/ColumnVector.cpp
@ -222,6 +222,12 @@ Float64 ColumnVector<T>::getFloat64(size_t n) const
    return static_cast<Float64>(data[n]);
 }

+template <typename T>
+Float32 ColumnVector<T>::getFloat32(size_t n) const
+{
+    return static_cast<Float32>(data[n]);
+}
+
 template <typename T>
 void ColumnVector<T>::insertRangeFrom(const IColumn & src, size_t start, size_t length)
 {
--- a/dbms/src/Columns/ColumnVector.h
+++ b/dbms/src/Columns/ColumnVector.h
@ -205,6 +205,7 @@ public:
    UInt64 get64(size_t n) const override;

    Float64 getFloat64(size_t n) const override;
+    Float32 getFloat32(size_t n) const override;

    UInt64 getUInt(size_t n) const override
    {
--- a/dbms/src/Columns/IColumn.h
+++ b/dbms/src/Columns/IColumn.h
@ -100,6 +100,11 @@ public:
        throw Exception("Method getFloat64 is not supported for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
    }

+    virtual Float32 getFloat32(size_t /*n*/) const
+    {
+        throw Exception("Method getFloat32 is not supported for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
+    }
+
    /** If column is numeric, return value of n-th element, casted to UInt64.
      * For NULL values of Nullable column it is allowed to return arbitrary value.
      * Otherwise throw an exception.
--- a/dbms/src/Columns/getLeastSuperColumn.cpp
+++ b/dbms/src/Columns/getLeastSuperColumn.cpp
@ -18,7 +18,7 @@ static bool sameConstants(const IColumn & a, const IColumn & b)
    return assert_cast<const ColumnConst &>(a).getField() == assert_cast<const ColumnConst &>(b).getField();
 }

-ColumnWithTypeAndName getLeastSuperColumn(std::vector<const ColumnWithTypeAndName *> columns)
+ColumnWithTypeAndName getLeastSuperColumn(const std::vector<const ColumnWithTypeAndName *> & columns)
 {
    if (columns.empty())
        throw Exception("Logical error: no src columns for supercolumn", ErrorCodes::LOGICAL_ERROR);
--- a/dbms/src/Columns/getLeastSuperColumn.h
+++ b/dbms/src/Columns/getLeastSuperColumn.h
@ -7,6 +7,6 @@ namespace DB
 {

 /// getLeastSupertype + related column changes
-ColumnWithTypeAndName getLeastSuperColumn(std::vector<const ColumnWithTypeAndName *> columns);
+ColumnWithTypeAndName getLeastSuperColumn(const std::vector<const ColumnWithTypeAndName *> & columns);

 }
--- a/dbms/src/Common/ErrorCodes.cpp
+++ b/dbms/src/Common/ErrorCodes.cpp
@ -464,12 +464,13 @@ namespace ErrorCodes
    extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY = 487;
    extern const int UNKNOWN_DICTIONARY = 488;
    extern const int INCORRECT_DICTIONARY_DEFINITION = 489;
+    extern const int CANNOT_FORMAT_DATETIME = 490;
+    extern const int UNACCEPTABLE_URL = 491;

    extern const int KEEPER_EXCEPTION = 999;
    extern const int POCO_EXCEPTION = 1000;
    extern const int STD_EXCEPTION = 1001;
    extern const int UNKNOWN_EXCEPTION = 1002;
-    extern const int METRIKA_OTHER_ERROR = 1003;

    extern const int CONDITIONAL_TREE_PARENT_NOT_FOUND = 2001;
    extern const int ILLEGAL_PROJECTION_MANIPULATOR = 2002;
--- a/dbms/src/Common/Exception.cpp
+++ b/dbms/src/Common/Exception.cpp
@ -261,7 +261,7 @@ std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool
        stream << "Code: " << e.code() << ", e.displayText() = " << text;

        if (with_stacktrace && !has_embedded_stack_trace)
-            stream << ", Stack trace:\n\n" << e.getStackTrace().toString();
+            stream << ", Stack trace (when copying this message, always include the lines below):\n\n" << e.getStackTrace().toString();
    }
    catch (...) {}

--- a/dbms/src/Common/Exception.h
+++ b/dbms/src/Common/Exception.h
@ -17,7 +17,6 @@ namespace DB
 namespace ErrorCodes
 {
    extern const int POCO_EXCEPTION;
-    extern const int METRIKA_OTHER_ERROR;
 }

 class Exception : public Poco::Exception
--- a/dbms/src/Common/HashTable/Hash.h
+++ b/dbms/src/Common/HashTable/Hash.h
@ -84,6 +84,23 @@ struct DefaultHash<T, std::enable_if_t<is_arithmetic_v<T>>>
    }
 };

+template <typename T>
+struct DefaultHash<T, std::enable_if_t<DB::IsDecimalNumber<T> && sizeof(T) <= 8>>
+{
+    size_t operator() (T key) const
+    {
+        return DefaultHash64<typename T::NativeType>(key);
+    }
+};
+
+template <typename T>
+struct DefaultHash<T, std::enable_if_t<DB::IsDecimalNumber<T> && sizeof(T) == 16>>
+{
+    size_t operator() (T key) const
+    {
+        return DefaultHash64<Int64>(key >> 64) ^ DefaultHash64<Int64>(key);
+    }
+};

 template <typename T> struct HashCRC32;

--- a/dbms/src/Common/PODArray.h
+++ b/dbms/src/Common/PODArray.h
@ -430,11 +430,11 @@ public:
    template <typename It1, typename It2>
    void insert(iterator it, It1 from_begin, It2 from_end)
    {
-        insertPrepare(from_begin, from_end);
-
        size_t bytes_to_copy = this->byte_size(from_end - from_begin);
        size_t bytes_to_move = (end() - it) * sizeof(T);

+        insertPrepare(from_begin, from_end);
+
        if (unlikely(bytes_to_move))
            memcpy(this->c_end + bytes_to_copy - bytes_to_move, this->c_end - bytes_to_move, bytes_to_move);

--- a/dbms/src/Common/RemoteHostFilter.cpp
+++ b/dbms/src/Common/RemoteHostFilter.cpp
@ -0,0 +1,62 @@
+#include <re2/re2.h>
+#include <Common/RemoteHostFilter.h>
+#include <Poco/URI.h>
+#include <Formats/FormatFactory.h>
+#include <Poco/Util/AbstractConfiguration.h>
+#include <Common/StringUtils/StringUtils.h>
+#include <Common/Exception.h>
+#include <IO/WriteHelpers.h>
+
+namespace DB
+{
+namespace ErrorCodes
+{
+    extern const int UNACCEPTABLE_URL;
+}
+
+void RemoteHostFilter::checkURL(const Poco::URI & uri) const
+{
+    if (!checkForDirectEntry(uri.getHost()) &&
+        !checkForDirectEntry(uri.getHost() + ":" + toString(uri.getPort())))
+        throw Exception("URL \"" + uri.toString() + "\" is not allowed in config.xml", ErrorCodes::UNACCEPTABLE_URL);
+}
+
+void RemoteHostFilter::checkHostAndPort(const std::string & host, const std::string & port) const
+{
+    if (!checkForDirectEntry(host) &&
+        !checkForDirectEntry(host + ":" + port))
+        throw Exception("URL \"" + host + ":" + port + "\" is not allowed in config.xml", ErrorCodes::UNACCEPTABLE_URL);
+}
+
+void RemoteHostFilter::setValuesFromConfig(const Poco::Util::AbstractConfiguration & config)
+{
+    if (config.has("remote_url_allow_hosts"))
+    {
+        std::vector<std::string> keys;
+        config.keys("remote_url_allow_hosts", keys);
+        for (auto key : keys)
+        {
+            if (startsWith(key, "host_regexp"))
+                regexp_hosts.push_back(config.getString("remote_url_allow_hosts." + key));
+            else if (startsWith(key, "host"))
+                primary_hosts.insert(config.getString("remote_url_allow_hosts." + key));
+        }
+    }
+}
+
+bool RemoteHostFilter::checkForDirectEntry(const std::string & str) const
+{
+    if (!primary_hosts.empty() || !regexp_hosts.empty())
+    {
+        if (primary_hosts.find(str) == primary_hosts.end())
+        {
+            for (size_t i = 0; i < regexp_hosts.size(); ++i)
+                if (re2::RE2::FullMatch(str, regexp_hosts[i]))
+                    return true;
+            return false;
+        }
+        return true;
+    }
+    return true;
+}
+}
--- a/dbms/src/Common/RemoteHostFilter.h
+++ b/dbms/src/Common/RemoteHostFilter.h
@ -0,0 +1,30 @@
+#pragma once
+
+#include <vector>
+#include <unordered_set>
+#include <Poco/URI.h>
+#include <Poco/Util/AbstractConfiguration.h>
+
+
+namespace DB
+{
+class RemoteHostFilter
+{
+/**
+ * This class checks if url is allowed.
+ * If primary_hosts and regexp_hosts are empty all urls are allowed.
+ */
+public:
+    void checkURL(const Poco::URI & uri) const; /// If URL not allowed in config.xml throw UNACCEPTABLE_URL Exception
+
+    void setValuesFromConfig(const Poco::Util::AbstractConfiguration & config);
+
+    void checkHostAndPort(const std::string & host, const std::string & port) const; /// Does the same as checkURL, but for host and port.
+
+private:
+    std::unordered_set<std::string> primary_hosts;      /// Allowed primary (<host>) URL from config.xml
+    std::vector<std::string> regexp_hosts;              /// Allowed regexp (<hots_regexp>) URL from config.xml
+
+    bool checkForDirectEntry(const std::string & str) const; /// Checks if the primary_hosts and regexp_hosts contain str. If primary_hosts and regexp_hosts are empty return true.
+};
+}
--- a/dbms/src/Common/StackTrace.cpp
+++ b/dbms/src/Common/StackTrace.cpp
@ -158,7 +158,7 @@ std::string signalToErrorMessage(int sig, const siginfo_t & info, const ucontext
            break;
        }

-        case SIGPROF:
+        case SIGTSTP:
        {
            error << "This is a signal used for debugging purposes by the user.";
            break;
--- a/dbms/src/Common/ThreadStatus.h
+++ b/dbms/src/Common/ThreadStatus.h
@ -4,7 +4,7 @@
 #include <Common/ProfileEvents.h>
 #include <Common/MemoryTracker.h>

-#include <Core/SettingsCommon.h>
+#include <Core/SettingsCollection.h>

 #include <IO/Progress.h>

--- a/dbms/src/Common/tests/gtest_pod_array.cpp
+++ b/dbms/src/Common/tests/gtest_pod_array.cpp
@ -0,0 +1,34 @@
+#include <gtest/gtest.h>
+
+#include <Common/PODArray.h>
+
+using namespace DB;
+
+TEST(Common, PODArray_Insert)
+{
+    std::string str = "test_string_abacaba";
+    PODArray<char> chars;
+    chars.insert(chars.end(), str.begin(), str.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+
+    std::string insert_in_the_middle = "insert_in_the_middle";
+    auto pos = str.size() / 2;
+    str.insert(str.begin() + pos, insert_in_the_middle.begin(), insert_in_the_middle.end());
+    chars.insert(chars.begin() + pos, insert_in_the_middle.begin(), insert_in_the_middle.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+
+    std::string insert_with_resize;
+    insert_with_resize.reserve(chars.capacity() * 2);
+    char cur_char = 'a';
+    while (insert_with_resize.size() < insert_with_resize.capacity())
+    {
+        insert_with_resize += cur_char;
+        if (cur_char == 'z')
+            cur_char = 'a';
+        else
+            ++cur_char;
+    }
+    str.insert(str.begin(), insert_with_resize.begin(), insert_with_resize.end());
+    chars.insert(chars.begin(), insert_with_resize.begin(), insert_with_resize.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+}
--- a/dbms/src/Core/Defines.h
+++ b/dbms/src/Core/Defines.h
@ -59,9 +59,11 @@
 #define DBMS_MIN_REVISION_WITH_COLUMN_DEFAULTS_METADATA 54410

 #define DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE 54405
-
 #define DBMS_MIN_REVISION_WITH_CLIENT_WRITE_INFO 54420

+/// Mininum revision supporting SettingsBinaryFormat::STRINGS.
+#define DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS 54429
+
 /// Version of ClickHouse TCP protocol. Set to git tag with latest protocol change.
 #define DBMS_TCP_PROTOCOL_VERSION 54226

@ -148,9 +150,9 @@
    #define OPTIMIZE(x)
 #endif

-/// This number is only used for distributed version compatible.
-/// It could be any magic number.
-#define DBMS_DISTRIBUTED_SENDS_MAGIC_NUMBER 0xCAFECABE
+/// Marks that extra information is sent to a shard. It could be any magic numbers.
+#define DBMS_DISTRIBUTED_SIGNATURE_EXTRA_INFO 0xCAFEDACEull
+#define DBMS_DISTRIBUTED_SIGNATURE_SETTINGS_OLD_FORMAT 0xCAFECABEull

 #if !__has_include(<sanitizer/asan_interface.h>)
 #   define ASAN_UNPOISON_MEMORY_REGION(a, b)
--- a/dbms/src/Core/MySQLProtocol.cpp
+++ b/dbms/src/Core/MySQLProtocol.cpp
@ -100,4 +100,71 @@ size_t getLengthEncodedStringSize(const String & s)
    return getLengthEncodedNumberSize(s.size()) + s.size();
 }

+ColumnDefinition getColumnDefinition(const String & column_name, const TypeIndex type_index)
+{
+    ColumnType column_type;
+    int flags = 0;
+    switch (type_index)
+    {
+        case TypeIndex::UInt8:
+            column_type = ColumnType::MYSQL_TYPE_TINY;
+            flags = ColumnDefinitionFlags::BINARY_FLAG | ColumnDefinitionFlags::UNSIGNED_FLAG;
+            break;
+        case TypeIndex::UInt16:
+            column_type = ColumnType::MYSQL_TYPE_SHORT;
+            flags = ColumnDefinitionFlags::BINARY_FLAG | ColumnDefinitionFlags::UNSIGNED_FLAG;
+            break;
+        case TypeIndex::UInt32:
+            column_type = ColumnType::MYSQL_TYPE_LONG;
+            flags = ColumnDefinitionFlags::BINARY_FLAG | ColumnDefinitionFlags::UNSIGNED_FLAG;
+            break;
+        case TypeIndex::UInt64:
+            column_type = ColumnType::MYSQL_TYPE_LONGLONG;
+            flags = ColumnDefinitionFlags::BINARY_FLAG | ColumnDefinitionFlags::UNSIGNED_FLAG;
+            break;
+        case TypeIndex::Int8:
+            column_type = ColumnType::MYSQL_TYPE_TINY;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Int16:
+            column_type = ColumnType::MYSQL_TYPE_SHORT;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Int32:
+            column_type = ColumnType::MYSQL_TYPE_LONG;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Int64:
+            column_type = ColumnType::MYSQL_TYPE_LONGLONG;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Float32:
+            column_type = ColumnType::MYSQL_TYPE_FLOAT;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Float64:
+            column_type = ColumnType::MYSQL_TYPE_TINY;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::Date:
+            column_type = ColumnType::MYSQL_TYPE_DATE;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::DateTime:
+            column_type = ColumnType::MYSQL_TYPE_DATETIME;
+            flags = ColumnDefinitionFlags::BINARY_FLAG;
+            break;
+        case TypeIndex::String:
+            column_type = ColumnType::MYSQL_TYPE_STRING;
+            break;
+        case TypeIndex::FixedString:
+            column_type = ColumnType::MYSQL_TYPE_STRING;
+            break;
+        default:
+            column_type = ColumnType::MYSQL_TYPE_STRING;
+            break;
+    }
+    return ColumnDefinition(column_name, CharacterSet::binary, 0, column_type, flags, 0);
+}
+
 }
--- a/dbms/src/Core/MySQLProtocol.h
+++ b/dbms/src/Core/MySQLProtocol.h
@ -130,6 +130,14 @@ enum ColumnType
 };


+// https://dev.mysql.com/doc/dev/mysql-server/latest/group__group__cs__column__definition__flags.html
+enum ColumnDefinitionFlags
+{
+    UNSIGNED_FLAG = 32,
+    BINARY_FLAG = 128
+};
+
+
 class ProtocolError : public DB::Exception
 {
 public:
@ -824,19 +832,40 @@ protected:
    }
 };

+
+ColumnDefinition getColumnDefinition(const String & column_name, const TypeIndex index);
+
+
+namespace ProtocolText
+{
+
 class ResultsetRow : public WritePacket
 {
-    std::vector<String> columns;
+    const Columns & columns;
+    int row_num;
    size_t payload_size = 0;
+    std::vector<String> serialized;
 public:
-    ResultsetRow() = default;
-
-    void appendColumn(String && value)
+    ResultsetRow(const DataTypes & data_types, const Columns & columns_, int row_num_)
+        : columns(columns_)
+        , row_num(row_num_)
    {
-        payload_size += getLengthEncodedStringSize(value);
-        columns.emplace_back(std::move(value));
+        for (size_t i = 0; i < columns.size(); i++)
+        {
+            if (columns[i]->isNullAt(row_num))
+            {
+                payload_size += 1;
+                serialized.emplace_back("\xfb");
+            }
+            else
+            {
+                WriteBufferFromOwnString ostr;
+                data_types[i]->serializeAsText(*columns[i], row_num, ostr, FormatSettings());
+                payload_size += getLengthEncodedStringSize(ostr.str());
+                serialized.push_back(std::move(ostr.str()));
+            }
+        }
    }
-
 protected:
    size_t getPayloadSize() const override
    {
@ -845,11 +874,18 @@ protected:

    void writePayloadImpl(WriteBuffer & buffer) const override
    {
-        for (const String & column : columns)
-            writeLengthEncodedString(column, buffer);
+        for (size_t i = 0; i < columns.size(); i++)
+        {
+            if (columns[i]->isNullAt(row_num))
+                buffer.write(serialized[i].data(), 1);
+            else
+                writeLengthEncodedString(serialized[i], buffer);
+        }
    }
 };

+}
+
 namespace Authentication
 {

--- a/dbms/src/Core/Settings.h
+++ b/dbms/src/Core/Settings.h
@ -1,6 +1,6 @@
 #pragma once

-#include "SettingsCommon.h"
+#include <Core/SettingsCollection.h>
 #include <Core/Defines.h>


@ -35,219 +35,226 @@ struct Settings : public SettingsCollection<Settings>
    /// http://en.cppreference.com/w/cpp/language/aggregate_initialization
    Settings() {}

-    /** List of settings: type, name, default value.
+    /** List of settings: type, name, default value, description, flags
      *
      * This looks rather unconvenient. It is done that way to avoid repeating settings in different places.
      * Note: as an alternative, we could implement settings to be completely dynamic in form of map: String -> Field,
      *  but we are not going to do it, because settings is used everywhere as static struct fields.
+      *
+      * `flags` can be either 0 or IMPORTANT.
+      * A setting is "IMPORTANT" if it affects the results of queries and can't be ignored by older versions.
      */

 #define LIST_OF_SETTINGS(M)                                            \
-    M(SettingUInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.") \
-    M(SettingUInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.") \
-    M(SettingUInt64, max_block_size, DEFAULT_BLOCK_SIZE, "Maximum block size for reading") \
-    M(SettingUInt64, max_insert_block_size, DEFAULT_INSERT_BLOCK_SIZE, "The maximum block size for insertion, if we control the creation of blocks for insertion.") \
-    M(SettingUInt64, min_insert_block_size_rows, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.") \
-    M(SettingUInt64, min_insert_block_size_bytes, (DEFAULT_INSERT_BLOCK_SIZE * 256), "Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.") \
-    M(SettingMaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.") \
-    M(SettingMaxThreads, max_alter_threads, 0, "The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.") \
-    M(SettingUInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.") \
-    M(SettingUInt64, max_distributed_connections, 1024, "The maximum number of connections for distributed processing of one query (should be greater than max_threads).") \
-    M(SettingUInt64, max_query_size, 262144, "Which part of the query can be read into RAM for parsing (the remaining data for INSERT, if any, is read later)") \
-    M(SettingUInt64, interactive_delay, 100000, "The interval in microseconds to check if the request is cancelled, and to send progress info.") \
-    M(SettingSeconds, connect_timeout, DBMS_DEFAULT_CONNECT_TIMEOUT_SEC, "Connection timeout if there are no replicas.") \
-    M(SettingMilliseconds, connect_timeout_with_failover_ms, DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_MS, "Connection timeout for selecting first healthy replica.") \
-    M(SettingSeconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "") \
-    M(SettingSeconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "") \
-    M(SettingSeconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes") \
-    M(SettingMilliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.") \
-    M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when connection pool is full.") \
-    M(SettingMilliseconds, replace_running_query_max_wait_ms, 5000, "The wait time for running query with the same query_id to finish when setting 'replace_running_query' is active.") \
-    M(SettingMilliseconds, kafka_max_wait_ms, 5000, "The wait time for reading from Kafka before retry.") \
-    M(SettingUInt64, poll_interval, DBMS_DEFAULT_POLL_INTERVAL, "Block at the query wait loop on the server for the specified number of seconds.") \
-    M(SettingUInt64, idle_connection_timeout, 3600, "Close idle TCP connections after specified number of seconds.") \
-    M(SettingUInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.") \
-    M(SettingUInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.") \
-    M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The mininum size of part to upload during multipart upload to S3.") \
-    M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.") \
-    M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.") \
-    M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.") \
-    M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.") \
-    M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.") \
+    M(SettingUInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.", 0) \
+    M(SettingUInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.", 0) \
+    M(SettingUInt64, max_block_size, DEFAULT_BLOCK_SIZE, "Maximum block size for reading", 0) \
+    M(SettingUInt64, max_insert_block_size, DEFAULT_INSERT_BLOCK_SIZE, "The maximum block size for insertion, if we control the creation of blocks for insertion.", 0) \
+    M(SettingUInt64, min_insert_block_size_rows, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.", 0) \
+    M(SettingUInt64, min_insert_block_size_bytes, (DEFAULT_INSERT_BLOCK_SIZE * 256), "Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.", 0) \
+    M(SettingMaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
+    M(SettingMaxThreads, max_alter_threads, 0, "The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.", 0) \
+    M(SettingUInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
+    M(SettingUInt64, max_distributed_connections, 1024, "The maximum number of connections for distributed processing of one query (should be greater than max_threads).", 0) \
+    M(SettingUInt64, max_query_size, 262144, "Which part of the query can be read into RAM for parsing (the remaining data for INSERT, if any, is read later)", 0) \
+    M(SettingUInt64, interactive_delay, 100000, "The interval in microseconds to check if the request is cancelled, and to send progress info.", 0) \
+    M(SettingSeconds, connect_timeout, DBMS_DEFAULT_CONNECT_TIMEOUT_SEC, "Connection timeout if there are no replicas.", 0) \
+    M(SettingMilliseconds, connect_timeout_with_failover_ms, DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_MS, "Connection timeout for selecting first healthy replica.", 0) \
+    M(SettingSeconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "", 0) \
+    M(SettingSeconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "", 0) \
+    M(SettingSeconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes", 0) \
+    M(SettingMilliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.", 0) \
+    M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when connection pool is full.", 0) \
+    M(SettingMilliseconds, replace_running_query_max_wait_ms, 5000, "The wait time for running query with the same query_id to finish when setting 'replace_running_query' is active.", 0) \
+    M(SettingMilliseconds, kafka_max_wait_ms, 5000, "The wait time for reading from Kafka before retry.", 0) \
+    M(SettingUInt64, poll_interval, DBMS_DEFAULT_POLL_INTERVAL, "Block at the query wait loop on the server for the specified number of seconds.", 0) \
+    M(SettingUInt64, idle_connection_timeout, 3600, "Close idle TCP connections after specified number of seconds.", 0) \
+    M(SettingUInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.", 0) \
+    M(SettingUInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.", 0) \
+    M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The mininum size of part to upload during multipart upload to S3.", 0) \
+    M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", IMPORTANT) \
+    M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \
+    M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.", 0) \
+    M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.", 0) \
+    M(SettingUInt64, background_move_pool_size, 8, "Number of threads performing background moves for tables. Only has meaning at server startup.", 0) \
+    M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.", 0) \
    \
-    M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.") \
-    M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.") \
+    M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.", 0) \
+    M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.", 0) \
    \
-    M(SettingBool, distributed_directory_monitor_batch_inserts, false, "Should StorageDistributed DirectoryMonitors try to batch individual inserts into bigger ones.") \
+    M(SettingBool, distributed_directory_monitor_batch_inserts, false, "Should StorageDistributed DirectoryMonitors try to batch individual inserts into bigger ones.", 0) \
    \
-    M(SettingBool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.") \
+    M(SettingBool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.", 0) \
    \
-    M(SettingUInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.") \
-    M(SettingUInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.") \
+    M(SettingUInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.", 0) \
+    M(SettingUInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.", 0) \
    \
-    M(SettingLoadBalancing, load_balancing, LoadBalancing::RANDOM, "Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.") \
+    M(SettingLoadBalancing, load_balancing, LoadBalancing::RANDOM, "Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.", 0) \
    \
-    M(SettingTotalsMode, totals_mode, TotalsMode::AFTER_HAVING_EXCLUSIVE, "How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = ‘any’ are present.") \
-    M(SettingFloat, totals_auto_threshold, 0.5, "The threshold for totals_mode = 'auto'.") \
+    M(SettingTotalsMode, totals_mode, TotalsMode::AFTER_HAVING_EXCLUSIVE, "How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = ‘any’ are present.", IMPORTANT) \
+    M(SettingFloat, totals_auto_threshold, 0.5, "The threshold for totals_mode = 'auto'.", 0) \
    \
-    M(SettingBool, allow_suspicious_low_cardinality_types, false, "In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.") \
-    M(SettingBool, compile_expressions, false, "Compile some scalar functions and operators to native code.") \
-    M(SettingUInt64, min_count_to_compile, 3, "The number of structurally identical queries before they are compiled.") \
-    M(SettingUInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled") \
-    M(SettingUInt64, group_by_two_level_threshold, 100000, "From what number of keys, a two-level aggregation starts. 0 - the threshold is not set.") \
-    M(SettingUInt64, group_by_two_level_threshold_bytes, 100000000, "From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.") \
-    M(SettingBool, distributed_aggregation_memory_efficient, false, "Is the memory-saving mode of distributed aggregation enabled.") \
-    M(SettingUInt64, aggregation_memory_efficient_merge_threads, 0, "Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'.") \
+    M(SettingBool, allow_suspicious_low_cardinality_types, false, "In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.", 0) \
+    M(SettingBool, compile_expressions, false, "Compile some scalar functions and operators to native code.", 0) \
+    M(SettingUInt64, min_count_to_compile, 3, "The number of structurally identical queries before they are compiled.", 0) \
+    M(SettingUInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled", 0) \
+    M(SettingUInt64, group_by_two_level_threshold, 100000, "From what number of keys, a two-level aggregation starts. 0 - the threshold is not set.", 0) \
+    M(SettingUInt64, group_by_two_level_threshold_bytes, 100000000, "From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.", 0) \
+    M(SettingBool, distributed_aggregation_memory_efficient, false, "Is the memory-saving mode of distributed aggregation enabled.", 0) \
+    M(SettingUInt64, aggregation_memory_efficient_merge_threads, 0, "Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'.", 0) \
    \
-    M(SettingUInt64, max_parallel_replicas, 1, "The maximum number of replicas of each shard used when the query is executed. For consistency (to get different parts of the same partition), this option only works for the specified sampling key. The lag of the replicas is not controlled.") \
-    M(SettingUInt64, parallel_replicas_count, 0, "") \
-    M(SettingUInt64, parallel_replica_offset, 0, "") \
+    M(SettingUInt64, max_parallel_replicas, 1, "The maximum number of replicas of each shard used when the query is executed. For consistency (to get different parts of the same partition), this option only works for the specified sampling key. The lag of the replicas is not controlled.", 0) \
+    M(SettingUInt64, parallel_replicas_count, 0, "", 0) \
+    M(SettingUInt64, parallel_replica_offset, 0, "", 0) \
    \
-    M(SettingBool, skip_unavailable_shards, false, "If 1, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.") \
+    M(SettingBool, skip_unavailable_shards, false, "If 1, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
    \
-    M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.") \
-    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.") \
+    M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.", 0) \
+    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.", 0) \
    \
-    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.") \
-    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.") \
-    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.") \
-    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.") \
-    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.") \
-    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (128 * 8192), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
-    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (192 * 10 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
+    M(SettingBool, input_format_parallel_parsing, true, "Enable parallel parsing for some data formats.", 0) \
+    M(SettingUInt64, min_chunk_bytes_for_parallel_parsing, (1024 * 1024), "The minimum chunk size in bytes, which each thread will parse in parallel.", 0) \
    \
-    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.") \
+    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.", 0) \
+    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.", 0) \
+    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.", 0) \
+    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.", 0) \
+    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.", 0) \
+    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (128 * 8192), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)", 0) \
+    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (192 * 10 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)", 0) \
    \
-    M(SettingUInt64, mysql_max_rows_to_insert, 65536, "The maximum number of rows in MySQL batch insertion of the MySQL storage engine") \
+    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.", 0) \
    \
-    M(SettingUInt64, optimize_min_equality_disjunction_chain_length, 3, "The minimum length of the expression `expr = x1 OR ... expr = xN` for optimization ") \
+    M(SettingUInt64, mysql_max_rows_to_insert, 65536, "The maximum number of rows in MySQL batch insertion of the MySQL storage engine", 0) \
    \
-    M(SettingUInt64, min_bytes_to_use_direct_io, 0, "The minimum number of bytes for reading the data with O_DIRECT option during SELECT queries execution. 0 - disabled.") \
+    M(SettingUInt64, optimize_min_equality_disjunction_chain_length, 3, "The minimum length of the expression `expr = x1 OR ... expr = xN` for optimization ", 0) \
    \
-    M(SettingBool, force_index_by_date, 0, "Throw an exception if there is a partition key in a table, and it is not used.") \
-    M(SettingBool, force_primary_key, 0, "Throw an exception if there is primary key in a table, and it is not used.") \
+    M(SettingUInt64, min_bytes_to_use_direct_io, 0, "The minimum number of bytes for reading the data with O_DIRECT option during SELECT queries execution. 0 - disabled.", 0) \
    \
-    M(SettingUInt64, mark_cache_min_lifetime, 10000, "If the maximum size of mark_cache is exceeded, delete only records older than mark_cache_min_lifetime seconds.") \
+    M(SettingBool, force_index_by_date, 0, "Throw an exception if there is a partition key in a table, and it is not used.", 0) \
+    M(SettingBool, force_primary_key, 0, "Throw an exception if there is primary key in a table, and it is not used.", 0) \
    \
-    M(SettingFloat, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.") \
-    M(SettingFloat, max_streams_multiplier_for_merge_tables, 5, "Ask more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and especially helpful when merged tables differ in size.") \
+    M(SettingUInt64, mark_cache_min_lifetime, 10000, "If the maximum size of mark_cache is exceeded, delete only records older than mark_cache_min_lifetime seconds.", 0) \
    \
-    M(SettingString, network_compression_method, "LZ4", "Allows you to select the method of data compression when writing.") \
+    M(SettingFloat, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.", 0) \
+    M(SettingFloat, max_streams_multiplier_for_merge_tables, 5, "Ask more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and especially helpful when merged tables differ in size.", 0) \
    \
-    M(SettingInt64, network_zstd_compression_level, 1, "Allows you to select the level of ZSTD compression.") \
+    M(SettingString, network_compression_method, "LZ4", "Allows you to select the method of data compression when writing.", 0) \
    \
-    M(SettingUInt64, priority, 0, "Priority of the query. 1 - the highest, higher value - lower priority; 0 - do not use priorities.") \
-    M(SettingInt64, os_thread_priority, 0, "If non zero - set corresponding 'nice' value for query processing threads. Can be used to adjust query priority for OS scheduler.") \
+    M(SettingInt64, network_zstd_compression_level, 1, "Allows you to select the level of ZSTD compression.", 0) \
    \
-    M(SettingBool, log_queries, 0, "Log requests and write the log to the system table.") \
+    M(SettingUInt64, priority, 0, "Priority of the query. 1 - the highest, higher value - lower priority; 0 - do not use priorities.", 0) \
+    M(SettingInt64, os_thread_priority, 0, "If non zero - set corresponding 'nice' value for query processing threads. Can be used to adjust query priority for OS scheduler.", 0) \
    \
-    M(SettingUInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.") \
+    M(SettingBool, log_queries, 0, "Log requests and write the log to the system table.", 0) \
    \
-    M(SettingDistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?") \
+    M(SettingUInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.", 0) \
    \
-    M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.") \
+    M(SettingDistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?", IMPORTANT) \
    \
-    M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed") \
+    M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.", 0) \
    \
-    M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.") \
-    M(SettingMilliseconds, insert_quorum_timeout, 600000, "") \
-    M(SettingUInt64, select_sequential_consistency, 0, "For SELECT queries from the replicated table, throw an exception if the replica does not have a chunk written with the quorum; do not read the parts that have not yet been written with the quorum.") \
-    M(SettingUInt64, table_function_remote_max_addresses, 1000, "The maximum number of different shards and the maximum number of replicas of one shard in the `remote` function.") \
-    M(SettingMilliseconds, read_backoff_min_latency_ms, 1000, "Setting to reduce the number of threads in case of slow reads. Pay attention only to reads that took at least that much time.") \
-    M(SettingUInt64, read_backoff_max_throughput, 1048576, "Settings to reduce the number of threads in case of slow reads. Count events when the read bandwidth is less than that many bytes per second.") \
-    M(SettingMilliseconds, read_backoff_min_interval_between_events_ms, 1000, "Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time.") \
-    M(SettingUInt64, read_backoff_min_events, 2, "Settings to reduce the number of threads in case of slow reads. The number of events after which the number of threads will be reduced.") \
+    M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed", 0) \
    \
-    M(SettingFloat, memory_tracker_fault_probability, 0., "For testing of `exception safety` - throw an exception every time you allocate memory with the specified probability.") \
+    M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.", 0) \
+    M(SettingMilliseconds, insert_quorum_timeout, 600000, "", 0) \
+    M(SettingUInt64, select_sequential_consistency, 0, "For SELECT queries from the replicated table, throw an exception if the replica does not have a chunk written with the quorum; do not read the parts that have not yet been written with the quorum.", 0) \
+    M(SettingUInt64, table_function_remote_max_addresses, 1000, "The maximum number of different shards and the maximum number of replicas of one shard in the `remote` function.", 0) \
+    M(SettingMilliseconds, read_backoff_min_latency_ms, 1000, "Setting to reduce the number of threads in case of slow reads. Pay attention only to reads that took at least that much time.", 0) \
+    M(SettingUInt64, read_backoff_max_throughput, 1048576, "Settings to reduce the number of threads in case of slow reads. Count events when the read bandwidth is less than that many bytes per second.", 0) \
+    M(SettingMilliseconds, read_backoff_min_interval_between_events_ms, 1000, "Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time.", 0) \
+    M(SettingUInt64, read_backoff_min_events, 2, "Settings to reduce the number of threads in case of slow reads. The number of events after which the number of threads will be reduced.", 0) \
    \
-    M(SettingBool, enable_http_compression, 0, "Compress the result if the client over HTTP said that it understands data compressed by gzip or deflate.") \
-    M(SettingInt64, http_zlib_compression_level, 3, "Compression level - used if the client on HTTP said that it understands data compressed by gzip or deflate.") \
+    M(SettingFloat, memory_tracker_fault_probability, 0., "For testing of `exception safety` - throw an exception every time you allocate memory with the specified probability.", 0) \
    \
-    M(SettingBool, http_native_compression_disable_checksumming_on_decompress, 0, "If you uncompress the POST data from the client compressed by the native format, do not check the checksum.") \
+    M(SettingBool, enable_http_compression, 0, "Compress the result if the client over HTTP said that it understands data compressed by gzip or deflate.", 0) \
+    M(SettingInt64, http_zlib_compression_level, 3, "Compression level - used if the client on HTTP said that it understands data compressed by gzip or deflate.", 0) \
    \
-    M(SettingString, count_distinct_implementation, "uniqExact", "What aggregate function to use for implementation of count(DISTINCT ...)") \
+    M(SettingBool, http_native_compression_disable_checksumming_on_decompress, 0, "If you uncompress the POST data from the client compressed by the native format, do not check the checksum.", 0) \
    \
-    M(SettingBool, output_format_write_statistics, true, "Write statistics about read rows, bytes, time elapsed in suitable output formats.") \
+    M(SettingString, count_distinct_implementation, "uniqExact", "What aggregate function to use for implementation of count(DISTINCT ...)", 0) \
    \
-    M(SettingBool, add_http_cors_header, false, "Write add http CORS header.") \
+    M(SettingBool, output_format_write_statistics, true, "Write statistics about read rows, bytes, time elapsed in suitable output formats.", 0) \
    \
-    M(SettingUInt64, max_http_get_redirects, 0, "Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.") \
+    M(SettingBool, add_http_cors_header, false, "Write add http CORS header.", 0) \
    \
-    M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow, CSVWithNames, TSVWithNames and TSKV formats).") \
-    M(SettingBool, input_format_with_names_use_header, false, "For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.") \
-    M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).") \
-    M(SettingBool, input_format_defaults_for_omitted_fields, true, "For input data calculate default expressions for omitted fields (it works for JSONEachRow, CSV and TSV formats).") \
-    M(SettingBool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.") \
-    M(SettingBool, input_format_null_as_default, false, "For text input formats initialize null fields with default values if data type of this field is not nullable") \
+    M(SettingUInt64, max_http_get_redirects, 0, "Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.", 0) \
    \
-    M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.") \
-    M(SettingBool, input_format_values_deduce_templates_of_expressions, false, "For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.") \
-    M(SettingBool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.") \
+    M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow, CSVWithNames, TSVWithNames and TSKV formats).", 0) \
+    M(SettingBool, input_format_with_names_use_header, false, "For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.", 0) \
+    M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).", 0) \
+    M(SettingBool, input_format_defaults_for_omitted_fields, true, "For input data calculate default expressions for omitted fields (it works for JSONEachRow, CSV and TSV formats).", IMPORTANT) \
+    M(SettingBool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.", 0) \
+    M(SettingBool, input_format_null_as_default, false, "For text input formats initialize null fields with default values if data type of this field is not nullable", 0) \
    \
-    M(SettingBool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.") \
+    M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.", 0) \
+    M(SettingBool, input_format_values_deduce_templates_of_expressions, false, "For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.", 0) \
+    M(SettingBool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.", 0) \
    \
-    M(SettingBool, output_format_json_quote_denormals, false, "Enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format.") \
+    M(SettingBool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.", 0) \
    \
-    M(SettingBool, output_format_json_escape_forward_slashes, true, "Controls escaping forward slashes for string outputs in JSON output format. This is intended for compatibility with JavaScript. Don't confuse with backslashes that are always escaped.") \
+    M(SettingBool, output_format_json_quote_denormals, false, "Enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format.", 0) \
    \
-    M(SettingUInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.") \
-    M(SettingUInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.") \
-    M(SettingBool, output_format_pretty_color, true, "Use ANSI escape sequences to paint colors in Pretty formats") \
-    M(SettingUInt64, output_format_parquet_row_group_size, 1000000, "Row group size in rows.") \
+    M(SettingBool, output_format_json_escape_forward_slashes, true, "Controls escaping forward slashes for string outputs in JSON output format. This is intended for compatibility with JavaScript. Don't confuse with backslashes that are always escaped.", 0) \
    \
-    M(SettingBool, use_client_time_zone, false, "Use client timezone for interpreting DateTime string values, instead of adopting server timezone.") \
+    M(SettingUInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.", 0) \
+    M(SettingUInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.", 0) \
+    M(SettingBool, output_format_pretty_color, true, "Use ANSI escape sequences to paint colors in Pretty formats", 0) \
+    M(SettingUInt64, output_format_parquet_row_group_size, 1000000, "Row group size in rows.", 0) \
    \
-    M(SettingBool, send_progress_in_http_headers, false, "Send progress notifications using X-ClickHouse-Progress headers. Some clients do not support high amount of HTTP headers (Python requests in particular), so it is disabled by default.") \
+    M(SettingBool, use_client_time_zone, false, "Use client timezone for interpreting DateTime string values, instead of adopting server timezone.", 0) \
    \
-    M(SettingUInt64, http_headers_progress_interval_ms, 100, "Do not send HTTP headers X-ClickHouse-Progress more frequently than at each specified interval.") \
+    M(SettingBool, send_progress_in_http_headers, false, "Send progress notifications using X-ClickHouse-Progress headers. Some clients do not support high amount of HTTP headers (Python requests in particular), so it is disabled by default.", 0) \
    \
-    M(SettingBool, fsync_metadata, 1, "Do fsync after changing metadata for tables and databases (.sql files). Could be disabled in case of poor latency on server with high load of DDL queries and high load of disk subsystem.") \
+    M(SettingUInt64, http_headers_progress_interval_ms, 100, "Do not send HTTP headers X-ClickHouse-Progress more frequently than at each specified interval.", 0) \
    \
-    M(SettingUInt64, input_format_allow_errors_num, 0, "Maximum absolute amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.") \
-    M(SettingFloat, input_format_allow_errors_ratio, 0, "Maximum relative amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.") \
+    M(SettingBool, fsync_metadata, 1, "Do fsync after changing metadata for tables and databases (.sql files). Could be disabled in case of poor latency on server with high load of DDL queries and high load of disk subsystem.", 0) \
    \
-    M(SettingBool, join_use_nulls, 0, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.") \
+    M(SettingUInt64, input_format_allow_errors_num, 0, "Maximum absolute amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \
+    M(SettingFloat, input_format_allow_errors_ratio, 0, "Maximum relative amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \
    \
-    M(SettingJoinStrictness, join_default_strictness, JoinStrictness::ALL, "Set default strictness in JOIN query. Possible values: empty string, 'ANY', 'ALL'. If empty, query without strictness will throw exception.") \
-    M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.") \
+    M(SettingBool, join_use_nulls, 0, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.", IMPORTANT) \
    \
-    M(SettingUInt64, preferred_block_size_bytes, 1000000, "") \
+    M(SettingJoinStrictness, join_default_strictness, JoinStrictness::ALL, "Set default strictness in JOIN query. Possible values: empty string, 'ANY', 'ALL'. If empty, query without strictness will throw exception.", 0) \
+    M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.", IMPORTANT) \
    \
-    M(SettingUInt64, max_replica_delay_for_distributed_queries, 300, "If set, distributed queries of Replicated tables will choose servers with replication delay in seconds less than the specified value (not inclusive). Zero means do not take delay into account.") \
-    M(SettingBool, fallback_to_stale_replicas_for_distributed_queries, 1, "Suppose max_replica_delay_for_distributed_queries is set and all replicas for the queried table are stale. If this setting is enabled, the query will be performed anyway, otherwise the error will be reported.") \
-    M(SettingUInt64, preferred_max_column_in_block_size_bytes, 0, "Limit on max column size in block while reading. Helps to decrease cache misses count. Should be close to L2 cache size.") \
+    M(SettingUInt64, preferred_block_size_bytes, 1000000, "", 0) \
    \
-    M(SettingBool, insert_distributed_sync, false, "If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster.") \
-    M(SettingUInt64, insert_distributed_timeout, 0, "Timeout for insert query into distributed. Setting is used only with insert_distributed_sync enabled. Zero value means no timeout.") \
-    M(SettingInt64, distributed_ddl_task_timeout, 180, "Timeout for DDL query responses from all hosts in cluster. If a ddl request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode. Negative value means infinite.") \
-    M(SettingMilliseconds, stream_flush_interval_ms, 7500, "Timeout for flushing data from streaming storages.") \
-    M(SettingMilliseconds, stream_poll_timeout_ms, 500, "Timeout for polling data from/to streaming storages.") \
+    M(SettingUInt64, max_replica_delay_for_distributed_queries, 300, "If set, distributed queries of Replicated tables will choose servers with replication delay in seconds less than the specified value (not inclusive). Zero means do not take delay into account.", 0) \
+    M(SettingBool, fallback_to_stale_replicas_for_distributed_queries, 1, "Suppose max_replica_delay_for_distributed_queries is set and all replicas for the queried table are stale. If this setting is enabled, the query will be performed anyway, otherwise the error will be reported.", 0) \
+    M(SettingUInt64, preferred_max_column_in_block_size_bytes, 0, "Limit on max column size in block while reading. Helps to decrease cache misses count. Should be close to L2 cache size.", 0) \
    \
-    M(SettingString, format_schema, "", "Schema identifier (used by schema-based formats)") \
-    M(SettingString, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)") \
-    M(SettingString, format_template_row, "", "Path to file which contains format string for rows (for Template format)") \
-    M(SettingString, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)") \
+    M(SettingBool, insert_distributed_sync, false, "If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster.", 0) \
+    M(SettingUInt64, insert_distributed_timeout, 0, "Timeout for insert query into distributed. Setting is used only with insert_distributed_sync enabled. Zero value means no timeout.", 0) \
+    M(SettingInt64, distributed_ddl_task_timeout, 180, "Timeout for DDL query responses from all hosts in cluster. If a ddl request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode. Negative value means infinite.", 0) \
+    M(SettingMilliseconds, stream_flush_interval_ms, 7500, "Timeout for flushing data from streaming storages.", 0) \
+    M(SettingMilliseconds, stream_poll_timeout_ms, 500, "Timeout for polling data from/to streaming storages.", 0) \
    \
-    M(SettingString, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)") \
-    M(SettingString, format_custom_field_delimiter, "\t", "Delimiter between fields (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_before_delimiter, "", "Delimiter before field of the first column (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_after_delimiter, "\n", "Delimiter after field of the last column (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_between_delimiter, "", "Delimiter between rows (for CustomSeparated format)") \
-    M(SettingString, format_custom_result_before_delimiter, "", "Prefix before result set (for CustomSeparated format)") \
-    M(SettingString, format_custom_result_after_delimiter, "", "Suffix after result set (for CustomSeparated format)") \
+    M(SettingString, format_schema, "", "Schema identifier (used by schema-based formats)", 0) \
+    M(SettingString, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)", 0) \
+    M(SettingString, format_template_row, "", "Path to file which contains format string for rows (for Template format)", 0) \
+    M(SettingString, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)", 0) \
    \
-    M(SettingBool, insert_allow_materialized_columns, 0, "If setting is enabled, Allow materialized columns in INSERT.") \
-    M(SettingSeconds, http_connection_timeout, DEFAULT_HTTP_READ_BUFFER_CONNECTION_TIMEOUT, "HTTP connection timeout.") \
-    M(SettingSeconds, http_send_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP send timeout") \
-    M(SettingSeconds, http_receive_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP receive timeout") \
-    M(SettingBool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown") \
-    M(SettingBool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.") \
-    M(SettingBool, joined_subquery_requires_alias, false, "Force joined subqueries to have aliases for correct name qualification.") \
-    M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.") \
-    M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.") \
-    M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.") \
-    M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.") \
-    M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.") \
+    M(SettingString, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_field_delimiter, "\t", "Delimiter between fields (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_before_delimiter, "", "Delimiter before field of the first column (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_after_delimiter, "\n", "Delimiter after field of the last column (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_between_delimiter, "", "Delimiter between rows (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_result_before_delimiter, "", "Prefix before result set (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_result_after_delimiter, "", "Suffix after result set (for CustomSeparated format)", 0) \
+    \
+    M(SettingBool, insert_allow_materialized_columns, 0, "If setting is enabled, Allow materialized columns in INSERT.", 0) \
+    M(SettingSeconds, http_connection_timeout, DEFAULT_HTTP_READ_BUFFER_CONNECTION_TIMEOUT, "HTTP connection timeout.", 0) \
+    M(SettingSeconds, http_send_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP send timeout", 0) \
+    M(SettingSeconds, http_receive_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP receive timeout", 0) \
+    M(SettingBool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown", 0) \
+    M(SettingBool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.", 0) \
+    M(SettingBool, joined_subquery_requires_alias, false, "Force joined subqueries to have aliases for correct name qualification.", 0) \
+    M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.", 0) \
+    M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.", 0) \
+    M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.", 0) \
+    M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
+    M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
    \
    \
    /** Limits during query execution are part of the settings. \
@ -257,135 +264,135 @@ struct Settings : public SettingsCollection<Settings>
      * Almost all limits apply to each stream individually. \
      */ \
    \
-    M(SettingUInt64, max_rows_to_read, 0, "Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.") \
-    M(SettingUInt64, max_bytes_to_read, 0, "Limit on read bytes (after decompression) from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.") \
-    M(SettingOverflowMode, read_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_to_read, 0, "Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.", 0) \
+    M(SettingUInt64, max_bytes_to_read, 0, "Limit on read bytes (after decompression) from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.", 0) \
+    M(SettingOverflowMode, read_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_to_group_by, 0, "") \
-    M(SettingOverflowModeGroupBy, group_by_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingUInt64, max_bytes_before_external_group_by, 0, "") \
+    M(SettingUInt64, max_rows_to_group_by, 0, "", 0) \
+    M(SettingOverflowModeGroupBy, group_by_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingUInt64, max_bytes_before_external_group_by, 0, "", 0) \
    \
-    M(SettingUInt64, max_rows_to_sort, 0, "") \
-    M(SettingUInt64, max_bytes_to_sort, 0, "") \
-    M(SettingOverflowMode, sort_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingUInt64, max_bytes_before_external_sort, 0, "") \
-    M(SettingUInt64, max_bytes_before_remerge_sort, 1000000000, "In case of ORDER BY with LIMIT, when memory usage is higher than specified threshold, perform additional steps of merging blocks before final merge to keep just top LIMIT rows.") \
+    M(SettingUInt64, max_rows_to_sort, 0, "", 0) \
+    M(SettingUInt64, max_bytes_to_sort, 0, "", 0) \
+    M(SettingOverflowMode, sort_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingUInt64, max_bytes_before_external_sort, 0, "", 0) \
+    M(SettingUInt64, max_bytes_before_remerge_sort, 1000000000, "In case of ORDER BY with LIMIT, when memory usage is higher than specified threshold, perform additional steps of merging blocks before final merge to keep just top LIMIT rows.", 0) \
    \
-    M(SettingUInt64, max_result_rows, 0, "Limit on result size in rows. Also checked for intermediate data sent from remote servers.") \
-    M(SettingUInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). Also checked for intermediate data sent from remote servers.") \
-    M(SettingOverflowMode, result_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_result_rows, 0, "Limit on result size in rows. Also checked for intermediate data sent from remote servers.", 0) \
+    M(SettingUInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). Also checked for intermediate data sent from remote servers.", 0) \
+    M(SettingOverflowMode, result_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
    /* TODO: Check also when merging and finalizing aggregate functions. */ \
-    M(SettingSeconds, max_execution_time, 0, "") \
-    M(SettingOverflowMode, timeout_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingSeconds, max_execution_time, 0, "", 0) \
+    M(SettingOverflowMode, timeout_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, min_execution_speed, 0, "Minimum number of execution rows per second.") \
-    M(SettingUInt64, max_execution_speed, 0, "Maximum number of execution rows per second.") \
-    M(SettingUInt64, min_execution_speed_bytes, 0, "Minimum number of execution bytes per second.") \
-    M(SettingUInt64, max_execution_speed_bytes, 0, "Maximum number of execution bytes per second.") \
-    M(SettingSeconds, timeout_before_checking_execution_speed, 0, "Check that the speed is not too low after the specified time has elapsed.") \
+    M(SettingUInt64, min_execution_speed, 0, "Minimum number of execution rows per second.", 0) \
+    M(SettingUInt64, max_execution_speed, 0, "Maximum number of execution rows per second.", 0) \
+    M(SettingUInt64, min_execution_speed_bytes, 0, "Minimum number of execution bytes per second.", 0) \
+    M(SettingUInt64, max_execution_speed_bytes, 0, "Maximum number of execution bytes per second.", 0) \
+    M(SettingSeconds, timeout_before_checking_execution_speed, 0, "Check that the speed is not too low after the specified time has elapsed.", 0) \
    \
-    M(SettingUInt64, max_columns_to_read, 0, "") \
-    M(SettingUInt64, max_temporary_columns, 0, "") \
-    M(SettingUInt64, max_temporary_non_const_columns, 0, "") \
+    M(SettingUInt64, max_columns_to_read, 0, "", 0) \
+    M(SettingUInt64, max_temporary_columns, 0, "", 0) \
+    M(SettingUInt64, max_temporary_non_const_columns, 0, "", 0) \
    \
-    M(SettingUInt64, max_subquery_depth, 100, "") \
-    M(SettingUInt64, max_pipeline_depth, 1000, "") \
-    M(SettingUInt64, max_ast_depth, 1000, "Maximum depth of query syntax tree. Checked after parsing.") \
-    M(SettingUInt64, max_ast_elements, 50000, "Maximum size of query syntax tree in number of nodes. Checked after parsing.") \
-    M(SettingUInt64, max_expanded_ast_elements, 500000, "Maximum size of query syntax tree in number of nodes after expansion of aliases and the asterisk.") \
+    M(SettingUInt64, max_subquery_depth, 100, "", 0) \
+    M(SettingUInt64, max_pipeline_depth, 1000, "", 0) \
+    M(SettingUInt64, max_ast_depth, 1000, "Maximum depth of query syntax tree. Checked after parsing.", 0) \
+    M(SettingUInt64, max_ast_elements, 50000, "Maximum size of query syntax tree in number of nodes. Checked after parsing.", 0) \
+    M(SettingUInt64, max_expanded_ast_elements, 500000, "Maximum size of query syntax tree in number of nodes after expansion of aliases and the asterisk.", 0) \
    \
-    M(SettingUInt64, readonly, 0, "0 - everything is allowed. 1 - only read requests. 2 - only read requests, as well as changing settings, except for the 'readonly' setting.") \
+    M(SettingUInt64, readonly, 0, "0 - everything is allowed. 1 - only read requests. 2 - only read requests, as well as changing settings, except for the 'readonly' setting.", 0) \
    \
-    M(SettingUInt64, max_rows_in_set, 0, "Maximum size of the set (in number of elements) resulting from the execution of the IN section.") \
-    M(SettingUInt64, max_bytes_in_set, 0, "Maximum size of the set (in bytes in memory) resulting from the execution of the IN section.") \
-    M(SettingOverflowMode, set_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_in_set, 0, "Maximum size of the set (in number of elements) resulting from the execution of the IN section.", 0) \
+    M(SettingUInt64, max_bytes_in_set, 0, "Maximum size of the set (in bytes in memory) resulting from the execution of the IN section.", 0) \
+    M(SettingOverflowMode, set_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_in_join, 0, "Maximum size of the hash table for JOIN (in number of rows).") \
-    M(SettingUInt64, max_bytes_in_join, 0, "Maximum size of the hash table for JOIN (in number of bytes in memory).") \
-    M(SettingOverflowMode, join_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.") \
-    M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join for LEFT and INNER JOINs.") \
-    M(SettingBool, partial_merge_join_optimizations, false, "Enable optimizations in partial merge join") \
-    M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit's required but max_bytes_in_join is not set.") \
-    M(SettingUInt64, partial_merge_join_rows_in_right_blocks, 10000, "Split right-hand joining data in blocks of specified size. It's a portion of data indexed by min-max values and possibly unloaded on disk.") \
-    M(SettingUInt64, partial_merge_join_rows_in_left_blocks, 10000, "Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.") \
+    M(SettingUInt64, max_rows_in_join, 0, "Maximum size of the hash table for JOIN (in number of rows).", 0) \
+    M(SettingUInt64, max_bytes_in_join, 0, "Maximum size of the hash table for JOIN (in number of bytes in memory).", 0) \
+    M(SettingOverflowMode, join_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.", IMPORTANT) \
+    M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join for LEFT and INNER JOINs.", 0) \
+    M(SettingBool, partial_merge_join_optimizations, false, "Enable optimizations in partial merge join", 0) \
+    M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit's required but max_bytes_in_join is not set.", 0) \
+    M(SettingUInt64, partial_merge_join_rows_in_right_blocks, 10000, "Split right-hand joining data in blocks of specified size. It's a portion of data indexed by min-max values and possibly unloaded on disk.", 0) \
+    M(SettingUInt64, partial_merge_join_rows_in_left_blocks, 10000, "Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.", 0) \
    \
-    M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \
-    M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \
-    M(SettingOverflowMode, transfer_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
+    M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
+    M(SettingOverflowMode, transfer_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_in_distinct, 0, "Maximum number of elements during execution of DISTINCT.") \
-    M(SettingUInt64, max_bytes_in_distinct, 0, "Maximum total size of state (in uncompressed bytes) in memory for the execution of DISTINCT.") \
-    M(SettingOverflowMode, distinct_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_in_distinct, 0, "Maximum number of elements during execution of DISTINCT.", 0) \
+    M(SettingUInt64, max_bytes_in_distinct, 0, "Maximum total size of state (in uncompressed bytes) in memory for the execution of DISTINCT.", 0) \
+    M(SettingOverflowMode, distinct_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_memory_usage, 0, "Maximum memory usage for processing of single query. Zero means unlimited.") \
-    M(SettingUInt64, max_memory_usage_for_user, 0, "Maximum memory usage for processing all concurrently running queries for the user. Zero means unlimited.") \
-    M(SettingUInt64, max_memory_usage_for_all_queries, 0, "Maximum memory usage for processing all concurrently running queries on the server. Zero means unlimited.") \
+    M(SettingUInt64, max_memory_usage, 0, "Maximum memory usage for processing of single query. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_memory_usage_for_user, 0, "Maximum memory usage for processing all concurrently running queries for the user. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_memory_usage_for_all_queries, 0, "Maximum memory usage for processing all concurrently running queries on the server. Zero means unlimited.", 0) \
    \
-    M(SettingUInt64, max_network_bandwidth, 0, "The maximum speed of data exchange over the network in bytes per second for a query. Zero means unlimited.") \
-    M(SettingUInt64, max_network_bytes, 0, "The maximum number of bytes (compressed) to receive or transmit over the network for execution of the query.") \
-    M(SettingUInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.")\
-    M(SettingUInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.") \
-    M(SettingChar, format_csv_delimiter, ',', "The character to be considered as a delimiter in CSV data. If setting with a string, a string has to have a length of 1.") \
-    M(SettingBool, format_csv_allow_single_quotes, 1, "If it is set to true, allow strings in single quotes.") \
-    M(SettingBool, format_csv_allow_double_quotes, 1, "If it is set to true, allow strings in double quotes.") \
-    M(SettingBool, input_format_csv_unquoted_null_literal_as_null, false, "Consider unquoted NULL literal as \\N") \
+    M(SettingUInt64, max_network_bandwidth, 0, "The maximum speed of data exchange over the network in bytes per second for a query. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_network_bytes, 0, "The maximum number of bytes (compressed) to receive or transmit over the network for execution of the query.", 0) \
+    M(SettingUInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.", 0)\
+    M(SettingUInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.", 0) \
+    M(SettingChar, format_csv_delimiter, ',', "The character to be considered as a delimiter in CSV data. If setting with a string, a string has to have a length of 1.", 0) \
+    M(SettingBool, format_csv_allow_single_quotes, 1, "If it is set to true, allow strings in single quotes.", 0) \
+    M(SettingBool, format_csv_allow_double_quotes, 1, "If it is set to true, allow strings in double quotes.", 0) \
+    M(SettingBool, input_format_csv_unquoted_null_literal_as_null, false, "Consider unquoted NULL literal as \\N", 0) \
    \
-    M(SettingDateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.") \
-    M(SettingBool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.") \
-    M(SettingBool, log_query_settings, true, "Log query settings into the query_log.") \
-    M(SettingBool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.") \
-    M(SettingLogsLevel, send_logs_level, LogsLevel::none, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'none'") \
-    M(SettingBool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.") \
-    M(SettingBool, enable_optimize_predicate_expression_to_final_subquery, 1, "Allow push predicate to final subquery.") \
+    M(SettingDateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.", 0) \
+    M(SettingBool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.", 0) \
+    M(SettingBool, log_query_settings, true, "Log query settings into the query_log.", 0) \
+    M(SettingBool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.", 0) \
+    M(SettingLogsLevel, send_logs_level, LogsLevel::none, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'none'", 0) \
+    M(SettingBool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.", 0) \
+    M(SettingBool, enable_optimize_predicate_expression_to_final_subquery, 1, "Allow push predicate to final subquery.", 0) \
    \
-    M(SettingUInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.") \
-    M(SettingBool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.") \
-    M(SettingBool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations") \
+    M(SettingUInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.", 0) \
+    M(SettingBool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.", 0) \
+    M(SettingBool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations", 0) \
    \
-    M(SettingBool, prefer_localhost_replica, 1, "1 - always send query to local replica, if it exists. 0 - choose replica to send query between local and remote ones according to load_balancing") \
-    M(SettingUInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.") \
-    M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).") \
-    M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.") \
-    M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.") \
-    M(SettingBool, parallel_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \
-    M(SettingBool, enable_debug_queries, false, "Enables debug queries such as AST.") \
-    M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.") \
-    M(SettingBool, optimize_read_in_order, true, "Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.") \
-    M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.") \
-    M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Emulate multiple joins using subselects") \
-    M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Convert CROSS JOIN to INNER JOIN if possible") \
-    M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.") \
-    M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql' and 'odbc' table functions.") \
-    M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.") \
+    M(SettingBool, prefer_localhost_replica, 1, "1 - always send query to local replica, if it exists. 0 - choose replica to send query between local and remote ones according to load_balancing", 0) \
+    M(SettingUInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.", 0) \
+    M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).", 0) \
+    M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.", 0) \
+    M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.", 0) \
+    M(SettingBool, parallel_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.", 0) \
+    M(SettingBool, enable_debug_queries, false, "Enables debug queries such as AST.", 0) \
+    M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.", 0) \
+    M(SettingBool, optimize_read_in_order, true, "Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.", 0) \
+    M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.", 0) \
+    M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Emulate multiple joins using subselects", 0) \
+    M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Convert CROSS JOIN to INNER JOIN if possible", 0) \
+    M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.", 0) \
+    M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql' and 'odbc' table functions.", 0) \
+    M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.", 0) \
    \
-    M(SettingBool, experimental_use_processors, false, "Use processors pipeline.") \
+    M(SettingBool, experimental_use_processors, false, "Use processors pipeline.", 0) \
    \
-    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
-    M(SettingBool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.") \
-    M(SettingBool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.") \
+    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.", 0) \
+    M(SettingBool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.", 0) \
+    M(SettingBool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.", 0) \
    \
-    M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.") \
-    M(SettingBool, check_query_single_value_result, true, "Return check query result as single 1/0 value") \
-    M(SettingBool, allow_drop_detached, false, "Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries") \
+    M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.", 0) \
+    M(SettingBool, check_query_single_value_result, true, "Return check query result as single 1/0 value", 0) \
+    M(SettingBool, allow_drop_detached, false, "Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries", 0) \
    \
-    M(SettingSeconds, distributed_replica_error_half_life, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD, "Time period reduces replica error counter by 2 times.") \
-    M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.") \
+    M(SettingSeconds, distributed_replica_error_half_life, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD, "Time period reduces replica error counter by 2 times.", 0) \
+    M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.", 0) \
    \
-    M(SettingBool, allow_experimental_live_view, false, "Enable LIVE VIEW. Not mature enough.") \
-    M(SettingSeconds, live_view_heartbeat_interval, DEFAULT_LIVE_VIEW_HEARTBEAT_INTERVAL_SEC, "The heartbeat interval in seconds to indicate live query is alive.") \
-    M(SettingSeconds, temporary_live_view_timeout, DEFAULT_TEMPORARY_LIVE_VIEW_TIMEOUT_SEC, "Timeout after which temporary live view is deleted.") \
-    M(SettingUInt64, max_live_view_insert_blocks_before_refresh, 64, "Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.") \
-    M(SettingUInt64, min_free_disk_space_for_temporary_data, 0, "The minimum disk space to keep while writing temporary data used in external sorting and aggregation.") \
+    M(SettingBool, allow_experimental_live_view, false, "Enable LIVE VIEW. Not mature enough.", 0) \
+    M(SettingSeconds, live_view_heartbeat_interval, DEFAULT_LIVE_VIEW_HEARTBEAT_INTERVAL_SEC, "The heartbeat interval in seconds to indicate live query is alive.", 0) \
+    M(SettingSeconds, temporary_live_view_timeout, DEFAULT_TEMPORARY_LIVE_VIEW_TIMEOUT_SEC, "Timeout after which temporary live view is deleted.", 0) \
+    M(SettingUInt64, max_live_view_insert_blocks_before_refresh, 64, "Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.", 0) \
+    M(SettingUInt64, min_free_disk_space_for_temporary_data, 0, "The minimum disk space to keep while writing temporary data used in external sorting and aggregation.", 0) \
    \
-    M(SettingBool, enable_scalar_subquery_optimization, true, "If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once.") \
-    M(SettingBool, optimize_trivial_count_query, true, "Process trivial 'SELECT count() FROM table' query from metadata.") \
+    M(SettingBool, enable_scalar_subquery_optimization, true, "If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once.", 0) \
+    M(SettingBool, optimize_trivial_count_query, true, "Process trivial 'SELECT count() FROM table' query from metadata.", 0) \
    \
    /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \
    \
-    M(SettingBool, allow_experimental_low_cardinality_type, true, "Obsolete setting, does nothing. Will be removed after 2019-08-13") \
-    M(SettingBool, compile, false, "Whether query compilation is enabled. Will be removed after 2020-03-13") \
+    M(SettingBool, allow_experimental_low_cardinality_type, true, "Obsolete setting, does nothing. Will be removed after 2019-08-13", 0) \
+    M(SettingBool, compile, false, "Whether query compilation is enabled. Will be removed after 2020-03-13", 0) \

    DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS)

--- a/dbms/src/Core/SettingsCollection.cpp
+++ b/dbms/src/Core/SettingsCollection.cpp
@ -1,17 +1,17 @@
-#include "SettingsCommon.h"
+#include <Core/SettingsCollection.h>
+#include <Core/SettingsCollectionImpl.h>

 #include <Core/Field.h>
 #include <Common/getNumberOfPhysicalCPUCores.h>
 #include <Common/FieldVisitors.h>
+#include <common/logger_useful.h>
 #include <IO/ReadHelpers.h>
 #include <IO/ReadBufferFromString.h>
 #include <IO/WriteHelpers.h>


-
 namespace DB
 {
-
 namespace ErrorCodes
 {
    extern const int TYPE_MISMATCH;
@ -62,7 +62,7 @@ void SettingNumber<Type>::set(const Field & x)
 template <typename Type>
 void SettingNumber<Type>::set(const String & x)
 {
-    set(parse<Type>(x));
+    set(completeParse<Type>(x));
 }

 template <>
@ -90,8 +90,14 @@ void SettingNumber<bool>::set(const String & x)
 }

 template <typename Type>
-void SettingNumber<Type>::serialize(WriteBuffer & buf) const
+void SettingNumber<Type>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+         writeStringBinary(toString(), buf);
+         return;
+    }
+
    if constexpr (is_integral_v<Type> && is_unsigned_v<Type>)
        writeVarUInt(static_cast<UInt64>(value), buf);
    else if constexpr (is_integral_v<Type> && is_signed_v<Type>)
@ -99,13 +105,21 @@ void SettingNumber<Type>::serialize(WriteBuffer & buf) const
    else
    {
        static_assert(std::is_floating_point_v<Type>);
-        writeBinary(toString(), buf);
+        writeStringBinary(toString(), buf);
    }
 }

 template <typename Type>
-void SettingNumber<Type>::deserialize(ReadBuffer & buf)
+void SettingNumber<Type>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    if constexpr (is_integral_v<Type> && is_unsigned_v<Type>)
    {
        UInt64 x;
@ -122,7 +136,7 @@ void SettingNumber<Type>::deserialize(ReadBuffer & buf)
    {
        static_assert(std::is_floating_point_v<Type>);
        String x;
-        readBinary(x, buf);
+        readStringBinary(x, buf);
        set(x);
    }
 }
@ -167,13 +181,27 @@ void SettingMaxThreads::set(const String & x)
        set(parse<UInt64>(x));
 }

-void SettingMaxThreads::serialize(WriteBuffer & buf) const
+void SettingMaxThreads::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        writeStringBinary(is_auto ? "auto" : DB::toString(value), buf);
+        return;
+    }
+
    writeVarUInt(is_auto ? 0 : value, buf);
 }

-void SettingMaxThreads::deserialize(ReadBuffer & buf)
+void SettingMaxThreads::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    UInt64 x = 0;
    readVarUInt(x, buf);
    set(x);
@ -233,14 +261,28 @@ void SettingTimespan<io_unit>::set(const String & x)
 }

 template <SettingTimespanIO io_unit>
-void SettingTimespan<io_unit>::serialize(WriteBuffer & buf) const
+void SettingTimespan<io_unit>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        writeStringBinary(toString(), buf);
+        return;
+    }
+
    writeVarUInt(value.totalMicroseconds() / microseconds_per_io_unit, buf);
 }

 template <SettingTimespanIO io_unit>
-void SettingTimespan<io_unit>::deserialize(ReadBuffer & buf)
+void SettingTimespan<io_unit>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    UInt64 x = 0;
    readVarUInt(x, buf);
    set(x);
@ -271,15 +313,15 @@ void SettingString::set(const Field & x)
    set(safeGet<const String &>(x));
 }

-void SettingString::serialize(WriteBuffer & buf) const
+void SettingString::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(value, buf);
+    writeStringBinary(value, buf);
 }

-void SettingString::deserialize(ReadBuffer & buf)
+void SettingString::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }

@ -314,30 +356,30 @@ void SettingChar::set(const Field & x)
    set(s);
 }

-void SettingChar::serialize(WriteBuffer & buf) const
+void SettingChar::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(toString(), buf);
+    writeStringBinary(toString(), buf);
 }

-void SettingChar::deserialize(ReadBuffer & buf)
+void SettingChar::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }


 template <typename EnumType, typename Tag>
-void SettingEnum<EnumType, Tag>::serialize(WriteBuffer & buf) const
+void SettingEnum<EnumType, Tag>::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(toString(), buf);
+    writeStringBinary(toString(), buf);
 }

 template <typename EnumType, typename Tag>
-void SettingEnum<EnumType, Tag>::deserialize(ReadBuffer & buf)
+void SettingEnum<EnumType, Tag>::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }

@ -462,14 +504,43 @@ IMPLEMENT_SETTING_ENUM(LogsLevel, LOGS_LEVEL_LIST_OF_NAMES, ErrorCodes::BAD_ARGU

 namespace details
 {
+    void SettingsCollectionUtils::serializeName(const StringRef & name, WriteBuffer & buf)
+    {
+        writeStringBinary(name, buf);
+    }
+
    String SettingsCollectionUtils::deserializeName(ReadBuffer & buf)
    {
        String name;
-        readBinary(name, buf);
+        readStringBinary(name, buf);
        return name;
    }

-    void SettingsCollectionUtils::serializeName(const StringRef & name, WriteBuffer & buf) { writeBinary(name, buf); }
+    void SettingsCollectionUtils::serializeFlag(bool flag, WriteBuffer & buf)
+    {
+        buf.write(flag);
+    }
+
+    bool SettingsCollectionUtils::deserializeFlag(ReadBuffer & buf)
+    {
+        char c;
+        buf.readStrict(c);
+        return c;
+    }
+
+    void SettingsCollectionUtils::skipValue(ReadBuffer & buf)
+    {
+        /// Ignore a string written by the function writeStringBinary().
+        UInt64 size;
+        readVarUInt(size, buf);
+        buf.ignore(size);
+    }
+
+    void SettingsCollectionUtils::warningNameNotFound(const StringRef & name)
+    {
+        static auto * log = &Logger::get("Settings");
+        LOG_WARNING(log, "Unknown setting " << name << ", skipping");
+    }

    void SettingsCollectionUtils::throwNameNotFound(const StringRef & name)
    {
--- a/dbms/src/Core/SettingsCollection.h
+++ b/dbms/src/Core/SettingsCollection.h
@ -6,7 +6,6 @@
 #include <common/StringRef.h>
 #include <Core/Types.h>
 #include <unordered_map>
-#include <boost/noncopyable.hpp>


 namespace DB
@ -17,6 +16,8 @@ struct SettingChange;
 using SettingsChanges = std::vector<SettingChange>;
 class ReadBuffer;
 class WriteBuffer;
+enum class SettingsBinaryFormat;
+

 /** One setting for any type.
  * Stores a value within itself, as well as a flag - whether the value was changed.
@ -51,10 +52,10 @@ struct SettingNumber
    void set(const String & x);

    /// Serialize to binary stream suitable for transfer over network.
-    void serialize(WriteBuffer & buf) const;
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;

    /// Read from binary stream.
-    void deserialize(ReadBuffer & buf);
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };

 using SettingUInt64 = SettingNumber<UInt64>;
@ -85,8 +86,8 @@ struct SettingMaxThreads
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);

    void setAuto();
    UInt64 getAutoValue() const;
@ -118,8 +119,8 @@ struct SettingTimespan
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);

    static constexpr UInt64 microseconds_per_io_unit = (io_unit == SettingTimespanIO::MILLISECOND) ? 1000 : 1000000;
 };
@ -144,8 +145,8 @@ struct SettingString
    void set(const String & x);
    void set(const Field & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -167,8 +168,8 @@ public:
    void set(const String & x);
    void set(const Field & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -191,8 +192,8 @@ struct SettingEnum
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -269,15 +270,12 @@ enum class LogsLevel
 using SettingLogsLevel = SettingEnum<LogsLevel>;


-namespace details
+enum class SettingsBinaryFormat
 {
-    struct SettingsCollectionUtils
-    {
-        static void serializeName(const StringRef & name, WriteBuffer & buf);
-        static String deserializeName(ReadBuffer & buf);
-        [[noreturn]] static void throwNameNotFound(const StringRef & name);
-    };
-}
+    OLD,     /// Part of the settings are serialized as strings, and other part as varints. This is the old behaviour.
+    STRINGS, /// All settings are serialized as strings. Before each value the flag `is_ignorable` is serialized.
+    DEFAULT = STRINGS,
+};


 /** Template class to define collections of settings.
@ -287,9 +285,9 @@ namespace details
  * struct MySettings : public SettingsCollection<MySettings>
  * {
  * #   define APPLY_FOR_MYSETTINGS(M) \
-  *         M(SettingUInt64, a, 100, "Description of a") \
-  *         M(SettingFloat, f, 3.11, "Description of f") \
-  *         M(SettingString, s, "default", "Description of s")
+  *         M(SettingUInt64, a, 100, "Description of a", 0) \
+  *         M(SettingFloat, f, 3.11, "Description of f", IMPORTANT) // IMPORTANT - means the setting can't be ignored by older versions) \
+  *         M(SettingString, s, "default", "Description of s", 0)
  *
  *     DECLARE_SETTINGS_COLLECTION(MySettings, APPLY_FOR_MYSETTINGS)
  * };
@ -304,21 +302,22 @@ private:
    Derived & castToDerived() { return *static_cast<Derived *>(this); }
    const Derived & castToDerived() const { return *static_cast<const Derived *>(this); }

-    using IsChangedFunction = bool (*)(const Derived &);
-    using GetStringFunction = String (*)(const Derived &);
-    using GetFieldFunction = Field (*)(const Derived &);
-    using SetStringFunction = void (*)(Derived &, const String &);
-    using SetFieldFunction = void (*)(Derived &, const Field &);
-    using SerializeFunction = void (*)(const Derived &, WriteBuffer & buf);
-    using DeserializeFunction = void (*)(Derived &, ReadBuffer & buf);
-    using ValueToStringFunction = String (*)(const Field &);
-    using ValueToCorrespondingTypeFunction = Field (*)(const Field &);
-
    struct MemberInfo
    {
-        IsChangedFunction is_changed;
+        using IsChangedFunction = bool (*)(const Derived &);
+        using GetStringFunction = String (*)(const Derived &);
+        using GetFieldFunction = Field (*)(const Derived &);
+        using SetStringFunction = void (*)(Derived &, const String &);
+        using SetFieldFunction = void (*)(Derived &, const Field &);
+        using SerializeFunction = void (*)(const Derived &, WriteBuffer & buf, SettingsBinaryFormat);
+        using DeserializeFunction = void (*)(Derived &, ReadBuffer & buf, SettingsBinaryFormat);
+        using ValueToStringFunction = String (*)(const Field &);
+        using ValueToCorrespondingTypeFunction = Field (*)(const Field &);
+
        StringRef name;
        StringRef description;
+        bool is_important;
+        IsChangedFunction is_changed;
        GetStringFunction get_string;
        GetFieldFunction get_field;
        SetStringFunction set_string;
@ -329,52 +328,22 @@ private:
        ValueToCorrespondingTypeFunction value_to_corresponding_type;
    };

-    class MemberInfos : private boost::noncopyable
+    class MemberInfos
    {
    public:
-        static const MemberInfos & instance();
-
-        size_t size() const { return infos.size(); }
-        const MemberInfo & operator[](size_t index) const { return infos[index]; }
-        const MemberInfo * begin() const { return infos.data(); }
-        const MemberInfo * end() const { return infos.data() + infos.size(); }
-
-        size_t findIndex(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                return static_cast<size_t>(-1); // npos
-            return it->second;
-        }
-
-        size_t findIndexStrict(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                details::SettingsCollectionUtils::throwNameNotFound(name);
-            return it->second;
-        }
-
-        const MemberInfo * find(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                return end();
-            else
-                return &infos[it->second];
-        }
-
-        const MemberInfo * findStrict(const StringRef & name) const { return &infos[findIndexStrict(name)]; }
-
-    private:
        MemberInfos();

-        void add(MemberInfo && member)
-        {
-            size_t index = infos.size();
-            infos.emplace_back(member);
-            by_name_map.emplace(infos.back().name, index);
-        }
+        size_t size() const { return infos.size(); }
+        const MemberInfo * data() const { return infos.data(); }
+        const MemberInfo & operator[](size_t index) const { return infos[index]; }
+
+        const MemberInfo * find(const StringRef & name) const;
+        const MemberInfo & findStrict(const StringRef & name) const;
+        size_t findIndex(const StringRef & name) const;
+        size_t findIndexStrict(const StringRef & name) const;
+
+    private:
+        void add(MemberInfo && member);

        std::vector<MemberInfo> infos;
        std::unordered_map<StringRef, size_t> by_name_map;
@ -396,6 +365,7 @@ public:
        bool isChanged() const { return member->is_changed(*collection); }
        Field getValue() const;
        String getValueAsString() const { return member->get_string(*collection); }
+
    protected:
        friend class SettingsCollection<Derived>::const_iterator;
        const_reference() : collection(nullptr), member(nullptr) {}
@ -410,7 +380,7 @@ public:
    public:
        reference(Derived & collection_, const MemberInfo & member_) : const_reference(collection_, member_) {}
        reference(const const_reference & src) : const_reference(src) {}
-        void setValue(const Field & value);
+        void setValue(const Field & value) { this->member->set_field(*const_cast<Derived *>(this->collection), value); }
        void setValue(const String & value) { this->member->set_string(*const_cast<Derived *>(this->collection), value); }
    };

@ -453,7 +423,7 @@ public:

    /// Returns description of a setting.
    static StringRef getDescription(size_t index) { return members()[index].description; }
-    static StringRef getDescription(const String & name) { return members().findStrict(name)->description; }
+    static StringRef getDescription(const String & name) { return members().findStrict(name).description; }

    /// Searches a setting by its name; returns `npos` if not found.
    static size_t findIndex(const StringRef & name) { return members().findIndex(name); }
@ -463,36 +433,36 @@ public:
    static size_t findIndexStrict(const StringRef & name) { return members().findIndexStrict(name); }

    /// Casts a value to a string according to a specified setting without actual changing this settings.
-    static String valueToString(size_t index, const Field & value);
-    static String valueToString(const StringRef & name, const Field & value);
+    static String valueToString(size_t index, const Field & value) { return members()[index].value_to_string(value); }
+    static String valueToString(const StringRef & name, const Field & value) { return members().findStrict(name).value_to_string(value); }

    /// Casts a value to a type according to a specified setting without actual changing this settings.
    /// E.g. for SettingInt64 it casts Field to Field::Types::Int64.
    static Field valueToCorrespondingType(size_t index, const Field & value);
    static Field valueToCorrespondingType(const StringRef & name, const Field & value);

-    iterator begin() { return iterator(castToDerived(), members().begin()); }
-    const_iterator begin() const { return const_iterator(castToDerived(), members().begin()); }
-    iterator end() { return iterator(castToDerived(), members().end()); }
-    const_iterator end() const { return const_iterator(castToDerived(), members().end()); }
+    iterator begin() { return iterator(castToDerived(), members().data()); }
+    const_iterator begin() const { return const_iterator(castToDerived(), members().data()); }
+    iterator end() { const auto & the_members = members(); return iterator(castToDerived(), the_members.data() + the_members.size()); }
+    const_iterator end() const { const auto & the_members = members(); return const_iterator(castToDerived(), the_members.data() + the_members.size()); }

    /// Returns a proxy object for accessing to a setting. Throws an exception if there is not setting with such name.
    reference operator[](size_t index) { return reference(castToDerived(), members()[index]); }
-    reference operator[](const StringRef & name) { return reference(castToDerived(), *(members().findStrict(name))); }
+    reference operator[](const StringRef & name) { return reference(castToDerived(), members().findStrict(name)); }
    const_reference operator[](size_t index) const { return const_reference(castToDerived(), members()[index]); }
-    const_reference operator[](const StringRef & name) const { return const_reference(castToDerived(), *(members().findStrict(name))); }
+    const_reference operator[](const StringRef & name) const { return const_reference(castToDerived(), members().findStrict(name)); }

    /// Searches a setting by its name; returns end() if not found.
-    iterator find(const StringRef & name) { return iterator(castToDerived(), members().find(name)); }
-    const_iterator find(const StringRef & name) const { return const_iterator(castToDerived(), members().find(name)); }
+    iterator find(const StringRef & name);
+    const_iterator find(const StringRef & name) const;

    /// Searches a setting by its name; throws an exception if not found.
-    iterator findStrict(const StringRef & name) { return iterator(castToDerived(), members().findStrict(name)); }
-    const_iterator findStrict(const StringRef & name) const { return const_iterator(castToDerived(), members().findStrict(name)); }
+    iterator findStrict(const StringRef & name);
+    const_iterator findStrict(const StringRef & name) const;

    /// Sets setting's value.
-    void set(size_t index, const Field & value);
-    void set(const StringRef & name, const Field & value);
+    void set(size_t index, const Field & value) { (*this)[index].setValue(value); }
+    void set(const StringRef & name, const Field & value) { (*this)[name].setValue(value); }

    /// Sets setting's value. Read value in text form from string (for example, from configuration file or from URL parameter).
    void set(size_t index, const String & value) { (*this)[index].setValue(value); }
@ -514,11 +484,7 @@ public:

    /// Compares two collections of settings.
    bool operator ==(const Derived & rhs) const;
-
-    bool operator !=(const Derived & rhs) const
-    {
-        return !(*this == rhs);
-    }
+    bool operator!=(const Derived & rhs) const { return !(*this == rhs); }

    /// Gathers all changed values (e.g. for applying them later to another collection of settings).
    SettingsChanges changes() const;
@ -536,82 +502,16 @@ public:
    /// Writes the settings to buffer (e.g. to be sent to remote server).
    /// Only changed settings are written. They are written as list of contiguous name-value pairs,
    /// finished with empty name.
-    void serialize(WriteBuffer & buf) const
-    {
-        for (const auto & member : members())
-        {
-            if (member.is_changed(castToDerived()))
-            {
-                details::SettingsCollectionUtils::serializeName(member.name, buf);
-                member.serialize(castToDerived(), buf);
-            }
-        }
-        details::SettingsCollectionUtils::serializeName(StringRef{} /* empty string is a marker of the end of settings */, buf);
-    }
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format = SettingsBinaryFormat::DEFAULT) const;

    /// Reads the settings from buffer.
-    void deserialize(ReadBuffer & buf)
-    {
-        const auto & the_members = members();
-        while (true)
-        {
-            String name = details::SettingsCollectionUtils::deserializeName(buf);
-            if (name.empty() /* empty string is a marker of the end of settings */)
-                break;
-            the_members.findStrict(name)->deserialize(castToDerived(), buf);
-        }
-    }
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format = SettingsBinaryFormat::DEFAULT);
 };

+
 #define DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS_MACRO) \
    LIST_OF_SETTINGS_MACRO(DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_)

-
-#define IMPLEMENT_SETTINGS_COLLECTION(DERIVED_CLASS_NAME, LIST_OF_SETTINGS_MACRO) \
-    template<> \
-    SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::MemberInfos() \
-    { \
-        using Derived = DERIVED_CLASS_NAME; \
-        struct Functions \
-        { \
-            LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_) \
-        }; \
-        LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_) \
-    } \
-    template <> \
-    const SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos & SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::instance() \
-    { \
-        static const SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos single_instance; \
-        return single_instance; \
-    } \
-    /** \
-      * Instantiation should happen when all method definitions from SettingsCollectionImpl.h \
-      * are accessible, so we instantiate explicitly. \
-      */ \
-    template class SettingsCollection<DERIVED_CLASS_NAME>;
-
-
-
-#define DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
+#define DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
    TYPE NAME {DEFAULT};
-
-
-#define IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
-    static String NAME##_getString(const Derived & collection) { return collection.NAME.toString(); } \
-    static Field NAME##_getField(const Derived & collection) { return collection.NAME.toField(); } \
-    static void NAME##_setString(Derived & collection, const String & value) { collection.NAME.set(value); } \
-    static void NAME##_setField(Derived & collection, const Field & value) { collection.NAME.set(value); } \
-    static void NAME##_serialize(const Derived & collection, WriteBuffer & buf) { collection.NAME.serialize(buf); } \
-    static void NAME##_deserialize(Derived & collection, ReadBuffer & buf) { collection.NAME.deserialize(buf); } \
-    static String NAME##_valueToString(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toString(); } \
-    static Field NAME##_valueToCorrespondingType(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toField(); } \
-
-
-#define IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
-    add({[](const Derived & d) { return d.NAME.changed; },          \
-         StringRef(#NAME, strlen(#NAME)), StringRef(DESCRIPTION, strlen(DESCRIPTION)), \
-         &Functions::NAME##_getString, &Functions::NAME##_getField, \
-         &Functions::NAME##_setString, &Functions::NAME##_setField, \
-         &Functions::NAME##_serialize, &Functions::NAME##_deserialize, \
-         &Functions::NAME##_valueToString, &Functions::NAME##_valueToCorrespondingType});
 }
--- a/dbms/src/Core/SettingsCollectionImpl.h
+++ b/dbms/src/Core/SettingsCollectionImpl.h
@ -2,15 +2,84 @@

 /**
  * This file implements some functions that are dependent on Field type.
-  * Unlinke SettingsCommon.h, we only have to include it once for each
-  * instantiation of SettingsCollection<>. This allows to work on Field without
-  * always recompiling the entire project.
+  * Unlike SettingsCollection.h, we only have to include it once for each
+  * instantiation of SettingsCollection<>.
  */

 #include <Common/SettingsChanges.h>

 namespace DB
 {
+namespace details
+{
+    struct SettingsCollectionUtils
+    {
+        static void serializeName(const StringRef & name, WriteBuffer & buf);
+        static String deserializeName(ReadBuffer & buf);
+        static void serializeFlag(bool flag, WriteBuffer & buf);
+        static bool deserializeFlag(ReadBuffer & buf);
+        static void skipValue(ReadBuffer & buf);
+        static void warningNameNotFound(const StringRef & name);
+        [[noreturn]] static void throwNameNotFound(const StringRef & name);
+    };
+}
+
+
+template <class Derived>
+size_t SettingsCollection<Derived>::MemberInfos::findIndex(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        return static_cast<size_t>(-1); // npos
+    return it->second;
+}
+
+
+template <class Derived>
+size_t SettingsCollection<Derived>::MemberInfos::findIndexStrict(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        details::SettingsCollectionUtils::throwNameNotFound(name);
+    return it->second;
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfo * SettingsCollection<Derived>::MemberInfos::find(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        return nullptr;
+    else
+        return &infos[it->second];
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfo & SettingsCollection<Derived>::MemberInfos::findStrict(const StringRef & name) const
+{
+    return infos[findIndexStrict(name)];
+}
+
+
+template <class Derived>
+void SettingsCollection<Derived>::MemberInfos::add(MemberInfo && member)
+{
+    size_t index = infos.size();
+    infos.emplace_back(member);
+    by_name_map.emplace(infos.back().name, index);
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfos &
+SettingsCollection<Derived>::members()
+{
+    static const MemberInfos the_instance;
+    return the_instance;
+}
+

 template <class Derived>
 Field SettingsCollection<Derived>::const_reference::getValue() const
@ -18,23 +87,6 @@ Field SettingsCollection<Derived>::const_reference::getValue() const
    return member->get_field(*collection);
 }

-template <class Derived>
-void SettingsCollection<Derived>::reference::setValue(const Field & value)
-{
-    this->member->set_field(*const_cast<Derived *>(this->collection), value);
-}
-
-template <class Derived>
-String SettingsCollection<Derived>::valueToString(size_t index, const Field & value)
-{
-    return members()[index].value_to_string(value);
-}
-
-template <class Derived>
-String SettingsCollection<Derived>::valueToString(const StringRef & name, const Field & value)
-{
-    return members().findStrict(name)->value_to_string(value);
-}

 template <class Derived>
 Field SettingsCollection<Derived>::valueToCorrespondingType(size_t index, const Field & value)
@ -42,36 +94,62 @@ Field SettingsCollection<Derived>::valueToCorrespondingType(size_t index, const
    return members()[index].value_to_corresponding_type(value);
 }

+
 template <class Derived>
 Field SettingsCollection<Derived>::valueToCorrespondingType(const StringRef & name, const Field & value)
 {
-    return members().findStrict(name)->value_to_corresponding_type(value);
+    return members().findStrict(name).value_to_corresponding_type(value);
 }

-template <class Derived>
-void SettingsCollection<Derived>::set(size_t index, const Field & value)
-{
-    (*this)[index].setValue(value);
-}

 template <class Derived>
-void SettingsCollection<Derived>::set(const StringRef & name, const Field & value)
+typename SettingsCollection<Derived>::iterator SettingsCollection<Derived>::find(const StringRef & name)
 {
-    (*this)[name].setValue(value);
+    const auto * member = members().find(name);
+    if (member)
+        return iterator(castToDerived(), member);
+    return end();
 }

+
+template <class Derived>
+typename SettingsCollection<Derived>::const_iterator SettingsCollection<Derived>::find(const StringRef & name) const
+{
+    const auto * member = members().find(name);
+    if (member)
+        return const_iterator(castToDerived(), member);
+    return end();
+}
+
+
+template <class Derived>
+typename SettingsCollection<Derived>::iterator SettingsCollection<Derived>::findStrict(const StringRef & name)
+{
+    return iterator(castToDerived(), &members().findStrict(name));
+}
+
+
+template <class Derived>
+typename SettingsCollection<Derived>::const_iterator SettingsCollection<Derived>::findStrict(const StringRef & name) const
+{
+    return const_iterator(castToDerived(), &members().findStrict(name));
+}
+
+
 template <class Derived>
 Field SettingsCollection<Derived>::get(size_t index) const
 {
    return (*this)[index].getValue();
 }

+
 template <class Derived>
 Field SettingsCollection<Derived>::get(const StringRef & name) const
 {
    return (*this)[name].getValue();
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::tryGet(const StringRef & name, Field & value) const
 {
@ -82,6 +160,7 @@ bool SettingsCollection<Derived>::tryGet(const StringRef & name, Field & value)
    return true;
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::tryGet(const StringRef & name, String & value) const
 {
@ -92,11 +171,14 @@ bool SettingsCollection<Derived>::tryGet(const StringRef & name, String & value)
    return true;
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::operator ==(const Derived & rhs) const
 {
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
    {
+        const auto & member = the_members[i];
        bool left_changed = member.is_changed(castToDerived());
        bool right_changed = member.is_changed(rhs);
        if (left_changed || right_changed)
@ -110,27 +192,29 @@ bool SettingsCollection<Derived>::operator ==(const Derived & rhs) const
    return true;
 }

-/// Gathers all changed values (e.g. for applying them later to another collection of settings).
+
 template <class Derived>
 SettingsChanges SettingsCollection<Derived>::changes() const
 {
    SettingsChanges found_changes;
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
    {
+        const auto & member = the_members[i];
        if (member.is_changed(castToDerived()))
            found_changes.push_back({member.name.toString(), member.get_field(castToDerived())});
    }
    return found_changes;
 }

-/// Applies change to concrete setting.
+
 template <class Derived>
 void SettingsCollection<Derived>::applyChange(const SettingChange & change)
 {
    set(change.name, change.value);
 }

-/// Applies changes to the settings.
+
 template <class Derived>
 void SettingsCollection<Derived>::applyChanges(const SettingsChanges & changes)
 {
@ -138,25 +222,112 @@ void SettingsCollection<Derived>::applyChanges(const SettingsChanges & changes)
        applyChange(change);
 }

+
 template <class Derived>
 void SettingsCollection<Derived>::copyChangesFrom(const Derived & src)
 {
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
+    {
+        const auto & member = the_members[i];
        if (member.is_changed(src))
            member.set_field(castToDerived(), member.get_field(src));
+    }
 }

+
 template <class Derived>
 void SettingsCollection<Derived>::copyChangesTo(Derived & dest) const
 {
    dest.copyChangesFrom(castToDerived());
 }

+
 template <class Derived>
-const typename SettingsCollection<Derived>::MemberInfos &
-SettingsCollection<Derived>::members()
+void SettingsCollection<Derived>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
-    return MemberInfos::instance();
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
+    {
+        const auto & member = the_members[i];
+        if (member.is_changed(castToDerived()))
+        {
+            details::SettingsCollectionUtils::serializeName(member.name, buf);
+            if (format >= SettingsBinaryFormat::STRINGS)
+                details::SettingsCollectionUtils::serializeFlag(member.is_important, buf);
+            member.serialize(castToDerived(), buf, format);
+        }
+    }
+    details::SettingsCollectionUtils::serializeName(StringRef{} /* empty string is a marker of the end of settings */, buf);
 }

-} /* namespace DB */
+
+template <class Derived>
+void SettingsCollection<Derived>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
+{
+    const auto & the_members = members();
+    while (true)
+    {
+        String name = details::SettingsCollectionUtils::deserializeName(buf);
+        if (name.empty() /* empty string is a marker of the end of settings */)
+            break;
+        auto * member = the_members.find(name);
+        bool is_important = (format >= SettingsBinaryFormat::STRINGS) ? details::SettingsCollectionUtils::deserializeFlag(buf) : true;
+        if (member)
+        {
+            member->deserialize(castToDerived(), buf, format);
+        }
+        else if (is_important)
+        {
+            details::SettingsCollectionUtils::throwNameNotFound(name);
+        }
+        else
+        {
+            details::SettingsCollectionUtils::warningNameNotFound(name);
+            details::SettingsCollectionUtils::skipValue(buf);
+        }
+    }
+}
+
+
+//-V:IMPLEMENT_SETTINGS_COLLECTION:501
+#define IMPLEMENT_SETTINGS_COLLECTION(DERIVED_CLASS_NAME, LIST_OF_SETTINGS_MACRO) \
+    template<> \
+    SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::MemberInfos() \
+    { \
+        using Derived = DERIVED_CLASS_NAME; \
+        struct Functions \
+        { \
+            LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_) \
+        }; \
+        constexpr int IMPORTANT = 1; \
+        UNUSED(IMPORTANT); \
+        LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_) \
+    } \
+    /** \
+      * Instantiation should happen when all method definitions from SettingsCollectionImpl.h \
+      * are accessible, so we instantiate explicitly. \
+      */ \
+    template class SettingsCollection<DERIVED_CLASS_NAME>;
+
+
+#define IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
+    static String NAME##_getString(const Derived & collection) { return collection.NAME.toString(); } \
+    static Field NAME##_getField(const Derived & collection) { return collection.NAME.toField(); } \
+    static void NAME##_setString(Derived & collection, const String & value) { collection.NAME.set(value); } \
+    static void NAME##_setField(Derived & collection, const Field & value) { collection.NAME.set(value); } \
+    static void NAME##_serialize(const Derived & collection, WriteBuffer & buf, SettingsBinaryFormat format) { collection.NAME.serialize(buf, format); } \
+    static void NAME##_deserialize(Derived & collection, ReadBuffer & buf, SettingsBinaryFormat format) { collection.NAME.deserialize(buf, format); } \
+    static String NAME##_valueToString(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toString(); } \
+    static Field NAME##_valueToCorrespondingType(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toField(); } \
+
+
+#define IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
+    add({StringRef(#NAME, strlen(#NAME)), StringRef(DESCRIPTION, strlen(DESCRIPTION)), \
+         FLAGS & IMPORTANT, \
+         [](const Derived & d) { return d.NAME.changed; }, \
+         &Functions::NAME##_getString, &Functions::NAME##_getField, \
+         &Functions::NAME##_setString, &Functions::NAME##_setField, \
+         &Functions::NAME##_serialize, &Functions::NAME##_deserialize, \
+         &Functions::NAME##_valueToString, &Functions::NAME##_valueToCorrespondingType});
+}
--- a/dbms/src/Core/TypeListNumber.h
+++ b/dbms/src/Core/TypeListNumber.h
@ -5,6 +5,9 @@
 namespace DB
 {

-using TypeListNumbers = TypeList<UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64>;
+using TypeListNativeNumbers = TypeList<UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64>;
+using TypeListDecimalNumbers = TypeList<Decimal32, Decimal64, Decimal128>;
+using TypeListNumbers = TypeList<UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64,
+    Decimal32, Decimal64, Decimal128>;

 }
--- a/dbms/src/Core/iostream_debug_helpers.cpp
+++ b/dbms/src/Core/iostream_debug_helpers.cpp
@ -1,6 +1,7 @@
 #include "iostream_debug_helpers.h"

 #include <iostream>
+#include <Client/Connection.h>
 #include <Core/Block.h>
 #include <Core/ColumnWithTypeAndName.h>
 #include <Core/Field.h>
@ -92,9 +93,9 @@ std::ostream & operator<<(std::ostream & stream, const IColumn & what)
    return stream;
 }

-std::ostream & operator<<(std::ostream & stream, const Connection::Packet & what)
+std::ostream & operator<<(std::ostream & stream, const Packet & what)
 {
-    stream << "Connection::Packet("
+    stream << "Packet("
           << "type = " << what.type;
    // types description: Core/Protocol.h
    if (what.exception)
--- a/dbms/src/Core/iostream_debug_helpers.h
+++ b/dbms/src/Core/iostream_debug_helpers.h
@ -1,9 +1,6 @@
 #pragma once
 #include <iostream>

-#include <Client/Connection.h>
-
-
 namespace DB
 {

@ -40,7 +37,8 @@ std::ostream & operator<<(std::ostream & stream, const ColumnWithTypeAndName & w
 class IColumn;
 std::ostream & operator<<(std::ostream & stream, const IColumn & what);

-std::ostream & operator<<(std::ostream & stream, const Connection::Packet & what);
+struct Packet;
+std::ostream & operator<<(std::ostream & stream, const Packet & what);

 struct ExpressionAction;
 std::ostream & operator<<(std::ostream & stream, const ExpressionAction & what);
--- a/dbms/src/DataStreams/IBlockInputStream.h
+++ b/dbms/src/DataStreams/IBlockInputStream.h
@ -1,7 +1,6 @@
 #pragma once

 #include <Core/Block.h>
-#include <Core/SettingsCommon.h>
 #include <DataStreams/BlockStreamProfileInfo.h>
 #include <DataStreams/IBlockStream_fwd.h>
 #include <DataStreams/SizeLimits.h>
--- a/dbms/src/DataStreams/NativeBlockInputStream.cpp
+++ b/dbms/src/DataStreams/NativeBlockInputStream.cpp
@ -57,6 +57,13 @@ NativeBlockInputStream::NativeBlockInputStream(ReadBuffer & istr_, UInt64 server
    }
 }

+void NativeBlockInputStream::resetParser()
+{
+    istr_concrete = nullptr;
+    use_index = false;
+    header.clear();
+    avg_value_size_hints.clear();
+}

 void NativeBlockInputStream::readData(const IDataType & type, IColumn & column, ReadBuffer & istr, size_t rows, double avg_value_size_hint)
 {
@ -159,7 +166,7 @@ Block NativeBlockInputStream::readImpl()
            auto & header_column = header.getByName(column.name);
            if (!header_column.type->equals(*column.type))
            {
-                column.column = recursiveLowCardinalityConversion(column.column, column.type, header.getByPosition(i).type);
+                column.column = recursiveTypeConversion(column.column, column.type, header.getByPosition(i).type);
                column.type = header.getByPosition(i).type;
            }
        }
--- a/dbms/src/DataStreams/NativeBlockInputStream.h
+++ b/dbms/src/DataStreams/NativeBlockInputStream.h
@ -78,6 +78,9 @@ public:

    Block getHeader() const override;

+    void resetParser();
+
+
 protected:
    Block readImpl() override;

--- a/dbms/src/DataStreams/ParallelParsingBlockInputStream.cpp
+++ b/dbms/src/DataStreams/ParallelParsingBlockInputStream.cpp
@ -0,0 +1,203 @@
+#include <DataStreams/ParallelParsingBlockInputStream.h>
+#include "ParallelParsingBlockInputStream.h"
+
+namespace DB
+{
+
+void ParallelParsingBlockInputStream::segmentatorThreadFunction()
+{
+    setThreadName("Segmentator");
+    try
+    {
+        while (!finished)
+        {
+            const auto current_unit_number = segmentator_ticket_number % processing_units.size();
+            auto & unit = processing_units[current_unit_number];
+
+            {
+                std::unique_lock lock(mutex);
+                segmentator_condvar.wait(lock,
+                    [&]{ return unit.status == READY_TO_INSERT || finished; });
+            }
+
+            if (finished)
+            {
+                break;
+            }
+
+            assert(unit.status == READY_TO_INSERT);
+
+            // Segmentating the original input.
+            unit.segment.resize(0);
+
+            const bool have_more_data = file_segmentation_engine(original_buffer,
+                unit.segment, min_chunk_bytes);
+
+            unit.is_last = !have_more_data;
+            unit.status = READY_TO_PARSE;
+            scheduleParserThreadForUnitWithNumber(current_unit_number);
+            ++segmentator_ticket_number;
+
+            if (!have_more_data)
+            {
+                break;
+            }
+        }
+    }
+    catch (...)
+    {
+        onBackgroundException();
+    }
+}
+
+void ParallelParsingBlockInputStream::parserThreadFunction(size_t current_unit_number)
+{
+    try
+    {
+        setThreadName("ChunkParser");
+
+        auto & unit = processing_units[current_unit_number];
+
+        /*
+         * This is kind of suspicious -- the input_process_creator contract with
+         * respect to multithreaded use is not clear, but we hope that it is
+         * just a 'normal' factory class that doesn't have any state, and so we
+         * can use it from multiple threads simultaneously.
+         */
+        ReadBuffer read_buffer(unit.segment.data(), unit.segment.size(), 0);
+        auto parser = std::make_unique<InputStreamFromInputFormat>(
+            input_processor_creator(read_buffer, header, context,
+                row_input_format_params, format_settings));
+
+        unit.block_ext.block.clear();
+        unit.block_ext.block_missing_values.clear();
+
+        // We don't know how many blocks will be. So we have to read them all
+        // until an empty block occured.
+        Block block;
+        while (!finished && (block = parser->read()) != Block())
+        {
+            unit.block_ext.block.emplace_back(block);
+            unit.block_ext.block_missing_values.emplace_back(parser->getMissingValues());
+        }
+
+        // We suppose we will get at least some blocks for a non-empty buffer,
+        // except at the end of file. Also see a matching assert in readImpl().
+        assert(unit.is_last || unit.block_ext.block.size() > 0);
+
+        std::unique_lock lock(mutex);
+        unit.status = READY_TO_READ;
+        reader_condvar.notify_all();
+    }
+    catch (...)
+    {
+        onBackgroundException();
+    }
+}
+
+void ParallelParsingBlockInputStream::onBackgroundException()
+{
+    tryLogCurrentException(__PRETTY_FUNCTION__);
+
+    std::unique_lock lock(mutex);
+    if (!background_exception)
+    {
+        background_exception = std::current_exception();
+    }
+    finished = true;
+    reader_condvar.notify_all();
+    segmentator_condvar.notify_all();
+}
+
+Block ParallelParsingBlockInputStream::readImpl()
+{
+    if (isCancelledOrThrowIfKilled() || finished)
+    {
+        /**
+          * Check for background exception and rethrow it before we return.
+          */
+        std::unique_lock lock(mutex);
+        if (background_exception)
+        {
+            lock.unlock();
+            cancel(false);
+            std::rethrow_exception(background_exception);
+        }
+
+        return Block{};
+    }
+
+    const auto current_unit_number = reader_ticket_number % processing_units.size();
+    auto & unit = processing_units[current_unit_number];
+
+    if (!next_block_in_current_unit.has_value())
+    {
+        // We have read out all the Blocks from the previous Processing Unit,
+        // wait for the current one to become ready.
+        std::unique_lock lock(mutex);
+        reader_condvar.wait(lock, [&](){ return unit.status == READY_TO_READ || finished; });
+
+        if (finished)
+        {
+            /**
+              * Check for background exception and rethrow it before we return.
+              */
+            if (background_exception)
+            {
+                lock.unlock();
+                cancel(false);
+                std::rethrow_exception(background_exception);
+            }
+
+            return Block{};
+        }
+
+        assert(unit.status == READY_TO_READ);
+        next_block_in_current_unit = 0;
+    }
+
+    if (unit.block_ext.block.size() == 0)
+    {
+        /*
+         * Can we get zero blocks for an entire segment, when the format parser
+         * skips it entire content and does not create any blocks? Probably not,
+         * but if we ever do, we should add a loop around the above if, to skip
+         * these. Also see a matching assert in the parser thread.
+         */
+        assert(unit.is_last);
+        finished = true;
+        return Block{};
+    }
+
+    assert(next_block_in_current_unit.value() < unit.block_ext.block.size());
+
+    Block res = std::move(unit.block_ext.block.at(*next_block_in_current_unit));
+    last_block_missing_values = std::move(unit.block_ext.block_missing_values[*next_block_in_current_unit]);
+
+    next_block_in_current_unit.value() += 1;
+
+    if (*next_block_in_current_unit == unit.block_ext.block.size())
+    {
+        // Finished reading this Processing Unit, move to the next one.
+        next_block_in_current_unit.reset();
+        ++reader_ticket_number;
+
+        if (unit.is_last)
+        {
+            // It it was the last unit, we're finished.
+            finished = true;
+        }
+        else
+        {
+            // Pass the unit back to the segmentator.
+            std::unique_lock lock(mutex);
+            unit.status = READY_TO_INSERT;
+            segmentator_condvar.notify_all();
+        }
+    }
+
+    return res;
+}
+
+
+}
--- a/dbms/src/DataStreams/ParallelParsingBlockInputStream.h
+++ b/dbms/src/DataStreams/ParallelParsingBlockInputStream.h
@ -0,0 +1,258 @@
+#pragma once
+
+#include <DataStreams/IBlockInputStream.h>
+#include <Formats/FormatFactory.h>
+#include <Common/ThreadPool.h>
+#include <Common/setThreadName.h>
+#include <IO/BufferWithOwnMemory.h>
+#include <IO/ReadBuffer.h>
+#include <Processors/Formats/IRowInputFormat.h>
+#include <Processors/Formats/InputStreamFromInputFormat.h>
+#include <Interpreters/Context.h>
+
+namespace DB
+{
+
+/**
+ * ORDER-PRESERVING parallel parsing of data formats.
+ * It splits original data into chunks. Then each chunk is parsed by different thread.
+ * The number of chunks equals to the number or parser threads.
+ * The size of chunk is equal to min_chunk_bytes_for_parallel_parsing setting.
+ *
+ * This stream has three kinds of threads: one segmentator, multiple parsers,
+ * and one reader thread -- that is, the one from which readImpl() is called.
+ * They operate one after another on parts of data called "processing units".
+ * One unit consists of buffer with raw data from file, filled by segmentator
+ * thread. This raw data is then parsed by a parser thread to form a number of
+ * Blocks. These Blocks are returned to the parent stream from readImpl().
+ * After being read out, a processing unit is reused, to save on allocating
+ * memory for the raw buffer. The processing units are organized into a circular
+ * array to facilitate reuse and to apply backpressure on the segmentator thread
+ * -- after it runs out of processing units, it has to wait for the reader to
+ * read out the previous blocks.
+ * The outline of what the threads do is as follows:
+ * segmentator thread:
+ *  1) wait for the next processing unit to become empty
+ *  2) fill it with a part of input file
+ *  3) start a parser thread
+ *  4) repeat until eof
+ * parser thread:
+ *  1) parse the given raw buffer without any synchronization
+ *  2) signal that the given unit is ready to read
+ *  3) finish
+ * readImpl():
+ *  1) wait for the next processing unit to become ready to read
+ *  2) take the blocks from the processing unit to return them to the caller
+ *  3) signal that the processing unit is empty
+ *  4) repeat until it encounters unit that is marked as "past_the_end"
+ * All threads must also check for cancel/eof/exception flags.
+ */
+class ParallelParsingBlockInputStream : public IBlockInputStream
+{
+private:
+    using ReadCallback = std::function<void()>;
+
+    using InputProcessorCreator = std::function<InputFormatPtr(
+            ReadBuffer & buf,
+            const Block & header,
+            const Context & context,
+            const RowInputFormatParams & params,
+            const FormatSettings & settings)>;
+public:
+    struct InputCreatorParams
+    {
+        const Block &sample;
+        const Context &context;
+        const RowInputFormatParams& row_input_format_params;
+        const FormatSettings &settings;
+    };
+
+    struct Params
+    {
+        ReadBuffer & read_buffer;
+        const InputProcessorCreator &input_processor_creator;
+        const InputCreatorParams &input_creator_params;
+        FormatFactory::FileSegmentationEngine file_segmentation_engine;
+        int max_threads;
+        size_t min_chunk_bytes;
+    };
+
+    explicit ParallelParsingBlockInputStream(const Params & params)
+            : header(params.input_creator_params.sample),
+              context(params.input_creator_params.context),
+              row_input_format_params(params.input_creator_params.row_input_format_params),
+              format_settings(params.input_creator_params.settings),
+              input_processor_creator(params.input_processor_creator),
+              min_chunk_bytes(params.min_chunk_bytes),
+              original_buffer(params.read_buffer),
+              // Subtract one thread that we use for segmentation and one for
+              // reading. After that, must have at least two threads left for
+              // parsing. See the assertion below.
+              pool(std::max(2, params.max_threads - 2)),
+              file_segmentation_engine(params.file_segmentation_engine)
+    {
+        // See comment above.
+        assert(params.max_threads >= 4);
+
+        // One unit for each thread, including segmentator and reader, plus a
+        // couple more units so that the segmentation thread doesn't spuriously
+        // bump into reader thread on wraparound.
+        processing_units.resize(params.max_threads + 2);
+
+        segmentator_thread = ThreadFromGlobalPool([this] { segmentatorThreadFunction(); });
+    }
+
+    String getName() const override { return "ParallelParsing"; }
+
+    ~ParallelParsingBlockInputStream() override
+    {
+        finishAndWait();
+    }
+
+    void cancel(bool kill) override
+    {
+        /**
+          * Can be called multiple times, from different threads. Saturate the
+          * the kill flag with OR.
+          */
+        if (kill)
+            is_killed = true;
+        is_cancelled = true;
+
+        /*
+         * The format parsers themselves are not being cancelled here, so we'll
+         * have to wait until they process the current block. Given that the
+         * chunk size is on the order of megabytes, this should't be too long.
+         * We can't call IInputFormat->cancel here, because the parser object is
+         * local to the parser thread, and we don't want to introduce any
+         * synchronization between parser threads and the other threads to get
+         * better performance. An ideal solution would be to add a callback to
+         * IInputFormat that checks whether it was cancelled.
+         */
+
+        finishAndWait();
+    }
+
+    Block getHeader() const override
+    {
+        return header;
+    }
+
+protected:
+    //Reader routine
+    Block readImpl() override;
+
+    const BlockMissingValues & getMissingValues() const override
+    {
+        return last_block_missing_values;
+    }
+
+private:
+    const Block header;
+    const Context context;
+    const RowInputFormatParams row_input_format_params;
+    const FormatSettings format_settings;
+    const InputProcessorCreator input_processor_creator;
+
+    const size_t min_chunk_bytes;
+
+    /*
+     * This is declared as atomic to avoid UB, because parser threads access it
+     * without synchronization.
+     */
+    std::atomic<bool> finished{false};
+
+    BlockMissingValues last_block_missing_values;
+
+    // Original ReadBuffer to read from.
+    ReadBuffer & original_buffer;
+
+    //Non-atomic because it is used in one thread.
+    std::optional<size_t> next_block_in_current_unit;
+    size_t segmentator_ticket_number{0};
+    size_t reader_ticket_number{0};
+
+    std::mutex mutex;
+    std::condition_variable reader_condvar;
+    std::condition_variable segmentator_condvar;
+
+    // There are multiple "parsers", that's why we use thread pool.
+    ThreadPool pool;
+    // Reading and segmentating the file
+    ThreadFromGlobalPool segmentator_thread;
+
+    // Function to segment the file. Then "parsers" will parse that segments.
+    FormatFactory::FileSegmentationEngine file_segmentation_engine;
+
+    enum ProcessingUnitStatus
+    {
+        READY_TO_INSERT,
+        READY_TO_PARSE,
+        READY_TO_READ
+    };
+
+    struct BlockExt
+    {
+        std::vector<Block> block;
+        std::vector<BlockMissingValues> block_missing_values;
+    };
+
+    struct ProcessingUnit
+    {
+        explicit ProcessingUnit()
+            : status(ProcessingUnitStatus::READY_TO_INSERT)
+        {
+        }
+
+        BlockExt block_ext;
+        Memory<> segment;
+        std::atomic<ProcessingUnitStatus> status;
+        bool is_last{false};
+    };
+
+    std::exception_ptr background_exception = nullptr;
+
+    // We use deque instead of vector, because it does not require a move
+    // constructor, which is absent for atomics that are inside ProcessingUnit.
+    std::deque<ProcessingUnit> processing_units;
+
+
+    void scheduleParserThreadForUnitWithNumber(size_t unit_number)
+    {
+        pool.scheduleOrThrowOnError(std::bind(&ParallelParsingBlockInputStream::parserThreadFunction, this, unit_number));
+    }
+
+    void finishAndWait()
+    {
+        finished = true;
+
+        {
+            std::unique_lock lock(mutex);
+            segmentator_condvar.notify_all();
+            reader_condvar.notify_all();
+        }
+
+        if (segmentator_thread.joinable())
+            segmentator_thread.join();
+
+        try
+        {
+            pool.wait();
+        }
+        catch (...)
+        {
+            tryLogCurrentException(__PRETTY_FUNCTION__);
+        }
+    }
+
+    void segmentatorThreadFunction();
+    void parserThreadFunction(size_t bucket_num);
+
+    // Save/log a background exception, set termination flag, wake up all
+    // threads. This function is used by segmentator and parsed threads.
+    // readImpl() is called from the main thread, so the exception handling
+    // is different.
+    void onBackgroundException();
+};
+
+};
--- a/dbms/src/DataStreams/RemoteBlockInputStream.cpp
+++ b/dbms/src/DataStreams/RemoteBlockInputStream.cpp
@ -222,7 +222,7 @@ Block RemoteBlockInputStream::readImpl()
        if (isCancelledOrThrowIfKilled())
            return Block();

-        Connection::Packet packet = multiplexed_connections->receivePacket();
+        Packet packet = multiplexed_connections->receivePacket();

        switch (packet.type)
        {
@ -301,7 +301,7 @@ void RemoteBlockInputStream::readSuffixImpl()
    tryCancel("Cancelling query because enough data has been read");

    /// Get the remaining packets so that there is no out of sync in the connections to the replicas.
-    Connection::Packet packet = multiplexed_connections->drain();
+    Packet packet = multiplexed_connections->drain();
    switch (packet.type)
    {
        case Protocol::Server::EndOfStream:
--- a/dbms/src/DataStreams/RemoteBlockOutputStream.cpp
+++ b/dbms/src/DataStreams/RemoteBlockOutputStream.cpp
@ -32,7 +32,7 @@ RemoteBlockOutputStream::RemoteBlockOutputStream(Connection & connection_,

    while (true)
    {
-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();

        if (Protocol::Server::Data == packet.type)
        {
@ -77,7 +77,7 @@ void RemoteBlockOutputStream::write(const Block & block)
        auto packet_type = connection.checkPacket();
        if (packet_type && *packet_type == Protocol::Server::Exception)
        {
-            Connection::Packet packet = connection.receivePacket();
+            Packet packet = connection.receivePacket();
            packet.exception->rethrow();
        }

@ -101,7 +101,7 @@ void RemoteBlockOutputStream::writeSuffix()
    /// Wait for EndOfStream or Exception packet, skip Log packets.
    while (true)
    {
-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();

        if (Protocol::Server::EndOfStream == packet.type)
            break;
--- a/dbms/src/DataStreams/TTLBlockInputStream.cpp
+++ b/dbms/src/DataStreams/TTLBlockInputStream.cpp
@ -203,8 +203,15 @@ UInt32 TTLBlockInputStream::getTimestampByIndex(const IColumn * column, size_t i
        return date_lut.fromDayNum(DayNum(column_date->getData()[ind]));
    else if (const ColumnUInt32 * column_date_time = typeid_cast<const ColumnUInt32 *>(column))
        return column_date_time->getData()[ind];
-    else
-        throw Exception("Unexpected type of result ttl column", ErrorCodes::LOGICAL_ERROR);
+    else if (const ColumnConst * column_const = typeid_cast<const ColumnConst *>(column))
+    {
+        if (typeid_cast<const ColumnUInt16 *>(&column_const->getDataColumn()))
+            return date_lut.fromDayNum(DayNum(column_const->getValue<UInt16>()));
+        else if (typeid_cast<const ColumnUInt32 *>(&column_const->getDataColumn()))
+            return column_const->getValue<UInt32>();
+    }
+
+    throw Exception("Unexpected type of result TTL column", ErrorCodes::LOGICAL_ERROR);
 }

 }
--- a/dbms/src/DataStreams/TotalsHavingBlockInputStream.h
+++ b/dbms/src/DataStreams/TotalsHavingBlockInputStream.h
@ -10,7 +10,7 @@ class Arena;
 using ArenaPtr = std::shared_ptr<Arena>;

 class ExpressionActions;
-
+enum class TotalsMode;

 /** Takes blocks after grouping, with non-finalized aggregate functions.
  * Calculates total values according to totals_mode.
--- a/dbms/src/DataTypes/DataTypeLowCardinality.cpp
+++ b/dbms/src/DataTypes/DataTypeLowCardinality.cpp
@ -894,7 +894,7 @@ MutableColumnUniquePtr DataTypeLowCardinality::createColumnUniqueImpl(const IDat
    if (isColumnedAsNumber(type))
    {
        MutableColumnUniquePtr column;
-        TypeListNumbers::forEach(CreateColumnVector(column, *type, creator));
+        TypeListNativeNumbers::forEach(CreateColumnVector(column, *type, creator));

        if (!column)
            throw Exception("Unexpected numeric type: " + type->getName(), ErrorCodes::LOGICAL_ERROR);
--- a/dbms/src/DataTypes/DataTypeLowCardinality.h
+++ b/dbms/src/DataTypes/DataTypeLowCardinality.h
@ -126,6 +126,6 @@ DataTypePtr recursiveRemoveLowCardinality(const DataTypePtr & type);
 ColumnPtr recursiveRemoveLowCardinality(const ColumnPtr & column);

 /// Convert column of type from_type to type to_type by converting nested LowCardinality columns.
-ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const DataTypePtr & from_type, const DataTypePtr & to_type);
+ColumnPtr recursiveTypeConversion(const ColumnPtr & column, const DataTypePtr & from_type, const DataTypePtr & to_type);

 }
--- a/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp
+++ b/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp
@ -84,7 +84,7 @@ ColumnPtr recursiveRemoveLowCardinality(const ColumnPtr & column)
    return column;
 }

-ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const DataTypePtr & from_type, const DataTypePtr & to_type)
+ColumnPtr recursiveTypeConversion(const ColumnPtr & column, const DataTypePtr & from_type, const DataTypePtr & to_type)
 {
    if (!column)
        return column;
@ -92,10 +92,14 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
    if (from_type->equals(*to_type))
        return column;

+    /// We can allow insert enum column if it's numeric type is the same as the column's type in table.
+    if (WhichDataType(to_type).isEnum() && from_type->getTypeId() == to_type->getTypeId())
+        return column;
+
    if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get()))
    {
        auto & nested = column_const->getDataColumnPtr();
-        auto nested_no_lc = recursiveLowCardinalityConversion(nested, from_type, to_type);
+        auto nested_no_lc = recursiveTypeConversion(nested, from_type, to_type);
        if (nested.get() == nested_no_lc.get())
            return column;

@ -131,7 +135,7 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
            auto & nested_to = to_array_type->getNestedType();

            return ColumnArray::create(
-                    recursiveLowCardinalityConversion(column_array->getDataPtr(), nested_from, nested_to),
+                    recursiveTypeConversion(column_array->getDataPtr(), nested_from, nested_to),
                    column_array->getOffsetsPtr());
        }
    }
@ -154,7 +158,7 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
            for (size_t i = 0; i < columns.size(); ++i)
            {
                auto & element = columns[i];
-                auto element_no_lc = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i));
+                auto element_no_lc = recursiveTypeConversion(element, from_elements.at(i), to_elements.at(i));
                if (element.get() != element_no_lc.get())
                {
                    element = element_no_lc;
--- a/dbms/src/Databases/DatabaseLazy.cpp
+++ b/dbms/src/Databases/DatabaseLazy.cpp
@ -361,9 +361,8 @@ StoragePtr DatabaseLazy::loadTable(const Context & context, const String & table
    }
    catch (const Exception & e)
    {
-        throw Exception("Cannot create table from metadata file " + table_metadata_path + ", error: " + e.displayText() +
-            ", stack trace:\n" + e.getStackTrace().toString(),
-            ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA);
+        throw Exception("Cannot create table from metadata file " + table_metadata_path + ". Error: " + DB::getCurrentExceptionMessage(true),
+                e, DB::ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA);
    }
 }

--- a/dbms/src/Databases/DatabaseOrdinary.cpp
+++ b/dbms/src/Databases/DatabaseOrdinary.cpp
@ -27,6 +27,7 @@
 #include <Poco/Event.h>
 #include <Common/Stopwatch.h>
 #include <Common/StringUtils/StringUtils.h>
+#include <Common/quoteString.h>
 #include <Common/ThreadPool.h>
 #include <Common/escapeForFileName.h>
 #include <Common/typeid_cast.h>
@ -81,9 +82,8 @@ try
 catch (const Exception & e)
 {
    throw Exception(
-        "Cannot create object '" + query.table + "' from query " + serializeAST(query) + ", error: " + e.displayText() + ", stack trace:\n"
-            + e.getStackTrace().toString(),
-        ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA);
+        "Cannot create object '" + query.table + "' from query " + serializeAST(query) + ". Error: " + DB::getCurrentExceptionMessage(true),
+        e, DB::ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA);
 }


@ -138,8 +138,7 @@ void DatabaseOrdinary::loadStoredObjects(
        catch (const Exception & e)
        {
            throw Exception(
-                "Cannot parse definition from metadata file " + full_path + ", error: " + e.displayText() + ", stack trace:\n"
-                    + e.getStackTrace().toString(), ErrorCodes::CANNOT_PARSE_TEXT);
+                "Cannot parse definition from metadata file " + full_path + ". Error: " + DB::getCurrentExceptionMessage(true), e, ErrorCodes::CANNOT_PARSE_TEXT);
        }

    });
@ -180,7 +179,15 @@ void DatabaseOrdinary::loadStoredObjects(
    auto & external_loader = context.getExternalDictionariesLoader();
    external_loader.addConfigRepository(getDatabaseName(), std::move(dictionaries_repository));
    bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true);
-    external_loader.reload(!lazy_load);
+
+    auto filter = [this](const std::string & dictionary_name) -> bool
+    {
+        if (!startsWith(dictionary_name, name + "." /* db name */))
+            return false;
+        LOG_INFO(log, "Loading dictionary " << backQuote(dictionary_name) << ", for database " << backQuote(name));
+        return true;
+    };
+    external_loader.reload(filter, !lazy_load);
 }


--- a/dbms/src/Dictionaries/CacheDictionary.h
+++ b/dbms/src/Dictionaries/CacheDictionary.h
@ -48,7 +48,7 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count.load(std::memory_order_relaxed)) / size; }

-    bool isCached() const override { return true; }
+    bool supportUpdates() const override { return false; }

    std::shared_ptr<const IExternalLoadable> clone() const override
    {
--- a/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp
+++ b/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp
@ -125,7 +125,11 @@ BlockInputStreamPtr ClickHouseDictionarySource::loadAll()
      *    the necessity of holding process_list_element shared pointer.
      */
    if (is_local)
-        return executeQuery(load_all_query, context, true).in;
+    {
+        BlockIO res = executeQuery(load_all_query, context, true);
+        /// FIXME res.in may implicitly use some objects owned be res, but them will be destructed after return
+        return res.in;
+    }
    return std::make_shared<RemoteBlockInputStream>(pool, load_all_query, sample_block, context);
 }

--- a/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h
+++ b/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h
@ -71,7 +71,7 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count.load(std::memory_order_relaxed)) / size; }

-    bool isCached() const override { return true; }
+    bool supportUpdates() const override { return false; }

    std::shared_ptr<const IExternalLoadable> clone() const override
    {
--- a/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h
+++ b/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h
@ -46,8 +46,6 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count) / bucket_count; }

-    bool isCached() const override { return false; }
-
    std::shared_ptr<const IExternalLoadable> clone() const override
    {
        return std::make_shared<ComplexKeyHashedDictionary>(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block);
--- a/dbms/src/Dictionaries/Embedded/RegionsNames.cpp
+++ b/dbms/src/Dictionaries/Embedded/RegionsNames.cpp
@ -19,7 +19,7 @@ RegionsNames::RegionsNames(IRegionsNamesDataProviderPtr data_provider)
 {
    for (size_t language_id = 0; language_id < SUPPORTED_LANGUAGES_COUNT; ++language_id)
    {
-        const std::string & language = getSupportedLanguages()[language_id];
+        const std::string & language = supported_languages[language_id];
        names_sources[language_id] = data_provider->getLanguageRegionsNamesSource(language);
    }

@ -34,7 +34,7 @@ std::string RegionsNames::dumpSupportedLanguagesNames()
        if (i > 0)
            res += ", ";
        res += '\'';
-        res += getLanguageAliases()[i].name;
+        res += language_aliases[i].first;
        res += '\'';
    }
    return res;
@ -48,7 +48,7 @@ void RegionsNames::reload()
    RegionID max_region_id = 0;
    for (size_t language_id = 0; language_id < SUPPORTED_LANGUAGES_COUNT; ++language_id)
    {
-        const std::string & language = getSupportedLanguages()[language_id];
+        const std::string & language = supported_languages[language_id];

        auto names_source = names_sources[language_id];

--- a/dbms/src/Dictionaries/Embedded/RegionsNames.h
+++ b/dbms/src/Dictionaries/Embedded/RegionsNames.h
@ -20,7 +20,7 @@
 class RegionsNames
 {
 public:
-    enum class Language
+    enum class Language : size_t
    {
        RU = 0,
        EN,
@ -28,36 +28,35 @@ public:
        BY,
        KZ,
        TR,
+
+        END
    };

 private:
-    static const size_t ROOT_LANGUAGE = 0;
-    static const size_t SUPPORTED_LANGUAGES_COUNT = 6;
-    static const size_t LANGUAGE_ALIASES_COUNT = 7;
-
-    static const char ** getSupportedLanguages()
+    static inline constexpr const char * supported_languages[] =
    {
-        static const char * res[]{"ru", "en", "ua", "by", "kz", "tr"};
-        return res;
-    }
-
-    struct language_alias
-    {
-        const char * const name;
-        const Language lang;
+        "ru",
+        "en",
+        "ua",
+        "by",
+        "kz",
+        "tr"
    };
-    static const language_alias * getLanguageAliases()
-    {
-        static constexpr const language_alias language_aliases[]{{"ru", Language::RU},
-                                                                 {"en", Language::EN},
-                                                                 {"ua", Language::UA},
-                                                                 {"uk", Language::UA},
-                                                                 {"by", Language::BY},
-                                                                 {"kz", Language::KZ},
-                                                                 {"tr", Language::TR}};

-        return language_aliases;
-    }
+    static inline constexpr std::pair<const char *, Language> language_aliases[] =
+    {
+        {"ru", Language::RU},
+        {"en", Language::EN},
+        {"ua", Language::UA},
+        {"uk", Language::UA},
+        {"by", Language::BY},
+        {"kz", Language::KZ},
+        {"tr", Language::TR}
+    };
+
+    static constexpr size_t ROOT_LANGUAGE = 0;
+    static constexpr size_t SUPPORTED_LANGUAGES_COUNT = size_t(Language::END);
+    static constexpr size_t LANGUAGE_ALIASES_COUNT = sizeof(language_aliases);

    using NamesSources = std::vector<std::shared_ptr<ILanguageRegionsNamesDataSource>>;

@ -94,9 +93,9 @@ public:
        {
            for (size_t i = 0; i < LANGUAGE_ALIASES_COUNT; ++i)
            {
-                const auto & alias = getLanguageAliases()[i];
-                if (language[0] == alias.name[0] && language[1] == alias.name[1])
-                    return alias.lang;
+                const auto & alias = language_aliases[i];
+                if (language[0] == alias.first[0] && language[1] == alias.first[1])
+                    return alias.second;
            }
        }
        throw Poco::Exception("Unsupported language for region name. Supported languages are: " + dumpSupportedLanguagesNames() + ".");
--- a/dbms/src/Dictionaries/FlatDictionary.h
+++ b/dbms/src/Dictionaries/FlatDictionary.h
@ -43,8 +43,6 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count) / bucket_count; }

-    bool isCached() const override { return false; }
-
    std::shared_ptr<const IExternalLoadable> clone() const override
    {
        return std::make_shared<FlatDictionary>(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block);
--- a/dbms/src/Dictionaries/HashedDictionary.h
+++ b/dbms/src/Dictionaries/HashedDictionary.h
@ -48,8 +48,6 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count) / bucket_count; }

-    bool isCached() const override { return false; }
-
    std::shared_ptr<const IExternalLoadable> clone() const override
    {
        return std::make_shared<HashedDictionary>(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, sparse, saved_block);
--- a/dbms/src/Dictionaries/IDictionary.h
+++ b/dbms/src/Dictionaries/IDictionary.h
@ -37,8 +37,6 @@ struct IDictionaryBase : public IExternalLoadable

    virtual double getLoadFactor() const = 0;

-    virtual bool isCached() const = 0;
-
    virtual const IDictionarySource * getSource() const = 0;

    virtual const DictionaryStructure & getStructure() const = 0;
@ -47,7 +45,7 @@ struct IDictionaryBase : public IExternalLoadable

    virtual BlockInputStreamPtr getBlockInputStream(const Names & column_names, size_t max_block_size) const = 0;

-    bool supportUpdates() const override { return !isCached(); }
+    bool supportUpdates() const override { return true; }

    bool isModified() const override
    {
--- a/dbms/src/Dictionaries/RangeHashedDictionary.h
+++ b/dbms/src/Dictionaries/RangeHashedDictionary.h
@ -38,8 +38,6 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count) / bucket_count; }

-    bool isCached() const override { return false; }
-
    std::shared_ptr<const IExternalLoadable> clone() const override
    {
        return std::make_shared<RangeHashedDictionary>(dictionary_name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty);
--- a/dbms/src/Dictionaries/TrieDictionary.h
+++ b/dbms/src/Dictionaries/TrieDictionary.h
@ -47,8 +47,6 @@ public:

    double getLoadFactor() const override { return static_cast<double>(element_count) / bucket_count; }

-    bool isCached() const override { return false; }
-
    std::shared_ptr<const IExternalLoadable> clone() const override
    {
        return std::make_shared<TrieDictionary>(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty);
--- a/dbms/src/Dictionaries/readInvalidateQuery.cpp
+++ b/dbms/src/Dictionaries/readInvalidateQuery.cpp
@ -1,6 +1,7 @@
 #include "readInvalidateQuery.h"
 #include <DataStreams/IBlockInputStream.h>
 #include <IO/WriteBufferFromString.h>
+#include <Formats/FormatSettings.h>


 namespace DB
--- a/dbms/src/Formats/FormatFactory.cpp
+++ b/dbms/src/Formats/FormatFactory.cpp
@ -1,8 +1,10 @@
+#include <algorithm>
 #include <Common/config.h>
 #include <Common/Exception.h>
 #include <Interpreters/Context.h>
 #include <Core/Settings.h>
 #include <DataStreams/MaterializingBlockOutputStream.h>
+#include <DataStreams/ParallelParsingBlockInputStream.h>
 #include <Formats/FormatSettings.h>
 #include <Formats/FormatFactory.h>
 #include <Processors/Formats/IRowInputFormat.h>
@ -93,7 +95,7 @@ BlockInputStreamPtr FormatFactory::getInput(

    if (!getCreators(name).input_processor_creator)
    {
-        const auto & input_getter = getCreators(name).inout_creator;
+        const auto & input_getter = getCreators(name).input_creator;
        if (!input_getter)
            throw Exception("Format " + name + " is not suitable for input", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT);

@ -103,6 +105,37 @@ BlockInputStreamPtr FormatFactory::getInput(
        return input_getter(buf, sample, context, max_block_size, callback ? callback : ReadCallback(), format_settings);
    }

+    const Settings & settings = context.getSettingsRef();
+    const auto & file_segmentation_engine = getCreators(name).file_segmentation_engine;
+
+    // Doesn't make sense to use parallel parsing with less than four threads
+    // (segmentator + two parsers + reader).
+    if (settings.input_format_parallel_parsing
+        && file_segmentation_engine
+        && settings.max_threads >= 4)
+    {
+        const auto & input_getter = getCreators(name).input_processor_creator;
+        if (!input_getter)
+            throw Exception("Format " + name + " is not suitable for input", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT);
+
+        FormatSettings format_settings = getInputFormatSetting(settings);
+
+        RowInputFormatParams row_input_format_params;
+        row_input_format_params.max_block_size = max_block_size;
+        row_input_format_params.allow_errors_num = format_settings.input_allow_errors_num;
+        row_input_format_params.allow_errors_ratio = format_settings.input_allow_errors_ratio;
+        row_input_format_params.callback = std::move(callback);
+        row_input_format_params.max_execution_time = settings.max_execution_time;
+        row_input_format_params.timeout_overflow_mode = settings.timeout_overflow_mode;
+
+        auto input_creator_params = ParallelParsingBlockInputStream::InputCreatorParams{sample, context, row_input_format_params, format_settings};
+        ParallelParsingBlockInputStream::Params params{buf, input_getter,
+            input_creator_params, file_segmentation_engine,
+            static_cast<int>(settings.max_threads),
+            settings.min_chunk_bytes_for_parallel_parsing};
+        return std::make_shared<ParallelParsingBlockInputStream>(params);
+    }
+
    auto format = getInputFormat(name, buf, sample, context, max_block_size, std::move(callback));
    return std::make_shared<InputStreamFromInputFormat>(std::move(format));
 }
@ -191,7 +224,7 @@ OutputFormatPtr FormatFactory::getOutputFormat(

 void FormatFactory::registerInputFormat(const String & name, InputCreator input_creator)
 {
-    auto & target = dict[name].inout_creator;
+    auto & target = dict[name].input_creator;
    if (target)
        throw Exception("FormatFactory: Input format " + name + " is already registered", ErrorCodes::LOGICAL_ERROR);
    target = std::move(input_creator);
@ -221,6 +254,13 @@ void FormatFactory::registerOutputFormatProcessor(const String & name, OutputPro
    target = std::move(output_creator);
 }

+void FormatFactory::registerFileSegmentationEngine(const String & name, FileSegmentationEngine file_segmentation_engine)
+{
+    auto & target = dict[name].file_segmentation_engine;
+    if (target)
+        throw Exception("FormatFactory: File segmentation engine " + name + " is already registered", ErrorCodes::LOGICAL_ERROR);
+    target = file_segmentation_engine;
+}

 /// Formats for both input/output.

@ -241,6 +281,8 @@ void registerInputFormatProcessorTSKV(FormatFactory & factory);
 void registerOutputFormatProcessorTSKV(FormatFactory & factory);
 void registerInputFormatProcessorJSONEachRow(FormatFactory & factory);
 void registerOutputFormatProcessorJSONEachRow(FormatFactory & factory);
+void registerInputFormatProcessorJSONCompactEachRow(FormatFactory & factory);
+void registerOutputFormatProcessorJSONCompactEachRow(FormatFactory & factory);
 void registerInputFormatProcessorParquet(FormatFactory & factory);
 void registerInputFormatProcessorORC(FormatFactory & factory);
 void registerOutputFormatProcessorParquet(FormatFactory & factory);
@ -249,6 +291,12 @@ void registerOutputFormatProcessorProtobuf(FormatFactory & factory);
 void registerInputFormatProcessorTemplate(FormatFactory & factory);
 void registerOutputFormatProcessorTemplate(FormatFactory &factory);

+/// File Segmentation Engines for parallel reading
+
+void registerFileSegmentationEngineTabSeparated(FormatFactory & factory);
+void registerFileSegmentationEngineCSV(FormatFactory & factory);
+void registerFileSegmentationEngineJSONEachRow(FormatFactory & factory);
+
 /// Output only (presentational) formats.

 void registerOutputFormatNull(FormatFactory & factory);
@ -290,6 +338,8 @@ FormatFactory::FormatFactory()
    registerOutputFormatProcessorTSKV(*this);
    registerInputFormatProcessorJSONEachRow(*this);
    registerOutputFormatProcessorJSONEachRow(*this);
+    registerInputFormatProcessorJSONCompactEachRow(*this);
+    registerOutputFormatProcessorJSONCompactEachRow(*this);
    registerInputFormatProcessorProtobuf(*this);
    registerOutputFormatProcessorProtobuf(*this);
    registerInputFormatProcessorCapnProto(*this);
@ -299,6 +349,9 @@ FormatFactory::FormatFactory()
    registerInputFormatProcessorTemplate(*this);
    registerOutputFormatProcessorTemplate(*this);

+    registerFileSegmentationEngineTabSeparated(*this);
+    registerFileSegmentationEngineCSV(*this);
+    registerFileSegmentationEngineJSONEachRow(*this);

    registerOutputFormatNull(*this);

--- a/dbms/src/Formats/FormatFactory.h
+++ b/dbms/src/Formats/FormatFactory.h
@ -2,6 +2,7 @@

 #include <Core/Types.h>
 #include <DataStreams/IBlockStream_fwd.h>
+#include <IO/BufferWithOwnMemory.h>

 #include <functional>
 #include <memory>
@ -41,6 +42,15 @@ public:
    /// It's initial purpose was to extract payload for virtual columns from Kafka Consumer ReadBuffer.
    using ReadCallback = std::function<void()>;

+    /** Fast reading data from buffer and save result to memory.
+      * Reads at least min_chunk_bytes and some more until the end of the chunk, depends on the format.
+      * Used in ParallelParsingBlockInputStream.
+      */
+    using FileSegmentationEngine = std::function<bool(
+        ReadBuffer & buf,
+        DB::Memory<> & memory,
+        size_t min_chunk_bytes)>;
+
    /// This callback allows to perform some additional actions after writing a single row.
    /// It's initial purpose was to flush Kafka message for each row.
    using WriteCallback = std::function<void()>;
@ -77,10 +87,11 @@ private:

    struct Creators
    {
-        InputCreator inout_creator;
+        InputCreator input_creator;
        OutputCreator output_creator;
        InputProcessorCreator input_processor_creator;
        OutputProcessorCreator output_processor_creator;
+        FileSegmentationEngine file_segmentation_engine;
    };

    using FormatsDictionary = std::unordered_map<String, Creators>;
@ -114,6 +125,7 @@ public:
    /// Register format by its name.
    void registerInputFormat(const String & name, InputCreator input_creator);
    void registerOutputFormat(const String & name, OutputCreator output_creator);
+    void registerFileSegmentationEngine(const String & name, FileSegmentationEngine file_segmentation_engine);

    void registerInputFormatProcessor(const String & name, InputProcessorCreator input_creator);
    void registerOutputFormatProcessor(const String & name, OutputProcessorCreator output_creator);
--- a/dbms/src/Functions/FunctionsConversion.h
+++ b/dbms/src/Functions/FunctionsConversion.h
@ -971,8 +971,16 @@ public:
                ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);

        if (!isStringOrFixedString(arguments[0].type))
-            throw Exception("Illegal type " + arguments[0].type->getName() + " of first argument of function " + getName(),
-                ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+        {
+            if (this->getName().find("OrZero") != std::string::npos ||
+                this->getName().find("OrNull") != std::string::npos)
+                throw Exception("Illegal type " + arguments[0].type->getName() + " of first argument of function " + getName() +
+                        ". Conversion functions with postfix 'OrZero' or 'OrNull'  should take String argument",
+                        ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+            else
+                throw Exception("Illegal type " + arguments[0].type->getName() + " of first argument of function " + getName(),
+                        ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+        }

        if (arguments.size() == 2)
        {
--- a/dbms/src/Functions/GatherUtils/Algorithms.h
+++ b/dbms/src/Functions/GatherUtils/Algorithms.h
@ -1,5 +1,6 @@
 #pragma once

+#include <Core/Types.h>
 #include <Common/FieldVisitors.h>
 #include "Sources.h"
 #include "Sinks.h"
@ -79,8 +80,16 @@ inline ALWAYS_INLINE void writeSlice(const NumericArraySlice<T> & slice, Generic
 {
    for (size_t i = 0; i < slice.size; ++i)
    {
-        Field field = T(slice.data[i]);
-        sink.elements.insert(field);
+        if constexpr (IsDecimalNumber<T>)
+        {
+            DecimalField field(T(slice.data[i]), 0); /// TODO: Decimal scale
+            sink.elements.insert(field);
+        }
+        else
+        {
+            Field field = T(slice.data[i]);
+            sink.elements.insert(field);
+        }
    }
    sink.current_offset += slice.size;
 }
@ -422,9 +431,18 @@ bool sliceHasImpl(const FirstSliceType & first, const SecondSliceType & second,
 }

 template <typename T, typename U>
-bool sliceEqualElements(const NumericArraySlice<T> & first, const NumericArraySlice<U> & second, size_t first_ind, size_t second_ind)
+bool sliceEqualElements(const NumericArraySlice<T> & first [[maybe_unused]],
+                        const NumericArraySlice<U> & second [[maybe_unused]],
+                        size_t first_ind [[maybe_unused]],
+                        size_t second_ind [[maybe_unused]])
 {
-    return accurate::equalsOp(first.data[first_ind], second.data[second_ind]);
+    /// TODO: Decimal scale
+    if constexpr (IsDecimalNumber<T> && IsDecimalNumber<U>)
+        return accurate::equalsOp(typename T::NativeType(first.data[first_ind]), typename U::NativeType(second.data[second_ind]));
+    else if constexpr (IsDecimalNumber<T> || IsDecimalNumber<U>)
+        return false;
+    else
+        return accurate::equalsOp(first.data[first_ind], second.data[second_ind]);
 }

 template <typename T>
--- a/dbms/src/Functions/GatherUtils/Sinks.h
+++ b/dbms/src/Functions/GatherUtils/Sinks.h
@ -3,6 +3,7 @@
 #include "IArraySink.h"

 #include <Columns/ColumnVector.h>
+#include <Columns/ColumnDecimal.h>
 #include <Columns/ColumnArray.h>
 #include <Columns/ColumnString.h>
 #include <Columns/ColumnFixedString.h>
@ -33,17 +34,18 @@ struct NullableValueSource;
 template <typename T>
 struct NumericArraySink : public ArraySinkImpl<NumericArraySink<T>>
 {
+    using ColVecType = std::conditional_t<IsDecimalNumber<T>, ColumnDecimal<T>, ColumnVector<T>>;
    using CompatibleArraySource = NumericArraySource<T>;
    using CompatibleValueSource = NumericValueSource<T>;

-    typename ColumnVector<T>::Container & elements;
+    typename ColVecType::Container & elements;
    typename ColumnArray::Offsets & offsets;

    size_t row_num = 0;
    ColumnArray::Offset current_offset = 0;

    NumericArraySink(ColumnArray & arr, size_t column_size)
-            : elements(typeid_cast<ColumnVector<T> &>(arr.getData()).getData()), offsets(arr.getOffsets())
+            : elements(typeid_cast<ColVecType &>(arr.getData()).getData()), offsets(arr.getOffsets())
    {
        offsets.resize(column_size);
    }
--- a/dbms/src/Functions/GatherUtils/Sources.h
+++ b/dbms/src/Functions/GatherUtils/Sources.h
@ -1,6 +1,7 @@
 #pragma once

 #include <Columns/ColumnVector.h>
+#include <Columns/ColumnDecimal.h>
 #include <Columns/ColumnArray.h>
 #include <Columns/ColumnString.h>
 #include <Columns/ColumnFixedString.h>
@ -30,17 +31,18 @@ namespace GatherUtils
 template <typename T>
 struct NumericArraySource : public ArraySourceImpl<NumericArraySource<T>>
 {
+    using ColVecType = std::conditional_t<IsDecimalNumber<T>, ColumnDecimal<T>, ColumnVector<T>>;
    using Slice = NumericArraySlice<T>;
    using Column = ColumnArray;

-    const typename ColumnVector<T>::Container & elements;
+    const typename ColVecType::Container & elements;
    const typename ColumnArray::Offsets & offsets;

    size_t row_num = 0;
    ColumnArray::Offset prev_offset = 0;

    explicit NumericArraySource(const ColumnArray & arr)
-            : elements(typeid_cast<const ColumnVector<T> &>(arr.getData()).getData()), offsets(arr.getOffsets())
+            : elements(typeid_cast<const ColVecType &>(arr.getData()).getData()), offsets(arr.getOffsets())
    {
    }

@ -650,7 +652,7 @@ template <typename T>
 struct NumericValueSource : ValueSourceImpl<NumericValueSource<T>>
 {
    using Slice = NumericValueSlice<T>;
-    using Column = ColumnVector<T>;
+    using Column = std::conditional_t<IsDecimalNumber<T>, ColumnDecimal<T>, ColumnVector<T>>;

    const T * begin;
    size_t total_rows;
--- a/dbms/src/Functions/GatherUtils/createArraySink.cpp
+++ b/dbms/src/Functions/GatherUtils/createArraySink.cpp
@ -14,7 +14,9 @@ struct ArraySinkCreator<Type, Types...>
 {
    static std::unique_ptr<IArraySink> create(ColumnArray & col, NullMap * null_map, size_t column_size)
    {
-        if (typeid_cast<ColumnVector<Type> *>(&col.getData()))
+        using ColVecType = std::conditional_t<IsDecimalNumber<Type>, ColumnDecimal<Type>, ColumnVector<Type>>;
+
+        if (typeid_cast<ColVecType *>(&col.getData()))
        {
            if (null_map)
                return std::make_unique<NullableArraySink<NumericArraySink<Type>>>(col, *null_map, column_size);
--- a/dbms/src/Functions/GatherUtils/createArraySource.cpp
+++ b/dbms/src/Functions/GatherUtils/createArraySource.cpp
@ -14,7 +14,9 @@ struct ArraySourceCreator<Type, Types...>
 {
    static std::unique_ptr<IArraySource> create(const ColumnArray & col, const NullMap * null_map, bool is_const, size_t total_rows)
    {
-        if (typeid_cast<const ColumnVector<Type> *>(&col.getData()))
+        using ColVecType = std::conditional_t<IsDecimalNumber<Type>, ColumnDecimal<Type>, ColumnVector<Type>>;
+
+        if (typeid_cast<const ColVecType *>(&col.getData()))
        {
            if (null_map)
            {
--- a/dbms/src/Functions/GatherUtils/createValueSource.cpp
+++ b/dbms/src/Functions/GatherUtils/createValueSource.cpp
@ -14,7 +14,9 @@ struct ValueSourceCreator<Type, Types...>
 {
    static std::unique_ptr<IValueSource> create(const IColumn & col, const NullMap * null_map, bool is_const, size_t total_rows)
    {
-        if (auto column_vector = typeid_cast<const ColumnVector<Type> *>(&col))
+        using ColVecType = std::conditional_t<IsDecimalNumber<Type>, ColumnDecimal<Type>, ColumnVector<Type>>;
+
+        if (auto column_vector = typeid_cast<const ColVecType *>(&col))
        {
            if (null_map)
            {
--- a/dbms/src/Functions/GeoUtils.h
+++ b/dbms/src/Functions/GeoUtils.h
@ -590,7 +590,7 @@ struct CallPointInPolygon<Type, Types ...>
    template <typename PointInPolygonImpl>
    static ColumnPtr call(const IColumn & x, const IColumn & y, PointInPolygonImpl && impl)
    {
-        using Impl = typename ApplyTypeListForClass<::DB::GeoUtils::CallPointInPolygon, TypeListNumbers>::Type;
+        using Impl = typename ApplyTypeListForClass<::DB::GeoUtils::CallPointInPolygon, TypeListNativeNumbers>::Type;
        if (auto column = typeid_cast<const ColumnVector<Type> *>(&x))
            return Impl::template call<Type>(*column, y, impl);
        return CallPointInPolygon<Types ...>::call(x, y, impl);
@ -616,7 +616,7 @@ struct CallPointInPolygon<>
 template <typename PointInPolygonImpl>
 ColumnPtr pointInPolygon(const IColumn & x, const IColumn & y, PointInPolygonImpl && impl)
 {
-    using Impl = typename ApplyTypeListForClass<::DB::GeoUtils::CallPointInPolygon, TypeListNumbers>::Type;
+    using Impl = typename ApplyTypeListForClass<::DB::GeoUtils::CallPointInPolygon, TypeListNativeNumbers>::Type;
    return Impl::call(x, y, impl);
 }

--- a/dbms/src/Functions/array/arrayCompact.cpp
+++ b/dbms/src/Functions/array/arrayCompact.cpp
@ -1,5 +1,7 @@
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnDecimal.h>
 #include <Functions/array/FunctionArrayMapped.h>
 #include <Functions/FunctionFactory.h>

@ -27,16 +29,23 @@ struct ArrayCompactImpl
    template <typename T>
    static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
    {
-        const ColumnVector<T> * src_values_column = checkAndGetColumn<ColumnVector<T>>(mapped.get());
+        using ColVecType = std::conditional_t<IsDecimalNumber<T>, ColumnDecimal<T>, ColumnVector<T>>;
+
+        const ColVecType * src_values_column = checkAndGetColumn<ColVecType>(mapped.get());

        if (!src_values_column)
            return false;

        const IColumn::Offsets & src_offsets = array.getOffsets();
-        const typename ColumnVector<T>::Container & src_values = src_values_column->getData();
+        const typename ColVecType::Container & src_values = src_values_column->getData();

-        auto res_values_column = ColumnVector<T>::create(src_values.size());
-        typename ColumnVector<T>::Container & res_values = res_values_column->getData();
+        typename ColVecType::MutablePtr res_values_column;
+        if constexpr (IsDecimalNumber<T>)
+            res_values_column = ColVecType::create(src_values.size(), src_values.getScale());
+        else
+            res_values_column = ColVecType::create(src_values.size());
+
+        typename ColVecType::Container & res_values = res_values_column->getData();
        size_t src_offsets_size = src_offsets.size();
        auto res_offsets_column = ColumnArray::ColumnOffsets::create(src_offsets_size);
        IColumn::Offsets & res_offsets = res_offsets_column->getData();
@ -129,7 +138,10 @@ struct ArrayCompactImpl
            executeType< Int32 >(mapped, array, res) ||
            executeType< Int64 >(mapped, array, res) ||
            executeType<Float32>(mapped, array, res) ||
-            executeType<Float64>(mapped, array, res)))
+            executeType<Float64>(mapped, array, res)) ||
+            executeType<Decimal32>(mapped, array, res) ||
+            executeType<Decimal64>(mapped, array, res) ||
+            executeType<Decimal128>(mapped, array, res))
        {
            executeGeneric(mapped, array, res);
        }
--- a/dbms/src/Functions/array/arrayCumSum.cpp
+++ b/dbms/src/Functions/array/arrayCumSum.cpp
@ -1,5 +1,7 @@
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnDecimal.h>
 #include "FunctionArrayMapped.h"
 #include <Functions/FunctionFactory.h>

@ -31,6 +33,13 @@ struct ArrayCumSumImpl
        if (which.isFloat())
            return std::make_shared<DataTypeArray>(std::make_shared<DataTypeFloat64>());

+        if (which.isDecimal())
+        {
+            UInt32 scale = getDecimalScale(*expression_return);
+            DataTypePtr nested = std::make_shared<DataTypeDecimal<Decimal128>>(maxDecimalPrecision<Decimal128>(), scale);
+            return std::make_shared<DataTypeArray>(nested);
+        }
+
        throw Exception("arrayCumSum cannot add values of type " + expression_return->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
    }

@ -38,11 +47,14 @@ struct ArrayCumSumImpl
    template <typename Element, typename Result>
    static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
    {
-        const ColumnVector<Element> * column = checkAndGetColumn<ColumnVector<Element>>(&*mapped);
+        using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
+        using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
+
+        const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);

        if (!column)
        {
-            const ColumnConst * column_const = checkAndGetColumnConst<ColumnVector<Element>>(&*mapped);
+            const ColumnConst * column_const = checkAndGetColumnConst<ColVecType>(&*mapped);

            if (!column_const)
                return false;
@ -50,8 +62,17 @@ struct ArrayCumSumImpl
            const Element x = column_const->template getValue<Element>();
            const IColumn::Offsets & offsets = array.getOffsets();

-            auto res_nested = ColumnVector<Result>::create();
-            typename ColumnVector<Result>::Container & res_values = res_nested->getData();
+            typename ColVecResult::MutablePtr res_nested;
+            if constexpr (IsDecimalNumber<Element>)
+            {
+                const typename ColVecType::Container & data =
+                    checkAndGetColumn<ColVecType>(&column_const->getDataColumn())->getData();
+                res_nested = ColVecResult::create(0, data.getScale());
+            }
+            else
+                res_nested = ColVecResult::create();
+
+            typename ColVecResult::Container & res_values = res_nested->getData();
            res_values.resize(column_const->size());

            size_t pos = 0;
@ -72,11 +93,16 @@ struct ArrayCumSumImpl
            return true;
        }

+        const typename ColVecType::Container & data = column->getData();
        const IColumn::Offsets & offsets = array.getOffsets();
-        const typename ColumnVector<Element>::Container & data = column->getData();

-        auto res_nested = ColumnVector<Result>::create();
-        typename ColumnVector<Result>::Container & res_values = res_nested->getData();
+        typename ColVecResult::MutablePtr res_nested;
+        if constexpr (IsDecimalNumber<Element>)
+            res_nested = ColVecResult::create(0, data.getScale());
+        else
+            res_nested = ColVecResult::create();
+
+        typename ColVecResult::Container & res_values = res_nested->getData();
        res_values.resize(data.size());

        size_t pos = 0;
@ -110,7 +136,10 @@ struct ArrayCumSumImpl
            executeType<  Int32,  Int64>(mapped, array, res) ||
            executeType<  Int64,  Int64>(mapped, array, res) ||
            executeType<Float32,Float64>(mapped, array, res) ||
-            executeType<Float64,Float64>(mapped, array, res))
+            executeType<Float64,Float64>(mapped, array, res) ||
+            executeType<Decimal32, Decimal128>(mapped, array, res) ||
+            executeType<Decimal64, Decimal128>(mapped, array, res) ||
+            executeType<Decimal128, Decimal128>(mapped, array, res))
            return res;
        else
            throw Exception("Unexpected column for arrayCumSum: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
--- a/dbms/src/Functions/array/arrayCumSumNonNegative.cpp
+++ b/dbms/src/Functions/array/arrayCumSumNonNegative.cpp
@ -1,5 +1,7 @@
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnDecimal.h>
 #include "FunctionArrayMapped.h"
 #include <Functions/FunctionFactory.h>

@ -34,6 +36,13 @@ struct ArrayCumSumNonNegativeImpl
        if (which.isFloat())
            return std::make_shared<DataTypeArray>(std::make_shared<DataTypeFloat64>());

+        if (which.isDecimal())
+        {
+            UInt32 scale = getDecimalScale(*expression_return);
+            DataTypePtr nested = std::make_shared<DataTypeDecimal<Decimal128>>(maxDecimalPrecision<Decimal128>(), scale);
+            return std::make_shared<DataTypeArray>(nested);
+        }
+
        throw Exception("arrayCumSumNonNegativeImpl cannot add values of type " + expression_return->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
    }

@ -41,16 +50,24 @@ struct ArrayCumSumNonNegativeImpl
    template <typename Element, typename Result>
    static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
    {
-        const ColumnVector<Element> * column = checkAndGetColumn<ColumnVector<Element>>(&*mapped);
+        using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
+        using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
+
+        const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);

        if (!column)
            return false;

        const IColumn::Offsets & offsets = array.getOffsets();
-        const typename ColumnVector<Element>::Container & data = column->getData();
+        const typename ColVecType::Container & data = column->getData();

-        auto res_nested = ColumnVector<Result>::create();
-        typename ColumnVector<Result>::Container & res_values = res_nested->getData();
+        typename ColVecResult::MutablePtr res_nested;
+        if constexpr (IsDecimalNumber<Element>)
+            res_nested = ColVecResult::create(0, data.getScale());
+        else
+            res_nested = ColVecResult::create();
+
+        typename ColVecResult::Container & res_values = res_nested->getData();
        res_values.resize(data.size());

        size_t pos = 0;
@ -60,7 +77,7 @@ struct ArrayCumSumNonNegativeImpl
            // skip empty arrays
            if (pos < offsets[i])
            {
-                accum_sum = data[pos] > 0 ? data[pos] : 0;
+                accum_sum = data[pos] > 0 ? data[pos] : Element(0);
                res_values[pos] = accum_sum;
                for (++pos; pos < offsets[i]; ++pos)
                {
@ -90,7 +107,10 @@ struct ArrayCumSumNonNegativeImpl
            executeType<  Int32,  Int64>(mapped, array, res) ||
            executeType<  Int64,  Int64>(mapped, array, res) ||
            executeType<Float32,Float64>(mapped, array, res) ||
-            executeType<Float64,Float64>(mapped, array, res))
+            executeType<Float64,Float64>(mapped, array, res) ||
+            executeType<Decimal32, Decimal128>(mapped, array, res) ||
+            executeType<Decimal64, Decimal128>(mapped, array, res) ||
+            executeType<Decimal128, Decimal128>(mapped, array, res))
            return res;
        else
            throw Exception("Unexpected column for arrayCumSumNonNegativeImpl: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
--- a/dbms/src/Functions/array/arrayDifference.cpp
+++ b/dbms/src/Functions/array/arrayDifference.cpp
@ -1,5 +1,7 @@
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnDecimal.h>
 #include "FunctionArrayMapped.h"
 #include <Functions/FunctionFactory.h>

@ -37,6 +39,9 @@ struct ArrayDifferenceImpl
        if (which.isFloat32() || which.isFloat64())
            return std::make_shared<DataTypeArray>(std::make_shared<DataTypeFloat64>());

+        if (which.isDecimal())
+            return std::make_shared<DataTypeArray>(expression_return);
+
        throw Exception("arrayDifference cannot process values of type " + expression_return->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
    }

@ -44,16 +49,24 @@ struct ArrayDifferenceImpl
    template <typename Element, typename Result>
    static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
    {
-        const ColumnVector<Element> * column = checkAndGetColumn<ColumnVector<Element>>(&*mapped);
+        using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
+        using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
+
+        const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);

        if (!column)
            return false;

        const IColumn::Offsets & offsets = array.getOffsets();
-        const typename ColumnVector<Element>::Container & data = column->getData();
+        const typename ColVecType::Container & data = column->getData();

-        auto res_nested = ColumnVector<Result>::create();
-        typename ColumnVector<Result>::Container & res_values = res_nested->getData();
+        typename ColVecResult::MutablePtr res_nested;
+        if constexpr (IsDecimalNumber<Element>)
+            res_nested = ColVecResult::create(0, data.getScale());
+        else
+            res_nested = ColVecResult::create();
+
+        typename ColVecResult::Container & res_values = res_nested->getData();
        res_values.resize(data.size());

        size_t pos = 0;
@ -87,7 +100,10 @@ struct ArrayDifferenceImpl
            executeType<  Int32,  Int64>(mapped, array, res) ||
            executeType<  Int64,  Int64>(mapped, array, res) ||
            executeType<Float32,Float64>(mapped, array, res) ||
-            executeType<Float64,Float64>(mapped, array, res))
+            executeType<Float64,Float64>(mapped, array, res) ||
+            executeType<Decimal32, Decimal32>(mapped, array, res) ||
+            executeType<Decimal64, Decimal64>(mapped, array, res) ||
+            executeType<Decimal128, Decimal128>(mapped, array, res))
            return res;
        else
            throw Exception("Unexpected column for arrayDifference: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
--- a/dbms/src/Functions/array/arrayIntersect.cpp
+++ b/dbms/src/Functions/array/arrayIntersect.cpp
@ -4,6 +4,7 @@
 #include <DataTypes/DataTypeArray.h>
 #include <DataTypes/DataTypeNothing.h>
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <DataTypes/DataTypeDate.h>
 #include <DataTypes/DataTypeDateTime.h>
 #include <DataTypes/DataTypeNullable.h>
@ -12,6 +13,7 @@
 #include <Columns/ColumnArray.h>
 #include <Columns/ColumnString.h>
 #include <Columns/ColumnFixedString.h>
+#include <Columns/ColumnDecimal.h>
 #include <Columns/ColumnNullable.h>
 #include <Columns/ColumnTuple.h>
 #include <Common/HashTable/ClearableHashMap.h>
@ -58,10 +60,19 @@ private:
    struct UnpackedArrays
    {
        size_t base_rows = 0;
-        std::vector<char> is_const;
-        std::vector<const NullMap *> null_maps;
-        std::vector<const ColumnArray::ColumnOffsets::Container *> offsets;
-        ColumnRawPtrs nested_columns;
+
+        struct UnpackedArray
+        {
+            bool is_const = false;
+            const NullMap * null_map = nullptr;
+            const NullMap * overflow_mask = nullptr;
+            const ColumnArray::ColumnOffsets::Container * offsets = nullptr;
+            const IColumn * nested_column = nullptr;
+
+        };
+
+        std::vector<UnpackedArray> args;
+        Columns column_holders;

        UnpackedArrays() = default;
    };
@ -69,9 +80,16 @@ private:
    /// Cast column to data_type removing nullable if data_type hasn't.
    /// It's expected that column can represent data_type after removing some NullMap's.
    ColumnPtr castRemoveNullable(const ColumnPtr & column, const DataTypePtr & data_type) const;
-    Columns castColumns(Block & block, const ColumnNumbers & arguments,
+
+    struct CastArgumentsResult
+    {
+        ColumnsWithTypeAndName initial;
+        ColumnsWithTypeAndName casted;
+    };
+
+    CastArgumentsResult castColumns(Block & block, const ColumnNumbers & arguments,
                        const DataTypePtr & return_type, const DataTypePtr & return_type_with_nulls) const;
-    UnpackedArrays prepareArrays(const Columns & columns) const;
+    UnpackedArrays prepareArrays(const ColumnsWithTypeAndName & columns, ColumnsWithTypeAndName & initial_columns) const;

    template <typename Map, typename ColumnType, bool is_numeric_column>
    static ColumnPtr execute(const UnpackedArrays & arrays, MutableColumnPtr result_data);
@ -88,6 +106,19 @@ private:
        template <typename T, size_t>
        void operator()();
    };
+
+    struct DecimalExecutor
+    {
+        const UnpackedArrays & arrays;
+        const DataTypePtr & data_type;
+        ColumnPtr & result;
+
+        DecimalExecutor(const UnpackedArrays & arrays_, const DataTypePtr & data_type_, ColumnPtr & result_)
+            : arrays(arrays_), data_type(data_type_), result(result_) {}
+
+        template <typename T, size_t>
+        void operator()();
+    };
 };


@ -173,12 +204,13 @@ ColumnPtr FunctionArrayIntersect::castRemoveNullable(const ColumnPtr & column, c
    return column;
 }

-Columns FunctionArrayIntersect::castColumns(
+FunctionArrayIntersect::CastArgumentsResult FunctionArrayIntersect::castColumns(
        Block & block, const ColumnNumbers & arguments, const DataTypePtr & return_type,
        const DataTypePtr & return_type_with_nulls) const
 {
    size_t num_args = arguments.size();
-    Columns columns(num_args);
+    ColumnsWithTypeAndName initial_columns(num_args);
+    ColumnsWithTypeAndName columns(num_args);

    auto type_array = checkAndGetDataType<DataTypeArray>(return_type.get());
    auto & type_nested = type_array->getNestedType();
@ -201,6 +233,8 @@ Columns FunctionArrayIntersect::castColumns(
    for (size_t i = 0; i < num_args; ++i)
    {
        const ColumnWithTypeAndName & arg = block.getByPosition(arguments[i]);
+        initial_columns[i] = arg;
+        columns[i] = arg;
        auto & column = columns[i];

        if (is_numeric_or_string)
@ -208,68 +242,120 @@ Columns FunctionArrayIntersect::castColumns(
            /// Cast to Array(T) or Array(Nullable(T)).
            if (nested_is_nullable)
            {
-                if (arg.type->equals(*return_type))
-                    column = arg.column;
-                else
-                    column = castColumn(arg, return_type, context);
+                if (!arg.type->equals(*return_type))
+                {
+                    column.column = castColumn(arg, return_type, context);
+                    column.type = return_type;
+                }
            }
            else
            {
-                /// If result has array type Array(T) still cast Array(Nullable(U)) to Array(Nullable(T))
-                ///  because cannot cast Nullable(T) to T.
-                if (arg.type->equals(*return_type) || arg.type->equals(*nullable_return_type))
-                    column = arg.column;
-                else if (static_cast<const DataTypeArray &>(*arg.type).getNestedType()->isNullable())
-                    column = castColumn(arg, nullable_return_type, context);
-                else
-                    column = castColumn(arg, return_type, context);
+
+                if (!arg.type->equals(*return_type) && !arg.type->equals(*nullable_return_type))
+                {
+                    /// If result has array type Array(T) still cast Array(Nullable(U)) to Array(Nullable(T))
+                    ///  because cannot cast Nullable(T) to T.
+                    if (static_cast<const DataTypeArray &>(*arg.type).getNestedType()->isNullable())
+                    {
+                        column.column = castColumn(arg, nullable_return_type, context);
+                        column.type = nullable_return_type;
+                    }
+                    else
+                    {
+                        column.column = castColumn(arg, return_type, context);
+                        column.type = return_type;
+                    }
+                }
            }
        }
        else
        {
            /// return_type_with_nulls is the most common subtype with possible nullable parts.
-            if (arg.type->equals(*return_type_with_nulls))
-                column = arg.column;
-            else
-                column = castColumn(arg, return_type_with_nulls, context);
+            if (!arg.type->equals(*return_type_with_nulls))
+            {
+                column.column = castColumn(arg, return_type_with_nulls, context);
+                column.type = return_type_with_nulls;
+            }
        }
    }

-    return columns;
+    return {.initial = initial_columns, .casted = columns};
 }

-FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(const Columns & columns) const
+static ColumnPtr callFunctionNotEquals(ColumnWithTypeAndName first, ColumnWithTypeAndName second, const Context & context)
+{
+    ColumnsWithTypeAndName args;
+    args.reserve(2);
+    args.emplace_back(std::move(first));
+    args.emplace_back(std::move(second));
+
+    auto eq_func = FunctionFactory::instance().get("notEquals", context)->build(args);
+
+    Block block = args;
+    block.insert({nullptr, eq_func->getReturnType(), ""});
+
+    eq_func->execute(block, {0, 1}, 2, args.front().column->size());
+
+    return block.getByPosition(2).column;
+}
+
+FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(
+    const ColumnsWithTypeAndName & columns, ColumnsWithTypeAndName & initial_columns) const
 {
    UnpackedArrays arrays;

    size_t columns_number = columns.size();
-    arrays.is_const.assign(columns_number, false);
-    arrays.null_maps.resize(columns_number);
-    arrays.offsets.resize(columns_number);
-    arrays.nested_columns.resize(columns_number);
+    arrays.args.resize(columns_number);

    bool all_const = true;

    for (auto i : ext::range(0, columns_number))
    {
-        auto argument_column = columns[i].get();
+        auto & arg = arrays.args[i];
+        auto argument_column = columns[i].column.get();
+        auto initial_column = initial_columns[i].column.get();
+
        if (auto argument_column_const = typeid_cast<const ColumnConst *>(argument_column))
        {
-            arrays.is_const[i] = true;
+            arg.is_const = true;
            argument_column = argument_column_const->getDataColumnPtr().get();
+            initial_column = typeid_cast<const ColumnConst *>(initial_column)->getDataColumnPtr().get();
        }

        if (auto argument_column_array = typeid_cast<const ColumnArray *>(argument_column))
        {
-            if (!arrays.is_const[i])
+            if (!arg.is_const)
                all_const = false;

-            arrays.offsets[i] = &argument_column_array->getOffsets();
-            arrays.nested_columns[i] = &argument_column_array->getData();
-            if (auto column_nullable = typeid_cast<const ColumnNullable *>(arrays.nested_columns[i]))
+            arg.offsets = &argument_column_array->getOffsets();
+            arg.nested_column = &argument_column_array->getData();
+
+            initial_column = &typeid_cast<const ColumnArray *>(initial_column)->getData();
+
+            if (auto column_nullable = typeid_cast<const ColumnNullable *>(arg.nested_column))
            {
-                arrays.null_maps[i] = &column_nullable->getNullMapData();
-                arrays.nested_columns[i] = &column_nullable->getNestedColumn();
+                arg.null_map = &column_nullable->getNullMapData();
+                arg.nested_column = &column_nullable->getNestedColumn();
+                initial_column = &typeid_cast<const ColumnNullable *>(initial_column)->getNestedColumn();
+            }
+
+            /// In case column was casted need to create overflow mask for integer types.
+            if (arg.nested_column != initial_column)
+            {
+                auto & nested_init_type = typeid_cast<const DataTypeArray *>(removeNullable(initial_columns[i].type).get())->getNestedType();
+                auto & nested_cast_type = typeid_cast<const DataTypeArray *>(removeNullable(columns[i].type).get())->getNestedType();
+
+                if (isInteger(nested_init_type) || isDateOrDateTime(nested_init_type))
+                {
+                    /// Compare original and casted columns. It seem to be the easiest way.
+                    auto overflow_mask = callFunctionNotEquals(
+                            {arg.nested_column->getPtr(), nested_init_type, ""},
+                            {initial_column->getPtr(), nested_cast_type, ""},
+                            context);
+
+                    arg.overflow_mask = &typeid_cast<const ColumnUInt8 *>(overflow_mask.get())->getData();
+                    arrays.column_holders.emplace_back(std::move(overflow_mask));
+                }
            }
        }
        else
@ -278,16 +364,16 @@ FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(con

    if (all_const)
    {
-        arrays.base_rows = arrays.offsets.front()->size();
+        arrays.base_rows = arrays.args.front().offsets->size();
    }
    else
    {
        for (auto i : ext::range(0, columns_number))
        {
-            if (arrays.is_const[i])
+            if (arrays.args[i].is_const)
                continue;

-            size_t rows = arrays.offsets[i]->size();
+            size_t rows = arrays.args[i].offsets->size();
            if (arrays.base_rows == 0 && rows > 0)
                arrays.base_rows = rows;
            else if (arrays.base_rows != rows)
@ -322,13 +408,14 @@ void FunctionArrayIntersect::executeImpl(Block & block, const ColumnNumbers & ar

    auto return_type_with_nulls = getMostSubtype(data_types, true, true);

-    Columns columns = castColumns(block, arguments, return_type, return_type_with_nulls);
+    auto columns = castColumns(block, arguments, return_type, return_type_with_nulls);

-    UnpackedArrays arrays = prepareArrays(columns);
+    UnpackedArrays arrays = prepareArrays(columns.casted, columns.initial);

    ColumnPtr result_column;
    auto not_nullable_nested_return_type = removeNullable(nested_return_type);
-    TypeListNumbers::forEach(NumberExecutor(arrays, not_nullable_nested_return_type, result_column));
+    TypeListNativeNumbers::forEach(NumberExecutor(arrays, not_nullable_nested_return_type, result_column));
+    TypeListDecimalNumbers::forEach(DecimalExecutor(arrays, not_nullable_nested_return_type, result_column));

    using DateMap = ClearableHashMap<DataTypeDate::FieldType, size_t, DefaultHash<DataTypeDate::FieldType>,
            HashTableGrower<INITIAL_SIZE_DEGREE>,
@ -356,7 +443,7 @@ void FunctionArrayIntersect::executeImpl(Block & block, const ColumnNumbers & ar
            result_column = execute<StringMap, ColumnFixedString, false>(arrays, std::move(column));
        else
        {
-            column = static_cast<const DataTypeArray &>(*return_type_with_nulls).getNestedType()->createColumn();
+            column = assert_cast<const DataTypeArray &>(*return_type_with_nulls).getNestedType()->createColumn();
            result_column = castRemoveNullable(execute<StringMap, IColumn, false>(arrays, std::move(column)), return_type);
        }
    }
@ -374,27 +461,38 @@ void FunctionArrayIntersect::NumberExecutor::operator()()
        result = execute<Map, ColumnVector<T>, true>(arrays, ColumnVector<T>::create());
 }

+template <typename T, size_t>
+void FunctionArrayIntersect::DecimalExecutor::operator()()
+{
+    using Map = ClearableHashMap<T, size_t, DefaultHash<T>, HashTableGrower<INITIAL_SIZE_DEGREE>,
+            HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(T)>>;
+
+    if (!result)
+        if (auto * decimal = typeid_cast<const DataTypeDecimal<T> *>(data_type.get()))
+            result = execute<Map, ColumnDecimal<T>, true>(arrays, ColumnDecimal<T>::create(0, decimal->getScale()));
+}
+
 template <typename Map, typename ColumnType, bool is_numeric_column>
 ColumnPtr FunctionArrayIntersect::execute(const UnpackedArrays & arrays, MutableColumnPtr result_data_ptr)
 {
-    auto args = arrays.nested_columns.size();
+    auto args = arrays.args.size();
    auto rows = arrays.base_rows;

    bool all_nullable = true;

    std::vector<const ColumnType *> columns;
    columns.reserve(args);
-    for (auto arg : ext::range(0, args))
+    for (auto & arg : arrays.args)
    {
        if constexpr (std::is_same<ColumnType, IColumn>::value)
-            columns.push_back(arrays.nested_columns[arg]);
+            columns.push_back(arg.nested_column);
        else
-            columns.push_back(checkAndGetColumn<ColumnType>(arrays.nested_columns[arg]));
+            columns.push_back(checkAndGetColumn<ColumnType>(arg.nested_column));

        if (!columns.back())
            throw Exception("Unexpected array type for function arrayIntersect", ErrorCodes::LOGICAL_ERROR);

-        if (!arrays.null_maps[arg])
+        if (!arg.null_map)
            all_nullable = false;
    }

@ -415,44 +513,45 @@ ColumnPtr FunctionArrayIntersect::execute(const UnpackedArrays & arrays, Mutable

        bool all_has_nullable = all_nullable;

-        for (auto arg : ext::range(0, args))
+        for (auto arg_num : ext::range(0, args))
        {
+            auto & arg = arrays.args[arg_num];
            bool current_has_nullable = false;

            size_t off;
            // const array has only one row
-            bool const_arg = arrays.is_const[arg];
-            if (const_arg)
-                off = (*arrays.offsets[arg])[0];
+            if (arg.is_const)
+                off = (*arg.offsets)[0];
            else
-                off = (*arrays.offsets[arg])[row];
+                off = (*arg.offsets)[row];

-            for (auto i : ext::range(prev_off[arg], off))
+            for (auto i : ext::range(prev_off[arg_num], off))
            {
-                if (arrays.null_maps[arg] && (*arrays.null_maps[arg])[i])
+                if (arg.null_map && (*arg.null_map)[i])
                    current_has_nullable = true;
-                else
+                else if (!arg.overflow_mask || (*arg.overflow_mask)[i] == 0)
                {
                    typename Map::mapped_type * value = nullptr;

                    if constexpr (is_numeric_column)
-                        value = &map[columns[arg]->getElement(i)];
+                        value = &map[columns[arg_num]->getElement(i)];
                    else if constexpr (std::is_same<ColumnType, ColumnString>::value || std::is_same<ColumnType, ColumnFixedString>::value)
-                        value = &map[columns[arg]->getDataAt(i)];
+                        value = &map[columns[arg_num]->getDataAt(i)];
                    else
                    {
                        const char * data = nullptr;
-                        value = &map[columns[arg]->serializeValueIntoArena(i, arena, data)];
+                        value = &map[columns[arg_num]->serializeValueIntoArena(i, arena, data)];
                    }

-                    if (*value == arg)
+                    /// Here we count the number of element appearances, but no more than once per array.
+                    if (*value == arg_num)
                        ++(*value);
                }
            }

-            prev_off[arg] = off;
-            if (const_arg)
-                prev_off[arg] = 0;
+            prev_off[arg_num] = off;
+            if (arg.is_const)
+                prev_off[arg_num] = 0;

            if (!current_has_nullable)
                all_has_nullable = false;
--- a/dbms/src/Functions/array/arraySplit.cpp
+++ b/dbms/src/Functions/array/arraySplit.cpp
@ -37,20 +37,24 @@ struct ArraySplitImpl

            size_t pos = 0;

-            out_offsets_2.reserve(in_offsets.size()); // the actual size would be equal or larger
+            out_offsets_2.reserve(in_offsets.size()); // assume the actual size to be equal or larger
            out_offsets_1.reserve(in_offsets.size());

            for (size_t i = 0; i < in_offsets.size(); ++i)
            {
-                pos += !reverse;
-                for (; pos < in_offsets[i] - reverse; ++pos)
+                if (pos < in_offsets[i])
                {
-                    if (cut[pos])
-                        out_offsets_2.push_back(pos + reverse);
-                }
-                pos += reverse;
+                    pos += !reverse;
+                    for (; pos < in_offsets[i] - reverse; ++pos)
+                    {
+                        if (cut[pos])
+                            out_offsets_2.push_back(pos + reverse);
+                    }
+                    pos += reverse;
+
+                    out_offsets_2.push_back(pos);
+                }

-                out_offsets_2.push_back(pos);
                out_offsets_1.push_back(out_offsets_2.size());
            }
        }
@ -73,13 +77,21 @@ struct ArraySplitImpl
            }
            else
            {
+                size_t pos = 0;
+
                out_offsets_2.reserve(in_offsets.size());
                out_offsets_1.reserve(in_offsets.size());

                for (size_t i = 0; i < in_offsets.size(); ++i)
                {
-                    out_offsets_2.push_back(in_offsets[i]);
-                    out_offsets_1.push_back(i + 1);
+                    if (pos < in_offsets[i])
+                    {
+                        pos = in_offsets[i];
+
+                        out_offsets_2.push_back(pos);
+                    }
+
+                    out_offsets_1.push_back(out_offsets_2.size());
                }
            }
        }
--- a/dbms/src/Functions/array/arraySum.cpp
+++ b/dbms/src/Functions/array/arraySum.cpp
@ -1,5 +1,7 @@
 #include <DataTypes/DataTypesNumber.h>
+#include <DataTypes/DataTypesDecimal.h>
 #include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnDecimal.h>
 #include "FunctionArrayMapped.h"
 #include <Functions/FunctionFactory.h>

@ -31,25 +33,43 @@ struct ArraySumImpl
        if (which.isFloat())
            return std::make_shared<DataTypeFloat64>();

+        if (which.isDecimal())
+        {
+            UInt32 scale = getDecimalScale(*expression_return);
+            return std::make_shared<DataTypeDecimal<Decimal128>>(maxDecimalPrecision<Decimal128>(), scale);
+        }
+
        throw Exception("arraySum cannot add values of type " + expression_return->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
    }

    template <typename Element, typename Result>
    static bool executeType(const ColumnPtr & mapped, const ColumnArray::Offsets & offsets, ColumnPtr & res_ptr)
    {
-        const ColumnVector<Element> * column = checkAndGetColumn<ColumnVector<Element>>(&*mapped);
+        using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
+        using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
+
+        const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);

        if (!column)
        {
-            const ColumnConst * column_const = checkAndGetColumnConst<ColumnVector<Element>>(&*mapped);
+            const ColumnConst * column_const = checkAndGetColumnConst<ColVecType>(&*mapped);

            if (!column_const)
                return false;

            const Element x = column_const->template getValue<Element>();

-            auto res_column = ColumnVector<Result>::create(offsets.size());
-            typename ColumnVector<Result>::Container & res = res_column->getData();
+            typename ColVecResult::MutablePtr res_column;
+            if constexpr (IsDecimalNumber<Element>)
+            {
+                const typename ColVecType::Container & data =
+                    checkAndGetColumn<ColVecType>(&column_const->getDataColumn())->getData();
+                res_column = ColVecResult::create(offsets.size(), data.getScale());
+            }
+            else
+                res_column = ColVecResult::create(offsets.size());
+
+            typename ColVecResult::Container & res = res_column->getData();

            size_t pos = 0;
            for (size_t i = 0; i < offsets.size(); ++i)
@ -62,9 +82,15 @@ struct ArraySumImpl
            return true;
        }

-        const typename ColumnVector<Element>::Container & data = column->getData();
-        auto res_column = ColumnVector<Result>::create(offsets.size());
-        typename ColumnVector<Result>::Container & res = res_column->getData();
+        const typename ColVecType::Container & data = column->getData();
+
+        typename ColVecResult::MutablePtr res_column;
+        if constexpr (IsDecimalNumber<Element>)
+            res_column = ColVecResult::create(offsets.size(), data.getScale());
+        else
+            res_column = ColVecResult::create(offsets.size());
+
+        typename ColVecResult::Container & res = res_column->getData();

        size_t pos = 0;
        for (size_t i = 0; i < offsets.size(); ++i)
@ -95,7 +121,10 @@ struct ArraySumImpl
            executeType<  Int32,  Int64>(mapped, offsets, res) ||
            executeType<  Int64,  Int64>(mapped, offsets, res) ||
            executeType<Float32,Float64>(mapped, offsets, res) ||
-            executeType<Float64,Float64>(mapped, offsets, res))
+            executeType<Float64,Float64>(mapped, offsets, res) ||
+            executeType<Decimal32, Decimal128>(mapped, offsets, res) ||
+            executeType<Decimal64, Decimal128>(mapped, offsets, res) ||
+            executeType<Decimal128, Decimal128>(mapped, offsets, res))
            return res;
        else
            throw Exception("Unexpected column for arraySum: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
--- a/dbms/src/Functions/formatDateTime.cpp
+++ b/dbms/src/Functions/formatDateTime.cpp
@ -91,19 +91,7 @@ private:
        template <typename T>
        static inline void writeNumber2(char * p, T v)
        {
-            static const char digits[201] =
-                "00010203040506070809"
-                "10111213141516171819"
-                "20212223242526272829"
-                "30313233343536373839"
-                "40414243444546474849"
-                "50515253545556575859"
-                "60616263646566676869"
-                "70717273747576777879"
-                "80818283848586878889"
-                "90919293949596979899";
-
-            memcpy(p, &digits[v * 2], 2);
+            memcpy(p, &digits100[v * 2], 2);
        }

        template <typename T>
--- a/dbms/src/Functions/greatCircleDistance.cpp
+++ b/dbms/src/Functions/greatCircleDistance.cpp
@ -7,12 +7,9 @@
 #include <Functions/FunctionHelpers.h>
 #include <Functions/FunctionFactory.h>
 #include <ext/range.h>
-#include <math.h>
+#include <cmath>
 #include <array>

-#define DEGREES_IN_RADIANS (M_PI / 180.0)
-#define EARTH_RADIUS_IN_METERS 6372797.560856
-

 namespace DB
 {
@ -24,142 +21,196 @@ namespace ErrorCodes
    extern const int LOGICAL_ERROR;
 }

-static inline Float64 degToRad(Float64 angle) { return angle * DEGREES_IN_RADIANS; }
-
-/**
- *  The function calculates distance in meters between two points on Earth specified by longitude and latitude in degrees.
- *  The function uses great circle distance formula https://en.wikipedia.org/wiki/Great-circle_distance.
- *  Throws exception when one or several input values are not within reasonable bounds.
- *  Latitude must be in [-90, 90], longitude must be [-180, 180]
+/** https://en.wikipedia.org/wiki/Great-circle_distance
 *
+ *  The function calculates distance in meters between two points on Earth specified by longitude and latitude in degrees.
+ *  The function uses great circle distance formula https://en.wikipedia.org/wiki/Great-circle_distance .
+ *  Throws exception when one or several input values are not within reasonable bounds.
+ *  Latitude must be in [-90, 90], longitude must be [-180, 180].
+ *  Original code of this implementation of this function is here https://github.com/sphinxsearch/sphinx/blob/409f2c2b5b2ff70b04e38f92b6b1a890326bad65/src/sphinxexpr.cpp#L3825.
+ *  Andrey Aksenov, the author of original code, permitted to use this code in ClickHouse under the Apache 2.0 license.
+ *  Presentation about this code from Highload++ Siberia 2019 is here https://github.com/ClickHouse/ClickHouse/files/3324740/1_._._GEODIST_._.pdf
+ *  The main idea of this implementation is optimisations based on Taylor series, trigonometric identity and calculated constants once for cosine, arcsine(sqrt) and look up table.
 */
+
+namespace
+{
+
+constexpr double PI = 3.14159265358979323846;
+constexpr float TO_RADF = static_cast<float>(PI / 180.0);
+constexpr float TO_RADF2 = static_cast<float>(PI / 360.0);
+
+constexpr size_t GEODIST_TABLE_COS = 1024; // maxerr 0.00063%
+constexpr size_t GEODIST_TABLE_ASIN = 512;
+constexpr size_t GEODIST_TABLE_K = 1024;
+
+float g_GeoCos[GEODIST_TABLE_COS + 1];        /// cos(x) table
+float g_GeoAsin[GEODIST_TABLE_ASIN + 1];    /// asin(sqrt(x)) table
+float g_GeoFlatK[GEODIST_TABLE_K + 1][2];    /// geodistAdaptive() flat ellipsoid method k1, k2 coeffs table
+
+inline double sqr(double v)
+{
+    return v * v;
+}
+
+inline float fsqr(float v)
+{
+    return v * v;
+}
+
+void geodistInit()
+{
+    for (size_t i = 0; i <= GEODIST_TABLE_COS; ++i)
+        g_GeoCos[i] = static_cast<float>(cos(2 * PI * i / GEODIST_TABLE_COS)); // [0, 2 * pi] -> [0, COSTABLE]
+
+    for (size_t i = 0; i <= GEODIST_TABLE_ASIN; ++i)
+        g_GeoAsin[i] = static_cast<float>(asin(
+                sqrt(static_cast<double>(i) / GEODIST_TABLE_ASIN))); // [0, 1] -> [0, ASINTABLE]
+
+    for (size_t i = 0; i <= GEODIST_TABLE_K; ++i)
+    {
+        double x = PI * i / GEODIST_TABLE_K - PI * 0.5; // [-pi / 2, pi / 2] -> [0, KTABLE]
+        g_GeoFlatK[i][0] = static_cast<float>(sqr(111132.09 - 566.05 * cos(2 * x) + 1.20 * cos(4 * x)));
+        g_GeoFlatK[i][1] = static_cast<float>(sqr(111415.13 * cos(x) - 94.55 * cos(3 * x) + 0.12 * cos(5 * x)));
+    }
+}
+
+inline float geodistDegDiff(float f)
+{
+    f = static_cast<float>(fabs(f));
+    while (f > 360)
+        f -= 360;
+    if (f > 180)
+        f = 360 - f;
+    return f;
+}
+
+inline float geodistFastCos(float x)
+{
+    float y = static_cast<float>(fabs(x) * GEODIST_TABLE_COS / PI / 2);
+    int i = static_cast<int>(y);
+    y -= i;
+    i &= (GEODIST_TABLE_COS - 1);
+    return g_GeoCos[i] + (g_GeoCos[i + 1] - g_GeoCos[i]) * y;
+}
+
+inline float geodistFastSin(float x)
+{
+    float y = static_cast<float>(fabs(x) * GEODIST_TABLE_COS / PI / 2);
+    int i = static_cast<int>(y);
+    y -= i;
+    i = (i - GEODIST_TABLE_COS / 4) & (GEODIST_TABLE_COS - 1); // cos(x - pi / 2) = sin(x), costable / 4 = pi / 2
+    return g_GeoCos[i] + (g_GeoCos[i + 1] - g_GeoCos[i]) * y;
+}
+
+/// fast implementation of asin(sqrt(x))
+/// max error in floats 0.00369%, in doubles 0.00072%
+inline float geodistFastAsinSqrt(float x)
+{
+    if (x < 0.122)
+    {
+        // distance under 4546km, Taylor error under 0.00072%
+        float y = static_cast<float>(sqrt(x));
+        return y + x * y * 0.166666666666666f + x * x * y * 0.075f + x * x * x * y * 0.044642857142857f;
+    }
+    if (x < 0.948)
+    {
+        // distance under 17083km, 512-entry LUT error under 0.00072%
+        x *= GEODIST_TABLE_ASIN;
+        int i = static_cast<int>(x);
+        return g_GeoAsin[i] + (g_GeoAsin[i + 1] - g_GeoAsin[i]) * (x - i);
+    }
+    return static_cast<float>(asin(sqrt(x))); // distance over 17083km, just compute honestly
+}
+
+}
+
+
 class FunctionGreatCircleDistance : public IFunction
 {
 public:
-
    static constexpr auto name = "greatCircleDistance";
    static FunctionPtr create(const Context &) { return std::make_shared<FunctionGreatCircleDistance>(); }

 private:
-
-    enum class instr_type : uint8_t
-    {
-        get_float_64,
-        get_const_float_64
-    };
-
-    using instr_t = std::pair<instr_type, const IColumn *>;
-    using instrs_t = std::array<instr_t, 4>;
-
    String getName() const override { return name; }
-
    size_t getNumberOfArguments() const override { return 4; }

+    bool useDefaultImplementationForConstants() const override { return true; }
+
    DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
    {
        for (const auto arg_idx : ext::range(0, arguments.size()))
        {
            const auto arg = arguments[arg_idx].get();
-            if (!WhichDataType(arg).isFloat64())
+            if (!WhichDataType(arg).isFloat())
                throw Exception(
                    "Illegal type " + arg->getName() + " of argument " + std::to_string(arg_idx + 1) + " of function " + getName() + ". Must be Float64",
                    ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
        }

-        return std::make_shared<DataTypeFloat64>();
+        return std::make_shared<DataTypeFloat32>();
    }

-    instrs_t getInstructions(const Block & block, const ColumnNumbers & arguments, bool & out_const)
+    Float32 greatCircleDistance(Float32 lon1deg, Float32 lat1deg, Float32 lon2deg, Float32 lat2deg)
    {
-        instrs_t result;
-        out_const = true;
-
-        for (const auto arg_idx : ext::range(0, arguments.size()))
+        if (lon1deg < -180 || lon1deg > 180 ||
+            lon2deg < -180 || lon2deg > 180 ||
+            lat1deg < -90 || lat1deg > 90 ||
+            lat2deg < -90 || lat2deg > 90)
        {
-            const auto column = block.getByPosition(arguments[arg_idx]).column.get();
-
-            if (const auto col = checkAndGetColumn<ColumnVector<Float64>>(column))
-            {
-                out_const = false;
-                result[arg_idx] = instr_t{instr_type::get_float_64, col};
-            }
-            else if (const auto col_const = checkAndGetColumnConst<ColumnVector<Float64>>(column))
-            {
-                result[arg_idx] = instr_t{instr_type::get_const_float_64, col_const};
-            }
-            else
-                throw Exception("Illegal column " + column->getName() + " of argument of function " + getName(),
-                    ErrorCodes::ILLEGAL_COLUMN);
+            throw Exception("Arguments values out of bounds for function " + getName(),
+                            ErrorCodes::ARGUMENT_OUT_OF_BOUND);
        }

-        return result;
-    }
+        float lat_diff = geodistDegDiff(lat1deg - lat2deg);
+        float lon_diff = geodistDegDiff(lon1deg - lon2deg);

-    /// https://en.wikipedia.org/wiki/Great-circle_distance
-    Float64 greatCircleDistance(Float64 lon1Deg, Float64 lat1Deg, Float64 lon2Deg, Float64 lat2Deg)
-    {
-        if (lon1Deg < -180 || lon1Deg > 180 ||
-            lon2Deg < -180 || lon2Deg > 180 ||
-            lat1Deg < -90 || lat1Deg > 90 ||
-            lat2Deg < -90 || lat2Deg > 90)
+        if (lon_diff < 13)
        {
-            throw Exception("Arguments values out of bounds for function " + getName(), ErrorCodes::ARGUMENT_OUT_OF_BOUND);
-        }
-
-        Float64 lon1Rad = degToRad(lon1Deg);
-        Float64 lat1Rad = degToRad(lat1Deg);
-        Float64 lon2Rad = degToRad(lon2Deg);
-        Float64 lat2Rad = degToRad(lat2Deg);
-        Float64 u = sin((lat2Rad - lat1Rad) / 2);
-        Float64 v = sin((lon2Rad - lon1Rad) / 2);
-        return 2.0 * EARTH_RADIUS_IN_METERS * asin(sqrt(u * u + cos(lat1Rad) * cos(lat2Rad) * v * v));
-    }
-
-
-    void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
-    {
-        const auto size = input_rows_count;
-
-        bool result_is_const{};
-        auto instrs = getInstructions(block, arguments, result_is_const);
-
-        if (result_is_const)
-        {
-            const auto & colLon1 = assert_cast<const ColumnConst *>(block.getByPosition(arguments[0]).column.get())->getValue<Float64>();
-            const auto & colLat1 = assert_cast<const ColumnConst *>(block.getByPosition(arguments[1]).column.get())->getValue<Float64>();
-            const auto & colLon2 = assert_cast<const ColumnConst *>(block.getByPosition(arguments[2]).column.get())->getValue<Float64>();
-            const auto & colLat2 = assert_cast<const ColumnConst *>(block.getByPosition(arguments[3]).column.get())->getValue<Float64>();
-
-            Float64 res = greatCircleDistance(colLon1, colLat1, colLon2, colLat2);
-            block.getByPosition(result).column = block.getByPosition(result).type->createColumnConst(size, res);
+            // points are close enough; use flat ellipsoid model
+            // interpolate sqr(k1), sqr(k2) coefficients using latitudes midpoint
+            float m = (lat1deg + lat2deg + 180) * GEODIST_TABLE_K / 360; // [-90, 90] degrees -> [0, KTABLE] indexes
+            size_t i = static_cast<size_t>(m) & (GEODIST_TABLE_K - 1);
+            float kk1 = g_GeoFlatK[i][0] + (g_GeoFlatK[i + 1][0] - g_GeoFlatK[i][0]) * (m - i);
+            float kk2 = g_GeoFlatK[i][1] + (g_GeoFlatK[i + 1][1] - g_GeoFlatK[i][1]) * (m - i);
+            return static_cast<float>(sqrt(kk1 * lat_diff * lat_diff + kk2 * lon_diff * lon_diff));
        }
        else
        {
-            auto dst = ColumnVector<Float64>::create();
-            auto & dst_data = dst->getData();
-            dst_data.resize(size);
-            Float64 vals[instrs.size()];
-            for (const auto row : ext::range(0, size))
-            {
-                for (const auto idx : ext::range(0, instrs.size()))
-                {
-                    if (instr_type::get_float_64 == instrs[idx].first)
-                        vals[idx] = assert_cast<const ColumnVector<Float64> *>(instrs[idx].second)->getData()[row];
-                    else if (instr_type::get_const_float_64 == instrs[idx].first)
-                        vals[idx] = assert_cast<const ColumnConst *>(instrs[idx].second)->getValue<Float64>();
-                    else
-                        throw Exception{"Unknown instruction type in implementation of greatCircleDistance function", ErrorCodes::LOGICAL_ERROR};
-                }
-                dst_data[row] = greatCircleDistance(vals[0], vals[1], vals[2], vals[3]);
-            }
-            block.getByPosition(result).column = std::move(dst);
+            // points too far away; use haversine
+            static const float d = 2 * 6371000;
+            float a = fsqr(geodistFastSin(lat_diff * TO_RADF2)) +
+                geodistFastCos(lat1deg * TO_RADF) * geodistFastCos(lat2deg * TO_RADF) *
+                fsqr(geodistFastSin(lon_diff * TO_RADF2));
+            return static_cast<float>(d * geodistFastAsinSqrt(a));
        }
    }
+
+    void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
+    {
+        auto dst = ColumnVector<Float32>::create();
+        auto & dst_data = dst->getData();
+        dst_data.resize(input_rows_count);
+
+        const IColumn & col_lon1 = *block.getByPosition(arguments[0]).column;
+        const IColumn & col_lat1 = *block.getByPosition(arguments[1]).column;
+        const IColumn & col_lon2 = *block.getByPosition(arguments[2]).column;
+        const IColumn & col_lat2 = *block.getByPosition(arguments[3]).column;
+
+        for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
+            dst_data[row_num] = greatCircleDistance(
+                col_lon1.getFloat32(row_num), col_lat1.getFloat32(row_num),
+                col_lon2.getFloat32(row_num), col_lat2.getFloat32(row_num));
+
+        block.getByPosition(result).column = std::move(dst);
+    }
 };


 void registerFunctionGreatCircleDistance(FunctionFactory & factory)
 {
+    geodistInit();
    factory.registerFunction<FunctionGreatCircleDistance>();
 }

--- a/dbms/src/Functions/if.cpp
+++ b/dbms/src/Functions/if.cpp
@ -175,9 +175,7 @@ public:

 private:
    template <typename T0, typename T1>
-    static constexpr bool allow_arrays =
-        !IsDecimalNumber<T0> && !IsDecimalNumber<T1> &&
-        !std::is_same_v<T0, UInt128> && !std::is_same_v<T1, UInt128>;
+    static constexpr bool allow_arrays = !std::is_same_v<T0, UInt128> && !std::is_same_v<T1, UInt128>;

    template <typename T0, typename T1>
    static UInt32 decimalScale(Block & block [[maybe_unused]], const ColumnNumbers & arguments [[maybe_unused]])
--- a/Show More
+++ b/Show More