Merge branch 'master' of github.com:ClickHouse/ClickHouse into pipe_reading

2024-11-21 07:01:59 +00:00 · 2021-07-09 12:36:00 +03:00 · 2021-07-09 12:36:00 +03:00 · 21f1e0e626
commit 21f1e0e626
parent 407652457f 581e79ced9
139 changed files with 3902 additions and 1031 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,159 @@
+### ClickHouse release v21.7, 2021-07-09
+
+#### Backward Incompatible Change
+
+* Improved performance of queries with explicitly defined large sets. Added compatibility setting `legacy_column_name_of_tuple_literal`. It makes sense to set it to `true`, while doing rolling update of cluster from version lower than 21.7 to any higher version. Otherwise distributed queries with explicitly defined sets at `IN` clause may fail during update. [#25371](https://github.com/ClickHouse/ClickHouse/pull/25371) ([Anton Popov](https://github.com/CurtizJ)).
+* Forward/backward incompatible change of maximum buffer size in clickhouse-keeper (an experimental alternative to ZooKeeper). Better to do it now (before production), than later. [#25421](https://github.com/ClickHouse/ClickHouse/pull/25421) ([alesapin](https://github.com/alesapin)).
+
+#### New Feature
+
+* Support configuration in YAML format as alternative to XML. This closes [#3607](https://github.com/ClickHouse/ClickHouse/issues/3607). [#21858](https://github.com/ClickHouse/ClickHouse/pull/21858) ([BoloniniD](https://github.com/BoloniniD)).
+* Provides a way to restore replicated table when the data is (possibly) present, but the ZooKeeper metadata is lost. Resolves [#13458](https://github.com/ClickHouse/ClickHouse/issues/13458). [#13652](https://github.com/ClickHouse/ClickHouse/pull/13652) ([Mike Kot](https://github.com/myrrc)).
+* Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting `output_format_arrow_low_cardinality_as_dictionary`. [#24341](https://github.com/ClickHouse/ClickHouse/pull/24341) ([Kruglov Pavel](https://github.com/Avogar)).
+* Added support for `Array` type in dictionaries. [#25119](https://github.com/ClickHouse/ClickHouse/pull/25119) ([Maksim Kita](https://github.com/kitaisreal)).
+* Added function `bitPositionsToArray`. Closes [#23792](https://github.com/ClickHouse/ClickHouse/issues/23792). Author [Kevin Wan] (@MaxWk). [#25394](https://github.com/ClickHouse/ClickHouse/pull/25394) ([Maksim Kita](https://github.com/kitaisreal)).
+* Added function `dateName` to return names like 'Friday' or 'April'. Author [Daniil Kondratyev] (@dankondr). [#25372](https://github.com/ClickHouse/ClickHouse/pull/25372) ([Maksim Kita](https://github.com/kitaisreal)).
+* Add `toJSONString` function to serialize columns to their JSON representations. [#25164](https://github.com/ClickHouse/ClickHouse/pull/25164) ([Amos Bird](https://github.com/amosbird)).
+* Now `query_log` has two new columns: `initial_query_start_time`, `initial_query_start_time_microsecond` that record the starting time of a distributed query if any. [#25022](https://github.com/ClickHouse/ClickHouse/pull/25022) ([Amos Bird](https://github.com/amosbird)).
+* Add aggregate function `segmentLengthSum`. [#24250](https://github.com/ClickHouse/ClickHouse/pull/24250) ([flynn](https://github.com/ucasfl)).
+* Add a new boolean setting `prefer_global_in_and_join` which defaults all IN/JOIN as GLOBAL IN/JOIN. [#23434](https://github.com/ClickHouse/ClickHouse/pull/23434) ([Amos Bird](https://github.com/amosbird)).
+* Support `ALTER DELETE` queries for `Join` table engine. [#23260](https://github.com/ClickHouse/ClickHouse/pull/23260) ([foolchi](https://github.com/foolchi)).
+* Add `quantileBFloat16` aggregate function as well as the corresponding `quantilesBFloat16` and `medianBFloat16`. It is very simple and fast quantile estimator with relative error not more than 0.390625%. This closes [#16641](https://github.com/ClickHouse/ClickHouse/issues/16641). [#23204](https://github.com/ClickHouse/ClickHouse/pull/23204) ([Ivan Novitskiy](https://github.com/RedClusive)).
+* Implement `sequenceNextNode()` function useful for `flow analysis`. [#19766](https://github.com/ClickHouse/ClickHouse/pull/19766) ([achimbab](https://github.com/achimbab)).
+
+#### Experimental Feature
+
+* Add support for virtual filesystem over HDFS. [#11058](https://github.com/ClickHouse/ClickHouse/pull/11058) ([overshov](https://github.com/overshov)) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Now clickhouse-keeper (an experimental alternative to ZooKeeper) supports ZooKeeper-like `digest` ACLs. [#24448](https://github.com/ClickHouse/ClickHouse/pull/24448) ([alesapin](https://github.com/alesapin)).
+
+#### Performance Improvement
+
+* Added optimization that transforms some functions to reading of subcolumns to reduce amount of read data. E.g., statement `col IS NULL` is transformed to reading of subcolumn `col.null`. Optimization can be enabled by setting `optimize_functions_to_subcolumns` which is currently off by default. [#24406](https://github.com/ClickHouse/ClickHouse/pull/24406) ([Anton Popov](https://github.com/CurtizJ)).
+* Rewrite more columns to possible alias expressions. This may enable better optimization, such as projections. [#24405](https://github.com/ClickHouse/ClickHouse/pull/24405) ([Amos Bird](https://github.com/amosbird)).
+* Index of type `bloom_filter` can be used for expressions with `hasAny` function with constant arrays. This closes: [#24291](https://github.com/ClickHouse/ClickHouse/issues/24291). [#24900](https://github.com/ClickHouse/ClickHouse/pull/24900) ([Vasily Nemkov](https://github.com/Enmk)).
+* Add exponential backoff to reschedule read attempt in case RabbitMQ queues are empty. (ClickHouse has support for importing data from RabbitMQ). Closes [#24340](https://github.com/ClickHouse/ClickHouse/issues/24340). [#24415](https://github.com/ClickHouse/ClickHouse/pull/24415) ([Kseniia Sumarokova](https://github.com/kssenii)).
+
+#### Improvement
+
+* Allow to limit bandwidth for replication. Add two Replicated\*MergeTree settings: `max_replicated_fetches_network_bandwidth` and `max_replicated_sends_network_bandwidth` which allows to limit maximum speed of replicated fetches/sends for table. Add two server-wide settings (in `default` user profile): `max_replicated_fetches_network_bandwidth_for_server` and `max_replicated_sends_network_bandwidth_for_server` which limit maximum speed of replication for all tables. The settings are not followed perfectly accurately. Turned off by default. Fixes [#1821](https://github.com/ClickHouse/ClickHouse/issues/1821). [#24573](https://github.com/ClickHouse/ClickHouse/pull/24573) ([alesapin](https://github.com/alesapin)).
+* Resource constraints and isolation for ODBC and Library bridges. Use separate `clickhouse-bridge` group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes [#23861](https://github.com/ClickHouse/ClickHouse/issues/23861). [#25280](https://github.com/ClickHouse/ClickHouse/pull/25280) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Add standalone `clickhouse-keeper` symlink to the main `clickhouse` binary. Now it's possible to run coordination without the main clickhouse server. [#24059](https://github.com/ClickHouse/ClickHouse/pull/24059) ([alesapin](https://github.com/alesapin)).
+* Use global settings for query to `VIEW`. Fixed the behavior when queries to `VIEW` use local settings, that leads to errors if setting on `CREATE VIEW` and `SELECT` were different. As for now, `VIEW` won't use these modified settings, but you can still pass additional settings in `SETTINGS` section of `CREATE VIEW` query. Close [#20551](https://github.com/ClickHouse/ClickHouse/issues/20551). [#24095](https://github.com/ClickHouse/ClickHouse/pull/24095) ([Vladimir](https://github.com/vdimir)).
+* On server start, parts with incorrect partition ID would not be ever removed, but always detached. [#25070](https://github.com/ClickHouse/ClickHouse/issues/25070). [#25166](https://github.com/ClickHouse/ClickHouse/pull/25166) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Increase size of background schedule pool to 128 (`background_schedule_pool_size` setting). It allows avoiding replication queue hung on slow zookeeper connection. [#25072](https://github.com/ClickHouse/ClickHouse/pull/25072) ([alesapin](https://github.com/alesapin)).
+* Add merge tree setting `max_parts_to_merge_at_once` which limits the number of parts that can be merged in the background at once. Doesn't affect `OPTIMIZE FINAL` query. Fixes [#1820](https://github.com/ClickHouse/ClickHouse/issues/1820). [#24496](https://github.com/ClickHouse/ClickHouse/pull/24496) ([alesapin](https://github.com/alesapin)).
+* Allow `NOT IN` operator to be used in partition pruning. [#24894](https://github.com/ClickHouse/ClickHouse/pull/24894) ([Amos Bird](https://github.com/amosbird)).
+* Recognize IPv4 addresses like `127.0.1.1` as local. This is controversial and closes [#23504](https://github.com/ClickHouse/ClickHouse/issues/23504). Michael Filimonov will test this feature. [#24316](https://github.com/ClickHouse/ClickHouse/pull/24316) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* ClickHouse database created with MaterializeMySQL (it is an experimental feature) now contains all column comments from the MySQL database that materialized. [#25199](https://github.com/ClickHouse/ClickHouse/pull/25199) ([Storozhuk Kostiantyn](https://github.com/sand6255)).
+* Add settings (`connection_auto_close`/`connection_max_tries`/`connection_pool_size`) for MySQL storage engine. [#24146](https://github.com/ClickHouse/ClickHouse/pull/24146) ([Azat Khuzhin](https://github.com/azat)).
+* Improve startup time of Distributed engine. [#25663](https://github.com/ClickHouse/ClickHouse/pull/25663) ([Azat Khuzhin](https://github.com/azat)).
+* Improvement for Distributed tables. Drop replicas from dirname for internal_replication=true (allows INSERT into Distributed with cluster from any number of replicas, before only 15 replicas was supported, everything more will fail with ENAMETOOLONG while creating directory for async blocks). [#25513](https://github.com/ClickHouse/ClickHouse/pull/25513) ([Azat Khuzhin](https://github.com/azat)).
+* Added support `Interval` type for `LowCardinality`. It is needed for intermediate values of some expressions. Closes [#21730](https://github.com/ClickHouse/ClickHouse/issues/21730). [#25410](https://github.com/ClickHouse/ClickHouse/pull/25410) ([Vladimir](https://github.com/vdimir)).
+* Add `==` operator on time conditions for `sequenceMatch` and `sequenceCount` functions. For eg: sequenceMatch('(?1)(?t==1)(?2)')(time, data = 1, data = 2). [#25299](https://github.com/ClickHouse/ClickHouse/pull/25299) ([Christophe Kalenzaga](https://github.com/mga-chka)).
+* Add settings `http_max_fields`, `http_max_field_name_size`, `http_max_field_value_size`. [#25296](https://github.com/ClickHouse/ClickHouse/pull/25296) ([Ivan](https://github.com/abyss7)).
+* Add support for function `if` with `Decimal` and `Int` types on its branches. This closes [#20549](https://github.com/ClickHouse/ClickHouse/issues/20549). This closes [#10142](https://github.com/ClickHouse/ClickHouse/issues/10142). [#25283](https://github.com/ClickHouse/ClickHouse/pull/25283) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Update prompt in `clickhouse-client` and display a message when reconnecting. This closes [#10577](https://github.com/ClickHouse/ClickHouse/issues/10577). [#25281](https://github.com/ClickHouse/ClickHouse/pull/25281) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Correct memory tracking in aggregate function `topK`. This closes [#25259](https://github.com/ClickHouse/ClickHouse/issues/25259). [#25260](https://github.com/ClickHouse/ClickHouse/pull/25260) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix `topLevelDomain` for IDN hosts (i.e. `example.рф`), before it returns empty string for such hosts. [#25103](https://github.com/ClickHouse/ClickHouse/pull/25103) ([Azat Khuzhin](https://github.com/azat)).
+* Detect Linux kernel version at runtime (for worked nested epoll, that is required for `async_socket_for_remote`/`use_hedged_requests`, otherwise remote queries may stuck). [#25067](https://github.com/ClickHouse/ClickHouse/pull/25067) ([Azat Khuzhin](https://github.com/azat)).
+* For distributed query, when `optimize_skip_unused_shards=1`, allow to skip shard with condition like `(sharding key) IN (one-element-tuple)`. (Tuples with many elements were supported. Tuple with single element did not work because it is parsed as literal). [#24930](https://github.com/ClickHouse/ClickHouse/pull/24930) ([Amos Bird](https://github.com/amosbird)).
+* Improved log messages of S3 errors, no more double whitespaces in case of empty keys and buckets. [#24897](https://github.com/ClickHouse/ClickHouse/pull/24897) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Some queries require multi-pass semantic analysis. Try reusing built sets for `IN` in this case. [#24874](https://github.com/ClickHouse/ClickHouse/pull/24874) ([Amos Bird](https://github.com/amosbird)).
+* Respect `max_distributed_connections` for `insert_distributed_sync` (otherwise for huge clusters and sync insert it may run out of `max_thread_pool_size`). [#24754](https://github.com/ClickHouse/ClickHouse/pull/24754) ([Azat Khuzhin](https://github.com/azat)).
+* Avoid hiding errors like `Limit for rows or bytes to read exceeded` for scalar subqueries. [#24545](https://github.com/ClickHouse/ClickHouse/pull/24545) ([nvartolomei](https://github.com/nvartolomei)).
+* Make String-to-Int parser stricter so that `toInt64('+')` will throw. [#24475](https://github.com/ClickHouse/ClickHouse/pull/24475) ([Amos Bird](https://github.com/amosbird)).
+* If `SSD_CACHE` is created with DDL query, it can be created only inside `user_files` directory. [#24466](https://github.com/ClickHouse/ClickHouse/pull/24466) ([Maksim Kita](https://github.com/kitaisreal)).
+* PostgreSQL support for specifying non default schema for insert queries. Closes [#24149](https://github.com/ClickHouse/ClickHouse/issues/24149). [#24413](https://github.com/ClickHouse/ClickHouse/pull/24413) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix IPv6 addresses resolving (i.e. fixes `select * from remote('[::1]', system.one)`). [#24319](https://github.com/ClickHouse/ClickHouse/pull/24319) ([Azat Khuzhin](https://github.com/azat)).
+* Fix trailing whitespaces in FROM clause with subqueries in multiline mode, and also changes the output of the queries slightly in a more human friendly way. [#24151](https://github.com/ClickHouse/ClickHouse/pull/24151) ([Azat Khuzhin](https://github.com/azat)).
+* Improvement for Distributed tables. Add ability to split distributed batch on failures (i.e. due to memory limits, corruptions), under `distributed_directory_monitor_split_batch_on_failure` (OFF by default). [#23864](https://github.com/ClickHouse/ClickHouse/pull/23864) ([Azat Khuzhin](https://github.com/azat)).
+* Handle column name clashes for `Join` table engine. Closes [#20309](https://github.com/ClickHouse/ClickHouse/issues/20309). [#23769](https://github.com/ClickHouse/ClickHouse/pull/23769) ([Vladimir](https://github.com/vdimir)).
+* Display progress for `File` table engine in `clickhouse-local` and on INSERT query in `clickhouse-client` when data is passed to stdin. Closes [#18209](https://github.com/ClickHouse/ClickHouse/issues/18209). [#23656](https://github.com/ClickHouse/ClickHouse/pull/23656) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Bugfixes and improvements of `clickhouse-copier`. Allow to copy tables with different (but compatible schemas). Closes [#9159](https://github.com/ClickHouse/ClickHouse/issues/9159). Added test to copy ReplacingMergeTree. Closes [#22711](https://github.com/ClickHouse/ClickHouse/issues/22711). Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes [#19384](https://github.com/ClickHouse/ClickHouse/issues/19384). Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by setting `allow_to_copy_alias_and_materialized_columns` property to true in task configuration. Closes [#9177](https://github.com/ClickHouse/ClickHouse/issues/9177). Closes [#11007] (https://github.com/ClickHouse/ClickHouse/issues/11007). Closes [#9514](https://github.com/ClickHouse/ClickHouse/issues/9514). Added a property `allow_to_drop_target_partitions` in task configuration to drop partition in original table before moving helping tables. Closes [#20957](https://github.com/ClickHouse/ClickHouse/issues/20957). Get rid of `OPTIMIZE DEDUPLICATE` query. This hack was needed, because `ALTER TABLE MOVE PARTITION` was retried many times and plain MergeTree tables don't have deduplication. Closes [#17966](https://github.com/ClickHouse/ClickHouse/issues/17966). Write progress to ZooKeeper node on path `task_path + /status` in JSON format. Closes [#20955](https://github.com/ClickHouse/ClickHouse/issues/20955). Support for ReplicatedTables without arguments. Closes [#24834](https://github.com/ClickHouse/ClickHouse/issues/24834) .[#23518](https://github.com/ClickHouse/ClickHouse/pull/23518) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Added sleep with backoff between read retries from S3. [#23461](https://github.com/ClickHouse/ClickHouse/pull/23461) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Respect `insert_allow_materialized_columns` (allows materialized columns) for INSERT into `Distributed` table. [#23349](https://github.com/ClickHouse/ClickHouse/pull/23349) ([Azat Khuzhin](https://github.com/azat)).
+* Add ability to push down LIMIT for distributed queries. [#23027](https://github.com/ClickHouse/ClickHouse/pull/23027) ([Azat Khuzhin](https://github.com/azat)).
+* Fix zero-copy replication with several S3 volumes (Fixes [#22679](https://github.com/ClickHouse/ClickHouse/issues/22679)). [#22864](https://github.com/ClickHouse/ClickHouse/pull/22864) ([ianton-ru](https://github.com/ianton-ru)).
+* Resolve the actual port number bound when a user requests any available port from the operating system to show it in the log message. [#25569](https://github.com/ClickHouse/ClickHouse/pull/25569) ([bnaecker](https://github.com/bnaecker)).
+* Fixed case, when sometimes conversion of postgres arrays resulted in String data type, not n-dimensional array, because `attndims` works incorrectly in some cases. Closes [#24804](https://github.com/ClickHouse/ClickHouse/issues/24804). [#25538](https://github.com/ClickHouse/ClickHouse/pull/25538) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix convertion of DateTime with timezone for MySQL, PostgreSQL, ODBC. Closes [#5057](https://github.com/ClickHouse/ClickHouse/issues/5057). [#25528](https://github.com/ClickHouse/ClickHouse/pull/25528) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Distinguish KILL MUTATION for different tables (fixes unexpected `Cancelled mutating parts` error). [#25025](https://github.com/ClickHouse/ClickHouse/pull/25025) ([Azat Khuzhin](https://github.com/azat)).
+* Allow to declare S3 disk at root of bucket (S3 virtual filesystem is an experimental feature under development). [#24898](https://github.com/ClickHouse/ClickHouse/pull/24898) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Enable reading of subcolumns (e.g. components of Tuples) for distributed tables. [#24472](https://github.com/ClickHouse/ClickHouse/pull/24472) ([Anton Popov](https://github.com/CurtizJ)).
+* A feature for MySQL compatibility protocol: make `user` function to return correct output. Closes [#25697](https://github.com/ClickHouse/ClickHouse/pull/25697). [#25697](https://github.com/ClickHouse/ClickHouse/pull/25697) ([sundyli](https://github.com/sundy-li)).
+
+#### Bug Fix
+
+* Improvement for backward compatibility. Use old modulo function version when used in partition key. Closes [#23508](https://github.com/ClickHouse/ClickHouse/issues/23508). [#24157](https://github.com/ClickHouse/ClickHouse/pull/24157) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix extremely rare bug on low-memory servers which can lead to the inability to perform merges without restart. Possibly fixes [#24603](https://github.com/ClickHouse/ClickHouse/issues/24603). [#24872](https://github.com/ClickHouse/ClickHouse/pull/24872) ([alesapin](https://github.com/alesapin)).
+* Fix extremely rare error `Tagging already tagged part` in replication queue during concurrent `alter move/replace partition`. Possibly fixes [#22142](https://github.com/ClickHouse/ClickHouse/issues/22142). [#24961](https://github.com/ClickHouse/ClickHouse/pull/24961) ([alesapin](https://github.com/alesapin)).
+* Fix potential crash when calculating aggregate function states by aggregation of aggregate function states of other aggregate functions (not a practical use case). See [#24523](https://github.com/ClickHouse/ClickHouse/issues/24523). [#25015](https://github.com/ClickHouse/ClickHouse/pull/25015) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fixed the behavior when query `SYSTEM RESTART REPLICA` or `SYSTEM SYNC REPLICA` does not finish. This was detected on server with extremely low amount of RAM. [#24457](https://github.com/ClickHouse/ClickHouse/pull/24457) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Fix bug which can lead to ZooKeeper client hung inside clickhouse-server. [#24721](https://github.com/ClickHouse/ClickHouse/pull/24721) ([alesapin](https://github.com/alesapin)).
+* If ZooKeeper connection was lost and replica was cloned after restoring the connection, its replication queue might contain outdated entries. Fixed failed assertion when replication queue contains intersecting virtual parts. It may rarely happen if some data part was lost. Print error in log instead of terminating. [#24777](https://github.com/ClickHouse/ClickHouse/pull/24777) ([tavplubix](https://github.com/tavplubix)).
+* Fix lost `WHERE` condition in expression-push-down optimization of query plan (setting `query_plan_filter_push_down = 1` by default). Fixes [#25368](https://github.com/ClickHouse/ClickHouse/issues/25368). [#25370](https://github.com/ClickHouse/ClickHouse/pull/25370) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix bug which can lead to intersecting parts after merges with TTL: `Part all_40_40_0 is covered by all_40_40_1 but should be merged into all_40_41_1. This shouldn't happen often.`. [#25549](https://github.com/ClickHouse/ClickHouse/pull/25549) ([alesapin](https://github.com/alesapin)).
+* On ZooKeeper connection loss `ReplicatedMergeTree` table might wait for background operations to complete before trying to reconnect. It's fixed, now background operations are stopped forcefully. [#25306](https://github.com/ClickHouse/ClickHouse/pull/25306) ([tavplubix](https://github.com/tavplubix)).
+* Fix error `Key expression contains comparison between inconvertible types` for queries with `ARRAY JOIN` in case if array is used in primary key. Fixes [#8247](https://github.com/ClickHouse/ClickHouse/issues/8247). [#25546](https://github.com/ClickHouse/ClickHouse/pull/25546) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix wrong totals for query `WITH TOTALS` and `WITH FILL`. Fixes [#20872](https://github.com/ClickHouse/ClickHouse/issues/20872). [#25539](https://github.com/ClickHouse/ClickHouse/pull/25539) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix data race when querying `system.clusters` while reloading the cluster configuration at the same time. [#25737](https://github.com/ClickHouse/ClickHouse/pull/25737) ([Amos Bird](https://github.com/amosbird)).
+* Fixed `No such file or directory` error on moving `Distributed` table between databases. Fixes [#24971](https://github.com/ClickHouse/ClickHouse/issues/24971). [#25667](https://github.com/ClickHouse/ClickHouse/pull/25667) ([tavplubix](https://github.com/tavplubix)).
+* `REPLACE PARTITION` might be ignored in rare cases if the source partition was empty. It's fixed. Fixes [#24869](https://github.com/ClickHouse/ClickHouse/issues/24869). [#25665](https://github.com/ClickHouse/ClickHouse/pull/25665) ([tavplubix](https://github.com/tavplubix)).
+* Fixed a bug in `Replicated` database engine that might rarely cause some replica to skip enqueued DDL query. [#24805](https://github.com/ClickHouse/ClickHouse/pull/24805) ([tavplubix](https://github.com/tavplubix)).
+* Fix null pointer dereference in `EXPLAIN AST` without query. [#25631](https://github.com/ClickHouse/ClickHouse/pull/25631) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix waiting of automatic dropping of empty parts. It could lead to full filling of background pool and stuck of replication. [#23315](https://github.com/ClickHouse/ClickHouse/pull/23315) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix restore of a table stored in S3 virtual filesystem (it is an experimental feature not ready for production). [#25601](https://github.com/ClickHouse/ClickHouse/pull/25601) ([ianton-ru](https://github.com/ianton-ru)).
+* Fix nullptr dereference in `Arrow` format when using `Decimal256`. Add `Decimal256` support for `Arrow` format. [#25531](https://github.com/ClickHouse/ClickHouse/pull/25531) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fix excessive underscore before the names of the preprocessed configuration files. [#25431](https://github.com/ClickHouse/ClickHouse/pull/25431) ([Vitaly Baranov](https://github.com/vitlibar)).
+* A fix for `clickhouse-copier` tool: Fix segfault when sharding_key is absent in task config for copier. [#25419](https://github.com/ClickHouse/ClickHouse/pull/25419) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Fix `REPLACE` column transformer when used in DDL by correctly quoting the formated query. This fixes [#23925](https://github.com/ClickHouse/ClickHouse/issues/23925). [#25391](https://github.com/ClickHouse/ClickHouse/pull/25391) ([Amos Bird](https://github.com/amosbird)).
+* Fix the possibility of non-deterministic behaviour of the `quantileDeterministic` function and similar. This closes [#20480](https://github.com/ClickHouse/ClickHouse/issues/20480). [#25313](https://github.com/ClickHouse/ClickHouse/pull/25313) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Support `SimpleAggregateFunction(LowCardinality)` for `SummingMergeTree`. Fixes [#25134](https://github.com/ClickHouse/ClickHouse/issues/25134). [#25300](https://github.com/ClickHouse/ClickHouse/pull/25300) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix logical error with exception message "Cannot sum Array/Tuple in min/maxMap". [#25298](https://github.com/ClickHouse/ClickHouse/pull/25298) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fix error `Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t>` for queries where `LowCardinality` argument was used for IN (this bug appeared in 21.6). Fixes [#25187](https://github.com/ClickHouse/ClickHouse/issues/25187). [#25290](https://github.com/ClickHouse/ClickHouse/pull/25290) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix incorrect behaviour of `joinGetOrNull` with not-nullable columns. This fixes [#24261](https://github.com/ClickHouse/ClickHouse/issues/24261). [#25288](https://github.com/ClickHouse/ClickHouse/pull/25288) ([Amos Bird](https://github.com/amosbird)).
+* Fix incorrect behaviour and UBSan report in big integers. In previous versions `CAST(1e19 AS UInt128)` returned zero. [#25279](https://github.com/ClickHouse/ClickHouse/pull/25279) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fixed an error which occurred while inserting a subset of columns using CSVWithNames format. Fixes [#25129](https://github.com/ClickHouse/ClickHouse/issues/25129). [#25169](https://github.com/ClickHouse/ClickHouse/pull/25169) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Do not use table's projection for `SELECT` with `FINAL`. It is not supported yet. [#25163](https://github.com/ClickHouse/ClickHouse/pull/25163) ([Amos Bird](https://github.com/amosbird)).
+* Fix possible parts loss after updating up to 21.5 in case table used `UUID` in partition key. (It is not recommended to use `UUID` in partition key). Fixes [#25070](https://github.com/ClickHouse/ClickHouse/issues/25070). [#25127](https://github.com/ClickHouse/ClickHouse/pull/25127) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix crash in query with cross join and `joined_subquery_requires_alias = 0`. Fixes [#24011](https://github.com/ClickHouse/ClickHouse/issues/24011). [#25082](https://github.com/ClickHouse/ClickHouse/pull/25082) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix bug with constant maps in mapContains function that lead to error `empty column was returned by function mapContains`. Closes [#25077](https://github.com/ClickHouse/ClickHouse/issues/25077). [#25080](https://github.com/ClickHouse/ClickHouse/pull/25080) ([Kruglov Pavel](https://github.com/Avogar)).
+* Remove possibility to create tables with columns referencing themselves like `a UInt32 ALIAS a + 1` or `b UInt32 MATERIALIZED b`. Fixes [#24910](https://github.com/ClickHouse/ClickHouse/issues/24910), [#24292](https://github.com/ClickHouse/ClickHouse/issues/24292). [#25059](https://github.com/ClickHouse/ClickHouse/pull/25059) ([alesapin](https://github.com/alesapin)).
+* Fix wrong result when using aggregate projection with *not empty* `GROUP BY` key to execute query with `GROUP BY` by *empty* key. [#25055](https://github.com/ClickHouse/ClickHouse/pull/25055) ([Amos Bird](https://github.com/amosbird)).
+* Fix serialization of splitted nested messages in Protobuf format. This PR fixes [#24647](https://github.com/ClickHouse/ClickHouse/issues/24647). [#25000](https://github.com/ClickHouse/ClickHouse/pull/25000) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix limit/offset settings for distributed queries (ignore on the remote nodes). [#24940](https://github.com/ClickHouse/ClickHouse/pull/24940) ([Azat Khuzhin](https://github.com/azat)).
+* Fix possible heap-buffer-overflow in `Arrow` format. [#24922](https://github.com/ClickHouse/ClickHouse/pull/24922) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). [#24885](https://github.com/ClickHouse/ClickHouse/pull/24885) ([Pavel Kovalenko](https://github.com/Jokser)).
+* Fix "Missing columns" exception when joining Distributed Materialized View. [#24870](https://github.com/ClickHouse/ClickHouse/pull/24870) ([Azat Khuzhin](https://github.com/azat)).
+* Allow `NULL` values in postgresql compatibility protocol. Closes [#22622](https://github.com/ClickHouse/ClickHouse/issues/22622). [#24857](https://github.com/ClickHouse/ClickHouse/pull/24857) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix bug when exception `Mutation was killed` can be thrown to the client on mutation wait when mutation not loaded into memory yet. [#24809](https://github.com/ClickHouse/ClickHouse/pull/24809) ([alesapin](https://github.com/alesapin)).
+* Fixed bug in deserialization of random generator state with might cause some data types such as `AggregateFunction(groupArraySample(N), T))` to behave in a non-deterministic way. [#24538](https://github.com/ClickHouse/ClickHouse/pull/24538) ([tavplubix](https://github.com/tavplubix)).
+* Disallow building uniqXXXXStates of other aggregation states. [#24523](https://github.com/ClickHouse/ClickHouse/pull/24523) ([Raúl Marín](https://github.com/Algunenano)). Then allow it back by actually eliminating the root cause of the related issue. ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix usage of tuples in `CREATE .. AS SELECT` queries. [#24464](https://github.com/ClickHouse/ClickHouse/pull/24464) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix computation of total bytes in `Buffer` table. In current ClickHouse version total_writes.bytes counter decreases too much during the buffer flush. It leads to counter overflow and totalBytes return something around 17.44 EB some time after the flush. [#24450](https://github.com/ClickHouse/ClickHouse/pull/24450) ([DimasKovas](https://github.com/DimasKovas)).
+* Fix incorrect information about the monotonicity of toWeek function. This fixes [#24422](https://github.com/ClickHouse/ClickHouse/issues/24422) . This bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/5212 , and was exposed later by smarter partition pruner. [#24446](https://github.com/ClickHouse/ClickHouse/pull/24446) ([Amos Bird](https://github.com/amosbird)).
+* When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. [#24431](https://github.com/ClickHouse/ClickHouse/pull/24431) ([Denis Glazachev](https://github.com/traceon)).
+* In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes [#23905](https://github.com/ClickHouse/ClickHouse/issues/23905). [#24399](https://github.com/ClickHouse/ClickHouse/pull/24399) ([Ivan](https://github.com/abyss7)).
+* Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. [#24321](https://github.com/ClickHouse/ClickHouse/pull/24321) ([Amos Bird](https://github.com/amosbird)).
+* Fixed a bug in moving Materialized View from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with Materialized View. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)).
+* Allow empty HTTP headers. Fixes [#23901](https://github.com/ClickHouse/ClickHouse/issues/23901). [#24285](https://github.com/ClickHouse/ClickHouse/pull/24285) ([Ivan](https://github.com/abyss7)).
+* Correct processing of mutations (ALTER UPDATE/DELETE) in Memory tables. Closes [#24274](https://github.com/ClickHouse/ClickHouse/issues/24274). [#24275](https://github.com/ClickHouse/ClickHouse/pull/24275) ([flynn](https://github.com/ucasfl)).
+* Make column LowCardinality property in JOIN output the same as in the input, close [#23351](https://github.com/ClickHouse/ClickHouse/issues/23351), close [#20315](https://github.com/ClickHouse/ClickHouse/issues/20315). [#24061](https://github.com/ClickHouse/ClickHouse/pull/24061) ([Vladimir](https://github.com/vdimir)).
+* A fix for Kafka tables. Fix the bug in failover behavior when Engine = Kafka was not able to start consumption if the same consumer had an empty assignment previously. Closes [#21118](https://github.com/ClickHouse/ClickHouse/issues/21118). [#21267](https://github.com/ClickHouse/ClickHouse/pull/21267) ([filimonov](https://github.com/filimonov)).
+
+#### Build/Testing/Packaging Improvement
+
+* Add `darwin-aarch64` (Mac M1 / Apple Silicon) builds in CI [#25560](https://github.com/ClickHouse/ClickHouse/pull/25560) ([Ivan](https://github.com/abyss7)) and put the links to the docs and website ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Adds cross-platform embedding of binary resources into executables. It works on Illumos. [#25146](https://github.com/ClickHouse/ClickHouse/pull/25146) ([bnaecker](https://github.com/bnaecker)).
+* Add join related options to stress tests to improve fuzzing. [#25200](https://github.com/ClickHouse/ClickHouse/pull/25200) ([Vladimir](https://github.com/vdimir)).
+* Enable build with s3 module in osx [#25217](https://github.com/ClickHouse/ClickHouse/issues/25217). [#25218](https://github.com/ClickHouse/ClickHouse/pull/25218) ([kevin wan](https://github.com/MaxWk)).
+* Add integration test cases to cover JDBC bridge. [#25047](https://github.com/ClickHouse/ClickHouse/pull/25047) ([Zhichun Wu](https://github.com/zhicwu)).
+* Integration tests configuration has special treatment for dictionaries. Removed remaining dictionaries manual setup. [#24728](https://github.com/ClickHouse/ClickHouse/pull/24728) ([Ilya Yatsishin](https://github.com/qoega)).
+* Add libfuzzer tests for YAMLParser class. [#24480](https://github.com/ClickHouse/ClickHouse/pull/24480) ([BoloniniD](https://github.com/BoloniniD)).
+* Ubuntu 20.04 is now used to run integration tests, docker-compose version used to run integration tests is updated to 1.28.2. Environment variables now take effect on docker-compose. Rework test_dictionaries_all_layouts_separate_sources to allow parallel run. [#20393](https://github.com/ClickHouse/ClickHouse/pull/20393) ([Ilya Yatsishin](https://github.com/qoega)).
+* Fix TOCTOU error in installation script. [#25277](https://github.com/ClickHouse/ClickHouse/pull/25277) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+
+
 ### ClickHouse release 21.6, 2021-06-05

 #### Upgrade Notes
--- a/cmake/embed_binary.cmake
+++ b/cmake/embed_binary.cmake
@ -53,5 +53,6 @@ macro(clickhouse_embed_binaries)
        set_property(SOURCE "${CMAKE_CURRENT_BINARY_DIR}/${ASSEMBLY_FILE_NAME}" APPEND PROPERTY INCLUDE_DIRECTORIES "${EMBED_RESOURCE_DIR}")

        target_sources("${EMBED_TARGET}" PRIVATE "${CMAKE_CURRENT_BINARY_DIR}/${ASSEMBLY_FILE_NAME}")
+        set_target_properties("${EMBED_TARGET}" PROPERTIES OBJECT_DEPENDS "${RESOURCE_FILE}")
    endforeach()
 endmacro()
--- a/docker/test/performance-comparison/compare.sh
+++ b/docker/test/performance-comparison/compare.sh
@ -1178,11 +1178,11 @@ create view right_async_metric_log as
 -- Use the right log as time reference because it may have higher precision.
 create table metrics engine File(TSV, 'metrics/metrics.tsv') as
    with (select min(event_time) from right_async_metric_log) as min_time
-    select name metric, r.event_time - min_time event_time, l.value as left, r.value as right
+    select metric, r.event_time - min_time event_time, l.value as left, r.value as right
    from right_async_metric_log r
    asof join file('left-async-metric-log.tsv', TSVWithNamesAndTypes,
        '$(cat left-async-metric-log.tsv.columns)') l
-    on l.name = r.name and r.event_time <= l.event_time
+    on l.metric = r.metric and r.event_time <= l.event_time
    order by metric, event_time
    ;

--- a/docs/en/images/play.png
+++ b/docs/en/images/play.png
--- a/docs/en/interfaces/http.md
+++ b/docs/en/interfaces/http.md
@ -7,16 +7,21 @@ toc_title: HTTP Interface

 The HTTP interface lets you use ClickHouse on any platform from any programming language. We use it for working from Java and Perl, as well as shell scripts. In other departments, the HTTP interface is used from Perl, Python, and Go. The HTTP interface is more limited than the native interface, but it has better compatibility.

-By default, clickhouse-server listens for HTTP on port 8123 (this can be changed in the config).
+By default, `clickhouse-server` listens for HTTP on port 8123 (this can be changed in the config).

-If you make a GET / request without parameters, it returns 200 response code and the string which defined in [http_server_default_response](../operations/server-configuration-parameters/settings.md#server_configuration_parameters-http_server_default_response) default value “Ok.” (with a line feed at the end)
+If you make a `GET /` request without parameters, it returns 200 response code and the string which defined in [http_server_default_response](../operations/server-configuration-parameters/settings.md#server_configuration_parameters-http_server_default_response) default value “Ok.” (with a line feed at the end)

 ``` bash
 $ curl 'http://localhost:8123/'
 Ok.
 ```

-Use GET /ping request in health-check scripts. This handler always returns “Ok.” (with a line feed at the end). Available from version 18.12.13.
+Web UI can be accessed here: `http://localhost:8123/play`. 
+
+![Web UI](../images/play.png)
+
+
+In health-check scripts use `GET /ping` request. This handler always returns “Ok.” (with a line feed at the end). Available from version 18.12.13.

 ``` bash
 $ curl 'http://localhost:8123/ping'
@ -51,8 +56,8 @@ X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","written_rows":"0","writ
 1
 ```

-As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.
-Although wget escapes everything itself, we do not recommend using it because it does not work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
+As you can see, `curl` is somewhat inconvenient in that spaces must be URL escaped.
+Although `wget` escapes everything itself, we do not recommend using it because it does not work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.

 ``` bash
 $ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@ -75,7 +80,7 @@ ECT 1
 , expected One of: SHOW TABLES, SHOW DATABASES, SELECT, INSERT, CREATE, ATTACH, RENAME, DROP, DETACH, USE, SET, OPTIMIZE., e.what() = DB::Exception
 ```

-By default, data is returned in TabSeparated format (for more information, see the “Formats” section).
+By default, data is returned in [TabSeparated](formats.md#tabseparated) format.

 You use the FORMAT clause of the query to request any other format.

@ -90,9 +95,11 @@ $ echo 'SELECT 1 FORMAT Pretty' | curl 'http://localhost:8123/?' --data-binary @
 └───┘
 ```

-The POST method of transmitting data is necessary for INSERT queries. In this case, you can write the beginning of the query in the URL parameter, and use POST to pass the data to insert. The data to insert could be, for example, a tab-separated dump from MySQL. In this way, the INSERT query replaces LOAD DATA LOCAL INFILE from MySQL.
+The POST method of transmitting data is necessary for `INSERT` queries. In this case, you can write the beginning of the query in the URL parameter, and use POST to pass the data to insert. The data to insert could be, for example, a tab-separated dump from MySQL. In this way, the `INSERT` query replaces `LOAD DATA LOCAL INFILE` from MySQL.

-Examples: Creating a table:
+**Examples**
+
+Creating a table:

 ``` bash
 $ echo 'CREATE TABLE t (a UInt8) ENGINE = Memory' | curl 'http://localhost:8123/' --data-binary @-
@ -632,6 +639,4 @@ $ curl -vv -H 'XXX:xxx' 'http://localhost:8123/get_relative_path_static_handler'
 <
 <html><body>Relative Path File</body></html>
 * Connection #0 to host localhost left intact
-```
-
-[Original article](https://clickhouse.tech/docs/en/interfaces/http_interface/) <!--hide-->
+```
--- a/docs/en/introduction/adopters.md
+++ b/docs/en/introduction/adopters.md
@ -59,6 +59,7 @@ toc_title: Adopters
 | <a href="https://www.huya.com/" class="favicon">HUYA</a> | Video Streaming | Analytics | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/7.%20ClickHouse万亿数据分析实践%20李本旺(sundy-li)%20虎牙.pdf) |
 | <a href="https://www.the-ica.com/" class="favicon">ICA</a> | FinTech | Risk Management | — | — | [Blog Post in English, Sep 2020](https://altinity.com/blog/clickhouse-vs-redshift-performance-for-fintech-risk-management?utm_campaign=ClickHouse%20vs%20RedShift&utm_content=143520807&utm_medium=social&utm_source=twitter&hss_channel=tw-3894792263) |
 | <a href="https://www.idealista.com" class="favicon">Idealista</a> | Real Estate | Analytics | — | — | [Blog Post in English, April 2019](https://clickhouse.tech/blog/en/clickhouse-meetup-in-madrid-on-april-2-2019) |
+| <a href="https://infobaleen.com" class="favicon">Infobaleen</a> | AI markting tool  | Analytics | — | — | [Official site](https://infobaleen.com) |
 | <a href="https://www.infovista.com/" class="favicon">Infovista</a> | Networks | Analytics | — | — | [Slides in English, October 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup30/infovista.pdf) |
 | <a href="https://www.innogames.com" class="favicon">InnoGames</a> | Games | Metrics, Logging | — | — | [Slides in Russian, September 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup28/graphite_and_clickHouse.pdf) |
 | <a href="https://instabug.com/" class="favicon">Instabug</a> | APM Platform | Main product | — | — | [A quote from Co-Founder](https://altinity.com/) |
--- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md
+++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md
@ -129,7 +129,7 @@ That dictionary source can be configured only via XML configuration. Creating di

 ## Executable Pool {#dicts-external_dicts_dict_sources-executable_pool}

-Executable pool allows loading data from pool of processes. This source does not work with dictionary layouts that need to load all data from source. Executable pool works if the dictionary [is stored](external-dicts-dict-layout.md#ways-to-store-dictionaries-in-memory) using `cache`, `complex_key_cache`, `ssd_cache`, `complex_key_ssd_cache`, `direct`, `complex_key_direct` layouts. 
+Executable pool allows loading data from pool of processes. This source does not work with dictionary layouts that need to load all data from source. Executable pool works if the dictionary [is stored](external-dicts-dict-layout.md#ways-to-store-dictionaries-in-memory) using `cache`, `complex_key_cache`, `ssd_cache`, `complex_key_ssd_cache`, `direct`, `complex_key_direct` layouts.

 Executable pool will spawn pool of processes with specified command and keep them running until they exit. The program should read data from STDIN while it is available and output result to STDOUT, and it can wait for next block of data on STDIN. ClickHouse will not close STDIN after processing a block of data but will pipe another chunk of data when needed. The executable script should be ready for this way of data processing — it should poll STDIN and flush data to STDOUT early.

@ -581,6 +581,7 @@ Example of settings:
        <db>default</db>
        <table>ids</table>
        <where>id=10</where>
+        <secure>1</secure>
    </clickhouse>
 </source>
 ```
@ -596,6 +597,7 @@ SOURCE(CLICKHOUSE(
    db 'default'
    table 'ids'
    where 'id=10'
+    secure 1
 ))
 ```

@ -609,6 +611,7 @@ Setting fields:
 -   `table` – Name of the table.
 -   `where` – The selection criteria. May be omitted.
 -   `invalidate_query` – Query for checking the dictionary status. Optional parameter. Read more in the section [Updating dictionaries](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md).
+-   `secure` - Use ssl for connection.

 ### Mongodb {#dicts-external_dicts_dict_sources-mongodb}

--- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md
+++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md
@ -159,7 +159,7 @@ Configuration fields:
 | Tag                                                  | Description                                                                                                                                                                                                                                                                                                                                     | Required |
 |------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
 | `name`                                               | Column name.                                                                                                                                                                                                                                                                                                                                    | Yes      |
-| `type`                                               | ClickHouse data type: [UInt8](../../../sql-reference/data-types/int-uint.md), [UInt16](../../../sql-reference/data-types/int-uint.md), [UInt32](../../../sql-reference/data-types/int-uint.md), [UInt64](../../../sql-reference/data-types/int-uint.md), [Int8](../../../sql-reference/data-types/int-uint.md), [Int16](../../../sql-reference/data-types/int-uint.md), [Int32](../../../sql-reference/data-types/int-uint.md), [Int64](../../../sql-reference/data-types/int-uint.md), [Float32](../../../sql-reference/data-types/float.md), [Float64](../../../sql-reference/data-types/float.md), [UUID](../../../sql-reference/data-types/uuid.md), [Decimal32](../../../sql-reference/data-types/decimal.md), [Decimal64](../../../sql-reference/data-types/decimal.md), [Decimal128](../../../sql-reference/data-types/decimal.md), [Decimal256](../../../sql-reference/data-types/decimal.md), [String](../../../sql-reference/data-types/string.md).<br/>ClickHouse tries to cast value from dictionary to the specified data type. For example, for MySQL, the field might be `TEXT`, `VARCHAR`, or `BLOB` in the MySQL source table, but it can be uploaded as `String` in ClickHouse.<br/>[Nullable](../../../sql-reference/data-types/nullable.md) is currently supported for [Flat](external-dicts-dict-layout.md#flat), [Hashed](external-dicts-dict-layout.md#dicts-external_dicts_dict_layout-hashed), [ComplexKeyHashed](external-dicts-dict-layout.md#complex-key-hashed), [Direct](external-dicts-dict-layout.md#direct), [ComplexKeyDirect](external-dicts-dict-layout.md#complex-key-direct), [RangeHashed](external-dicts-dict-layout.md#range-hashed), [Polygon](external-dicts-dict-polygon.md), [Cache](external-dicts-dict-layout.md#cache), [ComplexKeyCache](external-dicts-dict-layout.md#complex-key-cache), [SSDCache](external-dicts-dict-layout.md#ssd-cache), [SSDComplexKeyCache](external-dicts-dict-layout.md#complex-key-ssd-cache) dictionaries. In [IPTrie](external-dicts-dict-layout.md#ip-trie) dictionaries `Nullable` types are not supported.       | Yes      |
+| `type`                                               | ClickHouse data type: [UInt8](../../../sql-reference/data-types/int-uint.md), [UInt16](../../../sql-reference/data-types/int-uint.md), [UInt32](../../../sql-reference/data-types/int-uint.md), [UInt64](../../../sql-reference/data-types/int-uint.md), [Int8](../../../sql-reference/data-types/int-uint.md), [Int16](../../../sql-reference/data-types/int-uint.md), [Int32](../../../sql-reference/data-types/int-uint.md), [Int64](../../../sql-reference/data-types/int-uint.md), [Float32](../../../sql-reference/data-types/float.md), [Float64](../../../sql-reference/data-types/float.md), [UUID](../../../sql-reference/data-types/uuid.md), [Decimal32](../../../sql-reference/data-types/decimal.md), [Decimal64](../../../sql-reference/data-types/decimal.md), [Decimal128](../../../sql-reference/data-types/decimal.md), [Decimal256](../../../sql-reference/data-types/decimal.md), [String](../../../sql-reference/data-types/string.md), [Array](../../../sql-reference/data-types/array.md).<br/>ClickHouse tries to cast value from dictionary to the specified data type. For example, for MySQL, the field might be `TEXT`, `VARCHAR`, or `BLOB` in the MySQL source table, but it can be uploaded as `String` in ClickHouse.<br/>[Nullable](../../../sql-reference/data-types/nullable.md) is currently supported for [Flat](external-dicts-dict-layout.md#flat), [Hashed](external-dicts-dict-layout.md#dicts-external_dicts_dict_layout-hashed), [ComplexKeyHashed](external-dicts-dict-layout.md#complex-key-hashed), [Direct](external-dicts-dict-layout.md#direct), [ComplexKeyDirect](external-dicts-dict-layout.md#complex-key-direct), [RangeHashed](external-dicts-dict-layout.md#range-hashed), [Polygon](external-dicts-dict-polygon.md), [Cache](external-dicts-dict-layout.md#cache), [ComplexKeyCache](external-dicts-dict-layout.md#complex-key-cache), [SSDCache](external-dicts-dict-layout.md#ssd-cache), [SSDComplexKeyCache](external-dicts-dict-layout.md#complex-key-ssd-cache) dictionaries. In [IPTrie](external-dicts-dict-layout.md#ip-trie) dictionaries `Nullable` types are not supported.       | Yes      |
 | `null_value`                                         | Default value for a non-existing element.<br/>In the example, it is an empty string. [NULL](../../syntax.md#null-literal) value can be used only for the `Nullable` types (see the previous line with types description).                                                                                                                                                                                                                       | Yes      |
 | `expression`                                         | [Expression](../../../sql-reference/syntax.md#syntax-expressions) that ClickHouse executes on the value.<br/>The expression can be a column name in the remote SQL database. Thus, you can use it to create an alias for the remote column.<br/><br/>Default value: no expression.                                                              | No       |
 | <a name="hierarchical-dict-attr"></a> `hierarchical` | If `true`, the attribute contains the value of a parent key for the current key. See [Hierarchical Dictionaries](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-hierarchical.md).<br/><br/>Default value: `false`.                                                                                               | No       |
--- a/docs/en/sql-reference/functions/logical-functions.md
+++ b/docs/en/sql-reference/functions/logical-functions.md
@ -5,15 +5,186 @@ toc_title: Logical

 # Logical Functions {#logical-functions}

-Logical functions accept any numeric types, but return a UInt8 number equal to 0 or 1.
+Performs logical operations on arguments of any numeric types, but returns a [UInt8](../../sql-reference/data-types/int-uint.md) number equal to 0, 1 or `NULL` in some cases.

-Zero as an argument is considered “false,” while any non-zero value is considered “true”.
+Zero as an argument is considered `false`, while any non-zero value is considered `true`.

-## and, AND operator {#and-and-operator}
+## and {#logical-and-function}

-## or, OR operator {#or-or-operator}
+Calculates the result of the logical conjunction between two or more values. Corresponds to [Logical AND Operator](../../sql-reference/operators/index.md#logical-and-operator).

-## not, NOT operator {#not-not-operator}
+**Syntax**

-## xor {#xor}
+``` sql
+and(val1, val2...)
+```

+**Arguments**
+
+-   `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `0`, if there is at least one zero value argument.
+-   `NULL`, if there are no zero values arguments and there is at least one `NULL` argument.
+-   `1`, otherwise.
+
+Type: [UInt8](../../sql-reference/data-types/int-uint.md) or [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Example**
+
+Query:
+
+``` sql
+SELECT and(0, 1, -2);
+```
+
+Result:
+
+``` text
+┌─and(0, 1, -2)─┐
+│             0 │
+└───────────────┘
+```
+
+With `NULL`:
+
+``` sql
+SELECT and(NULL, 1, 10, -2);
+```
+
+Result:
+
+``` text
+┌─and(NULL, 1, 10, -2)─┐
+│                 ᴺᵁᴸᴸ │
+└──────────────────────┘
+```
+
+## or {#logical-or-function}
+
+Calculates the result of the logical disjunction between two or more values. Corresponds to [Logical OR Operator](../../sql-reference/operators/index.md#logical-or-operator).
+
+**Syntax**
+
+``` sql
+and(val1, val2...)
+```
+
+**Arguments**
+
+-   `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `1`, if there is at least one non-zero value.
+-   `0`, if there are only zero values.
+-   `NULL`, if there are only zero values and `NULL`.
+
+Type: [UInt8](../../sql-reference/data-types/int-uint.md) or [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Example**
+
+Query:
+
+``` sql
+SELECT or(1, 0, 0, 2, NULL);
+```
+
+Result:
+
+``` text
+┌─or(1, 0, 0, 2, NULL)─┐
+│                    1 │
+└──────────────────────┘
+```
+
+With `NULL`:
+
+``` sql
+SELECT or(0, NULL);
+```
+
+Result:
+
+``` text
+┌─or(0, NULL)─┐
+│        ᴺᵁᴸᴸ │
+└─────────────┘
+```
+
+## not {#logical-not-function}
+
+Calculates the result of the logical negation of the value. Corresponds to [Logical Negation Operator](../../sql-reference/operators/index.md#logical-negation-operator).
+
+**Syntax**
+
+``` sql
+not(val);
+```
+
+**Arguments**
+
+-   `val` — The value. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `1`, if the `val` is `0`.
+-   `0`, if the `val` is a non-zero value.
+-   `NULL`, if the `val` is a `NULL` value.
+
+Type: [UInt8](../../sql-reference/data-types/int-uint.md) or [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Example**
+
+Query:
+
+``` sql
+SELECT NOT(1);
+```
+
+Result:
+
+``` test
+┌─not(1)─┐
+│      0 │
+└────────┘
+```
+
+## xor {#logical-xor-function}
+
+Calculates the result of the logical exclusive disjunction between two or more values. For more than two values the function works as if it calculates `XOR` of the first two values and then uses the result with the next value to calculate `XOR` and so on.
+
+**Syntax**
+
+``` sql
+xor(val1, val2...)
+```
+
+**Arguments**
+
+-   `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `1`, for two values: if one of the values is zero and other is not. 
+-   `0`, for two values: if both values are zero or non-zero at the same time.
+-   `NULL`, if there is at least one `NULL` value.
+
+Type: [UInt8](../../sql-reference/data-types/int-uint.md) or [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Example**
+
+Query:
+
+``` sql
+SELECT xor(0, 1, 1);
+```
+
+Result:
+
+``` text
+┌─xor(0, 1, 1)─┐
+│            0 │
+└──────────────┘
+```
--- a/docs/en/sql-reference/operators/index.md
+++ b/docs/en/sql-reference/operators/index.md
@ -211,17 +211,17 @@ SELECT toDateTime('2014-10-26 00:00:00', 'Europe/Moscow') AS time, time + 60 * 6
 -   [Interval](../../sql-reference/data-types/special-data-types/interval.md) data type
 -   [toInterval](../../sql-reference/functions/type-conversion-functions.md#function-tointerval) type conversion functions

-## Logical Negation Operator {#logical-negation-operator}
-
-`NOT a` – The `not(a)` function.
-
 ## Logical AND Operator {#logical-and-operator}

-`a AND b` – The`and(a, b)` function.
+Syntax `SELECT a AND b` — calculates logical conjunction of `a` and `b` with the function [and](../../sql-reference/functions/logical-functions.md#logical-and-function).

 ## Logical OR Operator {#logical-or-operator}

-`a OR b` – The `or(a, b)` function.
+Syntax `SELECT a OR b` — calculates logical disjunction of `a` and `b` with the function [or](../../sql-reference/functions/logical-functions.md#logical-or-function).
+
+## Logical Negation Operator {#logical-negation-operator}
+
+Syntax `SELECT NOT a` — calculates logical negation of `a` with the function [not](../../sql-reference/functions/logical-functions.md#logical-not-function).

 ## Conditional Operator {#conditional-operator}

--- a/docs/ru/getting-started/playground.md
+++ b/docs/ru/getting-started/playground.md
@ -61,4 +61,4 @@ clickhouse client --secure -h play-api.clickhouse.tech --port 9440 -u playground
 Бэкэнд Playground - это кластер ClickHouse без дополнительных серверных приложений. Как упоминалось выше,  способы подключения по HTTPS и TCP/TLS общедоступны как часть Playground. Они проксируются через [Cloudflare Spectrum](https://www.cloudflare.com/products/cloudflare-spectrum/) для добавления дополнительного уровня защиты и улучшенного глобального подключения.

 !!! warning "Предупреждение"
-Открывать сервер ClickHouse для публичного доступа  в любой другой ситуации **настоятельно не рекомендуется**. Убедитесь, что он настроен только на частную сеть и защищен брандмауэром.
+    Открывать сервер ClickHouse для публичного доступа  в любой другой ситуации **настоятельно не рекомендуется**. Убедитесь, что он настроен только на частную сеть и защищен брандмауэром.
--- a/docs/ru/images/play.png
+++ b/docs/ru/images/play.png
--- a/docs/ru/interfaces/http.md
+++ b/docs/ru/interfaces/http.md
@ -5,30 +5,33 @@ toc_title: "HTTP-интерфейс"

 # HTTP-интерфейс {#http-interface}

-HTTP интерфейс позволяет использовать ClickHouse на любой платформе, из любого языка программирования. У нас он используется для работы из Java и Perl, а также из shell-скриптов. В других отделах, HTTP интерфейс используется из Perl, Python и Go. HTTP интерфейс более ограничен по сравнению с родным интерфейсом, но является более совместимым.
+HTTP интерфейс позволяет использовать ClickHouse на любой платформе, из любого языка программирования. У нас он используется для работы из Java и Perl, а также из shell-скриптов. В других отделах HTTP интерфейс используется из Perl, Python и Go. HTTP интерфейс более ограничен по сравнению с родным интерфейсом, но является более совместимым.

-По умолчанию, clickhouse-server слушает HTTP на порту 8123 (это можно изменить в конфиге).
-Если запросить GET / без параметров, то вернётся строка заданная с помощью настройки [http_server_default_response](../operations/server-configuration-parameters/settings.md#server_configuration_parameters-http_server_default_response). Значение по умолчанию «Ok.» (с переводом строки на конце).
+По умолчанию `clickhouse-server` слушает HTTP на порту 8123 (это можно изменить в конфиге).
+Если запросить `GET /` без параметров, то вернётся строка заданная с помощью настройки [http_server_default_response](../operations/server-configuration-parameters/settings.md#server_configuration_parameters-http_server_default_response). Значение по умолчанию «Ok.» (с переводом строки на конце).

 ``` bash
 $ curl 'http://localhost:8123/'
 Ok.
 ```

-В скриптах проверки доступности вы можете использовать GET /ping без параметров. Если сервер доступен всегда возвращается «Ok.» (с переводом строки на конце).
+Веб-интерфейс доступен по адресу: `http://localhost:8123/play`. 
+
+![Веб-интерфейс](../images/play.png)
+
+В скриптах проверки доступности вы можете использовать `GET /ping` без параметров. Если сервер доступен, всегда возвращается «Ok.» (с переводом строки на конце).

 ``` bash
 $ curl 'http://localhost:8123/ping'
 Ok.
 ```

-Запрос отправляется в виде URL параметра с именем query. Или как тело запроса при использовании метода POST.
+Запрос отправляется в виде URL параметра с именем `query`. Или как тело запроса при использовании метода POST.
 Или начало запроса в URL параметре query, а продолжение POST-ом (зачем это нужно, будет объяснено ниже). Размер URL ограничен 16KB, это следует учитывать при отправке больших запросов.

-В случае успеха, вам вернётся код ответа 200 и результат обработки запроса в теле ответа.
-В случае ошибки, вам вернётся код ответа 500 и текст с описанием ошибки в теле ответа.
+В случае успеха возвращается код ответа 200 и результат обработки запроса в теле ответа, в случае ошибки — код ответа 500 и текст с описанием ошибки в теле ответа.

-При использовании метода GET, выставляется настройка readonly. То есть, для запросов, модифицирующие данные, можно использовать только метод POST. Сам запрос при этом можно отправлять как в теле POST-а, так и в параметре URL.
+При использовании метода GET выставляется настройка readonly. То есть, для запросов, модифицирующих данные, можно использовать только метод POST. Сам запрос при этом можно отправлять как в теле POST запроса, так и в параметре URL.

 Примеры:

@ -51,8 +54,8 @@ X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","written_rows":"0","writ
 1
 ```

-Как видно, curl немного неудобен тем, что надо URL-эскейпить пробелы.
-Хотя wget сам всё эскейпит, но его не рекомендуется использовать, так как он плохо работает по HTTP 1.1 при использовании keep-alive и Transfer-Encoding: chunked.
+Как видно, `curl` немного неудобен тем, что надо URL-эскейпить пробелы.
+Хотя `wget` сам всё эскейпит, но его не рекомендуется использовать, так как он плохо работает по HTTP 1.1 при использовании `keep-alive` и `Transfer-Encoding: chunked`.

 ``` bash
 $ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@ -65,7 +68,7 @@ $ echo '1' | curl 'http://localhost:8123/?query=SELECT' --data-binary @-
 1
 ```

-Если часть запроса отправляется в параметре, а часть POST-ом, то между этими двумя кусками данных ставится перевод строки.
+Если часть запроса отправляется в параметре, а часть POST запросом, то между этими двумя кусками данных ставится перевод строки.
 Пример (так работать не будет):

 ``` bash
@ -75,9 +78,9 @@ ECT 1
 , expected One of: SHOW TABLES, SHOW DATABASES, SELECT, INSERT, CREATE, ATTACH, RENAME, DROP, DETACH, USE, SET, OPTIMIZE., e.what() = DB::Exception
 ```

-По умолчанию, данные возвращаются в формате TabSeparated (подробнее смотри раздел «Форматы»).
+По умолчанию данные возвращаются в формате [TabSeparated](formats.md#tabseparated).

-Можно попросить любой другой формат - с помощью секции FORMAT запроса.
+Можно указать любой другой формат с помощью секции FORMAT запроса.

 Кроме того, вы можете использовать параметр URL-адреса `default_format` или заголовок `X-ClickHouse-Format`, чтобы указать формат по умолчанию, отличный от `TabSeparated`.

@ -90,9 +93,10 @@ $ echo 'SELECT 1 FORMAT Pretty' | curl 'http://localhost:8123/?' --data-binary @
 └───┘
 ```

-Возможность передавать данные POST-ом нужна для INSERT-запросов. В этом случае вы можете написать начало запроса в параметре URL, а вставляемые данные передать POST-ом. Вставляемыми данными может быть, например, tab-separated дамп, полученный из MySQL. Таким образом, запрос INSERT заменяет LOAD DATA LOCAL INFILE из MySQL.
+Возможность передавать данные с помощью POST нужна для запросов `INSERT`. В этом случае вы можете написать начало запроса в параметре URL, а вставляемые данные передать POST запросом. Вставляемыми данными может быть, например, tab-separated дамп, полученный из MySQL. Таким образом, запрос `INSERT` заменяет `LOAD DATA LOCAL INFILE` из MySQL.
+
+**Примеры**

-Примеры:
 Создаём таблицу:

 ``` bash
@ -147,7 +151,7 @@ $ curl 'http://localhost:8123/?query=SELECT%20a%20FROM%20t'
 $ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @-
 ```

-Для запросов, которые не возвращают таблицу с данными, в случае успеха, выдаётся пустое тело ответа.
+Для запросов, которые не возвращают таблицу с данными, в случае успеха выдаётся пустое тело ответа.


 ## Сжатие {#compression}
@ -165,7 +169,7 @@ $ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @-
 - `deflate`
 - `xz`

-Для отправки сжатого запроса `POST`, добавьте заголовок `Content-Encoding: compression_method`. 
+Для отправки сжатого запроса `POST` добавьте заголовок `Content-Encoding: compression_method`. 
 Чтобы ClickHouse сжимал ответ, разрешите сжатие настройкой [enable_http_compression](../operations/settings/settings.md#settings-enable_http_compression) и добавьте заголовок `Accept-Encoding: compression_method`. Уровень сжатия данных для всех методов сжатия можно задать с помощью настройки [http_zlib_compression_level](../operations/settings/settings.md#settings-http_zlib_compression_level).

 !!! note "Примечание"
@ -281,13 +285,13 @@ X-ClickHouse-Progress: {"read_rows":"8783786","read_bytes":"819092887","total_ro

 HTTP интерфейс позволяет передать внешние данные (внешние временные таблицы) для использования запроса. Подробнее смотрите раздел «Внешние данные для обработки запроса»

-## Буферизация ответа {#buferizatsiia-otveta}
+## Буферизация ответа {#response-buffering}

 Существует возможность включить буферизацию ответа на стороне сервера. Для этого предусмотрены параметры URL `buffer_size` и `wait_end_of_query`.

 `buffer_size` определяет количество байт результата которые будут буферизованы в памяти сервера. Если тело результата больше этого порога, то буфер будет переписан в HTTP канал, а оставшиеся данные будут отправляться в HTTP-канал напрямую.

-Чтобы гарантировать буферизацию всего ответа необходимо выставить `wait_end_of_query=1`. В этом случае данные, не поместившиеся в памяти, будут буферизованы во временном файле сервера.
+Чтобы гарантировать буферизацию всего ответа, необходимо выставить `wait_end_of_query=1`. В этом случае данные, не поместившиеся в памяти, будут буферизованы во временном файле сервера.

 Пример:

@ -295,7 +299,7 @@ HTTP интерфейс позволяет передать внешние да
 $ curl -sS 'http://localhost:8123/?max_result_bytes=4000000&buffer_size=3000000&wait_end_of_query=1' -d 'SELECT toUInt8(number) FROM system.numbers LIMIT 9000000 FORMAT RowBinary'
 ```

-Буферизация позволяет избежать ситуации когда код ответа и HTTP-заголовки были отправлены клиенту, после чего возникла ошибка выполнения запроса. В такой ситуации сообщение об ошибке записывается в конце тела ответа, и на стороне клиента ошибка может быть обнаружена только на этапе парсинга.
+Буферизация позволяет избежать ситуации, когда код ответа и HTTP-заголовки были отправлены клиенту, после чего возникла ошибка выполнения запроса. В такой ситуации сообщение об ошибке записывается в конце тела ответа, и на стороне клиента ошибка может быть обнаружена только на этапе парсинга.

 ### Запросы с параметрами {#cli-queries-with-parameters}

@ -634,4 +638,3 @@ $ curl -vv -H 'XXX:xxx' 'http://localhost:8123/get_relative_path_static_handler'
 <html><body>Relative Path File</body></html>
 * Connection #0 to host localhost left intact
 ```
-
--- a/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md
+++ b/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md
@ -159,7 +159,7 @@ CREATE DICTIONARY somename (
 | Тег                                                  | Описание                                                                                                                                                                                                                                                                                                                                                      | Обязательный |
 |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
 | `name`                                               | Имя столбца.                                                                                                                                                                                                                                                                                                                                                  | Да           |
-| `type`                                               | Тип данных ClickHouse: [UInt8](../../../sql-reference/data-types/int-uint.md), [UInt16](../../../sql-reference/data-types/int-uint.md), [UInt32](../../../sql-reference/data-types/int-uint.md), [UInt64](../../../sql-reference/data-types/int-uint.md), [Int8](../../../sql-reference/data-types/int-uint.md), [Int16](../../../sql-reference/data-types/int-uint.md), [Int32](../../../sql-reference/data-types/int-uint.md), [Int64](../../../sql-reference/data-types/int-uint.md), [Float32](../../../sql-reference/data-types/float.md), [Float64](../../../sql-reference/data-types/float.md), [UUID](../../../sql-reference/data-types/uuid.md), [Decimal32](../../../sql-reference/data-types/decimal.md), [Decimal64](../../../sql-reference/data-types/decimal.md), [Decimal128](../../../sql-reference/data-types/decimal.md), [Decimal256](../../../sql-reference/data-types/decimal.md), [String](../../../sql-reference/data-types/string.md).<br/>ClickHouse пытается привести значение из словаря к заданному типу данных. Например, в случае MySQL, в таблице-источнике поле может быть `TEXT`, `VARCHAR`, `BLOB`, но загружено может быть как `String`. <br/>[Nullable](../../../sql-reference/data-types/nullable.md) в настоящее время поддерживается для словарей [Flat](external-dicts-dict-layout.md#flat), [Hashed](external-dicts-dict-layout.md#dicts-external_dicts_dict_layout-hashed), [ComplexKeyHashed](external-dicts-dict-layout.md#complex-key-hashed), [Direct](external-dicts-dict-layout.md#direct), [ComplexKeyDirect](external-dicts-dict-layout.md#complex-key-direct), [RangeHashed](external-dicts-dict-layout.md#range-hashed), [Polygon](external-dicts-dict-polygon.md), [Cache](external-dicts-dict-layout.md#cache), [ComplexKeyCache](external-dicts-dict-layout.md#complex-key-cache), [SSDCache](external-dicts-dict-layout.md#ssd-cache), [SSDComplexKeyCache](external-dicts-dict-layout.md#complex-key-ssd-cache). Для словарей [IPTrie](external-dicts-dict-layout.md#ip-trie) `Nullable`-типы не поддерживаются. | Да           |
+| `type`                                               | Тип данных ClickHouse: [UInt8](../../../sql-reference/data-types/int-uint.md), [UInt16](../../../sql-reference/data-types/int-uint.md), [UInt32](../../../sql-reference/data-types/int-uint.md), [UInt64](../../../sql-reference/data-types/int-uint.md), [Int8](../../../sql-reference/data-types/int-uint.md), [Int16](../../../sql-reference/data-types/int-uint.md), [Int32](../../../sql-reference/data-types/int-uint.md), [Int64](../../../sql-reference/data-types/int-uint.md), [Float32](../../../sql-reference/data-types/float.md), [Float64](../../../sql-reference/data-types/float.md), [UUID](../../../sql-reference/data-types/uuid.md), [Decimal32](../../../sql-reference/data-types/decimal.md), [Decimal64](../../../sql-reference/data-types/decimal.md), [Decimal128](../../../sql-reference/data-types/decimal.md), [Decimal256](../../../sql-reference/data-types/decimal.md), [String](../../../sql-reference/data-types/string.md), [Array](../../../sql-reference/data-types/array.md).<br/>ClickHouse пытается привести значение из словаря к заданному типу данных. Например, в случае MySQL, в таблице-источнике поле может быть `TEXT`, `VARCHAR`, `BLOB`, но загружено может быть как `String`. <br/>[Nullable](../../../sql-reference/data-types/nullable.md) в настоящее время поддерживается для словарей [Flat](external-dicts-dict-layout.md#flat), [Hashed](external-dicts-dict-layout.md#dicts-external_dicts_dict_layout-hashed), [ComplexKeyHashed](external-dicts-dict-layout.md#complex-key-hashed), [Direct](external-dicts-dict-layout.md#direct), [ComplexKeyDirect](external-dicts-dict-layout.md#complex-key-direct), [RangeHashed](external-dicts-dict-layout.md#range-hashed), [Polygon](external-dicts-dict-polygon.md), [Cache](external-dicts-dict-layout.md#cache), [ComplexKeyCache](external-dicts-dict-layout.md#complex-key-cache), [SSDCache](external-dicts-dict-layout.md#ssd-cache), [SSDComplexKeyCache](external-dicts-dict-layout.md#complex-key-ssd-cache). Для словарей [IPTrie](external-dicts-dict-layout.md#ip-trie) `Nullable`-типы не поддерживаются. | Да           |
 | `null_value`                                         | Значение по умолчанию для несуществующего элемента.<br/>В примере это пустая строка. Значение [NULL](../../syntax.md#null-literal) можно указывать только для типов `Nullable` (см. предыдущую строку с описанием типов).                                                                                                                                                                                                                                          | Да           |
 | `expression`                                         | [Выражение](../../syntax.md#syntax-expressions), которое ClickHouse выполняет со значением.<br/>Выражением может быть имя столбца в удаленной SQL базе. Таким образом, вы можете использовать его для создания псевдонима удаленного столбца.<br/><br/>Значение по умолчанию: нет выражения.                                                                  | Нет          |
 | <a name="hierarchical-dict-attr"></a> `hierarchical` | Если `true`, то атрибут содержит ключ предка для текущего элемента. Смотрите [Иерархические словари](external-dicts-dict-hierarchical.md).<br/><br/>Значение по умолчанию: `false`.                                                                                                                                                                                   | Нет           |
--- a/docs/ru/sql-reference/functions/logical-functions.md
+++ b/docs/ru/sql-reference/functions/logical-functions.md
@ -5,15 +5,186 @@ toc_title: "Логические функции"

 # Логические функции {#logicheskie-funktsii}

-Логические функции принимают любые числовые типы, а возвращают число типа UInt8, равное 0 или 1.
+Логические функции производят логические операции над любыми числовыми типами, а возвращают число типа [UInt8](../../sql-reference/data-types/int-uint.md), равное 0, 1, а в некоторых случаях `NULL`.

-Ноль в качестве аргумента считается «ложью», а любое ненулевое значение - «истиной».
+Ноль в качестве аргумента считается `ложью`, а любое ненулевое значение — `истиной`.

-## and, оператор AND {#and-operator-and}
+## and {#logical-and-function}

-## or, оператор OR {#or-operator-or}
+Вычисляет результат логической конъюнкции между двумя и более значениями. Соответствует [оператору логического "И"](../../sql-reference/operators/index.md#logical-and-operator).

-## not, оператор NOT {#not-operator-not}
+**Синтаксис**

-## xor {#xor}
+``` sql
+and(val1, val2...)
+```

+**Аргументы**
+
+-   `val1, val2, ...` — список из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Возвращаемое значение**
+
+-   `0`, если среди аргументов есть хотя бы один нуль.
+-   `NULL`, если среди аргументов нет нулей, но есть хотя бы один `NULL`.
+-   `1`, в остальных случаях.
+
+Тип: [UInt8](../../sql-reference/data-types/int-uint.md) или [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Пример**
+
+Запрос:
+
+``` sql
+SELECT and(0, 1, -2);
+```
+
+Результат:
+
+``` text
+┌─and(0, 1, -2)─┐
+│             0 │
+└───────────────┘
+```
+
+Со значениями `NULL`:
+
+``` sql
+SELECT and(NULL, 1, 10, -2);
+```
+
+Результат:
+
+``` text
+┌─and(NULL, 1, 10, -2)─┐
+│                 ᴺᵁᴸᴸ │
+└──────────────────────┘
+```
+
+## or {#logical-or-function}
+
+Вычисляет результат логической дизъюнкции между двумя и более значениями. Соответствует [оператору логического "ИЛИ"](../../sql-reference/operators/index.md#logical-or-operator).
+
+**Синтаксис**
+
+``` sql
+and(val1, val2...)
+```
+
+**Аргументы**
+
+-   `val1, val2, ...` — список из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `1`, если среди аргументов есть хотя бы одно ненулевое число.
+-   `0`, если среди аргументов только нули.
+-   `NULL`, если среди аргументов нет ненулевых значений, и есть `NULL`.
+
+Тип: [UInt8](../../sql-reference/data-types/int-uint.md) или [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Пример**
+
+Запрос:
+
+``` sql
+SELECT or(1, 0, 0, 2, NULL);
+```
+
+Результат:
+
+``` text
+┌─or(1, 0, 0, 2, NULL)─┐
+│                    1 │
+└──────────────────────┘
+```
+
+Со значениями `NULL`:
+
+``` sql
+SELECT or(0, NULL);
+```
+
+Результат:
+
+``` text
+┌─or(0, NULL)─┐
+│        ᴺᵁᴸᴸ │
+└─────────────┘
+```
+
+## not {#logical-not-function}
+
+Вычисляет результат логического отрицания аргумента. Соответствует [оператору логического отрицания](../../sql-reference/operators/index.md#logical-negation-operator).
+
+**Синтаксис**
+
+``` sql
+not(val);
+```
+
+**Аргументы**
+
+-   `val` — значение. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Возвращаемое значение**
+
+-   `1`, если `val` — это `0`.
+-   `0`, если `val` — это ненулевое число.
+-   `NULL`, если `val` — это `NULL`.
+
+Тип: [UInt8](../../sql-reference/data-types/int-uint.md) или [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Пример**
+
+Запрос:
+
+``` sql
+SELECT NOT(1);
+```
+
+Результат:
+
+``` test
+┌─not(1)─┐
+│      0 │
+└────────┘
+```
+
+## xor {#logical-xor-function}
+
+Вычисляет результат логической исключающей дизъюнкции между двумя и более значениями. При более чем двух значениях функция работает так: сначала вычисляет `XOR` для первых двух значений, а потом использует полученный результат при вычислении `XOR` со следующим значением и так далее.
+
+**Синтаксис**
+
+``` sql
+xor(val1, val2...)
+```
+
+**Аргументы**
+
+-   `val1, val2, ...` — список из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). 
+
+**Returned value**
+
+-   `1`, для двух значений: если одно из значений является нулем, а второе нет.
+-   `0`, для двух значений: если оба значения одновременно нули или ненулевые числа.
+-   `NULL`, если среди аргументов хотя бы один `NULL`.
+
+Тип: [UInt8](../../sql-reference/data-types/int-uint.md) or [Nullable](../../sql-reference/data-types/nullable.md)([UInt8](../../sql-reference/data-types/int-uint.md)).
+
+**Пример**
+
+Запрос:
+
+``` sql
+SELECT xor(0, 1, 1);
+```
+
+Результат:
+
+``` text
+┌─xor(0, 1, 1)─┐
+│            0 │
+└──────────────┘
+```
--- a/docs/ru/sql-reference/operators/index.md
+++ b/docs/ru/sql-reference/operators/index.md
@ -211,17 +211,17 @@ SELECT toDateTime('2014-10-26 00:00:00', 'Europe/Moscow') AS time, time + 60 * 6
 -   Тип данных [Interval](../../sql-reference/operators/index.md)
 -   Функции преобразования типов [toInterval](../../sql-reference/operators/index.md#function-tointerval)

-## Оператор логического отрицания {#operator-logicheskogo-otritsaniia}
+## Оператор логического "И" {#logical-and-operator}

-`NOT a` - функция `not(a)`
+Синтаксис `SELECT a AND b` — вычисляет логическую конъюнкцию между `a` и `b` функцией [and](../../sql-reference/functions/logical-functions.md#logical-and-function).

-## Оператор логического ‘И’ {#operator-logicheskogo-i}
+## Оператор логического "ИЛИ" {#logical-or-operator}

-`a AND b` - функция `and(a, b)`
+Синтаксис `SELECT a OR b` — вычисляет логическую дизъюнкцию между `a` и `b` функцией [or](../../sql-reference/functions/logical-functions.md#logical-or-function).

-## Оператор логического ‘ИЛИ’ {#operator-logicheskogo-ili}
+## Оператор логического отрицания {#logical-negation-operator}

-`a OR b` - функция `or(a, b)`
+Синтаксис `SELECT NOT a` — вычисляет логическое отрицание `a` функцией [not](../../sql-reference/functions/logical-functions.md#logical-not-function).

 ## Условный оператор {#uslovnyi-operator}

--- a/docs/ru/sql-reference/statements/select/distinct.md
+++ b/docs/ru/sql-reference/statements/select/distinct.md
@ -6,7 +6,7 @@ toc_title: DISTINCT

 Если указан `SELECT DISTINCT`, то в результате запроса останутся только уникальные строки. Таким образом, из всех наборов полностью совпадающих строк в результате останется только одна строка.

-## Обработк NULL {#null-processing}
+## Обработка NULL {#null-processing}

 `DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов.

--- a/docs/zh/sql-reference/functions/bitmap-functions.md
+++ b/docs/zh/sql-reference/functions/bitmap-functions.md
@ -81,7 +81,7 @@ SELECT bitmapToArray(bitmapSubsetInRange(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,
 **示例**

 ``` sql
-SELECT bitmapToArray(bitmapSubsetInRange(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res
+SELECT bitmapToArray(bitmapSubsetLimit(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(30), toUInt32(200))) AS res
 ```

    ┌─res───────────────────────┐
@ -174,7 +174,7 @@ SELECT bitmapToArray(bitmapAnd(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS re
    │ [3] │
    └─────┘

-## 位图 {#bitmapor}
+## 位图或 {#bitmapor}

 为两个位图对象进行或操作，返回一个新的位图对象。

--- a/programs/server/Server.cpp
+++ b/programs/server/Server.cpp
@ -1159,7 +1159,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
    {
        /// This object will periodically calculate some metrics.
        AsynchronousMetrics async_metrics(
-            global_context, config().getUInt("asynchronous_metrics_update_period_s", 60), servers_to_start_before_tables, servers);
+            global_context, config().getUInt("asynchronous_metrics_update_period_s", 1), servers_to_start_before_tables, servers);
        attachSystemTablesAsync(*DatabaseCatalog::instance().getSystemDatabase(), async_metrics);

        for (const auto & listen_host : listen_hosts)
--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@ -583,7 +583,7 @@
        <port>9019</port>
    </jdbc_bridge>
    -->
-  
+
    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.tech/docs/en/operations/table_engines/distributed/
      -->
@ -917,7 +917,7 @@
            Asynchronous metrics are updated once a minute, so there is
            no need to flush more often.
        -->
-        <flush_interval_milliseconds>60000</flush_interval_milliseconds>
+        <flush_interval_milliseconds>7000</flush_interval_milliseconds>
    </asynchronous_metric_log>

    <!--
--- a/programs/server/play.html
+++ b/programs/server/play.html
@ -283,6 +283,29 @@
            color: var(--link-color);
            text-decoration: none;
        }
+
+        /* This is for graph in svg */
+        text
+        {
+            font-size: 14px;
+            fill: var(--text-color);
+        }
+
+        .node rect
+        {
+            fill: var(--element-background-color);
+            filter: drop-shadow(.2rem .2rem .2rem var(--shadow-color));
+        }
+
+        .edgePath path
+        {
+            stroke: var(--text-color);
+        }
+
+        marker
+        {
+            fill: var(--text-color);
+        }
    </style>
 </head>

@ -305,6 +328,7 @@
        <table class="monospace shadow" id="data-table"></table>
        <pre class="monospace shadow" id="data-unparsed"></pre>
    </div>
+    <svg id="graph" fill="none"></svg>
    <p id="error" class="monospace shadow">
    </p>
 </body>
@ -447,6 +471,12 @@
            table.removeChild(table.lastChild);
        }

+        let graph = document.getElementById('graph');
+        while (graph.firstChild) {
+            graph.removeChild(graph.lastChild);
+        }
+        graph.style.display = 'none';
+
        document.getElementById('data-unparsed').innerText = '';
        document.getElementById('data-unparsed').style.display = 'none';

@ -461,12 +491,21 @@

    function renderResult(response)
    {
-        //console.log(response);
        clear();

        let stats = document.getElementById('stats');
        stats.innerText = 'Elapsed: ' + response.statistics.elapsed.toFixed(3) + " sec, read " + response.statistics.rows_read + " rows.";

+        /// We can also render graphs if user performed EXPLAIN PIPELINE graph=1.
+        if (response.data.length > 3 && response.data[0][0] === "digraph" && document.getElementById('query').value.match(/^\s*EXPLAIN/i)) {
+            renderGraph(response);
+        } else {
+            renderTable(response);
+        }
+    }
+
+    function renderTable(response)
+    {
        let thead = document.createElement('thead');
        for (let idx in response.meta) {
            let th = document.createElement('th');
@ -559,6 +598,51 @@
        document.getElementById('error').style.display = 'block';
    }

+    /// Huge JS libraries should be loaded only if needed.
+    function loadJS(src) {
+        return new Promise((resolve, reject) => {
+            const script = document.createElement('script');
+            script.src = src;
+            script.addEventListener('load', function() { resolve(true); });
+            document.head.appendChild(script);
+        });
+    }
+
+    let load_dagre_promise;
+    function loadDagre() {
+        if (load_dagre_promise) { return load_dagre_promise; }
+
+        load_dagre_promise = Promise.all([
+            loadJS('https://dagrejs.github.io/project/dagre/v0.8.5/dagre.min.js'),
+            loadJS('https://dagrejs.github.io/project/graphlib-dot/v0.6.4/graphlib-dot.min.js'),
+            loadJS('https://dagrejs.github.io/project/dagre-d3/v0.6.4/dagre-d3.min.js'),
+            loadJS('https://cdn.jsdelivr.net/npm/d3@7.0.0'),
+        ]);
+
+        return load_dagre_promise;
+    }
+
+    async function renderGraph(response)
+    {
+        await loadDagre();
+
+        /// https://github.com/dagrejs/dagre-d3/issues/131
+        const dot = response.data.reduce((acc, row) => acc + '\n' + row[0].replace(/shape\s*=\s*box/g, 'shape=rect'));
+
+        let graph = graphlibDot.read(dot);
+        graph.graph().rankdir = 'TB';
+
+        let render = new dagreD3.render();
+
+        let svg = document.getElementById('graph');
+        svg.style.display = 'block';
+
+        render(d3.select("#graph"), graph);
+
+        svg.style.width = graph.graph().width;
+        svg.style.height = graph.graph().height;
+    }
+
    function setColorTheme(theme)
    {
        window.localStorage.setItem('theme', theme);
--- a/src/Common/CurrentMetrics.cpp
+++ b/src/Common/CurrentMetrics.cpp
@ -30,6 +30,8 @@
    M(OpenFileForWrite, "Number of files open for writing") \
    M(Read, "Number of read (read, pread, io_getevents, etc.) syscalls in fly") \
    M(Write, "Number of write (write, pwrite, io_getevents, etc.) syscalls in fly") \
+    M(NetworkReceive, "Number of threads receiving data from network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
+    M(NetworkSend, "Number of threads sending data to network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
    M(SendScalars, "Number of connections that are sending data for scalars to remote servers.") \
    M(SendExternalTables, "Number of connections that are sending data for external tables to remote servers. External tables are used to implement GLOBAL IN and GLOBAL JOIN operators with distributed subqueries.") \
    M(QueryThread, "Number of query processing threads") \
--- a/src/Common/ErrorCodes.cpp
+++ b/src/Common/ErrorCodes.cpp
@ -557,6 +557,7 @@
    M(587, CONCURRENT_ACCESS_NOT_SUPPORTED) \
    M(588, DISTRIBUTED_BROKEN_BATCH_INFO) \
    M(589, DISTRIBUTED_BROKEN_BATCH_FILES) \
+    M(590, CANNOT_SYSCONF) \
    \
    M(998, POSTGRESQL_CONNECTION_FAILURE) \
    M(999, KEEPER_EXCEPTION) \
--- a/src/Common/FieldVisitorsAccurateComparison.h
+++ b/src/Common/FieldVisitorsAccurateComparison.h
@ -117,4 +117,16 @@ public:
    }
 };

+
+class FieldVisitorAccurateLessOrEqual : public StaticVisitor<bool>
+{
+public:
+    template <typename T, typename U>
+    bool operator()(const T & l, const U & r) const
+    {
+        auto less_cmp = FieldVisitorAccurateLess();
+        return !less_cmp(r, l);
+    }
+};
+
 }
--- a/src/Common/ProfileEvents.cpp
+++ b/src/Common/ProfileEvents.cpp
@ -49,8 +49,10 @@
    M(CreatedReadBufferMMapFailed, "") \
    M(DiskReadElapsedMicroseconds, "Total time spent waiting for read syscall. This include reads from page cache.") \
    M(DiskWriteElapsedMicroseconds, "Total time spent waiting for write syscall. This include writes to page cache.") \
-    M(NetworkReceiveElapsedMicroseconds, "") \
-    M(NetworkSendElapsedMicroseconds, "") \
+    M(NetworkReceiveElapsedMicroseconds, "Total time spent waiting for data to receive or receiving data from network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
+    M(NetworkSendElapsedMicroseconds, "Total time spent waiting for data to send to network or sending data to network. Only ClickHouse-related network interaction is included, not by 3rd party libraries..") \
+    M(NetworkReceiveBytes, "Total number of bytes received from network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
+    M(NetworkSendBytes, "Total number of bytes send to network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
    M(ThrottlerSleepMicroseconds, "Total time a query was sleeping to conform the 'max_network_bandwidth' setting.") \
    \
    M(QueryMaskingRulesMatch, "Number of times query masking rules was successfully matched.") \
--- a/src/Common/ZooKeeper/ZooKeeperImpl.cpp
+++ b/src/Common/ZooKeeper/ZooKeeperImpl.cpp
@ -566,7 +566,6 @@ void ZooKeeper::sendThread()
                    if (info.watch)
                    {
                        info.request->has_watch = true;
-                        CurrentMetrics::add(CurrentMetrics::ZooKeeperWatch);
                    }

                    if (expired)
@ -773,6 +772,8 @@ void ZooKeeper::receiveEvent()

            if (add_watch)
            {
+                CurrentMetrics::add(CurrentMetrics::ZooKeeperWatch);
+
                /// The key of wathces should exclude the root_path
                String req_path = request_info.request->getPath();
                removeRootPath(req_path, root_path);
@ -852,7 +853,8 @@ void ZooKeeper::finalize(bool error_send, bool error_receive)
            }

            /// Send thread will exit after sending close request or on expired flag
-            send_thread.join();
+            if (send_thread.joinable())
+                send_thread.join();
        }

        /// Set expired flag after we sent close event
@ -869,7 +871,7 @@ void ZooKeeper::finalize(bool error_send, bool error_receive)
            tryLogCurrentException(__PRETTY_FUNCTION__);
        }

-        if (!error_receive)
+        if (!error_receive && receive_thread.joinable())
            receive_thread.join();

        {
@ -905,6 +907,7 @@ void ZooKeeper::finalize(bool error_send, bool error_receive)
        {
            std::lock_guard lock(watches_mutex);

+            Int64 watch_callback_count = 0;
            for (auto & path_watches : watches)
            {
                WatchResponse response;
@ -914,6 +917,7 @@ void ZooKeeper::finalize(bool error_send, bool error_receive)

                for (auto & callback : path_watches.second)
                {
+                    watch_callback_count += 1;
                    if (callback)
                    {
                        try
@ -928,7 +932,7 @@ void ZooKeeper::finalize(bool error_send, bool error_receive)
                }
            }

-            CurrentMetrics::sub(CurrentMetrics::ZooKeeperWatch, watches.size());
+            CurrentMetrics::sub(CurrentMetrics::ZooKeeperWatch, watch_callback_count);
            watches.clear();
        }

--- a/src/Common/hex.cpp
+++ b/src/Common/hex.cpp
@ -56,3 +56,37 @@ const char * const hex_char_to_digit_table =
    "\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"
    "\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"
    "\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff";
+
+const char * const bin_byte_to_char_table =
+    "0000000000000001000000100000001100000100000001010000011000000111"
+    "0000100000001001000010100000101100001100000011010000111000001111"
+    "0001000000010001000100100001001100010100000101010001011000010111"
+    "0001100000011001000110100001101100011100000111010001111000011111"
+    "0010000000100001001000100010001100100100001001010010011000100111"
+    "0010100000101001001010100010101100101100001011010010111000101111"
+    "0011000000110001001100100011001100110100001101010011011000110111"
+    "0011100000111001001110100011101100111100001111010011111000111111"
+    "0100000001000001010000100100001101000100010001010100011001000111"
+    "0100100001001001010010100100101101001100010011010100111001001111"
+    "0101000001010001010100100101001101010100010101010101011001010111"
+    "0101100001011001010110100101101101011100010111010101111001011111"
+    "0110000001100001011000100110001101100100011001010110011001100111"
+    "0110100001101001011010100110101101101100011011010110111001101111"
+    "0111000001110001011100100111001101110100011101010111011001110111"
+    "0111100001111001011110100111101101111100011111010111111001111111"
+    "1000000010000001100000101000001110000100100001011000011010000111"
+    "1000100010001001100010101000101110001100100011011000111010001111"
+    "1001000010010001100100101001001110010100100101011001011010010111"
+    "1001100010011001100110101001101110011100100111011001111010011111"
+    "1010000010100001101000101010001110100100101001011010011010100111"
+    "1010100010101001101010101010101110101100101011011010111010101111"
+    "1011000010110001101100101011001110110100101101011011011010110111"
+    "1011100010111001101110101011101110111100101111011011111010111111"
+    "1100000011000001110000101100001111000100110001011100011011000111"
+    "1100100011001001110010101100101111001100110011011100111011001111"
+    "1101000011010001110100101101001111010100110101011101011011010111"
+    "1101100011011001110110101101101111011100110111011101111011011111"
+    "1110000011100001111000101110001111100100111001011110011011100111"
+    "1110100011101001111010101110101111101100111011011110111011101111"
+    "1111000011110001111100101111001111110100111101011111011011110111"
+    "1111100011111001111110101111101111111100111111011111111011111111";
--- a/src/Common/hex.h
+++ b/src/Common/hex.h
@ -39,6 +39,12 @@ inline void writeHexByteLowercase(UInt8 byte, void * out)
    memcpy(out, &hex_byte_to_char_lowercase_table[static_cast<size_t>(byte) * 2], 2);
 }

+extern const char * const bin_byte_to_char_table;
+
+inline void writeBinByte(UInt8 byte, void * out)
+{
+    memcpy(out, &bin_byte_to_char_table[static_cast<size_t>(byte) * 8], 8);
+}

 /// Produces hex representation of an unsigned int with leading zeros (for checksums)
 template <typename TUInt>
--- a/src/DataStreams/TTLAggregationAlgorithm.cpp
+++ b/src/DataStreams/TTLAggregationAlgorithm.cpp
@ -37,88 +37,115 @@ TTLAggregationAlgorithm::TTLAggregationAlgorithm(
        settings.compile_aggregate_expressions, settings.min_count_to_compile_aggregate_expression);

    aggregator = std::make_unique<Aggregator>(params);
+
+    if (isMaxTTLExpired())
+        new_ttl_info.finished = true;
 }

 void TTLAggregationAlgorithm::execute(Block & block)
 {
-    if (!block)
-    {
-        if (!aggregation_result.empty())
-        {
-            MutableColumns result_columns = header.cloneEmptyColumns();
-            finalizeAggregates(result_columns);
-            block = header.cloneWithColumns(std::move(result_columns));
-        }

-        return;
-    }
-
-    const auto & column_names = header.getNames();
+    bool some_rows_were_aggregated = false;
    MutableColumns result_columns = header.cloneEmptyColumns();
-    MutableColumns aggregate_columns = header.cloneEmptyColumns();

-    auto ttl_column = executeExpressionAndGetColumn(description.expression, block, description.result_column);
-    auto where_column = executeExpressionAndGetColumn(description.where_expression, block, description.where_result_column);
-
-    size_t rows_aggregated = 0;
-    size_t current_key_start = 0;
-    size_t rows_with_current_key = 0;
-
-    for (size_t i = 0; i < block.rows(); ++i)
+    if (!block) /// Empty block -- no more data, but we may still have some accumulated rows
    {
-        UInt32 cur_ttl = getTimestampByIndex(ttl_column.get(), i);
-        bool where_filter_passed = !where_column || where_column->getBool(i);
-        bool ttl_expired = isTTLExpired(cur_ttl) && where_filter_passed;
-
-        bool same_as_current = true;
-        for (size_t j = 0; j < description.group_by_keys.size(); ++j)
+        if (!aggregation_result.empty()) /// Still have some aggregated data, let's update TTL
        {
-            const String & key_column = description.group_by_keys[j];
-            const IColumn * values_column = block.getByName(key_column).column.get();
-            if (!same_as_current || (*values_column)[i] != current_key_value[j])
-            {
-                values_column->get(i, current_key_value[j]);
-                same_as_current = false;
-            }
-        }
-
-        if (!same_as_current)
-        {
-            if (rows_with_current_key)
-                calculateAggregates(aggregate_columns, current_key_start, rows_with_current_key);
            finalizeAggregates(result_columns);
-
-            current_key_start = rows_aggregated;
-            rows_with_current_key = 0;
+            some_rows_were_aggregated = true;
        }
-
-        if (ttl_expired)
+        else /// No block, all aggregated, just finish
        {
-            ++rows_with_current_key;
-            ++rows_aggregated;
-            for (const auto & name : column_names)
-            {
-                const IColumn * values_column = block.getByName(name).column.get();
-                auto & column = aggregate_columns[header.getPositionByName(name)];
-                column->insertFrom(*values_column, i);
-            }
-        }
-        else
-        {
-            new_ttl_info.update(cur_ttl);
-            for (const auto & name : column_names)
-            {
-                const IColumn * values_column = block.getByName(name).column.get();
-                auto & column = result_columns[header.getPositionByName(name)];
-                column->insertFrom(*values_column, i);
-            }
+            return;
        }
    }
+    else
+    {
+        const auto & column_names = header.getNames();
+        MutableColumns aggregate_columns = header.cloneEmptyColumns();

-    if (rows_with_current_key)
-        calculateAggregates(aggregate_columns, current_key_start, rows_with_current_key);
+        auto ttl_column = executeExpressionAndGetColumn(description.expression, block, description.result_column);
+        auto where_column = executeExpressionAndGetColumn(description.where_expression, block, description.where_result_column);
+
+        size_t rows_aggregated = 0;
+        size_t current_key_start = 0;
+        size_t rows_with_current_key = 0;
+
+        for (size_t i = 0; i < block.rows(); ++i)
+        {
+            UInt32 cur_ttl = getTimestampByIndex(ttl_column.get(), i);
+            bool where_filter_passed = !where_column || where_column->getBool(i);
+            bool ttl_expired = isTTLExpired(cur_ttl) && where_filter_passed;
+
+            bool same_as_current = true;
+            for (size_t j = 0; j < description.group_by_keys.size(); ++j)
+            {
+                const String & key_column = description.group_by_keys[j];
+                const IColumn * values_column = block.getByName(key_column).column.get();
+                if (!same_as_current || (*values_column)[i] != current_key_value[j])
+                {
+                    values_column->get(i, current_key_value[j]);
+                    same_as_current = false;
+                }
+            }
+
+            if (!same_as_current)
+            {
+                if (rows_with_current_key)
+                {
+                    some_rows_were_aggregated = true;
+                    calculateAggregates(aggregate_columns, current_key_start, rows_with_current_key);
+                }
+                finalizeAggregates(result_columns);
+
+                current_key_start = rows_aggregated;
+                rows_with_current_key = 0;
+            }
+
+            if (ttl_expired)
+            {
+                ++rows_with_current_key;
+                ++rows_aggregated;
+                for (const auto & name : column_names)
+                {
+                    const IColumn * values_column = block.getByName(name).column.get();
+                    auto & column = aggregate_columns[header.getPositionByName(name)];
+                    column->insertFrom(*values_column, i);
+                }
+            }
+            else
+            {
+                for (const auto & name : column_names)
+                {
+                    const IColumn * values_column = block.getByName(name).column.get();
+                    auto & column = result_columns[header.getPositionByName(name)];
+                    column->insertFrom(*values_column, i);
+                }
+            }
+        }
+
+        if (rows_with_current_key)
+        {
+            some_rows_were_aggregated = true;
+            calculateAggregates(aggregate_columns, current_key_start, rows_with_current_key);
+        }
+    }

    block = header.cloneWithColumns(std::move(result_columns));
+
+    /// If some rows were aggregated we have to recalculate ttl info's
+    if (some_rows_were_aggregated)
+    {
+        auto ttl_column_after_aggregation = executeExpressionAndGetColumn(description.expression, block, description.result_column);
+        auto where_column_after_aggregation = executeExpressionAndGetColumn(description.where_expression, block, description.where_result_column);
+        for (size_t i = 0; i < block.rows(); ++i)
+        {
+            bool where_filter_passed = !where_column_after_aggregation || where_column_after_aggregation->getBool(i);
+            if (where_filter_passed)
+                new_ttl_info.update(getTimestampByIndex(ttl_column_after_aggregation.get(), i));
+        }
+    }
 }

 void TTLAggregationAlgorithm::calculateAggregates(const MutableColumns & aggregate_columns, size_t start_pos, size_t length)
@ -134,6 +161,7 @@ void TTLAggregationAlgorithm::calculateAggregates(const MutableColumns & aggrega

    aggregator->executeOnBlock(aggregate_chunk, length, aggregation_result, key_columns,
                               columns_for_aggregator, no_more_keys);
+
 }

 void TTLAggregationAlgorithm::finalizeAggregates(MutableColumns & result_columns)
@ -141,6 +169,7 @@ void TTLAggregationAlgorithm::finalizeAggregates(MutableColumns & result_columns
    if (!aggregation_result.empty())
    {
        auto aggregated_res = aggregator->convertToBlocks(aggregation_result, true, 1);
+
        for (auto & agg_block : aggregated_res)
        {
            for (const auto & it : description.set_parts)
--- a/src/DataStreams/TTLColumnAlgorithm.cpp
+++ b/src/DataStreams/TTLColumnAlgorithm.cpp
@ -21,6 +21,9 @@ TTLColumnAlgorithm::TTLColumnAlgorithm(
        new_ttl_info = old_ttl_info;
        is_fully_empty = false;
    }
+
+    if (isMaxTTLExpired())
+        new_ttl_info.finished = true;
 }

 void TTLColumnAlgorithm::execute(Block & block)
--- a/src/DataStreams/TTLDeleteAlgorithm.cpp
+++ b/src/DataStreams/TTLDeleteAlgorithm.cpp
@ -9,6 +9,9 @@ TTLDeleteAlgorithm::TTLDeleteAlgorithm(
 {
    if (!isMinTTLExpired())
        new_ttl_info = old_ttl_info;
+
+    if (isMaxTTLExpired())
+        new_ttl_info.finished = true;
 }

 void TTLDeleteAlgorithm::execute(Block & block)
--- a/src/DataTypes/Serializations/SerializationMap.cpp
+++ b/src/DataTypes/Serializations/SerializationMap.cpp
@ -80,8 +80,13 @@ void SerializationMap::deserializeBinary(IColumn & column, ReadBuffer & istr) co
 }


-template <typename Writer>
-void SerializationMap::serializeTextImpl(const IColumn & column, size_t row_num, bool quote_key, WriteBuffer & ostr, Writer && writer) const
+template <typename KeyWriter, typename ValueWriter>
+void SerializationMap::serializeTextImpl(
+    const IColumn & column,
+    size_t row_num,
+    WriteBuffer & ostr,
+    KeyWriter && key_writer,
+    ValueWriter && value_writer) const
 {
    const auto & column_map = assert_cast<const ColumnMap &>(column);

@ -98,17 +103,9 @@ void SerializationMap::serializeTextImpl(const IColumn & column, size_t row_num,
        if (i != offset)
            writeChar(',', ostr);

-        if (quote_key)
-        {
-            writeChar('"', ostr);
-            writer(key, nested_tuple.getColumn(0), i);
-            writeChar('"', ostr);
-        }
-        else
-            writer(key, nested_tuple.getColumn(0), i);
-
+        key_writer(ostr, key, nested_tuple.getColumn(0), i);
        writeChar(':', ostr);
-        writer(value, nested_tuple.getColumn(1), i);
+        value_writer(ostr, value, nested_tuple.getColumn(1), i);
    }
    writeChar('}', ostr);
 }
@ -148,13 +145,13 @@ void SerializationMap::deserializeTextImpl(IColumn & column, ReadBuffer & istr,
            if (*istr.position() == '}')
                break;

-            reader(key, key_column);
+            reader(istr, key, key_column);
            skipWhitespaceIfAny(istr);
            assertChar(':', istr);

            ++size;
            skipWhitespaceIfAny(istr);
-            reader(value, value_column);
+            reader(istr, value, value_column);

            skipWhitespaceIfAny(istr);
        }
@ -170,41 +167,45 @@ void SerializationMap::deserializeTextImpl(IColumn & column, ReadBuffer & istr,

 void SerializationMap::serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
 {
-    serializeTextImpl(column, row_num, /*quote_key=*/ false, ostr,
-        [&](const SerializationPtr & subcolumn_serialization, const IColumn & subcolumn, size_t pos)
-        {
-            subcolumn_serialization->serializeTextQuoted(subcolumn, pos, ostr, settings);
-        });
+    auto writer = [&settings](WriteBuffer & buf, const SerializationPtr & subcolumn_serialization, const IColumn & subcolumn, size_t pos)
+    {
+        subcolumn_serialization->serializeTextQuoted(subcolumn, pos, buf, settings);
+    };
+
+    serializeTextImpl(column, row_num, ostr, writer, writer);
 }

 void SerializationMap::deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
 {
    deserializeTextImpl(column, istr,
-        [&](const SerializationPtr & subcolumn_serialization, IColumn & subcolumn)
+        [&settings](ReadBuffer & buf, const SerializationPtr & subcolumn_serialization, IColumn & subcolumn)
        {
-            subcolumn_serialization->deserializeTextQuoted(subcolumn, istr, settings);
+            subcolumn_serialization->deserializeTextQuoted(subcolumn, buf, settings);
        });
 }

 void SerializationMap::serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
 {
-    /// We need to double-quote integer keys to produce valid JSON.
-    const auto & column_key = assert_cast<const ColumnMap &>(column).getNestedData().getColumn(0);
-    bool quote_key = !WhichDataType(column_key.getDataType()).isStringOrFixedString();
-
-    serializeTextImpl(column, row_num, quote_key, ostr,
-        [&](const SerializationPtr & subcolumn_serialization, const IColumn & subcolumn, size_t pos)
+    serializeTextImpl(column, row_num, ostr,
+        [&settings](WriteBuffer & buf, const SerializationPtr & subcolumn_serialization, const IColumn & subcolumn, size_t pos)
        {
-            subcolumn_serialization->serializeTextJSON(subcolumn, pos, ostr, settings);
+            /// We need to double-quote all keys (including integers) to produce valid JSON.
+            WriteBufferFromOwnString str_buf;
+            subcolumn_serialization->serializeText(subcolumn, pos, str_buf, settings);
+            writeJSONString(str_buf.str(), buf, settings);
+        },
+        [&settings](WriteBuffer & buf, const SerializationPtr & subcolumn_serialization, const IColumn & subcolumn, size_t pos)
+        {
+            subcolumn_serialization->serializeTextJSON(subcolumn, pos, buf, settings);
        });
 }

 void SerializationMap::deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
 {
    deserializeTextImpl(column, istr,
-        [&](const SerializationPtr & subcolumn_serialization, IColumn & subcolumn)
+        [&settings](ReadBuffer & buf, const SerializationPtr & subcolumn_serialization, IColumn & subcolumn)
        {
-            subcolumn_serialization->deserializeTextJSON(subcolumn, istr, settings);
+            subcolumn_serialization->deserializeTextJSON(subcolumn, buf, settings);
        });
 }

--- a/src/DataTypes/Serializations/SerializationMap.h
+++ b/src/DataTypes/Serializations/SerializationMap.h
@ -60,8 +60,8 @@ public:
        SubstreamsCache * cache) const override;

 private:
-    template <typename Writer>
-    void serializeTextImpl(const IColumn & column, size_t row_num, bool quote_key, WriteBuffer & ostr, Writer && writer) const;
+    template <typename KeyWriter, typename ValueWriter>
+    void serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, KeyWriter && key_writer, ValueWriter && value_writer) const;

    template <typename Reader>
    void deserializeTextImpl(IColumn & column, ReadBuffer & istr, Reader && reader) const;
--- a/src/Dictionaries/ClickHouseDictionarySource.cpp
+++ b/src/Dictionaries/ClickHouseDictionarySource.cpp
@ -224,9 +224,7 @@ void registerDictionarySourceClickHouse(DictionarySourceFactory & factory)

        ClickHouseDictionarySource::Configuration configuration
        {
-            .secure = config.getBool(settings_config_prefix + ".secure", false),
            .host = host,
-            .port = port,
            .user = config.getString(settings_config_prefix + ".user", "default"),
            .password = config.getString(settings_config_prefix + ".password", ""),
            .db = config.getString(settings_config_prefix + ".db", default_database),
@ -235,7 +233,9 @@ void registerDictionarySourceClickHouse(DictionarySourceFactory & factory)
            .invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""),
            .update_field = config.getString(settings_config_prefix + ".update_field", ""),
            .update_lag = config.getUInt64(settings_config_prefix + ".update_lag", 1),
-            .is_local = isLocalAddress({host, port}, default_port)
+            .port = port,
+            .is_local = isLocalAddress({host, port}, default_port),
+            .secure = config.getBool(settings_config_prefix + ".secure", false)
        };

        /// We should set user info even for the case when the dictionary is loaded in-process (without TCP communication).
--- a/src/Dictionaries/ClickHouseDictionarySource.h
+++ b/src/Dictionaries/ClickHouseDictionarySource.h
@ -20,9 +20,7 @@ class ClickHouseDictionarySource final : public IDictionarySource
 public:
    struct Configuration
    {
-        const bool secure;
        const std::string host;
-        const UInt16 port;
        const std::string user;
        const std::string password;
        const std::string db;
@ -31,7 +29,9 @@ public:
        const std::string invalidate_query;
        const std::string update_field;
        const UInt64 update_lag;
+        const UInt16 port;
        const bool is_local;
+        const bool secure;
    };

    ClickHouseDictionarySource(
--- a/src/Disks/IDiskRemote.cpp
+++ b/src/Disks/IDiskRemote.cpp
@ -417,7 +417,11 @@ void IDiskRemote::removeDirectory(const String & path)

 DiskDirectoryIteratorPtr IDiskRemote::iterateDirectory(const String & path)
 {
-    return std::make_unique<RemoteDiskDirectoryIterator>(metadata_path + path, path);
+    fs::path meta_path = fs::path(metadata_path) / path;
+    if (fs::exists(meta_path) && fs::is_directory(meta_path))
+        return std::make_unique<RemoteDiskDirectoryIterator>(meta_path, path);
+    else
+        return std::make_unique<RemoteDiskDirectoryIterator>();
 }


--- a/src/Disks/IDiskRemote.h
+++ b/src/Disks/IDiskRemote.h
@ -193,6 +193,7 @@ struct IDiskRemote::Metadata
 class RemoteDiskDirectoryIterator final : public IDiskDirectoryIterator
 {
 public:
+    RemoteDiskDirectoryIterator() {}
    RemoteDiskDirectoryIterator(const String & full_path, const String & folder_path_) : iter(full_path), folder_path(folder_path_) {}

    void next() override { ++iter; }
--- a/src/Functions/FunctionsCoding.cpp
+++ b/src/Functions/FunctionsCoding.cpp
@ -21,6 +21,8 @@ void registerFunctionsCoding(FunctionFactory & factory)
    factory.registerFunction<FunctionUUIDStringToNum>();
    factory.registerFunction<FunctionHex>(FunctionFactory::CaseInsensitive);
    factory.registerFunction<FunctionUnhex>(FunctionFactory::CaseInsensitive);
+    factory.registerFunction<FunctionBin>(FunctionFactory::CaseInsensitive);
+    factory.registerFunction<FunctionUnbin>(FunctionFactory::CaseInsensitive);
    factory.registerFunction<FunctionChar>(FunctionFactory::CaseInsensitive);
    factory.registerFunction<FunctionBitmaskToArray>();
    factory.registerFunction<FunctionBitPositionsToArray>();
--- a/src/Functions/FunctionsCoding.h
+++ b/src/Functions/FunctionsCoding.h
@ -65,7 +65,6 @@ namespace ErrorCodes
 constexpr size_t uuid_bytes_length = 16;
 constexpr size_t uuid_text_length = 36;

-
 class FunctionIPv6NumToString : public IFunction
 {
 public:
@ -951,19 +950,22 @@ public:
    }
 };

-
-class FunctionHex : public IFunction
+/// Encode number or string to string with binary or hexadecimal representation
+template <typename Impl>
+class EncodeToBinaryRepr : public IFunction
 {
 public:
-    static constexpr auto name = "hex";
-    static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionHex>(); }
+    static constexpr auto name = Impl::name;
+    static constexpr size_t word_size = Impl::word_size;

-    String getName() const override
-    {
-        return name;
-    }
+    static FunctionPtr create(ContextPtr) { return std::make_shared<EncodeToBinaryRepr>(); }
+
+    String getName() const override { return name; }

    size_t getNumberOfArguments() const override { return 1; }
+
+    bool useDefaultImplementationForConstants() const override { return true; }
+
    bool isInjective(const ColumnsWithTypeAndName &) const override { return true; }

    DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
@ -983,235 +985,6 @@ public:
        return std::make_shared<DataTypeString>();
    }

-    template <typename T>
-    void executeOneUInt(T x, char *& out) const
-    {
-        bool was_nonzero = false;
-        for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8)
-        {
-            UInt8 byte = x >> offset;
-
-            /// Leading zeros.
-            if (byte == 0 && !was_nonzero && offset)  // -V560
-                continue;
-
-            was_nonzero = true;
-
-            writeHexByteUppercase(byte, out);
-            out += 2;
-        }
-        *out = '\0';
-        ++out;
-    }
-
-    template <typename T>
-    bool tryExecuteUInt(const IColumn * col, ColumnPtr & col_res) const
-    {
-        const ColumnVector<T> * col_vec = checkAndGetColumn<ColumnVector<T>>(col);
-
-        static constexpr size_t MAX_UINT_HEX_LENGTH = sizeof(T) * 2 + 1;    /// Including trailing zero byte.
-
-        if (col_vec)
-        {
-            auto col_str = ColumnString::create();
-            ColumnString::Chars & out_vec = col_str->getChars();
-            ColumnString::Offsets & out_offsets = col_str->getOffsets();
-
-            const typename ColumnVector<T>::Container & in_vec = col_vec->getData();
-
-            size_t size = in_vec.size();
-            out_offsets.resize(size);
-            out_vec.resize(size * 3 + MAX_UINT_HEX_LENGTH); /// 3 is length of one byte in hex plus zero byte.
-
-            size_t pos = 0;
-            for (size_t i = 0; i < size; ++i)
-            {
-                /// Manual exponential growth, so as not to rely on the linear amortized work time of `resize` (no one guarantees it).
-                if (pos + MAX_UINT_HEX_LENGTH > out_vec.size())
-                    out_vec.resize(out_vec.size() * 2 + MAX_UINT_HEX_LENGTH);
-
-                char * begin = reinterpret_cast<char *>(&out_vec[pos]);
-                char * end = begin;
-                executeOneUInt<T>(in_vec[i], end);
-
-                pos += end - begin;
-                out_offsets[i] = pos;
-            }
-
-            out_vec.resize(pos);
-
-            col_res = std::move(col_str);
-            return true;
-        }
-        else
-        {
-            return false;
-        }
-    }
-
-    template <typename T>
-    void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes) const
-    {
-        const size_t hex_length = type_size_in_bytes * 2 + 1; /// Including trailing zero byte.
-        auto col_str = ColumnString::create();
-
-        ColumnString::Chars & out_vec = col_str->getChars();
-        ColumnString::Offsets & out_offsets = col_str->getOffsets();
-
-        size_t size = in_vec.size();
-        out_offsets.resize(size);
-        out_vec.resize(size * hex_length);
-
-        size_t pos = 0;
-        char * out = reinterpret_cast<char *>(&out_vec[0]);
-        for (size_t i = 0; i < size; ++i)
-        {
-            const UInt8 * in_pos = reinterpret_cast<const UInt8 *>(&in_vec[i]);
-            executeOneString(in_pos, in_pos + type_size_in_bytes, out);
-
-            pos += hex_length;
-            out_offsets[i] = pos;
-        }
-        col_res = std::move(col_str);
-    }
-
-    template <typename T>
-    bool tryExecuteFloat(const IColumn * col, ColumnPtr & col_res) const
-    {
-        const ColumnVector<T> * col_vec = checkAndGetColumn<ColumnVector<T>>(col);
-        if (col_vec)
-        {
-            const typename ColumnVector<T>::Container & in_vec = col_vec->getData();
-            executeFloatAndDecimal<typename ColumnVector<T>::Container>(in_vec, col_res, sizeof(T));
-            return true;
-        }
-        else
-        {
-            return false;
-        }
-    }
-
-    template <typename T>
-    bool tryExecuteDecimal(const IColumn * col, ColumnPtr & col_res) const
-    {
-        const ColumnDecimal<T> * col_dec = checkAndGetColumn<ColumnDecimal<T>>(col);
-        if (col_dec)
-        {
-            const typename ColumnDecimal<T>::Container & in_vec = col_dec->getData();
-            executeFloatAndDecimal<typename ColumnDecimal<T>::Container>(in_vec, col_res, sizeof(T));
-            return true;
-        }
-        else
-        {
-            return false;
-        }
-    }
-
-
-    static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out)
-    {
-        while (pos < end)
-        {
-            writeHexByteUppercase(*pos, out);
-            ++pos;
-            out += 2;
-        }
-        *out = '\0';
-        ++out;
-    }
-
-    static bool tryExecuteString(const IColumn * col, ColumnPtr & col_res)
-    {
-        const ColumnString * col_str_in = checkAndGetColumn<ColumnString>(col);
-
-        if (col_str_in)
-        {
-            auto col_str = ColumnString::create();
-            ColumnString::Chars & out_vec = col_str->getChars();
-            ColumnString::Offsets & out_offsets = col_str->getOffsets();
-
-            const ColumnString::Chars & in_vec = col_str_in->getChars();
-            const ColumnString::Offsets & in_offsets = col_str_in->getOffsets();
-
-            size_t size = in_offsets.size();
-            out_offsets.resize(size);
-            out_vec.resize(in_vec.size() * 2 - size);
-
-            char * begin = reinterpret_cast<char *>(out_vec.data());
-            char * pos = begin;
-            size_t prev_offset = 0;
-
-            for (size_t i = 0; i < size; ++i)
-            {
-                size_t new_offset = in_offsets[i];
-
-                executeOneString(&in_vec[prev_offset], &in_vec[new_offset - 1], pos);
-
-                out_offsets[i] = pos - begin;
-
-                prev_offset = new_offset;
-            }
-
-            if (!out_offsets.empty() && out_offsets.back() != out_vec.size())
-                throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR);
-
-            col_res = std::move(col_str);
-            return true;
-        }
-        else
-        {
-            return false;
-        }
-    }
-
-    static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res)
-    {
-        const ColumnFixedString * col_fstr_in = checkAndGetColumn<ColumnFixedString>(col);
-
-        if (col_fstr_in)
-        {
-            auto col_str = ColumnString::create();
-            ColumnString::Chars & out_vec = col_str->getChars();
-            ColumnString::Offsets & out_offsets = col_str->getOffsets();
-
-            const ColumnString::Chars & in_vec = col_fstr_in->getChars();
-
-            size_t size = col_fstr_in->size();
-
-            out_offsets.resize(size);
-            out_vec.resize(in_vec.size() * 2 + size);
-
-            char * begin = reinterpret_cast<char *>(out_vec.data());
-            char * pos = begin;
-
-            size_t n = col_fstr_in->getN();
-
-            size_t prev_offset = 0;
-
-            for (size_t i = 0; i < size; ++i)
-            {
-                size_t new_offset = prev_offset + n;
-
-                executeOneString(&in_vec[prev_offset], &in_vec[new_offset], pos);
-
-                out_offsets[i] = pos - begin;
-                prev_offset = new_offset;
-            }
-
-            if (!out_offsets.empty() && out_offsets.back() != out_vec.size())
-                throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR);
-
-            col_res = std::move(col_str);
-            return true;
-        }
-        else
-        {
-            return false;
-        }
-    }
-
-    bool useDefaultImplementationForConstants() const override { return true; }
-
    ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
    {
        const IColumn * column = arguments[0].column.get();
@ -1234,19 +1007,185 @@ public:
                        + " of argument of function " + getName(),
                        ErrorCodes::ILLEGAL_COLUMN);
    }
+
+    template <typename T>
+    bool tryExecuteUInt(const IColumn * col, ColumnPtr & col_res) const
+    {
+        const ColumnVector<T> * col_vec = checkAndGetColumn<ColumnVector<T>>(col);
+
+        static constexpr size_t MAX_LENGTH = sizeof(T) * word_size + 1;    /// Including trailing zero byte.
+
+        if (col_vec)
+        {
+            auto col_str = ColumnString::create();
+            ColumnString::Chars & out_vec = col_str->getChars();
+            ColumnString::Offsets & out_offsets = col_str->getOffsets();
+
+            const typename ColumnVector<T>::Container & in_vec = col_vec->getData();
+
+            size_t size = in_vec.size();
+            out_offsets.resize(size);
+            out_vec.resize(size * (word_size+1) + MAX_LENGTH); /// word_size+1 is length of one byte in hex/bin plus zero byte.
+
+            size_t pos = 0;
+            for (size_t i = 0; i < size; ++i)
+            {
+                /// Manual exponential growth, so as not to rely on the linear amortized work time of `resize` (no one guarantees it).
+                if (pos + MAX_LENGTH > out_vec.size())
+                    out_vec.resize(out_vec.size() * word_size + MAX_LENGTH);
+
+                char * begin = reinterpret_cast<char *>(&out_vec[pos]);
+                char * end = begin;
+                Impl::executeOneUInt(in_vec[i], end);
+
+                pos += end - begin;
+                out_offsets[i] = pos;
+            }
+            out_vec.resize(pos);
+
+            col_res = std::move(col_str);
+            return true;
+        }
+        else
+        {
+            return false;
+        }
+    }
+
+    bool tryExecuteString(const IColumn *col, ColumnPtr &col_res) const
+    {
+        const ColumnString * col_str_in = checkAndGetColumn<ColumnString>(col);
+
+        if (col_str_in)
+        {
+            auto col_str = ColumnString::create();
+            ColumnString::Chars & out_vec = col_str->getChars();
+            ColumnString::Offsets & out_offsets = col_str->getOffsets();
+
+            const ColumnString::Chars & in_vec = col_str_in->getChars();
+            const ColumnString::Offsets & in_offsets = col_str_in->getOffsets();
+
+            size_t size = in_offsets.size();
+
+            out_offsets.resize(size);
+            /// reserve `word_size` bytes for each non trailing zero byte from input + `size` bytes for trailing zeros
+            out_vec.resize((in_vec.size() - size) * word_size + size);
+
+            char * begin = reinterpret_cast<char *>(out_vec.data());
+            char * pos = begin;
+            size_t prev_offset = 0;
+
+            for (size_t i = 0; i < size; ++i)
+            {
+                size_t new_offset = in_offsets[i];
+
+                Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset - 1], pos);
+
+                out_offsets[i] = pos - begin;
+
+                prev_offset = new_offset;
+            }
+            if (!out_offsets.empty() && out_offsets.back() != out_vec.size())
+                throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR);
+
+            col_res = std::move(col_str);
+            return true;
+        }
+        else
+        {
+            return false;
+        }
+    }
+
+    template <typename T>
+    bool tryExecuteDecimal(const IColumn * col, ColumnPtr & col_res) const
+    {
+        const ColumnDecimal<T> * col_dec = checkAndGetColumn<ColumnDecimal<T>>(col);
+        if (col_dec)
+        {
+            const typename ColumnDecimal<T>::Container & in_vec = col_dec->getData();
+            Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T));
+            return true;
+        }
+        else
+        {
+            return false;
+        }
+    }
+
+    static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res)
+    {
+         const ColumnFixedString * col_fstr_in = checkAndGetColumn<ColumnFixedString>(col);
+
+         if (col_fstr_in)
+         {
+             auto col_str = ColumnString::create();
+             ColumnString::Chars & out_vec = col_str->getChars();
+             ColumnString::Offsets & out_offsets = col_str->getOffsets();
+
+             const ColumnString::Chars & in_vec = col_fstr_in->getChars();
+
+             size_t size = col_fstr_in->size();
+
+             out_offsets.resize(size);
+             out_vec.resize(in_vec.size() * word_size + size);
+
+             char * begin = reinterpret_cast<char *>(out_vec.data());
+             char * pos = begin;
+
+             size_t n = col_fstr_in->getN();
+
+             size_t prev_offset = 0;
+
+             for (size_t i = 0; i < size; ++i)
+             {
+                 size_t new_offset = prev_offset + n;
+
+                 Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset], pos);
+
+                 out_offsets[i] = pos - begin;
+                 prev_offset = new_offset;
+             }
+
+             if (!out_offsets.empty() && out_offsets.back() != out_vec.size())
+                 throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR);
+
+             col_res = std::move(col_str);
+             return true;
+         }
+         else
+         {
+             return false;
+         }
+     }
+
+    template <typename T>
+    bool tryExecuteFloat(const IColumn * col, ColumnPtr & col_res) const
+    {
+        const ColumnVector<T> * col_vec = checkAndGetColumn<ColumnVector<T>>(col);
+        if (col_vec)
+        {
+            const typename ColumnVector<T>::Container & in_vec = col_vec->getData();
+            Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T));
+            return true;
+        }
+        else
+        {
+            return false;
+        }
+    }
 };

-
-class FunctionUnhex : public IFunction
+/// Decode number or string from string with binary or hexadecimal representation
+template <typename Impl>
+class DecodeFromBinaryRepr : public IFunction
 {
 public:
-    static constexpr auto name = "unhex";
-    static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionUnhex>(); }
+    static constexpr auto name = Impl::name;
+    static constexpr size_t word_size = Impl::word_size;
+    static FunctionPtr create(ContextPtr) { return std::make_shared<DecodeFromBinaryRepr>(); }

-    String getName() const override
-    {
-        return name;
-    }
+    String getName() const override { return name; }

    size_t getNumberOfArguments() const override { return 1; }
    bool isInjective(const ColumnsWithTypeAndName &) const override { return true; }
@ -1255,29 +1194,11 @@ public:
    {
        if (!isString(arguments[0]))
            throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
-            ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+                            ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);

        return std::make_shared<DataTypeString>();
    }

-    static void unhexOne(const char * pos, const char * end, char *& out)
-    {
-        if ((end - pos) & 1)
-        {
-            *out = unhex(*pos);
-            ++out;
-            ++pos;
-        }
-        while (pos < end)
-        {
-            *out = unhex2(pos);
-            pos += 2;
-            ++out;
-        }
-        *out = '\0';
-        ++out;
-    }
-
    bool useDefaultImplementationForConstants() const override { return true; }

    ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
@ -1296,7 +1217,7 @@ public:

            size_t size = in_offsets.size();
            out_offsets.resize(size);
-            out_vec.resize(in_vec.size() / 2 + size);
+            out_vec.resize(in_vec.size() / word_size + size);

            char * begin = reinterpret_cast<char *>(out_vec.data());
            char * pos = begin;
@ -1306,7 +1227,7 @@ public:
            {
                size_t new_offset = in_offsets[i];

-                unhexOne(reinterpret_cast<const char *>(&in_vec[prev_offset]), reinterpret_cast<const char *>(&in_vec[new_offset - 1]), pos);
+                Impl::decode(reinterpret_cast<const char *>(&in_vec[prev_offset]), reinterpret_cast<const char *>(&in_vec[new_offset - 1]), pos);

                out_offsets[i] = pos - begin;

@ -1326,6 +1247,219 @@ public:
    }
 };

+struct HexImpl
+{
+    static constexpr auto name = "hex";
+    static constexpr size_t word_size = 2;
+
+    template <typename T>
+    static void executeOneUInt(T x, char *& out)
+    {
+        bool was_nonzero = false;
+        for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8)
+        {
+            UInt8 byte = x >> offset;
+
+            /// Skip leading zeros
+            if (byte == 0 && !was_nonzero && offset)
+                continue;
+
+            was_nonzero = true;
+            writeHexByteUppercase(byte, out);
+            out += word_size;
+        }
+        *out = '\0';
+        ++out;
+    }
+
+    static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out)
+    {
+        while (pos < end)
+        {
+            writeHexByteUppercase(*pos, out);
+            ++pos;
+            out += word_size;
+        }
+        *out = '\0';
+        ++out;
+    }
+
+    template <typename T>
+    static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes)
+    {
+        const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte.
+        auto col_str = ColumnString::create();
+
+        ColumnString::Chars & out_vec = col_str->getChars();
+        ColumnString::Offsets & out_offsets = col_str->getOffsets();
+
+        size_t size = in_vec.size();
+        out_offsets.resize(size);
+        out_vec.resize(size * hex_length);
+
+        size_t pos = 0;
+        char * out = reinterpret_cast<char *>(&out_vec[0]);
+        for (size_t i = 0; i < size; ++i)
+        {
+            const UInt8 * in_pos = reinterpret_cast<const UInt8 *>(&in_vec[i]);
+            executeOneString(in_pos, in_pos + type_size_in_bytes, out);
+
+            pos += hex_length;
+            out_offsets[i] = pos;
+        }
+        col_res = std::move(col_str);
+    }
+};
+
+struct UnhexImpl
+{
+    static constexpr auto name = "unhex";
+    static constexpr size_t word_size = 2;
+
+    static void decode(const char * pos, const char * end, char *& out)
+    {
+        if ((end - pos) & 1)
+        {
+            *out = unhex(*pos);
+            ++out;
+            ++pos;
+        }
+        while (pos < end)
+        {
+            *out = unhex2(pos);
+            pos += word_size;
+            ++out;
+        }
+        *out = '\0';
+        ++out;
+    }
+};
+
+struct BinImpl
+{
+    static constexpr auto name = "bin";
+    static constexpr size_t word_size = 8;
+
+    template <typename T>
+    static void executeOneUInt(T x, char *& out)
+    {
+        bool was_nonzero = false;
+        for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8)
+        {
+            UInt8 byte = x >> offset;
+
+            /// Skip leading zeros
+            if (byte == 0 && !was_nonzero && offset)
+                continue;
+
+            was_nonzero = true;
+            writeBinByte(byte, out);
+            out += word_size;
+        }
+        *out = '\0';
+        ++out;
+    }
+
+    template <typename T>
+    static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes)
+    {
+        const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte.
+        auto col_str = ColumnString::create();
+
+        ColumnString::Chars & out_vec = col_str->getChars();
+        ColumnString::Offsets & out_offsets = col_str->getOffsets();
+
+        size_t size = in_vec.size();
+        out_offsets.resize(size);
+        out_vec.resize(size * hex_length);
+
+        size_t pos = 0;
+        char * out = reinterpret_cast<char *>(out_vec.data());
+        for (size_t i = 0; i < size; ++i)
+        {
+            const UInt8 * in_pos = reinterpret_cast<const UInt8 *>(&in_vec[i]);
+            executeOneString(in_pos, in_pos + type_size_in_bytes, out);
+
+            pos += hex_length;
+            out_offsets[i] = pos;
+        }
+        col_res = std::move(col_str);
+    }
+
+    static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out)
+    {
+        while (pos < end)
+        {
+            writeBinByte(*pos, out);
+            ++pos;
+            out += word_size;
+        }
+        *out = '\0';
+        ++out;
+    }
+};
+
+struct UnbinImpl
+{
+    static constexpr auto name = "unbin";
+    static constexpr size_t word_size = 8;
+
+    static void decode(const char * pos, const char * end, char *& out)
+    {
+        if (pos == end)
+        {
+            *out = '\0';
+            ++out;
+            return;
+        }
+
+        UInt8 left = 0;
+
+        /// end - pos is the length of input.
+        /// (length & 7) to make remain bits length mod 8 is zero to split.
+        /// e.g. the length is 9 and the input is "101000001",
+        /// first left_cnt is 1, left is 0, right shift, pos is 1, left = 1
+        /// then, left_cnt is 0, remain input is '01000001'.
+        for (UInt8 left_cnt = (end - pos) & 7; left_cnt > 0; --left_cnt)
+        {
+            left = left << 1;
+            if (*pos != '0')
+                left += 1;
+            ++pos;
+        }
+
+        if (left != 0 || end - pos == 0)
+        {
+            *out = left;
+            ++out;
+        }
+
+        assert((end - pos) % 8 == 0);
+
+        while (end - pos != 0)
+        {
+            UInt8 c = 0;
+            for (UInt8 i = 0; i < 8; ++i)
+            {
+                c = c << 1;
+                if (*pos != '0')
+                    c += 1;
+                ++pos;
+            }
+            *out = c;
+            ++out;
+        }
+
+        *out = '\0';
+        ++out;
+    }
+};
+
+using FunctionHex = EncodeToBinaryRepr<HexImpl>;
+using FunctionUnhex = DecodeFromBinaryRepr<UnhexImpl>;
+using FunctionBin = EncodeToBinaryRepr<BinImpl>;
+using FunctionUnbin = DecodeFromBinaryRepr<UnbinImpl>;
+
 class FunctionChar : public IFunction
 {
 public:
--- a/src/Functions/FunctionsJSON.h
+++ b/src/Functions/FunctionsJSON.h
@ -607,6 +607,8 @@ public:
    }
 };

+template <typename JSONParser>
+class JSONExtractRawImpl;

 /// Nodes of the extract tree. We need the extract tree to extract from JSON complex values containing array, tuples or nullables.
 template <typename JSONParser>
@ -691,7 +693,10 @@ struct JSONExtractTree
    public:
        bool insertResultToColumn(IColumn & dest, const Element & element) override
        {
-            return JSONExtractStringImpl<JSONParser>::insertResultToColumn(dest, element, {});
+            if (element.isString())
+                return JSONExtractStringImpl<JSONParser>::insertResultToColumn(dest, element, {});
+            else
+                return JSONExtractRawImpl<JSONParser>::insertResultToColumn(dest, element, {});
        }
    };

--- a/src/Functions/FunctionsLogical.cpp
+++ b/src/Functions/FunctionsLogical.cpp
@ -575,12 +575,12 @@ ColumnPtr FunctionAnyArityLogical<Impl, Name>::getConstantResultForNonConstArgum
    if constexpr (std::is_same_v<Impl, AndImpl>)
    {
        if (has_false_constant)
-            result_type->createColumnConst(0, static_cast<UInt8>(false));
+            result_column = result_type->createColumnConst(0, static_cast<UInt8>(false));
    }
    else if constexpr (std::is_same_v<Impl, OrImpl>)
    {
        if (has_true_constant)
-            result_type->createColumnConst(0, static_cast<UInt8>(true));
+            result_column = result_type->createColumnConst(0, static_cast<UInt8>(true));
    }

    return result_column;
--- a/src/Functions/GatherUtils/Sources.h
+++ b/src/Functions/GatherUtils/Sources.h
@ -755,6 +755,7 @@ struct GenericValueSource : public ValueSourceImpl<GenericValueSource>
 {
    using Slice = GenericValueSlice;
    using SinkType = GenericArraySink;
+    using Column = IColumn;

    const IColumn * column;
    size_t total_rows;
--- a/src/Functions/padString.cpp
+++ b/src/Functions/padString.cpp
@ -0,0 +1,308 @@
+#include <Columns/ColumnFixedString.h>
+#include <Columns/ColumnString.h>
+#include <Functions/FunctionFactory.h>
+#include <Functions/FunctionHelpers.h>
+#include <Functions/GatherUtils/Algorithms.h>
+#include <Functions/GatherUtils/Sinks.h>
+#include <Functions/GatherUtils/Sources.h>
+#include <common/bit_cast.h>
+
+namespace DB
+{
+using namespace GatherUtils;
+
+namespace ErrorCodes
+{
+    extern const int ILLEGAL_COLUMN;
+    extern const int ILLEGAL_TYPE_OF_ARGUMENT;
+    extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
+    extern const int TOO_LARGE_STRING_SIZE;
+}
+
+namespace
+{
+    /// The maximum new padded length.
+    constexpr size_t MAX_NEW_LENGTH = 1000000;
+
+    /// Appends padding characters to a sink based on a pad string.
+    /// Depending on how many padding characters are required to add
+    /// the pad string can be copied only partly or be repeated multiple times.
+    template <bool is_utf8>
+    class PaddingChars
+    {
+    public:
+        explicit PaddingChars(const String & pad_string_) : pad_string(pad_string_) { init(); }
+
+        ALWAYS_INLINE size_t numCharsInPadString() const
+        {
+            if constexpr (is_utf8)
+                return utf8_offsets.size() - 1;
+            else
+                return pad_string.length();
+        }
+
+        ALWAYS_INLINE size_t numCharsToNumBytes(size_t count) const
+        {
+            if constexpr (is_utf8)
+                return utf8_offsets[count];
+            else
+                return count;
+        }
+
+        void appendTo(StringSink & res_sink, size_t num_chars) const
+        {
+            if (!num_chars)
+                return;
+
+            const size_t step = numCharsInPadString();
+            while (true)
+            {
+                if (num_chars <= step)
+                {
+                    writeSlice(StringSource::Slice{bit_cast<const UInt8 *>(pad_string.data()), numCharsToNumBytes(num_chars)}, res_sink);
+                    break;
+                }
+                writeSlice(StringSource::Slice{bit_cast<const UInt8 *>(pad_string.data()), numCharsToNumBytes(step)}, res_sink);
+                num_chars -= step;
+            }
+        }
+
+    private:
+        void init()
+        {
+            if (pad_string.empty())
+                pad_string = " ";
+
+            if constexpr (is_utf8)
+            {
+                size_t offset = 0;
+                utf8_offsets.reserve(pad_string.length() + 1);
+                while (true)
+                {
+                    utf8_offsets.push_back(offset);
+                    if (offset == pad_string.length())
+                        break;
+                    offset += UTF8::seqLength(pad_string[offset]);
+                    if (offset > pad_string.length())
+                        offset = pad_string.length();
+                }
+            }
+
+            /// Not necessary, but good for performance.
+            while (numCharsInPadString() < 16)
+            {
+                pad_string += pad_string;
+                if constexpr (is_utf8)
+                {
+                    size_t old_size = utf8_offsets.size();
+                    utf8_offsets.reserve((old_size - 1) * 2);
+                    size_t base = utf8_offsets.back();
+                    for (size_t i = 1; i != old_size; ++i)
+                        utf8_offsets.push_back(utf8_offsets[i] + base);
+                }
+            }
+        }
+
+        String pad_string;
+        std::vector<size_t> utf8_offsets;
+    };
+
+    /// Returns the number of characters in a slice.
+    template <bool is_utf8>
+    inline ALWAYS_INLINE size_t getLengthOfSlice(const StringSource::Slice & slice)
+    {
+        if constexpr (is_utf8)
+            return UTF8::countCodePoints(slice.data, slice.size);
+        else
+            return slice.size;
+    }
+
+    /// Moves the end of a slice back by n characters.
+    template <bool is_utf8>
+    inline ALWAYS_INLINE StringSource::Slice removeSuffixFromSlice(const StringSource::Slice & slice, size_t suffix_length)
+    {
+        StringSource::Slice res = slice;
+        if constexpr (is_utf8)
+            res.size = UTF8StringSource::skipCodePointsBackward(slice.data + slice.size, suffix_length, slice.data) - res.data;
+        else
+            res.size -= std::min(suffix_length, res.size);
+        return res;
+    }
+
+    /// If `is_right_pad` - it's the rightPad() function instead of leftPad().
+    /// If `is_utf8` - lengths are measured in code points instead of bytes.
+    template <bool is_right_pad, bool is_utf8>
+    class FunctionPadString : public IFunction
+    {
+    public:
+        static constexpr auto name = is_right_pad ? (is_utf8 ? "rightPadUTF8" : "rightPad") : (is_utf8 ? "leftPadUTF8" : "leftPad");
+        static FunctionPtr create(const ContextPtr) { return std::make_shared<FunctionPadString>(); }
+
+        String getName() const override { return name; }
+
+        bool isVariadic() const override { return true; }
+        size_t getNumberOfArguments() const override { return 0; }
+
+        bool useDefaultImplementationForConstants() const override { return false; }
+
+        DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
+        {
+            size_t number_of_arguments = arguments.size();
+
+            if (number_of_arguments != 2 && number_of_arguments != 3)
+                throw Exception(
+                    ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
+                    "Number of arguments for function {} doesn't match: passed {}, should be 2 or 3",
+                    getName(),
+                    std::to_string(number_of_arguments));
+
+            if (!isStringOrFixedString(arguments[0]))
+                throw Exception(
+                    ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
+                    "Illegal type {} of the first argument of function {}, should be string",
+                    arguments[0]->getName(),
+                    getName());
+
+            if (!isUnsignedInteger(arguments[1]))
+                throw Exception(
+                    ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
+                    "Illegal type {} of the second argument of function {}, should be unsigned integer",
+                    arguments[1]->getName(),
+                    getName());
+
+            if (number_of_arguments == 3 && !isStringOrFixedString(arguments[2]))
+                throw Exception(
+                    ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
+                    "Illegal type {} of the third argument of function {}, should be const string",
+                    arguments[2]->getName(),
+                    getName());
+
+            return arguments[0];
+        }
+
+        ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
+        {
+            auto column_string = arguments[0].column;
+            auto column_length = arguments[1].column;
+
+            String pad_string;
+            if (arguments.size() == 3)
+            {
+                auto column_pad = arguments[2].column;
+                const ColumnConst * column_pad_const = checkAndGetColumnConst<ColumnString>(column_pad.get());
+                if (!column_pad_const)
+                    throw Exception(
+                        ErrorCodes::ILLEGAL_COLUMN,
+                        "Illegal column {}, third argument of function {} must be a constant string",
+                        column_pad->getName(),
+                        getName());
+
+                pad_string = column_pad_const->getValue<String>();
+            }
+            PaddingChars<is_utf8> padding_chars{pad_string};
+
+            auto col_res = ColumnString::create();
+            StringSink res_sink{*col_res, input_rows_count};
+
+            if (const ColumnString * col = checkAndGetColumn<ColumnString>(column_string.get()))
+                executeForSource(StringSource{*col}, column_length, padding_chars, res_sink);
+            else if (const ColumnFixedString * col_fixed = checkAndGetColumn<ColumnFixedString>(column_string.get()))
+                executeForSource(FixedStringSource{*col_fixed}, column_length, padding_chars, res_sink);
+            else if (const ColumnConst * col_const = checkAndGetColumnConst<ColumnString>(column_string.get()))
+                executeForSource(ConstSource<StringSource>{*col_const}, column_length, padding_chars, res_sink);
+            else if (const ColumnConst * col_const_fixed = checkAndGetColumnConst<ColumnFixedString>(column_string.get()))
+                executeForSource(ConstSource<FixedStringSource>{*col_const_fixed}, column_length, padding_chars, res_sink);
+            else
+                throw Exception(
+                    ErrorCodes::ILLEGAL_COLUMN,
+                    "Illegal column {}, first argument of function {} must be a string",
+                    arguments[0].column->getName(),
+                    getName());
+
+            return col_res;
+        }
+
+    private:
+        template <typename SourceStrings>
+        void executeForSource(
+            SourceStrings && strings,
+            const ColumnPtr & column_length,
+            const PaddingChars<is_utf8> & padding_chars,
+            StringSink & res_sink) const
+        {
+            if (const auto * col_const = checkAndGetColumn<ColumnConst>(column_length.get()))
+                executeForSourceAndLength(std::forward<SourceStrings>(strings), ConstSource<GenericValueSource>{*col_const}, padding_chars, res_sink);
+            else
+                executeForSourceAndLength(std::forward<SourceStrings>(strings), GenericValueSource{*column_length}, padding_chars, res_sink);
+        }
+
+        template <typename SourceStrings, typename SourceLengths>
+        void executeForSourceAndLength(
+            SourceStrings && strings,
+            SourceLengths && lengths,
+            const PaddingChars<is_utf8> & padding_chars,
+            StringSink & res_sink) const
+        {
+            bool is_const_length = lengths.isConst();
+            bool need_check_length = true;
+
+            for (; !res_sink.isEnd(); res_sink.next(), strings.next(), lengths.next())
+            {
+                auto str = strings.getWhole();
+                size_t current_length = getLengthOfSlice<is_utf8>(str);
+
+                auto new_length_slice = lengths.getWhole();
+                size_t new_length = new_length_slice.elements->getUInt(new_length_slice.position);
+
+                if (need_check_length)
+                {
+                    if (new_length > MAX_NEW_LENGTH)
+                    {
+                        throw Exception(
+                            "New padded length (" + std::to_string(new_length) + ") is too big, maximum is: " + std::to_string(MAX_NEW_LENGTH),
+                            ErrorCodes::TOO_LARGE_STRING_SIZE);
+                    }
+                    if (is_const_length)
+                    {
+                        size_t rows_count = res_sink.offsets.size();
+                        res_sink.reserve((new_length + 1 /* zero terminator */) * rows_count);
+                        need_check_length = false;
+                    }
+                }
+
+                if (new_length == current_length)
+                {
+                    writeSlice(str, res_sink);
+                }
+                else if (new_length < current_length)
+                {
+                    str = removeSuffixFromSlice<is_utf8>(str, current_length - new_length);
+                    writeSlice(str, res_sink);
+                }
+                else if (new_length > current_length)
+                {
+                    if constexpr (!is_right_pad)
+                        padding_chars.appendTo(res_sink, new_length - current_length);
+
+                    writeSlice(str, res_sink);
+
+                    if constexpr (is_right_pad)
+                        padding_chars.appendTo(res_sink, new_length - current_length);
+                }
+            }
+        }
+    };
+}
+
+void registerFunctionPadString(FunctionFactory & factory)
+{
+    factory.registerFunction<FunctionPadString<false, false>>(); /// leftPad
+    factory.registerFunction<FunctionPadString<false, true>>();  /// leftPadUTF8
+    factory.registerFunction<FunctionPadString<true, false>>();  /// rightPad
+    factory.registerFunction<FunctionPadString<true, true>>();   /// rightPadUTF8
+
+    factory.registerAlias("lpad", "leftPad", FunctionFactory::CaseInsensitive);
+    factory.registerAlias("rpad", "rightPad", FunctionFactory::CaseInsensitive);
+}
+
+}
--- a/src/Functions/registerFunctionsString.cpp
+++ b/src/Functions/registerFunctionsString.cpp
@ -29,6 +29,7 @@ void registerFunctionAppendTrailingCharIfAbsent(FunctionFactory &);
 void registerFunctionStartsWith(FunctionFactory &);
 void registerFunctionEndsWith(FunctionFactory &);
 void registerFunctionTrim(FunctionFactory &);
+void registerFunctionPadString(FunctionFactory &);
 void registerFunctionRegexpQuoteMeta(FunctionFactory &);
 void registerFunctionNormalizeQuery(FunctionFactory &);
 void registerFunctionNormalizedQueryHash(FunctionFactory &);
@ -68,6 +69,7 @@ void registerFunctionsString(FunctionFactory & factory)
    registerFunctionStartsWith(factory);
    registerFunctionEndsWith(factory);
    registerFunctionTrim(factory);
+    registerFunctionPadString(factory);
    registerFunctionRegexpQuoteMeta(factory);
    registerFunctionNormalizeQuery(factory);
    registerFunctionNormalizedQueryHash(factory);
--- a/src/Functions/ya.make
+++ b/src/Functions/ya.make
@ -387,6 +387,7 @@ SRCS(
    now.cpp
    now64.cpp
    nullIf.cpp
+    padString.cpp
    partitionId.cpp
    pi.cpp
    plus.cpp
--- a/src/IO/ReadBufferFromFileDescriptor.cpp
+++ b/src/IO/ReadBufferFromFileDescriptor.cpp
@ -149,7 +149,7 @@ off_t ReadBufferFromFileDescriptor::seek(off_t offset, int whence)
        off_t res = ::lseek(fd, new_pos, SEEK_SET);
        if (-1 == res)
            throwFromErrnoWithPath("Cannot seek through file " + getFileName(), getFileName(),
-                                   ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
+                ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
        file_offset_of_buffer_end = new_pos;

        watch.stop();
@ -160,6 +160,20 @@ off_t ReadBufferFromFileDescriptor::seek(off_t offset, int whence)
 }


+void ReadBufferFromFileDescriptor::rewind()
+{
+    ProfileEvents::increment(ProfileEvents::Seek);
+    off_t res = ::lseek(fd, 0, SEEK_SET);
+    if (-1 == res)
+        throwFromErrnoWithPath("Cannot seek through file " + getFileName(), getFileName(),
+            ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
+
+    /// Clearing the buffer with existing data. New data will be read on subsequent call to 'next'.
+    working_buffer.resize(0);
+    pos = working_buffer.begin();
+}
+
+
 /// Assuming file descriptor supports 'select', check that we have data to read or wait until timeout.
 bool ReadBufferFromFileDescriptor::poll(size_t timeout_microseconds)
 {
--- a/src/IO/ReadBufferFromFileDescriptor.h
+++ b/src/IO/ReadBufferFromFileDescriptor.h
@ -39,6 +39,9 @@ public:
    /// If 'offset' is small enough to stay in buffer after seek, then true seek in file does not happen.
    off_t seek(off_t off, int whence) override;

+    /// Seek to the beginning, discarding already read data if any. Useful to reread file that changes on every read.
+    void rewind();
+
    off_t size();

    void setProgressCallback(ContextPtr context);
--- a/src/IO/ReadBufferFromPocoSocket.cpp
+++ b/src/IO/ReadBufferFromPocoSocket.cpp
@ -5,11 +5,19 @@
 #include <Common/Exception.h>
 #include <Common/NetException.h>
 #include <Common/Stopwatch.h>
+#include <Common/ProfileEvents.h>
+#include <Common/CurrentMetrics.h>


 namespace ProfileEvents
 {
    extern const Event NetworkReceiveElapsedMicroseconds;
+    extern const Event NetworkReceiveBytes;
+}
+
+namespace CurrentMetrics
+{
+    extern const Metric NetworkReceive;
 }


@ -31,6 +39,8 @@ bool ReadBufferFromPocoSocket::nextImpl()
    /// Add more details to exceptions.
    try
    {
+        CurrentMetrics::Increment metric_increment(CurrentMetrics::NetworkReceive);
+
        /// If async_callback is specified, and read will block, run async_callback and try again later.
        /// It is expected that file descriptor may be polled externally.
        /// Note that receive timeout is not checked here. External code should check it while polling.
@ -57,6 +67,7 @@ bool ReadBufferFromPocoSocket::nextImpl()

    /// NOTE: it is quite inaccurate on high loads since the thread could be replaced by another one
    ProfileEvents::increment(ProfileEvents::NetworkReceiveElapsedMicroseconds, watch.elapsedMicroseconds());
+    ProfileEvents::increment(ProfileEvents::NetworkReceiveBytes, bytes_read);

    if (bytes_read)
        working_buffer.resize(bytes_read);
--- a/src/IO/WriteBufferFromPocoSocket.cpp
+++ b/src/IO/WriteBufferFromPocoSocket.cpp
@ -6,11 +6,19 @@
 #include <Common/NetException.h>
 #include <Common/Stopwatch.h>
 #include <Common/MemoryTracker.h>
+#include <Common/ProfileEvents.h>
+#include <Common/CurrentMetrics.h>


 namespace ProfileEvents
 {
    extern const Event NetworkSendElapsedMicroseconds;
+    extern const Event NetworkSendBytes;
+}
+
+namespace CurrentMetrics
+{
+    extern const Metric NetworkSend;
 }


@ -40,6 +48,7 @@ void WriteBufferFromPocoSocket::nextImpl()
        /// Add more details to exceptions.
        try
        {
+            CurrentMetrics::Increment metric_increment(CurrentMetrics::NetworkSend);
            res = socket.impl()->sendBytes(working_buffer.begin() + bytes_written, offset() - bytes_written);
        }
        catch (const Poco::Net::NetException & e)
@ -62,6 +71,7 @@ void WriteBufferFromPocoSocket::nextImpl()
    }

    ProfileEvents::increment(ProfileEvents::NetworkSendElapsedMicroseconds, watch.elapsedMicroseconds());
+    ProfileEvents::increment(ProfileEvents::NetworkSendBytes, bytes_written);
 }

 WriteBufferFromPocoSocket::WriteBufferFromPocoSocket(Poco::Net::Socket & socket_, size_t buf_size)
--- a/src/Interpreters/AsynchronousMetricLog.cpp
+++ b/src/Interpreters/AsynchronousMetricLog.cpp
@ -18,7 +18,7 @@ NamesAndTypesList AsynchronousMetricLogElement::getNamesAndTypes()
        {"event_date", std::make_shared<DataTypeDate>()},
        {"event_time", std::make_shared<DataTypeDateTime>()},
        {"event_time_microseconds", std::make_shared<DataTypeDateTime64>(6)},
-        {"name", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>())},
+        {"metric", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>())},
        {"value", std::make_shared<DataTypeFloat64>(),}
    };
 }
--- a/src/Interpreters/AsynchronousMetrics.cpp
+++ b/src/Interpreters/AsynchronousMetrics.cpp
--- a/src/Interpreters/AsynchronousMetrics.h
+++ b/src/Interpreters/AsynchronousMetrics.h
@ -3,11 +3,15 @@
 #include <Interpreters/Context_fwd.h>
 #include <Common/MemoryStatisticsOS.h>
 #include <Common/ThreadPool.h>
+#include <IO/ReadBufferFromFile.h>

 #include <condition_variable>
+#include <map>
 #include <mutex>
 #include <string>
 #include <thread>
+#include <vector>
+#include <optional>
 #include <unordered_map>


@ -15,6 +19,7 @@ namespace DB
 {

 class ProtocolServerAdapter;
+class ReadBuffer;

 using AsynchronousMetricValue = double;
 using AsynchronousMetricValues = std::unordered_map<std::string, AsynchronousMetricValue>;
@ -23,10 +28,30 @@ using AsynchronousMetricValues = std::unordered_map<std::string, AsynchronousMet
 /** Periodically (by default, each minute, starting at 30 seconds offset)
  *  calculates and updates some metrics,
  *  that are not updated automatically (so, need to be asynchronously calculated).
+  *
+  * This includes both ClickHouse-related metrics (like memory usage of ClickHouse process)
+  *  and common OS-related metrics (like total memory usage on the server).
  */
 class AsynchronousMetrics : WithContext
 {
 public:
+    /// The default value of update_period_seconds is for ClickHouse-over-YT
+    /// in Arcadia -- it uses its own server implementation that also uses these
+    /// metrics.
+    AsynchronousMetrics(
+        ContextPtr global_context_,
+        int update_period_seconds,
+        std::shared_ptr<std::vector<ProtocolServerAdapter>> servers_to_start_before_tables_,
+        std::shared_ptr<std::vector<ProtocolServerAdapter>> servers_);
+
+    ~AsynchronousMetrics();
+
+    /// Separate method allows to initialize the `servers` variable beforehand.
+    void start();
+
+    /// Returns copy of all values.
+    AsynchronousMetricValues getValues() const;
+
 #if defined(ARCADIA_BUILD)
    /// This constructor needs only to provide backward compatibility with some other projects (hello, Arcadia).
    /// Never use this in the ClickHouse codebase.
@ -39,35 +64,6 @@ public:
    }
 #endif

-    /// The default value of update_period_seconds is for ClickHouse-over-YT
-    /// in Arcadia -- it uses its own server implementation that also uses these
-    /// metrics.
-    AsynchronousMetrics(
-        ContextPtr global_context_,
-        int update_period_seconds,
-        std::shared_ptr<std::vector<ProtocolServerAdapter>> servers_to_start_before_tables_,
-        std::shared_ptr<std::vector<ProtocolServerAdapter>> servers_)
-        : WithContext(global_context_)
-        , update_period(update_period_seconds)
-        , servers_to_start_before_tables(servers_to_start_before_tables_)
-        , servers(servers_)
-    {
-    }
-
-    ~AsynchronousMetrics();
-
-    /// Separate method allows to initialize the `servers` variable beforehand.
-    void start()
-    {
-        /// Update once right now, to make metrics available just after server start
-        /// (without waiting for asynchronous_metrics_update_period_s).
-        update();
-        thread = std::make_unique<ThreadFromGlobalPool>([this] { run(); });
-    }
-
-    /// Returns copy of all values.
-    AsynchronousMetricValues getValues() const;
-
 private:
    const std::chrono::seconds update_period;
    std::shared_ptr<std::vector<ProtocolServerAdapter>> servers_to_start_before_tables{nullptr};
@ -78,14 +74,113 @@ private:
    bool quit {false};
    AsynchronousMetricValues values;

+    /// Some values are incremental and we have to calculate the difference.
+    /// On first run we will only collect the values to subtract later.
+    bool first_run = true;
+    std::chrono::system_clock::time_point previous_update_time;
+
 #if defined(OS_LINUX)
    MemoryStatisticsOS memory_stat;
+
+    std::optional<ReadBufferFromFile> meminfo;
+    std::optional<ReadBufferFromFile> loadavg;
+    std::optional<ReadBufferFromFile> proc_stat;
+    std::optional<ReadBufferFromFile> cpuinfo;
+    std::optional<ReadBufferFromFile> file_nr;
+    std::optional<ReadBufferFromFile> uptime;
+    std::optional<ReadBufferFromFile> net_dev;
+
+    std::vector<std::unique_ptr<ReadBufferFromFile>> thermal;
+
+    std::unordered_map<String /* device name */,
+        std::unordered_map<String /* label name */,
+            std::unique_ptr<ReadBufferFromFile>>> hwmon_devices;
+
+    std::vector<std::pair<
+        std::unique_ptr<ReadBufferFromFile> /* correctable errors */,
+        std::unique_ptr<ReadBufferFromFile> /* uncorrectable errors */>> edac;
+
+    std::unordered_map<String /* device name */, std::unique_ptr<ReadBufferFromFile>> block_devs;
+
+    /// TODO: socket statistics.
+
+    struct ProcStatValuesCPU
+    {
+        uint64_t user;
+        uint64_t nice;
+        uint64_t system;
+        uint64_t idle;
+        uint64_t iowait;
+        uint64_t irq;
+        uint64_t softirq;
+        uint64_t steal;
+        uint64_t guest;
+        uint64_t guest_nice;
+
+        void read(ReadBuffer & in);
+        ProcStatValuesCPU operator-(const ProcStatValuesCPU & other) const;
+    };
+
+    struct ProcStatValuesOther
+    {
+        uint64_t interrupts;
+        uint64_t context_switches;
+        uint64_t processes_created;
+
+        ProcStatValuesOther operator-(const ProcStatValuesOther & other) const;
+    };
+
+    ProcStatValuesCPU proc_stat_values_all_cpus{};
+    ProcStatValuesOther proc_stat_values_other{};
+    std::vector<ProcStatValuesCPU> proc_stat_values_per_cpu;
+
+    /// https://www.kernel.org/doc/Documentation/block/stat.txt
+    struct BlockDeviceStatValues
+    {
+        uint64_t read_ios;
+        uint64_t read_merges;
+        uint64_t read_sectors;
+        uint64_t read_ticks;
+        uint64_t write_ios;
+        uint64_t write_merges;
+        uint64_t write_sectors;
+        uint64_t write_ticks;
+        uint64_t in_flight_ios;
+        uint64_t io_ticks;
+        uint64_t time_in_queue;
+        uint64_t discard_ops;
+        uint64_t discard_merges;
+        uint64_t discard_sectors;
+        uint64_t discard_ticks;
+
+        void read(ReadBuffer & in);
+        BlockDeviceStatValues operator-(const BlockDeviceStatValues & other) const;
+    };
+
+    std::unordered_map<String /* device name */, BlockDeviceStatValues> block_device_stats;
+
+    struct NetworkInterfaceStatValues
+    {
+        uint64_t recv_bytes;
+        uint64_t recv_packets;
+        uint64_t recv_errors;
+        uint64_t recv_drop;
+        uint64_t send_bytes;
+        uint64_t send_packets;
+        uint64_t send_errors;
+        uint64_t send_drop;
+
+        NetworkInterfaceStatValues operator-(const NetworkInterfaceStatValues & other) const;
+    };
+
+    std::unordered_map<String /* device name */, NetworkInterfaceStatValues> network_interface_stats;
+
 #endif

    std::unique_ptr<ThreadFromGlobalPool> thread;

    void run();
-    void update();
+    void update(std::chrono::system_clock::time_point update_time);
 };

 }
--- a/src/Interpreters/DatabaseAndTableWithAlias.h
+++ b/src/Interpreters/DatabaseAndTableWithAlias.h
@ -61,7 +61,7 @@ struct TableWithColumnNamesAndTypes
            names.insert(col.name);
    }

-    bool hasColumn(const String & name) const { return names.count(name); }
+    bool hasColumn(const String & name) const { return names.contains(name); }

    void addHiddenColumns(const NamesAndTypesList & addition)
    {
@ -86,8 +86,6 @@ private:
            names.insert(col.name);
    }

-
-private:
    NameSet names;
 };

--- a/src/Interpreters/ExpressionAnalyzer.cpp
+++ b/src/Interpreters/ExpressionAnalyzer.cpp
@ -11,7 +11,6 @@
 #include <Parsers/DumpASTNode.h>

 #include <DataTypes/DataTypeNullable.h>
-#include <DataTypes/DataTypesNumber.h>
 #include <Columns/IColumn.h>

 #include <Interpreters/ArrayJoinAction.h>
@ -813,7 +812,8 @@ JoinPtr SelectQueryExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain
    }

    ExpressionActionsChain::Step & step = chain.lastStep(columns_after_array_join);
-    chain.steps.push_back(std::make_unique<ExpressionActionsChain::JoinStep>(syntax->analyzed_join, table_join, step.getResultColumns()));
+    chain.steps.push_back(std::make_unique<ExpressionActionsChain::JoinStep>(
+        syntax->analyzed_join, table_join, step.getResultColumns()));
    chain.addStep();
    return table_join;
 }
@ -906,8 +906,8 @@ JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(
            *   in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`.
            * - this function shows the expression JOIN _data1.
            */
-        auto interpreter = interpretSubquery(join_element.table_expression, getContext(), original_right_columns, query_options);
-
+        auto interpreter = interpretSubquery(
+            join_element.table_expression, getContext(), original_right_columns, query_options.copy().setWithAllColumns());
        {
            joined_plan = std::make_unique<QueryPlan>();
            interpreter->buildQueryPlan(*joined_plan);
--- a/src/Interpreters/IdentifierSemantic.cpp
+++ b/src/Interpreters/IdentifierSemantic.cpp
@ -1,6 +1,8 @@
+#include <Interpreters/IdentifierSemantic.h>
+
 #include <Common/typeid_cast.h>

-#include <Interpreters/IdentifierSemantic.h>
+#include <Interpreters/Context.h>
 #include <Interpreters/StorageID.h>

 #include <Parsers/ASTFunction.h>
@ -280,7 +282,10 @@ IdentifierMembershipCollector::IdentifierMembershipCollector(const ASTSelectQuer
        QueryAliasesNoSubqueriesVisitor(aliases).visit(with);
    QueryAliasesNoSubqueriesVisitor(aliases).visit(select.select());

-    tables = getDatabaseAndTablesWithColumns(getTableExpressions(select), context);
+    const auto & settings = context->getSettingsRef();
+    tables = getDatabaseAndTablesWithColumns(getTableExpressions(select), context,
+                                             settings.asterisk_include_alias_columns,
+                                             settings.asterisk_include_materialized_columns);
 }

 std::optional<size_t> IdentifierMembershipCollector::getIdentsMembership(ASTPtr ast) const
--- a/src/Interpreters/InterpreterSelectQuery.cpp
+++ b/src/Interpreters/InterpreterSelectQuery.cpp
@ -30,7 +30,6 @@
 #include <Interpreters/JoinToSubqueryTransformVisitor.h>
 #include <Interpreters/CrossToInnerJoinVisitor.h>
 #include <Interpreters/TableJoin.h>
-#include <Interpreters/JoinSwitcher.h>
 #include <Interpreters/JoinedTables.h>
 #include <Interpreters/OpenTelemetrySpanLog.h>
 #include <Interpreters/QueryAliasesVisitor.h>
@ -68,7 +67,6 @@
 #include <Processors/Transforms/AggregatingTransform.h>
 #include <Processors/Transforms/ExpressionTransform.h>
 #include <Processors/Transforms/FilterTransform.h>
-#include <Processors/Transforms/JoiningTransform.h>

 #include <Storages/MergeTree/MergeTreeWhereOptimizer.h>
 #include <Storages/IStorage.h>
@ -313,7 +311,7 @@ InterpreterSelectQuery::InterpreterSelectQuery(
        ApplyWithSubqueryVisitor().visit(query_ptr);
    }

-    JoinedTables joined_tables(getSubqueryContext(context), getSelectQuery());
+    JoinedTables joined_tables(getSubqueryContext(context), getSelectQuery(), options.with_all_cols);

    bool got_storage_from_query = false;
    if (!has_input && !storage)
--- a/src/Interpreters/JoinedTables.cpp
+++ b/src/Interpreters/JoinedTables.cpp
@ -161,9 +161,10 @@ using RenameQualifiedIdentifiersVisitor = InDepthNodeVisitor<RenameQualifiedIden

 }

-JoinedTables::JoinedTables(ContextPtr context_, const ASTSelectQuery & select_query)
+JoinedTables::JoinedTables(ContextPtr context_, const ASTSelectQuery & select_query, bool include_all_columns_)
    : context(context_)
    , table_expressions(getTableExpressions(select_query))
+    , include_all_columns(include_all_columns_)
    , left_table_expression(extractTableExpression(select_query, 0))
    , left_db_and_table(getDatabaseAndTable(select_query, 0))
 {}
@ -220,11 +221,13 @@ StoragePtr JoinedTables::getLeftTableStorage()

 bool JoinedTables::resolveTables()
 {
-    tables_with_columns = getDatabaseAndTablesWithColumns(table_expressions, context);
+    const auto & settings = context->getSettingsRef();
+    bool include_alias_cols = include_all_columns || settings.asterisk_include_alias_columns;
+    bool include_materialized_cols = include_all_columns || settings.asterisk_include_materialized_columns;
+    tables_with_columns = getDatabaseAndTablesWithColumns(table_expressions, context, include_alias_cols, include_materialized_cols);
    if (tables_with_columns.size() != table_expressions.size())
        throw Exception("Unexpected tables count", ErrorCodes::LOGICAL_ERROR);

-    const auto & settings = context->getSettingsRef();
    if (settings.joined_subquery_requires_alias && tables_with_columns.size() > 1)
    {
        for (size_t i = 0; i < tables_with_columns.size(); ++i)
@ -312,4 +315,11 @@ std::shared_ptr<TableJoin> JoinedTables::makeTableJoin(const ASTSelectQuery & se
    return table_join;
 }

+void JoinedTables::reset(const ASTSelectQuery & select_query)
+{
+    table_expressions = getTableExpressions(select_query);
+    left_table_expression = extractTableExpression(select_query, 0);
+    left_db_and_table = getDatabaseAndTable(select_query, 0);
+}
+
 }
--- a/src/Interpreters/JoinedTables.h
+++ b/src/Interpreters/JoinedTables.h
@ -22,12 +22,9 @@ using StorageMetadataPtr = std::shared_ptr<const StorageInMemoryMetadata>;
 class JoinedTables
 {
 public:
-    JoinedTables(ContextPtr context, const ASTSelectQuery & select_query);
+    JoinedTables(ContextPtr context, const ASTSelectQuery & select_query, bool include_all_columns_ = false);

-    void reset(const ASTSelectQuery & select_query)
-    {
-        *this = JoinedTables(Context::createCopy(context), select_query);
-    }
+    void reset(const ASTSelectQuery & select_query);

    StoragePtr getLeftTableStorage();
    bool resolveTables();
@ -37,7 +34,6 @@ public:
    std::shared_ptr<TableJoin> makeTableJoin(const ASTSelectQuery & select_query);

    const TablesWithColumns & tablesWithColumns() const { return tables_with_columns; }
-    TablesWithColumns moveTablesWithColumns() { return std::move(tables_with_columns); }

    bool isLeftTableSubquery() const;
    bool isLeftTableFunction() const;
@ -51,6 +47,7 @@ private:
    ContextPtr context;
    std::vector<const ASTTableExpression *> table_expressions;
    TablesWithColumns tables_with_columns;
+    const bool include_all_columns;

    /// Legacy (duplicated left table values)
    ASTPtr left_table_expression;
--- a/src/Interpreters/SelectQueryOptions.h
+++ b/src/Interpreters/SelectQueryOptions.h
@ -42,11 +42,14 @@ struct SelectQueryOptions
    bool ignore_alias = false;
    bool is_internal = false;
    bool is_subquery = false; // non-subquery can also have subquery_depth > 0, e.g. insert select
+    bool with_all_cols = false; /// asterisk include materialized and aliased columns

-    SelectQueryOptions(QueryProcessingStage::Enum stage = QueryProcessingStage::Complete, size_t depth = 0, bool is_subquery_ = false)
+    SelectQueryOptions(
+        QueryProcessingStage::Enum stage = QueryProcessingStage::Complete,
+        size_t depth = 0,
+        bool is_subquery_ = false)
        : to_stage(stage), subquery_depth(depth), is_subquery(is_subquery_)
-    {
-    }
+    {}

    SelectQueryOptions copy() const { return *this; }

@ -114,6 +117,12 @@ struct SelectQueryOptions
        is_internal = value;
        return *this;
    }
+
+    SelectQueryOptions & setWithAllColumns(bool value = true)
+    {
+        with_all_cols = value;
+        return *this;
+    }
 };

 }
--- a/src/Interpreters/TreeRewriter.cpp
+++ b/src/Interpreters/TreeRewriter.cpp
@ -1,5 +1,4 @@
 #include <Core/Settings.h>
-#include <Core/Defines.h>
 #include <Core/NamesAndTypes.h>

 #include <Interpreters/TreeRewriter.h>
@ -32,7 +31,6 @@
 #include <DataTypes/DataTypeNullable.h>

 #include <IO/WriteHelpers.h>
-#include <IO/WriteBufferFromOStream.h>
 #include <Storages/IStorage.h>

 #include <AggregateFunctions/AggregateFunctionFactory.h>
@ -510,14 +508,10 @@ void setJoinStrictness(ASTSelectQuery & select_query, JoinStrictness join_defaul
 }

 /// Find the columns that are obtained by JOIN.
-void collectJoinedColumns(TableJoin & analyzed_join, const ASTSelectQuery & select_query,
+void collectJoinedColumns(TableJoin & analyzed_join, const ASTTableJoin & table_join,
                          const TablesWithColumns & tables, const Aliases & aliases)
 {
-    const ASTTablesInSelectQueryElement * node = select_query.join();
-    if (!node || tables.size() < 2)
-        return;
-
-    const auto & table_join = node->table_join->as<ASTTableJoin &>();
+    assert(tables.size() >= 2);

    if (table_join.using_expression_list)
    {
@ -896,9 +890,15 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect(

    if (tables_with_columns.size() > 1)
    {
-        result.analyzed_join->columns_from_joined_table = tables_with_columns[1].columns;
+        const auto & right_table = tables_with_columns[1];
+        auto & cols_from_joined = result.analyzed_join->columns_from_joined_table;
+        cols_from_joined = right_table.columns;
+        /// query can use materialized or aliased columns from right joined table,
+        /// we want to request it for right table
+        cols_from_joined.insert(cols_from_joined.end(), right_table.hidden_columns.begin(), right_table.hidden_columns.end());
+
        result.analyzed_join->deduplicateAndQualifyColumnNames(
-            source_columns_set, tables_with_columns[1].table.getQualifiedNamePrefix());
+            source_columns_set, right_table.table.getQualifiedNamePrefix());
    }

    translateQualifiedNames(query, *select_query, source_columns_set, tables_with_columns);
@ -932,7 +932,16 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect(
    setJoinStrictness(
        *select_query, settings.join_default_strictness, settings.any_join_distinct_right_table_keys, result.analyzed_join->table_join);

-    collectJoinedColumns(*result.analyzed_join, *select_query, tables_with_columns, result.aliases);
+    if (const auto * join_ast = select_query->join(); join_ast && tables_with_columns.size() >= 2)
+    {
+        auto & table_join_ast = join_ast->table_join->as<ASTTableJoin &>();
+        if (table_join_ast.using_expression_list && result.metadata_snapshot)
+            replaceAliasColumnsInQuery(table_join_ast.using_expression_list, result.metadata_snapshot->getColumns(), result.array_join_result_to_source, getContext());
+        if (table_join_ast.on_expression && result.metadata_snapshot)
+            replaceAliasColumnsInQuery(table_join_ast.on_expression, result.metadata_snapshot->getColumns(), result.array_join_result_to_source, getContext());
+
+        collectJoinedColumns(*result.analyzed_join, table_join_ast, tables_with_columns, result.aliases);
+    }

    result.aggregates = getAggregates(query, *select_query);
    result.window_function_asts = getWindowFunctions(query, *select_query);
--- a/src/Interpreters/WindowDescription.cpp
+++ b/src/Interpreters/WindowDescription.cpp
@ -1,6 +1,7 @@
 #include <Interpreters/WindowDescription.h>

 #include <Core/Field.h>
+#include <Common/FieldVisitorsAccurateComparison.h>
 #include <Common/FieldVisitorToString.h>
 #include <IO/Operators.h>
 #include <Parsers/ASTFunction.h>
@ -99,7 +100,7 @@ void WindowFrame::checkValid() const
                && begin_offset.get<Int64>() < INT_MAX))
        {
            throw Exception(ErrorCodes::BAD_ARGUMENTS,
-                "Frame start offset for '{}' frame must be a nonnegative 32-bit integer, '{}' of type '{}' given.",
+                "Frame start offset for '{}' frame must be a nonnegative 32-bit integer, '{}' of type '{}' given",
                toString(type),
                applyVisitor(FieldVisitorToString(), begin_offset),
                Field::Types::toString(begin_offset.getType()));
@ -112,7 +113,7 @@ void WindowFrame::checkValid() const
                && end_offset.get<Int64>() < INT_MAX))
        {
            throw Exception(ErrorCodes::BAD_ARGUMENTS,
-                "Frame end offset for '{}' frame must be a nonnegative 32-bit integer, '{}' of type '{}' given.",
+                "Frame end offset for '{}' frame must be a nonnegative 32-bit integer, '{}' of type '{}' given",
                toString(type),
                applyVisitor(FieldVisitorToString(), end_offset),
                Field::Types::toString(end_offset.getType()));
@ -160,7 +161,8 @@ void WindowFrame::checkValid() const
        bool begin_less_equal_end;
        if (begin_preceding && end_preceding)
        {
-            begin_less_equal_end = begin_offset >= end_offset;
+            /// we can't compare Fields using operator<= if fields have different types
+            begin_less_equal_end = applyVisitor(FieldVisitorAccurateLessOrEqual(), end_offset, begin_offset);
        }
        else if (begin_preceding && !end_preceding)
        {
@ -172,7 +174,7 @@ void WindowFrame::checkValid() const
        }
        else /* if (!begin_preceding && !end_preceding) */
        {
-            begin_less_equal_end = begin_offset <= end_offset;
+            begin_less_equal_end = applyVisitor(FieldVisitorAccurateLessOrEqual(), begin_offset, end_offset);
        }

        if (!begin_less_equal_end)
--- a/src/Interpreters/getTableExpressions.cpp
+++ b/src/Interpreters/getTableExpressions.cpp
@ -113,50 +113,42 @@ static NamesAndTypesList getColumnsFromTableExpression(
    return names_and_type_list;
 }

-NamesAndTypesList getColumnsFromTableExpression(const ASTTableExpression & table_expression, ContextPtr context)
-{
-    NamesAndTypesList materialized;
-    NamesAndTypesList aliases;
-    NamesAndTypesList virtuals;
-    return getColumnsFromTableExpression(table_expression, context, materialized, aliases, virtuals);
-}
-
-TablesWithColumns getDatabaseAndTablesWithColumns(const std::vector<const ASTTableExpression *> & table_expressions, ContextPtr context)
+TablesWithColumns getDatabaseAndTablesWithColumns(
+        const ASTTableExprConstPtrs & table_expressions,
+        ContextPtr context,
+        bool include_alias_cols,
+        bool include_materialized_cols)
 {
    TablesWithColumns tables_with_columns;

-    if (!table_expressions.empty())
+    String current_database = context->getCurrentDatabase();
+
+    for (const ASTTableExpression * table_expression : table_expressions)
    {
-        String current_database = context->getCurrentDatabase();
-        bool include_alias_cols = context->getSettingsRef().asterisk_include_alias_columns;
-        bool include_materialized_cols = context->getSettingsRef().asterisk_include_materialized_columns;
+        NamesAndTypesList materialized;
+        NamesAndTypesList aliases;
+        NamesAndTypesList virtuals;
+        NamesAndTypesList names_and_types = getColumnsFromTableExpression(
+            *table_expression, context, materialized, aliases, virtuals);

-        for (const ASTTableExpression * table_expression : table_expressions)
+        removeDuplicateColumns(names_and_types);
+
+        tables_with_columns.emplace_back(
+            DatabaseAndTableWithAlias(*table_expression, current_database), names_and_types);
+
+        auto & table = tables_with_columns.back();
+        table.addHiddenColumns(materialized);
+        table.addHiddenColumns(aliases);
+        table.addHiddenColumns(virtuals);
+
+        if (include_alias_cols)
        {
-            NamesAndTypesList materialized;
-            NamesAndTypesList aliases;
-            NamesAndTypesList virtuals;
-            NamesAndTypesList names_and_types = getColumnsFromTableExpression(*table_expression, context, materialized, aliases, virtuals);
+            table.addAliasColumns(aliases);
+        }

-            removeDuplicateColumns(names_and_types);
-
-            tables_with_columns.emplace_back(
-                DatabaseAndTableWithAlias(*table_expression, current_database), names_and_types);
-
-            auto & table = tables_with_columns.back();
-            table.addHiddenColumns(materialized);
-            table.addHiddenColumns(aliases);
-            table.addHiddenColumns(virtuals);
-
-            if (include_alias_cols)
-            {
-                table.addAliasColumns(aliases);
-            }
-
-            if (include_materialized_cols)
-            {
-                table.addMaterializedColumns(materialized);
-            }
+        if (include_materialized_cols)
+        {
+            table.addMaterializedColumns(materialized);
        }
    }

--- a/src/Interpreters/getTableExpressions.h
+++ b/src/Interpreters/getTableExpressions.h
@ -10,13 +10,17 @@ namespace DB
 struct ASTTableExpression;
 class ASTSelectQuery;

+using ASTTableExprConstPtrs = std::vector<const ASTTableExpression *>;
+
 NameSet removeDuplicateColumns(NamesAndTypesList & columns);

-std::vector<const ASTTableExpression *> getTableExpressions(const ASTSelectQuery & select_query);
+ASTTableExprConstPtrs getTableExpressions(const ASTSelectQuery & select_query);
+
 const ASTTableExpression * getTableExpression(const ASTSelectQuery & select, size_t table_number);
+
 ASTPtr extractTableExpression(const ASTSelectQuery & select, size_t table_number);

-NamesAndTypesList getColumnsFromTableExpression(const ASTTableExpression & table_expression, ContextPtr context);
-TablesWithColumns getDatabaseAndTablesWithColumns(const std::vector<const ASTTableExpression *> & table_expressions, ContextPtr context);
+TablesWithColumns getDatabaseAndTablesWithColumns(
+    const ASTTableExprConstPtrs & table_expressions, ContextPtr context, bool include_alias_cols, bool include_materialized_cols);

 }
--- a/src/Parsers/ParserSelectQuery.cpp
+++ b/src/Parsers/ParserSelectQuery.cpp
@ -1,4 +1,5 @@
 #include <memory>
+#include <Parsers/ASTLiteral.h>
 #include <Parsers/ASTSelectQuery.h>
 #include <Parsers/IParserBase.h>
 #include <Parsers/CommonParsers.h>
@ -16,11 +17,12 @@ namespace DB

 namespace ErrorCodes
 {
-    extern const int TOP_AND_LIMIT_TOGETHER;
-    extern const int WITH_TIES_WITHOUT_ORDER_BY;
+    extern const int FIRST_AND_NEXT_TOGETHER;
    extern const int LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED;
    extern const int ROW_AND_ROWS_TOGETHER;
-    extern const int FIRST_AND_NEXT_TOGETHER;
+    extern const int SYNTAX_ERROR;
+    extern const int TOP_AND_LIMIT_TOGETHER;
+    extern const int WITH_TIES_WITHOUT_ORDER_BY;
 }


@ -32,6 +34,7 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
    ParserKeyword s_select("SELECT");
    ParserKeyword s_all("ALL");
    ParserKeyword s_distinct("DISTINCT");
+    ParserKeyword s_distinct_on("DISTINCT ON");
    ParserKeyword s_from("FROM");
    ParserKeyword s_prewhere("PREWHERE");
    ParserKeyword s_where("WHERE");
@ -77,12 +80,13 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
    ASTPtr limit_by_length;
    ASTPtr limit_by_offset;
    ASTPtr limit_by_expression_list;
+    ASTPtr distinct_on_expression_list;
    ASTPtr limit_offset;
    ASTPtr limit_length;
    ASTPtr top_length;
    ASTPtr settings;

-    /// WITH expr list
+    /// WITH expr_list
    {
        if (s_with.ignore(pos, expected))
        {
@ -94,7 +98,7 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
        }
    }

-    /// SELECT [ALL/DISTINCT] [TOP N [WITH TIES]] expr list
+    /// SELECT [ALL/DISTINCT [ON (expr_list)]] [TOP N [WITH TIES]] expr_list
    {
        bool has_all = false;
        if (!s_select.ignore(pos, expected))
@ -103,13 +107,27 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
        if (s_all.ignore(pos, expected))
            has_all = true;

-        if (s_distinct.ignore(pos, expected))
+        if (s_distinct_on.ignore(pos, expected))
+        {
+            if (open_bracket.ignore(pos, expected))
+            {
+                if (!exp_list.parse(pos, distinct_on_expression_list, expected))
+                    return false;
+                if (!close_bracket.ignore(pos, expected))
+                    return false;
+            }
+            else
+                return false;
+        }
+        else if (s_distinct.ignore(pos, expected))
+        {
            select_query->distinct = true;
+        }

        if (!has_all && s_all.ignore(pos, expected))
            has_all = true;

-        if (has_all && select_query->distinct)
+        if (has_all && (select_query->distinct || distinct_on_expression_list))
            return false;

        if (s_top.ignore(pos, expected))
@ -256,13 +274,19 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
            select_query->limit_with_ties = true;
        }

+        if (limit_with_ties_occured && distinct_on_expression_list)
+            throw Exception("Can not use WITH TIES alongside LIMIT BY/DISTINCT ON", ErrorCodes::LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED);
+
        if (s_by.ignore(pos, expected))
        {
            /// WITH TIES was used alongside LIMIT BY
            /// But there are other kind of queries like LIMIT n BY smth LIMIT m WITH TIES which are allowed.
            /// So we have to ignore WITH TIES exactly in LIMIT BY state.
            if (limit_with_ties_occured)
-                throw Exception("Can not use WITH TIES alongside LIMIT BY", ErrorCodes::LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED);
+                throw Exception("Can not use WITH TIES alongside LIMIT BY/DISTINCT ON", ErrorCodes::LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED);
+
+            if (distinct_on_expression_list)
+                throw Exception("Can not use DISTINCT ON alongside LIMIT BY", ErrorCodes::SYNTAX_ERROR);

            limit_by_length = limit_length;
            limit_by_offset = limit_offset;
@ -335,6 +359,17 @@ bool ParserSelectQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
        }
    }

+    if (distinct_on_expression_list)
+    {
+        /// DISTINCT ON and LIMIT BY are mutually exclusive, checked before
+        assert (limit_by_expression_list == nullptr);
+
+        /// Transform `DISTINCT ON expr` to `LIMIT 1 BY expr`
+        limit_by_expression_list = distinct_on_expression_list;
+        limit_by_length = std::make_shared<ASTLiteral>(Field{UInt8(1)});
+        distinct_on_expression_list = nullptr;
+    }
+
    /// Because TOP n in totally equals LIMIT n
    if (top_length)
        limit_length = top_length;
--- a/src/Processors/QueryPipeline.cpp
+++ b/src/Processors/QueryPipeline.cpp
@ -350,6 +350,7 @@ std::unique_ptr<QueryPipeline> QueryPipeline::joinPipelines(
    left->pipe.processors.insert(left->pipe.processors.end(), right->pipe.processors.begin(), right->pipe.processors.end());
    left->pipe.holder = std::move(right->pipe.holder);
    left->pipe.header = left->pipe.output_ports.front()->getHeader();
+    left->pipe.max_parallel_streams = std::max(left->pipe.max_parallel_streams, right->pipe.max_parallel_streams);
    return left;
 }

--- a/src/Processors/printPipeline.cpp
+++ b/src/Processors/printPipeline.cpp
@ -103,7 +103,7 @@ void printPipelineCompact(const Processors & processors, WriteBuffer & out, bool

    out << "digraph\n{\n";
    out << "  rankdir=\"LR\";\n";
-    out << "  { node [shape = box]\n";
+    out << "  { node [shape = rect]\n";

    /// Nodes // TODO quoting and escaping
    size_t next_step = 0;
--- a/src/Processors/printPipeline.h
+++ b/src/Processors/printPipeline.h
@ -16,7 +16,7 @@ void printPipeline(const Processors & processors, const Statuses & statuses, Wri
 {
    out << "digraph\n{\n";
    out << "  rankdir=\"LR\";\n";
-    out << "  { node [shape = box]\n";
+    out << "  { node [shape = rect]\n";

    auto get_proc_id = [](const IProcessor & proc) -> UInt64
    {
--- a/src/Storages/HDFS/StorageHDFS.cpp
+++ b/src/Storages/HDFS/StorageHDFS.cpp
@ -34,6 +34,7 @@ namespace DB
 namespace ErrorCodes
 {
    extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
+    extern const int ACCESS_DENIED;
 }

 StorageHDFS::StorageHDFS(
@ -280,15 +281,7 @@ Pipe StorageHDFS::read(
    size_t max_block_size,
    unsigned num_streams)
 {
-    size_t begin_of_path;
-    /// This uri is checked for correctness in constructor of StorageHDFS and never modified afterwards
-    auto two_slash = uri.find("//");
-
-    if (two_slash == std::string::npos)
-        begin_of_path = uri.find('/');
-    else
-        begin_of_path = uri.find('/', two_slash + 2);
-
+    const size_t begin_of_path = uri.find('/', uri.find("//") + 2);
    const String path_from_uri = uri.substr(begin_of_path);
    const String uri_without_path = uri.substr(0, begin_of_path);

@ -330,6 +323,21 @@ BlockOutputStreamPtr StorageHDFS::write(const ASTPtr & /*query*/, const StorageM
        chooseCompressionMethod(uri, compression_method));
 }

+void StorageHDFS::truncate(const ASTPtr & /* query */, const StorageMetadataPtr &, ContextPtr context_, TableExclusiveLockHolder &)
+{
+    const size_t begin_of_path = uri.find('/', uri.find("//") + 2);
+    const String path = uri.substr(begin_of_path);
+    const String url = uri.substr(0, begin_of_path);
+
+    HDFSBuilderWrapper builder = createHDFSBuilder(url + "/", context_->getGlobalContext()->getConfigRef());
+    HDFSFSPtr fs = createHDFSFS(builder.get());
+
+    int ret = hdfsDelete(fs.get(), path.data(), 0);
+    if (ret)
+        throw Exception(ErrorCodes::ACCESS_DENIED, "Unable to truncate hdfs table: {}", std::string(hdfsGetLastError()));
+}
+
+
 void registerStorageHDFS(StorageFactory & factory)
 {
    factory.registerStorage("HDFS", [](const StorageFactory::Arguments & args)
--- a/src/Storages/HDFS/StorageHDFS.h
+++ b/src/Storages/HDFS/StorageHDFS.h
@ -34,6 +34,8 @@ public:

    BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override;

+    void truncate(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context_, TableExclusiveLockHolder &) override;
+
    NamesAndTypesList getVirtuals() const override;

 protected:
--- a/src/Storages/MergeTree/ActiveDataPartSet.cpp
+++ b/src/Storages/MergeTree/ActiveDataPartSet.cpp
@ -8,6 +8,11 @@
 namespace DB
 {

+namespace ErrorCodes
+{
+    extern const int LOGICAL_ERROR;
+}
+

 ActiveDataPartSet::ActiveDataPartSet(MergeTreeDataFormatVersion format_version_, const Strings & names)
    : format_version(format_version_)
@ -16,8 +21,7 @@ ActiveDataPartSet::ActiveDataPartSet(MergeTreeDataFormatVersion format_version_,
        add(name);
 }

-/// FIXME replace warnings with logical errors
-bool ActiveDataPartSet::add(const String & name, Strings * out_replaced_parts, Poco::Logger * log)
+bool ActiveDataPartSet::add(const String & name, Strings * out_replaced_parts)
 {
    /// TODO make it exception safe (out_replaced_parts->push_back(...) may throw)
    auto part_info = MergeTreePartInfo::fromPartName(name, format_version);
@ -38,11 +42,7 @@ bool ActiveDataPartSet::add(const String & name, Strings * out_replaced_parts, P
        if (!part_info.contains(it->first))
        {
            if (!part_info.isDisjoint(it->first))
-            {
-                if (log)
-                    LOG_ERROR(log, "Part {} intersects previous part {}. It is a bug.", name, it->first.getPartName());
-                assert(false);
-            }
+                throw Exception(ErrorCodes::LOGICAL_ERROR, "Part {} intersects previous part {}. It is a bug.", name, it->first.getPartName());
            ++it;
            break;
        }
@ -65,11 +65,7 @@ bool ActiveDataPartSet::add(const String & name, Strings * out_replaced_parts, P
    }

    if (it != part_info_to_name.end() && !part_info.isDisjoint(it->first))
-    {
-        if (log)
-            LOG_ERROR(log, "Part {} intersects next part {}. It is a bug.", name, it->first.getPartName());
-        assert(false);
-    }
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "Part {} intersects next part {}. It is a bug.", name, it->first.getPartName());

    part_info_to_name.emplace(part_info, name);
    return true;
--- a/src/Storages/MergeTree/ActiveDataPartSet.h
+++ b/src/Storages/MergeTree/ActiveDataPartSet.h
@ -50,7 +50,7 @@ public:

    /// Returns true if the part was actually added. If out_replaced_parts != nullptr, it will contain
    /// parts that were replaced from the set by the newly added part.
-    bool add(const String & name, Strings * out_replaced_parts = nullptr, Poco::Logger * log = nullptr);
+    bool add(const String & name, Strings * out_replaced_parts = nullptr);

    bool remove(const MergeTreePartInfo & part_info)
    {
--- a/src/Storages/MergeTree/DropPartsRanges.cpp
+++ b/src/Storages/MergeTree/DropPartsRanges.cpp
@ -0,0 +1,65 @@
+#include <Storages/MergeTree/DropPartsRanges.h>
+
+namespace DB
+{
+
+namespace ErrorCodes
+{
+    extern const int LOGICAL_ERROR;
+}
+
+
+bool DropPartsRanges::isAffectedByDropRange(const std::string & new_part_name, std::string & postpone_reason) const
+{
+    if (new_part_name.empty())
+        return false;
+
+    MergeTreePartInfo entry_info = MergeTreePartInfo::fromPartName(new_part_name, format_version);
+    for (const auto & [znode, drop_range] : drop_ranges)
+    {
+        if (!drop_range.isDisjoint(entry_info))
+        {
+            postpone_reason = fmt::format("Has DROP RANGE affecting entry {} producing part {}. Will postpone it's execution.", drop_range.getPartName(), new_part_name);
+            return true;
+        }
+    }
+
+    return false;
+}
+
+bool DropPartsRanges::isAffectedByDropRange(const ReplicatedMergeTreeLogEntry & entry, std::string & postpone_reason) const
+{
+    return isAffectedByDropRange(entry.new_part_name, postpone_reason);
+}
+
+void DropPartsRanges::addDropRange(const ReplicatedMergeTreeLogEntryPtr & entry)
+{
+    if (entry->type != ReplicatedMergeTreeLogEntry::DROP_RANGE)
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "Trying to add entry of type {} to drop ranges, expected DROP_RANGE", entry->typeToString());
+
+    MergeTreePartInfo entry_info = MergeTreePartInfo::fromPartName(*entry->getDropRange(format_version), format_version);
+    drop_ranges.emplace(entry->znode_name, entry_info);
+}
+
+void DropPartsRanges::removeDropRange(const ReplicatedMergeTreeLogEntryPtr & entry)
+{
+    if (entry->type != ReplicatedMergeTreeLogEntry::DROP_RANGE)
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "Trying to remove entry of type {} from drop ranges, expected DROP_RANGE", entry->typeToString());
+
+    auto it = drop_ranges.find(entry->znode_name);
+    assert(it != drop_ranges.end());
+    drop_ranges.erase(it);
+}
+
+bool DropPartsRanges::hasDropRange(const MergeTreePartInfo & new_drop_range_info) const
+{
+    for (const auto & [znode_name, drop_range] : drop_ranges)
+    {
+        if (drop_range.contains(new_drop_range_info))
+            return true;
+    }
+
+    return false;
+}
+
+}
--- a/src/Storages/MergeTree/DropPartsRanges.h
+++ b/src/Storages/MergeTree/DropPartsRanges.h
@ -0,0 +1,43 @@
+#pragma once
+
+#include <unordered_map>
+#include <Storages/MergeTree/MergeTreePartInfo.h>
+#include <Storages/MergeTree/MergeTreeDataFormatVersion.h>
+#include <Storages/MergeTree/ReplicatedMergeTreeLogEntry.h>
+
+namespace DB
+{
+
+/// All drop ranges in ReplicatedQueue.
+/// Used to postpone execution of entries affected by DROP RANGE
+class DropPartsRanges
+{
+private:
+    MergeTreeDataFormatVersion format_version;
+
+    /// znode_name -> drop_range
+    std::unordered_map<std::string, MergeTreePartInfo> drop_ranges;
+public:
+
+    explicit DropPartsRanges(MergeTreeDataFormatVersion format_version_)
+        : format_version(format_version_)
+    {}
+
+    /// Entry is affected by DROP_RANGE and must be postponed
+    bool isAffectedByDropRange(const ReplicatedMergeTreeLogEntry & entry, std::string & postpone_reason) const;
+
+    /// Part is affected by DROP_RANGE and must be postponed
+    bool isAffectedByDropRange(const std::string & new_part_name, std::string & postpone_reason) const;
+
+    /// Already has equal DROP_RANGE. Don't need to assign new one
+    bool hasDropRange(const MergeTreePartInfo & new_drop_range_info) const;
+
+    /// Add DROP_RANGE to map
+    void addDropRange(const ReplicatedMergeTreeLogEntryPtr & entry);
+
+    /// Remove DROP_RANGE from map
+    void removeDropRange(const ReplicatedMergeTreeLogEntryPtr & entry);
+
+};
+
+}
--- a/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/src/Storages/MergeTree/MergeTreeData.cpp
@ -1818,11 +1818,10 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
            if (MergeTreeSettings::isPartFormatSetting(setting_name) && !new_value)
            {
                /// Use default settings + new and check if doesn't affect part format settings
-                MergeTreeSettings copy = *getSettings();
-                copy.resetToDefault();
-                copy.applyChanges(new_changes);
+                auto copy = getDefaultSettings();
+                copy->applyChanges(new_changes);
                String reason;
-                if (!canUsePolymorphicParts(copy, &reason) && !reason.empty())
+                if (!canUsePolymorphicParts(*copy, &reason) && !reason.empty())
                    throw Exception("Can't change settings. Reason: " + reason, ErrorCodes::NOT_IMPLEMENTED);
            }

@ -1984,14 +1983,12 @@ void MergeTreeData::changeSettings(
            }
        }

-        MergeTreeSettings copy = *getSettings();
-        /// reset to default settings before applying existing
-        copy.resetToDefault();
-        copy.applyChanges(new_changes);
+        /// Reset to default settings before applying existing.
+        auto copy = getDefaultSettings();
+        copy->applyChanges(new_changes);
+        copy->sanityCheck(getContext()->getSettingsRef());

-        copy.sanityCheck(getContext()->getSettingsRef());
-
-        storage_settings.set(std::make_unique<const MergeTreeSettings>(copy));
+        storage_settings.set(std::move(copy));
        StorageInMemoryMetadata new_metadata = getInMemoryMetadata();
        new_metadata.setSettingsChanges(new_settings);
        setInMemoryMetadata(new_metadata);
@ -3980,13 +3977,11 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
            candidate.where_column_name = analysis_result.where_column_name;
            candidate.remove_where_filter = analysis_result.remove_where_filter;
            candidate.before_where = analysis_result.before_where->clone();
-            // std::cerr << fmt::format("before_where_actions = \n{}", candidate.before_where->dumpDAG()) << std::endl;
+
            required_columns = candidate.before_where->foldActionsByProjection(
                required_columns,
                projection.sample_block_for_keys,
                candidate.where_column_name);
-            // std::cerr << fmt::format("before_where_actions = \n{}", candidate.before_where->dumpDAG()) << std::endl;
-            // std::cerr << fmt::format("where_required_columns = \n{}", fmt::join(required_columns, ", ")) << std::endl;

            if (required_columns.empty())
                return false;
@ -4002,12 +3997,11 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
            // required_columns should not contain columns generated by prewhere
            for (const auto & column : prewhere_actions->getResultColumns())
                required_columns.erase(column.name);
-            // std::cerr << fmt::format("prewhere_actions = \n{}", prewhere_actions->dumpDAG()) << std::endl;
+
            // Prewhere_action should not add missing keys.
            prewhere_required_columns = prewhere_actions->foldActionsByProjection(
                prewhere_required_columns, projection.sample_block_for_keys, candidate.prewhere_info->prewhere_column_name, false);
-            // std::cerr << fmt::format("prewhere_actions = \n{}", prewhere_actions->dumpDAG()) << std::endl;
-            // std::cerr << fmt::format("prewhere_required_columns = \n{}", fmt::join(prewhere_required_columns, ", ")) << std::endl;
+
            if (prewhere_required_columns.empty())
                return false;
            candidate.prewhere_info->prewhere_actions = prewhere_actions;
@ -4017,7 +4011,7 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
                auto row_level_filter_actions = candidate.prewhere_info->row_level_filter->clone();
                prewhere_required_columns = row_level_filter_actions->foldActionsByProjection(
                    prewhere_required_columns, projection.sample_block_for_keys, candidate.prewhere_info->row_level_column_name, false);
-                // std::cerr << fmt::format("row_level_filter_required_columns = \n{}", fmt::join(prewhere_required_columns, ", ")) << std::endl;
+
                if (prewhere_required_columns.empty())
                    return false;
                candidate.prewhere_info->row_level_filter = row_level_filter_actions;
@ -4026,11 +4020,9 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
            if (candidate.prewhere_info->alias_actions)
            {
                auto alias_actions = candidate.prewhere_info->alias_actions->clone();
-                // std::cerr << fmt::format("alias_actions = \n{}", alias_actions->dumpDAG()) << std::endl;
                prewhere_required_columns
                    = alias_actions->foldActionsByProjection(prewhere_required_columns, projection.sample_block_for_keys, {}, false);
-                // std::cerr << fmt::format("alias_actions = \n{}", alias_actions->dumpDAG()) << std::endl;
-                // std::cerr << fmt::format("alias_required_columns = \n{}", fmt::join(prewhere_required_columns, ", ")) << std::endl;
+
                if (prewhere_required_columns.empty())
                    return false;
                candidate.prewhere_info->alias_actions = alias_actions;
@ -4058,7 +4050,6 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(

        if (projection.type == ProjectionDescription::Type::Aggregate && analysis_result.need_aggregate && can_use_aggregate_projection)
        {
-            // std::cerr << fmt::format("====== aggregate projection analysis: {} ======", projection.name) << std::endl;
            bool match = true;
            Block aggregates;
            // Let's first check if all aggregates are provided by current projection
@ -4084,11 +4075,8 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
            // needs to provide aggregation keys, and certain children DAG might be substituted by
            // some keys in projection.
            candidate.before_aggregation = analysis_result.before_aggregation->clone();
-            // std::cerr << fmt::format("keys = {}", fmt::join(keys, ", ")) << std::endl;
-            // std::cerr << fmt::format("before_aggregation = \n{}", candidate.before_aggregation->dumpDAG()) << std::endl;
            auto required_columns = candidate.before_aggregation->foldActionsByProjection(keys, projection.sample_block_for_keys);
-            // std::cerr << fmt::format("before_aggregation = \n{}", candidate.before_aggregation->dumpDAG()) << std::endl;
-            // std::cerr << fmt::format("aggregate_required_columns = \n{}", fmt::join(required_columns, ", ")) << std::endl;
+
            if (required_columns.empty() && !keys.empty())
                continue;

@ -4113,12 +4101,10 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
                    candidate.required_columns.push_back(aggregate.name);
                candidates.push_back(std::move(candidate));
            }
-            // std::cerr << fmt::format("====== aggregate projection analysis end: {} ======", projection.name) << std::endl;
        }

        if (projection.type == ProjectionDescription::Type::Normal && (analysis_result.hasWhere() || analysis_result.hasPrewhere()))
        {
-            // std::cerr << fmt::format("====== normal projection analysis: {} ======", projection.name) << std::endl;
            const auto & actions
                = analysis_result.before_aggregation ? analysis_result.before_aggregation : analysis_result.before_order_by;
            NameSet required_columns;
@ -4130,16 +4116,12 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection(
                candidate.required_columns = {required_columns.begin(), required_columns.end()};
                candidates.push_back(std::move(candidate));
            }
-            // std::cerr << fmt::format("====== normal projection analysis end: {} ======", projection.name) << std::endl;
        }
    }

    // Let's select the best projection to execute the query.
    if (!candidates.empty())
    {
-        // First build a MergeTreeDataSelectCache to check if a projection is indeed better than base
-        // query_info.merge_tree_data_select_cache = std::make_unique<MergeTreeDataSelectCache>();
-
        std::shared_ptr<PartitionIdToMaxBlock> max_added_blocks;
        if (settings.select_sequential_consistency)
        {
--- a/src/Storages/MergeTree/MergeTreeData.h
+++ b/src/Storages/MergeTree/MergeTreeData.h
@ -1087,6 +1087,9 @@ private:

    // Get partition matcher for FREEZE / UNFREEZE queries.
    MatcherFn getPartitionMatcher(const ASTPtr & partition, ContextPtr context) const;
+
+    /// Returns default settings for storage with possible changes from global config.
+    virtual std::unique_ptr<MergeTreeSettings> getDefaultSettings() const = 0;
 };

 /// RAII struct to record big parts that are submerging or emerging.
--- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
@ -752,13 +752,16 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    bool force_ttl = false;
    for (const auto & part : parts)
    {
-        new_data_part->ttl_infos.update(part->ttl_infos);
        if (metadata_snapshot->hasAnyTTL() && !part->checkAllTTLCalculated(metadata_snapshot))
        {
            LOG_INFO(log, "Some TTL values were not calculated for part {}. Will calculate them forcefully during merge.", part->name);
            need_remove_expired_values = true;
            force_ttl = true;
        }
+        else
+        {
+            new_data_part->ttl_infos.update(part->ttl_infos);
+        }
    }

    const auto & part_min_ttl = new_data_part->ttl_infos.part_min_ttl;
@ -939,7 +942,10 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
        merged_stream = std::make_shared<DistinctSortedBlockInputStream>(merged_stream, sort_description, SizeLimits(), 0 /*limit_hint*/, deduplicate_by_columns);

    if (need_remove_expired_values)
+    {
+        LOG_DEBUG(log, "Outdated rows found in source parts, TTLs processing enabled for merge");
        merged_stream = std::make_shared<TTLBlockInputStream>(merged_stream, data, metadata_snapshot, new_data_part, time_of_merge, force_ttl);
+    }

    if (metadata_snapshot->hasSecondaryIndices())
    {
--- a/src/Storages/MergeTree/MergeTreeDataPartTTLInfo.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataPartTTLInfo.cpp
@ -55,6 +55,10 @@ void MergeTreeDataPartTTLInfos::read(ReadBuffer & in)
            MergeTreeDataPartTTLInfo ttl_info;
            ttl_info.min = col["min"].getUInt();
            ttl_info.max = col["max"].getUInt();
+
+            if (col.has("finished"))
+                ttl_info.finished = col["finished"].getUInt();
+
            String name = col["name"].getString();
            columns_ttl.emplace(name, ttl_info);

@ -67,6 +71,9 @@ void MergeTreeDataPartTTLInfos::read(ReadBuffer & in)
        table_ttl.min = table["min"].getUInt();
        table_ttl.max = table["max"].getUInt();

+        if (table.has("finished"))
+            table_ttl.finished = table["finished"].getUInt();
+
        updatePartMinMaxTTL(table_ttl.min, table_ttl.max);
    }

@ -77,6 +84,10 @@ void MergeTreeDataPartTTLInfos::read(ReadBuffer & in)
            MergeTreeDataPartTTLInfo ttl_info;
            ttl_info.min = elem["min"].getUInt();
            ttl_info.max = elem["max"].getUInt();
+
+            if (elem.has("finished"))
+                ttl_info.finished = elem["finished"].getUInt();
+
            String expression = elem["expression"].getString();
            ttl_info_map.emplace(expression, ttl_info);

@ -126,6 +137,8 @@ void MergeTreeDataPartTTLInfos::write(WriteBuffer & out) const
            writeIntText(it->second.min, out);
            writeString(",\"max\":", out);
            writeIntText(it->second.max, out);
+            writeString(R"(,"finished":)", out);
+            writeIntText(static_cast<uint8_t>(it->second.finished), out);
            writeString("}", out);
        }
        writeString("]", out);
@ -138,6 +151,8 @@ void MergeTreeDataPartTTLInfos::write(WriteBuffer & out) const
        writeIntText(table_ttl.min, out);
        writeString(R"(,"max":)", out);
        writeIntText(table_ttl.max, out);
+        writeString(R"(,"finished":)", out);
+        writeIntText(static_cast<uint8_t>(table_ttl.finished), out);
        writeString("}", out);
    }

@ -159,6 +174,8 @@ void MergeTreeDataPartTTLInfos::write(WriteBuffer & out) const
            writeIntText(it->second.min, out);
            writeString(R"(,"max":)", out);
            writeIntText(it->second.max, out);
+            writeString(R"(,"finished":)", out);
+            writeIntText(static_cast<uint8_t>(it->second.finished), out);
            writeString("}", out);
        }
        writeString("]", out);
@ -202,6 +219,39 @@ time_t MergeTreeDataPartTTLInfos::getMinimalMaxRecompressionTTL() const
    return max;
 }

+bool MergeTreeDataPartTTLInfos::hasAnyNonFinishedTTLs() const
+{
+    auto has_non_finished_ttl = [] (const TTLInfoMap & map) -> bool
+    {
+        for (const auto & [name, info] : map)
+        {
+            if (!info.finished)
+                return true;
+        }
+        return false;
+    };
+
+    if (!table_ttl.finished)
+        return true;
+
+    if (has_non_finished_ttl(columns_ttl))
+        return true;
+
+    if (has_non_finished_ttl(rows_where_ttl))
+        return true;
+
+    if (has_non_finished_ttl(moves_ttl))
+        return true;
+
+    if (has_non_finished_ttl(recompression_ttl))
+        return true;
+
+    if (has_non_finished_ttl(group_by_ttl))
+        return true;
+
+    return false;
+}
+
 std::optional<TTLDescription> selectTTLDescriptionForTTLInfos(const TTLDescriptions & descriptions, const TTLInfoMap & ttl_info_map, time_t current_time, bool use_max)
 {
    time_t best_ttl_time = 0;
@ -232,4 +282,5 @@ std::optional<TTLDescription> selectTTLDescriptionForTTLInfos(const TTLDescripti
    return best_ttl_time ? *best_entry_it : std::optional<TTLDescription>();
 }

+
 }
--- a/src/Storages/MergeTree/MergeTreeDataPartTTLInfo.h
+++ b/src/Storages/MergeTree/MergeTreeDataPartTTLInfo.h
@ -14,6 +14,11 @@ struct MergeTreeDataPartTTLInfo
    time_t min = 0;
    time_t max = 0;

+    /// This TTL was computed on completely expired part. It doesn't make sense
+    /// to select such parts for TTL again. But make sense to recalcuate TTL
+    /// again for merge with multiple parts.
+    bool finished = false;
+
    void update(time_t time)
    {
        if (time && (!min || time < min))
@ -28,6 +33,7 @@ struct MergeTreeDataPartTTLInfo
            min = other_info.min;

        max = std::max(other_info.max, max);
+        finished &= other_info.finished;
    }
 };

@ -60,6 +66,9 @@ struct MergeTreeDataPartTTLInfos
    void write(WriteBuffer & out) const;
    void update(const MergeTreeDataPartTTLInfos & other_infos);

+    /// Has any TTLs which are not calculated on completely expired parts.
+    bool hasAnyNonFinishedTTLs() const;
+
    void updatePartMinMaxTTL(time_t time_min, time_t time_max)
    {
        if (time_min && (!part_min_ttl || time_min < part_min_ttl))
--- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@ -201,7 +201,6 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read(
            // NOTE: prewhere is executed inside readFromParts
            if (query_info.projection->before_where)
            {
-                // std::cerr << fmt::format("projection before_where: {}", query_info.projection->before_where->dumpDAG());
                auto where_step = std::make_unique<FilterStep>(
                    plan->getCurrentDataStream(),
                    query_info.projection->before_where,
@ -214,7 +213,6 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read(

            if (query_info.projection->before_aggregation)
            {
-                // std::cerr << fmt::format("projection before_aggregation: {}", query_info.projection->before_aggregation->dumpDAG());
                auto expression_before_aggregation
                    = std::make_unique<ExpressionStep>(plan->getCurrentDataStream(), query_info.projection->before_aggregation);
                expression_before_aggregation->setStepDescription("Before GROUP BY");
@ -268,9 +266,6 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read(
        {
            const auto & header_before_aggregation = pipe.getHeader();

-            // std::cerr << "============ header before aggregation" << std::endl;
-            // std::cerr << header_before_aggregation.dumpStructure() << std::endl;
-
            ColumnNumbers keys;
            for (const auto & key : query_info.projection->aggregation_keys)
                keys.push_back(header_before_aggregation.getPositionByName(key.name));
@ -350,9 +345,6 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read(
                return std::make_shared<AggregatingTransform>(
                    header, transform_params, many_data, counter++, merge_threads, temporary_data_merge_threads);
            });
-
-            // std::cerr << "============ header after aggregation" << std::endl;
-            // std::cerr << pipe.getHeader().dumpStructure() << std::endl;
        };

        if (!projection_pipe.empty())
--- a/src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.cpp
+++ b/src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.cpp
@ -431,6 +431,16 @@ std::optional<String> ReplicatedMergeTreeLogEntryData::getDropRange(MergeTreeDat
    return {};
 }

+bool ReplicatedMergeTreeLogEntryData::isDropPart(MergeTreeDataFormatVersion format_version) const
+{
+    if (type == DROP_RANGE)
+    {
+        auto drop_range_info = MergeTreePartInfo::fromPartName(new_part_name, format_version);
+        return !drop_range_info.isFakeDropRangePart();
+    }
+    return false;
+}
+
 Strings ReplicatedMergeTreeLogEntryData::getVirtualPartNames(MergeTreeDataFormatVersion format_version) const
 {
    /// Doesn't produce any part
@ -439,7 +449,30 @@ Strings ReplicatedMergeTreeLogEntryData::getVirtualPartNames(MergeTreeDataFormat

    /// DROP_RANGE does not add a real part, but we must disable merges in that range
    if (type == DROP_RANGE)
+    {
+        auto drop_range_part_info = MergeTreePartInfo::fromPartName(new_part_name, format_version);
+
+        /// It's DROP PART and we don't want to add it into virtual parts
+        /// because it can lead to intersecting parts on stale replicas and this
+        /// problem is fundamental. So we have very weak guarantees for DROP
+        /// PART. If any concurrent merge will be assigned then DROP PART will
+        /// delete nothing and part will be successfully merged into bigger part.
+        ///
+        /// dropPart used in the following cases:
+        /// 1) Remove empty parts after TTL.
+        /// 2) Remove parts after move between shards.
+        /// 3) User queries: ALTER TABLE DROP PART 'part_name'.
+        ///
+        /// In the first case merge of empty part is even better than DROP. In
+        /// the second case part UUIDs used to forbid merges for moding parts so
+        /// there is no problem with concurrent merges. The third case is quite
+        /// rare and we give very weak guarantee: there will be no active part
+        /// with this name, but possibly it was merged to some other part.
+        if (!drop_range_part_info.isFakeDropRangePart())
+            return {};
+
        return {new_part_name};
+    }

    if (type == REPLACE_RANGE)
    {
--- a/src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.h
+++ b/src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.h
@ -143,6 +143,10 @@ struct ReplicatedMergeTreeLogEntryData
    /// Returns fake part for drop range (for DROP_RANGE and REPLACE_RANGE)
    std::optional<String> getDropRange(MergeTreeDataFormatVersion format_version) const;

+    /// This entry is DROP PART, not DROP PARTITION. They both have same
+    /// DROP_RANGE entry type, but differs in information about drop range.
+    bool isDropPart(MergeTreeDataFormatVersion format_version) const;
+
    /// Access under queue_mutex, see ReplicatedMergeTreeQueue.
    bool currently_executing = false;    /// Whether the action is executing now.
    bool removed_by_other_entry = false;
--- a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp
+++ b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp
@ -26,6 +26,7 @@ ReplicatedMergeTreeQueue::ReplicatedMergeTreeQueue(StorageReplicatedMergeTree &
    , format_version(storage.format_version)
    , current_parts(format_version)
    , virtual_parts(format_version)
+    , drop_ranges(format_version)
 {
    zookeeper_path = storage.zookeeper_path;
    replica_path = storage.replica_path;
@ -52,8 +53,8 @@ void ReplicatedMergeTreeQueue::initialize(const MergeTreeData::DataParts & parts
    std::lock_guard lock(state_mutex);
    for (const auto & part : parts)
    {
-        current_parts.add(part->name, nullptr, log);
-        virtual_parts.add(part->name, nullptr, log);
+        current_parts.add(part->name, nullptr);
+        virtual_parts.add(part->name, nullptr);
    }
 }

@ -154,7 +155,7 @@ void ReplicatedMergeTreeQueue::insertUnlocked(
 {
    for (const String & virtual_part_name : entry->getVirtualPartNames(format_version))
    {
-        virtual_parts.add(virtual_part_name, nullptr, log);
+        virtual_parts.add(virtual_part_name, nullptr);
        /// Don't add drop range parts to mutations
        /// they don't produce any useful parts
        if (entry->type != LogEntry::DROP_RANGE)
@ -168,6 +169,13 @@ void ReplicatedMergeTreeQueue::insertUnlocked(
    }
    else
    {
+        drop_ranges.addDropRange(entry);
+
+        /// DROP PART remove parts, so we remove it from virtual parts to
+        /// preserve invariant virtual_parts = current_parts + queue
+        if (entry->isDropPart(format_version))
+            virtual_parts.removePartAndCoveredParts(*entry->getDropRange(format_version));
+
        queue.push_front(entry);
    }

@ -248,7 +256,7 @@ void ReplicatedMergeTreeQueue::updateStateOnQueueEntryRemoval(

        for (const String & virtual_part_name : entry->getVirtualPartNames(format_version))
        {
-            current_parts.add(virtual_part_name, nullptr, log);
+            current_parts.add(virtual_part_name, nullptr);

            /// These parts are already covered by newer part, we don't have to
            /// mutate it.
@ -257,10 +265,23 @@ void ReplicatedMergeTreeQueue::updateStateOnQueueEntryRemoval(

        if (auto drop_range_part_name = entry->getDropRange(format_version))
        {
-            current_parts.remove(*drop_range_part_name);
+            MergeTreePartInfo drop_range_info = MergeTreePartInfo::fromPartName(*drop_range_part_name, format_version);
+
+            /// DROP PART doesn't have virtual parts so remove from current
+            /// parts all covered parts.
+            if (entry->isDropPart(format_version))
+                current_parts.removePartAndCoveredParts(*drop_range_part_name);
+            else
+                current_parts.remove(*drop_range_part_name);
+
            virtual_parts.remove(*drop_range_part_name);
        }

+        if (entry->type == LogEntry::DROP_RANGE)
+        {
+            drop_ranges.removeDropRange(entry);
+        }
+
        if (entry->type == LogEntry::ALTER_METADATA)
        {
            LOG_TRACE(log, "Finishing metadata alter with version {}", entry->alter_version);
@ -269,6 +290,11 @@ void ReplicatedMergeTreeQueue::updateStateOnQueueEntryRemoval(
    }
    else
    {
+        if (entry->type == LogEntry::DROP_RANGE)
+        {
+            drop_ranges.removeDropRange(entry);
+        }
+
        for (const String & virtual_part_name : entry->getVirtualPartNames(format_version))
        {
            /// Because execution of the entry is unsuccessful,
@ -978,6 +1004,11 @@ bool ReplicatedMergeTreeQueue::addFuturePartIfNotCoveredByThem(const String & pa
 {
    std::lock_guard lock(state_mutex);

+    /// FIXME get rid of actual_part_name.
+    /// If new covering part jumps over DROP_RANGE we should execute drop range first
+    if (drop_ranges.isAffectedByDropRange(part_name, reject_reason))
+        return false;
+
    if (isNotCoveredByFuturePartsImpl(entry.znode_name, part_name, reject_reason, lock))
    {
        CurrentlyExecuting::setActualPartName(entry, part_name, *this);
@ -1003,6 +1034,9 @@ bool ReplicatedMergeTreeQueue::shouldExecuteLogEntry(
            return false;
    }

+    if (entry.type != LogEntry::DROP_RANGE && drop_ranges.isAffectedByDropRange(entry, out_postpone_reason))
+        return false;
+
    /// Check that fetches pool is not overloaded
    if ((entry.type == LogEntry::GET_PART || entry.type == LogEntry::ATTACH_PART)
        && !storage.canExecuteFetch(entry, out_postpone_reason))
@ -2074,6 +2108,12 @@ bool ReplicatedMergeTreeMergePredicate::isMutationFinished(const ReplicatedMerge
    return true;
 }

+bool ReplicatedMergeTreeMergePredicate::hasDropRange(const MergeTreePartInfo & new_drop_range_info) const
+{
+    std::lock_guard lock(queue.state_mutex);
+    return queue.drop_ranges.hasDropRange(new_drop_range_info);
+}
+

 ReplicatedMergeTreeQueue::SubscriberHandler
 ReplicatedMergeTreeQueue::addSubscriber(ReplicatedMergeTreeQueue::SubscriberCallBack && callback)
--- a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.h
+++ b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.h
@ -11,6 +11,7 @@
 #include <Storages/MergeTree/PinnedPartUUIDs.h>
 #include <Storages/MergeTree/ReplicatedMergeTreeQuorumAddedParts.h>
 #include <Storages/MergeTree/ReplicatedMergeTreeAltersSequence.h>
+#include <Storages/MergeTree/DropPartsRanges.h>

 #include <Common/ZooKeeper/ZooKeeper.h>

@ -100,6 +101,10 @@ private:
      */
    ActiveDataPartSet virtual_parts;

+
+    /// Dropped ranges inserted into queue
+    DropPartsRanges drop_ranges;
+
    /// A set of mutations loaded from ZooKeeper.
    /// mutations_by_partition is an index partition ID -> block ID -> mutation into this set.
    /// Note that mutations are updated in such a way that they are always more recent than
@ -475,6 +480,8 @@ public:
    /// The version of "log" node that is used to check that no new merges have appeared.
    int32_t getVersion() const { return merges_version; }

+    bool hasDropRange(const MergeTreePartInfo & new_drop_range_info) const;
+
 private:
    const ReplicatedMergeTreeQueue & queue;

--- a/src/Storages/MergeTree/StorageFromBasePartsOfProjection.h
+++ b/src/Storages/MergeTree/StorageFromBasePartsOfProjection.h
@ -1,75 +0,0 @@
-#pragma once
-
-#include <Core/Defines.h>
-#include <Processors/QueryPipeline.h>
-#include <Processors/QueryPlan/BuildQueryPipelineSettings.h>
-#include <Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h>
-#include <Processors/QueryPlan/QueryPlan.h>
-#include <Storages/IStorage.h>
-#include <Storages/MergeTree/IMergeTreeDataPart.h>
-#include <Storages/MergeTree/MergeTreeDataSelectExecutor.h>
-
-#include <common/shared_ptr_helper.h>
-
-
-namespace DB
-{
-/// A Storage that allows reading from a single MergeTree data part.
-class StorageFromBasePartsOfProjection final : public shared_ptr_helper<StorageFromBasePartsOfProjection>, public IStorage
-{
-    friend struct shared_ptr_helper<StorageFromBasePartsOfProjection>;
-
-public:
-    String getName() const override { return "FromBasePartsOfProjection"; }
-
-    Pipe read(
-        const Names & column_names,
-        const StorageMetadataPtr & metadata_snapshot,
-        SelectQueryInfo & query_info,
-        ContextPtr context,
-        QueryProcessingStage::Enum /*processed_stage*/,
-        size_t max_block_size,
-        unsigned num_streams) override
-    {
-        // NOTE: It's used to read normal parts only
-        QueryPlan query_plan = std::move(*MergeTreeDataSelectExecutor(storage).readFromParts(
-            {},
-            column_names,
-            metadata_snapshot,
-            metadata_snapshot,
-            query_info,
-            context,
-            max_block_size,
-            num_streams,
-            nullptr,
-            query_info.projection ? query_info.projection->merge_tree_data_select_base_cache.get()
-                                  : query_info.merge_tree_data_select_cache.get()));
-
-        return query_plan.convertToPipe(
-            QueryPlanOptimizationSettings::fromContext(context), BuildQueryPipelineSettings::fromContext(context));
-    }
-
-
-    bool supportsIndexForIn() const override { return true; }
-
-    bool mayBenefitFromIndexForIn(
-        const ASTPtr & left_in_operand, ContextPtr query_context, const StorageMetadataPtr & metadata_snapshot) const override
-    {
-        return storage.mayBenefitFromIndexForIn(left_in_operand, query_context, metadata_snapshot);
-    }
-
-    NamesAndTypesList getVirtuals() const override { return storage.getVirtuals(); }
-
-protected:
-    StorageFromBasePartsOfProjection(const MergeTreeData & storage_, const StorageMetadataPtr & metadata_snapshot)
-        : IStorage(storage_.getStorageID()), storage(storage_)
-    {
-        setInMemoryMetadata(*metadata_snapshot);
-    }
-
-
-private:
-    const MergeTreeData & storage;
-};
-
-}
--- a/src/Storages/MergeTree/TTLMergeSelector.cpp
+++ b/src/Storages/MergeTree/TTLMergeSelector.cpp
@ -111,6 +111,10 @@ bool TTLDeleteMergeSelector::isTTLAlreadySatisfied(const IMergeSelector::Part &
    if (only_drop_parts)
        return false;

+    /// All TTL satisfied
+    if (!part.ttl_infos->hasAnyNonFinishedTTLs())
+        return true;
+
    return !part.shall_participate_in_merges;
 }

--- a/src/Storages/SelectQueryInfo.h
+++ b/src/Storages/SelectQueryInfo.h
@ -99,8 +99,6 @@ class IMergeTreeDataPart;

 using ManyExpressionActions = std::vector<ExpressionActionsPtr>;

-struct MergeTreeDataSelectCache;
-
 // The projection selected to execute current query
 struct ProjectionCandidate
 {
@ -119,8 +117,6 @@ struct ProjectionCandidate
    ReadInOrderOptimizerPtr order_optimizer;
    InputOrderInfoPtr input_order_info;
    ManyExpressionActions group_by_elements_actions;
-    // std::shared_ptr<MergeTreeDataSelectCache> merge_tree_data_select_base_cache;
-    // std::shared_ptr<MergeTreeDataSelectCache> merge_tree_data_select_projection_cache;
 };

 /** Query along with some additional data,
@ -160,7 +156,6 @@ struct SelectQueryInfo
    /// If not null, it means we choose a projection to execute current query.
    std::optional<ProjectionCandidate> projection;
    bool ignore_projections = false;
-    std::shared_ptr<MergeTreeDataSelectCache> merge_tree_data_select_cache;
 };

 }
--- a/src/Storages/StorageMergeTree.cpp
+++ b/src/Storages/StorageMergeTree.cpp
@ -1604,4 +1604,9 @@ void StorageMergeTree::startBackgroundMovesIfNeeded()
        background_moves_executor.start();
 }

+std::unique_ptr<MergeTreeSettings> StorageMergeTree::getDefaultSettings() const
+{
+    return std::make_unique<MergeTreeSettings>(getContext()->getMergeTreeSettings());
+}
+
 }
--- a/src/Storages/StorageMergeTree.h
+++ b/src/Storages/StorageMergeTree.h
@ -236,6 +236,8 @@ private:

    void startBackgroundMovesIfNeeded() override;

+    std::unique_ptr<MergeTreeSettings> getDefaultSettings() const override;
+
    friend class MergeTreeProjectionBlockOutputStream;
    friend class MergeTreeBlockOutputStream;
    friend class MergeTreeData;
--- a/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/src/Storages/StorageReplicatedMergeTree.cpp
@ -552,6 +552,10 @@ void StorageReplicatedMergeTree::waitMutationToFinishOnReplicas(
                break;
        }

+        /// This replica inactive, don't check anything
+        if (!inactive_replicas.empty() && inactive_replicas.count(replica))
+            break;
+
        /// It maybe already removed from zk, but local in-memory mutations
        /// state was not updated.
        if (!getZooKeeper()->exists(fs::path(zookeeper_path) / "mutations" / mutation_id))
@ -2104,6 +2108,10 @@ bool StorageReplicatedMergeTree::executeFetch(LogEntry & entry)
        try
        {
            String part_name = entry.actual_new_part_name.empty() ? entry.new_part_name : entry.actual_new_part_name;
+
+            if (!entry.actual_new_part_name.empty())
+                LOG_DEBUG(log, "Will fetch part {} instead of {}", entry.actual_new_part_name, entry.new_part_name);
+
            if (!fetchPart(part_name, metadata_snapshot, fs::path(zookeeper_path) / "replicas" / replica, false, entry.quorum))
                return false;
        }
@ -6986,6 +6994,14 @@ bool StorageReplicatedMergeTree::dropPartImpl(
            return false;
        }

+        if (merge_pred.hasDropRange(part->info))
+        {
+            if (throw_if_noop)
+                throw Exception("Already has DROP RANGE for part " + part_name + " in queue.", ErrorCodes::PART_IS_TEMPORARILY_LOCKED);
+
+            return false;
+        }
+
        /// There isn't a lot we can do otherwise. Can't cancel merges because it is possible that a replica already
        /// finished the merge.
        if (partIsAssignedToBackgroundOperation(part))
@ -7174,6 +7190,11 @@ void StorageReplicatedMergeTree::startBackgroundMovesIfNeeded()
        background_moves_executor.start();
 }

+std::unique_ptr<MergeTreeSettings> StorageReplicatedMergeTree::getDefaultSettings() const
+{
+    return std::make_unique<MergeTreeSettings>(getContext()->getReplicatedMergeTreeSettings());
+}
+

 void StorageReplicatedMergeTree::lockSharedData(const IMergeTreeDataPart & part) const
 {
--- a/src/Storages/StorageReplicatedMergeTree.h
+++ b/src/Storages/StorageReplicatedMergeTree.h
@ -688,6 +688,24 @@ private:
        bool fetch_part,
        ContextPtr query_context) override;

+    /// NOTE: there are no guarantees for concurrent merges. Dropping part can
+    /// be concurrently merged into some covering part and dropPart will do
+    /// nothing. There are some fundamental problems with it. But this is OK
+    /// because:
+    ///
+    /// dropPart used in the following cases:
+    /// 1) Remove empty parts after TTL.
+    /// 2) Remove parts after move between shards.
+    /// 3) User queries: ALTER TABLE DROP PART 'part_name'.
+    ///
+    /// In the first case merge of empty part is even better than DROP. In the
+    /// second case part UUIDs used to forbid merges for moving parts so there
+    /// is no problem with concurrent merges. The third case is quite rare and
+    /// we give very weak guarantee: there will be no active part with this
+    /// name, but possibly it was merged to some other part.
+    ///
+    /// NOTE: don't rely on dropPart if you 100% need to remove non-empty part
+    /// and don't use any explicit locking mechanism for merges.
    bool dropPartImpl(zkutil::ZooKeeperPtr & zookeeper, String part_name, LogEntry & entry, bool detach, bool throw_if_noop);

    /// Check granularity of already existing replicated table in zookeeper if it exists
@ -702,6 +720,8 @@ private:

    void startBackgroundMovesIfNeeded() override;

+    std::unique_ptr<MergeTreeSettings> getDefaultSettings() const override;
+
    std::set<String> getPartitionIdsAffectedByCommands(const MutationCommands & commands, ContextPtr query_context) const;
    PartitionBlockNumbersHolder allocateBlockNumbersInAffectedPartitions(
        const MutationCommands & commands, ContextPtr query_context, const zkutil::ZooKeeperPtr & zookeeper) const;
--- a/src/Storages/StorageS3.cpp
+++ b/src/Storages/StorageS3.cpp
@ -27,6 +27,8 @@
 #include <aws/core/auth/AWSCredentials.h>
 #include <aws/s3/S3Client.h>
 #include <aws/s3/model/ListObjectsV2Request.h>
+#include <aws/s3/model/CopyObjectRequest.h>
+#include <aws/s3/model/DeleteObjectsRequest.h>

 #include <Common/parseGlobs.h>
 #include <Common/quoteString.h>
@ -434,6 +436,30 @@ BlockOutputStreamPtr StorageS3::write(const ASTPtr & /*query*/, const StorageMet
        max_single_part_upload_size);
 }

+
+void StorageS3::truncate(const ASTPtr & /* query */, const StorageMetadataPtr &, ContextPtr local_context, TableExclusiveLockHolder &)
+{
+    updateClientAndAuthSettings(local_context, client_auth);
+
+    Aws::S3::Model::ObjectIdentifier obj;
+    obj.SetKey(client_auth.uri.key);
+
+    Aws::S3::Model::Delete delkeys;
+    delkeys.AddObjects(std::move(obj));
+
+    Aws::S3::Model::DeleteObjectsRequest request;
+    request.SetBucket(client_auth.uri.bucket);
+    request.SetDelete(delkeys);
+
+    auto response = client_auth.client->DeleteObjects(request);
+    if (!response.IsSuccess())
+    {
+        const auto & err = response.GetError();
+        throw Exception(std::to_string(static_cast<int>(err.GetErrorType())) + ": " + err.GetMessage(), ErrorCodes::S3_ERROR);
+    }
+}
+
+
 void StorageS3::updateClientAndAuthSettings(ContextPtr ctx, StorageS3::ClientAuthentication & upd)
 {
    auto settings = ctx->getStorageS3Settings().getSettings(upd.uri.uri.toString());
--- a/src/Storages/StorageS3.h
+++ b/src/Storages/StorageS3.h
@ -130,6 +130,8 @@ public:

    BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override;

+    void truncate(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context, TableExclusiveLockHolder &) override;
+
    NamesAndTypesList getVirtuals() const override;

 private:
--- a/src/Storages/ya.make
+++ b/src/Storages/ya.make
@ -30,6 +30,7 @@ SRCS(
    MergeTree/BackgroundJobsExecutor.cpp
    MergeTree/BoolMask.cpp
    MergeTree/DataPartsExchange.cpp
+    MergeTree/DropPartsRanges.cpp
    MergeTree/EphemeralLockInZooKeeper.cpp
    MergeTree/IMergeTreeDataPart.cpp
    MergeTree/IMergeTreeDataPartWriter.cpp
--- a/tests/clickhouse-test
+++ b/tests/clickhouse-test
@ -29,7 +29,7 @@ import string
 import multiprocessing
 from contextlib import closing

-DISTRIBUTED_DDL_TIMEOUT_MSG = "is executing longer than distributed_ddl_task_timeout (=120)"
+DISTRIBUTED_DDL_TIMEOUT_MSG = "is executing longer than distributed_ddl_task_timeout"

 MESSAGES_TO_RETRY = [
    "DB::Exception: ZooKeeper session has been expired",
@ -41,6 +41,7 @@ MESSAGES_TO_RETRY = [
    "Operation timed out",
    "ConnectionPoolWithFailover: Connection failed at try",
    "DB::Exception: New table appeared in database being dropped or detached. Try again",
+    "is already started to be removing by another replica right now",
    DISTRIBUTED_DDL_TIMEOUT_MSG # FIXME
 ]

@ -48,15 +49,23 @@ MAX_RETRIES = 3

 class Terminated(KeyboardInterrupt):
    pass
+
 def signal_handler(sig, frame):
    raise Terminated(f'Terminated with {sig} signal')

 def stop_tests():
-    # send signal to all processes in group to avoid hung check triggering
-    # (to avoid terminating clickhouse-test itself, the signal should be ignored)
-    signal.signal(signal.SIGTERM, signal.SIG_IGN)
-    os.killpg(os.getpgid(os.getpid()), signal.SIGTERM)
-    signal.signal(signal.SIGTERM, signal.SIG_DFL)
+    global stop_tests_triggered_lock
+    global stop_tests_triggered
+
+    with stop_tests_triggered_lock:
+        if not stop_tests_triggered.is_set():
+            stop_tests_triggered.set()
+
+            # send signal to all processes in group to avoid hung check triggering
+            # (to avoid terminating clickhouse-test itself, the signal should be ignored)
+            signal.signal(signal.SIGTERM, signal.SIG_IGN)
+            os.killpg(os.getpgid(os.getpid()), signal.SIGTERM)
+            signal.signal(signal.SIGTERM, signal.SIG_DFL)

 def json_minify(string):
    """
@ -327,18 +336,20 @@ def colored(text, args, color=None, on_color=None, attrs=None):
        return text


-SERVER_DIED = False
-exit_code = 0
 stop_time = None
+exit_code = multiprocessing.Value("i", 0)
+server_died = multiprocessing.Event()
+stop_tests_triggered_lock = multiprocessing.Lock()
+stop_tests_triggered = multiprocessing.Event()
 queue = multiprocessing.Queue(maxsize=1)
 restarted_tests = []  # (test, stderr)

 # def run_tests_array(all_tests, suite, suite_dir, suite_tmp_dir, run_total):
 def run_tests_array(all_tests_with_params):
    all_tests, num_tests, suite, suite_dir, suite_tmp_dir = all_tests_with_params
-    global exit_code
-    global SERVER_DIED
    global stop_time
+    global exit_code
+    global server_died

    OP_SQUARE_BRACKET = colored("[", args, attrs=['bold'])
    CL_SQUARE_BRACKET = colored("]", args, attrs=['bold'])
@ -380,7 +391,7 @@ def run_tests_array(all_tests_with_params):
            else:
                break

-        if SERVER_DIED:
+        if server_died.is_set():
            stop_tests()
            break

@ -442,7 +453,7 @@ def run_tests_array(all_tests_with_params):
                        if failed_to_check or clickhouse_proc.returncode != 0:
                            failures += 1
                            print("Server does not respond to health check")
-                            SERVER_DIED = True
+                            server_died.set()
                            stop_tests()
                            break

@ -480,7 +491,7 @@ def run_tests_array(all_tests_with_params):
                            if MAX_RETRIES < counter:
                                if args.replicated_database:
                                    if DISTRIBUTED_DDL_TIMEOUT_MSG in stderr:
-                                        SERVER_DIED = True
+                                        server_died.set()
                                break

                        if proc.returncode != 0:
@ -495,10 +506,10 @@ def run_tests_array(all_tests_with_params):

                            # Stop on fatal errors like segmentation fault. They are sent to client via logs.
                            if ' <Fatal> ' in stderr:
-                                SERVER_DIED = True
+                                server_died.set()

                            if testcase_args.stop and ('Connection refused' in stderr or 'Attempt to read after eof' in stderr) and not 'Received exception from server' in stderr:
-                                SERVER_DIED = True
+                                server_died.set()

                            if os.path.isfile(stdout_file):
                                status += ", result:\n\n"
@ -584,7 +595,7 @@ def run_tests_array(all_tests_with_params):
            f" {skipped_total} tests skipped. {(datetime.now() - start_time).total_seconds():.2f} s elapsed"
            f' ({multiprocessing.current_process().name}).',
            args, "red", attrs=["bold"]))
-        exit_code = 1
+        exit_code.value = 1
    else:
        print(colored(f"\n{passed_total} tests passed. {skipped_total} tests skipped."
            f" {(datetime.now() - start_time).total_seconds():.2f} s elapsed"
@ -750,7 +761,7 @@ def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, parallel_test


 def main(args):
-    global SERVER_DIED
+    global server_died
    global stop_time
    global exit_code
    global server_logs_level
@ -857,7 +868,7 @@ def main(args):

    total_tests_run = 0
    for suite in sorted(os.listdir(base_dir), key=sute_key_func):
-        if SERVER_DIED:
+        if server_died.is_set():
            break

        suite_dir = os.path.join(base_dir, suite)
@ -957,8 +968,7 @@ def main(args):
            else:
                print(bt)

-
-            exit_code = 1
+            exit_code.value = 1
        else:
            print(colored("\nNo queries hung.", args, "green", attrs=["bold"]))

@ -975,7 +985,7 @@ def main(args):
    else:
        print("All tests have finished.")

-    sys.exit(exit_code)
+    sys.exit(exit_code.value)


 def find_binary(name):
--- a/tests/integration/test_cluster_copier/test_two_nodes.py
+++ b/tests/integration/test_cluster_copier/test_two_nodes.py
@ -473,17 +473,17 @@ def execute_task(started_cluster, task, cmd_options):


 # Tests
-@pytest.mark.timeout(600)
+@pytest.mark.skip(reason="Too flaky :(")
 def test_different_schema(started_cluster):
    execute_task(started_cluster, TaskWithDifferentSchema(started_cluster), [])


-@pytest.mark.timeout(600)
+@pytest.mark.skip(reason="Too flaky :(")
 def test_ttl_columns(started_cluster):
    execute_task(started_cluster, TaskTTL(started_cluster), [])


-@pytest.mark.timeout(600)
+@pytest.mark.skip(reason="Too flaky :(")
 def test_skip_index(started_cluster):
    execute_task(started_cluster, TaskSkipIndex(started_cluster), [])

--- a/tests/integration/test_replicated_mutations/test.py
+++ b/tests/integration/test_replicated_mutations/test.py
@ -1,3 +1,4 @@
+import logging
 import random
 import threading
 import time
@ -90,7 +91,7 @@ class Runner:
                i += 1

            try:
-                print('thread {}: insert for {}: {}'.format(thread_num, date_str, ','.join(str(x) for x in xs)))
+                logging.debug(f"thread {thread_num}: insert for {date_str}: {xs}")
                random.choice(self.nodes).query("INSERT INTO test_mutations FORMAT TSV", payload)

                with self.mtx:
@ -100,7 +101,7 @@ class Runner:
                    self.total_inserted_rows += len(xs)

            except Exception as e:
-                print('Exception while inserting,', e)
+                logging.debug(f"Exception while inserting: {e}")
                self.exceptions.append(e)
            finally:
                with self.mtx:
@ -128,7 +129,7 @@ class Runner:
                continue

            try:
-                print('thread {}: delete {} * {}'.format(thread_num, to_delete_count, x))
+                logging.debug(f"thread {thread_num}: delete {to_delete_count} * {x}")
                random.choice(self.nodes).query("ALTER TABLE test_mutations DELETE WHERE x = {}".format(x))

                with self.mtx:
@ -138,7 +139,7 @@ class Runner:
                    self.total_deleted_rows += to_delete_count

            except Exception as e:
-                print('Exception while deleting,', e)
+                logging.debug(f"Exception while deleting: {e}")
            finally:
                with self.mtx:
                    self.currently_deleting_xs.remove(x)
@ -185,10 +186,9 @@ def test_mutations(started_cluster):
    assert runner.total_mutations > 0

    all_done = wait_for_mutations(nodes, runner.total_mutations)
-
-    print("Total mutations: ", runner.total_mutations)
+    logging.debug(f"Total mutations: {runner.total_mutations}")
    for node in nodes:
-        print(node.query(
+        logging.debug(node.query(
            "SELECT mutation_id, command, parts_to_do, is_done FROM system.mutations WHERE table = 'test_mutations' FORMAT TSVWithNames"))
    assert all_done

@ -233,9 +233,9 @@ def test_mutations_dont_prevent_merges(started_cluster, nodes):
        t.join()

    for node in nodes:
-        print(node.query(
+        logging.debug(node.query(
            "SELECT mutation_id, command, parts_to_do, is_done FROM system.mutations WHERE table = 'test_mutations' FORMAT TSVWithNames"))
-        print(node.query(
+        logging.debug(node.query(
            "SELECT partition, count(name), sum(active), sum(active*rows) FROM system.parts WHERE table ='test_mutations' GROUP BY partition FORMAT TSVWithNames"))

    assert all_done
--- a/tests/integration/test_storage_hdfs/test.py
+++ b/tests/integration/test_storage_hdfs/test.py
@ -15,7 +15,6 @@ def started_cluster():
    finally:
        cluster.shutdown()

-
 def test_read_write_storage(started_cluster):
    hdfs_api = started_cluster.hdfs_api

@ -235,7 +234,7 @@ def test_virtual_columns(started_cluster):
    expected = "1\tfile1\thdfs://hdfs1:9000//file1\n2\tfile2\thdfs://hdfs1:9000//file2\n3\tfile3\thdfs://hdfs1:9000//file3\n"
    assert node1.query("select id, _file as file_name, _path as file_path from virtual_cols order by id") == expected

-    
+
 def test_read_files_with_spaces(started_cluster):
    hdfs_api = started_cluster.hdfs_api

@ -246,6 +245,18 @@ def test_read_files_with_spaces(started_cluster):
    assert node1.query("select * from test order by id") == "1\n2\n3\n"


+def test_truncate_table(started_cluster):
+    hdfs_api = started_cluster.hdfs_api
+    node1.query(
+        "create table test_truncate (id UInt32, name String, weight Float64) ENGINE = HDFS('hdfs://hdfs1:9000/tr', 'TSV')")
+    node1.query("insert into test_truncate values (1, 'Mark', 72.53)")
+    assert hdfs_api.read_data("/tr") == "1\tMark\t72.53\n"
+    assert node1.query("select * from test_truncate") == "1\tMark\t72.53\n"
+    node1.query("truncate table test_truncate")
+    assert node1.query("select * from test_truncate") == ""
+    node1.query("drop table test_truncate")
+
+
 if __name__ == '__main__':
    cluster.start()
    input("Cluster created, press any key to destroy...")
--- a/tests/integration/test_storage_s3/test.py
+++ b/tests/integration/test_storage_s3/test.py
@ -646,3 +646,28 @@ def test_storage_s3_put_gzip(started_cluster, extension, method):
    f = gzip.GzipFile(fileobj=buf, mode="rb")
    uncompressed_content = f.read().decode()
    assert sum([ int(i.split(',')[1]) for i in uncompressed_content.splitlines() ]) == 708
+
+
+def test_truncate_table(started_cluster):
+    bucket = started_cluster.minio_bucket
+    instance = started_cluster.instances["dummy"]  # type: ClickHouseInstance
+    name = "truncate"
+
+    instance.query("CREATE TABLE {} (id UInt32) ENGINE = S3('http://{}:{}/{}/{}', 'CSV')".format(
+        name, started_cluster.minio_ip, MINIO_INTERNAL_PORT, bucket, name))
+
+    instance.query("INSERT INTO {} SELECT number FROM numbers(10)".format(name))
+    result = instance.query("SELECT * FROM {}".format(name))
+    assert result == instance.query("SELECT number FROM numbers(10)")
+    instance.query("TRUNCATE TABLE {}".format(name))
+
+    minio = started_cluster.minio_client
+    timeout = 30
+    while timeout > 0:
+        if len(list(minio.list_objects(started_cluster.minio_bucket, 'truncate/'))) == 0:
+            return
+        timeout -= 1
+        time.sleep(1)
+    assert(len(list(minio.list_objects(started_cluster.minio_bucket, 'truncate/'))) == 0)
+    assert instance.query("SELECT * FROM {}".format(name)) == ""
+
--- a/tests/integration/test_ttl_replicated/test.py
+++ b/tests/integration/test_ttl_replicated/test.py
@ -351,6 +351,7 @@ def test_ttl_compatibility(started_cluster, node_left, node_right, num_run):
                ENGINE = ReplicatedMergeTree('/clickhouse/tables/test/test_ttl_delete_{suff}', '{replica}')
                ORDER BY id PARTITION BY toDayOfMonth(date)
                TTL date + INTERVAL 3 SECOND
+                SETTINGS max_number_of_merges_with_ttl_in_pool=100, max_replicated_merges_with_ttl_in_queue=100
            '''.format(suff=num_run, replica=node.name))

        node.query(
@ -359,6 +360,7 @@ def test_ttl_compatibility(started_cluster, node_left, node_right, num_run):
                ENGINE = ReplicatedMergeTree('/clickhouse/tables/test/test_ttl_group_by_{suff}', '{replica}')
                ORDER BY id PARTITION BY toDayOfMonth(date)
                TTL date + INTERVAL 3 SECOND GROUP BY id SET val = sum(val)
+                SETTINGS max_number_of_merges_with_ttl_in_pool=100, max_replicated_merges_with_ttl_in_queue=100
            '''.format(suff=num_run, replica=node.name))

        node.query(
@ -367,6 +369,7 @@ def test_ttl_compatibility(started_cluster, node_left, node_right, num_run):
                ENGINE = ReplicatedMergeTree('/clickhouse/tables/test/test_ttl_where_{suff}', '{replica}')
                ORDER BY id PARTITION BY toDayOfMonth(date)
                TTL date + INTERVAL 3 SECOND DELETE WHERE id % 2 = 1
+                SETTINGS max_number_of_merges_with_ttl_in_pool=100, max_replicated_merges_with_ttl_in_queue=100
            '''.format(suff=num_run, replica=node.name))

    node_left.query("INSERT INTO test_ttl_delete VALUES (now(), 1)")
@ -397,9 +400,9 @@ def test_ttl_compatibility(started_cluster, node_left, node_right, num_run):
    node_right.query("OPTIMIZE TABLE test_ttl_group_by FINAL")
    node_right.query("OPTIMIZE TABLE test_ttl_where FINAL")

-    exec_query_with_retry(node_left, "SYSTEM SYNC REPLICA test_ttl_delete")
-    node_left.query("SYSTEM SYNC REPLICA test_ttl_group_by", timeout=20)
-    node_left.query("SYSTEM SYNC REPLICA test_ttl_where", timeout=20)
+    exec_query_with_retry(node_left, "OPTIMIZE TABLE test_ttl_delete FINAL")
+    node_left.query("OPTIMIZE TABLE test_ttl_group_by FINAL", timeout=20)
+    node_left.query("OPTIMIZE TABLE test_ttl_where FINAL", timeout=20)

    # After OPTIMIZE TABLE, it is not guaranteed that everything is merged.
    # Possible scenario (for test_ttl_group_by):
@ -414,6 +417,10 @@ def test_ttl_compatibility(started_cluster, node_left, node_right, num_run):
    node_right.query("SYSTEM SYNC REPLICA test_ttl_group_by", timeout=20)
    node_right.query("SYSTEM SYNC REPLICA test_ttl_where", timeout=20)

+    exec_query_with_retry(node_left, "SYSTEM SYNC REPLICA test_ttl_delete")
+    node_left.query("SYSTEM SYNC REPLICA test_ttl_group_by", timeout=20)
+    node_left.query("SYSTEM SYNC REPLICA test_ttl_where", timeout=20)
+
    assert node_left.query("SELECT id FROM test_ttl_delete ORDER BY id") == "2\n4\n"
    assert node_right.query("SELECT id FROM test_ttl_delete ORDER BY id") == "2\n4\n"

--- a/tests/performance/join_max_streams.xml
+++ b/tests/performance/join_max_streams.xml
@ -0,0 +1,5 @@
+<test>
+    <query>SELECT * FROM (SELECT 1 AS k FROM numbers_mt(1)) t1 LEFT JOIN (SELECT 1 AS k FROM numbers_mt(10000000000) WHERE number = 1) t2 USING k</query>
+    <query>SELECT * FROM (SELECT 1 AS k FROM numbers_mt(1)) t1 LEFT JOIN (SELECT 1 AS k FROM numbers_mt(10000000000) GROUP BY k) t2 USING k</query>
+    <query>SELECT * FROM (SELECT 1 AS k FROM numbers_mt(1)) t1 LEFT JOIN (SELECT 1 AS k FROM numbers_mt(10000000000) WHERE number = 1) t2 ON t1.k = t2.k</query>
+</test>
--- a/Show More
+++ b/Show More