Merge branch 'master' into Vxider-disable_set_and_join_persistency

2024-09-20 08:40:50 +00:00 · 2020-09-19 15:51:40 +03:00 · 2020-09-19 15:51:40 +03:00 · 3003ebabc2
commit 3003ebabc2
parent c9249f77c7 eb9ee723c3
861 changed files with 13455 additions and 6403 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -37,7 +37,7 @@
 	url = https://github.com/ClickHouse-Extras/mariadb-connector-c.git
 [submodule "contrib/jemalloc"]
 	path = contrib/jemalloc
-	url = https://github.com/jemalloc/jemalloc.git
+	url = https://github.com/ClickHouse-Extras/jemalloc.git
 [submodule "contrib/unixodbc"]
 	path = contrib/unixodbc
 	url = https://github.com/ClickHouse-Extras/UnixODBC.git
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,145 @@
+## ClickHouse release 20.8
+
+### ClickHouse release v20.8.2.3-stable, 2020-09-08
+
+#### Backward Incompatible Change
+
+* Now `OPTIMIZE FINAL` query doesn't recalculate TTL for parts that were added before TTL was created. Use `ALTER TABLE ... MATERIALIZE TTL` once to calculate them, after that `OPTIMIZE FINAL` will evaluate TTL's properly. This behavior never worked for replicated tables. [#14220](https://github.com/ClickHouse/ClickHouse/pull/14220) ([alesapin](https://github.com/alesapin)).
+* Extend `parallel_distributed_insert_select` setting, adding an option to run `INSERT` into local table. The setting changes type from `Bool` to `UInt64`, so the values `false` and `true` are no longer supported. If you have these values in server configuration, the server will not start. Please replace them with `0` and `1`, respectively. [#14060](https://github.com/ClickHouse/ClickHouse/pull/14060) ([Azat Khuzhin](https://github.com/azat)).
+* Remove support for the `ODBCDriver` input/output format. This was a deprecated format once used for communication with the ClickHouse ODBC driver, now long superseded by the `ODBCDriver2` format. Resolves [#13629](https://github.com/ClickHouse/ClickHouse/issues/13629). [#13847](https://github.com/ClickHouse/ClickHouse/pull/13847) ([hexiaoting](https://github.com/hexiaoting)).
+
+#### New Feature
+
+* ClickHouse can work as MySQL replica - it is implemented by `MaterializeMySQL` database engine. Implements [#4006](https://github.com/ClickHouse/ClickHouse/issues/4006). [#10851](https://github.com/ClickHouse/ClickHouse/pull/10851) ([Winter Zhang](https://github.com/zhang2014)).
+* Add the ability to specify `Default` compression codec for columns that correspond to settings specified in `config.xml`. Implements: [#9074](https://github.com/ClickHouse/ClickHouse/issues/9074). [#14049](https://github.com/ClickHouse/ClickHouse/pull/14049) ([alesapin](https://github.com/alesapin)).
+* Support Kerberos authentication in Kafka, using `krb5` and `cyrus-sasl` libraries. [#12771](https://github.com/ClickHouse/ClickHouse/pull/12771) ([Ilya Golshtein](https://github.com/ilejn)).
+* Add function `normalizeQuery` that replaces literals, sequences of literals and complex aliases with placeholders. Add function `normalizedQueryHash` that returns identical 64bit hash values for similar queries. It helps to analyze query log. This closes [#11271](https://github.com/ClickHouse/ClickHouse/issues/11271). [#13816](https://github.com/ClickHouse/ClickHouse/pull/13816) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add `time_zones` table. [#13880](https://github.com/ClickHouse/ClickHouse/pull/13880) ([Bharat Nallan](https://github.com/bharatnc)).
+* Add function `defaultValueOfTypeName` that returns the default value for a given type. [#13877](https://github.com/ClickHouse/ClickHouse/pull/13877) ([hcz](https://github.com/hczhcz)).
+* Add `countDigits(x)` function that count number of decimal digits in integer or decimal column. Add `isDecimalOverflow(d, [p])` function that checks if the value in Decimal column is out of its (or specified) precision. [#14151](https://github.com/ClickHouse/ClickHouse/pull/14151) ([Artem Zuikov](https://github.com/4ertus2)).
+* Add `quantileExactLow` and `quantileExactHigh` implementations with respective aliases for `medianExactLow` and `medianExactHigh`. [#13818](https://github.com/ClickHouse/ClickHouse/pull/13818) ([Bharat Nallan](https://github.com/bharatnc)).
+* Added `date_trunc` function that truncates a date/time value to a specified date/time part. [#13888](https://github.com/ClickHouse/ClickHouse/pull/13888) ([Vladimir Golovchenko](https://github.com/vladimir-golovchenko)).
+* Add new optional section `<user_directories>` to the main config. [#13425](https://github.com/ClickHouse/ClickHouse/pull/13425) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Add `ALTER SAMPLE BY` statement that allows to change table sample clause. [#13280](https://github.com/ClickHouse/ClickHouse/pull/13280) ([Amos Bird](https://github.com/amosbird)).
+* Function `position` now supports optional `start_pos` argument. [#13237](https://github.com/ClickHouse/ClickHouse/pull/13237) ([vdimir](https://github.com/vdimir)).
+
+#### Bug Fix
+
+* Fix visible data clobbering by progress bar in client in interactive mode. This fixes [#12562](https://github.com/ClickHouse/ClickHouse/issues/12562) and [#13369](https://github.com/ClickHouse/ClickHouse/issues/13369) and [#13584](https://github.com/ClickHouse/ClickHouse/issues/13584) and fixes [#12964](https://github.com/ClickHouse/ClickHouse/issues/12964). [#13691](https://github.com/ClickHouse/ClickHouse/pull/13691) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fixed incorrect sorting order if `LowCardinality` column when sorting by multiple columns. This fixes [#13958](https://github.com/ClickHouse/ClickHouse/issues/13958). [#14223](https://github.com/ClickHouse/ClickHouse/pull/14223) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Check for array size overflow in `topK` aggregate function. Without this check the user may send a query with carefully crafter parameters that will lead to server crash. This closes [#14452](https://github.com/ClickHouse/ClickHouse/issues/14452). [#14467](https://github.com/ClickHouse/ClickHouse/pull/14467) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix bug which can lead to wrong merges assignment if table has partitions with a single part. [#14444](https://github.com/ClickHouse/ClickHouse/pull/14444) ([alesapin](https://github.com/alesapin)).
+* Stop query execution if exception happened in `PipelineExecutor` itself. This could prevent rare possible query hung. Continuation of [#14334](https://github.com/ClickHouse/ClickHouse/issues/14334). [#14402](https://github.com/ClickHouse/ClickHouse/pull/14402) [#14334](https://github.com/ClickHouse/ClickHouse/pull/14334) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix crash during `ALTER` query for table which was created `AS table_function`. Fixes [#14212](https://github.com/ClickHouse/ClickHouse/issues/14212). [#14326](https://github.com/ClickHouse/ClickHouse/pull/14326) ([alesapin](https://github.com/alesapin)).
+* Fix exception during ALTER LIVE VIEW query with REFRESH command. Live view is an experimental feature. [#14320](https://github.com/ClickHouse/ClickHouse/pull/14320) ([Bharat Nallan](https://github.com/bharatnc)).
+* Fix QueryPlan lifetime (for EXPLAIN PIPELINE graph=1) for queries with nested interpreter. [#14315](https://github.com/ClickHouse/ClickHouse/pull/14315) ([Azat Khuzhin](https://github.com/azat)).
+* Fix segfault in `clickhouse-odbc-bridge` during schema fetch from some external sources. This PR fixes https://github.com/ClickHouse/ClickHouse/issues/13861. [#14267](https://github.com/ClickHouse/ClickHouse/pull/14267) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix crash in mark inclusion search introduced in https://github.com/ClickHouse/ClickHouse/pull/12277. [#14225](https://github.com/ClickHouse/ClickHouse/pull/14225) ([Amos Bird](https://github.com/amosbird)).
+* Fix creation of tables with named tuples. This fixes [#13027](https://github.com/ClickHouse/ClickHouse/issues/13027). [#14143](https://github.com/ClickHouse/ClickHouse/pull/14143) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix formatting of minimal negative decimal numbers. This fixes https://github.com/ClickHouse/ClickHouse/issues/14111. [#14119](https://github.com/ClickHouse/ClickHouse/pull/14119) ([Alexander Kuzmenkov](https://github.com/akuzm)).
+* Fix `DistributedFilesToInsert` metric (zeroed when it should not). [#14095](https://github.com/ClickHouse/ClickHouse/pull/14095) ([Azat Khuzhin](https://github.com/azat)).
+* Fix `pointInPolygon` with const 2d array as polygon. [#14079](https://github.com/ClickHouse/ClickHouse/pull/14079) ([Alexey Ilyukhov](https://github.com/livace)).
+* Fixed wrong mount point in extra info for `Poco::Exception: no space left on device`. [#14050](https://github.com/ClickHouse/ClickHouse/pull/14050) ([tavplubix](https://github.com/tavplubix)).
+* Fix GRANT ALL statement when executed on a non-global level. [#13987](https://github.com/ClickHouse/ClickHouse/pull/13987) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix parser to reject create table as table function with engine. [#13940](https://github.com/ClickHouse/ClickHouse/pull/13940) ([hcz](https://github.com/hczhcz)).
+* Fix wrong results in select queries with `DISTINCT` keyword and subqueries with UNION ALL in case `optimize_duplicate_order_by_and_distinct` setting is enabled. [#13925](https://github.com/ClickHouse/ClickHouse/pull/13925) ([Artem Zuikov](https://github.com/4ertus2)).
+* Fixed potential deadlock when renaming `Distributed` table. [#13922](https://github.com/ClickHouse/ClickHouse/pull/13922) ([tavplubix](https://github.com/tavplubix)).
+* Fix incorrect sorting for `FixedString` columns when sorting by multiple columns. Fixes [#13182](https://github.com/ClickHouse/ClickHouse/issues/13182). [#13887](https://github.com/ClickHouse/ClickHouse/pull/13887) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix potentially imprecise result of `topK`/`topKWeighted` merge (with non-default parameters). [#13817](https://github.com/ClickHouse/ClickHouse/pull/13817) ([Azat Khuzhin](https://github.com/azat)).
+* Fix reading from MergeTree table with INDEX of type SET fails when comparing against NULL. This fixes [#13686](https://github.com/ClickHouse/ClickHouse/issues/13686). [#13793](https://github.com/ClickHouse/ClickHouse/pull/13793) ([Amos Bird](https://github.com/amosbird)).
+* Fix `arrayJoin` capturing in lambda (LOGICAL_ERROR). [#13792](https://github.com/ClickHouse/ClickHouse/pull/13792) ([Azat Khuzhin](https://github.com/azat)).
+* Add step overflow check in function `range`. [#13790](https://github.com/ClickHouse/ClickHouse/pull/13790) ([Azat Khuzhin](https://github.com/azat)).
+* Fixed `Directory not empty` error when concurrently executing `DROP DATABASE` and `CREATE TABLE`. [#13756](https://github.com/ClickHouse/ClickHouse/pull/13756) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add range check for `h3KRing` function. This fixes [#13633](https://github.com/ClickHouse/ClickHouse/issues/13633). [#13752](https://github.com/ClickHouse/ClickHouse/pull/13752) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix race condition between DETACH and background merges. Parts may revive after detach. This is continuation of [#8602](https://github.com/ClickHouse/ClickHouse/issues/8602) that did not fix the issue but introduced a test that started to fail in very rare cases, demonstrating the issue. [#13746](https://github.com/ClickHouse/ClickHouse/pull/13746) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix logging Settings.Names/Values when log_queries_min_type > QUERY_START. [#13737](https://github.com/ClickHouse/ClickHouse/pull/13737) ([Azat Khuzhin](https://github.com/azat)).
+* Fixes `/replicas_status` endpoint response status code when verbose=1. [#13722](https://github.com/ClickHouse/ClickHouse/pull/13722) ([javi santana](https://github.com/javisantana)).
+* Fix incorrect message in `clickhouse-server.init` while checking user and group. [#13711](https://github.com/ClickHouse/ClickHouse/pull/13711) ([ylchou](https://github.com/ylchou)).
+* Do not optimize any(arrayJoin()) -> arrayJoin() under `optimize_move_functions_out_of_any` setting. [#13681](https://github.com/ClickHouse/ClickHouse/pull/13681) ([Azat Khuzhin](https://github.com/azat)).
+* Fix crash in JOIN with StorageMerge and `set enable_optimize_predicate_expression=1`. [#13679](https://github.com/ClickHouse/ClickHouse/pull/13679) ([Artem Zuikov](https://github.com/4ertus2)).
+* Fix typo in error message about `The value of 'number_of_free_entries_in_pool_to_lower_max_size_of_merge' setting`. [#13678](https://github.com/ClickHouse/ClickHouse/pull/13678) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Concurrent `ALTER ... REPLACE/MOVE PARTITION ...` queries might cause deadlock. It's fixed. [#13626](https://github.com/ClickHouse/ClickHouse/pull/13626) ([tavplubix](https://github.com/tavplubix)).
+* Fixed the behaviour when sometimes cache-dictionary returned default value instead of present value from source. [#13624](https://github.com/ClickHouse/ClickHouse/pull/13624) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Fix secondary indices corruption in compact parts. Compact parts are experimental feature. [#13538](https://github.com/ClickHouse/ClickHouse/pull/13538) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix premature `ON CLUSTER` timeouts for queries that must be executed on a single replica. Fixes [#6704](https://github.com/ClickHouse/ClickHouse/issues/6704), [#7228](https://github.com/ClickHouse/ClickHouse/issues/7228), [#13361](https://github.com/ClickHouse/ClickHouse/issues/13361), [#11884](https://github.com/ClickHouse/ClickHouse/issues/11884). [#13450](https://github.com/ClickHouse/ClickHouse/pull/13450) ([alesapin](https://github.com/alesapin)).
+* Fix wrong code in function `netloc`. This fixes [#13335](https://github.com/ClickHouse/ClickHouse/issues/13335). [#13446](https://github.com/ClickHouse/ClickHouse/pull/13446) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix possible race in `StorageMemory`. [#13416](https://github.com/ClickHouse/ClickHouse/pull/13416) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix missing or excessive headers in `TSV/CSVWithNames` formats in HTTP protocol. This fixes [#12504](https://github.com/ClickHouse/ClickHouse/issues/12504). [#13343](https://github.com/ClickHouse/ClickHouse/pull/13343) ([Azat Khuzhin](https://github.com/azat)).
+* Fix parsing row policies from users.xml when names of databases or tables contain dots. This fixes https://github.com/ClickHouse/ClickHouse/issues/5779, https://github.com/ClickHouse/ClickHouse/issues/12527. [#13199](https://github.com/ClickHouse/ClickHouse/pull/13199) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix access to `redis` dictionary after connection was dropped once. It may happen with `cache` and `direct` dictionary layouts. [#13082](https://github.com/ClickHouse/ClickHouse/pull/13082) ([Anton Popov](https://github.com/CurtizJ)).
+* Removed wrong auth access check when using ClickHouseDictionarySource to query remote tables. [#12756](https://github.com/ClickHouse/ClickHouse/pull/12756) ([sundyli](https://github.com/sundy-li)).
+* Properly distinguish subqueries in some cases for common subexpression elimination. https://github.com/ClickHouse/ClickHouse/issues/8333. [#8367](https://github.com/ClickHouse/ClickHouse/pull/8367) ([Amos Bird](https://github.com/amosbird)).
+
+#### Improvement
+
+* Disallows `CODEC` on `ALIAS` column type. Fixes [#13911](https://github.com/ClickHouse/ClickHouse/issues/13911). [#14263](https://github.com/ClickHouse/ClickHouse/pull/14263) ([Bharat Nallan](https://github.com/bharatnc)).
+* When waiting for a dictionary update to complete, use the timeout specified by `query_wait_timeout_milliseconds` setting instead of a hard-coded value. [#14105](https://github.com/ClickHouse/ClickHouse/pull/14105) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Add setting `min_index_granularity_bytes` that protects against accidentally creating a table with very low `index_granularity_bytes` setting. [#14139](https://github.com/ClickHouse/ClickHouse/pull/14139) ([Bharat Nallan](https://github.com/bharatnc)).
+* Now it's possible to fetch partitions from clusters that use different ZooKeeper: `ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'zk-name:/path-in-zookeeper'`. It's useful for shipping data to new clusters. [#14155](https://github.com/ClickHouse/ClickHouse/pull/14155) ([Amos Bird](https://github.com/amosbird)).
+* Slightly better performance of Memory table if it was constructed from a huge number of very small blocks (that's unlikely). Author of the idea: [Mark Papadakis](https://github.com/markpapadakis). Closes [#14043](https://github.com/ClickHouse/ClickHouse/issues/14043). [#14056](https://github.com/ClickHouse/ClickHouse/pull/14056) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Conditional aggregate functions (for example: `avgIf`, `sumIf`, `maxIf`) should return `NULL` when miss rows and use nullable arguments. [#13964](https://github.com/ClickHouse/ClickHouse/pull/13964) ([Winter Zhang](https://github.com/zhang2014)).
+* Increase limit in -Resample combinator to 1M. [#13947](https://github.com/ClickHouse/ClickHouse/pull/13947) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
+* Corrected an error in AvroConfluent format that caused the Kafka table engine to stop processing messages when an abnormally small, malformed, message was received. [#13941](https://github.com/ClickHouse/ClickHouse/pull/13941) ([Gervasio Varela](https://github.com/gervarela)).
+* Fix wrong error for long queries. It was possible to get syntax error other than `Max query size exceeded` for correct query. [#13928](https://github.com/ClickHouse/ClickHouse/pull/13928) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Better error message for null value of `TabSeparated` format. [#13906](https://github.com/ClickHouse/ClickHouse/pull/13906) ([jiang tao](https://github.com/tomjiang1987)).
+* Function `arrayCompact` will compare NaNs bitwise if the type of array elements is Float32/Float64. In previous versions NaNs were always not equal if the type of array elements is Float32/Float64 and were always equal if the type is more complex, like Nullable(Float64). This closes [#13857](https://github.com/ClickHouse/ClickHouse/issues/13857). [#13868](https://github.com/ClickHouse/ClickHouse/pull/13868) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix data race in `lgamma` function. This race was caught only in `tsan`, no side effects a really happened. [#13842](https://github.com/ClickHouse/ClickHouse/pull/13842) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Avoid too slow queries when arrays are manipulated as fields. Throw exception instead. [#13753](https://github.com/ClickHouse/ClickHouse/pull/13753) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Added Redis requirepass authorization (for redis dictionary source). [#13688](https://github.com/ClickHouse/ClickHouse/pull/13688) ([Ivan Torgashov](https://github.com/it1804)).
+* Add MergeTree Write-Ahead-Log (WAL) dump tool. WAL is an experimental feature. [#13640](https://github.com/ClickHouse/ClickHouse/pull/13640) ([BohuTANG](https://github.com/BohuTANG)).
+* In previous versions `lcm` function may produce assertion violation in debug build if called with specifically crafted arguments. This fixes [#13368](https://github.com/ClickHouse/ClickHouse/issues/13368). [#13510](https://github.com/ClickHouse/ClickHouse/pull/13510) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Provide monotonicity for `toDate/toDateTime` functions in more cases. Monotonicity information is used for index analysis (more complex queries will be able to use index). Now the input arguments are saturated more naturally and provides better monotonicity. [#13497](https://github.com/ClickHouse/ClickHouse/pull/13497) ([Amos Bird](https://github.com/amosbird)).
+* Support compound identifiers for custom settings. Custom settings is an integration point of ClickHouse codebase with other codebases (no benefits for ClickHouse itself) [#13496](https://github.com/ClickHouse/ClickHouse/pull/13496) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Move parts from DiskLocal to DiskS3 in parallel. `DiskS3` is an experimental feature. [#13459](https://github.com/ClickHouse/ClickHouse/pull/13459) ([Pavel Kovalenko](https://github.com/Jokser)).
+* Enable mixed granularity parts by default. [#13449](https://github.com/ClickHouse/ClickHouse/pull/13449) ([alesapin](https://github.com/alesapin)).
+* Proper remote host checking in S3 redirects (security-related thing). [#13404](https://github.com/ClickHouse/ClickHouse/pull/13404) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Add `QueryTimeMicroseconds`, `SelectQueryTimeMicroseconds` and `InsertQueryTimeMicroseconds` to system.events. [#13336](https://github.com/ClickHouse/ClickHouse/pull/13336) ([ianton-ru](https://github.com/ianton-ru)).
+* Fix debug assertion when Decimal has too large negative exponent. Fixes [#13188](https://github.com/ClickHouse/ClickHouse/issues/13188). [#13228](https://github.com/ClickHouse/ClickHouse/pull/13228) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Added cache layer for DiskS3 (cache to local disk mark and index files). `DiskS3` is an experimental feature. [#13076](https://github.com/ClickHouse/ClickHouse/pull/13076) ([Pavel Kovalenko](https://github.com/Jokser)).
+* Fix readline so it dumps history to file now. [#13600](https://github.com/ClickHouse/ClickHouse/pull/13600) ([Amos Bird](https://github.com/amosbird)).
+* Create `system` database with `Atomic` engine by default (a preparation to enable `Atomic` database engine by default everywhere). [#13680](https://github.com/ClickHouse/ClickHouse/pull/13680) ([tavplubix](https://github.com/tavplubix)).
+
+#### Performance Improvement
+
+* Slightly optimize very short queries with `LowCardinality`. [#14129](https://github.com/ClickHouse/ClickHouse/pull/14129) ([Anton Popov](https://github.com/CurtizJ)).
+* Enable parallel INSERTs for table engines `Null`, `Memory`, `Distributed` and `Buffer` when the setting `max_insert_threads` is set. [#14120](https://github.com/ClickHouse/ClickHouse/pull/14120) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fail fast if `max_rows_to_read` limit is exceeded on parts scan. The motivation behind this change is to skip ranges scan for all selected parts if it is clear that `max_rows_to_read` is already exceeded. The change is quite noticeable for queries over big number of parts. [#13677](https://github.com/ClickHouse/ClickHouse/pull/13677) ([Roman Khavronenko](https://github.com/hagen1778)).
+* Slightly improve performance of aggregation by UInt8/UInt16 keys. [#13099](https://github.com/ClickHouse/ClickHouse/pull/13099) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Optimize `has()`, `indexOf()` and `countEqual()` functions for `Array(LowCardinality(T))` and constant right arguments. [#12550](https://github.com/ClickHouse/ClickHouse/pull/12550) ([myrrc](https://github.com/myrrc)).
+* When performing trivial `INSERT SELECT` queries, automatically set `max_threads` to 1 or `max_insert_threads`, and set `max_block_size` to `min_insert_block_size_rows`. Related to [#5907](https://github.com/ClickHouse/ClickHouse/issues/5907). [#12195](https://github.com/ClickHouse/ClickHouse/pull/12195) ([flynn](https://github.com/ucasFL)).
+
+#### Experimental Feature
+
+* Add types `Int128`, `Int256`, `UInt256` and related functions for them. Extend Decimals with Decimal256 (precision up to 76 digits). New types are under the setting `allow_experimental_bigint_types`. It is working extremely slow and bad. The implementation is incomplete. Please don't use this feature. [#13097](https://github.com/ClickHouse/ClickHouse/pull/13097) ([Artem Zuikov](https://github.com/4ertus2)).
+
+#### Build/Testing/Packaging Improvement
+
+* Added `clickhouse install` script, that is useful if you only have a single binary. [#13528](https://github.com/ClickHouse/ClickHouse/pull/13528) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Allow to run `clickhouse` binary without configuration. [#13515](https://github.com/ClickHouse/ClickHouse/pull/13515) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Enable check for typos in code with `codespell`. [#13513](https://github.com/ClickHouse/ClickHouse/pull/13513) [#13511](https://github.com/ClickHouse/ClickHouse/pull/13511) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Enable Shellcheck in CI as a linter of .sh tests. This closes [#13168](https://github.com/ClickHouse/ClickHouse/issues/13168). [#13530](https://github.com/ClickHouse/ClickHouse/pull/13530) [#13529](https://github.com/ClickHouse/ClickHouse/pull/13529) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add a CMake option to fail configuration instead of auto-reconfiguration, enabled by default. [#13687](https://github.com/ClickHouse/ClickHouse/pull/13687) ([Konstantin](https://github.com/podshumok)).
+* Expose version of embedded tzdata via TZDATA_VERSION in system.build_options. [#13648](https://github.com/ClickHouse/ClickHouse/pull/13648) ([filimonov](https://github.com/filimonov)).
+* Improve generation of system.time_zones table during build. Closes [#14209](https://github.com/ClickHouse/ClickHouse/issues/14209). [#14215](https://github.com/ClickHouse/ClickHouse/pull/14215) ([filimonov](https://github.com/filimonov)).
+* Build ClickHouse with the most fresh tzdata from package repository. [#13623](https://github.com/ClickHouse/ClickHouse/pull/13623) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add the ability to write js-style comments in skip_list.json. [#14159](https://github.com/ClickHouse/ClickHouse/pull/14159) ([alesapin](https://github.com/alesapin)).
+* Ensure that there is no copy-pasted GPL code. [#13514](https://github.com/ClickHouse/ClickHouse/pull/13514) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Switch tests docker images to use test-base parent. [#14167](https://github.com/ClickHouse/ClickHouse/pull/14167) ([Ilya Yatsishin](https://github.com/qoega)).
+* Adding retry logic when bringing up docker-compose cluster; Increasing COMPOSE_HTTP_TIMEOUT. [#14112](https://github.com/ClickHouse/ClickHouse/pull/14112) ([vzakaznikov](https://github.com/vzakaznikov)).
+* Enabled `system.text_log` in stress test to find more bugs. [#13855](https://github.com/ClickHouse/ClickHouse/pull/13855) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Testflows LDAP module: adding missing certificates and dhparam.pem for openldap4. [#13780](https://github.com/ClickHouse/ClickHouse/pull/13780) ([vzakaznikov](https://github.com/vzakaznikov)).
+* ZooKeeper cannot work reliably in unit tests in CI infrastructure. Using unit tests for ZooKeeper interaction with real ZooKeeper is bad idea from the start (unit tests are not supposed to verify complex distributed systems). We already using integration tests for this purpose and they are better suited. [#13745](https://github.com/ClickHouse/ClickHouse/pull/13745) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Added docker image for style check. Added style check that all docker and docker compose files are located in docker directory. [#13724](https://github.com/ClickHouse/ClickHouse/pull/13724) ([Ilya Yatsishin](https://github.com/qoega)).
+* Fix cassandra build on Mac OS. [#13708](https://github.com/ClickHouse/ClickHouse/pull/13708) ([Ilya Yatsishin](https://github.com/qoega)).
+* Fix link error in shared build. [#13700](https://github.com/ClickHouse/ClickHouse/pull/13700) ([Amos Bird](https://github.com/amosbird)).
+* Updating LDAP user authentication suite to check that it works with RBAC. [#13656](https://github.com/ClickHouse/ClickHouse/pull/13656) ([vzakaznikov](https://github.com/vzakaznikov)).
+* Removed `-DENABLE_CURL_CLIENT` for `contrib/aws`. [#13628](https://github.com/ClickHouse/ClickHouse/pull/13628) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Increasing health-check timeouts for ClickHouse nodes and adding support to dump docker-compose logs if unhealthy containers found. [#13612](https://github.com/ClickHouse/ClickHouse/pull/13612) ([vzakaznikov](https://github.com/vzakaznikov)).
+* Make sure https://github.com/ClickHouse/ClickHouse/issues/10977 is invalid. [#13539](https://github.com/ClickHouse/ClickHouse/pull/13539) ([Amos Bird](https://github.com/amosbird)).
+* Skip PR's from robot-clickhouse. [#13489](https://github.com/ClickHouse/ClickHouse/pull/13489) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Move Dockerfiles from integration tests to `docker/test` directory. docker_compose files are available in `runner` docker container. Docker images are built in CI and not in integration tests. [#13448](https://github.com/ClickHouse/ClickHouse/pull/13448) ([Ilya Yatsishin](https://github.com/qoega)).
+
+
 ## ClickHouse release 20.7

 ### ClickHouse release v20.7.2.30-stable, 2020-08-31
--- a/README.md
+++ b/README.md
@ -15,6 +15,7 @@ ClickHouse is an open-source column-oriented database management system that all
 * [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any.
 * You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person.

-## Upcoming Events		
+## Upcoming Events

-* [ClickHouse talk at Ya.Subbotnik (in Russian)](https://ya.cc/t/cIBI-3yECj5JF) on September 12, 2020.
+* [eBay migrating from Druid](https://us02web.zoom.us/webinar/register/tZMkfu6rpjItHtaQ1DXcgPWcSOnmM73HLGKL) on September 23, 2020.
+* [ClickHouse for Edge Analytics](https://ones2020.sched.com/event/bWPs) on September 29, 2020.
--- a/base/common/CMakeLists.txt
+++ b/base/common/CMakeLists.txt
@ -18,6 +18,7 @@ set (SRCS
    terminalColors.cpp
    errnoToString.cpp
    getResource.cpp
+    StringRef.cpp
 )

 if (ENABLE_REPLXX)
--- a/base/common/StringRef.cpp
+++ b/base/common/StringRef.cpp
@ -0,0 +1,13 @@
+#include <ostream>
+
+#include "StringRef.h"
+
+
+std::ostream & operator<<(std::ostream & os, const StringRef & str)
+{
+    if (str.data)
+        os.write(str.data, str.size);
+
+    return os;
+}
+
--- a/base/common/StringRef.h
+++ b/base/common/StringRef.h
@ -4,7 +4,7 @@
 #include <string>
 #include <vector>
 #include <functional>
-#include <ostream>
+#include <iosfwd>

 #include <common/types.h>
 #include <common/unaligned.h>
@ -322,10 +322,4 @@ inline bool operator==(StringRef lhs, const char * rhs)
    return true;
 }

-inline std::ostream & operator<<(std::ostream & os, const StringRef & str)
-{
-    if (str.data)
-        os.write(str.data, str.size);
-
-    return os;
-}
+std::ostream & operator<<(std::ostream & os, const StringRef & str);
--- a/base/common/arithmeticOverflow.h
+++ b/base/common/arithmeticOverflow.h
@ -1,6 +1,6 @@
 #pragma once

-#include <common/types.h>
+#include <common/extended_types.h>

 namespace common
 {
--- a/base/common/extended_types.h
+++ b/base/common/extended_types.h
@ -0,0 +1,108 @@
+#pragma once
+
+#include <type_traits>
+
+#include <common/types.h>
+#include <common/wide_integer.h>
+
+using Int128 = __int128;
+
+using wInt256 = wide::integer<256, signed>;
+using wUInt256 = wide::integer<256, unsigned>;
+
+static_assert(sizeof(wInt256) == 32);
+static_assert(sizeof(wUInt256) == 32);
+
+/// The standard library type traits, such as std::is_arithmetic, with one exception
+/// (std::common_type), are "set in stone". Attempting to specialize them causes undefined behavior.
+/// So instead of using the std type_traits, we use our own version which allows extension.
+template <typename T>
+struct is_signed
+{
+    static constexpr bool value = std::is_signed_v<T>;
+};
+
+template <> struct is_signed<Int128> { static constexpr bool value = true; };
+template <> struct is_signed<wInt256> { static constexpr bool value = true; };
+
+template <typename T>
+inline constexpr bool is_signed_v = is_signed<T>::value;
+
+template <typename T>
+struct is_unsigned
+{
+    static constexpr bool value = std::is_unsigned_v<T>;
+};
+
+template <> struct is_unsigned<wUInt256> { static constexpr bool value = true; };
+
+template <typename T>
+inline constexpr bool is_unsigned_v = is_unsigned<T>::value;
+
+
+/// TODO: is_integral includes char, char8_t and wchar_t.
+template <typename T>
+struct is_integer
+{
+    static constexpr bool value = std::is_integral_v<T>;
+};
+
+template <> struct is_integer<Int128> { static constexpr bool value = true; };
+template <> struct is_integer<wInt256> { static constexpr bool value = true; };
+template <> struct is_integer<wUInt256> { static constexpr bool value = true; };
+
+template <typename T>
+inline constexpr bool is_integer_v = is_integer<T>::value;
+
+
+template <typename T>
+struct is_arithmetic
+{
+    static constexpr bool value = std::is_arithmetic_v<T>;
+};
+
+template <> struct is_arithmetic<__int128> { static constexpr bool value = true; };
+
+template <typename T>
+inline constexpr bool is_arithmetic_v = is_arithmetic<T>::value;
+
+template <typename T>
+struct make_unsigned
+{
+    typedef std::make_unsigned_t<T> type;
+};
+
+template <> struct make_unsigned<Int128> { using type = unsigned __int128; };
+template <> struct make_unsigned<wInt256>  { using type = wUInt256; };
+template <> struct make_unsigned<wUInt256> { using type = wUInt256; };
+
+template <typename T> using make_unsigned_t = typename make_unsigned<T>::type;
+
+template <typename T>
+struct make_signed
+{
+    typedef std::make_signed_t<T> type;
+};
+
+template <> struct make_signed<wInt256>  { using type = wInt256; };
+template <> struct make_signed<wUInt256> { using type = wInt256; };
+
+template <typename T> using make_signed_t = typename make_signed<T>::type;
+
+template <typename T>
+struct is_big_int
+{
+    static constexpr bool value = false;
+};
+
+template <> struct is_big_int<wInt256> { static constexpr bool value = true; };
+template <> struct is_big_int<wUInt256> { static constexpr bool value = true; };
+
+template <typename T>
+inline constexpr bool is_big_int_v = is_big_int<T>::value;
+
+template <typename To, typename From>
+inline To bigint_cast(const From & x [[maybe_unused]])
+{
+    return static_cast<To>(x);
+}
--- a/base/common/throwError.h
+++ b/base/common/throwError.h
@ -0,0 +1,13 @@
+#pragma once
+#include <stdexcept>
+
+/// Throw DB::Exception-like exception before its definition.
+/// DB::Exception derived from Poco::Exception derived from std::exception.
+/// DB::Exception generally cought as Poco::Exception. std::exception generally has other catch blocks and could lead to other outcomes.
+/// DB::Exception is not defined yet. It'd better to throw Poco::Exception but we do not want to include any big header here, even <string>.
+/// So we throw some std::exception instead in the hope its catch block is the same as DB::Exception one.
+template <typename T>
+inline void throwError(const T & err)
+{
+    throw std::runtime_error(err);
+}
--- a/base/common/types.h
+++ b/base/common/types.h
@ -2,9 +2,6 @@

 #include <cstdint>
 #include <string>
-#include <type_traits>
-
-#include <common/wide_integer.h>

 using Int8 = int8_t;
 using Int16 = int16_t;
@ -21,112 +18,24 @@ using UInt16 = uint16_t;
 using UInt32 = uint32_t;
 using UInt64 = uint64_t;

-using Int128 = __int128;
+using String = std::string;

-using wInt256 = std::wide_integer<256, signed>;
-using wUInt256 = std::wide_integer<256, unsigned>;
+namespace DB
+{

-static_assert(sizeof(wInt256) == 32);
-static_assert(sizeof(wUInt256) == 32);
+using UInt8 = ::UInt8;
+using UInt16 = ::UInt16;
+using UInt32 = ::UInt32;
+using UInt64 = ::UInt64;
+
+using Int8 = ::Int8;
+using Int16 = ::Int16;
+using Int32 = ::Int32;
+using Int64 = ::Int64;
+
+using Float32 = float;
+using Float64 = double;

 using String = std::string;

-/// The standard library type traits, such as std::is_arithmetic, with one exception
-/// (std::common_type), are "set in stone". Attempting to specialize them causes undefined behavior.
-/// So instead of using the std type_traits, we use our own version which allows extension.
-template <typename T>
-struct is_signed
-{
-    static constexpr bool value = std::is_signed_v<T>;
-};
-
-template <> struct is_signed<Int128> { static constexpr bool value = true; };
-template <> struct is_signed<wInt256> { static constexpr bool value = true; };
-
-template <typename T>
-inline constexpr bool is_signed_v = is_signed<T>::value;
-
-template <typename T>
-struct is_unsigned
-{
-    static constexpr bool value = std::is_unsigned_v<T>;
-};
-
-template <> struct is_unsigned<wUInt256> { static constexpr bool value = true; };
-
-template <typename T>
-inline constexpr bool is_unsigned_v = is_unsigned<T>::value;
-
-
-/// TODO: is_integral includes char, char8_t and wchar_t.
-template <typename T>
-struct is_integer
-{
-    static constexpr bool value = std::is_integral_v<T>;
-};
-
-template <> struct is_integer<Int128> { static constexpr bool value = true; };
-template <> struct is_integer<wInt256> { static constexpr bool value = true; };
-template <> struct is_integer<wUInt256> { static constexpr bool value = true; };
-
-template <typename T>
-inline constexpr bool is_integer_v = is_integer<T>::value;
-
-
-template <typename T>
-struct is_arithmetic
-{
-    static constexpr bool value = std::is_arithmetic_v<T>;
-};
-
-template <> struct is_arithmetic<__int128> { static constexpr bool value = true; };
-
-template <typename T>
-inline constexpr bool is_arithmetic_v = is_arithmetic<T>::value;
-
-template <typename T>
-struct make_unsigned
-{
-    typedef std::make_unsigned_t<T> type;
-};
-
-template <> struct make_unsigned<Int128> { using type = unsigned __int128; };
-template <> struct make_unsigned<wInt256>  { using type = wUInt256; };
-template <> struct make_unsigned<wUInt256> { using type = wUInt256; };
-
-template <typename T> using make_unsigned_t = typename make_unsigned<T>::type;
-
-template <typename T>
-struct make_signed
-{
-    typedef std::make_signed_t<T> type;
-};
-
-template <> struct make_signed<wInt256>  { using type = wInt256; };
-template <> struct make_signed<wUInt256> { using type = wInt256; };
-
-template <typename T> using make_signed_t = typename make_signed<T>::type;
-
-template <typename T>
-struct is_big_int
-{
-    static constexpr bool value = false;
-};
-
-template <> struct is_big_int<wInt256> { static constexpr bool value = true; };
-template <> struct is_big_int<wUInt256> { static constexpr bool value = true; };
-
-template <typename T>
-inline constexpr bool is_big_int_v = is_big_int<T>::value;
-
-template <typename T>
-inline std::string bigintToString(const T & x)
-{
-    return to_string(x);
-}
-
-template <typename To, typename From>
-inline To bigint_cast(const From & x [[maybe_unused]])
-{
-    return static_cast<To>(x);
 }
--- a/base/common/wide_integer.h
+++ b/base/common/wide_integer.h
@ -22,79 +22,87 @@
 * without express or implied warranty.
 */

-#include <climits> // CHAR_BIT
-#include <cmath>
 #include <cstdint>
 #include <limits>
 #include <type_traits>
+#include <initializer_list>
+
+namespace wide
+{
+template <size_t Bits, typename Signed>
+class integer;
+}

 namespace std
 {
-template <size_t Bits, typename Signed>
-class wide_integer;

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-struct common_type<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>>;
+struct common_type<wide::integer<Bits, Signed>, wide::integer<Bits2, Signed2>>;

 template <size_t Bits, typename Signed, typename Arithmetic>
-struct common_type<wide_integer<Bits, Signed>, Arithmetic>;
+struct common_type<wide::integer<Bits, Signed>, Arithmetic>;

 template <typename Arithmetic, size_t Bits, typename Signed>
-struct common_type<Arithmetic, wide_integer<Bits, Signed>>;
+struct common_type<Arithmetic, wide::integer<Bits, Signed>>;
+
+}
+
+namespace wide
+{

 template <size_t Bits, typename Signed>
-class wide_integer
+class integer
 {
 public:
-    using base_type = uint8_t;
-    using signed_base_type = int8_t;
+    using base_type = uint64_t;
+    using signed_base_type = int64_t;

    // ctors
-    wide_integer() = default;
+    integer() = default;

    template <typename T>
-    constexpr wide_integer(T rhs) noexcept;
+    constexpr integer(T rhs) noexcept;
    template <typename T>
-    constexpr wide_integer(std::initializer_list<T> il) noexcept;
+    constexpr integer(std::initializer_list<T> il) noexcept;

    // assignment
    template <size_t Bits2, typename Signed2>
-    constexpr wide_integer<Bits, Signed> & operator=(const wide_integer<Bits2, Signed2> & rhs) noexcept;
+    constexpr integer<Bits, Signed> & operator=(const integer<Bits2, Signed2> & rhs) noexcept;

    template <typename Arithmetic>
-    constexpr wide_integer<Bits, Signed> & operator=(Arithmetic rhs) noexcept;
+    constexpr integer<Bits, Signed> & operator=(Arithmetic rhs) noexcept;

    template <typename Arithmetic>
-    constexpr wide_integer<Bits, Signed> & operator*=(const Arithmetic & rhs);
+    constexpr integer<Bits, Signed> & operator*=(const Arithmetic & rhs);

    template <typename Arithmetic>
-    constexpr wide_integer<Bits, Signed> & operator/=(const Arithmetic & rhs);
+    constexpr integer<Bits, Signed> & operator/=(const Arithmetic & rhs);

    template <typename Arithmetic>
-    constexpr wide_integer<Bits, Signed> & operator+=(const Arithmetic & rhs) noexcept(is_same<Signed, unsigned>::value);
+    constexpr integer<Bits, Signed> & operator+=(const Arithmetic & rhs) noexcept(std::is_same_v<Signed, unsigned>);

    template <typename Arithmetic>
-    constexpr wide_integer<Bits, Signed> & operator-=(const Arithmetic & rhs) noexcept(is_same<Signed, unsigned>::value);
+    constexpr integer<Bits, Signed> & operator-=(const Arithmetic & rhs) noexcept(std::is_same_v<Signed, unsigned>);

    template <typename Integral>
-    constexpr wide_integer<Bits, Signed> & operator%=(const Integral & rhs);
+    constexpr integer<Bits, Signed> & operator%=(const Integral & rhs);

    template <typename Integral>
-    constexpr wide_integer<Bits, Signed> & operator&=(const Integral & rhs) noexcept;
+    constexpr integer<Bits, Signed> & operator&=(const Integral & rhs) noexcept;

    template <typename Integral>
-    constexpr wide_integer<Bits, Signed> & operator|=(const Integral & rhs) noexcept;
+    constexpr integer<Bits, Signed> & operator|=(const Integral & rhs) noexcept;

    template <typename Integral>
-    constexpr wide_integer<Bits, Signed> & operator^=(const Integral & rhs) noexcept;
+    constexpr integer<Bits, Signed> & operator^=(const Integral & rhs) noexcept;

-    constexpr wide_integer<Bits, Signed> & operator<<=(int n);
-    constexpr wide_integer<Bits, Signed> & operator>>=(int n) noexcept;
+    constexpr integer<Bits, Signed> & operator<<=(int n) noexcept;
+    constexpr integer<Bits, Signed> & operator>>=(int n) noexcept;

-    constexpr wide_integer<Bits, Signed> & operator++() noexcept(is_same<Signed, unsigned>::value);
-    constexpr wide_integer<Bits, Signed> operator++(int) noexcept(is_same<Signed, unsigned>::value);
-    constexpr wide_integer<Bits, Signed> & operator--() noexcept(is_same<Signed, unsigned>::value);
-    constexpr wide_integer<Bits, Signed> operator--(int) noexcept(is_same<Signed, unsigned>::value);
+    constexpr integer<Bits, Signed> & operator++() noexcept(std::is_same_v<Signed, unsigned>);
+    constexpr integer<Bits, Signed> operator++(int) noexcept(std::is_same_v<Signed, unsigned>);
+    constexpr integer<Bits, Signed> & operator--() noexcept(std::is_same_v<Signed, unsigned>);
+    constexpr integer<Bits, Signed> operator--(int) noexcept(std::is_same_v<Signed, unsigned>);

    // observers

@ -114,12 +122,12 @@ public:

 private:
    template <size_t Bits2, typename Signed2>
-    friend class wide_integer;
+    friend class integer;

-    friend class numeric_limits<wide_integer<Bits, signed>>;
-    friend class numeric_limits<wide_integer<Bits, unsigned>>;
+    friend class std::numeric_limits<integer<Bits, signed>>;
+    friend class std::numeric_limits<integer<Bits, unsigned>>;

-    base_type m_arr[_impl::arr_size];
+    base_type items[_impl::item_count];
 };

 template <typename T>
@ -134,115 +142,117 @@ using __only_integer = typename std::enable_if<IntegralConcept<T>() && IntegralC

 // Unary operators
 template <size_t Bits, typename Signed>
-constexpr wide_integer<Bits, Signed> operator~(const wide_integer<Bits, Signed> & lhs) noexcept;
+constexpr integer<Bits, Signed> operator~(const integer<Bits, Signed> & lhs) noexcept;

 template <size_t Bits, typename Signed>
-constexpr wide_integer<Bits, Signed> operator-(const wide_integer<Bits, Signed> & lhs) noexcept(is_same<Signed, unsigned>::value);
+constexpr integer<Bits, Signed> operator-(const integer<Bits, Signed> & lhs) noexcept(std::is_same_v<Signed, unsigned>);

 template <size_t Bits, typename Signed>
-constexpr wide_integer<Bits, Signed> operator+(const wide_integer<Bits, Signed> & lhs) noexcept(is_same<Signed, unsigned>::value);
+constexpr integer<Bits, Signed> operator+(const integer<Bits, Signed> & lhs) noexcept(std::is_same_v<Signed, unsigned>);

 // Binary operators
 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator*(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator*(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 std::common_type_t<Arithmetic, Arithmetic2> constexpr operator*(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator/(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator/(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 std::common_type_t<Arithmetic, Arithmetic2> constexpr operator/(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator+(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator+(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 std::common_type_t<Arithmetic, Arithmetic2> constexpr operator+(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator-(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator-(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 std::common_type_t<Arithmetic, Arithmetic2> constexpr operator-(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator%(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator%(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Integral, typename Integral2, class = __only_integer<Integral, Integral2>>
 std::common_type_t<Integral, Integral2> constexpr operator%(const Integral & rhs, const Integral2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator&(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator&(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Integral, typename Integral2, class = __only_integer<Integral, Integral2>>
 std::common_type_t<Integral, Integral2> constexpr operator&(const Integral & rhs, const Integral2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator|(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator|(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Integral, typename Integral2, class = __only_integer<Integral, Integral2>>
 std::common_type_t<Integral, Integral2> constexpr operator|(const Integral & rhs, const Integral2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-std::common_type_t<wide_integer<Bits, Signed>, wide_integer<Bits2, Signed2>> constexpr
-operator^(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+std::common_type_t<integer<Bits, Signed>, integer<Bits2, Signed2>> constexpr
+operator^(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Integral, typename Integral2, class = __only_integer<Integral, Integral2>>
 std::common_type_t<Integral, Integral2> constexpr operator^(const Integral & rhs, const Integral2 & lhs);

 // TODO: Integral
 template <size_t Bits, typename Signed>
-constexpr wide_integer<Bits, Signed> operator<<(const wide_integer<Bits, Signed> & lhs, int n) noexcept;
+constexpr integer<Bits, Signed> operator<<(const integer<Bits, Signed> & lhs, int n) noexcept;
 template <size_t Bits, typename Signed>
-constexpr wide_integer<Bits, Signed> operator>>(const wide_integer<Bits, Signed> & lhs, int n) noexcept;
+constexpr integer<Bits, Signed> operator>>(const integer<Bits, Signed> & lhs, int n) noexcept;

 template <size_t Bits, typename Signed, typename Int, typename = std::enable_if_t<!std::is_same_v<Int, int>>>
-constexpr wide_integer<Bits, Signed> operator<<(const wide_integer<Bits, Signed> & lhs, Int n) noexcept
+constexpr integer<Bits, Signed> operator<<(const integer<Bits, Signed> & lhs, Int n) noexcept
 {
    return lhs << int(n);
 }
 template <size_t Bits, typename Signed, typename Int, typename = std::enable_if_t<!std::is_same_v<Int, int>>>
-constexpr wide_integer<Bits, Signed> operator>>(const wide_integer<Bits, Signed> & lhs, Int n) noexcept
+constexpr integer<Bits, Signed> operator>>(const integer<Bits, Signed> & lhs, Int n) noexcept
 {
    return lhs >> int(n);
 }

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator<(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator<(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator<(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator>(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator>(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator>(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator<=(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator<=(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator<=(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator>=(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator>=(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator>=(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator==(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator==(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator==(const Arithmetic & rhs, const Arithmetic2 & lhs);

 template <size_t Bits, typename Signed, size_t Bits2, typename Signed2>
-constexpr bool operator!=(const wide_integer<Bits, Signed> & lhs, const wide_integer<Bits2, Signed2> & rhs);
+constexpr bool operator!=(const integer<Bits, Signed> & lhs, const integer<Bits2, Signed2> & rhs);
 template <typename Arithmetic, typename Arithmetic2, class = __only_arithmetic<Arithmetic, Arithmetic2>>
 constexpr bool operator!=(const Arithmetic & rhs, const Arithmetic2 & lhs);

-template <size_t Bits, typename Signed>
-std::string to_string(const wide_integer<Bits, Signed> & n);
+}
+
+namespace std
+{

 template <size_t Bits, typename Signed>
-struct hash<wide_integer<Bits, Signed>>;
+struct hash<wide::integer<Bits, Signed>>;

 }

--- a/base/common/wide_integer_impl.h
+++ b/base/common/wide_integer_impl.h
--- a/base/common/wide_integer_to_string.h
+++ b/base/common/wide_integer_to_string.h
@ -0,0 +1,35 @@
+#pragma once
+
+#include <string>
+
+#include "wide_integer.h"
+
+namespace wide
+{
+
+template <size_t Bits, typename Signed>
+inline std::string to_string(const integer<Bits, Signed> & n)
+{
+    std::string res;
+    if (integer<Bits, Signed>::_impl::operator_eq(n, 0U))
+        return "0";
+
+    integer<Bits, unsigned> t;
+    bool is_neg = integer<Bits, Signed>::_impl::is_negative(n);
+    if (is_neg)
+        t = integer<Bits, Signed>::_impl::operator_unary_minus(n);
+    else
+        t = n;
+
+    while (!integer<Bits, unsigned>::_impl::operator_eq(t, 0U))
+    {
+        res.insert(res.begin(), '0' + char(integer<Bits, unsigned>::_impl::operator_percent(t, 10U)));
+        t = integer<Bits, unsigned>::_impl::operator_slash(t, 10U);
+    }
+
+    if (is_neg)
+        res.insert(res.begin(), '-');
+    return res;
+}
+
+}
--- a/base/common/ya.make
+++ b/base/common/ya.make
@ -53,6 +53,7 @@ SRCS(
    setTerminalEcho.cpp
    shift10.cpp
    sleep.cpp
+    StringRef.cpp
    terminalColors.cpp

 )
--- a/cmake/autogenerated_versions.txt
+++ b/cmake/autogenerated_versions.txt
@ -1,5 +1,5 @@
 # This strings autochanged from release_lib.sh:
-SET(VERSION_REVISION 54440)
+SET(VERSION_REVISION 54441)
 SET(VERSION_MAJOR 20)
 SET(VERSION_MINOR 10)
 SET(VERSION_PATCH 1)
--- a/cmake/sanitize.cmake
+++ b/cmake/sanitize.cmake
@ -36,7 +36,15 @@ if (SANITIZE)
        endif ()

    elseif (SANITIZE STREQUAL "thread")
-        set (TSAN_FLAGS "-fsanitize=thread -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt")
+        set (TSAN_FLAGS "-fsanitize=thread")
+        if (COMPILER_CLANG)
+            set (TSAN_FLAGS "${TSAN_FLAGS} -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt")
+        else()
+            message (WARNING "TSAN suppressions was not passed to the compiler (since the compiler is not clang)")
+            message (WARNING "Use the following command to pass them manually:")
+            message (WARNING "    export TSAN_OPTIONS=\"$TSAN_OPTIONS suppressions=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt\"")
+        endif()
+

        set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${TSAN_FLAGS}")
        set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${TSAN_FLAGS}")
--- a/cmake/tools.cmake
+++ b/cmake/tools.cmake
@ -28,7 +28,7 @@ elseif (COMPILER_CLANG)
            set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fchar8_t")
        endif ()
    else ()
-        set (CLANG_MINIMUM_VERSION 8)
+        set (CLANG_MINIMUM_VERSION 9)
        if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS ${CLANG_MINIMUM_VERSION})
            message (FATAL_ERROR "Clang version must be at least ${CLANG_MINIMUM_VERSION}.")
        endif ()
--- a/cmake/warnings.cmake
+++ b/cmake/warnings.cmake
@ -23,7 +23,7 @@ option (WEVERYTHING "Enables -Weverything option with some exceptions. This is i
 # Control maximum size of stack frames. It can be important if the code is run in fibers with small stack size.
 # Only in release build because debug has too large stack frames.
 if ((NOT CMAKE_BUILD_TYPE_UC STREQUAL "DEBUG") AND (NOT SANITIZE))
-    add_warning(frame-larger-than=16384)
+    add_warning(frame-larger-than=32768)
 endif ()

 if (COMPILER_CLANG)
@ -169,9 +169,16 @@ elseif (COMPILER_GCC)
    # Warn if vector operation is not implemented via SIMD capabilities of the architecture
    add_cxx_compile_options(-Wvector-operation-performance)

-    # XXX: gcc10 stuck with this option while compiling GatherUtils code
-    # (anyway there are builds with clang, that will warn)
    if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 10)
+        # XXX: gcc10 stuck with this option while compiling GatherUtils code
+        # (anyway there are builds with clang, that will warn)
        add_cxx_compile_options(-Wno-sequence-point)
+        # XXX: gcc10 false positive with this warning in MergeTreePartition.cpp
+        #     inlined from 'void writeHexByteLowercase(UInt8, void*)' at ../src/Common/hex.h:39:11,
+        #     inlined from 'DB::String DB::MergeTreePartition::getID(const DB::Block&) const' at ../src/Storages/MergeTree/MergeTreePartition.cpp:85:30:
+        #     ../contrib/libc-headers/x86_64-linux-gnu/bits/string_fortified.h:34:33: error: writing 2 bytes into a region of size 0 [-Werror=stringop-overflow=]
+        #     34 |   return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
+        # For some reason (bug in gcc?) macro 'GCC diagnostic ignored "-Wstringop-overflow"' doesn't help.
+        add_cxx_compile_options(-Wno-stringop-overflow)
    endif()
 endif ()
--- a/contrib/jemalloc
+++ b/contrib/jemalloc
@ -1 +1 @@
-Subproject commit ea6b3e973b477b8061e0076bb257dbd7f3faa756
+Subproject commit 026764f19995c53583ab25a3b9c06a2fd74e4689
--- a/contrib/llvm
+++ b/contrib/llvm
@ -1 +1 @@
-Subproject commit 3d6c7e916760b395908f28a1c885c8334d4fa98b
+Subproject commit 8f24d507c1cfeec66d27f48fe74518fd278e2d25
--- a/debian/rules
+++ b/debian/rules
@ -18,7 +18,7 @@ ifeq ($(CCACHE_PREFIX),distcc)
    THREADS_COUNT=$(shell distcc -j)
 endif
 ifeq ($(THREADS_COUNT),)
-    THREADS_COUNT=$(shell echo $$(( $$(nproc || grep -c ^processor /proc/cpuinfo || sysctl -n hw.ncpu || echo 8) / 2 )) )
+    THREADS_COUNT=$(shell nproc || grep -c ^processor /proc/cpuinfo || sysctl -n hw.ncpu || echo 4)
 endif
 DEB_BUILD_OPTIONS+=parallel=$(THREADS_COUNT)

--- a/docker/packager/binary/Dockerfile
+++ b/docker/packager/binary/Dockerfile
@ -11,7 +11,7 @@ RUN apt-get update \
    && echo "${LLVM_PUBKEY_HASH} /tmp/llvm-snapshot.gpg.key" | sha384sum -c \
    && apt-key add /tmp/llvm-snapshot.gpg.key \
    && export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
-    && echo "deb [trusted=yes] http://apt.llvm.org/${CODENAME}/ llvm-toolchain-${CODENAME}-${LLVM_VERSION} main" >> \
+    && echo "deb [trusted=yes] http://apt.llvm.org/${CODENAME}/ llvm-toolchain-${CODENAME}-11 main" >> \
        /etc/apt/sources.list

 # initial packages
@ -32,18 +32,15 @@ RUN apt-get update \
        curl \
        gcc-9 \
        g++-9 \
-        gcc-10 \
-        g++-10 \
        llvm-${LLVM_VERSION} \
        clang-${LLVM_VERSION} \
        lld-${LLVM_VERSION} \
        clang-tidy-${LLVM_VERSION} \
-        clang-9 \
-        lld-9 \
-        clang-tidy-9 \
-        clang-8 \
-        lld-8 \
-        clang-tidy-8 \
+        clang-11 \
+        clang-tidy-11 \
+        lld-11 \
+        llvm-11 \
+        llvm-11-dev \
        libicu-dev \
        libreadline-dev \
        ninja-build \
@ -93,5 +90,16 @@ RUN wget -nv "https://developer.arm.com/-/media/Files/downloads/gnu-a/8.3-2019.0
 # Download toolchain for FreeBSD 11.3
 RUN wget -nv https://clickhouse-datasets.s3.yandex.net/toolchains/toolchains/freebsd-11.3-toolchain.tar.xz

+# NOTE: For some reason we have outdated version of gcc-10 in ubuntu 20.04 stable.
+# Current workaround is to use latest version proposed repo. Remove as soon as
+# gcc-10.2 appear in stable repo.
+RUN echo 'deb http://archive.ubuntu.com/ubuntu/ focal-proposed restricted main multiverse universe' > /etc/apt/sources.list.d/proposed-repositories.list
+
+RUN apt-get update \
+    && apt-get install gcc-10 g++-10 --yes
+
+RUN rm /etc/apt/sources.list.d/proposed-repositories.list && apt-get update
+
+
 COPY build.sh /
 CMD ["/bin/bash", "/build.sh"]
--- a/docker/packager/binary/build.sh
+++ b/docker/packager/binary/build.sh
@ -18,9 +18,9 @@ ccache --zero-stats ||:
 ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so ||:
 rm -f CMakeCache.txt
 cmake --debug-trycompile --verbose=1 -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER $CMAKE_FLAGS ..
-ninja -j $(($(nproc) / 2)) $NINJA_FLAGS clickhouse-bundle
+ninja $NINJA_FLAGS clickhouse-bundle
 mv ./programs/clickhouse* /output
-mv ./src/unit_tests_dbms /output
+mv ./src/unit_tests_dbms /output ||: # may not exist for some binary builds
 find . -name '*.so' -print -exec mv '{}' /output \;
 find . -name '*.so.*' -print -exec mv '{}' /output \;

--- a/docker/packager/deb/Dockerfile
+++ b/docker/packager/deb/Dockerfile
@ -42,8 +42,6 @@ RUN  export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
 # Libraries from OS are only needed to test the "unbundled" build (this is not used in production).
 RUN apt-get update \
    && apt-get install \
-        gcc-10 \
-        g++-10 \
        gcc-9 \
        g++-9 \
        clang-11 \
@ -75,6 +73,16 @@ RUN apt-get update \
        pigz \
        --yes --no-install-recommends

+# NOTE: For some reason we have outdated version of gcc-10 in ubuntu 20.04 stable.
+# Current workaround is to use latest version proposed repo. Remove as soon as
+# gcc-10.2 appear in stable repo.
+RUN echo 'deb http://archive.ubuntu.com/ubuntu/ focal-proposed restricted main multiverse universe' > /etc/apt/sources.list.d/proposed-repositories.list
+
+RUN apt-get update \
+    && apt-get install gcc-10 g++-10 --yes --no-install-recommends
+
+RUN rm /etc/apt/sources.list.d/proposed-repositories.list && apt-get update
+
 # This symlink required by gcc to find lld compiler
 RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld

--- a/docker/packager/packager
+++ b/docker/packager/packager
@ -93,7 +93,7 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ

    cxx = cc.replace('gcc', 'g++').replace('clang', 'clang++')

-    if image_type == "deb":
+    if image_type == "deb" or image_type == "unbundled":
        result.append("DEB_CC={}".format(cc))
        result.append("DEB_CXX={}".format(cxx))
    elif image_type == "binary":
@ -105,6 +105,7 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
    # Create combined output archive for split build and for performance tests.
    if package_type == "performance":
        result.append("COMBINED_OUTPUT=performance")
+        cmake_flags.append("-DENABLE_TESTS=0")
    elif split_binary:
        result.append("COMBINED_OUTPUT=shared_build")

--- a/docker/test/fuzzer/run-fuzzer.sh
+++ b/docker/test/fuzzer/run-fuzzer.sh
@ -35,7 +35,7 @@ function download
 #    wget -O- -nv -nd -c "https://clickhouse-builds.s3.yandex.net/$PR_TO_TEST/$SHA_TO_TEST/clickhouse_build_check/performance/performance.tgz" \
 #        | tar --strip-components=1 -zxv

-    wget -nv -nd -c "https://clickhouse-builds.s3.yandex.net/$PR_TO_TEST/$SHA_TO_TEST/clickhouse_build_check/clang-10_debug_none_bundled_unsplitted_disable_False_binary/clickhouse"
+    wget -nv -nd -c "https://clickhouse-builds.s3.yandex.net/$PR_TO_TEST/$SHA_TO_TEST/clickhouse_build_check/clang-11_debug_none_bundled_unsplitted_disable_False_binary/clickhouse"
    chmod +x clickhouse
    ln -s ./clickhouse ./clickhouse-server
    ln -s ./clickhouse ./clickhouse-client
@ -227,4 +227,4 @@ EOF
    ;&
 esac

-exit $task_exit_code
+exit $task_exit_code
--- a/docker/test/performance-comparison/README.md
+++ b/docker/test/performance-comparison/README.md
@ -16,7 +16,7 @@ We also consider the test to be unstable, if the observed difference is less tha
 performance differences above 5% more often than in 5% runs, so the test is likely
 to have false positives.

-### How to read the report
+### How to Read the Report

 The check status summarizes the report in a short text message like `1 faster, 10 unstable`:
 * `1 faster` -- how many queries became faster,
@ -27,28 +27,50 @@ The check status summarizes the report in a short text message like `1 faster, 1

 The report page itself constists of a several tables. Some of them always signify errors, e.g. "Run errors" -- the very presence of this table indicates that there were errors during the test, that are not normal and must be fixed. Some tables are mostly informational, e.g. "Test times" -- they reflect normal test results. But if a cell in such table is marked in red, this also means an error, e.g., a test is taking too long to run.

-#### Tested commits
+#### Tested Commits
 Informational, no action required. Log messages for the commits that are tested. Note that for the right commit, we show nominal tested commit `pull/*/head` and real tested commit `pull/*/merge`, which is generated by GitHub by merging latest master to the `pull/*/head` and which we actually build and test in CI.

-#### Run errors
-Action required for every item -- these are errors that must be fixed. The errors that ocurred when running some test queries. For more information about the error, download test output archive and see `test-name-err.log`. To reproduce, see 'How to run' below.
+#### Error Summary
+Action required for every item.

-#### Slow on client
-Action required for every item -- these are errors that must be fixed. This table shows queries that take significantly longer to process on the client than on the server. A possible reason might be sending too much data to the client, e.g., a forgotten `format Null`.
+This table summarizes all errors that ocurred during the test. Click the links to go to the description of a particular error.

-#### Short queries not marked as short
-Action required for every item -- these are errors that must be fixed. This table shows queries that are "short" but not explicitly marked as such. "Short" queries are too fast to meaningfully compare performance, because the changes are drowned by the noise. We consider all queries that run faster than 0.02 s to be "short", and only check the performance if they became slower than this threshold. Probably this mode is not what you want, so you have to increase the query run time to be between 1 and 0.1 s, so that the performance can be compared. You do want this "short" mode for queries that complete "immediately", such as some varieties of `select count(*)`. You have to mark them as "short" explicitly by writing `<query short="1">...`. The value of "short" attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
+#### Run Errors
+Action required for every item -- these are errors that must be fixed.

-#### Partial queries
-Action required for the cells marked in red. Shows the queries we are unable to run on an old server -- probably because they contain a new function. You should see this table when you add a new function and a performance test for it. Check that the run time and variance are acceptable (run time between 0.1 and 1 seconds, variance below 10%). If not, they will be highlighted in red.
+The errors that ocurred when running some test queries. For more information about the error, download test output archive and see `test-name-err.log`. To reproduce, see 'How to run' below.

-#### Changes in performance
-Action required for the cells marked in red, and some cheering is appropriate for the cells marked in green. These are the queries for which we observe a statistically significant change in performance. Note that there will always be some false positives -- we try to filter by p < 0.001, and have 2000 queries, so two false positives per run are expected. In practice we have more -- e.g. code layout changed because of some unknowable jitter in compiler internals, so the change we observe is real, but it is a 'false positive' in the sense that it is not directly caused by your changes. If, based on your knowledge of ClickHouse internals, you can decide that the observed test changes are not relevant to the changes made in the tested PR, you can ignore them.
+#### Slow on Client
+Action required for every item -- these are errors that must be fixed.
+
+This table shows queries that take significantly longer to process on the client than on the server. A possible reason might be sending too much data to the client, e.g., a forgotten `format Null`.
+
+#### Inconsistent Short Marking
+Action required for every item -- these are errors that must be fixed.
+
+Queries that have "short" duration (on the order of 0.1 s) can't be reliably tested in a normal way, where we perform a small (about ten) measurements for each server, because the signal-to-noise ratio is much smaller. There is a special mode for such queries that instead runs them for a fixed amount of time, normally with much higher number of measurements (up to thousands). This mode must be explicitly enabled by the test author to avoid accidental errors. It must be used only for queries that are meant to complete "immediately", such as `select count(*)`. If your query is not supposed to be "immediate", try to make it run longer, by e.g. processing more data.
+
+This table shows queries for which the "short" marking is not consistent with the actual query run time -- i.e., a query runs for a long time but is marked as short, or it runs very fast but is not marked as short.
+
+If your query is really supposed to complete "immediately" and can't be made to run longer, you have to mark it as "short". To do so, write `<query short="1">...` in the test file. The value of "short" attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
+
+
+#### Partial Queries
+Action required for the cells marked in red.
+
+Shows the queries we are unable to run on an old server -- probably because they contain a new function. You should see this table when you add a new function and a performance test for it. Check that the run time and variance are acceptable (run time between 0.1 and 1 seconds, variance below 10%). If not, they will be highlighted in red.
+
+#### Changes in Performance
+Action required for the cells marked in red, and some cheering is appropriate for the cells marked in green.
+
+These are the queries for which we observe a statistically significant change in performance. Note that there will always be some false positives -- we try to filter by p < 0.001, and have 2000 queries, so two false positives per run are expected. In practice we have more -- e.g. code layout changed because of some unknowable jitter in compiler internals, so the change we observe is real, but it is a 'false positive' in the sense that it is not directly caused by your changes. If, based on your knowledge of ClickHouse internals, you can decide that the observed test changes are not relevant to the changes made in the tested PR, you can ignore them.

 You can find flame graphs for queries with performance changes in the test output archive, in files named as 'my_test_0_Cpu_SELECT 1 FROM....FORMAT Null.left.svg'. First goes the test name, then the query number in the test, then the trace type (same as in `system.trace_log`), and then the server version (left is old and right is new).

-#### Unstable queries
-Action required for the cells marked in red. These are queries for which we did not observe a statistically significant change in performance, but for which the variance in query performance is very high. This means that we are likely to observe big changes in performance even in the absence of real changes, e.g. when comparing the server to itself. Such queries are going to have bad sensitivity as performance tests -- if a query has, say, 50% expected variability, this means we are going to see changes in performance up to 50%, even when there were no real changes in the code. And because of this, we won't be able to detect changes less than 50% with such a query, which is pretty bad. The reasons for the high variability must be investigated and fixed; ideally, the variability should be brought under 5-10%. 
+#### Unstable Queries
+Action required for the cells marked in red.
+
+These are the queries for which we did not observe a statistically significant change in performance, but for which the variance in query performance is very high. This means that we are likely to observe big changes in performance even in the absence of real changes, e.g. when comparing the server to itself. Such queries are going to have bad sensitivity as performance tests -- if a query has, say, 50% expected variability, this means we are going to see changes in performance up to 50%, even when there were no real changes in the code. And because of this, we won't be able to detect changes less than 50% with such a query, which is pretty bad. The reasons for the high variability must be investigated and fixed; ideally, the variability should be brought under 5-10%. 

 The most frequent reason for instability is that the query is just too short -- e.g. below 0.1 seconds. Bringing query time to 0.2 seconds or above usually helps.
 Other reasons may include:
@ -57,24 +79,33 @@ Other reasons may include:

 Investigating the instablility is the hardest problem in performance testing, and we still have not been able to understand the reasons behind the instability of some queries. There are some data that can help you in the performance test output archive. Look for files named 'my_unstable_test_0_SELECT 1...FORMAT Null.{left,right}.metrics.rep'. They contain metrics from `system.query_log.ProfileEvents` and functions from stack traces from `system.trace_log`, that vary significantly between query runs. The second column is array of \[min, med, max] values for the metric. Say, if you see `PerfCacheMisses` there, it may mean that the code being tested has not-so-cache-local memory access pattern that is sensitive to memory layout.

-#### Skipped tests
-Informational, no action required. Shows the tests that were skipped, and the reason for it. Normally it is because the data set required for the test was not loaded, or the test is marked as 'long' -- both cases mean that the test is too big to be ran per-commit.
+#### Skipped Tests
+Informational, no action required.

-#### Test performance changes
-Informational, no action required. This table summarizes the changes in performance of queries in each test -- how many queries have changed, how many are unstable, and what is the magnitude of the changes.
+Shows the tests that were skipped, and the reason for it. Normally it is because the data set required for the test was not loaded, or the test is marked as 'long' -- both cases mean that the test is too big to be ran per-commit.

-#### Test times
-Action required for the cells marked in red. This table shows the run times for all the tests. You may have to fix two kinds of errors in this table:
+#### Test Performance Changes
+Informational, no action required.
+
+This table summarizes the changes in performance of queries in each test -- how many queries have changed, how many are unstable, and what is the magnitude of the changes.
+
+#### Test Times
+Action required for the cells marked in red.
+
+This table shows the run times for all the tests. You may have to fix two kinds of errors in this table:
 1) Average query run time is too long -- probalby means that the preparatory steps such as creating the table and filling them with data are taking too long. Try to make them faster.
 2) Longest query run time is too long -- some particular queries are taking too long, try to make them faster. The ideal query run time is between 0.1 and 1 s.

-#### Concurrent benchmarks
-No action required. This table shows the results of a concurrent behcmark where queries from `website` are ran in parallel using `clickhouse-benchmark`, and requests per second values are compared for old and new servers. It shows variability up to 20% for no apparent reason, so it's probably safe to disregard it. We have it for special cases like investigating concurrency effects in memory allocators, where it may be important.
+#### Metric Changes
+No action required.

-#### Metric changes
-No action required. These are changes in median values of metrics from `system.asynchronous_metrics_log`. Again, they are prone to unexplained variation and you can safely ignore this table unless it's interesting to you for some particular reason (e.g. you want to compare memory usage). There are also graphs of these metrics in the performance test output archive, in the `metrics` folder.
+These are changes in median values of metrics from `system.asynchronous_metrics_log`. These metrics are prone to unexplained variation and you can safely ignore this table unless it's interesting to you for some particular reason (e.g. you want to compare memory usage). There are also graphs of these metrics in the performance test output archive, in the `metrics` folder.

-### How to run
+#### Errors while Building the Report
+Ask a maintainer for help. These errors normally indicate a problem with testing infrastructure.
+
+
+### How to Run
 Run the entire docker container, specifying PR number (0 for master)
 and SHA of the commit to test. The reference revision is determined as a nearest
 ancestor testing release tag. It is possible to specify the reference revision and
--- a/docker/test/performance-comparison/compare.sh
+++ b/docker/test/performance-comparison/compare.sh
@ -63,7 +63,7 @@ function configure
    # Make copies of the original db for both servers. Use hardlinks instead
    # of copying to save space. Before that, remove preprocessed configs and
    # system tables, because sharing them between servers with hardlinks may
-    # lead to weird effects.
+    # lead to weird effects. 
    rm -r left/db ||:
    rm -r right/db ||:
    rm -r db0/preprocessed_configs ||:
@ -121,7 +121,7 @@ function run_tests
    then
        # Use the explicitly set path to directory with test files.
        test_prefix="$CHPC_TEST_PATH"
-    elif [ "$PR_TO_TEST" = "0" ]
+    elif [ "$PR_TO_TEST" == "0" ]
    then
        # When testing commits from master, use the older test files. This
        # allows the tests to pass even when we add new functions and tests for
@ -155,6 +155,20 @@ function run_tests
        test_files=$(ls "$test_prefix"/*.xml)
    fi

+    # For PRs, test only a subset of queries, and run them less times.
+    # If the corresponding environment variables are already set, keep
+    # those values.
+    if [ "$PR_TO_TEST" == "0" ]
+    then
+        CHPC_RUNS=${CHPC_RUNS:-13}
+        CHPC_MAX_QUERIES=${CHPC_MAX_QUERIES:-0}
+    else
+        CHPC_RUNS=${CHPC_RUNS:-7}
+        CHPC_MAX_QUERIES=${CHPC_MAX_QUERIES:-20}
+    fi
+    export CHPC_RUNS
+    export CHPC_MAX_QUERIES
+
    # Determine which concurrent benchmarks to run. For now, the only test
    # we run as a concurrent benchmark is 'website'. Run it as benchmark if we
    # are also going to run it as a normal test.
@ -184,11 +198,13 @@ function run_tests
        echo test "$test_name"

        TIMEFORMAT=$(printf "$test_name\t%%3R\t%%3U\t%%3S\n")
-        # the grep is to filter out set -x output and keep only time output
+        # The grep is to filter out set -x output and keep only time output.
+        # The '2>&1 >/dev/null' redirects stderr to stdout, and discards stdout.
        { \
            time "$script_dir/perf.py" --host localhost localhost --port 9001 9002 \
+                --runs "$CHPC_RUNS" --max-queries "$CHPC_MAX_QUERIES" \
                -- "$test" > "$test_name-raw.tsv" 2> "$test_name-err.log" ; \
-        } 2>&1 >/dev/null | grep -v ^+ >> "wall-clock-times.tsv" \
+        } 2>&1 >/dev/null | tee >(grep -v ^+ >> "wall-clock-times.tsv") \
            || echo "Test $test_name failed with error code $?" >> "$test_name-err.log"
    done

@ -197,33 +213,9 @@ function run_tests
    wait
 }

-# Run some queries concurrently and report the resulting TPS. This additional
-# (relatively) short test helps detect concurrency-related effects, because the
-# main performance comparison testing is done query-by-query.
-function run_benchmark
-{
-    rm -rf benchmark ||:
-    mkdir benchmark ||:
-
-    # The list is built by run_tests.
-    while IFS= read -r file
-    do
-        name=$(basename "$file" ".xml")
-
-        "$script_dir/perf.py" --print-queries "$file" > "benchmark/$name-queries.txt"
-        "$script_dir/perf.py" --print-settings "$file" > "benchmark/$name-settings.txt"
-
-        readarray -t settings < "benchmark/$name-settings.txt"
-        command=(clickhouse-benchmark --concurrency 6 --cumulative --iterations 1000 --randomize 1 --delay 0 --continue_on_errors "${settings[@]}")
-
-        "${command[@]}" --port 9001 --json "benchmark/$name-left.json" < "benchmark/$name-queries.txt"
-        "${command[@]}" --port 9002 --json "benchmark/$name-right.json" < "benchmark/$name-queries.txt"
-    done < benchmarks-to-run.txt
-}
-
 function get_profiles_watchdog
 {
-    sleep 6000
+    sleep 600

    echo "The trace collection did not finish in time." >> profile-errors.log

@ -394,12 +386,24 @@ create table query_run_metrics_denorm engine File(TSV, 'analyze/query-run-metric
    order by test, query_index, metric_names, version, query_id
    ;

+-- Filter out tests that don't have an even number of runs, to avoid breaking
+-- the further calculations. This may happen if there was an error during the
+-- test runs, e.g. the server died. It will be reported in test errors, so we
+-- don't have to report it again.
+create view broken_queries as
+    select test, query_index
+    from query_runs
+    group by test, query_index
+    having count(*) % 2 != 0
+    ;
+
 -- This is for statistical processing with eqmed.sql
 create table query_run_metrics_for_stats engine File(
        TSV, -- do not add header -- will parse with grep
        'analyze/query-run-metrics-for-stats.tsv')
    as select test, query_index, 0 run, version, metric_values
    from query_run_metric_arrays
+    where (test, query_index) not in broken_queries
    order by test, query_index, run, version
    ;

@ -478,8 +482,6 @@ build_log_column_definitions
 cat analyze/errors.log >> report/errors.log ||:
 cat profile-errors.log >> report/errors.log ||:

-short_query_threshold="0.02"
-
 clickhouse-local --query "
 create view query_display_names as select * from
    file('analyze/query-display-names.tsv', TSV,
@ -512,18 +514,11 @@ create view query_metric_stats as
 -- Main statistics for queries -- query time as reported in query log.
 create table queries engine File(TSVWithNamesAndTypes, 'report/queries.tsv')
    as select
-        -- Comparison mode doesn't make sense for queries that complete
-        -- immediately (on the same order of time as noise). If query duration is
-        -- less that some threshold, we just skip it. If there is a significant
-        -- regression in such query, the time will exceed the threshold, and we
-        -- well process it normally and detect the regression.
-        right < $short_query_threshold as short,
-
-        not short and abs(diff) > report_threshold        and abs(diff) > stat_threshold as changed_fail,
-        not short and abs(diff) > report_threshold - 0.05 and abs(diff) > stat_threshold as changed_show,
+        abs(diff) > report_threshold        and abs(diff) > stat_threshold as changed_fail,
+        abs(diff) > report_threshold - 0.05 and abs(diff) > stat_threshold as changed_show,
        
-        not short and not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
-        not short and not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
+        not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
+        not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
        
        left, right, diff, stat_threshold,
        if(report_threshold > 0, report_threshold, 0.10) as report_threshold,
@ -628,9 +623,9 @@ create table wall_clock_time_per_test engine Memory as select *

 create table test_time engine Memory as
    select test, sum(client) total_client_time,
-        maxIf(client, not short) query_max,
-        minIf(client, not short) query_min,
-        count(*) queries, sum(short) short_queries
+        max(client) query_max,
+        min(client) query_min,
+        count(*) queries
    from total_client_time_per_query full join queries using (test, query_index)
    group by test;

@ -638,7 +633,6 @@ create table test_times_report engine File(TSV, 'report/test-times.tsv') as
    select wall_clock_time_per_test.test, real,
        toDecimal64(total_client_time, 3),
        queries,
-        short_queries,
        toDecimal64(query_max, 3),
        toDecimal64(real / queries, 3) avg_real_per_query,
        toDecimal64(query_min, 3)
@ -673,32 +667,47 @@ create table queries_for_flamegraph engine File(TSVWithNamesAndTypes,
    select test, query_index from queries where unstable_show or changed_show
    ;

-- List of queries that have 'short' duration, but are not marked as 'short' by
-- the test author (we report them).
-create table unmarked_short_queries_report
-    engine File(TSV, 'report/unmarked-short-queries.tsv')
-    as select time, test, query_index, query_display_name
+
+create view shortness
+    as select 
+        (test, query_index) in
+            (select * from file('analyze/marked-short-queries.tsv', TSV,
+            'test text, query_index int'))
+            as marked_short,
+        time, test, query_index, query_display_name
    from (
-            select right time, test, query_index from queries where short
+            select right time, test, query_index from queries
            union all
            select time_median, test, query_index from partial_query_times
-                where time_median < $short_query_threshold
        ) times
        left join query_display_names
            on times.test = query_display_names.test
                and times.query_index = query_display_names.query_index
-    where (test, query_index) not in
-        (select * from file('analyze/marked-short-queries.tsv', TSV,
-            'test text, query_index int'))
-    order by test, query_index
    ;

+-- Report of queries that have inconsistent 'short' markings:
+-- 1) have short duration, but are not marked as 'short'
+-- 2) the reverse -- marked 'short' but take too long.
+-- The threshold for 2) is twice the threshold for 1), to avoid jitter.
+create table inconsistent_short_marking_report
+    engine File(TSV, 'report/inconsistent-short-marking.tsv')
+    as select
+        multiIf(marked_short and time > 0.1, 'marked as short but is too long',
+                not marked_short and time < 0.02, 'is short but not marked as such',
+                '') problem,
+        marked_short, time,
+        test, query_index, query_display_name
+    from shortness
+    where problem != ''
+    ;
+
+
 --------------------------------------------------------------------------------
 -- various compatibility data formats follow, not related to the main report

 -- keep the table in old format so that we can analyze new and old data together
 create table queries_old_format engine File(TSVWithNamesAndTypes, 'queries.rep')
-    as select short, changed_fail, unstable_fail, left, right, diff,
+    as select 0 short, changed_fail, unstable_fail, left, right, diff,
        stat_threshold, test, query_display_name query
    from queries
    ;
@ -915,13 +924,15 @@ done

 function report_metrics
 {
+build_log_column_definitions
+
 rm -rf metrics ||:
 mkdir metrics

 clickhouse-local --query "
 create view right_async_metric_log as
    select * from file('right-async-metric-log.tsv', TSVWithNamesAndTypes,
-        'event_date Date, event_time DateTime, name String, value Float64')
+        '$(cat right-async-metric-log.tsv.columns)')
    ;

 -- Use the right log as time reference because it may have higher precision.
@ -930,7 +941,7 @@ create table metrics engine File(TSV, 'metrics/metrics.tsv') as
    select name metric, r.event_time - min_time event_time, l.value as left, r.value as right
    from right_async_metric_log r
    asof join file('left-async-metric-log.tsv', TSVWithNamesAndTypes,
-        'event_date Date, event_time DateTime, name String, value Float64') l
+        '$(cat left-async-metric-log.tsv.columns)') l
    on l.name = r.name and r.event_time <= l.event_time
    order by metric, event_time
    ;
@ -994,9 +1005,6 @@ case "$stage" in
    # Ignore the errors to collect the log and build at least some report, anyway
    time run_tests ||:
    ;&
-"run_benchmark")
-    time run_benchmark 2> >(tee -a run-errors.tsv 1>&2) ||:
-    ;&
 "get_profiles")
    # Check for huge pages.
    cat /sys/kernel/mm/transparent_hugepage/enabled > thp-enabled.txt ||:
--- a/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml
+++ b/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml
@ -1,4 +1,4 @@
-<yandex> 
+<yandex>
    <http_port remove="remove"/>
    <mysql_port remove="remove"/>
    <interserver_http_port remove="remove"/>
@ -22,4 +22,6 @@
    <uncompressed_cache_size>1000000000</uncompressed_cache_size>

    <asynchronous_metrics_update_period_s>10</asynchronous_metrics_update_period_s>
+
+    <remap_executable replace="replace">true</remap_executable>
 </yandex>
--- a/docker/test/performance-comparison/eqmed.sql
+++ b/docker/test/performance-comparison/eqmed.sql
@ -8,7 +8,7 @@ select
 from
   (
      -- quantiles of randomization distributions
-      select quantileExactForEach(0.999)(
+      select quantileExactForEach(0.99)(
        arrayMap(x, y -> abs(x - y), metrics_by_label[1], metrics_by_label[2]) as d
      ) threshold
      ---- uncomment to see what the distribution is really like
@ -33,7 +33,7 @@ from
                                -- strip the query away before the join -- it might be several kB long;
                                (select metrics, run, version from table) no_query,
                                -- duplicate input measurements into many virtual runs
-                                numbers(1, 100000) nn
+                                numbers(1, 10000) nn
                              -- for each virtual run, randomly reorder measurements
                              order by virtual_run, rand()
                           ) virtual_runs
--- a/docker/test/performance-comparison/perf.py
+++ b/docker/test/performance-comparison/perf.py
@ -1,16 +1,20 @@
 #!/usr/bin/python3

-import os
-import sys
-import itertools
-import clickhouse_driver
-import xml.etree.ElementTree as et
 import argparse
+import clickhouse_driver
+import itertools
+import functools
+import math
+import os
 import pprint
+import random
 import re
+import statistics
 import string
+import sys
 import time
 import traceback
+import xml.etree.ElementTree as et

 def tsv_escape(s):
    return s.replace('\\', '\\\\').replace('\t', '\\t').replace('\n', '\\n').replace('\r','')
@ -20,7 +24,8 @@ parser = argparse.ArgumentParser(description='Run performance test.')
 parser.add_argument('file', metavar='FILE', type=argparse.FileType('r', encoding='utf-8'), nargs=1, help='test description file')
 parser.add_argument('--host', nargs='*', default=['localhost'], help="Server hostname(s). Corresponds to '--port' options.")
 parser.add_argument('--port', nargs='*', default=[9000], help="Server port(s). Corresponds to '--host' options.")
-parser.add_argument('--runs', type=int, default=int(os.environ.get('CHPC_RUNS', 13)), help='Number of query runs per server. Defaults to CHPC_RUNS environment variable.')
+parser.add_argument('--runs', type=int, default=1, help='Number of query runs per server.')
+parser.add_argument('--max-queries', type=int, default=None, help='Test no more than this number of queries, chosen at random.')
 parser.add_argument('--long', action='store_true', help='Do not skip the tests tagged as long.')
 parser.add_argument('--print-queries', action='store_true', help='Print test queries and exit.')
 parser.add_argument('--print-settings', action='store_true', help='Print test settings and exit.')
@ -62,18 +67,13 @@ def substitute_parameters(query_templates, other_templates = []):
 # Build a list of test queries, substituting parameters to query templates,
 # and reporting the queries marked as short.
 test_queries = []
+is_short = []
 for e in root.findall('query'):
-    new_queries = []
-    if 'short' in e.attrib:
-        new_queries, [is_short] = substitute_parameters([e.text], [[e.attrib['short']]])
-        for i, s in enumerate(is_short):
-            # Don't print this if we only need to print the queries.
-            if eval(s) and not args.print_queries:
-                print(f'short\t{i + len(test_queries)}')
-    else:
-        new_queries = substitute_parameters([e.text])
-
+    new_queries, [new_is_short] = substitute_parameters([e.text], [[e.attrib.get('short', '0')]])
    test_queries += new_queries
+    is_short += [eval(s) for s in new_is_short]
+
+assert(len(test_queries) == len(is_short))


 # If we're only asked to print the queries, do that and exit
@ -82,6 +82,11 @@ if args.print_queries:
        print(q)
    exit(0)

+# Print short queries
+for i, s in enumerate(is_short):
+    if s:
+        print(f'short\t{i}')
+
 # If we're only asked to print the settings, do that and exit. These are settings
 # for clickhouse-benchmark, so we print them as command line arguments, e.g.
 # '--max_memory_usage=10000000'.
@ -116,7 +121,7 @@ if 'max_ignored_relative_change' in root.attrib:

 # Open connections
 servers = [{'host': host, 'port': port} for (host, port) in zip(args.host, args.port)]
-connections = [clickhouse_driver.Client(**server) for server in servers]
+all_connections = [clickhouse_driver.Client(**server) for server in servers]

 for s in servers:
    print('server\t{}\t{}'.format(s['host'], s['port']))
@ -126,7 +131,7 @@ for s in servers:
 # connection loses the changes in settings.
 drop_query_templates = [q.text for q in root.findall('drop_query')]
 drop_queries = substitute_parameters(drop_query_templates)
-for conn_index, c in enumerate(connections):
+for conn_index, c in enumerate(all_connections):
    for q in drop_queries:
        try:
            c.execute(q)
@ -142,7 +147,7 @@ for conn_index, c in enumerate(connections):
 # configurable). So the end result is uncertain, but hopefully we'll be able to
 # run at least some queries.
 settings = root.findall('settings/*')
-for conn_index, c in enumerate(connections):
+for conn_index, c in enumerate(all_connections):
    for s in settings:
        try:
            q = f"set {s.tag} = '{s.text}'"
@ -154,7 +159,7 @@ for conn_index, c in enumerate(connections):
 # Check tables that should exist. If they don't exist, just skip this test.
 tables = [e.text for e in root.findall('preconditions/table_exists')]
 for t in tables:
-    for c in connections:
+    for c in all_connections:
        try:
            res = c.execute("select 1 from {} limit 1".format(t))
        except:
@ -176,7 +181,7 @@ for q in create_queries:
            file = sys.stderr)
        sys.exit(1)

-for conn_index, c in enumerate(connections):
+for conn_index, c in enumerate(all_connections):
    for q in create_queries:
        c.execute(q)
        print(f'create\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')
@ -184,13 +189,19 @@ for conn_index, c in enumerate(connections):
 # Run fill queries
 fill_query_templates = [q.text for q in root.findall('fill_query')]
 fill_queries = substitute_parameters(fill_query_templates)
-for conn_index, c in enumerate(connections):
+for conn_index, c in enumerate(all_connections):
    for q in fill_queries:
        c.execute(q)
        print(f'fill\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')

+# Run the queries in randomized order, but preserve their indexes as specified
+# in the test XML. To avoid using too much time, limit the number of queries
+# we run per test.
+queries_to_run = random.sample(range(0, len(test_queries)), min(len(test_queries), args.max_queries or len(test_queries)))
+
 # Run test queries.
-for query_index, q in enumerate(test_queries):
+for query_index in queries_to_run:
+    q = test_queries[query_index]
    query_prefix = f'{test_name}.query{query_index}'

    # We have some crazy long queries (about 100kB), so trim them to a sane
@ -208,8 +219,8 @@ for query_index, q in enumerate(test_queries):
    # new one. We want to run them on the new server only, so that the PR author
    # can ensure that the test works properly. Remember the errors we had on
    # each server.
-    query_error_on_connection = [None] * len(connections);
-    for conn_index, c in enumerate(connections):
+    query_error_on_connection = [None] * len(all_connections);
+    for conn_index, c in enumerate(all_connections):
        try:
            prewarm_id = f'{query_prefix}.prewarm0'
            res = c.execute(q, query_id = prewarm_id)
@ -236,21 +247,22 @@ for query_index, q in enumerate(test_queries):

    if len(no_errors) == 0:
        continue
-    elif len(no_errors) < len(connections):
+    elif len(no_errors) < len(all_connections):
        print(f'partial\t{query_index}\t{no_errors}')

+    this_query_connections = [all_connections[index] for index in no_errors]
+
    # Now, perform measured runs.
    # Track the time spent by the client to process this query, so that we can
    # notice the queries that take long to process on the client side, e.g. by
    # sending excessive data.
    start_seconds = time.perf_counter()
    server_seconds = 0
-    for run in range(0, args.runs):
+    run = 0
+    while True:
        run_id = f'{query_prefix}.run{run}'
-        for conn_index, c in enumerate(connections):
-            if query_error_on_connection[conn_index]:
-                continue

+        for conn_index, c in enumerate(this_query_connections):
            try:
                res = c.execute(q, query_id = run_id)
            except Exception as e:
@ -259,8 +271,8 @@ for query_index, q in enumerate(test_queries):
                e.message = run_id + ': ' + e.message
                raise

-            print(f'query\t{query_index}\t{run_id}\t{conn_index}\t{c.last_query.elapsed}')
            server_seconds += c.last_query.elapsed
+            print(f'query\t{query_index}\t{run_id}\t{conn_index}\t{c.last_query.elapsed}')

            if c.last_query.elapsed > 10:
                # Stop processing pathologically slow queries, to avoid timing out
@ -269,12 +281,37 @@ for query_index, q in enumerate(test_queries):
                print(f'The query no. {query_index} is taking too long to run ({c.last_query.elapsed} s)', file=sys.stderr)
                exit(2)

+        # Be careful with the counter, after this line it's the next iteration
+        # already.
+        run += 1
+
+        # Try to run any query for at least the specified number of times,
+        # before considering other stop conditions.
+        if run < args.runs:
+            continue
+
+        # For very short queries we have a special mode where we run them for at
+        # least some time. The recommended lower bound of run time for "normal"
+        # queries is about 0.1 s, and we run them about 10 times, giving the
+        # time per query per server of about one second. Use this value as a
+        # reference for "short" queries.
+        if is_short[query_index]:
+            if server_seconds >= 2 * len(this_query_connections):
+                break
+            # Also limit the number of runs, so that we don't go crazy processing
+            # the results -- 'eqmed.sql' is really suboptimal.
+            if run >= 500:
+                break
+        else:
+            if run >= args.runs:
+                break
+
    client_seconds = time.perf_counter() - start_seconds
    print(f'client-time\t{query_index}\t{client_seconds}\t{server_seconds}')

 # Run drop queries
 drop_queries = substitute_parameters(drop_query_templates)
-for conn_index, c in enumerate(connections):
+for conn_index, c in enumerate(all_connections):
    for q in drop_queries:
        c.execute(q)
        print(f'drop\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')
--- a/docker/test/performance-comparison/report.py
+++ b/docker/test/performance-comparison/report.py
@ -98,6 +98,9 @@ th {{

 tr:nth-child(odd) td {{filter: brightness(90%);}}

+.inconsistent-short-marking tr :nth-child(2),
+.inconsistent-short-marking tr :nth-child(3),
+.inconsistent-short-marking tr :nth-child(5),
 .all-query-times tr :nth-child(1),
 .all-query-times tr :nth-child(2),
 .all-query-times tr :nth-child(3),
@ -126,7 +129,6 @@ tr:nth-child(odd) td {{filter: brightness(90%);}}
 .test-times tr :nth-child(5),
 .test-times tr :nth-child(6),
 .test-times tr :nth-child(7),
-.test-times tr :nth-child(8),
 .concurrent-benchmarks tr :nth-child(2),
 .concurrent-benchmarks tr :nth-child(3),
 .concurrent-benchmarks tr :nth-child(4),
@ -205,9 +207,11 @@ def tableStart(title):
    global table_anchor
    table_anchor = cls
    anchor = currentTableAnchor()
+    help_anchor = '-'.join(title.lower().split(' '));
    return f"""
        <h2 id="{anchor}">
            <a class="cancela" href="#{anchor}">{title}</a>
+            <a class="cancela" href="https://github.com/ClickHouse/ClickHouse/tree/master/docker/test/performance-comparison#{help_anchor}"><sup style="color: #888">?</sup></a>
        </h2>
        <table class="{cls}">
    """
@ -250,7 +254,7 @@ def addSimpleTable(caption, columns, rows, pos=None):
 def add_tested_commits():
    global report_errors
    try:
-        addSimpleTable('Tested commits', ['Old', 'New'],
+        addSimpleTable('Tested Commits', ['Old', 'New'],
            [['<pre>{}</pre>'.format(x) for x in
                [open('left-commit.txt').read(),
                 open('right-commit.txt').read()]]])
@ -276,7 +280,7 @@ def add_report_errors():
    if not report_errors:
        return

-    text = tableStart('Errors while building the report')
+    text = tableStart('Errors while Building the Report')
    text += tableHeader(['Error'])
    for x in report_errors:
        text += tableRow([x])
@ -290,7 +294,7 @@ def add_errors_explained():
        return

    text = '<a name="fail1"/>'
-    text += tableStart('Error summary')
+    text += tableStart('Error Summary')
    text += tableHeader(['Description'])
    for row in errors_explained:
        text += tableRow(row)
@ -308,26 +312,26 @@ if args.report == 'main':

    run_error_rows = tsvRows('run-errors.tsv')
    error_tests += len(run_error_rows)
-    addSimpleTable('Run errors', ['Test', 'Error'], run_error_rows)
+    addSimpleTable('Run Errors', ['Test', 'Error'], run_error_rows)
    if run_error_rows:
        errors_explained.append([f'<a href="#{currentTableAnchor()}">There were some errors while running the tests</a>']);


    slow_on_client_rows = tsvRows('report/slow-on-client.tsv')
    error_tests += len(slow_on_client_rows)
-    addSimpleTable('Slow on client',
+    addSimpleTable('Slow on Client',
                     ['Client time,&nbsp;s', 'Server time,&nbsp;s', 'Ratio', 'Test', 'Query'],
                     slow_on_client_rows)
    if slow_on_client_rows:
        errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries are taking noticeable time client-side (missing `FORMAT Null`?)</a>']);

-    unmarked_short_rows = tsvRows('report/unmarked-short-queries.tsv')
+    unmarked_short_rows = tsvRows('report/inconsistent-short-marking.tsv')
    error_tests += len(unmarked_short_rows)
-    addSimpleTable('Short queries not marked as short',
-        ['New client time, s', 'Test', '#', 'Query'],
+    addSimpleTable('Inconsistent Short Marking',
+        ['Problem', 'Is marked as short', 'New client time, s', 'Test', '#', 'Query'],
        unmarked_short_rows)
    if unmarked_short_rows:
-        errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries have short duration but are not explicitly marked as "short"</a>']);
+        errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries have inconsistent short marking</a>']);

    def add_partial():
        rows = tsvRows('report/partial-queries-report.tsv')
@ -335,7 +339,7 @@ if args.report == 'main':
            return

        global unstable_partial_queries, slow_average_tests, tables
-        text = tableStart('Partial queries')
+        text = tableStart('Partial Queries')
        columns = ['Median time, s', 'Relative time variance', 'Test', '#', 'Query']
        text += tableHeader(columns)
        attrs = ['' for c in columns]
@ -366,13 +370,13 @@ if args.report == 'main':

        global faster_queries, slower_queries, tables

-        text = tableStart('Changes in performance')
+        text = tableStart('Changes in Performance')
        columns = [
            'Old,&nbsp;s',                                          # 0
            'New,&nbsp;s',                                          # 1
            'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)',                 # 2
            'Relative difference (new&nbsp;&minus;&nbsp;old) / old',   # 3
-            'p&nbsp;<&nbsp;0.001 threshold',                   # 4
+            'p&nbsp;<&nbsp;0.01 threshold',                   # 4
            # Failed                                           # 5
            'Test',                                            # 6
            '#',                                               # 7
@ -416,14 +420,14 @@ if args.report == 'main':
            'Old,&nbsp;s', #0
            'New,&nbsp;s', #1
            'Relative difference (new&nbsp;-&nbsp;old)/old', #2
-            'p&nbsp;&lt;&nbsp;0.001 threshold', #3
+            'p&nbsp;&lt;&nbsp;0.01 threshold', #3
            # Failed #4
            'Test', #5
            '#',    #6
            'Query' #7
        ]

-        text = tableStart('Unstable queries')
+        text = tableStart('Unstable Queries')
        text += tableHeader(columns)

        attrs = ['' for c in columns]
@ -444,9 +448,9 @@ if args.report == 'main':
    add_unstable_queries()

    skipped_tests_rows = tsvRows('analyze/skipped-tests.tsv')
-    addSimpleTable('Skipped tests', ['Test', 'Reason'], skipped_tests_rows)
+    addSimpleTable('Skipped Tests', ['Test', 'Reason'], skipped_tests_rows)

-    addSimpleTable('Test performance changes',
+    addSimpleTable('Test Performance Changes',
        ['Test', 'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)', 'Queries', 'Total not OK', 'Changed perf', 'Unstable'],
        tsvRows('report/test-perf-changes.tsv'))

@ -461,34 +465,34 @@ if args.report == 'main':
            'Wall clock time,&nbsp;s',                            #1
            'Total client time,&nbsp;s',                          #2
            'Total queries',                                 #3
-            'Ignored short queries',                         #4
-            'Longest query<br>(sum for all runs),&nbsp;s',        #5
-            'Avg wall clock time<br>(sum for all runs),&nbsp;s',  #6
-            'Shortest query<br>(sum for all runs),&nbsp;s',       #7
+            'Longest query<br>(sum for all runs),&nbsp;s',        #4
+            'Avg wall clock time<br>(sum for all runs),&nbsp;s',  #5
+            'Shortest query<br>(sum for all runs),&nbsp;s',       #6
            ]

-        text = tableStart('Test times')
+        text = tableStart('Test Times')
        text += tableHeader(columns)

-        nominal_runs = 13  # FIXME pass this as an argument
+        nominal_runs = 7  # FIXME pass this as an argument
        total_runs = (nominal_runs + 1) * 2  # one prewarm run, two servers
+        allowed_average_run_time = allowed_single_run_time + 60 / total_runs; # some allowance for fill/create queries
        attrs = ['' for c in columns]
        for r in rows:
            anchor = f'{currentTableAnchor()}.{r[0]}'
-            if float(r[6]) > 1.5 * total_runs:
+            if float(r[5]) > allowed_average_run_time * total_runs:
                # FIXME should be 15s max -- investigate parallel_insert
                slow_average_tests += 1
-                attrs[6] = f'style="background: {color_bad}"'
+                attrs[5] = f'style="background: {color_bad}"'
                errors_explained.append([f'<a href="#{anchor}">The test \'{r[0]}\' is too slow to run as a whole. Investigate whether the create and fill queries can be sped up'])
            else:
-                attrs[6] = ''
+                attrs[5] = ''

-            if float(r[5]) > allowed_single_run_time * total_runs:
+            if float(r[4]) > allowed_single_run_time * total_runs:
                slow_average_tests += 1
-                attrs[5] = f'style="background: {color_bad}"'
+                attrs[4] = f'style="background: {color_bad}"'
                errors_explained.append([f'<a href="./all-queries.html#all-query-times.{r[0]}.0">Some query of the test \'{r[0]}\' is too slow to run. See the all queries report'])
            else:
-                attrs[5] = ''
+                attrs[4] = ''

            text += tableRow(r, attrs, anchor)

@ -497,74 +501,7 @@ if args.report == 'main':

    add_test_times()

-    def add_benchmark_results():
-        if not os.path.isfile('benchmark/website-left.json'):
-            return
-
-        json_reports = [json.load(open(f'benchmark/website-{x}.json')) for x in ['left', 'right']]
-        stats = [next(iter(x.values()))["statistics"] for x in json_reports]
-        qps = [x["QPS"] for x in stats]
-        queries = [x["num_queries"] for x in stats]
-        errors = [x["num_errors"] for x in stats]
-        relative_diff = (qps[1] - qps[0]) / max(0.01, qps[0]);
-        times_diff = max(qps) / max(0.01, min(qps))
-
-        all_rows = []
-        header = ['Benchmark', 'Metric', 'Old', 'New', 'Relative difference', 'Times difference'];
-
-        attrs = ['' for x in header]
-        row = ['website', 'queries', f'{queries[0]:d}', f'{queries[1]:d}', '--', '--']
-        attrs[0] = 'rowspan=2'
-        all_rows.append([row, attrs])
-
-        attrs = ['' for x in header]
-        row = [None, 'queries/s', f'{qps[0]:.3f}', f'{qps[1]:.3f}', f'{relative_diff:.3f}', f'x{times_diff:.3f}']
-        if abs(relative_diff) > 0.1:
-            # More queries per second is better.
-            if relative_diff > 0.:
-                attrs[4] = f'style="background: {color_good}"'
-            else:
-                attrs[4] = f'style="background: {color_bad}"'
-        else:
-            attrs[4] = ''
-        all_rows.append([row, attrs]);
-
-        if max(errors):
-            all_rows[0][1][0] = "rowspan=3"
-            row = [''] * (len(header))
-            attrs = ['' for x in header]
-
-            attrs[0] = None
-            row[1] = 'errors'
-            row[2] = f'{errors[0]:d}'
-            row[3] = f'{errors[1]:d}'
-            row[4] = '--'
-            row[5] = '--'
-            if errors[0]:
-                attrs[2] += f' style="background: {color_bad}" '
-            if errors[1]:
-                attrs[3] += f' style="background: {color_bad}" '
-
-            all_rows.append([row, attrs])
-
-        text = tableStart('Concurrent benchmarks')
-        text += tableHeader(header)
-        for row, attrs in all_rows:
-            text += tableRow(row, attrs)
-        text += tableEnd()
-
-        global tables
-        tables.append(text)
-
-    try:
-        add_benchmark_results()
-    except:
-        report_errors.append(
-            traceback.format_exception_only(
-                *sys.exc_info()[:2])[-1])
-        pass
-
-    addSimpleTable('Metric changes',
+    addSimpleTable('Metric Changes',
        ['Metric', 'Old median value', 'New median value',
            'Relative difference', 'Times difference'],
        tsvRows('metrics/changes.tsv'))
@ -649,13 +586,13 @@ elif args.report == 'all-queries':
            'New,&nbsp;s', #3
            'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)',                 #4
            'Relative difference (new&nbsp;&minus;&nbsp;old) / old', #5
-            'p&nbsp;&lt;&nbsp;0.001 threshold',          #6
+            'p&nbsp;&lt;&nbsp;0.01 threshold',          #6
            'Test',                                   #7
            '#',                                      #8
            'Query',                                  #9
            ]

-        text = tableStart('All query times')
+        text = tableStart('All Query Times')
        text += tableHeader(columns)

        attrs = ['' for c in columns]
--- a/docker/test/stress/stress
+++ b/docker/test/stress/stress
@ -28,7 +28,7 @@ def get_options(i):
    options = ""
    if 0 < i:
        options += " --order=random"
-    if i == 1:
+    if i % 2 == 1:
        options += " --atomic-db-engine"
    return options

--- a/docs/en/engines/table-engines/special/distributed.md
+++ b/docs/en/engines/table-engines/special/distributed.md
@ -45,6 +45,18 @@ Clusters are set like this:
 <remote_servers>
    <logs>
        <shard>
+            <!-- Inter-server per-cluster secret for Distributed queries
+                 default: no secret (no authentication will be performed)
+
+                 If set, then Distributed queries will be validated on shards, so at least:
+                 - such cluster should exist on the shard,
+                 - such cluster should have the same secret.
+
+                 And also (and which is more important), the initial_user will
+                 be used as current user for the query.
+            -->
+            <!-- <secret></secret> -->
+
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
--- a/docs/en/getting-started/playground.md
+++ b/docs/en/getting-started/playground.md
@ -6,11 +6,11 @@ toc_title: Playground
 # ClickHouse Playground {#clickhouse-playground}

 [ClickHouse Playground](https://play.clickhouse.tech) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
-Several example datasets are available in the Playground as well as sample queries that show ClickHouse features. There’s also a selection of ClickHouse LTS releases to experiment with.
+Several example datasets are available in Playground as well as sample queries that show ClickHouse features. There’s also a selection of ClickHouse LTS releases to experiment with.

 ClickHouse Playground gives the experience of m2.small [Managed Service for ClickHouse](https://cloud.yandex.com/services/managed-clickhouse) instance (4 vCPU, 32 GB RAM) hosted in [Yandex.Cloud](https://cloud.yandex.com/). More information about [cloud providers](../commercial/cloud.md).

-You can make queries to playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
+You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).

 ## Credentials {#credentials}

@ -60,7 +60,7 @@ clickhouse client --secure -h play-api.clickhouse.tech --port 9440 -u playground
 ## Implementation Details {#implementation-details}

 ClickHouse Playground web interface makes requests via ClickHouse [HTTP API](../interfaces/http.md).
-The Playground backend is just a ClickHouse cluster without any additional server-side application. As mentioned above, ClickHouse HTTPS and TCP/TLS endpoints are also publicly available as a part of the Playground, both are proxied through [Cloudflare Spectrum](https://www.cloudflare.com/products/cloudflare-spectrum/) to add extra layer of protection and improved global connectivity.
+The Playground backend is just a ClickHouse cluster without any additional server-side application. As mentioned above, ClickHouse HTTPS and TCP/TLS endpoints are also publicly available as a part of the Playground, both are proxied through [Cloudflare Spectrum](https://www.cloudflare.com/products/cloudflare-spectrum/) to add an extra layer of protection and improved global connectivity.

 !!! warning "Warning"
-    Exposing ClickHouse server to public internet in any other situation is **strongly not recommended**. Make sure it listens only on private network and is covered by properly configured firewall.
+    Exposing the ClickHouse server to the public internet in any other situation is **strongly not recommended**. Make sure it listens only on a private network and is covered by a properly configured firewall.
--- a/docs/en/operations/settings/query-complexity.md
+++ b/docs/en/operations/settings/query-complexity.md
@ -60,6 +60,31 @@ A maximum number of bytes (uncompressed data) that can be read from a table when

 What to do when the volume of data read exceeds one of the limits: ‘throw’ or ‘break’. By default, throw.

+## max\_rows\_to\_read_leaf {#max-rows-to-read-leaf}
+
+The following restrictions can be checked on each block (instead of on each row). That is, the restrictions can be broken a little.
+
+A maximum number of rows that can be read from a local table on a leaf node when running a distributed query. While
+distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will be checked only on the read 
+stage on the leaf nodes and ignored on results merging stage on the root node. For example, cluster consists of 2 shards 
+and each shard contains a table with 100 rows. Then distributed query which suppose to read all the data from both 
+tables with setting `max_rows_to_read=150` will fail as in total it will be 200 rows. While query 
+with `max_rows_to_read_leaf=150` will succeed since leaf nodes will read 100 rows at max.
+
+## max\_bytes\_to\_read_leaf {#max-bytes-to-read-leaf}
+
+A maximum number of bytes (uncompressed data) that can be read from a local table on a leaf node when running 
+a distributed query. While distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will 
+be checked only on the read stage on the leaf nodes and ignored on results merging stage on the root node. 
+For example, cluster consists of 2 shards and each shard contains a table with 100 bytes of data. 
+Then distributed query which suppose to read all the data from both tables with setting `max_bytes_to_read=150` will fail 
+as in total it will be 200 bytes. While query with `max_bytes_to_read_leaf=150` will succeed since leaf nodes will read 
+100 bytes at max.
+
+## read\_overflow\_mode_leaf {#read-overflow-mode-leaf}
+
+What to do when the volume of data read exceeds one of the leaf limits: ‘throw’ or ‘break’. By default, throw.
+
 ## max\_rows\_to\_group\_by {#settings-max-rows-to-group-by}

 A maximum number of unique keys received from aggregation. This setting lets you limit memory consumption when aggregating.
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@ -940,6 +940,8 @@ This algorithm chooses the first replica in the set or a random replica if the f

 The `first_or_random` algorithm solves the problem of the `in_order` algorithm. With `in_order`, if one replica goes down, the next one gets a double load while the remaining replicas handle the usual amount of traffic. When using the `first_or_random` algorithm, the load is evenly distributed among replicas that are still available.

+It's possible to explicitly define what the first replica is by using the setting `load_balancing_first_offset`. This gives more control to rebalance query workloads among replicas.
+
 ### Round Robin {#load_balancing-round_robin}

 ``` sql
--- a/docs/en/operations/system-tables/merges.md
+++ b/docs/en/operations/system-tables/merges.md
@ -10,12 +10,16 @@ Columns:
 -   `progress` (Float64) — The percentage of completed work from 0 to 1.
 -   `num_parts` (UInt64) — The number of pieces to be merged.
 -   `result_part_name` (String) — The name of the part that will be formed as the result of merging.
-   `is_mutation` (UInt8) - 1 if this process is a part mutation.
+-   `is_mutation` (UInt8) — 1 if this process is a part mutation.
 -   `total_size_bytes_compressed` (UInt64) — The total size of the compressed data in the merged chunks.
 -   `total_size_marks` (UInt64) — The total number of marks in the merged parts.
 -   `bytes_read_uncompressed` (UInt64) — Number of bytes read, uncompressed.
 -   `rows_read` (UInt64) — Number of rows read.
 -   `bytes_written_uncompressed` (UInt64) — Number of bytes written, uncompressed.
 -   `rows_written` (UInt64) — Number of rows written.
+-   `memory_usage` (UInt64) — Memory consumption of the merge process.
+-   `thread_id` (UInt64) — Thread ID of the merge process.
+-   `merge_type` — The type of current merge. Empty if it's an mutation.
+-   `merge_algorithm` — The algorithm used in current merge. Empty if it's an mutation.

 [Original article](https://clickhouse.tech/docs/en/operations/system_tables/merges) <!--hide-->
--- a/docs/en/operations/utilities/clickhouse-benchmark.md
+++ b/docs/en/operations/utilities/clickhouse-benchmark.md
@ -38,7 +38,7 @@ clickhouse-benchmark [keys] < queries_file
 -   `-d N`, `--delay=N` — Interval in seconds between intermediate reports (set 0 to disable reports). Default value: 1.
 -   `-h WORD`, `--host=WORD` — Server host. Default value: `localhost`. For the [comparison mode](#clickhouse-benchmark-comparison-mode) you can use multiple `-h` keys.
 -   `-p N`, `--port=N` — Server port. Default value: 9000. For the [comparison mode](#clickhouse-benchmark-comparison-mode) you can use multiple `-p` keys.
-   `-i N`, `--iterations=N` — Total number of queries. Default value: 0.
+-   `-i N`, `--iterations=N` — Total number of queries. Default value: 0 (repeat forever).
 -   `-r`, `--randomize` — Random order of queries execution if there is more then one input query.
 -   `-s`, `--secure` — Using TLS connection.
 -   `-t N`, `--timelimit=N` — Time limit in seconds. `clickhouse-benchmark` stops sending queries when the specified time limit is reached. Default value: 0 (time limit disabled).
--- a/docs/en/sql-reference/ansi.md
+++ b/docs/en/sql-reference/ansi.md
@ -6,7 +6,7 @@ toc_title: ANSI Compatibility
 # ANSI SQL Compatibility of ClickHouse SQL Dialect {#ansi-sql-compatibility-of-clickhouse-sql-dialect}

 !!! note "Note"
-    This article relies on Table 38, “Feature taxonomy and definition for mandatory features”, Annex F of ISO/IEC CD 9075-2:2013.
+    This article relies on Table 38, “Feature taxonomy and definition for mandatory features”, Annex F of [ISO/IEC CD 9075-2:2011](https://www.iso.org/obp/ui/#iso:std:iso-iec:9075:-2:ed-4:v1:en:sec:8).

 ## Differences in Behaviour {#differences-in-behaviour}

@ -77,6 +77,16 @@ The following table lists cases when query feature works in ClickHouse, but beha
 | E071-05    | Columns combined via table operators need not have exactly the same data type                                            | Yes{.text-success}         |                                                                                                                                                                                           |
 | E071-06    | Table operators in subqueries                                                                                            | Yes{.text-success}         |                                                                                                                                                                                           |
 | **E081**   | **Basic privileges**                                                                                                     | **Partial**{.text-warning} | Work in progress                                                                                                                                                                          |
+| E081-01    | SELECT privilege at the table level | | |
+| E081-02    | DELETE privilege | | |
+| E081-03    | INSERT privilege at the table level | | |
+| E081-04    | UPDATE privilege at the table level | | |
+| E081-05    | UPDATE privilege at the column level | | |
+| E081-06    | REFERENCES privilege at the table level | | |
+| E081-07    | REFERENCES privilege at the column level | | |
+| E081-08    | WITH GRANT OPTION | | |
+| E081-09    | USAGE privilege | | |
+| E081-10    | EXECUTE privilege | | |
 | **E091**   | **Set functions**                                                                                                        | **Yes**{.text-success}     |                                                                                                                                                                                           |
 | E091-01    | AVG                                                                                                                      | Yes{.text-success}         |                                                                                                                                                                                           |
 | E091-02    | COUNT                                                                                                                    | Yes{.text-success}         |                                                                                                                                                                                           |
@ -169,6 +179,7 @@ The following table lists cases when query feature works in ClickHouse, but beha
 | **F471**   | **Scalar subquery values**                                                                                               | **Yes**{.text-success}     |                                                                                                                                                                                           |
 | **F481**   | **Expanded NULL predicate**                                                                                              | **Yes**{.text-success}     |                                                                                                                                                                                           |
 | **F812**   | **Basic flagging**                                                                                                       | **No**{.text-danger}       |                                                                                                                                                                                           |
+| **S011**   | **Distinct data types** | | |
 | **T321**   | **Basic SQL-invoked routines**                                                                                           | **No**{.text-danger}       |                                                                                                                                                                                           |
 | T321-01    | User-defined functions with no overloading                                                                               | No{.text-danger}           |                                                                                                                                                                                           |
 | T321-02    | User-defined stored procedures with no overloading                                                                       | No{.text-danger}           |                                                                                                                                                                                           |
--- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md
+++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md
@ -246,7 +246,7 @@ Installing unixODBC and the ODBC driver for PostgreSQL:
 $ sudo apt-get install -y unixodbc odbcinst odbc-postgresql
 ```

-Configuring `/etc/odbc.ini` (or `~/.odbc.ini`):
+Configuring `/etc/odbc.ini` (or `~/.odbc.ini` if you signed in under a user that runs ClickHouse):

 ``` text
    [DEFAULT]
@ -321,7 +321,7 @@ You may need to edit `odbc.ini` to specify the full path to the library with the

 Ubuntu OS.

-Installing the driver: :
+Installing the ODBC driver for connecting to MS SQL:

 ``` bash
 $ sudo apt-get install tdsodbc freetds-bin sqsh
@ -329,7 +329,7 @@ $ sudo apt-get install tdsodbc freetds-bin sqsh

 Configuring the driver:

-``` bash
+```bash
    $ cat /etc/freetds/freetds.conf
    ...

@ -339,8 +339,11 @@ Configuring the driver:
    tds version = 7.0
    client charset = UTF-8

+    # test TDS connection
+    $ sqsh -S MSSQL -D database -U user -P password
+
+
    $ cat /etc/odbcinst.ini
-    ...

    [FreeTDS]
    Description     = FreeTDS
@ -349,8 +352,8 @@ Configuring the driver:
    FileUsage       = 1
    UsageCount      = 5

-    $ cat ~/.odbc.ini
-    ...
+    $ cat /etc/odbc.ini
+    # $ cat ~/.odbc.ini # if you signed in under a user that runs ClickHouse

    [MSSQL]
    Description     = FreeTDS
@ -360,8 +363,15 @@ Configuring the driver:
    UID             = test
    PWD             = test
    Port            = 1433
+
+
+    # (optional) test ODBC connection (to use isql-tool install the [unixodbc](https://packages.debian.org/sid/unixodbc)-package)
+    $ isql -v MSSQL "user" "password"
 ```

+Remarks:
+- to determine the earliest TDS version that is supported by a particular SQL Server version, refer to the product documentation or look at [MS-TDS Product Behavior](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/135d0ebe-5c4c-4a94-99bf-1811eccb9f4a)
+
 Configuring the dictionary in ClickHouse:

 ``` xml
--- a/docs/en/sql-reference/functions/ext-dict-functions.md
+++ b/docs/en/sql-reference/functions/ext-dict-functions.md
@ -111,7 +111,7 @@ dictHas('dict_name', id_expr)
 **Parameters**

 -   `dict_name` — Name of the dictionary. [String literal](../../sql-reference/syntax.md#syntax-string-literal).
-   `id_expr` — Key value. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning a [UInt64](../../sql-reference/data-types/int-uint.md)-type value.
+-   `id_expr` — Key value. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning a [UInt64](../../sql-reference/data-types/int-uint.md) or [Tuple](../../sql-reference/data-types/tuple.md)-type value depending on the dictionary configuration.

 **Returned value**

@ -189,8 +189,8 @@ dictGet[Type]OrDefault('dict_name', 'attr_name', id_expr, default_value_expr)

 -   `dict_name` — Name of the dictionary. [String literal](../../sql-reference/syntax.md#syntax-string-literal).
 -   `attr_name` — Name of the column of the dictionary. [String literal](../../sql-reference/syntax.md#syntax-string-literal).
-   `id_expr` — Key value. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning a [UInt64](../../sql-reference/data-types/int-uint.md)-type value.
-   `default_value_expr` — Value which is returned if the dictionary doesn’t contain a row with the `id_expr` key. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning a value in the data type configured for the `attr_name` attribute.
+-   `id_expr` — Key value. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning a [UInt64](../../sql-reference/data-types/int-uint.md) or [Tuple](../../sql-reference/data-types/tuple.md)-type value depending on the dictionary configuration.
+-   `default_value_expr` — Value returned if the dictionary doesn’t contain a row with the `id_expr` key. [Expression](../../sql-reference/syntax.md#syntax-expressions) returning the value in the data type configured for the `attr_name` attribute.

 **Returned value**

--- a/docs/en/sql-reference/functions/tuple-map-functions.md
+++ b/docs/en/sql-reference/functions/tuple-map-functions.md
@ -46,3 +46,25 @@ SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt3
 │ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │
 └────────────────┴───────────────────────────────────┘
 ````
+
+## mapPopulateSeries {#function-mappopulateseries}
+
+Syntax: `mapPopulateSeries((keys : Array(<IntegerType>), values : Array(<IntegerType>)[, max : <IntegerType>])`
+
+Generates a map, where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from `keys` array with step size of one,
+and corresponding values taken from `values` array. If the value is not specified for the key, then it uses default value in the resulting map.
+For repeated keys only the first value (in order of appearing) gets associated with the key.
+
+The number of elements in `keys` and `values` must be the same for each row.
+
+Returns a tuple of two arrays: keys in sorted order, and values the corresponding keys.
+
+``` sql
+select mapPopulateSeries([1,2,4], [11,22,44], 5) as res, toTypeName(res) as type;
+```
+
+``` text
+┌─res──────────────────────────┬─type──────────────────────────────┐
+│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │
+└──────────────────────────────┴───────────────────────────────────┘
+```
--- a/docs/es/operations/backup.md
+++ b/docs/es/operations/backup.md
@ -1,20 +1,18 @@
 ---
-machine_translated: true
-machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
 toc_priority: 49
 toc_title: Copia de seguridad de datos
 ---

 # Copia de seguridad de datos {#data-backup}

-Mientras [replicación](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [no puede simplemente eliminar tablas con un motor similar a MergeTree que contenga más de 50 Gb de datos](https://github.com/ClickHouse/ClickHouse/blob/v18.14.18-stable/programs/server/config.xml#L322-L330). Sin embargo, estas garantías no cubren todos los casos posibles y pueden eludirse.
+Mientras que la [replicación](../engines/table-engines/mergetree-family/replication.md) proporciona protección contra fallos de hardware, no protege de errores humanos: el borrado accidental de datos, elminar la tabla equivocada o una tabla en el clúster equivocado, y bugs de software que dan como resultado un procesado incorrecto de los datos o la corrupción de los datos. En muchos casos, errores como estos afectarán a todas las réplicas. ClickHouse dispone de salvaguardas para prevenir algunos tipos de errores — por ejemplo, por defecto [no se puede simplemente eliminar tablas con un motor similar a MergeTree que contenga más de 50 Gb de datos](https://github.com/ClickHouse/ClickHouse/blob/v18.14.18-stable/programs/server/config.xml#L322-L330). Sin embargo, estas salvaguardas no cubren todos los casos posibles y pueden eludirse.

 Para mitigar eficazmente los posibles errores humanos, debe preparar cuidadosamente una estrategia para realizar copias de seguridad y restaurar sus datos **previamente**.

-Cada empresa tiene diferentes recursos disponibles y requisitos comerciales, por lo que no existe una solución universal para las copias de seguridad y restauraciones de ClickHouse que se adapten a cada situación. Lo que funciona para un gigabyte de datos probablemente no funcionará para decenas de petabytes. Hay una variedad de posibles enfoques con sus propios pros y contras, que se discutirán a continuación. Es una buena idea utilizar varios enfoques en lugar de solo uno para compensar sus diversas deficiencias.
+Cada empresa tiene diferentes recursos disponibles y requisitos comerciales, por lo que no existe una solución universal para las copias de seguridad y restauraciones de ClickHouse que se adapten a cada situación. Lo que funciona para un gigabyte de datos probablemente no funcionará para decenas de petabytes. Hay una variedad de posibles enfoques con sus propios pros y contras, que se discutirán a continuación. Es una buena idea utilizar varios enfoques en lugar de uno solo para compensar sus diversas deficiencias.

 !!! note "Nota"
-    Tenga en cuenta que si realizó una copia de seguridad de algo y nunca intentó restaurarlo, es probable que la restauración no funcione correctamente cuando realmente la necesite (o al menos tomará más tiempo de lo que las empresas pueden tolerar). Por lo tanto, cualquiera que sea el enfoque de copia de seguridad que elija, asegúrese de automatizar el proceso de restauración también y practicarlo en un clúster de ClickHouse de repuesto regularmente.
+    Tenga en cuenta que si realizó una copia de seguridad de algo y nunca intentó restaurarlo, es probable que la restauración no funcione correctamente cuando realmente la necesite (o al menos tomará más tiempo de lo que las empresas pueden tolerar). Por lo tanto, cualquiera que sea el enfoque de copia de seguridad que elija, asegúrese de automatizar el proceso de restauración también y ponerlo en practica en un clúster de ClickHouse de repuesto regularmente.

 ## Duplicar datos de origen en otro lugar {#duplicating-source-data-somewhere-else}

@ -32,7 +30,7 @@ Para volúmenes de datos más pequeños, un simple `INSERT INTO ... SELECT ...`

 ## Manipulaciones con piezas {#manipulations-with-parts}

-ClickHouse permite usar el `ALTER TABLE ... FREEZE PARTITION ...` consulta para crear una copia local de particiones de tabla. Esto se implementa utilizando enlaces duros al `/var/lib/clickhouse/shadow/` carpeta, por lo que generalmente no consume espacio adicional en disco para datos antiguos. Las copias creadas de archivos no son manejadas por el servidor ClickHouse, por lo que puede dejarlas allí: tendrá una copia de seguridad simple que no requiere ningún sistema externo adicional, pero seguirá siendo propenso a problemas de hardware. Por esta razón, es mejor copiarlos de forma remota en otra ubicación y luego eliminar las copias locales. Los sistemas de archivos distribuidos y los almacenes de objetos siguen siendo una buena opción para esto, pero los servidores de archivos conectados normales con una capacidad lo suficientemente grande podrían funcionar también (en este caso, la transferencia ocurrirá a través del sistema de archivos de red o tal vez [rsync](https://en.wikipedia.org/wiki/Rsync)).
+ClickHouse permite usar la consulta `ALTER TABLE ... FREEZE PARTITION ...` para crear una copia local de particiones de tabla. Esto se implementa utilizando enlaces duros a la carpeta `/var/lib/clickhouse/shadow/`, por lo que generalmente no consume espacio adicional en disco para datos antiguos. Las copias creadas de archivos no son manejadas por el servidor ClickHouse, por lo que puede dejarlas allí: tendrá una copia de seguridad simple que no requiere ningún sistema externo adicional, pero seguirá siendo propenso a problemas de hardware. Por esta razón, es mejor copiarlos de forma remota en otra ubicación y luego eliminar las copias locales. Los sistemas de archivos distribuidos y los almacenes de objetos siguen siendo una buena opción para esto, pero los servidores de archivos conectados normales con una capacidad lo suficientemente grande podrían funcionar también (en este caso, la transferencia ocurrirá a través del sistema de archivos de red o tal vez [rsync](https://en.wikipedia.org/wiki/Rsync)).

 Para obtener más información sobre las consultas relacionadas con las manipulaciones de particiones, consulte [Documentación de ALTER](../sql-reference/statements/alter.md#alter_manipulations-with-partitions).

--- a/docs/ru/getting-started/playground.md
+++ b/docs/ru/getting-started/playground.md
@ -1,38 +1,59 @@
 # ClickHouse Playground {#clickhouse-playground}

-ClickHouse Playground позволяет моментально выполнить запросы к ClickHouse из бразуера.
-В Playground доступны несколько тестовых массивов данных и примеры запросов, которые показывают некоторые отличительные черты ClickHouse.
+[ClickHouse Playground](https://play.clickhouse.tech) позволяет пользователям экспериментировать с ClickHouse, мгновенно выполняя запросы без настройки своего сервера или кластера.
+В Playground доступны несколько тестовых массивов данных, а также примеры запросов, которые показывают возможности ClickHouse. Кроме того, вы можете выбрать LTS релиз ClickHouse, который хотите протестировать.

-Запросы выполняются под пользователем с правами `readonly` для которого есть следующие ограничения:
+ClickHouse Playground дает возможность поработать с  [Managed Service for ClickHouse](https://cloud.yandex.com/services/managed-clickhouse) в конфигурации m2.small (4 vCPU, 32 ГБ ОЗУ), которую предосталяет [Яндекс.Облако](https://cloud.yandex.com/). Дополнительную информацию об облачных провайдерах читайте в разделе [Поставщики облачных услуг ClickHouse](../commercial/cloud.md).
+
+Вы можете отправлять запросы к Playground с помощью любого HTTP-клиента, например [curl](https://curl.haxx.se) или [wget](https://www.gnu.org/software/wget/), также можно установить соединение с помощью драйверов [JDBC](../interfaces/jdbc.md) или [ODBC](../interfaces/odbc.md). Более подробная информация о программных продуктах, поддерживающих ClickHouse, доступна [здесь](../interfaces/index.md).
+
+## Параметры доступа {#credentials}
+
+| Параметр            | Значение                                |
+|:--------------------|:----------------------------------------|
+| Конечная точка HTTPS| `https://play-api.clickhouse.tech:8443` |
+| Конечная точка TCP  | `play-api.clickhouse.tech:9440`         |
+| Пользователь        | `playground`                            |
+| Пароль              | `clickhouse`                            |
+
+Также можно подключаться к ClickHouse определённых релизов, чтобы протестировать их различия (порты и пользователь / пароль остаются неизменными):
+
+-   20.3 LTS: `play-api-v20-3.clickhouse.tech`
+-   19.14 LTS: `play-api-v19-14.clickhouse.tech`
+
+!!! note "Примечание"
+    Для всех этих конечных точек требуется безопасное соединение TLS.
+
+## Ограничения {#limitations}
+
+Запросы выполняются под пользователем с правами `readonly`, для которого есть следующие ограничения:
 - запрещены DDL запросы
 - запрещены INSERT запросы

 Также установлены следующие опции:
- [`max_result_bytes=10485760`](../operations/settings/query_complexity/#max-result-bytes)
- [`max_result_rows=2000`](../operations/settings/query_complexity/#setting-max_result_rows)
- [`result_overflow_mode=break`](../operations/settings/query_complexity/#result-overflow-mode)
- [`max_execution_time=60000`](../operations/settings/query_complexity/#max-execution-time)
+- [max\_result\_bytes=10485760](../operations/settings/query_complexity/#max-result-bytes)
+- [max\_result\_rows=2000](../operations/settings/query_complexity/#setting-max_result_rows)
+- [result\_overflow\_mode=break](../operations/settings/query_complexity/#result-overflow-mode)
+- [max\_execution\_time=60000](../operations/settings/query_complexity/#max-execution-time)

-ClickHouse Playground соответствует конфигурации m2.small хосту
-[Managed Service for ClickHouse](https://cloud.yandex.com/services/managed-clickhouse)
-запущеному в [Яндекс.Облаке](https://cloud.yandex.com/).
-Больше информации про [облачных провайдерах](../commercial/cloud.md).
+## Примеры {#examples}

-Веб интерфейс ClickHouse Playground делает запросы через ClickHouse HTTP API.
-Бекендом служит обычный кластер ClickHouse.
-ClickHouse HTTP интерфейс также доступен как часть Playground.
-
-Запросы к Playground могут быть выполнены с помощью curl/wget, а также через соединеие JDBC/ODBC драйвера
-Больше информации про приложения с поддержкой ClickHouse доступно в разделе [Интерфейсы](../interfaces/index.md).
-
-| Параметр         | Значение                              |
-|:-----------------|:--------------------------------------|
-| Адрес            | https://play-api.clickhouse.tech:8443 |
-| Имя пользователя | `playground`                          |
-| Пароль           | `clickhouse`                          |
-
-Требуется SSL соединение.
+Пример конечной точки HTTPS с `curl`:

 ``` bash
-curl "https://play-api.clickhouse.tech:8443/?query=SELECT+'Play+ClickHouse!';&user=playground&password=clickhouse&database=datasets"
+curl "https://play-api.clickhouse.tech:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
 ```
+
+Пример конечной точки TCP с [CLI](../interfaces/cli.md):
+
+``` bash
+clickhouse client --secure -h play-api.clickhouse.tech --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
+```
+
+## Детали реализации {#implementation-details}
+
+Веб-интерфейс ClickHouse Playground выполняет запросы через ClickHouse [HTTP API](../interfaces/http.md).
+Бэкэнд Playground - это кластер ClickHouse без дополнительных серверных приложений. Как упоминалось выше,  способы подключения по HTTPS и TCP/TLS общедоступны как часть Playground. Они проксируются через [Cloudflare Spectrum](https://www.cloudflare.com/products/cloudflare-spectrum/) для добавления дополнительного уровня защиты и улучшенного глобального подключения.
+
+!!! warning "Предупреждение"
+Открывать сервер ClickHouse для публичного доступа  в любой другой ситуации **настоятельно не рекомендуется**. Убедитесь, что он настроен только на частную сеть и защищен брандмауэром.
--- a/docs/ru/operations/settings/query-complexity.md
+++ b/docs/ru/operations/settings/query-complexity.md
@ -56,6 +56,32 @@

 Что делать, когда количество прочитанных данных превысило одно из ограничений: throw или break. По умолчанию: throw.

+## max\_rows\_to\_read_leaf {#max-rows-to-read-leaf}
+
+Следующие ограничения могут проверяться на каждый блок (а не на каждую строку). То есть, ограничения могут быть немного нарушены.
+
+Максимальное количество строчек, которое можно прочитать из таблицы на удалённом сервере при выполнении
+распределенного запроса. Распределенные запросы могут создавать несколько подзапросов к каждому из шардов в кластере и 
+тогда этот лимит будет применен при выполнении чтения на удаленных серверах (включая и сервер-инициатор) и проигнорирован 
+на сервере-инициаторе запроса во время обьединения полученных результатов. Например, кластер состоит из 2 шард и каждый 
+из них хранит таблицу с 100 строк. Тогда распределнный запрос для получения всех данных из этих таблиц и установленной 
+настройкой `max_rows_to_read=150` выбросит исключение, т.к. в общем он прочитает 200 строк. Но запрос 
+с настройкой  `max_rows_to_read_leaf=150` завершится успешно, потому что каждый из шардов прочитает максимум 100 строк.
+
+## max\_bytes\_to\_read_leaf {#max-bytes-to-read-leaf}
+
+Максимальное количество байт (несжатых данных), которое можно прочитать из таблицы на удалённом сервере при 
+выполнении распределенного запроса. Распределенные запросы могут создавать несколько подзапросов к каждому из шардов в 
+кластере и тогда этот лимит будет применен при выполнении чтения на удаленных серверах (включая и сервер-инициатор) 
+и проигнорирован на сервере-инициаторе запроса во время обьединения полученных результатов. Например, кластер состоит 
+из 2 шард и каждый из них хранит таблицу со 100 байтами. Тогда распределнный запрос для получения всех данных из этих таблиц 
+и установленной настройкой `max_bytes_to_read=150` выбросит исключение, т.к. в общем он прочитает 200 байт. Но запрос 
+с настройкой  `max_bytes_to_read_leaf=150` завершится успешно, потому что каждый из шардов прочитает максимум 100 байт.
+
+## read\_overflow\_mode_leaf {#read-overflow-mode-leaf}
+
+Что делать, когда количество прочитанных данных на удаленном сервере превысило одно из ограничений: throw или break. По умолчанию: throw.
+
 ## max\_rows\_to\_group\_by {#settings-max-rows-to-group-by}

 Максимальное количество уникальных ключей, получаемых в процессе агрегации. Позволяет ограничить потребление оперативки при агрегации.
--- a/docs/ru/sql-reference/functions/ext-dict-functions.md
+++ b/docs/ru/sql-reference/functions/ext-dict-functions.md
@ -103,7 +103,7 @@ dictHas('dict_name', id)
 **Параметры**

 -   `dict_name` — имя словаря. [Строковый литерал](../syntax.md#syntax-string-literal).
-   `id_expr` — значение ключа словаря. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md).
+-   `id_expr` — значение ключа словаря. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md) или [Tuple](../../sql-reference/functions/ext-dict-functions.md) в зависимости от конфигурации словаря.

 **Возвращаемое значение**

@ -179,7 +179,7 @@ dictGet[Type]OrDefault('dict_name', 'attr_name', id_expr, default_value_expr)

 -   `dict_name` — имя словаря. [Строковый литерал](../syntax.md#syntax-string-literal).
 -   `attr_name` — имя столбца словаря. [Строковый литерал](../syntax.md#syntax-string-literal).
-   `id_expr` — значение ключа словаря. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md).
+-   `id_expr` — значение ключа словаря. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md) или [Tuple](../../sql-reference/functions/ext-dict-functions.md) в зависимости от конфигурации словаря.
 -   `default_value_expr` — значение, возвращаемое в том случае, когда словарь не содержит строки с заданным ключом `id_expr`. [Выражение](../syntax.md#syntax-expressions) возвращающее значение с типом данных, сконфигурированным для атрибута `attr_name`.

 **Возвращаемое значение**
--- a/docs/ru/sql-reference/statements/create/view.md
+++ b/docs/ru/sql-reference/statements/create/view.md
@ -5,13 +5,15 @@ toc_title: Представление

 # CREATE VIEW {#create-view}

-``` sql
-CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...
-```
-
 Создаёт представление. Представления бывают двух видов - обычные и материализованные (MATERIALIZED).

-Обычные представления не хранят никаких данных, а всего лишь производят чтение из другой таблицы. То есть, обычное представление - не более чем сохранённый запрос. При чтении из представления, этот сохранённый запрос, используется в качестве подзапроса в секции FROM.
+## Обычные представления {#normal}
+
+``` sql
+CREATE [OR REPLACE] VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] AS SELECT ...
+```
+
+Normal views don’t store any data, they just perform a read from another table on each access. In other words, a normal view is nothing more than a saved query. When reading from a view, this saved query is used as a subquery in the [FROM](../../../sql-reference/statements/select/from.md) clause.

 Для примера, пусть вы создали представление:

@ -31,15 +33,24 @@ SELECT a, b, c FROM view
 SELECT a, b, c FROM (SELECT ...)
 ```

-Материализованные (MATERIALIZED) представления хранят данные, преобразованные соответствующим запросом SELECT.
+## Материализованные представления {#materialized}

-При создании материализованного представления без использования `TO [db].[table]`, нужно обязательно указать ENGINE - движок таблицы для хранения данных.
+``` sql
+CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...
+```
+
+Материализованные (MATERIALIZED) представления хранят данные, преобразованные соответствующим запросом [SELECT](../../../sql-reference/statements/select/index.md).
+
+При создании материализованного представления без использования `TO [db].[table]`, нужно обязательно указать `ENGINE` - движок таблицы для хранения данных.

 При создании материализованного представления с испольованием `TO [db].[table]`, нельзя указывать `POPULATE`

 Материализованное представление устроено следующим образом: при вставке данных в таблицу, указанную в SELECT-е, кусок вставляемых данных преобразуется этим запросом SELECT, и полученный результат вставляется в представление.

-Если указано POPULATE, то при создании представления, в него будут вставлены имеющиеся данные таблицы, как если бы был сделан запрос `CREATE TABLE ... AS SELECT ...` . Иначе, представление будет содержать только данные, вставляемые в таблицу после создания представления. Не рекомендуется использовать POPULATE, так как вставляемые в таблицу данные во время создания представления, не попадут в него.
+!!! important "Важно"
+    Материализованные представлени в ClickHouse больше похожи на `after insert` триггеры. Если в запросе материализованного представления есть агрегирование, оно применяется только к вставляемому блоку записей. Любые изменения существующих данных исходной таблицы (например обновление, удаление, удаление раздела и т.д.) не изменяют материализованное представление.
+
+Если указано `POPULATE`, то при создании представления, в него будут вставлены имеющиеся данные таблицы, как если бы был сделан запрос `CREATE TABLE ... AS SELECT ...` . Иначе, представление будет содержать только данные, вставляемые в таблицу после создания представления. Не рекомендуется использовать POPULATE, так как вставляемые в таблицу данные во время создания представления, не попадут в него.

 Запрос `SELECT` может содержать `DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`… Следует иметь ввиду, что соответствующие преобразования будут выполняться независимо, на каждый блок вставляемых данных. Например, при наличии `GROUP BY`, данные будут агрегироваться при вставке, но только в рамках одной пачки вставляемых данных. Далее, данные не будут доагрегированы. Исключение - использование ENGINE, производящего агрегацию данных самостоятельно, например, `SummingMergeTree`.

@ -50,4 +61,4 @@ SELECT a, b, c FROM (SELECT ...)
 Отсутствует отдельный запрос для удаления представлений. Чтобы удалить представление, следует использовать `DROP TABLE`.

 [Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/create/view) 
-<!--hide-->
+<!--hide-->
--- a/docs/ru/sql-reference/statements/drop.md
+++ b/docs/ru/sql-reference/statements/drop.md
@ -5,18 +5,35 @@ toc_title: DROP

 # DROP {#drop}

-Запрос имеет два вида: `DROP DATABASE` и `DROP TABLE`.
+Удаляет существующий объект. 
+Если указано `IF EXISTS` - не выдавать ошибку, если объекта не существует.
+
+## DROP DATABASE {#drop-database}

 ``` sql
 DROP DATABASE [IF EXISTS] db [ON CLUSTER cluster]
 ```

+Удаляет все таблицы в базе данных db, затем удаляет саму базу данных db.
+
+
+## DROP TABLE {#drop-table}
+
 ``` sql
 DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
 ```

 Удаляет таблицу.
-Если указано `IF EXISTS` - не выдавать ошибку, если таблица не существует или база данных не существует.
+
+
+## DROP DICTIONARY {#drop-dictionary}
+
+``` sql
+DROP DICTIONARY [IF EXISTS] [db.]name
+```
+
+Удаляет словарь.
+

 ## DROP USER {#drop-user-statement}

@ -41,6 +58,7 @@ DROP USER [IF EXISTS] name [,...] [ON CLUSTER cluster_name]
 DROP ROLE [IF EXISTS] name [,...] [ON CLUSTER cluster_name]
 ```

+
 ## DROP ROW POLICY {#drop-row-policy-statement}

 Удаляет политику доступа к строкам.
@ -80,5 +98,13 @@ DROP [SETTINGS] PROFILE [IF EXISTS] name [,...] [ON CLUSTER cluster_name]
 ```


+## DROP VIEW {#drop-view}

-[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/drop/) <!--hide-->
+``` sql
+DROP VIEW [IF EXISTS] [db.]name [ON CLUSTER cluster]
+```
+
+Удаляет представление. Представления могут быть удалены и командой `DROP TABLE`, но команда `DROP VIEW` проверяет, что `[db.]name` является представлением.
+
+
+[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/drop/) <!--hide-->
--- a/docs/zh/getting-started/tutorial.md
+++ b/docs/zh/getting-started/tutorial.md
@ -80,7 +80,7 @@ clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv

 ## 导入示例数据集 {#import-sample-dataset}

-现在是时候用一些示例数据填充我们的ClickHouse服务器。 在本教程中，我们将使用Yandex的匿名数据。Metrica，在成为开源之前以生产方式运行ClickHouse的第一个服务（更多关于这一点 [历史科](../introduction/history.md)). 有 [多种导入Yandex的方式。梅里卡数据集](example-datasets/metrica.md)，为了本教程，我们将使用最现实的一个。
+现在是时候用一些示例数据填充我们的ClickHouse服务端。 在本教程中，我们将使用Yandex.Metrica的匿名数据，它是在ClickHouse成为开源之前作为生产环境运行的第一个服务（关于这一点的更多内容请参阅[ClickHouse历史](../introduction/history.md))。有 [多种导入Yandex.Metrica数据集的的方法](example-datasets/metrica.md)，为了本教程，我们将使用最现实的一个。

 ### 下载并提取表数据 {#download-and-extract-table-data}

@ -93,22 +93,22 @@ curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unx

 ### 创建表 {#create-tables}

-与大多数数据库管理系统一样，ClickHouse在逻辑上将表分组为 “databases”. 有一个 `default` 数据库，但我们将创建一个名为新的 `tutorial`:
+与大多数数据库管理系统一样，ClickHouse在逻辑上将表分组为数据库。包含一个 `default` 数据库，但我们将创建一个新的数据库 `tutorial`:

 ``` bash
 clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
 ```

-与数据库相比，创建表的语法要复杂得多（请参阅 [参考资料](../sql-reference/statements/create.md). 一般 `CREATE TABLE` 声明必须指定三个关键的事情:
+与创建数据库相比，创建表的语法要复杂得多（请参阅 [参考资料](../sql-reference/statements/create.md). 一般 `CREATE TABLE` 声明必须指定三个关键的事情:

 1.  要创建的表的名称。
-2.  Table schema, i.e. list of columns and their [数据类型](../sql-reference/data-types/index.md).
-3.  [表引擎](../engines/table-engines/index.md) 及其设置，这决定了如何物理执行对此表的查询的所有细节。
+2.  表结构，例如：列名和对应的[数据类型](../sql-reference/data-types/index.md)。
+3.  [表引擎](../engines/table-engines/index.md) 及其设置，这决定了对此表的查询操作是如何在物理层面执行的所有细节。

-YandexMetrica是一个网络分析服务，样本数据集不包括其全部功能，因此只有两个表可以创建:
+Yandex.Metrica是一个网络分析服务，样本数据集不包括其全部功能，因此只有两个表可以创建:

-   `hits` 是一个表格，其中包含所有用户在服务所涵盖的所有网站上完成的每个操作。
-   `visits` 是一个包含预先构建的会话而不是单个操作的表。
+-   `hits` 表包含所有用户在服务所涵盖的所有网站上完成的每个操作。
+-   `visits` 表包含预先构建的会话，而不是单个操作。

 让我们看看并执行这些表的实际创建表查询:

@ -453,9 +453,9 @@ SAMPLE BY intHash32(UserID)
 SETTINGS index_granularity = 8192
 ```

-您可以使用以下交互模式执行这些查询 `clickhouse-client` （只需在终端中启动它，而不需要提前指定查询）或尝试一些 [替代接口](../interfaces/index.md) 如果你愿意的话
+您可以使用`clickhouse-client`的交互模式执行这些查询（只需在终端中启动它，而不需要提前指定查询）。或者如果你愿意，可以尝试一些[替代接口](../interfaces/index.md)。

-正如我们所看到的, `hits_v1` 使用 [基本MergeTree引擎](../engines/table-engines/mergetree-family/mergetree.md)，而 `visits_v1` 使用 [崩溃](../engines/table-engines/mergetree-family/collapsingmergetree.md) 变体。
+正如我们所看到的, `hits_v1` 使用 [基本的MergeTree引擎](../engines/table-engines/mergetree-family/mergetree.md)，而 `visits_v1` 使用 [折叠树](../engines/table-engines/mergetree-family/collapsingmergetree.md) 变体。

 ### 导入数据 {#import-data}

--- a/docs/zh/sql-reference/aggregate-functions/index.md
+++ b/docs/zh/sql-reference/aggregate-functions/index.md
@ -1,6 +1,6 @@
 ---
 toc_priority: 33
-toc_title: 简介
+toc_title: 聚合函数
 ---

 # 聚合函数 {#aggregate-functions}
--- a/docs/zh/sql-reference/functions/conditional-functions.md
+++ b/docs/zh/sql-reference/functions/conditional-functions.md
@ -34,7 +34,7 @@
    │ 2 │    3 │
    └───┴──────┘

-执行查询 `SELECT multiIf(isNull(y) x, y < 3, y, NULL) FROM t_null`。结果：
+执行查询 `SELECT multiIf(isNull(y), x, y < 3, y, NULL) FROM t_null`。结果：

    ┌─multiIf(isNull(y), x, less(y, 3), y, NULL)─┐
    │                                          1 │
--- a/docs/zh/sql-reference/table-functions/remote.md
+++ b/docs/zh/sql-reference/table-functions/remote.md
@ -1,13 +1,6 @@
---
-machine_translated: true
-machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
-toc_priority: 40
-toc_title: "\u8FDC\u7A0B"
---
-
 # 远程，远程安全 {#remote-remotesecure}

-允许您访问远程服务器，而无需创建 `Distributed` 桌子
+允许您访问远程服务器，而无需创建 `Distributed` 表

 签名:

@ -18,10 +11,10 @@ remoteSecure('addresses_expr', db, table[, 'user'[, 'password']])
 remoteSecure('addresses_expr', db.table[, 'user'[, 'password']])
 ```

-`addresses_expr` – An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`，或者只是 `host`. 主机可以指定为服务器名称，也可以指定为IPv4或IPv6地址。 IPv6地址在方括号中指定。 端口是远程服务器上的TCP端口。 如果省略端口，它使用 `tcp_port` 从服务器的配置文件（默认情况下，9000）。
+`addresses_expr` – 代表远程服务器地址的一个表达式。可以只是单个服务器地址。 服务器地址可以是 `host:port` 或 `host`。`host` 可以指定为服务器域名，或是IPV4或IPV6地址。IPv6地址在方括号中指定。`port` 是远程服务器上的TCP端口。 如果省略端口，则使用服务器配置文件中的 `tcp_port` （默认情况为，9000）。

 !!! important "重要事项"
-    IPv6地址需要该端口。
+    IPv6地址需要指定端口。

 例:

@ -34,7 +27,7 @@ localhost
 [2a02:6b8:0:1111::11]:9000
 ```

-多个地址可以用逗号分隔。 在这种情况下，ClickHouse将使用分布式处理，因此它将将查询发送到所有指定的地址（如具有不同数据的分片）。
+多个地址可以用逗号分隔。在这种情况下，ClickHouse将使用分布式处理，因此它将将查询发送到所有指定的地址（如具有不同数据的分片）。

 示例:

@ -56,7 +49,7 @@ example01-{01..02}-1

 如果您有多对大括号，它会生成相应集合的直接乘积。

-大括号中的地址和部分地址可以用管道符号(\|)分隔。 在这种情况下，相应的地址集被解释为副本，并且查询将被发送到第一个正常副本。 但是，副本将按照当前设置的顺序进行迭代 [load\_balancing](../../operations/settings/settings.md) 设置。
+大括号中的地址和部分地址可以用管道符号(\|)分隔。 在这种情况下，相应的地址集被解释为副本，并且查询将被发送到第一个正常副本。 但是，副本将按照当前[load\_balancing](../../operations/settings/settings.md)设置的顺序进行迭代。

 示例:

@ -66,20 +59,20 @@ example01-{01..02}-{1|2}

 此示例指定两个分片，每个分片都有两个副本。

-生成的地址数由常量限制。 现在这是1000个地址。
+生成的地址数由常量限制。目前这是1000个地址。

-使用 `remote` 表函数比创建一个不太优化 `Distributed` 表，因为在这种情况下，服务器连接被重新建立为每个请求。 此外，如果设置了主机名，则会解析这些名称，并且在使用各种副本时不会计算错误。 在处理大量查询时，始终创建 `Distributed` 表的时间提前，不要使用 `remote` 表功能。
+使用 `remote` 表函数没有创建一个 `Distributed` 表更优，因为在这种情况下，将为每个请求重新建立服务器连接。此外，如果设置了主机名，则会解析这些名称，并且在使用各种副本时不会计算错误。 在处理大量查询时，始终优先创建 `Distributed` 表，不要使用 `remote` 表功能。

 该 `remote` 表函数可以在以下情况下是有用的:

 -   访问特定服务器进行数据比较、调试和测试。
-   查询之间的各种ClickHouse群集用于研究目的。
-   手动发出的罕见分布式请求。
+-   在多个ClickHouse集群之间的用户研究目的的查询。
+-   手动发出的不频繁分布式请求。
 -   每次重新定义服务器集的分布式请求。

-如果未指定用户, `default` 被使用。
+如果未指定用户, 将会使用`default`。
 如果未指定密码，则使用空密码。

-`remoteSecure` -相同 `remote` but with secured connection. Default port — [tcp\_port\_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) 从配置或9440.
+`remoteSecure` - 与 `remote` 相同，但是会使用加密链接。默认端口为配置文件中的[tcp\_port\_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure)，或9440。

 [原始文章](https://clickhouse.tech/docs/en/query_language/table_functions/remote/) <!--hide-->
--- a/programs/benchmark/Benchmark.cpp
+++ b/programs/benchmark/Benchmark.cpp
@ -85,7 +85,12 @@ public:
            std::string cur_host = i >= hosts_.size() ? "localhost" : hosts_[i];

            connections.emplace_back(std::make_unique<ConnectionPool>(
-                concurrency, cur_host, cur_port, default_database_, user_, password_, "benchmark", Protocol::Compression::Enable, secure));
+                concurrency,
+                cur_host, cur_port,
+                default_database_, user_, password_,
+                "", /* cluster */
+                "", /* cluster_secret */
+                "benchmark", Protocol::Compression::Enable, secure));
            comparison_info_per_interval.emplace_back(std::make_shared<Stats>());
            comparison_info_total.emplace_back(std::make_shared<Stats>());
        }
--- a/programs/client/Client.cpp
+++ b/programs/client/Client.cpp
@ -701,6 +701,8 @@ private:
            connection_parameters.default_database,
            connection_parameters.user,
            connection_parameters.password,
+            "", /* cluster */
+            "", /* cluster_secret */
            "client",
            connection_parameters.compression,
            connection_parameters.security);
@ -958,7 +960,31 @@ private:

            // Try to parse the query.
            const char * this_query_end = this_query_begin;
-            parsed_query = parseQuery(this_query_end, all_queries_end, true);
+            try
+            {
+                parsed_query = parseQuery(this_query_end, all_queries_end, true);
+            }
+            catch (Exception & e)
+            {
+                if (!test_mode)
+                    throw;
+
+                /// Try find test hint for syntax error
+                const char * end_of_line = find_first_symbols<'\n'>(this_query_begin,all_queries_end);
+                TestHint hint(true, String(this_query_end, end_of_line - this_query_end));
+                if (hint.serverError()) /// Syntax errors are considered as client errors
+                    throw;
+                if (hint.clientError() != e.code())
+                {
+                    if (hint.clientError())
+                        e.addMessage("\nExpected clinet error: " + std::to_string(hint.clientError()));
+                    throw;
+                }
+
+                /// It's expected syntax error, skip the line
+                this_query_begin = end_of_line;
+                continue;
+            }

            if (!parsed_query)
            {
@ -1478,7 +1504,18 @@ private:
        {
            /// Send data contained in the query.
            ReadBufferFromMemory data_in(parsed_insert_query->data, parsed_insert_query->end - parsed_insert_query->data);
-            sendDataFrom(data_in, sample, columns_description);
+            try
+            {
+                sendDataFrom(data_in, sample, columns_description);
+            }
+            catch (Exception & e)
+            {
+                /// The following query will use data from input
+                //      "INSERT INTO data FORMAT TSV\n " < data.csv
+                //  And may be pretty hard to debug, so add information about data source to make it easier.
+                e.addMessage("data for INSERT was parsed from query");
+                throw;
+            }
            // Remember where the data ended. We use this info later to determine
            // where the next query begins.
            parsed_insert_query->end = data_in.buffer().begin() + data_in.count();
@ -1486,7 +1523,15 @@ private:
        else if (!is_interactive)
        {
            /// Send data read from stdin.
-            sendDataFrom(std_in, sample, columns_description);
+            try
+            {
+                sendDataFrom(std_in, sample, columns_description);
+            }
+            catch (Exception & e)
+            {
+                e.addMessage("data for INSERT was parsed from stdin");
+                throw;
+            }
        }
        else
            throw Exception("No data to insert", ErrorCodes::NO_DATA_TO_INSERT);
--- a/programs/client/Suggest.cpp
+++ b/programs/client/Suggest.cpp
@ -26,6 +26,8 @@ void Suggest::load(const ConnectionParameters & connection_parameters, size_t su
                    connection_parameters.default_database,
                    connection_parameters.user,
                    connection_parameters.password,
+                    "" /* cluster */,
+                    "" /* cluster_secret */,
                    "client",
                    connection_parameters.compression,
                    connection_parameters.security);
--- a/programs/server/Server.cpp
+++ b/programs/server/Server.cpp
@ -32,6 +32,8 @@
 #include <Common/getExecutablePath.h>
 #include <Common/ThreadProfileEvents.h>
 #include <Common/ThreadStatus.h>
+#include <Common/getMappedArea.h>
+#include <Common/remapExecutable.h>
 #include <IO/HTTPCommon.h>
 #include <IO/UseSSL.h>
 #include <Interpreters/AsynchronousMetrics.h>
@ -89,6 +91,23 @@ namespace CurrentMetrics
    extern const Metric MemoryTracking;
 }

+
+int mainEntryClickHouseServer(int argc, char ** argv)
+{
+    DB::Server app;
+    try
+    {
+        return app.run(argc, argv);
+    }
+    catch (...)
+    {
+        std::cerr << DB::getCurrentExceptionMessage(true) << "\n";
+        auto code = DB::getCurrentExceptionCode();
+        return code ? code : 1;
+    }
+}
+
+
 namespace
 {

@ -305,15 +324,27 @@ int Server::main(const std::vector<std::string> & /*args*/)

    /// After full config loaded
    {
+        if (config().getBool("remap_executable", false))
+        {
+            LOG_DEBUG(log, "Will remap executable in memory.");
+            remapExecutable();
+            LOG_DEBUG(log, "The code in memory has been successfully remapped.");
+        }
+
        if (config().getBool("mlock_executable", false))
        {
            if (hasLinuxCapability(CAP_IPC_LOCK))
            {
-                LOG_TRACE(log, "Will mlockall to prevent executable memory from being paged out. It may take a few seconds.");
-                if (0 != mlockall(MCL_CURRENT))
-                    LOG_WARNING(log, "Failed mlockall: {}", errnoToString(ErrorCodes::SYSTEM_ERROR));
+                /// Get the memory area with (current) code segment.
+                /// It's better to lock only the code segment instead of calling "mlockall",
+                /// because otherwise debug info will be also locked in memory, and it can be huge.
+                auto [addr, len] = getMappedArea(reinterpret_cast<void *>(mainEntryClickHouseServer));
+
+                LOG_TRACE(log, "Will do mlock to prevent executable memory from being paged out. It may take a few seconds.");
+                if (0 != mlock(addr, len))
+                    LOG_WARNING(log, "Failed mlock: {}", errnoToString(ErrorCodes::SYSTEM_ERROR));
                else
-                    LOG_TRACE(log, "The memory map of clickhouse executable has been mlock'ed");
+                    LOG_TRACE(log, "The memory map of clickhouse executable has been mlock'ed, total {}", ReadableSize(len));
            }
            else
            {
@ -530,6 +561,9 @@ int Server::main(const std::vector<std::string> & /*args*/)
            if (config->has("max_partition_size_to_drop"))
                global_context->setMaxPartitionSizeToDrop(config->getUInt64("max_partition_size_to_drop"));

+            if (config->has("zookeeper"))
+                global_context->reloadZooKeeperIfChanged(config);
+
            global_context->updateStorageConfiguration(*config);
        },
        /* already_loaded = */ true);
@ -708,7 +742,10 @@ int Server::main(const std::vector<std::string> & /*args*/)
    {
        /// DDL worker should be started after all tables were loaded
        String ddl_zookeeper_path = config().getString("distributed_ddl.path", "/clickhouse/task_queue/ddl/");
-        global_context->setDDLWorker(std::make_unique<DDLWorker>(ddl_zookeeper_path, *global_context, &config(), "distributed_ddl"));
+        int pool_size = config().getInt("distributed_ddl.pool_size", 1);
+        if (pool_size < 1)
+            throw Exception("distributed_ddl.pool_size should be greater then 0", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
+        global_context->setDDLWorker(std::make_unique<DDLWorker>(pool_size, ddl_zookeeper_path, *global_context, &config(), "distributed_ddl"));
    }

    std::unique_ptr<DNSCacheUpdater> dns_cache_updater;
@ -1124,21 +1161,3 @@ int Server::main(const std::vector<std::string> & /*args*/)
    return Application::EXIT_OK;
 }
 }
-
-#pragma GCC diagnostic ignored "-Wunused-function"
-#pragma GCC diagnostic ignored "-Wmissing-declarations"
-
-int mainEntryClickHouseServer(int argc, char ** argv)
-{
-    DB::Server app;
-    try
-    {
-        return app.run(argc, argv);
-    }
-    catch (...)
-    {
-        std::cerr << DB::getCurrentExceptionMessage(true) << "\n";
-        auto code = DB::getCurrentExceptionCode();
-        return code ? code : 1;
-    }
-}
--- a/programs/server/config.d/access_control.xml
+++ b/programs/server/config.d/access_control.xml
@ -0,0 +1,13 @@
+<yandex>
+    <!-- Sources to read users, roles, access rights, profiles of settings, quotas. -->
+    <user_directories replace="replace">
+        <users_xml>
+            <!-- Path to configuration file with predefined users. -->
+            <path>users.xml</path>
+        </users_xml>
+        <local_directory>
+            <!-- Path to folder where users created by SQL commands are stored. -->
+            <path>access/</path>
+        </local_directory>
+    </user_directories>
+</yandex>
--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@ -212,8 +212,17 @@
    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

-    <!-- Path to folder where users and roles created by SQL commands are stored. -->
-    <access_control_path>/var/lib/clickhouse/access/</access_control_path>
+    <!-- Sources to read users, roles, access rights, profiles of settings, quotas. -->
+    <user_directories>
+        <users_xml>
+            <!-- Path to configuration file with predefined users. -->
+            <path>users.xml</path>
+        </users_xml>
+        <local_directory>
+            <!-- Path to folder where users created by SQL commands are stored. -->
+            <path>/var/lib/clickhouse/access/</path>
+        </local_directory>
+    </user_directories>

    <!-- External user directories (LDAP). -->
    <ldap_servers>
@ -256,9 +265,6 @@
        -->
    </ldap_servers>

-    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
-    <users_config>users.xml</users_config>
-
    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

@ -296,12 +302,37 @@
    -->
    <mlock_executable>true</mlock_executable>

+    <!-- Reallocate memory for machine code ("text") using huge pages. Highly experimental. -->
+    <remap_executable>false</remap_executable>
+
    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.tech/docs/en/operations/table_engines/distributed/
      -->
    <remote_servers incl="clickhouse_remote_servers" >
        <!-- Test only shard config for testing distributed storage -->
        <test_shard_localhost>
+            <!-- Inter-server per-cluster secret for Distributed queries
+                 default: no secret (no authentication will be performed)
+
+                 If set, then Distributed queries will be validated on shards, so at least:
+                 - such cluster should exist on the shard,
+                 - such cluster should have the same secret.
+
+                 And also (and which is more important), the initial_user will
+                 be used as current user for the query.
+
+                 Right now the protocol is pretty simple and it only takes into account:
+                 - cluster name
+                 - query
+
+                 Also it will be nice if the following will be implemented:
+                 - source hostname (see interserver_http_host), but then it will depends from DNS,
+                   it can use IP address instead, but then the you need to get correct on the initiator node.
+                 - target hostname / ip address (same notes as for source hostname)
+                 - time-based security tokens
+            -->
+            <!-- <secret></secret> -->
+
            <shard>
                <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
                <!-- <internal_replication>false</internal_replication> -->
@ -615,6 +646,9 @@

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
+
+        <!-- Controls how much ON CLUSTER queries can be run simultaneously. -->
+        <!-- <pool_size>1</pool_size> -->
    </distributed_ddl>

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
--- a/src/Access/AccessControlManager.cpp
+++ b/src/Access/AccessControlManager.cpp
@ -181,6 +181,15 @@ void AccessControlManager::addUsersConfigStorage(
    const String & preprocessed_dir_,
    const zkutil::GetZooKeeper & get_zookeeper_function_)
 {
+    auto storages = getStoragesPtr();
+    for (const auto & storage : *storages)
+    {
+        if (auto users_config_storage = typeid_cast<std::shared_ptr<UsersConfigAccessStorage>>(storage))
+        {
+            if (users_config_storage->isPathEqual(users_config_path_))
+                return;
+        }
+    }
    auto check_setting_name_function = [this](const std::string_view & setting_name) { checkSettingNameIsAllowed(setting_name); };
    auto new_storage = std::make_shared<UsersConfigAccessStorage>(storage_name_, check_setting_name_function);
    new_storage->load(users_config_path_, include_from_path_, preprocessed_dir_, get_zookeeper_function_);
@ -210,17 +219,36 @@ void AccessControlManager::startPeriodicReloadingUsersConfigs()

 void AccessControlManager::addDiskStorage(const String & directory_, bool readonly_)
 {
-    addStorage(std::make_shared<DiskAccessStorage>(directory_, readonly_));
+    addDiskStorage(DiskAccessStorage::STORAGE_TYPE, directory_, readonly_);
 }

 void AccessControlManager::addDiskStorage(const String & storage_name_, const String & directory_, bool readonly_)
 {
+    auto storages = getStoragesPtr();
+    for (const auto & storage : *storages)
+    {
+        if (auto disk_storage = typeid_cast<std::shared_ptr<DiskAccessStorage>>(storage))
+        {
+            if (disk_storage->isPathEqual(directory_))
+            {
+                if (readonly_)
+                    disk_storage->setReadOnly(readonly_);
+                return;
+            }
+        }
+    }
    addStorage(std::make_shared<DiskAccessStorage>(storage_name_, directory_, readonly_));
 }


 void AccessControlManager::addMemoryStorage(const String & storage_name_)
 {
+    auto storages = getStoragesPtr();
+    for (const auto & storage : *storages)
+    {
+        if (auto memory_storage = typeid_cast<std::shared_ptr<MemoryAccessStorage>>(storage))
+            return;
+    }
    addStorage(std::make_shared<MemoryAccessStorage>(storage_name_));
 }

--- a/src/Access/AccessFlags.h
+++ b/src/Access/AccessFlags.h
@ -1,7 +1,7 @@
 #pragma once

 #include <Access/AccessType.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <Common/Exception.h>
 #include <ext/range.h>
 #include <ext/push_back.h>
--- a/src/Access/AccessRights.h
+++ b/src/Access/AccessRights.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Access/AccessRightsElement.h>
 #include <memory>
 #include <vector>
--- a/src/Access/AccessType.h
+++ b/src/Access/AccessType.h
@ -1,13 +1,17 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <boost/algorithm/string/case_conv.hpp>
 #include <boost/algorithm/string/replace.hpp>
 #include <array>
+#include <vector>


 namespace DB
 {
+
+using Strings = std::vector<String>;
+
 /// Represents an access type which can be granted on databases, tables, columns, etc.
 enum class AccessType
 {
--- a/src/Access/AllowedClientHosts.h
+++ b/src/Access/AllowedClientHosts.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Poco/Net/IPAddress.h>
 #include <memory>
 #include <vector>
@ -11,6 +11,9 @@

 namespace DB
 {
+
+using Strings = std::vector<String>;
+
 /// Represents lists of hosts an user is allowed to connect to server from.
 class AllowedClientHosts
 {
--- a/src/Access/Authentication.h
+++ b/src/Access/Authentication.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Common/Exception.h>
 #include <Common/OpenSSLHelpers.h>
 #include <Poco/SHA1Engine.h>
--- a/src/Access/DiskAccessStorage.cpp
+++ b/src/Access/DiskAccessStorage.cpp
@ -33,6 +33,9 @@
 #include <Interpreters/InterpreterShowGrantsQuery.h>
 #include <Common/quoteString.h>
 #include <Core/Defines.h>
+#include <Poco/JSON/JSON.h>
+#include <Poco/JSON/Object.h>
+#include <Poco/JSON/Stringifier.h>
 #include <boost/range/adaptor/map.hpp>
 #include <boost/range/algorithm/copy.hpp>
 #include <boost/range/algorithm_ext/push_back.hpp>
@ -218,6 +221,16 @@ namespace
    }


+    /// Converts a path to an absolute path and append it with a separator.
+    String makeDirectoryPathCanonical(const String & directory_path)
+    {
+        auto canonical_directory_path = std::filesystem::weakly_canonical(directory_path);
+        if (canonical_directory_path.has_filename())
+            canonical_directory_path += std::filesystem::path::preferred_separator;
+        return canonical_directory_path;
+    }
+
+
    /// Calculates the path to a file named <id>.sql for saving an access entity.
    String getEntityFilePath(const String & directory_path, const UUID & id)
    {
@ -298,22 +311,17 @@ DiskAccessStorage::DiskAccessStorage(const String & directory_path_, bool readon
 {
 }

-
 DiskAccessStorage::DiskAccessStorage(const String & storage_name_, const String & directory_path_, bool readonly_)
    : IAccessStorage(storage_name_)
 {
-    auto canonical_directory_path = std::filesystem::weakly_canonical(directory_path_);
-    if (canonical_directory_path.has_filename())
-        canonical_directory_path += std::filesystem::path::preferred_separator;
+    directory_path = makeDirectoryPathCanonical(directory_path_);
+    readonly = readonly_;

    std::error_code create_dir_error_code;
-    std::filesystem::create_directories(canonical_directory_path, create_dir_error_code);
+    std::filesystem::create_directories(directory_path, create_dir_error_code);

-    if (!std::filesystem::exists(canonical_directory_path) || !std::filesystem::is_directory(canonical_directory_path) || create_dir_error_code)
-        throw Exception("Couldn't create directory " + canonical_directory_path.string() + " reason: '" + create_dir_error_code.message() + "'", ErrorCodes::DIRECTORY_DOESNT_EXIST);
-
-    directory_path = canonical_directory_path;
-    readonly = readonly_;
+    if (!std::filesystem::exists(directory_path) || !std::filesystem::is_directory(directory_path) || create_dir_error_code)
+        throw Exception("Couldn't create directory " + directory_path + " reason: '" + create_dir_error_code.message() + "'", ErrorCodes::DIRECTORY_DOESNT_EXIST);

    bool should_rebuild_lists = std::filesystem::exists(getNeedRebuildListsMarkFilePath(directory_path));
    if (!should_rebuild_lists)
@ -337,6 +345,25 @@ DiskAccessStorage::~DiskAccessStorage()
 }


+String DiskAccessStorage::getStorageParamsJSON() const
+{
+    std::lock_guard lock{mutex};
+    Poco::JSON::Object json;
+    json.set("path", directory_path);
+    if (readonly)
+        json.set("readonly", readonly.load());
+    std::ostringstream oss;
+    Poco::JSON::Stringifier::stringify(json, oss);
+    return oss.str();
+}
+
+
+bool DiskAccessStorage::isPathEqual(const String & directory_path_) const
+{
+    return getPath() == makeDirectoryPathCanonical(directory_path_);
+}
+
+
 void DiskAccessStorage::clear()
 {
    entries_by_id.clear();
--- a/src/Access/DiskAccessStorage.h
+++ b/src/Access/DiskAccessStorage.h
@ -18,8 +18,13 @@ public:
    ~DiskAccessStorage() override;

    const char * getStorageType() const override { return STORAGE_TYPE; }
-    String getStoragePath() const override { return directory_path; }
-    bool isStorageReadOnly() const override { return readonly; }
+    String getStorageParamsJSON() const override;
+
+    String getPath() const { return directory_path; }
+    bool isPathEqual(const String & directory_path_) const;
+
+    void setReadOnly(bool readonly_) { readonly = readonly_; }
+    bool isReadOnly() const { return readonly; }

 private:
    std::optional<UUID> findImpl(EntityType type, const String & name) const override;
@ -66,7 +71,7 @@ private:
    void prepareNotifications(const UUID & id, const Entry & entry, bool remove, Notifications & notifications) const;

    String directory_path;
-    bool readonly;
+    std::atomic<bool> readonly;
    std::unordered_map<UUID, Entry> entries_by_id;
    std::unordered_map<std::string_view, Entry *> entries_by_name_and_type[static_cast<size_t>(EntityType::MAX)];
    boost::container::flat_set<EntityType> types_of_lists_to_write;
--- a/src/Access/EnabledRowPolicies.h
+++ b/src/Access/EnabledRowPolicies.h
@ -1,7 +1,7 @@
 #pragma once

 #include <Access/RowPolicy.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <Core/UUID.h>
 #include <boost/smart_ptr/atomic_shared_ptr.hpp>
 #include <unordered_map>
--- a/src/Access/EnabledSettings.h
+++ b/src/Access/EnabledSettings.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Core/UUID.h>
 #include <Access/SettingsConstraints.h>
 #include <Access/SettingsProfileElement.h>
--- a/src/Access/ExternalAuthenticators.h
+++ b/src/Access/ExternalAuthenticators.h
@ -1,7 +1,7 @@
 #pragma once

 #include <Access/LDAPParams.h>
-#include <Core/Types.h>
+#include <common/types.h>

 #include <map>
 #include <memory>
--- a/src/Access/IAccessEntity.h
+++ b/src/Access/IAccessEntity.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Common/typeid_cast.h>
 #include <Common/quoteString.h>
 #include <boost/algorithm/string.hpp>
--- a/src/Access/IAccessStorage.h
+++ b/src/Access/IAccessStorage.h
@ -1,7 +1,7 @@
 #pragma once

 #include <Access/IAccessEntity.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <Core/UUID.h>
 #include <ext/scope_guard.h>
 #include <functional>
@ -25,8 +25,9 @@ public:
    /// Returns the name of this storage.
    const String & getStorageName() const { return storage_name; }
    virtual const char * getStorageType() const = 0;
-    virtual String getStoragePath() const { return {}; }
-    virtual bool isStorageReadOnly() const { return false; }
+
+    /// Returns a JSON with the parameters of the storage. It's up to the storage type to fill the JSON.
+    virtual String getStorageParamsJSON() const { return "{}"; }

    using EntityType = IAccessEntity::Type;
    using EntityTypeInfo = IAccessEntity::TypeInfo;
--- a/src/Access/LDAPClient.h
+++ b/src/Access/LDAPClient.h
@ -5,7 +5,7 @@
 #endif

 #include <Access/LDAPParams.h>
-#include <Core/Types.h>
+#include <common/types.h>

 #if USE_LDAP
 #   include <ldap.h>
--- a/src/Access/LDAPParams.h
+++ b/src/Access/LDAPParams.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>

 #include <chrono>

--- a/src/Access/SettingsProfilesCache.h
+++ b/src/Access/SettingsProfilesCache.h
@ -2,7 +2,7 @@

 #include <Access/EnabledSettings.h>
 #include <Core/UUID.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <ext/scope_guard.h>
 #include <map>
 #include <unordered_map>
--- a/src/Access/UsersConfigAccessStorage.cpp
+++ b/src/Access/UsersConfigAccessStorage.cpp
@ -10,6 +10,9 @@
 #include <Core/Settings.h>
 #include <Poco/Util/AbstractConfiguration.h>
 #include <Poco/MD5Engine.h>
+#include <Poco/JSON/JSON.h>
+#include <Poco/JSON/Object.h>
+#include <Poco/JSON/Stringifier.h>
 #include <common/logger_useful.h>
 #include <boost/range/algorithm/copy.hpp>
 #include <boost/range/adaptor/map.hpp>
@ -482,12 +485,29 @@ UsersConfigAccessStorage::UsersConfigAccessStorage(const String & storage_name_,
 UsersConfigAccessStorage::~UsersConfigAccessStorage() = default;


-String UsersConfigAccessStorage::getStoragePath() const
+String UsersConfigAccessStorage::getStorageParamsJSON() const
+{
+    std::lock_guard lock{load_mutex};
+    Poco::JSON::Object json;
+    if (!path.empty())
+        json.set("path", path);
+    std::ostringstream oss;
+    Poco::JSON::Stringifier::stringify(json, oss);
+    return oss.str();
+}
+
+
+String UsersConfigAccessStorage::getPath() const
 {
    std::lock_guard lock{load_mutex};
    return path;
 }

+bool UsersConfigAccessStorage::isPathEqual(const String & path_) const
+{
+    return getPath() == path_;
+}
+

 void UsersConfigAccessStorage::setConfig(const Poco::Util::AbstractConfiguration & config)
 {
--- a/src/Access/UsersConfigAccessStorage.h
+++ b/src/Access/UsersConfigAccessStorage.h
@ -26,8 +26,10 @@ public:
    ~UsersConfigAccessStorage() override;

    const char * getStorageType() const override { return STORAGE_TYPE; }
-    String getStoragePath() const override;
-    bool isStorageReadOnly() const override { return true; }
+    String getStorageParamsJSON() const override;
+
+    String getPath() const;
+    bool isPathEqual(const String & path_) const;

    void setConfig(const Poco::Util::AbstractConfiguration & config);

--- a/src/AggregateFunctions/AggregateFunctionRankCorrelation.h
+++ b/src/AggregateFunctions/AggregateFunctionRankCorrelation.h
@ -6,7 +6,7 @@
 #include <Columns/ColumnTuple.h>
 #include <Common/assert_cast.h>
 #include <Common/FieldVisitors.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <DataTypes/DataTypesDecimal.h>
 #include <DataTypes/DataTypeNullable.h>
 #include <DataTypes/DataTypesNumber.h>
--- a/src/AggregateFunctions/IAggregateFunction.h
+++ b/src/AggregateFunctions/IAggregateFunction.h
@ -5,7 +5,7 @@
 #include <vector>
 #include <type_traits>

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Core/ColumnNumbers.h>
 #include <Core/Block.h>
 #include <Common/Exception.h>
--- a/src/AggregateFunctions/QuantileExact.h
+++ b/src/AggregateFunctions/QuantileExact.h
@ -1,7 +1,7 @@
 #pragma once

 #include <algorithm>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <IO/ReadBuffer.h>
 #include <IO/VarInt.h>
 #include <IO/WriteBuffer.h>
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@ -117,6 +117,10 @@ endif ()

 add_library(clickhouse_common_io ${clickhouse_common_io_headers} ${clickhouse_common_io_sources})

+if (SPLIT_SHARED_LIBRARIES)
+    target_compile_definitions(clickhouse_common_io PRIVATE SPLIT_SHARED_LIBRARIES)
+endif ()
+
 add_library (clickhouse_malloc OBJECT Common/malloc.cpp)
 set_source_files_properties(Common/malloc.cpp PROPERTIES COMPILE_FLAGS "-fno-builtin")

--- a/src/Client/Connection.cpp
+++ b/src/Client/Connection.cpp
@ -17,12 +17,15 @@
 #include <Common/CurrentMetrics.h>
 #include <Common/DNSResolver.h>
 #include <Common/StringUtils/StringUtils.h>
+#include <Common/OpenSSLHelpers.h>
+#include <Common/randomSeed.h>
 #include <Interpreters/ClientInfo.h>
 #include <Compression/CompressionFactory.h>
 #include <Processors/Pipe.h>
 #include <Processors/ISink.h>
 #include <Processors/Executors/PipelineExecutor.h>
 #include <Processors/ConcatProcessor.h>
+#include <pcg_random.hpp>

 #if !defined(ARCADIA_BUILD)
 #    include <Common/config_version.h>
@ -171,8 +174,26 @@ void Connection::sendHello()
    // NOTE For backward compatibility of the protocol, client cannot send its version_patch.
    writeVarUInt(client_revision, *out);
    writeStringBinary(default_database, *out);
-    writeStringBinary(user, *out);
-    writeStringBinary(password, *out);
+    /// If interserver-secret is used, one do not need password
+    /// (NOTE we do not check for DBMS_MIN_REVISION_WITH_INTERSERVER_SECRET, since we cannot ignore inter-server secret if it was requested)
+    if (!cluster_secret.empty())
+    {
+        writeStringBinary(USER_INTERSERVER_MARKER, *out);
+        writeStringBinary("" /* password */, *out);
+
+#if USE_SSL
+        sendClusterNameAndSalt();
+#else
+        throw Exception(
+            "Inter-server secret support is disabled, because ClickHouse was built without SSL library",
+            ErrorCodes::SUPPORT_IS_DISABLED);
+#endif
+    }
+    else
+    {
+        writeStringBinary(user, *out);
+        writeStringBinary(password, *out);
+    }

    out->next();
 }
@ -288,6 +309,19 @@ void Connection::forceConnected(const ConnectionTimeouts & timeouts)
    }
 }

+#if USE_SSL
+void Connection::sendClusterNameAndSalt()
+{
+    pcg64_fast rng(randomSeed());
+    UInt64 rand = rng();
+
+    salt = encodeSHA256(&rand, sizeof(rand));
+
+    writeStringBinary(cluster, *out);
+    writeStringBinary(salt, *out);
+}
+#endif
+
 bool Connection::ping()
 {
    // LOG_TRACE(log_wrapper.get(), "Ping");
@ -406,6 +440,37 @@ void Connection::sendQuery(
    else
        writeStringBinary("" /* empty string is a marker of the end of settings */, *out);

+    /// Interserver secret
+    if (server_revision >= DBMS_MIN_REVISION_WITH_INTERSERVER_SECRET)
+    {
+        /// Hash
+        ///
+        /// Send correct hash only for !INITIAL_QUERY, due to:
+        /// - this will avoid extra protocol complexity for simplest cases
+        /// - there is no need in hash for the INITIAL_QUERY anyway
+        ///   (since there is no secure/unsecure changes)
+        if (client_info && !cluster_secret.empty() && client_info->query_kind != ClientInfo::QueryKind::INITIAL_QUERY)
+        {
+#if USE_SSL
+            std::string data(salt);
+            data += cluster_secret;
+            data += query;
+            data += query_id;
+            data += client_info->initial_user;
+            /// TODO: add source/target host/ip-address
+
+            std::string hash = encodeSHA256(data);
+            writeStringBinary(hash, *out);
+#else
+        throw Exception(
+            "Inter-server secret support is disabled, because ClickHouse was built without SSL library",
+            ErrorCodes::SUPPORT_IS_DISABLED);
+#endif
+        }
+        else
+            writeStringBinary("", *out);
+    }
+
    writeVarUInt(stage, *out);
    writeVarUInt(static_cast<bool>(compression), *out);

--- a/src/Client/Connection.h
+++ b/src/Client/Connection.h
@ -83,6 +83,8 @@ public:
    Connection(const String & host_, UInt16 port_,
        const String & default_database_,
        const String & user_, const String & password_,
+        const String & cluster_,
+        const String & cluster_secret_,
        const String & client_name_ = "client",
        Protocol::Compression compression_ = Protocol::Compression::Enable,
        Protocol::Secure secure_ = Protocol::Secure::Disable,
@ -90,6 +92,8 @@ public:
        :
        host(host_), port(port_), default_database(default_database_),
        user(user_), password(password_),
+        cluster(cluster_),
+        cluster_secret(cluster_secret_),
        client_name(client_name_),
        compression(compression_),
        secure(secure_),
@ -191,6 +195,11 @@ private:
    String user;
    String password;

+    /// For inter-server authorization
+    String cluster;
+    String cluster_secret;
+    String salt;
+
    /// Address is resolved during the first connection (or the following reconnects)
    /// Use it only for logging purposes
    std::optional<Poco::Net::SocketAddress> current_resolved_address;
@ -269,6 +278,10 @@ private:
    void connect(const ConnectionTimeouts & timeouts);
    void sendHello();
    void receiveHello();
+
+#if USE_SSL
+    void sendClusterNameAndSalt();
+#endif
    bool ping();

    Block receiveData();
--- a/src/Client/ConnectionPool.h
+++ b/src/Client/ConnectionPool.h
@ -54,6 +54,8 @@ public:
            const String & default_database_,
            const String & user_,
            const String & password_,
+            const String & cluster_,
+            const String & cluster_secret_,
            const String & client_name_ = "client",
            Protocol::Compression compression_ = Protocol::Compression::Enable,
            Protocol::Secure secure_ = Protocol::Secure::Disable,
@ -65,6 +67,8 @@ public:
        default_database(default_database_),
        user(user_),
        password(password_),
+        cluster(cluster_),
+        cluster_secret(cluster_secret_),
        client_name(client_name_),
        compression(compression_),
        secure(secure_),
@ -109,6 +113,7 @@ protected:
        return std::make_shared<Connection>(
            host, port,
            default_database, user, password,
+            cluster, cluster_secret,
            client_name, compression, secure);
    }

@ -119,6 +124,10 @@ private:
    String user;
    String password;

+    /// For inter-server authorization
+    String cluster;
+    String cluster_secret;
+
    String client_name;
    Protocol::Compression compression; /// Whether to compress data when interacting with the server.
    Protocol::Secure secure;           /// Whether to encrypt data when interacting with the server.
--- a/src/Client/ConnectionPoolWithFailover.cpp
+++ b/src/Client/ConnectionPoolWithFailover.cpp
@ -56,6 +56,9 @@ IConnectionPool::Entry ConnectionPoolWithFailover::get(const ConnectionTimeouts
        return tryGetEntry(pool, timeouts, fail_message, settings);
    };

+    size_t offset = 0;
+    if (settings)
+        offset = settings->load_balancing_first_offset % nested_pools.size();
    GetPriorityFunc get_priority;
    switch (settings ? LoadBalancing(settings->load_balancing) : default_load_balancing)
    {
@ -68,7 +71,7 @@ IConnectionPool::Entry ConnectionPoolWithFailover::get(const ConnectionTimeouts
    case LoadBalancing::RANDOM:
        break;
    case LoadBalancing::FIRST_OR_RANDOM:
-        get_priority = [](size_t i) -> size_t { return i >= 1; };
+        get_priority = [offset](size_t i) -> size_t { return i != offset; };
        break;
    case LoadBalancing::ROUND_ROBIN:
        if (last_used >= nested_pools.size())
@ -190,6 +193,9 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
    else
        throw DB::Exception("Unknown pool allocation mode", DB::ErrorCodes::LOGICAL_ERROR);

+    size_t offset = 0;
+    if (settings)
+        offset = settings->load_balancing_first_offset % nested_pools.size();
    GetPriorityFunc get_priority;
    switch (settings ? LoadBalancing(settings->load_balancing) : default_load_balancing)
    {
@ -202,7 +208,7 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
    case LoadBalancing::RANDOM:
        break;
    case LoadBalancing::FIRST_OR_RANDOM:
-        get_priority = [](size_t i) -> size_t { return i >= 1; };
+        get_priority = [offset](size_t i) -> size_t { return i != offset; };
        break;
    case LoadBalancing::ROUND_ROBIN:
        if (last_used >= nested_pools.size())
--- a/src/Columns/ColumnArray.cpp
+++ b/src/Columns/ColumnArray.cpp
@ -781,18 +781,21 @@ void ColumnArray::getPermutation(bool reverse, size_t limit, int nan_direction_h

 void ColumnArray::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_range) const
 {
+    if (equal_range.empty())
+        return;
+
    if (limit >= size() || limit >= equal_range.back().second)
        limit = 0;

-    size_t n = equal_range.size();
+    size_t number_of_ranges = equal_range.size();

    if (limit)
-        --n;
+        --number_of_ranges;

    EqualRanges new_ranges;
-    for (size_t i = 0; i < n; ++i)
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto& [first, last] = equal_range[i];
+        const auto & [first, last] = equal_range[i];

        if (reverse)
            std::sort(res.begin() + first, res.begin() + last, Less<false>(*this, nan_direction_hint));
@ -817,7 +820,13 @@ void ColumnArray::updatePermutation(bool reverse, size_t limit, int nan_directio

    if (limit)
    {
-        const auto& [first, last] = equal_range.back();
+        const auto & [first, last] = equal_range.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, Less<false>(*this, nan_direction_hint));
        else
--- a/src/Columns/ColumnDecimal.cpp
+++ b/src/Columns/ColumnDecimal.cpp
@ -7,6 +7,7 @@
 #include <Core/BigInt.h>

 #include <common/unaligned.h>
+#include <ext/scope_guard.h>

 #include <IO/WriteHelpers.h>

@ -142,25 +143,31 @@ void ColumnDecimal<T>::getPermutation(bool reverse, size_t limit, int , IColumn:
 }

 template <typename T>
-void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColumn::Permutation & res, EqualRanges & equal_range) const
+void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColumn::Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= data.size() || limit >= equal_range.back().second)
+    if (equal_ranges.empty())
+        return;
+
+    if (limit >= data.size() || limit >= equal_ranges.back().second)
        limit = 0;

-    size_t n = equal_range.size();
+    size_t number_of_ranges = equal_ranges.size();
    if (limit)
-        --n;
+        --number_of_ranges;

    EqualRanges new_ranges;
-    for (size_t i = 0; i < n; ++i)
+    SCOPE_EXIT({equal_ranges = std::move(new_ranges);});
+
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto& [first, last] = equal_range[i];
+        const auto& [first, last] = equal_ranges[i];
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
                [this](size_t a, size_t b) { return data[a] > data[b]; });
        else
            std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
                [this](size_t a, size_t b) { return data[a] < data[b]; });
+
        auto new_first = first;
        for (auto j = first + 1; j < last; ++j)
        {
@ -178,13 +185,20 @@ void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColum

    if (limit)
    {
-        const auto& [first, last] = equal_range.back();
+        const auto & [first, last] = equal_ranges.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
                [this](size_t a, size_t b) { return data[a] > data[b]; });
        else
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
                [this](size_t a, size_t b) { return data[a] < data[b]; });
+
        auto new_first = first;
        for (auto j = first + 1; j < limit; ++j)
        {
@ -208,7 +222,6 @@ void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColum
        if (new_last - new_first > 1)
            new_ranges.emplace_back(new_first, new_last);
    }
-    equal_range = std::move(new_ranges);
 }

 template <typename T>
--- a/src/Columns/ColumnFixedString.cpp
+++ b/src/Columns/ColumnFixedString.cpp
@ -9,6 +9,8 @@
 #include <Common/WeakHash.h>
 #include <Common/HashTable/Hash.h>

+#include <ext/scope_guard.h>
+
 #include <DataStreams/ColumnGathererStream.h>

 #include <IO/WriteHelpers.h>
@ -168,24 +170,29 @@ void ColumnFixedString::getPermutation(bool reverse, size_t limit, int /*nan_dir
    }
 }

-void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_range) const
+void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= size() || limit >= equal_range.back().second)
+    if (equal_ranges.empty())
+        return;
+
+    if (limit >= size() || limit >= equal_ranges.back().second)
        limit = 0;

-    size_t k = equal_range.size();
+    size_t number_of_ranges = equal_ranges.size();
    if (limit)
-        --k;
+        --number_of_ranges;

    EqualRanges new_ranges;
+    SCOPE_EXIT({equal_ranges = std::move(new_ranges);});

-    for (size_t i = 0; i < k; ++i)
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto& [first, last] = equal_range[i];
+        const auto& [first, last] = equal_ranges[i];
        if (reverse)
            std::sort(res.begin() + first, res.begin() + last, less<false>(*this));
        else
            std::sort(res.begin() + first, res.begin() + last, less<true>(*this));
+
        auto new_first = first;
        for (auto j = first + 1; j < last; ++j)
        {
@ -202,11 +209,18 @@ void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permu
    }
    if (limit)
    {
-        const auto& [first, last] = equal_range.back();
+        const auto & [first, last] = equal_ranges.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<false>(*this));
        else
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<true>(*this));
+
        auto new_first = first;
        for (auto j = first + 1; j < limit; ++j)
        {
@ -230,7 +244,6 @@ void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permu
        if (new_last - new_first > 1)
            new_ranges.emplace_back(new_first, new_last);
    }
-    equal_range = std::move(new_ranges);
 }

 void ColumnFixedString::insertRangeFrom(const IColumn & src, size_t start, size_t length)
--- a/src/Columns/ColumnLowCardinality.cpp
+++ b/src/Columns/ColumnLowCardinality.cpp
@ -6,6 +6,7 @@
 #include <Common/assert_cast.h>
 #include <Common/WeakHash.h>

+#include <ext/scope_guard.h>

 namespace DB
 {
@ -329,19 +330,24 @@ void ColumnLowCardinality::getPermutation(bool reverse, size_t limit, int nan_di
    }
 }

-void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= size() || limit >= equal_range.back().second)
+    if (equal_ranges.empty())
+        return;
+
+    if (limit >= size() || limit >= equal_ranges.back().second)
        limit = 0;

-    size_t n = equal_range.size();
+    size_t number_of_ranges = equal_ranges.size();
    if (limit)
-        --n;
+        --number_of_ranges;

    EqualRanges new_ranges;
-    for (size_t i = 0; i < n; ++i)
+    SCOPE_EXIT({equal_ranges = std::move(new_ranges);});
+
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto& [first, last] = equal_range[i];
+        const auto& [first, last] = equal_ranges[i];
        if (reverse)
            std::sort(res.begin() + first, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
                      {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) > 0; });
@ -366,7 +372,13 @@ void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan

    if (limit)
    {
-        const auto& [first, last] = equal_range.back();
+        const auto & [first, last] = equal_ranges.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
                              {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) > 0; });
@ -374,6 +386,7 @@ void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
                              {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) < 0; });
        auto new_first = first;
+
        for (auto j = first + 1; j < limit; ++j)
        {
            if (getDictionary().compareAt(getIndexes().getUInt(res[new_first]), getIndexes().getUInt(res[j]), getDictionary(), nan_direction_hint) != 0)
@ -384,6 +397,7 @@ void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan
                new_first = j;
            }
        }
+
        auto new_last = limit;
        for (auto j = limit; j < last; ++j)
        {
@ -396,7 +410,6 @@ void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan
        if (new_last - new_first > 1)
            new_ranges.emplace_back(new_first, new_last);
    }
-    equal_range = std::move(new_ranges);
 }

 std::vector<MutableColumnPtr> ColumnLowCardinality::scatter(ColumnIndex num_columns, const Selector & selector) const
--- a/src/Columns/ColumnNullable.cpp
+++ b/src/Columns/ColumnNullable.cpp
@ -329,73 +329,113 @@ void ColumnNullable::getPermutation(bool reverse, size_t limit, int null_directi
    }
 }

-void ColumnNullable::updatePermutation(bool reverse, size_t limit, int null_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+void ColumnNullable::updatePermutation(bool reverse, size_t limit, int null_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= equal_range.back().second || limit >= size())
-        limit = 0;
+    if (equal_ranges.empty())
+        return;

-    EqualRanges new_ranges, temp_ranges;
+    /// We will sort nested columns into `new_ranges` and call updatePermutation in next columns with `null_ranges`.
+    EqualRanges new_ranges, null_ranges;

-    for (const auto &[first, last] : equal_range)
+    const auto is_nulls_last = ((null_direction_hint > 0) != reverse);
+
+    if (is_nulls_last)
    {
-        bool direction = ((null_direction_hint > 0) != reverse);
        /// Shift all NULL values to the end.
-
-        size_t read_idx = first;
-        size_t write_idx = first;
-        while (read_idx < last && (isNullAt(res[read_idx])^direction))
+        for (const auto & [first, last] : equal_ranges)
        {
-            ++read_idx;
-            ++write_idx;
-        }
+            /// Current interval is righter than limit. 
+            if (limit && first > limit)
+                break;

-        ++read_idx;
+            /// Consider a half interval [first, last)
+            size_t read_idx = first;
+            size_t write_idx = first;
+            size_t end_idx = last;

-        /// Invariants:
-        ///  write_idx < read_idx
-        ///  write_idx points to NULL
-        ///  read_idx will be incremented to position of next not-NULL
-        ///  there are range of NULLs between write_idx and read_idx - 1,
-        /// We are moving elements from end to begin of this range,
-        ///  so range will "bubble" towards the end.
-        /// Relative order of NULL elements could be changed,
-        ///  but relative order of non-NULLs is preserved.
-
-        while (read_idx < last && write_idx < last)
-        {
-            if (isNullAt(res[read_idx])^direction)
+            /// We can't check the limit here because the interval is not sorted by nested column.
+            while (read_idx < end_idx && !isNullAt(res[read_idx]))
            {
-                std::swap(res[read_idx], res[write_idx]);
+                ++read_idx;
                ++write_idx;
            }
-            ++read_idx;
-        }

-        if (write_idx - first > 1)
-        {
-            if (direction)
-                temp_ranges.emplace_back(first, write_idx);
-            else
+            ++read_idx;
+
+            /// Invariants:
+            ///  write_idx < read_idx
+            ///  write_idx points to NULL
+            ///  read_idx will be incremented to position of next not-NULL
+            ///  there are range of NULLs between write_idx and read_idx - 1,
+            /// We are moving elements from end to begin of this range,
+            ///  so range will "bubble" towards the end.
+            /// Relative order of NULL elements could be changed,
+            ///  but relative order of non-NULLs is preserved.
+
+            while (read_idx < end_idx && write_idx < end_idx)
+            {
+                if (!isNullAt(res[read_idx]))
+                {
+                    std::swap(res[read_idx], res[write_idx]);
+                    ++write_idx;
+                }
+                ++read_idx;
+            }
+
+            /// We have a range [first, write_idx) of non-NULL values
+            if (first != write_idx)
                new_ranges.emplace_back(first, write_idx);

-        }
-
-        if (last - write_idx > 1)
-        {
-            if (direction)
-                new_ranges.emplace_back(write_idx, last);
-            else
-                temp_ranges.emplace_back(write_idx, last);
+            /// We have a range [write_idx, list) of NULL values
+            if (write_idx != last)
+                null_ranges.emplace_back(write_idx, last);
        }
    }
-    while (!new_ranges.empty() && limit && limit <= new_ranges.back().first)
-        new_ranges.pop_back();
+    else
+    {
+        /// Shift all NULL values to the beginning.
+        for (const auto & [first, last] : equal_ranges)
+        {
+            /// Current interval is righter than limit.
+            if (limit && first > limit)
+                break;

-    if (!temp_ranges.empty())
-        getNestedColumn().updatePermutation(reverse, limit, null_direction_hint, res, temp_ranges);
+            ssize_t read_idx = last - 1;
+            ssize_t write_idx = last - 1;
+            ssize_t begin_idx = first;

-    equal_range.resize(temp_ranges.size() + new_ranges.size());
-    std::merge(temp_ranges.begin(), temp_ranges.end(), new_ranges.begin(), new_ranges.end(), equal_range.begin());
+            while (read_idx >= begin_idx && !isNullAt(res[read_idx]))
+            {
+                --read_idx;
+                --write_idx;
+            }
+
+            --read_idx;
+
+            while (read_idx >= begin_idx && write_idx >= begin_idx)
+            {
+                if (!isNullAt(res[read_idx]))
+                {
+                    std::swap(res[read_idx], res[write_idx]);
+                    --write_idx;
+                }
+                --read_idx;
+            }
+
+            /// We have a range [write_idx+1, last) of non-NULL values
+            if (write_idx != static_cast<ssize_t>(last))
+                new_ranges.emplace_back(write_idx + 1, last);
+
+            /// We have a range [first, write_idx+1) of NULL values
+            if (static_cast<ssize_t>(first) != write_idx)
+                null_ranges.emplace_back(first, write_idx + 1);
+        }
+    }
+
+    getNestedColumn().updatePermutation(reverse, limit, null_direction_hint, res, new_ranges);
+
+    equal_ranges = std::move(new_ranges);
+    std::move(null_ranges.begin(), null_ranges.end(), std::back_inserter(equal_ranges));
 }

 void ColumnNullable::gather(ColumnGathererStream & gatherer)
--- a/src/Columns/ColumnString.cpp
+++ b/src/Columns/ColumnString.cpp
@ -9,7 +9,7 @@
 #include <DataStreams/ColumnGathererStream.h>

 #include <common/unaligned.h>
-
+#include <ext/scope_guard.h>

 namespace DB
 {
@ -325,25 +325,30 @@ void ColumnString::getPermutation(bool reverse, size_t limit, int /*nan_directio
    }
 }

-void ColumnString::updatePermutation(bool reverse, size_t limit, int /*nan_direction_hint*/, Permutation & res, EqualRanges & equal_range) const
+void ColumnString::updatePermutation(bool reverse, size_t limit, int /*nan_direction_hint*/, Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= size() || limit > equal_range.back().second)
+    if (equal_ranges.empty())
+        return;
+
+    if (limit >= size() || limit > equal_ranges.back().second)
        limit = 0;

    EqualRanges new_ranges;
-    auto less_true = less<true>(*this);
-    auto less_false = less<false>(*this);
-    size_t n = equal_range.size();
-    if (limit)
-        --n;
+    SCOPE_EXIT({equal_ranges = std::move(new_ranges);});

-    for (size_t i = 0; i < n; ++i)
+    size_t number_of_ranges = equal_ranges.size();
+    if (limit)
+        --number_of_ranges;
+
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto &[first, last] = equal_range[i];
+        const auto & [first, last] = equal_ranges[i];
+
        if (reverse)
-            std::sort(res.begin() + first, res.begin() + last, less_false);
+            std::sort(res.begin() + first, res.begin() + last, less<false>(*this));
        else
-            std::sort(res.begin() + first, res.begin() + last, less_true);
+            std::sort(res.begin() + first, res.begin() + last, less<true>(*this));
+
        size_t new_first = first;
        for (size_t j = first + 1; j < last; ++j)
        {
@ -363,11 +368,18 @@ void ColumnString::updatePermutation(bool reverse, size_t limit, int /*nan_direc

    if (limit)
    {
-        const auto &[first, last] = equal_range.back();
+        const auto & [first, last] = equal_ranges.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
-            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less_false);
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<false>(*this));
        else
-            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less_true);
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<true>(*this));
+
        size_t new_first = first;
        for (size_t j = first + 1; j < limit; ++j)
        {
@ -394,7 +406,6 @@ void ColumnString::updatePermutation(bool reverse, size_t limit, int /*nan_direc
        if (new_last - new_first > 1)
            new_ranges.emplace_back(new_first, new_last);
    }
-    equal_range = std::move(new_ranges);
 }

 ColumnPtr ColumnString::replicate(const Offsets & replicate_offsets) const
@ -534,19 +545,25 @@ void ColumnString::getPermutationWithCollation(const Collator & collator, bool r
    }
 }

-void ColumnString::updatePermutationWithCollation(const Collator & collator, bool reverse, size_t limit, int, Permutation &res, EqualRanges &equal_range) const
+void ColumnString::updatePermutationWithCollation(const Collator & collator, bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_ranges) const
 {
-    if (limit >= size() || limit >= equal_range.back().second)
+    if (equal_ranges.empty())
+        return;
+
+    if (limit >= size() || limit >= equal_ranges.back().second)
        limit = 0;

-    size_t n = equal_range.size();
+    size_t number_of_ranges = equal_ranges.size();
    if (limit)
-        --n;
+        --number_of_ranges;

    EqualRanges new_ranges;
-    for (size_t i = 0; i < n; ++i)
+    SCOPE_EXIT({equal_ranges = std::move(new_ranges);});
+
+    for (size_t i = 0; i < number_of_ranges; ++i)
    {
-        const auto& [first, last] = equal_range[i];
+        const auto& [first, last] = equal_ranges[i];
+
        if (reverse)
            std::sort(res.begin() + first, res.begin() + last, lessWithCollation<false>(*this, collator));
        else
@ -566,16 +583,22 @@ void ColumnString::updatePermutationWithCollation(const Collator & collator, boo
        }
        if (last - new_first > 1)
            new_ranges.emplace_back(new_first, last);
-
    }

    if (limit)
    {
-        const auto& [first, last] = equal_range.back();
+        const auto & [first, last] = equal_ranges.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, lessWithCollation<false>(*this, collator));
        else
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, lessWithCollation<true>(*this, collator));
+
        auto new_first = first;
        for (auto j = first + 1; j < limit; ++j)
        {
@ -603,7 +626,6 @@ void ColumnString::updatePermutationWithCollation(const Collator & collator, boo
        if (new_last - new_first > 1)
            new_ranges.emplace_back(new_first, new_last);
    }
-    equal_range = std::move(new_ranges);
 }

 void ColumnString::protect()
--- a/src/Columns/ColumnTuple.cpp
+++ b/src/Columns/ColumnTuple.cpp
@ -344,15 +344,19 @@ void ColumnTuple::getPermutation(bool reverse, size_t limit, int nan_direction_h
    }
 }

-void ColumnTuple::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+void ColumnTuple::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
 {
-    for (const auto& column : columns)
-    {
-        column->updatePermutation(reverse, limit, nan_direction_hint, res, equal_range);
-        while (limit && !equal_range.empty() && limit <= equal_range.back().first)
-            equal_range.pop_back();
+    if (equal_ranges.empty())
+        return;

-        if (equal_range.empty())
+    for (const auto & column : columns)
+    {
+        column->updatePermutation(reverse, limit, nan_direction_hint, res, equal_ranges);
+
+        while (limit && !equal_ranges.empty() && limit <= equal_ranges.back().first)
+            equal_ranges.pop_back();
+
+        if (equal_ranges.empty())
            break;
    }
 }
--- a/src/Columns/ColumnUnique.h
+++ b/src/Columns/ColumnUnique.h
@ -382,17 +382,20 @@ int ColumnUnique<ColumnType>::compareAt(size_t n, size_t m, const IColumn & rhs,
        }
    }

-    auto & column_unique = static_cast<const IColumnUnique &>(rhs);
+    const auto & column_unique = static_cast<const IColumnUnique &>(rhs);
    return getNestedColumn()->compareAt(n, m, *column_unique.getNestedColumn(), nan_direction_hint);
 }

 template <typename ColumnType>
-void ColumnUnique<ColumnType>::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+void ColumnUnique<ColumnType>::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
 {
+    if (equal_ranges.empty())
+        return;
+
    bool found_null_value_index = false;
-    for (size_t i = 0; i < equal_range.size() && !found_null_value_index; ++i)
+    for (size_t i = 0; i < equal_ranges.size() && !found_null_value_index; ++i)
    {
-        auto& [first, last] = equal_range[i];
+        auto & [first, last] = equal_ranges[i];
        for (auto j = first; j < last; ++j)
        {
            if (res[j] == getNullValueIndex())
@ -409,14 +412,14 @@ void ColumnUnique<ColumnType>::updatePermutation(bool reverse, size_t limit, int
                }
                if (last - first <= 1)
                {
-                    equal_range.erase(equal_range.begin() + i);
+                    equal_ranges.erase(equal_ranges.begin() + i);
                }
                found_null_value_index = true;
                break;
            }
        }
    }
-    getNestedColumn()->updatePermutation(reverse, limit, nan_direction_hint, res, equal_range);
+    getNestedColumn()->updatePermutation(reverse, limit, nan_direction_hint, res, equal_ranges);
 }

 template <typename IndexType>
--- a/src/Columns/ColumnVector.cpp
+++ b/src/Columns/ColumnVector.cpp
@ -15,8 +15,9 @@
 #include <Columns/ColumnsCommon.h>
 #include <DataStreams/ColumnGathererStream.h>
 #include <ext/bit_cast.h>
+#include <ext/scope_guard.h>
 #include <pdqsort.h>
-#include <numeric>
+

 #if !defined(ARCADIA_BUILD)
 #    include <Common/config.h>
@ -243,10 +244,14 @@ void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_directi
 template <typename T>
 void ColumnVector<T>::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
 {
+    if (equal_range.empty())
+        return;
+
    if (limit >= data.size() || limit >= equal_range.back().second)
        limit = 0;

    EqualRanges new_ranges;
+    SCOPE_EXIT({equal_range = std::move(new_ranges);});

    for (size_t i = 0; i < equal_range.size() - bool(limit); ++i)
    {
@ -275,6 +280,12 @@ void ColumnVector<T>::updatePermutation(bool reverse, size_t limit, int nan_dire
    if (limit)
    {
        const auto & [first, last] = equal_range.back();
+
+        if (limit < first || limit > last)
+            return;
+
+        /// Since then, we are working inside the interval.
+
        if (reverse)
            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, greater(*this, nan_direction_hint));
        else
@ -307,7 +318,6 @@ void ColumnVector<T>::updatePermutation(bool reverse, size_t limit, int nan_dire
            new_ranges.emplace_back(new_first, new_last);
        }
    }
-    equal_range = std::move(new_ranges);
 }

 template <typename T>
--- a/src/Columns/ColumnVector.h
+++ b/src/Columns/ColumnVector.h
@ -7,6 +7,7 @@
 #include <common/unaligned.h>
 #include <Core/Field.h>
 #include <Core/BigInt.h>
+#include <Common/assert_cast.h>


 namespace DB
@ -130,7 +131,7 @@ public:

    void insertFrom(const IColumn & src, size_t n) override
    {
-        data.push_back(static_cast<const Self &>(src).getData()[n]);
+        data.push_back(assert_cast<const Self &>(src).getData()[n]);
    }

    void insertData(const char * pos, size_t) override
@ -205,14 +206,14 @@ public:
    /// This method implemented in header because it could be possibly devirtualized.
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override
    {
-        return CompareHelper<T>::compare(data[n], static_cast<const Self &>(rhs_).data[m], nan_direction_hint);
+        return CompareHelper<T>::compare(data[n], assert_cast<const Self &>(rhs_).data[m], nan_direction_hint);
    }

    void compareColumn(const IColumn & rhs, size_t rhs_row_num,
                       PaddedPODArray<UInt64> * row_indexes, PaddedPODArray<Int8> & compare_results,
                       int direction, int nan_direction_hint) const override
    {
-        return this->template doCompareColumn<Self>(static_cast<const Self &>(rhs), rhs_row_num, row_indexes,
+        return this->template doCompareColumn<Self>(assert_cast<const Self &>(rhs), rhs_row_num, row_indexes,
                                                    compare_results, direction, nan_direction_hint);
    }

--- a/src/Columns/ColumnsNumber.h
+++ b/src/Columns/ColumnsNumber.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>
 #include <Columns/ColumnVector.h>


--- a/src/Columns/ya.make
+++ b/src/Columns/ya.make
@ -2,6 +2,8 @@
 LIBRARY()

 ADDINCL(
+    contrib/libs/icu/common
+    contrib/libs/icu/i18n
    contrib/libs/pdqsort
 )

--- a/src/Common/BitonicSort.h
+++ b/src/Common/BitonicSort.h
@ -12,7 +12,7 @@
 #endif

 #include <ext/bit_cast.h>
-#include <Core/Types.h>
+#include <common/types.h>
 #include <Core/Defines.h>
 #include <Common/PODArray.h>
 #include <Columns/ColumnsCommon.h>
--- a/src/Common/Config/AbstractConfigurationComparison.h
+++ b/src/Common/Config/AbstractConfigurationComparison.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>

 namespace Poco::Util
 {
--- a/src/Common/CpuId.h
+++ b/src/Common/CpuId.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/Types.h>
+#include <common/types.h>

 #if defined(__x86_64__) || defined(__i386__)
 #include <cpuid.h>
--- a/Show More
+++ b/Show More