Merge remote-tracking branch 'origin/master' into HEAD

2024-09-23 10:10:50 +00:00 · 2020-05-26 09:07:01 +03:00 · 2020-05-26 09:07:01 +03:00 · 9d25ae22b5
commit 9d25ae22b5
parent 27df8d5aca 7fac670657
93 changed files with 1769 additions and 601 deletions
--- a/contrib/cppkafka
+++ b/contrib/cppkafka
@ -1 +1 @@
-Subproject commit 9b184d881c15cc50784b28688c7c99d3d764db24
+Subproject commit f555ee36aaa74d17ca0dab3ce472070a610b2966
--- a/docs/en/development/architecture.md
+++ b/docs/en/development/architecture.md
@ -1,6 +1,6 @@
 ---
 toc_priority: 62
-toc_title: Overview of ClickHouse Architecture
+toc_title: Architecture Overview
 ---

 # Overview of ClickHouse Architecture {#overview-of-clickhouse-architecture}
--- a/docs/en/development/browse-code.md
+++ b/docs/en/development/browse-code.md
@ -1,6 +1,6 @@
 ---
-toc_priority: 63
-toc_title: Browse Source Code
+toc_priority: 71
+toc_title: Source Code
 ---

 # Browse ClickHouse Source Code {#browse-clickhouse-source-code}
--- a/docs/en/development/developer-instruction.md
+++ b/docs/en/development/developer-instruction.md
@ -1,19 +1,17 @@
 ---
 toc_priority: 61
-toc_title: The Beginner ClickHouse Developer Instruction
+toc_title: For Beginners
 ---

+# The Beginner ClickHouse Developer Instruction
+
 Building of ClickHouse is supported on Linux, FreeBSD and Mac OS X.

-# If You Use Windows {#if-you-use-windows}
-
 If you use Windows, you need to create a virtual machine with Ubuntu. To start working with a virtual machine please install VirtualBox. You can download Ubuntu from the website: https://www.ubuntu.com/#download. Please create a virtual machine from the downloaded image (you should reserve at least 4GB of RAM for it). To run a command-line terminal in Ubuntu, please locate a program containing the word “terminal” in its name (gnome-terminal, konsole etc.) or just press Ctrl+Alt+T.

-# If You Use a 32-bit System {#if-you-use-a-32-bit-system}
-
 ClickHouse cannot work or build on a 32-bit system. You should acquire access to a 64-bit system and you can continue reading.

-# Creating a Repository on GitHub {#creating-a-repository-on-github}
+## Creating a Repository on GitHub {#creating-a-repository-on-github}

 To start working with ClickHouse repository you will need a GitHub account.

@ -33,7 +31,7 @@ To do that in Ubuntu you would run in the command line terminal:
 A brief manual on using Git can be found here: https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf.
 For a detailed manual on Git see https://git-scm.com/book/en/v2.

-# Cloning a Repository to Your Development Machine {#cloning-a-repository-to-your-development-machine}
+## Cloning a Repository to Your Development Machine {#cloning-a-repository-to-your-development-machine}

 Next, you need to download the source files onto your working machine. This is called “to clone a repository” because it creates a local copy of the repository on your working machine.

@ -77,7 +75,7 @@ You can also add original ClickHouse repo’s address to your local repository t

 After successfully running this command you will be able to pull updates from the main ClickHouse repo by running `git pull upstream master`.

-## Working with Submodules {#working-with-submodules}
+### Working with Submodules {#working-with-submodules}

 Working with submodules in git could be painful. Next commands will help to manage it:

@ -107,7 +105,7 @@ The next commands would help you to reset all submodules to the initial state (!
    git submodule foreach git submodule foreach git reset --hard
    git submodule foreach git submodule foreach git clean -xfd

-# Build System {#build-system}
+## Build System {#build-system}

 ClickHouse uses CMake and Ninja for building.

@ -127,11 +125,11 @@ For installing CMake and Ninja on Mac OS X first install Homebrew and then insta

 Next, check the version of CMake: `cmake --version`. If it is below 3.3, you should install a newer version from the website: https://cmake.org/download/.

-# Optional External Libraries {#optional-external-libraries}
+## Optional External Libraries {#optional-external-libraries}

 ClickHouse uses several external libraries for building. All of them do not need to be installed separately as they are built together with ClickHouse from the sources located in the submodules. You can check the list in `contrib`.

-# C++ Compiler {#c-compiler}
+## C++ Compiler {#c-compiler}

 Compilers GCC starting from version 9 and Clang version 8 or above are supported for building ClickHouse.

@ -145,7 +143,7 @@ Mac OS X build is supported only for Clang. Just run `brew install llvm`

 If you decide to use Clang, you can also install `libc++` and `lld`, if you know what it is. Using `ccache` is also recommended.

-# The Building Process {#the-building-process}
+## The Building Process {#the-building-process}

 Now that you are ready to build ClickHouse we recommend you to create a separate directory `build` inside `ClickHouse` that will contain all of the build artefacts:

@ -202,7 +200,7 @@ Upon successful build you get an executable file `ClickHouse/<build_dir>/program

    ls -l programs/clickhouse

-# Running the Built Executable of ClickHouse {#running-the-built-executable-of-clickhouse}
+## Running the Built Executable of ClickHouse {#running-the-built-executable-of-clickhouse}

 To run the server under the current user you need to navigate to `ClickHouse/programs/server/` (located outside of `build`) and run:

@ -229,7 +227,7 @@ You can also run your custom-built ClickHouse binary with the config file from t
    sudo service clickhouse-server stop
    sudo -u clickhouse ClickHouse/build/programs/clickhouse server --config-file /etc/clickhouse-server/config.xml

-# IDE (Integrated Development Environment) {#ide-integrated-development-environment}
+## IDE (Integrated Development Environment) {#ide-integrated-development-environment}

 If you do not know which IDE to use, we recommend that you use CLion. CLion is commercial software, but it offers 30 days free trial period. It is also free of charge for students. CLion can be used both on Linux and on Mac OS X.

@ -239,7 +237,7 @@ As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate

 Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.

-# Writing Code {#writing-code}
+## Writing Code {#writing-code}

 The description of ClickHouse architecture can be found here: https://clickhouse.tech/docs/en/development/architecture/

@ -249,7 +247,7 @@ Writing tests: https://clickhouse.tech/docs/en/development/tests/

 List of tasks: https://github.com/ClickHouse/ClickHouse/blob/master/testsructions/easy\_tasks\_sorted\_en.md

-# Test Data {#test-data}
+## Test Data {#test-data}

 Developing ClickHouse often requires loading realistic datasets. It is particularly important for performance testing. We have a specially prepared set of anonymized data from Yandex.Metrica. It requires additionally some 3GB of free disk space. Note that this data is not required to accomplish most of the development tasks.

@ -272,7 +270,7 @@ Developing ClickHouse often requires loading realistic datasets. It is particula
    clickhouse-client --max_insert_block_size 100000 --query "INSERT INTO test.hits FORMAT TSV" < hits_v1.tsv
    clickhouse-client --max_insert_block_size 100000 --query "INSERT INTO test.visits FORMAT TSV" < visits_v1.tsv

-# Creating Pull Request {#creating-pull-request}
+## Creating Pull Request {#creating-pull-request}

 Navigate to your fork repository in GitHub’s UI. If you have been developing in a branch, you need to select that branch. There will be a “Pull request” button located on the screen. In essence, this means “create a request for accepting my changes into the main repository”.

--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@ -1265,4 +1265,63 @@ Possible values:

 Default value: 16.

+## low_cardinality_max_dictionary_size {#low_cardinality_max_dictionary_size}
+
+Sets a maximum size in rows of a shared global dictionary for the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) data type that can be written to a storage file system. This setting prevents issues with RAM in case of unlimited dictionary growth. All the data that can't be encoded due to maximum dictionary size limitation ClickHouse writes in an ordinary method.
+
+Possible values:
+
+-   Any positive integer.
+
+Default value: 8192.
+
+## low_cardinality_use_single_dictionary_for_part {#low_cardinality_use_single_dictionary_for_part}
+
+Turns on or turns off using of single dictionary for the data part.
+
+By default, ClickHouse server monitors the size of dictionaries and if a dictionary overflows then the server starts to write the next one. To prohibit creating several dictionaries set `low_cardinality_use_single_dictionary_for_part = 1`.
+
+Possible values:
+
+- 1 — Creating several dictionaries for the data part is prohibited.
+- 0 — Creating several dictionaries for the data part is not prohibited.
+
+Default value: 0.
+
+## low_cardinality_allow_in_native_format {#low_cardinality_allow_in_native_format}
+
+Allows or restricts using the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) data type with the [Native](../../interfaces/formats.md#native) format.
+
+If usage of `LowCardinality` is restricted, ClickHouse server converts `LowCardinality`-columns to ordinary ones for `SELECT` queries, and convert ordinary columns to `LowCardinality`-columns for `INSERT` queries.
+
+This setting is required mainly for third-party clients which don't support `LowCardinality` data type.
+
+Possible values:
+
+- 1 — Usage of `LowCardinality` is not restricted.
+- 0 — Usage of `LowCardinality` is restricted.
+
+Default value: 1.
+
+
+## allow_suspicious_low_cardinality_types {#allow_suspicious_low_cardinality_types}
+
+Allows or restricts using [LowCardinality](../../sql-reference/data-types/lowcardinality.md) with data types with fixed size of 8 bytes or less: numeric data types and `FixedString(8_bytes_or_less)`.
+
+For small fixed values using of `LowCardinality` is usually inefficient, because ClickHouse stores a numeric index for each row. As a result:
+
+- Disk space usage can rise.
+- RAM consumption can be higher, depending on a dictionary size.
+- Some functions can work slower due to extra coding/encoding operations.
+
+Merge times in [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md)-engine tables can grow due to all the reasons described above.
+
+Possible values:
+
+- 1 — Usage of `LowCardinality` is not restricted.
+- 0 — Usage of `LowCardinality` is restricted.
+
+Default value: 0.
+
+
 [Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->
--- a/docs/en/operations/system-tables.md
+++ b/docs/en/operations/system-tables.md
@ -5,11 +5,41 @@ toc_title: System Tables

 # System Tables {#system-tables}

-System tables are used for implementing part of the system’s functionality, and for providing access to information about how the system is working.
-You can’t delete a system table (but you can perform DETACH).
-System tables don’t have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
-System tables are read-only.
-They are located in the ‘system’ database.
+## Introduction
+
+System tables provide information about:
+
+- Server states, processes, and environment.
+- Server's internal processes.
+
+System tables:
+
+- Located in the `system` database.
+- Available only for reading data.
+- Can't be dropped or altered, but can be detached.
+
+The `metric_log`, `query_log`, `query_thread_log`, `trace_log` system tables store data in a storage filesystem. Other system tables store their data in RAM. ClickHouse server creates such system tables at the start.
+
+### Sources of System Metrics
+
+For collecting system metrics ClickHouse server uses:
+
+- `CAP_NET_ADMIN` capability.
+- [procfs](https://en.wikipedia.org/wiki/Procfs) (only in Linux).
+
+**procfs**
+
+If ClickHouse server doesn't have `CAP_NET_ADMIN` capability, it tries to fall back to `ProcfsMetricsProvider`. `ProcfsMetricsProvider` allows collecting per-query system metrics (for CPU and I/O).
+
+If procfs is supported and enabled on the system, ClickHouse server collects these metrics:
+
+- `OSCPUVirtualTimeMicroseconds`
+- `OSCPUWaitMicroseconds`
+- `OSIOWaitMicroseconds`
+- `OSReadChars`
+- `OSWriteChars`
+- `OSReadBytes`
+- `OSWriteBytes`

 ## system.asynchronous\_metrics {#system_tables-asynchronous_metrics}

--- a/docs/en/operations/troubleshooting.md
+++ b/docs/en/operations/troubleshooting.md
@ -116,7 +116,7 @@ Check:
    Check:

    -   The [tcp\_port\_secure](server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) setting.
-    -   Settings for [SSL sertificates](server-configuration-parameters/settings.md#server_configuration_parameters-openssl).
+    -   Settings for [SSL certificates](server-configuration-parameters/settings.md#server_configuration_parameters-openssl).

    Use proper parameters while connecting. For example, use the `port_secure` parameter with `clickhouse_client`.

--- a/docs/en/sql-reference/data-types/aggregatefunction.md
+++ b/docs/en/sql-reference/data-types/aggregatefunction.md
@ -1,5 +1,5 @@
 ---
-toc_priority: 52
+toc_priority: 53
 toc_title: AggregateFunction
 ---

--- a/docs/en/sql-reference/data-types/array.md
+++ b/docs/en/sql-reference/data-types/array.md
@ -1,5 +1,5 @@
 ---
-toc_priority: 51
+toc_priority: 52
 toc_title: Array(T)
 ---

--- a/docs/en/sql-reference/data-types/lowcardinality.md
+++ b/docs/en/sql-reference/data-types/lowcardinality.md
@ -0,0 +1,59 @@
+---
+toc_priority: 51
+toc_title: LowCardinality
+---
+
+# LowCardinality Data Type {#lowcardinality-data-type}
+
+Changes the internal representation of other data types to be dictionary-encoded. 
+
+## Syntax {#lowcardinality-syntax}
+
+```sql
+LowCardinality(data_type)
+```
+
+**Parameters**
+
+- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md), and numbers excepting [Decimal](decimal.md). `LowCardinality` is not efficient for some data types, see the [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types) setting description.
+
+## Description {#lowcardinality-dscr}
+
+`LowCardinality` is a superstructure that changes a data storage method and rules of data processing. ClickHouse applies [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) to `LowCardinality`-columns. Operating with dictionary encoded data significantly increases performance of [SELECT](../statements/select/index.md) queries for many applications.
+
+The efficiency of using `LowCarditality` data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.
+
+Consider using `LowCardinality` instead of [Enum](enum.md) when working with strings. `LowCardinality` provides more flexibility in use and often reveals the same or higher efficiency.
+
+## Example
+
+Create a table with a `LowCardinality`-column:
+
+```sql
+CREATE TABLE lc_t
+(
+    `id` UInt16, 
+    `strings` LowCardinality(String)
+)
+ENGINE = MergeTree()
+ORDER BY id
+```
+
+## Related Settings and Functions
+
+Settings:
+
+- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size)
+- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
+- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
+- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
+
+Functions:
+
+- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality)
+
+## See Also
+
+- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
+- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
+- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).
--- a/docs/en/sql-reference/data-types/nullable.md
+++ b/docs/en/sql-reference/data-types/nullable.md
@ -1,5 +1,5 @@
 ---
-toc_priority: 54
+toc_priority: 55
 toc_title: Nullable
 ---

--- a/docs/en/sql-reference/data-types/tuple.md
+++ b/docs/en/sql-reference/data-types/tuple.md
@ -1,5 +1,5 @@
 ---
-toc_priority: 53
+toc_priority: 54
 toc_title: Tuple(T1, T2, ...)
 ---

--- a/docs/en/sql-reference/functions/type-conversion-functions.md
+++ b/docs/en/sql-reference/functions/type-conversion-functions.md
@ -516,7 +516,7 @@ Result:

 **See Also**

-   \[ISO 8601 announcement by @xkcd\](https://xkcd.com/1179/)
+-   [ISO 8601 announcement by @xkcd](https://xkcd.com/1179/)
 -   [RFC 1123](https://tools.ietf.org/html/rfc1123)
 -   [toDate](#todate)
 -   [toDateTime](#todatetime)
@ -529,4 +529,43 @@ Same as for [parseDateTimeBestEffort](#parsedatetimebesteffort) except that it r

 Same as for [parseDateTimeBestEffort](#parsedatetimebesteffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed.

+## toLowCardinality {#tolowcardinality}
+
+Converts input parameter to the [LowCardianlity](../data-types/lowcardinality.md) version of same data type.
+
+To convert data from the `LowCardinality` data type use the [CAST](#type_conversion_function-cast) function. For example, `CAST(x as String)`.
+
+**Syntax**
+
+```sql
+toLowCardinality(expr)
+```
+
+**Parameters**
+
+- `expr` — [Expression](../syntax.md#syntax-expressions) resulting in one of the [supported data types](../data-types/index.md#data_types).
+
+
+**Returned values**
+
+- Result of `expr`.
+
+Type: `LowCardinality(expr_result_type)`
+
+**Example**
+
+Query:
+
+```sql
+SELECT toLowCardinality('1')
+```
+
+Result:
+
+```text
+┌─toLowCardinality('1')─┐
+│ 1                     │
+└───────────────────────┘
+```
+
 [Original article](https://clickhouse.tech/docs/en/query_language/functions/type_conversion_functions/) <!--hide-->
--- a/docs/zh/operations/access-rights.md
+++ b/docs/zh/operations/access-rights.md
@ -1,101 +1,146 @@
-# 访问权限 {#access-rights}
+---
+toc_priority: 48
+toc_title: "访问权限和账户管理"
+---

-用户和访问权限在用户配置中设置。 这通常是 `users.xml`.
+# 访问权限和账户管理 {#access-rights}
+ClickHouse支持基于[RBAC](https://en.wikipedia.org/wiki/Role-based_access_control)的访问控制管理。

-用户被记录在 `users` 科。 这里是一个片段 `users.xml` 文件:
+ClickHouse权限实体包括：
+- [用户账户](#user-account-management)
+- [角色](#role-management)
+- [行策略](#row-policy-management)
+- [设置描述](#settings-profiles-management)
+- [配额](#quotas-management)

-``` xml
-<!-- Users and ACL. -->
-<users>
-    <!-- If the user name is not specified, the 'default' user is used. -->
-    <default>
-        <!-- Password could be specified in plaintext or in SHA256 (in hex format).
+你可以通过如下方式配置权限实体：

-             If you want to specify password in plaintext (not recommended), place it in 'password' element.
-             Example: <password>qwerty</password>.
-             Password could be empty.
+- 通过SQL驱动的工作流方式.

-             If you want to specify SHA256, place it in 'password_sha256_hex' element.
-             Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
+    你需要[开启](#enabling-access-control)这个功能.

-             How to generate decent password:
-             Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
-             In first line will be password and in second - corresponding SHA256.
-        -->
-        <password></password>
+- 服务端[配置文件](configuration-files.md) `users.xml` 和 `config.xml`.

-        <!-- A list of networks that access is allowed from.
-            Each list item has one of the following forms:
-            <ip> The IP address or subnet mask. For example: 198.51.100.0/24 or 2001:DB8::/32.
-            <host> Host name. For example: example01. A DNS query is made for verification, and all addresses obtained are compared with the address of the customer.
-            <host_regexp> Regular expression for host names. For example, ^example\d\d-\d\d-\d\.yandex\.ru$
-                To check it, a DNS PTR request is made for the client's address and a regular expression is applied to the result.
-                Then another DNS query is made for the result of the PTR query, and all received address are compared to the client address.
-                We strongly recommend that the regex ends with \.yandex\.ru$.
+我们建议你使用SQL工作流的方式。当然配置的方式也可以同时起作用, 所以如果你正在用服务端配置的方式来管理权限和账户，你可以平滑的切换到SQL驱动的工作流方式。

-            If you are installing ClickHouse yourself, specify here:
-                <networks>
-                        <ip>::/0</ip>
-                </networks>
-        -->
-        <networks incl="networks" />
+!!! note "警告"
+    你无法同时使用两个配置的方式来管理同一个权限实体。

-        <!-- Settings profile for the user. -->
-        <profile>default</profile>

-        <!-- Quota for the user. -->
-        <quota>default</quota>
-    </default>
+## 用法 {#access-control-usage}

-    <!-- For requests from the Yandex.Metrica user interface via the API for data on specific counters. -->
-    <web>
-        <password></password>
-        <networks incl="networks" />
-        <profile>web</profile>
-        <quota>default</quota>
-        <allow_databases>
-           <database>test</database>
-        </allow_databases>
-    </web>
-```
+默认ClickHouse提供了一个 `default` 账号，这个账号有所有的权限，但是不能使用SQL驱动方式的访问权限和账户管理。`default`主要用在用户名还未设置的情况，比如从客户端登录或者执行分布式查询。在分布式查询中如果服务端或者集群没有指定[用户名密码](../engines/table-engines/special/distributed.md)那默认的账户就会被使用。

-您可以看到两个用户的声明: `default`和`web`. 我们添加了 `web` 用户分开。
+如果你刚开始使用ClickHouse，考虑如下场景：

-该 `default` 在用户名未通过的情况下选择用户。 该 `default` 如果服务器或群集的配置没有指定分布式查询处理，则user也用于分布式查询处理 `user` 和 `password` （见上的部分 [分布](../engines/table-engines/special/distributed.md) 发动机）。
+1. 为 `default` 用户[开启SQL驱动方式的访问权限和账户管理](#enabling-access-control) .
+2. 使用 `default` 用户登录并且创建所需要的所有用户。 不要忘记创建管理员账户 (`GRANT ALL ON *.* WITH GRANT OPTION TO admin_user_account`)。
+3. [限制](settings/permissions-for-queries.md#permissions_for_queries) `default` 用户的权限并且禁用SQL驱动方式的访问权限和账户管理。

-The user that is used for exchanging information between servers combined in a cluster must not have substantial restrictions or quotas – otherwise, distributed queries will fail.
+### 当前解决方案的特性 {#access-control-properties}

-密码以明文（不推荐）或SHA-256形式指定。 哈希没有腌制。 在这方面，您不应将这些密码视为提供了针对潜在恶意攻击的安全性。 相反，他们是必要的保护员工。
+- 你甚至可以在数据库和表不存在的时候授予权限。
+- 如果表被删除，和这张表关联的特权不会被删除。这意味着如果你创建一张同名的表，所有的特权仍旧有效。如果想删除这张表关联的特权，你可以执行 `REVOKE ALL PRIVILEGES ON db.table FROM ALL`  查询。
+- 特权没有生命周期。

-指定允许访问的网络列表。 在此示例中，将从单独的文件加载两个用户的网络列表 (`/etc/metrika.xml`）包含 `networks` 替代。 这里是它的一个片段:
+## 用户账户 {#user-account-management}

-``` xml
-<yandex>
-    ...
-    <networks>
-        <ip>::/64</ip>
-        <ip>203.0.113.0/24</ip>
-        <ip>2001:DB8::/32</ip>
-        ...
-    </networks>
-</yandex>
-```
+用户账户是权限实体，用来授权操作ClickHouse，用户账户包含：

-您可以直接在以下内容中定义此网络列表 `users.xml`，或在文件中 `users.d` directory (for more information, see the section «[配置文件](configuration-files.md#configuration_files)»).
+- 标识符信息。
+- [特权](../sql-reference/statements/grant.md#grant-privileges)用来定义用户可以执行的查询的范围。
+- 可以连接到ClickHouse的主机。
+- 指定或者默认的角色。
+- 用户登录的时候默认的限制设置。
+- 指定的设置描述。

-该配置包括解释如何从任何地方打开访问的注释。
+特权可以通过[GRANT](../sql-reference/statements/grant.md)查询授权给用户或者通过[角色](#role-management)授予。如果想撤销特权，可以使用[REVOKE](../sql-reference/statements/revoke.md)查询。查询用户所有的特权，使用[SHOW GRANTS](../sql-reference/statements/show.md#show-grants-statement)语句。

-对于在生产中使用，仅指定 `ip` 元素（IP地址及其掩码），因为使用 `host` 和 `hoost_regexp` 可能会导致额外的延迟。
+查询管理：

-Next the user settings profile is specified (see the section «[设置配置文件](settings/settings-profiles.md)»). You can specify the default profile, `default'`. 配置文件可以有任何名称。 您可以为不同的用户指定相同的配置文件。 您可以在设置配置文件中编写的最重要的事情是 `readonly=1`，这确保只读访问。
-Then specify the quota to be used (see the section «[配额](quotas.md#quotas)»). You can specify the default quota: `default`. It is set in the config by default to only count resource usage, without restricting it. The quota can have any name. You can specify the same quota for different users – in this case, resource usage is calculated for each user individually.
+- [CREATE USER](../sql-reference/statements/create.md#create-user-statement)
+- [ALTER USER](../sql-reference/statements/alter.md#alter-user-statement)
+- [DROP USER](../sql-reference/statements/misc.md#drop-user-statement)
+- [SHOW CREATE USER](../sql-reference/statements/show.md#show-create-user-statement)

-在可选 `<allow_databases>` 您还可以指定用户可以访问的数据库列表。 默认情况下，所有数据库都可供用户使用。 您可以指定 `default` 数据库。 在这种情况下，默认情况下，用户将接收对数据库的访问权限。
+### 设置应用规则 {#access-control-settings-applying}

-访问 `system` 始终允许数据库（因为此数据库用于处理查询）。
+对于一个用户账户来说，设置可以通过多种方式配置：通过角色扮演和设置描述。对于一个登陆的账号来说，如果一个设置对应了多个不同的权限实体，这些设置的应用规则如下（优先权从高到底）：

-用户可以通过以下方式获取其中所有数据库和表的列表 `SHOW` 查询或系统表，即使不允许访问单个数据库。
+1. 用户账户设置。
+2. 用户账号默认的角色设置。如果这个设置配置了多个角色，那设置的应用是没有规定的顺序。
+3. 从设置描述分批给用户或者角色的设置。如果这个设置配置了多个角色，那设置的应用是没有规定的顺序。
+4. 对所有服务器有效的默认或者[default profile](server-configuration-parameters/settings.md#default-profile)的设置。

-数据库访问是不相关的 [只读](settings/permissions-for-queries.md#settings_readonly) 设置。 您不能授予对一个数据库的完全访问权限，并 `readonly` 进入另一个。

-[原始文章](https://clickhouse.tech/docs/en/operations/access_rights/) <!--hide-->
+## 角色 {#role-management}
+
+角色是权限实体的集合，可以被授予用户账号。
+
+角色包括：
+
+- [特权](../sql-reference/statements/grant.md#grant-privileges)
+- 设置和限制
+- 分配的角色列表
+
+查询管理:
+
+- [CREATE ROLE](../sql-reference/statements/create.md#create-role-statement)
+- [ALTER ROLE](../sql-reference/statements/alter.md#alter-role-statement)
+- [DROP ROLE](../sql-reference/statements/misc.md#drop-role-statement)
+- [SET ROLE](../sql-reference/statements/misc.md#set-role-statement)
+- [SET DEFAULT ROLE](../sql-reference/statements/misc.md#set-default-role-statement)
+- [SHOW CREATE ROLE](../sql-reference/statements/show.md#show-create-role-statement)
+
+使用[GRANT](../sql-reference/statements/grant.md) 查询可以把特权授予给角色。用[REVOKE](../sql-reference/statements/revoke.md)来撤回特权。
+
+## 行策略 {#row-policy-management}
+
+行策略是一个过滤器，用来定义哪些行数据可以被账户或者角色访问。对一个特定的表来说，行策略包括过滤器和使用这个策略的账户和角色。
+
+查询管理：
+
+- [CREATE ROW POLICY](../sql-reference/statements/create.md#create-row-policy-statement)
+- [ALTER ROW POLICY](../sql-reference/statements/alter.md#alter-row-policy-statement)
+- [DROP ROW POLICY](../sql-reference/statements/misc.md#drop-row-policy-statement)
+- [SHOW CREATE ROW POLICY](../sql-reference/statements/show.md#show-create-row-policy-statement)
+
+
+## 设置描述 {#settings-profiles-management}
+
+设置描述是[设置](settings/index.md)的汇总。设置汇总包括设置和限制，当然也包括这些描述的对象：角色和账户。
+
+查询管理:
+
+- [CREATE SETTINGS PROFILE](../sql-reference/statements/create.md#create-settings-profile-statement)
+- [ALTER SETTINGS PROFILE](../sql-reference/statements/alter.md#alter-settings-profile-statement)
+- [DROP SETTINGS PROFILE](../sql-reference/statements/misc.md#drop-settings-profile-statement)
+- [SHOW CREATE SETTINGS PROFILE](../sql-reference/statements/show.md#show-create-settings-profile-statement)
+
+
+## 配额 {#quotas-management}
+
+配额用来限制资源的使用情况。参考[配额](quotas.md).
+
+配额包括特定时间的限制条件和使用这个配额的账户和角色。
+
+Management queries:
+
+- [CREATE QUOTA](../sql-reference/statements/create.md#create-quota-statement)
+- [ALTER QUOTA](../sql-reference/statements/alter.md#alter-quota-statement)
+- [DROP QUOTA](../sql-reference/statements/misc.md#drop-quota-statement)
+- [SHOW CREATE QUOTA](../sql-reference/statements/show.md#show-create-quota-statement)
+
+
+## 开启SQL驱动方式的访问权限和账户管理 {#enabling-access-control}
+
+- 为配置的存储设置一个目录.
+
+    ClickHouse把访问实体的相关配置存储在[访问控制目录](server-configuration-parameters/settings.md#access_control_path)，而这个目录可以通过服务端进行配置.
+
+- 为至少一个账户开启SQL驱动方式的访问权限和账户管理.
+
+     默认情况，SQL驱动方式的访问权限和账户管理对所有用户都是关闭的。你需要在 `users.xml` 中配置至少一个用户，并且把[权限管理](settings/settings-users.md#access_management-user-setting)的值设置为1。
+
+
+[Original article](https://clickhouse.tech/docs/en/operations/access_rights/) <!--hide-->
--- a/docs/zh/operations/index.md
+++ b/docs/zh/operations/index.md
@ -1,3 +1,13 @@
+---
+toc_priority: 43
+toc_title: "操作"
+---
+
 # 操作 {#operations}

+Clickhouse运维手册主要包含下面几部分：
+
+- 安装要求
+
+
 [原始文章](https://clickhouse.tech/docs/en/operations/) <!--hide-->
--- a/docs/zh/operations/monitoring.md
+++ b/docs/zh/operations/monitoring.md
@ -1,3 +1,8 @@
+---
+toc_priority: 45
+toc_title: "监控"
+---
+
 # 监控 {#jian-kong}

 可以监控到：
@ -21,17 +26,17 @@ ClickHouse 本身不会去监控硬件资源的状态。

 ClickHouse服务本身具有用于自我状态监视指标。

-要跟踪服务器事件，请观察服务器日志。 请参阅配置文件的\[logger\]（server\_settings/settings.md\#server\_settings-logger）部分。
+要跟踪服务器事件，请观察服务器日志。 请参阅配置文件的 [logger](server-configuration-parameters/settings.md#server_configuration_parameters-logger)部分。

 ClickHouse 收集的指标项：

 -   服务用于计算的资源占用的各种指标。
 -   关于查询处理的常见统计信息。

-可以在 [系统。指标](system-tables.md#system_tables-metrics) ，[系统。活动](system-tables.md#system_tables-events) 以及[系统。asynchronous\_metrics](system-tables.md#system_tables-asynchronous_metrics) 等系统表查看所有的指标项。
+可以在 [系统指标](system-tables.md#system_tables-metrics) ，[系统事件](system-tables.md#system_tables-events) 以及[系统异步指标](system-tables.md#system_tables-asynchronous_metrics) 等系统表查看所有的指标项。

 可以配置ClickHouse 往 [石墨](https://github.com/graphite-project)导入指标。 参考 [石墨部分](server-configuration-parameters/settings.md#server_configuration_parameters-graphite) 配置文件。在配置指标导出之前，需要参考Graphite[官方教程](https://graphite.readthedocs.io/en/latest/install.html)搭建服务。

 此外，您可以通过HTTP API监视服务器可用性。 将HTTP GET请求发送到 `/ping`。 如果服务器可用，它将以 `200 OK` 响应。

-要监视服务器集群的配置中，应设置[max\_replica\_delay\_for\_distributed\_queries](settings/settings.md#settings-max_replica_delay_for_distributed_queries)参数并使用HTTP资源`/replicas_status`。 如果副本可用，并且不延迟在其他副本之后，则对`/replicas_status`的请求将返回200 OK。 如果副本滞后，请求将返回 `503 HTTP_SERVICE_UNAVAILABLE`，包括有关待办事项大小的信息。
+要监视服务器集群的配置，应设置[max\_replica\_delay\_for\_distributed\_queries](settings/settings.md#settings-max_replica_delay_for_distributed_queries)参数并使用HTTP资源`/replicas_status`。 如果副本可用，并且不延迟在其他副本之后，则对`/replicas_status`的请求将返回200 OK。 如果副本滞后，请求将返回 `503 HTTP_SERVICE_UNAVAILABLE`，包括有关待办事项大小的信息。
--- a/docs/zh/operations/requirements.md
+++ b/docs/zh/operations/requirements.md
@ -1,8 +1,6 @@
 ---
-machine_translated: true
-machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
 toc_priority: 44
-toc_title: "\u8981\u6C42"
+toc_title: "要求"
 ---

 # 要求 {#requirements}
@ -13,16 +11,16 @@ toc_title: "\u8981\u6C42"

 ClickHouse实现并行数据处理并使用所有可用的硬件资源。 在选择处理器时，考虑到ClickHouse在具有大量内核但时钟速率较低的配置中的工作效率要高于具有较少内核和较高时钟速率的配置。 例如，具有2600MHz的16核心优于具有3600MHz的8核心。

-建议使用 **涡轮增压** 和 **超线程** 技术。 它显着提高了典型工作负载的性能。
+建议使用 **睿频加速** 和 **超线程** 技术。 它显着提高了典型工作负载的性能。

 ## RAM {#ram}

-我们建议使用至少4GB的RAM来执行非平凡的查询。 ClickHouse服务器可以使用少得多的RAM运行，但它需要处理查询的内存。
+我们建议使用至少4GB的RAM来执行重要的查询。 ClickHouse服务器可以使用少得多的RAM运行，但它需要处理查询的内存。

 RAM所需的体积取决于:

 -   查询的复杂性。
-   在查询中处理的数据量。
+-   查询中处理的数据量。

 要计算所需的RAM体积，您应该估计临时数据的大小 [GROUP BY](../sql-reference/statements/select/group-by.md#select-group-by-clause), [DISTINCT](../sql-reference/statements/select/distinct.md#select-distinct), [JOIN](../sql-reference/statements/select/join.md#select-join) 和您使用的其他操作。

@ -42,20 +40,20 @@ ClickHouse可以使用外部存储器来存储临时数据。 看 [在外部存

    您可以采取数据的样本并从中获取行的平均大小。 然后将该值乘以计划存储的行数。

-   的数据压缩系数。
+-   数据压缩系数。

-    要估计数据压缩系数，请将数据的样本加载到ClickHouse中，并将数据的实际大小与存储的表的大小进行比较。 例如，点击流数据通常被压缩6-10次。
+    要估计数据压缩系数，请将数据的样本加载到ClickHouse中，并将数据的实际大小与存储的表的大小进行比较。 例如，点击流数据通常被压缩6-10倍。

-要计算要存储的最终数据量，请将压缩系数应用于估计的数据量。 如果计划将数据存储在多个副本中，则将估计的卷乘以副本数。
+要计算要存储的最终数据量，请将压缩系数应用于估计的数据量。 如果计划将数据存储在多个副本中，则将估计的量乘以副本数。

 ## 网络 {#network}

 如果可能的话，使用10G或更高级别的网络。

-网络带宽对于处理具有大量中间数据的分布式查询至关重要。 此外，网络速度会影响复制过程。
+网络带宽对于处理具有大量中间结果数据的分布式查询至关重要。 此外，网络速度会影响复制过程。

 ## 软件 {#software}

-ClickHouse主要是为Linux系列操作系统开发的。 推荐的Linux发行版是Ubuntu。 该 `tzdata` 软件包应安装在系统中。
+ClickHouse主要是为Linux系列操作系统开发的。 推荐的Linux发行版是Ubuntu。 `tzdata` 软件包应安装在系统中。

 ClickHouse也可以在其他操作系统系列中工作。 查看详细信息 [开始](../getting-started/index.md) 文档的部分。
--- a/docs/zh/operations/troubleshooting.md
+++ b/docs/zh/operations/troubleshooting.md
@ -1,23 +1,21 @@
 ---
-machine_translated: true
-machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
 toc_priority: 46
-toc_title: "\u7591\u96BE\u89E3\u7B54"
+toc_title: "常见问题"
 ---

-# 疑难解答 {#troubleshooting}
+# 常见问题 {#troubleshooting}

-   [安装方式](#troubleshooting-installation-errors)
+-   [安装](#troubleshooting-installation-errors)
 -   [连接到服务器](#troubleshooting-accepts-no-connections)
 -   [查询处理](#troubleshooting-does-not-process-queries)
 -   [查询处理效率](#troubleshooting-too-slow)

-## 安装方式 {#troubleshooting-installation-errors}
+## 安装 {#troubleshooting-installation-errors}

 ### 您无法使用Apt-get从ClickHouse存储库获取Deb软件包 {#you-cannot-get-deb-packages-from-clickhouse-repository-with-apt-get}

 -   检查防火墙设置。
-   如果出于任何原因无法访问存储库，请按照以下文件中的描述下载软件包 [开始](../getting-started/index.md) 文章并使用手动安装它们 `sudo dpkg -i <packages>` 指挥部 您还需要 `tzdata` 包。
+-   如果出于任何原因无法访问存储库，请按照[开始](../getting-started/index.md)中的描述下载软件包，并使用命令 `sudo dpkg -i <packages>` 手动安装它们。除此之外你还需要 `tzdata` 包。

 ## 连接到服务器 {#troubleshooting-accepts-no-connections}

@ -44,7 +42,7 @@ $ sudo service clickhouse-server start

 **检查日志**

-主日志 `clickhouse-server` 是在 `/var/log/clickhouse-server/clickhouse-server.log` 默认情况下。
+主日志 `clickhouse-server` 默认情况是在 `/var/log/clickhouse-server/clickhouse-server.log` 下。

 如果服务器成功启动，您应该看到字符串:

@ -57,13 +55,13 @@ $ sudo service clickhouse-server start
 2019.01.11 15:23:25.549505 [ 45 ] {} <Error> ExternalDictionaries: Failed reloading 'event2id' external dictionary: Poco::Exception. Code: 1000, e.code() = 111, e.displayText() = Connection refused, e.what() = Connection refused
 ```

-如果在文件末尾没有看到错误，请从字符串开始查看整个文件:
+如果在文件末尾没有看到错误，请从如下字符串开始查看整个文件:

 ``` text
 <Information> Application: starting up.
 ```

-如果您尝试启动第二个实例 `clickhouse-server` 在服务器上，您将看到以下日志:
+如果您尝试在服务器上启动第二个实例 `clickhouse-server` ，您将看到以下日志:

 ``` text
 2019.01.11 15:25:11.151730 [ 1 ] {} <Information> : Starting ClickHouse 19.1.0 with revision 54413
@ -79,9 +77,9 @@ Revision: 54413
 2019.01.11 15:25:11.156716 [ 2 ] {} <Information> BaseDaemon: Stop SignalListener thread
 ```

-**请参阅系统。d日志**
+**查看系统日志**

-如果你没有找到任何有用的信息 `clickhouse-server` 日志或没有任何日志，您可以查看 `system.d` 使用命令记录:
+如果你在 `clickhouse-server` 没有找到任何有用的信息或根本没有任何日志，您可以使用命令查看 `system.d` :

 ``` bash
 $ sudo journalctl -u clickhouse-server
@ -99,9 +97,9 @@ $ sudo -u clickhouse /usr/bin/clickhouse-server --config-file /etc/clickhouse-se

 检查:

-   码头工人设置。
+-   Docker设置。

-    如果您在IPv6网络中的Docker中运行ClickHouse，请确保 `network=host` 已设置。
+    如果您在IPv6网络中的Docker中运行ClickHouse，请确保 `network=host` 被设置。

 -   端点设置。

@ -117,10 +115,10 @@ $ sudo -u clickhouse /usr/bin/clickhouse-server --config-file /etc/clickhouse-se

    检查:

-    -   该 [tcp\_port\_secure](server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) 设置。
-    -   设置 [SSL序列](server-configuration-parameters/settings.md#server_configuration_parameters-openssl).
+    -   [tcp\_port\_secure](server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) 设置。
+    -   [SSL证书](server-configuration-parameters/settings.md#server_configuration_parameters-openssl) 设置.

-    连接时使用正确的参数。 例如，使用 `port_secure` 参数 `clickhouse_client`.
+    连接时使用正确的参数。 例如，使用 `clickhouse_client` 的时候使用 `port_secure` 参数 .

 -   用户设置。

@ -135,7 +133,7 @@ $ curl 'http://localhost:8123/' --data-binary "SELECT a"
 Code: 47, e.displayText() = DB::Exception: Unknown identifier: a. Note that there are no tables (FROM clause) in your query, context: required_names: 'a' source_tables: table_aliases: private_aliases: column_aliases: public_columns: 'a' masked_columns: array_join_columns: source_columns: , e.what() = DB::Exception
 ```

-如果你开始 `clickhouse-client` 与 `stack-trace` 参数，ClickHouse返回包含错误描述的服务器堆栈跟踪。
+如果你使用 `clickhouse-client` 时设置了 `stack-trace` 参数，ClickHouse返回包含错误描述的服务器堆栈跟踪信息。

 您可能会看到一条关于连接中断的消息。 在这种情况下，可以重复查询。 如果每次执行查询时连接中断，请检查服务器日志中是否存在错误。

--- a/docs/zh/operations/update.md
+++ b/docs/zh/operations/update.md
@ -1,11 +1,9 @@
 ---
-machine_translated: true
-machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
 toc_priority: 47
-toc_title: "\u70B9\u51FB\u66F4\u65B0"
+toc_title: "更新"
 ---

-# 点击更新 {#clickhouse-update}
+# 更新 {#clickhouse-update}

 如果从deb包安装ClickHouse，请在服务器上执行以下命令:

@ -15,6 +13,6 @@ $ sudo apt-get install clickhouse-client clickhouse-server
 $ sudo service clickhouse-server restart
 ```

-如果您使用除推荐的deb包之外的其他内容安装ClickHouse，请使用适当的更新方法。
+如果您使用除推荐的deb包之外的其他方式安装ClickHouse，请使用适当的更新方法。

 ClickHouse不支持分布式更新。该操作应在每个单独的服务器上连续执行。不要同时更新群集上的所有服务器，否则群集将在一段时间内不可用。
--- a/programs/CMakeLists.txt
+++ b/programs/CMakeLists.txt
@ -207,7 +207,7 @@ if (TARGET clickhouse-server AND TARGET copy-headers)
 endif ()

 if (ENABLE_TESTS AND USE_GTEST)
-    set (CLICKHOUSE_ALL_TESTS_TARGETS local_date_time_comparison unit_tests_libcommon unit_tests_dbms hashing_write_buffer hashing_read_buffer in_join_subqueries_preprocessor expression_analyzer)
+    set (CLICKHOUSE_ALL_TESTS_TARGETS local_date_time_comparison unit_tests_libcommon unit_tests_dbms hashing_write_buffer hashing_read_buffer in_join_subqueries_preprocessor)
    add_custom_target (clickhouse-tests ALL DEPENDS ${CLICKHOUSE_ALL_TESTS_TARGETS})
    add_dependencies(clickhouse-bundle clickhouse-tests)
 endif()
--- a/src/Columns/ColumnAggregateFunction.cpp
+++ b/src/Columns/ColumnAggregateFunction.cpp
@ -549,6 +549,8 @@ void ColumnAggregateFunction::getPermutation(bool /*reverse*/, size_t /*limit*/,
        res[i] = i;
 }

+void ColumnAggregateFunction::updatePermutation(bool, size_t, int, Permutation &, EqualRanges&) const {}
+
 void ColumnAggregateFunction::gather(ColumnGathererStream & gatherer)
 {
    gatherer.gather(*this);
--- a/src/Columns/ColumnAggregateFunction.h
+++ b/src/Columns/ColumnAggregateFunction.h
@ -193,6 +193,7 @@ public:
    }

    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_range) const override;

    /** More efficient manipulation methods */
    Container & getData()
--- a/src/Columns/ColumnArray.cpp
+++ b/src/Columns/ColumnArray.cpp
@ -737,6 +737,76 @@ void ColumnArray::getPermutation(bool reverse, size_t limit, int nan_direction_h
    }
 }

+void ColumnArray::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    size_t n = equal_range.size();
+
+    if (limit)
+        --n;
+
+    EqualRanges new_ranges;
+    for (size_t i = 0; i < n; ++i)
+    {
+        const auto& [first, last] = equal_range[i];
+
+        if (reverse)
+            std::sort(res.begin() + first, res.begin() + last, Less<false>(*this, nan_direction_hint));
+        else
+            std::sort(res.begin() + first, res.begin() + last, Less<true>(*this, nan_direction_hint));
+        auto new_first = first;
+
+        for (auto j = first + 1; j < last; ++j)
+        {
+            if (compareAt(res[new_first], res[j], *this, nan_direction_hint) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+    }
+
+    if (limit)
+    {
+        const auto& [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, Less<false>(*this, nan_direction_hint));
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, Less<true>(*this, nan_direction_hint));
+        auto new_first = first;
+        for (auto j = first + 1; j < limit; ++j)
+        {
+            if (compareAt(res[new_first], res[j], *this, nan_direction_hint) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        auto new_last = limit;
+        for (auto j = limit; j < last; ++j)
+        {
+            if (compareAt(res[new_first], res[j], *this, nan_direction_hint) == 0)
+            {
+                std::swap(res[new_last], res[j]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+        {
+            new_ranges.emplace_back(new_first, new_last);
+        }
+    }
+    equal_range = std::move(new_ranges);
+}

 ColumnPtr ColumnArray::replicate(const Offsets & replicate_offsets) const
 {
--- a/src/Columns/ColumnArray.h
+++ b/src/Columns/ColumnArray.h
@ -73,6 +73,7 @@ public:
    template <typename Type> ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override;
    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_range) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/src/Columns/ColumnConst.cpp
+++ b/src/Columns/ColumnConst.cpp
@ -120,6 +120,8 @@ void ColumnConst::getPermutation(bool /*reverse*/, size_t /*limit*/, int /*nan_d
        res[i] = i;
 }

+void ColumnConst::updatePermutation(bool, size_t, int, Permutation &, EqualRanges &) const {}
+
 void ColumnConst::updateWeakHash32(WeakHash32 & hash) const
 {
    if (hash.getData().size() != s)
--- a/src/Columns/ColumnConst.h
+++ b/src/Columns/ColumnConst.h
@ -170,6 +170,7 @@ public:
    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_range) const override;

    size_t byteSize() const override
    {
--- a/src/Columns/ColumnDecimal.cpp
+++ b/src/Columns/ColumnDecimal.cpp
@ -108,6 +108,76 @@ void ColumnDecimal<T>::getPermutation(bool reverse, size_t limit, int , IColumn:
    permutation(reverse, limit, res);
 }

+template <typename T>
+void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= data.size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    size_t n = equal_range.size();
+    if (limit)
+        --n;
+
+    EqualRanges new_ranges;
+    for (size_t i = 0; i < n; ++i)
+    {
+        const auto& [first, last] = equal_range[i];
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
+                [this](size_t a, size_t b) { return data[a] > data[b]; });
+        else
+            std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
+                [this](size_t a, size_t b) { return data[a] < data[b]; });
+        auto new_first = first;
+        for (auto j = first + 1; j < last; ++j)
+        {
+            if (data[res[new_first]] != data[res[j]])
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+    }
+
+    if (limit)
+    {
+        const auto& [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
+                [this](size_t a, size_t b) { return data[a] > data[b]; });
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
+                [this](size_t a, size_t b) { return data[a] < data[b]; });
+        auto new_first = first;
+        for (auto j = first + 1; j < limit; ++j)
+        {
+            if (data[res[new_first]] != data[res[j]])
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        auto new_last = limit;
+        for (auto j = limit; j < last; ++j)
+        {
+            if (data[res[new_first]] == data[res[j]])
+            {
+                std::swap(res[new_last], res[j]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+            new_ranges.emplace_back(new_first, new_last);
+    }
+    equal_range = std::move(new_ranges);
+}
+
 template <typename T>
 ColumnPtr ColumnDecimal<T>::permute(const IColumn::Permutation & perm, size_t limit) const
 {
--- a/src/Columns/ColumnDecimal.h
+++ b/src/Columns/ColumnDecimal.h
@ -108,6 +108,7 @@ public:
    void updateWeakHash32(WeakHash32 & hash) const override;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override;
    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int, IColumn::Permutation & res, EqualRanges& equal_range) const override;

    MutableColumnPtr cloneResized(size_t size) const override;

--- a/src/Columns/ColumnFixedString.cpp
+++ b/src/Columns/ColumnFixedString.cpp
@ -162,6 +162,71 @@ void ColumnFixedString::getPermutation(bool reverse, size_t limit, int /*nan_dir
    }
 }

+void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    size_t k = equal_range.size();
+    if (limit)
+        --k;
+
+    EqualRanges new_ranges;
+
+    for (size_t i = 0; i < k; ++i)
+    {
+        const auto& [first, last] = equal_range[i];
+        if (reverse)
+            std::sort(res.begin() + first, res.begin() + last, less<false>(*this));
+        else
+            std::sort(res.begin() + first, res.begin() + last, less<true>(*this));
+        auto new_first = first;
+        for (auto j = first + 1; j < last; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(chars.data() + j * n, chars.data() + new_first * n, n) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+    }
+    if (limit)
+    {
+        const auto& [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<false>(*this));
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<true>(*this));
+        auto new_first = first;
+        for (auto j = first + 1; j < limit; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(chars.data() + j * n, chars.data() + new_first * n, n)  != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        auto new_last = limit;
+        for (auto j = limit; j < last; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(chars.data() + j * n, chars.data() + new_first * n, n)  == 0)
+            {
+                std::swap(res[new_last], res[j]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+            new_ranges.emplace_back(new_first, new_last);
+    }
+    equal_range = std::move(new_ranges);
+}
+
 void ColumnFixedString::insertRangeFrom(const IColumn & src, size_t start, size_t length)
 {
    const ColumnFixedString & src_concrete = assert_cast<const ColumnFixedString &>(src);
--- a/src/Columns/ColumnFixedString.h
+++ b/src/Columns/ColumnFixedString.h
@ -118,6 +118,8 @@ public:

    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;

+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_range) const override;
+
    void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;

    ColumnPtr filter(const IColumn::Filter & filt, ssize_t result_size_hint) const override;
--- a/src/Columns/ColumnFunction.h
+++ b/src/Columns/ColumnFunction.h
@ -121,6 +121,11 @@ public:
        throw Exception("getPermutation is not implemented for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
    }

+    void updatePermutation(bool, size_t, int, Permutation &, EqualRanges &) const override
+    {
+        throw Exception("updatePermutation is not implemented for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
+    }
+
    void gather(ColumnGathererStream &) override
    {
        throw Exception("Method gather is not supported for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
--- a/src/Columns/ColumnLowCardinality.cpp
+++ b/src/Columns/ColumnLowCardinality.cpp
@ -314,6 +314,76 @@ void ColumnLowCardinality::getPermutation(bool reverse, size_t limit, int nan_di
    }
 }

+void ColumnLowCardinality::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    size_t n = equal_range.size();
+    if (limit)
+        --n;
+
+    EqualRanges new_ranges;
+    for (size_t i = 0; i < n; ++i)
+    {
+        const auto& [first, last] = equal_range[i];
+        if (reverse)
+            std::sort(res.begin() + first, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
+                      {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) > 0; });
+        else
+            std::sort(res.begin() + first, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
+                      {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) < 0; });
+
+        auto new_first = first;
+        for (auto j = first + 1; j < last; ++j)
+        {
+            if (compareAt(new_first, j, *this, nan_direction_hint) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+    }
+
+    if (limit)
+    {
+        const auto& [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
+                              {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) > 0; });
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, [this, nan_direction_hint](size_t a, size_t b)
+                              {return getDictionary().compareAt(getIndexes().getUInt(a), getIndexes().getUInt(b), getDictionary(), nan_direction_hint) < 0; });
+        auto new_first = first;
+        for (auto j = first + 1; j < limit; ++j)
+        {
+            if (getDictionary().compareAt(getIndexes().getUInt(new_first), getIndexes().getUInt(j), getDictionary(), nan_direction_hint) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        auto new_last = limit;
+        for (auto j = limit; j < last; ++j)
+        {
+            if (getDictionary().compareAt(getIndexes().getUInt(new_first), getIndexes().getUInt(j), getDictionary(), nan_direction_hint) == 0)
+            {
+                std::swap(res[new_last], res[j]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+            new_ranges.emplace_back(new_first, new_last);
+    }
+    equal_range = std::move(new_ranges);
+}
+
 std::vector<MutableColumnPtr> ColumnLowCardinality::scatter(ColumnIndex num_columns, const Selector & selector) const
 {
    auto columns = getIndexes().scatter(num_columns, selector);
--- a/src/Columns/ColumnLowCardinality.h
+++ b/src/Columns/ColumnLowCardinality.h
@ -111,6 +111,8 @@ public:

    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;

+    void updatePermutation(bool reverse, size_t limit, int, IColumn::Permutation & res, EqualRanges & equal_range) const override;
+
    ColumnPtr replicate(const Offsets & offsets) const override
    {
        return ColumnLowCardinality::create(dictionary.getColumnUniquePtr(), getIndexes().replicate(offsets));
--- a/src/Columns/ColumnNullable.cpp
+++ b/src/Columns/ColumnNullable.cpp
@ -321,6 +321,75 @@ void ColumnNullable::getPermutation(bool reverse, size_t limit, int null_directi
    }
 }

+void ColumnNullable::updatePermutation(bool reverse, size_t limit, int null_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= equal_range.back().second || limit >= size())
+        limit = 0;
+
+    EqualRanges new_ranges, temp_ranges;
+
+    for (const auto &[first, last] : equal_range)
+    {
+        bool direction = ((null_direction_hint > 0) != reverse);
+        /// Shift all NULL values to the end.
+
+        size_t read_idx = first;
+        size_t write_idx = first;
+        while (read_idx < last && (isNullAt(res[read_idx])^direction))
+        {
+            ++read_idx;
+            ++write_idx;
+        }
+
+        ++read_idx;
+
+        /// Invariants:
+        ///  write_idx < read_idx
+        ///  write_idx points to NULL
+        ///  read_idx will be incremented to position of next not-NULL
+        ///  there are range of NULLs between write_idx and read_idx - 1,
+        /// We are moving elements from end to begin of this range,
+        ///  so range will "bubble" towards the end.
+        /// Relative order of NULL elements could be changed,
+        ///  but relative order of non-NULLs is preserved.
+
+        while (read_idx < last && write_idx < last)
+        {
+            if (isNullAt(res[read_idx])^direction)
+            {
+                std::swap(res[read_idx], res[write_idx]);
+                ++write_idx;
+            }
+            ++read_idx;
+        }
+
+        if (write_idx - first > 1)
+        {
+            if (direction)
+                temp_ranges.emplace_back(first, write_idx);
+            else
+                new_ranges.emplace_back(first, write_idx);
+
+        }
+
+        if (last - write_idx > 1)
+        {
+            if (direction)
+                new_ranges.emplace_back(write_idx, last);
+            else
+                temp_ranges.emplace_back(write_idx, last);
+        }
+    }
+    while (!new_ranges.empty() && limit && limit <= new_ranges.back().first)
+        new_ranges.pop_back();
+
+    if (!temp_ranges.empty())
+        getNestedColumn().updatePermutation(reverse, limit, null_direction_hint, res, temp_ranges);
+
+    equal_range.resize(temp_ranges.size() + new_ranges.size());
+    std::merge(temp_ranges.begin(), temp_ranges.end(), new_ranges.begin(), new_ranges.end(), equal_range.begin());
+}
+
 void ColumnNullable::gather(ColumnGathererStream & gatherer)
 {
    gatherer.gather(*this);
--- a/src/Columns/ColumnNullable.h
+++ b/src/Columns/ColumnNullable.h
@ -78,6 +78,7 @@ public:
    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int null_direction_hint) const override;
    void getPermutation(bool reverse, size_t limit, int null_direction_hint, Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_range) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/src/Columns/ColumnString.cpp
+++ b/src/Columns/ColumnString.cpp
@ -302,6 +302,77 @@ void ColumnString::getPermutation(bool reverse, size_t limit, int /*nan_directio
    }
 }

+void ColumnString::updatePermutation(bool reverse, size_t limit, int /*nan_direction_hint*/, Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= size() || limit > equal_range.back().second)
+        limit = 0;
+
+    EqualRanges new_ranges;
+    auto less_true = less<true>(*this);
+    auto less_false = less<false>(*this);
+    size_t n = equal_range.size();
+    if (limit)
+        --n;
+
+    for (size_t i = 0; i < n; ++i)
+    {
+        const auto &[first, last] = equal_range[i];
+        if (reverse)
+            std::sort(res.begin() + first, res.begin() + last, less_false);
+        else
+            std::sort(res.begin() + first, res.begin() + last, less_true);
+        size_t new_first = first;
+        for (size_t j = first + 1; j < last; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(
+                chars.data() + offsetAt(res[j]), sizeAt(res[j]) - 1,
+                chars.data() + offsetAt(res[new_first]), sizeAt(res[new_first]) - 1) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+    }
+
+    if (limit)
+    {
+        const auto &[first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less_false);
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less_true);
+        size_t new_first = first;
+        for (size_t j = first + 1; j < limit; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(
+                chars.data() + offsetAt(res[j]), sizeAt(res[j]) - 1,
+                chars.data() + offsetAt(res[new_first]), sizeAt(res[new_first]) - 1) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+                new_first = j;
+            }
+        }
+        size_t new_last = limit;
+        for (size_t j = limit; j < last; ++j)
+        {
+            if (memcmpSmallAllowOverflow15(
+                chars.data() + offsetAt(res[j]), sizeAt(res[j]) - 1,
+                chars.data() + offsetAt(res[new_first]), sizeAt(res[new_first]) - 1) == 0)
+            {
+                std::swap(res[j], res[new_last]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+            new_ranges.emplace_back(new_first, new_last);
+    }
+    equal_range = std::move(new_ranges);
+}

 ColumnPtr ColumnString::replicate(const Offsets & replicate_offsets) const
 {
@ -440,6 +511,77 @@ void ColumnString::getPermutationWithCollation(const Collator & collator, bool r
    }
 }

+void ColumnString::updatePermutationWithCollation(const Collator & collator, bool reverse, size_t limit, int, Permutation &res, EqualRanges &equal_range) const
+{
+    if (limit >= size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    size_t n = equal_range.size();
+    if (limit)
+        --n;
+
+    EqualRanges new_ranges;
+    for (size_t i = 0; i < n; ++i)
+    {
+        const auto& [first, last] = equal_range[i];
+        if (reverse)
+            std::sort(res.begin() + first, res.begin() + last, lessWithCollation<false>(*this, collator));
+        else
+            std::sort(res.begin() + first, res.begin() + last, lessWithCollation<true>(*this, collator));
+        auto new_first = first;
+        for (auto j = first + 1; j < last; ++j)
+        {
+            if (collator.compare(
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[new_first])]), sizeAt(res[new_first]),
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[j])]), sizeAt(res[j])) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+            new_ranges.emplace_back(new_first, last);
+
+    }
+
+    if (limit)
+    {
+        const auto& [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, lessWithCollation<false>(*this, collator));
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, lessWithCollation<true>(*this, collator));
+        auto new_first = first;
+        for (auto j = first + 1; j < limit; ++j)
+        {
+            if (collator.compare(
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[new_first])]), sizeAt(res[new_first]),
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[j])]), sizeAt(res[j])) != 0)
+            {
+                if (j - new_first > 1)
+                    new_ranges.emplace_back(new_first, j);
+
+                new_first = j;
+            }
+        }
+        auto new_last = limit;
+        for (auto j = limit; j < last; ++j)
+        {
+            if (collator.compare(
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[new_first])]), sizeAt(res[new_first]),
+                    reinterpret_cast<const char *>(&chars[offsetAt(res[j])]), sizeAt(res[j])) == 0)
+            {
+                std::swap(res[new_last], res[j]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+            new_ranges.emplace_back(new_first, new_last);
+    }
+    equal_range = std::move(new_ranges);
+}

 void ColumnString::protect()
 {
--- a/src/Columns/ColumnString.h
+++ b/src/Columns/ColumnString.h
@ -225,9 +225,13 @@ public:

    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;

+    void updatePermutation(bool reverse, size_t limit, int, Permutation & res, EqualRanges & equal_range) const override;
+
    /// Sorting with respect of collation.
    void getPermutationWithCollation(const Collator & collator, bool reverse, size_t limit, Permutation & res) const;

+    void updatePermutationWithCollation(const Collator & collator, bool reverse, size_t limit, int, Permutation & res, EqualRanges& equal_range) const;
+
    ColumnPtr replicate(const Offsets & replicate_offsets) const override;

    MutableColumns scatter(ColumnIndex num_columns, const Selector & selector) const override
--- a/src/Columns/ColumnTuple.cpp
+++ b/src/Columns/ColumnTuple.cpp
@ -329,6 +329,19 @@ void ColumnTuple::getPermutation(bool reverse, size_t limit, int nan_direction_h
    }
 }

+void ColumnTuple::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    for (const auto& column : columns)
+    {
+        column->updatePermutation(reverse, limit, nan_direction_hint, res, equal_range);
+        while (limit && limit <= equal_range.back().first)
+            equal_range.pop_back();
+
+        if (equal_range.empty())
+            break;
+    }
+}
+
 void ColumnTuple::gather(ColumnGathererStream & gatherer)
 {
    gatherer.gather(*this);
--- a/src/Columns/ColumnTuple.h
+++ b/src/Columns/ColumnTuple.h
@ -72,6 +72,7 @@ public:
    int compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const override;
    void getExtremes(Field & min, Field & max) const override;
    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/src/Columns/ColumnUnique.h
+++ b/src/Columns/ColumnUnique.h
@ -77,6 +77,7 @@ public:
    }

    int compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const override;
+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const override;

    void getExtremes(Field & min, Field & max) const override { column_holder->getExtremes(min, max); }
    bool valuesHaveFixedSize() const override { return column_holder->valuesHaveFixedSize(); }
@ -374,6 +375,39 @@ int ColumnUnique<ColumnType>::compareAt(size_t n, size_t m, const IColumn & rhs,
    return getNestedColumn()->compareAt(n, m, *column_unique.getNestedColumn(), nan_direction_hint);
 }

+template <typename ColumnType>
+void ColumnUnique<ColumnType>::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    bool found_null_value_index = false;
+    for (size_t i = 0; i < equal_range.size() && !found_null_value_index; ++i)
+    {
+        auto& [first, last] = equal_range[i];
+        for (auto j = first; j < last; ++j)
+        {
+            if (res[j] == getNullValueIndex())
+            {
+                if ((nan_direction_hint > 0) != reverse)
+                {
+                    std::swap(res[j], res[last - 1]);
+                    --last;
+                }
+                else
+                {
+                    std::swap(res[j], res[first]);
+                    ++first;
+                }
+                if (last - first <= 1)
+                {
+                    equal_range.erase(equal_range.begin() + i);
+                }
+                found_null_value_index = true;
+                break;
+            }
+        }
+    }
+    getNestedColumn()->updatePermutation(reverse, limit, nan_direction_hint, res, equal_range);
+}
+
 template <typename IndexType>
 static void checkIndexes(const ColumnVector<IndexType> & indexes, size_t max_dictionary_size)
 {
--- a/src/Columns/ColumnVector.cpp
+++ b/src/Columns/ColumnVector.cpp
@ -219,6 +219,76 @@ void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_directi
    }
 }

+template <typename T>
+void ColumnVector<T>::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
+{
+    if (limit >= data.size() || limit >= equal_range.back().second)
+        limit = 0;
+
+    EqualRanges new_ranges;
+
+    for (size_t i = 0; i < equal_range.size() - bool(limit); ++i)
+    {
+        const auto & [first, last] = equal_range[i];
+        if (reverse)
+            pdqsort(res.begin() + first, res.begin() + last, greater(*this, nan_direction_hint));
+        else
+            pdqsort(res.begin() + first, res.begin() + last, less(*this, nan_direction_hint));
+        size_t new_first = first;
+        for (size_t j = first + 1; j < last; ++j)
+        {
+            if (less(*this, nan_direction_hint)(res[j], res[new_first]) || greater(*this, nan_direction_hint)(res[j], res[new_first]))
+            {
+                if (j - new_first > 1)
+                {
+                    new_ranges.emplace_back(new_first, j);
+                }
+                new_first = j;
+            }
+        }
+        if (last - new_first > 1)
+        {
+            new_ranges.emplace_back(new_first, last);
+        }
+    }
+    if (limit)
+    {
+        const auto & [first, last] = equal_range.back();
+        if (reverse)
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, greater(*this, nan_direction_hint));
+        else
+            std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less(*this, nan_direction_hint));
+
+        size_t new_first = first;
+        for (size_t j = first + 1; j < limit; ++j)
+        {
+            if (less(*this, nan_direction_hint)(res[j], res[new_first]) || greater(*this, nan_direction_hint)(res[j], res[new_first]))
+            {
+                if (j - new_first > 1)
+                {
+                    new_ranges.emplace_back(new_first, j);
+                }
+                new_first = j;
+            }
+        }
+
+        size_t new_last = limit;
+        for (size_t j = limit; j < last; ++j)
+        {
+            if (!less(*this, nan_direction_hint)(res[j], res[new_first]) && !greater(*this, nan_direction_hint)(res[j], res[new_first]))
+            {
+                std::swap(res[j], res[new_last]);
+                ++new_last;
+            }
+        }
+        if (new_last - new_first > 1)
+        {
+            new_ranges.emplace_back(new_first, new_last);
+        }
+    }
+    equal_range = std::move(new_ranges);
+}
+

 template <typename T>
 const char * ColumnVector<T>::getFamilyName() const
--- a/src/Columns/ColumnVector.h
+++ b/src/Columns/ColumnVector.h
@ -192,6 +192,8 @@ public:
    void getSpecialPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res,
                               IColumn::SpecialSort) const override;

+    void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges& equal_range) const override;
+
    void reserve(size_t n) override
    {
        data.reserve(n);
--- a/src/Columns/IColumn.h
+++ b/src/Columns/IColumn.h
@ -25,6 +25,13 @@ class ColumnGathererStream;
 class Field;
 class WeakHash32;

+
+/*
+ * Represents a set of equal ranges in previous column to perform sorting in current column.
+ * Used in sorting by tuples.
+ * */
+using EqualRanges = std::vector<std::pair<size_t, size_t> >;
+
 /// Declares interface to store columns in memory.
 class IColumn : public COW<IColumn>
 {
@ -256,6 +263,16 @@ public:
        getPermutation(reverse, limit, nan_direction_hint, res);
    }

+    /*in updatePermutation we pass the current permutation and the intervals at which it should be sorted
+     * Then for each interval separately (except for the last one, if there is a limit)
+     * We sort it based on data about the current column, and find all the intervals within this
+     * interval that had the same values in this column. we can't tell about these values in what order they
+     * should have been, we form a new array with intervals that need to be sorted
+     * If there is a limit, then for the last interval we do partial sorting and all that is described above,
+     * but in addition we still find all the elements equal to the largest sorted, they will also need to be sorted.
+     */
+    virtual void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res, EqualRanges & equal_ranges) const = 0;
+
    /** Copies each element according offsets parameter.
      * (i-th element should be copied offsets[i] - offsets[i - 1] times.)
      * It is necessary in ARRAY JOIN operation.
--- a/src/Columns/IColumnDummy.h
+++ b/src/Columns/IColumnDummy.h
@ -107,6 +107,8 @@ public:
            res[i] = i;
    }

+    void updatePermutation(bool, size_t, int, Permutation &, EqualRanges&) const override {}
+
    ColumnPtr replicate(const Offsets & offsets) const override
    {
        if (s != offsets.size())
--- a/src/Common/PODArray.cpp
+++ b/src/Common/PODArray.cpp
@ -2,7 +2,8 @@

 namespace DB
 {
+
 /// Used for left padding of PODArray when empty
-const char EmptyPODArray[EmptyPODArraySize]{};
+const char empty_pod_array[empty_pod_array_size]{};

 }
--- a/src/Common/PODArray.h
+++ b/src/Common/PODArray.h
@ -63,8 +63,8 @@ namespace ErrorCodes
  * TODO Pass alignment to Allocator.
  * TODO Allow greater alignment than alignof(T). Example: array of char aligned to page size.
  */
-static constexpr size_t EmptyPODArraySize = 1024;
-extern const char EmptyPODArray[EmptyPODArraySize];
+static constexpr size_t empty_pod_array_size = 1024;
+extern const char empty_pod_array[empty_pod_array_size];

 /** Base class that depend only on size of element, not on element itself.
  * You can static_cast to this class if you want to insert some data regardless to the actual type T.
@ -81,9 +81,9 @@ protected:
    /// pad_left is also rounded up to 16 bytes to maintain alignment of allocated memory.
    static constexpr size_t pad_left = integerRoundUp(integerRoundUp(pad_left_, ELEMENT_SIZE), 16);
    /// Empty array will point to this static memory as padding.
-    static constexpr char * null = pad_left ? const_cast<char *>(EmptyPODArray) + EmptyPODArraySize : nullptr;
+    static constexpr char * null = pad_left ? const_cast<char *>(empty_pod_array) + empty_pod_array_size : nullptr;

-    static_assert(pad_left <= EmptyPODArraySize && "Left Padding exceeds EmptyPODArraySize. Is the element size too large?");
+    static_assert(pad_left <= empty_pod_array_size && "Left Padding exceeds empty_pod_array_size. Is the element size too large?");

    // If we are using allocator with inline memory, the minimal size of
    // array must be in sync with the size of this memory.
--- a/src/Interpreters/InterpreterSelectQuery.cpp
+++ b/src/Interpreters/InterpreterSelectQuery.cpp
@ -1533,7 +1533,7 @@ void InterpreterSelectQuery::executeFetchColumns(
        if constexpr (pipeline_with_processors)
        {
            if (streams.size() == 1 || pipes.size() == 1)
-                pipeline.setMaxThreads(streams.size());
+                pipeline.setMaxThreads(1);

            /// Unify streams. They must have same headers.
            if (streams.size() > 1)
--- a/src/Interpreters/InterpreterSelectWithUnionQuery.cpp
+++ b/src/Interpreters/InterpreterSelectWithUnionQuery.cpp
@ -271,6 +271,11 @@ QueryPipeline InterpreterSelectWithUnionQuery::executeWithProcessors()
    {
        auto common_header = getCommonHeaderForUnion(headers);
        main_pipeline.unitePipelines(std::move(pipelines), common_header);
+
+        // nested queries can force 1 thread (due to simplicity)
+        // but in case of union this cannot be done.
+        UInt64 max_threads = context->getSettingsRef().max_threads;
+        main_pipeline.setMaxThreads(std::min<UInt64>(nested_interpreters.size(), max_threads));
    }

    main_pipeline.addInterpreterContext(context);
--- a/src/Interpreters/MutationsInterpreter.cpp
+++ b/src/Interpreters/MutationsInterpreter.cpp
@ -221,13 +221,10 @@ static NameSet getKeyColumns(const StoragePtr & storage)

    NameSet key_columns;

-    if (merge_tree_data->partition_key_expr)
-        for (const String & col : merge_tree_data->partition_key_expr->getRequiredColumns())
+    for (const String & col : merge_tree_data->getColumnsRequiredForPartitionKey())
        key_columns.insert(col);

-    auto sorting_key_expr = merge_tree_data->sorting_key_expr;
-    if (sorting_key_expr)
-        for (const String & col : sorting_key_expr->getRequiredColumns())
+    for (const String & col : merge_tree_data->getColumnsRequiredForSortingKey())
        key_columns.insert(col);
    /// We don't process sample_by_ast separately because it must be among the primary key columns.

--- a/src/Interpreters/sortBlock.cpp
+++ b/src/Interpreters/sortBlock.cpp
@ -101,6 +101,7 @@ struct PartialSortingLessWithCollation
    }
 };

+
 void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
 {
    if (!block)
@ -178,21 +179,47 @@ void sortBlock(Block & block, const SortDescription & description, UInt64 limit)

        if (need_collation)
        {
-            PartialSortingLessWithCollation less_with_collation(columns_with_sort_desc);
+            EqualRanges ranges;
+            ranges.emplace_back(0, perm.size());
+            for (const auto& column : columns_with_sort_desc)
+            {
+                while (!ranges.empty() && limit && limit <= ranges.back().first)
+                    ranges.pop_back();

-            if (limit)
-                std::partial_sort(perm.begin(), perm.begin() + limit, perm.end(), less_with_collation);
-            else
-                pdqsort(perm.begin(), perm.end(), less_with_collation);
+
+                if (ranges.empty())
+                    break;
+
+
+                if (isCollationRequired(column.description))
+                {
+                    const ColumnString & column_string = assert_cast<const ColumnString &>(*column.column);
+                    column_string.updatePermutationWithCollation(*column.description.collator, column.description.direction < 0, limit, column.description.nulls_direction, perm, ranges);
                }
                else
                {
-            PartialSortingLess less(columns_with_sort_desc);
-
-            if (limit)
-                std::partial_sort(perm.begin(), perm.begin() + limit, perm.end(), less);
+                    column.column->updatePermutation(
+                        column.description.direction < 0, limit, column.description.nulls_direction, perm, ranges);
+                }
+            }
+        }
        else
-                pdqsort(perm.begin(), perm.end(), less);
+        {
+            EqualRanges ranges;
+            ranges.emplace_back(0, perm.size());
+            for (const auto& column : columns_with_sort_desc)
+            {
+                while (!ranges.empty() && limit && limit <= ranges.back().first)
+                {
+                    ranges.pop_back();
+                }
+                if (ranges.empty())
+                {
+                    break;
+                }
+                column.column->updatePermutation(
+                    column.description.direction < 0, limit, column.description.nulls_direction, perm, ranges);
+            }
        }

        size_t columns = block.columns();
--- a/src/Interpreters/tests/CMakeLists.txt
+++ b/src/Interpreters/tests/CMakeLists.txt
@ -51,10 +51,6 @@ add_executable (in_join_subqueries_preprocessor in_join_subqueries_preprocessor.
 target_link_libraries (in_join_subqueries_preprocessor PRIVATE dbms clickhouse_parsers)
 add_check(in_join_subqueries_preprocessor)

-add_executable (expression_analyzer expression_analyzer.cpp)
-target_link_libraries (expression_analyzer PRIVATE dbms clickhouse_storages_system clickhouse_parsers clickhouse_common_io)
-add_check(expression_analyzer)
-
 add_executable (users users.cpp)
 target_link_libraries (users PRIVATE dbms clickhouse_common_config)

--- a/src/Interpreters/tests/expression_analyzer.cpp
+++ b/src/Interpreters/tests/expression_analyzer.cpp
@ -1,128 +0,0 @@
-#include <DataTypes/DataTypesNumber.h>
-
-#include <Storages/System/StorageSystemOne.h>
-#include <Storages/System/StorageSystemNumbers.h>
-#include <Databases/DatabaseMemory.h>
-
-#include <Parsers/ParserSelectQuery.h>
-#include <Parsers/parseQuery.h>
-
-#include <Interpreters/Context.h>
-#include <Interpreters/SyntaxAnalyzer.h>
-
-#include <IO/WriteBufferFromFileDescriptor.h>
-#include <IO/ReadBufferFromFileDescriptor.h>
-
-#include <vector>
-#include <unordered_map>
-#include <iostream>
-
-
-using namespace DB;
-
-namespace DB
-{
-    namespace ErrorCodes
-    {
-        extern const int SYNTAX_ERROR;
-    }
-}
-
-struct TestEntry
-{
-    String query;
-    std::unordered_map<String, String> expected_aliases; /// alias -> AST.getID()
-    NamesAndTypesList source_columns = {};
-
-    bool check(const Context & context)
-    {
-        ASTPtr ast = parse(query);
-
-        auto res = SyntaxAnalyzer(context).analyze(ast, source_columns);
-        return checkAliases(*res);
-    }
-
-private:
-    bool checkAliases(const SyntaxAnalyzerResult & res)
-    {
-        for (const auto & alias : res.aliases)
-        {
-            const String & alias_name = alias.first;
-            if (expected_aliases.count(alias_name) == 0 ||
-                expected_aliases[alias_name] != alias.second->getID())
-            {
-                std::cout << "unexpected alias: " << alias_name << ' ' << alias.second->getID() << std::endl;
-                return false;
-            }
-            else
-                expected_aliases.erase(alias_name);
-        }
-
-        if (!expected_aliases.empty())
-        {
-            std::cout << "missing aliases: " << expected_aliases.size() << std::endl;
-            return false;
-        }
-
-        return true;
-    }
-
-    static ASTPtr parse(const std::string & query)
-    {
-        ParserSelectQuery parser;
-        std::string message;
-        const auto * text = query.data();
-        if (ASTPtr ast = tryParseQuery(parser, text, text + query.size(), message, false, "", false, 0, 0))
-            return ast;
-        throw Exception(message, ErrorCodes::SYNTAX_ERROR);
-    }
-};
-
-
-int main()
-{
-    std::vector<TestEntry> queries =
-    {
-        {
-            "SELECT number AS n FROM system.numbers LIMIT 0",
-            {{"n", "Identifier_number"}},
-            { NameAndTypePair("number", std::make_shared<DataTypeUInt64>()) }
-        },
-
-        {
-            "SELECT number AS n FROM system.numbers LIMIT 0",
-            {{"n", "Identifier_number"}}
-        }
-    };
-
-    SharedContextHolder shared_context = Context::createShared();
-    Context context = Context::createGlobal(shared_context.get());
-    context.makeGlobalContext();
-
-    auto system_database = std::make_shared<DatabaseMemory>("system");
-    DatabaseCatalog::instance().attachDatabase("system", system_database);
-    //context.setCurrentDatabase("system");
-    system_database->attachTable("one", StorageSystemOne::create("one"), {});
-    system_database->attachTable("numbers", StorageSystemNumbers::create(StorageID("system", "numbers"), false), {});
-
-    size_t success = 0;
-    for (auto & entry : queries)
-    {
-        try
-        {
-            if (entry.check(context))
-            {
-                ++success;
-                std::cout << "[OK] " << entry.query << std::endl;
-            }
-            else
-                std::cout << "[Failed] " << entry.query << std::endl;
-        }
-        catch (Exception & e)
-        {
-            std::cout << "[Error] " << entry.query << std::endl << e.displayText() << std::endl;
-        }
-    }
-
-    return success != queries.size();
-}
--- a/src/Storages/IStorage.cpp
+++ b/src/Storages/IStorage.cpp
@ -9,6 +9,7 @@
 #include <Interpreters/Context.h>
 #include <Common/StringUtils/StringUtils.h>
 #include <Common/quoteString.h>
+#include <Interpreters/ExpressionActions.h>

 #include <Processors/Executors/TreeExecutorBlockInputStream.h>

@ -433,4 +434,111 @@ NamesAndTypesList IStorage::getVirtuals() const
    return {};
 }

+const StorageMetadataKeyField & IStorage::getPartitionKey() const
+{
+    return partition_key;
+}
+
+void IStorage::setPartitionKey(const StorageMetadataKeyField & partition_key_)
+{
+    partition_key = partition_key_;
+}
+
+bool IStorage::hasPartitionKey() const
+{
+    return partition_key.expression != nullptr;
+}
+
+Names IStorage::getColumnsRequiredForPartitionKey() const
+{
+    if (hasPartitionKey())
+        return partition_key.expression->getRequiredColumns();
+    return {};
+}
+
+const StorageMetadataKeyField & IStorage::getSortingKey() const
+{
+    return sorting_key;
+}
+
+void IStorage::setSortingKey(const StorageMetadataKeyField & sorting_key_)
+{
+    sorting_key = sorting_key_;
+}
+
+bool IStorage::hasSortingKey() const
+{
+    return sorting_key.expression != nullptr;
+}
+
+Names IStorage::getColumnsRequiredForSortingKey() const
+{
+    if (hasSortingKey())
+        return sorting_key.expression->getRequiredColumns();
+    return {};
+}
+
+Names IStorage::getSortingKeyColumns() const
+{
+    if (hasSortingKey())
+        return sorting_key.column_names;
+    return {};
+}
+
+const StorageMetadataKeyField & IStorage::getPrimaryKey() const
+{
+    return primary_key;
+}
+
+void IStorage::setPrimaryKey(const StorageMetadataKeyField & primary_key_)
+{
+    primary_key = primary_key_;
+}
+
+bool IStorage::isPrimaryKeyDefined() const
+{
+    return primary_key.definition_ast != nullptr;
+}
+
+bool IStorage::hasPrimaryKey() const
+{
+    return primary_key.expression != nullptr;
+}
+
+Names IStorage::getColumnsRequiredForPrimaryKey() const
+{
+    if (hasPrimaryKey())
+        return primary_key.expression->getRequiredColumns();
+    return {};
+}
+
+Names IStorage::getPrimaryKeyColumns() const
+{
+    if (hasSortingKey())
+        return primary_key.column_names;
+    return {};
+}
+
+const StorageMetadataKeyField & IStorage::getSamplingKey() const
+{
+    return sampling_key;
+}
+
+void IStorage::setSamplingKey(const StorageMetadataKeyField & sampling_key_)
+{
+    sampling_key = sampling_key_;
+}
+
+bool IStorage::hasSamplingKey() const
+{
+    return sampling_key.expression != nullptr;
+}
+
+Names IStorage::getColumnsRequiredForSampling() const
+{
+    if (hasSamplingKey())
+        return sampling_key.expression->getRequiredColumns();
+    return {};
+}
+
 }
--- a/src/Storages/IStorage.h
+++ b/src/Storages/IStorage.h
@ -101,7 +101,7 @@ public:
    virtual bool isView() const { return false; }

    /// Returns true if the storage supports queries with the SAMPLE section.
-    virtual bool supportsSampling() const { return false; }
+    virtual bool supportsSampling() const { return hasSamplingKey(); }

    /// Returns true if the storage supports queries with the FINAL section.
    virtual bool supportsFinal() const { return false; }
@ -195,10 +195,16 @@ protected: /// still thread-unsafe part.
 private:
    StorageID storage_id;
    mutable std::mutex id_mutex;
+
    ColumnsDescription columns;
    IndicesDescription indices;
    ConstraintsDescription constraints;

+    StorageMetadataKeyField partition_key;
+    StorageMetadataKeyField primary_key;
+    StorageMetadataKeyField sorting_key;
+    StorageMetadataKeyField sampling_key;
+
 private:
    RWLockImpl::LockHolder tryLockTimed(
        const RWLock & rwlock, RWLockImpl::Type type, const String & query_id, const SettingSeconds & acquire_timeout) const;
@ -440,36 +446,66 @@ public:
    /// Returns data paths if storage supports it, empty vector otherwise.
    virtual Strings getDataPaths() const { return {}; }

+    /// Returns structure with partition key.
+    const StorageMetadataKeyField & getPartitionKey() const;
+    /// Set partition key for storage (methods bellow, are just wrappers for this
+    /// struct).
+    void setPartitionKey(const StorageMetadataKeyField & partition_key_);
    /// Returns ASTExpressionList of partition key expression for storage or nullptr if there is none.
-    virtual ASTPtr getPartitionKeyAST() const { return nullptr; }
-
-    /// Returns ASTExpressionList of sorting key expression for storage or nullptr if there is none.
-    virtual ASTPtr getSortingKeyAST() const { return nullptr; }
-
-    /// Returns ASTExpressionList of primary key expression for storage or nullptr if there is none.
-    virtual ASTPtr getPrimaryKeyAST() const { return nullptr; }
-
-    /// Returns sampling expression AST for storage or nullptr if there is none.
-    virtual ASTPtr getSamplingKeyAST() const { return nullptr; }
-
+    ASTPtr getPartitionKeyAST() const { return partition_key.definition_ast; }
+    /// Storage has partition key.
+    bool hasPartitionKey() const;
    /// Returns column names that need to be read to calculate partition key.
-    virtual Names getColumnsRequiredForPartitionKey() const { return {}; }
+    Names getColumnsRequiredForPartitionKey() const;

+
+    /// Returns structure with sorting key.
+    const StorageMetadataKeyField & getSortingKey() const;
+    /// Set sorting key for storage (methods bellow, are just wrappers for this
+    /// struct).
+    void setSortingKey(const StorageMetadataKeyField & sorting_key_);
+    /// Returns ASTExpressionList of sorting key expression for storage or nullptr if there is none.
+    ASTPtr getSortingKeyAST() const { return sorting_key.definition_ast; }
+    /// Storage has sorting key.
+    bool hasSortingKey() const;
    /// Returns column names that need to be read to calculate sorting key.
-    virtual Names getColumnsRequiredForSortingKey() const { return {}; }
-
-    /// Returns column names that need to be read to calculate primary key.
-    virtual Names getColumnsRequiredForPrimaryKey() const { return {}; }
-
-    /// Returns column names that need to be read to calculate sampling key.
-    virtual Names getColumnsRequiredForSampling() const { return {}; }
-
-    /// Returns column names that need to be read for FINAL to work.
-    virtual Names getColumnsRequiredForFinal() const { return {}; }
-
+    Names getColumnsRequiredForSortingKey() const;
    /// Returns columns names in sorting key specified by user in ORDER BY
    /// expression. For example: 'a', 'x * y', 'toStartOfMonth(date)', etc.
-    virtual Names getSortingKeyColumns() const { return {}; }
+    Names getSortingKeyColumns() const;
+
+    /// Returns structure with primary key.
+    const StorageMetadataKeyField & getPrimaryKey() const;
+    /// Set primary key for storage (methods bellow, are just wrappers for this
+    /// struct).
+    void setPrimaryKey(const StorageMetadataKeyField & primary_key_);
+    /// Returns ASTExpressionList of primary key expression for storage or nullptr if there is none.
+    ASTPtr getPrimaryKeyAST() const { return primary_key.definition_ast; }
+    /// Storage has user-defined (in CREATE query) sorting key.
+    bool isPrimaryKeyDefined() const;
+    /// Storage has primary key (maybe part of some other key).
+    bool hasPrimaryKey() const;
+    /// Returns column names that need to be read to calculate primary key.
+    Names getColumnsRequiredForPrimaryKey() const;
+    /// Returns columns names in sorting key specified by. For example: 'a', 'x
+    /// * y', 'toStartOfMonth(date)', etc.
+    Names getPrimaryKeyColumns() const;
+
+    /// Returns structure with sampling key.
+    const StorageMetadataKeyField & getSamplingKey() const;
+    /// Set sampling key for storage (methods bellow, are just wrappers for this
+    /// struct).
+    void setSamplingKey(const StorageMetadataKeyField & sampling_key_);
+    /// Returns sampling expression AST for storage or nullptr if there is none.
+    ASTPtr getSamplingKeyAST() const { return sampling_key.definition_ast; }
+    /// Storage has sampling key.
+    bool hasSamplingKey() const;
+    /// Returns column names that need to be read to calculate sampling key.
+    Names getColumnsRequiredForSampling() const;
+
+    /// Returns column names that need to be read for FINAL to work.
+    Names getColumnsRequiredForFinal() const { return getColumnsRequiredForSortingKey(); }
+

    /// Returns columns, which will be needed to calculate dependencies
    /// (skip indices, TTL expressions) if we update @updated_columns set of columns.
--- a/src/Storages/Kafka/ReadBufferFromKafkaConsumer.cpp
+++ b/src/Storages/Kafka/ReadBufferFromKafkaConsumer.cpp
@ -15,6 +15,7 @@ namespace ErrorCodes

 using namespace std::chrono_literals;
 const auto MAX_TIME_TO_WAIT_FOR_ASSIGNMENT_MS = 15000;
+const auto DRAIN_TIMEOUT_MS = 5000ms;


 ReadBufferFromKafkaConsumer::ReadBufferFromKafkaConsumer(
@ -80,9 +81,72 @@ ReadBufferFromKafkaConsumer::ReadBufferFromKafkaConsumer(
    });
 }

-// NOTE on removed desctuctor: There is no need to unsubscribe prior to calling rd_kafka_consumer_close().
-// check: https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#termination
-// manual destruction was source of weird errors (hangs during droping kafka table, etc.)
+ReadBufferFromKafkaConsumer::~ReadBufferFromKafkaConsumer()
+{
+    try
+    {
+        if (!consumer->get_subscription().empty())
+        {
+            try
+            {
+                consumer->unsubscribe();
+            }
+            catch (const cppkafka::HandleException & e)
+            {
+                LOG_ERROR(log, "Error during unsubscribe: {}", e.what());
+            }
+            drain();
+        }
+    }
+    catch (const cppkafka::HandleException & e)
+    {
+        LOG_ERROR(log, "Error while destructing consumer: {}", e.what());
+    }
+
+}
+
+// Needed to drain rest of the messages / queued callback calls from the consumer
+// after unsubscribe, otherwise consumer will hang on destruction
+// see https://github.com/edenhill/librdkafka/issues/2077
+//     https://github.com/confluentinc/confluent-kafka-go/issues/189 etc.
+void ReadBufferFromKafkaConsumer::drain()
+{
+    auto start_time = std::chrono::steady_clock::now();
+    cppkafka::Error last_error(RD_KAFKA_RESP_ERR_NO_ERROR);
+
+    while (true)
+    {
+        auto msg = consumer->poll(100ms);
+        if (!msg)
+            break;
+
+        auto error = msg.get_error();
+
+        if (error)
+        {
+            if (msg.is_eof() || error == last_error)
+            {
+                break;
+            }
+            else
+            {
+                LOG_ERROR(log, "Error during draining: {}", error);
+            }
+        }
+
+        // i don't stop draining on first error,
+        // only if it repeats once again sequentially
+        last_error = error;
+
+        auto ts = std::chrono::steady_clock::now();
+        if (std::chrono::duration_cast<std::chrono::milliseconds>(ts-start_time) > DRAIN_TIMEOUT_MS)
+        {
+            LOG_ERROR(log, "Timeout during draining.");
+            break;
+        }
+    }
+}
+

 void ReadBufferFromKafkaConsumer::commit()
 {
--- a/src/Storages/Kafka/ReadBufferFromKafkaConsumer.h
+++ b/src/Storages/Kafka/ReadBufferFromKafkaConsumer.h
@ -28,7 +28,7 @@ public:
        const std::atomic<bool> & stopped_,
        const Names & _topics
    );
-
+    ~ReadBufferFromKafkaConsumer() override;
    void allowNext() { allowed = true; } // Allow to read next message.
    void commit(); // Commit all processed messages.
    void subscribe(); // Subscribe internal consumer to topics.
@ -75,6 +75,8 @@ private:
    cppkafka::TopicPartitionList assignment;
    const Names topics;

+    void drain();
+
    bool nextImpl() override;
 };

--- a/src/Storages/Kafka/StorageKafka.cpp
+++ b/src/Storages/Kafka/StorageKafka.cpp
@ -293,6 +293,7 @@ ConsumerBufferPtr StorageKafka::createReadBuffer()

    // Create a consumer and subscribe to topics
    auto consumer = std::make_shared<cppkafka::Consumer>(conf);
+    consumer->set_destroy_flags(RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE);

    // Limit the number of batched messages to allow early cancellations
    const Settings & settings = global_context.getSettingsRef();
--- a/src/Storages/MergeTree/IMergeTreeDataPart.cpp
+++ b/src/Storages/MergeTree/IMergeTreeDataPart.cpp
@ -418,7 +418,7 @@ void IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool require_columns_checks
    loadIndexGranularity();
    calculateColumnsSizesOnDisk();
    loadIndex();     /// Must be called after loadIndexGranularity as it uses the value of `index_granularity`
-    loadRowsCount(); /// Must be called after loadIndex() as it uses the value of `index_granularity`.
+    loadRowsCount(); /// Must be called after loadIndexGranularity() as it uses the value of `index_granularity`.
    loadPartitionAndMinMaxIndex();
    loadTTLInfos();

@ -437,7 +437,8 @@ void IMergeTreeDataPart::loadIndex()
    if (!index_granularity.isInitialized())
        throw Exception("Index granularity is not loaded before index loading", ErrorCodes::LOGICAL_ERROR);

-    size_t key_size = storage.primary_key_columns.size();
+    const auto & primary_key = storage.getPrimaryKey();
+    size_t key_size = primary_key.column_names.size();

    if (key_size)
    {
@ -446,23 +447,25 @@ void IMergeTreeDataPart::loadIndex()

        for (size_t i = 0; i < key_size; ++i)
        {
-            loaded_index[i] = storage.primary_key_data_types[i]->createColumn();
+            loaded_index[i] = primary_key.data_types[i]->createColumn();
            loaded_index[i]->reserve(index_granularity.getMarksCount());
        }

        String index_path = getFullRelativePath() + "primary.idx";
        auto index_file = openForReading(volume->getDisk(), index_path);

-        for (size_t i = 0; i < index_granularity.getMarksCount(); ++i) //-V756
+        size_t marks_count = index_granularity.getMarksCount();
+
+        for (size_t i = 0; i < marks_count; ++i) //-V756
            for (size_t j = 0; j < key_size; ++j)
-                storage.primary_key_data_types[j]->deserializeBinary(*loaded_index[j], *index_file);
+                primary_key.data_types[j]->deserializeBinary(*loaded_index[j], *index_file);

        for (size_t i = 0; i < key_size; ++i)
        {
            loaded_index[i]->protect();
-            if (loaded_index[i]->size() != index_granularity.getMarksCount())
+            if (loaded_index[i]->size() != marks_count)
                throw Exception("Cannot read all data from index file " + index_path
-                    + "(expected size: " + toString(index_granularity.getMarksCount()) + ", read: " + toString(loaded_index[i]->size()) + ")",
+                    + "(expected size: " + toString(marks_count) + ", read: " + toString(loaded_index[i]->size()) + ")",
                    ErrorCodes::CANNOT_READ_ALL_DATA);
        }

@ -493,7 +496,7 @@ void IMergeTreeDataPart::loadPartitionAndMinMaxIndex()
            minmax_idx.load(storage, volume->getDisk(), path);
    }

-    String calculated_partition_id = partition.getID(storage.partition_key_sample);
+    String calculated_partition_id = partition.getID(storage.getPartitionKey().sample_block);
    if (calculated_partition_id != info.partition_id)
        throw Exception(
            "While loading part " + getFullPath() + ": calculated partition ID: " + calculated_partition_id
@ -838,7 +841,7 @@ void IMergeTreeDataPart::checkConsistencyBase() const

    if (!checksums.empty())
    {
-        if (!storage.primary_key_columns.empty() && !checksums.files.count("primary.idx"))
+        if (storage.hasPrimaryKey() && !checksums.files.count("primary.idx"))
            throw Exception("No checksum for primary.idx", ErrorCodes::NO_FILE_IN_DATA_PART);

        if (storage.format_version >= MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
@ -846,7 +849,7 @@ void IMergeTreeDataPart::checkConsistencyBase() const
            if (!checksums.files.count("count.txt"))
                throw Exception("No checksum for count.txt", ErrorCodes::NO_FILE_IN_DATA_PART);

-            if (storage.partition_key_expr && !checksums.files.count("partition.dat"))
+            if (storage.hasPartitionKey() && !checksums.files.count("partition.dat"))
                throw Exception("No checksum for partition.dat", ErrorCodes::NO_FILE_IN_DATA_PART);

            if (!isEmpty())
@ -872,14 +875,14 @@ void IMergeTreeDataPart::checkConsistencyBase() const
        };

        /// Check that the primary key index is not empty.
-        if (!storage.primary_key_columns.empty())
+        if (storage.hasPrimaryKey())
            check_file_not_empty(volume->getDisk(), path + "primary.idx");

        if (storage.format_version >= MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
        {
            check_file_not_empty(volume->getDisk(), path + "count.txt");

-            if (storage.partition_key_expr)
+            if (storage.hasPartitionKey())
                check_file_not_empty(volume->getDisk(), path + "partition.dat");

            for (const String & col_name : storage.minmax_idx_columns)
--- a/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/src/Storages/MergeTree/MergeTreeData.cpp
@ -131,8 +131,6 @@ MergeTreeData::MergeTreeData(
    : IStorage(table_id_)
    , global_context(context_)
    , merging_params(merging_params_)
-    , partition_by_ast(metadata.partition_by_ast)
-    , sample_by_ast(metadata.sample_by_ast)
    , settings_ast(metadata.settings_ast)
    , require_part_metadata(require_part_metadata_)
    , relative_data_path(relative_data_path_)
@ -153,16 +151,16 @@ MergeTreeData::MergeTreeData(
    /// NOTE: using the same columns list as is read when performing actual merges.
    merging_params.check(getColumns().getAllPhysical());

-    if (sample_by_ast)
+    if (metadata.sample_by_ast != nullptr)
    {
-        sampling_expr_column_name = sample_by_ast->getColumnName();
+        StorageMetadataKeyField candidate_sampling_key = StorageMetadataKeyField::getKeyFromAST(metadata.sample_by_ast, getColumns(), global_context);

-        if (!primary_key_sample.has(sampling_expr_column_name)
-            && !attach && !settings->compatibility_allow_sampling_expression_not_in_primary_key) /// This is for backward compatibility.
+        const auto & pk_sample_block = getPrimaryKey().sample_block;
+        if (!pk_sample_block.has(candidate_sampling_key.column_names[0]) && !attach
+            && !settings->compatibility_allow_sampling_expression_not_in_primary_key) /// This is for backward compatibility.
            throw Exception("Sampling expression must be present in the primary key", ErrorCodes::BAD_ARGUMENTS);

-        auto syntax = SyntaxAnalyzer(global_context).analyze(sample_by_ast, getColumns().getAllPhysical());
-        columns_required_for_sampling = syntax->requiredSourceColumns();
+        setSamplingKey(candidate_sampling_key);
    }

    MergeTreeDataFormatVersion min_format_version(0);
@ -170,8 +168,8 @@ MergeTreeData::MergeTreeData(
    {
        try
        {
-            partition_by_ast = makeASTFunction("toYYYYMM", std::make_shared<ASTIdentifier>(date_column_name));
-            initPartitionKey();
+            auto partition_by_ast = makeASTFunction("toYYYYMM", std::make_shared<ASTIdentifier>(date_column_name));
+            initPartitionKey(partition_by_ast);

            if (minmax_idx_date_column_pos == -1)
                throw Exception("Could not find Date column", ErrorCodes::BAD_TYPE_OF_FIELD);
@ -186,7 +184,7 @@ MergeTreeData::MergeTreeData(
    else
    {
        is_custom_partitioned = true;
-        initPartitionKey();
+        initPartitionKey(metadata.partition_by_ast);
        min_format_version = MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING;
    }

@ -252,20 +250,20 @@ StorageInMemoryMetadata MergeTreeData::getInMemoryMetadata() const
 {
    StorageInMemoryMetadata metadata(getColumns(), getIndices(), getConstraints());

-    if (partition_by_ast)
-        metadata.partition_by_ast = partition_by_ast->clone();
+    if (hasPartitionKey())
+        metadata.partition_by_ast = getPartitionKeyAST()->clone();

-    if (order_by_ast)
-        metadata.order_by_ast = order_by_ast->clone();
+    if (hasSortingKey())
+        metadata.order_by_ast = getSortingKeyAST()->clone();

-    if (primary_key_ast)
-        metadata.primary_key_ast = primary_key_ast->clone();
+    if (isPrimaryKeyDefined())
+        metadata.primary_key_ast = getPrimaryKeyAST()->clone();

    if (ttl_table_ast)
        metadata.ttl_for_table_ast = ttl_table_ast->clone();

-    if (sample_by_ast)
-        metadata.sample_by_ast = sample_by_ast->clone();
+    if (hasSamplingKey())
+        metadata.sample_by_ast = getSamplingKeyAST()->clone();

    if (settings_ast)
        metadata.settings_ast = settings_ast->clone();
@ -352,17 +350,18 @@ void MergeTreeData::setProperties(const StorageInMemoryMetadata & metadata, bool
    auto all_columns = metadata.columns.getAllPhysical();

    /// Order by check AST
-    if (order_by_ast && only_check)
+    if (hasSortingKey() && only_check)
    {
        /// This is ALTER, not CREATE/ATTACH TABLE. Let us check that all new columns used in the sorting key
        /// expression have just been added (so that the sorting order is guaranteed to be valid with the new key).

        ASTPtr added_key_column_expr_list = std::make_shared<ASTExpressionList>();
+        const auto & old_sorting_key_columns = getSortingKeyColumns();
        for (size_t new_i = 0, old_i = 0; new_i < sorting_key_size; ++new_i)
        {
-            if (old_i < sorting_key_columns.size())
+            if (old_i < old_sorting_key_columns.size())
            {
-                if (new_sorting_key_columns[new_i] != sorting_key_columns[old_i])
+                if (new_sorting_key_columns[new_i] != old_sorting_key_columns[old_i])
                    added_key_column_expr_list->children.push_back(new_sorting_key_expr_list->children[new_i]);
                else
                    ++old_i;
@ -417,6 +416,12 @@ void MergeTreeData::setProperties(const StorageInMemoryMetadata & metadata, bool
        new_primary_key_data_types.push_back(elem.type);
    }

+    DataTypes new_sorting_key_data_types;
+    for (size_t i = 0; i < sorting_key_size; ++i)
+    {
+        new_sorting_key_data_types.push_back(new_sorting_key_sample.getByPosition(i).type);
+    }
+
    ASTPtr skip_indices_with_primary_key_expr_list = new_primary_key_expr_list->clone();
    ASTPtr skip_indices_with_sorting_key_expr_list = new_sorting_key_expr_list->clone();

@ -466,17 +471,23 @@ void MergeTreeData::setProperties(const StorageInMemoryMetadata & metadata, bool
    {
        setColumns(std::move(metadata.columns));

-        order_by_ast = metadata.order_by_ast;
-        sorting_key_columns = std::move(new_sorting_key_columns);
-        sorting_key_expr_ast = std::move(new_sorting_key_expr_list);
-        sorting_key_expr = std::move(new_sorting_key_expr);
+        StorageMetadataKeyField new_sorting_key;
+        new_sorting_key.definition_ast = metadata.order_by_ast;
+        new_sorting_key.column_names = std::move(new_sorting_key_columns);
+        new_sorting_key.expression_list_ast = std::move(new_sorting_key_expr_list);
+        new_sorting_key.expression = std::move(new_sorting_key_expr);
+        new_sorting_key.sample_block = std::move(new_sorting_key_sample);
+        new_sorting_key.data_types = std::move(new_sorting_key_data_types);
+        setSortingKey(new_sorting_key);

-        primary_key_ast = metadata.primary_key_ast;
-        primary_key_columns = std::move(new_primary_key_columns);
-        primary_key_expr_ast = std::move(new_primary_key_expr_list);
-        primary_key_expr = std::move(new_primary_key_expr);
-        primary_key_sample = std::move(new_primary_key_sample);
-        primary_key_data_types = std::move(new_primary_key_data_types);
+        StorageMetadataKeyField new_primary_key;
+        new_primary_key.definition_ast = metadata.primary_key_ast;
+        new_primary_key.column_names = std::move(new_primary_key_columns);
+        new_primary_key.expression_list_ast = std::move(new_primary_key_expr_list);
+        new_primary_key.expression = std::move(new_primary_key_expr);
+        new_primary_key.sample_block = std::move(new_primary_key_sample);
+        new_primary_key.data_types = std::move(new_primary_key_data_types);
+        setPrimaryKey(new_primary_key);

        setIndices(metadata.indices);
        skip_indices = std::move(new_indices);
@ -511,28 +522,17 @@ ASTPtr MergeTreeData::extractKeyExpressionList(const ASTPtr & node)
 }


-void MergeTreeData::initPartitionKey()
+void MergeTreeData::initPartitionKey(ASTPtr partition_by_ast)
 {
-    ASTPtr partition_key_expr_list = extractKeyExpressionList(partition_by_ast);
+    StorageMetadataKeyField new_partition_key = StorageMetadataKeyField::getKeyFromAST(partition_by_ast, getColumns(), global_context);

-    if (partition_key_expr_list->children.empty())
+    if (new_partition_key.expression_list_ast->children.empty())
        return;

-    {
-        auto syntax_result = SyntaxAnalyzer(global_context).analyze(partition_key_expr_list, getColumns().getAllPhysical());
-        partition_key_expr = ExpressionAnalyzer(partition_key_expr_list, syntax_result, global_context).getActions(false);
-    }
-
-    for (const ASTPtr & ast : partition_key_expr_list->children)
-    {
-        String col_name = ast->getColumnName();
-        partition_key_sample.insert(partition_key_expr->getSampleBlock().getByName(col_name));
-    }
-
-    checkKeyExpression(*partition_key_expr, partition_key_sample, "Partition");
+    checkKeyExpression(*new_partition_key.expression, new_partition_key.sample_block, "Partition");

    /// Add all columns used in the partition key to the min-max index.
-    const NamesAndTypesList & minmax_idx_columns_with_types = partition_key_expr->getRequiredColumnsWithTypes();
+    const NamesAndTypesList & minmax_idx_columns_with_types = new_partition_key.expression->getRequiredColumnsWithTypes();
    minmax_idx_expr = std::make_shared<ExpressionActions>(minmax_idx_columns_with_types, global_context);
    for (const NameAndTypePair & column : minmax_idx_columns_with_types)
    {
@ -577,6 +577,7 @@ void MergeTreeData::initPartitionKey()
            }
        }
    }
+    setPartitionKey(new_partition_key);
 }

 namespace
@ -631,12 +632,12 @@ void MergeTreeData::setTTLExpressions(const ColumnsDescription & new_columns,
    {
        NameSet columns_ttl_forbidden;

-        if (partition_key_expr)
-            for (const auto & col : partition_key_expr->getRequiredColumns())
+        if (hasPartitionKey())
+            for (const auto & col : getColumnsRequiredForPartitionKey())
                columns_ttl_forbidden.insert(col);

-        if (sorting_key_expr)
-            for (const auto & col : sorting_key_expr->getRequiredColumns())
+        if (hasSortingKey())
+            for (const auto & col : getColumnsRequiredForSortingKey())
                columns_ttl_forbidden.insert(col);

        for (const auto & [name, ast] : new_column_ttls)
@ -1418,12 +1419,12 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, const S
    /// (and not as a part of some expression) and if the ALTER only affects column metadata.
    NameSet columns_alter_type_metadata_only;

-    if (partition_key_expr)
+    if (hasPartitionKey())
    {
        /// Forbid altering partition key columns because it can change partition ID format.
        /// TODO: in some cases (e.g. adding an Enum value) a partition key column can still be ALTERed.
        /// We should allow it.
-        for (const String & col : partition_key_expr->getRequiredColumns())
+        for (const String & col : getColumnsRequiredForPartitionKey())
            columns_alter_type_forbidden.insert(col);
    }

@ -1433,8 +1434,9 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, const S
            columns_alter_type_forbidden.insert(col);
    }

-    if (sorting_key_expr)
+    if (hasSortingKey())
    {
+        auto sorting_key_expr = getSortingKey().expression;
        for (const ExpressionAction & action : sorting_key_expr->getActions())
        {
            auto action_columns = action.getNeededColumns();
@ -2616,7 +2618,7 @@ String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, const Context

    /// Re-parse partition key fields using the information about expected field types.

-    size_t fields_count = partition_key_sample.columns();
+    size_t fields_count = getPartitionKey().sample_block.columns();
    if (partition_ast.fields_count != fields_count)
        throw Exception(
            "Wrong number of fields in the partition expression: " + toString(partition_ast.fields_count) +
@ -2633,7 +2635,7 @@ String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, const Context
        ReadBufferFromMemory right_paren_buf(")", 1);
        ConcatReadBuffer buf({&left_paren_buf, &fields_buf, &right_paren_buf});

-        auto input_stream = FormatFactory::instance().getInput("Values", buf, partition_key_sample, context, context.getSettingsRef().max_block_size);
+        auto input_stream = FormatFactory::instance().getInput("Values", buf, getPartitionKey().sample_block, context, context.getSettingsRef().max_block_size);

        auto block = input_stream->read();
        if (!block || !block.rows())
@ -3084,7 +3086,7 @@ bool MergeTreeData::isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(const A
 {
    const String column_name = node->getColumnName();

-    for (const auto & name : primary_key_columns)
+    for (const auto & name : getPrimaryKeyColumns())
        if (column_name == name)
            return true;

@ -3144,10 +3146,10 @@ MergeTreeData & MergeTreeData::checkStructureAndGetMergeTreeData(IStorage & sour
        return ast ? queryToString(ast) : "";
    };

-    if (query_to_string(order_by_ast) != query_to_string(src_data->order_by_ast))
+    if (query_to_string(getSortingKeyAST()) != query_to_string(src_data->getSortingKeyAST()))
        throw Exception("Tables have different ordering", ErrorCodes::BAD_ARGUMENTS);

-    if (query_to_string(partition_by_ast) != query_to_string(src_data->partition_by_ast))
+    if (query_to_string(getPartitionKeyAST()) != query_to_string(src_data->getPartitionKeyAST()))
        throw Exception("Tables have different partition key", ErrorCodes::BAD_ARGUMENTS);

    if (format_version != src_data->format_version)
--- a/src/Storages/MergeTree/MergeTreeData.h
+++ b/src/Storages/MergeTree/MergeTreeData.h
@ -335,24 +335,11 @@ public:
    /// See comments about methods below in IStorage interface
    StorageInMemoryMetadata getInMemoryMetadata() const override;

-    ASTPtr getPartitionKeyAST() const override { return partition_by_ast; }
-    ASTPtr getSortingKeyAST() const override { return sorting_key_expr_ast; }
-    ASTPtr getPrimaryKeyAST() const override { return primary_key_expr_ast; }
-    ASTPtr getSamplingKeyAST() const override { return sample_by_ast; }
-
-    Names getColumnsRequiredForPartitionKey() const override { return (partition_key_expr ? partition_key_expr->getRequiredColumns() : Names{}); }
-    Names getColumnsRequiredForSortingKey() const override { return sorting_key_expr->getRequiredColumns(); }
-    Names getColumnsRequiredForPrimaryKey() const override { return primary_key_expr->getRequiredColumns(); }
-    Names getColumnsRequiredForSampling() const override { return columns_required_for_sampling; }
-    Names getColumnsRequiredForFinal() const override { return sorting_key_expr->getRequiredColumns(); }
-    Names getSortingKeyColumns() const override { return sorting_key_columns; }
-
    ColumnDependencies getColumnDependencies(const NameSet & updated_columns) const override;

    StoragePolicyPtr getStoragePolicy() const override;

    bool supportsPrewhere() const override { return true; }
-    bool supportsSampling() const override { return sample_by_ast != nullptr; }

    bool supportsFinal() const override
    {
@ -530,8 +517,6 @@ public:
     */
    static ASTPtr extractKeyExpressionList(const ASTPtr & node);

-    bool hasSortingKey() const { return !sorting_key_columns.empty(); }
-    bool hasPrimaryKey() const { return !primary_key_columns.empty(); }
    bool hasSkipIndices() const { return !skip_indices.empty(); }

    bool hasAnyColumnTTL() const { return !column_ttl_entries_by_name.empty(); }
@ -649,8 +634,6 @@ public:
    const MergingParams merging_params;

    bool is_custom_partitioned = false;
-    ExpressionActionsPtr partition_key_expr;
-    Block partition_key_sample;

    ExpressionActionsPtr minmax_idx_expr;
    Names minmax_idx_columns;
@ -664,19 +647,6 @@ public:
    ExpressionActionsPtr primary_key_and_skip_indices_expr;
    ExpressionActionsPtr sorting_key_and_skip_indices_expr;

-    /// Names of sorting key columns in ORDER BY expression. For example: 'a',
-    /// 'x * y', 'toStartOfMonth(date)', etc.
-    Names sorting_key_columns;
-    ASTPtr sorting_key_expr_ast;
-    ExpressionActionsPtr sorting_key_expr;
-
-    /// Names of columns for primary key.
-    Names primary_key_columns;
-    ASTPtr primary_key_expr_ast;
-    ExpressionActionsPtr primary_key_expr;
-    Block primary_key_sample;
-    DataTypes primary_key_data_types;
-
    struct TTLEntry
    {
        ExpressionActionsPtr expression;
@ -710,9 +680,6 @@ public:
    /// Vector rw operations have to be done under "move_ttl_entries_mutex".
    std::vector<TTLEntry> move_ttl_entries;

-    String sampling_expr_column_name;
-    Names columns_required_for_sampling;
-
    /// Limiting parallel sends per one table, used in DataPartsExchange
    std::atomic_uint current_table_sends {0};

@ -739,10 +706,6 @@ protected:
    friend struct ReplicatedMergeTreeTableMetadata;
    friend class StorageReplicatedMergeTree;

-    ASTPtr partition_by_ast;
-    ASTPtr order_by_ast;
-    ASTPtr primary_key_ast;
-    ASTPtr sample_by_ast;
    ASTPtr ttl_table_ast;
    ASTPtr settings_ast;

@ -854,7 +817,7 @@ protected:

    void setProperties(const StorageInMemoryMetadata & metadata, bool only_check = false, bool attach = false);

-    void initPartitionKey();
+    void initPartitionKey(ASTPtr partition_by_ast);

    void setTTLExpressions(const ColumnsDescription & columns,
        const ASTPtr & new_ttl_table_ast, bool only_check = false);
--- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
@ -608,7 +608,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    NamesAndTypesList merging_columns;
    Names gathering_column_names, merging_column_names;
    extractMergingAndGatheringColumns(
-        storage_columns, data.sorting_key_expr, data.skip_indices,
+        storage_columns, data.getSortingKey().expression, data.skip_indices,
        data.merging_params, gathering_columns, gathering_column_names, merging_columns, merging_column_names);

    auto single_disk_volume = std::make_shared<SingleDiskVolume>("volume_" + future_part.name, disk);
@ -727,7 +727,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
        pipes.emplace_back(std::move(pipe));
    }

-    Names sort_columns = data.sorting_key_columns;
+    Names sort_columns = data.getSortingKeyColumns();
    SortDescription sort_description;
    size_t sort_columns_size = sort_columns.size();
    sort_description.reserve(sort_columns_size);
--- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@ -223,9 +223,10 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
    data.check(real_column_names);

    const Settings & settings = context.getSettingsRef();
-    Names primary_key_columns = data.primary_key_columns;
+    const auto & primary_key = data.getPrimaryKey();
+    Names primary_key_columns = primary_key.column_names;

-    KeyCondition key_condition(query_info, context, primary_key_columns, data.primary_key_expr);
+    KeyCondition key_condition(query_info, context, primary_key_columns, primary_key.expression);

    if (settings.force_primary_key && key_condition.alwaysUnknownOrTrue())
    {
@ -388,7 +389,8 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
            used_sample_factor = 1.0 / boost::rational_cast<Float64>(relative_sample_size);

        RelativeSize size_of_universum = 0;
-        DataTypePtr sampling_column_type = data.primary_key_sample.getByName(data.sampling_expr_column_name).type;
+        const auto & sampling_key = data.getSamplingKey();
+        DataTypePtr sampling_column_type = sampling_key.data_types[0];

        if (typeid_cast<const DataTypeUInt64 *>(sampling_column_type.get()))
            size_of_universum = RelativeSize(std::numeric_limits<UInt64>::max()) + RelativeSize(1);
@ -457,17 +459,17 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
            /// The first time it was calculated for final, because sample key is a part of the PK.
            /// So, assume that we already have calculated column.
            ASTPtr sampling_key_ast = data.getSamplingKeyAST();
+
            if (select.final())
            {
-                sampling_key_ast = std::make_shared<ASTIdentifier>(data.sampling_expr_column_name);
-
+                sampling_key_ast = std::make_shared<ASTIdentifier>(sampling_key.column_names[0]);
                /// We do spoil available_real_columns here, but it is not used later.
-                available_real_columns.emplace_back(data.sampling_expr_column_name, std::move(sampling_column_type));
+                available_real_columns.emplace_back(sampling_key.column_names[0], std::move(sampling_column_type));
            }

            if (has_lower_limit)
            {
-                if (!key_condition.addCondition(data.sampling_expr_column_name, Range::createLeftBounded(lower, true)))
+                if (!key_condition.addCondition(sampling_key.column_names[0], Range::createLeftBounded(lower, true)))
                    throw Exception("Sampling column not in primary key", ErrorCodes::ILLEGAL_COLUMN);

                ASTPtr args = std::make_shared<ASTExpressionList>();
@ -484,7 +486,7 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(

            if (has_upper_limit)
            {
-                if (!key_condition.addCondition(data.sampling_expr_column_name, Range::createRightBounded(upper, false)))
+                if (!key_condition.addCondition(sampling_key.column_names[0], Range::createRightBounded(upper, false)))
                    throw Exception("Sampling column not in primary key", ErrorCodes::ILLEGAL_COLUMN);

                ASTPtr args = std::make_shared<ASTExpressionList>();
@ -612,7 +614,7 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
    if (select.final())
    {
        /// Add columns needed to calculate the sorting expression and the sign.
-        std::vector<String> add_columns = data.sorting_key_expr->getRequiredColumns();
+        std::vector<String> add_columns = data.getColumnsRequiredForSortingKey();
        column_names_to_read.insert(column_names_to_read.end(), add_columns.begin(), add_columns.end());

        if (!data.merging_params.sign_column.empty())
@ -638,7 +640,7 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
    else if (settings.optimize_read_in_order && query_info.input_sorting_info)
    {
        size_t prefix_size = query_info.input_sorting_info->order_key_prefix_descr.size();
-        auto order_key_prefix_ast = data.sorting_key_expr_ast->clone();
+        auto order_key_prefix_ast = data.getSortingKey().expression_list_ast->clone();
        order_key_prefix_ast->children.resize(prefix_size);

        auto syntax_result = SyntaxAnalyzer(context).analyze(order_key_prefix_ast, data.getColumns().getAllPhysical());
@ -1023,7 +1025,7 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder(
        {
            SortDescription sort_description;
            for (size_t j = 0; j < input_sorting_info->order_key_prefix_descr.size(); ++j)
-                sort_description.emplace_back(data.sorting_key_columns[j],
+                sort_description.emplace_back(data.getSortingKey().column_names[j],
                    input_sorting_info->direction, 1);

            /// Drop temporary columns, added by 'sorting_key_prefix_expr'
@ -1096,11 +1098,11 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsFinal(
        if (!out_projection)
            out_projection = createProjection(pipe, data);

-        pipe.addSimpleTransform(std::make_shared<ExpressionTransform>(pipe.getHeader(), data.sorting_key_expr));
+        pipe.addSimpleTransform(std::make_shared<ExpressionTransform>(pipe.getHeader(), data.getSortingKey().expression));
        pipes.emplace_back(std::move(pipe));
    }

-    Names sort_columns = data.sorting_key_columns;
+    Names sort_columns = data.getSortingKeyColumns();
    SortDescription sort_description;
    size_t sort_columns_size = sort_columns.size();
    sort_description.reserve(sort_columns_size);
@ -1293,11 +1295,12 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
        std::function<void(size_t, size_t, FieldRef &)> create_field_ref;
        /// If there are no monotonic functions, there is no need to save block reference.
        /// Passing explicit field to FieldRef allows to optimize ranges and shows better performance.
+        const auto & primary_key = data.getPrimaryKey();
        if (key_condition.hasMonotonicFunctionsChain())
        {
            auto index_block = std::make_shared<Block>();
            for (size_t i = 0; i < used_key_size; ++i)
-                index_block->insert({index[i], data.primary_key_data_types[i], data.primary_key_columns[i]});
+                index_block->insert({index[i], primary_key.data_types[i], primary_key.column_names[i]});

            create_field_ref = [index_block](size_t row, size_t column, FieldRef & field)
            {
@ -1328,7 +1331,7 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
                    create_field_ref(range.begin, i, index_left[i]);

                may_be_true = key_condition.mayBeTrueAfter(
-                    used_key_size, index_left.data(), data.primary_key_data_types);
+                    used_key_size, index_left.data(), primary_key.data_types);
            }
            else
            {
@ -1342,7 +1345,7 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
                }

                may_be_true = key_condition.mayBeTrueInRange(
-                    used_key_size, index_left.data(), index_right.data(), data.primary_key_data_types);
+                    used_key_size, index_left.data(), index_right.data(), primary_key.data_types);
            }

            if (!may_be_true)
--- a/src/Storages/MergeTree/MergeTreeDataWriter.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataWriter.cpp
@ -139,18 +139,19 @@ BlocksWithPartition MergeTreeDataWriter::splitBlockIntoParts(const Block & block
    data.check(block, true);
    block.checkNumberOfRows();

-    if (!data.partition_key_expr) /// Table is not partitioned.
+    if (!data.hasPartitionKey()) /// Table is not partitioned.
    {
        result.emplace_back(Block(block), Row());
        return result;
    }

    Block block_copy = block;
-    data.partition_key_expr->execute(block_copy);
+    const auto & partition_key = data.getPartitionKey();
+    partition_key.expression->execute(block_copy);

    ColumnRawPtrs partition_columns;
-    partition_columns.reserve(data.partition_key_sample.columns());
-    for (const ColumnWithTypeAndName & element : data.partition_key_sample)
+    partition_columns.reserve(partition_key.sample_block.columns());
+    for (const ColumnWithTypeAndName & element : partition_key.sample_block)
        partition_columns.emplace_back(block_copy.getByName(element.name).column.get());

    PODArray<size_t> partition_num_to_first_row;
@ -204,7 +205,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithPa

    MergeTreePartition partition(std::move(block_with_partition.partition));

-    MergeTreePartInfo new_part_info(partition.getID(data.partition_key_sample), temp_index, temp_index, 0);
+    MergeTreePartInfo new_part_info(partition.getID(data.getPartitionKey().sample_block), temp_index, temp_index, 0);
    String part_name;
    if (data.format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
    {
@ -262,7 +263,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithPa
    if (data.hasSortingKey() || data.hasSkipIndices())
        data.sorting_key_and_skip_indices_expr->execute(block);

-    Names sort_columns = data.sorting_key_columns;
+    Names sort_columns = data.getSortingKeyColumns();
    SortDescription sort_description;
    size_t sort_columns_size = sort_columns.size();
    sort_description.reserve(sort_columns_size);
--- a/src/Storages/MergeTree/MergeTreePartition.cpp
+++ b/src/Storages/MergeTree/MergeTreePartition.cpp
@ -26,7 +26,7 @@ static std::unique_ptr<ReadBufferFromFileBase> openForReading(const DiskPtr & di

 String MergeTreePartition::getID(const MergeTreeData & storage) const
 {
-    return getID(storage.partition_key_sample);
+    return getID(storage.getPartitionKey().sample_block);
 }

 /// NOTE: This ID is used to create part names which are then persisted in ZK and as directory names on the file system.
@ -89,7 +89,8 @@ String MergeTreePartition::getID(const Block & partition_key_sample) const

 void MergeTreePartition::serializeText(const MergeTreeData & storage, WriteBuffer & out, const FormatSettings & format_settings) const
 {
-    size_t key_size = storage.partition_key_sample.columns();
+    const auto & partition_key_sample = storage.getPartitionKey().sample_block;
+    size_t key_size = partition_key_sample.columns();

    if (key_size == 0)
    {
@ -97,7 +98,7 @@ void MergeTreePartition::serializeText(const MergeTreeData & storage, WriteBuffe
    }
    else if (key_size == 1)
    {
-        const DataTypePtr & type = storage.partition_key_sample.getByPosition(0).type;
+        const DataTypePtr & type = partition_key_sample.getByPosition(0).type;
        auto column = type->createColumn();
        column->insert(value[0]);
        type->serializeAsText(*column, 0, out, format_settings);
@ -108,7 +109,7 @@ void MergeTreePartition::serializeText(const MergeTreeData & storage, WriteBuffe
        Columns columns;
        for (size_t i = 0; i < key_size; ++i)
        {
-            const auto & type = storage.partition_key_sample.getByPosition(i).type;
+            const auto & type = partition_key_sample.getByPosition(i).type;
            types.push_back(type);
            auto column = type->createColumn();
            column->insert(value[i]);
@ -123,19 +124,20 @@ void MergeTreePartition::serializeText(const MergeTreeData & storage, WriteBuffe

 void MergeTreePartition::load(const MergeTreeData & storage, const DiskPtr & disk, const String & part_path)
 {
-    if (!storage.partition_key_expr)
+    if (!storage.hasPartitionKey())
        return;

+    const auto & partition_key_sample = storage.getPartitionKey().sample_block;
    auto partition_file_path = part_path + "partition.dat";
    auto file = openForReading(disk, partition_file_path);
-    value.resize(storage.partition_key_sample.columns());
-    for (size_t i = 0; i < storage.partition_key_sample.columns(); ++i)
-        storage.partition_key_sample.getByPosition(i).type->deserializeBinary(value[i], *file);
+    value.resize(partition_key_sample.columns());
+    for (size_t i = 0; i < partition_key_sample.columns(); ++i)
+        partition_key_sample.getByPosition(i).type->deserializeBinary(value[i], *file);
 }

 void MergeTreePartition::store(const MergeTreeData & storage, const DiskPtr & disk, const String & part_path, MergeTreeDataPartChecksums & checksums) const
 {
-    store(storage.partition_key_sample, disk, part_path, checksums);
+    store(storage.getPartitionKey().sample_block, disk, part_path, checksums);
 }

 void MergeTreePartition::store(const Block & partition_key_sample, const DiskPtr & disk, const String & part_path, MergeTreeDataPartChecksums & checksums) const
--- a/src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp
+++ b/src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp
@ -39,8 +39,9 @@ MergeTreeWhereOptimizer::MergeTreeWhereOptimizer(
        block_with_constants{KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context)},
        log{log_}
 {
-    if (!data.primary_key_columns.empty())
-        first_primary_key_column = data.primary_key_columns[0];
+    const auto & primary_key = data.getPrimaryKey();
+    if (!primary_key.column_names.empty())
+        first_primary_key_column = primary_key.column_names[0];

    calculateColumnSizes(data, queried_columns);
    determineArrayJoinedNames(query_info.query->as<ASTSelectQuery &>());
--- a/src/Storages/MergeTree/MergedBlockOutputStream.cpp
+++ b/src/Storages/MergeTree/MergedBlockOutputStream.cpp
@ -162,7 +162,7 @@ void MergedBlockOutputStream::writeImpl(const Block & block, const IColumn::Perm
                std::inserter(skip_indexes_column_names_set, skip_indexes_column_names_set.end()));
    Names skip_indexes_column_names(skip_indexes_column_names_set.begin(), skip_indexes_column_names_set.end());

-    Block primary_key_block = getBlockAndPermute(block, storage.primary_key_columns, permutation);
+    Block primary_key_block = getBlockAndPermute(block, storage.getPrimaryKeyColumns(), permutation);
    Block skip_indexes_block = getBlockAndPermute(block, skip_indexes_column_names, permutation);

    writer->write(block, permutation, primary_key_block, skip_indexes_block);
--- a/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp
+++ b/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp
@ -29,7 +29,7 @@ ReplicatedMergeTreeTableMetadata::ReplicatedMergeTreeTableMetadata(const MergeTr
        date_column = data.minmax_idx_columns[data.minmax_idx_date_column_pos];

    const auto data_settings = data.getSettings();
-    sampling_expression = formattedAST(data.sample_by_ast);
+    sampling_expression = formattedAST(data.getSamplingKeyAST());
    index_granularity = data_settings->index_granularity;
    merging_params_mode = static_cast<int>(data.merging_params.mode);
    sign_column = data.merging_params.sign_column;
@ -40,18 +40,18 @@ ReplicatedMergeTreeTableMetadata::ReplicatedMergeTreeTableMetadata(const MergeTr
    /// So rules in zookeeper metadata is following:
    /// - When we have only ORDER BY, than store it in "primary key:" row of /metadata
    /// - When we have both, than store PRIMARY KEY in "primary key:" row and ORDER BY in "sorting key:" row of /metadata
-    if (!data.primary_key_ast)
-        primary_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.order_by_ast));
+    if (!data.isPrimaryKeyDefined())
+        primary_key = formattedAST(data.getSortingKey().expression_list_ast);
    else
    {
-        primary_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.primary_key_ast));
-        sorting_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.order_by_ast));
+        primary_key = formattedAST(data.getPrimaryKey().expression_list_ast);
+        sorting_key = formattedAST(data.getSortingKey().expression_list_ast);
    }

    data_format_version = data.format_version;

    if (data.format_version >= MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
-        partition_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.partition_by_ast));
+        partition_key = formattedAST(data.getPartitionKey().expression_list_ast);

    ttl_table = formattedAST(data.ttl_table_ast);

--- a/src/Storages/MergeTree/StorageFromMergeTreeDataPart.h
+++ b/src/Storages/MergeTree/StorageFromMergeTreeDataPart.h
@ -52,10 +52,6 @@ public:
        return part->storage.getInMemoryMetadata();
    }

-    bool hasSortingKey() const { return part->storage.hasSortingKey(); }
-
-    Names getSortingKeyColumns() const override { return part->storage.getSortingKeyColumns(); }
-
    NamesAndTypesList getVirtuals() const override
    {
        return part->storage.getVirtuals();
@ -68,6 +64,7 @@ protected:
    {
        setColumns(part_->storage.getColumns());
        setIndices(part_->storage.getIndices());
+        setSortingKey(part_->storage.getSortingKey());
    }

 private:
--- a/src/Storages/StorageInMemoryMetadata.cpp
+++ b/src/Storages/StorageInMemoryMetadata.cpp
@ -1,7 +1,15 @@
 #include <Storages/StorageInMemoryMetadata.h>

+#include <Interpreters/ExpressionActions.h>
+#include <Interpreters/ExpressionAnalyzer.h>
+#include <Interpreters/SyntaxAnalyzer.h>
+#include <Parsers/ASTExpressionList.h>
+#include <Parsers/ASTFunction.h>
+#include <Parsers/queryToString.h>
+
 namespace DB
 {
+
 StorageInMemoryMetadata::StorageInMemoryMetadata(
    const ColumnsDescription & columns_,
    const IndicesDescription & indices_,
@ -79,4 +87,55 @@ StorageInMemoryMetadata & StorageInMemoryMetadata::operator=(const StorageInMemo

    return *this;
 }
+
+namespace
+{
+    ASTPtr extractKeyExpressionList(const ASTPtr & node)
+    {
+        if (!node)
+            return std::make_shared<ASTExpressionList>();
+
+        const auto * expr_func = node->as<ASTFunction>();
+
+        if (expr_func && expr_func->name == "tuple")
+        {
+            /// Primary key is specified in tuple, extract its arguments.
+            return expr_func->arguments->clone();
+        }
+        else
+        {
+            /// Primary key consists of one column.
+            auto res = std::make_shared<ASTExpressionList>();
+            res->children.push_back(node);
+            return res;
+        }
+    }
+}
+
+StorageMetadataKeyField StorageMetadataKeyField::getKeyFromAST(const ASTPtr & definition_ast, const ColumnsDescription & columns, const Context & context)
+{
+    StorageMetadataKeyField result;
+    result.definition_ast = definition_ast;
+    result.expression_list_ast = extractKeyExpressionList(definition_ast);
+
+    if (result.expression_list_ast->children.empty())
+        return result;
+
+    const auto & children = result.expression_list_ast->children;
+    for (const auto & child : children)
+        result.column_names.emplace_back(child->getColumnName());
+
+    {
+        auto expr = result.expression_list_ast->clone();
+        auto syntax_result = SyntaxAnalyzer(context).analyze(expr, columns.getAllPhysical());
+        result.expression = ExpressionAnalyzer(expr, syntax_result, context).getActions(true);
+        result.sample_block = result.expression->getSampleBlock();
+    }
+
+    for (size_t i = 0; i < result.sample_block.columns(); ++i)
+        result.data_types.emplace_back(result.sample_block.getByPosition(i).type);
+
+    return result;
+}
+
 }
--- a/src/Storages/StorageInMemoryMetadata.h
+++ b/src/Storages/StorageInMemoryMetadata.h
@ -43,4 +43,34 @@ struct StorageInMemoryMetadata
    StorageInMemoryMetadata & operator=(const StorageInMemoryMetadata & other);
 };

+/// Common structure for primary, partition and other storage keys
+struct StorageMetadataKeyField
+{
+    /// User defined AST in CREATE/ALTER query. This field may be empty, but key
+    /// can exists because some of them maybe set implicitly (for example,
+    /// primary key in merge tree can be part of sorting key)
+    ASTPtr definition_ast;
+
+    /// ASTExpressionList with key fields, example: (x, toStartOfMonth(date))).
+    ASTPtr expression_list_ast;
+
+    /// Expression from expression_list_ast created by ExpressionAnalyzer. Useful,
+    /// when you need to get required columns for key, example: a, date, b.
+    ExpressionActionsPtr expression;
+
+    /// Sample block with key columns (names, types, empty column)
+    Block sample_block;
+
+    /// Column names in key definition, example: x, toStartOfMonth(date), a * b.
+    Names column_names;
+
+    /// Types from sample block ordered in columns order.
+    DataTypes data_types;
+
+    /// Parse key structure from key definition. Requires all columns, available
+    /// in storage.
+    static StorageMetadataKeyField getKeyFromAST(const ASTPtr & definition_ast, const ColumnsDescription & columns, const Context & context);
+};
+
+
 }
--- a/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/src/Storages/StorageReplicatedMergeTree.cpp
@ -492,11 +492,11 @@ void StorageReplicatedMergeTree::setTableStructure(ColumnsDescription new_column
                metadata.order_by_ast = tuple;
            }

-            if (!primary_key_ast)
+            if (!isPrimaryKeyDefined())
            {
                /// Primary and sorting key become independent after this ALTER so we have to
                /// save the old ORDER BY expression as the new primary key.
-                metadata.primary_key_ast = order_by_ast->clone();
+                metadata.primary_key_ast = getSortingKeyAST()->clone();
            }
        }

--- a/src/Storages/StorageS3.cpp
+++ b/src/Storages/StorageS3.cpp
@ -24,7 +24,7 @@
 #include <DataTypes/DataTypeString.h>

 #include <aws/s3/S3Client.h>
-#include <aws/s3/model/ListObjectsRequest.h>
+#include <aws/s3/model/ListObjectsV2Request.h>

 #include <Common/parseGlobs.h>
 #include <Common/quoteString.h>
@ -228,18 +228,18 @@ Strings listFilesWithRegexpMatching(Aws::S3::S3Client & client, const S3::URI &
        return {globbed_uri.key};
    }

-    Aws::S3::Model::ListObjectsRequest request;
+    Aws::S3::Model::ListObjectsV2Request request;
    request.SetBucket(globbed_uri.bucket);
    request.SetPrefix(key_prefix);

    re2::RE2 matcher(makeRegexpPatternFromGlobs(globbed_uri.key));
    Strings result;
-    Aws::S3::Model::ListObjectsOutcome outcome;
+    Aws::S3::Model::ListObjectsV2Outcome outcome;
    int page = 0;
    do
    {
        ++page;
-        outcome = client.ListObjects(request);
+        outcome = client.ListObjectsV2(request);
        if (!outcome.IsSuccess())
        {
            throw Exception("Could not list objects in bucket " + quoteString(request.GetBucket())
@ -256,7 +256,7 @@ Strings listFilesWithRegexpMatching(Aws::S3::S3Client & client, const S3::URI &
                result.emplace_back(std::move(key));
        }

-        request.SetMarker(outcome.GetResult().GetNextMarker());
+        request.SetContinuationToken(outcome.GetResult().GetNextContinuationToken());
    }
    while (outcome.GetResult().GetIsTruncated());

--- a/src/Storages/System/StorageSystemTables.cpp
+++ b/src/Storages/System/StorageSystemTables.cpp
@ -372,7 +372,7 @@ protected:
                if (columns_mask[src_index++])
                {
                    assert(table != nullptr);
-                    if ((expression_ptr = table->getSortingKeyAST()))
+                    if ((expression_ptr = table->getSortingKey().expression_list_ast))
                        res_columns[res_index++]->insert(queryToString(expression_ptr));
                    else
                        res_columns[res_index++]->insertDefault();
@ -381,7 +381,7 @@ protected:
                if (columns_mask[src_index++])
                {
                    assert(table != nullptr);
-                    if ((expression_ptr = table->getPrimaryKeyAST()))
+                    if ((expression_ptr = table->getPrimaryKey().expression_list_ast))
                        res_columns[res_index++]->insert(queryToString(expression_ptr));
                    else
                        res_columns[res_index++]->insertDefault();
--- a/tests/clickhouse-test
+++ b/tests/clickhouse-test
@ -234,14 +234,6 @@ def run_tests_array(all_tests_with_params):
                        clickhouse_proc = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE)
                        clickhouse_proc.communicate("SELECT 'Running test {suite}/{case} from pid={pid}';".format(pid = os.getpid(), case = case, suite = suite))

-                    if not args.no_system_log_cleanup:
-                        clickhouse_proc = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE)
-                        clickhouse_proc.communicate("SYSTEM FLUSH LOGS")
-
-                        for table in ['query_log', 'query_thread_log', 'trace_log', 'metric_log']:
-                            clickhouse_proc = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE)
-                            clickhouse_proc.communicate("TRUNCATE TABLE IF EXISTS system.{}".format(table))
-
                    reference_file = os.path.join(suite_dir, name) + '.reference'
                    stdout_file = os.path.join(suite_tmp_dir, name) + '.stdout'
                    stderr_file = os.path.join(suite_tmp_dir, name) + '.stderr'
@ -572,7 +564,6 @@ if __name__ == '__main__':
    parser.add_argument('--stop', action='store_true', default=None, dest='stop', help='Stop on network errors')
    parser.add_argument('--order', default='desc', choices=['asc', 'desc', 'random'], help='Run order')
    parser.add_argument('--testname', action='store_true', default=None, dest='testname', help='Make query with test name before test run')
-    parser.add_argument('--no-system-log-cleanup', action='store_true', default=None, help='Do not cleanup system.*_log tables')
    parser.add_argument('--hung-check', action='store_true', default=False)
    parser.add_argument('--force-color', action='store_true', default=False)
    parser.add_argument('--database', help='Database for tests (random name test_XXXXXX by default)')
--- a/tests/integration/test_multiple_disks/test.py
+++ b/tests/integration/test_multiple_disks/test.py
@ -360,7 +360,6 @@ def test_max_data_part_size(start_cluster, name, engine):
    finally:
        node1.query("DROP TABLE IF EXISTS {}".format(name))

-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("mt_with_overflow","MergeTree()"),
    ("replicated_mt_with_overflow","ReplicatedMergeTree('/clickhouse/replicated_mt_with_overflow', '1')",),
@ -455,7 +454,6 @@ def test_background_move(start_cluster, name, engine):
    finally:
        node1.query("DROP TABLE IF EXISTS {name}".format(name=name))

-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("stopped_moving_mt","MergeTree()"),
    ("stopped_moving_replicated_mt","ReplicatedMergeTree('/clickhouse/stopped_moving_replicated_mt', '1')",),
@ -722,7 +720,6 @@ def produce_alter_move(node, name):
        pass


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("concurrently_altering_mt","MergeTree()"),
    ("concurrently_altering_replicated_mt","ReplicatedMergeTree('/clickhouse/concurrently_altering_replicated_mt', '1')",),
@ -776,7 +773,6 @@ def test_concurrent_alter_move(start_cluster, name, engine):
    finally:
        node1.query("DROP TABLE IF EXISTS {name}".format(name=name))

-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("concurrently_dropping_mt","MergeTree()"),
    ("concurrently_dropping_replicated_mt","ReplicatedMergeTree('/clickhouse/concurrently_dropping_replicated_mt', '1')",),
@ -905,8 +901,6 @@ def test_mutate_to_another_disk(start_cluster, name, engine):
    finally:
        node1.query("DROP TABLE IF EXISTS {name}".format(name=name))

-
-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("alter_modifying_mt","MergeTree()"),
    ("replicated_alter_modifying_mt","ReplicatedMergeTree('/clickhouse/replicated_alter_modifying_mt', '1')",),
@ -939,7 +933,11 @@ def test_concurrent_alter_modify(start_cluster, name, engine):
        def alter_modify(num):
            for i in range(num):
                column_type = random.choice(["UInt64", "String"])
+                try:
                    node1.query("ALTER TABLE {} MODIFY COLUMN number {}".format(name, column_type))
+                except:
+                    if "Replicated" not in engine:
+                        raise

        insert(100)

--- a/tests/integration/test_storage_kafka/test.py
+++ b/tests/integration/test_storage_kafka/test.py
@ -246,6 +246,50 @@ def test_kafka_consumer_hang(kafka_cluster):
    # 'dr'||'op' to avoid self matching
    assert int(instance.query("select count() from system.processes where position(lower(query),'dr'||'op')>0")) == 0

+@pytest.mark.timeout(180)
+def test_kafka_consumer_hang2(kafka_cluster):
+
+    instance.query('''
+        DROP TABLE IF EXISTS test.kafka;
+
+        CREATE TABLE test.kafka (key UInt64, value UInt64)
+            ENGINE = Kafka
+            SETTINGS kafka_broker_list = 'kafka1:19092',
+                     kafka_topic_list = 'consumer_hang2',
+                     kafka_group_name = 'consumer_hang2',
+                     kafka_format = 'JSONEachRow';
+
+        CREATE TABLE test.kafka2 (key UInt64, value UInt64)
+            ENGINE = Kafka
+            SETTINGS kafka_broker_list = 'kafka1:19092',
+                     kafka_topic_list = 'consumer_hang2',
+                     kafka_group_name = 'consumer_hang2',
+                     kafka_format = 'JSONEachRow';
+        ''')
+
+    # first consumer subscribe the topic, try to poll some data, and go to rest
+    instance.query('SELECT * FROM test.kafka')
+
+    # second consumer do the same leading to rebalance in the first
+    # consumer, try to poll some data
+    instance.query('SELECT * FROM test.kafka2')
+
+#echo 'SELECT * FROM test.kafka; SELECT * FROM test.kafka2; DROP TABLE test.kafka;' | clickhouse client -mn &
+#    kafka_cluster.open_bash_shell('instance')
+
+    # first consumer has pending rebalance callback unprocessed (no poll after select)
+    # one of those queries was failing because of
+    # https://github.com/edenhill/librdkafka/issues/2077
+    # https://github.com/edenhill/librdkafka/issues/2898
+    instance.query('DROP TABLE test.kafka')
+    instance.query('DROP TABLE test.kafka2')
+
+
+    # from a user perspective: we expect no hanging 'drop' queries
+    # 'dr'||'op' to avoid self matching
+    assert int(instance.query("select count() from system.processes where position(lower(query),'dr'||'op')>0")) == 0
+
+
@pytest.mark.timeout(180)
 def test_kafka_csv_with_delimiter(kafka_cluster):
    instance.query('''
@ -1130,6 +1174,7 @@ def test_kafka_rebalance(kafka_cluster):

    print(instance.query('SELECT count(), uniqExact(key), max(key) + 1 FROM test.destination'))

+    # Some queries to debug...
    # SELECT * FROM test.destination where key in (SELECT key FROM test.destination group by key having count() <> 1)
    # select number + 1 as key from numbers(4141) left join test.destination using (key) where  test.destination.key = 0;
    # SELECT * FROM test.destination WHERE key between 2360 and 2370 order by key;
@ -1137,6 +1182,18 @@ def test_kafka_rebalance(kafka_cluster):
    # select toUInt64(0) as _partition, number + 1 as _offset from numbers(400) left join test.destination using (_partition,_offset) where test.destination.key = 0 order by _offset;
    # SELECT * FROM test.destination WHERE _partition = 0 and _offset between 220 and 240 order by _offset;

+    # CREATE TABLE test.reference (key UInt64, value UInt64) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka1:19092',
+    #             kafka_topic_list = 'topic_with_multiple_partitions',
+    #             kafka_group_name = 'rebalance_test_group_reference',
+    #             kafka_format = 'JSONEachRow',
+    #             kafka_max_block_size = 100000;
+    #
+    # CREATE MATERIALIZED VIEW test.reference_mv Engine=Log AS
+    #     SELECT  key, value, _topic,_key,_offset, _partition, _timestamp, 'reference' as _consumed_by
+    # FROM test.reference;
+    #
+    # select * from test.reference_mv left join test.destination using (key,_topic,_offset,_partition) where test.destination._consumed_by = '';
+
    result = int(instance.query('SELECT count() == uniqExact(key) FROM test.destination'))

    for consumer_index in range(NUMBER_OF_CONSURRENT_CONSUMERS):
--- a/tests/integration/test_storage_s3/test.py
+++ b/tests/integration/test_storage_s3/test.py
@ -1,6 +1,7 @@
 import json
 import logging
 import random
+import threading

 import pytest

@ -278,3 +279,31 @@ def test_wrong_s3_syntax(cluster, s3_storage_args):

    query = "create table test_table_s3_syntax (id UInt32) ENGINE = S3({})".format(s3_storage_args)
    assert expected_err_msg in instance.query_and_get_error(query)
+
+
+# https://en.wikipedia.org/wiki/One_Thousand_and_One_Nights
+def test_s3_glob_scheherazade(cluster):
+    bucket = cluster.minio_bucket
+    instance = cluster.instances["dummy"]  # type: ClickHouseInstance
+    table_format = "column1 UInt32, column2 UInt32, column3 UInt32"
+    max_path = ""
+    values = "(1, 1, 1)"
+    nights_per_job = 1001 // 30
+    jobs = []
+    for night in range(0, 1001, nights_per_job):
+        def add_tales(start, end):
+            for i in range(start, end):
+                path = "night_{}/tale.csv".format(i)
+                query = "insert into table function s3('http://{}:{}/{}/{}', 'CSV', '{}') values {}".format(
+                    cluster.minio_host, cluster.minio_port, bucket, path, table_format, values)
+                run_query(instance, query)
+
+        jobs.append(threading.Thread(target=add_tales, args=(night, min(night+nights_per_job, 1001))))
+        jobs[-1].start()
+
+    for job in jobs:
+        job.join()
+
+    query = "select count(), sum(column1), sum(column2), sum(column3) from s3('http://{}:{}/{}/night_*/tale.csv', 'CSV', '{}')".format(
+        cluster.minio_redirect_host, cluster.minio_redirect_port, bucket, table_format)
+    assert run_query(instance, query).splitlines() == ["1001\t1001\t1001\t1001"]
--- a/tests/integration/test_ttl_move/test.py
+++ b/tests/integration/test_ttl_move/test.py
@ -59,7 +59,6 @@ def get_used_disks_for_table(node, table_name, partition=None):
    """.format(name=table_name, suffix=suffix)).strip().split('\n')


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,alter", [
    ("mt_test_rule_with_invalid_destination","MergeTree()",0),
    ("replicated_mt_test_rule_with_invalid_destination","ReplicatedMergeTree('/clickhouse/replicated_test_rule_with_invalid_destination', '1')",0),
@ -119,7 +118,6 @@ def test_rule_with_invalid_destination(started_cluster, name, engine, alter):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_inserts_to_disk_do_not_work","MergeTree()",0),
    ("replicated_mt_test_inserts_to_disk_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_inserts_to_disk_do_not_work', '1')",0),
@ -149,10 +147,12 @@ def test_inserts_to_disk_work(started_cluster, name, engine, positive):
        assert node1.query("SELECT count() FROM {name}".format(name=name)).strip() == "10"

    finally:
+        try:
            node1.query("DROP TABLE IF EXISTS {}".format(name))
+        except:
+            pass


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_moves_to_disk_do_not_work","MergeTree()",0),
    ("replicated_mt_test_moves_to_disk_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_moves_to_disk_do_not_work', '1')",0),
@ -171,7 +171,7 @@ def test_moves_to_disk_work(started_cluster, name, engine, positive):
            SETTINGS storage_policy='small_jbod_with_external'
        """.format(name=name, engine=engine))

-        wait_expire_1 = 6
+        wait_expire_1 = 12
        wait_expire_2 = 4
        time_1 = time.time() + wait_expire_1
        time_2 = time.time() + wait_expire_1 + wait_expire_2
@ -199,7 +199,6 @@ def test_moves_to_disk_work(started_cluster, name, engine, positive):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("mt_test_moves_to_volume_work","MergeTree()"),
    ("replicated_mt_test_moves_to_volume_work","ReplicatedMergeTree('/clickhouse/replicated_test_moves_to_volume_work', '1')"),
@ -246,7 +245,6 @@ def test_moves_to_volume_work(started_cluster, name, engine):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_inserts_to_volume_do_not_work","MergeTree()",0),
    ("replicated_mt_test_inserts_to_volume_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_inserts_to_volume_do_not_work', '1')",0),
@ -285,7 +283,6 @@ def test_inserts_to_volume_work(started_cluster, name, engine, positive):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("mt_test_moves_to_disk_eventually_work","MergeTree()"),
    ("replicated_mt_test_moves_to_disk_eventually_work","ReplicatedMergeTree('/clickhouse/replicated_test_moves_to_disk_eventually_work', '1')"),
@ -374,7 +371,6 @@ def test_replicated_download_ttl_info(started_cluster):
                continue


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_merges_to_disk_do_not_work","MergeTree()",0),
    ("replicated_mt_test_merges_to_disk_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_merges_to_disk_do_not_work', '1')",0),
@ -396,7 +392,7 @@ def test_merges_to_disk_work(started_cluster, name, engine, positive):
        node1.query("SYSTEM STOP MERGES {}".format(name))
        node1.query("SYSTEM STOP MOVES {}".format(name))

-        wait_expire_1 = 10
+        wait_expire_1 = 16
        wait_expire_2 = 4
        time_1 = time.time() + wait_expire_1
        time_2 = time.time() + wait_expire_1 + wait_expire_2
@ -432,7 +428,6 @@ def test_merges_to_disk_work(started_cluster, name, engine, positive):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("mt_test_merges_with_full_disk_work","MergeTree()"),
    ("replicated_mt_test_merges_with_full_disk_work","ReplicatedMergeTree('/clickhouse/replicated_test_merges_with_full_disk_work', '1')"),
@ -499,7 +494,6 @@ def test_merges_with_full_disk_work(started_cluster, name, engine):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_moves_after_merges_do_not_work","MergeTree()",0),
    ("replicated_mt_test_moves_after_merges_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_moves_after_merges_do_not_work', '1')",0),
@ -518,7 +512,7 @@ def test_moves_after_merges_work(started_cluster, name, engine, positive):
            SETTINGS storage_policy='small_jbod_with_external'
        """.format(name=name, engine=engine))

-        wait_expire_1 = 10
+        wait_expire_1 = 16
        wait_expire_2 = 4
        time_1 = time.time() + wait_expire_1
        time_2 = time.time() + wait_expire_1 + wait_expire_2
@ -552,7 +546,6 @@ def test_moves_after_merges_work(started_cluster, name, engine, positive):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine,positive,bar", [
    ("mt_test_moves_after_alter_do_not_work","MergeTree()",0,"DELETE"),
    ("replicated_mt_test_moves_after_alter_do_not_work","ReplicatedMergeTree('/clickhouse/replicated_test_moves_after_alter_do_not_work', '1')",0,"DELETE"),
@ -658,7 +651,12 @@ def test_materialize_ttl_in_partition(started_cluster, name, engine):
        node1.query("DROP TABLE IF EXISTS {}".format(name))


-@pytest.mark.skip(reason="Flappy test")
+def start_thread(*args, **kwargs):
+    thread = threading.Thread(*args, **kwargs)
+    thread.start()
+    return thread
+
+
@pytest.mark.parametrize("name,engine,positive", [
    ("mt_test_alter_multiple_ttls_positive", "MergeTree()", True),
    ("mt_replicated_test_alter_multiple_ttls_positive", "ReplicatedMergeTree('/clickhouse/replicated_test_alter_multiple_ttls_positive', '1')", True),
@ -689,6 +687,8 @@ limitations under the License."""
    """
    now = time.time()
    try:
+        sleeps = { delay : start_thread(target=time.sleep, args=(delay,)) for delay in [16, 26] }
+
        node1.query("""
            CREATE TABLE {name} (
                p1 Int64,
@ -697,16 +697,16 @@ limitations under the License."""
            ) ENGINE = {engine}
            ORDER BY tuple()
            PARTITION BY p1
-            TTL d1 + INTERVAL 30 SECOND TO DISK 'jbod2',
-                d1 + INTERVAL 60 SECOND TO VOLUME 'external'
+            TTL d1 + INTERVAL 34 SECOND TO DISK 'jbod2',
+                d1 + INTERVAL 64 SECOND TO VOLUME 'external'
            SETTINGS storage_policy='jbods_with_external', merge_with_ttl_timeout=0
        """.format(name=name, engine=engine))

        node1.query("""
            ALTER TABLE {name} MODIFY
            TTL d1 + INTERVAL 0 SECOND TO DISK 'jbod2',
-                d1 + INTERVAL 5 SECOND TO VOLUME 'external',
-                d1 + INTERVAL 10 SECOND DELETE
+                d1 + INTERVAL 14 SECOND TO VOLUME 'external',
+                d1 + INTERVAL 24 SECOND DELETE
        """.format(name=name))

        for p in range(3):
@ -724,14 +724,14 @@ limitations under the License."""

        assert node1.query("SELECT count() FROM {name}".format(name=name)).splitlines() == ["6"]

-        time.sleep(5)
+        sleeps[16].join()

        used_disks = get_used_disks_for_table(node1, name)
        assert set(used_disks) == {"external"} if positive else {"jbod1", "jbod2"}

        assert node1.query("SELECT count() FROM {name}".format(name=name)).splitlines() == ["6"]

-        time.sleep(5)
+        sleeps[26].join()

        node1.query("OPTIMIZE TABLE {name} FINAL".format(name=name))

@ -741,7 +741,6 @@ limitations under the License."""
        node1.query("DROP TABLE IF EXISTS {name}".format(name=name))


-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,engine", [
    ("concurrently_altering_ttl_mt","MergeTree()"),
    ("concurrently_altering_ttl_replicated_mt","ReplicatedMergeTree('/clickhouse/concurrently_altering_ttl_replicated_mt', '1')",),
@ -792,7 +791,7 @@ def test_concurrent_alter_with_ttl_move(started_cluster, name, engine):
                try:
                    node1.query("ALTER TABLE {} MOVE {mt} {mp} TO {md} {mv}".format(
                        name, mt=move_type, mp=move_part, md=move_disk, mv=move_volume))
-                except QueryRuntimeException as ex:
+                except QueryRuntimeException:
                    pass

            for i in range(num):
@ -809,7 +808,10 @@ def test_concurrent_alter_with_ttl_move(started_cluster, name, engine):
                    what = random.choice(["TO VOLUME 'main'", "TO VOLUME 'external'", "TO DISK 'jbod1'", "TO DISK 'jbod2'", "TO DISK 'external'"])
                    when = "now()+{}".format(random.randint(-1, 5))
                    ttls.append("{} {}".format(when, what))
+                try:
                    node1.query("ALTER TABLE {} MODIFY TTL {}".format(name, ", ".join(ttls)))
+                except QueryRuntimeException:
+                    pass

        def optimize_table(num):
            for i in range(num):
@ -832,7 +834,6 @@ def test_concurrent_alter_with_ttl_move(started_cluster, name, engine):
    finally:
        node1.query("DROP TABLE IF EXISTS {name}".format(name=name))

-@pytest.mark.skip(reason="Flappy test")
@pytest.mark.parametrize("name,positive", [
    ("test_double_move_while_select_negative", 0),
    ("test_double_move_while_select_positive", 1),
@ -870,6 +871,8 @@ def test_double_move_while_select(started_cluster, name, positive):
        node1.query("INSERT INTO {name} VALUES (3, '{string}')".format(name=name, string=get_random_string(9 * 1024 * 1024)))
        node1.query("INSERT INTO {name} VALUES (4, '{string}')".format(name=name, string=get_random_string(9 * 1024 * 1024)))

+        time.sleep(1)
+
        # If SELECT locked old part on external, move shall fail.
        assert node1.query("SELECT disk_name FROM system.parts WHERE table = '{name}' AND active = 1 AND name = '{part}'"
                .format(name=name, part=parts[0])).splitlines() == ["jbod1" if positive else "external"]
--- a/tests/queries/0_stateless/00634_performance_introspection_and_logging.sh
+++ b/tests/queries/0_stateless/00634_performance_introspection_and_logging.sh
@ -47,7 +47,7 @@ SELECT
    threads_realtime >= threads_time_user_system_io,
    any(length(thread_ids)) >= 1
    FROM
-        (SELECT * FROM system.query_log PREWHERE query='$heavy_cpu_query' WHERE type='QueryFinish' ORDER BY event_time DESC LIMIT 1)
+        (SELECT * FROM system.query_log PREWHERE query='$heavy_cpu_query' WHERE event_date >= today()-1 AND type=2 ORDER BY event_time DESC LIMIT 1)
    ARRAY JOIN ProfileEvents.Names AS PN, ProfileEvents.Values AS PV"

 # Check per-thread and per-query ProfileEvents consistency
@ -58,7 +58,7 @@ SELECT PN, PVq, PVt FROM
    SELECT PN, sum(PV) AS PVt
    FROM system.query_thread_log
    ARRAY JOIN ProfileEvents.Names AS PN, ProfileEvents.Values AS PV
-    WHERE query_id='$query_id'
+    WHERE event_date >= today()-1 AND query_id='$query_id'
    GROUP BY PN
 ) js1
 ANY INNER JOIN
@ -66,7 +66,7 @@ ANY INNER JOIN
    SELECT PN, PV AS PVq
    FROM system.query_log
    ARRAY JOIN ProfileEvents.Names AS PN, ProfileEvents.Values AS PV
-    WHERE query_id='$query_id'
+    WHERE event_date >= today()-1 AND query_id='$query_id'
 ) js2
 USING PN
 WHERE
--- a/tests/queries/0_stateless/00933_test_fix_extra_seek_on_compressed_cache.sh
+++ b/tests/queries/0_stateless/00933_test_fix_extra_seek_on_compressed_cache.sh
@ -19,7 +19,7 @@ $CLICKHOUSE_CLIENT --use_uncompressed_cache=1 --query_id="test-query-uncompresse
 sleep 1
 $CLICKHOUSE_CLIENT --query="SYSTEM FLUSH LOGS"

-$CLICKHOUSE_CLIENT --query="SELECT ProfileEvents.Values[indexOf(ProfileEvents.Names, 'Seek')], ProfileEvents.Values[indexOf(ProfileEvents.Names, 'ReadCompressedBytes')], ProfileEvents.Values[indexOf(ProfileEvents.Names, 'UncompressedCacheHits')] AS hit FROM system.query_log WHERE (query_id = 'test-query-uncompressed-cache') AND (type = 'QueryFinish') ORDER BY event_time DESC LIMIT 1"
+$CLICKHOUSE_CLIENT --query="SELECT ProfileEvents.Values[indexOf(ProfileEvents.Names, 'Seek')], ProfileEvents.Values[indexOf(ProfileEvents.Names, 'ReadCompressedBytes')], ProfileEvents.Values[indexOf(ProfileEvents.Names, 'UncompressedCacheHits')] AS hit FROM system.query_log WHERE (query_id = 'test-query-uncompressed-cache') AND (type = 2) AND event_date >= yesterday() ORDER BY event_time DESC LIMIT 1"

 $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS small_table"

--- a/tests/queries/0_stateless/00956_sensitive_data_masking.sh
+++ b/tests/queries/0_stateless/00956_sensitive_data_masking.sh
@ -95,7 +95,7 @@ echo 7
 # and finally querylog
 $CLICKHOUSE_CLIENT \
  --server_logs_file=/dev/null \
-  --query="select * from system.query_log where query like '%TOPSECRET%';"
+  --query="select * from system.query_log where event_time>now() - 10 and query like '%TOPSECRET%';"


 rm -f $tmp_file >/dev/null 2>&1
@ -117,8 +117,8 @@ sleep 0.1;
 echo 9
 $CLICKHOUSE_CLIENT \
   --server_logs_file=/dev/null \
-   --query="SELECT if( count() > 0, 'text_log non empty', 'text_log empty') FROM system.text_log WHERE message like '%find_me%';
-   select * from system.text_log where message like '%TOPSECRET=TOPSECRET%';"  --ignore-error --multiquery
+   --query="SELECT if( count() > 0, 'text_log non empty', 'text_log empty') FROM system.text_log WHERE event_time>now() - 60 and message like '%find_me%';
+   select * from system.text_log where event_time>now() - 60 and message like '%TOPSECRET=TOPSECRET%';"  --ignore-error --multiquery

 echo 'finish'
 rm -f $tmp_file >/dev/null 2>&1
--- a/tests/queries/0_stateless/00974_text_log_table_not_empty.sh
+++ b/tests/queries/0_stateless/00974_text_log_table_not_empty.sh
@ -10,7 +10,7 @@ do

 ${CLICKHOUSE_CLIENT} --query="SYSTEM FLUSH LOGS"
 sleep 0.1;
-if [[ $($CLICKHOUSE_CURL -sS "$CLICKHOUSE_URL" -d "SELECT count() > 0 FROM system.text_log WHERE position(system.text_log.message, 'SELECT 6103') > 0") == 1 ]]; then echo 1; exit; fi;
+if [[ $($CLICKHOUSE_CURL -sS "$CLICKHOUSE_URL" -d "SELECT count() > 0 FROM system.text_log WHERE position(system.text_log.message, 'SELECT 6103') > 0 AND event_date >= yesterday()") == 1 ]]; then echo 1; exit; fi;

 done;

--- a/tests/queries/0_stateless/01070_exception_code_in_query_log_table.sql
+++ b/tests/queries/0_stateless/01070_exception_code_in_query_log_table.sql
@ -3,5 +3,5 @@ SELECT * FROM test_table_for_01070_exception_code_in_query_log_table; -- { serve
 CREATE TABLE test_table_for_01070_exception_code_in_query_log_table (value UInt64) ENGINE=Memory();
 SELECT * FROM test_table_for_01070_exception_code_in_query_log_table;
 SYSTEM FLUSH LOGS;
-SELECT exception_code FROM system.query_log WHERE query = 'SELECT * FROM test_table_for_01070_exception_code_in_query_log_table' ORDER BY exception_code;
+SELECT exception_code FROM system.query_log WHERE query = 'SELECT * FROM test_table_for_01070_exception_code_in_query_log_table' AND event_date >= yesterday() AND event_time > now() - INTERVAL 5 MINUTE ORDER BY exception_code;
 DROP TABLE IF EXISTS test_table_for_01070_exception_code_in_query_log_table;
--- a/tests/queries/0_stateless/01091_num_threads.sql
+++ b/tests/queries/0_stateless/01091_num_threads.sql
@ -8,13 +8,13 @@ WITH
    (
        SELECT query_id
        FROM system.query_log
-        WHERE (query = 'SELECT 1')
+        WHERE (query = 'SELECT 1') AND (event_date >= (today() - 1))
        ORDER BY event_time DESC
        LIMIT 1
    ) AS id
 SELECT uniqExact(thread_id)
 FROM system.query_thread_log
-WHERE (query_id = id) AND (thread_id != master_thread_id);
+WHERE (event_date >= (today() - 1)) AND (query_id = id) AND (thread_id != master_thread_id);

 select sum(number) from numbers(1000000);
 SYSTEM FLUSH LOGS;
@ -23,13 +23,13 @@ WITH
    (
        SELECT query_id
        FROM system.query_log
-        WHERE (query = 'SELECT sum(number) FROM numbers(1000000)')
+        WHERE (query = 'SELECT sum(number) FROM numbers(1000000)') AND (event_date >= (today() - 1))
        ORDER BY event_time DESC
        LIMIT 1
    ) AS id
 SELECT uniqExact(thread_id)
 FROM system.query_thread_log
-WHERE (query_id = id) AND (thread_id != master_thread_id);
+WHERE (event_date >= (today() - 1)) AND (query_id = id) AND (thread_id != master_thread_id);

 select sum(number) from numbers_mt(1000000);
 SYSTEM FLUSH LOGS;
@ -38,10 +38,10 @@ WITH
    (
        SELECT query_id
        FROM system.query_log
-        WHERE (query = 'SELECT sum(number) FROM numbers_mt(1000000)')
+        WHERE (query = 'SELECT sum(number) FROM numbers_mt(1000000)') AND (event_date >= (today() - 1))
        ORDER BY event_time DESC
        LIMIT 1
    ) AS id
 SELECT uniqExact(thread_id) > 2
 FROM system.query_thread_log
-WHERE (query_id = id) AND (thread_id != master_thread_id);
+WHERE (event_date >= (today() - 1)) AND (query_id = id) AND (thread_id != master_thread_id);
--- a/tests/queries/0_stateless/01092_memory_profiler.sql
+++ b/tests/queries/0_stateless/01092_memory_profiler.sql
@ -3,4 +3,4 @@ SET allow_introspection_functions = 1;
 SET memory_profiler_step = 1000000;
 SELECT ignore(groupArray(number), 'test memory profiler') FROM numbers(10000000);
 SYSTEM FLUSH LOGS;
-WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE trace_type = 'Memory' AND query_id = (SELECT query_id FROM system.query_log WHERE query LIKE '%test memory profiler%' ORDER BY event_time DESC LIMIT 1);
+WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'Memory' AND query_id = (SELECT query_id FROM system.query_log WHERE event_date >= yesterday() AND query LIKE '%test memory profiler%' ORDER BY event_time DESC LIMIT 1);
--- a/tests/queries/0_stateless/01198_client_quota_key.sh
+++ b/tests/queries/0_stateless/01198_client_quota_key.sh
@ -3,4 +3,4 @@
 CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
 . $CURDIR/../shell_config.sh

-$CLICKHOUSE_CLIENT --quota_key Hello --query_id test_quota_key --log_queries 1 --multiquery --query "SELECT 1; SYSTEM FLUSH LOGS; SELECT DISTINCT quota_key FROM system.query_log WHERE query_id = 'test_quota_key'"
+$CLICKHOUSE_CLIENT --quota_key Hello --query_id test_quota_key --log_queries 1 --multiquery --query "SELECT 1; SYSTEM FLUSH LOGS; SELECT DISTINCT quota_key FROM system.query_log WHERE event_date >= yesterday() AND event_time >= now() - 300 AND query_id = 'test_quota_key'"
--- a/tests/queries/0_stateless/01231_log_queries_min_type.sql
+++ b/tests/queries/0_stateless/01231_log_queries_min_type.sql
@ -2,14 +2,14 @@ set log_queries=1;

 select '01231_log_queries_min_type/QUERY_START';
 system flush logs;
-select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%';
+select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%' and event_date = today() and event_time >= now() - interval 1 minute;

 set log_queries_min_type='EXCEPTION_BEFORE_START';
 select '01231_log_queries_min_type/EXCEPTION_BEFORE_START';
 system flush logs;
-select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%';
+select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%' and event_date = today() and event_time >= now() - interval 1 minute;

 set log_queries_min_type='EXCEPTION_WHILE_PROCESSING';
 select '01231_log_queries_min_type/', max(number) from system.numbers limit 1e6 settings max_rows_to_read='100K'; -- { serverError 158; }
 system flush logs;
-select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%';
+select count() from system.query_log where query like '%01231_log_queries_min_type/%' and query not like '%system.query_log%' and event_date = today() and event_time >= now() - interval 1 minute;
--- a/tests/queries/0_stateless/01283_max_threads_simple_query_optimization.reference
+++ b/tests/queries/0_stateless/01283_max_threads_simple_query_optimization.reference
--- a/tests/queries/0_stateless/01283_max_threads_simple_query_optimization.sql
+++ b/tests/queries/0_stateless/01283_max_threads_simple_query_optimization.sql
@ -0,0 +1,21 @@
+DROP TABLE IF EXISTS data_01283;
+
+CREATE TABLE data_01283 engine=MergeTree()
+ORDER BY key
+PARTITION BY key
+AS SELECT number key FROM numbers(10);
+
+SET log_queries=1;
+SELECT * FROM data_01283 LIMIT 1 FORMAT Null;
+SET log_queries=0;
+SYSTEM FLUSH LOGS;
+
+-- 1 for PullingAsyncPipelineExecutor::pull
+-- 1 for AsynchronousBlockInputStream
+SELECT
+    throwIf(count() != 1, 'no query was logged'),
+    throwIf(length(thread_ids) != 2, 'too many threads used')
+FROM system.query_log
+WHERE type = 'QueryFinish' AND query LIKE '%data_01283 LIMIT 1%'
+GROUP BY thread_ids
+FORMAT Null;
--- a/utils/github/init.py
+++ b/utils/github/init.py
@ -1,3 +1 @@
 # -*- coding: utf-8 -*-
-
-# REMOVE ME
--- a/utils/github/main.py
+++ b/utils/github/main.py
@ -7,15 +7,15 @@
    - All pull-requests must be squash-merged or explicitly merged without rebase.
    - All pull-requests to master must have at least one label prefixed with `pr-`.
    - Labels that require pull-request to be backported must be red colored (#ff0000).
-    - Stable branch name must be of form `YY.NUMBER`.
-    - All stable branches must be forked directly from the master branch and never be merged back,
+    - Release branch name must be of form `YY.NUMBER`.
+    - All release branches must be forked directly from the master branch and never be merged back,
      or merged with any other branches based on the master branch (including master branch itself).

    Output of this script:

    - Commits without references from pull-requests.
    - Pull-requests to master without proper labels.
-    - Pull-requests that need to be backported, with statuses per stable branch.
+    - Pull-requests that need to be backported, with statuses per release branch.

 '''

@ -29,7 +29,7 @@ import sys
 try:
    from termcolor import colored  # `pip install termcolor`
 except ImportError:
-    sys.exit("Package 'termcolor' not found. Try run: `pip3 install termcolor`")
+    sys.exit("Package 'termcolor' not found. Try run: `pip3 install [--user] termcolor`")


 CHECK_MARK = colored('🗸', 'green')
@ -45,8 +45,6 @@ parser.add_argument('--repo', '-r', type=str, default='', metavar='PATH',
    help='path to the root of the ClickHouse repository')
 parser.add_argument('--remote', type=str, default='origin',
    help='remote name of the "ClickHouse/ClickHouse" upstream')
-parser.add_argument('-n', type=int, default=3, dest='number',
-    help='number of last stable branches to consider')
 parser.add_argument('--token', type=str, required=True,
    help='token for Github access')
 parser.add_argument('--login', type=str,
@ -54,31 +52,46 @@ parser.add_argument('--login', type=str,
 parser.add_argument('--auto-label', action='store_true', dest='autolabel', default=True,
    help='try to automatically parse PR description and put labels')

+# Either select last N release branches, or specify them manually.
+group = parser.add_mutually_exclusive_group(required=True)
+group.add_argument('-n', type=int, default=3, dest='number',
+    help='number of last release branches to consider')
+group.add_argument('--branch', type=str, action='append', metavar='BRANCH',
+    help='specific release branch name to consider')
+
 args = parser.parse_args()

 github = query.Query(args.token, 30)
 repo = local.Local(args.repo, args.remote, github.get_default_branch())

-stables = repo.get_stables()[-args.number:] # [(branch name, base)]
-if not stables:
+if not args.branch:
+    release_branches = repo.get_release_branches()[-args.number:] # [(branch name, base)]
+else:
+    release_branches = []
+    all_release_branches = repo.get_release_branches()
+    for branch in all_release_branches:
+        if branch[0] in args.branch:
+            release_branches.append(branch)
+
+if not release_branches:
    sys.exit('No release branches found!')
 else:
    print('Found release branches:')
-    for stable in stables:
-        print(f'{CHECK_MARK} {stable[0]} forked from {stable[1]}')
+    for branch in release_branches:
+        print(f'{CHECK_MARK} {branch[0]} forked from {branch[1]}')

-first_commit = stables[0][1]
+first_commit = release_branches[0][1]
 pull_requests = github.get_pull_requests(first_commit, args.login)
 good_commits = set(pull_request['mergeCommit']['oid'] for pull_request in pull_requests)

 bad_commits = [] # collect and print them in the end
 from_commit = repo.get_head_commit()
-for i in reversed(range(len(stables))):
-    for commit in repo.iterate(from_commit, stables[i][1]):
+for i in reversed(range(len(release_branches))):
+    for commit in repo.iterate(from_commit, release_branches[i][1]):
        if str(commit) not in good_commits and commit.author.name != 'robot-clickhouse':
            bad_commits.append(commit)

-    from_commit = stables[i][1]
+    from_commit = release_branches[i][1]

 members = set(github.get_members("ClickHouse", "ClickHouse"))
 def print_responsible(pull_request):
@ -146,22 +159,22 @@ if need_backporting:
        no_backport_labeled = set()
        wait = set()

-        for stable in stables:
-            if repo.comparator(stable[1]) < repo.comparator(pull_request['mergeCommit']['oid']):
-                targets.append(stable[0])
+        for branch in release_branches:
+            if repo.comparator(branch[1]) < repo.comparator(pull_request['mergeCommit']['oid']):
+                targets.append(branch[0])

                # FIXME: compatibility logic - check for a manually set label, that indicates status 'backported'.
-                # FIXME: O(n²) - no need to iterate all labels for every `stable`
+                # FIXME: O(n²) - no need to iterate all labels for every `branch`
                for label in github.get_labels(pull_request):
                    if re_vlabel.match(label['name']) or re_vlabel_backported.match(label['name']):
-                        if f'v{stable[0]}' == label['name'] or f'v{stable[0]}-backported' == label['name']:
-                            backport_labeled.add(stable[0])
+                        if f'v{branch[0]}' == label['name'] or f'v{branch[0]}-backported' == label['name']:
+                            backport_labeled.add(branch[0])
                    if re_vlabel_conflicts.match(label['name']):
-                        if f'v{stable[0]}-conflicts' == label['name']:
-                            conflict_labeled.add(stable[0])
+                        if f'v{branch[0]}-conflicts' == label['name']:
+                            conflict_labeled.add(branch[0])
                    if re_vlabel_no_backport.match(label['name']):
-                        if f'v{stable[0]}-no-backport' == label['name']:
-                            no_backport_labeled.add(stable[0])
+                        if f'v{branch[0]}-no-backport' == label['name']:
+                            no_backport_labeled.add(branch[0])

        for event in github.get_timeline(pull_request):
            if(event['isCrossRepository'] or
--- a/utils/github/local.py
+++ b/utils/github/local.py
@ -1,7 +1,9 @@
 # -*- coding: utf-8 -*-

-# `pip install …`
-import git # gitpython
+try:
+    import git # `pip3 install gitpython`
+except ImportError:
+    sys.exit("Package 'gitpython' not found. Try run: `pip3 install [--user] gitpython`")

 import functools
 import os
@ -11,7 +13,7 @@ import re
 class Local:
    '''Implements some useful methods atop of the local repository
    '''
-    RE_STABLE_REF = re.compile(r'^refs/remotes/.+/\d+\.\d+$')
+    RE_RELEASE_BRANCH_REF = re.compile(r'^refs/remotes/.+/\d+\.\d+$')

    def __init__(self, repo_path, remote_name, default_branch_name):
        self._repo = git.Repo(repo_path, search_parent_directories=(not repo_path))
@ -42,16 +44,16 @@ class Local:
         * head (git.Commit)).
        List is sorted by commits in ascending order.
    '''
-    def get_stables(self):
-        stables = []
+    def get_release_branches(self):
+        release_branches = []

-        for stable in [r for r in self._remote.refs if Local.RE_STABLE_REF.match(r.path)]:
-            base = self._repo.merge_base(self._default, self._repo.commit(stable))
+        for branch in [r for r in self._remote.refs if Local.RE_RELEASE_BRANCH_REF.match(r.path)]:
+            base = self._repo.merge_base(self._default, self._repo.commit(branch))
            if not base:
-                print(f'Branch {stable.path} is not based on branch {self._default}. Ignoring.')
+                print(f'Branch {branch.path} is not based on branch {self._default}. Ignoring.')
            elif len(base) > 1:
-                print(f'Branch {stable.path} has more than one base commit. Ignoring.')
+                print(f'Branch {branch.path} has more than one base commit. Ignoring.')
            else:
-                stables.append((os.path.basename(stable.name), base[0]))
+                release_branches.append((os.path.basename(branch.name), base[0]))

-        return sorted(stables, key=lambda x : self.comparator(x[1]))
+        return sorted(release_branches, key=lambda x : self.comparator(x[1]))
--- a/website/templates/common_meta.html
+++ b/website/templates/common_meta.html
@ -1,7 +1,7 @@
 {% set description = description or _('ClickHouse is a fast open-source column-oriented database management system that allows generating analytical data reports in real-time using SQL queries') %}
 <meta charset="utf-8" />
 <meta http-equiv="X-UA-Compatible" content="IE=edge" />
-<meta name="viewport" content="width=device-width,initial-scale=1">
+<meta name="viewport" content="width=device-width,initial-scale=1" />

 <title>{% if title %}{{ title }}{% else %}{{ _('ClickHouse - fast open-source OLAP DBMS') }}{% endif %}</title>

@ -14,7 +14,7 @@
 <meta property="og:url" content="{{ url or 'https://clickhouse.tech/' }}"/>
 <link rel="canonical" href="{{ canonical_url or 'https://clickhouse.tech/' }}" />
 {% if page and not single_page %}
-<link rel="amphtml" href="{{ url or 'https://clickhouse.tech/' }}amp/">
+<link rel="amphtml" href="{{ url or 'https://clickhouse.tech/' }}amp/" />
 {% endif %}
 <link rel="search" href="/opensearch.xml" title="ClickHouse" type="application/opensearchdescription+xml" />
 {% include "templates/docs/ld_json.html" %}