Merge branch 'master' into windowview-multi-column-groupby

2024-11-23 08:02:02 +00:00 · 2022-02-26 00:50:49 +00:00 · 2022-02-26 00:50:49 +00:00 · 8d84d22618
commit 8d84d22618
parent 06469eb793 ba9150abb0
96 changed files with 1459 additions and 737 deletions
--- a/.github/workflows/tags_stable.yml
+++ b/.github/workflows/tags_stable.yml
@ -27,6 +27,8 @@ jobs:
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v3
      with:
+        author: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
+        committer: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
        commit-message: Update version_date.tsv after ${{ env.GITHUB_TAG }}
        branch: auto/${{ env.GITHUB_TAG }}
        delete-branch: true
--- a/docker/packager/README.md
+++ b/docker/packager/README.md
@ -3,25 +3,25 @@ compilers and build settings. Correctly configured Docker daemon is single depen

 Usage:

-Build deb package with `clang-11` in `debug` mode:
+Build deb package with `clang-14` in `debug` mode:
 ```
 $ mkdir deb/test_output
-$ ./packager --output-dir deb/test_output/ --package-type deb --compiler=clang-11 --build-type=debug
+$ ./packager --output-dir deb/test_output/ --package-type deb --compiler=clang-14 --build-type=debug
 $ ls -l deb/test_output
-rw-r--r-- 1 root root      3730 clickhouse-client_18.14.2+debug_all.deb
-rw-r--r-- 1 root root  84221888 clickhouse-common-static_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root 255967314 clickhouse-common-static-dbg_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root     14940 clickhouse-server_18.14.2+debug_all.deb
-rw-r--r-- 1 root root 340206010 clickhouse-server-base_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root      7900 clickhouse-server-common_18.14.2+debug_all.deb
+-rw-r--r-- 1 root root      3730 clickhouse-client_22.2.2+debug_all.deb
+-rw-r--r-- 1 root root  84221888 clickhouse-common-static_22.2.2+debug_amd64.deb
+-rw-r--r-- 1 root root 255967314 clickhouse-common-static-dbg_22.2.2+debug_amd64.deb
+-rw-r--r-- 1 root root     14940 clickhouse-server_22.2.2+debug_all.deb
+-rw-r--r-- 1 root root 340206010 clickhouse-server-base_22.2.2+debug_amd64.deb
+-rw-r--r-- 1 root root      7900 clickhouse-server-common_22.2.2+debug_all.deb

 ```

-Build ClickHouse binary with `clang-11` and `address` sanitizer in `relwithdebuginfo`
+Build ClickHouse binary with `clang-14` and `address` sanitizer in `relwithdebuginfo`
 mode:
 ```
 $ mkdir $HOME/some_clickhouse
-$ ./packager --output-dir=$HOME/some_clickhouse --package-type binary --compiler=clang-11 --sanitizer=address
+$ ./packager --output-dir=$HOME/some_clickhouse --package-type binary --compiler=clang-14 --sanitizer=address
 $ ls -l $HOME/some_clickhouse
 -rwxr-xr-x 1 root root 787061952  clickhouse
 lrwxrwxrwx 1 root root        10  clickhouse-benchmark -> clickhouse
--- a/docs/en/development/style.md
+++ b/docs/en/development/style.md
@ -322,7 +322,7 @@ std::string getName() const override { return "Memory"; }
 class StorageMemory : public IStorage
 ```

-**4.** `using` are named the same way as classes, or with `_t` on the end.
+**4.** `using` are named the same way as classes.

 **5.** Names of template type arguments: in simple cases, use `T`; `T`, `U`; `T1`, `T2`.

@ -490,7 +490,7 @@ if (0 != close(fd))
    throwFromErrno("Cannot close file " + file_name, ErrorCodes::CANNOT_CLOSE_FILE);
 ```

-`Do not use assert`.
+You can use assert to check invariants in code.

 **4.** Exception types.

@ -571,7 +571,7 @@ Don’t use these types for numbers: `signed/unsigned long`, `long long`, `short

 **13.** Passing arguments.

-Pass complex values by reference (including `std::string`).
+Pass complex values by value if they are going to be moved and use std::move; pass by reference if you want to update value in a loop.

 If a function captures ownership of an object created in the heap, make the argument type `shared_ptr` or `unique_ptr`.

@ -581,7 +581,7 @@ In most cases, just use `return`. Do not write `return std::move(res)`.

 If the function allocates an object on heap and returns it, use `shared_ptr` or `unique_ptr`.

-In rare cases you might need to return the value via an argument. In this case, the argument should be a reference.
+In rare cases (updating a value in a loop) you might need to return the value via an argument. In this case, the argument should be a reference.

 ``` cpp
 using AggregateFunctionPtr = std::shared_ptr<IAggregateFunction>;
--- a/docs/en/engines/table-engines/mergetree-family/mergetree.md
+++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md
@ -887,6 +887,57 @@ S3 disk can be configured as `main` or `cold` storage:

 In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.

+## Using Azure Blob Storage for Data Storage {#table_engine-mergetree-azure-blob-storage}
+
+`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
+
+As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.
+
+Configuration markup:
+``` xml
+<storage_configuration>
+    ...
+    <disks>
+        <blob_storage_disk>
+            <type>azure_blob_storage</type>
+            <storage_account_url>http://account.blob.core.windows.net</storage_account_url>
+            <container_name>container</container_name>
+            <account_name>account</account_name>
+            <account_key>pass123</account_key>
+            <metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
+            <cache_enabled>true</cache_enabled>
+            <cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
+            <skip_access_check>false</skip_access_check>
+        </blob_storage_disk>
+    </disks>
+    ...
+</storage_configuration>
+```
+
+Connection parameters:
+* `storage_account_url` - **Required**, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
+* `container_name` - Target container name, defaults to `default-container`.
+* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.
+
+Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
+* `connection_string` - For authentication using a connection string.
+* `account_name` and `account_key` - For authentication using Shared Key.
+
+Limit parameters (mainly for internal usage):
+* `max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
+* `min_bytes_for_seek` - Limits the size of a seekable region.
+* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
+* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
+* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
+
+Other parameters:
+* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
+* `cache_enabled` - Allows to cache mark and index files on local FS. Default value is `true`.
+* `cache_path` - Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
+* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.
+
+Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
+
 ## Virtual Columns {#virtual-columns}

 -   `_part` — Name of a part.
--- a/docs/en/interfaces/third-party/gui.md
+++ b/docs/en/interfaces/third-party/gui.md
@ -143,6 +143,10 @@ Features:
 -   Backup and restore.
 -   RBAC.

+### Zeppelin-Interpreter-for-ClickHouse {#zeppelin-interpreter-for-clickhouse}
+
+[Zeppelin-Interpreter-for-ClickHouse](https://github.com/SiderZhang/Zeppelin-Interpreter-for-ClickHouse) is a [Zeppelin](https://zeppelin.apache.org) interpreter for ClickHouse. Compared with JDBC interpreter, it can provide better timeout control for long running queries.
+
 ## Commercial {#commercial}

 ### DataGrip {#datagrip}
--- a/docs/en/sql-reference/data-types/array.md
+++ b/docs/en/sql-reference/data-types/array.md
@ -5,7 +5,7 @@ toc_title: Array(T)

 # Array(t) {#data-type-array}

-An array of `T`-type items. `T` can be any data type, including an array.
+An array of `T`-type items, with the starting array index as 1. `T` can be any data type, including an array.

 ## Creating an Array {#creating-an-array}

--- a/docs/en/sql-reference/data-types/date.md
+++ b/docs/en/sql-reference/data-types/date.md
@ -7,6 +7,8 @@ toc_title: Date

 A date. Stored in two bytes as the number of days since 1970-01-01 (unsigned). Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2149, but the final fully-supported year is 2148).

+Supported range of values: \[1970-01-01, 2149-06-06\].
+
 The date value is stored without the time zone.

 **Example**
--- a/docs/en/sql-reference/data-types/datetime.md
+++ b/docs/en/sql-reference/data-types/datetime.md
@ -13,7 +13,7 @@ Syntax:
 DateTime([timezone])
 ```

-Supported range of values: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
+Supported range of values: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].

 Resolution: 1 second.

--- a/docs/en/sql-reference/data-types/datetime64.md
+++ b/docs/en/sql-reference/data-types/datetime64.md
@ -18,7 +18,7 @@ DateTime64(precision, [timezone])

 Internally, stores data as a number of ‘ticks’ since epoch start (1970-01-01 00:00:00 UTC) as Int64. The tick resolution is determined by the precision parameter. Additionally, the `DateTime64` type can store time zone that is the same for the entire column, that affects how the values of the `DateTime64` type values are displayed in text format and how the values specified as strings are parsed (‘2020-01-01 05:00:01.000’). The time zone is not stored in the rows of the table (or in resultset), but is stored in the column metadata. See details in [DateTime](../../sql-reference/data-types/datetime.md).

-Supported range from January 1, 1925 till November 11, 2283.
+Supported range of values: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (Note: The precision of the maximum value is 8).

 ## Examples {#examples}

--- a/docs/en/sql-reference/statements/select/array-join.md
+++ b/docs/en/sql-reference/statements/select/array-join.md
@ -127,7 +127,7 @@ ARRAY JOIN [1, 2, 3] AS arr_external;
 └─────────────┴──────────────┘
 ```

-Multiple arrays can be comma-separated in the `ARRAY JOIN` clause. In this case, `JOIN` is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size. Example:
+Multiple arrays can be comma-separated in the `ARRAY JOIN` clause. In this case, `JOIN` is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size by default. Example:

 ``` sql
 SELECT s, arr, a, num, mapped
@ -162,6 +162,25 @@ ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num;
 │ World │ [3,4,5] │ 5 │   3 │ [1,2,3]             │
 └───────┴─────────┴───┴─────┴─────────────────────┘
 ```
+Multiple arrays with different sizes can be joined by using: `SETTINGS enable_unaligned_array_join = 1`. Example:
+
+```sql
+SELECT s, arr, a, b 
+FROM arrays_test ARRAY JOIN arr as a, [['a','b'],['c']] as b 
+SETTINGS enable_unaligned_array_join = 1;
+```
+
+```text
+┌─s───────┬─arr─────┬─a─┬─b─────────┐
+│ Hello   │ [1,2]   │ 1 │ ['a','b'] │
+│ Hello   │ [1,2]   │ 2 │ ['c']     │
+│ World   │ [3,4,5] │ 3 │ ['a','b'] │
+│ World   │ [3,4,5] │ 4 │ ['c']     │
+│ World   │ [3,4,5] │ 5 │ []        │
+│ Goodbye │ []      │ 0 │ ['a','b'] │
+│ Goodbye │ []      │ 0 │ ['c']     │
+└─────────┴─────────┴───┴───────────┘
+```

 ## ARRAY JOIN with Nested Data Structure {#array-join-with-nested-data-structure}

--- a/docs/ja/sql-reference/data-types/date.md
+++ b/docs/ja/sql-reference/data-types/date.md
@ -7,6 +7,8 @@ toc_title: "\u65E5\u4ED8"

 日付型です。 1970-01-01 からの日数が2バイトの符号なし整数として格納されます。 UNIX時間の開始直後から、変換段階で定数として定義される上限しきい値までの値を格納できます（現在は2106年までですが、一年分を完全にサポートしているのは2105年までです）。

+サポートされる値の範囲: \[1970-01-01, 2149-06-06\].
+
 日付値は、タイムゾーンなしで格納されます。

 [元の記事](https://clickhouse.com/docs/en/data_types/date/) <!--hide-->
--- a/docs/ja/sql-reference/data-types/datetime.md
+++ b/docs/ja/sql-reference/data-types/datetime.md
@ -15,7 +15,7 @@ toc_title: DateTime
 DateTime([timezone])
 ```

-サポートされる値の範囲: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
+サポートされる値の範囲: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].

 解像度:1秒.

--- a/docs/ja/sql-reference/data-types/datetime64.md
+++ b/docs/ja/sql-reference/data-types/datetime64.md
@ -19,6 +19,8 @@ DateTime64(precision, [timezone])

 内部的には、データを ‘ticks’ エポック開始（1970-01-01 00:00:00UTC）以来、Int64として。 目盛りの解像度は、精度パラメータによって決定されます。 さらに、 `DateTime64` 型は、列全体で同じタイムゾーンを格納することができます。 `DateTime64` 型の値はテキスト形式で表示され、文字列として指定された値がどのように解析されるか (‘2020-01-01 05:00:01.000’). タイムゾーンは、テーブルの行(またはresultset)には格納されませんが、列メタデータに格納されます。 詳細はを参照。 [DateTime](datetime.md).

+サポートされる値の範囲: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (注）最大値の精度は、8).
+
 ## 例 {#examples}

 **1.** テーブルの作成 `DateTime64`-列を入力し、そこにデータを挿入する:
--- a/docs/ru/sql-reference/data-types/array.md
+++ b/docs/ru/sql-reference/data-types/array.md
@ -5,7 +5,7 @@ toc_title: Array(T)

 # Array(T) {#data-type-array}

-Массив из элементов типа `T`. `T` может любым, в том числе массивом. Таким образом поддерживаются многомерные массивы.
+Массив из элементов типа `T`. `T` может любым, в том числе массивом. Таким образом поддерживаются многомерные массивы. Первый элемент массива имеет индекс 1.

 ## Создание массива {#creating-an-array}

--- a/docs/ru/sql-reference/data-types/date.md
+++ b/docs/ru/sql-reference/data-types/date.md
@ -7,6 +7,8 @@ toc_title: Date

 Дата. Хранится в двух байтах в виде (беззнакового) числа дней, прошедших от 1970-01-01. Позволяет хранить значения от чуть больше, чем начала unix-эпохи до верхнего порога, определяющегося константой на этапе компиляции (сейчас - до 2106 года, последний полностью поддерживаемый год - 2105).

+Диапазон значений: \[1970-01-01, 2149-06-06\].
+
 Дата хранится без учёта часового пояса.

 **Пример**
--- a/docs/ru/sql-reference/data-types/datetime.md
+++ b/docs/ru/sql-reference/data-types/datetime.md
@ -13,7 +13,7 @@ toc_title: DateTime
 DateTime([timezone])
 ```

-Диапазон значений: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
+Диапазон значений: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].

 Точность: 1 секунда.

--- a/docs/ru/sql-reference/data-types/datetime64.md
+++ b/docs/ru/sql-reference/data-types/datetime64.md
@ -18,7 +18,7 @@ DateTime64(precision, [timezone])

 Данные хранятся в виде количества ‘тиков’, прошедших с момента начала эпохи (1970-01-01 00:00:00 UTC), в Int64. Размер тика определяется параметром precision. Дополнительно, тип `DateTime64` позволяет хранить часовой пояс, единый для всей колонки, который влияет на то, как будут отображаться значения типа `DateTime64` в текстовом виде и как будут парситься значения заданные в виде строк (‘2020-01-01 05:00:01.000’). Часовой пояс не хранится в строках таблицы (выборки), а хранится в метаданных колонки. Подробнее см. [DateTime](datetime.md).

-Поддерживаются значения от 1 января 1925 г. и до 11 ноября 2283 г.
+Диапазон значений: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (Примечание: Точность максимального значения составляет 8).

 ## Примеры {#examples}

--- a/docs/zh/faq/general/who-is-using-clickhouse.md
+++ b/docs/zh/faq/general/who-is-using-clickhouse.md
@ -1 +0,0 @@
-../../../en/faq/general/who-is-using-clickhouse.md
--- a/docs/zh/faq/general/who-is-using-clickhouse.md
+++ b/docs/zh/faq/general/who-is-using-clickhouse.md
@ -0,0 +1,19 @@
+---
+title: 谁在使用 ClickHouse?
+toc_hidden: true
+toc_priority: 9
+---
+
+# 谁在使用 ClickHouse? {#who-is-using-clickhouse}
+
+作为一个开源产品，这个问题的答案并不那么简单。如果你想开始使用ClickHouse，你不需要告诉任何人，你只需要获取源代码或预编译包。不需要签署任何合同，[Apache 2.0许可证](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE)允许不受约束的软件分发。
+
+此外，技术堆栈通常处于保密协议所涵盖的灰色地带。一些公司认为他们使用的技术是一种竞争优势，即使这些技术是开源的，并且不允许员工公开分享任何细节。一些公司看到了一些公关风险，只允许员工在获得公关部门批准后分享实施细节。
+
+那么，如何辨别谁在使用ClickHouse呢?
+
+一种方法是询问周围的人。如果不是书面形式，人们更愿意分享他们公司使用的技术、用例、使用的硬件类型、数据量等。我们定期在[ClickHouse meetup](https://www.youtube.com/channel/UChtmrD-dsdpspr42P_PyRAw/playlists)上与世界各地的用户进行交流，并听到了大约1000多家使用ClickHouse的公司的故事。不幸的是，这是不可复制的，我们试图把这些故事当作是在保密协议下被告知的，以避免任何潜在的麻烦。但你可以参加我们未来的任何聚会，并与其他用户单独交谈。有多种方式宣布聚会，例如，你可以订阅[我们的Twitter](http://twitter.com/ClickHouseDB/)。
+
+第二种方法是寻找**公开表示**使用ClickHouse的公司。因为通常会有一些确凿的证据，如博客文章、谈话视频录音、幻灯片等。我们在我们的[**Adopters**](../../introduction/adopters.md)页面上收集指向此类证据的链接。你可以随意提供你雇主的故事，或者只是一些你偶然发现的链接(但尽量不要在这个过程中违反保密协议)。
+
+你可以在采用者名单中找到一些非常大的公司，比如彭博社、思科、中国电信、腾讯或优步，但通过第一种方法，我们发现还有更多。例如，如果你看看《福布斯》[(2020年)列出的最大IT公司名单](https://www.forbes.com/sites/hanktucker/2020/05/13/worlds-largest-technology-companies-2020-apple-stays-on-top-zoom-and-uber-debut/)，超过一半的公司都在以某种方式使用ClickHouse。此外，不提[Yandex](../../introduction/history.md)是不公平的，该公司最初于2016年开放ClickHouse，碰巧是欧洲最大的it公司之一。
--- a/docs/zh/getting-started/example-datasets/github-events.md
+++ b/docs/zh/getting-started/example-datasets/github-events.md
@ -1 +0,0 @@
-../../../en/getting-started/example-datasets/github-events.md
--- a/docs/zh/getting-started/example-datasets/github-events.md
+++ b/docs/zh/getting-started/example-datasets/github-events.md
@ -0,0 +1,10 @@
+---
+toc_priority: 11
+toc_title: GitHub 事件数据集
+---
+
+# GitHub 事件数据集
+
+数据集包含了GitHub上从2011年到2020年12月6日的所有事件，大小为31亿条记录。下载大小为75 GB，如果存储在使用lz4压缩的表中，则需要多达200 GB的磁盘空间。
+
+完整的数据集描述，见解，下载说明和交互式查询请参考[这里](https://ghe.clickhouse.tech/)。
--- a/docs/zh/operations/system-tables/mutations.md
+++ b/docs/zh/operations/system-tables/mutations.md
@ -1,30 +1,50 @@
---
-machine_translated: true
-machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
+# system.mutations {#system_tables-mutations}

-# 系统。突变 {#system_tables-mutations}
+该表包含关于MergeTree表的[mutation](../../sql-reference/statements/alter.md#alter-mutations)及其进度信息 。每条mutation命令都用一行来表示。

-该表包含以下信息 [突变](../../sql-reference/statements/alter.md#alter-mutations) MergeTree表及其进展。 每个突变命令由一行表示。 该表具有以下列:
+该表具有以下列属性:

-**数据库**, **表** -应用突变的数据库和表的名称。
+-   `database` ([String](../../sql-reference/data-types/string.md)) — 应用mutation的数据库名称。

-**mutation_id** -变异的ID 对于复制的表，这些Id对应于znode中的名称 `<table_path_in_zookeeper>/mutations/` 动物园管理员的目录。 对于未复制的表，Id对应于表的数据目录中的文件名。
+-   `table` ([String](../../sql-reference/data-types/string.md)) — 应用mutation的表名称。

-**命令** -Mutation命令字符串（查询后的部分 `ALTER TABLE [db.]table`).
+-   `mutation_id` ([String](../../sql-reference/data-types/string.md)) — mutation的ID。对于复制表，这些ID对应于ZooKeeper中<table_path_in_zookeeper>/mutations/目录下的znode名称。对于非复制表，ID对应表的数据目录中的文件名。

-**create_time** -当这个突变命令被提交执行。
+-   `command` ([String](../../sql-reference/data-types/string.md)) — mutation命令字符串（`ALTER TABLE [db.]table`语句之后的部分)。

-**block_numbers.partition_id**, **block_numbers.编号** -嵌套列。 对于复制表的突变，它包含每个分区的一条记录：分区ID和通过突变获取的块编号（在每个分区中，只有包含编号小于该分区中突变获取的块编号的块的 在非复制表中，所有分区中的块编号形成一个序列。 这意味着对于非复制表的突变，该列将包含一条记录，其中包含由突变获取的单个块编号。
+-   `create_time` ([Datetime](../../sql-reference/data-types/datetime.md)) — mutation命令提交执行的日期和时间。

-**parts_to_do** -为了完成突变，需要突变的数据部分的数量。
+-   `block_numbers.partition_id` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — 对于复制表的mutation，该数组包含分区的ID（每个分区都有一条记录）。对于非复制表的mutation，该数组为空。

-**is_done** -变异完成了?？ 请注意，即使 `parts_to_do = 0` 由于长时间运行的INSERT将创建需要突变的新数据部分，因此可能尚未完成复制表的突变。
+-   `block_numbers.number` ([Array](../../sql-reference/data-types/array.md)([Int64](../../sql-reference/data-types/int-uint.md))) — 对于复制表的mutation，该数组包含每个分区的一条记录，以及通过mutation获取的块号。只有包含块号小于该数字的块的part才会在分区中应用mutation。
+  
+    在非复制表中，所有分区中的块号组成一个序列。这意味着对于非复制表的mutation，该列将包含一条记录，该记录具有通过mutation获得的单个块号。
+    
+-   `parts_to_do_names` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — 由需要应用mutation的part名称构成的数组。

-如果在改变某些部分时出现问题，以下列将包含其他信息:
+-   `parts_to_do` ([Int64](../../sql-reference/data-types/int-uint.md)) — 需要应用mutation的part的数量。

-**latest_failed_part** -不能变异的最新部分的名称。
+-   `is_done` ([UInt8](../../sql-reference/data-types/int-uint.md)) — mutation是否完成的标志。其中：
+    -   1，表示mutation已经完成。
+    -   0，表示mutation仍在进行中。

-**latest_fail_time** -最近的部分突变失败的时间。

-**latest_fail_reason** -导致最近部件变异失败的异常消息。
+!!! info "注意"
+    即使 parts_to_do = 0，由于长时间运行的`INSERT`查询将创建需要mutate的新part，也可能导致复制表mutation尚未完成。
+
+如果某些parts在mutation时出现问题，以下列将包含附加信息：
+
+-   `latest_failed_part`([String](../../sql-reference/data-types/string.md)) — 最近不能mutation的part的名称。
+
+-   `latest_fail_time`([Datetime](../../sql-reference/data-types/datetime.md)) — 最近的一个mutation失败的时间。
+
+-   `latest_fail_reason`([String](../../sql-reference/data-types/string.md)) — 导致最近part的mutation失败的异常消息。
+
+
+**另请参阅**
+
+- Mutations
+- [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) 表引擎
+- [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/replication.md) 族
+
+[Original article](https://clickhouse.com/docs/en/operations/system_tables/mutations) <!--hide-->
--- a/docs/zh/sql-reference/data-types/date.md
+++ b/docs/zh/sql-reference/data-types/date.md
@ -2,4 +2,6 @@

 日期类型，用两个字节存储，表示从 1970-01-01 (无符号) 到当前的日期值。允许存储从 Unix 纪元开始到编译阶段定义的上限阈值常量（目前上限是2106年，但最终完全支持的年份为2105）。最小值输出为1970-01-01。

+值的范围: \[1970-01-01, 2149-06-06\]。
+
 日期中没有存储时区信息。
--- a/docs/zh/sql-reference/data-types/datetime.md
+++ b/docs/zh/sql-reference/data-types/datetime.md
@ -2,6 +2,8 @@

 时间戳类型。用四个字节（无符号的）存储 Unix 时间戳）。允许存储与日期类型相同的范围内的值。最小值为 1970-01-01 00:00:00。时间戳类型值精确到秒（没有闰秒）。

+值的范围: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\]。
+
 ## 时区 {#shi-qu}

 使用启动客户端或服务器时的系统时区，时间戳是从文本（分解为组件）转换为二进制并返回。在文本格式中，有关夏令时的信息会丢失。
--- a/docs/zh/sql-reference/data-types/datetime64.md
+++ b/docs/zh/sql-reference/data-types/datetime64.md
@ -19,6 +19,8 @@ DateTime64(precision, [timezone])

 在内部，此类型以Int64类型将数据存储为自Linux纪元开始(1970-01-01 00:00:00UTC)的时间刻度数（ticks）。时间刻度的分辨率由precision参数确定。此外，`DateTime64` 类型可以像存储其他数据列一样存储时区信息，时区会影响 `DateTime64` 类型的值如何以文本格式显示，以及如何解析以字符串形式指定的时间数据 (‘2020-01-01 05:00:01.000’)。时区不存储在表的行中（也不在resultset中），而是存储在列的元数据中。详细信息请参考 [DateTime](datetime.md) 数据类型.

+值的范围: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (注意: 最大值的精度是8)。
+
 ## 示例 {#examples}

 **1.** 创建一个具有 `DateTime64` 类型列的表，并向其中插入数据:
--- a/programs/client/Client.cpp
+++ b/programs/client/Client.cpp
@ -486,12 +486,19 @@ void Client::connect()
    UInt64 server_version_minor = 0;
    UInt64 server_version_patch = 0;

+    if (hosts_and_ports.empty())
+    {
+        String host = config().getString("host", "localhost");
+        UInt16 port = static_cast<UInt16>(ConnectionParameters::getPortFromConfig(config()));
+        hosts_and_ports.emplace_back(HostAndPort{host, port});
+    }
+
    for (size_t attempted_address_index = 0; attempted_address_index < hosts_and_ports.size(); ++attempted_address_index)
    {
        try
        {
-            connection_parameters
-                = ConnectionParameters(config(), hosts_and_ports[attempted_address_index].host, hosts_and_ports[attempted_address_index].port);
+            connection_parameters = ConnectionParameters(
+                config(), hosts_and_ports[attempted_address_index].host, hosts_and_ports[attempted_address_index].port);

            if (is_interactive)
                std::cout << "Connecting to "
@ -1085,22 +1092,15 @@ void Client::processOptions(const OptionsDescription & options_description,
        }
    }

-    if (hosts_and_ports_arguments.empty())
+    for (const auto & hosts_and_ports_argument : hosts_and_ports_arguments)
    {
-        hosts_and_ports.emplace_back(HostAndPort{"localhost", DBMS_DEFAULT_PORT});
-    }
-    else
-    {
-        for (const auto & hosts_and_ports_argument : hosts_and_ports_arguments)
-        {
-            /// Parse commandline options related to external tables.
-            po::parsed_options parsed_hosts_and_ports
-                = po::command_line_parser(hosts_and_ports_argument).options(options_description.hosts_and_ports_description.value()).run();
-            po::variables_map host_and_port_options;
-            po::store(parsed_hosts_and_ports, host_and_port_options);
-            hosts_and_ports.emplace_back(
-                HostAndPort{host_and_port_options["host"].as<std::string>(), host_and_port_options["port"].as<UInt16>()});
-        }
+        /// Parse commandline options related to external tables.
+        po::parsed_options parsed_hosts_and_ports
+            = po::command_line_parser(hosts_and_ports_argument).options(options_description.hosts_and_ports_description.value()).run();
+        po::variables_map host_and_port_options;
+        po::store(parsed_hosts_and_ports, host_and_port_options);
+        hosts_and_ports.emplace_back(
+            HostAndPort{host_and_port_options["host"].as<std::string>(), host_and_port_options["port"].as<UInt16>()});
    }

    send_external_tables = true;
--- a/programs/odbc-bridge/ODBCBlockInputStream.cpp
+++ b/programs/odbc-bridge/ODBCBlockInputStream.cpp
@ -149,8 +149,6 @@ void ODBCSource::insertValue(
            DateTime64 time = 0;
            const auto * datetime_type = assert_cast<const DataTypeDateTime64 *>(data_type.get());
            readDateTime64Text(time, datetime_type->getScale(), in, datetime_type->getTimeZone());
-            if (time < 0)
-                time = 0;
            assert_cast<DataTypeDateTime64::ColumnType &>(column).insertValue(time);
            break;
        }
--- a/src/Common/ProgressIndication.cpp
+++ b/src/Common/ProgressIndication.cpp
@ -238,28 +238,39 @@ void ProgressIndication::writeProgress()
                /// at right after progress bar or at left on top of the progress bar.
                if (width_of_progress_bar <= 1 + 2 * static_cast<int64_t>(profiling_msg.size()))
                    profiling_msg.clear();
-                else
-                    width_of_progress_bar -= profiling_msg.size();

                if (width_of_progress_bar > 0)
                {
                    double bar_width = UnicodeBar::getWidth(current_count, 0, max_count, width_of_progress_bar);
                    std::string bar = UnicodeBar::render(bar_width);
+                    size_t bar_width_in_terminal = bar.size() / UNICODE_BAR_CHAR_SIZE;

-                    /// Render profiling_msg at left on top of the progress bar.
-                    bool render_profiling_msg_at_left = current_count * 2 >= max_count;
-                    if (!profiling_msg.empty() && render_profiling_msg_at_left)
-                        message << "\033[30;42m" << profiling_msg << "\033[0m";
+                    if (profiling_msg.empty())
+                    {
+                        message << "\033[0;32m" << bar << "\033[0m"
+                            << std::string(width_of_progress_bar - bar_width_in_terminal, ' ');
+                    }
+                    else
+                    {
+                        bool render_profiling_msg_at_left = current_count * 2 >= max_count;

-                    message << "\033[0;32m" << bar << "\033[0m";
+                        if (render_profiling_msg_at_left)
+                        {
+                            /// Render profiling_msg at left on top of the progress bar.

-                    /// Whitespaces after the progress bar.
-                    if (width_of_progress_bar > static_cast<int64_t>(bar.size() / UNICODE_BAR_CHAR_SIZE))
-                        message << std::string(width_of_progress_bar - bar.size() / UNICODE_BAR_CHAR_SIZE, ' ');
+                            message << "\033[30;42m" << profiling_msg << "\033[0m"
+                                << "\033[0;32m" << bar.substr(profiling_msg.size() * UNICODE_BAR_CHAR_SIZE) << "\033[0m"
+                                << std::string(width_of_progress_bar - bar_width_in_terminal, ' ');
+                        }
+                        else
+                        {
+                            /// Render profiling_msg at right after the progress bar.

-                    /// Render profiling_msg at right after the progress bar.
-                    if (!profiling_msg.empty() && !render_profiling_msg_at_left)
-                        message << "\033[2m" << profiling_msg << "\033[0m";
+                            message << "\033[0;32m" << bar << "\033[0m"
+                                << std::string(width_of_progress_bar - bar_width_in_terminal - profiling_msg.size(), ' ')
+                                << "\033[2m" << profiling_msg << "\033[0m";
+                        }
+                    }
                }
            }
        }
--- a/src/Compression/CachedCompressedReadBuffer.h
+++ b/src/Compression/CachedCompressedReadBuffer.h
@ -51,7 +51,7 @@ public:

    /// Seek is lazy. It doesn't move the position anywhere, just remember them and perform actual
    /// seek inside nextImpl.
-    void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);
+    void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block) override;

    void setProfileCallback(const ReadBufferFromFileBase::ProfileCallback & profile_callback_, clockid_t clock_type_ = CLOCK_MONOTONIC_COARSE)
    {
--- a/src/Compression/CompressedReadBufferBase.h
+++ b/src/Compression/CompressedReadBufferBase.h
@ -48,8 +48,8 @@ protected:

 public:
    /// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'.
-    CompressedReadBufferBase(ReadBuffer * in = nullptr, bool allow_different_codecs_ = false);
-    ~CompressedReadBufferBase();
+    explicit CompressedReadBufferBase(ReadBuffer * in = nullptr, bool allow_different_codecs_ = false);
+    virtual ~CompressedReadBufferBase();

    /** Disable checksums.
      * For example, may be used when
@ -60,7 +60,9 @@ public:
        disable_checksum = true;
    }

-public:
+    /// Some compressed read buffer can do useful seek operation
+    virtual void seek(size_t /* offset_in_compressed_file */, size_t /* offset_in_decompressed_block */) {}
+
    CompressionCodecPtr codec;
 };

--- a/src/Compression/CompressedReadBufferFromFile.h
+++ b/src/Compression/CompressedReadBufferFromFile.h
@ -51,7 +51,7 @@ public:
    /// Seek is lazy in some sense. We move position in compressed file_in to offset_in_compressed_file, but don't
    /// read data into working_buffer and don't shift our position to offset_in_decompressed_block. Instead
    /// we store this offset inside nextimpl_working_buffer_offset.
-    void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);
+    void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block) override;

    size_t readBig(char * to, size_t n) override;

--- a/src/Coordination/Changelog.cpp
+++ b/src/Coordination/Changelog.cpp
@ -293,6 +293,8 @@ Changelog::Changelog(

    if (existing_changelogs.empty())
        LOG_WARNING(log, "No logs exists in {}. It's Ok if it's the first run of clickhouse-keeper.", changelogs_dir);
+
+    clean_log_thread = ThreadFromGlobalPool([this] { cleanLogThread(); });
 }

 void Changelog::readChangelogAndInitWriter(uint64_t last_commited_log_index, uint64_t logs_to_keep)
@ -581,7 +583,17 @@ void Changelog::compact(uint64_t up_to_log_index)
            }

            LOG_INFO(log, "Removing changelog {} because of compaction", itr->second.path);
-            std::filesystem::remove(itr->second.path);
+            /// If failed to push to queue for background removing, then we will remove it now
+            if (!log_files_to_delete_queue.tryPush(itr->second.path, 1))
+            {
+                std::error_code ec;
+                std::filesystem::remove(itr->second.path, ec);
+                if (ec)
+                    LOG_WARNING(log, "Failed to remove changelog {} in compaction, error message: {}", itr->second.path, ec.message());
+                else
+                    LOG_INFO(log, "Removed changelog {} because of compaction", itr->second.path);
+            }
+
            itr = existing_changelogs.erase(itr);
        }
        else /// Files are ordered, so all subsequent should exist
@ -705,6 +717,9 @@ Changelog::~Changelog()
    try
    {
        flush();
+        log_files_to_delete_queue.finish();
+        if (clean_log_thread.joinable())
+            clean_log_thread.join();
    }
    catch (...)
    {
@ -712,4 +727,20 @@ Changelog::~Changelog()
    }
 }

+void Changelog::cleanLogThread()
+{
+    while (!log_files_to_delete_queue.isFinishedAndEmpty())
+    {
+        std::string path;
+        if (log_files_to_delete_queue.tryPop(path))
+        {
+            std::error_code ec;
+            if (std::filesystem::remove(path, ec))
+                LOG_INFO(log, "Removed changelog {} because of compaction.", path);
+            else
+                LOG_WARNING(log, "Failed to remove changelog {} in compaction, error message: {}", path, ec.message());
+        }
+    }
+}
+
 }
--- a/src/Coordination/Changelog.h
+++ b/src/Coordination/Changelog.h
@ -7,6 +7,7 @@
 #include <IO/HashingWriteBuffer.h>
 #include <IO/CompressionMethod.h>
 #include <Disks/IDisk.h>
+#include <Common/ConcurrentBoundedQueue.h>

 namespace DB
 {
@ -142,6 +143,9 @@ private:
    /// Init writer for existing log with some entries already written
    void initWriter(const ChangelogFileDescription & description);

+    /// Clean useless log files in a background thread
+    void cleanLogThread();
+
 private:
    const std::string changelogs_dir;
    const uint64_t rotate_interval;
@ -160,6 +164,10 @@ private:
    /// min_log_id + 1 == max_log_id means empty log storage for NuRaft
    uint64_t min_log_id = 0;
    uint64_t max_log_id = 0;
+    /// For compaction, queue of delete not used logs
+    /// 128 is enough, even if log is not removed, it's not a problem
+    ConcurrentBoundedQueue<std::string> log_files_to_delete_queue{128};
+    ThreadFromGlobalPool clean_log_thread;
 };

 }
--- a/src/Coordination/KeeperDispatcher.cpp
+++ b/src/Coordination/KeeperDispatcher.cpp
@ -240,6 +240,8 @@ bool KeeperDispatcher::putRequest(const Coordination::ZooKeeperRequestPtr & requ

    KeeperStorage::RequestForSession request_info;
    request_info.request = request;
+    using namespace std::chrono;
+    request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
    request_info.session_id = session_id;

    std::lock_guard lock(push_request_mutex);
@ -400,6 +402,8 @@ void KeeperDispatcher::sessionCleanerTask()
                    request->xid = Coordination::CLOSE_XID;
                    KeeperStorage::RequestForSession request_info;
                    request_info.request = request;
+                    using namespace std::chrono;
+                    request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
                    request_info.session_id = dead_session;
                    {
                        std::lock_guard lock(push_request_mutex);
@ -433,7 +437,7 @@ void KeeperDispatcher::finishSession(int64_t session_id)

 void KeeperDispatcher::addErrorResponses(const KeeperStorage::RequestsForSessions & requests_for_sessions, Coordination::Error error)
 {
-    for (const auto & [session_id, request] : requests_for_sessions)
+    for (const auto & [session_id, time, request] : requests_for_sessions)
    {
        KeeperStorage::ResponsesForSessions responses;
        auto response = request->makeResponse();
@ -477,6 +481,8 @@ int64_t KeeperDispatcher::getSessionID(int64_t session_timeout_ms)
    request->server_id = server->getServerID();

    request_info.request = request;
+    using namespace std::chrono;
+    request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
    request_info.session_id = -1;

    auto promise = std::make_shared<std::promise<int64_t>>();
--- a/src/Coordination/KeeperServer.cpp
+++ b/src/Coordination/KeeperServer.cpp
@ -260,11 +260,12 @@ void KeeperServer::shutdown()
 namespace
 {

-nuraft::ptr<nuraft::buffer> getZooKeeperLogEntry(int64_t session_id, const Coordination::ZooKeeperRequestPtr & request)
+nuraft::ptr<nuraft::buffer> getZooKeeperLogEntry(int64_t session_id, int64_t time, const Coordination::ZooKeeperRequestPtr & request)
 {
    DB::WriteBufferFromNuraftBuffer buf;
    DB::writeIntBinary(session_id, buf);
    request->write(buf);
+    DB::writeIntBinary(time, buf);
    return buf.getBuffer();
 }

@ -283,8 +284,8 @@ RaftAppendResult KeeperServer::putRequestBatch(const KeeperStorage::RequestsForS
 {

    std::vector<nuraft::ptr<nuraft::buffer>> entries;
-    for (const auto & [session_id, request] : requests_for_sessions)
-        entries.push_back(getZooKeeperLogEntry(session_id, request));
+    for (const auto & [session_id, time, request] : requests_for_sessions)
+        entries.push_back(getZooKeeperLogEntry(session_id, time, request));

    return raft_instance->append_entries(entries);
 }
--- a/src/Coordination/KeeperSnapshotManager.cpp
+++ b/src/Coordination/KeeperSnapshotManager.cpp
@ -337,8 +337,9 @@ KeeperStorageSnapshot::KeeperStorageSnapshot(KeeperStorage * storage_, uint64_t
    , session_id(storage->session_id_counter)
    , cluster_config(cluster_config_)
 {
-    snapshot_container_size = storage->container.snapshotSize();
-    storage->enableSnapshotMode(snapshot_container_size);
+    auto [size, ver] = storage->container.snapshotSizeWithVersion();
+    snapshot_container_size = size;
+    storage->enableSnapshotMode(ver);
    begin = storage->getSnapshotIteratorBegin();
    session_and_timeout = storage->getActiveSessions();
    acl_map = storage->acl_map.getMapping();
@ -351,8 +352,9 @@ KeeperStorageSnapshot::KeeperStorageSnapshot(KeeperStorage * storage_, const Sna
    , session_id(storage->session_id_counter)
    , cluster_config(cluster_config_)
 {
-    snapshot_container_size = storage->container.snapshotSize();
-    storage->enableSnapshotMode(snapshot_container_size);
+    auto [size, ver] = storage->container.snapshotSizeWithVersion();
+    snapshot_container_size = size;
+    storage->enableSnapshotMode(ver);
    begin = storage->getSnapshotIteratorBegin();
    session_and_timeout = storage->getActiveSessions();
    acl_map = storage->acl_map.getMapping();
--- a/src/Coordination/KeeperStateMachine.cpp
+++ b/src/Coordination/KeeperStateMachine.cpp
@ -38,6 +38,8 @@ namespace
        request_for_session.request = Coordination::ZooKeeperRequestFactory::instance().get(opnum);
        request_for_session.request->xid = xid;
        request_for_session.request->readImpl(buffer);
+
+        readIntBinary(request_for_session.time, buffer);
        return request_for_session;
    }
 }
@ -133,7 +135,7 @@ nuraft::ptr<nuraft::buffer> KeeperStateMachine::commit(const uint64_t log_idx, n
    else
    {
        std::lock_guard lock(storage_and_responses_lock);
-        KeeperStorage::ResponsesForSessions responses_for_sessions = storage->processRequest(request_for_session.request, request_for_session.session_id, log_idx);
+        KeeperStorage::ResponsesForSessions responses_for_sessions = storage->processRequest(request_for_session.request, request_for_session.session_id, request_for_session.time, log_idx);
        for (auto & response_for_session : responses_for_sessions)
            if (!responses_queue.push(response_for_session))
                throw Exception(ErrorCodes::SYSTEM_ERROR, "Could not push response with session id {} into responses queue", response_for_session.session_id);
@ -358,7 +360,7 @@ void KeeperStateMachine::processReadRequest(const KeeperStorage::RequestForSessi
 {
    /// Pure local request, just process it with storage
    std::lock_guard lock(storage_and_responses_lock);
-    auto responses = storage->processRequest(request_for_session.request, request_for_session.session_id, std::nullopt);
+    auto responses = storage->processRequest(request_for_session.request, request_for_session.session_id, request_for_session.time, std::nullopt);
    for (const auto & response : responses)
        if (!responses_queue.push(response))
            throw Exception(ErrorCodes::SYSTEM_ERROR, "Could not push response with session id {} into responses queue", response.session_id);
--- a/src/Coordination/KeeperStorage.cpp
+++ b/src/Coordination/KeeperStorage.cpp
@ -191,7 +191,7 @@ struct KeeperStorageRequestProcessor
    explicit KeeperStorageRequestProcessor(const Coordination::ZooKeeperRequestPtr & zk_request_)
        : zk_request(zk_request_)
    {}
-    virtual std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const = 0;
+    virtual std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const = 0;
    virtual KeeperStorage::ResponsesForSessions processWatches(KeeperStorage::Watches & /*watches*/, KeeperStorage::Watches & /*list_watches*/) const { return {}; }
    virtual bool checkAuth(KeeperStorage & /*storage*/, int64_t /*session_id*/) const { return true; }

@ -201,7 +201,7 @@ struct KeeperStorageRequestProcessor
 struct KeeperStorageHeartbeatRequestProcessor final : public KeeperStorageRequestProcessor
 {
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
    {
        return {zk_request->makeResponse(), {}};
    }
@ -210,7 +210,7 @@ struct KeeperStorageHeartbeatRequestProcessor final : public KeeperStorageReques
 struct KeeperStorageSyncRequestProcessor final : public KeeperStorageRequestProcessor
 {
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
    {
        auto response = zk_request->makeResponse();
        dynamic_cast<Coordination::ZooKeeperSyncResponse &>(*response).path
@ -246,7 +246,7 @@ struct KeeperStorageCreateRequestProcessor final : public KeeperStorageRequestPr
        return checkACL(Coordination::ACL::Create, node_acls, session_auths);
    }

-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const override
    {
        auto & container = storage.container;
        auto & ephemerals = storage.ephemerals;
@ -309,8 +309,8 @@ struct KeeperStorageCreateRequestProcessor final : public KeeperStorageRequestPr
        created_node.stat.czxid = zxid;
        created_node.stat.mzxid = zxid;
        created_node.stat.pzxid = zxid;
-        created_node.stat.ctime = std::chrono::system_clock::now().time_since_epoch() / std::chrono::milliseconds(1);
-        created_node.stat.mtime = created_node.stat.ctime;
+        created_node.stat.ctime = time;
+        created_node.stat.mtime = time;
        created_node.stat.numChildren = 0;
        created_node.stat.dataLength = request.data.length();
        created_node.stat.ephemeralOwner = request.is_ephemeral ? session_id : 0;
@ -394,7 +394,7 @@ struct KeeperStorageGetRequestProcessor final : public KeeperStorageRequestProce
    }

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /* zxid */, int64_t /* session_id */) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
    {
        auto & container = storage.container;
        Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -453,7 +453,7 @@ struct KeeperStorageRemoveRequestProcessor final : public KeeperStorageRequestPr
    }

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /*session_id*/) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /*session_id*/, int64_t /* time */) const override
    {
        auto & container = storage.container;
        auto & ephemerals = storage.ephemerals;
@ -538,7 +538,7 @@ struct KeeperStorageRemoveRequestProcessor final : public KeeperStorageRequestPr
 struct KeeperStorageExistsRequestProcessor final : public KeeperStorageRequestProcessor
 {
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /* session_id */) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /* session_id */, int64_t /* time */) const override
    {
        auto & container = storage.container;

@ -579,7 +579,7 @@ struct KeeperStorageSetRequestProcessor final : public KeeperStorageRequestProce
    }

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /* session_id */) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /* session_id */, int64_t time) const override
    {
        auto & container = storage.container;

@ -598,11 +598,11 @@ struct KeeperStorageSetRequestProcessor final : public KeeperStorageRequestProce

            auto prev_node = it->value;

-            auto itr = container.updateValue(request.path, [zxid, request] (KeeperStorage::Node & value)
+            auto itr = container.updateValue(request.path, [zxid, request, time] (KeeperStorage::Node & value)
            {
                value.stat.version++;
                value.stat.mzxid = zxid;
-                value.stat.mtime = std::chrono::system_clock::now().time_since_epoch() / std::chrono::milliseconds(1);
+                value.stat.mtime = time;
                value.stat.dataLength = request.data.length();
                value.size_bytes = value.size_bytes + request.data.size() - value.data.size();
                value.data = request.data;
@ -657,7 +657,7 @@ struct KeeperStorageListRequestProcessor final : public KeeperStorageRequestProc
    }

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
    {
        auto & container = storage.container;
        Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -706,7 +706,7 @@ struct KeeperStorageCheckRequestProcessor final : public KeeperStorageRequestPro
    }

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
    {
        auto & container = storage.container;

@ -751,7 +751,7 @@ struct KeeperStorageSetACLRequestProcessor final : public KeeperStorageRequestPr

    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;

-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id, int64_t /* time */) const override
    {
        auto & container = storage.container;

@ -815,7 +815,7 @@ struct KeeperStorageGetACLRequestProcessor final : public KeeperStorageRequestPr
    }
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;

-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
    {
        Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
        Coordination::ZooKeeperGetACLResponse & response = dynamic_cast<Coordination::ZooKeeperGetACLResponse &>(*response_ptr);
@ -877,7 +877,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
        }
    }

-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const override
    {
        Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
        Coordination::ZooKeeperMultiResponse & response = dynamic_cast<Coordination::ZooKeeperMultiResponse &>(*response_ptr);
@ -888,7 +888,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
            size_t i = 0;
            for (const auto & concrete_request : concrete_requests)
            {
-                auto [ cur_response, undo_action ] = concrete_request->process(storage, zxid, session_id);
+                auto [ cur_response, undo_action ] = concrete_request->process(storage, zxid, session_id, time);

                response.responses[i] = cur_response;
                if (cur_response->error != Coordination::Error::ZOK)
@ -945,7 +945,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
 struct KeeperStorageCloseRequestProcessor final : public KeeperStorageRequestProcessor
 {
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage &, int64_t, int64_t) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage &, int64_t, int64_t, int64_t /* time */) const override
    {
        throw DB::Exception("Called process on close request", ErrorCodes::LOGICAL_ERROR);
    }
@ -954,7 +954,7 @@ struct KeeperStorageCloseRequestProcessor final : public KeeperStorageRequestPro
 struct KeeperStorageAuthRequestProcessor final : public KeeperStorageRequestProcessor
 {
    using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
-    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id) const override
+    std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id, int64_t /* time */) const override
    {
        Coordination::ZooKeeperAuthRequest & auth_request = dynamic_cast<Coordination::ZooKeeperAuthRequest &>(*zk_request);
        Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -1067,7 +1067,7 @@ KeeperStorageRequestProcessorsFactory::KeeperStorageRequestProcessorsFactory()
 }


-KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordination::ZooKeeperRequestPtr & zk_request, int64_t session_id, std::optional<int64_t> new_last_zxid, bool check_acl)
+KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordination::ZooKeeperRequestPtr & zk_request, int64_t session_id, int64_t time, std::optional<int64_t> new_last_zxid, bool check_acl)
 {
    KeeperStorage::ResponsesForSessions results;
    if (new_last_zxid)
@ -1119,7 +1119,7 @@ KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordina
    else if (zk_request->getOpNum() == Coordination::OpNum::Heartbeat) /// Heartbeat request is also special
    {
        KeeperStorageRequestProcessorPtr storage_request = KeeperStorageRequestProcessorsFactory::instance().get(zk_request);
-        auto [response, _] = storage_request->process(*this, zxid, session_id);
+        auto [response, _] = storage_request->process(*this, zxid, session_id, time);
        response->xid = zk_request->xid;
        response->zxid = getZXID();

@ -1138,7 +1138,7 @@ KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordina
        }
        else
        {
-            std::tie(response, std::ignore) = request_processor->process(*this, zxid, session_id);
+            std::tie(response, std::ignore) = request_processor->process(*this, zxid, session_id, time);
        }

        /// Watches for this requests are added to the watches lists
--- a/src/Coordination/KeeperStorage.h
+++ b/src/Coordination/KeeperStorage.h
@ -66,6 +66,7 @@ public:
    struct RequestForSession
    {
        int64_t session_id;
+        int64_t time;
        Coordination::ZooKeeperRequestPtr request;
    };

@ -153,16 +154,17 @@ public:

    /// Process user request and return response.
    /// check_acl = false only when converting data from ZooKeeper.
-    ResponsesForSessions processRequest(const Coordination::ZooKeeperRequestPtr & request, int64_t session_id, std::optional<int64_t> new_last_zxid, bool check_acl = true);
+    ResponsesForSessions processRequest(const Coordination::ZooKeeperRequestPtr & request, int64_t session_id, int64_t time, std::optional<int64_t> new_last_zxid, bool check_acl = true);

    void finalize();

    /// Set of methods for creating snapshots

    /// Turn on snapshot mode, so data inside Container is not deleted, but replaced with new version.
-    void enableSnapshotMode(size_t up_to_size)
+    void enableSnapshotMode(size_t up_to_version)
    {
-        container.enableSnapshotMode(up_to_size);
+        container.enableSnapshotMode(up_to_version);
+
    }

    /// Turn off snapshot mode.
--- a/src/Coordination/SnapshotableHashTable.h
+++ b/src/Coordination/SnapshotableHashTable.h
@ -17,6 +17,8 @@ struct ListNode
    StringRef key;
    V value;

+    /// Monotonically increasing version info for snapshot
+    size_t version{0};
    bool active_in_map{true};
    bool free_key{false};
 };
@ -35,7 +37,8 @@ private:
    IndexMap map;
    bool snapshot_mode{false};
    /// Allows to avoid additional copies in updateValue function
-    size_t snapshot_up_to_size = 0;
+    size_t current_version{0};
+    size_t snapshot_up_to_version{0};
    ArenaWithFreeLists arena;
    /// Collect invalid iterators to avoid traversing the whole list
    std::vector<Mapped> snapshot_invalid_iters;
@ -129,8 +132,9 @@ public:

        if (!it)
        {
-            ListElem elem{copyStringInArena(arena, key), value, true};
-            auto itr = list.insert(list.end(), elem);
+
+            ListElem elem{copyStringInArena(arena, key), value, current_version};
+            auto itr = list.insert(list.end(), std::move(elem));
            bool inserted;
            map.emplace(itr->key, it, inserted, hash_value);
            assert(inserted);
@ -151,8 +155,8 @@ public:

        if (it == map.end())
        {
-            ListElem elem{copyStringInArena(arena, key), value, true};
-            auto itr = list.insert(list.end(), elem);
+            ListElem elem{copyStringInArena(arena, key), value, current_version};
+            auto itr = list.insert(list.end(), std::move(elem));
            bool inserted;
            map.emplace(itr->key, it, inserted, hash_value);
            assert(inserted);
@ -163,9 +167,9 @@ public:
            auto list_itr = it->getMapped();
            if (snapshot_mode)
            {
-                ListElem elem{list_itr->key, value, true};
+                ListElem elem{list_itr->key, value, current_version};
                list_itr->active_in_map = false;
-                auto new_list_itr = list.insert(list.end(), elem);
+                auto new_list_itr = list.insert(list.end(), std::move(elem));
                it->getMapped() = new_list_itr;
                snapshot_invalid_iters.push_back(list_itr);
            }
@ -224,14 +228,14 @@ public:
            /// We in snapshot mode but updating some node which is already more
            /// fresh than snapshot distance. So it will not participate in
            /// snapshot and we don't need to copy it.
-            size_t distance = std::distance(list.begin(), list_itr);
-            if (distance < snapshot_up_to_size)
+            if (snapshot_mode && list_itr->version <= snapshot_up_to_version)
            {
                auto elem_copy = *(list_itr);
                list_itr->active_in_map = false;
                snapshot_invalid_iters.push_back(list_itr);
                updater(elem_copy.value);
-                auto itr = list.insert(list.end(), elem_copy);
+                elem_copy.version = current_version;
+                auto itr = list.insert(list.end(), std::move(elem_copy));
                it->getMapped() = itr;
                ret = itr;
            }
@ -289,16 +293,16 @@ public:
        updateDataSize(CLEAR, 0, 0, 0);
    }

-    void enableSnapshotMode(size_t up_to_size)
+    void enableSnapshotMode(size_t version)
    {
        snapshot_mode = true;
-        snapshot_up_to_size = up_to_size;
+        snapshot_up_to_version = version;
+        ++current_version;
    }

    void disableSnapshotMode()
    {
        snapshot_mode = false;
-        snapshot_up_to_size = 0;
    }

    size_t size() const
@ -306,9 +310,9 @@ public:
        return map.size();
    }

-    size_t snapshotSize() const
+    std::pair<size_t, size_t> snapshotSizeWithVersion() const
    {
-        return list.size();
+        return std::make_pair(list.size(), current_version);
    }

    uint64_t getApproximateDataSize() const
--- a/src/Coordination/ZooKeeperDataReader.cpp
+++ b/src/Coordination/ZooKeeperDataReader.cpp
@ -518,7 +518,7 @@ bool deserializeTxn(KeeperStorage & storage, ReadBuffer & in, Poco::Logger * /*l
            if (request->getOpNum() == Coordination::OpNum::Multi && hasErrorsInMultiRequest(request))
                return true;

-            storage.processRequest(request, session_id, zxid, /* check_acl = */ false);
+            storage.processRequest(request, session_id, time, zxid, /* check_acl = */ false);
        }
    }

--- a/src/Coordination/tests/gtest_coordination.cpp
+++ b/src/Coordination/tests/gtest_coordination.cpp
@ -1,3 +1,4 @@
+#include <chrono>
 #include <gtest/gtest.h>

 #include "config_core.h"
@ -406,6 +407,7 @@ TEST_P(CoordinationTest, ChangelogTestCompaction)
    EXPECT_TRUE(fs::exists("./logs/changelog_6_10.bin" + params.extension));

    changelog.compact(6);
+    std::this_thread::sleep_for(std::chrono::microseconds(200));

    EXPECT_FALSE(fs::exists("./logs/changelog_1_5.bin" + params.extension));
    EXPECT_TRUE(fs::exists("./logs/changelog_6_10.bin" + params.extension));
@ -865,7 +867,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
    EXPECT_FALSE(map_snp.insert("/hello", 145).second);
    map_snp.updateValue("/hello", [](IntNode & value) { value = 554; });
    EXPECT_EQ(map_snp.getValue("/hello"), 554);
-    EXPECT_EQ(map_snp.snapshotSize(), 2);
+    EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 2);
    EXPECT_EQ(map_snp.size(), 1);

    auto itr = map_snp.begin();
@ -884,7 +886,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
    }
    EXPECT_EQ(map_snp.getValue("/hello3"), 3);

-    EXPECT_EQ(map_snp.snapshotSize(), 7);
+    EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 7);
    EXPECT_EQ(map_snp.size(), 6);
    itr = std::next(map_snp.begin(), 2);
    for (size_t i = 0; i < 5; ++i)
@ -898,7 +900,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
    EXPECT_TRUE(map_snp.erase("/hello3"));
    EXPECT_TRUE(map_snp.erase("/hello2"));

-    EXPECT_EQ(map_snp.snapshotSize(), 7);
+    EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 7);
    EXPECT_EQ(map_snp.size(), 4);
    itr = std::next(map_snp.begin(), 2);
    for (size_t i = 0; i < 5; ++i)
@ -910,7 +912,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
    }
    map_snp.clearOutdatedNodes();

-    EXPECT_EQ(map_snp.snapshotSize(), 4);
+    EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 4);
    EXPECT_EQ(map_snp.size(), 4);
    itr = map_snp.begin();
    EXPECT_EQ(itr->key, "/hello");
@ -1164,14 +1166,15 @@ TEST_P(CoordinationTest, TestStorageSnapshotMode)
                storage.container.erase("/hello_" + std::to_string(i));
        }
        EXPECT_EQ(storage.container.size(), 26);
-        EXPECT_EQ(storage.container.snapshotSize(), 101);
+        EXPECT_EQ(storage.container.snapshotSizeWithVersion().first, 101);
+        EXPECT_EQ(storage.container.snapshotSizeWithVersion().second, 1);
        auto buf = manager.serializeSnapshotToBuffer(snapshot);
        manager.serializeSnapshotBufferToDisk(*buf, 50);
    }
    EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
    EXPECT_EQ(storage.container.size(), 26);
    storage.clearGarbageAfterSnapshot();
-    EXPECT_EQ(storage.container.snapshotSize(), 26);
+    EXPECT_EQ(storage.container.snapshotSizeWithVersion().first, 26);
    for (size_t i = 0; i < 50; ++i)
    {
        if (i % 2 != 0)
@ -1219,6 +1222,9 @@ nuraft::ptr<nuraft::buffer> getBufferFromZKRequest(int64_t session_id, const Coo
    DB::WriteBufferFromNuraftBuffer buf;
    DB::writeIntBinary(session_id, buf);
    request->write(buf);
+    using namespace std::chrono;
+    auto time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
+    DB::writeIntBinary(time, buf);
    return buf.getBuffer();
 }

@ -1459,6 +1465,7 @@ TEST_P(CoordinationTest, TestRotateIntervalChanges)
    }

    changelog_2.compact(105);
+    std::this_thread::sleep_for(std::chrono::microseconds(200));

    EXPECT_FALSE(fs::exists("./logs/changelog_1_100.bin" + params.extension));
    EXPECT_TRUE(fs::exists("./logs/changelog_101_110.bin" + params.extension));
@ -1478,6 +1485,7 @@ TEST_P(CoordinationTest, TestRotateIntervalChanges)
    }

    changelog_3.compact(125);
+    std::this_thread::sleep_for(std::chrono::microseconds(200));
    EXPECT_FALSE(fs::exists("./logs/changelog_101_110.bin" + params.extension));
    EXPECT_FALSE(fs::exists("./logs/changelog_111_117.bin" + params.extension));
    EXPECT_FALSE(fs::exists("./logs/changelog_118_124.bin" + params.extension));
--- a/src/Core/PostgreSQL/insertPostgreSQLValue.cpp
+++ b/src/Core/PostgreSQL/insertPostgreSQLValue.cpp
@ -108,8 +108,6 @@ void insertPostgreSQLValue(
            ReadBufferFromString in(value);
            DateTime64 time = 0;
            readDateTime64Text(time, 6, in, assert_cast<const DataTypeDateTime64 *>(data_type.get())->getTimeZone());
-            if (time < 0)
-                time = 0;
            assert_cast<DataTypeDateTime64::ColumnType &>(column).insertValue(time);
            break;
        }
--- a/src/Core/Settings.h
+++ b/src/Core/Settings.h
@ -498,7 +498,6 @@ class IColumn;
    /** Experimental feature for moving data between shards. */ \
    \
    M(Bool, allow_experimental_query_deduplication, false, "Experimental data deduplication for SELECT queries based on part UUIDs", 0) \
-    M(Bool, experimental_query_deduplication_send_all_part_uuids, false, "If false only part UUIDs for currently moving parts are sent. If true all read part UUIDs are sent (useful only for testing).", 0) \
    \
    M(Bool, engine_file_empty_if_not_exists, false, "Allows to select data from a file engine table without file", 0) \
    M(Bool, engine_file_truncate_on_insert, false, "Enables or disables truncate before insert in file engine tables", 0) \
--- a/src/Databases/DatabaseAtomic.cpp
+++ b/src/Databases/DatabaseAtomic.cpp
@ -141,9 +141,6 @@ void DatabaseAtomic::dropTable(ContextPtr local_context, const String & table_na
    if (table->storesDataOnDisk())
        tryRemoveSymlink(table_name);

-    if (table->dropTableImmediately())
-        table->drop();
-
    /// Notify DatabaseCatalog that table was dropped. It will remove table data in background.
    /// Cleanup is performed outside of database to allow easily DROP DATABASE without waiting for cleanup to complete.
    DatabaseCatalog::instance().enqueueDroppedTableCleanup(table->getStorageID(), table, table_metadata_path_drop, no_delay);
--- a/src/Disks/AzureBlobStorage/DiskAzureBlobStorage.cpp
+++ b/src/Disks/AzureBlobStorage/DiskAzureBlobStorage.cpp
@ -110,7 +110,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskAzureBlobStorage::writeFile(
        readOrCreateUpdateAndStoreMetadata(path, mode, false, [blob_path, count] (Metadata & metadata) { metadata.addObject(blob_path, count); return true; });
    };

-    return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromAzureBlobStorage>>(std::move(buffer), std::move(create_metadata_callback), path);
+    return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(buffer), std::move(create_metadata_callback), path);
 }


--- a/src/Disks/DiskRestartProxy.cpp
+++ b/src/Disks/DiskRestartProxy.cpp
@ -20,11 +20,26 @@ public:
    RestartAwareReadBuffer(const DiskRestartProxy & disk, std::unique_ptr<ReadBufferFromFileBase> impl_)
        : ReadBufferFromFileDecorator(std::move(impl_)), lock(disk.mutex) { }

-    void prefetch() override { impl->prefetch(); }
+    void prefetch() override
+    {
+        swap(*impl);
+        impl->prefetch();
+        swap(*impl);
+    }

-    void setReadUntilPosition(size_t position) override { impl->setReadUntilPosition(position); }
+    void setReadUntilPosition(size_t position) override
+    {
+        swap(*impl);
+        impl->setReadUntilPosition(position);
+        swap(*impl);
+    }

-    void setReadUntilEnd() override { impl->setReadUntilEnd(); }
+    void setReadUntilEnd() override
+    {
+        swap(*impl);
+        impl->setReadUntilEnd();
+        swap(*impl);
+    }

 private:
    ReadLock lock;
--- a/src/Disks/HDFS/DiskHDFS.cpp
+++ b/src/Disks/HDFS/DiskHDFS.cpp
@ -106,8 +106,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskHDFS::writeFile(const String & path
        readOrCreateUpdateAndStoreMetadata(path, mode, false, [file_name, count] (Metadata & metadata) { metadata.addObject(file_name, count); return true; });
    };

-    return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromHDFS>>(
-        std::move(hdfs_buffer), std::move(create_metadata_callback), path);
+    return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(hdfs_buffer), std::move(create_metadata_callback), path);
 }


--- a/src/Disks/IO/AsynchronousReadIndirectBufferFromRemoteFS.cpp
+++ b/src/Disks/IO/AsynchronousReadIndirectBufferFromRemoteFS.cpp
@ -32,7 +32,7 @@ namespace DB
 namespace ErrorCodes
 {
    extern const int LOGICAL_ERROR;
-    extern const int CANNOT_SEEK_THROUGH_FILE;
+    extern const int ARGUMENT_OUT_OF_BOUND;
 }


@ -146,125 +146,127 @@ bool AsynchronousReadIndirectBufferFromRemoteFS::nextImpl()
        return false;

    size_t size = 0;
-
    if (prefetch_future.valid())
    {
        ProfileEvents::increment(ProfileEvents::RemoteFSPrefetchedReads);

-        CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
-        Stopwatch watch;
+        size_t offset = 0;
        {
+            Stopwatch watch;
+            CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
            auto result = prefetch_future.get();
            size = result.size;
-            auto offset = result.offset;
-            assert(offset < size);
-
-            if (size)
-            {
-                memory.swap(prefetch_buffer);
-                size -= offset;
-                set(memory.data() + offset, size);
-                working_buffer.resize(size);
-                file_offset_of_buffer_end += size;
-            }
+            offset = result.offset;
+            /// If prefetch_future is valid, size should always be greater than zero.
+            assert(offset < size && size > 0);
+            ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
        }

-        watch.stop();
-        ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
+        prefetch_buffer.swap(memory);
+        /// Adjust the working buffer so that it ignores `offset` bytes.
+        setWithBytesToIgnore(memory.data(), size, offset);
    }
    else
    {
        ProfileEvents::increment(ProfileEvents::RemoteFSUnprefetchedReads);
+
        auto result = readInto(memory.data(), memory.size()).get();
        size = result.size;
        auto offset = result.offset;
-        assert(offset < size);
-
+        assert(offset < size || size == 0);
        if (size)
        {
-            size -= offset;
-            set(memory.data() + offset, size);
-            working_buffer.resize(size);
-            file_offset_of_buffer_end += size;
+            /// Adjust the working buffer so that it ignores `offset` bytes.
+            setWithBytesToIgnore(memory.data(), size, offset);
        }
    }

-    if (file_offset_of_buffer_end != impl->offset())
-        throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected equality {} == {}. It's a bug", file_offset_of_buffer_end, impl->offset());
-
+    file_offset_of_buffer_end = impl->offset();
    prefetch_future = {};
    return size;
 }


-off_t AsynchronousReadIndirectBufferFromRemoteFS::seek(off_t offset_, int whence)
+off_t AsynchronousReadIndirectBufferFromRemoteFS::seek(off_t offset, int whence)
 {
    ProfileEvents::increment(ProfileEvents::RemoteFSSeeks);

-    if (whence == SEEK_CUR)
+    size_t new_pos;
+    if (whence == SEEK_SET)
    {
-        /// If position within current working buffer - shift pos.
-        if (!working_buffer.empty() && static_cast<size_t>(getPosition() + offset_) < file_offset_of_buffer_end)
-        {
-            pos += offset_;
-            return getPosition();
-        }
-        else
-        {
-            file_offset_of_buffer_end += offset_;
-        }
+        assert(offset >= 0);
+        new_pos = offset;
    }
-    else if (whence == SEEK_SET)
+    else if (whence == SEEK_CUR)
    {
-        /// If position is within current working buffer - shift pos.
-        if (!working_buffer.empty()
-            && static_cast<size_t>(offset_) >= file_offset_of_buffer_end - working_buffer.size()
-            && size_t(offset_) < file_offset_of_buffer_end)
-        {
-            pos = working_buffer.end() - (file_offset_of_buffer_end - offset_);
+        new_pos = file_offset_of_buffer_end - (working_buffer.end() - pos) + offset;
+    }
+    else
+    {
+        throw Exception("ReadBufferFromFileDescriptor::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
+    }

+    /// Position is unchanged.
+    if (new_pos + (working_buffer.end() - pos) == file_offset_of_buffer_end)
+        return new_pos;
+
+    bool read_from_prefetch = false;
+    while (true)
+    {
+        if (file_offset_of_buffer_end - working_buffer.size() <= new_pos && new_pos <= file_offset_of_buffer_end)
+        {
+            /// Position is still inside the buffer.
+            /// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
+
+            pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
            assert(pos >= working_buffer.begin());
            assert(pos <= working_buffer.end());

-            return getPosition();
+            return new_pos;
        }
-        else
+        else if (prefetch_future.valid())
        {
-            file_offset_of_buffer_end = offset_;
+            /// Read from prefetch buffer and recheck if the new position is valid inside.
+
+            if (nextImpl())
+            {
+                read_from_prefetch = true;
+                continue;
+            }
        }
-    }
-    else
-        throw Exception("Only SEEK_SET or SEEK_CUR modes are allowed.", ErrorCodes::CANNOT_SEEK_THROUGH_FILE);

-    if (prefetch_future.valid())
-    {
-        ProfileEvents::increment(ProfileEvents::RemoteFSCancelledPrefetches);
-        prefetch_future.wait();
-        prefetch_future = {};
+        /// Prefetch is cancelled because of seek.
+        if (read_from_prefetch)
+            ProfileEvents::increment(ProfileEvents::RemoteFSCancelledPrefetches);
+
+        break;
    }

+    assert(!prefetch_future.valid());
+
+    /// First reset the buffer so the next read will fetch new data to the buffer.
    resetWorkingBuffer();

    /**
    * Lazy ignore. Save number of bytes to ignore and ignore it either for prefetch buffer or current buffer.
    * Note: we read in range [file_offset_of_buffer_end, read_until_position).
    */
-    off_t file_offset_before_seek = impl->offset();
    if (impl->initialized()
-        && read_until_position && file_offset_of_buffer_end < *read_until_position
-        && static_cast<off_t>(file_offset_of_buffer_end) > file_offset_before_seek
-        && static_cast<off_t>(file_offset_of_buffer_end) < file_offset_before_seek + static_cast<off_t>(min_bytes_for_seek))
+        && read_until_position && new_pos < *read_until_position
+        && new_pos > file_offset_of_buffer_end
+        && new_pos < file_offset_of_buffer_end + min_bytes_for_seek)
    {
        ProfileEvents::increment(ProfileEvents::RemoteFSLazySeeks);
-        bytes_to_ignore = file_offset_of_buffer_end - file_offset_before_seek;
+        bytes_to_ignore = new_pos - file_offset_of_buffer_end;
    }
    else
    {
        ProfileEvents::increment(ProfileEvents::RemoteFSSeeksWithReset);
        impl->reset();
+        file_offset_of_buffer_end = new_pos;
    }

-    return file_offset_of_buffer_end;
+    return new_pos;
 }


--- a/src/Disks/IO/ReadBufferFromRemoteFSGather.cpp
+++ b/src/Disks/IO/ReadBufferFromRemoteFSGather.cpp
@ -157,6 +157,7 @@ bool ReadBufferFromRemoteFSGather::readImpl()
    if (bytes_to_ignore)
    {
        current_buf->ignore(bytes_to_ignore);
+        file_offset_of_buffer_end += bytes_to_ignore;
        bytes_to_ignore = 0;
    }

--- a/src/Disks/IO/WriteIndirectBufferFromRemoteFS.cpp
+++ b/src/Disks/IO/WriteIndirectBufferFromRemoteFS.cpp
@ -9,9 +9,8 @@
 namespace DB
 {

-template <typename T>
-WriteIndirectBufferFromRemoteFS<T>::WriteIndirectBufferFromRemoteFS(
-    std::unique_ptr<T> impl_,
+WriteIndirectBufferFromRemoteFS::WriteIndirectBufferFromRemoteFS(
+    std::unique_ptr<WriteBuffer> impl_,
    CreateMetadataCallback && create_callback_,
    const String & metadata_file_path_)
    : WriteBufferFromFileDecorator(std::move(impl_))
@ -20,8 +19,8 @@ WriteIndirectBufferFromRemoteFS<T>::WriteIndirectBufferFromRemoteFS(
 {
 }

-template <typename T>
-WriteIndirectBufferFromRemoteFS<T>::~WriteIndirectBufferFromRemoteFS()
+
+WriteIndirectBufferFromRemoteFS::~WriteIndirectBufferFromRemoteFS()
 {
    try
    {
@ -33,29 +32,12 @@ WriteIndirectBufferFromRemoteFS<T>::~WriteIndirectBufferFromRemoteFS()
    }
 }

-template <typename T>
-void WriteIndirectBufferFromRemoteFS<T>::finalizeImpl()
+
+void WriteIndirectBufferFromRemoteFS::finalizeImpl()
 {
    WriteBufferFromFileDecorator::finalizeImpl();
    create_metadata_callback(count());
 }

-#if USE_AWS_S3
-template
-class WriteIndirectBufferFromRemoteFS<WriteBufferFromS3>;
-#endif
-
-#if USE_AZURE_BLOB_STORAGE
-template
-class WriteIndirectBufferFromRemoteFS<WriteBufferFromAzureBlobStorage>;
-#endif
-
-#if USE_HDFS
-template
-class WriteIndirectBufferFromRemoteFS<WriteBufferFromHDFS>;
-#endif
-
-template
-class WriteIndirectBufferFromRemoteFS<WriteBufferFromHTTP>;

 }
--- a/src/Disks/IO/WriteIndirectBufferFromRemoteFS.h
+++ b/src/Disks/IO/WriteIndirectBufferFromRemoteFS.h
@ -12,12 +12,11 @@ namespace DB
 using CreateMetadataCallback = std::function<void(size_t bytes_count)>;

 /// Stores data in S3/HDFS and adds the object path and object size to metadata file on local FS.
-template <typename T>
 class WriteIndirectBufferFromRemoteFS final : public WriteBufferFromFileDecorator
 {
 public:
    WriteIndirectBufferFromRemoteFS(
-        std::unique_ptr<T> impl_,
+        std::unique_ptr<WriteBuffer> impl_,
        CreateMetadataCallback && create_callback_,
        const String & metadata_file_path_);

--- a/src/Disks/S3/DiskS3.cpp
+++ b/src/Disks/S3/DiskS3.cpp
@ -293,7 +293,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
        readOrCreateUpdateAndStoreMetadata(path, mode, false, [blob_name, count] (Metadata & metadata) { metadata.addObject(blob_name, count); return true; });
    };

-    return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromS3>>(std::move(s3_buffer), std::move(create_metadata_callback), path);
+    return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(s3_buffer), std::move(create_metadata_callback), path);
 }

 void DiskS3::createHardLink(const String & src_path, const String & dst_path)
--- a/src/Functions/in.cpp
+++ b/src/Functions/in.cpp
@ -121,7 +121,8 @@ public:

        auto set = column_set->getData();
        auto set_types = set->getDataTypes();
-        if (tuple && (set_types.size() != 1 || !set_types[0]->equals(*type_tuple)))
+
+        if (tuple && set_types.size() != 1 && set_types.size() == tuple->tupleSize())
        {
            const auto & tuple_columns = tuple->getColumns();
            const DataTypes & tuple_types = type_tuple->getElements();
--- a/src/IO/AsynchronousReadBufferFromFileDescriptor.cpp
+++ b/src/IO/AsynchronousReadBufferFromFileDescriptor.cpp
@ -26,6 +26,7 @@ namespace DB
 namespace ErrorCodes
 {
    extern const int ARGUMENT_OUT_OF_BOUND;
+    extern const int LOGICAL_ERROR;
 }


@ -43,6 +44,8 @@ std::future<IAsynchronousReader::Result> AsynchronousReadBufferFromFileDescripto
    request.size = size;
    request.offset = file_offset_of_buffer_end;
    request.priority = priority;
+    request.ignore = bytes_to_ignore;
+    bytes_to_ignore = 0;

    /// This is a workaround of a read pass EOF bug in linux kernel with pread()
    if (file_size.has_value() && file_offset_of_buffer_end >= *file_size)
@ -75,11 +78,14 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
        /// Read request already in flight. Wait for its completion.

        size_t size = 0;
+        size_t offset = 0;
        {
            Stopwatch watch;
            CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
            auto result = prefetch_future.get();
            size = result.size;
+            offset = result.offset;
+            assert(offset < size || size == 0);
            ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
        }

@ -89,8 +95,8 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
        if (size)
        {
            prefetch_buffer.swap(memory);
-            set(memory.data(), memory.size());
-            working_buffer.resize(size);
+            /// Adjust the working buffer so that it ignores `offset` bytes.
+            setWithBytesToIgnore(memory.data(), size, offset);
            return true;
        }

@ -100,13 +106,13 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
    {
        /// No pending request. Do synchronous read.

-        auto [size, _] = readInto(memory.data(), memory.size()).get();
+        auto [size, offset] = readInto(memory.data(), memory.size()).get();
        file_offset_of_buffer_end += size;

        if (size)
        {
-            set(memory.data(), memory.size());
-            working_buffer.resize(size);
+            /// Adjust the working buffer so that it ignores `offset` bytes.
+            setWithBytesToIgnore(memory.data(), size, offset);
            return true;
        }

@ -125,6 +131,30 @@ void AsynchronousReadBufferFromFileDescriptor::finalize()
 }


+AsynchronousReadBufferFromFileDescriptor::AsynchronousReadBufferFromFileDescriptor(
+    AsynchronousReaderPtr reader_,
+    Int32 priority_,
+    int fd_,
+    size_t buf_size,
+    char * existing_memory,
+    size_t alignment,
+    std::optional<size_t> file_size_)
+    : ReadBufferFromFileBase(buf_size, existing_memory, alignment, file_size_)
+    , reader(std::move(reader_))
+    , priority(priority_)
+    , required_alignment(alignment)
+    , fd(fd_)
+{
+    if (required_alignment > buf_size)
+        throw Exception(
+            ErrorCodes::LOGICAL_ERROR,
+            "Too large alignment. Cannot have required_alignment greater than buf_size: {} > {}. It is a bug",
+            required_alignment,
+            buf_size);
+
+    prefetch_buffer.alignment = alignment;
+}
+
 AsynchronousReadBufferFromFileDescriptor::~AsynchronousReadBufferFromFileDescriptor()
 {
    finalize();
@ -153,46 +183,48 @@ off_t AsynchronousReadBufferFromFileDescriptor::seek(off_t offset, int whence)
    if (new_pos + (working_buffer.end() - pos) == file_offset_of_buffer_end)
        return new_pos;

-    if (file_offset_of_buffer_end - working_buffer.size() <= static_cast<size_t>(new_pos)
-        && new_pos <= file_offset_of_buffer_end)
+    while (true)
    {
-        /// Position is still inside the buffer.
-        /// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
-
-        pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
-        assert(pos >= working_buffer.begin());
-        assert(pos <= working_buffer.end());
-
-        return new_pos;
-    }
-    else
-    {
-        if (prefetch_future.valid())
+        if (file_offset_of_buffer_end - working_buffer.size() <= new_pos && new_pos <= file_offset_of_buffer_end)
        {
-            //std::cerr << "Ignoring prefetched data" << "\n";
-            prefetch_future.wait();
-            prefetch_future = {};
+            /// Position is still inside the buffer.
+            /// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
+
+            pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
+            assert(pos >= working_buffer.begin());
+            assert(pos <= working_buffer.end());
+
+            return new_pos;
+        }
+        else if (prefetch_future.valid())
+        {
+            /// Read from prefetch buffer and recheck if the new position is valid inside.
+
+            if (nextImpl())
+                continue;
        }

-        /// Position is out of the buffer, we need to do real seek.
-        off_t seek_pos = required_alignment > 1
-            ? new_pos / required_alignment * required_alignment
-            : new_pos;
-
-        off_t offset_after_seek_pos = new_pos - seek_pos;
-
-        /// First reset the buffer so the next read will fetch new data to the buffer.
-        resetWorkingBuffer();
-
-        /// Just update the info about the next position in file.
-
-        file_offset_of_buffer_end = seek_pos;
-
-        if (offset_after_seek_pos > 0)
-            ignore(offset_after_seek_pos);
-
-        return seek_pos;
+        break;
    }
+
+    assert(!prefetch_future.valid());
+
+    /// Position is out of the buffer, we need to do real seek.
+    off_t seek_pos = required_alignment > 1
+        ? new_pos / required_alignment * required_alignment
+        : new_pos;
+
+    /// First reset the buffer so the next read will fetch new data to the buffer.
+    resetWorkingBuffer();
+
+    /// Just update the info about the next position in file.
+
+    file_offset_of_buffer_end = seek_pos;
+    bytes_to_ignore = new_pos - seek_pos;
+
+    assert(bytes_to_ignore < internal_buffer.size());
+
+    return seek_pos;
 }


--- a/src/IO/AsynchronousReadBufferFromFileDescriptor.h
+++ b/src/IO/AsynchronousReadBufferFromFileDescriptor.h
@ -24,6 +24,7 @@ protected:

    const size_t required_alignment = 0;  /// For O_DIRECT both file offsets and memory addresses have to be aligned.
    size_t file_offset_of_buffer_end = 0; /// What offset in file corresponds to working_buffer.end().
+    size_t bytes_to_ignore = 0;           /// How many bytes should we ignore upon a new read request.
    int fd;

    bool nextImpl() override;
@ -41,15 +42,7 @@ public:
        size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE,
        char * existing_memory = nullptr,
        size_t alignment = 0,
-        std::optional<size_t> file_size_ = std::nullopt)
-        : ReadBufferFromFileBase(buf_size, existing_memory, alignment, file_size_)
-        , reader(std::move(reader_))
-        , priority(priority_)
-        , required_alignment(alignment)
-        , fd(fd_)
-    {
-        prefetch_buffer.alignment = alignment;
-    }
+        std::optional<size_t> file_size_ = std::nullopt);

    ~AsynchronousReadBufferFromFileDescriptor() override;

--- a/src/IO/ReadBuffer.h
+++ b/src/IO/ReadBuffer.h
@ -50,6 +50,29 @@ public:
    // FIXME: behavior differs greately from `BufferBase::set()` and it's very confusing.
    void set(Position ptr, size_t size) { BufferBase::set(ptr, size, 0); working_buffer.resize(0); }

+    /// Set buffer to given piece of memory but with certain bytes ignored from beginning.
+    ///
+    /// internal_buffer: |__________________|
+    /// working_buffer:  |xxxxx|____________|
+    ///                  ^     ^
+    ///               bytes_to_ignore
+    ///
+    /// It's used for lazy seek. We also have another lazy seek mechanism that uses
+    /// `nextimpl_working_buffer_offset` to set offset in `next` method. It's important that we
+    /// don't do double lazy seek, which means `nextimpl_working_buffer_offset` should be zero. It's
+    /// useful to keep internal_buffer points to the real span of the underlying memory, because its
+    /// size might be used to allocate other buffers. It's also important to have pos starts at
+    /// working_buffer.begin(), because some buffers assume this condition to be true and uses
+    /// offset() to check read bytes.
+    void setWithBytesToIgnore(Position ptr, size_t size, size_t bytes_to_ignore)
+    {
+        assert(bytes_to_ignore < size);
+        assert(nextimpl_working_buffer_offset == 0);
+        internal_buffer = Buffer(ptr, ptr + size);
+        working_buffer = Buffer(ptr + bytes_to_ignore, ptr + size);
+        pos = ptr + bytes_to_ignore;
+    }
+
    /** read next data and fill a buffer with it; set position to the beginning;
      * return `false` in case of end, `true` otherwise; throw an exception, if something is wrong
      */
--- a/src/IO/SynchronousReader.cpp
+++ b/src/IO/SynchronousReader.cpp
@ -82,8 +82,7 @@ std::future<IAsynchronousReader::Result> SynchronousReader::submit(Request reque
        watch.stop();
        ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());

-        return Result{ .size = bytes_read, .offset = 0};
-
+        return Result{ .size = bytes_read, .offset = request.ignore };
    });
 }

--- a/src/IO/ThreadPoolReader.cpp
+++ b/src/IO/ThreadPoolReader.cpp
@ -176,7 +176,7 @@ std::future<IAsynchronousReader::Result> ThreadPoolReader::submit(Request reques
            ProfileEvents::increment(ProfileEvents::ThreadPoolReaderPageCacheHitElapsedMicroseconds, watch.elapsedMicroseconds());
            ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());

-            promise.set_value({bytes_read, 0});
+            promise.set_value({bytes_read, request.ignore});
            return future;
        }
    }
@ -219,7 +219,7 @@ std::future<IAsynchronousReader::Result> ThreadPoolReader::submit(Request reques
        ProfileEvents::increment(ProfileEvents::ThreadPoolReaderPageCacheMissElapsedMicroseconds, watch.elapsedMicroseconds());
        ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());

-        return Result{ .size = bytes_read, .offset = 0 };
+        return Result{ .size = bytes_read, .offset = request.ignore };
    });

    auto future = task->get_future();
--- a/src/Storages/FileLog/StorageFileLog.cpp
+++ b/src/Storages/FileLog/StorageFileLog.cpp
@ -95,8 +95,9 @@ StorageFileLog::StorageFileLog(
 void StorageFileLog::loadMetaFiles(bool attach)
 {
    const auto & storage = getStorageID();
+    /// FIXME Why do we need separate directory? Why not to use data directory?
    root_meta_path
-        = std::filesystem::path(getContext()->getPath()) / ".filelog_storage_metadata" / storage.getDatabaseName() / storage.getTableName();
+        = std::filesystem::path(getContext()->getPath()) / "stream_engines/filelog/" / DatabaseCatalog::getPathForUUID(storage.uuid);

    /// Attach table
    if (attach)
--- a/src/Storages/FileLog/StorageFileLog.h
+++ b/src/Storages/FileLog/StorageFileLog.h
@ -52,12 +52,6 @@ public:

    void drop() override;

-    /// We need to call drop() immediately to remove meta data directory,
-    /// otherwise, if another filelog table with same name created before
-    /// the table be dropped finally, then its meta data directory will
-    /// be deleted by this table drop finally
-    bool dropTableImmediately() override { return true; }
-
    const auto & getFormatName() const { return format_name; }

    enum class FileStatus
--- a/src/Storages/IStorage.h
+++ b/src/Storages/IStorage.h
@ -598,10 +598,6 @@ public:
    /// Does not takes underlying Storage (if any) into account.
    virtual std::optional<UInt64> lifetimeBytes() const { return {}; }

-    /// Should table->drop be called at once or with delay (in case of atomic database engine).
-    /// Needed for integration engines, when there must be no delay for calling drop() method.
-    virtual bool dropTableImmediately() { return false; }
-
 private:
    /// Lock required for alter queries (lockForAlter).
    /// Allows to execute only one simultaneous alter query.
--- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@ -1751,8 +1751,6 @@ void MergeTreeDataSelectExecutor::selectPartsToReadWithUUIDFilter(
    PartFilterCounters & counters,
    Poco::Logger * log)
 {
-    const Settings & settings = query_context->getSettings();
-
    /// process_parts prepare parts that have to be read for the query,
    /// returns false if duplicated parts' UUID have been met
    auto select_parts = [&] (MergeTreeData::DataPartsVector & selected_parts) -> bool
@ -1807,14 +1805,11 @@ void MergeTreeDataSelectExecutor::selectPartsToReadWithUUIDFilter(
            counters.num_granules_after_partition_pruner += num_granules;

            /// populate UUIDs and exclude ignored parts if enabled
-            if (part->uuid != UUIDHelpers::Nil)
+            if (part->uuid != UUIDHelpers::Nil && pinned_part_uuids->contains(part->uuid))
            {
-                if (settings.experimental_query_deduplication_send_all_part_uuids || pinned_part_uuids->contains(part->uuid))
-                {
-                    auto result = temp_part_uuids.insert(part->uuid);
-                    if (!result.second)
-                        throw Exception("Found a part with the same UUID on the same replica.", ErrorCodes::LOGICAL_ERROR);
-                }
+                auto result = temp_part_uuids.insert(part->uuid);
+                if (!result.second)
+                    throw Exception("Found a part with the same UUID on the same replica.", ErrorCodes::LOGICAL_ERROR);
            }

            selected_parts.push_back(part_or_projection);
--- a/src/Storages/MergeTree/MergeTreeIndexReader.cpp
+++ b/src/Storages/MergeTree/MergeTreeIndexReader.cpp
@ -54,7 +54,7 @@ MergeTreeIndexReader::MergeTreeIndexReader(
        std::move(settings));
    version = index_format.version;

-    stream->adjustForRange(MarkRange(0, getLastMark(all_mark_ranges_)));
+    stream->adjustRightMark(getLastMark(all_mark_ranges_));
    stream->seekToStart();
 }

--- a/src/Storages/MergeTree/MergeTreeRangeReader.cpp
+++ b/src/Storages/MergeTree/MergeTreeRangeReader.cpp
@ -695,10 +695,10 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar
            {
                auto block = prev_reader->sample_block.cloneWithColumns(read_result.columns);
                auto block_before_prewhere = read_result.block_before_prewhere;
-                for (auto & ctn : block)
+                for (const auto & column : block)
                {
-                    if (block_before_prewhere.has(ctn.name))
-                        block_before_prewhere.erase(ctn.name);
+                    if (block_before_prewhere.has(column.name))
+                        block_before_prewhere.erase(column.name);
                }

                if (block_before_prewhere)
@ -710,8 +710,8 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar
                        block_before_prewhere.setColumns(std::move(old_columns));
                    }

-                    for (auto && ctn : block_before_prewhere)
-                        block.insert(std::move(ctn));
+                    for (auto & column : block_before_prewhere)
+                        block.insert(std::move(column));
                }
                merge_tree_reader->evaluateMissingDefaults(block, columns);
            }
--- a/src/Storages/MergeTree/MergeTreeReaderCompact.cpp
+++ b/src/Storages/MergeTree/MergeTreeReaderCompact.cpp
@ -96,6 +96,7 @@ MergeTreeReaderCompact::MergeTreeReaderCompact(

            cached_buffer = std::move(buffer);
            data_buffer = cached_buffer.get();
+            compressed_data_buffer = cached_buffer.get();
        }
        else
        {
@ -114,6 +115,7 @@ MergeTreeReaderCompact::MergeTreeReaderCompact(

            non_cached_buffer = std::move(buffer);
            data_buffer = non_cached_buffer.get();
+            compressed_data_buffer = non_cached_buffer.get();
        }
    }
    catch (...)
@ -260,10 +262,7 @@ void MergeTreeReaderCompact::seekToMark(size_t row_index, size_t column_index)
    MarkInCompressedFile mark = marks_loader.getMark(row_index, column_index);
    try
    {
-        if (cached_buffer)
-            cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
-        if (non_cached_buffer)
-            non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
+        compressed_data_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
    }
    catch (Exception & e)
    {
@ -288,10 +287,7 @@ void MergeTreeReaderCompact::adjustUpperBound(size_t last_mark)
            return;

        last_right_offset = 0; // Zero value means the end of file.
-        if (cached_buffer)
-            cached_buffer->setReadUntilEnd();
-        if (non_cached_buffer)
-            non_cached_buffer->setReadUntilEnd();
+        data_buffer->setReadUntilEnd();
    }
    else
    {
@ -299,10 +295,7 @@ void MergeTreeReaderCompact::adjustUpperBound(size_t last_mark)
            return;

        last_right_offset = right_offset;
-        if (cached_buffer)
-            cached_buffer->setReadUntilPosition(right_offset);
-        if (non_cached_buffer)
-            non_cached_buffer->setReadUntilPosition(right_offset);
+        data_buffer->setReadUntilPosition(right_offset);
    }
 }

--- a/src/Storages/MergeTree/MergeTreeReaderCompact.h
+++ b/src/Storages/MergeTree/MergeTreeReaderCompact.h
@ -41,6 +41,7 @@ private:
    bool isContinuousReading(size_t mark, size_t column_position);

    ReadBuffer * data_buffer;
+    CompressedReadBufferBase * compressed_data_buffer;
    std::unique_ptr<CachedCompressedReadBuffer> cached_buffer;
    std::unique_ptr<CompressedReadBufferFromFile> non_cached_buffer;

--- a/src/Storages/MergeTree/MergeTreeReaderStream.cpp
+++ b/src/Storages/MergeTree/MergeTreeReaderStream.cpp
@ -42,7 +42,8 @@ MergeTreeReaderStream::MergeTreeReaderStream(
    {
        size_t left_mark = mark_range.begin;
        size_t right_mark = mark_range.end;
-        auto [_, mark_range_bytes] = getRightOffsetAndBytesRange(left_mark, right_mark);
+        size_t left_offset = left_mark < marks_count ? marks_loader.getMark(left_mark).offset_in_compressed_file : 0;
+        auto mark_range_bytes = getRightOffset(right_mark) - left_offset;

        max_mark_range_bytes = std::max(max_mark_range_bytes, mark_range_bytes);
        sum_mark_range_bytes += mark_range_bytes;
@ -85,6 +86,7 @@ MergeTreeReaderStream::MergeTreeReaderStream(

        cached_buffer = std::move(buffer);
        data_buffer = cached_buffer.get();
+        compressed_data_buffer = cached_buffer.get();
    }
    else
    {
@ -102,22 +104,21 @@ MergeTreeReaderStream::MergeTreeReaderStream(

        non_cached_buffer = std::move(buffer);
        data_buffer = non_cached_buffer.get();
+        compressed_data_buffer = non_cached_buffer.get();
    }
 }


-std::pair<size_t, size_t> MergeTreeReaderStream::getRightOffsetAndBytesRange(size_t left_mark, size_t right_mark_non_included)
+size_t MergeTreeReaderStream::getRightOffset(size_t right_mark_non_included)
 {
    /// NOTE: if we are reading the whole file, then right_mark == marks_count
    /// and we will use max_read_buffer_size for buffer size, thus avoiding the need to load marks.

    /// Special case, can happen in Collapsing/Replacing engines
    if (marks_count == 0)
-        return std::make_pair(0, 0);
+        return 0;

-    assert(left_mark < marks_count);
    assert(right_mark_non_included <= marks_count);
-    assert(left_mark <= right_mark_non_included);

    size_t result_right_offset;
    if (0 < right_mark_non_included && right_mark_non_included < marks_count)
@ -177,30 +178,20 @@ std::pair<size_t, size_t> MergeTreeReaderStream::getRightOffsetAndBytesRange(siz
        }
    }
    else if (right_mark_non_included == 0)
-    {
        result_right_offset = marks_loader.getMark(right_mark_non_included).offset_in_compressed_file;
-    }
    else
-    {
        result_right_offset = file_size;
-    }

-    size_t mark_range_bytes = result_right_offset - (left_mark < marks_count ? marks_loader.getMark(left_mark).offset_in_compressed_file : 0);
-
-    return std::make_pair(result_right_offset, mark_range_bytes);
+    return result_right_offset;
 }

-
 void MergeTreeReaderStream::seekToMark(size_t index)
 {
    MarkInCompressedFile mark = marks_loader.getMark(index);

    try
    {
-        if (cached_buffer)
-            cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
-        if (non_cached_buffer)
-            non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
+        compressed_data_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
    }
    catch (Exception & e)
    {
@ -220,10 +211,7 @@ void MergeTreeReaderStream::seekToStart()
 {
    try
    {
-        if (cached_buffer)
-            cached_buffer->seek(0, 0);
-        if (non_cached_buffer)
-            non_cached_buffer->seek(0, 0);
+        compressed_data_buffer->seek(0, 0);
    }
    catch (Exception & e)
    {
@ -236,24 +224,21 @@ void MergeTreeReaderStream::seekToStart()
 }


-void MergeTreeReaderStream::adjustForRange(MarkRange range)
+void MergeTreeReaderStream::adjustRightMark(size_t right_mark)
 {
    /**
     * Note: this method is called multiple times for the same range of marks -- each time we
     * read from stream, but we must update last_right_offset only if it is bigger than
     * the last one to avoid redundantly cancelling prefetches.
     */
-    auto [right_offset, _] = getRightOffsetAndBytesRange(range.begin, range.end);
+    auto right_offset = getRightOffset(right_mark);
    if (!right_offset)
    {
        if (last_right_offset && *last_right_offset == 0)
            return;

        last_right_offset = 0; // Zero value means the end of file.
-        if (cached_buffer)
-            cached_buffer->setReadUntilEnd();
-        if (non_cached_buffer)
-            non_cached_buffer->setReadUntilEnd();
+        data_buffer->setReadUntilEnd();
    }
    else
    {
@ -261,10 +246,7 @@ void MergeTreeReaderStream::adjustForRange(MarkRange range)
            return;

        last_right_offset = right_offset;
-        if (cached_buffer)
-            cached_buffer->setReadUntilPosition(right_offset);
-        if (non_cached_buffer)
-            non_cached_buffer->setReadUntilPosition(right_offset);
+        data_buffer->setReadUntilPosition(right_offset);
    }
 }

--- a/src/Storages/MergeTree/MergeTreeReaderStream.h
+++ b/src/Storages/MergeTree/MergeTreeReaderStream.h
@ -34,12 +34,13 @@ public:
     * Does buffer need to know something about mark ranges bounds it is going to read?
     * (In case of MergeTree* tables). Mostly needed for reading from remote fs.
     */
-    void adjustForRange(MarkRange range);
+    void adjustRightMark(size_t right_mark);

    ReadBuffer * data_buffer;
+    CompressedReadBufferBase * compressed_data_buffer;

 private:
-    std::pair<size_t, size_t> getRightOffsetAndBytesRange(size_t left_mark, size_t right_mark_non_included);
+    size_t getRightOffset(size_t right_mark_non_included);

    DiskPtr disk;
    std::string path_prefix;
--- a/src/Storages/MergeTree/MergeTreeReaderWide.cpp
+++ b/src/Storages/MergeTree/MergeTreeReaderWide.cpp
@ -212,7 +212,7 @@ static ReadBuffer * getStream(
        return nullptr;

    MergeTreeReaderStream & stream = *it->second;
-    stream.adjustForRange(MarkRange(seek_to_start ? 0 : from_mark, current_task_last_mark));
+    stream.adjustRightMark(current_task_last_mark);

    if (seek_to_start)
        stream.seekToStart();
--- a/tests/ci/docker_images_check.py
+++ b/tests/ci/docker_images_check.py
@ -349,14 +349,20 @@ def parse_args() -> argparse.Namespace:
        help="list of image paths to build instead of using pr_info + diff URL, "
        "e.g. 'docker/packager/binary'",
    )
+    parser.add_argument("--reports", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-reports",
-        action="store_true",
+        action="store_false",
+        dest="reports",
+        default=argparse.SUPPRESS,
        help="don't push reports to S3 and github",
    )
+    parser.add_argument("--push", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-push-images",
-        action="store_true",
+        action="store_false",
+        dest="push",
+        default=argparse.SUPPRESS,
        help="don't push images to docker hub",
    )

@ -375,8 +381,7 @@ def main():
    else:
        changed_json = os.path.join(TEMP_PATH, "changed_images.json")

-    push = not args.no_push_images
-    if push:
+    if args.push:
        subprocess.check_output(  # pylint: disable=unexpected-keyword-arg
            "docker login --username 'robotclickhouse' --password-stdin",
            input=get_parameter_from_ssm("dockerhub_robot_password"),
@ -408,7 +413,7 @@ def main():
    images_processing_result = []
    for image in changed_images:
        images_processing_result += process_image_with_parents(
-            image, image_versions, push
+            image, image_versions, args.push
        )
        result_images[image.repo] = result_version

@ -437,7 +442,7 @@ def main():
    print(f"::notice ::Report url: {url}")
    print(f'::set-output name=url_output::"{url}"')

-    if args.no_reports:
+    if not args.reports:
        return

    gh = Github(get_best_robot_token())
--- a/tests/ci/docker_manifests_merge.py
+++ b/tests/ci/docker_manifests_merge.py
@ -44,14 +44,20 @@ def parse_args() -> argparse.Namespace:
        default=RUNNER_TEMP,
        help="path to changed_images_*.json files",
    )
+    parser.add_argument("--reports", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-reports",
-        action="store_true",
+        action="store_false",
+        dest="reports",
+        default=argparse.SUPPRESS,
        help="don't push reports to S3 and github",
    )
+    parser.add_argument("--push", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-push-images",
-        action="store_true",
+        action="store_false",
+        dest="push",
+        default=argparse.SUPPRESS,
        help="don't push images to docker hub",
    )

@ -63,7 +69,7 @@ def parse_args() -> argparse.Namespace:


 def load_images(path: str, suffix: str) -> Images:
-    with open(os.path.join(path, CHANGED_IMAGES.format(suffix)), "r") as images:
+    with open(os.path.join(path, CHANGED_IMAGES.format(suffix)), "rb") as images:
        return json.load(images)


@ -125,39 +131,37 @@ def merge_images(to_merge: Dict[str, Images]) -> Dict[str, List[List[str]]]:
 def create_manifest(image: str, tags: List[str], push: bool) -> Tuple[str, str]:
    tag = tags[0]
    manifest = f"{image}:{tag}"
-    cmd = "docker manifest create --amend {}".format(
-        " ".join((f"{image}:{t}" for t in tags))
-    )
+    cmd = "docker manifest create --amend " + " ".join((f"{image}:{t}" for t in tags))
    logging.info("running: %s", cmd)
-    popen = subprocess.Popen(
+    with subprocess.Popen(
        cmd,
        shell=True,
        stderr=subprocess.STDOUT,
        stdout=subprocess.PIPE,
        universal_newlines=True,
-    )
-    retcode = popen.wait()
-    if retcode != 0:
-        output = popen.stdout.read()  # type: ignore
-        logging.error("failed to create manifest for %s:\n %s\n", manifest, output)
-        return manifest, "FAIL"
-    if not push:
-        return manifest, "OK"
+    ) as popen:
+        retcode = popen.wait()
+        if retcode != 0:
+            output = popen.stdout.read()  # type: ignore
+            logging.error("failed to create manifest for %s:\n %s\n", manifest, output)
+            return manifest, "FAIL"
+        if not push:
+            return manifest, "OK"

    cmd = f"docker manifest push {manifest}"
    logging.info("running: %s", cmd)
-    popen = subprocess.Popen(
+    with subprocess.Popen(
        cmd,
        shell=True,
        stderr=subprocess.STDOUT,
        stdout=subprocess.PIPE,
        universal_newlines=True,
-    )
-    retcode = popen.wait()
-    if retcode != 0:
-        output = popen.stdout.read()  # type: ignore
-        logging.error("failed to push %s:\n %s\n", manifest, output)
-        return manifest, "FAIL"
+    ) as popen:
+        retcode = popen.wait()
+        if retcode != 0:
+            output = popen.stdout.read()  # type: ignore
+            logging.error("failed to push %s:\n %s\n", manifest, output)
+            return manifest, "FAIL"

    return manifest, "OK"

@ -167,8 +171,7 @@ def main():
    stopwatch = Stopwatch()

    args = parse_args()
-    push = not args.no_push_images
-    if push:
+    if args.push:
        subprocess.check_output(  # pylint: disable=unexpected-keyword-arg
            "docker login --username 'robotclickhouse' --password-stdin",
            input=get_parameter_from_ssm("dockerhub_robot_password"),
@ -189,12 +192,14 @@ def main():
    test_results = []  # type: List[Tuple[str, str]]
    for image, versions in merged.items():
        for tags in versions:
-            manifest, test_result = create_manifest(image, tags, push)
+            manifest, test_result = create_manifest(image, tags, args.push)
            test_results.append((manifest, test_result))
            if test_result != "OK":
                status = "failure"

-    with open(os.path.join(args.path, "changed_images.json"), "w") as ci:
+    with open(
+        os.path.join(args.path, "changed_images.json"), "w", encoding="utf-8"
+    ) as ci:
        json.dump(changed_images, ci)

    pr_info = PRInfo()
@ -202,10 +207,10 @@ def main():

    url = upload_results(s3_helper, pr_info.number, pr_info.sha, test_results, [], NAME)

-    print("::notice ::Report url: {}".format(url))
-    print('::set-output name=url_output::"{}"'.format(url))
+    print(f"::notice ::Report url: {url}")
+    print(f'::set-output name=url_output::"{url}"')

-    if args.no_reports:
+    if not args.reports:
        return

    if changed_images:
--- a/tests/ci/git_helper.py
+++ b/tests/ci/git_helper.py
@ -5,8 +5,13 @@ import re
 import subprocess
 from typing import Optional

-TAG_REGEXP = r"^v\d{2}[.][1-9]\d*[.][1-9]\d*[.][1-9]\d*-(testing|prestable|stable|lts)$"
-SHA_REGEXP = r"^([0-9]|[a-f]){40}$"
+# ^ and $ match subline in `multiple\nlines`
+# \A and \Z match only start and end of the whole string
+RELEASE_BRANCH_REGEXP = r"\A\d+[.]\d+\Z"
+TAG_REGEXP = (
+    r"\Av\d{2}[.][1-9]\d*[.][1-9]\d*[.][1-9]\d*-(testing|prestable|stable|lts)\Z"
+)
+SHA_REGEXP = r"\A([0-9]|[a-f]){40}\Z"


 # Py 3.8 removeprefix and removesuffix
@ -31,6 +36,13 @@ def commit(name: str):
    return name


+def release_branch(name: str):
+    r = re.compile(RELEASE_BRANCH_REGEXP)
+    if not r.match(name):
+        raise argparse.ArgumentTypeError("release branch should be as 12.1")
+    return name
+
+
 class Runner:
    """lightweight check_output wrapper with stripping last NEW_LINE"""

--- a/tests/ci/git_test.py
+++ b/tests/ci/git_test.py
@ -0,0 +1,68 @@
+#!/usr/bin/env python
+
+from unittest.mock import patch
+import os.path as p
+import unittest
+
+from git_helper import Git, Runner
+
+
+class TestRunner(unittest.TestCase):
+    def test_init(self):
+        runner = Runner()
+        self.assertEqual(runner.cwd, p.realpath(p.dirname(__file__)))
+        runner = Runner("/")
+        self.assertEqual(runner.cwd, "/")
+
+    def test_run(self):
+        runner = Runner()
+        output = runner.run("echo 1")
+        self.assertEqual(output, "1")
+
+
+class TestGit(unittest.TestCase):
+    def setUp(self):
+        """we use dummy git object"""
+        run_patcher = patch("git_helper.Runner.run", return_value="")
+        self.run_mock = run_patcher.start()
+        self.addCleanup(run_patcher.stop)
+        update_patcher = patch("git_helper.Git.update")
+        update_mock = update_patcher.start()
+        self.addCleanup(update_patcher.stop)
+        self.git = Git()
+        update_mock.assert_called_once()
+        self.run_mock.assert_called_once()
+        self.git.new_branch = "NEW_BRANCH_NAME"
+        self.git.new_tag = "v21.12.333.22222-stable"
+        self.git.branch = "old_branch"
+        self.git.sha = ""
+        self.git.sha_short = ""
+        self.git.latest_tag = ""
+        self.git.description = ""
+        self.git.commits_since_tag = 0
+
+    def test_tags(self):
+        self.git.new_tag = "v21.12.333.22222-stable"
+        self.git.latest_tag = "v21.12.333.22222-stable"
+        for tag_attr in ("new_tag", "latest_tag"):
+            self.assertEqual(getattr(self.git, tag_attr), "v21.12.333.22222-stable")
+            setattr(self.git, tag_attr, "")
+            self.assertEqual(getattr(self.git, tag_attr), "")
+            for tag in (
+                "v21.12.333-stable",
+                "v21.12.333-prestable",
+                "21.12.333.22222-stable",
+                "v21.12.333.22222-production",
+            ):
+                with self.assertRaises(Exception):
+                    setattr(self.git, tag_attr, tag)
+
+    def test_tweak(self):
+        self.git.commits_since_tag = 0
+        self.assertEqual(self.git.tweak, 1)
+        self.git.commits_since_tag = 2
+        self.assertEqual(self.git.tweak, 2)
+        self.git.latest_tag = "v21.12.333.22222-testing"
+        self.assertEqual(self.git.tweak, 22224)
+        self.git.commits_since_tag = 0
+        self.assertEqual(self.git.tweak, 22222)
--- a/tests/ci/push_to_artifactory.py
+++ b/tests/ci/push_to_artifactory.py
@ -253,15 +253,21 @@ def parse_args() -> argparse.Namespace:
        default="https://clickhousedb.jfrog.io/artifactory",
        help="SaaS Artifactory url",
    )
+    parser.add_argument("--artifactory", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "-n",
        "--no-artifactory",
-        action="store_true",
+        action="store_false",
+        dest="artifactory",
+        default=argparse.SUPPRESS,
        help="do not push packages to artifactory",
    )
+    parser.add_argument("--force-download", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-force-download",
-        action="store_true",
+        action="store_false",
+        dest="force_download",
+        default=argparse.SUPPRESS,
        help="do not download packages again if they exist already",
    )

@ -303,10 +309,10 @@ def main():
        args.commit,
        args.check_name,
        args.release.version,
-        not args.no_force_download,
+        args.force_download,
    )
    art_client = None
-    if not args.no_artifactory:
+    if args.artifactory:
        art_client = Artifactory(args.artifactory_url, args.release.type)

    if args.deb:
--- a/tests/ci/release.py
+++ b/tests/ci/release.py
@ -6,7 +6,7 @@ from typing import List, Optional
 import argparse
 import logging

-from git_helper import commit
+from git_helper import commit, release_branch
 from version_helper import (
    FILE_WITH_VERSION_PATH,
    ClickHouseVersion,
@ -18,14 +18,45 @@ from version_helper import (
 )


+class Repo:
+    VALID = ("ssh", "https", "origin")
+
+    def __init__(self, repo: str, protocol: str):
+        self._repo = repo
+        self._url = ""
+        self.url = protocol
+
+    @property
+    def url(self) -> str:
+        return self._url
+
+    @url.setter
+    def url(self, protocol: str):
+        if protocol == "ssh":
+            self._url = f"git@github.com:{self}.git"
+        elif protocol == "https":
+            self._url = f"https://github.com/{self}.git"
+        elif protocol == "origin":
+            self._url = protocol
+        else:
+            raise Exception(f"protocol must be in {self.VALID}")
+
+    def __str__(self):
+        return self._repo
+
+
 class Release:
    BIG = ("major", "minor")
    SMALL = ("patch",)

-    def __init__(self, version: ClickHouseVersion):
-        self._version = version
-        self._git = version._git
+    def __init__(self, repo: Repo, release_commit: str, release_type: str):
+        self.repo = repo
        self._release_commit = ""
+        self.release_commit = release_commit
+        self.release_type = release_type
+        self._version = get_version_from_repo()
+        self._git = self._version._git
+        self._release_branch = ""
        self._rollback_stack = []  # type: List[str]

    def run(self, cmd: str, cwd: Optional[str] = None) -> str:
@ -35,32 +66,45 @@ class Release:
        logging.info("Running command%s:\n    %s", cwd_text, cmd)
        return self._git.run(cmd, cwd)

-    def update(self):
+    def set_release_branch(self):
+        # Get the actual version for the commit before check
+        with self._checkout(self.release_commit, True):
+            self.read_version()
+            self.release_branch = f"{self.version.major}.{self.version.minor}"
+
+        self.read_version()
+
+    def read_version(self):
        self._git.update()
        self.version = get_version_from_repo()

-    def do(self, args: argparse.Namespace):
-        self.release_commit = args.commit
+    def do(self, check_dirty: bool, check_branch: bool, with_prestable: bool):

-        if not args.no_check_dirty:
+        if check_dirty:
            logging.info("Checking if repo is clean")
            self.run("git diff HEAD --exit-code")

-        if not args.no_check_branch:
-            self.check_branch(args.release_type)
+        self.set_release_branch()

-        if args.release_type in self.BIG:
-            # Checkout to the commit, it will provide the correct current version
-            with self._checkout(self.release_commit, True):
-                if args.no_prestable:
+        if check_branch:
+            self.check_branch()
+
+        with self._checkout(self.release_commit, True):
+            if self.release_type in self.BIG:
+                # Checkout to the commit, it will provide the correct current version
+                if with_prestable:
                    logging.info("Skipping prestable stage")
                else:
-                    with self.prestable(args):
+                    with self.prestable():
                        logging.info("Prestable part of the releasing is done")

-                with self.testing(args):
+                with self.testing():
                    logging.info("Testing part of the releasing is done")

+            elif self.release_type in self.SMALL:
+                with self.stable():
+                    logging.info("Stable part of the releasing is done")
+
        self.log_rollback()

    def check_no_tags_after(self):
@ -71,19 +115,27 @@ class Release:
                f"{tags_after_commit}\nChoose another commit"
            )

-    def check_branch(self, release_type: str):
-        if release_type in self.BIG:
+    def check_branch(self):
+        if self.release_type in self.BIG:
            # Commit to spin up the release must belong to a main branch
-            output = self.run(f"git branch --contains={self.release_commit} master")
-            if "master" not in output:
+            branch = "master"
+            output = self.run(f"git branch --contains={self.release_commit} {branch}")
+            if branch not in output:
                raise Exception(
-                    f"commit {self.release_commit} must belong to 'master' for "
-                    f"{release_type} release"
+                    f"commit {self.release_commit} must belong to {branch} for "
+                    f"{self.release_type} release"
                )
-        if release_type in self.SMALL:
-            branch = f"{self.version.major}.{self.version.minor}"
-            if self._git.branch != branch:
-                raise Exception(f"branch must be '{branch}' for {release_type} release")
+            return
+        elif self.release_type in self.SMALL:
+            output = self.run(
+                f"git branch --contains={self.release_commit} {self.release_branch}"
+            )
+            if self.release_branch not in output:
+                raise Exception(
+                    f"commit {self.release_commit} must be in "
+                    f"'{self.release_branch}' branch for {self.release_type} release"
+                )
+            return

    def log_rollback(self):
        if self._rollback_stack:
@ -95,31 +147,60 @@ class Release:
            )

    @contextmanager
-    def prestable(self, args: argparse.Namespace):
+    def prestable(self):
        self.check_no_tags_after()
        # Create release branch
-        self.update()
-        release_branch = f"{self.version.major}.{self.version.minor}"
-        with self._create_branch(release_branch, self.release_commit):
-            with self._checkout(release_branch, True):
-                self.update()
+        self.read_version()
+        with self._create_branch(self.release_branch, self.release_commit):
+            with self._checkout(self.release_branch, True):
+                self.read_version()
                self.version.with_description(VersionType.PRESTABLE)
-                with self._create_gh_release(args):
-                    with self._bump_prestable_version(release_branch, args):
+                with self._create_gh_release(True):
+                    with self._bump_prestable_version():
                        # At this point everything will rollback automatically
                        yield

    @contextmanager
-    def testing(self, args: argparse.Namespace):
+    def stable(self):
+        self.check_no_tags_after()
+        self.read_version()
+        version_type = VersionType.STABLE
+        if self.version.minor % 5 == 3:  # our 3 and 8 are LTS
+            version_type = VersionType.LTS
+        self.version.with_description(version_type)
+        with self._create_gh_release(False):
+            self.version = self.version.update(self.release_type)
+            self.version.with_description(version_type)
+            update_cmake_version(self.version)
+            cmake_path = get_abs_path(FILE_WITH_VERSION_PATH)
+            # Checkouting the commit of the branch and not the branch itself,
+            # then we are able to skip rollback
+            with self._checkout(f"{self.release_branch}@{{0}}", False):
+                current_commit = self.run("git rev-parse HEAD")
+                self.run(
+                    f"git commit -m "
+                    f"'Update version to {self.version.string}' '{cmake_path}'"
+                )
+                with self._push(
+                    "HEAD", with_rollback_on_fail=False, remote_ref=self.release_branch
+                ):
+                    # DO NOT PUT ANYTHING ELSE HERE
+                    # The push must be the last action and mean the successful release
+                    self._rollback_stack.append(
+                        f"git push {self.repo.url} "
+                        f"+{current_commit}:{self.release_branch}"
+                    )
+                    yield
+
+    @contextmanager
+    def testing(self):
        # Create branch for a version bump
-        self.update()
-        self.version = self.version.update(args.release_type)
+        self.read_version()
+        self.version = self.version.update(self.release_type)
        helper_branch = f"{self.version.major}.{self.version.minor}-prepare"
        with self._create_branch(helper_branch, self.release_commit):
            with self._checkout(helper_branch, True):
-                self.update()
-                self.version = self.version.update(args.release_type)
-                with self._bump_testing_version(helper_branch, args):
+                with self._bump_testing_version(helper_branch):
                    yield

    @property
@ -132,6 +213,14 @@ class Release:
            raise ValueError(f"version must be ClickHouseVersion, not {type(version)}")
        self._version = version

+    @property
+    def release_branch(self) -> str:
+        return self._release_branch
+
+    @release_branch.setter
+    def release_branch(self, branch: str):
+        self._release_branch = release_branch(branch)
+
    @property
    def release_commit(self) -> str:
        return self._release_commit
@ -141,7 +230,8 @@ class Release:
        self._release_commit = commit(release_commit)

    @contextmanager
-    def _bump_prestable_version(self, release_branch: str, args: argparse.Namespace):
+    def _bump_prestable_version(self):
+        # Update only git, origal version stays the same
        self._git.update()
        new_version = self.version.patch_update()
        new_version.with_description("prestable")
@ -150,35 +240,38 @@ class Release:
        self.run(
            f"git commit -m 'Update version to {new_version.string}' '{cmake_path}'"
        )
-        with self._push(release_branch, args):
+        with self._push(self.release_branch):
            with self._create_gh_label(
-                f"v{release_branch}-must-backport", "10dbed", args
+                f"v{self.release_branch}-must-backport", "10dbed"
            ):
                with self._create_gh_label(
-                    f"v{release_branch}-affected", "c2bfff", args
+                    f"v{self.release_branch}-affected", "c2bfff"
                ):
                    self.run(
-                        f"gh pr create --repo {args.repo} --title 'Release pull "
-                        f"request for branch {release_branch}' --head {release_branch} "
+                        f"gh pr create --repo {self.repo} --title "
+                        f"'Release pull request for branch {self.release_branch}' "
+                        f"--head {self.release_branch}  --label release "
                        "--body 'This PullRequest is a part of ClickHouse release "
                        "cycle. It is used by CI system only. Do not perform any "
-                        "changes with it.' --label release"
+                        "changes with it.'"
                    )
                    # Here the prestable part is done
                    yield

    @contextmanager
-    def _bump_testing_version(self, helper_branch: str, args: argparse.Namespace):
+    def _bump_testing_version(self, helper_branch: str):
+        self.read_version()
+        self.version = self.version.update(self.release_type)
        self.version.with_description("testing")
        update_cmake_version(self.version)
        cmake_path = get_abs_path(FILE_WITH_VERSION_PATH)
        self.run(
            f"git commit -m 'Update version to {self.version.string}' '{cmake_path}'"
        )
-        with self._push(helper_branch, args):
+        with self._push(helper_branch):
            body_file = get_abs_path(".github/PULL_REQUEST_TEMPLATE.md")
            self.run(
-                f"gh pr create --repo {args.repo} --title 'Update version after "
+                f"gh pr create --repo {self.repo} --title 'Update version after "
                f"release' --head {helper_branch} --body-file '{body_file}'"
            )
            # Here the prestable part is done
@ -216,9 +309,12 @@ class Release:
            raise

    @contextmanager
-    def _create_gh_label(self, label: str, color: str, args: argparse.Namespace):
-        self.run(f"gh api repos/{args.repo}/labels -f name={label} -f color={color}")
-        rollback_cmd = f"gh api repos/{args.repo}/labels/{label} -X DELETE"
+    def _create_gh_label(self, label: str, color_hex: str):
+        # API call, https://docs.github.com/en/rest/reference/issues#create-a-label
+        self.run(
+            f"gh api repos/{self.repo}/labels -f name={label} -f color={color_hex}"
+        )
+        rollback_cmd = f"gh api repos/{self.repo}/labels/{label} -X DELETE"
        self._rollback_stack.append(rollback_cmd)
        try:
            yield
@ -228,15 +324,18 @@ class Release:
            raise

    @contextmanager
-    def _create_gh_release(self, args: argparse.Namespace):
-        with self._create_tag(args):
+    def _create_gh_release(self, as_prerelease: bool):
+        with self._create_tag():
            # Preserve tag if version is changed
            tag = self.version.describe
+            prerelease = ""
+            if as_prerelease:
+                prerelease = "--prerelease"
            self.run(
-                f"gh release create --prerelease --draft --repo {args.repo} "
+                f"gh release create {prerelease} --draft --repo {self.repo} "
                f"--title 'Release {tag}' '{tag}'"
            )
-            rollback_cmd = f"gh release delete --yes --repo {args.repo} '{tag}'"
+            rollback_cmd = f"gh release delete --yes --repo {self.repo} '{tag}'"
            self._rollback_stack.append(rollback_cmd)
            try:
                yield
@ -246,13 +345,13 @@ class Release:
                raise

    @contextmanager
-    def _create_tag(self, args: argparse.Namespace):
+    def _create_tag(self):
        tag = self.version.describe
        self.run(f"git tag -a -m 'Release {tag}' '{tag}'")
        rollback_cmd = f"git tag -d '{tag}'"
        self._rollback_stack.append(rollback_cmd)
        try:
-            with self._push(f"'{tag}'", args):
+            with self._push(f"'{tag}'"):
                yield
        except BaseException:
            logging.warning("Rolling back tag %s", tag)
@ -260,15 +359,22 @@ class Release:
            raise

    @contextmanager
-    def _push(self, ref: str, args: argparse.Namespace):
-        self.run(f"git push git@github.com:{args.repo}.git {ref}")
-        rollback_cmd = f"git push -d git@github.com:{args.repo}.git {ref}"
-        self._rollback_stack.append(rollback_cmd)
+    def _push(self, ref: str, with_rollback_on_fail: bool = True, remote_ref: str = ""):
+        if remote_ref == "":
+            remote_ref = ref
+
+        self.run(f"git push {self.repo.url} {ref}:{remote_ref}")
+        if with_rollback_on_fail:
+            rollback_cmd = f"git push -d {self.repo.url} {remote_ref}"
+            self._rollback_stack.append(rollback_cmd)
+
        try:
            yield
        except BaseException:
-            logging.warning("Rolling back pushed ref %s", ref)
-            self.run(rollback_cmd)
+            if with_rollback_on_fail:
+                logging.warning("Rolling back pushed ref %s", ref)
+                self.run(rollback_cmd)
+
            raise


@ -284,52 +390,66 @@ def parse_args() -> argparse.Namespace:
        default="ClickHouse/ClickHouse",
        help="repository to create the release",
    )
+    parser.add_argument(
+        "--remote-protocol",
+        "-p",
+        default="ssh",
+        choices=Repo.VALID,
+        help="repo protocol for git commands remote, 'origin' is a special case and "
+        "uses 'origin' as a remote",
+    )
    parser.add_argument(
        "--type",
        default="minor",
-        # choices=Release.BIG+Release.SMALL, # add support later
        choices=Release.BIG + Release.SMALL,
        dest="release_type",
        help="a release type, new branch is created only for 'major' and 'minor'",
    )
-    parser.add_argument(
-        "--no-prestable",
-        action="store_true",
-        help=f"for release types in {Release.BIG} skip creating prestable release and "
-        "release branch",
-    )
    parser.add_argument(
        "--commit",
        default=git.sha,
        type=commit,
        help="commit create a release, default to HEAD",
    )
+    parser.add_argument("--with-prestable", default=True, help=argparse.SUPPRESS)
+    parser.add_argument(
+        "--no-prestable",
+        dest="with_prestable",
+        action="store_false",
+        default=argparse.SUPPRESS,
+        help=f"if set, for release types in {Release.BIG} skip creating prestable "
+        "release and  release branch",
+    )
+    parser.add_argument("--check-dirty", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-check-dirty",
-        action="store_true",
-        help="skip check repository for uncommited changes",
+        dest="check_dirty",
+        action="store_false",
+        default=argparse.SUPPRESS,
+        help="(dangerous) if set, skip check repository for uncommited changes",
    )
+    parser.add_argument("--check-branch", default=True, help=argparse.SUPPRESS)
    parser.add_argument(
        "--no-check-branch",
-        action="store_true",
-        help="by default, 'major' and 'minor' types work only for master, and 'patch' "
-        "works only for a release branches, that name should be the same as "
-        "'$MAJOR.$MINOR' version, e.g. 22.2",
+        dest="check_branch",
+        action="store_false",
+        default=argparse.SUPPRESS,
+        help="(debug or development only) if set, skip the branch check for a run. "
+        "By default, 'major' and 'minor' types workonly for master, and 'patch' works "
+        "only for a release branches, that name "
+        "should be the same as '$MAJOR.$MINOR' version, e.g. 22.2",
    )

    return parser.parse_args()


-def prestable():
-    pass
-
-
 def main():
    logging.basicConfig(level=logging.INFO)
    args = parse_args()
-    release = Release(get_version_from_repo())
+    repo = Repo(args.repo, args.remote_protocol)
+    release = Release(repo, args.commit, args.release_type)

-    release.do(args)
+    release.do(args.check_dirty, args.check_branch, args.with_prestable)


 if __name__ == "__main__":
--- a/tests/ci/team_keys_lambda/app.py
+++ b/tests/ci/team_keys_lambda/app.py
@ -105,4 +105,4 @@ if __name__ == "__main__":
    args = parser.parse_args()
    keys = main(args.token, args.organization, args.team)

-    print(f"Just shoing off the keys:\n{keys}")
+    print(f"# Just shoing off the keys:\n{keys}")
--- a/tests/integration/test_keeper_znode_time/init.py
+++ b/tests/integration/test_keeper_znode_time/init.py
@ -0,0 +1 @@
+#!/usr/bin/env python3
--- a/tests/integration/test_keeper_znode_time/configs/enable_keeper1.xml
+++ b/tests/integration/test_keeper_znode_time/configs/enable_keeper1.xml
@ -0,0 +1,41 @@
+<clickhouse>
+    <keeper_server>
+        <tcp_port>9181</tcp_port>
+        <server_id>1</server_id>
+        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
+        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
+
+        <coordination_settings>
+            <operation_timeout_ms>5000</operation_timeout_ms>
+            <session_timeout_ms>10000</session_timeout_ms>
+            <snapshot_distance>75</snapshot_distance>
+            <raft_logs_level>trace</raft_logs_level>
+        </coordination_settings>
+
+        <raft_configuration>
+            <server>
+                <id>1</id>
+                <hostname>node1</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <priority>3</priority>
+            </server>
+            <server>
+                <id>2</id>
+                <hostname>node2</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>2</priority>
+            </server>
+            <server>
+                <id>3</id>
+                <hostname>node3</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>1</priority>
+            </server>
+        </raft_configuration>
+    </keeper_server>
+</clickhouse>
--- a/tests/integration/test_keeper_znode_time/configs/enable_keeper2.xml
+++ b/tests/integration/test_keeper_znode_time/configs/enable_keeper2.xml
@ -0,0 +1,41 @@
+<clickhouse>
+    <keeper_server>
+        <tcp_port>9181</tcp_port>
+        <server_id>2</server_id>
+        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
+        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
+
+        <coordination_settings>
+            <operation_timeout_ms>5000</operation_timeout_ms>
+            <session_timeout_ms>10000</session_timeout_ms>
+            <snapshot_distance>75</snapshot_distance>
+            <raft_logs_level>trace</raft_logs_level>
+        </coordination_settings>
+
+        <raft_configuration>
+            <server>
+                <id>1</id>
+                <hostname>node1</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <priority>3</priority>
+            </server>
+            <server>
+                <id>2</id>
+                <hostname>node2</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>2</priority>
+            </server>
+            <server>
+                <id>3</id>
+                <hostname>node3</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>1</priority>
+            </server>
+        </raft_configuration>
+    </keeper_server>
+</clickhouse>
--- a/tests/integration/test_keeper_znode_time/configs/enable_keeper3.xml
+++ b/tests/integration/test_keeper_znode_time/configs/enable_keeper3.xml
@ -0,0 +1,41 @@
+<clickhouse>
+    <keeper_server>
+        <tcp_port>9181</tcp_port>
+        <server_id>3</server_id>
+        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
+        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
+
+        <coordination_settings>
+            <operation_timeout_ms>5000</operation_timeout_ms>
+            <session_timeout_ms>10000</session_timeout_ms>
+            <snapshot_distance>75</snapshot_distance>
+            <raft_logs_level>trace</raft_logs_level>
+        </coordination_settings>
+
+        <raft_configuration>
+            <server>
+                <id>1</id>
+                <hostname>node1</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <priority>3</priority>
+            </server>
+            <server>
+                <id>2</id>
+                <hostname>node2</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>2</priority>
+            </server>
+            <server>
+                <id>3</id>
+                <hostname>node3</hostname>
+                <port>9234</port>
+                <can_become_leader>true</can_become_leader>
+                <start_as_follower>true</start_as_follower>
+                <priority>1</priority>
+            </server>
+        </raft_configuration>
+    </keeper_server>
+</clickhouse>
--- a/tests/integration/test_keeper_znode_time/configs/use_keeper.xml
+++ b/tests/integration/test_keeper_znode_time/configs/use_keeper.xml
@ -0,0 +1,16 @@
+<clickhouse>
+    <zookeeper>
+        <node index="1">
+            <host>node1</host>
+            <port>9181</port>
+        </node>
+        <node index="2">
+            <host>node2</host>
+            <port>9181</port>
+        </node>
+        <node index="3">
+            <host>node3</host>
+            <port>9181</port>
+        </node>
+    </zookeeper>
+</clickhouse>
--- a/tests/integration/test_keeper_znode_time/test.py
+++ b/tests/integration/test_keeper_znode_time/test.py
@ -0,0 +1,129 @@
+import pytest
+from helpers.cluster import ClickHouseCluster
+import random
+import string
+import os
+import time
+from multiprocessing.dummy import Pool
+from helpers.network import PartitionManager
+from helpers.test_tools import assert_eq_with_retry
+
+cluster = ClickHouseCluster(__file__)
+node1 = cluster.add_instance('node1', main_configs=['configs/enable_keeper1.xml', 'configs/use_keeper.xml'], stay_alive=True)
+node2 = cluster.add_instance('node2', main_configs=['configs/enable_keeper2.xml', 'configs/use_keeper.xml'], stay_alive=True)
+node3 = cluster.add_instance('node3', main_configs=['configs/enable_keeper3.xml', 'configs/use_keeper.xml'], stay_alive=True)
+
+from kazoo.client import KazooClient, KazooState
+
+@pytest.fixture(scope="module")
+def started_cluster():
+    try:
+        cluster.start()
+
+        yield cluster
+
+    finally:
+        cluster.shutdown()
+
+def smaller_exception(ex):
+    return '\n'.join(str(ex).split('\n')[0:2])
+
+def wait_node(node):
+    for _ in range(100):
+        zk = None
+        try:
+            node.query("SELECT * FROM system.zookeeper WHERE path = '/'")
+            zk = get_fake_zk(node.name, timeout=30.0)
+            zk.create("/test", sequence=True)
+            print("node", node.name, "ready")
+            break
+        except Exception as ex:
+            time.sleep(0.2)
+            print("Waiting until", node.name, "will be ready, exception", ex)
+        finally:
+            if zk:
+                zk.stop()
+                zk.close()
+    else:
+        raise Exception("Can't wait node", node.name, "to become ready")
+
+def wait_nodes():
+    for node in [node1, node2, node3]:
+        wait_node(node)
+
+
+def get_fake_zk(nodename, timeout=30.0):
+    _fake_zk_instance = KazooClient(hosts=cluster.get_instance_ip(nodename) + ":9181", timeout=timeout)
+    _fake_zk_instance.start()
+    return _fake_zk_instance
+
+def assert_eq_stats(stat1, stat2):
+    assert stat1.version == stat2.version
+    assert stat1.cversion == stat2.cversion
+    assert stat1.aversion == stat2.aversion
+    assert stat1.aversion == stat2.aversion
+    assert stat1.dataLength == stat2.dataLength
+    assert stat1.numChildren == stat2.numChildren
+    assert stat1.ctime == stat2.ctime
+    assert stat1.mtime == stat2.mtime
+
+def test_between_servers(started_cluster):
+    try:
+        wait_nodes()
+        node1_zk = get_fake_zk("node1")
+        node2_zk = get_fake_zk("node2")
+        node3_zk = get_fake_zk("node3")
+
+        node1_zk.create("/test_between_servers")
+        for child_node in range(1000):
+            node1_zk.create("/test_between_servers/" + str(child_node))
+
+        for child_node in range(1000):
+            node1_zk.set("/test_between_servers/" + str(child_node), b"somevalue")
+
+        for child_node in range(1000):
+            stats1 = node1_zk.exists("/test_between_servers/" + str(child_node))
+            stats2 = node2_zk.exists("/test_between_servers/" + str(child_node))
+            stats3 = node3_zk.exists("/test_between_servers/" + str(child_node))
+            assert_eq_stats(stats1, stats2)
+            assert_eq_stats(stats2, stats3)
+
+    finally:
+        try:
+            for zk_conn in [node1_zk, node2_zk, node3_zk]:
+                zk_conn.stop()
+                zk_conn.close()
+        except:
+            pass
+
+
+def test_server_restart(started_cluster):
+    try:
+        wait_nodes()
+        node1_zk = get_fake_zk("node1")
+
+        node1_zk.create("/test_server_restart")
+        for child_node in range(1000):
+            node1_zk.create("/test_server_restart/" + str(child_node))
+
+        for child_node in range(1000):
+            node1_zk.set("/test_server_restart/" + str(child_node), b"somevalue")
+
+        node3.restart_clickhouse(kill=True)
+
+        node2_zk = get_fake_zk("node2")
+        node3_zk = get_fake_zk("node3")
+        for child_node in range(1000):
+            stats1 = node1_zk.exists("/test_between_servers/" + str(child_node))
+            stats2 = node2_zk.exists("/test_between_servers/" + str(child_node))
+            stats3 = node3_zk.exists("/test_between_servers/" + str(child_node))
+            assert_eq_stats(stats1, stats2)
+            assert_eq_stats(stats2, stats3)
+
+    finally:
+        try:
+            for zk_conn in [node1_zk, node2_zk, node3_zk]:
+                zk_conn.stop()
+                zk_conn.close()
+        except:
+            pass
--- a/tests/integration/test_query_deduplication/init.py
+++ b/tests/integration/test_query_deduplication/init.py
--- a/tests/integration/test_query_deduplication/configs/deduplication_settings.xml
+++ b/tests/integration/test_query_deduplication/configs/deduplication_settings.xml
@ -1,5 +0,0 @@
-<clickhouse>
-    <merge_tree>
-        <assign_part_uuids>1</assign_part_uuids>
-    </merge_tree>
-</clickhouse>
--- a/tests/integration/test_query_deduplication/configs/profiles.xml
+++ b/tests/integration/test_query_deduplication/configs/profiles.xml
@ -1,7 +0,0 @@
-<clickhouse>
-    <profiles>
-        <default>
-            <experimental_query_deduplication_send_all_part_uuids>1</experimental_query_deduplication_send_all_part_uuids>
-        </default>
-    </profiles>
-</clickhouse>
--- a/tests/integration/test_query_deduplication/configs/remote_servers.xml
+++ b/tests/integration/test_query_deduplication/configs/remote_servers.xml
@ -1,24 +0,0 @@
-<clickhouse>
-    <remote_servers>
-        <test_cluster>
-            <shard>
-                <replica>
-                    <host>node1</host>
-                    <port>9000</port>
-                </replica>
-            </shard>
-            <shard>
-                <replica>
-                    <host>node2</host>
-                    <port>9000</port>
-                </replica>
-            </shard>
-            <shard>
-                <replica>
-                    <host>node3</host>
-                    <port>9000</port>
-                </replica>
-            </shard>
-        </test_cluster>
-    </remote_servers>
-</clickhouse>
--- a/tests/integration/test_query_deduplication/test.py
+++ b/tests/integration/test_query_deduplication/test.py
@ -1,168 +0,0 @@
-import uuid
-
-import pytest
-
-from helpers.cluster import ClickHouseCluster
-from helpers.test_tools import TSV
-
-DUPLICATED_UUID = uuid.uuid4()
-
-cluster = ClickHouseCluster(__file__)
-
-node1 = cluster.add_instance(
-    'node1',
-    main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
-    user_configs=['configs/profiles.xml'])
-
-node2 = cluster.add_instance(
-    'node2',
-    main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
-    user_configs=['configs/profiles.xml'])
-
-node3 = cluster.add_instance(
-    'node3',
-    main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
-    user_configs=['configs/profiles.xml'])
-
-
-@pytest.fixture(scope="module")
-def started_cluster():
-    try:
-        cluster.start()
-        yield cluster
-    finally:
-        cluster.shutdown()
-
-
-def prepare_node(node, parts_uuid=None):
-    node.query("""
-    CREATE TABLE t(_prefix UInt8 DEFAULT 0, key UInt64, value UInt64)
-    ENGINE MergeTree()
-    ORDER BY tuple()
-    PARTITION BY _prefix
-    SETTINGS index_granularity = 1
-    """)
-
-    node.query("""
-    CREATE TABLE d AS t ENGINE=Distributed(test_cluster, default, t)
-    """)
-
-    # Stop merges while populating test data
-    node.query("SYSTEM STOP MERGES")
-
-    # Create 5 parts
-    for i in range(1, 6):
-        node.query("INSERT INTO t VALUES ({}, {}, {})".format(i, i, i))
-
-    node.query("DETACH TABLE t")
-
-    if parts_uuid:
-        for part, part_uuid in parts_uuid:
-            script = """
-            echo -n '{}' > /var/lib/clickhouse/data/default/t/{}/uuid.txt
-            """.format(part_uuid, part)
-            node.exec_in_container(["bash", "-c", script])
-
-    # Attach table back
-    node.query("ATTACH TABLE t")
-
-    # NOTE:
-    # due to absence of the ability to lock part, need to operate on parts with preventin merges
-    # node.query("SYSTEM START MERGES")
-    # node.query("OPTIMIZE TABLE t FINAL")
-
-    print(node.name)
-    print(node.query("SELECT name, uuid, partition FROM system.parts WHERE table = 't' AND active ORDER BY name"))
-
-    assert '5' == node.query("SELECT count() FROM system.parts WHERE table = 't' AND active").strip()
-    if parts_uuid:
-        for part, part_uuid in parts_uuid:
-            assert '1' == node.query(
-                "SELECT count() FROM system.parts WHERE table = 't' AND uuid = '{}' AND active".format(
-                    part_uuid)).strip()
-
-
-@pytest.fixture(scope="module")
-def prepared_cluster(started_cluster):
-    print("duplicated UUID: {}".format(DUPLICATED_UUID))
-    prepare_node(node1, parts_uuid=[("3_3_3_0", DUPLICATED_UUID)])
-    prepare_node(node2, parts_uuid=[("3_3_3_0", DUPLICATED_UUID)])
-    prepare_node(node3)
-
-
-def test_virtual_column(prepared_cluster):
-    # Part containing `key=3` has the same fingerprint on both nodes,
-    #   we expect it to be included only once in the end result.;
-    # select query is using virtucal column _part_fingerprint to filter out part in one shard
-    expected = """
-    1	2
-    2	2
-    3	1
-    4	2
-    5	2
-    """
-    assert TSV(expected) == TSV(node1.query("""
-    SELECT
-        key,
-        count() AS c
-    FROM d
-    WHERE ((_shard_num = 1) AND (_part_uuid != '{}')) OR (_shard_num = 2)
-    GROUP BY key
-    ORDER BY
-        key ASC
-    """.format(DUPLICATED_UUID)))
-
-
-def test_with_deduplication(prepared_cluster):
-    # Part containing `key=3` has the same fingerprint on both nodes,
-    # we expect it to be included only once in the end result
-    expected = """
-1	3
-2	3
-3	2
-4	3
-5	3
-"""
-    assert TSV(expected) == TSV(node1.query(
-        "SET allow_experimental_query_deduplication=1; SELECT key, count() c FROM d GROUP BY key ORDER BY key"))
-
-
-def test_no_merge_with_deduplication(prepared_cluster):
-    # Part containing `key=3` has the same fingerprint on both nodes,
-    # we expect it to be included only once in the end result.
-    # even with distributed_group_by_no_merge=1 the duplicated part should be excluded from the final result
-    expected = """
-1	1
-2	1
-3	1
-4	1
-5	1
-1	1
-2	1
-3	1
-4	1
-5	1
-1	1
-2	1
-4	1
-5	1
-"""
-    assert TSV(expected) == TSV(node1.query("SELECT key, count() c FROM d GROUP BY key ORDER BY key", settings={
-        "allow_experimental_query_deduplication": 1,
-        "distributed_group_by_no_merge": 1,
-    }))
-
-
-def test_without_deduplication(prepared_cluster):
-    # Part containing `key=3` has the same fingerprint on both nodes,
-    # but allow_experimental_query_deduplication is disabled,
-    # so it will not be excluded
-    expected = """
-1	3
-2	3
-3	3
-4	3
-5	3
-"""
-    assert TSV(expected) == TSV(node1.query(
-        "SET allow_experimental_query_deduplication=0; SELECT key, count() c FROM d GROUP BY key ORDER BY key"))
--- a/tests/integration/test_storage_postgresql/test.py
+++ b/tests/integration/test_storage_postgresql/test.py
@ -447,6 +447,16 @@ def test_where_false(started_cluster):
    cursor.execute("DROP TABLE test")


+def test_datetime64(started_cluster):
+    cursor = started_cluster.postgres_conn.cursor()
+    cursor.execute("drop table if exists test")
+    cursor.execute("create table test (ts timestamp)")
+    cursor.execute("insert into test select '1960-01-01 20:00:00';")
+
+    result = node1.query("select * from postgresql(postgres1, table='test')")
+    assert(result.strip() == '1960-01-01 20:00:00.000000')
+
+
 if __name__ == '__main__':
    cluster.start()
    input("Cluster created, press any key to destroy...")
--- a/tests/queries/0_stateless/02100_multiple_hosts_command_line_set.reference
+++ b/tests/queries/0_stateless/02100_multiple_hosts_command_line_set.reference
@ -1,14 +1,29 @@
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
+=== Backward compatibility test
+1
+=== Cannot resolve host
+1
+1
+=== Bad arguments
+1
+1
+=== Not alive host
+1
+1
+1
+1
+1
+=== Code 210 with ipv6
+1
+1
+1
+1
+1
+1
+=== Values form config
+1
+1
+===
 1
 1
 1
--- a/tests/queries/0_stateless/02100_multiple_hosts_command_line_set.sh
+++ b/tests/queries/0_stateless/02100_multiple_hosts_command_line_set.sh
@ -7,55 +7,75 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
 # default values test
 ${CLICKHOUSE_CLIENT} --query "SELECT 1"

-# backward compatibility test
+echo '=== Backward compatibility test'
 ${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";

+echo '=== Cannot resolve host'
 not_resolvable_host="notlocalhost"
-exception_msg="Cannot resolve host (${not_resolvable_host}), error 0: ${not_resolvable_host}.
-Code: 198. DB::Exception: Not found address of host: ${not_resolvable_host}. (DNS_ERROR)
-"
-error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --host "${not_resolvable_host}"  --query "SELECT 1" 2>&1 > /dev/null)";
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+error="$(${CLICKHOUSE_CLIENT} --host "${not_resolvable_host}" --query "SELECT 1" 2>&1 > /dev/null)";
+echo "${error}" | grep -Fc "DNS_ERROR"
+echo "${error}" | grep -Fq "${not_resolvable_host}" && echo 1 || echo 0

+echo '=== Bad arguments'
 not_number_port="abc"
-exception_msg="Bad arguments: the argument ('${CLICKHOUSE_HOST}:${not_number_port}') for option '--host' is invalid."
-error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_number_port}"  --query "SELECT 1" 2>&1 > /dev/null)";
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_number_port}" --query "SELECT 1" 2>&1 > /dev/null)";
+echo "${error}" | grep -Fc "Bad arguments"
+echo "${error}" | grep -Fc "${not_number_port}"
+
+echo '=== Not alive host'

 not_alive_host="10.100.0.0"
 ${CLICKHOUSE_CLIENT} --host "${not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";

 not_alive_port="1"
-exception_msg="Code: 210. DB::NetException: Connection refused (${CLICKHOUSE_HOST}:${not_alive_port}). (NETWORK_ERROR)
-"
 error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_alive_port}" --query "SELECT 1" 2>&1 > /dev/null)"
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+echo "${error}" | grep -Fc "Code: 210"
+echo "${error}" | grep -Fc "${CLICKHOUSE_HOST}:${not_alive_port}"
+
 ${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_alive_port}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
 ${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${CLICKHOUSE_PORT_TCP}" --port "${not_alive_port}" --query "SELECT 1";

+echo '=== Code 210 with ipv6'
+
 ipv6_host_without_brackets="2001:3984:3989::1:1000"
-exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_without_brackets}). (NETWORK_ERROR)
-"
 error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_without_brackets}" --query "SELECT 1" 2>&1 > /dev/null)"
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+echo "${error}" | grep -Fc "Code: 210"
+echo "${error}" | grep -Fc "${ipv6_host_without_brackets}"

 ipv6_host_with_brackets="[2001:3984:3989::1:1000]"
-exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_with_brackets}). (NETWORK_ERROR)
-"
+
 error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_with_brackets}" --query "SELECT 1" 2>&1 > /dev/null)"
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+echo "${error}" | grep -Fc "Code: 210"
+echo "${error}" | grep -Fc "${ipv6_host_with_brackets}"

-exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_with_brackets}:${not_alive_port}). (NETWORK_ERROR)
-"
 error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_with_brackets}" --port "${not_alive_port}" --query "SELECT 1" 2>&1 > /dev/null)"
-[ "${error}" == "${exception_msg}" ]; echo "$?"
+echo "${error}" | grep -Fc "Code: 210"
+echo "${error}" | grep -Fc "${ipv6_host_with_brackets}:${not_alive_port}"

+echo '=== Values form config'

-${CLICKHOUSE_CLIENT}  --query "SELECT 1";
-${CLICKHOUSE_CLIENT}  --port "${CLICKHOUSE_PORT_TCP}"  --query "SELECT 1";
-${CLICKHOUSE_CLIENT}  --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
-${CLICKHOUSE_CLIENT}  --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
-${CLICKHOUSE_CLIENT}  --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --host "{$not_alive_host}" --port  "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
-${CLICKHOUSE_CLIENT}  --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1" 2> /dev/null;
-${CLICKHOUSE_CLIENT}  --port "${CLICKHOUSE_PORT_TCP}"  --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
+CUSTOM_CONFIG="$CURDIR/02100_config.xml"
+rm -f ${CUSTOM_CONFIG}
+
+cat << EOF > ${CUSTOM_CONFIG}
+<config>
+  <host>${not_alive_host}</host>
+  <port>${not_alive_port}</port>
+</config>
+EOF
+
+error="$(${CLICKHOUSE_CLIENT} --config ${CUSTOM_CONFIG} --query "SELECT 1" 2>&1 > /dev/null)"
+echo "${error}" | grep -Fc "DB::NetException"
+echo "${error}" | grep -Fc "${not_alive_host}:${not_alive_port}"
+rm -f ${CUSTOM_CONFIG}
+
+echo '==='
+
+${CLICKHOUSE_CLIENT} --query "SELECT 1";
+${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
+${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
+${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
+${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --host "{$not_alive_host}" --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
+${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1" 2> /dev/null;
+${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}"  --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";

--- a/tests/queries/0_stateless/02226_in_untuple_issue_34810.reference
+++ b/tests/queries/0_stateless/02226_in_untuple_issue_34810.reference
@ -0,0 +1 @@
+2001	2
--- a/tests/queries/0_stateless/02226_in_untuple_issue_34810.sql
+++ b/tests/queries/0_stateless/02226_in_untuple_issue_34810.sql
@ -0,0 +1,13 @@
+DROP TABLE IF EXISTS calendar;
+DROP TABLE IF EXISTS events32;
+
+CREATE TABLE calendar ( `year` Int64, `month` Int64 ) ENGINE = TinyLog;
+INSERT INTO calendar VALUES (2000, 1), (2001, 2), (2000, 3);
+
+CREATE TABLE events32 ( `year` Int32, `month` Int32 ) ENGINE = TinyLog;
+INSERT INTO events32 VALUES (2001, 2), (2001, 3);
+
+SELECT * FROM calendar WHERE (year, month) IN ( SELECT (year, month) FROM events32 );
+
+DROP TABLE IF EXISTS calendar;
+DROP TABLE IF EXISTS events32;
--- a/utils/list-versions/version_date.tsv
+++ b/utils/list-versions/version_date.tsv
@ -1,4 +1,6 @@
+v22.2.3.5-stable	2022-02-25
 v22.2.2.1-stable	2022-02-17
+v22.1.4.30-stable	2022-02-25
 v22.1.3.7-stable	2022-01-23
 v22.1.2.2-stable	2022-01-19
 v21.12.4.1-stable	2022-01-23
--- a/website/blog/en/2022/clickhouse-v22.2-released.md
+++ b/website/blog/en/2022/clickhouse-v22.2-released.md
@ -0,0 +1,90 @@
+---
+title: 'ClickHouse 22.2 Released'
+image: 'https://blog-images.clickhouse.com/en/2022/clickhouse-v22-2/featured.jpg'
+date: '2022-02-23'
+author: 'Alexey Milovidov'
+tags: ['company', 'community']
+---
+
+We prepared a new ClickHouse release 22.2, so it's nice if you have tried it on 2022-02-22. If not, you can try it today. This latest release includes 2,140 new commits from 118 contributors, including 41 new contributors:
+
+> Aaron Katz, Andre Marianiello, Andrew, Andrii Buriachevskyi, Brian Hunter, CoolT2, Federico Rodriguez, Filippov Denis, Gaurav Kumar, Geoff Genz, HarryLeeIBM, Heena Bansal, ILya Limarenko, Igor Nikonov, IlyaTsoi, Jake Liu, JaySon-Huang, Lemore, Leonid Krylov, Michail Safronov, Mikhail Fursov, Nikita, RogerYK, Roy Bellingan, Saad Ur Rahman, W, Yakov Olkhovskiy, alexeypavlenko, cnmade, grantovsky, hanqf-git, liuneng1994, mlkui, s-kat, tesw yew isal, vahid-sohrabloo, yakov-olkhovskiy, zhifeng, zkun, zxealous, 박동철.
+
+Let me tell you what is most interesting in 22.2...
+
+## Projections are production ready
+
+Projections allow you to have multiple data representations in the same table. For example, you can have data aggregations along with the raw data. There are no restrictions on which aggregate functions can be used - you can have count distinct, quantiles, or whatever you want. You can have data in multiple different sorting orders. ClickHouse will automatically select the most suitable projection for your query, so the query will be automatically optimized.
+
+Projections are somewhat similar to Materialized Views, which also allow you to have incremental aggregation and multiple sorting orders. But unlike Materialized Views, projections are updated atomically and consistently with the main table. The data for projections is being stored in the same "data parts" of the table and is being merged in the same way as the main data.
+
+The feature was developed by **Amos Bird**, a prominent ClickHouse contributor. The [prototype](https://github.com/ClickHouse/ClickHouse/pull/20202) has been available since Feb 2021, it has been merged in the main codebase by **Nikolai Kochetov** in May 2021 under experimental flag, and after 21 follow-up pull requests we ensured that it passed the full set of test suites and enabled it by default.
+
+Read an example of how to optimize queries with projections [in our docs](https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid/#speedup-with-projections).
+
+## Control of file creation and rewriting on data export
+
+When you export your data with an `INSERT INTO TABLE FUNCTION` statement into `file`, `s3` or `hdfs` and the target file already exists, you can now control how to deal with it: you can append new data into the file if it is possible, rewrite it with new data, or create another file with a similar name like 'data.1.parquet.gz'. 
+
+Some storage systems like `s3` and some formats like `Parquet` don't support data appending. In previous ClickHouse versions, if you insert multiple times into a file with Parquet data format, you will end up with a file that is not recognized by other systems. Now you can choose between throwing exceptions on subsequent inserts or creating more files.
+
+So, new settings were introduced: `s3_truncate_on_insert`, `s3_create_new_file_on_insert`, `hdfs_truncate_on_insert`, `hdfs_create_new_file_on_insert`, `engine_file_allow_create_multiple_files`.
+
+This feature [was developed](https://github.com/ClickHouse/ClickHouse/pull/33302) by **Pavel Kruglov**.
+
+## Custom deduplication token
+
+`ReplicatedMergeTree` and `MergeTree` types of tables implement block-level deduplication. When a block of data is inserted, its cryptographic hash is calculated and if the same block was already inserted before, then the duplicate is skipped and the insert query succeeds. This makes it possible to implement exactly-once semantics for inserts.
+
+In ClickHouse version 22.2 you can provide your own deduplication token instead of an automatically calculated hash. This makes sense if you already have batch identifiers from some other system and you want to reuse them. It also makes sense when blocks can be identical but they should actually be inserted multiple times. Or the opposite - when blocks contain some random data and you want to deduplicate only by significant columns.
+
+This is implemented by adding the setting `insert_deduplication_token`. The feature was contributed by **Igor Nikonov**. 
+
+## DEFAULT keyword for INSERT
+
+A small addition for SQL compatibility - now we allow using the `DEFAULT` keyword instead of a value in `INSERT INTO ... VALUES` statement. It looks like this: 
+
+`INSERT INTO test VALUES (1, 'Hello', DEFAULT)`
+
+Thanks to **Andrii Buriachevskyi** for this feature. 
+
+## EPHEMERAL columns
+
+A column in a table can have a `DEFAULT` expression like `c INT DEFAULT a + b`. In ClickHouse you can also use `MATERIALIZED` instead of `DEFAULT` if you want the column to be always calculated with the provided expression instead of allowing a user to insert data. And you can use `ALIAS` if you don't want the column to be stored at all but instead to be calculated on the fly if referenced.
+
+Since version 22.2 a new type of column is added: `EPHEMERAL` column. The user can insert data into this column but the column is not stored in a table, it's ephemeral. The purpose of this column is to provide data to calculate other columns that can reference it with `DEFAULT` or `MATERIALIZED` expressions.
+
+This feature was made by **Yakov Olkhovskiy**.
+
+## Improvements for multi-disk configuration
+
+You can configure multiple disks to store ClickHouse data instead of managing RAID and ClickHouse will automatically manage the data placement.
+
+Since version 22.2 ClickHouse can automatically repair broken disks without server restart by downloading the missing parts from replicas and placing them on the healthy disks.
+
+This feature was implemented by **Amos Bird** and is already being used for more than 1.5 years in production at Kuaishou.
+
+Another improvement is the option to specify TTL MOVE TO DISK/VOLUME **IF EXISTS**. It allows replicas with non-uniform disk configuration and to have one replica to move old data to cold storage while another replica has all the data on hot storage. Data will be moved only on replicas that have the specified disk or volume, hence *if exists*. This was developed by **Anton Popov**.
+
+## Flexible memory limits
+
+We split per-query and per-user memory limits into a pair of hard and soft limits. The settings `max_memory_usage` and `max_memory_usage_for_user` act as hard limits. When memory consumption is approaching the hard limit, an exception will be thrown. Two other settings: `max_guaranteed_memory_usage` and `max_guaranteed_memory_usage_for_user` act as soft limits.
+
+A query will be allowed to use more memory than a soft limit if there is available memory. But if there will be memory shortage (relative to the per-user hard limit or total per-server memory consumption), we calculate the "overcommit ratio" - how much more memory every query is consuming relative to the soft limit - and we will kill the most overcommitted query to let other queries run.
+ 
+In short, your query will not be limited to a few gigabytes of RAM if you have hundreds of gigabytes available.
+
+This experimental feature was implemented by **Dmitry Novik** and is continuing to be developed.
+
+## Shell-style comments in SQL
+
+Now we allow comments starting with `# ` or `#!`, similar to MySQL. The variant with `#!` allows using shell scripts with "shebang" interpreted by `clickhouse-local`.
+
+This feature was contributed by **Aaron Katz**. Very nice.  
+
+
+## And many more...
+
+Maxim Kita, Danila Kutenin, Anton Popov, zhanglistar, Federico Rodriguez, Raúl Marín, Amos Bird and Alexey Milovidov have contributed a ton of performance optimizations for this release. We are obsessed with high performance, as usual. :)
+
+Read the [full changelog](https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md) for the 22.2 release and follow [the roadmap](https://github.com/ClickHouse/ClickHouse/issues/32513).
--- a/website/blog/en/2022/opensee-analyzing-terabytes-of-financial-data-a-day-with-clickhouse.md
+++ b/website/blog/en/2022/opensee-analyzing-terabytes-of-financial-data-a-day-with-clickhouse.md
@ -0,0 +1,75 @@
+---
+title: 'Opensee: Analyzing Terabytes of Financial Data a Day With ClickHouse'
+image: 'https://blog-images.clickhouse.com/en/2022/opensee/featured.png'
+date: '2022-02-22'
+author: 'Christophe Rivoire, Elena Bessis'
+tags: ['company', 'community']
+---
+
+We’d like to welcome Christophe Rivoire (UK Country Manager) and Elena Bessis (Product Marketing Assistant) from Opensee as guests to our blog. Today, they’re telling us how their product, powered by ClickHouse, allows financial institutions’ business users to directly harness 100% of their vast quantities of data instantly and on demand, with no size limitations. 
+
+Opensee is a financial technology company providing real time self-service analytics solutions to financial institutions, which help them turn their big data challenges into a competitive advantage — unlocking vital opportunities led by business users. Opensee, formerly ICA, was started by a team of financial industry and technology experts frustrated that no simple big data analytics solution enabled them to dive deeper into all their data easily and efficiently, or perform what-if analysis on the hundreds of terabytes of data they were handling. 
+
+So they built their own. 
+
+
+## ClickHouse For Trillions Of Financial Data Points
+
+Financial institutions have always been storing a lot of data (customer data, risk data, transaction data...) for their own decision processes and for regulatory reasons. Since the financial crisis, regulators all around the world have been significantly increasing the reporting requirements, insisting on longer historical ranges and deeper granularity. This combination has generated an exponential amount of data, which has forced financial institutions to review and upgrade their infrastructure. Opensee offers a solution to navigate all these very large data cubes, based on millions, billions or even trillions of data points. In order to build it, a data storage system capable of scaling horizontally with data and with fast OLAP query response time was required. In 2016, after thorough evaluation, Opensee concluded ClickHouse was the obvious solution. 
+
+There are many use cases that involve storing and leveraging massive amounts of data on a daily basis, but Opensee built from the strength of their own expertise, evaluating risk linked to activities in the financial market. There are various types of risks (market risk, credit risk, liquidity risk…) and all of them need to aggregate a lot of data in order to calculate linear or non-linear indicators, both business and regulatory, and analyze all those numbers on the fly.
+
+!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/dashboard.png)
+_Dashboard in Opensee for a Market Risk use case_
+
+
+## ClickHouse for Scalability, Granularity, Speed and Cost Control
+
+Financial institutions have sometimes believed that their ability to craft efficient storage solutions like data lakes for their vast amounts of data, typically built on a Hadoop stack, would make real-time analytics available. Unfortunately, many of these systems are too slow for at-scale analytics. 
+
+Running a query on a Hadoop data lake is just not an option for users with real-time needs! Banks experimented with different types of analytical layers between the data lakes and the users, in order to allow access to their stored data and to run analytics, but ran into new challenges: in-memory computing solutions have a lack of scalability and high hardware costs. Others tried query accelerators but were forced to analyze only prepared data (pre-aggregated or specifically indexed data), losing the granularity which is always required to understand things like daily changes. More recently, financial institutions have been contemplating cloud database management systems, but for very large datasets and calculations the speed of these services is far from what ClickHouse can achieve for their specific use cases. 
+
+Ultimately, none of these technologies could simultaneously combine scalability, granularity, speed and cost control, forcing financial institutions into a series of compromises. With Opensee, there is no need to compromise: the platform leverages ClickHouse's capacity to handle the huge volume that data lakes require and the fast response that in-memory databases can give, without the need to pre-aggregate the data. 
+
+
+
+!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/pivot-table.png)
+_Pivot table from the Opensee UI on a liquidity use case_
+
+
+## Opensee Architecture 
+
+Opensee provides a series of APIs which allows users to fully abstract all the complexity and in particular the physical data model. These APIs are typically used for data ingestion, data query, model management, etc. Thanks to Opensee’s low-code API, users don’t need to access data through complex quasi-SQL queries, but rather through simple business queries that are optimized by Opensee to deliver performance. Opensee’s back end, which provides indirect access to Clickhouse, is written in Scala, while PostgreSQL contains all the configuration and context data that must be managed transactionally. Opensee also provides various options for front ends (dedicated Opensee web or rich user interface, Excel, others…) to interact with the data, navigate through the cube and leverage functionality like data versioning — built for the financial institution’s use. 
+
+
+
+!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/architecture-chart.png)
+_Opensee architecture chart_
+
+
+## Advantages of ClickHouse
+
+For Opensee, the most valuable feature is horizontal scalability, the capability to shard the data. Next comes the very fast dictionary lookup, rapid calculations with vectorization and the capability to manage array values. In the financial industry, where time series or historical data is everywhere, this capacity to calculate vectors and manage array values is critical.	
+
+On top of being a solution that is extremely fast and efficient, other advantages include:
+
+
+- distributed and replicated, with high availability and a performant map/reduce system
+- wide range of features fit for analytics
+- really good and extensive format support (csv, json, parquet, orc, protobuf ....)
+- very rapid evolutions through the high contributions of a wide community to a very popular Open Source technology
+
+On top of these native ClickHouse strengths and functionalities, Opensee has developed a lot of other functionalities dedicated to financial institutions. To name only a few, a data versioning mechanism has been created allowing business users to either correct on the fly inaccurate data or simulate new values. This ‘What If’ simulation feature can be used to add, amend or delete transactions,with full auditability and traceability, without deleting any data.
+
+Another key feature is a Python processor which is available to define  more complex calculations. Furthermore, the abstraction model layer has been built to remove the complexity of the physical data model for the users and optimize the queries. And, last but not least, in terms of visualization, a UI dedicated to financial institutions has been developed with and for its users.		
+
+
+## Dividing Hardware Costs By 10+
+
+The cost efficiency factor is a key improvement for large financial institutions typically using in-memory computing technology. Dividing by ten (and sometimes more) the hardware cost is no small achievement! Being able to use very large datasets on standard servers on premise or in the cloud is a big achievement. With Opensee powered by ClickHouse, financial institutions are able to alleviate critical limitations of their existing solutions, avoiding legacy compromises and a lack of flexibility. Finally, these organizations are able to provide their users a turn-key solution to analyze all their data sets, which used to be siloed, in one single place, one single data model, one single infrastructure, and all of that in real time, combining very granular and very long historical ranges.
+
+## About Opensee
+
+Opensee empowers financial data divers to analyze deeper and faster. Headquartered in Paris, with offices in London and New York, Opensee is working with a trusted client base across global Tier 1 banks, asset managers, hedge funds and trading platforms.
+
+For more information please visit [www.opensee.io](http://www.opensee.io) or follow them on [LinkedIn](https://www.linkedin.com/company/opensee-company) and [Twitter](https://twitter.com/opensee_io).
--- a/website/templates/global/banner.html
+++ b/website/templates/global/banner.html
@ -1,6 +1,6 @@
 <div class="banner bg-light">
    <div class="container">
-        <p class="text-center text-dark mb-0 mx-auto pt-1 pb-1">{{ _('ClickHouse v22.2 is coming soon! Add the Feb 17 Release Webinar to your calendar') }} <a
-                href="https://www.google.com/calendar/render?action=TEMPLATE&text=ClickHouse+v22.2+Release+Webinar&details=Join+from+a+PC%2C+Mac%2C+iPad%2C+iPhone+or+Android+device%3A%0A%C2%A0+%C2%A0+Please+click+this+URL+to+join.%C2%A0%0Ahttps%3A%2F%2Fzoom.us%2Fj%2F92785669470%3Fpwd%3DMkpCMU9KSmpNTGp6WmZmK2JqV0NwQT09%0A%0A%C2%A0+%C2%A0+Passcode%3A+139285%0A%0A%C2%A0Description%3A+Connect+with+ClickHouse+experts+and+test+out+the+newest+features+and+performance+gains+in+the+v22.2+release.%0A%0AOr+One+tap+mobile%3A%0A%C2%A0+%C2%A0+%2B12532158782%2C%2C92785669470%23%2C%2C%2C%2C%2A139285%23+US+%28Tacoma%29%0A%C2%A0+%C2%A0+%2B13462487799%2C%2C92785669470%23%2C%2C%2C%2C%2A139285%23+US+%28Houston%29%0A%0AOr+join+by+phone%3A%0A%C2%A0+%C2%A0+Dial%28for+higher+quality%2C+dial+a+number+based+on+your+current+location%29%3A%0A%C2%A0+%C2%A0+%C2%A0+%C2%A0+US%3A+%2B1+253+215+8782+or+%2B1+346+248+7799+or+%2B1+669+900+9128+or+%2B1+301+715+8592+or+%2B1+312+626+6799+or+%2B1+646+558+8656%C2%A0%0A%C2%A0+%C2%A0%C2%A0%0A%C2%A0+%C2%A0+Webinar+ID%3A+927+8566+9470%0A%C2%A0+%C2%A0+Passcode%3A+139285%0A%C2%A0+%C2%A0+International+numbers+available%3A+https%3A%2F%2Fzoom.us%2Fu%2FalqvP0je9&location=https%3A%2F%2Fzoom.us%2Fj%2F92785669470%3Fpwd%3DMkpCMU9KSmpNTGp6WmZmK2JqV0NwQT09&dates=20220217T170000Z%2F20220217T180000Z" target="_blank">here</a></p>
+        <p class="text-center text-dark mb-0 mx-auto pt-1 pb-1">{{ _('ClickHouse v22.3 is coming soon! Add the Mar 17 Release Webinar to your calendar') }} <a
+                href="http://www.google.com/calendar/event?action=TEMPLATE&dates=20220317T160000Z/20220317T170000Z&text=ClickHouse+v22.3+Release+Webinar&location=https%3A%2F%2Fzoom.us%2Fj%2F91955953263%3Fpwd%3DSXBKWW5ETkNMc1dmVWUxTUJKNm5hUT09&details=Please+click+the+link+below+to+join+the+webinar%3A%0D%0Ahttps%3A%2F%2Fzoom.us%2Fj%2F91955953263%3Fpwd%3DSXBKWW5ETkNMc1dmVWUxTUJKNm5hUT09%0D%0A%0D%0APasscode%3A+139285%0D%0A%0D%0AOr+One+tap+mobile+%3A+%0D%0A++++US%3A+%2B12532158782%2C%2C91955953263%23%2C%2C%2C%2C%2A139285%23++or+%2B13462487799%2C%2C91955953263%23%2C%2C%2C%2C%2A139285%23+%0D%0A%0D%0AOr+Telephone%3A%0D%0A++++Dial%28for+higher+quality%2C+dial+a+number+based+on+your+current+location%29%3A%0D%0A++++++++US%3A+%2B1+253+215+8782++or+%2B1+346+248+7799++or+%2B1+669+900+9128++or+%2B1+301+715+8592++or+%2B1+312+626+6799++or+%2B1+646+558+8656+%0D%0A%0D%0AWebinar+ID%3A+919+5595+3263%0D%0APasscode%3A+139285%0D%0A++++International+numbers+available%3A+https%3A%2F%2Fzoom.us%2Fu%2FasrDyM28Q" target="_blank">here</a></p>
    </div>
 </div>
--- a/website/templates/index/hero.html
+++ b/website/templates/index/hero.html
@ -3,7 +3,7 @@
    <div class="container pt-5 pt-lg-7 pt-xl-15 pb-5 pb-lg-7">

        <h1 class="display-1 mb-2 mb-xl-3 mx-auto text-center">
-            ClickHouse <span class="text-orange">v22.1 Released</span>
+            ClickHouse <span class="text-orange">v22.2 Released</span>
        </h1>

        <p class="lead mb-3 mb-lg-5 mb-xl-7 mx-auto text-muted text-center" style="max-width:780px;">
@ -11,7 +11,7 @@
        </p>

        <p class="d-flex justify-content-center mb-0">
-            <a href="https://www.youtube.com/watch?v=gP7I2SUBXig&ab_channel=ClickHouse" target="_blank" class="btn btn-primary trailing-link">Watch the Release Webinar on YouTube</a>
+            <a href="https://www.youtube.com/watch?v=6EG1gwhSTPg" target="_blank" class="btn btn-primary trailing-link">Watch the Release Webinar on YouTube</a>
        </p>

    </div>
				`@ -1 +0,0 @@`
				`../../../en/faq/general/who-is-using-clickhouse.md`
				`@ -1 +0,0 @@`
				`../../../en/getting-started/example-datasets/github-events.md`