Merge branch 'master' into UniqSketch

2024-11-21 23:21:59 +00:00 · 2022-09-01 19:31:53 +08:00 · 2022-09-01 19:31:53 +08:00 · bc7d661668
commit bc7d661668
parent acec516271 e3af5a7a11
118 changed files with 1804 additions and 344 deletions
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@ -1,4 +1,4 @@
-name: ReleaseWorkflow
+name: PublishedReleaseCI
 # - Gets artifacts from S3
 # - Sends it to JFROG Artifactory
 # - Adds them to the release assets
@ -15,7 +15,7 @@ jobs:
    - name: Set envs
      run: |
        cat >> "$GITHUB_ENV" << 'EOF'
-        JFROG_API_KEY=${{ secrets.JFROG_KEY_API_PACKAGES }}
+        JFROG_API_KEY=${{ secrets.JFROG_ARTIFACTORY_API_KEY }}
        TEMP_PATH=${{runner.temp}}/release_packages
        REPO_COPY=${{runner.temp}}/release_packages/ClickHouse
        EOF
@ -30,7 +30,7 @@ jobs:
        cp -r "$GITHUB_WORKSPACE" "$TEMP_PATH"
        cd "$REPO_COPY"
        python3 ./tests/ci/push_to_artifactory.py --release "${{ github.ref }}" \
-          --commit '${{ github.sha }}' --all
+          --commit '${{ github.sha }}' --artifactory-url "${{ secrets.JFROG_ARTIFACTORY_URL }}" --all
    - name: Upload packages to release assets
      uses: svenstaro/upload-release-action@v2
      with:
--- a/.github/workflows/release_branches.yml
+++ b/.github/workflows/release_branches.yml
@ -1,4 +1,4 @@
-name: ReleaseCI
+name: ReleaseBranchCI

 env:
  # Force the stdout and stderr streams to be unbuffered
--- a/contrib/NuRaft
+++ b/contrib/NuRaft
@ -1 +1 @@
-Subproject commit 33f60f961d4914441b684af43e9e5535078ba54b
+Subproject commit bdba298189e29995892de78dcecf64d127444e81
--- a/docs/changelogs/v22.8.4.7-lts.md
+++ b/docs/changelogs/v22.8.4.7-lts.md
@ -0,0 +1,18 @@
+---
+sidebar_position: 1
+sidebar_label: 2022
+---
+
+# 2022 Changelog
+
+### ClickHouse release v22.8.4.7-lts (baad27bcd2f) FIXME as compared to v22.8.3.13-lts (6a15b73faea)
+
+#### Bug Fix (user-visible misbehavior in official stable or prestable release)
+
+* Backported in [#40760](https://github.com/ClickHouse/ClickHouse/issues/40760): Fix possible error 'Decimal math overflow' while parsing DateTime64. [#40546](https://github.com/ClickHouse/ClickHouse/pull/40546) ([Kruglov Pavel](https://github.com/Avogar)).
+* Backported in [#40811](https://github.com/ClickHouse/ClickHouse/issues/40811): In [#40595](https://github.com/ClickHouse/ClickHouse/issues/40595) it was reported that the `host_regexp` functionality was not working properly with a name to address resolution in `/etc/hosts`. It's fixed. [#40769](https://github.com/ClickHouse/ClickHouse/pull/40769) ([Arthur Passos](https://github.com/arthurpassos)).
+
+#### NOT FOR CHANGELOG / INSIGNIFICANT
+
+* Migrate artifactory [#40831](https://github.com/ClickHouse/ClickHouse/pull/40831) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
+
--- a/docs/en/operations/backup.md
+++ b/docs/en/operations/backup.md
@ -1,10 +1,10 @@
 ---
 slug: /en/operations/backup
 sidebar_position: 49
-sidebar_label: Data Backup
+sidebar_label: Data backup and restore
 ---

-# Data Backup
+# Data backup and restore

 While [replication](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [you can’t just drop tables with a MergeTree-like engine containing more than 50 Gb of data](server-configuration-parameters/settings.md#max-table-size-to-drop). However, these safeguards do not cover all possible cases and can be circumvented.

@ -16,21 +16,181 @@ Each company has different resources available and business requirements, so the
 Keep in mind that if you backed something up and never tried to restore it, chances are that restore will not work properly when you actually need it (or at least it will take longer than business can tolerate). So whatever backup approach you choose, make sure to automate the restore process as well, and practice it on a spare ClickHouse cluster regularly.
 :::

-## Duplicating Source Data Somewhere Else {#duplicating-source-data-somewhere-else}
+## Configure a backup destination
+
+In the examples below you will see the backup destination specified like `Disk('backups', '1.zip')`.  To prepare the destination add a file to `/etc/clickhouse-server/config.d/backup_disk.xml` specifying the backup destination.  For example, this file defines disk named `backups` and then adds that disk to the **backups > allowed_disk** list:
+
+```xml
+<clickhouse>
+    <storage_configuration>
+        <disks>
+<!--highlight-next-line -->
+            <backups>
+                <type>local</type>
+                <path>/backups/</path>
+            </backups>
+        </disks>
+    </storage_configuration>
+<!--highlight-start -->
+    <backups>
+        <allowed_disk>backups</allowed_disk>
+        <allowed_path>/backups/</allowed_path>
+    </backups>
+<!--highlight-end -->
+</clickhouse>
+```
+
+## Parameters
+
+Backups can be either full or incremental, and can include tables (including materialized views, projections, and dictionaries), and databases.  Backups can be synchronous (default) or asynchronous.  They can be compressed.  Backups can be password protected.
+
+The BACKUP and RESTORE statements take a list of DATABASE and TABLE names, a destination (or source), options and settings:
+- The destination for the backup, or the source for the restore.  This is based on the disk defined earlier.  For example `Disk('backups', 'filename.zip')`
+- ASYNC: backup or restore asynchronously
+- PARTITIONS: a list of partitions to restore
+- SETTINGS:
+    - [`compression_method`](en/sql-reference/statements/create/table/#column-compression-codecs) and compression_level
+    - `password` for the file on disk
+    - `base_backup`: the destination of the previous backup of this source.  For example, `Disk('backups', '1.zip')` 
+
+## Usage examples
+
+Backup and then restore a table:
+```
+BACKUP TABLE test.table TO Disk('backups', '1.zip')
+```
+
+Corresponding restore:
+```
+RESTORE TABLE test.table FROM Disk('backups', '1.zip')
+```
+
+:::note
+The above RESTORE would fail if the table `test.table` contains data, you would have to drop the table in order to test the RESTORE, or use the setting `allow_non_empty_tables=true`:
+```
+RESTORE TABLE test.table FROM Disk('backups', '1.zip') 
+SETTINGS allow_non_empty_tables=true
+```
+:::
+
+Tables can be restored, or backed up, with new names:
+```
+RESTORE TABLE test.table AS test.table2 FROM Disk('backups', '1.zip')
+```
+
+```
+BACKUP TABLE test.table3 AS test.table4 TO Disk('backups', '2.zip')
+```
+
+## Incremental backups
+
+Incremental backups can be taken by specifying the `base_backup`.
+:::note
+Incremental backups depend on the base backup.  The base backup must be kept available in order to be able to restore from an incremental backup.
+:::
+
+Incrementally store new data. The setting `base_backup` causes data since a previous backup to `Disk('backups', 'd.zip')` to be stored to `Disk('backups', 'incremental-a.zip')`:
+```
+BACKUP TABLE test.table TO Disk('backups', 'incremental-a.zip')
+  SETTINGS base_backup = Disk('backups', 'd.zip')
+```
+
+Restore all data from the incremental backup and the base_backup into a new table `test.table2`:
+```
+RESTORE TABLE test.table AS test.table2 
+  FROM Disk('backups', 'incremental-a.zip');
+```
+
+## Assign a password to the backup
+
+Backups written to disk can have a password applied to the file:
+```
+BACKUP TABLE test.table
+  TO Disk('backups', 'password-protected.zip')
+  SETTINGS password='qwerty'
+```
+
+Restore:
+```
+RESTORE TABLE test.table
+  FROM Disk('backups', 'password-protected.zip')
+  SETTINGS password='qwerty'
+```
+
+## Compression settings
+
+If you would like to specify the compression method or level:
+```
+BACKUP TABLE test.table
+  TO Disk('backups', 'filename.zip')
+  SETTINGS compression_method='lzma', compression_level=3
+```
+
+## Restore specific partitions
+If specific partitions associated with a table need to be restored these can be specified.  To restore partitions 1 and 4 from backup:
+```
+RESTORE TABLE test.table PARTITIONS '2', '3'
+  FROM Disk('backups', 'filename.zip')
+```
+
+## Check the status of backups
+
+The backup command returns an `id` and `status`, and that `id` can be used to get the status of the backup.  This is very useful to check the progress of long ASYNC backups.  The example below shows a failure that happened when trying to overwrite an existing backup file:
+```sql
+BACKUP TABLE helloworld.my_first_table TO Disk('backups', '1.zip') ASYNC
+```
+```response
+┌─id───────────────────────────────────┬─status──────────┐
+│ 7678b0b3-f519-4e6e-811f-5a0781a4eb52 │ CREATING_BACKUP │
+└──────────────────────────────────────┴─────────────────┘
+
+1 row in set. Elapsed: 0.001 sec.
+```
+
+```
+SELECT
+    *
+FROM system.backups
+where id='7678b0b3-f519-4e6e-811f-5a0781a4eb52'
+FORMAT Vertical
+```
+```response
+Row 1:
+──────
+id:                7678b0b3-f519-4e6e-811f-5a0781a4eb52
+name:              Disk('backups', '1.zip')
+#highlight-next-line
+status:            BACKUP_FAILED
+num_files:         0
+uncompressed_size: 0
+compressed_size:   0
+#highlight-next-line
+error:             Code: 598. DB::Exception: Backup Disk('backups', '1.zip') already exists. (BACKUP_ALREADY_EXISTS) (version 22.8.2.11 (official build))
+start_time:        2022-08-30 09:21:46
+end_time:          2022-08-30 09:21:46
+
+1 row in set. Elapsed: 0.002 sec.
+```
+
+## Alternatives
+
+ClickHouse stores data on disk, and there are many ways to backup disks.  These are some alternatives that have been used in the past, and that may fit in well in your environment.
+
+### Duplicating Source Data Somewhere Else {#duplicating-source-data-somewhere-else}

 Often data that is ingested into ClickHouse is delivered through some sort of persistent queue, such as [Apache Kafka](https://kafka.apache.org). In this case it is possible to configure an additional set of subscribers that will read the same data stream while it is being written to ClickHouse and store it in cold storage somewhere. Most companies already have some default recommended cold storage, which could be an object store or a distributed filesystem like [HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html).

-## Filesystem Snapshots {#filesystem-snapshots}
+### Filesystem Snapshots {#filesystem-snapshots}

 Some local filesystems provide snapshot functionality (for example, [ZFS](https://en.wikipedia.org/wiki/ZFS)), but they might not be the best choice for serving live queries. A possible solution is to create additional replicas with this kind of filesystem and exclude them from the [Distributed](../engines/table-engines/special/distributed.md) tables that are used for `SELECT` queries. Snapshots on such replicas will be out of reach of any queries that modify data. As a bonus, these replicas might have special hardware configurations with more disks attached per server, which would be cost-effective.

-## clickhouse-copier {#clickhouse-copier}
+### clickhouse-copier {#clickhouse-copier}

 [clickhouse-copier](../operations/utilities/clickhouse-copier.md) is a versatile tool that was initially created to re-shard petabyte-sized tables. It can also be used for backup and restore purposes because it reliably copies data between ClickHouse tables and clusters.

 For smaller volumes of data, a simple `INSERT INTO ... SELECT ...` to remote tables might work as well.

-## Manipulations with Parts {#manipulations-with-parts}
+### Manipulations with Parts {#manipulations-with-parts}

 ClickHouse allows using the `ALTER TABLE ... FREEZE PARTITION ...` query to create a local copy of table partitions. This is implemented using hardlinks to the `/var/lib/clickhouse/shadow/` folder, so it usually does not consume extra disk space for old data. The created copies of files are not handled by ClickHouse server, so you can just leave them there: you will have a simple backup that does not require any additional external system, but it will still be prone to hardware issues. For this reason, it’s better to remotely copy them to another location and then remove the local copies. Distributed filesystems and object stores are still a good options for this, but normal attached file servers with a large enough capacity might work as well (in this case the transfer will occur via the network filesystem or maybe [rsync](https://en.wikipedia.org/wiki/Rsync)).
 Data can be restored from backup using the `ALTER TABLE ... ATTACH PARTITION ...`
@ -39,4 +199,3 @@ For more information about queries related to partition manipulations, see the [

 A third-party tool is available to automate this approach: [clickhouse-backup](https://github.com/AlexAkulov/clickhouse-backup).

-[Original article](https://clickhouse.com/docs/en/operations/backup/) <!--hide-->
--- a/docs/en/sql-reference/functions/date-time-functions.md
+++ b/docs/en/sql-reference/functions/date-time-functions.md
@ -1069,7 +1069,7 @@ Formats a Time according to the given Format string. Format is a constant expres
 **Syntax**

 ``` sql
-formatDateTime(Time, Format\[, Timezone\])
+formatDateTime(Time, Format[, Timezone])
 ```

 **Returned value(s)**
@ -1105,6 +1105,7 @@ Using replacement fields, you can define a pattern for the resulting string. “
 | %w       | weekday as a decimal number with Sunday as 0 (0-6)      | 2          |
 | %y       | Year, last two digits (00-99)                           | 18         |
 | %Y       | Year                                                    | 2018       |
+| %z       | Time offset from UTC as +HHMM or -HHMM                  | -0500      |
 | %%       | a % sign                                                | %          |

 **Example**
--- a/docs/en/sql-reference/statements/alter/constraint.md
+++ b/docs/en/sql-reference/statements/alter/constraint.md
@ -9,8 +9,8 @@ sidebar_label: CONSTRAINT
 Constraints could be added or deleted using following syntax:

 ``` sql
-ALTER TABLE [db].name ADD CONSTRAINT constraint_name CHECK expression;
-ALTER TABLE [db].name DROP CONSTRAINT constraint_name;
+ALTER TABLE [db].name [ON CLUSTER cluster] ADD CONSTRAINT constraint_name CHECK expression;
+ALTER TABLE [db].name [ON CLUSTER cluster] DROP CONSTRAINT constraint_name;
 ```

 See more on [constraints](../../../sql-reference/statements/create/table.md#constraints).
--- a/docs/en/sql-reference/statements/alter/ttl.md
+++ b/docs/en/sql-reference/statements/alter/ttl.md
@ -11,7 +11,7 @@ sidebar_label: TTL
 You can change [table TTL](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-table-ttl) with a request of the following form:

 ``` sql
-ALTER TABLE table_name MODIFY TTL ttl_expression;
+ALTER TABLE [db.]table_name [ON CLUSTER cluster] MODIFY TTL ttl_expression;
 ```

 ## REMOVE TTL
@ -19,7 +19,7 @@ ALTER TABLE table_name MODIFY TTL ttl_expression;
 TTL-property can be removed from table with the following query:

 ```sql
-ALTER TABLE table_name REMOVE TTL
+ALTER TABLE [db.]table_name [ON CLUSTER cluster] REMOVE TTL
 ```

 **Example**
--- a/docs/ru/sql-reference/functions/date-time-functions.md
+++ b/docs/ru/sql-reference/functions/date-time-functions.md
@ -1017,7 +1017,7 @@ SELECT timeSlots(toDateTime64('1980-12-12 21:01:02.1234', 4, 'UTC'), toDecimal64
 **Синтаксис**

 ``` sql
-formatDateTime(Time, Format\[, Timezone\])
+formatDateTime(Time, Format[, Timezone])
 ```

 **Возвращаемое значение**
--- a/docs/ru/sql-reference/statements/alter/constraint.md
+++ b/docs/ru/sql-reference/statements/alter/constraint.md
@ -11,8 +11,8 @@ sidebar_label: "Манипуляции с ограничениями"
 Добавить или удалить ограничение можно с помощью запросов

 ``` sql
-ALTER TABLE [db].name ADD CONSTRAINT constraint_name CHECK expression;
-ALTER TABLE [db].name DROP CONSTRAINT constraint_name;
+ALTER TABLE [db].name [ON CLUSTER cluster] ADD CONSTRAINT constraint_name CHECK expression;
+ALTER TABLE [db].name [ON CLUSTER cluster] DROP CONSTRAINT constraint_name;
 ```

 Запросы выполняют добавление или удаление метаданных об ограничениях таблицы `[db].name`, поэтому выполняются мгновенно.
--- a/docs/ru/sql-reference/statements/alter/ttl.md
+++ b/docs/ru/sql-reference/statements/alter/ttl.md
@ -11,7 +11,7 @@ sidebar_label: TTL
 Вы можете изменить [TTL для таблицы](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-column-ttl) запросом следующего вида:

 ``` sql
-ALTER TABLE table-name MODIFY TTL ttl-expression
+ALTER TABLE [db.]table-name [ON CLUSTER cluster] MODIFY TTL ttl-expression
 ```

 ## REMOVE TTL {#remove-ttl}
@ -19,7 +19,7 @@ ALTER TABLE table-name MODIFY TTL ttl-expression
 Удалить табличный TTL можно запросом следующего вида:

 ```sql
-ALTER TABLE table_name REMOVE TTL
+ALTER TABLE [db.]table_name [ON CLUSTER cluster] REMOVE TTL
 ```

 **Пример**
--- a/docs/zh/sql-reference/functions/date-time-functions.md
+++ b/docs/zh/sql-reference/functions/date-time-functions.md
@ -956,7 +956,7 @@ SELECT
 **语法**

 ``` sql
-formatDateTime(Time, Format\[, Timezone\])
+formatDateTime(Time, Format[, Timezone])
 ```

 **返回值**
--- a/programs/server/Server.cpp
+++ b/programs/server/Server.cpp
@ -736,7 +736,9 @@ int Server::main(const std::vector<std::string> & /*args*/)
    std::vector<ProtocolServerAdapter> servers_to_start_before_tables;
    /// This object will periodically calculate some metrics.
    AsynchronousMetrics async_metrics(
-        global_context, config().getUInt("asynchronous_metrics_update_period_s", 1),
+        global_context,
+        config().getUInt("asynchronous_metrics_update_period_s", 1),
+        config().getUInt("asynchronous_heavy_metrics_update_period_s", 120),
        [&]() -> std::vector<ProtocolServerMetrics>
        {
            std::vector<ProtocolServerMetrics> metrics;
--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@ -65,9 +65,31 @@
        in specified format like JSON.
        For example, as below:
        {"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
-        To enable JSON logging support, just uncomment <formatting> tag below.
+        To enable JSON logging support, please uncomment the entire <formatting> tag below.
+        
+        a) You can modify key names by changing values under tag values inside <names> tag.
+        For example, to change DATE_TIME to MY_DATE_TIME, you can do like:
+            <date_time>MY_DATE_TIME</date_time>
+        b) You can stop unwanted log properties to appear in logs. To do so, you can simply comment out (recommended)
+        that property from this file.
+        For example, if you do not want your log to print query_id, you can comment out only <query_id> tag.
+        However, if you comment out all the tags under <names>, the program will print default values for as
+        below.
        -->
-        <!-- <formatting>json</formatting> -->
+        <!-- <formatting>
+            <type>json</type>
+            <names>
+                <date_time>date_time</date_time>
+                <thread_name>thread_name</thread_name>
+                <thread_id>thread_id</thread_id>
+                <level>level</level>
+                <query_id>query_id</query_id>
+                <logger_name>logger_name</logger_name>
+                <message>message</message>
+                <source_file>source_file</source_file>
+                <source_line>source_line</source_line>
+            </names>
+        </formatting> -->
    </logger>

    <!-- Add headers to response in options request. OPTIONS method is used in CORS preflight requests. -->
--- a/src/Backups/BackupImpl.cpp
+++ b/src/Backups/BackupImpl.cpp
@ -625,7 +625,7 @@ CheckBackupResult checkBaseBackupForFile(const SizeAndChecksum & base_backup_inf
 {
    /// We cannot reuse base backup because our file is smaller
    /// than file stored in previous backup
-    if (new_entry_info.size > base_backup_info.first)
+    if (new_entry_info.size < base_backup_info.first)
        return CheckBackupResult::HasNothing;

    if (base_backup_info.first == new_entry_info.size)
@ -682,8 +682,6 @@ ChecksumsForNewEntry calculateNewEntryChecksumsIfNeeded(BackupEntryPtr entry, si

 void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
 {
-
-    std::lock_guard lock{mutex};
    if (open_mode != OpenMode::WRITE)
        throw Exception("Backup is not opened for writing", ErrorCodes::LOGICAL_ERROR);

@ -802,7 +800,12 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
    /// or have only prefix of it in previous backup. Let's go long path.

    info.data_file_name = info.file_name;
+
+    if (use_archives)
+    {
+        std::lock_guard lock{mutex};
        info.archive_suffix = current_archive_suffix;
+    }

    bool is_data_file_required;
    coordination->addFileInfo(info, is_data_file_required);
@ -818,9 +821,11 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
    /// if source and destination are compatible
    if (!use_archives && info.base_size == 0 && writer->supportNativeCopy(reader_description))
    {
-
+        /// Should be much faster than writing data through server.
        LOG_TRACE(log, "Will copy file {} using native copy", adjusted_path);
-        /// Should be much faster than writing data through server
+
+        /// NOTE: `mutex` must be unlocked here otherwise writing will be in one thread maximum and hence slow.
+
        writer->copyFileNative(entry->tryGetDiskIfExists(), entry->getFilePath(), info.data_file_name);
    }
    else
@ -838,6 +843,11 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
        if (use_archives)
        {
            LOG_TRACE(log, "Adding file {} to archive", adjusted_path);
+
+            /// An archive must be written strictly in one thread, so it's correct to lock the mutex for all the time we're writing the file
+            /// to the archive.
+            std::lock_guard lock{mutex};
+
            String archive_suffix = current_archive_suffix;
            bool next_suffix = false;
            if (current_archive_suffix.empty() && is_internal_backup)
@ -859,6 +869,7 @@ void BackupImpl::writeFile(const String & file_name, BackupEntryPtr entry)
        }
        else
        {
+            /// NOTE: `mutex` must be unlocked here otherwise writing will be in one thread maximum and hence slow.
            writer->copyFileThroughBuffer(std::move(read_buffer), info.data_file_name);
        }
    }
--- a/src/Backups/BackupImpl.h
+++ b/src/Backups/BackupImpl.h
@ -130,7 +130,7 @@ private:
    std::pair<String, std::shared_ptr<IArchiveWriter>> archive_writers[2];
    String current_archive_suffix;
    String lock_file_name;
-    size_t num_files_written = 0;
+    std::atomic<size_t> num_files_written = 0;
    bool writing_finalized = false;
    const Poco::Logger * log;
 };
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@ -247,6 +247,7 @@ add_object_library(clickhouse_databases Databases)
 add_object_library(clickhouse_databases_mysql Databases/MySQL)
 add_object_library(clickhouse_disks Disks)
 add_object_library(clickhouse_interpreters Interpreters)
+add_object_library(clickhouse_interpreters_cache Interpreters/Cache)
 add_object_library(clickhouse_interpreters_access Interpreters/Access)
 add_object_library(clickhouse_interpreters_mysql Interpreters/MySQL)
 add_object_library(clickhouse_interpreters_clusterproxy Interpreters/ClusterProxy)
--- a/src/Client/ClientBase.cpp
+++ b/src/Client/ClientBase.cpp
@ -91,13 +91,6 @@ static const NameSet exit_strings
    "q", "й", "\\q", "\\Q", "\\й", "\\Й", ":q", "Жй"
 };

-static const std::initializer_list<std::pair<String, String>> backslash_aliases
-{
-    { "\\l", "SHOW DATABASES" },
-    { "\\d", "SHOW TABLES" },
-    { "\\c", "USE" },
-};
-

 namespace ErrorCodes
 {
@ -1999,6 +1992,21 @@ void ClientBase::runInteractive()
    /// Enable bracketed-paste-mode so that we are able to paste multiline queries as a whole.
    lr.enableBracketedPaste();

+    static const std::initializer_list<std::pair<String, String>> backslash_aliases =
+        {
+            { "\\l", "SHOW DATABASES" },
+            { "\\d", "SHOW TABLES" },
+            { "\\c", "USE" },
+        };
+
+    static const std::initializer_list<String> repeat_last_input_aliases =
+        {
+            ".",  /// Vim shortcut
+            "/"   /// Oracle SQL Plus shortcut
+        };
+
+    String last_input;
+
    do
    {
        auto input = lr.readLine(prompt(), ":-] ");
@ -2034,10 +2042,20 @@ void ClientBase::runInteractive()
            }
        }

+        for (const auto & alias : repeat_last_input_aliases)
+        {
+            if (input == alias)
+            {
+                input  = last_input;
+                break;
+            }
+        }
+
        try
        {
            if (!processQueryText(input))
                break;
+            last_input = input;
        }
        catch (const Exception & e)
        {
--- a/src/Columns/ColumnObject.cpp
+++ b/src/Columns/ColumnObject.cpp
@ -12,6 +12,7 @@
 #include <Interpreters/castColumn.h>
 #include <Interpreters/convertFieldToType.h>
 #include <Common/HashTable/HashSet.h>
+#include <Processors/Transforms/ColumnGathererTransform.h>

 namespace DB
 {
@ -154,13 +155,15 @@ FieldInfo getFieldInfo(const Field & field)
 {
    FieldVisitorToScalarType to_scalar_type_visitor;
    applyVisitor(to_scalar_type_visitor, field);
+    FieldVisitorToNumberOfDimensions to_number_dimension_visitor;

    return
    {
        to_scalar_type_visitor.getScalarType(),
        to_scalar_type_visitor.haveNulls(),
        to_scalar_type_visitor.needConvertField(),
-        applyVisitor(FieldVisitorToNumberOfDimensions(), field),
+        applyVisitor(to_number_dimension_visitor, field),
+        to_number_dimension_visitor.need_fold_dimension
    };
 }

@ -821,6 +824,44 @@ MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
    return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.cloneResized(new_size); });
 }

+void ColumnObject::getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation & res) const
+{
+    res.resize(num_rows);
+    std::iota(res.begin(), res.end(), 0);
+}
+
+void ColumnObject::compareColumn(const IColumn & rhs, size_t rhs_row_num,
+                                 PaddedPODArray<UInt64> * row_indexes, PaddedPODArray<Int8> & compare_results,
+                                 int direction, int nan_direction_hint) const
+{
+    return doCompareColumn<ColumnObject>(assert_cast<const ColumnObject &>(rhs), rhs_row_num, row_indexes,
+                                        compare_results, direction, nan_direction_hint);
+}
+
+void ColumnObject::getExtremes(Field & min, Field & max) const
+{
+    if (num_rows == 0)
+    {
+        min = Object();
+        max = Object();
+    }
+    else
+    {
+        get(0, min);
+        get(0, max);
+    }
+}
+
+MutableColumns ColumnObject::scatter(ColumnIndex num_columns, const Selector & selector) const
+{
+    return scatterImpl<ColumnObject>(num_columns, selector);
+}
+
+void ColumnObject::gather(ColumnGathererStream & gatherer)
+{
+    gatherer.gather(*this);
+}
+
 const ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key) const
 {
    if (const auto * node = subcolumns.findLeaf(key))
--- a/src/Columns/ColumnObject.h
+++ b/src/Columns/ColumnObject.h
@ -15,7 +15,7 @@ namespace DB

 namespace ErrorCodes
 {
-    extern const int LOGICAL_ERROR;
+    extern const int NOT_IMPLEMENTED;
 }

 /// Info that represents a scalar or array field in a decomposed view.
@ -35,6 +35,10 @@ struct FieldInfo

    /// Number of dimension in array. 0 if field is scalar.
    size_t num_dimensions;
+
+    /// If true then this field is an array of variadic dimension field
+    /// and we need to normalize the dimension
+    bool need_fold_dimension;
 };

 FieldInfo getFieldInfo(const Field & field);
@ -220,6 +224,19 @@ public:
    ColumnPtr replicate(const Offsets & offsets) const override;
    MutableColumnPtr cloneResized(size_t new_size) const override;

+    /// Order of rows in ColumnObject is undefined.
+    void getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation & res) const override;
+    void compareColumn(const IColumn & rhs, size_t rhs_row_num,
+                       PaddedPODArray<UInt64> * row_indexes, PaddedPODArray<Int8> & compare_results,
+                       int direction, int nan_direction_hint) const override;
+
+    void updatePermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &, EqualRanges &) const override {}
+    int compareAt(size_t, size_t, const IColumn &, int) const override { return 0; }
+    void getExtremes(Field & min, Field & max) const override;
+
+    MutableColumns scatter(ColumnIndex num_columns, const Selector & selector) const override;
+    void gather(ColumnGathererStream & gatherer) override;
+
    /// All other methods throw exception.

    StringRef getDataAt(size_t) const override { throwMustBeConcrete(); }
@ -232,14 +249,7 @@ public:
    void updateWeakHash32(WeakHash32 &) const override { throwMustBeConcrete(); }
    void updateHashFast(SipHash &) const override { throwMustBeConcrete(); }
    void expand(const Filter &, bool) override { throwMustBeConcrete(); }
-    int compareAt(size_t, size_t, const IColumn &, int) const override { throwMustBeConcrete(); }
-    void compareColumn(const IColumn &, size_t, PaddedPODArray<UInt64> *, PaddedPODArray<Int8> &, int, int) const override { throwMustBeConcrete(); }
    bool hasEqualValues() const override { throwMustBeConcrete(); }
-    void getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &) const override { throwMustBeConcrete(); }
-    void updatePermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &, EqualRanges &) const override { throwMustBeConcrete(); }
-    MutableColumns scatter(ColumnIndex, const Selector &) const override { throwMustBeConcrete(); }
-    void gather(ColumnGathererStream &) override { throwMustBeConcrete(); }
-    void getExtremes(Field &, Field &) const override { throwMustBeConcrete(); }
    size_t byteSizeAt(size_t) const override { throwMustBeConcrete(); }
    double getRatioOfDefaultRows(double) const override { throwMustBeConcrete(); }
    void getIndicesOfNonDefaultRows(Offsets &, size_t, size_t) const override { throwMustBeConcrete(); }
@ -247,7 +257,7 @@ public:
 private:
    [[noreturn]] static void throwMustBeConcrete()
    {
-        throw Exception("ColumnObject must be converted to ColumnTuple before use", ErrorCodes::LOGICAL_ERROR);
+        throw Exception("ColumnObject must be converted to ColumnTuple before use", ErrorCodes::NOT_IMPLEMENTED);
    }

    template <typename Func>
--- a/src/Common/ProfileEvents.cpp
+++ b/src/Common/ProfileEvents.cpp
@ -318,6 +318,7 @@ The server successfully detected this situation and will download merged part fr
    \
    M(FileSegmentWaitReadBufferMicroseconds, "Metric per file segment. Time spend waiting for internal read buffer (includes cache waiting)") \
    M(FileSegmentReadMicroseconds, "Metric per file segment. Time spend reading from file") \
+    M(FileSegmentWriteMicroseconds, "Metric per file segment. Time spend writing cache") \
    M(FileSegmentCacheWriteMicroseconds, "Metric per file segment. Time spend writing data to cache") \
    M(FileSegmentPredownloadMicroseconds, "Metric per file segment. Time spent predownloading data to cache (predownloading - finishing file segment download (after someone who failed to do that) up to the point current thread was requested to do)") \
    M(FileSegmentUsedBytes, "Metric per file segment. How many bytes were actually used from current file segment") \
--- a/src/Coordination/KeeperServer.cpp
+++ b/src/Coordination/KeeperServer.cpp
@ -370,6 +370,7 @@ void KeeperServer::startup(const Poco::Util::AbstractConfiguration & config, boo
    {
        auto log_entries = log_store->log_entries(state_machine->last_commit_index() + 1, next_log_idx);

+        size_t preprocessed = 0;
        LOG_INFO(log, "Preprocessing {} log entries", log_entries->size());
        auto idx = state_machine->last_commit_index() + 1;
        for (const auto & entry : *log_entries)
@ -378,7 +379,12 @@ void KeeperServer::startup(const Poco::Util::AbstractConfiguration & config, boo
                state_machine->pre_commit(idx, entry->get_buf());

            ++idx;
+            ++preprocessed;
+
+            if (preprocessed % 50000 == 0)
+                LOG_TRACE(log, "Preprocessed {}/{} entries", preprocessed, log_entries->size());
        }
+        LOG_INFO(log, "Preprocessing done");
    }

    loadLatestConfig();
--- a/src/Coordination/KeeperStorage.cpp
+++ b/src/Coordination/KeeperStorage.cpp
@ -369,8 +369,16 @@ void KeeperStorage::UncommittedState::addDeltas(std::vector<Delta> new_deltas)
        const auto & added_delta = deltas.emplace_back(std::move(delta));

        if (!added_delta.path.empty())
+        {
+            deltas_for_path[added_delta.path].push_back(&added_delta);
            applyDelta(added_delta);
        }
+        else if (const auto * auth_delta = std::get_if<AddAuthDelta>(&added_delta.operation))
+        {
+            auto & uncommitted_auth = session_and_auth[auth_delta->session_id];
+            uncommitted_auth.emplace_back(&auth_delta->auth_id);
+        }
+    }
 }

 void KeeperStorage::UncommittedState::commit(int64_t commit_zxid)
@ -385,6 +393,26 @@ void KeeperStorage::UncommittedState::commit(int64_t commit_zxid)
            break;
        }

+        auto & front_delta = deltas.front();
+
+        if (!front_delta.path.empty())
+        {
+            auto & path_deltas = deltas_for_path.at(front_delta.path);
+            assert(path_deltas.front() == &front_delta);
+            path_deltas.pop_front();
+            if (path_deltas.empty())
+                deltas_for_path.erase(front_delta.path);
+        }
+        else if (auto * add_auth = std::get_if<AddAuthDelta>(&front_delta.operation))
+        {
+            auto & uncommitted_auth = session_and_auth[add_auth->session_id];
+            assert(!uncommitted_auth.empty() && uncommitted_auth.front() == &add_auth->auth_id);
+            uncommitted_auth.pop_front();
+            if (uncommitted_auth.empty())
+                session_and_auth.erase(add_auth->session_id);
+
+        }
+
        deltas.pop_front();
    }

@ -405,10 +433,12 @@ void KeeperStorage::UncommittedState::rollback(int64_t rollback_zxid)
            deltas.back().zxid,
            rollback_zxid);

+    auto delta_it = deltas.rbegin();
+
    // we need to undo ephemeral mapping modifications
    // CreateNodeDelta added ephemeral for session id -> we need to remove it
    // RemoveNodeDelta removed ephemeral for session id -> we need to add it back
-    for (auto delta_it = deltas.rbegin(); delta_it != deltas.rend(); ++delta_it)
+    for (; delta_it != deltas.rend(); ++delta_it)
    {
        if (delta_it->zxid < rollback_zxid)
            break;
@ -431,29 +461,56 @@ void KeeperStorage::UncommittedState::rollback(int64_t rollback_zxid)
                    }
                },
                delta_it->operation);
+
+            auto & path_deltas = deltas_for_path.at(delta_it->path);
+            if (path_deltas.back() == &*delta_it)
+            {
+                path_deltas.pop_back();
+                if (path_deltas.empty())
+                    deltas_for_path.erase(delta_it->path);
+            }
+        }
+        else if (auto * add_auth = std::get_if<AddAuthDelta>(&delta_it->operation))
+        {
+            auto & uncommitted_auth = session_and_auth[add_auth->session_id];
+            if (uncommitted_auth.back() == &add_auth->auth_id)
+            {
+                uncommitted_auth.pop_back();
+                if (uncommitted_auth.empty())
+                    session_and_auth.erase(add_auth->session_id);
+            }
        }
    }

-    std::erase_if(deltas, [rollback_zxid](const auto & delta) { return delta.zxid == rollback_zxid; });
+    if (delta_it == deltas.rend())
+        deltas.clear();
+    else
+        deltas.erase(delta_it.base(), deltas.end());

-    std::unordered_set<std::string> deleted_nodes;
+    absl::flat_hash_set<std::string> deleted_nodes;
    std::erase_if(
        nodes,
        [&, rollback_zxid](const auto & node)
        {
            if (node.second.zxid == rollback_zxid)
            {
-                deleted_nodes.emplace(node.first);
+                deleted_nodes.emplace(std::move(node.first));
                return true;
            }
            return false;
        });

    // recalculate all the uncommitted deleted nodes
-    for (const auto & delta : deltas)
+    for (const auto & deleted_node : deleted_nodes)
    {
-        if (!delta.path.empty() && deleted_nodes.contains(delta.path))
-            applyDelta(delta);
+        auto path_delta_it = deltas_for_path.find(deleted_node);
+        if (path_delta_it != deltas_for_path.end())
+        {
+            for (const auto & delta : path_delta_it->second)
+            {
+                applyDelta(*delta);
+            }
+        }
    }
 }

--- a/src/Coordination/KeeperStorage.h
+++ b/src/Coordination/KeeperStorage.h
@ -229,27 +229,42 @@ public:

        bool hasACL(int64_t session_id, bool is_local, std::function<bool(const AuthID &)> predicate)
        {
-            for (const auto & session_auth : storage.session_and_auth[session_id])
+            const auto check_auth = [&](const auto & auth_ids)
            {
-                if (predicate(session_auth))
+                for (const auto & auth : auth_ids)
+                {
+                    using TAuth = std::remove_reference_t<decltype(auth)>;
+
+                    const AuthID * auth_ptr = nullptr;
+                    if constexpr (std::is_pointer_v<TAuth>)
+                        auth_ptr = auth;
+                    else
+                        auth_ptr = &auth;
+
+                    if (predicate(*auth_ptr))
                        return true;
                }
+                return false;
+            };

            if (is_local)
-                return false;
+                return check_auth(storage.session_and_auth[session_id]);

-            for (const auto & delta : deltas)
-            {
-                if (const auto * auth_delta = std::get_if<KeeperStorage::AddAuthDelta>(&delta.operation);
-                    auth_delta && auth_delta->session_id == session_id && predicate(auth_delta->auth_id))
+            if (check_auth(storage.session_and_auth[session_id]))
                return true;
-            }

+            // check if there are uncommitted
+            const auto auth_it = session_and_auth.find(session_id);
+            if (auth_it == session_and_auth.end())
                return false;
+
+            return check_auth(auth_it->second);
        }

        std::shared_ptr<Node> tryGetNodeFromStorage(StringRef path) const;

+        std::unordered_map<int64_t, std::list<const AuthID *>> session_and_auth;
+
        struct UncommittedNode
        {
            std::shared_ptr<Node> node{nullptr};
@ -257,7 +272,32 @@ public:
            int64_t zxid{0};
        };

-        mutable std::unordered_map<std::string, UncommittedNode> nodes;
+        struct Hash
+        {
+            auto operator()(const std::string_view view) const
+            {
+                SipHash hash;
+                hash.update(view);
+                return hash.get64();
+            }
+
+            using is_transparent = void; // required to make find() work with different type than key_type
+        };
+
+        struct Equal
+        {
+            auto operator()(const std::string_view a,
+                            const std::string_view b) const
+            {
+                return a == b;
+            }
+
+            using is_transparent = void; // required to make find() work with different type than key_type
+        };
+
+        mutable std::unordered_map<std::string, UncommittedNode, Hash, Equal> nodes;
+        std::unordered_map<std::string, std::list<const Delta *>, Hash, Equal> deltas_for_path;
+
        std::list<Delta> deltas;
        KeeperStorage & storage;
    };
--- a/src/Daemon/BaseDaemon.cpp
+++ b/src/Daemon/BaseDaemon.cpp
@ -1016,8 +1016,8 @@ void BaseDaemon::setupWatchdog()
        if (config().getRawString("logger.stream_compress", "false") == "true")
        {
            Poco::AutoPtr<OwnPatternFormatter> pf;
-            if (config().getString("logger.formatting", "") == "json")
-                pf = new OwnJSONPatternFormatter;
+            if (config().getString("logger.formatting.type", "") == "json")
+                pf = new OwnJSONPatternFormatter(config());
            else
                pf = new OwnPatternFormatter;
            Poco::AutoPtr<DB::OwnFormattingChannel> log = new DB::OwnFormattingChannel(pf, new Poco::ConsoleChannel(std::cerr));
--- a/src/DataTypes/ObjectUtils.cpp
+++ b/src/DataTypes/ObjectUtils.cpp
@ -737,14 +737,31 @@ Field FieldVisitorReplaceScalars::operator()(const Array & x) const
    return res;
 }

-size_t FieldVisitorToNumberOfDimensions::operator()(const Array & x) const
+size_t FieldVisitorToNumberOfDimensions::operator()(const Array & x)
 {
    const size_t size = x.size();
    size_t dimensions = 0;
    for (size_t i = 0; i < size; ++i)
-        dimensions = std::max(dimensions, applyVisitor(*this, x[i]));
+    {
+        size_t element_dimensions = applyVisitor(*this, x[i]);
+        if (i > 0 && element_dimensions != dimensions)
+            need_fold_dimension = true;
+        dimensions = std::max(dimensions, element_dimensions);
+    }

    return 1 + dimensions;
 }

+Field FieldVisitorFoldDimension::operator()(const Array & x) const
+{
+    if (num_dimensions_to_fold == 0)
+        return x;
+    const size_t size = x.size();
+    Array res(size);
+    for (size_t i = 0; i < size; ++i)
+    {
+        res[i] = applyVisitor(FieldVisitorFoldDimension(num_dimensions_to_fold - 1), x[i]);
+    }
+    return res;
+}
 }
--- a/src/DataTypes/ObjectUtils.h
+++ b/src/DataTypes/ObjectUtils.h
@ -114,10 +114,42 @@ private:
 class FieldVisitorToNumberOfDimensions : public StaticVisitor<size_t>
 {
 public:
-    size_t operator()(const Array & x) const;
+    size_t operator()(const Array & x);

    template <typename T>
    size_t operator()(const T &) const { return 0; }
+
+    bool need_fold_dimension = false;
+};
+
+/// Fold field (except Null) to the higher dimension, e.g. `1` -- fold 2 --> `[[1]]`
+/// used to normalize dimension of element in an array. e.g [1, [2]] --> [[1], [2]]
+class FieldVisitorFoldDimension : public StaticVisitor<Field>
+{
+public:
+    explicit FieldVisitorFoldDimension(size_t num_dimensions_to_fold_) : num_dimensions_to_fold(num_dimensions_to_fold_) { }
+
+    Field operator()(const Array & x) const;
+
+    Field operator()(const Null & x) const { return x; }
+
+    template <typename T>
+    Field operator()(const T & x) const
+    {
+        if (num_dimensions_to_fold == 0)
+            return x;
+        Array res(1,x);
+        for (size_t i = 1; i < num_dimensions_to_fold; ++i)
+        {
+            Array new_res;
+            new_res.push_back(std::move(res));
+            res = std::move(new_res);
+        }
+        return res;
+    }
+
+private:
+    size_t num_dimensions_to_fold;
 };

 /// Receives range of objects, which contains collections
--- a/src/DataTypes/Serializations/SerializationObject.cpp
+++ b/src/DataTypes/Serializations/SerializationObject.cpp
@ -65,6 +65,8 @@ void SerializationObject<Parser>::deserializeTextImpl(IColumn & column, Reader &
    for (size_t i = 0; i < paths.size(); ++i)
    {
        auto field_info = getFieldInfo(values[i]);
+        if (field_info.need_fold_dimension)
+            values[i] = applyVisitor(FieldVisitorFoldDimension(field_info.num_dimensions), std::move(values[i]));
        if (isNothing(field_info.scalar_type))
            continue;

@ -258,7 +260,12 @@ void SerializationObject<Parser>::serializeBinaryBulkWithMultipleStreams(
    auto * state_object = checkAndGetState<SerializeStateObject>(state);

    if (!column_object.isFinalized())
-        throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot write non-finalized ColumnObject");
+    {
+        auto finalized_object = column_object.clone();
+        assert_cast<ColumnObject &>(*finalized_object).finalize();
+        serializeBinaryBulkWithMultipleStreams(*finalized_object, offset, limit, settings, state);
+        return;
+    }

    auto [tuple_column, tuple_type] = unflattenObjectToTuple(column_object);

--- a/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp
+++ b/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp
@ -80,7 +80,7 @@ void CachedOnDiskReadBufferFromFile::appendFilesystemCacheLog(
        .file_segment_range = { file_segment_range.left, file_segment_range.right },
        .requested_range = { first_offset, read_until_position },
        .file_segment_size = file_segment_range.size(),
-        .cache_attempted = true,
+        .read_from_cache_attempted = true,
        .read_buffer_id = current_buffer_id,
        .profile_counters = std::make_shared<ProfileEvents::Counters::Snapshot>(
            current_file_segment_counters.getPartiallyAtomicSnapshot()),
--- a/src/Disks/IO/CachedOnDiskReadBufferFromFile.h
+++ b/src/Disks/IO/CachedOnDiskReadBufferFromFile.h
@ -1,13 +1,13 @@
 #pragma once

-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Common/logger_useful.h>
 #include <IO/SeekableReadBuffer.h>
 #include <IO/WriteBufferFromFile.h>
 #include <IO/ReadSettings.h>
 #include <IO/ReadBufferFromFileBase.h>
 #include <Interpreters/FilesystemCacheLog.h>
-#include <Common/FileSegment.h>
+#include <Interpreters/Cache/FileSegment.h>


 namespace CurrentMetrics
--- a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp
+++ b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp
@ -1,7 +1,7 @@
 #include "CachedOnDiskWriteBufferFromFile.h"

-#include <Common/FileCacheFactory.h>
-#include <Common/FileSegment.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
+#include <Interpreters/Cache/FileSegment.h>
 #include <Common/logger_useful.h>
 #include <Interpreters/FilesystemCacheLog.h>
 #include <Interpreters/Context.h>
@ -11,6 +11,7 @@ namespace ProfileEvents
 {
    extern const Event CachedWriteBufferCacheWriteBytes;
    extern const Event CachedWriteBufferCacheWriteMicroseconds;
+    extern const Event FileSegmentWriteMicroseconds;
 }

 namespace DB
@ -118,6 +119,9 @@ void CachedOnDiskWriteBufferFromFile::cacheData(char * data, size_t size)

    ProfileEvents::increment(ProfileEvents::CachedWriteBufferCacheWriteBytes, size);
    ProfileEvents::increment(ProfileEvents::CachedWriteBufferCacheWriteMicroseconds, watch.elapsedMicroseconds());
+
+    current_file_segment_counters.increment(
+        ProfileEvents::FileSegmentWriteMicroseconds, watch.elapsedMicroseconds());
 }

 void CachedOnDiskWriteBufferFromFile::appendFilesystemCacheLog(const FileSegment & file_segment)
@ -134,7 +138,7 @@ void CachedOnDiskWriteBufferFromFile::appendFilesystemCacheLog(const FileSegment
            .requested_range = {},
            .cache_type = FilesystemCacheLogElement::CacheType::WRITE_THROUGH_CACHE,
            .file_segment_size = file_segment_range.size(),
-            .cache_attempted = false,
+            .read_from_cache_attempted = false,
            .read_buffer_id = {},
            .profile_counters = std::make_shared<ProfileEvents::Counters::Snapshot>(current_file_segment_counters.getPartiallyAtomicSnapshot()),
        };
--- a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h
+++ b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h
@ -2,7 +2,7 @@

 #include <IO/WriteBufferFromFileDecorator.h>
 #include <IO/WriteSettings.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Interpreters/FilesystemCacheLog.h>

 namespace Poco
--- a/src/Disks/IO/ReadBufferFromRemoteFSGather.cpp
+++ b/src/Disks/IO/ReadBufferFromRemoteFSGather.cpp
@ -81,7 +81,7 @@ void ReadBufferFromRemoteFSGather::appendFilesystemCacheLog()
        .file_segment_range = { 0, current_file_size },
        .cache_type = FilesystemCacheLogElement::CacheType::READ_FROM_FS_BYPASSING_CACHE,
        .file_segment_size = total_bytes_read_from_current_file,
-        .cache_attempted = false,
+        .read_from_cache_attempted = false,
    };

    if (auto cache_log = Context::getGlobalContextInstance()->getFilesystemCacheLog())
--- a/src/Disks/ObjectStorages/Cached/CachedObjectStorage.cpp
+++ b/src/Disks/ObjectStorages/Cached/CachedObjectStorage.cpp
@ -4,8 +4,8 @@
 #include <IO/BoundedReadBuffer.h>
 #include <Disks/IO/CachedOnDiskWriteBufferFromFile.h>
 #include <Disks/IO/CachedOnDiskReadBufferFromFile.h>
-#include <Common/FileCache.h>
-#include <Common/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
 #include <Common/CurrentThread.h>
 #include <Common/logger_useful.h>
 #include <filesystem>
--- a/src/Disks/ObjectStorages/Cached/CachedObjectStorage.h
+++ b/src/Disks/ObjectStorages/Cached/CachedObjectStorage.h
@ -1,8 +1,8 @@
 #pragma once

 #include <Disks/ObjectStorages/IObjectStorage.h>
-#include <Common/FileCache.h>
-#include <Common/FileCacheSettings.h>
+#include <Interpreters/Cache/FileCache.h>
+#include <Interpreters/Cache/FileCacheSettings.h>

 namespace Poco
 {
--- a/src/Disks/ObjectStorages/Cached/registerDiskCache.cpp
+++ b/src/Disks/ObjectStorages/Cached/registerDiskCache.cpp
@ -1,6 +1,6 @@
-#include <Common/FileCacheSettings.h>
-#include <Common/FileCacheFactory.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCacheSettings.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Common/logger_useful.h>
 #include <Common/assert_cast.h>
 #include <Disks/DiskFactory.h>
--- a/src/Disks/ObjectStorages/DiskObjectStorage.cpp
+++ b/src/Disks/ObjectStorages/DiskObjectStorage.cpp
@ -10,7 +10,6 @@
 #include <Common/quoteString.h>
 #include <Common/logger_useful.h>
 #include <Common/filesystemHelpers.h>
-#include <Common/FileCache.h>
 #include <Disks/ObjectStorages/Cached/CachedObjectStorage.h>
 #include <Disks/ObjectStorages/DiskObjectStorageRemoteMetadataRestoreHelper.h>
 #include <Disks/ObjectStorages/DiskObjectStorageTransaction.h>
--- a/src/Disks/ObjectStorages/DiskObjectStorage.h
+++ b/src/Disks/ObjectStorages/DiskObjectStorage.h
@ -2,7 +2,6 @@

 #include <Disks/IDisk.h>
 #include <Disks/ObjectStorages/IObjectStorage.h>
-#include <Common/FileCache_fwd.h>
 #include <Disks/ObjectStorages/DiskObjectStorageRemoteMetadataRestoreHelper.h>
 #include <Disks/ObjectStorages/IMetadataStorage.h>
 #include <Disks/ObjectStorages/DiskObjectStorageTransaction.h>
--- a/src/Disks/ObjectStorages/DiskObjectStorageTransaction.cpp
+++ b/src/Disks/ObjectStorages/DiskObjectStorageTransaction.cpp
@ -3,6 +3,7 @@
 #include <Common/checkStackSize.h>
 #include <ranges>
 #include <Common/logger_useful.h>
+#include <Common/Exception.h>


 namespace DB
@ -633,9 +634,11 @@ void DiskObjectStorageTransaction::commit()
        {
            operations_to_execute[i]->execute(metadata_transaction);
        }
-        catch (Exception & ex)
+        catch (...)
        {
-            ex.addMessage(fmt::format("While executing operation #{} ({})", i, operations_to_execute[i]->getInfoForLog()));
+            tryLogCurrentException(
+                &Poco::Logger::get("DiskObjectStorageTransaction"),
+                fmt::format("An error occurred while executing transaction's operation #{} ({})", i, operations_to_execute[i]->getInfoForLog()));

            for (int64_t j = i; j >= 0; --j)
            {
@ -643,9 +646,12 @@ void DiskObjectStorageTransaction::commit()
                {
                    operations_to_execute[j]->undo();
                }
-                catch (Exception & rollback_ex)
+                catch (...)
                {
-                    rollback_ex.addMessage(fmt::format("While undoing operation #{}", i));
+                    tryLogCurrentException(
+                        &Poco::Logger::get("DiskObjectStorageTransaction"),
+                        fmt::format("An error occurred while undoing transaction's operation #{}", i));
+
                    throw;
                }
            }
--- a/src/Disks/ObjectStorages/IObjectStorage.h
+++ b/src/Disks/ObjectStorages/IObjectStorage.h
@ -17,7 +17,6 @@
 #include <Disks/ObjectStorages/StoredObject.h>
 #include <Disks/DiskType.h>
 #include <Common/ThreadPool.h>
-#include <Common/FileCache.h>
 #include <Disks/WriteMode.h>


--- a/src/Disks/ObjectStorages/LocalObjectStorage.cpp
+++ b/src/Disks/ObjectStorages/LocalObjectStorage.cpp
@ -1,8 +1,6 @@
 #include <Disks/ObjectStorages/LocalObjectStorage.h>

 #include <Disks/ObjectStorages/DiskObjectStorageCommon.h>
-#include <Common/FileCache.h>
-#include <Common/FileCacheFactory.h>
 #include <Common/filesystemHelpers.h>
 #include <Common/logger_useful.h>
 #include <Disks/IO/createReadBufferFromFileBase.h>
--- a/src/Disks/ObjectStorages/S3/diskSettings.cpp
+++ b/src/Disks/ObjectStorages/S3/diskSettings.cpp
@ -22,8 +22,6 @@
 #include <Disks/DiskRestartProxy.h>
 #include <Disks/DiskLocal.h>

-#include <Common/FileCacheFactory.h>
-
 namespace DB
 {

--- a/src/Disks/ObjectStorages/S3/parseConfig.h
+++ b/src/Disks/ObjectStorages/S3/parseConfig.h
@ -6,7 +6,6 @@

 #include <aws/core/client/DefaultRetryStrategy.h>
 #include <IO/S3Common.h>
-#include <Disks/DiskCacheWrapper.h>
 #include <Storages/StorageS3Settings.h>
 #include <Disks/ObjectStorages/S3/ProxyConfiguration.h>
 #include <Disks/ObjectStorages/S3/ProxyListConfiguration.h>
@ -14,7 +13,6 @@
 #include <Disks/DiskRestartProxy.h>
 #include <Disks/DiskLocal.h>
 #include <Disks/ObjectStorages/DiskObjectStorageCommon.h>
-#include <Common/FileCacheFactory.h>


 namespace DB
--- a/src/Functions/formatDateTime.cpp
+++ b/src/Functions/formatDateTime.cpp
@ -272,6 +272,19 @@ private:
            writeNumber2(target + 6, ToSecondImpl::execute(source, timezone));
        }

+        static void timezoneOffset(char * target, Time source, const DateLUTImpl & timezone)
+        {
+            auto offset = TimezoneOffsetImpl::execute(source, timezone);
+            if (offset < 0)
+            {
+                *target = '-';
+                offset = -offset;
+            }
+
+            writeNumber2(target + 1, offset / 3600);
+            writeNumber2(target + 3, offset % 3600 / 60);
+        }
+
        static void quarter(char * target, Time source, const DateLUTImpl & timezone)
        {
            *target += ToQuarterImpl::execute(source, timezone);
@ -632,6 +645,12 @@ public:
                        result.append("0");
                        break;

+                    // Offset from UTC timezone as +hhmm or -hhmm
+                    case 'z':
+                        instructions.emplace_back(&Action<T>::timezoneOffset, 5);
+                        result.append("+0000");
+                        break;
+
                    /// Time components. If the argument is Date, not a DateTime, then this components will have default value.

                    // Minute (00-59)
--- a/src/IO/ConcatSeekableReadBuffer.cpp
+++ b/src/IO/ConcatSeekableReadBuffer.cpp
@ -9,6 +9,11 @@ namespace ErrorCodes
    extern const int ARGUMENT_OUT_OF_BOUND;
 }

+ConcatSeekableReadBuffer::BufferInfo::BufferInfo(BufferInfo && src) noexcept
+    : in(std::exchange(src.in, nullptr)), own_in(std::exchange(src.own_in, false)), size(std::exchange(src.size, 0))
+{
+}
+
 ConcatSeekableReadBuffer::BufferInfo::~BufferInfo()
 {
    if (own_in)
--- a/src/IO/ConcatSeekableReadBuffer.h
+++ b/src/IO/ConcatSeekableReadBuffer.h
@ -30,7 +30,7 @@ private:
    struct BufferInfo
    {
        BufferInfo() = default;
-        BufferInfo(BufferInfo &&) = default;
+        BufferInfo(BufferInfo && src) noexcept;
        ~BufferInfo();
        SeekableReadBuffer * in = nullptr;
        bool own_in = false;
--- a/src/IO/HashingReadBuffer.h
+++ b/src/IO/HashingReadBuffer.h
@ -18,29 +18,38 @@ public:
    {
        working_buffer = in.buffer();
        pos = in.position();
-
-        /// calculate hash from the data already read
-        if (!working_buffer.empty())
-        {
-            calculateHash(pos, working_buffer.end() - pos);
+        hashing_begin = pos;
    }
+
+    uint128 getHash()
+    {
+        if (pos > hashing_begin)
+        {
+            calculateHash(hashing_begin, pos - hashing_begin);
+            hashing_begin = pos;
+        }
+        return IHashingBuffer<ReadBuffer>::getHash();
    }

 private:
    bool nextImpl() override
    {
+        if (pos > hashing_begin)
+            calculateHash(hashing_begin, pos - hashing_begin);
+
        in.position() = pos;
        bool res = in.next();
        working_buffer = in.buffer();
-        pos = in.position();

        // `pos` may be different from working_buffer.begin() when using sophisticated ReadBuffers.
-        calculateHash(pos, working_buffer.end() - pos);
+        pos = in.position();
+        hashing_begin = pos;

        return res;
    }

    ReadBuffer & in;
+    BufferBase::Position hashing_begin;
 };

 }
--- a/src/IO/ReadHelpers.cpp
+++ b/src/IO/ReadHelpers.cpp
@ -636,8 +636,9 @@ concept WithResize = requires (T value)
 template <typename Vector>
 void readCSVStringInto(Vector & s, ReadBuffer & buf, const FormatSettings::CSV & settings)
 {
+    /// Empty string
    if (buf.eof())
-        throwReadAfterEOF();
+        return;

    const char delimiter = settings.delimiter;
    const char maybe_quote = *buf.position();
--- a/src/IO/ReadSettings.h
+++ b/src/IO/ReadSettings.h
@ -3,7 +3,7 @@
 #include <cstddef>
 #include <string>
 #include <Core/Defines.h>
-#include <Common/FileCache_fwd.h>
+#include <Interpreters/Cache/FileCache_fwd.h>
 #include <Common/Throttler_fwd.h>

 namespace DB
--- a/src/IO/S3/PocoHTTPClient.cpp
+++ b/src/IO/S3/PocoHTTPClient.cpp
@ -120,11 +120,31 @@ std::shared_ptr<Aws::Http::HttpResponse> PocoHTTPClient::MakeRequest(
    const std::shared_ptr<Aws::Http::HttpRequest> & request,
    Aws::Utils::RateLimits::RateLimiterInterface * readLimiter,
    Aws::Utils::RateLimits::RateLimiterInterface * writeLimiter) const
+{
+    try
    {
        auto response = Aws::MakeShared<PocoHTTPResponse>("PocoHTTPClient", request);
        makeRequestInternal(*request, response, readLimiter, writeLimiter);
        return response;
    }
+    catch (const Exception &)
+    {
+        throw;
+    }
+    catch (const Poco::Exception & e)
+    {
+        throw Exception(Exception::CreateFromPocoTag{}, e);
+    }
+    catch (const std::exception & e)
+    {
+        throw Exception(Exception::CreateFromSTDTag{}, e);
+    }
+    catch (...)
+    {
+        tryLogCurrentException(__PRETTY_FUNCTION__);
+        throw;
+    }
+}

 namespace
 {
--- a/src/IO/WriteBufferFromS3.cpp
+++ b/src/IO/WriteBufferFromS3.cpp
@ -4,7 +4,7 @@

 #include <Common/logger_useful.h>
 #include <Common/Throttler.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCache.h>

 #include <IO/WriteBufferFromS3.h>
 #include <IO/WriteHelpers.h>
--- a/src/Interpreters/AsynchronousMetrics.cpp
+++ b/src/Interpreters/AsynchronousMetrics.cpp
@ -11,10 +11,10 @@
 #include <Common/CurrentMetrics.h>
 #include <Common/typeid_cast.h>
 #include <Common/filesystemHelpers.h>
-#include <Common/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
 #include <Common/getCurrentProcessFDCount.h>
 #include <Common/getMaxFileDescriptorCount.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Server/ProtocolServerAdapter.h>
 #include <Storages/MarkCache.h>
 #include <Storages/StorageMergeTree.h>
@ -77,9 +77,11 @@ static std::unique_ptr<ReadBufferFromFilePRead> openFileIfExists(const std::stri
 AsynchronousMetrics::AsynchronousMetrics(
    ContextPtr global_context_,
    int update_period_seconds,
+    int heavy_metrics_update_period_seconds,
    const ProtocolServerMetricsFunc & protocol_server_metrics_func_)
    : WithContext(global_context_)
    , update_period(update_period_seconds)
+    , heavy_metric_update_period(heavy_metrics_update_period_seconds)
    , protocol_server_metrics_func(protocol_server_metrics_func_)
    , log(&Poco::Logger::get("AsynchronousMetrics"))
 {
@ -563,7 +565,7 @@ AsynchronousMetrics::NetworkInterfaceStatValues::operator-(const AsynchronousMet
 #endif


-void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_time)
+void AsynchronousMetrics::update(TimePoint update_time)
 {
    Stopwatch watch;

@ -1584,6 +1586,8 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti
    saveAllArenasMetric<size_t>(new_values, "muzzy_purged");
 #endif

+    updateHeavyMetricsIfNeeded(current_time, update_time, new_values);
+
    /// Add more metrics as you wish.

    new_values["AsynchronousMetricsCalculationTimeSpent"] = watch.elapsedSeconds();
@ -1601,4 +1605,76 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti
    values = new_values;
 }

+void AsynchronousMetrics::updateDetachedPartsStats()
+{
+    DetachedPartsStats current_values{};
+
+    for (const auto & db : DatabaseCatalog::instance().getDatabases())
+    {
+        if (!db.second->canContainMergeTreeTables())
+            continue;
+
+        for (auto iterator = db.second->getTablesIterator(getContext()); iterator->isValid(); iterator->next())
+        {
+            const auto & table = iterator->table();
+            if (!table)
+                continue;
+
+            if (MergeTreeData * table_merge_tree = dynamic_cast<MergeTreeData *>(table.get()))
+            {
+                for (const auto & detached_part: table_merge_tree->getDetachedParts())
+                {
+                    if (!detached_part.valid_name)
+                        continue;
+
+                    if (detached_part.prefix.empty())
+                        ++current_values.detached_by_user;
+
+                    ++current_values.count;
+                }
+            }
+        }
+    }
+
+    detached_parts_stats = current_values;
+}
+
+void AsynchronousMetrics::updateHeavyMetricsIfNeeded(TimePoint current_time, TimePoint update_time, AsynchronousMetricValues & new_values)
+{
+    const auto time_after_previous_update = current_time - heavy_metric_previous_update_time;
+    const bool update_heavy_metric = time_after_previous_update >= heavy_metric_update_period || first_run;
+
+    if (update_heavy_metric)
+    {
+        heavy_metric_previous_update_time = update_time;
+
+        Stopwatch watch;
+
+        /// Test shows that listing 100000 entries consuming around 0.15 sec.
+        updateDetachedPartsStats();
+
+        watch.stop();
+
+        /// Normally heavy metrics don't delay the rest of the metrics calculation
+        /// otherwise log the warning message
+        auto log_level = std::make_pair(DB::LogsLevel::trace, Poco::Message::PRIO_TRACE);
+        if (watch.elapsedSeconds() > (update_period.count() / 2.))
+            log_level = std::make_pair(DB::LogsLevel::debug, Poco::Message::PRIO_DEBUG);
+        else if (watch.elapsedSeconds() > (update_period.count() / 4. * 3))
+            log_level = std::make_pair(DB::LogsLevel::warning, Poco::Message::PRIO_WARNING);
+        LOG_IMPL(log, log_level.first, log_level.second,
+                 "Update heavy metrics. "
+                 "Update period {} sec. "
+                 "Update heavy metrics period {} sec.  "
+                 "Heavy metrics calculation elapsed: {} sec.",
+                 update_period.count(),
+                 heavy_metric_update_period.count(),
+                 watch.elapsedSeconds());
+
+    }
+
+    new_values["NumberOfDetachedParts"] = detached_parts_stats.count;
+    new_values["NumberOfDetachedByUserParts"] = detached_parts_stats.detached_by_user;
+}
+
 }
--- a/src/Interpreters/AsynchronousMetrics.h
+++ b/src/Interpreters/AsynchronousMetrics.h
@ -50,6 +50,7 @@ public:
    AsynchronousMetrics(
        ContextPtr global_context_,
        int update_period_seconds,
+        int heavy_metrics_update_period_seconds,
        const ProtocolServerMetricsFunc & protocol_server_metrics_func_);

    ~AsynchronousMetrics();
@ -63,7 +64,11 @@ public:
    AsynchronousMetricValues getValues() const;

 private:
-    const std::chrono::seconds update_period;
+    using Duration = std::chrono::seconds;
+    using TimePoint = std::chrono::system_clock::time_point;
+
+    const Duration update_period;
+    const Duration heavy_metric_update_period;
    ProtocolServerMetricsFunc protocol_server_metrics_func;

    mutable std::mutex mutex;
@ -74,7 +79,16 @@ private:
    /// Some values are incremental and we have to calculate the difference.
    /// On first run we will only collect the values to subtract later.
    bool first_run = true;
-    std::chrono::system_clock::time_point previous_update_time;
+    TimePoint previous_update_time;
+    TimePoint heavy_metric_previous_update_time;
+
+    struct DetachedPartsStats
+    {
+        size_t count;
+        size_t detached_by_user;
+    };
+
+    DetachedPartsStats detached_parts_stats{};

 #if defined(OS_LINUX) || defined(OS_FREEBSD)
    MemoryStatisticsOS memory_stat;
@ -185,7 +199,10 @@ private:
    std::unique_ptr<ThreadFromGlobalPool> thread;

    void run();
-    void update(std::chrono::system_clock::time_point update_time);
+    void update(TimePoint update_time);
+
+    void updateDetachedPartsStats();
+    void updateHeavyMetricsIfNeeded(TimePoint current_time, TimePoint update_time, AsynchronousMetricValues & new_values);

    Poco::Logger * log;
 };
--- a/src/Interpreters/Cache/FileCache.cpp
+++ b/src/Interpreters/Cache/FileCache.cpp
@ -2,7 +2,8 @@

 #include <Common/randomSeed.h>
 #include <Common/SipHash.h>
-#include <Common/FileCacheSettings.h>
+#include <Interpreters/Cache/FileCacheSettings.h>
+#include <Interpreters/Cache/LRUFileCachePriority.h>
 #include <IO/ReadHelpers.h>
 #include <IO/WriteBufferFromFile.h>
 #include <IO/ReadSettings.h>
@ -10,7 +11,6 @@
 #include <IO/Operators.h>
 #include <pcg-random/pcg_random.hpp>
 #include <filesystem>
-#include <Common/LRUFileCachePriority.h>

 namespace fs = std::filesystem;

@ -59,6 +59,24 @@ String FileCache::getPathInLocalCache(const Key & key) const
    return fs::path(cache_base_path) / key_str.substr(0, 3) / key_str;
 }

+void FileCache::removeKeyDirectoryIfExists(const Key & key, std::lock_guard<std::mutex> & /* cache_lock */) const
+{
+    /// Note: it is guaranteed that there is no concurrency here with files deletion
+    /// because cache key directories are create only in FileCache class under cache_lock.
+
+    auto key_str = key.toString();
+    auto key_prefix_path = fs::path(cache_base_path) / key_str.substr(0, 3);
+    auto key_path = key_prefix_path / key_str;
+
+    if (!fs::exists(key_path))
+        return;
+
+    fs::remove_all(key_path);
+
+    if (fs::is_empty(key_prefix_path))
+        fs::remove(key_prefix_path);
+}
+
 static bool isQueryInitialized()
 {
    return CurrentThread::isInitialized()
@ -174,15 +192,8 @@ FileSegments FileCache::getImpl(
    const auto & file_segments = it->second;
    if (file_segments.empty())
    {
-        auto key_path = getPathInLocalCache(key);
-
        files.erase(key);
-
-        /// Note: it is guaranteed that there is no concurrency with files deletion,
-        /// because cache files are deleted only inside FileCache and under cache lock.
-        if (fs::exists(key_path))
-            fs::remove_all(key_path);
-
+        removeKeyDirectoryIfExists(key, cache_lock);
        return {};
    }

@ -827,14 +838,10 @@ void FileCache::removeIfExists(const Key & key)
        }
    }

-    auto key_path = getPathInLocalCache(key);
-
    if (!some_cells_were_skipped)
    {
        files.erase(key);
-
-        if (fs::exists(key_path))
-            fs::remove_all(key_path);
+        removeKeyDirectoryIfExists(key, cache_lock);
    }
 }

@ -924,12 +931,8 @@ void FileCache::remove(

            if (is_initialized && offsets.empty())
            {
-                auto key_path = getPathInLocalCache(key);
-
                files.erase(key);
-
-                if (fs::exists(key_path))
-                    fs::remove_all(key_path);
+                removeKeyDirectoryIfExists(key, cache_lock);
            }
        }
        catch (...)
--- a/src/Interpreters/Cache/FileCache.h
+++ b/src/Interpreters/Cache/FileCache.h
@ -13,11 +13,11 @@

 #include <Core/Types.h>
 #include <IO/ReadSettings.h>
-#include <Common/FileCache_fwd.h>
-#include <Common/FileSegment.h>
-#include <Common/IFileCachePriority.h>
+#include <Interpreters/Cache/FileCache_fwd.h>
+#include <Interpreters/Cache/FileSegment.h>
+#include <Interpreters/Cache/IFileCachePriority.h>
 #include <Common/logger_useful.h>
-#include <Common/FileCacheType.h>
+#include <Interpreters/Cache/FileCacheKey.h>

 namespace DB
 {
@ -261,6 +261,8 @@ private:

    void assertCacheCellsCorrectness(const FileSegmentsByOffset & cells_by_offset, std::lock_guard<std::mutex> & cache_lock);

+    void removeKeyDirectoryIfExists(const Key & key, std::lock_guard<std::mutex> & cache_lock) const;
+
    /// Used to track and control the cache access of each query.
    /// Through it, we can realize the processing of different queries by the cache layer.
    struct QueryContext
--- a/src/Interpreters/Cache/FileCacheFactory.cpp
+++ b/src/Interpreters/Cache/FileCacheFactory.cpp
--- a/src/Interpreters/Cache/FileCacheFactory.h
+++ b/src/Interpreters/Cache/FileCacheFactory.h
@ -1,7 +1,7 @@
 #pragma once

-#include <Common/FileCache_fwd.h>
-#include <Common/FileCacheSettings.h>
+#include <Interpreters/Cache/FileCache_fwd.h>
+#include <Interpreters/Cache/FileCacheSettings.h>

 #include <boost/noncopyable.hpp>
 #include <unordered_map>
--- a/src/Interpreters/Cache/FileCacheKey.h
+++ b/src/Interpreters/Cache/FileCacheKey.h
--- a/src/Interpreters/Cache/FileCacheSettings.cpp
+++ b/src/Interpreters/Cache/FileCacheSettings.cpp
--- a/src/Interpreters/Cache/FileCacheSettings.h
+++ b/src/Interpreters/Cache/FileCacheSettings.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Common/FileCache_fwd.h>
+#include <Interpreters/Cache/FileCache_fwd.h>

 namespace Poco { namespace Util { class AbstractConfiguration; } } // NOLINT(cppcoreguidelines-virtual-class-destructor)

--- a/src/Interpreters/Cache/FileCache_fwd.h
+++ b/src/Interpreters/Cache/FileCache_fwd.h
--- a/src/Interpreters/Cache/FileSegment.cpp
+++ b/src/Interpreters/Cache/FileSegment.cpp
@ -3,7 +3,7 @@
 #include <base/getThreadId.h>
 #include <base/scope_guard.h>
 #include <Common/logger_useful.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Common/hex.h>
 #include <IO/WriteBufferFromString.h>
 #include <IO/Operators.h>
@ -840,37 +840,31 @@ void FileSegmentRangeWriter::completeFileSegment(FileSegment & file_segment)
        return;

    size_t current_downloaded_size = file_segment.getDownloadedSize();
-    if (current_downloaded_size > 0)
-    {
-        file_segment.getOrSetDownloader();

-        {
    /// file_segment->complete(DOWNLOADED) is not enough, because file segment capacity
    /// was initially set with a margin as `max_file_segment_size`. => We need to always
    /// resize to actual size after download finished.
-
+    if (current_downloaded_size != file_segment.range().size())
+    {
        /// Current file segment is downloaded as a part of write-through cache
        /// and therefore cannot be concurrently accessed. Nevertheless, it can be
        /// accessed by cache system tables if someone read from them,
        /// therefore we need a mutex.
        std::lock_guard segment_lock(file_segment.mutex);

-            assert(file_segment.downloaded_size <= file_segment.range().size());
+        assert(current_downloaded_size <= file_segment.range().size());
        file_segment.segment_range = FileSegment::Range(
            file_segment.segment_range.left,
-                file_segment.segment_range.left + file_segment.downloaded_size - 1);
-            file_segment.reserved_size = file_segment.downloaded_size;
+            file_segment.segment_range.left + current_downloaded_size - 1);
+        file_segment.reserved_size = current_downloaded_size;
    }

-        file_segment.completeWithState(FileSegment::State::DOWNLOADED);
-
-        on_complete_file_segment_func(file_segment);
-    }
-    else
    {
        std::lock_guard cache_lock(cache->mutex);
        file_segment.completeWithoutState(cache_lock);
    }
+
+    on_complete_file_segment_func(file_segment);
 }

 bool FileSegmentRangeWriter::write(const char * data, size_t size, size_t offset, bool is_persistent)
@ -899,22 +893,27 @@ bool FileSegmentRangeWriter::write(const char * data, size_t size, size_t offset
                offset, current_file_segment_write_offset);
        }

-        if ((*current_file_segment_it)->getRemainingSizeToDownload() == 0)
+        auto current_file_segment = *current_file_segment_it;
+        if (current_file_segment->getRemainingSizeToDownload() == 0)
        {
-            completeFileSegment(**current_file_segment_it);
+            completeFileSegment(*current_file_segment);
            current_file_segment_it = allocateFileSegment(current_file_segment_write_offset, is_persistent);
        }
-        else if ((*current_file_segment_it)->getDownloadOffset() != offset)
+        else if (current_file_segment->getDownloadOffset() != offset)
        {
            throw Exception(
                ErrorCodes::LOGICAL_ERROR,
                "Cannot file segment download offset {} does not match current write offset {}",
-                (*current_file_segment_it)->getDownloadOffset(), offset);
+                current_file_segment->getDownloadOffset(), offset);
        }
    }

    auto & file_segment = *current_file_segment_it;
-    file_segment->getOrSetDownloader();
+
+    auto downloader = file_segment->getOrSetDownloader();
+    if (downloader != FileSegment::getCallerId())
+        throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to set a downloader. ({})", file_segment->getInfoForLog());
+
    SCOPE_EXIT({
        file_segment->resetDownloader();
    });
--- a/src/Interpreters/Cache/FileSegment.h
+++ b/src/Interpreters/Cache/FileSegment.h
@ -5,7 +5,8 @@
 #include <IO/WriteBufferFromFile.h>
 #include <IO/ReadBufferFromFileBase.h>
 #include <list>
-#include <Common/FileCacheType.h>
+#include <Interpreters/Cache/FileCacheKey.h>
+

 namespace Poco { class Logger; }

--- a/src/Interpreters/Cache/IFileCachePriority.h
+++ b/src/Interpreters/Cache/IFileCachePriority.h
@ -4,7 +4,7 @@
 #include <mutex>
 #include <Core/Types.h>
 #include <Common/Exception.h>
-#include <Common/FileCacheType.h>
+#include <Interpreters/Cache/FileCacheKey.h>

 namespace DB
 {
--- a/src/Interpreters/Cache/LRUFileCachePriority.cpp
+++ b/src/Interpreters/Cache/LRUFileCachePriority.cpp
@ -1,4 +1,4 @@
-#include <Common/LRUFileCachePriority.h>
+#include <Interpreters/Cache/LRUFileCachePriority.h>
 #include <Common/CurrentMetrics.h>

 namespace CurrentMetrics
--- a/src/Interpreters/Cache/LRUFileCachePriority.h
+++ b/src/Interpreters/Cache/LRUFileCachePriority.h
@ -1,7 +1,7 @@
 #pragma once

 #include <list>
-#include <Common/IFileCachePriority.h>
+#include <Interpreters/Cache/IFileCachePriority.h>
 #include <Common/logger_useful.h>

 namespace DB
--- a/src/Interpreters/FilesystemCacheLog.cpp
+++ b/src/Interpreters/FilesystemCacheLog.cpp
@ -42,7 +42,7 @@ NamesAndTypesList FilesystemCacheLogElement::getNamesAndTypes()
        {"total_requested_range", std::make_shared<DataTypeTuple>(types)},
        {"size", std::make_shared<DataTypeUInt64>()},
        {"read_type", std::make_shared<DataTypeString>()},
-        {"cache_attempted", std::make_shared<DataTypeUInt8>()},
+        {"read_from_cache_attempted", std::make_shared<DataTypeUInt8>()},
        {"ProfileEvents", std::make_shared<DataTypeMap>(std::make_shared<DataTypeString>(), std::make_shared<DataTypeUInt64>())},
        {"read_buffer_id", std::make_shared<DataTypeString>()},
    };
@ -62,7 +62,7 @@ void FilesystemCacheLogElement::appendToBlock(MutableColumns & columns) const
    columns[i++]->insert(Tuple{requested_range.first, requested_range.second});
    columns[i++]->insert(file_segment_size);
    columns[i++]->insert(typeToString(cache_type));
-    columns[i++]->insert(cache_attempted);
+    columns[i++]->insert(read_from_cache_attempted);

    if (profile_counters)
    {
--- a/src/Interpreters/FilesystemCacheLog.h
+++ b/src/Interpreters/FilesystemCacheLog.h
@ -41,7 +41,7 @@ struct FilesystemCacheLogElement
    std::pair<size_t, size_t> requested_range{};
    CacheType cache_type{};
    size_t file_segment_size;
-    bool cache_attempted;
+    bool read_from_cache_attempted;
    String read_buffer_id;
    std::shared_ptr<ProfileEvents::Counters::Snapshot> profile_counters;

--- a/src/Interpreters/InterpreterDescribeCacheQuery.cpp
+++ b/src/Interpreters/InterpreterDescribeCacheQuery.cpp
@ -5,8 +5,8 @@
 #include <DataTypes/DataTypesNumber.h>
 #include <DataTypes/DataTypeString.h>
 #include <Storages/ColumnsDescription.h>
-#include <Common/FileCacheFactory.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Access/Common/AccessFlags.h>
 #include <Core/Block.h>

--- a/src/Interpreters/InterpreterSelectQuery.cpp
+++ b/src/Interpreters/InterpreterSelectQuery.cpp
@ -2592,7 +2592,7 @@ void InterpreterSelectQuery::executeOrderOptimized(QueryPlan & query_plan, Input

    auto finish_sorting_step = std::make_unique<SortingStep>(
        query_plan.getCurrentDataStream(),
-        input_sorting_info->order_key_prefix_descr,
+        input_sorting_info->sort_description_for_merging,
        output_order_descr,
        settings.max_block_size,
        limit);
--- a/src/Interpreters/InterpreterShowTablesQuery.cpp
+++ b/src/Interpreters/InterpreterShowTablesQuery.cpp
@ -7,7 +7,7 @@
 #include <Interpreters/InterpreterShowTablesQuery.h>
 #include <DataTypes/DataTypeString.h>
 #include <Storages/ColumnsDescription.h>
-#include <Common/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
 #include <Processors/Sources/SourceFromSingleChunk.h>
 #include <Access/Common/AccessFlags.h>
 #include <Common/typeid_cast.h>
--- a/src/Interpreters/InterpreterSystemQuery.cpp
+++ b/src/Interpreters/InterpreterSystemQuery.cpp
@ -7,8 +7,8 @@
 #include <Common/ThreadPool.h>
 #include <Common/escapeForFileName.h>
 #include <Common/ShellCommand.h>
-#include <Common/FileCacheFactory.h>
-#include <Common/FileCache.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
 #include <Interpreters/Context.h>
 #include <Interpreters/DatabaseCatalog.h>
 #include <Interpreters/ExternalDictionariesLoader.h>
--- a/src/Interpreters/tests/gtest_lru_file_cache.cpp
+++ b/src/Interpreters/tests/gtest_lru_file_cache.cpp
@ -1,11 +1,11 @@
 #include <iomanip>
 #include <iostream>
 #include <gtest/gtest.h>
-#include <Common/FileCache.h>
-#include <Common/FileSegment.h>
+#include <Interpreters/Cache/FileCache.h>
+#include <Interpreters/Cache/FileSegment.h>
+#include <Interpreters/Cache/FileCacheSettings.h>
 #include <Common/CurrentThread.h>
 #include <Common/filesystemHelpers.h>
-#include <Common/FileCacheSettings.h>
 #include <Common/tests/gtest_global_context.h>
 #include <Common/SipHash.h>
 #include <Common/hex.h>
--- a/src/Loggers/Loggers.cpp
+++ b/src/Loggers/Loggers.cpp
@ -99,8 +99,8 @@ void Loggers::buildLoggers(Poco::Util::AbstractConfiguration & config, Poco::Log

        Poco::AutoPtr<OwnPatternFormatter> pf;

-        if (config.getString("logger.formatting", "") == "json")
-            pf = new OwnJSONPatternFormatter;
+        if (config.getString("logger.formatting.type", "") == "json")
+            pf = new OwnJSONPatternFormatter(config);
        else
            pf = new OwnPatternFormatter;

@ -140,8 +140,8 @@ void Loggers::buildLoggers(Poco::Util::AbstractConfiguration & config, Poco::Log

        Poco::AutoPtr<OwnPatternFormatter> pf;

-        if (config.getString("logger.formatting", "") == "json")
-            pf = new OwnJSONPatternFormatter;
+        if (config.getString("logger.formatting.type", "") == "json")
+            pf = new OwnJSONPatternFormatter(config);
        else
            pf = new OwnPatternFormatter;

@ -184,8 +184,8 @@ void Loggers::buildLoggers(Poco::Util::AbstractConfiguration & config, Poco::Log

        Poco::AutoPtr<OwnPatternFormatter> pf;

-        if (config.getString("logger.formatting", "") == "json")
-            pf = new OwnJSONPatternFormatter;
+        if (config.getString("logger.formatting.type", "") == "json")
+            pf = new OwnJSONPatternFormatter(config);
        else
            pf = new OwnPatternFormatter;

@ -211,8 +211,8 @@ void Loggers::buildLoggers(Poco::Util::AbstractConfiguration & config, Poco::Log
        }

        Poco::AutoPtr<OwnPatternFormatter> pf;
-        if (config.getString("logger.formatting", "") == "json")
-            pf = new OwnJSONPatternFormatter;
+        if (config.getString("logger.formatting.type", "") == "json")
+            pf = new OwnJSONPatternFormatter(config);
        else
            pf = new OwnPatternFormatter(color_enabled);
        Poco::AutoPtr<DB::OwnFormattingChannel> log = new DB::OwnFormattingChannel(pf, new Poco::ConsoleChannel);
--- a/src/Loggers/OwnJSONPatternFormatter.cpp
+++ b/src/Loggers/OwnJSONPatternFormatter.cpp
@ -8,21 +8,63 @@
 #include <Common/CurrentThread.h>
 #include <Common/HashTable/Hash.h>

-OwnJSONPatternFormatter::OwnJSONPatternFormatter() : OwnPatternFormatter("")
+OwnJSONPatternFormatter::OwnJSONPatternFormatter(Poco::Util::AbstractConfiguration & config)
 {
-}
+    if (config.has("logger.formatting.names.date_time"))
+        date_time = config.getString("logger.formatting.names.date_time", "");

+    if (config.has("logger.formatting.names.thread_name"))
+        thread_name = config.getString("logger.formatting.names.thread_name", "");
+
+    if (config.has("logger.formatting.names.thread_id"))
+        thread_id = config.getString("logger.formatting.names.thread_id", "");
+
+    if (config.has("logger.formatting.names.level"))
+        level = config.getString("logger.formatting.names.level", "");
+
+    if (config.has("logger.formatting.names.query_id"))
+        query_id = config.getString("logger.formatting.names.query_id", "");
+
+    if (config.has("logger.formatting.names.logger_name"))
+        logger_name = config.getString("logger.formatting.names.logger_name", "");
+
+    if (config.has("logger.formatting.names.message"))
+        message = config.getString("logger.formatting.names.message", "");
+
+    if (config.has("logger.formatting.names.source_file"))
+        source_file = config.getString("logger.formatting.names.source_file", "");
+
+    if (config.has("logger.formatting.names.source_line"))
+        source_line = config.getString("logger.formatting.names.source_line", "");
+
+    if (date_time.empty() && thread_name.empty() && thread_id.empty() && level.empty() && query_id.empty()
+        && logger_name.empty() && message.empty() && source_file.empty() && source_line.empty())
+    {
+        date_time = "date_time";
+        thread_name = "thread_name";
+        thread_id = "thread_id";
+        level = "level";
+        query_id = "query_id";
+        logger_name = "logger_name";
+        message = "message";
+        source_file = "source_file";
+        source_line = "source_line";
+    }
+}

 void OwnJSONPatternFormatter::formatExtended(const DB::ExtendedLogMessage & msg_ext, std::string & text) const
 {
    DB::WriteBufferFromString wb(text);

    DB::FormatSettings settings;
+    bool print_comma = false;

    const Poco::Message & msg = msg_ext.base;
    DB::writeChar('{', wb);

-    writeJSONString("date_time", wb, settings);
+    if (!date_time.empty())
+    {
+        writeJSONString(date_time, wb, settings);
        DB::writeChar(':', wb);

        DB::writeChar('\"', wb);
@ -36,65 +78,116 @@ void OwnJSONPatternFormatter::formatExtended(const DB::ExtendedLogMessage & msg_
        DB::writeChar('0' + ((msg_ext.time_microseconds / 10) % 10), wb);
        DB::writeChar('0' + ((msg_ext.time_microseconds / 1) % 10), wb);
        DB::writeChar('\"', wb);
+        print_comma = true;
+    }

+    if (!thread_name.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);
+        else
+            print_comma = true;

-    writeJSONString("thread_name", wb, settings);
+        writeJSONString(thread_name, wb, settings);
        DB::writeChar(':', wb);

        writeJSONString(msg.getThread(), wb, settings);
+    }

+    if (!thread_id.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);
+        else
+            print_comma = true;

-    writeJSONString("thread_id", wb, settings);
+        writeJSONString(thread_id, wb, settings);
        DB::writeChar(':', wb);
        DB::writeChar('\"', wb);
        DB::writeIntText(msg_ext.thread_id, wb);
        DB::writeChar('\"', wb);
+    }

+    if (!level.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);
+        else
+            print_comma = true;

-    writeJSONString("level", wb, settings);
+        writeJSONString(level, wb, settings);
        DB::writeChar(':', wb);
        int priority = static_cast<int>(msg.getPriority());
        writeJSONString(std::to_string(priority), wb, settings);
+    }
+
+    if (!query_id.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);
+        else
+            print_comma = true;

        /// We write query_id even in case when it is empty (no query context)
        /// just to be convenient for various log parsers.

-    writeJSONString("query_id", wb, settings);
+        writeJSONString(query_id, wb, settings);
        DB::writeChar(':', wb);
        writeJSONString(msg_ext.query_id, wb, settings);
+    }

+    if (!logger_name.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);
+        else
+            print_comma = true;

-    writeJSONString("logger_name", wb, settings);
+        writeJSONString(logger_name, wb, settings);
        DB::writeChar(':', wb);

        writeJSONString(msg.getSource(), wb, settings);
-    DB::writeChar(',', wb);
+    }

-    writeJSONString("message", wb, settings);
+    if (!message.empty())
+    {
+        if (print_comma)
+            DB::writeChar(',', wb);
+        else
+            print_comma = true;
+
+        writeJSONString(message, wb, settings);
        DB::writeChar(':', wb);
        writeJSONString(msg.getText(), wb, settings);
-    DB::writeChar(',', wb);
+    }
+
+    if (!source_file.empty())
+    {
+        if (print_comma)
+            DB::writeChar(',', wb);
+        else
+            print_comma = true;

-    writeJSONString("source_file", wb, settings);
-    DB::writeChar(':', wb);
-    const char * source_file = msg.getSourceFile();
-    if (source_file != nullptr)
        writeJSONString(source_file, wb, settings);
+        DB::writeChar(':', wb);
+        const char * source_file_name = msg.getSourceFile();
+        if (source_file_name != nullptr)
+            writeJSONString(source_file_name, wb, settings);
        else
            writeJSONString("", wb, settings);
+    }
+
+    if (!source_line.empty())
+    {
+        if (print_comma)
            DB::writeChar(',', wb);

-    writeJSONString("source_line", wb, settings);
+        writeJSONString(source_line, wb, settings);
        DB::writeChar(':', wb);
        DB::writeChar('\"', wb);
        DB::writeIntText(msg.getSourceLine(), wb);
        DB::writeChar('\"', wb);
-
+    }
    DB::writeChar('}', wb);
 }

--- a/src/Loggers/OwnJSONPatternFormatter.h
+++ b/src/Loggers/OwnJSONPatternFormatter.h
@ -2,6 +2,7 @@


 #include <Poco/PatternFormatter.h>
+#include <Poco/Util/AbstractConfiguration.h>
 #include "ExtendedLogChannel.h"
 #include "OwnPatternFormatter.h"

@ -25,8 +26,19 @@ class Loggers;
 class OwnJSONPatternFormatter : public OwnPatternFormatter
 {
 public:
-    OwnJSONPatternFormatter();
+    OwnJSONPatternFormatter(Poco::Util::AbstractConfiguration & config);

    void format(const Poco::Message & msg, std::string & text) override;
    void formatExtended(const DB::ExtendedLogMessage & msg_ext, std::string & text) const override;
+
+private:
+    std::string date_time;
+    std::string thread_name;
+    std::string thread_id;
+    std::string level;
+    std::string query_id;
+    std::string logger_name;
+    std::string message;
+    std::string source_file;
+    std::string source_line;
 };
--- a/src/Processors/QueryPlan/Optimizations/reuseStorageOrderingForWindowFunctions.cpp
+++ b/src/Processors/QueryPlan/Optimizations/reuseStorageOrderingForWindowFunctions.cpp
@ -104,7 +104,7 @@ size_t tryReuseStorageOrderingForWindowFunctions(QueryPlan::Node * parent_node,
    if (order_info)
    {
        read_from_merge_tree->setQueryInfoInputOrderInfo(order_info);
-        sorting->convertToFinishSorting(order_info->order_key_prefix_descr);
+        sorting->convertToFinishSorting(order_info->sort_description_for_merging);
    }

    return 0;
--- a/src/Processors/QueryPlan/ReadFromMergeTree.cpp
+++ b/src/Processors/QueryPlan/ReadFromMergeTree.cpp
@ -548,9 +548,7 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder(

    if (need_preliminary_merge)
    {
-        size_t fixed_prefix_size = input_order_info->order_key_fixed_prefix_descr.size();
-        size_t prefix_size = fixed_prefix_size + input_order_info->order_key_prefix_descr.size();
-
+        size_t prefix_size = input_order_info->used_prefix_of_sorting_key_size;
        auto order_key_prefix_ast = metadata_for_reading->getSortingKey().expression_list_ast->clone();
        order_key_prefix_ast->children.resize(prefix_size);

@ -865,6 +863,7 @@ MergeTreeDataSelectAnalysisResultPtr ReadFromMergeTree::selectRangesToRead(

    size_t total_parts = parts.size();

+    /// TODO Support row_policy_filter and additional_filters
    auto part_values = MergeTreeDataSelectExecutor::filterPartsByVirtualColumns(data, parts, query_info.query, context);
    if (part_values && part_values->empty())
        return std::make_shared<MergeTreeDataSelectAnalysisResult>(MergeTreeDataSelectAnalysisResult{.result = std::move(result)});
@ -923,6 +922,9 @@ MergeTreeDataSelectAnalysisResultPtr ReadFromMergeTree::selectRangesToRead(
    }
    LOG_DEBUG(log, "Key condition: {}", key_condition->toString());

+    if (key_condition->alwaysFalse())
+        return std::make_shared<MergeTreeDataSelectAnalysisResult>(MergeTreeDataSelectAnalysisResult{.result = std::move(result)});
+
    const auto & select = query_info.query->as<ASTSelectQuery &>();

    size_t total_marks_pk = 0;
--- a/src/Processors/Transforms/AggregatingInOrderTransform.cpp
+++ b/src/Processors/Transforms/AggregatingInOrderTransform.cpp
@ -41,13 +41,13 @@ AggregatingInOrderTransform::AggregatingInOrderTransform(
    /// We won't finalize states in order to merge same states (generated due to multi-thread execution) in AggregatingSortedTransform
    res_header = params->getCustomHeader(/* final_= */ false);

-    for (size_t i = 0; i < group_by_info->order_key_prefix_descr.size(); ++i)
+    for (size_t i = 0; i < group_by_info->sort_description_for_merging.size(); ++i)
    {
        const auto & column_description = group_by_description_[i];
        group_by_description.emplace_back(column_description, res_header.getPositionByName(column_description.column_name));
    }

-    if (group_by_info->order_key_prefix_descr.size() < group_by_description_.size())
+    if (group_by_info->sort_description_for_merging.size() < group_by_description_.size())
    {
        group_by_key = true;
        /// group_by_description may contains duplicates, so we use keys_size from Aggregator::params
--- a/src/Storages/MergeTree/DataPartsExchange.cpp
+++ b/src/Storages/MergeTree/DataPartsExchange.cpp
@ -773,6 +773,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToDisk(
    ThrottlerPtr throttler)
 {
    assert(!tmp_prefix.empty());
+    const auto data_settings = data.getSettings();

    /// We will remove directory if it's already exists. Make precautions.
    if (tmp_prefix.empty() //-V560
@ -800,7 +801,14 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToDisk(
    {
        LOG_WARNING(log, "Directory {} already exists, probably result of a failed fetch. Will remove it before fetching part.",
            data_part_storage_builder->getFullPath());
-        data_part_storage_builder->removeRecursive();
+
+        /// Even if it's a temporary part it could be downloaded with zero copy replication and this function
+        /// is executed as a callback.
+        ///
+        /// We don't control the amount of refs for temporary parts so we cannot decide can we remove blobs
+        /// or not. So we are not doing it
+        bool keep_shared = disk->supportZeroCopyReplication() && data_settings->allow_remote_fs_zero_copy_replication;
+        data_part_storage_builder->removeSharedRecursive(keep_shared);
    }

    data_part_storage_builder->createDirectories();
--- a/src/Storages/MergeTree/KeyCondition.cpp
+++ b/src/Storages/MergeTree/KeyCondition.cpp
@ -2463,7 +2463,6 @@ BoolMask KeyCondition::checkInHyperrectangle(
    return rpn_stack[0];
 }

-
 bool KeyCondition::mayBeTrueInRange(
    size_t used_key_size,
    const FieldRef * left_keys,
@ -2474,6 +2473,7 @@ bool KeyCondition::mayBeTrueInRange(
 }

 String KeyCondition::RPNElement::toString() const { return toString("column " + std::to_string(key_column), false); }
+
 String KeyCondition::RPNElement::toString(std::string_view column_name, bool print_constants) const
 {
    auto print_wrapped_column = [this, &column_name, print_constants](WriteBuffer & buf)
@ -2563,10 +2563,12 @@ bool KeyCondition::alwaysUnknownOrTrue() const
 {
    return unknownOrAlwaysTrue(false);
 }
+
 bool KeyCondition::anyUnknownOrAlwaysTrue() const
 {
    return unknownOrAlwaysTrue(true);
 }
+
 bool KeyCondition::unknownOrAlwaysTrue(bool unknown_any) const
 {
    std::vector<UInt8> rpn_stack;
@ -2627,6 +2629,80 @@ bool KeyCondition::unknownOrAlwaysTrue(bool unknown_any) const
    return rpn_stack[0];
 }

+bool KeyCondition::alwaysFalse() const
+{
+    /// 0: always_false, 1: always_true, 2: non_const
+    std::vector<UInt8> rpn_stack;
+
+    for (const auto & element : rpn)
+    {
+        if (element.function == RPNElement::ALWAYS_TRUE)
+        {
+            rpn_stack.push_back(1);
+        }
+        else if (element.function == RPNElement::ALWAYS_FALSE)
+        {
+            rpn_stack.push_back(0);
+        }
+        else if (element.function == RPNElement::FUNCTION_NOT_IN_RANGE
+            || element.function == RPNElement::FUNCTION_IN_RANGE
+            || element.function == RPNElement::FUNCTION_IN_SET
+            || element.function == RPNElement::FUNCTION_NOT_IN_SET
+            || element.function == RPNElement::FUNCTION_IS_NULL
+            || element.function == RPNElement::FUNCTION_IS_NOT_NULL
+            || element.function == RPNElement::FUNCTION_UNKNOWN)
+        {
+            rpn_stack.push_back(2);
+        }
+        else if (element.function == RPNElement::FUNCTION_NOT)
+        {
+            assert(!rpn_stack.empty());
+
+            auto & arg = rpn_stack.back();
+            if (arg == 0)
+                arg = 1;
+            else if (arg == 1)
+                arg = 0;
+        }
+        else if (element.function == RPNElement::FUNCTION_AND)
+        {
+            assert(!rpn_stack.empty());
+
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+
+            if (arg1 == 0 || arg2 == 0)
+                rpn_stack.back() = 0;
+            else if (arg1 == 1 && arg2 == 1)
+                rpn_stack.back() = 1;
+            else
+                rpn_stack.back() = 2;
+        }
+        else if (element.function == RPNElement::FUNCTION_OR)
+        {
+            assert(!rpn_stack.empty());
+
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+
+            if (arg1 == 1 || arg2 == 1)
+                rpn_stack.back() = 1;
+            else if (arg1 == 0 && arg2 == 0)
+                rpn_stack.back() = 0;
+            else
+                rpn_stack.back() = 2;
+        }
+        else
+            throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
+    }
+
+    if (rpn_stack.size() != 1)
+        throw Exception("Unexpected stack size in KeyCondition::alwaysFalse", ErrorCodes::LOGICAL_ERROR);
+
+    return rpn_stack[0] == 0;
+}

 size_t KeyCondition::getMaxKeyColumn() const
 {
--- a/src/Storages/MergeTree/KeyCondition.h
+++ b/src/Storages/MergeTree/KeyCondition.h
@ -279,6 +279,8 @@ public:
    /// Does not allow any FUNCTION_UNKNOWN (will instantly return true).
    bool anyUnknownOrAlwaysTrue() const;

+    bool alwaysFalse() const;
+
    /// Get the maximum number of the key element used in the condition.
    size_t getMaxKeyColumn() const;

--- a/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/src/Storages/MergeTree/MergeTreeData.cpp
@ -1592,7 +1592,21 @@ size_t MergeTreeData::clearOldTemporaryDirectories(size_t custom_directories_lif
                    else
                    {
                        LOG_WARNING(log, "Removing temporary directory {}", full_path);
-                        disk->removeRecursive(it->path());
+
+                        /// Even if it's a temporary part it could be downloaded with zero copy replication and this function
+                        /// is executed as a callback.
+                        ///
+                        /// We don't control the amount of refs for temporary parts so we cannot decide can we remove blobs
+                        /// or not. So we are not doing it
+                        bool keep_shared = false;
+                        if (it->path().find("fetch") != std::string::npos)
+                        {
+                            keep_shared = disk->supportZeroCopyReplication() && settings->allow_remote_fs_zero_copy_replication;
+                            if (keep_shared)
+                                LOG_WARNING(log, "Since zero-copy replication is enabled we are not going to remove blobs from shared storage for {}", full_path);
+                        }
+
+                        disk->removeSharedRecursive(it->path(), keep_shared, {});
                        ++cleared_count;
                    }
                }
--- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@ -242,7 +242,7 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read(

            auto sorting_step = std::make_unique<SortingStep>(
                projection_plan->getCurrentDataStream(),
-                query_info.projection->input_order_info->order_key_prefix_descr,
+                query_info.projection->input_order_info->sort_description_for_merging,
                output_order_descr,
                settings.max_block_size,
                limit);
--- a/src/Storages/MergeTree/PartMetadataManagerWithCache.cpp
+++ b/src/Storages/MergeTree/PartMetadataManagerWithCache.cpp
@ -191,6 +191,7 @@ void PartMetadataManagerWithCache::getKeysAndCheckSums(Strings & keys, std::vect
    {
        ReadBufferFromString rbuf(values[i]);
        HashingReadBuffer hbuf(rbuf);
+        hbuf.ignoreAll();
        checksums.push_back(hbuf.getHash());
    }
 }
--- a/src/Storages/ReadInOrderOptimizer.cpp
+++ b/src/Storages/ReadInOrderOptimizer.cpp
@ -5,10 +5,12 @@
 #include <Interpreters/TreeRewriter.h>
 #include <Interpreters/replaceAliasColumnsInQuery.h>
 #include <Functions/IFunction.h>
+#include <Functions/FunctionFactory.h>
 #include <Interpreters/TableJoin.h>
 #include <Interpreters/Context.h>
 #include <Parsers/ASTSelectQuery.h>
 #include <Parsers/ASTFunction.h>
+#include <Parsers/ASTIdentifier.h>

 namespace DB
 {
@ -21,26 +23,46 @@ namespace ErrorCodes
 namespace
 {

-ASTPtr getFixedPoint(const ASTPtr & ast)
+/// Finds expression like x = 'y' or f(x) = 'y',
+/// where `x` is identifier, 'y' is literal and `f` is injective functions.
+ASTPtr getFixedPoint(const ASTPtr & ast, const ContextPtr & context)
 {
    const auto * func = ast->as<ASTFunction>();
    if (!func || func->name != "equals")
        return nullptr;

+    if (!func->arguments || func->arguments->children.size() != 2)
+        return nullptr;
+
    const auto & lhs = func->arguments->children[0];
    const auto & rhs = func->arguments->children[1];

-    if (lhs->as<ASTLiteral>())
-        return rhs;
-
-    if (rhs->as<ASTLiteral>())
-        return lhs;
-
+    if (!lhs->as<ASTLiteral>() && !rhs->as<ASTLiteral>())
        return nullptr;
+
+    /// Case of two literals doesn't make sense.
+    if (lhs->as<ASTLiteral>() && rhs->as<ASTLiteral>())
+        return nullptr;
+
+    /// If indetifier is wrapped into injective functions, remove them.
+    auto argument = lhs->as<ASTLiteral>() ? rhs : lhs;
+    while (const auto * arg_func = argument->as<ASTFunction>())
+    {
+        if (!arg_func->arguments || arg_func->arguments->children.size() != 1)
+            return nullptr;
+
+        auto func_resolver = FunctionFactory::instance().tryGet(arg_func->name, context);
+        if (!func_resolver || !func_resolver->isInjective({}))
+            return nullptr;
+
+        argument = arg_func->arguments->children[0];
    }

-size_t calculateFixedPrefixSize(
-    const ASTSelectQuery & query, const Names & sorting_key_columns)
+    return argument->as<ASTIdentifier>() ? argument : nullptr;
+}
+
+NameSet getFixedSortingColumns(
+    const ASTSelectQuery & query, const Names & sorting_key_columns, const ContextPtr & context)
 {
    ASTPtr condition;
    if (query.where() && query.prewhere())
@ -51,14 +73,15 @@ size_t calculateFixedPrefixSize(
        condition = query.prewhere();

    if (!condition)
-        return 0;
+        return {};

    /// Convert condition to CNF for more convenient analysis.
    auto cnf = TreeCNFConverter::tryConvertToCNF(condition);
    if (!cnf)
-        return 0;
+        return {};

    NameSet fixed_points;
+    NameSet sorting_key_columns_set(sorting_key_columns.begin(), sorting_key_columns.end());

    /// If we met expression like 'column = x', where 'x' is literal,
    /// in clause of size 1 in CNF, then we can guarantee
@ -67,22 +90,17 @@ size_t calculateFixedPrefixSize(
    {
        if (group.size() == 1 && !group.begin()->negative)
        {
-            auto fixed_point = getFixedPoint(group.begin()->ast);
+            auto fixed_point = getFixedPoint(group.begin()->ast, context);
            if (fixed_point)
-                fixed_points.insert(fixed_point->getColumnName());
+            {
+                auto column_name = fixed_point->getColumnName();
+                if (sorting_key_columns_set.contains(column_name))
+                    fixed_points.insert(column_name);
+            }
        }
    });

-    size_t prefix_size = 0;
-    for (const auto & column_name : sorting_key_columns)
-    {
-        if (!fixed_points.contains(column_name))
-            break;
-
-        ++prefix_size;
-    }
-
-    return prefix_size;
+    return fixed_points;
 }

 /// Optimize in case of exact match with order key element
@ -181,46 +199,54 @@ InputOrderInfoPtr ReadInOrderOptimizer::getInputOrderImpl(
    const StorageMetadataPtr & metadata_snapshot,
    const SortDescription & description,
    const ManyExpressionActions & actions,
+    const ContextPtr & context,
    UInt64 limit) const
 {
    auto sorting_key_columns = metadata_snapshot->getSortingKeyColumns();
    int read_direction = description.at(0).direction;

-    size_t fixed_prefix_size = calculateFixedPrefixSize(query, sorting_key_columns);
-    size_t descr_prefix_size = std::min(description.size(), sorting_key_columns.size() - fixed_prefix_size);
+    auto fixed_sorting_columns = getFixedSortingColumns(query, sorting_key_columns, context);

-    SortDescription order_key_prefix_descr;
-    order_key_prefix_descr.reserve(descr_prefix_size);
+    SortDescription sort_description_for_merging;
+    sort_description_for_merging.reserve(description.size());

-    for (size_t i = 0; i < descr_prefix_size; ++i)
+    size_t desc_pos = 0;
+    size_t key_pos = 0;
+
+    while (desc_pos < description.size() && key_pos < sorting_key_columns.size())
    {
-        if (forbidden_columns.contains(description[i].column_name))
+        if (forbidden_columns.contains(description[desc_pos].column_name))
            break;

-        int current_direction = matchSortDescriptionAndKey(
-            actions[i]->getActions(), description[i], sorting_key_columns[i + fixed_prefix_size]);
+        int current_direction = matchSortDescriptionAndKey(actions[desc_pos]->getActions(), description[desc_pos], sorting_key_columns[key_pos]);
+        bool is_matched = current_direction && (desc_pos == 0 || current_direction == read_direction);

-        if (!current_direction || (i > 0 && current_direction != read_direction))
-            break;
-
-        if (i == 0)
-            read_direction = current_direction;
-
-        order_key_prefix_descr.push_back(required_sort_description[i]);
+        if (!is_matched)
+        {
+            /// If one of the sorting columns is constant after filtering,
+            /// skip it, because it won't affect order anymore.
+            if (fixed_sorting_columns.contains(sorting_key_columns[key_pos]))
+            {
+                ++key_pos;
+                continue;
            }

-    if (order_key_prefix_descr.empty())
+            break;
+        }
+
+        if (desc_pos == 0)
+            read_direction = current_direction;
+
+        sort_description_for_merging.push_back(description[desc_pos]);
+
+        ++desc_pos;
+        ++key_pos;
+    }
+
+    if (sort_description_for_merging.empty())
        return {};

-    SortDescription order_key_fixed_prefix_descr;
-    order_key_fixed_prefix_descr.reserve(fixed_prefix_size);
-    for (size_t i = 0; i < fixed_prefix_size; ++i)
-        order_key_fixed_prefix_descr.emplace_back(sorting_key_columns[i], read_direction);
-
-    return std::make_shared<InputOrderInfo>(
-        std::move(order_key_fixed_prefix_descr),
-        std::move(order_key_prefix_descr),
-        read_direction, limit);
+    return std::make_shared<InputOrderInfo>(std::move(sort_description_for_merging), key_pos, read_direction, limit);
 }

 InputOrderInfoPtr ReadInOrderOptimizer::getInputOrder(
@ -255,10 +281,10 @@ InputOrderInfoPtr ReadInOrderOptimizer::getInputOrder(
            aliases_actions[i] = expression_analyzer.getActions(true);
        }

-        return getInputOrderImpl(metadata_snapshot, aliases_sort_description, aliases_actions, limit);
+        return getInputOrderImpl(metadata_snapshot, aliases_sort_description, aliases_actions, context, limit);
    }

-    return getInputOrderImpl(metadata_snapshot, required_sort_description, elements_actions, limit);
+    return getInputOrderImpl(metadata_snapshot, required_sort_description, elements_actions, context, limit);
 }

 }
--- a/src/Storages/ReadInOrderOptimizer.h
+++ b/src/Storages/ReadInOrderOptimizer.h
@ -12,8 +12,6 @@ namespace DB
 *   common prefix, which is needed for
 *   performing reading in order of PK.
 */
-class Context;
-
 class ReadInOrderOptimizer
 {
 public:
@ -30,6 +28,7 @@ private:
        const StorageMetadataPtr & metadata_snapshot,
        const SortDescription & description,
        const ManyExpressionActions & actions,
+        const ContextPtr & context,
        UInt64 limit) const;

    /// Actions for every element of order expression to analyze functions for monotonicity
--- a/src/Storages/SelectQueryInfo.h
+++ b/src/Storages/SelectQueryInfo.h
@ -101,17 +101,33 @@ struct FilterDAGInfo

 struct InputOrderInfo
 {
-    SortDescription order_key_fixed_prefix_descr;
-    SortDescription order_key_prefix_descr;
+    /// Sort description for merging of already sorted streams.
+    /// Always a prefix of ORDER BY or GROUP BY description specified in query.
+    SortDescription sort_description_for_merging;
+
+    /** Size of prefix of sorting key that is already
+     * sorted before execution of sorting or aggreagation.
+     *
+     * Contains both columns that scpecified in
+     * ORDER BY or GROUP BY clause of query
+     * and columns that turned out to be already sorted.
+     *
+     * E.g. if we have sorting key ORDER BY (a, b, c, d)
+     * and query with `WHERE a = 'x' AND b = 'y' ORDER BY c, d` clauses.
+     * sort_description_for_merging will be equal to (c, d) and
+     * used_prefix_of_sorting_key_size will be equal to 4.
+     */
+    size_t used_prefix_of_sorting_key_size;
+
    int direction;
    UInt64 limit;

    InputOrderInfo(
-        const SortDescription & order_key_fixed_prefix_descr_,
-        const SortDescription & order_key_prefix_descr_,
+        const SortDescription & sort_description_for_merging_,
+        size_t used_prefix_of_sorting_key_size_,
        int direction_, UInt64 limit_)
-        : order_key_fixed_prefix_descr(order_key_fixed_prefix_descr_)
-        , order_key_prefix_descr(order_key_prefix_descr_)
+        : sort_description_for_merging(sort_description_for_merging_)
+        , used_prefix_of_sorting_key_size(used_prefix_of_sorting_key_size_)
        , direction(direction_), limit(limit_)
    {
    }
--- a/src/Storages/StorageView.cpp
+++ b/src/Storages/StorageView.cpp
@ -179,12 +179,14 @@ void StorageView::replaceWithSubquery(ASTSelectQuery & outer_query, ASTPtr view_

    if (!table_expression->database_and_table_name)
    {
-        // If it's a view table function, add a fake db.table name.
+        // If it's a view or merge table function, add a fake db.table name.
        if (table_expression->table_function)
        {
            auto table_function_name = table_expression->table_function->as<ASTFunction>()->name;
-            if ((table_function_name == "view") || (table_function_name == "viewIfPermitted"))
+            if (table_function_name == "view" || table_function_name == "viewIfPermitted")
                table_expression->database_and_table_name = std::make_shared<ASTTableIdentifier>("__view");
+            if (table_function_name == "merge")
+                table_expression->database_and_table_name = std::make_shared<ASTTableIdentifier>("__merge");
        }
        if (!table_expression->database_and_table_name)
            throw Exception("Logical error: incorrect table expression", ErrorCodes::LOGICAL_ERROR);
--- a/src/Storages/System/StorageSystemFilesystemCache.cpp
+++ b/src/Storages/System/StorageSystemFilesystemCache.cpp
@ -2,9 +2,9 @@
 #include <DataTypes/DataTypeString.h>
 #include <DataTypes/DataTypesNumber.h>
 #include <DataTypes/DataTypeTuple.h>
-#include <Common/FileCache.h>
-#include <Common/FileSegment.h>
-#include <Common/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
+#include <Interpreters/Cache/FileSegment.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
 #include <Interpreters/Context.h>
 #include <Disks/IDisk.h>

--- a/src/Storages/System/StorageSystemRemoteDataPaths.cpp
+++ b/src/Storages/System/StorageSystemRemoteDataPaths.cpp
@ -1,8 +1,8 @@
 #include "StorageSystemRemoteDataPaths.h"
 #include <DataTypes/DataTypeString.h>
 #include <DataTypes/DataTypeArray.h>
-#include <Common/FileCache.h>
-#include <Common/FileCacheFactory.h>
+#include <Interpreters/Cache/FileCache.h>
+#include <Interpreters/Cache/FileCacheFactory.h>
 #include <Columns/ColumnString.h>
 #include <Columns/ColumnArray.h>
 #include <Interpreters/Context.h>
--- a/tests/integration/test_backup_restore_new/test.py
+++ b/tests/integration/test_backup_restore_new/test.py
@ -224,6 +224,89 @@ def test_incremental_backup_after_renaming_table():
    assert instance.query("SELECT count(), sum(x) FROM test.table2") == "100\t4950\n"


+def test_incremental_backup_for_log_family():
+    backup_name = new_backup_name()
+    create_and_fill_table(engine="Log")
+
+    assert instance.query("SELECT count(), sum(x) FROM test.table") == "100\t4950\n"
+    instance.query(f"BACKUP TABLE test.table TO {backup_name}")
+
+    instance.query("INSERT INTO test.table VALUES (65, 'a'), (66, 'b')")
+
+    assert instance.query("SELECT count(), sum(x) FROM test.table") == "102\t5081\n"
+
+    backup_name2 = new_backup_name()
+    instance.query(f"BACKUP TABLE test.table TO {backup_name2}")
+
+    backup_name_inc = new_backup_name()
+    instance.query(
+        f"BACKUP TABLE test.table TO {backup_name_inc} SETTINGS base_backup = {backup_name}"
+    )
+
+    metadata_path = os.path.join(
+        get_path_to_backup(backup_name), "metadata/test/table.sql"
+    )
+
+    metadata_path2 = os.path.join(
+        get_path_to_backup(backup_name2), "metadata/test/table.sql"
+    )
+
+    metadata_path_inc = os.path.join(
+        get_path_to_backup(backup_name_inc), "metadata/test/table.sql"
+    )
+
+    assert os.path.isfile(metadata_path)
+    assert os.path.isfile(metadata_path2)
+    assert not os.path.isfile(metadata_path_inc)
+    assert os.path.getsize(metadata_path) > 0
+    assert os.path.getsize(metadata_path) == os.path.getsize(metadata_path2)
+
+    x_bin_path = os.path.join(get_path_to_backup(backup_name), "data/test/table/x.bin")
+    y_bin_path = os.path.join(get_path_to_backup(backup_name), "data/test/table/y.bin")
+
+    x_bin_path2 = os.path.join(
+        get_path_to_backup(backup_name2), "data/test/table/x.bin"
+    )
+    y_bin_path2 = os.path.join(
+        get_path_to_backup(backup_name2), "data/test/table/y.bin"
+    )
+
+    x_bin_path_inc = os.path.join(
+        get_path_to_backup(backup_name_inc), "data/test/table/x.bin"
+    )
+
+    y_bin_path_inc = os.path.join(
+        get_path_to_backup(backup_name_inc), "data/test/table/y.bin"
+    )
+
+    assert os.path.isfile(x_bin_path)
+    assert os.path.isfile(y_bin_path)
+    assert os.path.isfile(x_bin_path2)
+    assert os.path.isfile(y_bin_path2)
+    assert os.path.isfile(x_bin_path_inc)
+    assert os.path.isfile(y_bin_path_inc)
+
+    x_bin_size = os.path.getsize(x_bin_path)
+    y_bin_size = os.path.getsize(y_bin_path)
+    x_bin_size2 = os.path.getsize(x_bin_path2)
+    y_bin_size2 = os.path.getsize(y_bin_path2)
+    x_bin_size_inc = os.path.getsize(x_bin_path_inc)
+    y_bin_size_inc = os.path.getsize(y_bin_path_inc)
+
+    assert x_bin_size > 0
+    assert y_bin_size > 0
+    assert x_bin_size2 > 0
+    assert y_bin_size2 > 0
+    assert x_bin_size_inc > 0
+    assert y_bin_size_inc > 0
+    assert x_bin_size2 == x_bin_size + x_bin_size_inc
+    assert y_bin_size2 == y_bin_size + y_bin_size_inc
+
+    instance.query(f"RESTORE TABLE test.table AS test.table2 FROM {backup_name_inc}")
+
+    assert instance.query("SELECT count(), sum(x) FROM test.table2") == "102\t5081\n"
+
+
 def test_backup_not_found_or_already_exists():
    backup_name = new_backup_name()

--- a/tests/integration/test_detached_parts_metrics/init.py
+++ b/tests/integration/test_detached_parts_metrics/init.py
--- a/tests/integration/test_detached_parts_metrics/configs/asynchronous_metrics_update_period_s.xml
+++ b/tests/integration/test_detached_parts_metrics/configs/asynchronous_metrics_update_period_s.xml
@ -0,0 +1,4 @@
+<clickhouse>
+    <asynchronous_metrics_update_period_s>1</asynchronous_metrics_update_period_s>
+    <asynchronous_heavy_metrics_update_period_s>1</asynchronous_heavy_metrics_update_period_s>
+</clickhouse>
--- a/tests/integration/test_detached_parts_metrics/test.py
+++ b/tests/integration/test_detached_parts_metrics/test.py
@ -0,0 +1,133 @@
+import time
+import pytest
+from helpers.cluster import ClickHouseCluster
+from helpers.test_tools import assert_eq_with_retry
+
+
+cluster = ClickHouseCluster(__file__)
+node1 = cluster.add_instance(
+    "node1",
+    main_configs=["configs/asynchronous_metrics_update_period_s.xml"],
+)
+
+
+@pytest.fixture(scope="module")
+def started_cluster():
+    try:
+        cluster.start()
+        yield cluster
+    finally:
+        cluster.shutdown()
+
+
+def test_event_time_microseconds_field(started_cluster):
+    cluster.start()
+    query_create = """
+    CREATE TABLE t
+    (
+       id Int64,
+       event_time Date
+    )
+    Engine=MergeTree()
+    PARTITION BY toYYYYMMDD(event_time)
+    ORDER BY id;
+    """
+    node1.query(query_create)
+
+    # gives us 2 partitions with 3 parts in total
+    node1.query("INSERT INTO t VALUES (1, toDate('2022-09-01'));")
+    node1.query("INSERT INTO t VALUES (2, toDate('2022-08-29'));")
+    node1.query("INSERT INTO t VALUES (3, toDate('2022-09-01'));")
+
+    query_number_detached_parts_in_async_metric = """
+    SELECT value
+    FROM system.asynchronous_metrics
+    WHERE metric LIKE 'NumberOfDetachedParts';
+    """
+    query_number_detached_by_user_parts_in_async_metric = """
+    SELECT value
+    FROM system.asynchronous_metrics
+    WHERE metric LIKE 'NumberOfDetachedByUserParts';
+    """
+    query_count_active_parts = """
+    SELECT count(*) FROM system.parts WHERE table = 't' AND active
+    """
+    query_count_detached_parts = """
+    SELECT count(*) FROM system.detached_parts WHERE table = 't'
+    """
+
+    query_one_partition_name = """
+    SELECT name FROM system.parts WHERE table = 't' AND active AND partition = '20220829'
+    """
+    partition_name = node1.query(query_one_partition_name).strip()
+
+    assert 0 == int(node1.query(query_count_detached_parts))
+    assert 3 == int(node1.query(query_count_active_parts))
+    assert 0 == int(node1.query(query_number_detached_parts_in_async_metric))
+    assert 0 == int(node1.query(query_number_detached_by_user_parts_in_async_metric))
+
+    # detach some parts and wait until asynchronous metrics notice it
+    node1.query("ALTER TABLE t DETACH PARTITION '20220901';")
+
+    assert 2 == int(node1.query(query_count_detached_parts))
+    assert 1 == int(node1.query(query_count_active_parts))
+
+    assert_eq_with_retry(
+        node1,
+        query_number_detached_parts_in_async_metric,
+        "2\n",
+    )
+    assert 2 == int(node1.query(query_number_detached_by_user_parts_in_async_metric))
+
+    # detach the rest parts and wait until asynchronous metrics notice it
+    node1.query("ALTER TABLE t DETACH PARTITION ALL")
+
+    assert 3 == int(node1.query(query_count_detached_parts))
+    assert 0 == int(node1.query(query_count_active_parts))
+
+    assert_eq_with_retry(
+        node1,
+        query_number_detached_parts_in_async_metric,
+        "3\n",
+    )
+    assert 3 == int(node1.query(query_number_detached_by_user_parts_in_async_metric))
+
+    # inject some data directly and wait until asynchronous metrics notice it
+    node1.exec_in_container(
+        [
+            "bash",
+            "-c",
+            "mkdir /var/lib/clickhouse/data/default/t/detached/unexpected_all_0_0_0",
+        ]
+    )
+
+    assert 4 == int(node1.query(query_count_detached_parts))
+    assert 0 == int(node1.query(query_count_active_parts))
+
+    assert_eq_with_retry(
+        node1,
+        query_number_detached_parts_in_async_metric,
+        "4\n",
+    )
+    assert 3 == int(node1.query(query_number_detached_by_user_parts_in_async_metric))
+
+    # drop some data directly and wait asynchronous metrics notice it
+    node1.exec_in_container(
+        [
+            "bash",
+            "-c",
+            "rm -rf /var/lib/clickhouse/data/default/t/detached/{}".format(
+                partition_name
+            ),
+        ]
+    )
+
+    assert 3 == int(node1.query(query_count_detached_parts))
+    assert 0 == int(node1.query(query_count_active_parts))
+
+    assert_eq_with_retry(
+        node1,
+        query_number_detached_parts_in_async_metric,
+        "3\n",
+    )
+    assert 2 == int(node1.query(query_number_detached_by_user_parts_in_async_metric))
--- a/tests/integration/test_structured_logging_json/configs/config_all_keys_json.xml
+++ b/tests/integration/test_structured_logging_json/configs/config_all_keys_json.xml
@ -0,0 +1,27 @@
+<?xml version="1.0"?>
+<clickhouse>
+    <logger>
+        <!-- Structured log formatting:
+        You can specify log format(for now, JSON only). In that case, the console log will be printed
+        in specified format like JSON.
+        For example, as below:
+        {"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
+        To enable JSON logging support, just uncomment <formatting> tag below.
+        -->
+        <formatting>
+            <type>json</type>
+            <names>
+                <date_time>DATE_TIME</date_time>
+                <thread_name>THREAD_NAME</thread_name>
+                <thread_id>THREAD_ID</thread_id>
+                <level>LEVEL</level>
+                <query_id>QUERY_ID</query_id>
+                <logger_name>LOGGER_NAME</logger_name>
+                <message>MESSAGE</message>
+                <source_file>SOURCE_FILE</source_file>
+                <source_line>SOURCE_LINE</source_line>
+            </names>
+        </formatting>
+    </logger>
+
+</clickhouse>
--- a/tests/integration/test_structured_logging_json/configs/config_json.xml
+++ b/tests/integration/test_structured_logging_json/configs/config_json.xml
@ -8,7 +8,20 @@
        {"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
        To enable JSON logging support, just uncomment <formatting> tag below.
        -->
-        <formatting>json</formatting>
+        <formatting>
+            <type>json</type>
+            <names>
+                <date_time>DATE_TIME</date_time>
+                <thread_name>THREAD_NAME</thread_name>
+                <thread_id>THREAD_ID</thread_id>
+                <level>LEVEL</level>
+                <query_id>QUERY_ID</query_id>
+                <logger_name>LOGGER_NAME</logger_name>
+                <message>MESSAGE</message>
+                <source_file>SOURCE_FILE</source_file>
+                <source_line>SOURCE_LINE</source_line>
+            </names>
+        </formatting>
    </logger>

 </clickhouse>
--- a/tests/integration/test_structured_logging_json/configs/config_no_keys_json.xml
+++ b/tests/integration/test_structured_logging_json/configs/config_no_keys_json.xml
@ -0,0 +1,27 @@
+<?xml version="1.0"?>
+<clickhouse>
+    <logger>
+        <!-- Structured log formatting:
+        You can specify log format(for now, JSON only). In that case, the console log will be printed
+        in specified format like JSON.
+        For example, as below:
+        {"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
+        To enable JSON logging support, just uncomment <formatting> tag below.
+        -->
+        <formatting>
+            <type>json</type>
+            <!--<names>
+                <date_time>DATE_TIME</date_time>
+                <thread_name>THREAD_NAME</thread_name>
+                <thread_id>THREAD_ID</thread_id>
+                <level>LEVEL</level>
+                <query_id>QUERY_ID</query_id>
+                <logger_name>LOGGER_NAME</logger_name>
+                <message>MESSAGE</message>
+                <source_file>SOURCE_FILE</source_file>
+                <source_line>SOURCE_LINE</source_line>
+            </names>-->
+        </formatting>
+    </logger>
+
+</clickhouse>
--- a/tests/integration/test_structured_logging_json/configs/config_some_keys_json.xml
+++ b/tests/integration/test_structured_logging_json/configs/config_some_keys_json.xml
@ -0,0 +1,27 @@
+<?xml version="1.0"?>
+<clickhouse>
+    <logger>
+        <!-- Structured log formatting:
+        You can specify log format(for now, JSON only). In that case, the console log will be printed
+        in specified format like JSON.
+        For example, as below:
+        {"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
+        To enable JSON logging support, just uncomment <formatting> tag below.
+        -->
+        <formatting>
+            <type>json</type>
+            <names>
+                <date_time>DATE_TIME</date_time>
+                <thread_name>THREAD_NAME</thread_name>
+                <thread_id>THREAD_ID</thread_id>
+                <level>LEVEL</level>
+                <!--<query_id>QUERY_ID</query_id>
+                <logger_name>LOGGER_NAME</logger_name>-->
+                <message>MESSAGE</message>
+                <source_file>SOURCE_FILE</source_file>
+                <!--<source_line>SOURCE_LINE</source_line>-->
+            </names>
+        </formatting>
+    </logger>
+
+</clickhouse>
--- a/tests/integration/test_structured_logging_json/test.py
+++ b/tests/integration/test_structured_logging_json/test.py
@ -1,9 +1,18 @@
 import pytest
 from helpers.cluster import ClickHouseCluster
 import json
+from xml.etree import ElementTree as ET

 cluster = ClickHouseCluster(__file__)
-node = cluster.add_instance("node", main_configs=["configs/config_json.xml"])
+node_all_keys = cluster.add_instance(
+    "node_all_keys", main_configs=["configs/config_all_keys_json.xml"]
+)
+node_some_keys = cluster.add_instance(
+    "node_some_keys", main_configs=["configs/config_some_keys_json.xml"]
+)
+node_no_keys = cluster.add_instance(
+    "node_no_keys", main_configs=["configs/config_no_keys_json.xml"]
+)


@pytest.fixture(scope="module")
@ -23,10 +32,70 @@ def is_json(log_json):
    return True


-def test_structured_logging_json_format(start_cluster):
-    node.query("SELECT 1")
+def validate_log_config_relation(config, logs, config_type):
+    root = ET.fromstring(config)
+    keys_in_config = set()

-    logs = node.grep_in_log("").split("\n")
+    if config_type == "config_no_keys":
+        keys_in_config.add("date_time")
+        keys_in_config.add("thread_name")
+        keys_in_config.add("thread_id")
+        keys_in_config.add("level")
+        keys_in_config.add("query_id")
+        keys_in_config.add("logger_name")
+        keys_in_config.add("message")
+        keys_in_config.add("source_file")
+        keys_in_config.add("source_line")
+    else:
+        for child in root.findall(".//names/*"):
+            keys_in_config.add(child.text)
+
+    try:
        length = min(10, len(logs))
        for i in range(0, length):
-        assert is_json(logs[i])
+            json_log = json.loads(logs[i])
+            keys_in_log = set()
+            for log_key in json_log.keys():
+                keys_in_log.add(log_key)
+                if log_key not in keys_in_config:
+                    return False
+            for config_key in keys_in_config:
+                if config_key not in keys_in_log:
+                    return False
+    except ValueError as e:
+        return False
+    return True
+
+
+def validate_logs(logs):
+    length = min(10, len(logs))
+    result = True
+    for i in range(0, length):
+        result = result and is_json(logs[i])
+    return result
+
+
+def valiade_everything(config, node, config_type):
+    node.query("SELECT 1")
+    logs = node.grep_in_log("").split("\n")
+    return validate_logs(logs) and validate_log_config_relation(
+        config, logs, config_type
+    )
+
+
+def test_structured_logging_json_format(start_cluster):
+    config_all_keys = node_all_keys.exec_in_container(
+        ["cat", "/etc/clickhouse-server/config.d/config_all_keys_json.xml"]
+    )
+    config_some_keys = node_some_keys.exec_in_container(
+        ["cat", "/etc/clickhouse-server/config.d/config_some_keys_json.xml"]
+    )
+    config_no_keys = node_no_keys.exec_in_container(
+        ["cat", "/etc/clickhouse-server/config.d/config_no_keys_json.xml"]
+    )
+
+    assert valiade_everything(config_all_keys, node_all_keys, "config_all_keys") == True
+    assert (
+        valiade_everything(config_some_keys, node_some_keys, "config_some_keys") == True
+    )
+    assert valiade_everything(config_no_keys, node_no_keys, "config_no_keys") == True
--- a/Show More
+++ b/Show More