merged master

2024-11-22 23:52:03 +00:00 · 2019-11-20 02:34:05 +03:00 · 2019-11-20 02:34:05 +03:00 · 5be62948bc
commit 5be62948bc
parent 1bf4d21c67 7b6e61abb6
142 changed files with 3827 additions and 1024 deletions
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -12,7 +12,7 @@ Changelog category (leave one):
 - Non-significant (changelog entry is not needed)


-Changelog entry (up to few sentences, not needed for non-significant PRs):
+Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):

 ...

--- a/.gitmodules
+++ b/.gitmodules
@ -1,6 +1,7 @@
 [submodule "contrib/poco"]
 	path = contrib/poco
 	url = https://github.com/ClickHouse-Extras/poco
+	branch = clickhouse
 [submodule "contrib/zstd"]
 	path = contrib/zstd
 	url = https://github.com/facebook/zstd.git
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -128,7 +128,7 @@ Kuzmenkov](https://github.com/akuzm))
 Zuikov](https://github.com/4ertus2))
 * Optimize partial merge join. [#7070](https://github.com/ClickHouse/ClickHouse/pull/7070)
  ([Artem Zuikov](https://github.com/4ertus2))
-* Do not use more then 98K of memory in uniqCombined functions.
+* Do not use more than 98K of memory in uniqCombined functions.
  [#7236](https://github.com/ClickHouse/ClickHouse/pull/7236),
 [#7270](https://github.com/ClickHouse/ClickHouse/pull/7270) ([Azat
 Khuzhin](https://github.com/azat))
@ -396,7 +396,7 @@ fix comments to make obvious that it may throw.
 * Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/ClickHouse/ClickHouse/pull/6384) ([Anton Popov](https://github.com/CurtizJ))
 * Fixed wrong code in mutations that may lead to memory corruption. Fixed segfault with read of address `0x14c0` that may happed due to concurrent `DROP TABLE` and `SELECT` from `system.parts` or `system.parts_columns`. Fixed race condition in preparation of mutation queries. Fixed deadlock caused by `OPTIMIZE` of Replicated tables and concurrent modification operations like ALTERs. [#6514](https://github.com/ClickHouse/ClickHouse/pull/6514) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Removed extra verbose logging in MySQL interface [#6389](https://github.com/ClickHouse/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
-* Return ability to parse boolean settings from 'true' and 'false' in configuration file. [#6278](https://github.com/ClickHouse/ClickHouse/pull/6278) ([alesapin](https://github.com/alesapin))
+* Return the ability to parse boolean settings from 'true' and 'false' in the configuration file. [#6278](https://github.com/ClickHouse/ClickHouse/pull/6278) ([alesapin](https://github.com/alesapin))
 * Fix crash in `quantile` and `median` function over `Nullable(Decimal128)`. [#6378](https://github.com/ClickHouse/ClickHouse/pull/6378) ([Artem Zuikov](https://github.com/4ertus2))
 * Fixed possible incomplete result returned by `SELECT` query with `WHERE` condition on primary key contained conversion to Float type. It was caused by incorrect checking of monotonicity in `toFloat` function. [#6248](https://github.com/ClickHouse/ClickHouse/issues/6248) [#6374](https://github.com/ClickHouse/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
 * Check `max_expanded_ast_elements` setting for mutations. Clear mutations after `TRUNCATE TABLE`. [#6205](https://github.com/ClickHouse/ClickHouse/pull/6205) ([Winter Zhang](https://github.com/zhang2014))
@ -424,8 +424,8 @@ fix comments to make obvious that it may throw.
 * Fix bug with writing secondary indices marks with adaptive granularity. [#6126](https://github.com/ClickHouse/ClickHouse/pull/6126) ([alesapin](https://github.com/alesapin))
 * Fix initialization order while server startup. Since `StorageMergeTree::background_task_handle` is initialized in `startup()` the `MergeTreeBlockOutputStream::write()` may try to use it before initialization. Just check if it is initialized. [#6080](https://github.com/ClickHouse/ClickHouse/pull/6080) ([Ivan](https://github.com/abyss7))
 * Clearing the data buffer from the previous read operation that was completed with an error. [#6026](https://github.com/ClickHouse/ClickHouse/pull/6026) ([Nikolay](https://github.com/bopohaa))
-* Fix bug with enabling adaptive granularity when creating new replica for Replicated*MergeTree table. [#6394](https://github.com/ClickHouse/ClickHouse/issues/6394) [#6452](https://github.com/ClickHouse/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
-* Fixed possible crash during server startup in case of exception happened in `libunwind` during exception at access to uninitialised `ThreadStatus` structure. [#6456](https://github.com/ClickHouse/ClickHouse/pull/6456) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
+* Fix bug with enabling adaptive granularity when creating a new replica for Replicated*MergeTree table. [#6394](https://github.com/ClickHouse/ClickHouse/issues/6394) [#6452](https://github.com/ClickHouse/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
+* Fixed possible crash during server startup in case of exception happened in `libunwind` during exception at access to uninitialized `ThreadStatus` structure. [#6456](https://github.com/ClickHouse/ClickHouse/pull/6456) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
 * Fix crash in `yandexConsistentHash` function. Found by fuzz test. [#6304](https://github.com/ClickHouse/ClickHouse/issues/6304) [#6305](https://github.com/ClickHouse/ClickHouse/pull/6305) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed the possibility of hanging queries when server is overloaded and global thread pool becomes near full. This have higher chance to happen on clusters with large number of shards (hundreds), because distributed queries allocate a thread per connection to each shard. For example, this issue may reproduce if a cluster of 330 shards is processing 30 concurrent distributed queries. This issue affects all versions starting from 19.2. [#6301](https://github.com/ClickHouse/ClickHouse/pull/6301) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/ClickHouse/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
@ -669,7 +669,7 @@ fix comments to make obvious that it may throw.
 * Fix kafka tests. [#6805](https://github.com/ClickHouse/ClickHouse/pull/6805) ([Ivan](https://github.com/abyss7))

 ### Security Fix
-* If the attacker has write access to ZooKeeper and is able to run custom server available from the network where ClickHouse run, it can create custom-built malicious server that will act as ClickHouse replica and register it in ZooKeeper. When another replica will fetch data part from malicious replica, it can force clickhouse-server to write to arbitrary path on filesystem. Found by Eldar Zaitov, information security team at Yandex. [#6247](https://github.com/ClickHouse/ClickHouse/pull/6247) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* If the attacker has write access to ZooKeeper and is able to run custom server available from the network where ClickHouse runs, it can create custom-built malicious server that will act as ClickHouse replica and register it in ZooKeeper. When another replica will fetch data part from malicious replica, it can force clickhouse-server to write to arbitrary path on filesystem. Found by Eldar Zaitov, information security team at Yandex. [#6247](https://github.com/ClickHouse/ClickHouse/pull/6247) ([alexey-milovidov](https://github.com/alexey-milovidov))

 ## ClickHouse release 19.13.3.26, 2019-08-22

@ -697,7 +697,7 @@ fix comments to make obvious that it may throw.
 * Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/ClickHouse/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))

 ### Backward Incompatible Change
-* The setting `input_format_defaults_for_omitted_fields` is enabled by default. Inserts in Distibuted tables need this setting to be the same on cluster (you need to set it before rolling update). It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference. [#6043](https://github.com/ClickHouse/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/ClickHouse/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))
+* The setting `input_format_defaults_for_omitted_fields` is enabled by default. Inserts in Distributed tables need this setting to be the same on cluster (you need to set it before rolling update). It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behavior but may lead to negligible performance difference. [#6043](https://github.com/ClickHouse/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/ClickHouse/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))

 ### Experimental features
 * New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/ClickHouse/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
@ -1478,7 +1478,7 @@ lee](https://github.com/neverlee))

 ### Bug fixes

-* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/ClickHouse/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed error in #3920. This error manifests itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/ClickHouse/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))


 ## ClickHouse release 19.3.6, 2019-03-02
@ -2335,7 +2335,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si

 ### Improvements:

-* Changed the numbering scheme for release versions. Now the first part contains the year of release (A.D., Moscow timezone, minus 2000), the second part contains the number for major changes (increases for most releases), and the third part is the patch version. Releases are still backwards compatible, unless otherwise stated in the changelog.
+* Changed the numbering scheme for release versions. Now the first part contains the year of release (A.D., Moscow timezone, minus 2000), the second part contains the number for major changes (increases for most releases), and the third part is the patch version. Releases are still backward compatible, unless otherwise stated in the changelog.
 * Faster conversions of floating-point numbers to a string ([Amos Bird](https://github.com/ClickHouse/ClickHouse/pull/2664)).
 * If some rows were skipped during an insert due to parsing errors (this is possible with the `input_allow_errors_num` and `input_allow_errors_ratio` settings enabled), the number of skipped rows is now written to the server log ([Leonardo Cecchi](https://github.com/ClickHouse/ClickHouse/pull/2669)).

@ -2534,7 +2534,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si
 * Configuration of the table level for the `ReplicatedMergeTree` family in order to minimize the amount of data stored in Zookeeper: : `use_minimalistic_checksums_in_zookeeper = 1`
 * Configuration of the `clickhouse-client` prompt. By default, server names are now output to the prompt. The server's display name can be changed. It's also sent in the `X-ClickHouse-Display-Name` HTTP header (Kirill Shvakov).
 * Multiple comma-separated `topics` can be specified for the `Kafka` engine  (Tobias Adamson)
-* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was cancelled` exception instead of an incomplete result.
+* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was canceled` exception instead of an incomplete result.

 ### Improvements:

--- a/README.md
+++ b/README.md
@ -13,8 +13,7 @@ ClickHouse is an open-source column-oriented database management system that all
 * You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.

 ## Upcoming Events
-* [ClickHouse Meetup in Tokyo](https://clickhouse.connpass.com/event/147001/) on November 14.
-* [ClickHouse Meetup in Istanbul](https://www.eventbrite.com/e/clickhouse-meetup-istanbul-create-blazing-fast-experiences-w-clickhouse-tickets-73101120419) on November 19.
+
 * [ClickHouse Meetup in Ankara](https://www.eventbrite.com/e/clickhouse-meetup-ankara-create-blazing-fast-experiences-w-clickhouse-tickets-73100530655) on November 21.
 * [ClickHouse Meetup in Singapore](https://www.meetup.com/Singapore-Clickhouse-Meetup-Group/events/265085331/) on November 23.
 * [ClickHouse Meetup in San Francisco](https://www.eventbrite.com/e/clickhouse-december-meetup-registration-78642047481) on December 3.
--- a/contrib/poco
+++ b/contrib/poco
@ -1 +1 @@
-Subproject commit 6216cc01a107ce149863411ca29013a224f80343
+Subproject commit 2b273bfe9db89429b2040c024484dee0197e48c7
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@ -76,7 +76,7 @@ elseif (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
 endif()

 if (USE_DEBUG_HELPERS)
-    set (INCLUDE_DEBUG_HELPERS "-include ${ClickHouse_SOURCE_DIR}/libs/libcommon/include/common/iostream_debug_helpers.h")
+    set (INCLUDE_DEBUG_HELPERS "-I${ClickHouse_SOURCE_DIR}/libs/libcommon/include -include ${ClickHouse_SOURCE_DIR}/dbms/src/Core/iostream_debug_helpers.h")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${INCLUDE_DEBUG_HELPERS}")
 endif ()

--- a/dbms/programs/client/Client.cpp
+++ b/dbms/programs/client/Client.cpp
@ -497,15 +497,21 @@ private:
                throw Exception("Cannot initialize readline", ErrorCodes::CANNOT_READLINE);

 #if RL_VERSION_MAJOR >= 7
-            /// When bracketed paste mode is set, pasted text is bracketed with control sequences so
-            ///  that the program can differentiate pasted text from typed-in text. This helps
-            ///  clickhouse-client so that without -m flag, one can still paste multiline queries, and
-            ///  possibly get better pasting performance. See https://cirw.in/blog/bracketed-paste for
-            ///  more details.
-            rl_variable_bind("enable-bracketed-paste", "on");
+            /// Enable bracketed-paste-mode only when multiquery is enabled and multiline is
+            ///  disabled, so that we are able to paste and execute multiline queries in a whole
+            ///  instead of erroring out, while be less intrusive.
+            if (config().has("multiquery") && !config().has("multiline"))
+            {
+                /// When bracketed paste mode is set, pasted text is bracketed with control sequences so
+                ///  that the program can differentiate pasted text from typed-in text. This helps
+                ///  clickhouse-client so that without -m flag, one can still paste multiline queries, and
+                ///  possibly get better pasting performance. See https://cirw.in/blog/bracketed-paste for
+                ///  more details.
+                rl_variable_bind("enable-bracketed-paste", "on");

-            /// Use our bracketed paste handler to get better user experience. See comments above.
-            rl_bind_keyseq(BRACK_PASTE_PREF, clickhouse_rl_bracketed_paste_begin);
+                /// Use our bracketed paste handler to get better user experience. See comments above.
+                rl_bind_keyseq(BRACK_PASTE_PREF, clickhouse_rl_bracketed_paste_begin);
+            }
 #endif

            auto clear_prompt_or_exit = [](int)
@ -751,6 +757,9 @@ private:

    bool process(const String & text)
    {
+        if (exit_strings.end() != exit_strings.find(trim(text, [](char c){ return isWhitespaceASCII(c) || c == ';'; })))
+            return false;
+
        const bool test_mode = config().has("testmode");
        if (config().has("multiquery"))
        {
@ -845,9 +854,6 @@ private:

    bool processSingleQuery(const String & line, ASTPtr parsed_query_ = nullptr)
    {
-        if (exit_strings.end() != exit_strings.find(trim(line, [](char c){ return isWhitespaceASCII(c) || c == ';'; })))
-            return false;
-
        resetOutput();
        got_exception = false;

@ -1220,7 +1226,7 @@ private:
    /// Returns true if one should continue receiving packets.
    bool receiveAndProcessPacket()
    {
-        Connection::Packet packet = connection->receivePacket();
+        Packet packet = connection->receivePacket();

        switch (packet.type)
        {
@ -1268,7 +1274,7 @@ private:
    {
        while (true)
        {
-            Connection::Packet packet = connection->receivePacket();
+            Packet packet = connection->receivePacket();

            switch (packet.type)
            {
@ -1302,7 +1308,7 @@ private:
    {
        while (true)
        {
-            Connection::Packet packet = connection->receivePacket();
+            Packet packet = connection->receivePacket();

            switch (packet.type)
            {
--- a/dbms/programs/client/Suggest.h
+++ b/dbms/programs/client/Suggest.h
@ -113,7 +113,7 @@ private:

        while (true)
        {
-            Connection::Packet packet = connection.receivePacket();
+            Packet packet = connection.receivePacket();
            switch (packet.type)
            {
                case Protocol::Server::Data:
--- a/dbms/programs/odbc-bridge/ColumnInfoHandler.cpp
+++ b/dbms/programs/odbc-bridge/ColumnInfoHandler.cpp
@ -18,6 +18,7 @@
 #include <Poco/Net/HTTPServerRequest.h>
 #include <Poco/Net/HTTPServerResponse.h>
 #include <Poco/Net/HTMLForm.h>
+#include <Poco/NumberParser.h>
 #include <DataTypes/DataTypeFactory.h>
 #include <DataTypes/DataTypeNullable.h>
 #include <IO/WriteBufferFromHTTPServerResponse.h>
@ -95,6 +96,7 @@ void ODBCColumnsInfoHandler::handleRequest(Poco::Net::HTTPServerRequest & reques
    std::string schema_name = "";
    std::string table_name = params.get("table");
    std::string connection_string = params.get("connection_string");
+
    if (params.has("schema"))
    {
        schema_name = params.get("schema");
@ -106,6 +108,8 @@ void ODBCColumnsInfoHandler::handleRequest(Poco::Net::HTTPServerRequest & reques

    try
    {
+        const bool external_table_functions_use_nulls = Poco::NumberParser::parseBool(params.get("external_table_functions_use_nulls", "false"));
+
        POCO_SQL_ODBC_CLASS::SessionImpl session(validateODBCConnectionString(connection_string), DBMS_DEFAULT_CONNECT_TIMEOUT_SEC);
        SQLHDBC hdbc = session.dbc().handle();

@ -160,13 +164,13 @@ void ODBCColumnsInfoHandler::handleRequest(Poco::Net::HTTPServerRequest & reques
            /// TODO Why 301?
            SQLCHAR column_name[301];

-            SQLSMALLINT nullable;
-            const auto result = POCO_SQL_ODBC_CLASS::SQLDescribeCol(hstmt, ncol, column_name, sizeof(column_name), nullptr, &type, nullptr, nullptr, &nullable);
+            SQLSMALLINT is_nullable;
+            const auto result = POCO_SQL_ODBC_CLASS::SQLDescribeCol(hstmt, ncol, column_name, sizeof(column_name), nullptr, &type, nullptr, nullptr, &is_nullable);
            if (POCO_SQL_ODBC_CLASS::Utility::isError(result))
                throw POCO_SQL_ODBC_CLASS::StatementException(hstmt);

            auto column_type = getDataType(type);
-            if (nullable == SQL_NULLABLE)
+            if (external_table_functions_use_nulls && is_nullable == SQL_NULLABLE)
            {
                column_type = std::make_shared<DataTypeNullable>(column_type);
            }
--- a/dbms/programs/performance-test/PerformanceTest.cpp
+++ b/dbms/programs/performance-test/PerformanceTest.cpp
@ -35,7 +35,7 @@ void waitQuery(Connection & connection)
        if (!connection.poll(1000000))
            continue;

-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();
        switch (packet.type)
        {
            case Protocol::Server::EndOfStream:
@ -120,7 +120,7 @@ bool PerformanceTest::checkPreconditions() const

            while (true)
            {
-                Connection::Packet packet = connection.receivePacket();
+                Packet packet = connection.receivePacket();

                if (packet.type == Protocol::Server::Data)
                {
--- a/dbms/programs/server/MySQLHandler.cpp
+++ b/dbms/programs/server/MySQLHandler.cpp
@ -225,7 +225,7 @@ void MySQLHandler::authenticate(const String & user_name, const String & auth_pl
        }

        std::optional<String> auth_response = auth_plugin_name == auth_plugin->getName() ? std::make_optional<String>(initial_auth_response) : std::nullopt;
-        auth_plugin->authenticate(user_name, auth_response, connection_context, packet_sender, secure_connection, socket().address());
+        auth_plugin->authenticate(user_name, auth_response, connection_context, packet_sender, secure_connection, socket().peerAddress());
    }
    catch (const Exception & exc)
    {
--- a/dbms/programs/server/TCPHandler.cpp
+++ b/dbms/programs/server/TCPHandler.cpp
@ -924,7 +924,9 @@ void TCPHandler::receiveQuery()

    /// Per query settings.
    Settings & settings = query_context->getSettingsRef();
-    settings.deserialize(*in);
+    auto settings_format = (client_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                      : SettingsBinaryFormat::OLD;
+    settings.deserialize(*in, settings_format);

    /// Sync timeouts on client and server during current query to avoid dangling queries on server
    /// NOTE: We use settings.send_timeout for the receive timeout and vice versa (change arguments ordering in TimeoutSetter),
@ -953,7 +955,9 @@ void TCPHandler::receiveUnexpectedQuery()
        skip_client_info.read(*in, client_revision);

    Settings & skip_settings = query_context->getSettingsRef();
-    skip_settings.deserialize(*in);
+    auto settings_format = (client_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                      : SettingsBinaryFormat::OLD;
+    skip_settings.deserialize(*in, settings_format);

    readVarUInt(skip_uint_64, *in);
    readVarUInt(skip_uint_64, *in);
--- a/dbms/src/AggregateFunctions/AggregateFunctionArray.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionArray.h
@ -129,6 +129,8 @@ public:
        return nested_func->allocatesMemoryInArena();
    }

+    AggregateFunctionPtr getNestedFunction() const { return nested_func; }
+
    const char * getHeaderFilePath() const override { return __FILE__; }
 };

--- a/dbms/src/AggregateFunctions/AggregateFunctionUniq.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionUniq.h
@ -219,7 +219,8 @@ public:
        return std::make_shared<DataTypeUInt64>();
    }

-    void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
+    /// ALWAYS_INLINE is required to have better code layout for uniqHLL12 function
+    void ALWAYS_INLINE add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
    {
        detail::OneAdder<T, Data>::add(this->data(place), *columns[0], row_num);
    }
--- a/dbms/src/AggregateFunctions/AggregateFunctionUniqUpTo.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionUniqUpTo.h
@ -48,7 +48,8 @@ struct __attribute__((__packed__)) AggregateFunctionUniqUpToData
    }

    /// threshold - for how many elements there is room in a `data`.
-    void insert(T x, UInt8 threshold)
+    /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function
+    void ALWAYS_INLINE insert(T x, UInt8 threshold)
    {
        /// The state is already full - nothing needs to be done.
        if (count > threshold)
@ -100,7 +101,8 @@ struct __attribute__((__packed__)) AggregateFunctionUniqUpToData
            rb.read(reinterpret_cast<char *>(data), count * sizeof(data[0]));
    }

-    void add(const IColumn & column, size_t row_num, UInt8 threshold)
+    /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function
+    void ALWAYS_INLINE add(const IColumn & column, size_t row_num, UInt8 threshold)
    {
        insert(assert_cast<const ColumnVector<T> &>(column).getData()[row_num], threshold);
    }
@ -111,7 +113,8 @@ struct __attribute__((__packed__)) AggregateFunctionUniqUpToData
 template <>
 struct AggregateFunctionUniqUpToData<String> : AggregateFunctionUniqUpToData<UInt64>
 {
-    void add(const IColumn & column, size_t row_num, UInt8 threshold)
+    /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function
+    void ALWAYS_INLINE add(const IColumn & column, size_t row_num, UInt8 threshold)
    {
        /// Keep in mind that calculations are approximate.
        StringRef value = column.getDataAt(row_num);
@ -122,7 +125,8 @@ struct AggregateFunctionUniqUpToData<String> : AggregateFunctionUniqUpToData<UIn
 template <>
 struct AggregateFunctionUniqUpToData<UInt128> : AggregateFunctionUniqUpToData<UInt64>
 {
-    void add(const IColumn & column, size_t row_num, UInt8 threshold)
+    /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function
+    void ALWAYS_INLINE add(const IColumn & column, size_t row_num, UInt8 threshold)
    {
        UInt128 value = assert_cast<const ColumnVector<UInt128> &>(column).getData()[row_num];
        insert(sipHash64(value), threshold);
@ -155,7 +159,8 @@ public:
        return std::make_shared<DataTypeUInt64>();
    }

-    void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
+    /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function
+    void ALWAYS_INLINE add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
    {
        this->data(place).add(*columns[0], row_num, threshold);
    }
--- a/dbms/src/AggregateFunctions/IAggregateFunction.h
+++ b/dbms/src/AggregateFunctions/IAggregateFunction.h
@ -131,12 +131,23 @@ public:
    /** Contains a loop with calls to "add" function. You can collect arguments into array "places"
      *  and do a single call to "addBatch" for devirtualization and inlining.
      */
-    virtual void addBatch(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, Arena * arena) const = 0;
+    virtual void
+    addBatch(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, Arena * arena)
+        const = 0;

    /** The same for single place.
      */
    virtual void addBatchSinglePlace(size_t batch_size, AggregateDataPtr place, const IColumn ** columns, Arena * arena) const = 0;

+    /** In addition to addBatch, this method collects multiple rows of arguments into array "places"
+      *  as long as they are between offsets[i-1] and offsets[i]. This is used for arrayReduce and
+      *  -Array combinator. It might also be used generally to break data dependency when array
+      *  "places" contains a large number of same values consecutively.
+      */
+    virtual void
+    addBatchArray(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, const UInt64 * offsets, Arena * arena)
+        const = 0;
+
    /** This is used for runtime code generation to determine, which header files to include in generated source.
      * Always implement it as
      * const char * getHeaderFilePath() const override { return __FILE__; }
@ -179,6 +190,20 @@ public:
        for (size_t i = 0; i < batch_size; ++i)
            static_cast<const Derived *>(this)->add(place, columns, i, arena);
    }
+
+    void addBatchArray(
+        size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, const UInt64 * offsets, Arena * arena)
+        const override
+    {
+        size_t current_offset = 0;
+        for (size_t i = 0; i < batch_size; ++i)
+        {
+            size_t next_offset = offsets[i];
+            for (size_t j = current_offset; j < next_offset; ++j)
+                static_cast<const Derived *>(this)->add(places[i] + place_offset, columns, j, arena);
+            current_offset = next_offset;
+        }
+    }
 };


--- a/dbms/src/Client/Connection.cpp
+++ b/dbms/src/Client/Connection.cpp
@ -409,7 +409,11 @@ void Connection::sendQuery(

    /// Per query settings.
    if (settings)
-        settings->serialize(*out);
+    {
+        auto settings_format = (server_revision >= DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS) ? SettingsBinaryFormat::STRINGS
+                                                                                                          : SettingsBinaryFormat::OLD;
+        settings->serialize(*out, settings_format);
+    }
    else
        writeStringBinary("" /* empty string is a marker of the end of settings */, *out);

@ -612,7 +616,7 @@ std::optional<UInt64> Connection::checkPacket(size_t timeout_microseconds)
 }


-Connection::Packet Connection::receivePacket()
+Packet Connection::receivePacket()
 {
    try
    {
--- a/dbms/src/Client/Connection.h
+++ b/dbms/src/Client/Connection.h
@ -42,6 +42,21 @@ using ConnectionPtr = std::shared_ptr<Connection>;
 using Connections = std::vector<ConnectionPtr>;


+/// Packet that could be received from server.
+struct Packet
+{
+    UInt64 type;
+
+    Block block;
+    std::unique_ptr<Exception> exception;
+    std::vector<String> multistring_message;
+    Progress progress;
+    BlockStreamProfileInfo profile_info;
+
+    Packet() : type(Protocol::Server::Hello) {}
+};
+
+
 /** Connection with database server, to use by client.
  * How to use - see Core/Protocol.h
  * (Implementation of server end - see Server/TCPHandler.h)
@ -87,20 +102,6 @@ public:
    }


-    /// Packet that could be received from server.
-    struct Packet
-    {
-        UInt64 type;
-
-        Block block;
-        std::unique_ptr<Exception> exception;
-        std::vector<String> multistring_message;
-        Progress progress;
-        BlockStreamProfileInfo profile_info;
-
-        Packet() : type(Protocol::Server::Hello) {}
-    };
-
    /// Change default database. Changes will take effect on next reconnect.
    void setDefaultDatabase(const String & database);

--- a/dbms/src/Client/MultiplexedConnections.cpp
+++ b/dbms/src/Client/MultiplexedConnections.cpp
@ -138,10 +138,10 @@ void MultiplexedConnections::sendQuery(
    sent_query = true;
 }

-Connection::Packet MultiplexedConnections::receivePacket()
+Packet MultiplexedConnections::receivePacket()
 {
    std::lock_guard lock(cancel_mutex);
-    Connection::Packet packet = receivePacketUnlocked();
+    Packet packet = receivePacketUnlocked();
    return packet;
 }

@ -177,19 +177,19 @@ void MultiplexedConnections::sendCancel()
    cancelled = true;
 }

-Connection::Packet MultiplexedConnections::drain()
+Packet MultiplexedConnections::drain()
 {
    std::lock_guard lock(cancel_mutex);

    if (!cancelled)
        throw Exception("Cannot drain connections: cancel first.", ErrorCodes::LOGICAL_ERROR);

-    Connection::Packet res;
+    Packet res;
    res.type = Protocol::Server::EndOfStream;

    while (hasActiveConnections())
    {
-        Connection::Packet packet = receivePacketUnlocked();
+        Packet packet = receivePacketUnlocked();

        switch (packet.type)
        {
@ -235,7 +235,7 @@ std::string MultiplexedConnections::dumpAddressesUnlocked() const
    return os.str();
 }

-Connection::Packet MultiplexedConnections::receivePacketUnlocked()
+Packet MultiplexedConnections::receivePacketUnlocked()
 {
    if (!sent_query)
        throw Exception("Cannot receive packets: no query sent.", ErrorCodes::LOGICAL_ERROR);
@ -247,7 +247,7 @@ Connection::Packet MultiplexedConnections::receivePacketUnlocked()
    if (current_connection == nullptr)
        throw Exception("Logical error: no available replica", ErrorCodes::NO_AVAILABLE_REPLICA);

-    Connection::Packet packet = current_connection->receivePacket();
+    Packet packet = current_connection->receivePacket();

    switch (packet.type)
    {
--- a/dbms/src/Client/MultiplexedConnections.h
+++ b/dbms/src/Client/MultiplexedConnections.h
@ -42,7 +42,7 @@ public:
        bool with_pending_data = false);

    /// Get packet from any replica.
-    Connection::Packet receivePacket();
+    Packet receivePacket();

    /// Break all active connections.
    void disconnect();
@ -54,7 +54,7 @@ public:
      * Returns EndOfStream if no exception has been received. Otherwise
      * returns the last received packet of type Exception.
      */
-    Connection::Packet drain();
+    Packet drain();

    /// Get the replica addresses as a string.
    std::string dumpAddresses() const;
@ -69,7 +69,7 @@ public:

 private:
    /// Internal version of `receivePacket` function without locking.
-    Connection::Packet receivePacketUnlocked();
+    Packet receivePacketUnlocked();

    /// Internal version of `dumpAddresses` function without locking.
    std::string dumpAddressesUnlocked() const;
--- a/dbms/src/Common/HyperLogLogCounter.h
+++ b/dbms/src/Common/HyperLogLogCounter.h
@ -293,7 +293,8 @@ private:
 public:
    using value_type = Value;

-    void insert(Value value)
+    /// ALWAYS_INLINE is required to have better code layout for uniqCombined function
+    void ALWAYS_INLINE insert(Value value)
    {
        HashValueType hash = getHash(value);

@ -420,7 +421,8 @@ private:
    }

    /// Update maximum rank for current bucket.
-    void update(HashValueType bucket, UInt8 rank)
+    /// ALWAYS_INLINE is required to have better code layout for uniqCombined function
+    void ALWAYS_INLINE update(HashValueType bucket, UInt8 rank)
    {
        typename RankStore::Locus content = rank_store[bucket];
        UInt8 cur_rank = static_cast<UInt8>(content);
--- a/dbms/src/Common/HyperLogLogWithSmallSetOptimization.h
+++ b/dbms/src/Common/HyperLogLogWithSmallSetOptimization.h
@ -56,7 +56,8 @@ public:
            delete large;
    }

-    void insert(Key value)
+    /// ALWAYS_INLINE is required to have better code layout for uniqHLL12 function
+    void ALWAYS_INLINE insert(Key value)
    {
        if (!isLarge())
        {
--- a/dbms/src/Common/PODArray.h
+++ b/dbms/src/Common/PODArray.h
@ -430,11 +430,11 @@ public:
    template <typename It1, typename It2>
    void insert(iterator it, It1 from_begin, It2 from_end)
    {
-        insertPrepare(from_begin, from_end);
-
        size_t bytes_to_copy = this->byte_size(from_end - from_begin);
        size_t bytes_to_move = (end() - it) * sizeof(T);

+        insertPrepare(from_begin, from_end);
+
        if (unlikely(bytes_to_move))
            memcpy(this->c_end + bytes_to_copy - bytes_to_move, this->c_end - bytes_to_move, bytes_to_move);

--- a/dbms/src/Common/ThreadStatus.h
+++ b/dbms/src/Common/ThreadStatus.h
@ -4,7 +4,7 @@
 #include <Common/ProfileEvents.h>
 #include <Common/MemoryTracker.h>

-#include <Core/SettingsCommon.h>
+#include <Core/SettingsCollection.h>

 #include <IO/Progress.h>

--- a/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp
+++ b/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp
@ -16,15 +16,11 @@ unsigned getNumberOfPhysicalCPUCores()
 {
 #if USE_CPUID
    cpu_raw_data_t raw_data;
-    if (0 != cpuid_get_raw_data(&raw_data))
-        throw DB::Exception("Cannot cpuid_get_raw_data: " + std::string(cpuid_error()), DB::ErrorCodes::CPUID_ERROR);
-
    cpu_id_t data;
-    if (0 != cpu_identify(&raw_data, &data))
-        throw DB::Exception("Cannot cpu_identify: " + std::string(cpuid_error()), DB::ErrorCodes::CPUID_ERROR);

    /// On Xen VMs, libcpuid returns wrong info (zero number of cores). Fallback to alternative method.
-    if (data.num_logical_cpus == 0)
+    /// Also, libcpuid does not support some CPUs like AMD Hygon C86 7151.
+    if (0 != cpuid_get_raw_data(&raw_data) || 0 != cpu_identify(&raw_data, &data) || data.num_logical_cpus == 0)
        return std::thread::hardware_concurrency();

    unsigned res = data.num_cores * data.total_logical_cpus / data.num_logical_cpus;
@ -38,13 +34,14 @@ unsigned getNumberOfPhysicalCPUCores()

    if (res != 0)
        return res;
+
 #elif USE_CPUINFO
    uint32_t cores = 0;
    if (cpuinfo_initialize())
        cores = cpuinfo_get_cores_count();

    if (cores)
-            return cores;
+        return cores;
 #endif

    /// As a fallback (also for non-x86 architectures) assume there are no hyper-threading on the system.
--- a/dbms/src/Common/tests/gtest_pod_array.cpp
+++ b/dbms/src/Common/tests/gtest_pod_array.cpp
@ -0,0 +1,34 @@
+#include <gtest/gtest.h>
+
+#include <Common/PODArray.h>
+
+using namespace DB;
+
+TEST(Common, PODArray_Insert)
+{
+    std::string str = "test_string_abacaba";
+    PODArray<char> chars;
+    chars.insert(chars.end(), str.begin(), str.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+
+    std::string insert_in_the_middle = "insert_in_the_middle";
+    auto pos = str.size() / 2;
+    str.insert(str.begin() + pos, insert_in_the_middle.begin(), insert_in_the_middle.end());
+    chars.insert(chars.begin() + pos, insert_in_the_middle.begin(), insert_in_the_middle.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+
+    std::string insert_with_resize;
+    insert_with_resize.reserve(chars.capacity() * 2);
+    char cur_char = 'a';
+    while (insert_with_resize.size() < insert_with_resize.capacity())
+    {
+        insert_with_resize += cur_char;
+        if (cur_char == 'z')
+            cur_char = 'a';
+        else
+            ++cur_char;
+    }
+    str.insert(str.begin(), insert_with_resize.begin(), insert_with_resize.end());
+    chars.insert(chars.begin(), insert_with_resize.begin(), insert_with_resize.end());
+    EXPECT_EQ(str, std::string(chars.data(), chars.size()));
+}
--- a/dbms/src/Core/Defines.h
+++ b/dbms/src/Core/Defines.h
@ -59,9 +59,11 @@
 #define DBMS_MIN_REVISION_WITH_COLUMN_DEFAULTS_METADATA 54410

 #define DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE 54405
-
 #define DBMS_MIN_REVISION_WITH_CLIENT_WRITE_INFO 54420

+/// Mininum revision supporting SettingsBinaryFormat::STRINGS.
+#define DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS 54429
+
 /// Version of ClickHouse TCP protocol. Set to git tag with latest protocol change.
 #define DBMS_TCP_PROTOCOL_VERSION 54226

@ -148,9 +150,9 @@
    #define OPTIMIZE(x)
 #endif

-/// This number is only used for distributed version compatible.
-/// It could be any magic number.
-#define DBMS_DISTRIBUTED_SENDS_MAGIC_NUMBER 0xCAFECABE
+/// Marks that extra information is sent to a shard. It could be any magic numbers.
+#define DBMS_DISTRIBUTED_SIGNATURE_EXTRA_INFO 0xCAFEDACEull
+#define DBMS_DISTRIBUTED_SIGNATURE_SETTINGS_OLD_FORMAT 0xCAFECABEull

 #if !__has_include(<sanitizer/asan_interface.h>)
 #   define ASAN_UNPOISON_MEMORY_REGION(a, b)
--- a/dbms/src/Core/Settings.h
+++ b/dbms/src/Core/Settings.h
@ -1,6 +1,6 @@
 #pragma once

-#include "SettingsCommon.h"
+#include <Core/SettingsCollection.h>
 #include <Core/Defines.h>


@ -35,219 +35,222 @@ struct Settings : public SettingsCollection<Settings>
    /// http://en.cppreference.com/w/cpp/language/aggregate_initialization
    Settings() {}

-    /** List of settings: type, name, default value.
+    /** List of settings: type, name, default value, description, flags
      *
      * This looks rather unconvenient. It is done that way to avoid repeating settings in different places.
      * Note: as an alternative, we could implement settings to be completely dynamic in form of map: String -> Field,
      *  but we are not going to do it, because settings is used everywhere as static struct fields.
+      *
+      * `flags` can be either 0 or IGNORABLE.
+      * A setting is "IGNORABLE" if it doesn't affects the results of the queries and can be ignored without exception.
      */

 #define LIST_OF_SETTINGS(M)                                            \
-    M(SettingUInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.") \
-    M(SettingUInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.") \
-    M(SettingUInt64, max_block_size, DEFAULT_BLOCK_SIZE, "Maximum block size for reading") \
-    M(SettingUInt64, max_insert_block_size, DEFAULT_INSERT_BLOCK_SIZE, "The maximum block size for insertion, if we control the creation of blocks for insertion.") \
-    M(SettingUInt64, min_insert_block_size_rows, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.") \
-    M(SettingUInt64, min_insert_block_size_bytes, (DEFAULT_INSERT_BLOCK_SIZE * 256), "Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.") \
-    M(SettingMaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.") \
-    M(SettingMaxThreads, max_alter_threads, 0, "The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.") \
-    M(SettingUInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.") \
-    M(SettingUInt64, max_distributed_connections, 1024, "The maximum number of connections for distributed processing of one query (should be greater than max_threads).") \
-    M(SettingUInt64, max_query_size, 262144, "Which part of the query can be read into RAM for parsing (the remaining data for INSERT, if any, is read later)") \
-    M(SettingUInt64, interactive_delay, 100000, "The interval in microseconds to check if the request is cancelled, and to send progress info.") \
-    M(SettingSeconds, connect_timeout, DBMS_DEFAULT_CONNECT_TIMEOUT_SEC, "Connection timeout if there are no replicas.") \
-    M(SettingMilliseconds, connect_timeout_with_failover_ms, DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_MS, "Connection timeout for selecting first healthy replica.") \
-    M(SettingSeconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "") \
-    M(SettingSeconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "") \
-    M(SettingSeconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes") \
-    M(SettingMilliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.") \
-    M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when connection pool is full.") \
-    M(SettingMilliseconds, replace_running_query_max_wait_ms, 5000, "The wait time for running query with the same query_id to finish when setting 'replace_running_query' is active.") \
-    M(SettingMilliseconds, kafka_max_wait_ms, 5000, "The wait time for reading from Kafka before retry.") \
-    M(SettingUInt64, poll_interval, DBMS_DEFAULT_POLL_INTERVAL, "Block at the query wait loop on the server for the specified number of seconds.") \
-    M(SettingUInt64, idle_connection_timeout, 3600, "Close idle TCP connections after specified number of seconds.") \
-    M(SettingUInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.") \
-    M(SettingUInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.") \
-    M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The mininum size of part to upload during multipart upload to S3.") \
-    M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.") \
-    M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.") \
-    M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.") \
-    M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.") \
-    M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.") \
+    M(SettingUInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.", 0) \
+    M(SettingUInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.", 0) \
+    M(SettingUInt64, max_block_size, DEFAULT_BLOCK_SIZE, "Maximum block size for reading", 0) \
+    M(SettingUInt64, max_insert_block_size, DEFAULT_INSERT_BLOCK_SIZE, "The maximum block size for insertion, if we control the creation of blocks for insertion.", 0) \
+    M(SettingUInt64, min_insert_block_size_rows, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.", 0) \
+    M(SettingUInt64, min_insert_block_size_bytes, (DEFAULT_INSERT_BLOCK_SIZE * 256), "Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.", 0) \
+    M(SettingMaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
+    M(SettingMaxThreads, max_alter_threads, 0, "The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.", 0) \
+    M(SettingUInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
+    M(SettingUInt64, max_distributed_connections, 1024, "The maximum number of connections for distributed processing of one query (should be greater than max_threads).", 0) \
+    M(SettingUInt64, max_query_size, 262144, "Which part of the query can be read into RAM for parsing (the remaining data for INSERT, if any, is read later)", 0) \
+    M(SettingUInt64, interactive_delay, 100000, "The interval in microseconds to check if the request is cancelled, and to send progress info.", 0) \
+    M(SettingSeconds, connect_timeout, DBMS_DEFAULT_CONNECT_TIMEOUT_SEC, "Connection timeout if there are no replicas.", 0) \
+    M(SettingMilliseconds, connect_timeout_with_failover_ms, DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_MS, "Connection timeout for selecting first healthy replica.", 0) \
+    M(SettingSeconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "", 0) \
+    M(SettingSeconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "", 0) \
+    M(SettingSeconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes", 0) \
+    M(SettingMilliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.", 0) \
+    M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when connection pool is full.", 0) \
+    M(SettingMilliseconds, replace_running_query_max_wait_ms, 5000, "The wait time for running query with the same query_id to finish when setting 'replace_running_query' is active.", 0) \
+    M(SettingMilliseconds, kafka_max_wait_ms, 5000, "The wait time for reading from Kafka before retry.", 0) \
+    M(SettingUInt64, poll_interval, DBMS_DEFAULT_POLL_INTERVAL, "Block at the query wait loop on the server for the specified number of seconds.", 0) \
+    M(SettingUInt64, idle_connection_timeout, 3600, "Close idle TCP connections after specified number of seconds.", 0) \
+    M(SettingUInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.", 0) \
+    M(SettingUInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.", 0) \
+    M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The mininum size of part to upload during multipart upload to S3.", 0) \
+    M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", 0) \
+    M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \
+    M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.", 0) \
+    M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.", 0) \
+    M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.", 0) \
    \
-    M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.") \
-    M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.") \
+    M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.", 0) \
+    M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.", 0) \
    \
-    M(SettingBool, distributed_directory_monitor_batch_inserts, false, "Should StorageDistributed DirectoryMonitors try to batch individual inserts into bigger ones.") \
+    M(SettingBool, distributed_directory_monitor_batch_inserts, false, "Should StorageDistributed DirectoryMonitors try to batch individual inserts into bigger ones.", 0) \
    \
-    M(SettingBool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.") \
+    M(SettingBool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.", 0) \
    \
-    M(SettingUInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.") \
-    M(SettingUInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.") \
+    M(SettingUInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.", 0) \
+    M(SettingUInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.", 0) \
    \
-    M(SettingLoadBalancing, load_balancing, LoadBalancing::RANDOM, "Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.") \
+    M(SettingLoadBalancing, load_balancing, LoadBalancing::RANDOM, "Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.", 0) \
    \
-    M(SettingTotalsMode, totals_mode, TotalsMode::AFTER_HAVING_EXCLUSIVE, "How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = ‘any’ are present.") \
-    M(SettingFloat, totals_auto_threshold, 0.5, "The threshold for totals_mode = 'auto'.") \
+    M(SettingTotalsMode, totals_mode, TotalsMode::AFTER_HAVING_EXCLUSIVE, "How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = ‘any’ are present.", 0) \
+    M(SettingFloat, totals_auto_threshold, 0.5, "The threshold for totals_mode = 'auto'.", 0) \
    \
-    M(SettingBool, allow_suspicious_low_cardinality_types, false, "In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.") \
-    M(SettingBool, compile_expressions, false, "Compile some scalar functions and operators to native code.") \
-    M(SettingUInt64, min_count_to_compile, 3, "The number of structurally identical queries before they are compiled.") \
-    M(SettingUInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled") \
-    M(SettingUInt64, group_by_two_level_threshold, 100000, "From what number of keys, a two-level aggregation starts. 0 - the threshold is not set.") \
-    M(SettingUInt64, group_by_two_level_threshold_bytes, 100000000, "From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.") \
-    M(SettingBool, distributed_aggregation_memory_efficient, false, "Is the memory-saving mode of distributed aggregation enabled.") \
-    M(SettingUInt64, aggregation_memory_efficient_merge_threads, 0, "Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'.") \
+    M(SettingBool, allow_suspicious_low_cardinality_types, false, "In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.", 0) \
+    M(SettingBool, compile_expressions, false, "Compile some scalar functions and operators to native code.", 0) \
+    M(SettingUInt64, min_count_to_compile, 3, "The number of structurally identical queries before they are compiled.", 0) \
+    M(SettingUInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled", 0) \
+    M(SettingUInt64, group_by_two_level_threshold, 100000, "From what number of keys, a two-level aggregation starts. 0 - the threshold is not set.", 0) \
+    M(SettingUInt64, group_by_two_level_threshold_bytes, 100000000, "From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.", 0) \
+    M(SettingBool, distributed_aggregation_memory_efficient, false, "Is the memory-saving mode of distributed aggregation enabled.", 0) \
+    M(SettingUInt64, aggregation_memory_efficient_merge_threads, 0, "Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'.", 0) \
    \
-    M(SettingUInt64, max_parallel_replicas, 1, "The maximum number of replicas of each shard used when the query is executed. For consistency (to get different parts of the same partition), this option only works for the specified sampling key. The lag of the replicas is not controlled.") \
-    M(SettingUInt64, parallel_replicas_count, 0, "") \
-    M(SettingUInt64, parallel_replica_offset, 0, "") \
+    M(SettingUInt64, max_parallel_replicas, 1, "The maximum number of replicas of each shard used when the query is executed. For consistency (to get different parts of the same partition), this option only works for the specified sampling key. The lag of the replicas is not controlled.", 0) \
+    M(SettingUInt64, parallel_replicas_count, 0, "", 0) \
+    M(SettingUInt64, parallel_replica_offset, 0, "", 0) \
    \
-    M(SettingBool, skip_unavailable_shards, false, "If 1, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.") \
+    M(SettingBool, skip_unavailable_shards, false, "If 1, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
    \
-    M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.") \
-    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.") \
+    M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.", 0) \
+    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.", 0) \
    \
-    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.") \
-    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.") \
-    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.") \
-    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.") \
-    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.") \
-    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (128 * 8192), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
-    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (192 * 10 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
+    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.", 0) \
+    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.", 0) \
+    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.", 0) \
+    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.", 0) \
+    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.", 0) \
+    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (128 * 8192), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)", 0) \
+    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (192 * 10 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)", 0) \
    \
-    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.") \
+    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.", 0) \
    \
-    M(SettingUInt64, mysql_max_rows_to_insert, 65536, "The maximum number of rows in MySQL batch insertion of the MySQL storage engine") \
+    M(SettingUInt64, mysql_max_rows_to_insert, 65536, "The maximum number of rows in MySQL batch insertion of the MySQL storage engine", 0) \
    \
-    M(SettingUInt64, optimize_min_equality_disjunction_chain_length, 3, "The minimum length of the expression `expr = x1 OR ... expr = xN` for optimization ") \
+    M(SettingUInt64, optimize_min_equality_disjunction_chain_length, 3, "The minimum length of the expression `expr = x1 OR ... expr = xN` for optimization ", 0) \
    \
-    M(SettingUInt64, min_bytes_to_use_direct_io, 0, "The minimum number of bytes for reading the data with O_DIRECT option during SELECT queries execution. 0 - disabled.") \
+    M(SettingUInt64, min_bytes_to_use_direct_io, 0, "The minimum number of bytes for reading the data with O_DIRECT option during SELECT queries execution. 0 - disabled.", 0) \
    \
-    M(SettingBool, force_index_by_date, 0, "Throw an exception if there is a partition key in a table, and it is not used.") \
-    M(SettingBool, force_primary_key, 0, "Throw an exception if there is primary key in a table, and it is not used.") \
+    M(SettingBool, force_index_by_date, 0, "Throw an exception if there is a partition key in a table, and it is not used.", 0) \
+    M(SettingBool, force_primary_key, 0, "Throw an exception if there is primary key in a table, and it is not used.", 0) \
    \
-    M(SettingUInt64, mark_cache_min_lifetime, 10000, "If the maximum size of mark_cache is exceeded, delete only records older than mark_cache_min_lifetime seconds.") \
+    M(SettingUInt64, mark_cache_min_lifetime, 10000, "If the maximum size of mark_cache is exceeded, delete only records older than mark_cache_min_lifetime seconds.", 0) \
    \
-    M(SettingFloat, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.") \
-    M(SettingFloat, max_streams_multiplier_for_merge_tables, 5, "Ask more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and especially helpful when merged tables differ in size.") \
+    M(SettingFloat, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.", 0) \
+    M(SettingFloat, max_streams_multiplier_for_merge_tables, 5, "Ask more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and especially helpful when merged tables differ in size.", 0) \
    \
-    M(SettingString, network_compression_method, "LZ4", "Allows you to select the method of data compression when writing.") \
+    M(SettingString, network_compression_method, "LZ4", "Allows you to select the method of data compression when writing.", 0) \
    \
-    M(SettingInt64, network_zstd_compression_level, 1, "Allows you to select the level of ZSTD compression.") \
+    M(SettingInt64, network_zstd_compression_level, 1, "Allows you to select the level of ZSTD compression.", 0) \
    \
-    M(SettingUInt64, priority, 0, "Priority of the query. 1 - the highest, higher value - lower priority; 0 - do not use priorities.") \
-    M(SettingInt64, os_thread_priority, 0, "If non zero - set corresponding 'nice' value for query processing threads. Can be used to adjust query priority for OS scheduler.") \
+    M(SettingUInt64, priority, 0, "Priority of the query. 1 - the highest, higher value - lower priority; 0 - do not use priorities.", 0) \
+    M(SettingInt64, os_thread_priority, 0, "If non zero - set corresponding 'nice' value for query processing threads. Can be used to adjust query priority for OS scheduler.", 0) \
    \
-    M(SettingBool, log_queries, 0, "Log requests and write the log to the system table.") \
+    M(SettingBool, log_queries, 0, "Log requests and write the log to the system table.", 0) \
    \
-    M(SettingUInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.") \
+    M(SettingUInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.", 0) \
    \
-    M(SettingDistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?") \
+    M(SettingDistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?", 0) \
    \
-    M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.") \
+    M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.", 0) \
    \
-    M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed") \
+    M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed", 0) \
    \
-    M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.") \
-    M(SettingMilliseconds, insert_quorum_timeout, 600000, "") \
-    M(SettingUInt64, select_sequential_consistency, 0, "For SELECT queries from the replicated table, throw an exception if the replica does not have a chunk written with the quorum; do not read the parts that have not yet been written with the quorum.") \
-    M(SettingUInt64, table_function_remote_max_addresses, 1000, "The maximum number of different shards and the maximum number of replicas of one shard in the `remote` function.") \
-    M(SettingMilliseconds, read_backoff_min_latency_ms, 1000, "Setting to reduce the number of threads in case of slow reads. Pay attention only to reads that took at least that much time.") \
-    M(SettingUInt64, read_backoff_max_throughput, 1048576, "Settings to reduce the number of threads in case of slow reads. Count events when the read bandwidth is less than that many bytes per second.") \
-    M(SettingMilliseconds, read_backoff_min_interval_between_events_ms, 1000, "Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time.") \
-    M(SettingUInt64, read_backoff_min_events, 2, "Settings to reduce the number of threads in case of slow reads. The number of events after which the number of threads will be reduced.") \
+    M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.", 0) \
+    M(SettingMilliseconds, insert_quorum_timeout, 600000, "", 0) \
+    M(SettingUInt64, select_sequential_consistency, 0, "For SELECT queries from the replicated table, throw an exception if the replica does not have a chunk written with the quorum; do not read the parts that have not yet been written with the quorum.", 0) \
+    M(SettingUInt64, table_function_remote_max_addresses, 1000, "The maximum number of different shards and the maximum number of replicas of one shard in the `remote` function.", 0) \
+    M(SettingMilliseconds, read_backoff_min_latency_ms, 1000, "Setting to reduce the number of threads in case of slow reads. Pay attention only to reads that took at least that much time.", 0) \
+    M(SettingUInt64, read_backoff_max_throughput, 1048576, "Settings to reduce the number of threads in case of slow reads. Count events when the read bandwidth is less than that many bytes per second.", 0) \
+    M(SettingMilliseconds, read_backoff_min_interval_between_events_ms, 1000, "Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time.", 0) \
+    M(SettingUInt64, read_backoff_min_events, 2, "Settings to reduce the number of threads in case of slow reads. The number of events after which the number of threads will be reduced.", 0) \
    \
-    M(SettingFloat, memory_tracker_fault_probability, 0., "For testing of `exception safety` - throw an exception every time you allocate memory with the specified probability.") \
+    M(SettingFloat, memory_tracker_fault_probability, 0., "For testing of `exception safety` - throw an exception every time you allocate memory with the specified probability.", 0) \
    \
-    M(SettingBool, enable_http_compression, 0, "Compress the result if the client over HTTP said that it understands data compressed by gzip or deflate.") \
-    M(SettingInt64, http_zlib_compression_level, 3, "Compression level - used if the client on HTTP said that it understands data compressed by gzip or deflate.") \
+    M(SettingBool, enable_http_compression, 0, "Compress the result if the client over HTTP said that it understands data compressed by gzip or deflate.", 0) \
+    M(SettingInt64, http_zlib_compression_level, 3, "Compression level - used if the client on HTTP said that it understands data compressed by gzip or deflate.", 0) \
    \
-    M(SettingBool, http_native_compression_disable_checksumming_on_decompress, 0, "If you uncompress the POST data from the client compressed by the native format, do not check the checksum.") \
+    M(SettingBool, http_native_compression_disable_checksumming_on_decompress, 0, "If you uncompress the POST data from the client compressed by the native format, do not check the checksum.", 0) \
    \
-    M(SettingString, count_distinct_implementation, "uniqExact", "What aggregate function to use for implementation of count(DISTINCT ...)") \
+    M(SettingString, count_distinct_implementation, "uniqExact", "What aggregate function to use for implementation of count(DISTINCT ...)", 0) \
    \
-    M(SettingBool, output_format_write_statistics, true, "Write statistics about read rows, bytes, time elapsed in suitable output formats.") \
+    M(SettingBool, output_format_write_statistics, true, "Write statistics about read rows, bytes, time elapsed in suitable output formats.", 0) \
    \
-    M(SettingBool, add_http_cors_header, false, "Write add http CORS header.") \
+    M(SettingBool, add_http_cors_header, false, "Write add http CORS header.", 0) \
    \
-    M(SettingUInt64, max_http_get_redirects, 0, "Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.") \
+    M(SettingUInt64, max_http_get_redirects, 0, "Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.", 0) \
    \
-    M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow, CSVWithNames, TSVWithNames and TSKV formats).") \
-    M(SettingBool, input_format_with_names_use_header, false, "For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.") \
-    M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).") \
-    M(SettingBool, input_format_defaults_for_omitted_fields, true, "For input data calculate default expressions for omitted fields (it works for JSONEachRow, CSV and TSV formats).") \
-    M(SettingBool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.") \
-    M(SettingBool, input_format_null_as_default, false, "For text input formats initialize null fields with default values if data type of this field is not nullable") \
+    M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow, CSVWithNames, TSVWithNames and TSKV formats).", 0) \
+    M(SettingBool, input_format_with_names_use_header, false, "For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.", 0) \
+    M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).", 0) \
+    M(SettingBool, input_format_defaults_for_omitted_fields, true, "For input data calculate default expressions for omitted fields (it works for JSONEachRow, CSV and TSV formats).", 0) \
+    M(SettingBool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.", 0) \
+    M(SettingBool, input_format_null_as_default, false, "For text input formats initialize null fields with default values if data type of this field is not nullable", 0) \
    \
-    M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.") \
-    M(SettingBool, input_format_values_deduce_templates_of_expressions, false, "For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.") \
-    M(SettingBool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.") \
+    M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.", 0) \
+    M(SettingBool, input_format_values_deduce_templates_of_expressions, false, "For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.", 0) \
+    M(SettingBool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.", 0) \
    \
-    M(SettingBool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.") \
+    M(SettingBool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.", 0) \
    \
-    M(SettingBool, output_format_json_quote_denormals, false, "Enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format.") \
+    M(SettingBool, output_format_json_quote_denormals, false, "Enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format.", 0) \
    \
-    M(SettingBool, output_format_json_escape_forward_slashes, true, "Controls escaping forward slashes for string outputs in JSON output format. This is intended for compatibility with JavaScript. Don't confuse with backslashes that are always escaped.") \
+    M(SettingBool, output_format_json_escape_forward_slashes, true, "Controls escaping forward slashes for string outputs in JSON output format. This is intended for compatibility with JavaScript. Don't confuse with backslashes that are always escaped.", 0) \
    \
-    M(SettingUInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.") \
-    M(SettingUInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.") \
-    M(SettingBool, output_format_pretty_color, true, "Use ANSI escape sequences to paint colors in Pretty formats") \
-    M(SettingUInt64, output_format_parquet_row_group_size, 1000000, "Row group size in rows.") \
+    M(SettingUInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.", 0) \
+    M(SettingUInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.", 0) \
+    M(SettingBool, output_format_pretty_color, true, "Use ANSI escape sequences to paint colors in Pretty formats", 0) \
+    M(SettingUInt64, output_format_parquet_row_group_size, 1000000, "Row group size in rows.", 0) \
    \
-    M(SettingBool, use_client_time_zone, false, "Use client timezone for interpreting DateTime string values, instead of adopting server timezone.") \
+    M(SettingBool, use_client_time_zone, false, "Use client timezone for interpreting DateTime string values, instead of adopting server timezone.", 0) \
    \
-    M(SettingBool, send_progress_in_http_headers, false, "Send progress notifications using X-ClickHouse-Progress headers. Some clients do not support high amount of HTTP headers (Python requests in particular), so it is disabled by default.") \
+    M(SettingBool, send_progress_in_http_headers, false, "Send progress notifications using X-ClickHouse-Progress headers. Some clients do not support high amount of HTTP headers (Python requests in particular), so it is disabled by default.", 0) \
    \
-    M(SettingUInt64, http_headers_progress_interval_ms, 100, "Do not send HTTP headers X-ClickHouse-Progress more frequently than at each specified interval.") \
+    M(SettingUInt64, http_headers_progress_interval_ms, 100, "Do not send HTTP headers X-ClickHouse-Progress more frequently than at each specified interval.", 0) \
    \
-    M(SettingBool, fsync_metadata, 1, "Do fsync after changing metadata for tables and databases (.sql files). Could be disabled in case of poor latency on server with high load of DDL queries and high load of disk subsystem.") \
+    M(SettingBool, fsync_metadata, 1, "Do fsync after changing metadata for tables and databases (.sql files). Could be disabled in case of poor latency on server with high load of DDL queries and high load of disk subsystem.", 0) \
    \
-    M(SettingUInt64, input_format_allow_errors_num, 0, "Maximum absolute amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.") \
-    M(SettingFloat, input_format_allow_errors_ratio, 0, "Maximum relative amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.") \
+    M(SettingUInt64, input_format_allow_errors_num, 0, "Maximum absolute amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \
+    M(SettingFloat, input_format_allow_errors_ratio, 0, "Maximum relative amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \
    \
-    M(SettingBool, join_use_nulls, 0, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.") \
+    M(SettingBool, join_use_nulls, 0, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.", 0) \
    \
-    M(SettingJoinStrictness, join_default_strictness, JoinStrictness::ALL, "Set default strictness in JOIN query. Possible values: empty string, 'ANY', 'ALL'. If empty, query without strictness will throw exception.") \
-    M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.") \
+    M(SettingJoinStrictness, join_default_strictness, JoinStrictness::ALL, "Set default strictness in JOIN query. Possible values: empty string, 'ANY', 'ALL'. If empty, query without strictness will throw exception.", 0) \
+    M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.", 0) \
    \
-    M(SettingUInt64, preferred_block_size_bytes, 1000000, "") \
+    M(SettingUInt64, preferred_block_size_bytes, 1000000, "", 0) \
    \
-    M(SettingUInt64, max_replica_delay_for_distributed_queries, 300, "If set, distributed queries of Replicated tables will choose servers with replication delay in seconds less than the specified value (not inclusive). Zero means do not take delay into account.") \
-    M(SettingBool, fallback_to_stale_replicas_for_distributed_queries, 1, "Suppose max_replica_delay_for_distributed_queries is set and all replicas for the queried table are stale. If this setting is enabled, the query will be performed anyway, otherwise the error will be reported.") \
-    M(SettingUInt64, preferred_max_column_in_block_size_bytes, 0, "Limit on max column size in block while reading. Helps to decrease cache misses count. Should be close to L2 cache size.") \
+    M(SettingUInt64, max_replica_delay_for_distributed_queries, 300, "If set, distributed queries of Replicated tables will choose servers with replication delay in seconds less than the specified value (not inclusive). Zero means do not take delay into account.", 0) \
+    M(SettingBool, fallback_to_stale_replicas_for_distributed_queries, 1, "Suppose max_replica_delay_for_distributed_queries is set and all replicas for the queried table are stale. If this setting is enabled, the query will be performed anyway, otherwise the error will be reported.", 0) \
+    M(SettingUInt64, preferred_max_column_in_block_size_bytes, 0, "Limit on max column size in block while reading. Helps to decrease cache misses count. Should be close to L2 cache size.", 0) \
    \
-    M(SettingBool, insert_distributed_sync, false, "If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster.") \
-    M(SettingUInt64, insert_distributed_timeout, 0, "Timeout for insert query into distributed. Setting is used only with insert_distributed_sync enabled. Zero value means no timeout.") \
-    M(SettingInt64, distributed_ddl_task_timeout, 180, "Timeout for DDL query responses from all hosts in cluster. If a ddl request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode. Negative value means infinite.") \
-    M(SettingMilliseconds, stream_flush_interval_ms, 7500, "Timeout for flushing data from streaming storages.") \
-    M(SettingMilliseconds, stream_poll_timeout_ms, 500, "Timeout for polling data from/to streaming storages.") \
+    M(SettingBool, insert_distributed_sync, false, "If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster.", 0) \
+    M(SettingUInt64, insert_distributed_timeout, 0, "Timeout for insert query into distributed. Setting is used only with insert_distributed_sync enabled. Zero value means no timeout.", 0) \
+    M(SettingInt64, distributed_ddl_task_timeout, 180, "Timeout for DDL query responses from all hosts in cluster. If a ddl request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode. Negative value means infinite.", 0) \
+    M(SettingMilliseconds, stream_flush_interval_ms, 7500, "Timeout for flushing data from streaming storages.", 0) \
+    M(SettingMilliseconds, stream_poll_timeout_ms, 500, "Timeout for polling data from/to streaming storages.", 0) \
    \
-    M(SettingString, format_schema, "", "Schema identifier (used by schema-based formats)") \
-    M(SettingString, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)") \
-    M(SettingString, format_template_row, "", "Path to file which contains format string for rows (for Template format)") \
-    M(SettingString, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)") \
+    M(SettingString, format_schema, "", "Schema identifier (used by schema-based formats)", 0) \
+    M(SettingString, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)", 0) \
+    M(SettingString, format_template_row, "", "Path to file which contains format string for rows (for Template format)", 0) \
+    M(SettingString, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)", 0) \
    \
-    M(SettingString, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)") \
-    M(SettingString, format_custom_field_delimiter, "\t", "Delimiter between fields (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_before_delimiter, "", "Delimiter before field of the first column (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_after_delimiter, "\n", "Delimiter after field of the last column (for CustomSeparated format)") \
-    M(SettingString, format_custom_row_between_delimiter, "", "Delimiter between rows (for CustomSeparated format)") \
-    M(SettingString, format_custom_result_before_delimiter, "", "Prefix before result set (for CustomSeparated format)") \
-    M(SettingString, format_custom_result_after_delimiter, "", "Suffix after result set (for CustomSeparated format)") \
+    M(SettingString, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_field_delimiter, "\t", "Delimiter between fields (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_before_delimiter, "", "Delimiter before field of the first column (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_after_delimiter, "\n", "Delimiter after field of the last column (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_row_between_delimiter, "", "Delimiter between rows (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_result_before_delimiter, "", "Prefix before result set (for CustomSeparated format)", 0) \
+    M(SettingString, format_custom_result_after_delimiter, "", "Suffix after result set (for CustomSeparated format)", 0) \
    \
-    M(SettingBool, insert_allow_materialized_columns, 0, "If setting is enabled, Allow materialized columns in INSERT.") \
-    M(SettingSeconds, http_connection_timeout, DEFAULT_HTTP_READ_BUFFER_CONNECTION_TIMEOUT, "HTTP connection timeout.") \
-    M(SettingSeconds, http_send_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP send timeout") \
-    M(SettingSeconds, http_receive_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP receive timeout") \
-    M(SettingBool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown") \
-    M(SettingBool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.") \
-    M(SettingBool, joined_subquery_requires_alias, false, "Force joined subqueries to have aliases for correct name qualification.") \
-    M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.") \
-    M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.") \
-    M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.") \
-    M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.") \
-    M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.") \
+    M(SettingBool, insert_allow_materialized_columns, 0, "If setting is enabled, Allow materialized columns in INSERT.", 0) \
+    M(SettingSeconds, http_connection_timeout, DEFAULT_HTTP_READ_BUFFER_CONNECTION_TIMEOUT, "HTTP connection timeout.", 0) \
+    M(SettingSeconds, http_send_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP send timeout", 0) \
+    M(SettingSeconds, http_receive_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP receive timeout", 0) \
+    M(SettingBool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown", 0) \
+    M(SettingBool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.", 0) \
+    M(SettingBool, joined_subquery_requires_alias, false, "Force joined subqueries to have aliases for correct name qualification.", 0) \
+    M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.", 0) \
+    M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.", 0) \
+    M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.", 0) \
+    M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
+    M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
    \
    \
    /** Limits during query execution are part of the settings. \
@ -257,135 +260,135 @@ struct Settings : public SettingsCollection<Settings>
      * Almost all limits apply to each stream individually. \
      */ \
    \
-    M(SettingUInt64, max_rows_to_read, 0, "Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.") \
-    M(SettingUInt64, max_bytes_to_read, 0, "Limit on read bytes (after decompression) from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.") \
-    M(SettingOverflowMode, read_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_to_read, 0, "Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.", 0) \
+    M(SettingUInt64, max_bytes_to_read, 0, "Limit on read bytes (after decompression) from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.", 0) \
+    M(SettingOverflowMode, read_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_to_group_by, 0, "") \
-    M(SettingOverflowModeGroupBy, group_by_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingUInt64, max_bytes_before_external_group_by, 0, "") \
+    M(SettingUInt64, max_rows_to_group_by, 0, "", 0) \
+    M(SettingOverflowModeGroupBy, group_by_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingUInt64, max_bytes_before_external_group_by, 0, "", 0) \
    \
-    M(SettingUInt64, max_rows_to_sort, 0, "") \
-    M(SettingUInt64, max_bytes_to_sort, 0, "") \
-    M(SettingOverflowMode, sort_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingUInt64, max_bytes_before_external_sort, 0, "") \
-    M(SettingUInt64, max_bytes_before_remerge_sort, 1000000000, "In case of ORDER BY with LIMIT, when memory usage is higher than specified threshold, perform additional steps of merging blocks before final merge to keep just top LIMIT rows.") \
+    M(SettingUInt64, max_rows_to_sort, 0, "", 0) \
+    M(SettingUInt64, max_bytes_to_sort, 0, "", 0) \
+    M(SettingOverflowMode, sort_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingUInt64, max_bytes_before_external_sort, 0, "", 0) \
+    M(SettingUInt64, max_bytes_before_remerge_sort, 1000000000, "In case of ORDER BY with LIMIT, when memory usage is higher than specified threshold, perform additional steps of merging blocks before final merge to keep just top LIMIT rows.", 0) \
    \
-    M(SettingUInt64, max_result_rows, 0, "Limit on result size in rows. Also checked for intermediate data sent from remote servers.") \
-    M(SettingUInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). Also checked for intermediate data sent from remote servers.") \
-    M(SettingOverflowMode, result_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_result_rows, 0, "Limit on result size in rows. Also checked for intermediate data sent from remote servers.", 0) \
+    M(SettingUInt64, max_result_bytes, 0, "Limit on result size in bytes (uncompressed). Also checked for intermediate data sent from remote servers.", 0) \
+    M(SettingOverflowMode, result_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
    /* TODO: Check also when merging and finalizing aggregate functions. */ \
-    M(SettingSeconds, max_execution_time, 0, "") \
-    M(SettingOverflowMode, timeout_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingSeconds, max_execution_time, 0, "", 0) \
+    M(SettingOverflowMode, timeout_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, min_execution_speed, 0, "Minimum number of execution rows per second.") \
-    M(SettingUInt64, max_execution_speed, 0, "Maximum number of execution rows per second.") \
-    M(SettingUInt64, min_execution_speed_bytes, 0, "Minimum number of execution bytes per second.") \
-    M(SettingUInt64, max_execution_speed_bytes, 0, "Maximum number of execution bytes per second.") \
-    M(SettingSeconds, timeout_before_checking_execution_speed, 0, "Check that the speed is not too low after the specified time has elapsed.") \
+    M(SettingUInt64, min_execution_speed, 0, "Minimum number of execution rows per second.", 0) \
+    M(SettingUInt64, max_execution_speed, 0, "Maximum number of execution rows per second.", 0) \
+    M(SettingUInt64, min_execution_speed_bytes, 0, "Minimum number of execution bytes per second.", 0) \
+    M(SettingUInt64, max_execution_speed_bytes, 0, "Maximum number of execution bytes per second.", 0) \
+    M(SettingSeconds, timeout_before_checking_execution_speed, 0, "Check that the speed is not too low after the specified time has elapsed.", 0) \
    \
-    M(SettingUInt64, max_columns_to_read, 0, "") \
-    M(SettingUInt64, max_temporary_columns, 0, "") \
-    M(SettingUInt64, max_temporary_non_const_columns, 0, "") \
+    M(SettingUInt64, max_columns_to_read, 0, "", 0) \
+    M(SettingUInt64, max_temporary_columns, 0, "", 0) \
+    M(SettingUInt64, max_temporary_non_const_columns, 0, "", 0) \
    \
-    M(SettingUInt64, max_subquery_depth, 100, "") \
-    M(SettingUInt64, max_pipeline_depth, 1000, "") \
-    M(SettingUInt64, max_ast_depth, 1000, "Maximum depth of query syntax tree. Checked after parsing.") \
-    M(SettingUInt64, max_ast_elements, 50000, "Maximum size of query syntax tree in number of nodes. Checked after parsing.") \
-    M(SettingUInt64, max_expanded_ast_elements, 500000, "Maximum size of query syntax tree in number of nodes after expansion of aliases and the asterisk.") \
+    M(SettingUInt64, max_subquery_depth, 100, "", 0) \
+    M(SettingUInt64, max_pipeline_depth, 1000, "", 0) \
+    M(SettingUInt64, max_ast_depth, 1000, "Maximum depth of query syntax tree. Checked after parsing.", 0) \
+    M(SettingUInt64, max_ast_elements, 50000, "Maximum size of query syntax tree in number of nodes. Checked after parsing.", 0) \
+    M(SettingUInt64, max_expanded_ast_elements, 500000, "Maximum size of query syntax tree in number of nodes after expansion of aliases and the asterisk.", 0) \
    \
-    M(SettingUInt64, readonly, 0, "0 - everything is allowed. 1 - only read requests. 2 - only read requests, as well as changing settings, except for the 'readonly' setting.") \
+    M(SettingUInt64, readonly, 0, "0 - everything is allowed. 1 - only read requests. 2 - only read requests, as well as changing settings, except for the 'readonly' setting.", 0) \
    \
-    M(SettingUInt64, max_rows_in_set, 0, "Maximum size of the set (in number of elements) resulting from the execution of the IN section.") \
-    M(SettingUInt64, max_bytes_in_set, 0, "Maximum size of the set (in bytes in memory) resulting from the execution of the IN section.") \
-    M(SettingOverflowMode, set_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_in_set, 0, "Maximum size of the set (in number of elements) resulting from the execution of the IN section.", 0) \
+    M(SettingUInt64, max_bytes_in_set, 0, "Maximum size of the set (in bytes in memory) resulting from the execution of the IN section.", 0) \
+    M(SettingOverflowMode, set_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_in_join, 0, "Maximum size of the hash table for JOIN (in number of rows).") \
-    M(SettingUInt64, max_bytes_in_join, 0, "Maximum size of the hash table for JOIN (in number of bytes in memory).") \
-    M(SettingOverflowMode, join_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
-    M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.") \
-    M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join for LEFT and INNER JOINs.") \
-    M(SettingBool, partial_merge_join_optimizations, false, "Enable optimizations in partial merge join") \
-    M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit's required but max_bytes_in_join is not set.") \
-    M(SettingUInt64, partial_merge_join_rows_in_right_blocks, 10000, "Split right-hand joining data in blocks of specified size. It's a portion of data indexed by min-max values and possibly unloaded on disk.") \
-    M(SettingUInt64, partial_merge_join_rows_in_left_blocks, 10000, "Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.") \
+    M(SettingUInt64, max_rows_in_join, 0, "Maximum size of the hash table for JOIN (in number of rows).", 0) \
+    M(SettingUInt64, max_bytes_in_join, 0, "Maximum size of the hash table for JOIN (in number of bytes in memory).", 0) \
+    M(SettingOverflowMode, join_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
+    M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.", 0) \
+    M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join for LEFT and INNER JOINs.", 0) \
+    M(SettingBool, partial_merge_join_optimizations, false, "Enable optimizations in partial merge join", 0) \
+    M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit's required but max_bytes_in_join is not set.", 0) \
+    M(SettingUInt64, partial_merge_join_rows_in_right_blocks, 10000, "Split right-hand joining data in blocks of specified size. It's a portion of data indexed by min-max values and possibly unloaded on disk.", 0) \
+    M(SettingUInt64, partial_merge_join_rows_in_left_blocks, 10000, "Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.", 0) \
    \
-    M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \
-    M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \
-    M(SettingOverflowMode, transfer_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
+    M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
+    M(SettingOverflowMode, transfer_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_rows_in_distinct, 0, "Maximum number of elements during execution of DISTINCT.") \
-    M(SettingUInt64, max_bytes_in_distinct, 0, "Maximum total size of state (in uncompressed bytes) in memory for the execution of DISTINCT.") \
-    M(SettingOverflowMode, distinct_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
+    M(SettingUInt64, max_rows_in_distinct, 0, "Maximum number of elements during execution of DISTINCT.", 0) \
+    M(SettingUInt64, max_bytes_in_distinct, 0, "Maximum total size of state (in uncompressed bytes) in memory for the execution of DISTINCT.", 0) \
+    M(SettingOverflowMode, distinct_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.", 0) \
    \
-    M(SettingUInt64, max_memory_usage, 0, "Maximum memory usage for processing of single query. Zero means unlimited.") \
-    M(SettingUInt64, max_memory_usage_for_user, 0, "Maximum memory usage for processing all concurrently running queries for the user. Zero means unlimited.") \
-    M(SettingUInt64, max_memory_usage_for_all_queries, 0, "Maximum memory usage for processing all concurrently running queries on the server. Zero means unlimited.") \
+    M(SettingUInt64, max_memory_usage, 0, "Maximum memory usage for processing of single query. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_memory_usage_for_user, 0, "Maximum memory usage for processing all concurrently running queries for the user. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_memory_usage_for_all_queries, 0, "Maximum memory usage for processing all concurrently running queries on the server. Zero means unlimited.", 0) \
    \
-    M(SettingUInt64, max_network_bandwidth, 0, "The maximum speed of data exchange over the network in bytes per second for a query. Zero means unlimited.") \
-    M(SettingUInt64, max_network_bytes, 0, "The maximum number of bytes (compressed) to receive or transmit over the network for execution of the query.") \
-    M(SettingUInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.")\
-    M(SettingUInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.") \
-    M(SettingChar, format_csv_delimiter, ',', "The character to be considered as a delimiter in CSV data. If setting with a string, a string has to have a length of 1.") \
-    M(SettingBool, format_csv_allow_single_quotes, 1, "If it is set to true, allow strings in single quotes.") \
-    M(SettingBool, format_csv_allow_double_quotes, 1, "If it is set to true, allow strings in double quotes.") \
-    M(SettingBool, input_format_csv_unquoted_null_literal_as_null, false, "Consider unquoted NULL literal as \\N") \
+    M(SettingUInt64, max_network_bandwidth, 0, "The maximum speed of data exchange over the network in bytes per second for a query. Zero means unlimited.", 0) \
+    M(SettingUInt64, max_network_bytes, 0, "The maximum number of bytes (compressed) to receive or transmit over the network for execution of the query.", 0) \
+    M(SettingUInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.", 0)\
+    M(SettingUInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.", 0) \
+    M(SettingChar, format_csv_delimiter, ',', "The character to be considered as a delimiter in CSV data. If setting with a string, a string has to have a length of 1.", 0) \
+    M(SettingBool, format_csv_allow_single_quotes, 1, "If it is set to true, allow strings in single quotes.", 0) \
+    M(SettingBool, format_csv_allow_double_quotes, 1, "If it is set to true, allow strings in double quotes.", 0) \
+    M(SettingBool, input_format_csv_unquoted_null_literal_as_null, false, "Consider unquoted NULL literal as \\N", 0) \
    \
-    M(SettingDateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.") \
-    M(SettingBool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.") \
-    M(SettingBool, log_query_settings, true, "Log query settings into the query_log.") \
-    M(SettingBool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.") \
-    M(SettingLogsLevel, send_logs_level, LogsLevel::none, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'none'") \
-    M(SettingBool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.") \
-    M(SettingBool, enable_optimize_predicate_expression_to_final_subquery, 1, "Allow push predicate to final subquery.") \
+    M(SettingDateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.", 0) \
+    M(SettingBool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.", 0) \
+    M(SettingBool, log_query_settings, true, "Log query settings into the query_log.", 0) \
+    M(SettingBool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.", 0) \
+    M(SettingLogsLevel, send_logs_level, LogsLevel::none, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'none'", 0) \
+    M(SettingBool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.", 0) \
+    M(SettingBool, enable_optimize_predicate_expression_to_final_subquery, 1, "Allow push predicate to final subquery.", 0) \
    \
-    M(SettingUInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.") \
-    M(SettingBool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.") \
-    M(SettingBool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations") \
+    M(SettingUInt64, low_cardinality_max_dictionary_size, 8192, "Maximum size (in rows) of shared global dictionary for LowCardinality type.", 0) \
+    M(SettingBool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.", 0) \
+    M(SettingBool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations", 0) \
    \
-    M(SettingBool, prefer_localhost_replica, 1, "1 - always send query to local replica, if it exists. 0 - choose replica to send query between local and remote ones according to load_balancing") \
-    M(SettingUInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.") \
-    M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).") \
-    M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.") \
-    M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.") \
-    M(SettingBool, parallel_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \
-    M(SettingBool, enable_debug_queries, false, "Enables debug queries such as AST.") \
-    M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.") \
-    M(SettingBool, optimize_read_in_order, true, "Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.") \
-    M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.") \
-    M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Emulate multiple joins using subselects") \
-    M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Convert CROSS JOIN to INNER JOIN if possible") \
-    M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.") \
-    M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for 'mysql' table function.") \
-    M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.") \
+    M(SettingBool, prefer_localhost_replica, 1, "1 - always send query to local replica, if it exists. 0 - choose replica to send query between local and remote ones according to load_balancing", 0) \
+    M(SettingUInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.", 0) \
+    M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).", 0) \
+    M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.", 0) \
+    M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.", 0) \
+    M(SettingBool, parallel_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.", 0) \
+    M(SettingBool, enable_debug_queries, false, "Enables debug queries such as AST.", 0) \
+    M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.", 0) \
+    M(SettingBool, optimize_read_in_order, true, "Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.", 0) \
+    M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.", 0) \
+    M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Emulate multiple joins using subselects", 0) \
+    M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Convert CROSS JOIN to INNER JOIN if possible", 0) \
+    M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.", 0) \
+    M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql' and 'odbc' table functions.", 0) \
+    M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.", 0) \
    \
-    M(SettingBool, experimental_use_processors, false, "Use processors pipeline.") \
+    M(SettingBool, experimental_use_processors, false, "Use processors pipeline.", 0) \
    \
-    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
-    M(SettingBool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.") \
-    M(SettingBool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.") \
+    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.", 0) \
+    M(SettingBool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.", 0) \
+    M(SettingBool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.", 0) \
    \
-    M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.") \
-    M(SettingBool, check_query_single_value_result, true, "Return check query result as single 1/0 value") \
-    M(SettingBool, allow_drop_detached, false, "Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries") \
+    M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.", 0) \
+    M(SettingBool, check_query_single_value_result, true, "Return check query result as single 1/0 value", 0) \
+    M(SettingBool, allow_drop_detached, false, "Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries", 0) \
    \
-    M(SettingSeconds, distributed_replica_error_half_life, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD, "Time period reduces replica error counter by 2 times.") \
-    M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.") \
+    M(SettingSeconds, distributed_replica_error_half_life, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD, "Time period reduces replica error counter by 2 times.", 0) \
+    M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.", 0) \
    \
-    M(SettingBool, allow_experimental_live_view, false, "Enable LIVE VIEW. Not mature enough.") \
-    M(SettingSeconds, live_view_heartbeat_interval, DEFAULT_LIVE_VIEW_HEARTBEAT_INTERVAL_SEC, "The heartbeat interval in seconds to indicate live query is alive.") \
-    M(SettingSeconds, temporary_live_view_timeout, DEFAULT_TEMPORARY_LIVE_VIEW_TIMEOUT_SEC, "Timeout after which temporary live view is deleted.") \
-    M(SettingUInt64, max_live_view_insert_blocks_before_refresh, 64, "Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.") \
-    M(SettingUInt64, min_free_disk_space_for_temporary_data, 0, "The minimum disk space to keep while writing temporary data used in external sorting and aggregation.") \
+    M(SettingBool, allow_experimental_live_view, false, "Enable LIVE VIEW. Not mature enough.", 0) \
+    M(SettingSeconds, live_view_heartbeat_interval, DEFAULT_LIVE_VIEW_HEARTBEAT_INTERVAL_SEC, "The heartbeat interval in seconds to indicate live query is alive.", 0) \
+    M(SettingSeconds, temporary_live_view_timeout, DEFAULT_TEMPORARY_LIVE_VIEW_TIMEOUT_SEC, "Timeout after which temporary live view is deleted.", 0) \
+    M(SettingUInt64, max_live_view_insert_blocks_before_refresh, 64, "Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.", 0) \
+    M(SettingUInt64, min_free_disk_space_for_temporary_data, 0, "The minimum disk space to keep while writing temporary data used in external sorting and aggregation.", 0) \
    \
-    M(SettingBool, enable_scalar_subquery_optimization, true, "If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once.") \
-    M(SettingBool, optimize_trivial_count_query, true, "Process trivial 'SELECT count() FROM table' query from metadata.") \
+    M(SettingBool, enable_scalar_subquery_optimization, true, "If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once.", 0) \
+    M(SettingBool, optimize_trivial_count_query, true, "Process trivial 'SELECT count() FROM table' query from metadata.", 0) \
    \
    /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \
    \
-    M(SettingBool, allow_experimental_low_cardinality_type, true, "Obsolete setting, does nothing. Will be removed after 2019-08-13") \
-    M(SettingBool, compile, false, "Whether query compilation is enabled. Will be removed after 2020-03-13") \
+    M(SettingBool, allow_experimental_low_cardinality_type, true, "Obsolete setting, does nothing. Will be removed after 2019-08-13", 0) \
+    M(SettingBool, compile, false, "Whether query compilation is enabled. Will be removed after 2020-03-13", 0) \

    DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS)

--- a/dbms/src/Core/SettingsCollection.cpp
+++ b/dbms/src/Core/SettingsCollection.cpp
@ -1,17 +1,17 @@
-#include "SettingsCommon.h"
+#include <Core/SettingsCollection.h>
+#include <Core/SettingsCollectionImpl.h>

 #include <Core/Field.h>
 #include <Common/getNumberOfPhysicalCPUCores.h>
 #include <Common/FieldVisitors.h>
+#include <common/logger_useful.h>
 #include <IO/ReadHelpers.h>
 #include <IO/ReadBufferFromString.h>
 #include <IO/WriteHelpers.h>


-
 namespace DB
 {
-
 namespace ErrorCodes
 {
    extern const int TYPE_MISMATCH;
@ -90,8 +90,14 @@ void SettingNumber<bool>::set(const String & x)
 }

 template <typename Type>
-void SettingNumber<Type>::serialize(WriteBuffer & buf) const
+void SettingNumber<Type>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+         writeStringBinary(toString(), buf);
+         return;
+    }
+
    if constexpr (is_integral_v<Type> && is_unsigned_v<Type>)
        writeVarUInt(static_cast<UInt64>(value), buf);
    else if constexpr (is_integral_v<Type> && is_signed_v<Type>)
@ -99,13 +105,21 @@ void SettingNumber<Type>::serialize(WriteBuffer & buf) const
    else
    {
        static_assert(std::is_floating_point_v<Type>);
-        writeBinary(toString(), buf);
+        writeStringBinary(toString(), buf);
    }
 }

 template <typename Type>
-void SettingNumber<Type>::deserialize(ReadBuffer & buf)
+void SettingNumber<Type>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    if constexpr (is_integral_v<Type> && is_unsigned_v<Type>)
    {
        UInt64 x;
@ -122,7 +136,7 @@ void SettingNumber<Type>::deserialize(ReadBuffer & buf)
    {
        static_assert(std::is_floating_point_v<Type>);
        String x;
-        readBinary(x, buf);
+        readStringBinary(x, buf);
        set(x);
    }
 }
@ -167,13 +181,27 @@ void SettingMaxThreads::set(const String & x)
        set(parse<UInt64>(x));
 }

-void SettingMaxThreads::serialize(WriteBuffer & buf) const
+void SettingMaxThreads::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        writeStringBinary(is_auto ? "auto" : DB::toString(value), buf);
+        return;
+    }
+
    writeVarUInt(is_auto ? 0 : value, buf);
 }

-void SettingMaxThreads::deserialize(ReadBuffer & buf)
+void SettingMaxThreads::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    UInt64 x = 0;
    readVarUInt(x, buf);
    set(x);
@ -233,14 +261,28 @@ void SettingTimespan<io_unit>::set(const String & x)
 }

 template <SettingTimespanIO io_unit>
-void SettingTimespan<io_unit>::serialize(WriteBuffer & buf) const
+void SettingTimespan<io_unit>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        writeStringBinary(toString(), buf);
+        return;
+    }
+
    writeVarUInt(value.totalMicroseconds() / microseconds_per_io_unit, buf);
 }

 template <SettingTimespanIO io_unit>
-void SettingTimespan<io_unit>::deserialize(ReadBuffer & buf)
+void SettingTimespan<io_unit>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
 {
+    if (format >= SettingsBinaryFormat::STRINGS)
+    {
+        String x;
+        readStringBinary(x, buf);
+        set(x);
+        return;
+    }
+
    UInt64 x = 0;
    readVarUInt(x, buf);
    set(x);
@ -271,15 +313,15 @@ void SettingString::set(const Field & x)
    set(safeGet<const String &>(x));
 }

-void SettingString::serialize(WriteBuffer & buf) const
+void SettingString::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(value, buf);
+    writeStringBinary(value, buf);
 }

-void SettingString::deserialize(ReadBuffer & buf)
+void SettingString::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }

@ -314,30 +356,30 @@ void SettingChar::set(const Field & x)
    set(s);
 }

-void SettingChar::serialize(WriteBuffer & buf) const
+void SettingChar::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(toString(), buf);
+    writeStringBinary(toString(), buf);
 }

-void SettingChar::deserialize(ReadBuffer & buf)
+void SettingChar::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }


 template <typename EnumType, typename Tag>
-void SettingEnum<EnumType, Tag>::serialize(WriteBuffer & buf) const
+void SettingEnum<EnumType, Tag>::serialize(WriteBuffer & buf, SettingsBinaryFormat) const
 {
-    writeBinary(toString(), buf);
+    writeStringBinary(toString(), buf);
 }

 template <typename EnumType, typename Tag>
-void SettingEnum<EnumType, Tag>::deserialize(ReadBuffer & buf)
+void SettingEnum<EnumType, Tag>::deserialize(ReadBuffer & buf, SettingsBinaryFormat)
 {
    String s;
-    readBinary(s, buf);
+    readStringBinary(s, buf);
    set(s);
 }

@ -462,14 +504,43 @@ IMPLEMENT_SETTING_ENUM(LogsLevel, LOGS_LEVEL_LIST_OF_NAMES, ErrorCodes::BAD_ARGU

 namespace details
 {
+    void SettingsCollectionUtils::serializeName(const StringRef & name, WriteBuffer & buf)
+    {
+        writeStringBinary(name, buf);
+    }
+
    String SettingsCollectionUtils::deserializeName(ReadBuffer & buf)
    {
        String name;
-        readBinary(name, buf);
+        readStringBinary(name, buf);
        return name;
    }

-    void SettingsCollectionUtils::serializeName(const StringRef & name, WriteBuffer & buf) { writeBinary(name, buf); }
+    void SettingsCollectionUtils::serializeFlag(bool flag, WriteBuffer & buf)
+    {
+        buf.write(flag);
+    }
+
+    bool SettingsCollectionUtils::deserializeFlag(ReadBuffer & buf)
+    {
+        char c;
+        buf.readStrict(c);
+        return c;
+    }
+
+    void SettingsCollectionUtils::skipValue(ReadBuffer & buf)
+    {
+        /// Ignore a string written by the function writeStringBinary().
+        UInt64 size;
+        readVarUInt(size, buf);
+        buf.ignore(size);
+    }
+
+    void SettingsCollectionUtils::warningNameNotFound(const StringRef & name)
+    {
+        static auto * log = &Logger::get("Settings");
+        LOG_WARNING(log, "Unknown setting " << name << ", skipping");
+    }

    void SettingsCollectionUtils::throwNameNotFound(const StringRef & name)
    {
--- a/dbms/src/Core/SettingsCollection.h
+++ b/dbms/src/Core/SettingsCollection.h
@ -6,7 +6,6 @@
 #include <common/StringRef.h>
 #include <Core/Types.h>
 #include <unordered_map>
-#include <boost/noncopyable.hpp>


 namespace DB
@ -17,6 +16,8 @@ struct SettingChange;
 using SettingsChanges = std::vector<SettingChange>;
 class ReadBuffer;
 class WriteBuffer;
+enum class SettingsBinaryFormat;
+

 /** One setting for any type.
  * Stores a value within itself, as well as a flag - whether the value was changed.
@ -51,10 +52,10 @@ struct SettingNumber
    void set(const String & x);

    /// Serialize to binary stream suitable for transfer over network.
-    void serialize(WriteBuffer & buf) const;
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;

    /// Read from binary stream.
-    void deserialize(ReadBuffer & buf);
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };

 using SettingUInt64 = SettingNumber<UInt64>;
@ -85,8 +86,8 @@ struct SettingMaxThreads
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);

    void setAuto();
    UInt64 getAutoValue() const;
@ -118,8 +119,8 @@ struct SettingTimespan
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);

    static constexpr UInt64 microseconds_per_io_unit = (io_unit == SettingTimespanIO::MILLISECOND) ? 1000 : 1000000;
 };
@ -144,8 +145,8 @@ struct SettingString
    void set(const String & x);
    void set(const Field & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -167,8 +168,8 @@ public:
    void set(const String & x);
    void set(const Field & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -191,8 +192,8 @@ struct SettingEnum
    void set(const Field & x);
    void set(const String & x);

-    void serialize(WriteBuffer & buf) const;
-    void deserialize(ReadBuffer & buf);
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format) const;
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format);
 };


@ -269,15 +270,12 @@ enum class LogsLevel
 using SettingLogsLevel = SettingEnum<LogsLevel>;


-namespace details
+enum class SettingsBinaryFormat
 {
-    struct SettingsCollectionUtils
-    {
-        static void serializeName(const StringRef & name, WriteBuffer & buf);
-        static String deserializeName(ReadBuffer & buf);
-        [[noreturn]] static void throwNameNotFound(const StringRef & name);
-    };
-}
+    OLD,     /// Part of the settings are serialized as strings, and other part as varints. This is the old behaviour.
+    STRINGS, /// All settings are serialized as strings. Before each value the flag `is_ignorable` is serialized.
+    DEFAULT = STRINGS,
+};


 /** Template class to define collections of settings.
@ -287,9 +285,9 @@ namespace details
  * struct MySettings : public SettingsCollection<MySettings>
  * {
  * #   define APPLY_FOR_MYSETTINGS(M) \
-  *         M(SettingUInt64, a, 100, "Description of a") \
-  *         M(SettingFloat, f, 3.11, "Description of f") \
-  *         M(SettingString, s, "default", "Description of s")
+  *         M(SettingUInt64, a, 100, "Description of a", 0) \
+  *         M(SettingFloat, f, 3.11, "Description of f", IGNORABLE) // IGNORABLE - means the setting can be ignored by older versions) \
+  *         M(SettingString, s, "default", "Description of s", 0)
  *
  *     DECLARE_SETTINGS_COLLECTION(MySettings, APPLY_FOR_MYSETTINGS)
  * };
@ -304,21 +302,22 @@ private:
    Derived & castToDerived() { return *static_cast<Derived *>(this); }
    const Derived & castToDerived() const { return *static_cast<const Derived *>(this); }

-    using IsChangedFunction = bool (*)(const Derived &);
-    using GetStringFunction = String (*)(const Derived &);
-    using GetFieldFunction = Field (*)(const Derived &);
-    using SetStringFunction = void (*)(Derived &, const String &);
-    using SetFieldFunction = void (*)(Derived &, const Field &);
-    using SerializeFunction = void (*)(const Derived &, WriteBuffer & buf);
-    using DeserializeFunction = void (*)(Derived &, ReadBuffer & buf);
-    using ValueToStringFunction = String (*)(const Field &);
-    using ValueToCorrespondingTypeFunction = Field (*)(const Field &);
-
    struct MemberInfo
    {
-        IsChangedFunction is_changed;
+        using IsChangedFunction = bool (*)(const Derived &);
+        using GetStringFunction = String (*)(const Derived &);
+        using GetFieldFunction = Field (*)(const Derived &);
+        using SetStringFunction = void (*)(Derived &, const String &);
+        using SetFieldFunction = void (*)(Derived &, const Field &);
+        using SerializeFunction = void (*)(const Derived &, WriteBuffer & buf, SettingsBinaryFormat);
+        using DeserializeFunction = void (*)(Derived &, ReadBuffer & buf, SettingsBinaryFormat);
+        using ValueToStringFunction = String (*)(const Field &);
+        using ValueToCorrespondingTypeFunction = Field (*)(const Field &);
+
        StringRef name;
        StringRef description;
+        bool is_ignorable;
+        IsChangedFunction is_changed;
        GetStringFunction get_string;
        GetFieldFunction get_field;
        SetStringFunction set_string;
@ -329,52 +328,22 @@ private:
        ValueToCorrespondingTypeFunction value_to_corresponding_type;
    };

-    class MemberInfos : private boost::noncopyable
+    class MemberInfos
    {
    public:
-        static const MemberInfos & instance();
-
-        size_t size() const { return infos.size(); }
-        const MemberInfo & operator[](size_t index) const { return infos[index]; }
-        const MemberInfo * begin() const { return infos.data(); }
-        const MemberInfo * end() const { return infos.data() + infos.size(); }
-
-        size_t findIndex(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                return static_cast<size_t>(-1); // npos
-            return it->second;
-        }
-
-        size_t findIndexStrict(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                details::SettingsCollectionUtils::throwNameNotFound(name);
-            return it->second;
-        }
-
-        const MemberInfo * find(const StringRef & name) const
-        {
-            auto it = by_name_map.find(name);
-            if (it == by_name_map.end())
-                return end();
-            else
-                return &infos[it->second];
-        }
-
-        const MemberInfo * findStrict(const StringRef & name) const { return &infos[findIndexStrict(name)]; }
-
-    private:
        MemberInfos();

-        void add(MemberInfo && member)
-        {
-            size_t index = infos.size();
-            infos.emplace_back(member);
-            by_name_map.emplace(infos.back().name, index);
-        }
+        size_t size() const { return infos.size(); }
+        const MemberInfo * data() const { return infos.data(); }
+        const MemberInfo & operator[](size_t index) const { return infos[index]; }
+
+        const MemberInfo * find(const StringRef & name) const;
+        const MemberInfo & findStrict(const StringRef & name) const;
+        size_t findIndex(const StringRef & name) const;
+        size_t findIndexStrict(const StringRef & name) const;
+
+    private:
+        void add(MemberInfo && member);

        std::vector<MemberInfo> infos;
        std::unordered_map<StringRef, size_t> by_name_map;
@ -396,6 +365,7 @@ public:
        bool isChanged() const { return member->is_changed(*collection); }
        Field getValue() const;
        String getValueAsString() const { return member->get_string(*collection); }
+
    protected:
        friend class SettingsCollection<Derived>::const_iterator;
        const_reference() : collection(nullptr), member(nullptr) {}
@ -410,7 +380,7 @@ public:
    public:
        reference(Derived & collection_, const MemberInfo & member_) : const_reference(collection_, member_) {}
        reference(const const_reference & src) : const_reference(src) {}
-        void setValue(const Field & value);
+        void setValue(const Field & value) { this->member->set_field(*const_cast<Derived *>(this->collection), value); }
        void setValue(const String & value) { this->member->set_string(*const_cast<Derived *>(this->collection), value); }
    };

@ -453,7 +423,7 @@ public:

    /// Returns description of a setting.
    static StringRef getDescription(size_t index) { return members()[index].description; }
-    static StringRef getDescription(const String & name) { return members().findStrict(name)->description; }
+    static StringRef getDescription(const String & name) { return members().findStrict(name).description; }

    /// Searches a setting by its name; returns `npos` if not found.
    static size_t findIndex(const StringRef & name) { return members().findIndex(name); }
@ -463,36 +433,36 @@ public:
    static size_t findIndexStrict(const StringRef & name) { return members().findIndexStrict(name); }

    /// Casts a value to a string according to a specified setting without actual changing this settings.
-    static String valueToString(size_t index, const Field & value);
-    static String valueToString(const StringRef & name, const Field & value);
+    static String valueToString(size_t index, const Field & value) { return members()[index].value_to_string(value); }
+    static String valueToString(const StringRef & name, const Field & value) { return members().findStrict(name).value_to_string(value); }

    /// Casts a value to a type according to a specified setting without actual changing this settings.
    /// E.g. for SettingInt64 it casts Field to Field::Types::Int64.
    static Field valueToCorrespondingType(size_t index, const Field & value);
    static Field valueToCorrespondingType(const StringRef & name, const Field & value);

-    iterator begin() { return iterator(castToDerived(), members().begin()); }
-    const_iterator begin() const { return const_iterator(castToDerived(), members().begin()); }
-    iterator end() { return iterator(castToDerived(), members().end()); }
-    const_iterator end() const { return const_iterator(castToDerived(), members().end()); }
+    iterator begin() { return iterator(castToDerived(), members().data()); }
+    const_iterator begin() const { return const_iterator(castToDerived(), members().data()); }
+    iterator end() { const auto & the_members = members(); return iterator(castToDerived(), the_members.data() + the_members.size()); }
+    const_iterator end() const { const auto & the_members = members(); return const_iterator(castToDerived(), the_members.data() + the_members.size()); }

    /// Returns a proxy object for accessing to a setting. Throws an exception if there is not setting with such name.
    reference operator[](size_t index) { return reference(castToDerived(), members()[index]); }
-    reference operator[](const StringRef & name) { return reference(castToDerived(), *(members().findStrict(name))); }
+    reference operator[](const StringRef & name) { return reference(castToDerived(), members().findStrict(name)); }
    const_reference operator[](size_t index) const { return const_reference(castToDerived(), members()[index]); }
-    const_reference operator[](const StringRef & name) const { return const_reference(castToDerived(), *(members().findStrict(name))); }
+    const_reference operator[](const StringRef & name) const { return const_reference(castToDerived(), members().findStrict(name)); }

    /// Searches a setting by its name; returns end() if not found.
-    iterator find(const StringRef & name) { return iterator(castToDerived(), members().find(name)); }
-    const_iterator find(const StringRef & name) const { return const_iterator(castToDerived(), members().find(name)); }
+    iterator find(const StringRef & name);
+    const_iterator find(const StringRef & name) const;

    /// Searches a setting by its name; throws an exception if not found.
-    iterator findStrict(const StringRef & name) { return iterator(castToDerived(), members().findStrict(name)); }
-    const_iterator findStrict(const StringRef & name) const { return const_iterator(castToDerived(), members().findStrict(name)); }
+    iterator findStrict(const StringRef & name);
+    const_iterator findStrict(const StringRef & name) const;

    /// Sets setting's value.
-    void set(size_t index, const Field & value);
-    void set(const StringRef & name, const Field & value);
+    void set(size_t index, const Field & value) { (*this)[index].setValue(value); }
+    void set(const StringRef & name, const Field & value) { (*this)[name].setValue(value); }

    /// Sets setting's value. Read value in text form from string (for example, from configuration file or from URL parameter).
    void set(size_t index, const String & value) { (*this)[index].setValue(value); }
@ -514,11 +484,7 @@ public:

    /// Compares two collections of settings.
    bool operator ==(const Derived & rhs) const;
-
-    bool operator !=(const Derived & rhs) const
-    {
-        return !(*this == rhs);
-    }
+    bool operator!=(const Derived & rhs) const { return !(*this == rhs); }

    /// Gathers all changed values (e.g. for applying them later to another collection of settings).
    SettingsChanges changes() const;
@ -536,82 +502,16 @@ public:
    /// Writes the settings to buffer (e.g. to be sent to remote server).
    /// Only changed settings are written. They are written as list of contiguous name-value pairs,
    /// finished with empty name.
-    void serialize(WriteBuffer & buf) const
-    {
-        for (const auto & member : members())
-        {
-            if (member.is_changed(castToDerived()))
-            {
-                details::SettingsCollectionUtils::serializeName(member.name, buf);
-                member.serialize(castToDerived(), buf);
-            }
-        }
-        details::SettingsCollectionUtils::serializeName(StringRef{} /* empty string is a marker of the end of settings */, buf);
-    }
+    void serialize(WriteBuffer & buf, SettingsBinaryFormat format = SettingsBinaryFormat::DEFAULT) const;

    /// Reads the settings from buffer.
-    void deserialize(ReadBuffer & buf)
-    {
-        const auto & the_members = members();
-        while (true)
-        {
-            String name = details::SettingsCollectionUtils::deserializeName(buf);
-            if (name.empty() /* empty string is a marker of the end of settings */)
-                break;
-            the_members.findStrict(name)->deserialize(castToDerived(), buf);
-        }
-    }
+    void deserialize(ReadBuffer & buf, SettingsBinaryFormat format = SettingsBinaryFormat::DEFAULT);
 };

+
 #define DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS_MACRO) \
    LIST_OF_SETTINGS_MACRO(DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_)

-
-#define IMPLEMENT_SETTINGS_COLLECTION(DERIVED_CLASS_NAME, LIST_OF_SETTINGS_MACRO) \
-    template<> \
-    SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::MemberInfos() \
-    { \
-        using Derived = DERIVED_CLASS_NAME; \
-        struct Functions \
-        { \
-            LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_) \
-        }; \
-        LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_) \
-    } \
-    template <> \
-    const SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos & SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::instance() \
-    { \
-        static const SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos single_instance; \
-        return single_instance; \
-    } \
-    /** \
-      * Instantiation should happen when all method definitions from SettingsCollectionImpl.h \
-      * are accessible, so we instantiate explicitly. \
-      */ \
-    template class SettingsCollection<DERIVED_CLASS_NAME>;
-
-
-
-#define DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
+#define DECLARE_SETTINGS_COLLECTION_DECLARE_VARIABLES_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
    TYPE NAME {DEFAULT};
-
-
-#define IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
-    static String NAME##_getString(const Derived & collection) { return collection.NAME.toString(); } \
-    static Field NAME##_getField(const Derived & collection) { return collection.NAME.toField(); } \
-    static void NAME##_setString(Derived & collection, const String & value) { collection.NAME.set(value); } \
-    static void NAME##_setField(Derived & collection, const Field & value) { collection.NAME.set(value); } \
-    static void NAME##_serialize(const Derived & collection, WriteBuffer & buf) { collection.NAME.serialize(buf); } \
-    static void NAME##_deserialize(Derived & collection, ReadBuffer & buf) { collection.NAME.deserialize(buf); } \
-    static String NAME##_valueToString(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toString(); } \
-    static Field NAME##_valueToCorrespondingType(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toField(); } \
-
-
-#define IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION) \
-    add({[](const Derived & d) { return d.NAME.changed; },          \
-         StringRef(#NAME, strlen(#NAME)), StringRef(DESCRIPTION, strlen(DESCRIPTION)), \
-         &Functions::NAME##_getString, &Functions::NAME##_getField, \
-         &Functions::NAME##_setString, &Functions::NAME##_setField, \
-         &Functions::NAME##_serialize, &Functions::NAME##_deserialize, \
-         &Functions::NAME##_valueToString, &Functions::NAME##_valueToCorrespondingType});
 }
--- a/dbms/src/Core/SettingsCollectionImpl.h
+++ b/dbms/src/Core/SettingsCollectionImpl.h
@ -2,15 +2,84 @@

 /**
  * This file implements some functions that are dependent on Field type.
-  * Unlinke SettingsCommon.h, we only have to include it once for each
-  * instantiation of SettingsCollection<>. This allows to work on Field without
-  * always recompiling the entire project.
+  * Unlike SettingsCollection.h, we only have to include it once for each
+  * instantiation of SettingsCollection<>.
  */

 #include <Common/SettingsChanges.h>

 namespace DB
 {
+namespace details
+{
+    struct SettingsCollectionUtils
+    {
+        static void serializeName(const StringRef & name, WriteBuffer & buf);
+        static String deserializeName(ReadBuffer & buf);
+        static void serializeFlag(bool flag, WriteBuffer & buf);
+        static bool deserializeFlag(ReadBuffer & buf);
+        static void skipValue(ReadBuffer & buf);
+        static void warningNameNotFound(const StringRef & name);
+        [[noreturn]] static void throwNameNotFound(const StringRef & name);
+    };
+}
+
+
+template <class Derived>
+size_t SettingsCollection<Derived>::MemberInfos::findIndex(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        return static_cast<size_t>(-1); // npos
+    return it->second;
+}
+
+
+template <class Derived>
+size_t SettingsCollection<Derived>::MemberInfos::findIndexStrict(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        details::SettingsCollectionUtils::throwNameNotFound(name);
+    return it->second;
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfo * SettingsCollection<Derived>::MemberInfos::find(const StringRef & name) const
+{
+    auto it = by_name_map.find(name);
+    if (it == by_name_map.end())
+        return nullptr;
+    else
+        return &infos[it->second];
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfo & SettingsCollection<Derived>::MemberInfos::findStrict(const StringRef & name) const
+{
+    return infos[findIndexStrict(name)];
+}
+
+
+template <class Derived>
+void SettingsCollection<Derived>::MemberInfos::add(MemberInfo && member)
+{
+    size_t index = infos.size();
+    infos.emplace_back(member);
+    by_name_map.emplace(infos.back().name, index);
+}
+
+
+template <class Derived>
+const typename SettingsCollection<Derived>::MemberInfos &
+SettingsCollection<Derived>::members()
+{
+    static const MemberInfos the_instance;
+    return the_instance;
+}
+

 template <class Derived>
 Field SettingsCollection<Derived>::const_reference::getValue() const
@ -18,23 +87,6 @@ Field SettingsCollection<Derived>::const_reference::getValue() const
    return member->get_field(*collection);
 }

-template <class Derived>
-void SettingsCollection<Derived>::reference::setValue(const Field & value)
-{
-    this->member->set_field(*const_cast<Derived *>(this->collection), value);
-}
-
-template <class Derived>
-String SettingsCollection<Derived>::valueToString(size_t index, const Field & value)
-{
-    return members()[index].value_to_string(value);
-}
-
-template <class Derived>
-String SettingsCollection<Derived>::valueToString(const StringRef & name, const Field & value)
-{
-    return members().findStrict(name)->value_to_string(value);
-}

 template <class Derived>
 Field SettingsCollection<Derived>::valueToCorrespondingType(size_t index, const Field & value)
@ -42,36 +94,62 @@ Field SettingsCollection<Derived>::valueToCorrespondingType(size_t index, const
    return members()[index].value_to_corresponding_type(value);
 }

+
 template <class Derived>
 Field SettingsCollection<Derived>::valueToCorrespondingType(const StringRef & name, const Field & value)
 {
-    return members().findStrict(name)->value_to_corresponding_type(value);
+    return members().findStrict(name).value_to_corresponding_type(value);
 }

-template <class Derived>
-void SettingsCollection<Derived>::set(size_t index, const Field & value)
-{
-    (*this)[index].setValue(value);
-}

 template <class Derived>
-void SettingsCollection<Derived>::set(const StringRef & name, const Field & value)
+typename SettingsCollection<Derived>::iterator SettingsCollection<Derived>::find(const StringRef & name)
 {
-    (*this)[name].setValue(value);
+    const auto * member = members().find(name);
+    if (member)
+        return iterator(castToDerived(), member);
+    return end();
 }

+
+template <class Derived>
+typename SettingsCollection<Derived>::const_iterator SettingsCollection<Derived>::find(const StringRef & name) const
+{
+    const auto * member = members().find(name);
+    if (member)
+        return const_iterator(castToDerived(), member);
+    return end();
+}
+
+
+template <class Derived>
+typename SettingsCollection<Derived>::iterator SettingsCollection<Derived>::findStrict(const StringRef & name)
+{
+    return iterator(castToDerived(), &members().findStrict(name));
+}
+
+
+template <class Derived>
+typename SettingsCollection<Derived>::const_iterator SettingsCollection<Derived>::findStrict(const StringRef & name) const
+{
+    return const_iterator(castToDerived(), &members().findStrict(name));
+}
+
+
 template <class Derived>
 Field SettingsCollection<Derived>::get(size_t index) const
 {
    return (*this)[index].getValue();
 }

+
 template <class Derived>
 Field SettingsCollection<Derived>::get(const StringRef & name) const
 {
    return (*this)[name].getValue();
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::tryGet(const StringRef & name, Field & value) const
 {
@ -82,6 +160,7 @@ bool SettingsCollection<Derived>::tryGet(const StringRef & name, Field & value)
    return true;
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::tryGet(const StringRef & name, String & value) const
 {
@ -92,11 +171,14 @@ bool SettingsCollection<Derived>::tryGet(const StringRef & name, String & value)
    return true;
 }

+
 template <class Derived>
 bool SettingsCollection<Derived>::operator ==(const Derived & rhs) const
 {
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
    {
+        const auto & member = the_members[i];
        bool left_changed = member.is_changed(castToDerived());
        bool right_changed = member.is_changed(rhs);
        if (left_changed || right_changed)
@ -110,27 +192,29 @@ bool SettingsCollection<Derived>::operator ==(const Derived & rhs) const
    return true;
 }

-/// Gathers all changed values (e.g. for applying them later to another collection of settings).
+
 template <class Derived>
 SettingsChanges SettingsCollection<Derived>::changes() const
 {
    SettingsChanges found_changes;
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
    {
+        const auto & member = the_members[i];
        if (member.is_changed(castToDerived()))
            found_changes.push_back({member.name.toString(), member.get_field(castToDerived())});
    }
    return found_changes;
 }

-/// Applies change to concrete setting.
+
 template <class Derived>
 void SettingsCollection<Derived>::applyChange(const SettingChange & change)
 {
    set(change.name, change.value);
 }

-/// Applies changes to the settings.
+
 template <class Derived>
 void SettingsCollection<Derived>::applyChanges(const SettingsChanges & changes)
 {
@ -138,25 +222,110 @@ void SettingsCollection<Derived>::applyChanges(const SettingsChanges & changes)
        applyChange(change);
 }

+
 template <class Derived>
 void SettingsCollection<Derived>::copyChangesFrom(const Derived & src)
 {
-    for (const auto & member : members())
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
+    {
+        const auto & member = the_members[i];
        if (member.is_changed(src))
            member.set_field(castToDerived(), member.get_field(src));
+    }
 }

+
 template <class Derived>
 void SettingsCollection<Derived>::copyChangesTo(Derived & dest) const
 {
    dest.copyChangesFrom(castToDerived());
 }

+
 template <class Derived>
-const typename SettingsCollection<Derived>::MemberInfos &
-SettingsCollection<Derived>::members()
+void SettingsCollection<Derived>::serialize(WriteBuffer & buf, SettingsBinaryFormat format) const
 {
-    return MemberInfos::instance();
+    const auto & the_members = members();
+    for (size_t i = 0; i != the_members.size(); ++i)
+    {
+        const auto & member = the_members[i];
+        if (member.is_changed(castToDerived()))
+        {
+            details::SettingsCollectionUtils::serializeName(member.name, buf);
+            if (format >= SettingsBinaryFormat::STRINGS)
+                details::SettingsCollectionUtils::serializeFlag(member.is_ignorable, buf);
+            member.serialize(castToDerived(), buf, format);
+        }
+    }
+    details::SettingsCollectionUtils::serializeName(StringRef{} /* empty string is a marker of the end of settings */, buf);
 }

-} /* namespace DB */
+
+template <class Derived>
+void SettingsCollection<Derived>::deserialize(ReadBuffer & buf, SettingsBinaryFormat format)
+{
+    const auto & the_members = members();
+    while (true)
+    {
+        String name = details::SettingsCollectionUtils::deserializeName(buf);
+        if (name.empty() /* empty string is a marker of the end of settings */)
+            break;
+        auto * member = the_members.find(name);
+        bool is_ignorable = (format >= SettingsBinaryFormat::STRINGS) ? details::SettingsCollectionUtils::deserializeFlag(buf) : false;
+        if (member)
+        {
+            member->deserialize(castToDerived(), buf, format);
+        }
+        else if (is_ignorable)
+        {
+            details::SettingsCollectionUtils::warningNameNotFound(name);
+            details::SettingsCollectionUtils::skipValue(buf);
+        }
+        else
+            details::SettingsCollectionUtils::throwNameNotFound(name);
+    }
+}
+
+
+//-V:IMPLEMENT_SETTINGS_COLLECTION:501
+#define IMPLEMENT_SETTINGS_COLLECTION(DERIVED_CLASS_NAME, LIST_OF_SETTINGS_MACRO) \
+    template<> \
+    SettingsCollection<DERIVED_CLASS_NAME>::MemberInfos::MemberInfos() \
+    { \
+        using Derived = DERIVED_CLASS_NAME; \
+        struct Functions \
+        { \
+            LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_) \
+        }; \
+        constexpr int IGNORABLE = 1; \
+        UNUSED(IGNORABLE); \
+        LIST_OF_SETTINGS_MACRO(IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_) \
+    } \
+    /** \
+      * Instantiation should happen when all method definitions from SettingsCollectionImpl.h \
+      * are accessible, so we instantiate explicitly. \
+      */ \
+    template class SettingsCollection<DERIVED_CLASS_NAME>;
+
+
+#define IMPLEMENT_SETTINGS_COLLECTION_DEFINE_FUNCTIONS_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
+    static String NAME##_getString(const Derived & collection) { return collection.NAME.toString(); } \
+    static Field NAME##_getField(const Derived & collection) { return collection.NAME.toField(); } \
+    static void NAME##_setString(Derived & collection, const String & value) { collection.NAME.set(value); } \
+    static void NAME##_setField(Derived & collection, const Field & value) { collection.NAME.set(value); } \
+    static void NAME##_serialize(const Derived & collection, WriteBuffer & buf, SettingsBinaryFormat format) { collection.NAME.serialize(buf, format); } \
+    static void NAME##_deserialize(Derived & collection, ReadBuffer & buf, SettingsBinaryFormat format) { collection.NAME.deserialize(buf, format); } \
+    static String NAME##_valueToString(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toString(); } \
+    static Field NAME##_valueToCorrespondingType(const Field & value) { TYPE temp{DEFAULT}; temp.set(value); return temp.toField(); } \
+
+
+#define IMPLEMENT_SETTINGS_COLLECTION_ADD_MEMBER_INFO_HELPER_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \
+    add({StringRef(#NAME, strlen(#NAME)), StringRef(DESCRIPTION, strlen(DESCRIPTION)), \
+         FLAGS & IGNORABLE, \
+         [](const Derived & d) { return d.NAME.changed; }, \
+         &Functions::NAME##_getString, &Functions::NAME##_getField, \
+         &Functions::NAME##_setString, &Functions::NAME##_setField, \
+         &Functions::NAME##_serialize, &Functions::NAME##_deserialize, \
+         &Functions::NAME##_valueToString, &Functions::NAME##_valueToCorrespondingType});
+}
--- a/dbms/src/Core/iostream_debug_helpers.cpp
+++ b/dbms/src/Core/iostream_debug_helpers.cpp
@ -1,6 +1,7 @@
 #include "iostream_debug_helpers.h"

 #include <iostream>
+#include <Client/Connection.h>
 #include <Core/Block.h>
 #include <Core/ColumnWithTypeAndName.h>
 #include <Core/Field.h>
@ -92,9 +93,9 @@ std::ostream & operator<<(std::ostream & stream, const IColumn & what)
    return stream;
 }

-std::ostream & operator<<(std::ostream & stream, const Connection::Packet & what)
+std::ostream & operator<<(std::ostream & stream, const Packet & what)
 {
-    stream << "Connection::Packet("
+    stream << "Packet("
           << "type = " << what.type;
    // types description: Core/Protocol.h
    if (what.exception)
--- a/dbms/src/Core/iostream_debug_helpers.h
+++ b/dbms/src/Core/iostream_debug_helpers.h
@ -1,9 +1,6 @@
 #pragma once
 #include <iostream>

-#include <Client/Connection.h>
-
-
 namespace DB
 {

@ -40,7 +37,8 @@ std::ostream & operator<<(std::ostream & stream, const ColumnWithTypeAndName & w
 class IColumn;
 std::ostream & operator<<(std::ostream & stream, const IColumn & what);

-std::ostream & operator<<(std::ostream & stream, const Connection::Packet & what);
+struct Packet;
+std::ostream & operator<<(std::ostream & stream, const Packet & what);

 struct ExpressionAction;
 std::ostream & operator<<(std::ostream & stream, const ExpressionAction & what);
--- a/dbms/src/DataStreams/IBlockInputStream.h
+++ b/dbms/src/DataStreams/IBlockInputStream.h
@ -1,7 +1,6 @@
 #pragma once

 #include <Core/Block.h>
-#include <Core/SettingsCommon.h>
 #include <DataStreams/BlockStreamProfileInfo.h>
 #include <DataStreams/IBlockStream_fwd.h>
 #include <DataStreams/SizeLimits.h>
--- a/dbms/src/DataStreams/RemoteBlockInputStream.cpp
+++ b/dbms/src/DataStreams/RemoteBlockInputStream.cpp
@ -2,6 +2,7 @@
 #include <DataStreams/OneBlockInputStream.h>
 #include <Common/NetException.h>
 #include <Common/CurrentThread.h>
+#include <Columns/ColumnConst.h>
 #include <Interpreters/Context.h>
 #include <Interpreters/castColumn.h>
 #include <Interpreters/InternalTextLogsQueue.h>
@ -173,9 +174,30 @@ static Block adaptBlockStructure(const Block & block, const Block & header, cons
        ColumnPtr column;

        if (elem.column && isColumnConst(*elem.column))
-            /// TODO: check that column from block contains the same value.
-            /// TODO: serialize const columns.
-            column = elem.column->cloneResized(block.rows());
+        {
+            /// We expect constant column in block.
+            /// If block is not empty, then get value for constant from it,
+            /// because it may be different for remote server for functions like version(), uptime(), ...
+            if (block.rows() > 0 && block.has(elem.name))
+            {
+                /// Const column is passed as materialized. Get first value from it.
+                ///
+                /// TODO: check that column contains the same value.
+                /// TODO: serialize const columns.
+                auto col = block.getByName(elem.name);
+                col.column = block.getByName(elem.name).column->cut(0, 1);
+
+                column = castColumn(col, elem.type, context);
+
+                if (!isColumnConst(*column))
+                    column = ColumnConst::create(column, block.rows());
+                else
+                    /// It is not possible now. Just in case we support const columns serialization.
+                    column = column->cloneResized(block.rows());
+            }
+            else
+                column = elem.column->cloneResized(block.rows());
+        }
        else
            column = castColumn(block.getByName(elem.name), elem.type, context);

@ -200,7 +222,7 @@ Block RemoteBlockInputStream::readImpl()
        if (isCancelledOrThrowIfKilled())
            return Block();

-        Connection::Packet packet = multiplexed_connections->receivePacket();
+        Packet packet = multiplexed_connections->receivePacket();

        switch (packet.type)
        {
@ -279,7 +301,7 @@ void RemoteBlockInputStream::readSuffixImpl()
    tryCancel("Cancelling query because enough data has been read");

    /// Get the remaining packets so that there is no out of sync in the connections to the replicas.
-    Connection::Packet packet = multiplexed_connections->drain();
+    Packet packet = multiplexed_connections->drain();
    switch (packet.type)
    {
        case Protocol::Server::EndOfStream:
--- a/dbms/src/DataStreams/RemoteBlockOutputStream.cpp
+++ b/dbms/src/DataStreams/RemoteBlockOutputStream.cpp
@ -32,7 +32,7 @@ RemoteBlockOutputStream::RemoteBlockOutputStream(Connection & connection_,

    while (true)
    {
-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();

        if (Protocol::Server::Data == packet.type)
        {
@ -77,7 +77,7 @@ void RemoteBlockOutputStream::write(const Block & block)
        auto packet_type = connection.checkPacket();
        if (packet_type && *packet_type == Protocol::Server::Exception)
        {
-            Connection::Packet packet = connection.receivePacket();
+            Packet packet = connection.receivePacket();
            packet.exception->rethrow();
        }

@ -101,7 +101,7 @@ void RemoteBlockOutputStream::writeSuffix()
    /// Wait for EndOfStream or Exception packet, skip Log packets.
    while (true)
    {
-        Connection::Packet packet = connection.receivePacket();
+        Packet packet = connection.receivePacket();

        if (Protocol::Server::EndOfStream == packet.type)
            break;
--- a/dbms/src/DataStreams/TotalsHavingBlockInputStream.h
+++ b/dbms/src/DataStreams/TotalsHavingBlockInputStream.h
@ -10,7 +10,7 @@ class Arena;
 using ArenaPtr = std::shared_ptr<Arena>;

 class ExpressionActions;
-
+enum class TotalsMode;

 /** Takes blocks after grouping, with non-finalized aggregate functions.
  * Calculates total values according to totals_mode.
--- a/dbms/src/Dictionaries/Embedded/RegionsNames.cpp
+++ b/dbms/src/Dictionaries/Embedded/RegionsNames.cpp
@ -19,7 +19,7 @@ RegionsNames::RegionsNames(IRegionsNamesDataProviderPtr data_provider)
 {
    for (size_t language_id = 0; language_id < SUPPORTED_LANGUAGES_COUNT; ++language_id)
    {
-        const std::string & language = getSupportedLanguages()[language_id];
+        const std::string & language = supported_languages[language_id];
        names_sources[language_id] = data_provider->getLanguageRegionsNamesSource(language);
    }

@ -34,7 +34,7 @@ std::string RegionsNames::dumpSupportedLanguagesNames()
        if (i > 0)
            res += ", ";
        res += '\'';
-        res += getLanguageAliases()[i].name;
+        res += language_aliases[i].first;
        res += '\'';
    }
    return res;
@ -48,7 +48,7 @@ void RegionsNames::reload()
    RegionID max_region_id = 0;
    for (size_t language_id = 0; language_id < SUPPORTED_LANGUAGES_COUNT; ++language_id)
    {
-        const std::string & language = getSupportedLanguages()[language_id];
+        const std::string & language = supported_languages[language_id];

        auto names_source = names_sources[language_id];

--- a/dbms/src/Dictionaries/Embedded/RegionsNames.h
+++ b/dbms/src/Dictionaries/Embedded/RegionsNames.h
@ -20,7 +20,7 @@
 class RegionsNames
 {
 public:
-    enum class Language
+    enum class Language : size_t
    {
        RU = 0,
        EN,
@ -28,36 +28,35 @@ public:
        BY,
        KZ,
        TR,
+
+        END
    };

 private:
-    static const size_t ROOT_LANGUAGE = 0;
-    static const size_t SUPPORTED_LANGUAGES_COUNT = 6;
-    static const size_t LANGUAGE_ALIASES_COUNT = 7;
-
-    static const char ** getSupportedLanguages()
+    static inline constexpr const char * supported_languages[] =
    {
-        static const char * res[]{"ru", "en", "ua", "by", "kz", "tr"};
-        return res;
-    }
-
-    struct language_alias
-    {
-        const char * const name;
-        const Language lang;
+        "ru",
+        "en",
+        "ua",
+        "by",
+        "kz",
+        "tr"
    };
-    static const language_alias * getLanguageAliases()
-    {
-        static constexpr const language_alias language_aliases[]{{"ru", Language::RU},
-                                                                 {"en", Language::EN},
-                                                                 {"ua", Language::UA},
-                                                                 {"uk", Language::UA},
-                                                                 {"by", Language::BY},
-                                                                 {"kz", Language::KZ},
-                                                                 {"tr", Language::TR}};

-        return language_aliases;
-    }
+    static inline constexpr std::pair<const char *, Language> language_aliases[] =
+    {
+        {"ru", Language::RU},
+        {"en", Language::EN},
+        {"ua", Language::UA},
+        {"uk", Language::UA},
+        {"by", Language::BY},
+        {"kz", Language::KZ},
+        {"tr", Language::TR}
+    };
+
+    static constexpr size_t ROOT_LANGUAGE = 0;
+    static constexpr size_t SUPPORTED_LANGUAGES_COUNT = size_t(Language::END);
+    static constexpr size_t LANGUAGE_ALIASES_COUNT = sizeof(language_aliases);

    using NamesSources = std::vector<std::shared_ptr<ILanguageRegionsNamesDataSource>>;

@ -94,9 +93,9 @@ public:
        {
            for (size_t i = 0; i < LANGUAGE_ALIASES_COUNT; ++i)
            {
-                const auto & alias = getLanguageAliases()[i];
-                if (language[0] == alias.name[0] && language[1] == alias.name[1])
-                    return alias.lang;
+                const auto & alias = language_aliases[i];
+                if (language[0] == alias.first[0] && language[1] == alias.first[1])
+                    return alias.second;
            }
        }
        throw Poco::Exception("Unsupported language for region name. Supported languages are: " + dumpSupportedLanguagesNames() + ".");
--- a/dbms/src/Dictionaries/readInvalidateQuery.cpp
+++ b/dbms/src/Dictionaries/readInvalidateQuery.cpp
@ -1,6 +1,7 @@
 #include "readInvalidateQuery.h"
 #include <DataStreams/IBlockInputStream.h>
 #include <IO/WriteBufferFromString.h>
+#include <Formats/FormatSettings.h>


 namespace DB
--- a/dbms/src/Functions/FunctionHelpers.cpp
+++ b/dbms/src/Functions/FunctionHelpers.cpp
@ -18,6 +18,7 @@ namespace ErrorCodes
 {
    extern const int ILLEGAL_COLUMN;
    extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
+    extern const int SIZES_OF_ARRAYS_DOESNT_MATCH;
 }

 const ColumnConst * checkAndGetColumnConstStringOrFixedString(const IColumn * column)
@ -118,4 +119,28 @@ void validateArgumentType(const IFunction & func, const DataTypes & arguments,
                        ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
 }

+std::pair<std::vector<const IColumn *>, const ColumnArray::Offset *>
+checkAndGetNestedArrayOffset(const IColumn ** columns, size_t num_arguments)
+{
+    assert(num_arguments > 0);
+    std::vector<const IColumn *> nested_columns(num_arguments);
+    const ColumnArray::Offsets * offsets = nullptr;
+    for (size_t i = 0; i < num_arguments; ++i)
+    {
+        const ColumnArray::Offsets * offsets_i = nullptr;
+        if (const ColumnArray * arr = checkAndGetColumn<const ColumnArray>(columns[i]))
+        {
+            nested_columns[i] = &arr->getData();
+            offsets_i = &arr->getOffsets();
+        }
+        else
+            throw Exception("Illegal column " + columns[i]->getName() + " as argument of function", ErrorCodes::ILLEGAL_COLUMN);
+        if (i == 0)
+            offsets = offsets_i;
+        else if (*offsets_i != *offsets)
+            throw Exception("Lengths of all arrays passed to aggregate function must be equal.", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH);
+    }
+    return {nested_columns, offsets->data()};
+}
+
 }
--- a/dbms/src/Functions/FunctionHelpers.h
+++ b/dbms/src/Functions/FunctionHelpers.h
@ -4,6 +4,7 @@
 #include <Common/assert_cast.h>
 #include <DataTypes/IDataType.h>
 #include <Columns/IColumn.h>
+#include <Columns/ColumnArray.h>
 #include <Columns/ColumnConst.h>
 #include <Core/Block.h>
 #include <Core/ColumnNumbers.h>
@ -89,4 +90,8 @@ void validateArgumentType(const IFunction & func, const DataTypes & arguments,
        size_t argument_index, bool (* validator_func)(const IDataType &),
        const char * expected_type_description);

+/// Checks if a list of array columns have equal offsets. Return a pair of nested columns and offsets if true, otherwise throw.
+std::pair<std::vector<const IColumn *>, const ColumnArray::Offset *>
+checkAndGetNestedArrayOffset(const IColumn ** columns, size_t num_arguments);
+
 }
--- a/dbms/src/Functions/array/arrayIntersect.cpp
+++ b/dbms/src/Functions/array/arrayIntersect.cpp
@ -58,10 +58,19 @@ private:
    struct UnpackedArrays
    {
        size_t base_rows = 0;
-        std::vector<char> is_const;
-        std::vector<const NullMap *> null_maps;
-        std::vector<const ColumnArray::ColumnOffsets::Container *> offsets;
-        ColumnRawPtrs nested_columns;
+
+        struct UnpackedArray
+        {
+            bool is_const = false;
+            const NullMap * null_map = nullptr;
+            const NullMap * overflow_mask = nullptr;
+            const ColumnArray::ColumnOffsets::Container * offsets = nullptr;
+            const IColumn * nested_column = nullptr;
+
+        };
+
+        std::vector<UnpackedArray> args;
+        Columns column_holders;

        UnpackedArrays() = default;
    };
@ -69,9 +78,16 @@ private:
    /// Cast column to data_type removing nullable if data_type hasn't.
    /// It's expected that column can represent data_type after removing some NullMap's.
    ColumnPtr castRemoveNullable(const ColumnPtr & column, const DataTypePtr & data_type) const;
-    Columns castColumns(Block & block, const ColumnNumbers & arguments,
+
+    struct CastArgumentsResult
+    {
+        ColumnsWithTypeAndName initial;
+        ColumnsWithTypeAndName casted;
+    };
+
+    CastArgumentsResult castColumns(Block & block, const ColumnNumbers & arguments,
                        const DataTypePtr & return_type, const DataTypePtr & return_type_with_nulls) const;
-    UnpackedArrays prepareArrays(const Columns & columns) const;
+    UnpackedArrays prepareArrays(const ColumnsWithTypeAndName & columns, ColumnsWithTypeAndName & initial_columns) const;

    template <typename Map, typename ColumnType, bool is_numeric_column>
    static ColumnPtr execute(const UnpackedArrays & arrays, MutableColumnPtr result_data);
@ -173,12 +189,13 @@ ColumnPtr FunctionArrayIntersect::castRemoveNullable(const ColumnPtr & column, c
    return column;
 }

-Columns FunctionArrayIntersect::castColumns(
+FunctionArrayIntersect::CastArgumentsResult FunctionArrayIntersect::castColumns(
        Block & block, const ColumnNumbers & arguments, const DataTypePtr & return_type,
        const DataTypePtr & return_type_with_nulls) const
 {
    size_t num_args = arguments.size();
-    Columns columns(num_args);
+    ColumnsWithTypeAndName initial_columns(num_args);
+    ColumnsWithTypeAndName columns(num_args);

    auto type_array = checkAndGetDataType<DataTypeArray>(return_type.get());
    auto & type_nested = type_array->getNestedType();
@ -201,6 +218,8 @@ Columns FunctionArrayIntersect::castColumns(
    for (size_t i = 0; i < num_args; ++i)
    {
        const ColumnWithTypeAndName & arg = block.getByPosition(arguments[i]);
+        initial_columns[i] = arg;
+        columns[i] = arg;
        auto & column = columns[i];

        if (is_numeric_or_string)
@ -208,68 +227,120 @@ Columns FunctionArrayIntersect::castColumns(
            /// Cast to Array(T) or Array(Nullable(T)).
            if (nested_is_nullable)
            {
-                if (arg.type->equals(*return_type))
-                    column = arg.column;
-                else
-                    column = castColumn(arg, return_type, context);
+                if (!arg.type->equals(*return_type))
+                {
+                    column.column = castColumn(arg, return_type, context);
+                    column.type = return_type;
+                }
            }
            else
            {
-                /// If result has array type Array(T) still cast Array(Nullable(U)) to Array(Nullable(T))
-                ///  because cannot cast Nullable(T) to T.
-                if (arg.type->equals(*return_type) || arg.type->equals(*nullable_return_type))
-                    column = arg.column;
-                else if (static_cast<const DataTypeArray &>(*arg.type).getNestedType()->isNullable())
-                    column = castColumn(arg, nullable_return_type, context);
-                else
-                    column = castColumn(arg, return_type, context);
+
+                if (!arg.type->equals(*return_type) && !arg.type->equals(*nullable_return_type))
+                {
+                    /// If result has array type Array(T) still cast Array(Nullable(U)) to Array(Nullable(T))
+                    ///  because cannot cast Nullable(T) to T.
+                    if (static_cast<const DataTypeArray &>(*arg.type).getNestedType()->isNullable())
+                    {
+                        column.column = castColumn(arg, nullable_return_type, context);
+                        column.type = nullable_return_type;
+                    }
+                    else
+                    {
+                        column.column = castColumn(arg, return_type, context);
+                        column.type = return_type;
+                    }
+                }
            }
        }
        else
        {
            /// return_type_with_nulls is the most common subtype with possible nullable parts.
-            if (arg.type->equals(*return_type_with_nulls))
-                column = arg.column;
-            else
-                column = castColumn(arg, return_type_with_nulls, context);
+            if (!arg.type->equals(*return_type_with_nulls))
+            {
+                column.column = castColumn(arg, return_type_with_nulls, context);
+                column.type = return_type_with_nulls;
+            }
        }
    }

-    return columns;
+    return {.initial = initial_columns, .casted = columns};
 }

-FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(const Columns & columns) const
+static ColumnPtr callFunctionNotEquals(ColumnWithTypeAndName first, ColumnWithTypeAndName second, const Context & context)
+{
+    ColumnsWithTypeAndName args;
+    args.reserve(2);
+    args.emplace_back(std::move(first));
+    args.emplace_back(std::move(second));
+
+    auto eq_func = FunctionFactory::instance().get("notEquals", context)->build(args);
+
+    Block block = args;
+    block.insert({nullptr, eq_func->getReturnType(), ""});
+
+    eq_func->execute(block, {0, 1}, 2, args.front().column->size());
+
+    return block.getByPosition(2).column;
+}
+
+FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(
+    const ColumnsWithTypeAndName & columns, ColumnsWithTypeAndName & initial_columns) const
 {
    UnpackedArrays arrays;

    size_t columns_number = columns.size();
-    arrays.is_const.assign(columns_number, false);
-    arrays.null_maps.resize(columns_number);
-    arrays.offsets.resize(columns_number);
-    arrays.nested_columns.resize(columns_number);
+    arrays.args.resize(columns_number);

    bool all_const = true;

    for (auto i : ext::range(0, columns_number))
    {
-        auto argument_column = columns[i].get();
+        auto & arg = arrays.args[i];
+        auto argument_column = columns[i].column.get();
+        auto initial_column = initial_columns[i].column.get();
+
        if (auto argument_column_const = typeid_cast<const ColumnConst *>(argument_column))
        {
-            arrays.is_const[i] = true;
+            arg.is_const = true;
            argument_column = argument_column_const->getDataColumnPtr().get();
+            initial_column = typeid_cast<const ColumnConst *>(initial_column)->getDataColumnPtr().get();
        }

        if (auto argument_column_array = typeid_cast<const ColumnArray *>(argument_column))
        {
-            if (!arrays.is_const[i])
+            if (!arg.is_const)
                all_const = false;

-            arrays.offsets[i] = &argument_column_array->getOffsets();
-            arrays.nested_columns[i] = &argument_column_array->getData();
-            if (auto column_nullable = typeid_cast<const ColumnNullable *>(arrays.nested_columns[i]))
+            arg.offsets = &argument_column_array->getOffsets();
+            arg.nested_column = &argument_column_array->getData();
+
+            initial_column = &typeid_cast<const ColumnArray *>(initial_column)->getData();
+
+            if (auto column_nullable = typeid_cast<const ColumnNullable *>(arg.nested_column))
            {
-                arrays.null_maps[i] = &column_nullable->getNullMapData();
-                arrays.nested_columns[i] = &column_nullable->getNestedColumn();
+                arg.null_map = &column_nullable->getNullMapData();
+                arg.nested_column = &column_nullable->getNestedColumn();
+                initial_column = &typeid_cast<const ColumnNullable *>(initial_column)->getNestedColumn();
+            }
+
+            /// In case column was casted need to create overflow mask for integer types.
+            if (arg.nested_column != initial_column)
+            {
+                auto & nested_init_type = typeid_cast<const DataTypeArray *>(removeNullable(initial_columns[i].type).get())->getNestedType();
+                auto & nested_cast_type = typeid_cast<const DataTypeArray *>(removeNullable(columns[i].type).get())->getNestedType();
+
+                if (isInteger(nested_init_type) || isDateOrDateTime(nested_init_type))
+                {
+                    /// Compare original and casted columns. It seem to be the easiest way.
+                    auto overflow_mask = callFunctionNotEquals(
+                            {arg.nested_column->getPtr(), nested_init_type, ""},
+                            {initial_column->getPtr(), nested_cast_type, ""},
+                            context);
+
+                    arg.overflow_mask = &typeid_cast<const ColumnUInt8 *>(overflow_mask.get())->getData();
+                    arrays.column_holders.emplace_back(std::move(overflow_mask));
+                }
            }
        }
        else
@ -278,16 +349,16 @@ FunctionArrayIntersect::UnpackedArrays FunctionArrayIntersect::prepareArrays(con

    if (all_const)
    {
-        arrays.base_rows = arrays.offsets.front()->size();
+        arrays.base_rows = arrays.args.front().offsets->size();
    }
    else
    {
        for (auto i : ext::range(0, columns_number))
        {
-            if (arrays.is_const[i])
+            if (arrays.args[i].is_const)
                continue;

-            size_t rows = arrays.offsets[i]->size();
+            size_t rows = arrays.args[i].offsets->size();
            if (arrays.base_rows == 0 && rows > 0)
                arrays.base_rows = rows;
            else if (arrays.base_rows != rows)
@ -322,9 +393,9 @@ void FunctionArrayIntersect::executeImpl(Block & block, const ColumnNumbers & ar

    auto return_type_with_nulls = getMostSubtype(data_types, true, true);

-    Columns columns = castColumns(block, arguments, return_type, return_type_with_nulls);
+    auto columns = castColumns(block, arguments, return_type, return_type_with_nulls);

-    UnpackedArrays arrays = prepareArrays(columns);
+    UnpackedArrays arrays = prepareArrays(columns.casted, columns.initial);

    ColumnPtr result_column;
    auto not_nullable_nested_return_type = removeNullable(nested_return_type);
@ -356,7 +427,7 @@ void FunctionArrayIntersect::executeImpl(Block & block, const ColumnNumbers & ar
            result_column = execute<StringMap, ColumnFixedString, false>(arrays, std::move(column));
        else
        {
-            column = static_cast<const DataTypeArray &>(*return_type_with_nulls).getNestedType()->createColumn();
+            column = assert_cast<const DataTypeArray &>(*return_type_with_nulls).getNestedType()->createColumn();
            result_column = castRemoveNullable(execute<StringMap, IColumn, false>(arrays, std::move(column)), return_type);
        }
    }
@ -377,24 +448,24 @@ void FunctionArrayIntersect::NumberExecutor::operator()()
 template <typename Map, typename ColumnType, bool is_numeric_column>
 ColumnPtr FunctionArrayIntersect::execute(const UnpackedArrays & arrays, MutableColumnPtr result_data_ptr)
 {
-    auto args = arrays.nested_columns.size();
+    auto args = arrays.args.size();
    auto rows = arrays.base_rows;

    bool all_nullable = true;

    std::vector<const ColumnType *> columns;
    columns.reserve(args);
-    for (auto arg : ext::range(0, args))
+    for (auto & arg : arrays.args)
    {
        if constexpr (std::is_same<ColumnType, IColumn>::value)
-            columns.push_back(arrays.nested_columns[arg]);
+            columns.push_back(arg.nested_column);
        else
-            columns.push_back(checkAndGetColumn<ColumnType>(arrays.nested_columns[arg]));
+            columns.push_back(checkAndGetColumn<ColumnType>(arg.nested_column));

        if (!columns.back())
            throw Exception("Unexpected array type for function arrayIntersect", ErrorCodes::LOGICAL_ERROR);

-        if (!arrays.null_maps[arg])
+        if (!arg.null_map)
            all_nullable = false;
    }

@ -415,44 +486,45 @@ ColumnPtr FunctionArrayIntersect::execute(const UnpackedArrays & arrays, Mutable

        bool all_has_nullable = all_nullable;

-        for (auto arg : ext::range(0, args))
+        for (auto arg_num : ext::range(0, args))
        {
+            auto & arg = arrays.args[arg_num];
            bool current_has_nullable = false;

            size_t off;
            // const array has only one row
-            bool const_arg = arrays.is_const[arg];
-            if (const_arg)
-                off = (*arrays.offsets[arg])[0];
+            if (arg.is_const)
+                off = (*arg.offsets)[0];
            else
-                off = (*arrays.offsets[arg])[row];
+                off = (*arg.offsets)[row];

-            for (auto i : ext::range(prev_off[arg], off))
+            for (auto i : ext::range(prev_off[arg_num], off))
            {
-                if (arrays.null_maps[arg] && (*arrays.null_maps[arg])[i])
+                if (arg.null_map && (*arg.null_map)[i])
                    current_has_nullable = true;
-                else
+                else if (!arg.overflow_mask || (*arg.overflow_mask)[i] == 0)
                {
                    typename Map::mapped_type * value = nullptr;

                    if constexpr (is_numeric_column)
-                        value = &map[columns[arg]->getElement(i)];
+                        value = &map[columns[arg_num]->getElement(i)];
                    else if constexpr (std::is_same<ColumnType, ColumnString>::value || std::is_same<ColumnType, ColumnFixedString>::value)
-                        value = &map[columns[arg]->getDataAt(i)];
+                        value = &map[columns[arg_num]->getDataAt(i)];
                    else
                    {
                        const char * data = nullptr;
-                        value = &map[columns[arg]->serializeValueIntoArena(i, arena, data)];
+                        value = &map[columns[arg_num]->serializeValueIntoArena(i, arena, data)];
                    }

-                    if (*value == arg)
+                    /// Here we count the number of element appearances, but no more than once per array.
+                    if (*value == arg_num)
                        ++(*value);
                }
            }

-            prev_off[arg] = off;
-            if (const_arg)
-                prev_off[arg] = 0;
+            prev_off[arg_num] = off;
+            if (arg.is_const)
+                prev_off[arg_num] = 0;

            if (!current_has_nullable)
                all_has_nullable = false;
--- a/dbms/src/Functions/array/arrayReduce.cpp
+++ b/dbms/src/Functions/array/arrayReduce.cpp
@ -7,11 +7,14 @@
 #include <Columns/ColumnAggregateFunction.h>
 #include <IO/WriteHelpers.h>
 #include <AggregateFunctions/AggregateFunctionFactory.h>
+#include <AggregateFunctions/AggregateFunctionState.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <AggregateFunctions/parseAggregateFunctionParameters.h>
 #include <Common/AlignedBuffer.h>
 #include <Common/Arena.h>

+#include <ext/scope_guard.h>
+

 namespace DB
 {
@ -106,10 +109,7 @@ DataTypePtr FunctionArrayReduce::getReturnTypeImpl(const ColumnsWithTypeAndName
 void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count)
 {
    IAggregateFunction & agg_func = *aggregate_function.get();
-    AlignedBuffer place_holder(agg_func.sizeOfData(), agg_func.alignOfData());
-    AggregateDataPtr place = place_holder.data();
-
-    std::unique_ptr<Arena> arena = agg_func.allocatesMemoryInArena() ? std::make_unique<Arena>() : nullptr;
+    std::unique_ptr<Arena> arena = std::make_unique<Arena>();

    /// Aggregate functions do not support constant columns. Therefore, we materialize them.
    std::vector<ColumnPtr> materialized_columns;
@ -157,32 +157,40 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
        throw Exception("State function " + agg_func.getName() + " inserts results into non-state column "
                        + block.getByPosition(result).type->getName(), ErrorCodes::ILLEGAL_COLUMN);

-    ColumnArray::Offset current_offset = 0;
+    PODArray<AggregateDataPtr> places(input_rows_count);
    for (size_t i = 0; i < input_rows_count; ++i)
    {
-        agg_func.create(place);
-        ColumnArray::Offset next_offset = (*offsets)[i];
-
+        places[i] = arena->alignedAlloc(agg_func.sizeOfData(), agg_func.alignOfData());
        try
        {
-            for (size_t j = current_offset; j < next_offset; ++j)
-                agg_func.add(place, aggregate_arguments, j, arena.get());
-
-            if (!res_col_aggregate_function)
-                agg_func.insertResultInto(place, res_col);
-            else
-                res_col_aggregate_function->insertFrom(place);
+            agg_func.create(places[i]);
        }
        catch (...)
        {
-            agg_func.destroy(place);
+            agg_func.destroy(places[i]);
            throw;
        }
-
-        agg_func.destroy(place);
-        current_offset = next_offset;
    }

+    SCOPE_EXIT({
+        for (size_t i = 0; i < input_rows_count; ++i)
+            agg_func.destroy(places[i]);
+    });
+
+    {
+        auto that = &agg_func;
+        /// Unnest consecutive trailing -State combinators
+        while (auto func = typeid_cast<AggregateFunctionState *>(that))
+            that = func->getNestedFunction().get();
+
+        that->addBatchArray(input_rows_count, places.data(), 0, aggregate_arguments, offsets->data(), arena.get());
+    }
+
+    for (size_t i = 0; i < input_rows_count; ++i)
+        if (!res_col_aggregate_function)
+            agg_func.insertResultInto(places[i], res_col);
+        else
+            res_col_aggregate_function->insertFrom(places[i]);
    block.getByPosition(result).column = std::move(result_holder);
 }

--- a/dbms/src/Functions/modulo.cpp
+++ b/dbms/src/Functions/modulo.cpp
@ -61,8 +61,23 @@ struct ModuloByConstantImpl

        /// Here we failed to make the SSE variant from libdivide give an advantage.
        size_t size = a.size();
-        for (size_t i = 0; i < size; ++i)
-            c[i] = a[i] - (a[i] / divider) * b; /// NOTE: perhaps, the division semantics with the remainder of negative numbers is not preserved.
+
+        /// strict aliasing optimization for char like arrays
+        auto * __restrict src = a.data();
+        auto * __restrict dst = c.data();
+
+        if (b & (b - 1))
+        {
+            for (size_t i = 0; i < size; ++i)
+                dst[i] = src[i] - (src[i] / divider) * b; /// NOTE: perhaps, the division semantics with the remainder of negative numbers is not preserved.
+        }
+        else
+        {
+            // gcc libdivide doesn't work well for pow2 division
+            auto mask = b - 1;
+            for (size_t i = 0; i < size; ++i)
+                dst[i] = src[i] & mask;
+        }
    }
 };

--- a/dbms/src/Functions/substring.cpp
+++ b/dbms/src/Functions/substring.cpp
@ -138,15 +138,9 @@ public:
        Int64 length_value = 0;

        if (column_start_const)
-        {
            start_value = column_start_const->getInt(0);
-        }
        if (column_length_const)
-        {
            length_value = column_length_const->getInt(0);
-            if (length_value < 0)
-                throw Exception("Third argument provided for function substring could not be negative.", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
-        }

        if constexpr (is_utf8)
        {
--- a/dbms/src/IO/BufferWithOwnMemory.h
+++ b/dbms/src/IO/BufferWithOwnMemory.h
@ -77,7 +77,7 @@ struct Memory : boost::noncopyable, Allocator
            m_capacity = new_size;
            alloc();
        }
-        else if (new_size <= m_size)
+        else if (new_size <= m_capacity - pad_right)
        {
            m_size = new_size;
            return;
--- a/dbms/src/Interpreters/Aggregator.cpp
+++ b/dbms/src/Interpreters/Aggregator.cpp
@ -26,6 +26,8 @@
 #include <Common/assert_cast.h>
 #include <common/demangle.h>
 #include <common/config_common.h>
+#include <AggregateFunctions/AggregateFunctionArray.h>
+#include <AggregateFunctions/AggregateFunctionState.h>


 namespace ProfileEvents
@ -492,7 +494,12 @@ void NO_INLINE Aggregator::executeImplBatch(

    /// Add values to the aggregate functions.
    for (AggregateFunctionInstruction * inst = aggregate_instructions; inst->that; ++inst)
-        inst->that->addBatch(rows, places.data(), inst->state_offset, inst->arguments, aggregates_pool);
+    {
+        if (inst->offsets)
+            inst->batch_that->addBatchArray(rows, places.data(), inst->state_offset, inst->batch_arguments, inst->offsets, aggregates_pool);
+        else
+            inst->batch_that->addBatch(rows, places.data(), inst->state_offset, inst->batch_arguments, aggregates_pool);
+    }
 }


@ -504,7 +511,13 @@ void NO_INLINE Aggregator::executeWithoutKeyImpl(
 {
    /// Adding values
    for (AggregateFunctionInstruction * inst = aggregate_instructions; inst->that; ++inst)
-        inst->that->addBatchSinglePlace(rows, res + inst->state_offset, inst->arguments, arena);
+    {
+        if (inst->offsets)
+            inst->batch_that->addBatchSinglePlace(
+                inst->offsets[static_cast<ssize_t>(rows - 1)], res + inst->state_offset, inst->batch_arguments, arena);
+        else
+            inst->batch_that->addBatchSinglePlace(rows, res + inst->state_offset, inst->batch_arguments, arena);
+    }
 }


@ -564,6 +577,7 @@ bool Aggregator::executeOnBlock(Columns columns, UInt64 num_rows, AggregatedData
    AggregateFunctionInstructions aggregate_functions_instructions(params.aggregates_size + 1);
    aggregate_functions_instructions[params.aggregates_size].that = nullptr;

+    std::vector<std::vector<const IColumn *>> nested_columns_holder;
    for (size_t i = 0; i < params.aggregates_size; ++i)
    {
        for (size_t j = 0; j < aggregate_columns[i].size(); ++j)
@ -579,10 +593,30 @@ bool Aggregator::executeOnBlock(Columns columns, UInt64 num_rows, AggregatedData
            }
        }

-        aggregate_functions_instructions[i].that = aggregate_functions[i];
-        aggregate_functions_instructions[i].func = aggregate_functions[i]->getAddressOfAddFunction();
-        aggregate_functions_instructions[i].state_offset = offsets_of_aggregate_states[i];
        aggregate_functions_instructions[i].arguments = aggregate_columns[i].data();
+        aggregate_functions_instructions[i].state_offset = offsets_of_aggregate_states[i];
+        auto that = aggregate_functions[i];
+        /// Unnest consecutive trailing -State combinators
+        while (auto func = typeid_cast<const AggregateFunctionState *>(that))
+            that = func->getNestedFunction().get();
+        aggregate_functions_instructions[i].that = that;
+        aggregate_functions_instructions[i].func = that->getAddressOfAddFunction();
+
+        if (auto func = typeid_cast<const AggregateFunctionArray *>(that))
+        {
+            /// Unnest consecutive -State combinators before -Array
+            that = func->getNestedFunction().get();
+            while (auto nested_func = typeid_cast<const AggregateFunctionState *>(that))
+                that = nested_func->getNestedFunction().get();
+            auto [nested_columns, offsets] = checkAndGetNestedArrayOffset(aggregate_columns[i].data(), that->getArgumentTypes().size());
+            nested_columns_holder.push_back(std::move(nested_columns));
+            aggregate_functions_instructions[i].batch_arguments = nested_columns_holder.back().data();
+            aggregate_functions_instructions[i].offsets = offsets;
+        }
+        else
+            aggregate_functions_instructions[i].batch_arguments = aggregate_columns[i].data();
+
+        aggregate_functions_instructions[i].batch_that = that;
    }

    if (isCancelled())
--- a/dbms/src/Interpreters/Aggregator.h
+++ b/dbms/src/Interpreters/Aggregator.h
@ -1008,6 +1008,9 @@ protected:
        IAggregateFunction::AddFunc func;
        size_t state_offset;
        const IColumn ** arguments;
+        const IAggregateFunction * batch_that;
+        const IColumn ** batch_arguments;
+        const UInt64 * offsets = nullptr;
    };

    using AggregateFunctionInstructions = std::vector<AggregateFunctionInstruction>;
--- a/dbms/src/Interpreters/AnalyzedJoin.h
+++ b/dbms/src/Interpreters/AnalyzedJoin.h
@ -2,7 +2,6 @@

 #include <Core/Names.h>
 #include <Core/NamesAndTypes.h>
-#include <Core/SettingsCommon.h>
 #include <Parsers/ASTTablesInSelectQuery.h>
 #include <Interpreters/IJoin.h>
 #include <Interpreters/asof.h>
--- a/dbms/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp
+++ b/dbms/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp
@ -11,6 +11,7 @@
 #include <TableFunctions/TableFunctionFactory.h>

 #include <common/logger_useful.h>
+#include <DataStreams/ConvertingBlockInputStream.h>


 namespace ProfileEvents
@ -66,7 +67,7 @@ SelectStreamFactory::SelectStreamFactory(
 namespace
 {

-BlockInputStreamPtr createLocalStream(const ASTPtr & query_ast, const Context & context, QueryProcessingStage::Enum processed_stage)
+BlockInputStreamPtr createLocalStream(const ASTPtr & query_ast, const Block & header, const Context & context, QueryProcessingStage::Enum processed_stage)
 {
    checkStackSize();

@ -83,7 +84,7 @@ BlockInputStreamPtr createLocalStream(const ASTPtr & query_ast, const Context &
     */
    /// return std::make_shared<MaterializingBlockInputStream>(stream);

-    return stream;
+    return std::make_shared<ConvertingBlockInputStream>(context, stream, header, ConvertingBlockInputStream::MatchColumnsMode::Name);
 }

 static String formattedAST(const ASTPtr & ast)
@ -109,7 +110,7 @@ void SelectStreamFactory::createForShard(

    auto emplace_local_stream = [&]()
    {
-        res.emplace_back(createLocalStream(modified_query_ast, context, processed_stage));
+        res.emplace_back(createLocalStream(modified_query_ast, header, context, processed_stage));
    };

    String modified_query = formattedAST(modified_query_ast);
@ -249,7 +250,7 @@ void SelectStreamFactory::createForShard(
            }

            if (try_results.empty() || local_delay < max_remote_delay)
-                return createLocalStream(modified_query_ast, context, stage);
+                return createLocalStream(modified_query_ast, header, context, stage);
            else
            {
                std::vector<IConnectionPool::Entry> connections;
--- a/dbms/src/Interpreters/IdentifierSemantic.cpp
+++ b/dbms/src/Interpreters/IdentifierSemantic.cpp
@ -132,6 +132,15 @@ std::pair<String, String> IdentifierSemantic::extractDatabaseAndTable(const ASTI
    return { "", identifier.name };
 }

+std::optional<String> IdentifierSemantic::extractNestedName(const ASTIdentifier & identifier, const String & table_name)
+{
+    if (identifier.name_parts.size() == 3 && table_name == identifier.name_parts[0])
+        return identifier.name_parts[1] + '.' + identifier.name_parts[2];
+    else if (identifier.name_parts.size() == 2)
+        return identifier.name_parts[0] + '.' + identifier.name_parts[1];
+    return {};
+}
+
 bool IdentifierSemantic::doesIdentifierBelongTo(const ASTIdentifier & identifier, const String & database, const String & table)
 {
    size_t num_components = identifier.name_parts.size();
--- a/dbms/src/Interpreters/IdentifierSemantic.h
+++ b/dbms/src/Interpreters/IdentifierSemantic.h
@ -36,6 +36,7 @@ struct IdentifierSemantic
    static std::optional<String> getTableName(const ASTIdentifier & node);
    static std::optional<String> getTableName(const ASTPtr & ast);
    static std::pair<String, String> extractDatabaseAndTable(const ASTIdentifier & identifier);
+    static std::optional<String> extractNestedName(const ASTIdentifier & identifier, const String & table_name);

    static ColumnMatch canReferColumnToTable(const ASTIdentifier & identifier, const DatabaseAndTableWithAlias & db_and_table);
    static void setColumnShortName(ASTIdentifier & identifier, const DatabaseAndTableWithAlias & db_and_table);
--- a/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.h
+++ b/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.h
@ -1,6 +1,5 @@
 #pragma once

-#include <Core/SettingsCommon.h>
 #include <Core/Types.h>
 #include <Parsers/IAST_fwd.h>
 #include <Storages/IStorage_fwd.h>
--- a/dbms/src/Interpreters/InterpreterSelectQuery.cpp
+++ b/dbms/src/Interpreters/InterpreterSelectQuery.cpp
@ -286,8 +286,9 @@ InterpreterSelectQuery::InterpreterSelectQuery(
    {
        if (is_table_func)
        {
-            /// Read from table function.
-            storage = context.getQueryContext().executeTableFunction(table_expression);
+            /// Read from table function. propagate all settings from initSettings(),
+            /// alternative is to call on current `context`, but that can potentially pollute it.
+            storage = getSubqueryContext(context).executeTableFunction(table_expression);
        }
        else
        {
@ -924,14 +925,14 @@ static UInt64 getLimitForSorting(const ASTSelectQuery & query, const Context & c
 }


-static SortingInfoPtr optimizeReadInOrder(const MergeTreeData & merge_tree, const ASTSelectQuery & query,
+static InputSortingInfoPtr optimizeReadInOrder(const MergeTreeData & merge_tree, const ASTSelectQuery & query,
    const Context & context, const SyntaxAnalyzerResultPtr & global_syntax_result)
 {
    if (!merge_tree.hasSortingKey())
        return {};

    auto order_descr = getSortDescription(query, context);
-    SortDescription prefix_order_descr;
+    SortDescription order_key_prefix_descr;
    int read_direction = order_descr.at(0).direction;

    const auto & sorting_key_columns = merge_tree.getSortingKeyColumns();
@ -946,7 +947,7 @@ static SortingInfoPtr optimizeReadInOrder(const MergeTreeData & merge_tree, cons
        ///  or in some simple cases when order key element is wrapped into monotonic function.
        int current_direction = order_descr[i].direction;
        if (order_descr[i].column_name == sorting_key_columns[i] && current_direction == read_direction)
-            prefix_order_descr.push_back(order_descr[i]);
+            order_key_prefix_descr.push_back(order_descr[i]);
        else
        {
            auto ast = query.orderBy()->children[i]->children.at(0);
@ -994,14 +995,14 @@ static SortingInfoPtr optimizeReadInOrder(const MergeTreeData & merge_tree, cons
            if (i == 0)
                read_direction = current_direction;

-            prefix_order_descr.push_back(order_descr[i]);
+            order_key_prefix_descr.push_back(order_descr[i]);
        }
    }

-    if (prefix_order_descr.empty())
+    if (order_key_prefix_descr.empty())
        return {};

-    return std::make_shared<SortingInfo>(std::move(prefix_order_descr), read_direction);
+    return std::make_shared<InputSortingInfo>(std::move(order_key_prefix_descr), read_direction);
 }


@ -1025,11 +1026,11 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
    const Settings & settings = context.getSettingsRef();
    auto & expressions = analysis_result;

-    SortingInfoPtr sorting_info;
+    InputSortingInfoPtr input_sorting_info;
    if (settings.optimize_read_in_order && storage && query.orderBy() && !query_analyzer->hasAggregation() && !query.final() && !query.join())
    {
        if (const auto * merge_tree_data = dynamic_cast<const MergeTreeData *>(storage.get()))
-            sorting_info = optimizeReadInOrder(*merge_tree_data, query, context, syntax_analyzer_result);
+            input_sorting_info = optimizeReadInOrder(*merge_tree_data, query, context, syntax_analyzer_result);
    }

    if (options.only_analyze)
@ -1089,7 +1090,7 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
            throw Exception("PREWHERE is not supported if the table is filtered by row-level security expression", ErrorCodes::ILLEGAL_PREWHERE);

        /** Read the data from Storage. from_stage - to what stage the request was completed in Storage. */
-        executeFetchColumns(from_stage, pipeline, sorting_info, expressions.prewhere_info, expressions.columns_to_remove_after_prewhere);
+        executeFetchColumns(from_stage, pipeline, input_sorting_info, expressions.prewhere_info, expressions.columns_to_remove_after_prewhere);

        LOG_TRACE(log, QueryProcessingStage::toString(from_stage) << " -> " << QueryProcessingStage::toString(options.to_stage));
    }
@ -1215,7 +1216,7 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
            if (!expressions.second_stage && !expressions.need_aggregate && !expressions.has_having)
            {
                if (expressions.has_order_by)
-                    executeOrder(pipeline, query_info.sorting_info);
+                    executeOrder(pipeline, query_info.input_sorting_info);

                if (expressions.has_order_by && query.limitLength())
                    executeDistinct(pipeline, false, expressions.selected_columns);
@ -1288,7 +1289,7 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
                if (!expressions.first_stage && !expressions.need_aggregate && !(query.group_by_with_totals && !aggregate_final))
                    executeMergeSorted(pipeline);
                else    /// Otherwise, just sort.
-                    executeOrder(pipeline, query_info.sorting_info);
+                    executeOrder(pipeline, query_info.input_sorting_info);
            }

            /** Optimization - if there are several sources and there is LIMIT, then first apply the preliminary LIMIT,
@ -1348,7 +1349,7 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
 template <typename TPipeline>
 void InterpreterSelectQuery::executeFetchColumns(
        QueryProcessingStage::Enum processing_stage, TPipeline & pipeline,
-        const SortingInfoPtr & sorting_info, const PrewhereInfoPtr & prewhere_info, const Names & columns_to_remove_after_prewhere)
+        const InputSortingInfoPtr & input_sorting_info, const PrewhereInfoPtr & prewhere_info, const Names & columns_to_remove_after_prewhere)
 {
    constexpr bool pipeline_with_processors = std::is_same<TPipeline, QueryPipeline>::value;

@ -1665,7 +1666,7 @@ void InterpreterSelectQuery::executeFetchColumns(
        query_info.syntax_analyzer_result = syntax_analyzer_result;
        query_info.sets = query_analyzer->getPreparedSets();
        query_info.prewhere_info = prewhere_info;
-        query_info.sorting_info = sorting_info;
+        query_info.input_sorting_info = input_sorting_info;

        BlockInputStreams streams;
        Pipes pipes;
@ -1813,12 +1814,12 @@ void InterpreterSelectQuery::executeFetchColumns(
            }

            /// Pin sources for merge tree tables.
-            bool pin_sources = dynamic_cast<const MergeTreeData *>(storage.get()) != nullptr;
-            if (pin_sources)
-            {
-                for (size_t i = 0; i < pipes.size(); ++i)
-                    pipes[i].pinSources(i);
-            }
+//            bool pin_sources = dynamic_cast<const MergeTreeData *>(storage.get()) != nullptr;
+//            if (pin_sources)
+//            {
+//                for (size_t i = 0; i < pipes.size(); ++i)
+//                    pipes[i].pinSources(i);
+//            }

            pipeline.init(std::move(pipes));
        }
@ -2247,46 +2248,46 @@ void InterpreterSelectQuery::executeExpression(QueryPipeline & pipeline, const E
    });
 }

-void InterpreterSelectQuery::executeOrder(Pipeline & pipeline, SortingInfoPtr sorting_info)
+void InterpreterSelectQuery::executeOrder(Pipeline & pipeline, InputSortingInfoPtr input_sorting_info)
 {
    auto & query = getSelectQuery();
-    SortDescription order_descr = getSortDescription(query, context);
+    SortDescription output_order_descr = getSortDescription(query, context);
    const Settings & settings = context.getSettingsRef();
    UInt64 limit = getLimitForSorting(query, context);

-    if (sorting_info)
+    if (input_sorting_info)
    {
        /* Case of sorting with optimization using sorting key.
         * We have several threads, each of them reads batch of parts in direct
         *  or reverse order of sorting key using one input stream per part
         *  and then merge them into one sorted stream.
         * At this stage we merge per-thread streams into one.
+         * If the input is sorted by some prefix of the sorting key required for output,
+         * we have to finish sorting after the merge.
         */

-        bool need_finish_sorting = (sorting_info->prefix_order_descr.size() < order_descr.size());
+        bool need_finish_sorting = (input_sorting_info->order_key_prefix_descr.size() < output_order_descr.size());
+
+        UInt64 limit_for_merging = (need_finish_sorting ? 0 : limit);
+        executeMergeSorted(pipeline, input_sorting_info->order_key_prefix_descr, limit_for_merging);
+
        if (need_finish_sorting)
        {
            pipeline.transform([&](auto & stream)
            {
-                stream = std::make_shared<PartialSortingBlockInputStream>(stream, order_descr, limit);
+                stream = std::make_shared<PartialSortingBlockInputStream>(stream, output_order_descr, limit);
            });
-        }

-        UInt64 limit_for_merging = (need_finish_sorting ? 0 : limit);
-        executeMergeSorted(pipeline, sorting_info->prefix_order_descr, limit_for_merging);
-
-        if (need_finish_sorting)
-        {
            pipeline.firstStream() = std::make_shared<FinishSortingBlockInputStream>(
-                pipeline.firstStream(), sorting_info->prefix_order_descr,
-                order_descr, settings.max_block_size, limit);
+                pipeline.firstStream(), input_sorting_info->order_key_prefix_descr,
+                output_order_descr, settings.max_block_size, limit);
        }
    }
    else
    {
        pipeline.transform([&](auto & stream)
        {
-            auto sorting_stream = std::make_shared<PartialSortingBlockInputStream>(stream, order_descr, limit);
+            auto sorting_stream = std::make_shared<PartialSortingBlockInputStream>(stream, output_order_descr, limit);

            /// Limits on sorting
            IBlockInputStream::LocalLimits limits;
@ -2302,16 +2303,16 @@ void InterpreterSelectQuery::executeOrder(Pipeline & pipeline, SortingInfoPtr so

        /// Merge the sorted blocks.
        pipeline.firstStream() = std::make_shared<MergeSortingBlockInputStream>(
-            pipeline.firstStream(), order_descr, settings.max_block_size, limit,
+            pipeline.firstStream(), output_order_descr, settings.max_block_size, limit,
            settings.max_bytes_before_remerge_sort,
            settings.max_bytes_before_external_sort, context.getTemporaryPath(), settings.min_free_disk_space_for_temporary_data);
    }
 }

-void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoPtr sorting_info)
+void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, InputSortingInfoPtr input_sorting_info)
 {
    auto & query = getSelectQuery();
-    SortDescription order_descr = getSortDescription(query, context);
+    SortDescription output_order_descr = getSortDescription(query, context);
    UInt64 limit = getLimitForSorting(query, context);

    const Settings & settings = context.getSettingsRef();
@ -2321,7 +2322,7 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP
 //    limits.mode = IBlockInputStream::LIMITS_TOTAL;
 //    limits.size_limits = SizeLimits(settings.max_rows_to_sort, settings.max_bytes_to_sort, settings.sort_overflow_mode);

-    if (sorting_info)
+    if (input_sorting_info)
    {
        /* Case of sorting with optimization using sorting key.
         * We have several threads, each of them reads batch of parts in direct
@ -2330,16 +2331,7 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP
         * At this stage we merge per-thread streams into one.
         */

-        bool need_finish_sorting = (sorting_info->prefix_order_descr.size() < order_descr.size());
-
-        if (need_finish_sorting)
-        {
-            pipeline.addSimpleTransform([&](const Block & header, QueryPipeline::StreamType stream_type)
-            {
-                bool do_count_rows = stream_type == QueryPipeline::StreamType::Main;
-                return std::make_shared<PartialSortingTransform>(header, order_descr, limit, do_count_rows);
-            });
-        }
+        bool need_finish_sorting = (input_sorting_info->order_key_prefix_descr.size() < output_order_descr.size());

        if (pipeline.getNumStreams() > 1)
        {
@ -2347,7 +2339,7 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP
            auto transform = std::make_shared<MergingSortedTransform>(
                pipeline.getHeader(),
                pipeline.getNumStreams(),
-                sorting_info->prefix_order_descr,
+                input_sorting_info->order_key_prefix_descr,
                settings.max_block_size, limit_for_merging);

            pipeline.addPipe({ std::move(transform) });
@ -2355,11 +2347,17 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP

        if (need_finish_sorting)
        {
+            pipeline.addSimpleTransform([&](const Block & header, QueryPipeline::StreamType stream_type)
+            {
+                bool do_count_rows = stream_type == QueryPipeline::StreamType::Main;
+                return std::make_shared<PartialSortingTransform>(header, output_order_descr, limit, do_count_rows);
+            });
+
            pipeline.addSimpleTransform([&](const Block & header) -> ProcessorPtr
            {
                return std::make_shared<FinishSortingTransform>(
-                    header, sorting_info->prefix_order_descr,
-                    order_descr, settings.max_block_size, limit);
+                    header, input_sorting_info->order_key_prefix_descr,
+                    output_order_descr, settings.max_block_size, limit);
            });
        }

@ -2369,7 +2367,7 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP
    pipeline.addSimpleTransform([&](const Block & header, QueryPipeline::StreamType stream_type)
    {
        bool do_count_rows = stream_type == QueryPipeline::StreamType::Main;
-        return std::make_shared<PartialSortingTransform>(header, order_descr, limit, do_count_rows);
+        return std::make_shared<PartialSortingTransform>(header, output_order_descr, limit, do_count_rows);
    });

    /// If there are several streams, we merge them into one
@ -2382,7 +2380,7 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, SortingInfoP
            return nullptr;

        return std::make_shared<MergeSortingTransform>(
-                header, order_descr, settings.max_block_size, limit,
+                header, output_order_descr, settings.max_block_size, limit,
                settings.max_bytes_before_remerge_sort,
                settings.max_bytes_before_external_sort, context.getTemporaryPath(), settings.min_free_disk_space_for_temporary_data);
    });
@ -2807,11 +2805,11 @@ void InterpreterSelectQuery::executeExtremes(QueryPipeline & pipeline)
 void InterpreterSelectQuery::executeSubqueriesInSetsAndJoins(Pipeline & pipeline, SubqueriesForSets & subqueries_for_sets)
 {
    /// Merge streams to one. Use MergeSorting if data was read in sorted order, Union otherwise.
-    if (query_info.sorting_info)
+    if (query_info.input_sorting_info)
    {
        if (pipeline.stream_with_non_joined_data)
            throw Exception("Using read in order optimization, but has stream with non-joined data in pipeline", ErrorCodes::LOGICAL_ERROR);
-        executeMergeSorted(pipeline, query_info.sorting_info->prefix_order_descr, 0);
+        executeMergeSorted(pipeline, query_info.input_sorting_info->order_key_prefix_descr, 0);
    }
    else
        executeUnion(pipeline, {});
@ -2822,11 +2820,11 @@ void InterpreterSelectQuery::executeSubqueriesInSetsAndJoins(Pipeline & pipeline

 void InterpreterSelectQuery::executeSubqueriesInSetsAndJoins(QueryPipeline & pipeline, SubqueriesForSets & subqueries_for_sets)
 {
-    if (query_info.sorting_info)
+    if (query_info.input_sorting_info)
    {
        if (pipeline.hasDelayedStream())
            throw Exception("Using read in order optimization, but has delayed stream in pipeline", ErrorCodes::LOGICAL_ERROR);
-        executeMergeSorted(pipeline, query_info.sorting_info->prefix_order_descr, 0);
+        executeMergeSorted(pipeline, query_info.input_sorting_info->order_key_prefix_descr, 0);
    }

    const Settings & settings = context.getSettingsRef();
--- a/dbms/src/Interpreters/InterpreterSelectQuery.h
+++ b/dbms/src/Interpreters/InterpreterSelectQuery.h
@ -198,7 +198,7 @@ private:

    template <typename TPipeline>
    void executeFetchColumns(QueryProcessingStage::Enum processing_stage, TPipeline & pipeline,
-        const SortingInfoPtr & sorting_info, const PrewhereInfoPtr & prewhere_info,
+        const InputSortingInfoPtr & sorting_info, const PrewhereInfoPtr & prewhere_info,
        const Names & columns_to_remove_after_prewhere);

    void executeWhere(Pipeline & pipeline, const ExpressionActionsPtr & expression, bool remove_filter);
@ -207,7 +207,7 @@ private:
    void executeTotalsAndHaving(Pipeline & pipeline, bool has_having, const ExpressionActionsPtr & expression, bool overflow_row, bool final);
    void executeHaving(Pipeline & pipeline, const ExpressionActionsPtr & expression);
    void executeExpression(Pipeline & pipeline, const ExpressionActionsPtr & expression);
-    void executeOrder(Pipeline & pipeline, SortingInfoPtr sorting_info);
+    void executeOrder(Pipeline & pipeline, InputSortingInfoPtr sorting_info);
    void executeWithFill(Pipeline & pipeline);
    void executeMergeSorted(Pipeline & pipeline);
    void executePreLimit(Pipeline & pipeline);
@ -226,7 +226,7 @@ private:
    void executeTotalsAndHaving(QueryPipeline & pipeline, bool has_having, const ExpressionActionsPtr & expression, bool overflow_row, bool final);
    void executeHaving(QueryPipeline & pipeline, const ExpressionActionsPtr & expression);
    void executeExpression(QueryPipeline & pipeline, const ExpressionActionsPtr & expression);
-    void executeOrder(QueryPipeline & pipeline, SortingInfoPtr sorting_info);
+    void executeOrder(QueryPipeline & pipeline, InputSortingInfoPtr sorting_info);
    void executeWithFill(QueryPipeline & pipeline);
    void executeMergeSorted(QueryPipeline & pipeline);
    void executePreLimit(QueryPipeline & pipeline);
--- a/dbms/src/Interpreters/Join.h
+++ b/dbms/src/Interpreters/Join.h
@ -10,7 +10,6 @@
 #include <Interpreters/IJoin.h>
 #include <Interpreters/AggregationCommon.h>
 #include <Interpreters/RowRefs.h>
-#include <Core/SettingsCommon.h>

 #include <Common/Arena.h>
 #include <Common/ColumnsHashing.h>
--- a/dbms/src/Interpreters/PredicateExpressionsOptimizer.cpp
+++ b/dbms/src/Interpreters/PredicateExpressionsOptimizer.cpp
@ -142,7 +142,7 @@ bool PredicateExpressionsOptimizer::allowPushDown(
    if (!subquery
        || (!settings.enable_optimize_predicate_expression_to_final_subquery && subquery->final())
        || subquery->limitBy() || subquery->limitLength()
-        || subquery->with())
+        || subquery->with() || subquery->withFill())
        return false;
    else
    {
--- a/dbms/src/Interpreters/SystemLog.cpp
+++ b/dbms/src/Interpreters/SystemLog.cpp
@ -32,7 +32,7 @@ std::shared_ptr<TSystemLog> createSystemLog(
    String database = config.getString(config_prefix + ".database", default_database_name);
    String table = config.getString(config_prefix + ".table", default_table_name);
    String partition_by = config.getString(config_prefix + ".partition_by", "toYYYYMM(event_date)");
-    String engine = "ENGINE = MergeTree PARTITION BY (" + partition_by + ") ORDER BY (event_date, event_time) SETTINGS index_granularity = 1024";
+    String engine = "ENGINE = MergeTree PARTITION BY (" + partition_by + ") ORDER BY (event_date, event_time)";

    size_t flush_interval_milliseconds = config.getUInt64(config_prefix + ".flush_interval_milliseconds", DEFAULT_SYSTEM_LOG_FLUSH_INTERVAL_MILLISECONDS);

--- a/dbms/src/Interpreters/TranslateQualifiedNamesVisitor.cpp
+++ b/dbms/src/Interpreters/TranslateQualifiedNamesVisitor.cpp
@ -29,6 +29,26 @@ namespace ErrorCodes
    extern const int LOGICAL_ERROR;
 }

+bool TranslateQualifiedNamesMatcher::Data::unknownColumn(size_t table_pos, const ASTIdentifier & identifier) const
+{
+    const auto & table = tables[table_pos].first;
+    auto nested1 = IdentifierSemantic::extractNestedName(identifier, table.table);
+    auto nested2 = IdentifierSemantic::extractNestedName(identifier, table.alias);
+
+    String short_name = identifier.shortName();
+    const Names & column_names = tables[table_pos].second;
+    for (auto & known_name : column_names)
+    {
+        if (short_name == known_name)
+            return false;
+        if (nested1 && *nested1 == known_name)
+            return false;
+        if (nested2 && *nested2 == known_name)
+            return false;
+    }
+    return !column_names.empty();
+}
+
 bool TranslateQualifiedNamesMatcher::needChildVisit(ASTPtr & node, const ASTPtr & child)
 {
    /// Do not go to FROM, JOIN, subqueries.
@ -66,6 +86,13 @@ void TranslateQualifiedNamesMatcher::visit(ASTIdentifier & identifier, ASTPtr &,
        bool allow_ambiguous = data.join_using_columns.count(short_name);
        if (IdentifierSemantic::chooseTable(identifier, data.tables, table_pos, allow_ambiguous))
        {
+            if (data.unknownColumn(table_pos, identifier))
+            {
+                String table_name = data.tables[table_pos].first.getQualifiedNamePrefix(false);
+                throw Exception("There's no column '" + identifier.name + "' in table '" + table_name + "'",
+                                ErrorCodes::UNKNOWN_IDENTIFIER);
+            }
+
            IdentifierSemantic::setMembership(identifier, table_pos);

            /// In case if column from the joined table are in source columns, change it's name to qualified.
--- a/dbms/src/Interpreters/TranslateQualifiedNamesVisitor.h
+++ b/dbms/src/Interpreters/TranslateQualifiedNamesVisitor.h
@ -38,6 +38,7 @@ public:
        bool hasColumn(const String & name) const { return source_columns.count(name); }
        bool hasTable() const { return !tables.empty(); }
        bool processAsterisks() const { return hasTable() && has_columns; }
+        bool unknownColumn(size_t table_pos, const ASTIdentifier & node) const;

        static std::vector<TableWithColumnNames> tablesOnly(const std::vector<DatabaseAndTableWithAlias> & tables)
        {
--- a/dbms/src/Interpreters/Users.cpp
+++ b/dbms/src/Interpreters/Users.cpp
@ -75,14 +75,15 @@ User::User(const String & name_, const String & config_elem, const Poco::Util::A
    const auto config_sub_elem = config_elem + ".allow_databases";
    if (config.has(config_sub_elem))
    {
+        databases = DatabaseSet();
        Poco::Util::AbstractConfiguration::Keys config_keys;
        config.keys(config_sub_elem, config_keys);

-        databases.reserve(config_keys.size());
+        databases->reserve(config_keys.size());
        for (const auto & key : config_keys)
        {
            const auto database_name = config.getString(config_sub_elem + "." + key);
-            databases.insert(database_name);
+            databases->insert(database_name);
        }
    }

@ -90,14 +91,15 @@ User::User(const String & name_, const String & config_elem, const Poco::Util::A
    const auto config_dictionary_sub_elem = config_elem + ".allow_dictionaries";
    if (config.has(config_dictionary_sub_elem))
    {
+        dictionaries = DictionarySet();
        Poco::Util::AbstractConfiguration::Keys config_keys;
        config.keys(config_dictionary_sub_elem, config_keys);

-        dictionaries.reserve(config_keys.size());
+        dictionaries->reserve(config_keys.size());
        for (const auto & key : config_keys)
        {
            const auto dictionary_name = config.getString(config_dictionary_sub_elem + "." + key);
-            dictionaries.insert(dictionary_name);
+            dictionaries->insert(dictionary_name);
        }
    }

--- a/dbms/src/Interpreters/Users.h
+++ b/dbms/src/Interpreters/Users.h
@ -36,11 +36,11 @@ struct User

    /// List of allowed databases.
    using DatabaseSet = std::unordered_set<std::string>;
-    DatabaseSet databases;
+    std::optional<DatabaseSet> databases;

    /// List of allowed dictionaries.
    using DictionarySet = std::unordered_set<std::string>;
-    DictionarySet dictionaries;
+    std::optional<DictionarySet> dictionaries;

    /// Table properties.
    using PropertyMap = std::unordered_map<std::string /* name */, std::string /* value */>;
--- a/dbms/src/Interpreters/UsersManager.cpp
+++ b/dbms/src/Interpreters/UsersManager.cpp
@ -63,7 +63,7 @@ bool UsersManager::hasAccessToDatabase(const std::string & user_name, const std:
        throw Exception("Unknown user " + user_name, ErrorCodes::UNKNOWN_USER);

    auto user = it->second;
-    return user->databases.empty() || user->databases.count(database_name);
+    return !user->databases.has_value() || user->databases->count(database_name);
 }

 bool UsersManager::hasAccessToDictionary(const std::string & user_name, const std::string & dictionary_name) const
@ -74,6 +74,6 @@ bool UsersManager::hasAccessToDictionary(const std::string & user_name, const st
        throw Exception("Unknown user " + user_name, ErrorCodes::UNKNOWN_USER);

    auto user = it->second;
-    return user->dictionaries.empty() || user->dictionaries.count(dictionary_name);
+    return !user->dictionaries.has_value() || user->dictionaries->count(dictionary_name);
 }
 }
--- a/dbms/src/Parsers/ASTSelectQuery.cpp
+++ b/dbms/src/Parsers/ASTSelectQuery.cpp
@ -3,6 +3,7 @@
 #include <Parsers/ASTFunction.h>
 #include <Parsers/ASTIdentifier.h>
 #include <Parsers/ASTSelectQuery.h>
+#include <Parsers/ASTOrderByElement.h>
 #include <Parsers/ASTTablesInSelectQuery.h>


@ -276,6 +277,18 @@ bool ASTSelectQuery::final() const
    return table_expression->final;
 }

+bool ASTSelectQuery::withFill() const
+{
+    if (!orderBy())
+        return false;
+
+    for (const auto & order_expression_element : orderBy()->children)
+        if (order_expression_element->as<ASTOrderByElement &>().with_fill)
+            return true;
+
+    return false;
+}
+

 ASTPtr ASTSelectQuery::array_join_expression_list(bool & is_left) const
 {
--- a/dbms/src/Parsers/ASTSelectQuery.h
+++ b/dbms/src/Parsers/ASTSelectQuery.h
@ -83,6 +83,7 @@ public:
    ASTPtr array_join_expression_list() const;
    const ASTTablesInSelectQueryElement * join() const;
    bool final() const;
+    bool withFill() const;
    void replaceDatabaseAndTable(const String & database_name, const String & table_name);
    void addTableFunction(ASTPtr & table_function_ptr);

--- a/dbms/src/Processors/Executors/PipelineExecutor.cpp
+++ b/dbms/src/Processors/Executors/PipelineExecutor.cpp
@ -10,6 +10,7 @@
 #include <boost/lockfree/queue.hpp>
 #include <Common/Stopwatch.h>
 #include <Processors/ISource.h>
+#include <Common/setThreadName.h>

 namespace DB
 {
@ -750,6 +751,8 @@ void PipelineExecutor::executeImpl(size_t num_threads)
        {
            /// ThreadStatus thread_status;

+            setThreadName("QueryPipelineEx");
+
            if (thread_group)
                CurrentThread::attachTo(thread_group);

--- a/dbms/src/Processors/Transforms/FilterTransform.cpp
+++ b/dbms/src/Processors/Transforms/FilterTransform.cpp
@ -131,7 +131,7 @@ void FilterTransform::transform(Chunk & chunk)
    size_t first_non_constant_column = num_columns;
    for (size_t i = 0; i < num_columns; ++i)
    {
-        if (!isColumnConst(*columns[i]))
+        if (i != filter_column_position && !isColumnConst(*columns[i]))
        {
            first_non_constant_column = i;
            break;
--- a/dbms/src/Processors/Transforms/TotalsHavingTransform.h
+++ b/dbms/src/Processors/Transforms/TotalsHavingTransform.h
@ -1,7 +1,6 @@
 #include <Processors/ISimpleTransform.h>

 #include <Common/Arena.h>
-#include <Core/SettingsCommon.h>

 namespace DB
 {
@ -12,6 +11,8 @@ using ArenaPtr = std::shared_ptr<Arena>;
 class ExpressionActions;
 using ExpressionActionsPtr = std::shared_ptr<ExpressionActions>;

+enum class TotalsMode;
+
 /** Takes blocks after grouping, with non-finalized aggregate functions.
  * Calculates total values according to totals_mode.
  * If necessary, evaluates the expression from HAVING and filters rows. Returns the finalized and filtered blocks.
--- a/dbms/src/Storages/Distributed/DirectoryMonitor.cpp
+++ b/dbms/src/Storages/Distributed/DirectoryMonitor.cpp
@ -10,6 +10,7 @@
 #include <Interpreters/Context.h>
 #include <Storages/Distributed/DirectoryMonitor.h>
 #include <IO/ReadBufferFromFile.h>
+#include <IO/ReadBufferFromString.h>
 #include <IO/WriteBufferFromFile.h>
 #include <Compression/CompressedReadBuffer.h>
 #include <IO/ConnectionTimeouts.h>
@ -269,17 +270,41 @@ void StorageDistributedDirectoryMonitor::processFile(const std::string & file_pa
 void StorageDistributedDirectoryMonitor::readQueryAndSettings(
    ReadBuffer & in, Settings & insert_settings, std::string & insert_query) const
 {
-    UInt64 magic_number_or_query_size;
+    UInt64 query_size;
+    readVarUInt(query_size, in);

-    readVarUInt(magic_number_or_query_size, in);
-
-    if (magic_number_or_query_size == UInt64(DBMS_DISTRIBUTED_SENDS_MAGIC_NUMBER))
+    if (query_size == DBMS_DISTRIBUTED_SIGNATURE_EXTRA_INFO)
    {
-        insert_settings.deserialize(in);
-        readVarUInt(magic_number_or_query_size, in);
+        /// Read extra information.
+        String extra_info_as_string;
+        readStringBinary(extra_info_as_string, in);
+        readVarUInt(query_size, in);
+        ReadBufferFromString extra_info(extra_info_as_string);
+
+        UInt64 initiator_revision;
+        readVarUInt(initiator_revision, extra_info);
+        if (ClickHouseRevision::get() < initiator_revision)
+        {
+            LOG_WARNING(
+                log,
+                "ClickHouse shard version is older than ClickHouse initiator version. "
+                    << "It may lack support for new features.");
+        }
+
+        insert_settings.deserialize(extra_info);
+
+        /// Add handling new data here, for example:
+        /// if (initiator_revision >= DBMS_MIN_REVISION_WITH_MY_NEW_DATA)
+        ///    readVarUInt(my_new_data, extra_info);
    }
-    insert_query.resize(magic_number_or_query_size);
-    in.readStrict(insert_query.data(), magic_number_or_query_size);
+    else if (query_size == DBMS_DISTRIBUTED_SIGNATURE_SETTINGS_OLD_FORMAT)
+    {
+        insert_settings.deserialize(in, SettingsBinaryFormat::OLD);
+        readVarUInt(query_size, in);
+    }
+
+    insert_query.resize(query_size);
+    in.readStrict(insert_query.data(), query_size);
 }

 struct StorageDistributedDirectoryMonitor::BatchHeader
--- a/dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp
+++ b/dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp
@ -588,8 +588,19 @@ void DistributedBlockOutputStream::writeToShard(const Block & block, const std::
            CompressedWriteBuffer compress{out};
            NativeBlockOutputStream stream{compress, ClickHouseRevision::get(), block.cloneEmpty()};

-            writeVarUInt(UInt64(DBMS_DISTRIBUTED_SENDS_MAGIC_NUMBER), out);
-            context.getSettingsRef().serialize(out);
+            /// We wrap the extra information into a string for compatibility with older versions:
+            /// a shard will able to read this information partly and ignore other parts
+            /// based on its version.
+            WriteBufferFromOwnString extra_info;
+            writeVarUInt(ClickHouseRevision::get(), extra_info);
+            context.getSettingsRef().serialize(extra_info);
+
+            /// Add new fields here, for example:
+            /// writeVarUInt(my_new_data, extra_info);
+
+            writeVarUInt(DBMS_DISTRIBUTED_SIGNATURE_EXTRA_INFO, out);
+            writeStringBinary(extra_info.str(), out);
+
            writeStringBinary(query_string, out);

            stream.writePrefix();
--- a/dbms/src/Storages/Kafka/KafkaSettings.h
+++ b/dbms/src/Storages/Kafka/KafkaSettings.h
@ -1,6 +1,6 @@
 #pragma once

-#include <Core/SettingsCommon.h>
+#include <Core/SettingsCollection.h>


 namespace DB
@ -16,16 +16,16 @@ struct KafkaSettings : public SettingsCollection<KafkaSettings>


 #define LIST_OF_KAFKA_SETTINGS(M)                                      \
-    M(SettingString, kafka_broker_list, "", "A comma-separated list of brokers for Kafka engine.") \
-    M(SettingString, kafka_topic_list, "", "A list of Kafka topics.") \
-    M(SettingString, kafka_group_name, "", "A group of Kafka consumers.") \
-    M(SettingString, kafka_format, "", "The message format for Kafka engine.") \
-    M(SettingChar, kafka_row_delimiter, '\0', "The character to be considered as a delimiter in Kafka message.") \
-    M(SettingString, kafka_schema, "", "Schema identifier (used by schema-based formats) for Kafka engine") \
-    M(SettingUInt64, kafka_num_consumers, 1, "The number of consumers per table for Kafka engine.") \
-    M(SettingUInt64, kafka_max_block_size, 0, "The maximum block size per table for Kafka engine.") \
-    M(SettingUInt64, kafka_skip_broken_messages, 0, "Skip at least this number of broken messages from Kafka topic per block") \
-    M(SettingUInt64, kafka_commit_every_batch, 0, "Commit every consumed and handled batch instead of a single commit after writing a whole block")
+    M(SettingString, kafka_broker_list, "", "A comma-separated list of brokers for Kafka engine.", 0) \
+    M(SettingString, kafka_topic_list, "", "A list of Kafka topics.", 0) \
+    M(SettingString, kafka_group_name, "", "A group of Kafka consumers.", 0) \
+    M(SettingString, kafka_format, "", "The message format for Kafka engine.", 0) \
+    M(SettingChar, kafka_row_delimiter, '\0', "The character to be considered as a delimiter in Kafka message.", 0) \
+    M(SettingString, kafka_schema, "", "Schema identifier (used by schema-based formats) for Kafka engine", 0) \
+    M(SettingUInt64, kafka_num_consumers, 1, "The number of consumers per table for Kafka engine.", 0) \
+    M(SettingUInt64, kafka_max_block_size, 0, "The maximum block size per table for Kafka engine.", 0) \
+    M(SettingUInt64, kafka_skip_broken_messages, 0, "Skip at least this number of broken messages from Kafka topic per block", 0) \
+    M(SettingUInt64, kafka_commit_every_batch, 0, "Commit every consumed and handled batch instead of a single commit after writing a whole block", 0)

    DECLARE_SETTINGS_COLLECTION(LIST_OF_KAFKA_SETTINGS)

--- a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@ -604,9 +604,9 @@ Pipes MergeTreeDataSelectExecutor::readFromParts(
            virt_column_names,
            settings);
    }
-    else if (settings.optimize_read_in_order && query_info.sorting_info)
+    else if (settings.optimize_read_in_order && query_info.input_sorting_info)
    {
-        size_t prefix_size = query_info.sorting_info->prefix_order_descr.size();
+        size_t prefix_size = query_info.input_sorting_info->order_key_prefix_descr.size();
        auto order_key_prefix_ast = data.sorting_key_expr_ast->clone();
        order_key_prefix_ast->children.resize(prefix_size);

@ -853,7 +853,7 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder(
    const Settings & settings) const
 {
    size_t sum_marks = 0;
-    SortingInfoPtr sorting_info = query_info.sorting_info;
+    const InputSortingInfoPtr & input_sorting_info = query_info.input_sorting_info;
    size_t adaptive_parts = 0;
    std::vector<size_t> sum_marks_in_parts(parts.size());
    const auto data_settings = data.getSettings();
@ -1004,9 +1004,9 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder(
                parts.emplace_back(part);
            }

-            ranges_to_get_from_part = split_ranges(ranges_to_get_from_part, sorting_info->direction);
+            ranges_to_get_from_part = split_ranges(ranges_to_get_from_part, input_sorting_info->direction);

-            if (sorting_info->direction == 1)
+            if (input_sorting_info->direction == 1)
            {
                pipes.emplace_back(std::make_shared<MergeTreeSelectProcessor>(
                    data, part.data_part, max_block_size, settings.preferred_block_size_bytes,
@ -1029,9 +1029,9 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder(
        if (pipes.size() > 1)
        {
            SortDescription sort_description;
-            for (size_t j = 0; j < query_info.sorting_info->prefix_order_descr.size(); ++j)
+            for (size_t j = 0; j < input_sorting_info->order_key_prefix_descr.size(); ++j)
                sort_description.emplace_back(data.sorting_key_columns[j],
-                    sorting_info->direction, 1);
+                    input_sorting_info->direction, 1);

            for (auto & pipe : pipes)
                pipe.addSimpleTransform(std::make_shared<ExpressionTransform>(pipe.getHeader(), sorting_key_prefix_expr));
--- a/dbms/src/Storages/MergeTree/MergeTreeSettings.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeSettings.h
@ -1,7 +1,7 @@
 #pragma once

 #include <Core/Defines.h>
-#include <Core/SettingsCommon.h>
+#include <Core/SettingsCollection.h>
 #include <Common/SettingsChanges.h>


@ -26,70 +26,70 @@ struct MergeTreeSettings : public SettingsCollection<MergeTreeSettings>
 {

 #define LIST_OF_MERGE_TREE_SETTINGS(M)                                 \
-    M(SettingUInt64, index_granularity, 8192, "How many rows correspond to one primary key value.") \
+    M(SettingUInt64, index_granularity, 8192, "How many rows correspond to one primary key value.", 0) \
    \
    /** Merge settings. */ \
-    M(SettingUInt64, merge_max_block_size, DEFAULT_MERGE_BLOCK_SIZE, "How many rows in blocks should be formed for merge operations.") \
-    M(SettingUInt64, max_bytes_to_merge_at_max_space_in_pool, 150ULL * 1024 * 1024 * 1024, "Maximum in total size of parts to merge, when there are maximum free threads in background pool (or entries in replication queue).") \
-    M(SettingUInt64, max_bytes_to_merge_at_min_space_in_pool, 1024 * 1024, "Maximum in total size of parts to merge, when there are minimum free threads in background pool (or entries in replication queue).") \
-    M(SettingUInt64, max_replicated_merges_in_queue, 16, "How many tasks of merging and mutating parts are allowed simultaneously in ReplicatedMergeTree queue.") \
-    M(SettingUInt64, max_replicated_mutations_in_queue, 8, "How many tasks of mutating parts are allowed simultaneously in ReplicatedMergeTree queue.") \
-    M(SettingUInt64, number_of_free_entries_in_pool_to_lower_max_size_of_merge, 8, "When there is less than specified number of free entries in pool (or replicated queue), start to lower maximum size of merge to process (or to put in queue). This is to allow small merges to process - not filling the pool with long running merges.") \
-    M(SettingUInt64, number_of_free_entries_in_pool_to_execute_mutation, 10, "When there is less than specified number of free entries in pool, do not execute part mutations. This is to leave free threads for regular merges and avoid \"Too many parts\"") \
-    M(SettingSeconds, old_parts_lifetime, 8 * 60, "How many seconds to keep obsolete parts.") \
-    M(SettingSeconds, temporary_directories_lifetime, 86400, "How many seconds to keep tmp_-directories.") \
+    M(SettingUInt64, merge_max_block_size, DEFAULT_MERGE_BLOCK_SIZE, "How many rows in blocks should be formed for merge operations.", 0) \
+    M(SettingUInt64, max_bytes_to_merge_at_max_space_in_pool, 150ULL * 1024 * 1024 * 1024, "Maximum in total size of parts to merge, when there are maximum free threads in background pool (or entries in replication queue).", 0) \
+    M(SettingUInt64, max_bytes_to_merge_at_min_space_in_pool, 1024 * 1024, "Maximum in total size of parts to merge, when there are minimum free threads in background pool (or entries in replication queue).", 0) \
+    M(SettingUInt64, max_replicated_merges_in_queue, 16, "How many tasks of merging and mutating parts are allowed simultaneously in ReplicatedMergeTree queue.", 0) \
+    M(SettingUInt64, max_replicated_mutations_in_queue, 8, "How many tasks of mutating parts are allowed simultaneously in ReplicatedMergeTree queue.", 0) \
+    M(SettingUInt64, number_of_free_entries_in_pool_to_lower_max_size_of_merge, 8, "When there is less than specified number of free entries in pool (or replicated queue), start to lower maximum size of merge to process (or to put in queue). This is to allow small merges to process - not filling the pool with long running merges.", 0) \
+    M(SettingUInt64, number_of_free_entries_in_pool_to_execute_mutation, 10, "When there is less than specified number of free entries in pool, do not execute part mutations. This is to leave free threads for regular merges and avoid \"Too many parts\"", 0) \
+    M(SettingSeconds, old_parts_lifetime, 8 * 60, "How many seconds to keep obsolete parts.", 0) \
+    M(SettingSeconds, temporary_directories_lifetime, 86400, "How many seconds to keep tmp_-directories.", 0) \
    \
    /** Inserts settings. */ \
-    M(SettingUInt64, parts_to_delay_insert, 150, "If table contains at least that many active parts in single partition, artificially slow down insert into table.") \
-    M(SettingUInt64, parts_to_throw_insert, 300, "If more than this number active parts in single partition, throw 'Too many parts ...' exception.") \
-    M(SettingUInt64, max_delay_to_insert, 1, "Max delay of inserting data into MergeTree table in seconds, if there are a lot of unmerged parts in single partition.") \
-    M(SettingUInt64, max_parts_in_total, 100000, "If more than this number active parts in all partitions in total, throw 'Too many parts ...' exception.") \
+    M(SettingUInt64, parts_to_delay_insert, 150, "If table contains at least that many active parts in single partition, artificially slow down insert into table.", 0) \
+    M(SettingUInt64, parts_to_throw_insert, 300, "If more than this number active parts in single partition, throw 'Too many parts ...' exception.", 0) \
+    M(SettingUInt64, max_delay_to_insert, 1, "Max delay of inserting data into MergeTree table in seconds, if there are a lot of unmerged parts in single partition.", 0) \
+    M(SettingUInt64, max_parts_in_total, 100000, "If more than this number active parts in all partitions in total, throw 'Too many parts ...' exception.", 0) \
    \
    /** Replication settings. */ \
-    M(SettingUInt64, replicated_deduplication_window, 100, "How many last blocks of hashes should be kept in ZooKeeper (old blocks will be deleted).") \
-    M(SettingUInt64, replicated_deduplication_window_seconds, 7 * 24 * 60 * 60 /* one week */, "Similar to \"replicated_deduplication_window\", but determines old blocks by their lifetime. Hash of an inserted block will be deleted (and the block will not be deduplicated after) if it outside of one \"window\". You can set very big replicated_deduplication_window to avoid duplicating INSERTs during that period of time.") \
-    M(SettingUInt64, max_replicated_logs_to_keep, 10000, "How many records may be in log, if there is inactive replica.") \
-    M(SettingUInt64, min_replicated_logs_to_keep, 100, "Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn't affect work of tables: used only to diagnose ZooKeeper log before cleaning.") \
-    M(SettingSeconds, prefer_fetch_merged_part_time_threshold, 3600, "If time passed after replication log entry creation exceeds this threshold and sum size of parts is greater than \"prefer_fetch_merged_part_size_threshold\", prefer fetching merged part from replica instead of doing merge locally. To speed up very long merges.") \
-    M(SettingUInt64, prefer_fetch_merged_part_size_threshold, 10ULL * 1024 * 1024 * 1024, "If sum size of parts exceeds this threshold and time passed after replication log entry creation is greater than \"prefer_fetch_merged_part_time_threshold\", prefer fetching merged part from replica instead of doing merge locally. To speed up very long merges.") \
-    M(SettingUInt64, max_suspicious_broken_parts, 10, "Max broken parts, if more - deny automatic deletion.") \
-    M(SettingUInt64, max_files_to_modify_in_alter_columns, 75, "Not apply ALTER if number of files for modification(deletion, addition) more than this.") \
-    M(SettingUInt64, max_files_to_remove_in_alter_columns, 50, "Not apply ALTER, if number of files for deletion more than this.") \
-    M(SettingFloat, replicated_max_ratio_of_wrong_parts, 0.5, "If ratio of wrong parts to total number of parts is less than this - allow to start.") \
-    M(SettingUInt64, replicated_max_parallel_fetches, 0, "Limit parallel fetches.") \
-    M(SettingUInt64, replicated_max_parallel_fetches_for_table, 0, "Limit parallel fetches for one table.") \
-    M(SettingUInt64, replicated_max_parallel_fetches_for_host, DEFAULT_COUNT_OF_HTTP_CONNECTIONS_PER_ENDPOINT, "Limit parallel fetches from endpoint (actually pool size).") \
-    M(SettingUInt64, replicated_max_parallel_sends, 0, "Limit parallel sends.") \
-    M(SettingUInt64, replicated_max_parallel_sends_for_table, 0, "Limit parallel sends for one table.") \
-    M(SettingBool, replicated_can_become_leader, true, "If true, Replicated tables replicas on this node will try to acquire leadership.") \
-    M(SettingSeconds, zookeeper_session_expiration_check_period, 60, "ZooKeeper session expiration check period, in seconds.") \
+    M(SettingUInt64, replicated_deduplication_window, 100, "How many last blocks of hashes should be kept in ZooKeeper (old blocks will be deleted).", 0) \
+    M(SettingUInt64, replicated_deduplication_window_seconds, 7 * 24 * 60 * 60 /* one week */, "Similar to \"replicated_deduplication_window\", but determines old blocks by their lifetime. Hash of an inserted block will be deleted (and the block will not be deduplicated after) if it outside of one \"window\". You can set very big replicated_deduplication_window to avoid duplicating INSERTs during that period of time.", 0) \
+    M(SettingUInt64, max_replicated_logs_to_keep, 10000, "How many records may be in log, if there is inactive replica.", 0) \
+    M(SettingUInt64, min_replicated_logs_to_keep, 100, "Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn't affect work of tables: used only to diagnose ZooKeeper log before cleaning.", 0) \
+    M(SettingSeconds, prefer_fetch_merged_part_time_threshold, 3600, "If time passed after replication log entry creation exceeds this threshold and sum size of parts is greater than \"prefer_fetch_merged_part_size_threshold\", prefer fetching merged part from replica instead of doing merge locally. To speed up very long merges.", 0) \
+    M(SettingUInt64, prefer_fetch_merged_part_size_threshold, 10ULL * 1024 * 1024 * 1024, "If sum size of parts exceeds this threshold and time passed after replication log entry creation is greater than \"prefer_fetch_merged_part_time_threshold\", prefer fetching merged part from replica instead of doing merge locally. To speed up very long merges.", 0) \
+    M(SettingUInt64, max_suspicious_broken_parts, 10, "Max broken parts, if more - deny automatic deletion.", 0) \
+    M(SettingUInt64, max_files_to_modify_in_alter_columns, 75, "Not apply ALTER if number of files for modification(deletion, addition) more than this.", 0) \
+    M(SettingUInt64, max_files_to_remove_in_alter_columns, 50, "Not apply ALTER, if number of files for deletion more than this.", 0) \
+    M(SettingFloat, replicated_max_ratio_of_wrong_parts, 0.5, "If ratio of wrong parts to total number of parts is less than this - allow to start.", 0) \
+    M(SettingUInt64, replicated_max_parallel_fetches, 0, "Limit parallel fetches.", 0) \
+    M(SettingUInt64, replicated_max_parallel_fetches_for_table, 0, "Limit parallel fetches for one table.", 0) \
+    M(SettingUInt64, replicated_max_parallel_fetches_for_host, DEFAULT_COUNT_OF_HTTP_CONNECTIONS_PER_ENDPOINT, "Limit parallel fetches from endpoint (actually pool size).", 0) \
+    M(SettingUInt64, replicated_max_parallel_sends, 0, "Limit parallel sends.", 0) \
+    M(SettingUInt64, replicated_max_parallel_sends_for_table, 0, "Limit parallel sends for one table.", 0) \
+    M(SettingBool, replicated_can_become_leader, true, "If true, Replicated tables replicas on this node will try to acquire leadership.", 0) \
+    M(SettingSeconds, zookeeper_session_expiration_check_period, 60, "ZooKeeper session expiration check period, in seconds.", 0) \
    \
    /** Check delay of replicas settings. */ \
-    M(SettingUInt64, check_delay_period, 60, "Period to check replication delay and compare with other replicas.") \
-    M(SettingUInt64, cleanup_delay_period, 30, "Period to clean old queue logs, blocks hashes and parts.") \
-    M(SettingUInt64, cleanup_delay_period_random_add, 10, "Add uniformly distributed value from 0 to x seconds to cleanup_delay_period to avoid thundering herd effect and subsequent DoS of ZooKeeper in case of very large number of tables.") \
-    M(SettingUInt64, min_relative_delay_to_yield_leadership, 120, "Minimal delay from other replicas to yield leadership. Here and further 0 means unlimited.") \
-    M(SettingUInt64, min_relative_delay_to_close, 300, "Minimal delay from other replicas to close, stop serving requests and not return Ok during status check.") \
-    M(SettingUInt64, min_absolute_delay_to_close, 0, "Minimal absolute delay to close, stop serving requests and not return Ok during status check.") \
-    M(SettingUInt64, enable_vertical_merge_algorithm, 1, "Enable usage of Vertical merge algorithm.") \
-    M(SettingUInt64, vertical_merge_algorithm_min_rows_to_activate, 16 * DEFAULT_MERGE_BLOCK_SIZE, "Minimal (approximate) sum of rows in merging parts to activate Vertical merge algorithm.") \
-    M(SettingUInt64, vertical_merge_algorithm_min_columns_to_activate, 11, "Minimal amount of non-PK columns to activate Vertical merge algorithm.") \
+    M(SettingUInt64, check_delay_period, 60, "Period to check replication delay and compare with other replicas.", 0) \
+    M(SettingUInt64, cleanup_delay_period, 30, "Period to clean old queue logs, blocks hashes and parts.", 0) \
+    M(SettingUInt64, cleanup_delay_period_random_add, 10, "Add uniformly distributed value from 0 to x seconds to cleanup_delay_period to avoid thundering herd effect and subsequent DoS of ZooKeeper in case of very large number of tables.", 0) \
+    M(SettingUInt64, min_relative_delay_to_yield_leadership, 120, "Minimal delay from other replicas to yield leadership. Here and further 0 means unlimited.", 0) \
+    M(SettingUInt64, min_relative_delay_to_close, 300, "Minimal delay from other replicas to close, stop serving requests and not return Ok during status check.", 0) \
+    M(SettingUInt64, min_absolute_delay_to_close, 0, "Minimal absolute delay to close, stop serving requests and not return Ok during status check.", 0) \
+    M(SettingUInt64, enable_vertical_merge_algorithm, 1, "Enable usage of Vertical merge algorithm.", 0) \
+    M(SettingUInt64, vertical_merge_algorithm_min_rows_to_activate, 16 * DEFAULT_MERGE_BLOCK_SIZE, "Minimal (approximate) sum of rows in merging parts to activate Vertical merge algorithm.", 0) \
+    M(SettingUInt64, vertical_merge_algorithm_min_columns_to_activate, 11, "Minimal amount of non-PK columns to activate Vertical merge algorithm.", 0) \
    \
    /** Compatibility settings */ \
-    M(SettingBool, compatibility_allow_sampling_expression_not_in_primary_key, false, "Allow to create a table with sampling expression not in primary key. This is needed only to temporarily allow to run the server with wrong tables for backward compatibility.") \
-    M(SettingBool, use_minimalistic_checksums_in_zookeeper, true, "Use small format (dozens bytes) for part checksums in ZooKeeper instead of ordinary ones (dozens KB). Before enabling check that all replicas support new format.") \
-    M(SettingBool, use_minimalistic_part_header_in_zookeeper, false, "Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (<part>/columns and <part>/checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.") \
-    M(SettingUInt64, finished_mutations_to_keep, 100, "How many records about mutations that are done to keep. If zero, then keep all of them.") \
-    M(SettingUInt64, min_merge_bytes_to_use_direct_io, 10ULL * 1024 * 1024 * 1024, "Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled).") \
-    M(SettingUInt64, index_granularity_bytes, 10 * 1024 * 1024, "Approximate amount of bytes in single granule (0 - disabled).") \
-    M(SettingInt64, merge_with_ttl_timeout, 3600 * 24, "Minimal time in seconds, when merge with TTL can be repeated.") \
-    M(SettingBool, ttl_only_drop_parts, false, "Only drop altogether the expired parts and not partially prune them.") \
-    M(SettingBool, write_final_mark, 1, "Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0)") \
-    M(SettingBool, enable_mixed_granularity_parts, 0, "Enable parts with adaptive and non adaptive granularity") \
-    M(SettingMaxThreads, max_part_loading_threads, 0, "The number of theads to load data parts at startup.") \
-    M(SettingMaxThreads, max_part_removal_threads, 0, "The number of theads for concurrent removal of inactive data parts. One is usually enough, but in 'Google Compute Environment SSD Persistent Disks' file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).") \
-    M(SettingUInt64, concurrent_part_removal_threshold, 100, "Activate concurrent part removal (see 'max_part_removal_threads') only if the number of inactive data parts is at least this.") \
-    M(SettingString, storage_policy, "default", "Name of storage disk policy")
+    M(SettingBool, compatibility_allow_sampling_expression_not_in_primary_key, false, "Allow to create a table with sampling expression not in primary key. This is needed only to temporarily allow to run the server with wrong tables for backward compatibility.", 0) \
+    M(SettingBool, use_minimalistic_checksums_in_zookeeper, true, "Use small format (dozens bytes) for part checksums in ZooKeeper instead of ordinary ones (dozens KB). Before enabling check that all replicas support new format.", 0) \
+    M(SettingBool, use_minimalistic_part_header_in_zookeeper, false, "Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (<part>/columns and <part>/checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.", 0) \
+    M(SettingUInt64, finished_mutations_to_keep, 100, "How many records about mutations that are done to keep. If zero, then keep all of them.", 0) \
+    M(SettingUInt64, min_merge_bytes_to_use_direct_io, 10ULL * 1024 * 1024 * 1024, "Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled).", 0) \
+    M(SettingUInt64, index_granularity_bytes, 10 * 1024 * 1024, "Approximate amount of bytes in single granule (0 - disabled).", 0) \
+    M(SettingInt64, merge_with_ttl_timeout, 3600 * 24, "Minimal time in seconds, when merge with TTL can be repeated.", 0) \
+    M(SettingBool, ttl_only_drop_parts, false, "Only drop altogether the expired parts and not partially prune them.", 0) \
+    M(SettingBool, write_final_mark, 1, "Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0)", 0) \
+    M(SettingBool, enable_mixed_granularity_parts, 0, "Enable parts with adaptive and non adaptive granularity", 0) \
+    M(SettingMaxThreads, max_part_loading_threads, 0, "The number of theads to load data parts at startup.", 0) \
+    M(SettingMaxThreads, max_part_removal_threads, 0, "The number of theads for concurrent removal of inactive data parts. One is usually enough, but in 'Google Compute Environment SSD Persistent Disks' file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).", 0) \
+    M(SettingUInt64, concurrent_part_removal_threshold, 100, "Activate concurrent part removal (see 'max_part_removal_threads') only if the number of inactive data parts is at least this.", 0) \
+    M(SettingString, storage_policy, "default", "Name of storage disk policy", 0)

    DECLARE_SETTINGS_COLLECTION(LIST_OF_MERGE_TREE_SETTINGS)

--- a/dbms/src/Storages/SelectQueryInfo.h
+++ b/dbms/src/Storages/SelectQueryInfo.h
@ -34,18 +34,18 @@ struct FilterInfo
    bool do_remove_column = false;
 };

-struct SortingInfo
+struct InputSortingInfo
 {
-    SortDescription prefix_order_descr;
+    SortDescription order_key_prefix_descr;
    int direction;

-    SortingInfo(const SortDescription & prefix_order_descr_, int direction_)
-        : prefix_order_descr(prefix_order_descr_), direction(direction_) {}
+    InputSortingInfo(const SortDescription & order_key_prefix_descr_, int direction_)
+        : order_key_prefix_descr(order_key_prefix_descr_), direction(direction_) {}
 };

 using PrewhereInfoPtr = std::shared_ptr<PrewhereInfo>;
 using FilterInfoPtr = std::shared_ptr<FilterInfo>;
-using SortingInfoPtr = std::shared_ptr<SortingInfo>;
+using InputSortingInfoPtr = std::shared_ptr<InputSortingInfo>;

 struct SyntaxAnalyzerResult;
 using SyntaxAnalyzerResultPtr = std::shared_ptr<const SyntaxAnalyzerResult>;
@ -62,7 +62,7 @@ struct SelectQueryInfo

    PrewhereInfoPtr prewhere_info;

-    SortingInfoPtr sorting_info;
+    InputSortingInfoPtr input_sorting_info;

    /// Prepared sets are used for indices by storage engine.
    /// Example: x IN (1, 2, 3)
--- a/dbms/src/Storages/StorageFile.cpp
+++ b/dbms/src/Storages/StorageFile.cpp
@ -178,41 +178,41 @@ StorageFile::StorageFile(
 class StorageFileBlockInputStream : public IBlockInputStream
 {
 public:
-    StorageFileBlockInputStream(StorageFile & storage_, const Context & context, UInt64 max_block_size, std::string file_path)
-        : storage(storage_)
+    StorageFileBlockInputStream(std::shared_ptr<StorageFile> storage_, const Context & context, UInt64 max_block_size, std::string file_path)
+        : storage(std::move(storage_))
    {
-        if (storage.use_table_fd)
+        if (storage->use_table_fd)
        {
-            unique_lock = std::unique_lock(storage.rwlock);
+            unique_lock = std::unique_lock(storage->rwlock);

            /// We could use common ReadBuffer and WriteBuffer in storage to leverage cache
            ///  and add ability to seek unseekable files, but cache sync isn't supported.

-            if (storage.table_fd_was_used) /// We need seek to initial position
+            if (storage->table_fd_was_used) /// We need seek to initial position
            {
-                if (storage.table_fd_init_offset < 0)
-                    throw Exception("File descriptor isn't seekable, inside " + storage.getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
+                if (storage->table_fd_init_offset < 0)
+                    throw Exception("File descriptor isn't seekable, inside " + storage->getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE);

                /// ReadBuffer's seek() doesn't make sense, since cache is empty
-                if (lseek(storage.table_fd, storage.table_fd_init_offset, SEEK_SET) < 0)
-                    throwFromErrno("Cannot seek file descriptor, inside " + storage.getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
+                if (lseek(storage->table_fd, storage->table_fd_init_offset, SEEK_SET) < 0)
+                    throwFromErrno("Cannot seek file descriptor, inside " + storage->getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
            }

-            storage.table_fd_was_used = true;
-            read_buf = std::make_unique<ReadBufferFromFileDescriptor>(storage.table_fd);
+            storage->table_fd_was_used = true;
+            read_buf = std::make_unique<ReadBufferFromFileDescriptor>(storage->table_fd);
        }
        else
        {
-            shared_lock = std::shared_lock(storage.rwlock);
+            shared_lock = std::shared_lock(storage->rwlock);
            read_buf = std::make_unique<ReadBufferFromFile>(file_path);
        }

-        reader = FormatFactory::instance().getInput(storage.format_name, *read_buf, storage.getSampleBlock(), context, max_block_size);
+        reader = FormatFactory::instance().getInput(storage->format_name, *read_buf, storage->getSampleBlock(), context, max_block_size);
    }

    String getName() const override
    {
-        return storage.getName();
+        return storage->getName();
    }

    Block readImpl() override
@ -233,7 +233,7 @@ public:
    }

 private:
-    StorageFile & storage;
+    std::shared_ptr<StorageFile> storage;
    Block sample_block;
    std::unique_ptr<ReadBufferFromFileDescriptor> read_buf;
    BlockInputStreamPtr reader;
@ -259,7 +259,8 @@ BlockInputStreams StorageFile::read(
    blocks_input.reserve(paths.size());
    for (const auto & file_path : paths)
    {
-        BlockInputStreamPtr cur_block = std::make_shared<StorageFileBlockInputStream>(*this, context, max_block_size, file_path);
+        BlockInputStreamPtr cur_block = std::make_shared<StorageFileBlockInputStream>(
+                std::static_pointer_cast<StorageFile>(shared_from_this()), context, max_block_size, file_path);
        blocks_input.push_back(column_defaults.empty() ? cur_block : std::make_shared<AddingDefaultsBlockInputStream>(cur_block, column_defaults, context));
    }
    return blocks_input;
--- a/dbms/src/Storages/StorageMerge.cpp
+++ b/dbms/src/Storages/StorageMerge.cpp
@ -25,7 +25,6 @@
 #include <Common/typeid_cast.h>
 #include <Common/checkStackSize.h>
 #include <Databases/IDatabase.h>
-#include <Core/SettingsCommon.h>
 #include <ext/range.h>
 #include <algorithm>
 #include <Parsers/ASTFunction.h>
--- a/dbms/src/Storages/StorageXDBC.cpp
+++ b/dbms/src/Storages/StorageXDBC.cpp
@ -105,7 +105,7 @@ namespace
    template <typename BridgeHelperMixin>
    void registerXDBCStorage(StorageFactory & factory, const std::string & name)
    {
-        factory.registerStorage(name, [&name](const StorageFactory::Arguments & args)
+        factory.registerStorage(name, [name](const StorageFactory::Arguments & args)
        {
            ASTs & engine_args = args.engine_args;

--- a/dbms/src/TableFunctions/ITableFunctionXDBC.cpp
+++ b/dbms/src/TableFunctions/ITableFunctionXDBC.cpp
@ -16,6 +16,7 @@
 #include <Poco/Net/HTTPRequest.h>
 #include <Common/Exception.h>
 #include <Common/typeid_cast.h>
+#include <Poco/NumberFormatter.h>


 namespace DB
@ -70,6 +71,10 @@ StoragePtr ITableFunctionXDBC::executeImpl(const ASTPtr & ast_function, const Co
        columns_info_uri.addQueryParameter("schema", schema_name);
    columns_info_uri.addQueryParameter("table", remote_table_name);

+    const auto use_nulls = context.getSettingsRef().external_table_functions_use_nulls;
+    columns_info_uri.addQueryParameter("external_table_functions_use_nulls",
+        Poco::NumberFormatter::format(use_nulls));
+
    ReadWriteBufferFromHTTP buf(columns_info_uri, Poco::Net::HTTPRequest::HTTP_POST, nullptr);

    std::string columns_info;
--- a/dbms/tests/clickhouse-test
+++ b/dbms/tests/clickhouse-test
@ -502,13 +502,13 @@ if __name__ == '__main__':
    parser.add_argument('--no-stateful', action='store_true', help='Disable all stateful tests')
    parser.add_argument('--skip', nargs='+', help="Skip these tests")
    parser.add_argument('--no-long', action='store_false', dest='no_long', help='Do not run long tests')
+    parser.add_argument('--client-option', nargs='+', help='Specify additional client argument')
    group=parser.add_mutually_exclusive_group(required=False)
    group.add_argument('--zookeeper', action='store_true', default=None, dest='zookeeper', help='Run zookeeper related tests')
    group.add_argument('--no-zookeeper', action='store_false', default=None, dest='zookeeper', help='Do not run zookeeper related tests')
    group=parser.add_mutually_exclusive_group(required=False)
    group.add_argument('--shard', action='store_true', default=None, dest='shard', help='Run sharding related tests (required to clickhouse-server listen 127.0.0.2 127.0.0.3)')
    group.add_argument('--no-shard', action='store_false', default=None, dest='shard', help='Do not run shard related tests')
-    group.add_argument('--client-option', nargs='+', help='Specify additional client argument')

    args = parser.parse_args()

--- a/dbms/tests/integration/test_odbc_interaction/test.py
+++ b/dbms/tests/integration/test_odbc_interaction/test.py
@ -91,7 +91,8 @@ def test_mysql_simple_select_works(started_cluster):
    with conn.cursor() as cursor:
        cursor.execute("INSERT INTO clickhouse.{} VALUES(50, 'null-guy', 127, 255, NULL), (100, 'non-null-guy', 127, 255, 511);".format(table_name))
        conn.commit()
-    assert node1.query("SELECT column_x FROM odbc('DSN={}', '{}')".format(mysql_setup["DSN"], table_name)) == '\\N\n511\n'
+    assert node1.query("SELECT column_x FROM odbc('DSN={}', '{}') SETTINGS external_table_functions_use_nulls=1".format(mysql_setup["DSN"], table_name)) == '\\N\n511\n'
+    assert node1.query("SELECT column_x FROM odbc('DSN={}', '{}') SETTINGS external_table_functions_use_nulls=0".format(mysql_setup["DSN"], table_name)) == '0\n511\n'

    node1.query('''
 CREATE TABLE {}(id UInt32, name String, age UInt32, money UInt32, column_x Nullable(UInt32)) ENGINE = MySQL('mysql1:3306', 'clickhouse', '{}', 'root', 'clickhouse');
--- a/dbms/tests/integration/test_old_versions_client/init.py
+++ b/dbms/tests/integration/test_old_versions_client/init.py
--- a/dbms/tests/integration/test_old_versions/configs/config.d/test_cluster.xml
+++ b/dbms/tests/integration/test_old_versions/configs/config.d/test_cluster.xml
@ -0,0 +1,13 @@
+<yandex>
+    <remote_servers>
+        <test_cluster>
+            <shard>
+                <weight>1</weight>
+                <replica>
+                    <host>node_new</host>
+                    <port>9000</port>
+                </replica>
+            </shard>
+        </test_cluster>
+    </remote_servers>
+</yandex>
--- a/dbms/tests/integration/test_old_versions/test.py
+++ b/dbms/tests/integration/test_old_versions/test.py
@ -0,0 +1,73 @@
+import time
+import os
+import pytest
+
+from helpers.cluster import ClickHouseCluster
+from multiprocessing.dummy import Pool
+from helpers.client import QueryRuntimeException, QueryTimeoutExceedException
+from helpers.test_tools import assert_eq_with_retry
+
+
+cluster = ClickHouseCluster(__file__)
+node18_14 = cluster.add_instance('node18_14', image='yandex/clickhouse-server:18.14.19', with_installed_binary=True, config_dir="configs")
+node19_1 = cluster.add_instance('node19_1', image='yandex/clickhouse-server:19.1.16', with_installed_binary=True, config_dir="configs")
+node19_4 = cluster.add_instance('node19_4', image='yandex/clickhouse-server:19.4.5.35', with_installed_binary=True, config_dir="configs")
+node19_8 = cluster.add_instance('node19_8', image='yandex/clickhouse-server:19.8.3.8', with_installed_binary=True, config_dir="configs")
+node19_11 = cluster.add_instance('node19_11', image='yandex/clickhouse-server:19.11.13.74', with_installed_binary=True, config_dir="configs")
+node19_13 = cluster.add_instance('node19_13', image='yandex/clickhouse-server:19.13.7.57', with_installed_binary=True, config_dir="configs")
+node19_16 = cluster.add_instance('node19_16', image='yandex/clickhouse-server:19.16.2.2', with_installed_binary=True, config_dir="configs")
+old_nodes = [node18_14, node19_1, node19_4, node19_8, node19_11, node19_13, node19_16]
+new_node = cluster.add_instance('node_new')
+
+
+def query_from_one_node_to_another(client_node, server_node, query):
+    client_node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --host {} --query {!r}".format(server_node.name, query)])
+
+
+@pytest.fixture(scope="module")
+def setup_nodes():
+    try:
+        cluster.start()
+
+        for n in old_nodes + [new_node]:
+            n.query('''CREATE TABLE test_table (id UInt32, value UInt64) ENGINE = MergeTree() ORDER BY tuple()''')
+
+        for n in old_nodes:
+            n.query('''CREATE TABLE dist_table AS test_table ENGINE = Distributed('test_cluster', 'default', 'test_table')''')
+
+        yield cluster
+    finally:
+        cluster.shutdown()
+
+
+def test_client_is_older_than_server(setup_nodes):
+    server = new_node
+    for i, client in enumerate(old_nodes):
+        query_from_one_node_to_another(client, server, "INSERT INTO test_table VALUES (1, {})".format(i))
+
+    for client in old_nodes:
+        query_from_one_node_to_another(client, server, "SELECT COUNT() FROM test_table")
+
+    assert server.query("SELECT COUNT() FROM test_table WHERE id=1") == str(len(old_nodes)) + "\n"
+
+
+def test_server_is_older_than_client(setup_nodes):
+    client = new_node
+    for i, server in enumerate(old_nodes):
+        query_from_one_node_to_another(client, server, "INSERT INTO test_table VALUES (2, {})".format(i))
+
+    for server in old_nodes:
+        query_from_one_node_to_another(client, server, "SELECT COUNT() FROM test_table")
+
+    for server in old_nodes:
+        assert server.query("SELECT COUNT() FROM test_table WHERE id=2") == "1\n"
+
+
+def test_distributed_query_initiator_is_older_than_shard(setup_nodes):
+    distributed_query_initiator_old_nodes = [node18_14, node19_13, node19_16]
+    shard = new_node
+    for i, initiator in enumerate(distributed_query_initiator_old_nodes):
+        initiator.query("INSERT INTO dist_table VALUES (3, {})".format(i))
+
+    assert_eq_with_retry(shard, "SELECT COUNT() FROM test_table WHERE id=3", str(len(distributed_query_initiator_old_nodes)))
+    assert_eq_with_retry(initiator, "SELECT COUNT() FROM dist_table WHERE id=3", str(len(distributed_query_initiator_old_nodes)))
--- a/dbms/tests/integration/test_old_versions_client/test.py
+++ b/dbms/tests/integration/test_old_versions_client/test.py
@ -1,51 +0,0 @@
-import time
-import pytest
-
-from helpers.cluster import ClickHouseCluster
-from multiprocessing.dummy import Pool
-from helpers.client import QueryRuntimeException, QueryTimeoutExceedException
-
-from helpers.test_tools import assert_eq_with_retry
-cluster = ClickHouseCluster(__file__)
-node18_14 = cluster.add_instance('node18_14', image='yandex/clickhouse-server:18.14.19', with_installed_binary=True)
-node19_1 = cluster.add_instance('node19_1', image='yandex/clickhouse-server:19.1.16', with_installed_binary=True)
-node19_4 = cluster.add_instance('node19_4', image='yandex/clickhouse-server:19.4.5.35', with_installed_binary=True)
-node19_6 = cluster.add_instance('node19_6', image='yandex/clickhouse-server:19.6.3.18', with_installed_binary=True)
-node19_8 = cluster.add_instance('node19_8', image='yandex/clickhouse-server:19.8.3.8', with_installed_binary=True)
-node_new = cluster.add_instance('node_new')
-
-@pytest.fixture(scope="module")
-def setup_nodes():
-    try:
-        cluster.start()
-        for n in (node18_14, node19_1, node19_4, node19_6, node19_8, node_new):
-            n.query('''CREATE TABLE test_table (id UInt32, value UInt64) ENGINE = MergeTree() ORDER BY tuple()''')
-
-        yield cluster
-    finally:
-        cluster.shutdown()
-
-
-def query_from_one_node_to_another(client_node, server_node, query):
-    client_node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --host {} --query '{}'".format(server_node.name, query)])
-
-def test_client_from_different_versions(setup_nodes):
-    old_nodes = (node18_14, node19_1, node19_4, node19_6, node19_8)
-    # from new to old
-    for n in old_nodes:
-        query_from_one_node_to_another(node_new, n, "INSERT INTO test_table VALUES (1, 1)")
-
-    for n in old_nodes:
-        query_from_one_node_to_another(node_new, n, "SELECT COUNT() FROM test_table")
-
-    for n in old_nodes:
-        assert n.query("SELECT COUNT() FROM test_table") == "1\n"
-
-    # from old to new
-    for i, n in enumerate(old_nodes):
-        query_from_one_node_to_another(n, node_new, "INSERT INTO test_table VALUES ({i}, {i})".format(i=i))
-
-    for n in old_nodes:
-        query_from_one_node_to_another(n, node_new, "SELECT COUNT() FROM test_table")
-
-    assert node_new.query("SELECT COUNT() FROM test_table") == str(len(old_nodes)) + "\n"
--- a/dbms/tests/integration/test_replicating_constants/init.py
+++ b/dbms/tests/integration/test_replicating_constants/init.py
--- a/dbms/tests/integration/test_replicating_constants/test.py
+++ b/dbms/tests/integration/test_replicating_constants/test.py
@ -0,0 +1,21 @@
+import pytest
+
+from helpers.cluster import ClickHouseCluster
+
+cluster = ClickHouseCluster(__file__)
+
+node1 = cluster.add_instance('node1', with_zookeeper=True)
+node2 = cluster.add_instance('node2', with_zookeeper=True, image='yandex/clickhouse-server:19.1.14', with_installed_binary=True)
+
+@pytest.fixture(scope="module")
+def start_cluster():
+    try:
+        cluster.start()
+
+        yield cluster
+    finally:
+        cluster.shutdown()
+
+def test_different_versions(start_cluster):
+
+    assert node1.query("SELECT uniqExact(x) FROM (SELECT version() as x from remote('node{1,2}', system.one))") == "2\n"
--- a/dbms/tests/integration/test_storage_s3/test_server.py
+++ b/dbms/tests/integration/test_storage_s3/test_server.py
@ -23,9 +23,10 @@ import time
 import uuid
 import xml.etree.ElementTree

+BASE_DIR = os.path.dirname(__file__)

 logging.getLogger().setLevel(logging.INFO)
-file_handler = logging.FileHandler("/var/log/clickhouse-server/test-server.log", "a", encoding="utf-8")
+file_handler = logging.FileHandler(os.path.join(BASE_DIR, "test-server.log"), "a", encoding="utf-8")
 file_handler.setFormatter(logging.Formatter("%(asctime)s %(message)s"))
 logging.getLogger().addHandler(file_handler)
 logging.getLogger().addHandler(logging.StreamHandler())
--- a/dbms/tests/integration/test_storage_s3/test.py
+++ b/dbms/tests/integration/test_storage_s3/test.py
@ -38,7 +38,7 @@ def started_cluster():
        cluster.start()

        cluster.communication_port = 10000
-        instance.copy_file_to_container(os.path.join(os.path.dirname(__file__), "test_server.py"), "test_server.py")
+        instance.copy_file_to_container(os.path.join(os.path.dirname(__file__), "server.py"), "test_server.py")
        cluster.bucket = "abc"
        instance.exec_in_container(["python", "test_server.py", str(cluster.communication_port), cluster.bucket], detach=True)
        cluster.mock_host = instance.ip_address
--- a/dbms/tests/integration/test_user_zero_database_access/init.py
+++ b/dbms/tests/integration/test_user_zero_database_access/init.py
--- a/dbms/tests/integration/test_user_zero_database_access/configs/config.xml
+++ b/dbms/tests/integration/test_user_zero_database_access/configs/config.xml
@ -0,0 +1,31 @@
+<?xml version="1.0"?>
+<yandex>
+    <logger>
+        <level>trace</level>
+        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
+        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
+        <size>1000M</size>
+        <count>10</count>
+    </logger>
+
+    <tcp_port>9000</tcp_port>
+    <listen_host>127.0.0.1</listen_host>
+
+    <openSSL>
+        <client>
+            <cacheSessions>true</cacheSessions>
+            <verificationMode>none</verificationMode>
+            <invalidCertificateHandler>
+                <name>AcceptCertificateHandler</name>
+            </invalidCertificateHandler>
+        </client>
+    </openSSL>
+
+    <max_concurrent_queries>500</max_concurrent_queries>
+    <mark_cache_size>5368709120</mark_cache_size>
+    <path>./clickhouse/</path>
+    <users_config>users.xml</users_config>
+
+    <max_table_size_to_drop>1</max_table_size_to_drop>
+    <max_partition_size_to_drop>1</max_partition_size_to_drop>
+</yandex>
--- a/dbms/tests/integration/test_user_zero_database_access/configs/users.xml
+++ b/dbms/tests/integration/test_user_zero_database_access/configs/users.xml
@ -0,0 +1,46 @@
+<?xml version="1.0"?>
+<yandex>
+    <profiles>
+        <default>
+        </default>
+    </profiles>
+
+    <users>
+        <default>
+            <password></password>
+            <networks incl="networks" replace="replace">
+                <ip>::/0</ip>
+            </networks>
+            <profile>default</profile>
+            <quota>default</quota>
+        </default>
+
+        <no_access>
+            <password></password>
+            <networks incl="networks" replace="replace">
+                <ip>::/0</ip>
+            </networks>
+            <profile>default</profile>
+            <quota>default</quota>
+            <allow_databases></allow_databases>
+        </no_access>
+
+        <has_access>
+            <password></password>
+            <networks incl="networks" replace="replace">
+                <ip>::/0</ip>
+            </networks>
+            <profile>default</profile>
+            <quota>default</quota>
+            <allow_databases>
+                <database>test</database>
+                <database>db1</database>
+            </allow_databases>
+        </has_access>
+    </users>
+
+    <quotas>
+        <default>
+        </default>
+    </quotas>
+</yandex>
--- a/dbms/tests/integration/test_user_zero_database_access/test_user_zero_database_access.py
+++ b/dbms/tests/integration/test_user_zero_database_access/test_user_zero_database_access.py
@ -0,0 +1,64 @@
+import time
+import pytest
+
+from helpers.cluster import ClickHouseCluster
+
+
+cluster = ClickHouseCluster(__file__)
+node = cluster.add_instance('node', config_dir="configs")
+
+
+@pytest.fixture(scope="module")
+def start_cluster():
+    try:
+        cluster.start()
+        node.query("CREATE DATABASE test;")
+        yield cluster
+    finally:
+        cluster.shutdown()
+
+
+def test_user_zero_database_access(start_cluster):
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'no_access' --query 'DROP DATABASE test'"], user='root')
+        assert False, "user with no access rights dropped database test"
+    except AssertionError:
+        raise
+    except Exception as ex:
+        print ex
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'has_access' --query 'DROP DATABASE test'"], user='root')
+    except Exception as ex:
+        assert False, "user with access rights can't drop database test"
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'has_access' --query 'CREATE DATABASE test'"], user='root')
+    except Exception as ex:
+        assert False, "user with access rights can't create database test"
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'no_access' --query 'CREATE DATABASE test2'"], user='root')
+        assert False, "user with no access rights created database test2"
+    except AssertionError:
+        raise
+    except Exception as ex:
+        print ex
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'has_access' --query 'CREATE DATABASE test2'"], user='root')
+        assert False, "user with limited access rights created database test2 which is outside of his scope of rights"
+    except AssertionError:
+        raise
+    except Exception as ex:
+        print ex
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'default' --query 'CREATE DATABASE test2'"], user='root')
+    except Exception as ex:
+        assert False, "user with full access rights can't create database test2"
+
+    try:
+        node.exec_in_container(["bash", "-c", "/usr/bin/clickhouse client --user 'default' --query 'DROP DATABASE test2'"], user='root')
+    except Exception as ex:
+        assert False, "user with full access rights can't drop database test2"
--- a/dbms/tests/performance/modulo.xml
+++ b/dbms/tests/performance/modulo.xml
@ -0,0 +1,17 @@
+<test>
+    <type>loop</type>
+
+    <stop_conditions>
+        <any_of>
+            <iterations>10</iterations>
+        </any_of>
+    </stop_conditions>
+
+    <main_metric>
+        <min_time />
+    </main_metric>
+
+    <query>SELECT number % 128 FROM numbers(300000000) FORMAT Null</query>
+    <query>SELECT number % 255 FROM numbers(300000000) FORMAT Null</query>
+    <query>SELECT number % 256 FROM numbers(300000000) FORMAT Null</query>
+</test>
--- a/dbms/tests/performance/prewhere.xml
+++ b/dbms/tests/performance/prewhere.xml
@ -0,0 +1,28 @@
+<test>
+    <type>loop</type>
+
+    <stop_conditions>
+        <all_of>
+            <iterations>5</iterations>
+            <min_time_not_changing_for_ms>10000</min_time_not_changing_for_ms>
+        </all_of>
+        <any_of>
+            <iterations>50</iterations>
+            <total_time_ms>60000</total_time_ms>
+        </any_of>
+    </stop_conditions>
+
+    <main_metric>
+        <min_time />
+    </main_metric>
+
+    <preconditions>
+        <table_exists>default.hits_10m_single</table_exists>
+    </preconditions>
+
+    <settings>
+        <max_threads>1</max_threads>
+    </settings>
+
+    <query>SELECT Title, URL FROM hits_10m_single PREWHERE WatchID % 2 = 1 WHERE UserID = 10000 FORMAT Null</query>
+</test>
--- a/dbms/tests/performance/string_set.xml
+++ b/dbms/tests/performance/string_set.xml
@ -0,0 +1,38 @@
+<test>
+    <type>loop</type>
+
+    <stop_conditions>
+        <any_of>
+            <iterations>10</iterations>
+        </any_of>
+    </stop_conditions>
+
+    <main_metric>
+        <rows_per_second />
+    </main_metric>
+
+    <preconditions>
+        <table_exists>default.hits_10m_single</table_exists>
+    </preconditions>
+
+    <create_query>CREATE TABLE hits_10m_words (word String, UserID UInt64) ENGINE Memory</create_query>
+    <create_query>CREATE TABLE strings (short String, long String) ENGINE Memory</create_query>
+
+    <fill_query>INSERT INTO hits_10m_words SELECT DISTINCT arrayJoin(splitByString(' ', SearchPhrase)) AS word, UserID FROM hits_10m_single WHERE length(word) > 0</fill_query>
+    <fill_query>INSERT INTO strings SELECT toString(rand()) a, a || a || a || a || a || a || a || a || a || a || a || a FROM numbers(1000000)</fill_query>
+
+    <settings>
+        <max_threads>1</max_threads>
+    </settings>
+
+    <query>SELECT 1 FROM hits_10m_words WHERE word IN (SELECT word FROM hits_10m_words) FORMAT Null</query>
+    <query>SELECT 1 FROM strings WHERE short IN (SELECT short FROM strings) FORMAT Null</query>
+    <query>SELECT 1 FROM strings WHERE long IN (SELECT long FROM strings) FORMAT Null</query>
+    <query>SELECT 1 FROM strings WHERE short IN (SELECT long FROM strings) FORMAT Null</query>
+    <query>SELECT 1 FROM strings WHERE long IN (SELECT short FROM strings) FORMAT Null</query>
+    <query>SELECT 1 FROM hits_10m_words WHERE word IN (SELECT short FROM strings) FORMAT Null</query>
+    <query>SELECT 1 FROM hits_10m_words WHERE word IN (SELECT long FROM strings) FORMAT Null</query>
+
+    <drop_query>DROP TABLE IF EXISTS hits_10m_words</drop_query>
+    <drop_query>DROP TABLE IF EXISTS strings</drop_query>
+</test>
--- a/dbms/tests/performance/vectorize_aggregation_combinators.xml
+++ b/dbms/tests/performance/vectorize_aggregation_combinators.xml
@ -0,0 +1,35 @@
+<test>
+
+    <type>loop</type>
+
+    <stop_conditions>
+        <all_of>
+            <total_time_ms>30000</total_time_ms>
+        </all_of>
+        <any_of>
+            <average_speed_not_changing_for_ms>6000</average_speed_not_changing_for_ms>
+            <total_time_ms>60000</total_time_ms>
+        </any_of>
+    </stop_conditions>
+
+    <main_metric>
+        <min_time/>
+    </main_metric>
+
+    <settings>
+        <max_threads>1</max_threads>
+    </settings>
+
+    <create_query>CREATE TABLE array_data(k UInt16, v Array(UInt64)) ENGINE Log</create_query>
+
+    <fill_query>INSERT INTO array_data SELECT number % 1024, arrayWithConstant(16, number) from numbers(10000000)</fill_query>
+
+    <query>SELECT countMerge(v) FROM (SELECT countState() v FROM numbers(1000000000)) FORMAT Null</query>
+    <query>SELECT countMerge(v) FROM (SELECT number % 1024 k, countState() v FROM numbers(1000000000) GROUP BY k) FORMAT Null</query>
+
+    <query>SELECT sumArray(v) FROM array_data FORMAT Null</query>
+    <query>SELECT k, sumArray(v) FROM array_data GROUP BY k FORMAT Null</query>
+    <query>SELECT arrayReduce('avg', v) FROM array_data FORMAT Null</query>
+
+    <drop_query>DROP TABLE IF EXISTS array_data</drop_query>
+</test>
--- a/dbms/tests/queries/0_stateless/00717_merge_and_distributed.reference
+++ b/dbms/tests/queries/0_stateless/00717_merge_and_distributed.reference
@ -54,3 +54,4 @@
 2018-08-01	-1
 2018-08-01	1
 2018-08-01	1
+2018-08-01	1
--- a/dbms/tests/queries/0_stateless/00717_merge_and_distributed.sql
+++ b/dbms/tests/queries/0_stateless/00717_merge_and_distributed.sql
@ -80,7 +80,7 @@ SELECT '--------------Implicit type conversion------------';
 SELECT * FROM merge(currentDatabase(), 'test_s64_distributed|test_u64_distributed') ORDER BY value;
 SELECT * FROM merge(currentDatabase(), 'test_s64_distributed|test_u64_distributed') WHERE date = '2018-08-01' ORDER BY value;
 SELECT * FROM merge(currentDatabase(), 'test_s64_distributed|test_u64_distributed') WHERE _table = 'test_u64_distributed' ORDER BY value;
-SELECT * FROM merge(currentDatabase(), 'test_s64_distributed|test_u64_distributed') WHERE value = 1; -- { serverError 171 }
+SELECT * FROM merge(currentDatabase(), 'test_s64_distributed|test_u64_distributed') WHERE value = 1;

 DROP TABLE IF EXISTS test_u64_local;
 DROP TABLE IF EXISTS test_s64_local;
--- a/dbms/tests/queries/0_stateless/00930_arrayIntersect.reference
+++ b/dbms/tests/queries/0_stateless/00930_arrayIntersect.reference
@ -46,3 +46,6 @@
 []
 []
 []
+-
+[]
+[]
--- a/Show More
+++ b/Show More