Merge pull request #1 from yandex/master

merge with master
2024-11-19 14:11:58 +00:00 · 2019-04-17 23:57:01 +03:00 · 2019-04-17 23:57:01 +03:00 · 7a4dc5a212
commit 7a4dc5a212
parent f8a1eaba3a 345ae9aaa5
690 changed files with 18379 additions and 9648 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,109 @@
 ## ClickHouse release 19.5.2.6, 2019-04-15
 ### New Features
 * [Hyperscan](https://github.com/intel/hyperscan) multiple regular expression matching was added (functions `multiMatchAny`, `multiMatchAnyIndex`, `multiFuzzyMatchAny`, `multiFuzzyMatchAnyIndex`). [#4780](https://github.com/yandex/ClickHouse/pull/4780), [#4841](https://github.com/yandex/ClickHouse/pull/4841) ([Danila Kutenin](https://github.com/danlark1))
 * `multiSearchFirstPosition` function was added. [#4780](https://github.com/yandex/ClickHouse/pull/4780) ([Danila Kutenin](https://github.com/danlark1))
 * Implement the predefined expression filter per row for tables. [#4792](https://github.com/yandex/ClickHouse/pull/4792) ([Ivan](https://github.com/abyss7))
 * A new type of data skipping indices based on bloom filters (can be used for `equal`, `in` and `like` functions). [#4499](https://github.com/yandex/ClickHouse/pull/4499) ([Nikita Vasilev](https://github.com/nikvas0))
 * Added `ASOF JOIN` which allows to run queries that join to the most recent value known. [#4774](https://github.com/yandex/ClickHouse/pull/4774) [#4867](https://github.com/yandex/ClickHouse/pull/4867) [#4863](https://github.com/yandex/ClickHouse/pull/4863) [#4875](https://github.com/yandex/ClickHouse/pull/4875) ([Martijn Bakker](https://github.com/Gladdy), [Artem Zuikov](https://github.com/4ertus2))
 * Rewrite multiple `COMMA JOIN` to `CROSS JOIN`. Then rewrite them to `INNER JOIN` if possible. [#4661](https://github.com/yandex/ClickHouse/pull/4661) ([Artem Zuikov](https://github.com/4ertus2))
 ### Improvement
 * `topK` and `topKWeighted` now supports custom `loadFactor` (fixes issue [#4252](https://github.com/yandex/ClickHouse/issues/4252)). [#4634](https://github.com/yandex/ClickHouse/pull/4634) ([Kirill Danshin](https://github.com/kirillDanshin))
 * Allow to use `parallel_replicas_count > 1` even for tables without sampling (the setting is simply ignored for them). In previous versions it was lead to exception. [#4637](https://github.com/yandex/ClickHouse/pull/4637) ([Alexey Elymanov](https://github.com/digitalist))
 * Support for `CREATE OR REPLACE VIEW`. Allow to create a view or set a new definition in a single statement. [#4654](https://github.com/yandex/ClickHouse/pull/4654) ([Boris Granveaud](https://github.com/bgranvea))
 * `Buffer` table engine now supports `PREWHERE`. [#4671](https://github.com/yandex/ClickHouse/pull/4671) ([Yangkuan Liu](https://github.com/LiuYangkuan))
 * Add ability to start replicated table without metadata in zookeeper in `readonly` mode. [#4691](https://github.com/yandex/ClickHouse/pull/4691) ([alesapin](https://github.com/alesapin))
 * Fixed flicker of progress bar in clickhouse-client. The issue was most noticeable when using `FORMAT Null` with streaming queries. [#4811](https://github.com/yandex/ClickHouse/pull/4811) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Allow to disable functions with `hyperscan` library on per user basis to limit potentially excessive and uncontrolled resource usage. [#4816](https://github.com/yandex/ClickHouse/pull/4816) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Add version number logging in all errors. [#4824](https://github.com/yandex/ClickHouse/pull/4824) ([proller](https://github.com/proller))
 * Added restriction to the `multiMatch` functions which requires string size to fit into `unsigned int`. Also added the number of arguments limit to the `multiSearch` functions. [#4834](https://github.com/yandex/ClickHouse/pull/4834) ([Danila Kutenin](https://github.com/danlark1))
 * Improved usage of scratch space and error handling in Hyperscan. [#4866](https://github.com/yandex/ClickHouse/pull/4866) ([Danila Kutenin](https://github.com/danlark1))
 * Fill `system.graphite_detentions` from a table config of `*GraphiteMergeTree` engine tables. [#4584](https://github.com/yandex/ClickHouse/pull/4584) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
 * Rename `trigramDistance` function to `ngramDistance` and add more functions with `CaseInsensitive` and `UTF`. [#4602](https://github.com/yandex/ClickHouse/pull/4602) ([Danila Kutenin](https://github.com/danlark1))
 * Improved data skipping indices calculation. [#4640](https://github.com/yandex/ClickHouse/pull/4640) ([Nikita Vasilev](https://github.com/nikvas0))
 ### Bug Fix
 * Avoid `std::terminate` in case of memory allocation failure. Now `std::bad_alloc` exception is thrown as expected. [#4665](https://github.com/yandex/ClickHouse/pull/4665) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixes capnproto reading from buffer. Sometimes files wasn't loaded successfully by HTTP. [#4674](https://github.com/yandex/ClickHouse/pull/4674) ([Vladislav](https://github.com/smirnov-vs))
 * Fix error `Unknown log entry type: 0` after `OPTIMIZE TABLE FINAL` query. [#4683](https://github.com/yandex/ClickHouse/pull/4683) ([Amos Bird](https://github.com/amosbird))
 * Wrong arguments to `hasAny` or `hasAll` functions may lead to segfault. [#4698](https://github.com/yandex/ClickHouse/pull/4698) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Deadlock may happen while executing `DROP DATABASE dictionary` query. [#4701](https://github.com/yandex/ClickHouse/pull/4701) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix undefinied behavior in `median` and `quantile` functions. [#4702](https://github.com/yandex/ClickHouse/pull/4702) ([hcz](https://github.com/hczhcz))
 * Fix compression level detection when `network_compression_method` in lowercase. Broken in v19.1. [#4706](https://github.com/yandex/ClickHouse/pull/4706) ([proller](https://github.com/proller))
 * Keep ordinary, `DEFAULT`, `MATERIALIZED` and `ALIAS` columns in a single list (fixes issue [#2867](https://github.com/yandex/ClickHouse/issues/2867)). [#4707](https://github.com/yandex/ClickHouse/pull/4707) ([Alex Zatelepin](https://github.com/ztlpn))
 * Fixed ignorance of `<timezone>UTC</timezone>` setting (fixes issue [#4658](https://github.com/yandex/ClickHouse/issues/4658)). [#4718](https://github.com/yandex/ClickHouse/pull/4718) ([proller](https://github.com/proller))
 * Fix `histogram` function behaviour with `Distributed` tables. [#4741](https://github.com/yandex/ClickHouse/pull/4741) ([olegkv](https://github.com/olegkv))
 * Fixed tsan report `destroy of a locked mutex`. [#4742](https://github.com/yandex/ClickHouse/pull/4742) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed TSan report on shutdown due to race condition in system logs usage. Fixed potential use-after-free on shutdown when part_log is enabled. [#4758](https://github.com/yandex/ClickHouse/pull/4758) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix recheck parts in `ReplicatedMergeTreeAlterThread` in case of error. [#4772](https://github.com/yandex/ClickHouse/pull/4772) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 * Arithmetic operations on intermediate aggregate function states were not working for constant arguments (such as subquery results). [#4776](https://github.com/yandex/ClickHouse/pull/4776) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Always backquote column names in metadata. Otherwise it's impossible to create a table with column named `index` (server won't restart due to malformed `ATTACH` query in metadata). [#4782](https://github.com/yandex/ClickHouse/pull/4782) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix crash in `ALTER ... MODIFY ORDER BY` on `Distributed` table. [#4790](https://github.com/yandex/ClickHouse/pull/4790) ([TCeason](https://github.com/TCeason))
 * Fix segfault in `JOIN ON` with enabled `enable_optimize_predicate_expression`. [#4794](https://github.com/yandex/ClickHouse/pull/4794) ([Winter Zhang](https://github.com/zhang2014))
 * Fix bug with adding an extraneous row after consuming a protobuf message from Kafka. [#4808](https://github.com/yandex/ClickHouse/pull/4808) ([Vitaly Baranov](https://github.com/vitlibar))
 * Fix crash of `JOIN` on not-nullable vs nullable column. Fix `NULLs` in right keys in `ANY JOIN` + `join_use_nulls`. [#4815](https://github.com/yandex/ClickHouse/pull/4815) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix segmentation fault in `clickhouse-copier`. [#4835](https://github.com/yandex/ClickHouse/pull/4835) ([proller](https://github.com/proller))
 * Fixed race condition in `SELECT` from `system.tables` if the table is renamed or altered concurrently. [#4836](https://github.com/yandex/ClickHouse/pull/4836) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed data race when fetching data part that is already obsolete. [#4839](https://github.com/yandex/ClickHouse/pull/4839) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed rare data race that can happen during `RENAME` table of MergeTree family. [#4844](https://github.com/yandex/ClickHouse/pull/4844) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed segmentation fault in function `arrayIntersect`. Segmentation fault could happen if function was called with mixed constant and ordinary arguments. [#4847](https://github.com/yandex/ClickHouse/pull/4847) ([Lixiang Qian](https://github.com/fancyqlx))
 * Fixed reading from `Array(LowCardinality)` column in rare case when column contained a long sequence of empty arrays. [#4850](https://github.com/yandex/ClickHouse/pull/4850) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 * Fix crash in `FULL/RIGHT JOIN` when we joining on nullable vs not nullable. [#4855](https://github.com/yandex/ClickHouse/pull/4855) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix `No message received` exception while fetching parts between replicas. [#4856](https://github.com/yandex/ClickHouse/pull/4856) ([alesapin](https://github.com/alesapin))
 * Fixed `arrayIntersect` function wrong result in case of several repeated values in single array. [#4871](https://github.com/yandex/ClickHouse/pull/4871) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 * Fix a race condition during concurrent `ALTER COLUMN` queries that could lead to a server crash (fixes issue [#3421](https://github.com/yandex/ClickHouse/issues/3421)). [#4592](https://github.com/yandex/ClickHouse/pull/4592) ([Alex Zatelepin](https://github.com/ztlpn))
 * Fix incorrect result in `FULL/RIGHT JOIN` with const column. [#4723](https://github.com/yandex/ClickHouse/pull/4723) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix duplicates in `GLOBAL JOIN` with asterisk. [#4705](https://github.com/yandex/ClickHouse/pull/4705) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix parameter deduction in `ALTER MODIFY` of column `CODEC` when column type is not specified. [#4883](https://github.com/yandex/ClickHouse/pull/4883) ([alesapin](https://github.com/alesapin))
 * Functions `cutQueryStringAndFragment()` and `queryStringAndFragment()` now works correctly when `URL` contains a fragment and no query. [#4894](https://github.com/yandex/ClickHouse/pull/4894) ([Vitaly Baranov](https://github.com/vitlibar))
 * Fix rare bug when setting `min_bytes_to_use_direct_io` is greater than zero, which occures when thread have to seek backward in column file. [#4897](https://github.com/yandex/ClickHouse/pull/4897) ([alesapin](https://github.com/alesapin))
 * Fix wrong argument types for aggregate functions with `LowCardinality` arguments (fixes issue [#4919](https://github.com/yandex/ClickHouse/issues/4919)). [#4922](https://github.com/yandex/ClickHouse/pull/4922) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 * Fix wrong name qualification in `GLOBAL JOIN`. [#4969](https://github.com/yandex/ClickHouse/pull/4969) ([Artem Zuikov](https://github.com/4ertus2))
 * Function `toISOWeek` result for year 1970. [#4988](https://github.com/yandex/ClickHouse/pull/4988) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix `DROP`, `TRUNCATE` and `OPTIMIZE` queries duplication, when executed on `ON CLUSTER` for `ReplicatedMergeTree*` tables family. [#4991](https://github.com/yandex/ClickHouse/pull/4991) ([alesapin](https://github.com/alesapin))
 ### Backward Incompatible Change
 * Rename setting `insert_sample_with_metadata` to setting `input_format_defaults_for_omitted_fields`. [#4771](https://github.com/yandex/ClickHouse/pull/4771) ([Artem Zuikov](https://github.com/4ertus2))
 * Added setting `max_partitions_per_insert_block` (with value 100 by default). If inserted block contains larger number of partitions, an exception is thrown. Set it to 0 if you want to remove the limit (not recommended). [#4845](https://github.com/yandex/ClickHouse/pull/4845) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Multi-search functions were renamed (`multiPosition` to `multiSearchAllPositions`, `multiSearch` to `multiSearchAny`, `firstMatch` to `multiSearchFirstIndex`). [#4780](https://github.com/yandex/ClickHouse/pull/4780) ([Danila Kutenin](https://github.com/danlark1))
 ### Performance Improvement
 * Optimize Volnitsky searcher by inlining, giving about 5-10% search improvement for queries with many needles or many similar bigrams. [#4862](https://github.com/yandex/ClickHouse/pull/4862) ([Danila Kutenin](https://github.com/danlark1))
 * Fix performance issue when setting `use_uncompressed_cache` is greater than zero, which appeared when all read data contained in cache. [#4913](https://github.com/yandex/ClickHouse/pull/4913) ([alesapin](https://github.com/alesapin))
 ### Build/Testing/Packaging Improvement
 * Hardening debug build: more granular memory mappings and ASLR; add memory protection for mark cache and index. This allows to find more memory stomping bugs in case when ASan and MSan cannot do it. [#4632](https://github.com/yandex/ClickHouse/pull/4632) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Add support for cmake variables `ENABLE_PROTOBUF`, `ENABLE_PARQUET` and `ENABLE_BROTLI` which allows to enable/disable the above features (same as we can do for librdkafka, mysql, etc). [#4669](https://github.com/yandex/ClickHouse/pull/4669) ([Silviu Caragea](https://github.com/silviucpp))
 * Add ability to print process list and stacktraces of all threads if some queries are hung after test run. [#4675](https://github.com/yandex/ClickHouse/pull/4675) ([alesapin](https://github.com/alesapin))
 * Add retries on `Connection loss` error in `clickhouse-test`. [#4682](https://github.com/yandex/ClickHouse/pull/4682) ([alesapin](https://github.com/alesapin))
 * Add freebsd build with vagrant and build with thread sanitizer to packager script. [#4712](https://github.com/yandex/ClickHouse/pull/4712) [#4748](https://github.com/yandex/ClickHouse/pull/4748) ([alesapin](https://github.com/alesapin))
 * Now user asked for password for user `'default'` during installation. [#4725](https://github.com/yandex/ClickHouse/pull/4725) ([proller](https://github.com/proller))
 * Suppress warning in `rdkafka` library. [#4740](https://github.com/yandex/ClickHouse/pull/4740) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Allow ability to build without ssl. [#4750](https://github.com/yandex/ClickHouse/pull/4750) ([proller](https://github.com/proller))
 * Add a way to launch clickhouse-server image from a custom user. [#4753](https://github.com/yandex/ClickHouse/pull/4753) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
 * Upgrade contrib boost to 1.69. [#4793](https://github.com/yandex/ClickHouse/pull/4793) ([proller](https://github.com/proller))
 * Disable usage of `mremap` when compiled with Thread Sanitizer. Surprisingly enough, TSan does not intercept `mremap` (though it does intercept `mmap`, `munmap`) that leads to false positives. Fixed TSan report in stateful tests. [#4859](https://github.com/yandex/ClickHouse/pull/4859) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Add test checking using format schema via HTTP interface. [#4864](https://github.com/yandex/ClickHouse/pull/4864) ([Vitaly Baranov](https://github.com/vitlibar))
 ## ClickHouse release 19.4.3.11, 2019-04-02
 ### Bug Fixes
 * Fix crash in `FULL/RIGHT JOIN` when we joining on nullable vs not nullable. [#4855](https://github.com/yandex/ClickHouse/pull/4855) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix segmentation fault in `clickhouse-copier`. [#4835](https://github.com/yandex/ClickHouse/pull/4835) ([proller](https://github.com/proller))
 ### Build/Testing/Packaging Improvement
 * Add a way to launch clickhouse-server image from a custom user. [#4753](https://github.com/yandex/ClickHouse/pull/4753) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
 ## ClickHouse release 19.4.2.7, 2019-03-30
 ### Bug Fixes
@ -13,11 +119,11 @@
 ### New Features
 * Added full support for `Protobuf` format (input and output, nested data structures). [#4174](https://github.com/yandex/ClickHouse/pull/4174) [#4493](https://github.com/yandex/ClickHouse/pull/4493) ([Vitaly Baranov](https://github.com/vitlibar))
 * Added bitmap functions with Roaring Bitmaps. [#4207](https://github.com/yandex/ClickHouse/pull/4207) ([Andy Yang](https://github.com/andyyzh)) [#4568](https://github.com/yandex/ClickHouse/pull/4568) ([Vitaly Baranov](https://github.com/vitlibar))
-* Parquet format support [#4448](https://github.com/yandex/ClickHouse/pull/4448) ([proller](https://github.com/proller))
+* Parquet format support. [#4448](https://github.com/yandex/ClickHouse/pull/4448) ([proller](https://github.com/proller))
 * N-gram distance was added for fuzzy string comparison. It is similar to q-gram metrics in R language. [#4466](https://github.com/yandex/ClickHouse/pull/4466) ([Danila Kutenin](https://github.com/danlark1))
 * Combine rules for graphite rollup from dedicated aggregation and retention patterns. [#4426](https://github.com/yandex/ClickHouse/pull/4426) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
 * Added `max_execution_speed` and `max_execution_speed_bytes` to limit resource usage. Added `min_execution_speed_bytes` setting to complement the `min_execution_speed`. [#4430](https://github.com/yandex/ClickHouse/pull/4430) ([Winter Zhang](https://github.com/zhang2014))
-* Implemented function `flatten` [#4555](https://github.com/yandex/ClickHouse/pull/4555) [#4409](https://github.com/yandex/ClickHouse/pull/4409) ([alexey-milovidov](https://github.com/alexey-milovidov), [kzon](https://github.com/kzon))
+* Implemented function `flatten`. [#4555](https://github.com/yandex/ClickHouse/pull/4555) [#4409](https://github.com/yandex/ClickHouse/pull/4409) ([alexey-milovidov](https://github.com/alexey-milovidov), [kzon](https://github.com/kzon))
 * Added functions `arrayEnumerateDenseRanked` and `arrayEnumerateUniqRanked` (it's like `arrayEnumerateUniq` but allows to fine tune array depth to look inside multidimensional arrays). [#4475](https://github.com/yandex/ClickHouse/pull/4475) ([proller](https://github.com/proller)) [#4601](https://github.com/yandex/ClickHouse/pull/4601) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Multiple JOINS with some restrictions: no asterisks, no complex aliases in ON/WHERE/GROUP BY/... [#4462](https://github.com/yandex/ClickHouse/pull/4462) ([Artem Zuikov](https://github.com/4ertus2))
@ -26,25 +132,25 @@
 * Fixed bug in data skipping indices: order of granules after INSERT was incorrect. [#4407](https://github.com/yandex/ClickHouse/pull/4407) ([Nikita Vasilev](https://github.com/nikvas0))
 * Fixed `set` index for `Nullable` and `LowCardinality` columns. Before it, `set` index with `Nullable` or `LowCardinality` column led to error `Data type must be deserialized with multiple streams` while selecting. [#4594](https://github.com/yandex/ClickHouse/pull/4594) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 * Correctly set update_time on full `executable` dictionary update. [#4551](https://github.com/yandex/ClickHouse/pull/4551) ([Tema Novikov](https://github.com/temoon))
-* Fix broken progress bar in 19.3 [#4627](https://github.com/yandex/ClickHouse/pull/4627) ([filimonov](https://github.com/filimonov))
+* Fix broken progress bar in 19.3. [#4627](https://github.com/yandex/ClickHouse/pull/4627) ([filimonov](https://github.com/filimonov))
 * Fixed inconsistent values of MemoryTracker when memory region was shrinked, in certain cases. [#4619](https://github.com/yandex/ClickHouse/pull/4619) ([alexey-milovidov](https://github.com/alexey-milovidov))
-* Fixed undefined behaviour in ThreadPool [#4612](https://github.com/yandex/ClickHouse/pull/4612) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed undefined behaviour in ThreadPool. [#4612](https://github.com/yandex/ClickHouse/pull/4612) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fixed a very rare crash with the message `mutex lock failed: Invalid argument` that could happen when a MergeTree table was dropped concurrently with a SELECT. [#4608](https://github.com/yandex/ClickHouse/pull/4608) ([Alex Zatelepin](https://github.com/ztlpn))
-* ODBC driver compatibility with `LowCardinality` data type [#4381](https://github.com/yandex/ClickHouse/pull/4381) ([proller](https://github.com/proller))
+* ODBC driver compatibility with `LowCardinality` data type. [#4381](https://github.com/yandex/ClickHouse/pull/4381) ([proller](https://github.com/proller))
-* FreeBSD: Fixup for `AIOcontextPool: Found io_event with unknown id 0` error [#4438](https://github.com/yandex/ClickHouse/pull/4438) ([urgordeadbeef](https://github.com/urgordeadbeef))
+* FreeBSD: Fixup for `AIOcontextPool: Found io_event with unknown id 0` error. [#4438](https://github.com/yandex/ClickHouse/pull/4438) ([urgordeadbeef](https://github.com/urgordeadbeef))
 * `system.part_log` table was created regardless to configuration. [#4483](https://github.com/yandex/ClickHouse/pull/4483) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix undefined behaviour in `dictIsIn` function for cache dictionaries. [#4515](https://github.com/yandex/ClickHouse/pull/4515) ([alesapin](https://github.com/alesapin))
 * Fixed a deadlock when a SELECT query locks the same table multiple times (e.g. from different threads or when executing multiple subqueries) and there is a concurrent DDL query. [#4535](https://github.com/yandex/ClickHouse/pull/4535) ([Alex Zatelepin](https://github.com/ztlpn))
 * Disable compile_expressions by default until we get own `llvm` contrib and can test it with `clang` and `asan`. [#4579](https://github.com/yandex/ClickHouse/pull/4579) ([alesapin](https://github.com/alesapin))
 * Prevent `std::terminate` when `invalidate_query` for `clickhouse` external dictionary source has returned wrong resultset (empty or more than one row or more than one column). Fixed issue when the `invalidate_query` was performed every five seconds regardless to the `lifetime`. [#4583](https://github.com/yandex/ClickHouse/pull/4583) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Avoid deadlock when the `invalidate_query` for a dictionary with `clickhouse` source was involving `system.dictionaries` table or `Dictionaries` database (rare case). [#4599](https://github.com/yandex/ClickHouse/pull/4599) ([alexey-milovidov](https://github.com/alexey-milovidov))
-* Fixes for CROSS JOIN with empty WHERE [#4598](https://github.com/yandex/ClickHouse/pull/4598) ([Artem Zuikov](https://github.com/4ertus2))
+* Fixes for CROSS JOIN with empty WHERE. [#4598](https://github.com/yandex/ClickHouse/pull/4598) ([Artem Zuikov](https://github.com/4ertus2))
 * Fixed segfault in function "replicate" when constant argument is passed. [#4603](https://github.com/yandex/ClickHouse/pull/4603) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Fix lambda function with predicate optimizer. [#4408](https://github.com/yandex/ClickHouse/pull/4408) ([Winter Zhang](https://github.com/zhang2014))
 * Multiple JOINs multiple fixes. [#4595](https://github.com/yandex/ClickHouse/pull/4595) ([Artem Zuikov](https://github.com/4ertus2))
 ### Improvements
-* Support aliases in JOIN ON section for right table columns [#4412](https://github.com/yandex/ClickHouse/pull/4412) ([Artem Zuikov](https://github.com/4ertus2))
+* Support aliases in JOIN ON section for right table columns. [#4412](https://github.com/yandex/ClickHouse/pull/4412) ([Artem Zuikov](https://github.com/4ertus2))
 * Result of multiple JOINs need correct result names to be used in subselects. Replace flat aliases with source names in result. [#4474](https://github.com/yandex/ClickHouse/pull/4474) ([Artem Zuikov](https://github.com/4ertus2))
 * Improve push-down logic for joined statements. [#4387](https://github.com/yandex/ClickHouse/pull/4387) ([Ivan](https://github.com/abyss7))
@ -67,6 +173,18 @@
 * Fix compilation on Mac. [#4371](https://github.com/yandex/ClickHouse/pull/4371) ([Vitaly Baranov](https://github.com/vitlibar))
 * Build fixes for FreeBSD and various unusual build configurations. [#4444](https://github.com/yandex/ClickHouse/pull/4444) ([proller](https://github.com/proller))
 ## ClickHouse release 19.3.9.1, 2019-04-02
 ### Bug Fixes
 * Fix crash in `FULL/RIGHT JOIN` when we joining on nullable vs not nullable. [#4855](https://github.com/yandex/ClickHouse/pull/4855) ([Artem Zuikov](https://github.com/4ertus2))
 * Fix segmentation fault in `clickhouse-copier`. [#4835](https://github.com/yandex/ClickHouse/pull/4835) ([proller](https://github.com/proller))
 * Fixed reading from `Array(LowCardinality)` column in rare case when column contained a long sequence of empty arrays. [#4850](https://github.com/yandex/ClickHouse/pull/4850) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
 ### Build/Testing/Packaging Improvement
 * Add a way to launch clickhouse-server image from a custom user [#4753](https://github.com/yandex/ClickHouse/pull/4753) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
 ## ClickHouse release 19.3.7, 2019-03-12
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -317,6 +317,7 @@ include (cmake/find_hdfs3.cmake) # uses protobuf
 include (cmake/find_consistent-hashing.cmake)
 include (cmake/find_base64.cmake)
 include (cmake/find_hyperscan.cmake)
 include (cmake/find_lfalloc.cmake)
 find_contrib_lib(cityhash)
 find_contrib_lib(farmhash)
 find_contrib_lib(metrohash)
--- a/README.md
+++ b/README.md
@ -10,3 +10,10 @@ ClickHouse is an open-source column-oriented database management system that all
 * [Blog](https://clickhouse.yandex/blog/en/) contains various ClickHouse-related articles, as well as announces and reports about events.
 * [Contacts](https://clickhouse.yandex/#contacts) can help to get your questions answered if there are any.
 * You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
 ## Upcoming Events
 * [ClickHouse Community Meetup in Limassol](https://www.facebook.com/events/386638262181785/) on May 7.
 * ClickHouse at [Percona Live 2019](https://www.percona.com/live/19/other-open-source-databases-track) in Austin on May 28-30.
 * [ClickHouse Community Meetup in Beijing](https://www.huodongxing.com/event/2483759276200) on June 8.
 * [ClickHouse Community Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
 * [ClickHouse Community Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
--- a/cmake/find_lfalloc.cmake
+++ b/cmake/find_lfalloc.cmake
@ -0,0 +1,10 @@
 if (NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE AND NOT OS_FREEBSD)
    option (ENABLE_LFALLOC "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
 endif ()
 if (ENABLE_LFALLOC)
    set (USE_LFALLOC 1)
    set (USE_LFALLOC_RANDOM_HINT 1)
    set (LFALLOC_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/lfalloc/src)
    message (STATUS "Using lfalloc=${USE_LFALLOC}: ${LFALLOC_INCLUDE_DIR}")
 endif ()
--- a/cmake/find_poco.cmake
+++ b/cmake/find_poco.cmake
@ -36,6 +36,8 @@ elseif (NOT MISSING_INTERNAL_POCO_LIBRARY)
    set (ENABLE_DATA_SQLITE 0 CACHE BOOL "")
    set (ENABLE_DATA_MYSQL 0 CACHE BOOL "")
    set (ENABLE_DATA_POSTGRESQL 0 CACHE BOOL "")
    set (ENABLE_ENCODINGS 0 CACHE BOOL "")
    # new after 2.0.0:
    set (POCO_ENABLE_ZIP 0 CACHE BOOL "")
    set (POCO_ENABLE_PAGECOMPILER 0 CACHE BOOL "")
--- a/contrib/lfalloc/src/lf_allocX64.h
+++ b/contrib/lfalloc/src/lf_allocX64.h
--- a/contrib/lfalloc/src/lfmalloc.h
+++ b/contrib/lfalloc/src/lfmalloc.h
@ -0,0 +1,23 @@
 #pragma once
 #include <string.h>
 #include <stdlib.h>
 #include "util/system/compiler.h"
 namespace NMalloc {
    volatile inline bool IsAllocatorCorrupted = false;
    static inline void AbortFromCorruptedAllocator() {
        IsAllocatorCorrupted = true;
        abort();
    }
    struct TAllocHeader {
        void* Block;
        size_t AllocSize;
        void Y_FORCE_INLINE Encode(void* block, size_t size, size_t signature) {
            Block = block;
            AllocSize = size | signature;
        }
    };
 }
--- a/contrib/lfalloc/src/util/README.md
+++ b/contrib/lfalloc/src/util/README.md
@ -0,0 +1,33 @@
 Style guide for the util folder is a stricter version of general style guide (mostly in terms of ambiguity resolution).
 * all {} must be in K&R style
 * &, * tied closer to a type, not to variable
 * always use `using` not `typedef`
 * even a single line block must be in braces {}:
   ```
   if (A) {
       B();
   }
   ```
 * _ at the end of private data member of a class - `First_`, `Second_`
 * every .h file must be accompanied with corresponding .cpp to avoid a leakage and check that it is self contained
 * prohibited to use `printf`-like functions
 Things declared in the general style guide, which sometimes are missed:
 * `template <`, not `template<`
 * `noexcept`, not `throw ()` nor `throw()`, not required for destructors
 * indents inside `namespace` same as inside `class`
 Requirements for a new code (and for corrections in an old code which involves change of behaviour) in util:
 * presence of UNIT-tests
 * presence of comments in Doxygen style
 * accessors without Get prefix (`Length()`, but not `GetLength()`)
 This guide is not a mandatory as there is the general style guide.
 Nevertheless if it is not followed, then a next `ya style .` run in the util folder will undeservedly update authors of some lines of code.
 Thus before a commit it is recommended to run `ya style .` in the util folder.
--- a/contrib/lfalloc/src/util/system/atomic.h
+++ b/contrib/lfalloc/src/util/system/atomic.h
@ -0,0 +1,51 @@
 #pragma once
 #include "defaults.h"
 using TAtomicBase = intptr_t;
 using TAtomic = volatile TAtomicBase;
 #if defined(__GNUC__)
 #include "atomic_gcc.h"
 #elif defined(_MSC_VER)
 #include "atomic_win.h"
 #else
 #error unsupported platform
 #endif
 #if !defined(ATOMIC_COMPILER_BARRIER)
 #define ATOMIC_COMPILER_BARRIER()
 #endif
 static inline TAtomicBase AtomicSub(TAtomic& a, TAtomicBase v) {
    return AtomicAdd(a, -v);
 }
 static inline TAtomicBase AtomicGetAndSub(TAtomic& a, TAtomicBase v) {
    return AtomicGetAndAdd(a, -v);
 }
 #if defined(USE_GENERIC_SETGET)
 static inline TAtomicBase AtomicGet(const TAtomic& a) {
    return a;
 }
 static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
    a = v;
 }
 #endif
 static inline bool AtomicTryLock(TAtomic* a) {
    return AtomicCas(a, 1, 0);
 }
 static inline bool AtomicTryAndTryLock(TAtomic* a) {
    return (AtomicGet(*a) == 0) && AtomicTryLock(a);
 }
 static inline void AtomicUnlock(TAtomic* a) {
    ATOMIC_COMPILER_BARRIER();
    AtomicSet(*a, 0);
 }
 #include "atomic_ops.h"
--- a/contrib/lfalloc/src/util/system/atomic_gcc.h
+++ b/contrib/lfalloc/src/util/system/atomic_gcc.h
@ -0,0 +1,90 @@
 #pragma once
 #define ATOMIC_COMPILER_BARRIER() __asm__ __volatile__("" \
                                                       :  \
                                                       :  \
                                                       : "memory")
 static inline TAtomicBase AtomicGet(const TAtomic& a) {
    TAtomicBase tmp;
 #if defined(_arm64_)
    __asm__ __volatile__(
        "ldar %x[value], %[ptr]  \n\t"
        : [value] "=r"(tmp)
        : [ptr] "Q"(a)
        : "memory");
 #else
    __atomic_load(&a, &tmp, __ATOMIC_ACQUIRE);
 #endif
    return tmp;
 }
 static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
 #if defined(_arm64_)
    __asm__ __volatile__(
        "stlr %x[value], %[ptr]  \n\t"
        : [ptr] "=Q"(a)
        : [value] "r"(v)
        : "memory");
 #else
    __atomic_store(&a, &v, __ATOMIC_RELEASE);
 #endif
 }
 static inline intptr_t AtomicIncrement(TAtomic& p) {
    return __atomic_add_fetch(&p, 1, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicGetAndIncrement(TAtomic& p) {
    return __atomic_fetch_add(&p, 1, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicDecrement(TAtomic& p) {
    return __atomic_sub_fetch(&p, 1, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicGetAndDecrement(TAtomic& p) {
    return __atomic_fetch_sub(&p, 1, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicAdd(TAtomic& p, intptr_t v) {
    return __atomic_add_fetch(&p, v, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicGetAndAdd(TAtomic& p, intptr_t v) {
    return __atomic_fetch_add(&p, v, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicSwap(TAtomic* p, intptr_t v) {
    (void)p; // disable strange 'parameter set but not used' warning on gcc
    intptr_t ret;
    __atomic_exchange(p, &v, &ret, __ATOMIC_SEQ_CST);
    return ret;
 }
 static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    (void)a; // disable strange 'parameter set but not used' warning on gcc
    return __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    (void)a; // disable strange 'parameter set but not used' warning on gcc
    __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
    return compare;
 }
 static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
    return __atomic_or_fetch(&a, b, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
    return __atomic_xor_fetch(&a, b, __ATOMIC_SEQ_CST);
 }
 static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
    return __atomic_and_fetch(&a, b, __ATOMIC_SEQ_CST);
 }
 static inline void AtomicBarrier() {
    __sync_synchronize();
 }
--- a/contrib/lfalloc/src/util/system/atomic_ops.h
+++ b/contrib/lfalloc/src/util/system/atomic_ops.h
@ -0,0 +1,189 @@
 #pragma once
 #include <type_traits>
 template <typename T>
 inline TAtomic* AsAtomicPtr(T volatile* target) {
    return reinterpret_cast<TAtomic*>(target);
 }
 template <typename T>
 inline const TAtomic* AsAtomicPtr(T const volatile* target) {
    return reinterpret_cast<const TAtomic*>(target);
 }
 // integral types
 template <typename T>
 struct TAtomicTraits {
    enum {
        Castable = std::is_integral<T>::value && sizeof(T) == sizeof(TAtomicBase) && !std::is_const<T>::value,
    };
 };
 template <typename T, typename TT>
 using TEnableIfCastable = std::enable_if_t<TAtomicTraits<T>::Castable, TT>;
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGet(T const volatile& target) {
    return static_cast<T>(AtomicGet(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline TEnableIfCastable<T, void> AtomicSet(T volatile& target, TAtomicBase value) {
    AtomicSet(*AsAtomicPtr(&target), value);
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicIncrement(T volatile& target) {
    return static_cast<T>(AtomicIncrement(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGetAndIncrement(T volatile& target) {
    return static_cast<T>(AtomicGetAndIncrement(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicDecrement(T volatile& target) {
    return static_cast<T>(AtomicDecrement(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGetAndDecrement(T volatile& target) {
    return static_cast<T>(AtomicGetAndDecrement(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicAdd(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicAdd(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGetAndAdd(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicGetAndAdd(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicSub(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicSub(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGetAndSub(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicGetAndSub(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicSwap(T volatile* target, TAtomicBase exchange) {
    return static_cast<T>(AtomicSwap(AsAtomicPtr(target), exchange));
 }
 template <typename T>
 inline TEnableIfCastable<T, bool> AtomicCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
    return AtomicCas(AsAtomicPtr(target), exchange, compare);
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicGetAndCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
    return static_cast<T>(AtomicGetAndCas(AsAtomicPtr(target), exchange, compare));
 }
 template <typename T>
 inline TEnableIfCastable<T, bool> AtomicTryLock(T volatile* target) {
    return AtomicTryLock(AsAtomicPtr(target));
 }
 template <typename T>
 inline TEnableIfCastable<T, bool> AtomicTryAndTryLock(T volatile* target) {
    return AtomicTryAndTryLock(AsAtomicPtr(target));
 }
 template <typename T>
 inline TEnableIfCastable<T, void> AtomicUnlock(T volatile* target) {
    AtomicUnlock(AsAtomicPtr(target));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicOr(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicOr(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicAnd(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicAnd(*AsAtomicPtr(&target), value));
 }
 template <typename T>
 inline TEnableIfCastable<T, T> AtomicXor(T volatile& target, TAtomicBase value) {
    return static_cast<T>(AtomicXor(*AsAtomicPtr(&target), value));
 }
 // pointer types
 template <typename T>
 inline T* AtomicGet(T* const volatile& target) {
    return reinterpret_cast<T*>(AtomicGet(*AsAtomicPtr(&target)));
 }
 template <typename T>
 inline void AtomicSet(T* volatile& target, T* value) {
    AtomicSet(*AsAtomicPtr(&target), reinterpret_cast<TAtomicBase>(value));
 }
 using TNullPtr = decltype(nullptr);
 template <typename T>
 inline void AtomicSet(T* volatile& target, TNullPtr) {
    AtomicSet(*AsAtomicPtr(&target), 0);
 }
 template <typename T>
 inline T* AtomicSwap(T* volatile* target, T* exchange) {
    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange)));
 }
 template <typename T>
 inline T* AtomicSwap(T* volatile* target, TNullPtr) {
    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), 0));
 }
 template <typename T>
 inline bool AtomicCas(T* volatile* target, T* exchange, T* compare) {
    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare));
 }
 template <typename T>
 inline T* AtomicGetAndCas(T* volatile* target, T* exchange, T* compare) {
    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare)));
 }
 template <typename T>
 inline bool AtomicCas(T* volatile* target, T* exchange, TNullPtr) {
    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0);
 }
 template <typename T>
 inline T* AtomicGetAndCas(T* volatile* target, T* exchange, TNullPtr) {
    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0));
 }
 template <typename T>
 inline bool AtomicCas(T* volatile* target, TNullPtr, T* compare) {
    return AtomicCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare));
 }
 template <typename T>
 inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, T* compare) {
    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare)));
 }
 template <typename T>
 inline bool AtomicCas(T* volatile* target, TNullPtr, TNullPtr) {
    return AtomicCas(AsAtomicPtr(target), 0, 0);
 }
 template <typename T>
 inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, TNullPtr) {
    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, 0));
 }
--- a/contrib/lfalloc/src/util/system/atomic_win.h
+++ b/contrib/lfalloc/src/util/system/atomic_win.h
@ -0,0 +1,114 @@
 #pragma once
 #include <intrin.h>
 #define USE_GENERIC_SETGET
 #if defined(_i386_)
 #pragma intrinsic(_InterlockedIncrement)
 #pragma intrinsic(_InterlockedDecrement)
 #pragma intrinsic(_InterlockedExchangeAdd)
 #pragma intrinsic(_InterlockedExchange)
 #pragma intrinsic(_InterlockedCompareExchange)
 static inline intptr_t AtomicIncrement(TAtomic& a) {
    return _InterlockedIncrement((volatile long*)&a);
 }
 static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
    return _InterlockedIncrement((volatile long*)&a) - 1;
 }
 static inline intptr_t AtomicDecrement(TAtomic& a) {
    return _InterlockedDecrement((volatile long*)&a);
 }
 static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
    return _InterlockedDecrement((volatile long*)&a) + 1;
 }
 static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
    return _InterlockedExchangeAdd((volatile long*)&a, b) + b;
 }
 static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
    return _InterlockedExchangeAdd((volatile long*)&a, b);
 }
 static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
    return _InterlockedExchange((volatile long*)a, b);
 }
 static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    return _InterlockedCompareExchange((volatile long*)a, exchange, compare) == compare;
 }
 static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    return _InterlockedCompareExchange((volatile long*)a, exchange, compare);
 }
 #else // _x86_64_
 #pragma intrinsic(_InterlockedIncrement64)
 #pragma intrinsic(_InterlockedDecrement64)
 #pragma intrinsic(_InterlockedExchangeAdd64)
 #pragma intrinsic(_InterlockedExchange64)
 #pragma intrinsic(_InterlockedCompareExchange64)
 static inline intptr_t AtomicIncrement(TAtomic& a) {
    return _InterlockedIncrement64((volatile __int64*)&a);
 }
 static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
    return _InterlockedIncrement64((volatile __int64*)&a) - 1;
 }
 static inline intptr_t AtomicDecrement(TAtomic& a) {
    return _InterlockedDecrement64((volatile __int64*)&a);
 }
 static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
    return _InterlockedDecrement64((volatile __int64*)&a) + 1;
 }
 static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
    return _InterlockedExchangeAdd64((volatile __int64*)&a, b) + b;
 }
 static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
    return _InterlockedExchangeAdd64((volatile __int64*)&a, b);
 }
 static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
    return _InterlockedExchange64((volatile __int64*)a, b);
 }
 static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare) == compare;
 }
 static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare);
 }
 static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
    return _InterlockedOr64(&a, b) | b;
 }
 static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
    return _InterlockedAnd64(&a, b) & b;
 }
 static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
    return _InterlockedXor64(&a, b) ^ b;
 }
 #endif // _x86_
 //TODO
 static inline void AtomicBarrier() {
    TAtomic val = 0;
    AtomicSwap(&val, 0);
 }
--- a/contrib/lfalloc/src/util/system/compiler.h
+++ b/contrib/lfalloc/src/util/system/compiler.h
@ -0,0 +1,617 @@
 #pragma once
 // useful cross-platfrom definitions for compilers
 /**
 * @def Y_FUNC_SIGNATURE
 *
 * Use this macro to get pretty function name (see example).
 *
 * @code
 * void Hi() {
 *     Cout << Y_FUNC_SIGNATURE << Endl;
 * }
 * template <typename T>
 * void Do() {
 *     Cout << Y_FUNC_SIGNATURE << Endl;
 * }
 * int main() {
 *    Hi();         // void Hi()
 *    Do<int>();    // void Do() [T = int]
 *    Do<TString>(); // void Do() [T = TString]
 * }
 * @endcode
 */
 #if defined(__GNUC__)
 #define Y_FUNC_SIGNATURE __PRETTY_FUNCTION__
 #elif defined(_MSC_VER)
 #define Y_FUNC_SIGNATURE __FUNCSIG__
 #else
 #define Y_FUNC_SIGNATURE ""
 #endif
 #ifdef __GNUC__
 #define Y_PRINTF_FORMAT(n, m) __attribute__((__format__(__printf__, n, m)))
 #endif
 #ifndef Y_PRINTF_FORMAT
 #define Y_PRINTF_FORMAT(n, m)
 #endif
 #if defined(__clang__)
 #define Y_NO_SANITIZE(...) __attribute__((no_sanitize(__VA_ARGS__)))
 #endif
 #if !defined(Y_NO_SANITIZE)
 #define Y_NO_SANITIZE(...)
 #endif
 /**
 * @def Y_DECLARE_UNUSED
 *
 * Macro is needed to silence compiler warning about unused entities (e.g. function or argument).
 *
 * @code
 * Y_DECLARE_UNUSED int FunctionUsedSolelyForDebugPurposes();
 * assert(FunctionUsedSolelyForDebugPurposes() == 42);
 *
 * void Foo(const int argumentUsedOnlyForDebugPurposes Y_DECLARE_UNUSED) {
 *     assert(argumentUsedOnlyForDebugPurposes == 42);
 *     // however you may as well omit `Y_DECLARE_UNUSED` and use `UNUSED` macro instead
 *     Y_UNUSED(argumentUsedOnlyForDebugPurposes);
 * }
 * @endcode
 */
 #ifdef __GNUC__
 #define Y_DECLARE_UNUSED __attribute__((unused))
 #endif
 #ifndef Y_DECLARE_UNUSED
 #define Y_DECLARE_UNUSED
 #endif
 #if defined(__GNUC__)
 #define Y_LIKELY(Cond) __builtin_expect(!!(Cond), 1)
 #define Y_UNLIKELY(Cond) __builtin_expect(!!(Cond), 0)
 #define Y_PREFETCH_READ(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 0, Priority)
 #define Y_PREFETCH_WRITE(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 1, Priority)
 #endif
 /**
 * @def Y_FORCE_INLINE
 *
 * Macro to use in place of 'inline' in function declaration/definition to force
 * it to be inlined.
 */
 #if !defined(Y_FORCE_INLINE)
 #if defined(CLANG_COVERAGE)
 #/* excessive __always_inline__ might significantly slow down compilation of an instrumented unit */
 #define Y_FORCE_INLINE inline
 #elif defined(_MSC_VER)
 #define Y_FORCE_INLINE __forceinline
 #elif defined(__GNUC__)
 #/* Clang also defines __GNUC__ (as 4) */
 #define Y_FORCE_INLINE inline __attribute__((__always_inline__))
 #else
 #define Y_FORCE_INLINE inline
 #endif
 #endif
 /**
 * @def Y_NO_INLINE
 *
 * Macro to use in place of 'inline' in function declaration/definition to
 * prevent it from being inlined.
 */
 #if !defined(Y_NO_INLINE)
 #if defined(_MSC_VER)
 #define Y_NO_INLINE __declspec(noinline)
 #elif defined(__GNUC__) || defined(__INTEL_COMPILER)
 #/* Clang also defines __GNUC__ (as 4) */
 #define Y_NO_INLINE __attribute__((__noinline__))
 #else
 #define Y_NO_INLINE
 #endif
 #endif
 //to cheat compiler about strict aliasing or similar problems
 #if defined(__GNUC__)
 #define Y_FAKE_READ(X)                  \
    do {                                \
        __asm__ __volatile__(""         \
                             :          \
                             : "m"(X)); \
    } while (0)
 #define Y_FAKE_WRITE(X)                  \
    do {                                 \
        __asm__ __volatile__(""          \
                             : "=m"(X)); \
    } while (0)
 #endif
 #if !defined(Y_FAKE_READ)
 #define Y_FAKE_READ(X)
 #endif
 #if !defined(Y_FAKE_WRITE)
 #define Y_FAKE_WRITE(X)
 #endif
 #ifndef Y_PREFETCH_READ
 #define Y_PREFETCH_READ(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
 #endif
 #ifndef Y_PREFETCH_WRITE
 #define Y_PREFETCH_WRITE(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
 #endif
 #ifndef Y_LIKELY
 #define Y_LIKELY(Cond) (Cond)
 #define Y_UNLIKELY(Cond) (Cond)
 #endif
 #ifdef __GNUC__
 #define _packed __attribute__((packed))
 #else
 #define _packed
 #endif
 #if defined(__GNUC__)
 #define Y_WARN_UNUSED_RESULT __attribute__((warn_unused_result))
 #endif
 #ifndef Y_WARN_UNUSED_RESULT
 #define Y_WARN_UNUSED_RESULT
 #endif
 #if defined(__GNUC__)
 #define Y_HIDDEN __attribute__((visibility("hidden")))
 #endif
 #if !defined(Y_HIDDEN)
 #define Y_HIDDEN
 #endif
 #if defined(__GNUC__)
 #define Y_PUBLIC __attribute__((visibility("default")))
 #endif
 #if !defined(Y_PUBLIC)
 #define Y_PUBLIC
 #endif
 #if !defined(Y_UNUSED) && !defined(__cplusplus)
 #define Y_UNUSED(var) (void)(var)
 #endif
 #if !defined(Y_UNUSED) && defined(__cplusplus)
 template <class... Types>
 constexpr Y_FORCE_INLINE int Y_UNUSED(Types&&...) {
    return 0;
 };
 #endif
 /**
 * @def Y_ASSUME
 *
 * Macro that tells the compiler that it can generate optimized code
 * as if the given expression will always evaluate true.
 * The behavior is undefined if it ever evaluates false.
 *
 * @code
 * // factored into a function so that it's testable
 * inline int Avg(int x, int y) {
 *     if (x >= 0 && y >= 0) {
 *         return (static_cast<unsigned>(x) + static_cast<unsigned>(y)) >> 1;
 *     } else {
 *         // a slower implementation
 *     }
 * }
 *
 * // we know that xs and ys are non-negative from domain knowledge,
 * // but we can't change the types of xs and ys because of API constrains
 * int Foo(const TVector<int>& xs, const TVector<int>& ys) {
 *     TVector<int> avgs;
 *     avgs.resize(xs.size());
 *     for (size_t i = 0; i < xs.size(); ++i) {
 *         auto x = xs[i];
 *         auto y = ys[i];
 *         Y_ASSUME(x >= 0);
 *         Y_ASSUME(y >= 0);
 *         xs[i] = Avg(x, y);
 *     }
 * }
 * @endcode
 */
 #if defined(__GNUC__)
 #define Y_ASSUME(condition) ((condition) ? (void)0 : __builtin_unreachable())
 #elif defined(_MSC_VER)
 #define Y_ASSUME(condition) __assume(condition)
 #else
 #define Y_ASSUME(condition) Y_UNUSED(condition)
 #endif
 #ifdef __cplusplus
 [[noreturn]]
 #endif
 Y_HIDDEN void _YandexAbort();
 /**
 * @def Y_UNREACHABLE
 *
 * Macro that marks the rest of the code branch unreachable.
 * The behavior is undefined if it's ever reached.
 *
 * @code
 * switch (i % 3) {
 * case 0:
 *     return foo;
 * case 1:
 *     return bar;
 * case 2:
 *     return baz;
 * default:
 *     Y_UNREACHABLE();
 * }
 * @endcode
 */
 #if defined(__GNUC__) || defined(_MSC_VER)
 #define Y_UNREACHABLE() Y_ASSUME(0)
 #else
 #define Y_UNREACHABLE() _YandexAbort()
 #endif
 #if defined(undefined_sanitizer_enabled)
 #define _ubsan_enabled_
 #endif
 #ifdef __clang__
 #if __has_feature(thread_sanitizer)
 #define _tsan_enabled_
 #endif
 #if __has_feature(memory_sanitizer)
 #define _msan_enabled_
 #endif
 #if __has_feature(address_sanitizer)
 #define _asan_enabled_
 #endif
 #else
 #if defined(thread_sanitizer_enabled) || defined(__SANITIZE_THREAD__)
 #define _tsan_enabled_
 #endif
 #if defined(memory_sanitizer_enabled)
 #define _msan_enabled_
 #endif
 #if defined(address_sanitizer_enabled) || defined(__SANITIZE_ADDRESS__)
 #define _asan_enabled_
 #endif
 #endif
 #if defined(_asan_enabled_) || defined(_msan_enabled_) || defined(_tsan_enabled_) || defined(_ubsan_enabled_)
 #define _san_enabled_
 #endif
 #if defined(_MSC_VER)
 #define __PRETTY_FUNCTION__ __FUNCSIG__
 #endif
 #if defined(__GNUC__)
 #define Y_WEAK __attribute__((weak))
 #else
 #define Y_WEAK
 #endif
 #if defined(__CUDACC_VER_MAJOR__)
 #define Y_CUDA_AT_LEAST(x, y) (__CUDACC_VER_MAJOR__ > x || (__CUDACC_VER_MAJOR__ == x && __CUDACC_VER_MINOR__ >= y))
 #else
 #define Y_CUDA_AT_LEAST(x, y) 0
 #endif
 // NVidia CUDA C++ Compiler did not know about noexcept keyword until version 9.0
 #if !Y_CUDA_AT_LEAST(9, 0)
 #if defined(__CUDACC__) && !defined(noexcept)
 #define noexcept throw ()
 #endif
 #endif
 #if defined(__GNUC__)
 #define Y_COLD __attribute__((cold))
 #define Y_LEAF __attribute__((leaf))
 #define Y_WRAPPER __attribute__((artificial))
 #else
 #define Y_COLD
 #define Y_LEAF
 #define Y_WRAPPER
 #endif
 /**
 * @def Y_PRAGMA
 *
 * Macro for use in other macros to define compiler pragma
 * See below for other usage examples
 *
 * @code
 * #if defined(__clang__) || defined(__GNUC__)
 * #define Y_PRAGMA_NO_WSHADOW \
 *     Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
 * #elif defined(_MSC_VER)
 * #define Y_PRAGMA_NO_WSHADOW \
 *     Y_PRAGMA("warning(disable:4456 4457")
 * #else
 * #define Y_PRAGMA_NO_WSHADOW
 * #endif
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA(x) _Pragma(x)
 #elif defined(_MSC_VER)
 #define Y_PRAGMA(x) __pragma(x)
 #else
 #define Y_PRAGMA(x)
 #endif
 /**
 * @def Y_PRAGMA_DIAGNOSTIC_PUSH
 *
 * Cross-compiler pragma to save diagnostic settings
 *
 * @see
 *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
 *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
 *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_PUSH
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_DIAGNOSTIC_PUSH \
    Y_PRAGMA("GCC diagnostic push")
 #elif defined(_MSC_VER)
 #define Y_PRAGMA_DIAGNOSTIC_PUSH \
    Y_PRAGMA(warning(push))
 #else
 #define Y_PRAGMA_DIAGNOSTIC_PUSH
 #endif
 /**
 * @def Y_PRAGMA_DIAGNOSTIC_POP
 *
 * Cross-compiler pragma to restore diagnostic settings
 *
 * @see
 *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
 *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
 *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_POP
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_DIAGNOSTIC_POP \
    Y_PRAGMA("GCC diagnostic pop")
 #elif defined(_MSC_VER)
 #define Y_PRAGMA_DIAGNOSTIC_POP \
    Y_PRAGMA(warning(pop))
 #else
 #define Y_PRAGMA_DIAGNOSTIC_POP
 #endif
 /**
 * @def Y_PRAGMA_NO_WSHADOW
 *
 * Cross-compiler pragma to disable warnings about shadowing variables
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_PUSH
 * Y_PRAGMA_NO_WSHADOW
 *
 * // some code which use variable shadowing, e.g.:
 *
 * for (int i = 0; i < 100; ++i) {
 *   Use(i);
 *
 *   for (int i = 42; i < 100500; ++i) { // this i is shadowing previous i
 *       AnotherUse(i);
 *    }
 * }
 *
 * Y_PRAGMA_DIAGNOSTIC_POP
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_NO_WSHADOW \
    Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
 #elif defined(_MSC_VER)
 #define Y_PRAGMA_NO_WSHADOW \
    Y_PRAGMA(warning(disable : 4456 4457))
 #else
 #define Y_PRAGMA_NO_WSHADOW
 #endif
 /**
 * @ def Y_PRAGMA_NO_UNUSED_FUNCTION
 *
 * Cross-compiler pragma to disable warnings about unused functions
 *
 * @see
 *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
 *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function
 *     MSVC: there is no such warning
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_PUSH
 * Y_PRAGMA_NO_UNUSED_FUNCTION
 *
 * // some code which introduces a function which later will not be used, e.g.:
 *
 * void Foo() {
 * }
 *
 * int main() {
 *     return 0; // Foo() never called
 * }
 *
 * Y_PRAGMA_DIAGNOSTIC_POP
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_NO_UNUSED_FUNCTION \
    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-function\"")
 #else
 #define Y_PRAGMA_NO_UNUSED_FUNCTION
 #endif
 /**
 * @ def Y_PRAGMA_NO_UNUSED_PARAMETER
 *
 * Cross-compiler pragma to disable warnings about unused function parameters
 *
 * @see
 *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
 *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-parameter
 *     MSVC: https://msdn.microsoft.com/en-us/library/26kb9fy0.aspx
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_PUSH
 * Y_PRAGMA_NO_UNUSED_PARAMETER
 *
 * // some code which introduces a function with unused parameter, e.g.:
 *
 * void foo(int a) {
 *     // a is not referenced
 * }
 *
 * int main() {
 *     foo(1);
 *     return 0;
 * }
 *
 * Y_PRAGMA_DIAGNOSTIC_POP
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_NO_UNUSED_PARAMETER \
    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-parameter\"")
 #elif defined(_MSC_VER)
 #define Y_PRAGMA_NO_UNUSED_PARAMETER \
    Y_PRAGMA(warning(disable : 4100))
 #else
 #define Y_PRAGMA_NO_UNUSED_PARAMETER
 #endif
 /**
 * @def Y_PRAGMA_NO_DEPRECATED
 *
 * Cross compiler pragma to disable warnings and errors about deprecated
 *
 * @see
 *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
 *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wdeprecated
 *     MSVC: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4996?view=vs-2017
 *
 * @code
 * Y_PRAGMA_DIAGNOSTIC_PUSH
 * Y_PRAGMA_NO_DEPRECATED
 *
 * [deprecated] void foo() {
 *     // ...
 * }
 *
 * int main() {
 *     foo();
 *     return 0;
 * }
 *
 * Y_PRAGMA_DIAGNOSTIC_POP
 * @endcode
 */
 #if defined(__clang__) || defined(__GNUC__)
 #define Y_PRAGMA_NO_DEPRECATED \
    Y_PRAGMA("GCC diagnostic ignored \"-Wdeprecated\"")
 #elif defined(_MSC_VER)
 #define Y_PRAGMA_NO_DEPRECATED \
    Y_PRAGMA(warning(disable : 4996))
 #else
 #define Y_PRAGMA_NO_DEPRECATED
 #endif
 #if defined(__clang__) || defined(__GNUC__)
 /**
 * @def Y_CONST_FUNCTION
   methods and functions, marked with this method are promised to:
     1. do not have side effects
     2. this method do not read global memory
   NOTE: this attribute can't be set for methods that depend on data, pointed by this
   this allow compilers to do hard optimization of that functions
   NOTE: in common case this attribute can't be set if method have pointer-arguments
   NOTE: as result there no any reason to discard result of such method
 */
 #define Y_CONST_FUNCTION [[gnu::const]]
 #endif
 #if !defined(Y_CONST_FUNCTION)
 #define Y_CONST_FUNCTION
 #endif
 #if defined(__clang__) || defined(__GNUC__)
 /**
 * @def Y_PURE_FUNCTION
   methods and functions, marked with this method are promised to:
     1. do not have side effects
     2. result will be the same if no global memory changed
   this allow compilers to do hard optimization of that functions
   NOTE: as result there no any reason to discard result of such method
 */
 #define Y_PURE_FUNCTION [[gnu::pure]]
 #endif
 #if !defined(Y_PURE_FUNCTION)
 #define Y_PURE_FUNCTION
 #endif
 /**
 * @ def Y_HAVE_INT128
 *
 * Defined when the compiler supports __int128 extension
 *
 * @code
 *
 * #if defined(Y_HAVE_INT128)
 *     __int128 myVeryBigInt = 12345678901234567890;
 * #endif
 *
 * @endcode
 */
 #if defined(__SIZEOF_INT128__)
 #define Y_HAVE_INT128 1
 #endif
 /**
 * XRAY macro must be passed to compiler if XRay is enabled.
 *
 * Define everything XRay-specific as a macro so that it doesn't cause errors
 * for compilers that doesn't support XRay.
 */
 #if defined(XRAY) && defined(__cplusplus)
 #include <xray/xray_interface.h>
 #define Y_XRAY_ALWAYS_INSTRUMENT [[clang::xray_always_instrument]]
 #define Y_XRAY_NEVER_INSTRUMENT [[clang::xray_never_instrument]]
 #define Y_XRAY_CUSTOM_EVENT(__string, __length) \
    do {                                        \
        __xray_customevent(__string, __length); \
    } while (0)
 #else
 #define Y_XRAY_ALWAYS_INSTRUMENT
 #define Y_XRAY_NEVER_INSTRUMENT
 #define Y_XRAY_CUSTOM_EVENT(__string, __length) \
    do {                                        \
    } while (0)
 #endif
--- a/contrib/lfalloc/src/util/system/defaults.h
+++ b/contrib/lfalloc/src/util/system/defaults.h
@ -0,0 +1,168 @@
 #pragma once
 #include "platform.h"
 #if defined _unix_
 #define LOCSLASH_C '/'
 #define LOCSLASH_S "/"
 #else
 #define LOCSLASH_C '\\'
 #define LOCSLASH_S "\\"
 #endif // _unix_
 #if defined(__INTEL_COMPILER) && defined(__cplusplus)
 #include <new>
 #endif
 // low and high parts of integers
 #if !defined(_win_)
 #include <sys/param.h>
 #endif
 #if defined(BSD) || defined(_android_)
 #if defined(BSD)
 #include <machine/endian.h>
 #endif
 #if defined(_android_)
 #include <endian.h>
 #endif
 #if (BYTE_ORDER == LITTLE_ENDIAN)
 #define _little_endian_
 #elif (BYTE_ORDER == BIG_ENDIAN)
 #define _big_endian_
 #else
 #error unknown endian not supported
 #endif
 #elif (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(WHATEVER_THAT_HAS_BIG_ENDIAN)
 #define _big_endian_
 #else
 #define _little_endian_
 #endif
 // alignment
 #if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_QUADS)
 #define _must_align8_
 #endif
 #if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_LONGS)
 #define _must_align4_
 #endif
 #if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_SHORTS)
 #define _must_align2_
 #endif
 #if defined(__GNUC__)
 #define alias_hack __attribute__((__may_alias__))
 #endif
 #ifndef alias_hack
 #define alias_hack
 #endif
 #include "types.h"
 #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
 #define PRAGMA(x) _Pragma(#x)
 #define RCSID(idstr) PRAGMA(comment(exestr, idstr))
 #else
 #define RCSID(idstr) static const char rcsid[] = idstr
 #endif
 #include "compiler.h"
 #ifdef _win_
 #include <malloc.h>
 #elif defined(_sun_)
 #include <alloca.h>
 #endif
 #ifdef NDEBUG
 #define Y_IF_DEBUG(X)
 #else
 #define Y_IF_DEBUG(X) X
 #endif
 /**
 * @def Y_ARRAY_SIZE
 *
 * This macro is needed to get number of elements in a statically allocated fixed size array. The
 * expression is a compile-time constant and therefore can be used in compile time computations.
 *
 * @code
 * enum ENumbers {
 *     EN_ONE,
 *     EN_TWO,
 *     EN_SIZE
 * }
 *
 * const char* NAMES[] = {
 *     "one",
 *     "two"
 * }
 *
 * static_assert(Y_ARRAY_SIZE(NAMES) == EN_SIZE, "you should define `NAME` for each enumeration");
 * @endcode
 *
 * This macro also catches type errors. If you see a compiler error like "warning: division by zero
 * is undefined" when using `Y_ARRAY_SIZE` then you are probably giving it a pointer.
 *
 * Since all of our code is expected to work on a 64 bit platform where pointers are 8 bytes we may
 * falsefully accept pointers to types of sizes that are divisors of 8 (1, 2, 4 and 8).
 */
 #if defined(__cplusplus)
 namespace NArraySizePrivate {
    template <class T>
    struct TArraySize;
    template <class T, size_t N>
    struct TArraySize<T[N]> {
        enum {
            Result = N
        };
    };
    template <class T, size_t N>
    struct TArraySize<T (&)[N]> {
        enum {
            Result = N
        };
    };
 }
 #define Y_ARRAY_SIZE(arr) ((size_t)::NArraySizePrivate::TArraySize<decltype(arr)>::Result)
 #else
 #undef Y_ARRAY_SIZE
 #define Y_ARRAY_SIZE(arr) \
    ((sizeof(arr) / sizeof((arr)[0])) / static_cast<size_t>(!(sizeof(arr) % sizeof((arr)[0]))))
 #endif
 #undef Y_ARRAY_BEGIN
 #define Y_ARRAY_BEGIN(arr) (arr)
 #undef Y_ARRAY_END
 #define Y_ARRAY_END(arr) ((arr) + Y_ARRAY_SIZE(arr))
 /**
 * Concatenates two symbols, even if one of them is itself a macro.
 */
 #define Y_CAT(X, Y) Y_CAT_I(X, Y)
 #define Y_CAT_I(X, Y) Y_CAT_II(X, Y)
 #define Y_CAT_II(X, Y) X##Y
 #define Y_STRINGIZE(X) UTIL_PRIVATE_STRINGIZE_AUX(X)
 #define UTIL_PRIVATE_STRINGIZE_AUX(X) #X
 #if defined(__COUNTER__)
 #define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __COUNTER__)
 #endif
 #if !defined(Y_GENERATE_UNIQUE_ID)
 #define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __LINE__)
 #endif
 #define NPOS ((size_t)-1)
--- a/contrib/lfalloc/src/util/system/platform.h
+++ b/contrib/lfalloc/src/util/system/platform.h
@ -0,0 +1,242 @@
 #pragma once
 // What OS ?
 // our definition has the form _{osname}_
 #if defined(_WIN64)
 #define _win64_
 #define _win32_
 #elif defined(__WIN32__) || defined(_WIN32) // _WIN32 is also defined by the 64-bit compiler for backward compatibility
 #define _win32_
 #else
 #define _unix_
 #if defined(__sun__) || defined(sun) || defined(sparc) || defined(__sparc)
 #define _sun_
 #endif
 #if defined(__hpux__)
 #define _hpux_
 #endif
 #if defined(__linux__)
 #define _linux_
 #endif
 #if defined(__FreeBSD__)
 #define _freebsd_
 #endif
 #if defined(__CYGWIN__)
 #define _cygwin_
 #endif
 #if defined(__APPLE__)
 #define _darwin_
 #endif
 #if defined(__ANDROID__)
 #define _android_
 #endif
 #endif
 #if defined(__IOS__)
 #define _ios_
 #endif
 #if defined(_linux_)
 #if defined(_musl_)
 //nothing to do
 #elif defined(_android_)
 #define _bionic_
 #else
 #define _glibc_
 #endif
 #endif
 #if defined(_darwin_)
 #define unix
 #define __unix__
 #endif
 #if defined(_win32_) || defined(_win64_)
 #define _win_
 #endif
 #if defined(__arm__) || defined(__ARM__) || defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM)
 #if defined(__arm64) || defined(__arm64__) || defined(__aarch64__)
 #define _arm64_
 #else
 #define _arm32_
 #endif
 #endif
 #if defined(_arm64_) || defined(_arm32_)
 #define _arm_
 #endif
 /* __ia64__ and __x86_64__      - defined by GNU C.
 * _M_IA64, _M_X64, _M_AMD64    - defined by Visual Studio.
 *
 * Microsoft can define _M_IX86, _M_AMD64 (before Visual Studio 8)
 * or _M_X64 (starting in Visual Studio 8).
 */
 #if defined(__x86_64__) || defined(_M_X64) || defined(_M_AMD64)
 #define _x86_64_
 #endif
 #if defined(__i386__) || defined(_M_IX86)
 #define _i386_
 #endif
 #if defined(__ia64__) || defined(_M_IA64)
 #define _ia64_
 #endif
 #if defined(__powerpc__)
 #define _ppc_
 #endif
 #if defined(__powerpc64__)
 #define _ppc64_
 #endif
 #if !defined(sparc) && !defined(__sparc) && !defined(__hpux__) && !defined(__alpha__) && !defined(_ia64_) && !defined(_x86_64_) && !defined(_arm_) && !defined(_i386_) && !defined(_ppc_) && !defined(_ppc64_)
 #error "platform not defined, please, define one"
 #endif
 #if defined(_x86_64_) || defined(_i386_)
 #define _x86_
 #endif
 #if defined(__MIC__)
 #define _mic_
 #define _k1om_
 #endif
 // stdio or MessageBox
 #if defined(__CONSOLE__) || defined(_CONSOLE)
 #define _console_
 #endif
 #if (defined(_win_) && !defined(_console_))
 #define _windows_
 #elif !defined(_console_)
 #define _console_
 #endif
 #if defined(__SSE__) || defined(SSE_ENABLED)
 #define _sse_
 #endif
 #if defined(__SSE2__) || defined(SSE2_ENABLED)
 #define _sse2_
 #endif
 #if defined(__SSE3__) || defined(SSE3_ENABLED)
 #define _sse3_
 #endif
 #if defined(__SSSE3__) || defined(SSSE3_ENABLED)
 #define _ssse3_
 #endif
 #if defined(POPCNT_ENABLED)
 #define _popcnt_
 #endif
 #if defined(__DLL__) || defined(_DLL)
 #define _dll_
 #endif
 // 16, 32 or 64
 #if defined(__sparc_v9__) || defined(_x86_64_) || defined(_ia64_) || defined(_arm64_) || defined(_ppc64_)
 #define _64_
 #else
 #define _32_
 #endif
 /* All modern 64-bit Unix systems use scheme LP64 (long, pointers are 64-bit).
 * Microsoft uses a different scheme: LLP64 (long long, pointers are 64-bit).
 *
 * Scheme          LP64   LLP64
 * char              8      8
 * short            16     16
 * int              32     32
 * long             64     32
 * long long        64     64
 * pointer          64     64
 */
 #if defined(_32_)
 #define SIZEOF_PTR 4
 #elif defined(_64_)
 #define SIZEOF_PTR 8
 #endif
 #define PLATFORM_DATA_ALIGN SIZEOF_PTR
 #if !defined(SIZEOF_PTR)
 #error todo
 #endif
 #define SIZEOF_CHAR 1
 #define SIZEOF_UNSIGNED_CHAR 1
 #define SIZEOF_SHORT 2
 #define SIZEOF_UNSIGNED_SHORT 2
 #define SIZEOF_INT 4
 #define SIZEOF_UNSIGNED_INT 4
 #if defined(_32_)
 #define SIZEOF_LONG 4
 #define SIZEOF_UNSIGNED_LONG 4
 #elif defined(_64_)
 #if defined(_win_)
 #define SIZEOF_LONG 4
 #define SIZEOF_UNSIGNED_LONG 4
 #else
 #define SIZEOF_LONG 8
 #define SIZEOF_UNSIGNED_LONG 8
 #endif // _win_
 #endif // _32_
 #if !defined(SIZEOF_LONG)
 #error todo
 #endif
 #define SIZEOF_LONG_LONG 8
 #define SIZEOF_UNSIGNED_LONG_LONG 8
 #undef SIZEOF_SIZE_T // in case we include <Python.h> which defines it, too
 #define SIZEOF_SIZE_T SIZEOF_PTR
 #if defined(__INTEL_COMPILER)
 #pragma warning(disable 1292)
 #pragma warning(disable 1469)
 #pragma warning(disable 193)
 #pragma warning(disable 271)
 #pragma warning(disable 383)
 #pragma warning(disable 424)
 #pragma warning(disable 444)
 #pragma warning(disable 584)
 #pragma warning(disable 593)
 #pragma warning(disable 981)
 #pragma warning(disable 1418)
 #pragma warning(disable 304)
 #pragma warning(disable 810)
 #pragma warning(disable 1029)
 #pragma warning(disable 1419)
 #pragma warning(disable 177)
 #pragma warning(disable 522)
 #pragma warning(disable 858)
 #pragma warning(disable 111)
 #pragma warning(disable 1599)
 #pragma warning(disable 411)
 #pragma warning(disable 304)
 #pragma warning(disable 858)
 #pragma warning(disable 444)
 #pragma warning(disable 913)
 #pragma warning(disable 310)
 #pragma warning(disable 167)
 #pragma warning(disable 180)
 #pragma warning(disable 1572)
 #endif
 #if defined(_MSC_VER)
 #undef _WINSOCKAPI_
 #define _WINSOCKAPI_
 #undef NOMINMAX
 #define NOMINMAX
 #endif
--- a/contrib/lfalloc/src/util/system/types.h
+++ b/contrib/lfalloc/src/util/system/types.h
@ -0,0 +1,117 @@
 #pragma once
 // DO_NOT_STYLE
 #include "platform.h"
 #include <inttypes.h>
 typedef int8_t i8;
 typedef int16_t i16;
 typedef uint8_t ui8;
 typedef uint16_t ui16;
 typedef int yssize_t;
 #define PRIYSZT "d"
 #if defined(_darwin_) && defined(_32_)
 typedef unsigned long ui32;
 typedef long i32;
 #else
 typedef uint32_t ui32;
 typedef int32_t i32;
 #endif
 #if defined(_darwin_) && defined(_64_)
 typedef unsigned long ui64;
 typedef long i64;
 #else
 typedef uint64_t ui64;
 typedef int64_t i64;
 #endif
 #define LL(number) INT64_C(number)
 #define ULL(number) UINT64_C(number)
 // Macro for size_t and ptrdiff_t types
 #if defined(_32_)
 #   if defined(_darwin_)
 #       define PRISZT "lu"
 #       undef PRIi32
 #       define PRIi32 "li"
 #       undef SCNi32
 #       define SCNi32 "li"
 #       undef PRId32
 #       define PRId32 "li"
 #       undef SCNd32
 #       define SCNd32 "li"
 #       undef PRIu32
 #       define PRIu32 "lu"
 #       undef SCNu32
 #       define SCNu32 "lu"
 #       undef PRIx32
 #       define PRIx32 "lx"
 #       undef SCNx32
 #       define SCNx32 "lx"
 #   elif !defined(_cygwin_)
 #       define PRISZT PRIu32
 #   else
 #       define PRISZT "u"
 #   endif
 #   define SCNSZT SCNu32
 #   define PRIPDT PRIi32
 #   define SCNPDT SCNi32
 #   define PRITMT PRIi32
 #   define SCNTMT SCNi32
 #elif defined(_64_)
 #   if defined(_darwin_)
 #       define PRISZT "lu"
 #       undef PRIu64
 #       define PRIu64 PRISZT
 #       undef PRIx64
 #       define PRIx64 "lx"
 #       undef PRIX64
 #       define PRIX64 "lX"
 #       undef PRId64
 #       define PRId64 "ld"
 #       undef PRIi64
 #       define PRIi64 "li"
 #       undef SCNi64
 #       define SCNi64 "li"
 #       undef SCNu64
 #       define SCNu64 "lu"
 #       undef SCNx64
 #       define SCNx64 "lx"
 #   else
 #       define PRISZT PRIu64
 #   endif
 #   define SCNSZT SCNu64
 #   define PRIPDT PRIi64
 #   define SCNPDT SCNi64
 #   define PRITMT PRIi64
 #   define SCNTMT SCNi64
 #else
 #   error "Unsupported platform"
 #endif
 // SUPERLONG
 #if !defined(DONT_USE_SUPERLONG) && !defined(SUPERLONG_MAX)
 #define SUPERLONG_MAX ~LL(0)
 typedef i64 SUPERLONG;
 #endif
 // UNICODE
 // UCS-2, native byteorder
 typedef ui16 wchar16;
 // internal symbol type: UTF-16LE
 typedef wchar16 TChar;
 typedef ui32 wchar32;
 #if defined(_MSC_VER)
 #include <basetsd.h>
 typedef SSIZE_T ssize_t;
 #define HAVE_SSIZE_T 1
 #include <wchar.h>
 #endif
 #include <sys/types.h>
--- a/contrib/libmetrohash/src/platform.h
+++ b/contrib/libmetrohash/src/platform.h
@ -18,6 +18,7 @@
 #define METROHASH_PLATFORM_H
 #include <stdint.h>
 #include <cstring>
 // rotate right idiom recognized by most compilers
 inline static uint64_t rotate_right(uint64_t v, unsigned k)
@ -25,20 +26,28 @@ inline static uint64_t rotate_right(uint64_t v, unsigned k)
    return (v >> k) | (v << (64 - k));
 }
 // unaligned reads, fast and safe on Nehalem and later microarchitectures
 inline static uint64_t read_u64(const void * const ptr)
 {
-    return static_cast<uint64_t>(*reinterpret_cast<const uint64_t*>(ptr));
+    uint64_t result;
    // Assignment like `result = *reinterpret_cast<const uint64_t *>(ptr)` here would mean undefined behaviour (unaligned read),
    // so we use memcpy() which is the most portable. clang & gcc usually translates `memcpy()` into a single `load` instruction
    // when hardware supports it, so using memcpy() is efficient too.
    memcpy(&result, ptr, sizeof(result));
    return result;
 }
 inline static uint64_t read_u32(const void * const ptr)
 {
-    return static_cast<uint64_t>(*reinterpret_cast<const uint32_t*>(ptr));
+    uint32_t result;
    memcpy(&result, ptr, sizeof(result));
    return result;
 }
 inline static uint64_t read_u16(const void * const ptr)
 {
-    return static_cast<uint64_t>(*reinterpret_cast<const uint16_t*>(ptr));
+    uint16_t result;
    memcpy(&result, ptr, sizeof(result));
    return result;
 }
 inline static uint64_t read_u8 (const void * const ptr)
--- a/contrib/poco
+++ b/contrib/poco
@ -1 +1 @@
-Subproject commit fe5505e56c27b6ecb0dcbc40c49dc2caf4e9637f
+Subproject commit 29439cf7fa32c1a2d62d925bb6d6a3f14668a4a2
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@ -155,7 +155,6 @@ if (USE_EMBEDDED_COMPILER)
    target_include_directories (dbms SYSTEM BEFORE PUBLIC ${LLVM_INCLUDE_DIRS})
 endif ()
 if (CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE" OR CMAKE_BUILD_TYPE_UC STREQUAL "RELWITHDEBINFO" OR CMAKE_BUILD_TYPE_UC STREQUAL "MINSIZEREL")
    # Won't generate debug info for files with heavy template instantiation to achieve faster linking and lower size.
    set_source_files_properties(
@ -186,8 +185,6 @@ target_link_libraries (clickhouse_common_io
    ${LINK_LIBRARIES_ONLY_ON_X86_64}
        PUBLIC
    ${DOUBLE_CONVERSION_LIBRARIES}
        PRIVATE
    pocoext
        PUBLIC
    ${Poco_Net_LIBRARY}
    ${Poco_Util_LIBRARY}
@ -214,6 +211,10 @@ target_link_libraries (clickhouse_common_io
 target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
 if (USE_LFALLOC)
    target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${LFALLOC_INCLUDE_DIR})
 endif ()
 if(CPUID_LIBRARY)
    target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
 endif()
@ -223,8 +224,9 @@ if(CPUINFO_LIBRARY)
 endif()
 target_link_libraries (dbms
-        PRIVATE
+        PUBLIC
    clickhouse_compression
        PRIVATE
    clickhouse_parsers
    clickhouse_common_config
        PUBLIC
@ -232,7 +234,6 @@ target_link_libraries (dbms
        PRIVATE
    clickhouse_dictionaries_embedded
        PUBLIC
    pocoext
    ${MYSQLXX_LIBRARY}
        PRIVATE
    ${BTRIE_LIBRARIES}
--- a/dbms/programs/benchmark/Benchmark.cpp
+++ b/dbms/programs/benchmark/Benchmark.cpp
@ -439,7 +439,7 @@ int mainEntryClickHouseBenchmark(int argc, char ** argv)
            ("help",                                                            "produce help message")
            ("concurrency,c", value<unsigned>()->default_value(1),              "number of parallel queries")
            ("delay,d",       value<double>()->default_value(1),                "delay between intermediate reports in seconds (set 0 to disable reports)")
-            ("stage",         value<std::string>()->default_value("complete"),  "request query processing up to specified stage")
+            ("stage",         value<std::string>()->default_value("complete"),  "request query processing up to specified stage: complete,fetch_columns,with_mergeable_state")
            ("iterations,i",  value<size_t>()->default_value(0),                "amount of queries to be executed")
            ("timelimit,t",   value<double>()->default_value(0.),               "stop launch of queries after specified time limit")
            ("randomize,r",   value<bool>()->default_value(false),              "randomize order of execution")
--- a/dbms/programs/odbc-bridge/HandlerFactory.cpp
+++ b/dbms/programs/odbc-bridge/HandlerFactory.cpp
@ -2,7 +2,6 @@
 #include "PingHandler.h"
 #include "ColumnInfoHandler.h"
 #include <Poco/URI.h>
 #include <Poco/Ext/SessionPoolHelpers.h>
 #include <Poco/Net/HTTPServerRequest.h>
 #include <common/logger_useful.h>
--- a/dbms/programs/odbc-bridge/MainHandler.cpp
+++ b/dbms/programs/odbc-bridge/MainHandler.cpp
@ -11,11 +11,12 @@
 #include <IO/WriteHelpers.h>
 #include <IO/ReadHelpers.h>
 #include <Interpreters/Context.h>
 #include <Poco/Ext/SessionPoolHelpers.h>
 #include <Poco/Net/HTTPServerRequest.h>
 #include <Poco/Net/HTTPServerResponse.h>
 #include <Poco/Net/HTMLForm.h>
 #include <common/logger_useful.h>
 #include <mutex>
 #include <Poco/ThreadPool.h>
 namespace DB
 {
@ -31,6 +32,24 @@ namespace
    }
 }
 using PocoSessionPoolConstructor = std::function<std::shared_ptr<Poco::Data::SessionPool>()>;
 /** Is used to adjust max size of default Poco thread pool. See issue #750
  * Acquire the lock, resize pool and construct new Session.
  */
 std::shared_ptr<Poco::Data::SessionPool> createAndCheckResizePocoSessionPool(PocoSessionPoolConstructor pool_constr)
 {
    static std::mutex mutex;
    Poco::ThreadPool & pool = Poco::ThreadPool::defaultPool();
    /// NOTE: The lock don't guarantee that external users of the pool don't change its capacity
    std::unique_lock lock(mutex);
    if (pool.available() == 0)
        pool.addCapacity(2 * std::max(pool.capacity(), 1));
    return pool_constr();
 }
 ODBCHandler::PoolPtr ODBCHandler::getPool(const std::string & connection_str)
 {
--- a/dbms/src/Columns/ColumnString.h
+++ b/dbms/src/Columns/ColumnString.h
@ -21,6 +21,7 @@ namespace DB
 class ColumnString final : public COWPtrHelper<IColumn, ColumnString>
 {
 public:
    using Char = UInt8;
    using Chars = PaddedPODArray<UInt8>;
 private:
--- a/dbms/src/Columns/IColumn.h
+++ b/dbms/src/Columns/IColumn.h
@ -250,7 +250,7 @@ public:
    /// Size of memory, allocated for column.
    /// This is greater or equals to byteSize due to memory reservation in containers.
-    /// Zero, if could be determined.
+    /// Zero, if could not be determined.
    virtual size_t allocatedBytes() const = 0;
    /// Make memory region readonly with mprotect if it is large enough.
--- a/dbms/src/Common/CurrentThread.cpp
+++ b/dbms/src/Common/CurrentThread.cpp
@ -7,7 +7,7 @@
 #include <Common/TaskStatsInfoGetter.h>
 #include <Interpreters/ProcessList.h>
 #include <Interpreters/Context.h>
-#include <Poco/Ext/ThreadNumber.h>
+#include <common/getThreadNumber.h>
 #include <Poco/Logger.h>
@ -29,7 +29,7 @@ void CurrentThread::updatePerformanceCounters()
 ThreadStatus & CurrentThread::get()
 {
    if (unlikely(!current_thread))
-        throw Exception("Thread #" + std::to_string(Poco::ThreadNumber::get()) + " status was not initialized", ErrorCodes::LOGICAL_ERROR);
+        throw Exception("Thread #" + std::to_string(getThreadNumber()) + " status was not initialized", ErrorCodes::LOGICAL_ERROR);
    return *current_thread;
 }
--- a/dbms/src/Common/ErrorCodes.cpp
+++ b/dbms/src/Common/ErrorCodes.cpp
@ -424,6 +424,8 @@ namespace ErrorCodes
    extern const int HYPERSCAN_CANNOT_SCAN_TEXT = 447;
    extern const int BROTLI_READ_FAILED = 448;
    extern const int BROTLI_WRITE_FAILED = 449;
    extern const int BAD_TTL_EXPRESSION = 450;
    extern const int BAD_TTL_FILE = 451;
    extern const int KEEPER_EXCEPTION = 999;
    extern const int POCO_EXCEPTION = 1000;
--- a/dbms/src/Common/LFAllocator.cpp
+++ b/dbms/src/Common/LFAllocator.cpp
@ -0,0 +1,53 @@
 #include <Common/config.h>
 #if USE_LFALLOC
 #include "LFAllocator.h"
 #include <cstring>
 #include <lf_allocX64.h>
 namespace DB
 {
 void * LFAllocator::alloc(size_t size, size_t alignment)
 {
    if (alignment == 0)
        return LFAlloc(size);
    else
    {
        void * ptr;
        int res = LFPosixMemalign(&ptr, alignment, size);
        return res ? nullptr : ptr;
    }
 }
 void LFAllocator::free(void * buf, size_t)
 {
    LFFree(buf);
 }
 void * LFAllocator::realloc(void * old_ptr, size_t, size_t new_size, size_t alignment)
 {
    if (old_ptr == nullptr)
    {
        void * result = LFAllocator::alloc(new_size, alignment);
        return result;
    }
    if (new_size == 0)
    {
        LFFree(old_ptr);
        return nullptr;
    }
    void * new_ptr = LFAllocator::alloc(new_size, alignment);
    if (new_ptr == nullptr)
        return nullptr;
    size_t old_size = LFGetSize(old_ptr);
    memcpy(new_ptr, old_ptr, ((old_size < new_size) ? old_size : new_size));
    LFFree(old_ptr);
    return new_ptr;
 }
 }
 #endif
--- a/dbms/src/Common/LFAllocator.h
+++ b/dbms/src/Common/LFAllocator.h
@ -0,0 +1,22 @@
 #pragma once
 #include <Common/config.h>
 #if !USE_LFALLOC
 #error "do not include this file until USE_LFALLOC is set to 1"
 #endif
 #include <cstddef>
 namespace DB
 {
 struct LFAllocator
 {
    static void * alloc(size_t size, size_t alignment = 0);
    static void free(void * buf, size_t);
    static void * realloc(void * buf, size_t, size_t new_size, size_t alignment = 0);
 };
 }
--- a/dbms/src/Common/RWLock.cpp
+++ b/dbms/src/Common/RWLock.cpp
@ -1,7 +1,6 @@
 #include "RWLock.h"
 #include <Common/Stopwatch.h>
 #include <Common/Exception.h>
 #include <Poco/Ext/ThreadNumber.h>
 #include <Common/CurrentMetrics.h>
 #include <Common/ProfileEvents.h>
--- a/dbms/src/Common/RadixSort.h
+++ b/dbms/src/Common/RadixSort.h
@ -64,15 +64,15 @@ struct RadixSortFloatTransform
 };
-template <typename Float>
+template <typename _Element, typename _Key = _Element>
 struct RadixSortFloatTraits
 {
-    using Element = Float;        /// The type of the element. It can be a structure with a key and some other payload. Or just a key.
+    using Element = _Element;     /// The type of the element. It can be a structure with a key and some other payload. Or just a key.
-    using Key = Float;            /// The key to sort.
+    using Key = _Key;             /// The key to sort.
    using CountType = uint32_t;   /// Type for calculating histograms. In the case of a known small number of elements, it can be less than size_t.
    /// The type to which the key is transformed to do bit operations. This UInt is the same size as the key.
-    using KeyBits = std::conditional_t<sizeof(Float) == 8, uint64_t, uint32_t>;
+    using KeyBits = std::conditional_t<sizeof(_Key) == 8, uint64_t, uint32_t>;
    static constexpr size_t PART_SIZE_BITS = 8;    /// With what pieces of the key, in bits, to do one pass - reshuffle of the array.
@ -85,7 +85,13 @@ struct RadixSortFloatTraits
    using Allocator = RadixSortMallocAllocator;
    /// The function to get the key from an array element.
-    static Key & extractKey(Element & elem) { return elem; }
+    static Key & extractKey(Element & elem)
    {
        if constexpr (std::is_same_v<Element, Key>)
            return elem;
        else
            return *reinterpret_cast<Key *>(&elem);
    }
 };
@ -109,13 +115,13 @@ struct RadixSortSignedTransform
 };
-template <typename UInt>
+template <typename _Element, typename _Key = _Element>
 struct RadixSortUIntTraits
 {
-    using Element = UInt;
+    using Element = _Element;
-    using Key = UInt;
+    using Key = _Key;
    using CountType = uint32_t;
-    using KeyBits = UInt;
+    using KeyBits = _Key;
    static constexpr size_t PART_SIZE_BITS = 8;
@ -123,16 +129,22 @@ struct RadixSortUIntTraits
    using Allocator = RadixSortMallocAllocator;
    /// The function to get the key from an array element.
-    static Key & extractKey(Element & elem) { return elem; }
+    static Key & extractKey(Element & elem)
    {
        if constexpr (std::is_same_v<Element, Key>)
            return elem;
        else
            return *reinterpret_cast<Key *>(&elem);
    }
 };
-template <typename Int>
+template <typename _Element, typename _Key = _Element>
 struct RadixSortIntTraits
 {
-    using Element = Int;
+    using Element = _Element;
-    using Key = Int;
+    using Key = _Key;
    using CountType = uint32_t;
-    using KeyBits = std::make_unsigned_t<Int>;
+    using KeyBits = std::make_unsigned_t<_Key>;
    static constexpr size_t PART_SIZE_BITS = 8;
@ -140,7 +152,13 @@ struct RadixSortIntTraits
    using Allocator = RadixSortMallocAllocator;
    /// The function to get the key from an array element.
-    static Key & extractKey(Element & elem) { return elem; }
+    static Key & extractKey(Element & elem)
    {
        if constexpr (std::is_same_v<Element, Key>)
            return elem;
        else
            return *reinterpret_cast<Key *>(&elem);
    }
 };
@ -261,3 +279,16 @@ radixSort(T * arr, size_t size)
    return RadixSort<RadixSortFloatTraits<T>>::execute(arr, size);
 }
 template <typename _Element, typename _Key>
 std::enable_if_t<std::is_integral_v<_Key>, void>
 radixSort(_Element * arr, size_t size)
 {
    return RadixSort<RadixSortUIntTraits<_Element, _Key>>::execute(arr, size);
 }
 template <typename _Element, typename _Key>
 std::enable_if_t<std::is_floating_point_v<_Key>, void>
 radixSort(_Element * arr, size_t size)
 {
    return RadixSort<RadixSortFloatTraits<_Element, _Key>>::execute(arr, size);
 }
--- a/dbms/src/Common/ThreadStatus.cpp
+++ b/dbms/src/Common/ThreadStatus.cpp
@ -7,7 +7,7 @@
 #include <Common/ThreadStatus.h>
 #include <Poco/Logger.h>
-#include <Poco/Ext/ThreadNumber.h>
+#include <common/getThreadNumber.h>
 namespace DB
@ -33,7 +33,7 @@ TasksStatsCounters TasksStatsCounters::current()
 ThreadStatus::ThreadStatus()
 {
-    thread_number = Poco::ThreadNumber::get();
+    thread_number = getThreadNumber();
    os_thread_id = TaskStatsInfoGetter::getCurrentTID();
    last_rusage = std::make_unique<RUsageCounters>();
--- a/dbms/src/Common/config.h.in
+++ b/dbms/src/Common/config.h.in
@ -25,6 +25,8 @@
 #cmakedefine01 USE_BROTLI
 #cmakedefine01 USE_SSL
 #cmakedefine01 USE_HYPERSCAN
 #cmakedefine01 USE_LFALLOC
 #cmakedefine01 USE_LFALLOC_RANDOM_HINT
 #cmakedefine01 CLICKHOUSE_SPLIT_BINARY
 #cmakedefine01 LLVM_HAS_RTTI
--- a/dbms/src/Core/AccurateComparison.h
+++ b/dbms/src/Core/AccurateComparison.h
@ -426,6 +426,21 @@ inline bool_if_safe_conversion<A, B> greaterOrEqualsOp(A a, B b)
    return a >= b;
 }
 /// Converts numeric to an equal numeric of other type.
 template <typename From, typename To>
 inline bool NO_SANITIZE_UNDEFINED convertNumeric(From value, To & result)
 {
    /// Note that NaNs doesn't compare equal to anything, but they are still in range of any Float type.
    if (isNaN(value) && std::is_floating_point_v<To>)
    {
        result = value;
        return true;
    }
    result = static_cast<To>(value);
    return equalsOp(value, result);
 }
 }
--- a/dbms/src/Core/Settings.h
+++ b/dbms/src/Core/Settings.h
@ -93,9 +93,12 @@ struct Settings
    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.") \
    \
    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.") \
    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (100 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.") \
    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.") \
    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.") \
    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.") \
    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (600 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
    \
    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.") \
    \
--- a/dbms/src/DataStreams/BlocksListBlockInputStream.h
+++ b/dbms/src/DataStreams/BlocksListBlockInputStream.h
@ -23,6 +23,8 @@ public:
    String getName() const override { return "BlocksList"; }
 protected:
    Block getHeader() const override { return list.empty() ? Block() : *list.begin(); }
    Block readImpl() override
    {
        if (it == end)
--- a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp
+++ b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp
@ -40,7 +40,7 @@ void CollapsingSortedBlockInputStream::reportIncorrectData()
 }
-void CollapsingSortedBlockInputStream::insertRows(MutableColumns & merged_columns, size_t & merged_rows)
+void CollapsingSortedBlockInputStream::insertRows(MutableColumns & merged_columns, size_t block_size, MergeStopCondition & condition)
 {
    if (count_positive == 0 && count_negative == 0)
    {
@ -52,7 +52,7 @@ void CollapsingSortedBlockInputStream::insertRows(MutableColumns & merged_column
    {
        if (count_positive <= count_negative)
        {
-            ++merged_rows;
+            condition.addRowWithGranularity(block_size);
            for (size_t i = 0; i < num_columns; ++i)
                merged_columns[i]->insertFrom(*(*first_negative.columns)[i], first_negative.row_num);
@ -62,7 +62,7 @@ void CollapsingSortedBlockInputStream::insertRows(MutableColumns & merged_column
        if (count_positive >= count_negative)
        {
-            ++merged_rows;
+            condition.addRowWithGranularity(block_size);
            for (size_t i = 0; i < num_columns; ++i)
                merged_columns[i]->insertFrom(*(*last_positive.columns)[i], last_positive.row_num);
@ -106,12 +106,14 @@ Block CollapsingSortedBlockInputStream::readImpl()
 void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue<SortCursor> & queue)
 {
    size_t merged_rows = 0;
    MergeStopCondition stop_condition(average_block_sizes, max_block_size);
    size_t current_block_granularity;
    /// Take rows in correct order and put them into `merged_columns` until the rows no more than `max_block_size`
    for (; !queue.empty(); ++current_pos)
    {
        SortCursor current = queue.top();
        current_block_granularity = current->rows;
        if (current_key.empty())
            setPrimaryKeyRef(current_key, current);
@ -122,7 +124,7 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st
        bool key_differs = next_key != current_key;
        /// if there are enough rows and the last one is calculated completely
-        if (key_differs && merged_rows >= max_block_size)
+        if (key_differs && stop_condition.checkStop())
        {
            ++blocks_written;
            return;
@ -133,7 +135,7 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st
        if (key_differs)
        {
            /// We write data for the previous primary key.
-            insertRows(merged_columns, merged_rows);
+            insertRows(merged_columns, current_block_granularity, stop_condition);
            current_key.swap(next_key);
@ -167,7 +169,7 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st
                first_negative_pos = current_pos;
            }
-            if (!blocks_written && !merged_rows)
+            if (!blocks_written && stop_condition.empty())
            {
                setRowRef(last_negative, current);
                last_negative_pos = current_pos;
@ -193,7 +195,7 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st
    }
    /// Write data for last primary key.
-    insertRows(merged_columns, merged_rows);
+    insertRows(merged_columns, /*some_granularity*/ 0, stop_condition);
    finished = true;
 }
--- a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h
+++ b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h
@ -26,8 +26,9 @@ class CollapsingSortedBlockInputStream : public MergingSortedBlockInputStream
 public:
    CollapsingSortedBlockInputStream(
            BlockInputStreams inputs_, const SortDescription & description_,
-            const String & sign_column, size_t max_block_size_, WriteBuffer * out_row_sources_buf_ = nullptr)
+            const String & sign_column, size_t max_block_size_,
-        : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_)
+            WriteBuffer * out_row_sources_buf_ = nullptr, bool average_block_sizes_ = false)
        : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_, false, average_block_sizes_)
    {
        sign_column_number = header.getPositionByName(sign_column);
    }
@ -75,7 +76,7 @@ private:
    void merge(MutableColumns & merged_columns, std::priority_queue<SortCursor> & queue);
    /// Output to result rows for the current primary key.
-    void insertRows(MutableColumns & merged_columns, size_t & merged_rows);
+    void insertRows(MutableColumns & merged_columns, size_t block_size, MergeStopCondition & condition);
    void reportIncorrectData();
 };
--- a/dbms/src/DataStreams/MarkInCompressedFile.h
+++ b/dbms/src/DataStreams/MarkInCompressedFile.h
@ -6,6 +6,10 @@
 #include <IO/WriteHelpers.h>
 #include <Common/PODArray.h>
 #include <Common/config.h>
 #if USE_LFALLOC
 #include <Common/LFAllocator.h>
 #endif
 namespace DB
 {
@ -32,8 +36,16 @@ struct MarkInCompressedFile
    {
        return "(" + DB::toString(offset_in_compressed_file) + "," + DB::toString(offset_in_decompressed_block) + ")";
    }
 };
 using MarksInCompressedFile = PODArray<MarkInCompressedFile>;
    String toStringWithRows(size_t rows_num)
    {
        return "(" + DB::toString(offset_in_compressed_file) + "," + DB::toString(offset_in_decompressed_block) + "," + DB::toString(rows_num) + ")";
    }
 };
 #if USE_LFALLOC
 using MarksInCompressedFile = PODArray<MarkInCompressedFile, 4096, LFAllocator>;
 #else
 using MarksInCompressedFile = PODArray<MarkInCompressedFile>;
 #endif
 }
--- a/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp
+++ b/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp
@ -18,9 +18,10 @@ namespace ErrorCodes
 MergingSortedBlockInputStream::MergingSortedBlockInputStream(
    const BlockInputStreams & inputs_, const SortDescription & description_,
-    size_t max_block_size_, UInt64 limit_, WriteBuffer * out_row_sources_buf_, bool quiet_)
+    size_t max_block_size_, UInt64 limit_, WriteBuffer * out_row_sources_buf_, bool quiet_, bool average_block_sizes_)
    : description(description_), max_block_size(max_block_size_), limit(limit_), quiet(quiet_)
-    , source_blocks(inputs_.size()), cursors(inputs_.size()), out_row_sources_buf(out_row_sources_buf_)
+    , average_block_sizes(average_block_sizes_), source_blocks(inputs_.size())
    , cursors(inputs_.size()), out_row_sources_buf(out_row_sources_buf_)
 {
    children.insert(children.end(), inputs_.begin(), inputs_.end());
    header = children.at(0)->getHeader();
@ -116,7 +117,7 @@ Block MergingSortedBlockInputStream::readImpl()
 template <typename TSortCursor>
 void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current, std::priority_queue<TSortCursor> & queue)
 {
-    size_t order = current.impl->order;
+    size_t order = current->order;
    size_t size = cursors.size();
    if (order >= size || &cursors[order] != current.impl)
@ -132,6 +133,19 @@ void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current,
    }
 }
 bool MergingSortedBlockInputStream::MergeStopCondition::checkStop() const
 {
    if (!count_average)
        return sum_rows_count == max_block_size;
    if (sum_rows_count == 0)
        return false;
    size_t average = sum_blocks_granularity / sum_rows_count;
    return sum_rows_count >= average;
 }
 template
 void MergingSortedBlockInputStream::fetchNextBlock<SortCursor>(const SortCursor & current, std::priority_queue<SortCursor> & queue);
@ -144,10 +158,11 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
 {
    size_t merged_rows = 0;
    MergeStopCondition stop_condition(average_block_sizes, max_block_size);
    /** Increase row counters.
      * Return true if it's time to finish generating the current data block.
      */
-    auto count_row_and_check_limit = [&, this]()
+    auto count_row_and_check_limit = [&, this](size_t current_granularity)
    {
        ++total_merged_rows;
        if (limit && total_merged_rows == limit)
@ -159,19 +174,15 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
        }
        ++merged_rows;
-        if (merged_rows == max_block_size)
+        stop_condition.addRowWithGranularity(current_granularity);
-        {
+        return stop_condition.checkStop();
    //        std::cerr << "max_block_size reached\n";
            return true;
        }
        return false;
    };
    /// Take rows in required order and put them into `merged_columns`, while the rows are no more than `max_block_size`
    while (!queue.empty())
    {
        TSortCursor current = queue.top();
        size_t current_block_granularity = current->rows;
        queue.pop();
        while (true)
@ -179,7 +190,7 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
            /** And what if the block is totally less or equal than the rest for the current cursor?
              * Or is there only one data source left in the queue? Then you can take the entire block on current cursor.
              */
-            if (current.impl->isFirst() && (queue.empty() || current.totallyLessOrEquals(queue.top())))
+            if (current->isFirst() && (queue.empty() || current.totallyLessOrEquals(queue.top())))
            {
    //            std::cerr << "current block is totally less or equals\n";
@ -191,8 +202,8 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
                    return;
                }
-                /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl)
+                /// Actually, current->order stores source number (i.e. cursors[current->order] == current)
-                size_t source_num = current.impl->order;
+                size_t source_num = current->order;
                if (source_num >= cursors.size())
                    throw Exception("Logical error in MergingSortedBlockInputStream", ErrorCodes::LOGICAL_ERROR);
@ -204,6 +215,7 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
                merged_rows = merged_columns.at(0)->size();
                /// Limit output
                if (limit && total_merged_rows + merged_rows > limit)
                {
                    merged_rows = limit - total_merged_rows;
@ -217,6 +229,8 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
                    finished = true;
                }
                /// Write order of rows for other columns
                /// this data will be used in grather stream
                if (out_row_sources_buf)
                {
                    RowSourcePart row_source(source_num);
@ -239,7 +253,7 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
            if (out_row_sources_buf)
            {
                /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl)
-                RowSourcePart row_source(current.impl->order);
+                RowSourcePart row_source(current->order);
                out_row_sources_buf->write(row_source.data);
            }
@ -250,7 +264,7 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
                if (queue.empty() || !(current.greater(queue.top())))
                {
-                    if (count_row_and_check_limit())
+                    if (count_row_and_check_limit(current_block_granularity))
                    {
    //                    std::cerr << "pushing back to queue\n";
                        queue.push(current);
@ -277,7 +291,7 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
            break;
        }
-        if (count_row_and_check_limit())
+        if (count_row_and_check_limit(current_block_granularity))
            return;
    }
--- a/dbms/src/DataStreams/MergingSortedBlockInputStream.h
+++ b/dbms/src/DataStreams/MergingSortedBlockInputStream.h
@ -68,7 +68,7 @@ public:
      */
    MergingSortedBlockInputStream(
        const BlockInputStreams & inputs_, const SortDescription & description_, size_t max_block_size_,
-        UInt64 limit_ = 0, WriteBuffer * out_row_sources_buf_ = nullptr, bool quiet_ = false);
+        UInt64 limit_ = 0, WriteBuffer * out_row_sources_buf_ = nullptr, bool quiet_ = false, bool average_block_sizes_ = false);
    String getName() const override { return "MergingSorted"; }
@ -116,6 +116,38 @@ protected:
        size_t size() const { return empty() ? 0 : columns->size(); }
    };
    /// Simple class, which allows to check stop condition during merge process
    /// in simple case it just compare amount of merged rows with max_block_size
    /// in `count_average` case it compares amount of merged rows with linear combination
    /// of block sizes from which these rows were taken.
    struct MergeStopCondition
    {
        size_t sum_blocks_granularity = 0;
        size_t sum_rows_count = 0;
        bool count_average;
        size_t max_block_size;
        MergeStopCondition(bool count_average_, size_t max_block_size_)
            : count_average(count_average_)
            , max_block_size(max_block_size_)
        {}
        /// add single row from block size `granularity`
        void addRowWithGranularity(size_t granularity)
        {
            sum_blocks_granularity += granularity;
            sum_rows_count++;
        }
        /// check that sum_rows_count is enough
        bool checkStop() const;
        bool empty() const
        {
            return sum_blocks_granularity == 0;
        }
    };
    Block readImpl() override;
@ -139,6 +171,7 @@ protected:
    bool first = true;
    bool has_collation = false;
    bool quiet = false;
    bool average_block_sizes = false;
    /// May be smaller or equal to max_block_size. To do 'reserve' for columns.
    size_t expected_block_size = 0;
--- a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp
+++ b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp
@ -12,7 +12,7 @@ namespace ErrorCodes
 }
-void ReplacingSortedBlockInputStream::insertRow(MutableColumns & merged_columns, size_t & merged_rows)
+void ReplacingSortedBlockInputStream::insertRow(MutableColumns & merged_columns)
 {
    if (out_row_sources_buf)
    {
@ -24,7 +24,6 @@ void ReplacingSortedBlockInputStream::insertRow(MutableColumns & merged_columns,
        current_row_sources.resize(0);
    }
    ++merged_rows;
    for (size_t i = 0; i < num_columns; ++i)
        merged_columns[i]->insertFrom(*(*selected_row.columns)[i], selected_row.row_num);
 }
@ -51,12 +50,12 @@ Block ReplacingSortedBlockInputStream::readImpl()
 void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue<SortCursor> & queue)
 {
-    size_t merged_rows = 0;
+    MergeStopCondition stop_condition(average_block_sizes, max_block_size);
    /// Take the rows in needed order and put them into `merged_columns` until rows no more than `max_block_size`
    while (!queue.empty())
    {
        SortCursor current = queue.top();
        size_t current_block_granularity = current->rows;
        if (current_key.empty())
            setPrimaryKeyRef(current_key, current);
@ -66,7 +65,7 @@ void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std
        bool key_differs = next_key != current_key;
        /// if there are enough rows and the last one is calculated completely
-        if (key_differs && merged_rows >= max_block_size)
+        if (key_differs && stop_condition.checkStop())
            return;
        queue.pop();
@ -74,7 +73,8 @@ void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std
        if (key_differs)
        {
            /// Write the data for the previous primary key.
-            insertRow(merged_columns, merged_rows);
+            insertRow(merged_columns);
            stop_condition.addRowWithGranularity(current_block_granularity);
            selected_row.reset();
            current_key.swap(next_key);
        }
@ -110,7 +110,7 @@ void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std
    /// We will write the data for the last primary key.
    if (!selected_row.empty())
-        insertRow(merged_columns, merged_rows);
+        insertRow(merged_columns);
    finished = true;
 }
--- a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h
+++ b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h
@ -18,8 +18,9 @@ class ReplacingSortedBlockInputStream : public MergingSortedBlockInputStream
 public:
    ReplacingSortedBlockInputStream(
        const BlockInputStreams & inputs_, const SortDescription & description_,
-        const String & version_column, size_t max_block_size_, WriteBuffer * out_row_sources_buf_ = nullptr)
+        const String & version_column, size_t max_block_size_, WriteBuffer * out_row_sources_buf_ = nullptr,
-        : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_)
+        bool average_block_sizes_ = false)
        : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_, false, average_block_sizes_)
    {
        if (!version_column.empty())
            version_column_number = header.getPositionByName(version_column);
@ -54,7 +55,7 @@ private:
    void merge(MutableColumns & merged_columns, std::priority_queue<SortCursor> & queue);
    /// Output into result the rows for current primary key.
-    void insertRow(MutableColumns & merged_columns, size_t & merged_rows);
+    void insertRow(MutableColumns & merged_columns);
 };
 }
--- a/dbms/src/DataStreams/TTLBlockInputStream.cpp
+++ b/dbms/src/DataStreams/TTLBlockInputStream.cpp
@ -0,0 +1,208 @@
 #include <DataStreams/TTLBlockInputStream.h>
 #include <DataTypes/DataTypeDate.h>
 #include <Interpreters/evaluateMissingDefaults.h>
 #include <Interpreters/SyntaxAnalyzer.h>
 #include <Interpreters/ExpressionAnalyzer.h>
 namespace DB
 {
 namespace ErrorCodes
 {
    extern const int LOGICAL_ERROR;
 }
 TTLBlockInputStream::TTLBlockInputStream(
    const BlockInputStreamPtr & input_,
    const MergeTreeData & storage_,
    const MergeTreeData::MutableDataPartPtr & data_part_,
    time_t current_time_)
    : storage(storage_)
    , data_part(data_part_)
    , current_time(current_time_)
    , old_ttl_infos(data_part->ttl_infos)
    , log(&Logger::get(storage.getLogName() + " (TTLBlockInputStream)"))
    , date_lut(DateLUT::instance())
 {
    children.push_back(input_);
    const auto & column_defaults = storage.getColumns().getDefaults();
    ASTPtr default_expr_list = std::make_shared<ASTExpressionList>();
    for (const auto & [name, ttl_info] : old_ttl_infos.columns_ttl)
    {
        if (ttl_info.min <= current_time)
        {
            new_ttl_infos.columns_ttl.emplace(name, MergeTreeDataPart::TTLInfo{});
            empty_columns.emplace(name);
            auto it = column_defaults.find(name);
            if (it != column_defaults.end())
                default_expr_list->children.emplace_back(
                    setAlias(it->second.expression, it->first));
        }
        else
            new_ttl_infos.columns_ttl.emplace(name, ttl_info);
    }
    if (old_ttl_infos.table_ttl.min > current_time)
        new_ttl_infos.table_ttl = old_ttl_infos.table_ttl;
    if (!default_expr_list->children.empty())
    {
        auto syntax_result = SyntaxAnalyzer(storage.global_context).analyze(
            default_expr_list, storage.getColumns().getAllPhysical());
        defaults_expression = ExpressionAnalyzer{default_expr_list, syntax_result, storage.global_context}.getActions(true);
    }
 }
 Block TTLBlockInputStream::getHeader() const
 {
    return children.at(0)->getHeader();
 }
 Block TTLBlockInputStream::readImpl()
 {
    Block block = children.at(0)->read();
    if (!block)
        return block;
    if (storage.hasTableTTL())
    {
        /// Skip all data if table ttl is expired for part
        if (old_ttl_infos.table_ttl.max <= current_time)
        {
            rows_removed = data_part->rows_count;
            return {};
        }
        if (old_ttl_infos.table_ttl.min <= current_time)
            removeRowsWithExpiredTableTTL(block);
    }
    removeValuesWithExpiredColumnTTL(block);
    return block;
 }
 void TTLBlockInputStream::readSuffixImpl()
 {
    for (const auto & elem : new_ttl_infos.columns_ttl)
        new_ttl_infos.updatePartMinTTL(elem.second.min);
    new_ttl_infos.updatePartMinTTL(new_ttl_infos.table_ttl.min);
    data_part->ttl_infos = std::move(new_ttl_infos);
    data_part->empty_columns = std::move(empty_columns);
    if (rows_removed)
        LOG_INFO(log, "Removed " << rows_removed << " rows with expired ttl from part " << data_part->name);
 }
 void TTLBlockInputStream::removeRowsWithExpiredTableTTL(Block & block)
 {
    storage.ttl_table_entry.expression->execute(block);
    const auto & current = block.getByName(storage.ttl_table_entry.result_column);
    const IColumn * ttl_column = current.column.get();
    MutableColumns result_columns;
    result_columns.reserve(getHeader().columns());
    for (const auto & name : storage.getColumns().getNamesOfPhysical())
    {
        auto & column_with_type = block.getByName(name);
        const IColumn * values_column = column_with_type.column.get();
        MutableColumnPtr result_column = values_column->cloneEmpty();
        result_column->reserve(block.rows());
        for (size_t i = 0; i < block.rows(); ++i)
        {
            UInt32 cur_ttl = getTimestampByIndex(ttl_column, i);
            if (cur_ttl > current_time)
            {
                new_ttl_infos.table_ttl.update(cur_ttl);
                result_column->insertFrom(*values_column, i);
            }
            else
                ++rows_removed;
        }
        result_columns.emplace_back(std::move(result_column));
    }
    block = getHeader().cloneWithColumns(std::move(result_columns));
 }
 void TTLBlockInputStream::removeValuesWithExpiredColumnTTL(Block & block)
 {
    Block block_with_defaults;
    if (defaults_expression)
    {
        block_with_defaults = block;
        defaults_expression->execute(block_with_defaults);
    }
    for (const auto & [name, ttl_entry] : storage.ttl_entries_by_name)
    {
        const auto & old_ttl_info = old_ttl_infos.columns_ttl[name];
        auto & new_ttl_info = new_ttl_infos.columns_ttl[name];
        if (old_ttl_info.min > current_time)
            continue;
        if (old_ttl_info.max <= current_time)
            continue;
        if (!block.has(ttl_entry.result_column))
            ttl_entry.expression->execute(block);
        ColumnPtr default_column = nullptr;
        if (block_with_defaults.has(name))
            default_column = block_with_defaults.getByName(name).column->convertToFullColumnIfConst();
        auto & column_with_type = block.getByName(name);
        const IColumn * values_column = column_with_type.column.get();
        MutableColumnPtr result_column = values_column->cloneEmpty();
        result_column->reserve(block.rows());
        const auto & current = block.getByName(ttl_entry.result_column);
        const IColumn * ttl_column = current.column.get();
        for (size_t i = 0; i < block.rows(); ++i)
        {
            UInt32 cur_ttl = getTimestampByIndex(ttl_column, i);
            if (cur_ttl <= current_time)
            {
                if (default_column)
                    result_column->insertFrom(*default_column, i);
                else
                    result_column->insertDefault();
            }
            else
            {
                new_ttl_info.update(cur_ttl);
                empty_columns.erase(name);
                result_column->insertFrom(*values_column, i);
            }
        }
        column_with_type.column = std::move(result_column);
    }
    for (const auto & elem : storage.ttl_entries_by_name)
        if (block.has(elem.second.result_column))
            block.erase(elem.second.result_column);
 }
 UInt32 TTLBlockInputStream::getTimestampByIndex(const IColumn * column, size_t ind)
 {
    if (const ColumnUInt16 * column_date = typeid_cast<const ColumnUInt16 *>(column))
        return date_lut.fromDayNum(DayNum(column_date->getData()[ind]));
    else if (const ColumnUInt32 * column_date_time = typeid_cast<const ColumnUInt32 *>(column))
        return column_date_time->getData()[ind];
    else
        throw Exception("Unexpected type of result ttl column", ErrorCodes::LOGICAL_ERROR);
 }
 }
--- a/dbms/src/DataStreams/TTLBlockInputStream.h
+++ b/dbms/src/DataStreams/TTLBlockInputStream.h
@ -0,0 +1,60 @@
 #pragma once
 #include <DataStreams/IBlockInputStream.h>
 #include <Storages/MergeTree/MergeTreeData.h>
 #include <Storages/MergeTree/MergeTreeDataPart.h>
 #include <Core/Block.h>
 #include <common/DateLUT.h>
 namespace DB
 {
 class TTLBlockInputStream : public IBlockInputStream
 {
 public:
    TTLBlockInputStream(
        const BlockInputStreamPtr & input_,
        const MergeTreeData & storage_,
        const MergeTreeData::MutableDataPartPtr & data_part_,
        time_t current_time
    );
    String getName() const override { return "TTLBlockInputStream"; }
    Block getHeader() const override;
 protected:
    Block readImpl() override;
    /// Finalizes ttl infos and updates data part
    void readSuffixImpl() override;
 private:
    const MergeTreeData & storage;
    /// ttl_infos and empty_columns are updating while reading
    const MergeTreeData::MutableDataPartPtr & data_part;
    time_t current_time;
    MergeTreeDataPart::TTLInfos old_ttl_infos;
    MergeTreeDataPart::TTLInfos new_ttl_infos;
    NameSet empty_columns;
    size_t rows_removed = 0;
    Logger * log;
    DateLUTImpl date_lut;
    std::unordered_map<String, String> defaults_result_column;
    ExpressionActionsPtr defaults_expression;
 private:
    /// Removes values with expired ttl and computes new min_ttl and empty_columns for part
    void removeValuesWithExpiredColumnTTL(Block & block);
    /// Remove rows with expired table ttl and computes new min_ttl for part
    void removeRowsWithExpiredTableTTL(Block & block);
    UInt32 getTimestampByIndex(const IColumn * column, size_t ind);
 };
 }
--- a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp
+++ b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp
@ -16,8 +16,8 @@ namespace ErrorCodes
 VersionedCollapsingSortedBlockInputStream::VersionedCollapsingSortedBlockInputStream(
    const BlockInputStreams & inputs_, const SortDescription & description_,
    const String & sign_column_, size_t max_block_size_,
-    WriteBuffer * out_row_sources_buf_)
+    WriteBuffer * out_row_sources_buf_, bool average_block_sizes_)
-    : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_)
+    : MergingSortedBlockInputStream(inputs_, description_, max_block_size_, 0, out_row_sources_buf_, false, average_block_sizes_)
    , max_rows_in_queue(std::min(std::max<size_t>(3, max_block_size_), MAX_ROWS_IN_MULTIVERSION_QUEUE) - 2)
    , current_keys(max_rows_in_queue + 1)
 {
@ -83,7 +83,7 @@ Block VersionedCollapsingSortedBlockInputStream::readImpl()
 void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue<SortCursor> & queue)
 {
-    size_t merged_rows = 0;
+    MergeStopCondition stop_condition(average_block_sizes, max_block_size);
    auto update_queue = [this, & queue](SortCursor & cursor)
    {
@ -108,6 +108,7 @@ void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_co
    while (!queue.empty())
    {
        SortCursor current = queue.top();
        size_t current_block_granularity = current->rows;
        RowRef next_key;
@ -154,10 +155,10 @@ void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_co
            current_keys.popFront();
-            ++merged_rows;
+            stop_condition.addRowWithGranularity(current_block_granularity);
            --rows_to_merge;
-            if (merged_rows >= max_block_size)
+            if (stop_condition.checkStop())
            {
                ++blocks_written;
                return;
@ -173,7 +174,6 @@ void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_co
        insertRow(gap, row, merged_columns);
        current_keys.popFront();
        ++merged_rows;
    }
    /// Write information about last collapsed rows.
--- a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h
+++ b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h
@ -178,7 +178,7 @@ public:
    VersionedCollapsingSortedBlockInputStream(
        const BlockInputStreams & inputs_, const SortDescription & description_,
        const String & sign_column_, size_t max_block_size_,
-        WriteBuffer * out_row_sources_buf_ = nullptr);
+        WriteBuffer * out_row_sources_buf_ = nullptr, bool average_block_sizes_ = false);
    String getName() const override { return "VersionedCollapsingSorted"; }
--- a/dbms/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp
+++ b/dbms/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp
@ -0,0 +1,138 @@
 #pragma GCC diagnostic ignored "-Wsign-compare"
 #ifdef __clang__
 #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
 #pragma clang diagnostic ignored "-Wundef"
 #endif
 #include <gtest/gtest.h>
 #include <Core/Block.h>
 #include <Columns/ColumnVector.h>
 #include <DataStreams/MergingSortedBlockInputStream.h>
 #include <DataStreams/BlocksListBlockInputStream.h>
 #include <DataTypes/DataTypesNumber.h>
 #include <Columns/ColumnsNumber.h>
 using namespace DB;
 Block getBlockWithSize(const std::vector<std::string> & columns, size_t rows, size_t stride, size_t & start)
 {
    ColumnsWithTypeAndName cols;
    size_t size_of_row_in_bytes = columns.size() * sizeof(UInt64);
    for (size_t i = 0; i * sizeof(UInt64) < size_of_row_in_bytes; i++)
    {
        auto column = ColumnUInt64::create(rows, 0);
        for (size_t j = 0; j < rows; ++j)
        {
            column->getElement(j) = start;
            start += stride;
        }
        cols.emplace_back(std::move(column), std::make_shared<DataTypeUInt64>(), columns[i]);
    }
    return Block(cols);
 }
 BlockInputStreams getInputStreams(const std::vector<std::string> & column_names, const std::vector<std::tuple<size_t, size_t, size_t>> & block_sizes)
 {
    BlockInputStreams result;
    for (auto [block_size_in_bytes, blocks_count, stride] : block_sizes)
    {
        BlocksList blocks;
        size_t start = stride;
        while (blocks_count--)
            blocks.push_back(getBlockWithSize(column_names, block_size_in_bytes, stride, start));
        result.push_back(std::make_shared<BlocksListBlockInputStream>(std::move(blocks)));
    }
    return result;
 }
 BlockInputStreams getInputStreamsEqualStride(const std::vector<std::string> & column_names, const std::vector<std::tuple<size_t, size_t, size_t>> & block_sizes)
 {
    BlockInputStreams result;
    size_t i = 0;
    for (auto [block_size_in_bytes, blocks_count, stride] : block_sizes)
    {
        BlocksList blocks;
        size_t start = i;
        while (blocks_count--)
            blocks.push_back(getBlockWithSize(column_names, block_size_in_bytes, stride, start));
        result.push_back(std::make_shared<BlocksListBlockInputStream>(std::move(blocks)));
        i++;
    }
    return result;
 }
 SortDescription getSortDescription(const std::vector<std::string> & column_names)
 {
    SortDescription descr;
    for (const auto & column : column_names)
    {
        descr.emplace_back(column, 1, 1);
    }
    return descr;
 }
 TEST(MergingSortedTest, SimpleBlockSizeTest)
 {
    std::vector<std::string> key_columns{"K1", "K2", "K3"};
    auto sort_description = getSortDescription(key_columns);
    auto streams = getInputStreams(key_columns, {{5, 1, 1}, {10, 1, 2}, {21, 1, 3}});
    EXPECT_EQ(streams.size(), 3);
    MergingSortedBlockInputStream stream(streams, sort_description, DEFAULT_MERGE_BLOCK_SIZE, 0, nullptr, false, true);
    size_t total_rows = 0;
    auto block1 = stream.read();
    auto block2 = stream.read();
    auto block3 = stream.read();
    EXPECT_EQ(stream.read(), Block());
    for (auto & block : {block1, block2, block3})
        total_rows += block.rows();
    /**
      * First block consists of 1 row from block3 with 21 rows + 2 rows from block2 with 10 rows
      * + 5 rows from block 1 with 5 rows granularity
      */
    EXPECT_EQ(block1.rows(), 8);
    /**
      * Combination of 10 and 21 rows blocks
      */
    EXPECT_EQ(block2.rows(), 14);
    /**
      * Combination of 10 and 21 rows blocks
      */
    EXPECT_EQ(block3.rows(), 14);
    EXPECT_EQ(total_rows, 5 + 10 + 21);
 }
 TEST(MergingSortedTest, MoreInterestingBlockSizes)
 {
    std::vector<std::string> key_columns{"K1", "K2", "K3"};
    auto sort_description = getSortDescription(key_columns);
    auto streams = getInputStreamsEqualStride(key_columns, {{1000, 1, 3}, {1500, 1, 3}, {1400, 1, 3}});
    EXPECT_EQ(streams.size(), 3);
    MergingSortedBlockInputStream stream(streams, sort_description, DEFAULT_MERGE_BLOCK_SIZE, 0, nullptr, false, true);
    auto block1 = stream.read();
    auto block2 = stream.read();
    auto block3 = stream.read();
    EXPECT_EQ(stream.read(), Block());
    EXPECT_EQ(block1.rows(), (1000 + 1500 + 1400) / 3);
    EXPECT_EQ(block2.rows(), (1000 + 1500 + 1400) / 3);
    EXPECT_EQ(block3.rows(), (1000 + 1500 + 1400) / 3);
    EXPECT_EQ(block1.rows() + block2.rows() + block3.rows(), 1000 + 1500 + 1400);
 }
--- a/dbms/src/Databases/DatabaseDictionary.cpp
+++ b/dbms/src/Databases/DatabaseDictionary.cpp
@ -65,22 +65,13 @@ StoragePtr DatabaseDictionary::tryGetTable(
    const Context & context,
    const String & table_name) const
 {
-    auto objects_map = context.getExternalDictionaries().getObjectsMap();
+    auto dict_ptr = context.getExternalDictionaries().tryGetDictionary(table_name);
    const auto & dictionaries = objects_map.get();
    {
        auto it = dictionaries.find(table_name);
        if (it != dictionaries.end())
        {
            const auto & dict_ptr = std::static_pointer_cast<IDictionaryBase>(it->second.loadable);
    if (dict_ptr)
    {
        const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
        auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
        return StorageDictionary::create(table_name, ColumnsDescription{columns}, context, true, table_name);
    }
        }
    }
    return {};
 }
--- a/dbms/src/Dictionaries/CMakeLists.txt
+++ b/dbms/src/Dictionaries/CMakeLists.txt
@ -15,7 +15,7 @@ list(REMOVE_ITEM clickhouse_dictionaries_sources DictionaryFactory.cpp Dictionar
 list(REMOVE_ITEM clickhouse_dictionaries_headers DictionaryFactory.h DictionarySourceFactory.h DictionaryStructure.h)
 add_library(clickhouse_dictionaries ${LINK_MODE} ${clickhouse_dictionaries_sources})
-target_link_libraries(clickhouse_dictionaries PRIVATE dbms clickhouse_common_io pocoext ${BTRIE_LIBRARIES} PUBLIC Threads::Threads)
+target_link_libraries(clickhouse_dictionaries PRIVATE dbms clickhouse_common_io ${BTRIE_LIBRARIES} PUBLIC Threads::Threads)
 if(Poco_SQL_FOUND AND NOT USE_INTERNAL_POCO_LIBRARY)
    target_include_directories(clickhouse_dictionaries SYSTEM PRIVATE ${Poco_SQL_INCLUDE_DIR})
--- a/dbms/src/Dictionaries/XDBCDictionarySource.cpp
+++ b/dbms/src/Dictionaries/XDBCDictionarySource.cpp
@ -7,7 +7,6 @@
 #include <IO/ReadWriteBufferFromHTTP.h>
 #include <IO/WriteHelpers.h>
 #include <Interpreters/Context.h>
 #include <Poco/Ext/SessionPoolHelpers.h>
 #include <Poco/Net/HTTPRequest.h>
 #include <Poco/Util/AbstractConfiguration.h>
 #include <Common/XDBCBridgeHelper.h>
--- a/dbms/src/Functions/GatherUtils/Sources.h
+++ b/dbms/src/Functions/GatherUtils/Sources.h
@ -8,12 +8,14 @@
 #include <Columns/ColumnNullable.h>
 #include <Common/typeid_cast.h>
 #include <Common/UTF8Helpers.h>
 #include <Functions/GatherUtils/IArraySource.h>
 #include <Functions/GatherUtils/IValueSource.h>
 #include <Functions/GatherUtils/Slices.h>
 #include <Functions/FunctionHelpers.h>
 namespace DB
 {
@ -276,6 +278,92 @@ struct StringSource
 };
 /// Differs to StringSource by having 'offest' and 'length' in code points instead of bytes in getSlice* methods.
 /** NOTE: The behaviour of substring and substringUTF8 is inconsistent when negative offset is greater than string size:
  * substring:
  *      hello
  * ^-----^ - offset -10, length 7, result: "he"
  * substringUTF8:
  *      hello
  *      ^-----^ - offset -10, length 7, result: "hello"
  * This may be subject for change.
  */
 struct UTF8StringSource : public StringSource
 {
    using StringSource::StringSource;
    static const ColumnString::Char * skipCodePointsForward(const ColumnString::Char * pos, size_t size, const ColumnString::Char * end)
    {
        for (size_t i = 0; i < size && pos < end; ++i)
            pos += UTF8::seqLength(*pos);   /// NOTE pos may become greater than end. It is Ok due to padding in PaddedPODArray.
        return pos;
    }
    static const ColumnString::Char * skipCodePointsBackward(const ColumnString::Char * pos, size_t size, const ColumnString::Char * begin)
    {
        for (size_t i = 0; i < size && pos > begin; ++i)
        {
            --pos;
            if (pos == begin)
                break;
            UTF8::syncBackward(pos, begin);
        }
        return pos;
    }
    Slice getSliceFromLeft(size_t offset) const
    {
        auto begin = &elements[prev_offset];
        auto end = elements.data() + offsets[row_num] - 1;
        auto res_begin = skipCodePointsForward(begin, offset, end);
        if (res_begin >= end)
            return {begin, 0};
        return {res_begin, size_t(end - res_begin)};
    }
    Slice getSliceFromLeft(size_t offset, size_t length) const
    {
        auto begin = &elements[prev_offset];
        auto end = elements.data() + offsets[row_num] - 1;
        auto res_begin = skipCodePointsForward(begin, offset, end);
        if (res_begin >= end)
            return {begin, 0};
        auto res_end = skipCodePointsForward(res_begin, length, end);
        if (res_end >= end)
            return {res_begin, size_t(end - res_begin)};
        return {res_begin, size_t(res_end - res_begin)};
    }
    Slice getSliceFromRight(size_t offset) const
    {
        auto begin = &elements[prev_offset];
        auto end = elements.data() + offsets[row_num] - 1;
        auto res_begin = skipCodePointsBackward(end, offset, begin);
        return {res_begin, size_t(end - res_begin)};
    }
    Slice getSliceFromRight(size_t offset, size_t length) const
    {
        auto begin = &elements[prev_offset];
        auto end = elements.data() + offsets[row_num] - 1;
        auto res_begin = skipCodePointsBackward(end, offset, begin);
        auto res_end = skipCodePointsForward(res_begin, length, end);
        if (res_end >= end)
            return {res_begin, size_t(end - res_begin)};
        return {res_begin, size_t(res_end - res_begin)};
    }
 };
 struct FixedStringSource
 {
    using Slice = NumericArraySlice<UInt8>;
--- a/dbms/src/Functions/bitSwapLastTwo.cpp
+++ b/dbms/src/Functions/bitSwapLastTwo.cpp
@ -10,7 +10,7 @@ struct BitSwapLastTwoImpl
 {
    using ResultType = UInt8;
-    static inline ResultType apply(A a)
+    static inline ResultType NO_SANITIZE_UNDEFINED apply(A a)
    {
        return static_cast<ResultType>(
                ((static_cast<ResultType>(a) & 1) << 1) | ((static_cast<ResultType>(a) >> 1) & 1));
--- a/dbms/src/Functions/isValidUTF8.cpp
+++ b/dbms/src/Functions/isValidUTF8.cpp
@ -283,8 +283,7 @@ SOFTWARE.
        memset(buf + len + 1, 0, 16);
        check_packed(_mm_loadu_si128(reinterpret_cast<__m128i *>(buf + 1)));
-        /* Reduce error vector, error_reduced = 0xFFFF if error == 0 */
+        return _mm_testz_si128(error, error);
        return _mm_movemask_epi8(_mm_cmpeq_epi8(error, _mm_set1_epi8(0))) == 0xFFFF;
    }
 #endif
--- a/dbms/src/Functions/registerFunctionsString.cpp
+++ b/dbms/src/Functions/registerFunctionsString.cpp
@ -18,7 +18,6 @@ void registerFunctionReverse(FunctionFactory &);
 void registerFunctionReverseUTF8(FunctionFactory &);
 void registerFunctionsConcat(FunctionFactory &);
 void registerFunctionSubstring(FunctionFactory &);
 void registerFunctionSubstringUTF8(FunctionFactory &);
 void registerFunctionAppendTrailingCharIfAbsent(FunctionFactory &);
 void registerFunctionStartsWith(FunctionFactory &);
 void registerFunctionEndsWith(FunctionFactory &);
@ -46,7 +45,6 @@ void registerFunctionsString(FunctionFactory & factory)
    registerFunctionReverseUTF8(factory);
    registerFunctionsConcat(factory);
    registerFunctionSubstring(factory);
    registerFunctionSubstringUTF8(factory);
    registerFunctionAppendTrailingCharIfAbsent(factory);
    registerFunctionStartsWith(factory);
    registerFunctionEndsWith(factory);
--- a/dbms/src/Functions/substring.cpp
+++ b/dbms/src/Functions/substring.cpp
@ -28,10 +28,13 @@ namespace ErrorCodes
 }
 /// If 'is_utf8' - measure offset and length in code points instead of bytes.
 /// UTF8 variant is not available for FixedString arguments.
 template <bool is_utf8>
 class FunctionSubstring : public IFunction
 {
 public:
-    static constexpr auto name = "substring";
+    static constexpr auto name = is_utf8 ? "substringUTF8" : "substring";
    static FunctionPtr create(const Context &)
    {
        return std::make_shared<FunctionSubstring>();
@ -56,7 +59,7 @@ public:
                + toString(number_of_arguments) + ", should be 2 or 3",
                ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
-        if (!isStringOrFixedString(arguments[0]))
+        if ((is_utf8 && !isString(arguments[0])) || !isStringOrFixedString(arguments[0]))
            throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
        if (!isNumber(arguments[1]))
@ -145,6 +148,21 @@ public:
                throw Exception("Third argument provided for function substring could not be negative.", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
        }
        if constexpr (is_utf8)
        {
            if (const ColumnString * col = checkAndGetColumn<ColumnString>(column_string.get()))
                executeForSource(column_start, column_length, column_start_const, column_length_const, start_value,
                                length_value, block, result, UTF8StringSource(*col), input_rows_count);
            else if (const ColumnConst * col_const = checkAndGetColumnConst<ColumnString>(column_string.get()))
                executeForSource(column_start, column_length, column_start_const, column_length_const, start_value,
                                length_value, block, result, ConstSource<UTF8StringSource>(*col_const), input_rows_count);
            else
                throw Exception(
                    "Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(),
                    ErrorCodes::ILLEGAL_COLUMN);
        }
        else
        {
            if (const ColumnString * col = checkAndGetColumn<ColumnString>(column_string.get()))
                executeForSource(column_start, column_length, column_start_const, column_length_const, start_value,
                                length_value, block, result, StringSource(*col), input_rows_count);
@ -162,13 +180,16 @@ public:
                    "Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(),
                    ErrorCodes::ILLEGAL_COLUMN);
        }
    }
 };
 void registerFunctionSubstring(FunctionFactory & factory)
 {
-    factory.registerFunction<FunctionSubstring>(FunctionFactory::CaseInsensitive);
+    factory.registerFunction<FunctionSubstring<false>>(FunctionFactory::CaseInsensitive);
-    factory.registerAlias("substr", FunctionSubstring::name, FunctionFactory::CaseInsensitive);
+    factory.registerAlias("substr", "substring", FunctionFactory::CaseInsensitive);
-    factory.registerAlias("mid", FunctionSubstring::name, FunctionFactory::CaseInsensitive); /// from MySQL dialect
+    factory.registerAlias("mid", "substring", FunctionFactory::CaseInsensitive); /// from MySQL dialect
    factory.registerFunction<FunctionSubstring<true>>(FunctionFactory::CaseSensitive);
 }
 }
--- a/dbms/src/Functions/substringUTF8.cpp
+++ b/dbms/src/Functions/substringUTF8.cpp
@ -1,166 +0,0 @@
 #include <DataTypes/DataTypeString.h>
 #include <Columns/ColumnString.h>
 #include <Core/ColumnNumbers.h>
 #include <Functions/FunctionFactory.h>
 #include <Functions/FunctionHelpers.h>
 namespace DB
 {
 namespace ErrorCodes
 {
    extern const int ILLEGAL_COLUMN;
    extern const int ILLEGAL_TYPE_OF_ARGUMENT;
    extern const int ARGUMENT_OUT_OF_BOUND;
 }
 /** If the string is encoded in UTF-8, then it selects a substring of code points in it.
  * Otherwise, the behavior is undefined.
  */
 struct SubstringUTF8Impl
 {
    static void vector(const ColumnString::Chars & data,
        const ColumnString::Offsets & offsets,
        size_t start,
        size_t length,
        ColumnString::Chars & res_data,
        ColumnString::Offsets & res_offsets)
    {
        res_data.reserve(data.size());
        size_t size = offsets.size();
        res_offsets.resize(size);
        ColumnString::Offset prev_offset = 0;
        ColumnString::Offset res_offset = 0;
        for (size_t i = 0; i < size; ++i)
        {
            ColumnString::Offset j = prev_offset;
            ColumnString::Offset pos = 1;
            ColumnString::Offset bytes_start = 0;
            ColumnString::Offset bytes_length = 0;
            while (j < offsets[i] - 1)
            {
                if (pos == start)
                    bytes_start = j - prev_offset + 1;
                if (data[j] < 0xBF)
                    j += 1;
                else if (data[j] < 0xE0)
                    j += 2;
                else if (data[j] < 0xF0)
                    j += 3;
                else
                    j += 1;
                if (pos >= start && pos < start + length)
                    bytes_length = j - prev_offset + 1 - bytes_start;
                else if (pos >= start + length)
                    break;
                ++pos;
            }
            if (bytes_start == 0)
            {
                res_data.resize(res_data.size() + 1);
                res_data[res_offset] = 0;
                ++res_offset;
            }
            else
            {
                size_t bytes_to_copy = std::min(offsets[i] - prev_offset - bytes_start, bytes_length);
                res_data.resize(res_data.size() + bytes_to_copy + 1);
                memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &data[prev_offset + bytes_start - 1], bytes_to_copy);
                res_offset += bytes_to_copy + 1;
                res_data[res_offset - 1] = 0;
            }
            res_offsets[i] = res_offset;
            prev_offset = offsets[i];
        }
    }
 };
 class FunctionSubstringUTF8 : public IFunction
 {
 public:
    static constexpr auto name = "substringUTF8";
    static FunctionPtr create(const Context &)
    {
        return std::make_shared<FunctionSubstringUTF8>();
    }
    String getName() const override
    {
        return name;
    }
    size_t getNumberOfArguments() const override
    {
        return 3;
    }
    bool useDefaultImplementationForConstants() const override { return true; }
    ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1, 2}; }
    DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
    {
        if (!isString(arguments[0]))
            throw Exception(
                "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
        if (!isNumber(arguments[1]) || !isNumber(arguments[2]))
            throw Exception("Illegal type " + (isNumber(arguments[1]) ? arguments[2]->getName() : arguments[1]->getName())
                    + " of argument of function "
                    + getName(),
                ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
        return std::make_shared<DataTypeString>();
    }
    void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) override
    {
        const ColumnPtr column_string = block.getByPosition(arguments[0]).column;
        const ColumnPtr column_start = block.getByPosition(arguments[1]).column;
        const ColumnPtr column_length = block.getByPosition(arguments[2]).column;
        if (!column_start->isColumnConst() || !column_length->isColumnConst())
            throw Exception("2nd and 3rd arguments of function " + getName() + " must be constants.", ErrorCodes::ILLEGAL_COLUMN);
        Field start_field = (*block.getByPosition(arguments[1]).column)[0];
        Field length_field = (*block.getByPosition(arguments[2]).column)[0];
        if (start_field.getType() != Field::Types::UInt64 || length_field.getType() != Field::Types::UInt64)
            throw Exception("2nd and 3rd arguments of function " + getName() + " must be non-negative and must have UInt type.", ErrorCodes::ILLEGAL_COLUMN);
        UInt64 start = start_field.get<UInt64>();
        UInt64 length = length_field.get<UInt64>();
        if (start == 0)
            throw Exception("Second argument of function substring must be greater than 0.", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
        /// Otherwise may lead to overflow and pass bounds check inside inner loop.
        if (start >= 0x8000000000000000ULL || length >= 0x8000000000000000ULL)
            throw Exception("Too large values of 2nd or 3rd argument provided for function substring.", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
        if (const ColumnString * col = checkAndGetColumn<ColumnString>(column_string.get()))
        {
            auto col_res = ColumnString::create();
            SubstringUTF8Impl::vector(col->getChars(), col->getOffsets(), start, length, col_res->getChars(), col_res->getOffsets());
            block.getByPosition(result).column = std::move(col_res);
        }
        else
            throw Exception(
                "Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(),
                ErrorCodes::ILLEGAL_COLUMN);
    }
 };
 void registerFunctionSubstringUTF8(FunctionFactory & factory)
 {
    factory.registerFunction<FunctionSubstringUTF8>();
 }
 }
--- a/dbms/src/IO/UncompressedCache.h
+++ b/dbms/src/IO/UncompressedCache.h
@ -6,6 +6,11 @@
 #include <Common/ProfileEvents.h>
 #include <IO/BufferWithOwnMemory.h>
 #include <Common/config.h>
 #if USE_LFALLOC
 #include <Common/LFAllocator.h>
 #endif
 namespace ProfileEvents
 {
@ -20,7 +25,11 @@ namespace DB
 struct UncompressedCacheCell
 {
 #if USE_LFALLOC
    Memory<LFAllocator> data;
 #else
    Memory<> data;
 #endif
    size_t compressed_size;
    UInt32 additional_bytes;
 };
--- a/dbms/src/Interpreters/ClientInfo.h
+++ b/dbms/src/Interpreters/ClientInfo.h
@ -37,7 +37,7 @@ public:
    {
        NO_QUERY = 0,            /// Uninitialized object.
        INITIAL_QUERY = 1,
-        SECONDARY_QUERY = 2,    /// Query that was initiated by another query for distributed query execution.
+        SECONDARY_QUERY = 2,    /// Query that was initiated by another query for distributed or ON CLUSTER query execution.
    };
--- a/dbms/src/Interpreters/Compiler.cpp
+++ b/dbms/src/Interpreters/Compiler.cpp
@ -2,6 +2,7 @@
 #include <Poco/Util/Application.h>
 #include <ext/unlock_guard.h>
 #include <Common/ClickHouseRevision.h>
 #include <Common/config.h>
 #include <Common/SipHash.h>
 #include <Common/ShellCommand.h>
 #include <Common/StringUtils/StringUtils.h>
@ -261,6 +262,9 @@ void Compiler::compile(
            " -I " << compiler_headers << "/dbms/src/"
            " -isystem " << compiler_headers << "/contrib/cityhash102/include/"
            " -isystem " << compiler_headers << "/contrib/libpcg-random/include/"
        #if USE_LFALLOC
            " -isystem " << compiler_headers << "/contrib/lfalloc/src/"
        #endif
            " -isystem " << compiler_headers << INTERNAL_DOUBLE_CONVERSION_INCLUDE_DIR
            " -isystem " << compiler_headers << INTERNAL_Poco_Foundation_INCLUDE_DIR
            " -isystem " << compiler_headers << INTERNAL_Boost_INCLUDE_DIRS
--- a/dbms/src/Interpreters/Context.cpp
+++ b/dbms/src/Interpreters/Context.cpp
@ -104,7 +104,7 @@ struct ContextShared
    mutable std::recursive_mutex mutex;
    /// Separate mutex for access of dictionaries. Separate mutex to avoid locks when server doing request to itself.
    mutable std::mutex embedded_dictionaries_mutex;
-    mutable std::mutex external_dictionaries_mutex;
+    mutable std::recursive_mutex external_dictionaries_mutex;
    mutable std::mutex external_models_mutex;
    /// Separate mutex for re-initialization of zookeer session. This operation could take a long time and must not interfere with another operations.
    mutable std::mutex zookeeper_mutex;
@ -1240,44 +1240,38 @@ EmbeddedDictionaries & Context::getEmbeddedDictionariesImpl(const bool throw_on_
 ExternalDictionaries & Context::getExternalDictionariesImpl(const bool throw_on_error) const
 {
-    const auto & config = getConfigRef();
+    {
        std::lock_guard lock(shared->external_dictionaries_mutex);
        if (shared->external_dictionaries)
            return *shared->external_dictionaries;
    }
    const auto & config = getConfigRef();
    std::lock_guard lock(shared->external_dictionaries_mutex);
    if (!shared->external_dictionaries)
    {
        if (!this->global_context)
            throw Exception("Logical error: there is no global context", ErrorCodes::LOGICAL_ERROR);
        auto config_repository = shared->runtime_components_factory->createExternalDictionariesConfigRepository();
-
+        shared->external_dictionaries.emplace(std::move(config_repository), config, *this->global_context);
-        shared->external_dictionaries.emplace(
+        shared->external_dictionaries->init(throw_on_error);
            std::move(config_repository),
            config,
            *this->global_context,
            throw_on_error);
    }
    return *shared->external_dictionaries;
 }
 ExternalModels & Context::getExternalModelsImpl(bool throw_on_error) const
 {
    std::lock_guard lock(shared->external_models_mutex);
    if (!shared->external_models)
    {
        if (!this->global_context)
            throw Exception("Logical error: there is no global context", ErrorCodes::LOGICAL_ERROR);
        auto config_repository = shared->runtime_components_factory->createExternalModelsConfigRepository();
-
+        shared->external_models.emplace(std::move(config_repository), *this->global_context);
-        shared->external_models.emplace(
+        shared->external_models->init(throw_on_error);
            std::move(config_repository),
            *this->global_context,
            throw_on_error);
    }
    return *shared->external_models;
 }
--- a/dbms/src/Interpreters/DDLWorker.cpp
+++ b/dbms/src/Interpreters/DDLWorker.cpp
@ -547,6 +547,7 @@ bool DDLWorker::tryExecuteQuery(const String & query, const DDLTask & task, Exec
    try
    {
        current_context = std::make_unique<Context>(context);
        current_context->getClientInfo().query_kind = ClientInfo::QueryKind::SECONDARY_QUERY;
        current_context->setCurrentQueryId(""); // generate random query_id
        executeQuery(istr, ostr, false, *current_context, {}, {});
    }
--- a/dbms/src/Interpreters/ExternalDictionaries.cpp
+++ b/dbms/src/Interpreters/ExternalDictionaries.cpp
@ -30,8 +30,7 @@ namespace
 ExternalDictionaries::ExternalDictionaries(
    std::unique_ptr<IExternalLoaderConfigRepository> config_repository,
    const Poco::Util::AbstractConfiguration & config,
-    Context & context,
+    Context & context)
    bool throw_on_error)
        : ExternalLoader(config,
                         externalDictionariesUpdateSettings,
                         getExternalDictionariesConfigSettings(),
@ -40,11 +39,11 @@ ExternalDictionaries::ExternalDictionaries(
                         "external dictionary"),
        context(context)
 {
    init(throw_on_error);
 }
 std::unique_ptr<IExternalLoadable> ExternalDictionaries::create(
-        const std::string & name, const Configuration & config, const std::string & config_prefix)
+        const std::string & name, const Configuration & config, const std::string & config_prefix) const
 {
    return DictionaryFactory::instance().create(name, config, config_prefix, context);
 }
--- a/dbms/src/Interpreters/ExternalDictionaries.h
+++ b/dbms/src/Interpreters/ExternalDictionaries.h
@ -21,8 +21,7 @@ public:
    ExternalDictionaries(
        std::unique_ptr<IExternalLoaderConfigRepository> config_repository,
        const Poco::Util::AbstractConfiguration & config,
-        Context & context,
+        Context & context);
        bool throw_on_error);
    /// Forcibly reloads specified dictionary.
    void reloadDictionary(const std::string & name) { reload(name); }
@ -40,7 +39,7 @@ public:
 protected:
    std::unique_ptr<IExternalLoadable> create(const std::string & name, const Configuration & config,
-                                              const std::string & config_prefix) override;
+                                              const std::string & config_prefix) const override;
    using ExternalLoader::getObjectsMap;
--- a/dbms/src/Interpreters/ExternalLoader.cpp
+++ b/dbms/src/Interpreters/ExternalLoader.cpp
@ -57,12 +57,16 @@ ExternalLoader::ExternalLoader(const Poco::Util::AbstractConfiguration & config_
 {
 }
 void ExternalLoader::init(bool throw_on_error)
 {
-    if (is_initialized)
+    std::call_once(is_initialized_flag, &ExternalLoader::initImpl, this, throw_on_error);
-        return;
+}
-    is_initialized = true;
+
 void ExternalLoader::initImpl(bool throw_on_error)
 {
    std::lock_guard all_lock(all_mutex);
    {
        /// During synchronous loading of external dictionaries at moment of query execution,
@ -87,13 +91,13 @@ ExternalLoader::~ExternalLoader()
 void ExternalLoader::reloadAndUpdate(bool throw_on_error)
 {
    std::lock_guard all_lock(all_mutex);
    reloadFromConfigFiles(throw_on_error);
    /// list of recreated loadable objects to perform delayed removal from unordered_map
    std::list<std::string> recreated_failed_loadable_objects;
    std::lock_guard all_lock(all_mutex);
    /// retry loading failed loadable objects
    for (auto & failed_loadable_object : failed_loadable_objects)
    {
@ -250,15 +254,17 @@ void ExternalLoader::reloadAndUpdate(bool throw_on_error)
    }
 }
 void ExternalLoader::reloadFromConfigFiles(const bool throw_on_error, const bool force_reload, const std::string & only_dictionary)
 {
-    const auto config_paths = config_repository->list(config_main, config_settings.path_setting_name);
+    std::lock_guard all_lock{all_mutex};
    const auto config_paths = config_repository->list(config_main, config_settings.path_setting_name);
    for (const auto & config_path : config_paths)
    {
        try
        {
-            reloadFromConfigFile(config_path, throw_on_error, force_reload, only_dictionary);
+            reloadFromConfigFile(config_path, force_reload, only_dictionary);
        }
        catch (...)
        {
@ -270,8 +276,8 @@ void ExternalLoader::reloadFromConfigFiles(const bool throw_on_error, const bool
    }
    /// erase removed from config loadable objects
    {
        std::lock_guard lock{map_mutex};
        std::list<std::string> removed_loadable_objects;
        for (const auto & loadable : loadable_objects)
        {
@ -283,17 +289,21 @@ void ExternalLoader::reloadFromConfigFiles(const bool throw_on_error, const bool
            loadable_objects.erase(name);
    }
-void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const bool throw_on_error,
+    /// create all loadable objects which was read from config
-                                          const bool force_reload, const std::string & loadable_name)
+    finishAllReloads(throw_on_error);
 }
 void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const bool force_reload, const std::string & loadable_name)
 {
    // We assume `all_mutex` is already locked.
    if (config_path.empty() || !config_repository->exists(config_path))
    {
        LOG_WARNING(log, "config file '" + config_path + "' does not exist");
    }
    else
    {
        std::lock_guard all_lock(all_mutex);
        auto modification_time_it = last_modification_times.find(config_path);
        if (modification_time_it == std::end(last_modification_times))
            modification_time_it = last_modification_times.emplace(config_path, Poco::Timestamp{0}).first;
@ -329,8 +339,6 @@ void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const
                    continue;
                }
                try
                {
                name = loaded_config->getString(key + "." + config_settings.external_name);
                if (name.empty())
                {
@ -342,7 +350,80 @@ void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const
                if (!loadable_name.empty() && name != loadable_name)
                    continue;
-                    decltype(loadable_objects.begin()) object_it;
+                objects_to_reload.emplace(name, LoadableCreationInfo{name, loaded_config, config_path, key});
            }
        }
    }
 }
 void ExternalLoader::reload()
 {
    reloadFromConfigFiles(true, true);
 }
 void ExternalLoader::reload(const std::string & name)
 {
    reloadFromConfigFiles(true, true, name);
    /// Check that specified object was loaded
    std::lock_guard lock{map_mutex};
    if (!loadable_objects.count(name))
        throw Exception("Failed to load " + object_name + " '" + name + "' during the reload process", ErrorCodes::BAD_ARGUMENTS);
 }
 void ExternalLoader::finishReload(const std::string & loadable_name, bool throw_on_error) const
 {
    // We assume `all_mutex` is already locked.
    auto it = objects_to_reload.find(loadable_name);
    if (it == objects_to_reload.end())
        return;
    LoadableCreationInfo creation_info = std::move(it->second);
    objects_to_reload.erase(it);
    finishReloadImpl(creation_info, throw_on_error);
 }
 void ExternalLoader::finishAllReloads(bool throw_on_error) const
 {
    // We assume `all_mutex` is already locked.
    // We cannot just go through the map `objects_to_create` from begin to end and create every object
    // because these objects can depend on each other.
    // For example, if the first object's data depends on the second object's data then
    // creating the first object will cause creating the second object too.
    while (!objects_to_reload.empty())
    {
        auto it = objects_to_reload.begin();
        LoadableCreationInfo creation_info = std::move(it->second);
        objects_to_reload.erase(it);
        try
        {
            finishReloadImpl(creation_info, throw_on_error);
        }
        catch (...)
        {
            objects_to_reload.clear(); // no more objects to create after an exception
            throw;
        }
    }
 }
 void ExternalLoader::finishReloadImpl(const LoadableCreationInfo & creation_info, bool throw_on_error) const
 {
    // We assume `all_mutex` is already locked.
    const std::string & name = creation_info.name;
    const std::string & config_path = creation_info.config_path;
    try
    {
        ObjectsMap::iterator object_it;
        {
            std::lock_guard lock{map_mutex};
            object_it = loadable_objects.find(name);
@ -354,7 +435,7 @@ void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const
                                ErrorCodes::EXTERNAL_LOADABLE_ALREADY_EXISTS);
        }
-                    auto object_ptr = create(name, *loaded_config, key);
+        auto object_ptr = create(name, *creation_info.config, creation_info.config_prefix);
        /// If the object could not be loaded.
        if (const auto exception_ptr = object_ptr->getCreationException())
@ -422,29 +503,18 @@ void ExternalLoader::reloadFromConfigFile(const std::string & config_path, const
            throw;
    }
 }
        }
    }
 }
 void ExternalLoader::reload()
 {
    reloadFromConfigFiles(true, true);
 }
 void ExternalLoader::reload(const std::string & name)
 {
    reloadFromConfigFiles(true, true, name);
    /// Check that specified object was loaded
    std::lock_guard lock{map_mutex};
    if (!loadable_objects.count(name))
        throw Exception("Failed to load " + object_name + " '" + name + "' during the reload process", ErrorCodes::BAD_ARGUMENTS);
 }
 ExternalLoader::LoadablePtr ExternalLoader::getLoadableImpl(const std::string & name, bool throw_on_error) const
 {
-    std::lock_guard lock{map_mutex};
+    /// We try to finish the reloading of the object `name` here, before searching it in the map `loadable_objects` later in this function.
    /// If some other thread is already doing this reload work we don't want to wait until it finishes, because it's faster to just use
    /// the current version of this loadable object. That's why we use try_lock() instead of lock() here.
    std::unique_lock all_lock{all_mutex, std::defer_lock};
    if (all_lock.try_lock())
        finishReload(name, throw_on_error);
    std::lock_guard lock{map_mutex};
    const auto it = loadable_objects.find(name);
    if (it == std::end(loadable_objects))
    {
--- a/dbms/src/Interpreters/ExternalLoader.h
+++ b/dbms/src/Interpreters/ExternalLoader.h
@ -89,7 +89,7 @@ public:
    using Configuration = Poco::Util::AbstractConfiguration;
    using ObjectsMap = std::unordered_map<std::string, LoadableInfo>;
-    /// Objects will be loaded immediately and then will be updated in separate thread, each 'reload_period' seconds.
+    /// Call init() after constructing the instance of any derived class.
    ExternalLoader(const Configuration & config_main,
                   const ExternalLoaderUpdateSettings & update_settings,
                   const ExternalLoaderConfigSettings & config_settings,
@ -97,6 +97,11 @@ public:
                   Logger * log, const std::string & loadable_object_name);
    virtual ~ExternalLoader();
    /// Should be called after creating an instance of a derived class.
    /// Loads the objects immediately and starts a separate thread to update them once in each 'reload_period' seconds.
    /// This function does nothing if called again.
    void init(bool throw_on_error);
    /// Forcibly reloads all loadable objects.
    void reload();
@ -108,7 +113,7 @@ public:
 protected:
    virtual std::unique_ptr<IExternalLoadable> create(const std::string & name, const Configuration & config,
-                                                      const std::string & config_prefix) = 0;
+                                                      const std::string & config_prefix) const = 0;
    class LockedObjectsMap
    {
@ -123,12 +128,8 @@ protected:
    /// Direct access to objects.
    LockedObjectsMap getObjectsMap() const;
    /// Should be called in derived constructor (to avoid pure virtual call).
    void init(bool throw_on_error);
 private:
-
+    std::once_flag is_initialized_flag;
    bool is_initialized = false;
    /// Protects only objects map.
    /** Reading and assignment of "loadable" should be done under mutex.
@ -136,22 +137,35 @@ private:
      */
    mutable std::mutex map_mutex;
-    /// Protects all data, currently used to avoid races between updating thread and SYSTEM queries
+    /// Protects all data, currently used to avoid races between updating thread and SYSTEM queries.
-    mutable std::mutex all_mutex;
+    /// The mutex is recursive because creating of objects might be recursive, i.e.
    /// creating objects might cause creating other objects.
    mutable std::recursive_mutex all_mutex;
    /// name -> loadable.
-    ObjectsMap loadable_objects;
+    mutable ObjectsMap loadable_objects;
    struct LoadableCreationInfo
    {
        std::string name;
        Poco::AutoPtr<Poco::Util::AbstractConfiguration> config;
        std::string config_path;
        std::string config_prefix;
    };
    /// Objects which should be reloaded soon.
    mutable std::unordered_map<std::string, LoadableCreationInfo> objects_to_reload;
    /// Here are loadable objects, that has been never loaded successfully.
    /// They are also in 'loadable_objects', but with nullptr as 'loadable'.
-    std::unordered_map<std::string, FailedLoadableInfo> failed_loadable_objects;
+    mutable std::unordered_map<std::string, FailedLoadableInfo> failed_loadable_objects;
    /// Both for loadable_objects and failed_loadable_objects.
-    std::unordered_map<std::string, std::chrono::system_clock::time_point> update_times;
+    mutable std::unordered_map<std::string, std::chrono::system_clock::time_point> update_times;
    std::unordered_map<std::string, std::unordered_set<std::string>> loadable_objects_defined_in_config;
-    pcg64 rnd_engine{randomSeed()};
+    mutable pcg64 rnd_engine{randomSeed()};
    const Configuration & config_main;
    const ExternalLoaderUpdateSettings & update_settings;
@ -168,17 +182,22 @@ private:
    std::unordered_map<std::string, Poco::Timestamp> last_modification_times;
    void initImpl(bool throw_on_error);
    /// Check objects definitions in config files and reload or/and add new ones if the definition is changed
    /// If loadable_name is not empty, load only loadable object with name loadable_name
    void reloadFromConfigFiles(bool throw_on_error, bool force_reload = false, const std::string & loadable_name = "");
-    void reloadFromConfigFile(const std::string & config_path, const bool throw_on_error,
+    void reloadFromConfigFile(const std::string & config_path, const bool force_reload, const std::string & loadable_name);
                                const bool force_reload, const std::string & loadable_name);
    /// Check config files and update expired loadable objects
    void reloadAndUpdate(bool throw_on_error = false);
    void reloadPeriodically();
    void finishReload(const std::string & loadable_name, bool throw_on_error) const;
    void finishAllReloads(bool throw_on_error) const;
    void finishReloadImpl(const LoadableCreationInfo & creation_info, bool throw_on_error) const;
    LoadablePtr getLoadableImpl(const std::string & name, bool throw_on_error) const;
 };
--- a/dbms/src/Interpreters/ExternalModels.cpp
+++ b/dbms/src/Interpreters/ExternalModels.cpp
@ -33,8 +33,7 @@ namespace
 ExternalModels::ExternalModels(
    std::unique_ptr<IExternalLoaderConfigRepository> config_repository,
-    Context & context,
+    Context & context)
    bool throw_on_error)
        : ExternalLoader(context.getConfigRef(),
                         externalModelsUpdateSettings,
                         getExternalModelsConfigSettings(),
@ -43,11 +42,10 @@ ExternalModels::ExternalModels(
                         "external model"),
          context(context)
 {
    init(throw_on_error);
 }
 std::unique_ptr<IExternalLoadable> ExternalModels::create(
-        const std::string & name, const Configuration & config, const std::string & config_prefix)
+        const std::string & name, const Configuration & config, const std::string & config_prefix) const
 {
    String type = config.getString(config_prefix + ".type");
    ExternalLoadableLifetime lifetime(config, config_prefix + ".lifetime");
--- a/dbms/src/Interpreters/ExternalModels.h
+++ b/dbms/src/Interpreters/ExternalModels.h
@ -20,8 +20,7 @@ public:
    /// Models will be loaded immediately and then will be updated in separate thread, each 'reload_period' seconds.
    ExternalModels(
        std::unique_ptr<IExternalLoaderConfigRepository> config_repository,
-        Context & context,
+        Context & context);
        bool throw_on_error);
    /// Forcibly reloads specified model.
    void reloadModel(const std::string & name) { reload(name); }
@ -34,7 +33,7 @@ public:
 protected:
    std::unique_ptr<IExternalLoadable> create(const std::string & name, const Configuration & config,
-                                              const std::string & config_prefix) override;
+                                              const std::string & config_prefix) const override;
    using ExternalLoader::getObjectsMap;
--- a/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.cpp
+++ b/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.cpp
@ -2,6 +2,7 @@
 #include <Interpreters/Context.h>
 #include <Interpreters/DatabaseAndTableWithAlias.h>
 #include <Interpreters/IdentifierSemantic.h>
 #include <Interpreters/InDepthNodeVisitor.h>
 #include <Storages/StorageDistributed.h>
 #include <Parsers/ASTIdentifier.h>
 #include <Parsers/ASTSelectQuery.h>
@ -23,114 +24,37 @@ namespace ErrorCodes
 namespace
 {
 /** Call a function for each non-GLOBAL subquery in IN or JOIN.
  * Pass to function: AST node with subquery, and AST node with corresponding IN function or JOIN.
  * Consider only first-level subqueries (do not go recursively into subqueries).
  */
 template <typename F>
 void forEachNonGlobalSubquery(IAST * node, F && f)
 {
    if (auto * function = node->as<ASTFunction>())
    {
        if (function->name == "in" || function->name == "notIn")
        {
            f(function->arguments->children.at(1).get(), function, nullptr);
            return;
        }
        /// Pass into other functions, as subquery could be in aggregate or in lambda functions.
    }
    else if (const auto * join = node->as<ASTTablesInSelectQueryElement>())
    {
        if (join->table_join && join->table_expression)
        {
            auto & table_join = join->table_join->as<ASTTableJoin &>();
            if (table_join.locality != ASTTableJoin::Locality::Global)
            {
                auto & subquery = join->table_expression->as<ASTTableExpression>()->subquery;
                if (subquery)
                    f(subquery.get(), nullptr, &table_join);
            }
            return;
        }
        /// Pass into other kind of JOINs, as subquery could be in ARRAY JOIN.
    }
    /// Descent into all children, but not into subqueries of other kind (scalar subqueries), that are irrelevant to us.
    for (auto & child : node->children)
        if (!child->as<ASTSelectQuery>())
            forEachNonGlobalSubquery(child.get(), f);
 }
 /** Find all (ordinary) tables in any nesting level in AST.
  */
 template <typename F>
 void forEachTable(IAST * node, F && f)
 {
    if (auto * table_expression = node->as<ASTTableExpression>())
    {
        auto & database_and_table = table_expression->database_and_table_name;
        if (database_and_table)
            f(database_and_table);
    }
    for (auto & child : node->children)
        forEachTable(child.get(), f);
 }
 StoragePtr tryGetTable(const ASTPtr & database_and_table, const Context & context)
 {
    DatabaseAndTableWithAlias db_and_table(database_and_table);
    return context.tryGetTable(db_and_table.database, db_and_table.table);
 }
 using CheckShardsAndTables = InJoinSubqueriesPreprocessor::CheckShardsAndTables;
 struct NonGlobalTableData
 {
    using TypeToVisit = ASTTableExpression;
    const CheckShardsAndTables & checker;
    const Context & context;
    ASTFunction * function = nullptr;
    ASTTableJoin * table_join = nullptr;
    void visit(ASTTableExpression & node, ASTPtr &)
    {
        ASTPtr & database_and_table = node.database_and_table_name;
        if (database_and_table)
            renameIfNeeded(database_and_table);
    }
-
+private:
-void InJoinSubqueriesPreprocessor::process(ASTSelectQuery * query) const
+    void renameIfNeeded(ASTPtr & database_and_table)
    {
    if (!query)
        return;
        const SettingDistributedProductMode distributed_product_mode = context.getSettingsRef().distributed_product_mode;
    if (distributed_product_mode == DistributedProductMode::ALLOW)
        return;
    if (!query->tables())
        return;
    const auto & tables_in_select_query = query->tables()->as<ASTTablesInSelectQuery &>();
    if (tables_in_select_query.children.empty())
        return;
    const auto & tables_element = tables_in_select_query.children[0]->as<ASTTablesInSelectQueryElement &>();
    if (!tables_element.table_expression)
        return;
    const auto * table_expression = tables_element.table_expression->as<ASTTableExpression>();
    /// If not ordinary table, skip it.
    if (!table_expression->database_and_table_name)
        return;
    /// If not really distributed table, skip it.
    {
        StoragePtr storage = tryGetTable(table_expression->database_and_table_name, context);
        if (!storage || !hasAtLeastTwoShards(*storage))
            return;
    }
    forEachNonGlobalSubquery(query, [&] (IAST * subquery, IAST * function, IAST * table_join)
    {
        forEachTable(subquery, [&] (ASTPtr & database_and_table)
        {
        StoragePtr storage = tryGetTable(database_and_table, context);
-
+        if (!storage || !checker.hasAtLeastTwoShards(*storage))
            if (!storage || !hasAtLeastTwoShards(*storage))
            return;
        if (distributed_product_mode == DistributedProductMode::DENY)
@ -157,7 +81,7 @@ void InJoinSubqueriesPreprocessor::process(ASTSelectQuery * query) const
                    throw Exception("Logical error: unexpected function name " + concrete->name, ErrorCodes::LOGICAL_ERROR);
            }
            else if (table_join)
-                    table_join->as<ASTTableJoin &>().locality = ASTTableJoin::Locality::Global;
+                table_join->locality = ASTTableJoin::Locality::Global;
            else
                throw Exception("Logical error: unexpected AST node", ErrorCodes::LOGICAL_ERROR);
        }
@ -167,18 +91,131 @@ void InJoinSubqueriesPreprocessor::process(ASTSelectQuery * query) const
            std::string database;
            std::string table;
-                std::tie(database, table) = getRemoteDatabaseAndTableName(*storage);
+            std::tie(database, table) = checker.getRemoteDatabaseAndTableName(*storage);
            String alias = database_and_table->tryGetAlias();
            if (alias.empty())
                throw Exception("Distributed table should have an alias when distributed_product_mode set to local.",
                                ErrorCodes::DISTRIBUTED_IN_JOIN_SUBQUERY_DENIED);
            database_and_table = createTableIdentifier(database, table);
            database_and_table->setAlias(alias);
        }
        else
-                throw Exception("InJoinSubqueriesPreprocessor: unexpected value of 'distributed_product_mode' setting", ErrorCodes::LOGICAL_ERROR);
+            throw Exception("InJoinSubqueriesPreprocessor: unexpected value of 'distributed_product_mode' setting",
-        });
+                            ErrorCodes::LOGICAL_ERROR);
-    });
+    }
 };
 using NonGlobalTableMatcher = OneTypeMatcher<NonGlobalTableData>;
 using NonGlobalTableVisitor = InDepthNodeVisitor<NonGlobalTableMatcher, true>;
 class NonGlobalSubqueryMatcher
 {
 public:
    struct Data
    {
        const CheckShardsAndTables & checker;
        const Context & context;
    };
    static void visit(ASTPtr & node, Data & data)
    {
        if (auto * function = node->as<ASTFunction>())
            visit(*function, node, data);
        if (const auto * tables = node->as<ASTTablesInSelectQueryElement>())
            visit(*tables, node, data);
    }
    static bool needChildVisit(ASTPtr & node, const ASTPtr & child)
    {
        if (auto * function = node->as<ASTFunction>())
            if (function->name == "in" || function->name == "notIn")
                return false; /// Processed, process others
        if (const auto * t = node->as<ASTTablesInSelectQueryElement>())
            if (t->table_join && t->table_expression)
                return false; /// Processed, process others
        /// Descent into all children, but not into subqueries of other kind (scalar subqueries), that are irrelevant to us.
        if (child->as<ASTSelectQuery>())
            return false;
        return true;
    }
 private:
    static void visit(ASTFunction & node, ASTPtr &, Data & data)
    {
        if (node.name == "in" || node.name == "notIn")
        {
            auto & subquery = node.arguments->children.at(1);
            NonGlobalTableVisitor::Data table_data{data.checker, data.context, &node, nullptr};
            NonGlobalTableVisitor(table_data).visit(subquery);
        }
    }
    static void visit(const ASTTablesInSelectQueryElement & node, ASTPtr &, Data & data)
    {
        if (!node.table_join || !node.table_expression)
            return;
        ASTTableJoin * table_join = node.table_join->as<ASTTableJoin>();
        if (table_join->locality != ASTTableJoin::Locality::Global)
        {
            if (auto & subquery = node.table_expression->as<ASTTableExpression>()->subquery)
            {
                NonGlobalTableVisitor::Data table_data{data.checker, data.context, nullptr, table_join};
                NonGlobalTableVisitor(table_data).visit(subquery);
            }
        }
    }
 };
 using NonGlobalSubqueryVisitor = InDepthNodeVisitor<NonGlobalSubqueryMatcher, true>;
 }
-bool InJoinSubqueriesPreprocessor::hasAtLeastTwoShards(const IStorage & table) const
+void InJoinSubqueriesPreprocessor::visit(ASTPtr & ast) const
 {
    if (!ast)
        return;
    ASTSelectQuery * query = ast->as<ASTSelectQuery>();
    if (!query || !query->tables())
        return;
    if (context.getSettingsRef().distributed_product_mode == DistributedProductMode::ALLOW)
        return;
    const auto & tables_in_select_query = query->tables()->as<ASTTablesInSelectQuery &>();
    if (tables_in_select_query.children.empty())
        return;
    const auto & tables_element = tables_in_select_query.children[0]->as<ASTTablesInSelectQueryElement &>();
    if (!tables_element.table_expression)
        return;
    const auto * table_expression = tables_element.table_expression->as<ASTTableExpression>();
    /// If not ordinary table, skip it.
    if (!table_expression->database_and_table_name)
        return;
    /// If not really distributed table, skip it.
    {
        StoragePtr storage = tryGetTable(table_expression->database_and_table_name, context);
        if (!storage || !checker->hasAtLeastTwoShards(*storage))
            return;
    }
    NonGlobalSubqueryVisitor::Data visitor_data{*checker, context};
    NonGlobalSubqueryVisitor(visitor_data).visit(ast);
 }
 bool InJoinSubqueriesPreprocessor::CheckShardsAndTables::hasAtLeastTwoShards(const IStorage & table) const
 {
    const StorageDistributed * distributed = dynamic_cast<const StorageDistributed *>(&table);
    if (!distributed)
@ -189,7 +226,7 @@ bool InJoinSubqueriesPreprocessor::hasAtLeastTwoShards(const IStorage & table) c
 std::pair<std::string, std::string>
-InJoinSubqueriesPreprocessor::getRemoteDatabaseAndTableName(const IStorage & table) const
+InJoinSubqueriesPreprocessor::CheckShardsAndTables::getRemoteDatabaseAndTableName(const IStorage & table) const
 {
    const StorageDistributed & distributed = dynamic_cast<const StorageDistributed &>(table);
    return { distributed.getRemoteDatabaseName(), distributed.getRemoteTableName() };
--- a/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.h
+++ b/dbms/src/Interpreters/InJoinSubqueriesPreprocessor.h
@ -1,14 +1,15 @@
 #pragma once
 #include <string>
 #include <memory>
 #include <Core/Types.h>
 #include <Core/SettingsCommon.h>
 #include <Parsers/IAST_fwd.h>
 namespace DB
 {
 class IAST;
 class IStorage;
 class ASTSelectQuery;
 class Context;
@ -34,16 +35,26 @@ class Context;
 class InJoinSubqueriesPreprocessor
 {
 public:
-    InJoinSubqueriesPreprocessor(const Context & context) : context(context) {}
+    struct CheckShardsAndTables
-    void process(ASTSelectQuery * query) const;
+    {
        using Ptr = std::unique_ptr<CheckShardsAndTables>;
        /// These methods could be overriden for the need of the unit test.
        virtual bool hasAtLeastTwoShards(const IStorage & table) const;
        virtual std::pair<std::string, std::string> getRemoteDatabaseAndTableName(const IStorage & table) const;
-    virtual ~InJoinSubqueriesPreprocessor() {}
+        virtual ~CheckShardsAndTables() {}
    };
    InJoinSubqueriesPreprocessor(const Context & context, CheckShardsAndTables::Ptr _checker = std::make_unique<CheckShardsAndTables>())
        : context(context)
        , checker(std::move(_checker))
    {}
    void visit(ASTPtr & query) const;
 private:
    const Context & context;
    CheckShardsAndTables::Ptr checker;
 };
--- a/dbms/src/Interpreters/InterpreterCreateQuery.cpp
+++ b/dbms/src/Interpreters/InterpreterCreateQuery.cpp
@ -233,6 +233,9 @@ ASTPtr InterpreterCreateQuery::formatColumns(const ColumnsDescription & columns)
            column_declaration->codec = parseQuery(codec_p, codec_desc_pos, codec_desc_end, "column codec", 0);
        }
        if (column.ttl)
            column_declaration->ttl = column.ttl;
        columns_list->children.push_back(column_declaration_ptr);
    }
@ -347,6 +350,9 @@ ColumnsDescription InterpreterCreateQuery::getColumnsDescription(const ASTExpres
        if (col_decl.codec)
            column.codec = CompressionCodecFactory::instance().get(col_decl.codec, column.type);
        if (col_decl.ttl)
            column.ttl = col_decl.ttl;
        res.add(std::move(column));
    }
--- a/dbms/src/Interpreters/InterpreterDescribeQuery.cpp
+++ b/dbms/src/Interpreters/InterpreterDescribeQuery.cpp
@ -52,6 +52,9 @@ Block InterpreterDescribeQuery::getSampleBlock()
    col.name = "codec_expression";
    block.insert(col);
    col.name = "ttl_expression";
    block.insert(col);
    return block;
 }
@ -118,6 +121,11 @@ BlockInputStreamPtr InterpreterDescribeQuery::executeImpl()
            res_columns[5]->insert(column.codec->getCodecDesc());
        else
            res_columns[5]->insertDefault();
        if (column.ttl)
            res_columns[6]->insert(queryToString(column.ttl));
        else
            res_columns[6]->insertDefault();
    }
    return std::make_shared<OneBlockInputStream>(sample_block.cloneWithColumns(std::move(res_columns)));
--- a/dbms/src/Interpreters/InterpreterSelectQuery.cpp
+++ b/dbms/src/Interpreters/InterpreterSelectQuery.cpp
@ -760,11 +760,11 @@ void InterpreterSelectQuery::executeImpl(Pipeline & pipeline, const BlockInputSt
                executeExpression(pipeline, expressions.before_order_and_select);
                executeDistinct(pipeline, true, expressions.selected_columns);
-                need_second_distinct_pass = query.distinct && pipeline.hasMoreThanOneStream();
+                need_second_distinct_pass = query.distinct && pipeline.hasMixedStreams();
            }
            else
            {
-                need_second_distinct_pass = query.distinct && pipeline.hasMoreThanOneStream();
+                need_second_distinct_pass = query.distinct && pipeline.hasMixedStreams();
                if (query.group_by_with_totals && !aggregate_final)
                {
@ -1533,6 +1533,7 @@ void InterpreterSelectQuery::executeUnion(Pipeline & pipeline)
        pipeline.firstStream() = std::make_shared<UnionBlockInputStream>(pipeline.streams, pipeline.stream_with_non_joined_data, max_streams);
        pipeline.stream_with_non_joined_data = nullptr;
        pipeline.streams.resize(1);
        pipeline.union_stream = true;
    }
    else if (pipeline.stream_with_non_joined_data)
    {
--- a/dbms/src/Interpreters/InterpreterSelectQuery.h
+++ b/dbms/src/Interpreters/InterpreterSelectQuery.h
@ -100,6 +100,7 @@ private:
          * It is appended to the main streams in UnionBlockInputStream or ParallelAggregatingBlockInputStream.
          */
        BlockInputStreamPtr stream_with_non_joined_data;
        bool union_stream = false;
        BlockInputStreamPtr & firstStream() { return streams.at(0); }
@ -117,6 +118,12 @@ private:
        {
            return streams.size() + (stream_with_non_joined_data ? 1 : 0) > 1;
        }
        /// Resulting stream is mix of other streams data. Distinct and/or order guaranties are broken.
        bool hasMixedStreams() const
        {
            return hasMoreThanOneStream() || union_stream;
        }
    };
    void executeImpl(Pipeline & pipeline, const BlockInputStreamPtr & prepared_input, bool dry_run);
--- a/dbms/src/Interpreters/RowRefs.h
+++ b/dbms/src/Interpreters/RowRefs.h
@ -1,5 +1,6 @@
 #pragma once
 #include <Common/RadixSort.h>
 #include <Columns/IColumn.h>
 #include <optional>
@ -39,11 +40,11 @@ struct RowRefList : RowRef
 * references that can be returned by the lookup methods
 */
-template <typename T>
+template <typename _Entry, typename _Key>
 class SortedLookupVector
 {
 public:
-    using Base = std::vector<T>;
+    using Base = std::vector<_Entry>;
    // First stage, insertions into the vector
    template <typename U, typename ... TAllocatorParams>
@ -54,7 +55,7 @@ public:
    }
    // Transition into second stage, ensures that the vector is sorted
-    typename Base::const_iterator upper_bound(const T & k)
+    typename Base::const_iterator upper_bound(const _Entry & k)
    {
        sort();
        return std::upper_bound(array.cbegin(), array.cend(), k);
@ -81,7 +82,15 @@ private:
            std::lock_guard<std::mutex> l(lock);
            if (!sorted.load(std::memory_order_relaxed))
            {
                /// TODO: It has been tested only for UInt32 yet. It needs to check UInt64, Float32/64.
                if constexpr (std::is_same_v<_Key, UInt32>)
                {
                    if (!array.empty())
                        radixSort<_Entry, _Key>(&array[0], array.size());
                }
                else
                    std::sort(array.begin(), array.end());
                sorted.store(true, std::memory_order_release);
            }
        }
@ -94,7 +103,7 @@ public:
    template <typename T>
    struct Entry
    {
-        using LookupType = SortedLookupVector<Entry<T>>;
+        using LookupType = SortedLookupVector<Entry<T>, T>;
        using LookupPtr = std::unique_ptr<LookupType>;
        T asof_value;
        RowRef row_ref;
--- a/dbms/src/Interpreters/SyntaxAnalyzer.cpp
+++ b/dbms/src/Interpreters/SyntaxAnalyzer.cpp
@ -710,9 +710,8 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyze(
                                (storage ? storage->getColumns().getOrdinary().getNames() : source_columns_list), source_columns_set,
                                result.analyzed_join.columns_from_joined_table);
-        /// Depending on the user's profile, check for the execution rights
+        /// Rewrite IN and/or JOIN for distributed tables according to distributed_product_mode setting.
-        /// distributed subqueries inside the IN or JOIN sections and process these subqueries.
+        InJoinSubqueriesPreprocessor(context).visit(query);
        InJoinSubqueriesPreprocessor(context).process(select_query);
        /// Optimizes logical expressions.
        LogicalExpressionsOptimizer(select_query, settings.optimize_min_equality_disjunction_chain_length.value).perform();
--- a/dbms/src/Interpreters/convertFieldToType.cpp
+++ b/dbms/src/Interpreters/convertFieldToType.cpp
@ -51,16 +51,10 @@ namespace
 template <typename From, typename To>
 static Field convertNumericTypeImpl(const Field & from)
 {
-    From value = from.get<From>();
+    To result;
-
+    if (!accurate::convertNumeric(from.get<From>(), result))
    /// Note that NaNs doesn't compare equal to anything, but they are still in range of any Float type.
    if (isNaN(value) && std::is_floating_point_v<To>)
        return value;
    if (!accurate::equalsOp(value, To(value)))
        return {};
-
+    return result;
    return To(value);
 }
 template <typename To>
--- a/dbms/src/Interpreters/tests/in_join_subqueries_preprocessor.cpp
+++ b/dbms/src/Interpreters/tests/in_join_subqueries_preprocessor.cpp
@ -52,11 +52,9 @@ private:
 };
-class InJoinSubqueriesPreprocessorMock : public DB::InJoinSubqueriesPreprocessor
+class CheckShardsAndTablesMock : public DB::InJoinSubqueriesPreprocessor::CheckShardsAndTables
 {
 public:
    using DB::InJoinSubqueriesPreprocessor::InJoinSubqueriesPreprocessor;
    bool hasAtLeastTwoShards(const DB::IStorage & table) const override
    {
        if (!table.isRemote())
@ -1181,13 +1179,11 @@ TestResult check(const TestEntry & entry)
        if (!parse(ast_input, entry.input))
            return TestResult(false, "parse error");
        auto select_query = typeid_cast<DB::ASTSelectQuery *>(&*ast_input);
        bool success = true;
        try
        {
-            InJoinSubqueriesPreprocessorMock(context).process(select_query);
+            DB::InJoinSubqueriesPreprocessor(context, std::make_unique<CheckShardsAndTablesMock>()).visit(ast_input);
        }
        catch (const DB::Exception & ex)
        {
--- a/dbms/src/Parsers/ASTAlterQuery.cpp
+++ b/dbms/src/Parsers/ASTAlterQuery.cpp
@ -40,6 +40,11 @@ ASTPtr ASTAlterCommand::clone() const
        res->predicate = predicate->clone();
        res->children.push_back(res->predicate);
    }
    if (ttl)
    {
        res->ttl = ttl->clone();
        res->children.push_back(res->ttl);
    }
    return res;
 }
@ -174,6 +179,11 @@ void ASTAlterCommand::formatImpl(
        settings.ostr << " " << (settings.hilite ? hilite_none : "");
        comment->formatImpl(settings, state, frame);
    }
    else if (type == ASTAlterCommand::MODIFY_TTL)
    {
        settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "MODIFY TTL " << (settings.hilite ? hilite_none : "");
        ttl->formatImpl(settings, state, frame);
    }
    else
        throw Exception("Unexpected type of ALTER", ErrorCodes::UNEXPECTED_AST_STRUCTURE);
 }
--- a/dbms/src/Parsers/ASTAlterQuery.h
+++ b/dbms/src/Parsers/ASTAlterQuery.h
@ -27,6 +27,7 @@ public:
        MODIFY_COLUMN,
        COMMENT_COLUMN,
        MODIFY_ORDER_BY,
        MODIFY_TTL,
        ADD_INDEX,
        DROP_INDEX,
@ -84,6 +85,9 @@ public:
    /// A column comment
    ASTPtr comment;
    /// For MODIFY TTL query
    ASTPtr ttl;
    bool detach = false;        /// true for DETACH PARTITION
    bool part = false;          /// true for ATTACH PART
--- a/dbms/src/Parsers/ASTColumnDeclaration.cpp
+++ b/dbms/src/Parsers/ASTColumnDeclaration.cpp
@ -33,6 +33,12 @@ ASTPtr ASTColumnDeclaration::clone() const
        res->children.push_back(res->comment);
    }
    if (ttl)
    {
        res->ttl = ttl->clone();
        res->children.push_back(res->ttl);
    }
    return res;
 }
@ -69,6 +75,12 @@ void ASTColumnDeclaration::formatImpl(const FormatSettings & settings, FormatSta
        settings.ostr << ' ';
        codec->formatImpl(settings, state, frame);
    }
    if (ttl)
    {
        settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << "TTL" << (settings.hilite ? hilite_none : "") << ' ';
        ttl->formatImpl(settings, state, frame);
    }
 }
 }
--- a/dbms/src/Parsers/ASTColumnDeclaration.h
+++ b/dbms/src/Parsers/ASTColumnDeclaration.h
@ -17,6 +17,7 @@ public:
    ASTPtr default_expression;
    ASTPtr codec;
    ASTPtr comment;
    ASTPtr ttl;
    String getID(char delim) const override { return "ColumnDeclaration" + (delim + name); }
--- a/dbms/src/Parsers/ASTCreateQuery.cpp
+++ b/dbms/src/Parsers/ASTCreateQuery.cpp
@ -23,6 +23,8 @@ ASTPtr ASTStorage::clone() const
        res->set(res->order_by, order_by->clone());
    if (sample_by)
        res->set(res->sample_by, sample_by->clone());
    if (ttl_table)
        res->set(res->ttl_table, ttl_table->clone());
    if (settings)
        res->set(res->settings, settings->clone());
@ -57,6 +59,11 @@ void ASTStorage::formatImpl(const FormatSettings & s, FormatState & state, Forma
        s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SAMPLE BY " << (s.hilite ? hilite_none : "");
        sample_by->formatImpl(s, state, frame);
    }
    if (ttl_table)
    {
        s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "TTL " << (s.hilite ? hilite_none : "");
        ttl_table->formatImpl(s, state, frame);
    }
    if (settings)
    {
        s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SETTINGS " << (s.hilite ? hilite_none : "");
--- a/dbms/src/Parsers/ASTCreateQuery.h
+++ b/dbms/src/Parsers/ASTCreateQuery.h
@ -18,6 +18,7 @@ public:
    IAST * primary_key = nullptr;
    IAST * order_by = nullptr;
    IAST * sample_by = nullptr;
    IAST * ttl_table = nullptr;
    ASTSetQuery * settings = nullptr;
    String getID(char) const override { return "Storage definition"; }
--- a/dbms/src/Parsers/ParserAlterQuery.cpp
+++ b/dbms/src/Parsers/ParserAlterQuery.cpp
@ -27,6 +27,7 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
    ParserKeyword s_modify_column("MODIFY COLUMN");
    ParserKeyword s_comment_column("COMMENT COLUMN");
    ParserKeyword s_modify_order_by("MODIFY ORDER BY");
    ParserKeyword s_modify_ttl("MODIFY TTL");
    ParserKeyword s_add_index("ADD INDEX");
    ParserKeyword s_drop_index("DROP INDEX");
@ -282,6 +283,12 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
        command->type = ASTAlterCommand::COMMENT_COLUMN;
    }
    else if (s_modify_ttl.ignore(pos, expected))
    {
        if (!parser_exp_elem.parse(pos, command->ttl, expected))
            return false;
        command->type = ASTAlterCommand::MODIFY_TTL;
    }
    else
        return false;
@ -299,6 +306,8 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
        command->children.push_back(command->update_assignments);
    if (command->comment)
        command->children.push_back(command->comment);
    if (command->ttl)
        command->children.push_back(command->ttl);
    return true;
 }
--- a/dbms/src/Parsers/ParserCreateQuery.cpp
+++ b/dbms/src/Parsers/ParserCreateQuery.cpp
@ -210,6 +210,7 @@ bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
    ParserKeyword s_primary_key("PRIMARY KEY");
    ParserKeyword s_order_by("ORDER BY");
    ParserKeyword s_sample_by("SAMPLE BY");
    ParserKeyword s_ttl("TTL");
    ParserKeyword s_settings("SETTINGS");
    ParserIdentifierWithOptionalParameters ident_with_optional_params_p;
@ -221,6 +222,7 @@ bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
    ASTPtr primary_key;
    ASTPtr order_by;
    ASTPtr sample_by;
    ASTPtr ttl_table;
    ASTPtr settings;
    if (!s_engine.ignore(pos, expected))
@ -265,6 +267,14 @@ bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
                return false;
        }
        if (!ttl_table && s_ttl.ignore(pos, expected))
        {
            if (expression_p.parse(pos, ttl_table, expected))
                continue;
            else
                return false;
        }
        if (s_settings.ignore(pos, expected))
        {
            if (!settings_p.parse(pos, settings, expected))
@ -280,6 +290,7 @@ bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
    storage->set(storage->primary_key, primary_key);
    storage->set(storage->order_by, order_by);
    storage->set(storage->sample_by, sample_by);
    storage->set(storage->ttl_table, ttl_table);
    storage->set(storage->settings, settings);
--- a/dbms/src/Parsers/ParserCreateQuery.h
+++ b/dbms/src/Parsers/ParserCreateQuery.h
@ -123,9 +123,11 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
    ParserKeyword s_alias{"ALIAS"};
    ParserKeyword s_comment{"COMMENT"};
    ParserKeyword s_codec{"CODEC"};
    ParserKeyword s_ttl{"TTL"};
    ParserTernaryOperatorExpression expr_parser;
    ParserStringLiteral string_literal_parser;
    ParserCodec codec_parser;
    ParserExpression expression_parser;
    /// mandatory column name
    ASTPtr name;
@ -140,6 +142,7 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
    ASTPtr default_expression;
    ASTPtr comment_expression;
    ASTPtr codec_expression;
    ASTPtr ttl_expression;
    if (!s_default.check_without_moving(pos, expected) &&
        !s_materialized.check_without_moving(pos, expected) &&
@ -178,6 +181,12 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
            return false;
    }
    if (s_ttl.ignore(pos, expected))
    {
        if (!expression_parser.parse(pos, ttl_expression, expected))
            return false;
    }
    const auto column_declaration = std::make_shared<ASTColumnDeclaration>();
    node = column_declaration;
    getIdentifierName(name, column_declaration->name);
@ -207,6 +216,12 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
        column_declaration->children.push_back(std::move(codec_expression));
    }
    if (ttl_expression)
    {
        column_declaration->ttl = ttl_expression;
        column_declaration->children.push_back(std::move(ttl_expression));
    }
    return true;
 }
--- a/dbms/src/Storages/AlterCommands.cpp
+++ b/dbms/src/Storages/AlterCommands.cpp
@ -17,6 +17,8 @@
 #include <Common/typeid_cast.h>
 #include <Compression/CompressionFactory.h>
 #include <Parsers/queryToString.h>
 namespace DB
 {
@ -64,6 +66,9 @@ std::optional<AlterCommand> AlterCommand::parse(const ASTAlterCommand * command_
        if (command_ast->column)
            command.after_column = *getIdentifierName(command_ast->column);
        if (ast_col_decl.ttl)
            command.ttl = ast_col_decl.ttl;
        command.if_not_exists = command_ast->if_not_exists;
        return command;
@ -104,6 +109,9 @@ std::optional<AlterCommand> AlterCommand::parse(const ASTAlterCommand * command_
            command.comment = ast_comment.value.get<String>();
        }
        if (ast_col_decl.ttl)
            command.ttl = ast_col_decl.ttl;
        if (ast_col_decl.codec)
            command.codec = compression_codec_factory.get(ast_col_decl.codec, command.data_type);
@ -157,13 +165,20 @@ std::optional<AlterCommand> AlterCommand::parse(const ASTAlterCommand * command_
        return command;
    }
    else if (command_ast->type == ASTAlterCommand::MODIFY_TTL)
    {
        AlterCommand command;
        command.type = AlterCommand::MODIFY_TTL;
        command.ttl = command_ast->ttl;
        return command;
    }
    else
        return {};
 }
 void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
-        ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
+        ASTPtr & order_by_ast, ASTPtr & primary_key_ast, ASTPtr & ttl_table_ast) const
 {
    if (type == ADD_COLUMN)
    {
@ -175,6 +190,7 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri
        }
        column.comment = comment;
        column.codec = codec;
        column.ttl = ttl;
        columns_description.add(column, after_column);
@ -204,6 +220,9 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri
            return;
        }
        if (ttl)
            column.ttl = ttl;
        column.type = data_type;
        column.default_desc.kind = default_kind;
@ -278,6 +297,10 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri
        indices_description.indices.erase(erase_it);
    }
    else if (type == MODIFY_TTL)
    {
        ttl_table_ast = ttl;
    }
    else
        throw Exception("Wrong parameter type in ALTER query", ErrorCodes::LOGICAL_ERROR);
 }
@ -293,20 +316,22 @@ bool AlterCommand::is_mutable() const
 }
 void AlterCommands::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
-        ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
+        ASTPtr & order_by_ast, ASTPtr & primary_key_ast, ASTPtr & ttl_table_ast) const
 {
    auto new_columns_description = columns_description;
    auto new_indices_description = indices_description;
    auto new_order_by_ast = order_by_ast;
    auto new_primary_key_ast = primary_key_ast;
    auto new_ttl_table_ast = ttl_table_ast;
    for (const AlterCommand & command : *this)
        if (!command.ignore)
-            command.apply(new_columns_description, new_indices_description, new_order_by_ast, new_primary_key_ast);
+            command.apply(new_columns_description, new_indices_description, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast);
    columns_description = std::move(new_columns_description);
    indices_description = std::move(new_indices_description);
    order_by_ast = std::move(new_order_by_ast);
    primary_key_ast = std::move(new_primary_key_ast);
    ttl_table_ast = std::move(new_ttl_table_ast);
 }
 void AlterCommands::validate(const IStorage & table, const Context & context)
@ -493,7 +518,8 @@ void AlterCommands::apply(ColumnsDescription & columns_description) const
    IndicesDescription indices_description;
    ASTPtr out_order_by;
    ASTPtr out_primary_key;
-    apply(out_columns_description, indices_description, out_order_by, out_primary_key);
+    ASTPtr out_ttl_table;
    apply(out_columns_description, indices_description, out_order_by, out_primary_key, out_ttl_table);
    if (out_order_by)
        throw Exception("Storage doesn't support modifying ORDER BY expression", ErrorCodes::NOT_IMPLEMENTED);
@ -501,6 +527,8 @@ void AlterCommands::apply(ColumnsDescription & columns_description) const
        throw Exception("Storage doesn't support modifying PRIMARY KEY expression", ErrorCodes::NOT_IMPLEMENTED);
    if (!indices_description.indices.empty())
        throw Exception("Storage doesn't support modifying indices", ErrorCodes::NOT_IMPLEMENTED);
    if (out_ttl_table)
        throw Exception("Storage doesn't support modifying TTL expression", ErrorCodes::NOT_IMPLEMENTED);
    columns_description = std::move(out_columns_description);
 }
--- a/dbms/src/Storages/AlterCommands.h
+++ b/dbms/src/Storages/AlterCommands.h
@ -24,6 +24,7 @@ struct AlterCommand
        MODIFY_ORDER_BY,
        ADD_INDEX,
        DROP_INDEX,
        MODIFY_TTL,
        UKNOWN_TYPE,
    };
@ -60,6 +61,9 @@ struct AlterCommand
    /// For ADD/DROP INDEX
    String index_name;
    /// For MODIFY TTL
    ASTPtr ttl;
    /// indicates that this command should not be applied, for example in case of if_exists=true and column doesn't exist.
    bool ignore = false;
@ -79,7 +83,7 @@ struct AlterCommand
    static std::optional<AlterCommand> parse(const ASTAlterCommand * command);
    void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
-            ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const;
+            ASTPtr & order_by_ast, ASTPtr & primary_key_ast, ASTPtr & ttl_table_ast) const;
    /// Checks that not only metadata touched by that command
    bool is_mutable() const;
 };
@ -91,7 +95,7 @@ class AlterCommands : public std::vector<AlterCommand>
 {
 public:
    void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, ASTPtr & order_by_ast,
-            ASTPtr & primary_key_ast) const;
+            ASTPtr & primary_key_ast, ASTPtr & ttl_table_ast) const;
    /// For storages that don't support MODIFY_ORDER_BY.
    void apply(ColumnsDescription & columns_description) const;
--- a/dbms/src/Storages/ColumnsDescription.cpp
+++ b/dbms/src/Storages/ColumnsDescription.cpp
@ -37,12 +37,14 @@ namespace ErrorCodes
 bool ColumnDescription::operator==(const ColumnDescription & other) const
 {
    auto codec_str = [](const CompressionCodecPtr & codec_ptr) { return codec_ptr ? codec_ptr->getCodecDesc() : String(); };
    auto ttl_str = [](const ASTPtr & ttl_ast) { return ttl_ast ? queryToString(ttl_ast) : String{}; };
    return name == other.name
        && type->equals(*other.type)
        && default_desc == other.default_desc
        && comment == other.comment
-        && codec_str(codec) == codec_str(other.codec);
+        && codec_str(codec) == codec_str(other.codec)
        && ttl_str(ttl) == ttl_str(other.ttl);
 }
 void ColumnDescription::writeText(WriteBuffer & buf) const
@ -74,6 +76,13 @@ void ColumnDescription::writeText(WriteBuffer & buf) const
        DB::writeText(")", buf);
    }
    if (ttl)
    {
        writeChar('\t', buf);
        DB::writeText("TTL ", buf);
        DB::writeText(queryToString(ttl), buf);
    }
    writeChar('\n', buf);
 }
@ -99,6 +108,9 @@ void ColumnDescription::readText(ReadBuffer & buf)
        if (col_ast->codec)
            codec = CompressionCodecFactory::instance().get(col_ast->codec, type);
        if (col_ast->ttl)
            ttl = col_ast->ttl;
    }
    else
        throw Exception("Cannot parse column description", ErrorCodes::CANNOT_PARSE_TEXT);
@ -388,6 +400,18 @@ CompressionCodecPtr ColumnsDescription::getCodecOrDefault(const String & column_
    return getCodecOrDefault(column_name, CompressionCodecFactory::instance().getDefaultCodec());
 }
 ColumnsDescription::ColumnTTLs ColumnsDescription::getColumnTTLs() const
 {
    ColumnTTLs ret;
    for (const auto & column : columns)
    {
        if (column.ttl)
            ret.emplace(column.name, column.ttl);
    }
    return ret;
 }
 String ColumnsDescription::toString() const
 {
--- a/dbms/src/Storages/ColumnsDescription.h
+++ b/dbms/src/Storages/ColumnsDescription.h
@ -18,6 +18,7 @@ struct ColumnDescription
    ColumnDefault default_desc;
    String comment;
    CompressionCodecPtr codec;
    ASTPtr ttl;
    ColumnDescription() = default;
    ColumnDescription(String name_, DataTypePtr type_) : name(std::move(name_)), type(std::move(type_)) {}
@ -58,6 +59,9 @@ public:
    /// ordinary + materialized + aliases.
    NamesAndTypesList getAll() const;
    using ColumnTTLs = std::unordered_map<String, ASTPtr>;
    ColumnTTLs getColumnTTLs() const;
    bool has(const String & column_name) const;
    bool hasNested(const String & column_name) const;
    ColumnDescription & get(const String & column_name);
--- a/dbms/src/Storages/MergeTree/MergeList.cpp
+++ b/dbms/src/Storages/MergeTree/MergeList.cpp
@ -1,7 +1,7 @@
 #include <Storages/MergeTree/MergeList.h>
 #include <Storages/MergeTree/MergeTreeDataMergerMutator.h>
 #include <Common/CurrentMetrics.h>
-#include <Poco/Ext/ThreadNumber.h>
+#include <common/getThreadNumber.h>
 #include <Common/CurrentThread.h>
@ -19,7 +19,7 @@ MergeListElement::MergeListElement(const std::string & database, const std::stri
    , result_part_name{future_part.name}
    , result_data_version{future_part.part_info.getDataVersion()}
    , num_parts{future_part.parts.size()}
-    , thread_number{Poco::ThreadNumber::get()}
+    , thread_number{getThreadNumber()}
 {
    for (const auto & source_part : future_part.parts)
    {
@ -28,7 +28,8 @@ MergeListElement::MergeListElement(const std::string & database, const std::stri
        std::shared_lock<std::shared_mutex> part_lock(source_part->columns_lock);
        total_size_bytes_compressed += source_part->bytes_on_disk;
-        total_size_marks += source_part->marks_count;
+        total_size_marks += source_part->getMarksCount();
        total_rows_count += source_part->index_granularity.getTotalRows();
    }
    if (!future_part.parts.empty())
@ -60,6 +61,7 @@ MergeInfo MergeListElement::getInfo() const
    res.num_parts = num_parts;
    res.total_size_bytes_compressed = total_size_bytes_compressed;
    res.total_size_marks = total_size_marks;
    res.total_rows_count = total_rows_count;
    res.bytes_read_uncompressed = bytes_read_uncompressed.load(std::memory_order_relaxed);
    res.bytes_written_uncompressed = bytes_written_uncompressed.load(std::memory_order_relaxed);
    res.rows_read = rows_read.load(std::memory_order_relaxed);
--- a/dbms/src/Storages/MergeTree/MergeList.h
+++ b/dbms/src/Storages/MergeTree/MergeList.h
@ -36,6 +36,7 @@ struct MergeInfo
    UInt64 num_parts;
    UInt64 total_size_bytes_compressed;
    UInt64 total_size_marks;
    UInt64 total_rows_count;
    UInt64 bytes_read_uncompressed;
    UInt64 bytes_written_uncompressed;
    UInt64 rows_read;
@ -67,6 +68,7 @@ struct MergeListElement : boost::noncopyable
    UInt64 total_size_bytes_compressed{};
    UInt64 total_size_marks{};
    UInt64 total_rows_count{};
    std::atomic<UInt64> bytes_read_uncompressed{};
    std::atomic<UInt64> bytes_written_uncompressed{};
--- a/dbms/src/Storages/MergeTree/MergeSelector.h
+++ b/dbms/src/Storages/MergeTree/MergeSelector.h
@ -39,6 +39,9 @@ public:
        /// Opaque pointer to avoid dependencies (it is not possible to do forward declaration of typedef).
        const void * data;
        /// Minimal time, when we need to delete some data from this part
        time_t min_ttl;
    };
    /// Parts are belong to partitions. Only parts within same partition could be merged.
--- a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectBlockInputStream.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectBlockInputStream.cpp
@ -5,6 +5,7 @@
 #include <Columns/FilterDescription.h>
 #include <Columns/ColumnArray.h>
 #include <Common/typeid_cast.h>
 #include <Common/StackTrace.h>
 #include <ext/range.h>
 #include <DataTypes/DataTypeNothing.h>
@ -40,8 +41,7 @@ MergeTreeBaseSelectBlockInputStream::MergeTreeBaseSelectBlockInputStream(
    max_read_buffer_size(max_read_buffer_size),
    use_uncompressed_cache(use_uncompressed_cache),
    save_marks_in_cache(save_marks_in_cache),
-    virt_column_names(virt_column_names),
+    virt_column_names(virt_column_names)
    max_block_size_marks(max_block_size_rows / storage.index_granularity)
 {
 }
@ -76,7 +76,7 @@ Block MergeTreeBaseSelectBlockInputStream::readFromPart()
    const auto current_max_block_size_rows = max_block_size_rows;
    const auto current_preferred_block_size_bytes = preferred_block_size_bytes;
    const auto current_preferred_max_column_in_block_size_bytes = preferred_max_column_in_block_size_bytes;
-    const auto index_granularity = storage.index_granularity;
+    const auto & index_granularity = task->data_part->index_granularity;
    const double min_filtration_ratio = 0.00001;
    auto estimateNumRows = [current_preferred_block_size_bytes, current_max_block_size_rows,
@ -87,11 +87,12 @@ Block MergeTreeBaseSelectBlockInputStream::readFromPart()
            return current_max_block_size_rows;
        /// Calculates number of rows will be read using preferred_block_size_bytes.
-        /// Can't be less than index_granularity.
+        /// Can't be less than avg_index_granularity.
        UInt64 rows_to_read = current_task.size_predictor->estimateNumRows(current_preferred_block_size_bytes);
        if (!rows_to_read)
            return rows_to_read;
-        rows_to_read = std::max<UInt64>(index_granularity, rows_to_read);
+        UInt64 total_row_in_current_granule = current_reader.numRowsInCurrentGranule();
        rows_to_read = std::max<UInt64>(total_row_in_current_granule, rows_to_read);
        if (current_preferred_max_column_in_block_size_bytes)
        {
@ -102,7 +103,7 @@ Block MergeTreeBaseSelectBlockInputStream::readFromPart()
            auto rows_to_read_for_max_size_column_with_filtration
                = static_cast<UInt64>(rows_to_read_for_max_size_column / filtration_ratio);
-            /// If preferred_max_column_in_block_size_bytes is used, number of rows to read can be less than index_granularity.
+            /// If preferred_max_column_in_block_size_bytes is used, number of rows to read can be less than current_index_granularity.
            rows_to_read = std::min(rows_to_read, rows_to_read_for_max_size_column_with_filtration);
        }
@ -110,8 +111,7 @@ Block MergeTreeBaseSelectBlockInputStream::readFromPart()
        if (unread_rows_in_current_granule >= rows_to_read)
            return rows_to_read;
-        UInt64 granule_to_read = (rows_to_read + current_reader.numReadRowsInCurrentGranule() + index_granularity / 2) / index_granularity;
+        return index_granularity.countMarksForRows(current_reader.currentMark(), rows_to_read, current_reader.numReadRowsInCurrentGranule());
        return index_granularity * granule_to_read - current_reader.numReadRowsInCurrentGranule();
    };
    if (!task->range_reader.isInitialized())
@ -121,28 +121,33 @@ Block MergeTreeBaseSelectBlockInputStream::readFromPart()
            if (reader->getColumns().empty())
            {
                task->range_reader = MergeTreeRangeReader(
-                    pre_reader.get(), index_granularity, nullptr,
+                    pre_reader.get(), nullptr,
                    prewhere_info->alias_actions, prewhere_info->prewhere_actions,
                    &prewhere_info->prewhere_column_name, &task->ordered_names,
                    task->should_reorder, task->remove_prewhere_column, true);
            }
            else
            {
                MergeTreeRangeReader * pre_reader_ptr = nullptr;
                if (pre_reader != nullptr)
                {
                    task->pre_range_reader = MergeTreeRangeReader(
-                    pre_reader.get(), index_granularity, nullptr,
+                        pre_reader.get(), nullptr,
                        prewhere_info->alias_actions, prewhere_info->prewhere_actions,
                        &prewhere_info->prewhere_column_name, &task->ordered_names,
                        task->should_reorder, task->remove_prewhere_column, false);
                    pre_reader_ptr = &task->pre_range_reader;
                }
                task->range_reader = MergeTreeRangeReader(
-                    reader.get(), index_granularity, &task->pre_range_reader, nullptr, nullptr,
+                    reader.get(), pre_reader_ptr, nullptr, nullptr,
                    nullptr, &task->ordered_names, true, false, true);
            }
        }
        else
        {
            task->range_reader = MergeTreeRangeReader(
-                reader.get(), index_granularity, nullptr, nullptr, nullptr,
+                reader.get(),  nullptr, nullptr, nullptr,
                nullptr, &task->ordered_names, task->should_reorder, false, true);
        }
    }
--- a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectBlockInputStream.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectBlockInputStream.h
@ -71,8 +71,6 @@ protected:
    using MergeTreeReaderPtr = std::unique_ptr<MergeTreeReader>;
    MergeTreeReaderPtr reader;
    MergeTreeReaderPtr pre_reader;
    UInt64 max_block_size_marks;
 };
 }
--- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp
@ -15,6 +15,7 @@
 #include <Parsers/parseQuery.h>
 #include <Parsers/queryToString.h>
 #include <DataStreams/ExpressionBlockInputStream.h>
 #include <DataStreams/MarkInCompressedFile.h>
 #include <Formats/ValuesRowInputStream.h>
 #include <DataStreams/copyData.h>
 #include <IO/WriteBufferFromFile.h>
@ -84,6 +85,7 @@ namespace ErrorCodes
    extern const int CANNOT_ALLOCATE_MEMORY;
    extern const int CANNOT_MUNMAP;
    extern const int CANNOT_MREMAP;
    extern const int BAD_TTL_EXPRESSION;
 }
@ -97,17 +99,19 @@ MergeTreeData::MergeTreeData(
    const ASTPtr & order_by_ast_,
    const ASTPtr & primary_key_ast_,
    const ASTPtr & sample_by_ast_,
    const ASTPtr & ttl_table_ast_,
    const MergingParams & merging_params_,
    const MergeTreeSettings & settings_,
    bool require_part_metadata_,
    bool attach,
    BrokenPartCallback broken_part_callback_)
    : global_context(context_),
    index_granularity_info(settings_),
    merging_params(merging_params_),
    index_granularity(settings_.index_granularity),
    settings(settings_),
    partition_by_ast(partition_by_ast_),
    sample_by_ast(sample_by_ast_),
    ttl_table_ast(ttl_table_ast_),
    require_part_metadata(require_part_metadata_),
    database_name(database_), table_name(table_),
    full_path(full_path_),
@ -133,7 +137,6 @@ MergeTreeData::MergeTreeData(
        columns_required_for_sampling = ExpressionAnalyzer(sample_by_ast, syntax, global_context)
            .getRequiredSourceColumns();
    }
    MergeTreeDataFormatVersion min_format_version(0);
    if (!date_column_name.empty())
    {
@ -159,6 +162,8 @@ MergeTreeData::MergeTreeData(
        min_format_version = MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING;
    }
    setTTLExpressions(columns_.getColumnTTLs(), ttl_table_ast_);
    auto path_exists = Poco::File(full_path).exists();
    /// Creating directories, if not exist.
    Poco::File(full_path).createDirectories();
@ -187,11 +192,35 @@ MergeTreeData::MergeTreeData(
        format_version = 0;
    if (format_version < min_format_version)
    {
        if (min_format_version == MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING.toUnderType())
            throw Exception(
                "MergeTree data format version on disk doesn't support custom partitioning",
                ErrorCodes::METADATA_MISMATCH);
    }
 }
 MergeTreeData::IndexGranularityInfo::IndexGranularityInfo(const MergeTreeSettings & settings)
    : fixed_index_granularity(settings.index_granularity)
    , index_granularity_bytes(settings.index_granularity_bytes)
 {
    /// Granularity is fixed
    if (index_granularity_bytes == 0)
    {
        is_adaptive = false;
        mark_size_in_bytes = sizeof(UInt64) * 2;
        marks_file_extension = ".mrk";
    }
    else
    {
        is_adaptive = true;
        mark_size_in_bytes = sizeof(UInt64) * 3;
        marks_file_extension = ".mrk2";
    }
 }
 static void checkKeyExpression(const ExpressionActions & expr, const Block & sample_block, const String & key_name)
 {
@ -490,6 +519,98 @@ void MergeTreeData::initPartitionKey()
    }
 }
 namespace
 {
 void checkTTLExpression(const ExpressionActionsPtr & ttl_expression, const String & result_column_name)
 {
    for (const auto & action : ttl_expression->getActions())
    {
        if (action.type == ExpressionAction::APPLY_FUNCTION)
        {
            IFunctionBase & func = *action.function_base;
            if (!func.isDeterministic())
                throw Exception("TTL expression cannot contain non-deterministic functions, "
                    "but contains function " + func.getName(), ErrorCodes::BAD_ARGUMENTS);
        }
    }
    bool has_date_column = false;
    for (const auto & elem : ttl_expression->getRequiredColumnsWithTypes())
    {
        if (typeid_cast<const DataTypeDateTime *>(elem.type.get()) || typeid_cast<const DataTypeDate *>(elem.type.get()))
        {
            has_date_column = true;
            break;
        }
    }
    if (!has_date_column)
        throw Exception("TTL expression should use at least one Date or DateTime column", ErrorCodes::BAD_TTL_EXPRESSION);
    const auto & result_column = ttl_expression->getSampleBlock().getByName(result_column_name);
    if (!typeid_cast<const DataTypeDateTime *>(result_column.type.get())
        && !typeid_cast<const DataTypeDate *>(result_column.type.get()))
    {
        throw Exception("TTL expression result column should have DateTime or Date type, but has "
            + result_column.type->getName(), ErrorCodes::BAD_TTL_EXPRESSION);
    }
 }
 }
 void MergeTreeData::setTTLExpressions(const ColumnsDescription::ColumnTTLs & new_column_ttls,
        const ASTPtr & new_ttl_table_ast, bool only_check)
 {
    auto create_ttl_entry = [this](ASTPtr ttl_ast) -> TTLEntry
    {
        auto syntax_result = SyntaxAnalyzer(global_context).analyze(ttl_ast, getColumns().getAllPhysical());
        auto expr = ExpressionAnalyzer(ttl_ast, syntax_result, global_context).getActions(false);
        String result_column = ttl_ast->getColumnName();
        checkTTLExpression(expr, result_column);
        return {expr, result_column};
    };
    if (!new_column_ttls.empty())
    {
        NameSet columns_ttl_forbidden;
        if (partition_key_expr)
            for (const auto & col : partition_key_expr->getRequiredColumns())
                columns_ttl_forbidden.insert(col);
        if (sorting_key_expr)
            for (const auto & col : sorting_key_expr->getRequiredColumns())
                columns_ttl_forbidden.insert(col);
        for (const auto & [name, ast] : new_column_ttls)
        {
            if (columns_ttl_forbidden.count(name))
                throw Exception("Trying to set ttl for key column " + name, ErrorCodes::ILLEGAL_COLUMN);
            else
            {
                auto new_ttl_entry = create_ttl_entry(ast);
                if (!only_check)
                    ttl_entries_by_name.emplace(name, new_ttl_entry);
            }
        }
    }
    if (new_ttl_table_ast)
    {
        auto new_ttl_table_entry = create_ttl_entry(new_ttl_table_ast);
        if (!only_check)
        {
            ttl_table_ast = new_ttl_table_ast;
            ttl_table_entry = new_ttl_table_entry;
        }
    }
 }
 void MergeTreeData::MergingParams::check(const NamesAndTypesList & columns) const
 {
@ -1059,7 +1180,8 @@ void MergeTreeData::checkAlter(const AlterCommands & commands, const Context & c
    auto new_indices = getIndicesDescription();
    ASTPtr new_order_by_ast = order_by_ast;
    ASTPtr new_primary_key_ast = primary_key_ast;
-    commands.apply(new_columns, new_indices, new_order_by_ast, new_primary_key_ast);
+    ASTPtr new_ttl_table_ast = ttl_table_ast;
    commands.apply(new_columns, new_indices, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast);
    if (getIndicesDescription().empty() && !new_indices.empty() &&
            !context.getSettingsRef().allow_experimental_data_skipping_indices)
@ -1145,11 +1267,12 @@ void MergeTreeData::checkAlter(const AlterCommands & commands, const Context & c
    setPrimaryKeyIndicesAndColumns(new_order_by_ast, new_primary_key_ast,
            new_columns, new_indices, /* only_check = */ true);
    setTTLExpressions(new_columns.getColumnTTLs(), new_ttl_table_ast, /* only_check = */ true);
    /// Check that type conversions are possible.
    ExpressionActionsPtr unused_expression;
    NameToNameMap unused_map;
    bool unused_bool;
    createConvertExpression(nullptr, getColumns().getAllPhysical(), new_columns.getAllPhysical(),
            getIndicesDescription().indices, new_indices.indices, unused_expression, unused_map, unused_bool);
 }
@ -1181,7 +1304,7 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name
        if (!new_indices_set.count(index.name))
        {
            out_rename_map["skp_idx_" + index.name + ".idx"] = "";
-            out_rename_map["skp_idx_" + index.name + ".mrk"] = "";
+            out_rename_map["skp_idx_" + index.name + index_granularity_info.marks_file_extension] = "";
        }
    }
@ -1210,7 +1333,7 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name
                    if (--stream_counts[file_name] == 0)
                    {
                        out_rename_map[file_name + ".bin"] = "";
-                        out_rename_map[file_name + ".mrk"] = "";
+                        out_rename_map[file_name + index_granularity_info.marks_file_extension] = "";
                    }
                }, {});
            }
@ -1285,7 +1408,7 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name
                    String temporary_file_name = IDataType::getFileNameForStream(temporary_column_name, substream_path);
                    out_rename_map[temporary_file_name + ".bin"] = original_file_name + ".bin";
-                    out_rename_map[temporary_file_name + ".mrk"] = original_file_name + ".mrk";
+                    out_rename_map[temporary_file_name + index_granularity_info.marks_file_extension] = original_file_name + index_granularity_info.marks_file_extension;
                }, {});
        }
@ -1404,7 +1527,14 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart(
          */
        IMergedBlockOutputStream::WrittenOffsetColumns unused_written_offsets;
        MergedColumnOnlyOutputStream out(
-            *this, in.getHeader(), full_path + part->name + '/', true /* sync */, compression_codec, true /* skip_offsets */, unused_written_offsets);
+            *this,
            in.getHeader(),
            full_path + part->name + '/',
            true /* sync */,
            compression_codec,
            true /* skip_offsets */,
            unused_written_offsets,
            part->index_granularity);
        in.readPrefix();
        out.writePrefix();
@ -1446,6 +1576,32 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart(
    return transaction;
 }
 void MergeTreeData::removeEmptyColumnsFromPart(MergeTreeData::MutableDataPartPtr & data_part)
 {
    auto & empty_columns = data_part->empty_columns;
    if (empty_columns.empty())
        return;
    NamesAndTypesList new_columns;
    for (const auto & [name, type] : data_part->columns)
        if (!empty_columns.count(name))
            new_columns.emplace_back(name, type);
    std::stringstream log_message;
    for (auto it = empty_columns.begin(); it != empty_columns.end(); ++it)
    {
        if (it != empty_columns.begin())
            log_message << ", ";
        log_message << *it;
    }
    LOG_INFO(log, "Removing empty columns: " << log_message.str() << " from part " << data_part->name);
    if (auto transaction = alterDataPart(data_part, new_columns, getIndicesDescription().indices, false))
        transaction->commit();
    empty_columns.clear();
 }
 void MergeTreeData::freezeAll(const String & with_name, const Context & context)
 {
    freezePartitionsByMatcher([] (const DataPartPtr &){ return true; }, with_name, context);
@ -2150,8 +2306,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeData::loadPartAndFixMetadata(const St
    /// Check the data while we are at it.
    if (part->checksums.empty())
    {
-        part->checksums = checkDataPart(full_part_path, index_granularity, false, primary_key_data_types, skip_indices);
+        part->checksums = checkDataPart(part, false, primary_key_data_types, skip_indices);
        {
            WriteBufferFromFile out(full_part_path + "checksums.txt.tmp", 4096);
            part->checksums.write(out);
--- a/dbms/src/Storages/MergeTree/MergeTreeData.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeData.h
@ -285,6 +285,32 @@ public:
        String getModeName() const;
    };
    /// Meta information about index granularity
    struct IndexGranularityInfo
    {
        /// Marks file extension '.mrk' or '.mrk2'
        String marks_file_extension;
        /// Size of one mark in file two or three size_t numbers
        UInt8 mark_size_in_bytes;
        /// Is stride in rows between marks non fixed?
        bool is_adaptive;
        /// Fixed size in rows of one granule if index_granularity_bytes is zero
        size_t fixed_index_granularity;
        /// Approximate bytes size of one granule
        size_t index_granularity_bytes;
        IndexGranularityInfo(const MergeTreeSettings & settings);
        String getMarksFilePath(const String & column_path) const
        {
            return column_path + marks_file_extension;
        }
    };
    /// Attach the table corresponding to the directory in full_path (must end with /), with the given columns.
    /// Correctness of names and paths is not checked.
@ -312,6 +338,7 @@ public:
                  const ASTPtr & order_by_ast_,
                  const ASTPtr & primary_key_ast_,
                  const ASTPtr & sample_by_ast_, /// nullptr, if sampling is not supported.
                  const ASTPtr & ttl_table_ast_,
                  const MergingParams & merging_params_,
                  const MergeTreeSettings & settings_,
                  bool require_part_metadata_,
@ -494,6 +521,9 @@ public:
        const IndicesASTs & new_indices,
        bool skip_sanity_checks);
    /// Remove columns, that have been markedd as empty after zeroing values with expired ttl
    void removeEmptyColumnsFromPart(MergeTreeData::MutableDataPartPtr & data_part);
    /// Freezes all parts.
    void freezeAll(const String & with_name, const Context & context);
@ -514,6 +544,7 @@ public:
    bool hasSortingKey() const { return !sorting_key_columns.empty(); }
    bool hasPrimaryKey() const { return !primary_key_columns.empty(); }
    bool hasSkipIndices() const { return !skip_indices.empty(); }
    bool hasTableTTL() const { return ttl_table_ast != nullptr; }
    ASTPtr getSortingKeyAST() const { return sorting_key_expr_ast; }
    ASTPtr getPrimaryKeyAST() const { return primary_key_expr_ast; }
@ -569,6 +600,7 @@ public:
    MergeTreeDataFormatVersion format_version;
    Context global_context;
    IndexGranularityInfo index_granularity_info;
    /// Merging params - what additional actions to perform during merge.
    const MergingParams merging_params;
@ -601,10 +633,20 @@ public:
    Block primary_key_sample;
    DataTypes primary_key_data_types;
    struct TTLEntry
    {
        ExpressionActionsPtr expression;
        String result_column;
    };
    using TTLEntriesByName = std::unordered_map<String, TTLEntry>;
    TTLEntriesByName ttl_entries_by_name;
    TTLEntry ttl_table_entry;
    String sampling_expr_column_name;
    Names columns_required_for_sampling;
    const size_t index_granularity;
    const MergeTreeSettings settings;
    /// Limiting parallel sends per one table, used in DataPartsExchange
@ -625,6 +667,7 @@ private:
    ASTPtr order_by_ast;
    ASTPtr primary_key_ast;
    ASTPtr sample_by_ast;
    ASTPtr ttl_table_ast;
    bool require_part_metadata;
@ -735,6 +778,9 @@ private:
    void initPartitionKey();
    void setTTLExpressions(const ColumnsDescription::ColumnTTLs & new_column_ttls,
                           const ASTPtr & new_ttl_table_ast, bool only_check = false);
    /// Expression for column type conversion.
    /// If no conversions are needed, out_expression=nullptr.
    /// out_rename_map maps column files for the out_expression onto new table files.
--- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
@ -4,9 +4,11 @@
 #include <Storages/MergeTree/DiskSpaceMonitor.h>
 #include <Storages/MergeTree/SimpleMergeSelector.h>
 #include <Storages/MergeTree/AllMergeSelector.h>
 #include <Storages/MergeTree/TTLMergeSelector.h>
 #include <Storages/MergeTree/MergeList.h>
 #include <Storages/MergeTree/StorageFromMergeTreeDataPart.h>
 #include <Storages/MergeTree/BackgroundProcessingPool.h>
 #include <DataStreams/TTLBlockInputStream.h>
 #include <DataStreams/DistinctSortedBlockInputStream.h>
 #include <DataStreams/ExpressionBlockInputStream.h>
 #include <DataStreams/MergingSortedBlockInputStream.h>
@ -176,6 +178,7 @@ bool MergeTreeDataMergerMutator::selectPartsToMerge(
    const String * prev_partition_id = nullptr;
    const MergeTreeData::DataPartPtr * prev_part = nullptr;
    bool has_part_with_expired_ttl = false;
    for (const MergeTreeData::DataPartPtr & part : data_parts)
    {
        const String & partition_id = part->info.partition_id;
@ -191,6 +194,10 @@ bool MergeTreeDataMergerMutator::selectPartsToMerge(
        part_info.age = current_time - part->modification_time;
        part_info.level = part->info.level;
        part_info.data = &part;
        part_info.min_ttl = part->ttl_infos.part_min_ttl;
        if (part_info.min_ttl && part_info.min_ttl <= current_time)
            has_part_with_expired_ttl = true;
        partitions.back().emplace_back(part_info);
@ -210,7 +217,16 @@ bool MergeTreeDataMergerMutator::selectPartsToMerge(
    if (aggressive)
        merge_settings.base = 1;
    bool can_merge_with_ttl =
        (current_time - last_merge_with_ttl > data.settings.merge_with_ttl_timeout);
    /// NOTE Could allow selection of different merge strategy.
    if (can_merge_with_ttl && has_part_with_expired_ttl)
    {
        merge_selector = std::make_unique<TTLMergeSelector>(current_time);
        last_merge_with_ttl = current_time;
    }
    else
        merge_selector = std::make_unique<SimpleMergeSelector>(merge_settings);
    IMergeSelector::PartsInPartition parts_to_merge = merge_selector->select(
@ -224,7 +240,8 @@ bool MergeTreeDataMergerMutator::selectPartsToMerge(
        return false;
    }
-    if (parts_to_merge.size() == 1)
+    /// Allow to "merge" part with itself if we need remove some values with expired ttl
    if (parts_to_merge.size() == 1 && !has_part_with_expired_ttl)
        throw Exception("Logical error: merge selector returned only one part to merge", ErrorCodes::LOGICAL_ERROR);
    MergeTreeData::DataPartsVector parts;
@ -536,9 +553,18 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    new_data_part->relative_path = TMP_PREFIX + future_part.name;
    new_data_part->is_temp = true;
-    size_t sum_input_rows_upper_bound = merge_entry->total_size_marks * data.index_granularity;
+    size_t sum_input_rows_upper_bound = merge_entry->total_rows_count;
-    MergeAlgorithm merge_alg = chooseMergeAlgorithm(parts, sum_input_rows_upper_bound, gathering_columns, deduplicate);
+    bool need_remove_expired_values = false;
    for (const MergeTreeData::DataPartPtr & part : parts)
        new_data_part->ttl_infos.update(part->ttl_infos);
    const auto & part_min_ttl = new_data_part->ttl_infos.part_min_ttl;
    if (part_min_ttl && part_min_ttl <= time_of_merge)
        need_remove_expired_values = true;
    MergeAlgorithm merge_alg = chooseMergeAlgorithm(parts, sum_input_rows_upper_bound, gathering_columns, deduplicate, need_remove_expired_values);
    LOG_DEBUG(log, "Selected MergeAlgorithm: " << ((merge_alg == MergeAlgorithm::Vertical) ? "Vertical" : "Horizontal"));
@ -599,6 +625,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    MergeStageProgress horizontal_stage_progress(
        merge_alg == MergeAlgorithm::Horizontal ? 1.0 : column_sizes.keyColumnsWeight());
    for (const auto & part : parts)
    {
        auto input = std::make_unique<MergeTreeSequentialBlockInputStream>(
@ -629,16 +656,19 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    ///  that is going in insertion order.
    std::shared_ptr<IBlockInputStream> merged_stream;
    /// If merge is vertical we cannot calculate it
    bool blocks_are_granules_size = (merge_alg == MergeAlgorithm::Vertical);
    switch (data.merging_params.mode)
    {
        case MergeTreeData::MergingParams::Ordinary:
            merged_stream = std::make_unique<MergingSortedBlockInputStream>(
-                src_streams, sort_description, DEFAULT_MERGE_BLOCK_SIZE, 0, rows_sources_write_buf.get(), true);
+                src_streams, sort_description, DEFAULT_MERGE_BLOCK_SIZE, 0, rows_sources_write_buf.get(), true, blocks_are_granules_size);
            break;
        case MergeTreeData::MergingParams::Collapsing:
            merged_stream = std::make_unique<CollapsingSortedBlockInputStream>(
-                src_streams, sort_description, data.merging_params.sign_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get());
+                src_streams, sort_description, data.merging_params.sign_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get(), blocks_are_granules_size);
            break;
        case MergeTreeData::MergingParams::Summing:
@ -653,7 +683,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
        case MergeTreeData::MergingParams::Replacing:
            merged_stream = std::make_unique<ReplacingSortedBlockInputStream>(
-                src_streams, sort_description, data.merging_params.version_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get());
+                src_streams, sort_description, data.merging_params.version_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get(), blocks_are_granules_size);
            break;
        case MergeTreeData::MergingParams::Graphite:
@ -664,15 +694,24 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
        case MergeTreeData::MergingParams::VersionedCollapsing:
            merged_stream = std::make_unique<VersionedCollapsingSortedBlockInputStream>(
-                src_streams, sort_description, data.merging_params.sign_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get());
+                src_streams, sort_description, data.merging_params.sign_column, DEFAULT_MERGE_BLOCK_SIZE, rows_sources_write_buf.get(), blocks_are_granules_size);
            break;
    }
    if (deduplicate)
        merged_stream = std::make_shared<DistinctSortedBlockInputStream>(merged_stream, SizeLimits(), 0 /*limit_hint*/, Names());
    if (need_remove_expired_values)
        merged_stream = std::make_shared<TTLBlockInputStream>(merged_stream, data, new_data_part, time_of_merge);
    MergedBlockOutputStream to{
-        data, new_part_tmp_path, merging_columns, compression_codec, merged_column_to_size, data.settings.min_merge_bytes_to_use_direct_io};
+        data,
        new_part_tmp_path,
        merging_columns,
        compression_codec,
        merged_column_to_size,
        data.settings.min_merge_bytes_to_use_direct_io,
        blocks_are_granules_size};
    merged_stream->readPrefix();
    to.writePrefix();
@ -684,6 +723,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
    while (!actions_blocker.isCancelled() && (block = merged_stream->read()))
    {
        rows_written += block.rows();
        to.write(block);
        merge_entry->rows_written = merged_stream->getProfileInfo().rows;
@ -758,7 +798,15 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
            rows_sources_read_buf.seek(0, 0);
            ColumnGathererStream column_gathered_stream(column_name, column_part_streams, rows_sources_read_buf);
            MergedColumnOnlyOutputStream column_to(
-                data, column_gathered_stream.getHeader(), new_part_tmp_path, false, compression_codec, false, written_offset_columns);
+                data,
                column_gathered_stream.getHeader(),
                new_part_tmp_path,
                false,
                compression_codec,
                false,
                written_offset_columns,
                to.getIndexGranularity()
            );
            size_t column_elems_written = 0;
            column_to.writePrefix();
@ -857,6 +905,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
        data, future_part.name, future_part.part_info);
    new_data_part->relative_path = "tmp_mut_" + future_part.name;
    new_data_part->is_temp = true;
    new_data_part->ttl_infos = source_part->ttl_infos;
    String new_part_tmp_path = new_data_part->getFullPath();
@ -936,7 +985,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
            {
                String stream_name = IDataType::getFileNameForStream(entry.name, substream_path);
                files_to_skip.insert(stream_name + ".bin");
-                files_to_skip.insert(stream_name + ".mrk");
+                files_to_skip.insert(stream_name + data.index_granularity_info.marks_file_extension);
            };
            IDataType::SubstreamPath stream_path;
@ -959,7 +1008,15 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
        IMergedBlockOutputStream::WrittenOffsetColumns unused_written_offsets;
        MergedColumnOnlyOutputStream out(
-            data, in_header, new_part_tmp_path, /* sync = */ false, compression_codec, /* skip_offsets = */ false, unused_written_offsets);
+            data,
            in_header,
            new_part_tmp_path,
            /* sync = */ false,
            compression_codec,
            /* skip_offsets = */ false,
            unused_written_offsets,
            source_part->index_granularity
        );
        in->readPrefix();
        out.writePrefix();
@ -1002,7 +1059,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
        }
        new_data_part->rows_count = source_part->rows_count;
-        new_data_part->marks_count = source_part->marks_count;
+        new_data_part->index_granularity = source_part->index_granularity;
        new_data_part->index = source_part->index;
        new_data_part->partition.assign(source_part->partition);
        new_data_part->minmax_idx = source_part->minmax_idx;
@ -1016,12 +1073,14 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
 MergeTreeDataMergerMutator::MergeAlgorithm MergeTreeDataMergerMutator::chooseMergeAlgorithm(
    const MergeTreeData::DataPartsVector & parts, size_t sum_rows_upper_bound,
-    const NamesAndTypesList & gathering_columns, bool deduplicate) const
+    const NamesAndTypesList & gathering_columns, bool deduplicate, bool need_remove_expired_values) const
 {
    if (deduplicate)
        return MergeAlgorithm::Horizontal;
    if (data.settings.enable_vertical_merge_algorithm == 0)
        return MergeAlgorithm::Horizontal;
    if (need_remove_expired_values)
        return MergeAlgorithm::Horizontal;
    bool is_supported_storage =
        data.merging_params.mode == MergeTreeData::MergingParams::Ordinary ||
@ -1093,7 +1152,6 @@ MergeTreeData::DataPartPtr MergeTreeDataMergerMutator::renameMergedTemporaryPart
    return new_data_part;
 }
 size_t MergeTreeDataMergerMutator::estimateNeededDiskSpace(const MergeTreeData::DataPartsVector & source_parts)
 {
    size_t res = 0;
--- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.h
@ -127,7 +127,7 @@ private:
    MergeAlgorithm chooseMergeAlgorithm(
        const MergeTreeData::DataPartsVector & parts,
-        size_t rows_upper_bound, const NamesAndTypesList & gathering_columns, bool deduplicate) const;
+        size_t rows_upper_bound, const NamesAndTypesList & gathering_columns, bool deduplicate, bool need_remove_expired_values) const;
 private:
    MergeTreeData & data;
@ -137,6 +137,9 @@ private:
    /// When the last time you wrote to the log that the disk space was running out (not to write about this too often).
    time_t disk_space_warning_time = 0;
    /// Last time when TTLMergeSelector has been used
    time_t last_merge_with_ttl = 0;
 };
--- a/dbms/src/Storages/MergeTree/MergeTreeDataPart.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataPart.cpp
@ -22,9 +22,7 @@
 #include <Poco/DirectoryIterator.h>
 #include <common/logger_useful.h>
-
+#include <common/JSON.h>
 #define MERGE_TREE_MARK_SIZE (2 * sizeof(UInt64))
 namespace DB
 {
@ -37,6 +35,7 @@ namespace ErrorCodes
    extern const int CORRUPTED_DATA;
    extern const int NOT_FOUND_EXPECTED_DATA_PART;
    extern const int BAD_SIZE_OF_FILE_IN_DATA_PART;
    extern const int BAD_TTL_FILE;
 }
@ -137,10 +136,20 @@ void MergeTreeDataPart::MinMaxIndex::merge(const MinMaxIndex & other)
 MergeTreeDataPart::MergeTreeDataPart(MergeTreeData & storage_, const String & name_)
-    : storage(storage_), name(name_), info(MergeTreePartInfo::fromPartName(name_, storage.format_version))
+    : storage(storage_)
    , name(name_)
    , info(MergeTreePartInfo::fromPartName(name_, storage.format_version))
 {
 }
 MergeTreeDataPart::MergeTreeDataPart(const MergeTreeData & storage_, const String & name_, const MergeTreePartInfo & info_)
    : storage(storage_)
    , name(name_)
    , info(info_)
 {
 }
 /// Takes into account the fact that several columns can e.g. share their .size substreams.
 /// When calculating totals these should be counted only once.
 MergeTreeDataPart::ColumnSize MergeTreeDataPart::getColumnSizeImpl(
@ -164,7 +173,7 @@ MergeTreeDataPart::ColumnSize MergeTreeDataPart::getColumnSizeImpl(
            size.data_uncompressed += bin_checksum->second.uncompressed_size;
        }
-        auto mrk_checksum = checksums.files.find(file_name + ".mrk");
+        auto mrk_checksum = checksums.files.find(file_name + storage.index_granularity_info.marks_file_extension);
        if (mrk_checksum != checksums.files.end())
            size.marks += mrk_checksum->second.file_size;
    }, {});
@ -198,7 +207,6 @@ size_t MergeTreeDataPart::getFileSizeOrZero(const String & file_name) const
    return checksum->second.file_size;
 }
 /** Returns the name of a column with minimum compressed size (as returned by getColumnSize()).
  * If no checksums are present returns the name of the first physically existing column.
  */
@ -459,6 +467,11 @@ void MergeTreeDataPart::renameToDetached(const String & prefix) const
 }
 UInt64 MergeTreeDataPart::getMarksCount() const
 {
    return index_granularity.getMarksCount();
 }
 void MergeTreeDataPart::makeCloneInDetached(const String & prefix) const
 {
    Poco::Path src(getFullPath());
@ -476,24 +489,55 @@ void MergeTreeDataPart::loadColumnsChecksumsIndexes(bool require_columns_checksu
    loadColumns(require_columns_checksums);
    loadChecksums(require_columns_checksums);
-    loadIndex();
+    loadIndexGranularity();
-    loadRowsCount(); /// Must be called after loadIndex() as it uses the value of `marks_count`.
+    loadIndex(); /// Must be called after loadIndexGranularity as it uses the value of `index_granularity`
    loadRowsCount(); /// Must be called after loadIndex() as it uses the value of `index_granularity`.
    loadPartitionAndMinMaxIndex();
    loadTTLInfos();
    if (check_consistency)
        checkConsistency(require_columns_checksums);
 }
-
+void MergeTreeDataPart::loadIndexGranularity()
 void MergeTreeDataPart::loadIndex()
 {
    if (!marks_count)
 {
    if (columns.empty())
        throw Exception("No columns in part " + name, ErrorCodes::NO_FILE_IN_DATA_PART);
    const auto & granularity_info = storage.index_granularity_info;
-        marks_count = Poco::File(getFullPath() + escapeForFileName(columns.front().name) + ".mrk")
+    /// We can use any column, it doesn't matter
-            .getSize() / MERGE_TREE_MARK_SIZE;
+    std::string marks_file_path = granularity_info.getMarksFilePath(getFullPath() + escapeForFileName(columns.front().name));
    if (!Poco::File(marks_file_path).exists())
        throw Exception("Marks file '" + marks_file_path + "' doesn't exist", ErrorCodes::NO_FILE_IN_DATA_PART);
    size_t marks_file_size = Poco::File(marks_file_path).getSize();
    /// old version of marks with static index granularity
    if (!granularity_info.is_adaptive)
    {
        size_t marks_count = marks_file_size / granularity_info.mark_size_in_bytes;
        index_granularity.resizeWithFixedGranularity(marks_count, granularity_info.fixed_index_granularity); /// all the same
    }
    else
    {
        ReadBufferFromFile buffer(marks_file_path, marks_file_size, -1);
        while (!buffer.eof())
        {
            buffer.seek(sizeof(size_t) * 2, SEEK_CUR); /// skip offset_in_compressed file and offset_in_decompressed_block
            size_t granularity;
            readIntBinary(granularity, buffer);
            index_granularity.appendMark(granularity);
        }
        if (index_granularity.getMarksCount() * granularity_info.mark_size_in_bytes != marks_file_size)
            throw Exception("Cannot read all marks from file " + marks_file_path, ErrorCodes::CANNOT_READ_ALL_DATA);
    }
    index_granularity.setInitialized();
 }
 void MergeTreeDataPart::loadIndex()
 {
    /// It can be empty in case of mutations
    if (!index_granularity.isInitialized())
        throw Exception("Index granularity is not loaded before index loading", ErrorCodes::LOGICAL_ERROR);
    size_t key_size = storage.primary_key_columns.size();
@ -505,22 +549,22 @@ void MergeTreeDataPart::loadIndex()
        for (size_t i = 0; i < key_size; ++i)
        {
            loaded_index[i] = storage.primary_key_data_types[i]->createColumn();
-            loaded_index[i]->reserve(marks_count);
+            loaded_index[i]->reserve(index_granularity.getMarksCount());
        }
        String index_path = getFullPath() + "primary.idx";
        ReadBufferFromFile index_file = openForReading(index_path);
-        for (size_t i = 0; i < marks_count; ++i)    //-V756
+        for (size_t i = 0; i < index_granularity.getMarksCount(); ++i)    //-V756
            for (size_t j = 0; j < key_size; ++j)
                storage.primary_key_data_types[j]->deserializeBinary(*loaded_index[j], index_file);
        for (size_t i = 0; i < key_size; ++i)
        {
            loaded_index[i]->protect();
-            if (loaded_index[i]->size() != marks_count)
+            if (loaded_index[i]->size() != index_granularity.getMarksCount())
                throw Exception("Cannot read all data from index file " + index_path
-                    + "(expected size: " + toString(marks_count) + ", read: " + toString(loaded_index[i]->size()) + ")",
+                    + "(expected size: " + toString(index_granularity.getMarksCount()) + ", read: " + toString(loaded_index[i]->size()) + ")",
                    ErrorCodes::CANNOT_READ_ALL_DATA);
        }
@ -585,7 +629,7 @@ void MergeTreeDataPart::loadChecksums(bool require)
 void MergeTreeDataPart::loadRowsCount()
 {
-    if (marks_count == 0)
+    if (index_granularity.empty())
    {
        rows_count = 0;
    }
@ -601,8 +645,6 @@ void MergeTreeDataPart::loadRowsCount()
    }
    else
    {
        size_t rows_approx = storage.index_granularity * marks_count;
        for (const NameAndTypePair & column : columns)
        {
            ColumnPtr column_col = column.type->createColumn();
@ -624,10 +666,12 @@ void MergeTreeDataPart::loadRowsCount()
                    ErrorCodes::LOGICAL_ERROR);
            }
-            if (!(rows_count <= rows_approx && rows_approx < rows_count + storage.index_granularity))
+            size_t last_mark_index_granularity = index_granularity.getLastMarkRows();
            size_t rows_approx = index_granularity.getTotalRows();
            if (!(rows_count <= rows_approx && rows_approx < rows_count + last_mark_index_granularity))
                throw Exception(
                    "Unexpected size of column " + column.name + ": " + toString(rows_count) + " rows, expected "
-                    + toString(rows_approx) + "+-" + toString(storage.index_granularity) + " rows according to the index",
+                    + toString(rows_approx) + "+-" + toString(last_mark_index_granularity) + " rows according to the index",
                    ErrorCodes::LOGICAL_ERROR);
            return;
@ -637,6 +681,33 @@ void MergeTreeDataPart::loadRowsCount()
    }
 }
 void MergeTreeDataPart::loadTTLInfos()
 {
    String path = getFullPath() + "ttl.txt";
    if (Poco::File(path).exists())
    {
        ReadBufferFromFile in = openForReading(path);
        assertString("ttl format version: ", in);
        size_t format_version;
        readText(format_version, in);
        assertChar('\n', in);
        if (format_version == 1)
        {
            try
            {
                ttl_infos.read(in);
            }
            catch (const JSONException &)
            {
                throw Exception("Error while parsing file ttl.txt in part: " + name, ErrorCodes::BAD_TTL_FILE);
            }
        }
        else
            throw Exception("Unknown ttl format version: " + toString(format_version), ErrorCodes::BAD_TTL_FILE);
    }
 }
 void MergeTreeDataPart::accumulateColumnSizes(ColumnToSize & column_to_size) const
 {
    std::shared_lock<std::shared_mutex> part_lock(columns_lock);
@ -699,7 +770,7 @@ void MergeTreeDataPart::checkConsistency(bool require_part_metadata)
                name_type.type->enumerateStreams([&](const IDataType::SubstreamPath & substream_path)
                {
                    String file_name = IDataType::getFileNameForStream(name_type.name, substream_path);
-                    String mrk_file_name = file_name + ".mrk";
+                    String mrk_file_name = file_name + storage.index_granularity_info.marks_file_extension;
                    String bin_file_name = file_name + ".bin";
                    if (!checksums.files.count(mrk_file_name))
                        throw Exception("No " + mrk_file_name + " file checksum for column " + name_type.name + " in part " + path,
@ -763,7 +834,7 @@ void MergeTreeDataPart::checkConsistency(bool require_part_metadata)
        {
            name_type.type->enumerateStreams([&](const IDataType::SubstreamPath & substream_path)
            {
-                Poco::File file(IDataType::getFileNameForStream(name_type.name, substream_path) + ".mrk");
+                Poco::File file(IDataType::getFileNameForStream(name_type.name, substream_path) + storage.index_granularity_info.marks_file_extension);
                /// Missing file is Ok for case when new column was added.
                if (file.exists())
@ -794,7 +865,7 @@ bool MergeTreeDataPart::hasColumnFiles(const String & column) const
    String escaped_column = escapeForFileName(column);
    return Poco::File(prefix + escaped_column + ".bin").exists()
-        && Poco::File(prefix + escaped_column + ".mrk").exists();
+        && Poco::File(prefix + escaped_column + storage.index_granularity_info.marks_file_extension).exists();
 }
--- a/Show More
+++ b/Show More
		`@ -1 +1 @@`
			`Subproject commit fe5505e56c27b6ecb0dcbc40c49dc2caf4e9637f`				`Subproject commit 29439cf7fa32c1a2d62d925bb6d6a3f14668a4a2`