diff --git a/.gitmodules b/.gitmodules
index abd7b0282fc..8890e42447a 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -88,3 +88,6 @@
 [submodule "contrib/rapidjson"]
 	path = contrib/rapidjson
 	url = https://github.com/Tencent/rapidjson
+[submodule "contrib/mimalloc"]
+	path = contrib/mimalloc
+	url = https://github.com/ClickHouse-Extras/mimalloc
diff --git a/CHANGELOG.md b/CHANGELOG.md
index da59934ee47..bf86b2060e8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,114 @@
+## ClickHouse release 19.8.3.8, 2019-06-11
+
+### New Features
+* Added functions to work with JSON [#4686](https://github.com/yandex/ClickHouse/pull/4686) ([hcz](https://github.com/hczhcz)) [#5124](https://github.com/yandex/ClickHouse/pull/5124). ([Vitaly Baranov](https://github.com/vitlibar))
+* Add a function basename, with a similar behaviour to a basename function, which exists in a lot of languages (`os.path.basename` in python, `basename` in PHP, etc...). Work with both an UNIX-like path or a Windows path. [#5136](https://github.com/yandex/ClickHouse/pull/5136) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Added `LIMIT n, m BY` or `LIMIT m OFFSET n BY` syntax to set offset of n for LIMIT BY clause. [#5138](https://github.com/yandex/ClickHouse/pull/5138) ([Anton Popov](https://github.com/CurtizJ))
+* Added new data type `SimpleAggregateFunction`, which allows to have columns with light aggregation in an `AggregatingMergeTree`. This can only be used with simple functions like `any`, `anyLast`, `sum`, `min`, `max`. [#4629](https://github.com/yandex/ClickHouse/pull/4629) ([Boris Granveaud](https://github.com/bgranvea))
+* Added support for non-constant arguments in function `ngramDistance` [#5198](https://github.com/yandex/ClickHouse/pull/5198) ([Danila Kutenin](https://github.com/danlark1))
+* Added functions `skewPop`, `skewSamp`, `kurtPop` and `kurtSamp` to compute for sequence skewness, sample skewness, kurtosis and sample kurtosis respectively. [#5200](https://github.com/yandex/ClickHouse/pull/5200) ([hcz](https://github.com/hczhcz))
+* Support rename operation for `MaterializeView` storage. [#5209](https://github.com/yandex/ClickHouse/pull/5209) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Added server which allows connecting to ClickHouse using MySQL client. [#4715](https://github.com/yandex/ClickHouse/pull/4715) ([Yuriy Baranov](https://github.com/yurriy))
+* Add `toDecimal*OrZero` and `toDecimal*OrNull` functions. [#5291](https://github.com/yandex/ClickHouse/pull/5291) ([Artem Zuikov](https://github.com/4ertus2))
+* Support Decimal types in functions: `quantile`, `quantiles`, `median`, `quantileExactWeighted`, `quantilesExactWeighted`, medianExactWeighted. [#5304](https://github.com/yandex/ClickHouse/pull/5304) ([Artem Zuikov](https://github.com/4ertus2))
+* Added `toValidUTF8` function, which replaces all invalid UTF-8 characters by replacement character � (U+FFFD). [#5322](https://github.com/yandex/ClickHouse/pull/5322) ([Danila Kutenin](https://github.com/danlark1))
+* Added `format` function. Formatting constant pattern (simplified Python format pattern) with the strings listed in the arguments. [#5330](https://github.com/yandex/ClickHouse/pull/5330) ([Danila Kutenin](https://github.com/danlark1))
+* Added `system.detached_parts` table containing information about detached parts of `MergeTree` tables. [#5353](https://github.com/yandex/ClickHouse/pull/5353) ([akuzm](https://github.com/akuzm))
+* Added `ngramSearch` function to calculate the non-symmetric difference between needle and haystack. [#5418](https://github.com/yandex/ClickHouse/pull/5418)[#5422](https://github.com/yandex/ClickHouse/pull/5422)  ([Danila Kutenin](https://github.com/danlark1))
+* Implementation of basic machine learning methods (stochastic linear regression and logistic regression) using aggregate functions interface. Has different strategies for updating model weights (simple gradient descent, momentum method, Nesterov method). Also supports mini-batches of custom size. [#4943](https://github.com/yandex/ClickHouse/pull/4943) ([Quid37](https://github.com/Quid37))
+* Implementation of `geohashEncode` and `geohashDecode` functions. [#5003](https://github.com/yandex/ClickHouse/pull/5003) ([Vasily Nemkov](https://github.com/Enmk))
+* Added aggregate function `timeSeriesGroupSum`, which can aggregate different time series that sample timestamp not alignment. It will use linear interpolation between two sample timestamp and then sum time-series together. Added aggregate function `timeSeriesGroupRateSum`, which calculates the rate of time-series and then sum rates together. [#4542](https://github.com/yandex/ClickHouse/pull/4542) ([Yangkuan Liu](https://github.com/LiuYangkuan))
+* Added functions `IPv4CIDRtoIPv4Range` and `IPv6CIDRtoIPv6Range` to calculate the lower and higher bounds for an IP in the subnet using a CIDR. [#5095](https://github.com/yandex/ClickHouse/pull/5095) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Add a X-ClickHouse-Summary header when we send a query using HTTP with enabled setting `send_progress_in_http_headers`. Return the usual information of X-ClickHouse-Progress, with additional information like how many rows and bytes were inserted in the query. [#5116](https://github.com/yandex/ClickHouse/pull/5116) ([Guillaume Tassery](https://github.com/YiuRULE))
+
+### Improvements
+* Added `max_parts_in_total` setting for MergeTree family of tables (default: 100 000) that prevents unsafe specification of partition key #5166. [#5171](https://github.com/yandex/ClickHouse/pull/5171) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* `clickhouse-obfuscator`: derive seed for individual columns by combining initial seed with column name, not column position. This is intended to transform datasets with multiple related tables, so that tables will remain JOINable after transformation. [#5178](https://github.com/yandex/ClickHouse/pull/5178) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Added functions `JSONExtractRaw`, `JSONExtractKeyAndValues`. Renamed functions `jsonExtract<type>` to `JSONExtract<type>`. When something goes wrong these functions return the correspondent values, not `NULL`. Modified function `JSONExtract`, now it gets the return type from its last parameter and doesn't inject nullables. Implemented fallback to RapidJSON in case AVX2 instructions are not available. Simdjson library updated to a new version. [#5235](https://github.com/yandex/ClickHouse/pull/5235) ([Vitaly Baranov](https://github.com/vitlibar))
+* Now `if` and `multiIf` functions don't rely on the condition's `Nullable`, but rely on the branches for sql compatibility. [#5238](https://github.com/yandex/ClickHouse/pull/5238) ([Jian Wu](https://github.com/janplus))
+* `In` predicate now generates `Null` result from `Null` input like the `Equal` function. [#5152](https://github.com/yandex/ClickHouse/pull/5152) ([Jian Wu](https://github.com/janplus))
+* Check the time limit every (flush_interval / poll_timeout) number of rows from Kafka. This allows to break the reading from Kafka consumer more frequently and to check the time limits for the top-level streams [#5249](https://github.com/yandex/ClickHouse/pull/5249) ([Ivan](https://github.com/abyss7))
+* Link rdkafka with bundled SASL. It should allow to use SASL SCRAM authentication [#5253](https://github.com/yandex/ClickHouse/pull/5253) ([Ivan](https://github.com/abyss7))
+* Batched version of RowRefList for ALL JOINS. [#5267](https://github.com/yandex/ClickHouse/pull/5267) ([Artem Zuikov](https://github.com/4ertus2))
+* clickhouse-server: more informative listen error messages. [#5268](https://github.com/yandex/ClickHouse/pull/5268) ([proller](https://github.com/proller))
+* Support dictionaries in clickhouse-copier for functions in `<sharding_key>` [#5270](https://github.com/yandex/ClickHouse/pull/5270) ([proller](https://github.com/proller))
+* Add new setting `kafka_commit_every_batch` to regulate Kafka committing policy. 
+It allows to set commit mode: after every batch of messages is handled, or after the whole block is written to the storage. It's a trade-off between losing some messages or reading them twice in some extreme situations. [#5308](https://github.com/yandex/ClickHouse/pull/5308) ([Ivan](https://github.com/abyss7))
+* Make `windowFunnel` support other Unsigned Integer Types. [#5320](https://github.com/yandex/ClickHouse/pull/5320) ([sundyli](https://github.com/sundy-li))
+* Allow to shadow virtual column `_table` in Merge engine. [#5325](https://github.com/yandex/ClickHouse/pull/5325) ([Ivan](https://github.com/abyss7))
+* Make `sequenceMatch` aggregate functions support other unsigned Integer types [#5339](https://github.com/yandex/ClickHouse/pull/5339) ([sundyli](https://github.com/sundy-li))
+* Better error messages if checksum mismatch is most likely caused by hardware failures. [#5355](https://github.com/yandex/ClickHouse/pull/5355) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Check that underlying tables support sampling for `StorageMerge` [#5366](https://github.com/yandex/ClickHouse/pull/5366) ([Ivan](https://github.com/abyss7))
+* Сlose MySQL connections after their usage in external dictionaries. It is related to issue #893. [#5395](https://github.com/yandex/ClickHouse/pull/5395) ([Clément Rodriguez](https://github.com/clemrodriguez))
+* Improvements of MySQL Wire Protocol. Changed name of format to MySQLWire. Using RAII for calling RSA_free. Disabling SSL if context cannot be created. [#5419](https://github.com/yandex/ClickHouse/pull/5419) ([Yuriy Baranov](https://github.com/yurriy))
+* clickhouse-client: allow to run with unaccessable history file (read-only, no disk space, file is directory, ...). [#5431](https://github.com/yandex/ClickHouse/pull/5431) ([proller](https://github.com/proller))
+* Respect query settings in asynchronous INSERTs into Distributed tables. [#4936](https://github.com/yandex/ClickHouse/pull/4936) ([TCeason](https://github.com/TCeason))
+* Renamed functions `leastSqr` to `simpleLinearRegression`, `LinearRegression` to `linearRegression`, `LogisticRegression` to `logisticRegression`. [#5391](https://github.com/yandex/ClickHouse/pull/5391) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+
+### Performance Improvements
+* Paralellize processing of parts in alter modify query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
+* Optimizations in regular expressions extraction. [#5193](https://github.com/yandex/ClickHouse/pull/5193) [#5191](https://github.com/yandex/ClickHouse/pull/5191) ([Danila Kutenin](https://github.com/danlark1))
+* Do not add right join key column to join result if it's used only in join on section. [#5260](https://github.com/yandex/ClickHouse/pull/5260) ([Artem Zuikov](https://github.com/4ertus2))
+* Freeze the Kafka buffer after first empty response. It avoids multiple invokations of `ReadBuffer::next()` for empty result in some row-parsing streams. [#5283](https://github.com/yandex/ClickHouse/pull/5283) ([Ivan](https://github.com/abyss7))
+* `concat` function optimization for multiple arguments. [#5357](https://github.com/yandex/ClickHouse/pull/5357) ([Danila Kutenin](https://github.com/danlark1))
+* Query optimisation. Allow push down IN statement while rewriting commа/cross join into inner one. [#5396](https://github.com/yandex/ClickHouse/pull/5396) ([Artem Zuikov](https://github.com/4ertus2))
+* Upgrade our LZ4 implementation with reference one to have faster decompression. [#5070](https://github.com/yandex/ClickHouse/pull/5070) ([Danila Kutenin](https://github.com/danlark1))
+* Implemented MSD radix sort (based on kxsort), and partial sorting. [#5129](https://github.com/yandex/ClickHouse/pull/5129) ([Evgenii Pravda](https://github.com/kvinty))
+
+### Bug Fixes
+* Fix push require columns with join [#5192](https://github.com/yandex/ClickHouse/pull/5192) ([Winter Zhang](https://github.com/zhang2014))
+* Fixed bug, when ClickHouse is run by systemd, the command `sudo service clickhouse-server forcerestart` was not working as expected. [#5204](https://github.com/yandex/ClickHouse/pull/5204) ([proller](https://github.com/proller))
+* Fix http error codes in DataPartsExchange (interserver http server on 9009 port always returned code 200, even on errors). [#5216](https://github.com/yandex/ClickHouse/pull/5216) ([proller](https://github.com/proller))
+* Fix SimpleAggregateFunction for String longer than MAX_SMALL_STRING_SIZE [#5311](https://github.com/yandex/ClickHouse/pull/5311) ([Azat Khuzhin](https://github.com/azat))
+* Fix error for `Decimal` to `Nullable(Decimal)` conversion in IN. Support other Decimal to Decimal conversions (including different scales). [#5350](https://github.com/yandex/ClickHouse/pull/5350) ([Artem Zuikov](https://github.com/4ertus2))
+* Fixed FPU clobbering in simdjson library that lead to wrong calculation of `uniqHLL` and `uniqCombined` aggregate function and math functions such as `log`. [#5354](https://github.com/yandex/ClickHouse/pull/5354) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed handling mixed const/nonconst cases in JSON functions. [#5435](https://github.com/yandex/ClickHouse/pull/5435) ([Vitaly Baranov](https://github.com/vitlibar))
+* Fix `retention` function. Now all conditions that satisfy in a row of data are added to the data state. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
+* Fix result type for `quantileExact` with Decimals. [#5304](https://github.com/yandex/ClickHouse/pull/5304) ([Artem Zuikov](https://github.com/4ertus2)) 
+
+### Documentation
+*  Translate documentation for `CollapsingMergeTree` to chinese. [#5168](https://github.com/yandex/ClickHouse/pull/5168) ([张风啸](https://github.com/AlexZFX))
+* Translate some documentation about table engines to chinese. 
+    [#5134](https://github.com/yandex/ClickHouse/pull/5134)
+    [#5328](https://github.com/yandex/ClickHouse/pull/5328)
+    ([never lee](https://github.com/neverlee))
+    
+
+### Build/Testing/Packaging Improvements
+* Fix some sanitizer reports that show probable use-after-free.[#5139](https://github.com/yandex/ClickHouse/pull/5139) [#5143](https://github.com/yandex/ClickHouse/pull/5143) [#5393](https://github.com/yandex/ClickHouse/pull/5393) ([Ivan](https://github.com/abyss7))
+* Move performance tests out of separate directories for convenience. [#5158](https://github.com/yandex/ClickHouse/pull/5158) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix incorrect performance tests. [#5255](https://github.com/yandex/ClickHouse/pull/5255) ([alesapin](https://github.com/alesapin))
+* Added a tool to calculate checksums caused by bit flips to debug hardware issues. [#5334](https://github.com/yandex/ClickHouse/pull/5334) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Make runner script more usable. [#5340](https://github.com/yandex/ClickHouse/pull/5340)[#5360](https://github.com/yandex/ClickHouse/pull/5360) ([filimonov](https://github.com/filimonov))
+* Add small instruction how to write performance tests. [#5408](https://github.com/yandex/ClickHouse/pull/5408) ([alesapin](https://github.com/alesapin))
+* Add ability to make substitutions in create, fill and drop query in performance tests [#5367](https://github.com/yandex/ClickHouse/pull/5367) ([Olga Khvostikova](https://github.com/stavrolia))
+
+## ClickHouse release 19.7.5.27, 2019-06-09
+
+### New features
+* Added bitmap related functions `bitmapHasAny` and `bitmapHasAll` analogous to `hasAny` and `hasAll` functions for arrays. [#5279](https://github.com/yandex/ClickHouse/pull/5279) ([Sergi Vladykin](https://github.com/svladykin))
+
+### Bug Fixes
+* Fix segfault on `minmax` INDEX with Null value. [#5246](https://github.com/yandex/ClickHouse/pull/5246) ([Nikita Vasilev](https://github.com/nikvas0))
+* Mark all input columns in LIMIT BY as required output. It fixes 'Not found column' error in some distributed queries. [#5407](https://github.com/yandex/ClickHouse/pull/5407) ([Constantin S. Pan](https://github.com/kvap))
+* Fix "Column '0' already exists" error in `SELECT .. PREWHERE` on column with DEFAULT [#5397](https://github.com/yandex/ClickHouse/pull/5397) ([proller](https://github.com/proller))
+* Fix `ALTER MODIFY TTL` query on `ReplicatedMergeTree`. [#5539](https://github.com/yandex/ClickHouse/pull/5539/commits) ([Anton Popov](https://github.com/CurtizJ))
+* Don't crash the server when Kafka consumers have failed to start. [#5285](https://github.com/yandex/ClickHouse/pull/5285) ([Ivan](https://github.com/abyss7))
+* Fixed bitmap functions produce wrong result. [#5359](https://github.com/yandex/ClickHouse/pull/5359) ([Andy Yang](https://github.com/andyyzh))
+* Fix element_count for hashed dictionary (do not include duplicates) [#5440](https://github.com/yandex/ClickHouse/pull/5440) ([Azat Khuzhin](https://github.com/azat))
+* Use contents of environment variable TZ as the name for timezone. It helps to correctly detect default timezone in some cases.[#5443](https://github.com/yandex/ClickHouse/pull/5443) ([Ivan](https://github.com/abyss7))
+* Do not try to convert integers in `dictGetT` functions, because it doesn't work correctly. Throw an exception instead. [#5446](https://github.com/yandex/ClickHouse/pull/5446) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix settings in ExternalData HTTP request. [#5455](https://github.com/yandex/ClickHouse/pull/5455) ([Danila
+  Kutenin](https://github.com/danlark1))
+* Fix bug when parts were removed only from FS without dropping them from Zookeeper. [#5520](https://github.com/yandex/ClickHouse/pull/5520) ([alesapin](https://github.com/alesapin))
+* Fix segmentation fault in `bitmapHasAny` function. [#5528](https://github.com/yandex/ClickHouse/pull/5528) ([Zhichang Yu](https://github.com/yuzhichang))
+* Fixed error when replication connection pool doesn't retry to resolve host, even when DNS cache was dropped. [#5534](https://github.com/yandex/ClickHouse/pull/5534) ([alesapin](https://github.com/alesapin))
+* Fixed `DROP INDEX IF EXISTS` query. Now `ALTER TABLE ... DROP INDEX IF EXISTS ...` query doesn't raise an exception if provided index does not exist. [#5524](https://github.com/yandex/ClickHouse/pull/5524) ([Gleb Novikov](https://github.com/NanoBjorn))
+* Fix union all supertype column. There were cases with inconsistent data and column types of resulting columns. [#5503](https://github.com/yandex/ClickHouse/pull/5503) ([Artem Zuikov](https://github.com/4ertus2))
+* Skip ZNONODE during DDL query processing. Before if another node removes the znode in task queue, the one that
+did not process it, but already get list of children, will terminate the DDLWorker thread. [#5489](https://github.com/yandex/ClickHouse/pull/5489) ([Azat Khuzhin](https://github.com/azat))
+* Fix INSERT into Distributed() table with MATERIALIZED column. [#5429](https://github.com/yandex/ClickHouse/pull/5429) ([Azat Khuzhin](https://github.com/azat))
+
 ## ClickHouse release 19.7.3.9, 2019-05-30
 
 ### New Features
@@ -60,6 +171,16 @@ lee](https://github.com/neverlee))
   [#5110](https://github.com/yandex/ClickHouse/pull/5110)
 ([proller](https://github.com/proller))
 
+## ClickHouse release 19.6.3.18, 2019-06-13
+
+### Bug Fixes
+* Fixed IN condition pushdown for queries from table functions `mysql` and `odbc` and corresponding table engines. This fixes #3540 and #2384. [#5313](https://github.com/yandex/ClickHouse/pull/5313) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix deadlock in Zookeeper. [#5297](https://github.com/yandex/ClickHouse/pull/5297) ([github1youlc](https://github.com/github1youlc))
+* Allow quoted decimals in CSV. [#5284](https://github.com/yandex/ClickHouse/pull/5284) ([Artem Zuikov](https://github.com/4ertus2) 
+* Disallow conversion from float Inf/NaN into Decimals (throw exception). [#5282](https://github.com/yandex/ClickHouse/pull/5282) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix data race in rename query. [#5247](https://github.com/yandex/ClickHouse/pull/5247) ([Winter Zhang](https://github.com/zhang2014))
+* Temporarily disable LFAlloc. Usage of LFAlloc might lead to a lot of MAP_FAILED in allocating UncompressedCache and in a result to crashes of queries at high loaded servers. [cfdba93](https://github.com/yandex/ClickHouse/commit/cfdba938ce22f16efeec504f7f90206a515b1280)([Danila Kutenin](https://github.com/danlark1))
+
 ## ClickHouse release 19.6.2.11, 2019-05-13
 
 ### New Features
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 08c7cd4d60f..405d118ad34 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -16,6 +16,8 @@ set(CMAKE_LINK_DEPENDS_NO_SHARED 1) # Do not relink all depended targets on .so
 set(CMAKE_CONFIGURATION_TYPES "RelWithDebInfo;Debug;Release;MinSizeRel" CACHE STRING "" FORCE)
 set(CMAKE_DEBUG_POSTFIX "d" CACHE STRING "Generate debug library name with a postfix.")    # To be consistent with CMakeLists from contrib libs.
 
+include (cmake/arch.cmake)
+
 option(ENABLE_IPO "Enable inter-procedural optimization (aka LTO)" OFF) # need cmake 3.9+
 if(ENABLE_IPO)
     cmake_policy(SET CMP0069 NEW)
@@ -31,12 +33,12 @@ else()
     message(STATUS "IPO/LTO not enabled.")
 endif()
 
-if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
+if (COMPILER_GCC)
     # Require at least gcc 7
     if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7 AND NOT CMAKE_VERSION VERSION_LESS 2.8.9)
         message (FATAL_ERROR "GCC version must be at least 7. For example, if GCC 7 is available under gcc-7, g++-7 names, do the following: export CC=gcc-7 CXX=g++-7; rm -rf CMakeCache.txt CMakeFiles; and re run cmake or ./release.")
     endif ()
-elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+elseif (COMPILER_CLANG)
     # Require at least clang 5
     if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5)
         message (FATAL_ERROR "Clang version must be at least 5.")
@@ -81,7 +83,6 @@ endif ()
 
 include (cmake/sanitize.cmake)
 
-include (cmake/arch.cmake)
 
 if (CMAKE_GENERATOR STREQUAL "Ninja")
     # Turn on colored output. https://github.com/ninja-build/ninja/wiki/FAQ
@@ -102,13 +103,12 @@ if (COMPILER_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "8.3.0")
     set (CXX_WARNING_FLAGS "${CXX_WARNING_FLAGS} -Wno-array-bounds")
 endif ()
 
-if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+if (COMPILER_CLANG)
     # clang: warning: argument unused during compilation: '-stdlib=libc++'
     # clang: warning: argument unused during compilation: '-specs=/usr/share/dpkg/no-pie-compile.specs' [-Wunused-command-line-argument]
     set (COMMON_WARNING_FLAGS "${COMMON_WARNING_FLAGS} -Wno-unused-command-line-argument")
 endif ()
 
-option (TEST_COVERAGE "Enables flags for test coverage" OFF)
 option (ENABLE_TESTS "Enables tests" ON)
 
 if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64")
@@ -128,7 +128,7 @@ string(REGEX MATCH "-?[0-9]+(.[0-9]+)?$" COMPILER_POSTFIX ${CMAKE_CXX_COMPILER})
 find_program (LLD_PATH NAMES "lld${COMPILER_POSTFIX}" "lld")
 find_program (GOLD_PATH NAMES "gold")
 
-if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND LLD_PATH AND NOT LINKER_NAME)
+if (COMPILER_CLANG AND LLD_PATH AND NOT LINKER_NAME)
     set (LINKER_NAME "lld")
 elseif (GOLD_PATH)
     set (LINKER_NAME "gold")
@@ -162,7 +162,7 @@ if (ARCH_NATIVE)
 endif ()
 
 # Special options for better optimized code with clang
-#if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+#if (COMPILER_CLANG)
 #    set (CMAKE_CXX_FLAGS_RELWITHDEBINFO  "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -Wno-unused-command-line-argument -mllvm -inline-threshold=10000")
 #endif ()
 
@@ -177,6 +177,14 @@ else ()
     set (CXX_FLAGS_INTERNAL_COMPILER "-std=c++1z")
 endif ()
 
+option(WITH_COVERAGE "Build with coverage." 0)
+if(WITH_COVERAGE AND COMPILER_CLANG)
+   set(COMPILER_FLAGS "${COMPILER_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
+endif()
+if(WITH_COVERAGE AND COMPILER_GCC)
+   set(COMPILER_FLAGS "${COMPILER_FLAGS} -fprofile-arcs -ftest-coverage")
+endif()
+
 set (CMAKE_BUILD_COLOR_MAKEFILE          ON)
 set (CMAKE_CXX_FLAGS                     "${CMAKE_CXX_FLAGS} ${COMPILER_FLAGS} ${PLATFORM_EXTRA_CXX_FLAG} -fno-omit-frame-pointer ${COMMON_WARNING_FLAGS} ${CXX_WARNING_FLAGS}")
 #set (CMAKE_CXX_FLAGS_RELEASE             "${CMAKE_CXX_FLAGS_RELEASE} ${CMAKE_CXX_FLAGS_ADD}")
@@ -188,10 +196,8 @@ set (CMAKE_C_FLAGS                       "${CMAKE_C_FLAGS} ${COMPILER_FLAGS} -fn
 set (CMAKE_C_FLAGS_RELWITHDEBINFO        "${CMAKE_C_FLAGS_RELWITHDEBINFO} -O3 ${CMAKE_C_FLAGS_ADD}")
 set (CMAKE_C_FLAGS_DEBUG                 "${CMAKE_C_FLAGS_DEBUG} -O0 -g3 -ggdb3 -fno-inline ${CMAKE_C_FLAGS_ADD}")
 
-
 include (cmake/use_libcxx.cmake)
 
-
 # Set standard, system and compiler libraries explicitly.
 # This is intended for more control of what we are linking.
 
@@ -251,7 +257,6 @@ if (DEFAULT_LIBS)
     set(CMAKE_CXX_STANDARD_LIBRARIES ${DEFAULT_LIBS})
 endif ()
 
-
 if (NOT MAKE_STATIC_LIBRARIES)
     set(CMAKE_POSITION_INDEPENDENT_CODE ON)
 endif ()
@@ -268,11 +273,6 @@ if (USE_INCLUDE_WHAT_YOU_USE)
     endif()
 endif ()
 
-# Flags for test coverage
-if (TEST_COVERAGE)
-    set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -DIS_DEBUG")
-endif (TEST_COVERAGE)
-
 if (ENABLE_TESTS)
     message (STATUS "Tests are enabled")
 endif ()
@@ -336,7 +336,7 @@ include (cmake/find_hdfs3.cmake) # uses protobuf
 include (cmake/find_consistent-hashing.cmake)
 include (cmake/find_base64.cmake)
 include (cmake/find_hyperscan.cmake)
-include (cmake/find_lfalloc.cmake)
+include (cmake/find_mimalloc.cmake)
 include (cmake/find_simdjson.cmake)
 include (cmake/find_rapidjson.cmake)
 find_contrib_lib(cityhash)
diff --git a/README.md b/README.md
index 7db6a1a679d..e3c4f407c8d 100644
--- a/README.md
+++ b/README.md
@@ -12,8 +12,6 @@ ClickHouse is an open-source column-oriented database management system that all
 * You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
 
 ## Upcoming Events
-* [ClickHouse on HighLoad++ Siberia](https://www.highload.ru/siberia/2019/abstracts/5348) on June 24-25.
-* [ClickHouse Meetup in Novosibirsk](https://events.yandex.ru/events/ClickHouse/26-June-2019/) on June 26.
 * [ClickHouse Meetup in Minsk](https://yandex.ru/promo/metrica/clickhouse-minsk) on July 11.
 * [ClickHouse Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
 * [ClickHouse Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
diff --git a/cmake/find_lfalloc.cmake b/cmake/find_lfalloc.cmake
deleted file mode 100644
index 32cb1e7d5d5..00000000000
--- a/cmake/find_lfalloc.cmake
+++ /dev/null
@@ -1,11 +0,0 @@
-# TODO(danlark1). Disable LFAlloc for a while to fix mmap count problem
-if (NOT OS_LINUX AND NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE AND NOT OS_FREEBSD AND NOT APPLE)
-    option (ENABLE_LFALLOC "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
-endif ()
-
-if (ENABLE_LFALLOC)
-    set (USE_LFALLOC 1)
-    set (USE_LFALLOC_RANDOM_HINT 1)
-    set (LFALLOC_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/lfalloc/src)
-    message (STATUS "Using lfalloc=${USE_LFALLOC}: ${LFALLOC_INCLUDE_DIR}")
-endif ()
diff --git a/cmake/find_mimalloc.cmake b/cmake/find_mimalloc.cmake
new file mode 100644
index 00000000000..6e3f24625b6
--- /dev/null
+++ b/cmake/find_mimalloc.cmake
@@ -0,0 +1,15 @@
+if (OS_LINUX AND NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE)
+    option (ENABLE_MIMALLOC "Set to FALSE to disable usage of mimalloc for internal ClickHouse caches" ${NOT_UNBUNDLED})
+endif ()
+
+if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/mimalloc/include/mimalloc.h")
+    message (WARNING "submodule contrib/mimalloc is missing. to fix try run: \n git submodule update --init --recursive")
+    return()
+endif ()
+
+if (ENABLE_MIMALLOC)
+    set (MIMALLOC_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/mimalloc/include)
+    set (USE_MIMALLOC 1)
+    set (MIMALLOC_LIBRARY mimalloc-static)
+    message (STATUS "Using mimalloc: ${MIMALLOC_INCLUDE_DIR} : ${MIMALLOC_LIBRARY}")
+endif ()
diff --git a/contrib/CMakeLists.txt b/contrib/CMakeLists.txt
index 737b6d72bee..78ddc692b3d 100644
--- a/contrib/CMakeLists.txt
+++ b/contrib/CMakeLists.txt
@@ -1,11 +1,11 @@
 # Third-party libraries may have substandard code.
 
 if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch -Wno-stringop-truncation")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -Wno-array-bounds -Wno-missing-attributes -Wno-stringop-truncation -std=c++1z")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z")
 elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch -Wno-string-plus-int")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z")
 endif ()
 
 set_property(DIRECTORY PROPERTY EXCLUDE_FROM_ALL 1)
@@ -321,3 +321,7 @@ endif()
 if (USE_SIMDJSON)
     add_subdirectory (simdjson-cmake)
 endif()
+
+if (USE_MIMALLOC)
+    add_subdirectory (mimalloc)
+endif()
diff --git a/contrib/lfalloc/src/lf_allocX64.h b/contrib/lfalloc/src/lf_allocX64.h
deleted file mode 100644
index 12190f0712f..00000000000
--- a/contrib/lfalloc/src/lf_allocX64.h
+++ /dev/null
@@ -1,1820 +0,0 @@
-#pragma once
-
-#include <stdlib.h>
-#include <stdio.h>
-#include <stdarg.h>
-
-#include "lfmalloc.h"
-
-#include "util/system/compiler.h"
-#include "util/system/types.h"
-#include <random>
-
-#ifdef _MSC_VER
-#ifndef _CRT_SECURE_NO_WARNINGS
-#define _CRT_SECURE_NO_WARNINGS
-#endif
-#ifdef _M_X64
-#define _64_
-#endif
-#include <intrin.h>
-#define WIN32_LEAN_AND_MEAN
-#include <Windows.h>
-#pragma intrinsic(_InterlockedCompareExchange)
-#pragma intrinsic(_InterlockedExchangeAdd)
-
-#include <new>
-#include <assert.h>
-#include <errno.h>
-
-#define PERTHREAD __declspec(thread)
-#define _win_
-#define Y_FORCE_INLINE __forceinline
-
-using TAtomic = volatile long;
-
-static inline long AtomicAdd(TAtomic& a, long b) {
-    return _InterlockedExchangeAdd(&a, b) + b;
-}
-
-static inline long AtomicSub(TAtomic& a, long b) {
-    return AtomicAdd(a, -b);
-}
-
-#define Y_ASSERT_NOBT(x) ((void)0)
-
-#else
-
-#include "util/system/defaults.h"
-#include "util/system/atomic.h"
-#include <cassert>
-
-#if !defined(NDEBUG) && !defined(__GCCXML__)
-#define Y_ASSERT_NOBT(a)                       \
-    do {                                       \
-        if (Y_UNLIKELY(!(a))) {                \
-            assert(false && (a));              \
-        }                                      \
-    } while (0)
-#else
-#define Y_ASSERT_NOBT(a)                       \
-    do {                                       \
-        if (false) {                           \
-            bool __xxx = static_cast<bool>(a); \
-            Y_UNUSED(__xxx);                   \
-        }                                      \
-    } while (0)
-#endif
-
-#include <pthread.h>
-#include <sys/mman.h>
-#include <stdlib.h>
-#include <memory.h>
-#include <new>
-#include <errno.h>
-
-#if defined(_linux_)
-#if !defined(MADV_HUGEPAGE)
-#define MADV_HUGEPAGE 14
-#endif
-#if !defined(MAP_HUGETLB)
-#define MAP_HUGETLB 0x40000
-#endif
-#endif
-
-#define PERTHREAD __thread
-
-#endif
-
-#ifndef _darwin_
-
-#ifndef Y_ARRAY_SIZE
-#define Y_ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
-#endif
-
-#ifndef NDEBUG
-#define DBG_FILL_MEMORY
-static bool FillMemoryOnAllocation = true;
-#endif
-
-static bool TransparentHugePages = false; // force MADV_HUGEPAGE for large allocs
-static bool MapHugeTLB = false;           // force MAP_HUGETLB for small allocs
-static bool EnableDefrag = true;
-
-// Buffers that are larger than this size will not be filled with 0xcf
-#ifndef DBG_FILL_MAX_SIZE
-#define DBG_FILL_MAX_SIZE 0x01000000000000ULL
-#endif
-
-template <class T>
-inline T* DoCas(T* volatile* target, T* exchange, T* compare) {
-#if defined(_linux_)
-    return __sync_val_compare_and_swap(target, compare, exchange);
-#elif defined(_WIN32)
-#ifdef _64_
-    return (T*)_InterlockedCompareExchange64((__int64*)target, (__int64)exchange, (__int64)compare);
-#else
-    //return (T*)InterlockedCompareExchangePointer(targetVoidP, exchange, compare);
-    return (T*)_InterlockedCompareExchange((LONG*)target, (LONG)exchange, (LONG)compare);
-#endif
-#elif defined(__i386) || defined(__x86_64__)
-    union {
-        T* volatile* NP;
-        void* volatile* VoidP;
-    } gccSucks;
-    gccSucks.NP = target;
-    void* volatile* targetVoidP = gccSucks.VoidP;
-
-    __asm__ __volatile__(
-        "lock\n\t"
-        "cmpxchg %2,%0\n\t"
-        : "+m"(*(targetVoidP)), "+a"(compare)
-        : "r"(exchange)
-        : "cc", "memory");
-    return compare;
-#else
-#error inline_cas not defined for this platform
-#endif
-}
-
-#ifdef _64_
-const uintptr_t N_MAX_WORKSET_SIZE = 0x100000000ll * 200;
-const uintptr_t N_HUGE_AREA_FINISH = 0x700000000000ll;
-#ifndef _freebsd_
-const uintptr_t LINUX_MMAP_AREA_START = 0x100000000ll;
-static uintptr_t volatile linuxAllocPointer = LINUX_MMAP_AREA_START;
-static uintptr_t volatile linuxAllocPointerHuge = LINUX_MMAP_AREA_START + N_MAX_WORKSET_SIZE;
-#endif
-#else
-const uintptr_t N_MAX_WORKSET_SIZE = 0xffffffff;
-#endif
-#define ALLOC_START ((char*)0)
-
-const size_t N_CHUNK_SIZE = 1024 * 1024;
-const size_t N_CHUNKS = N_MAX_WORKSET_SIZE / N_CHUNK_SIZE;
-const size_t N_LARGE_ALLOC_SIZE = N_CHUNK_SIZE * 128;
-
-// map size idx to size in bytes
-#ifdef LFALLOC_YT
-const int N_SIZES = 27;
-#else
-const int N_SIZES = 25;
-#endif
-const int nSizeIdxToSize[N_SIZES] = {
-    -1,
-#if defined(_64_)
-    16, 16, 32, 32, 48, 64, 96, 128,
-#else
-    8,
-    16,
-    24,
-    32,
-    48,
-    64,
-    96,
-    128,
-#endif
-    192, 256, 384, 512, 768, 1024, 1536, 2048,
-    3072, 4096, 6144, 8192, 12288, 16384, 24576, 32768,
-#ifdef LFALLOC_YT
-    49152, 65536
-#endif
-};
-#ifdef LFALLOC_YT
-const size_t N_MAX_FAST_SIZE = 65536;
-#else
-const size_t N_MAX_FAST_SIZE = 32768;
-#endif
-const unsigned char size2idxArr1[64 + 1] = {
-    1,
-#if defined(_64_)
-    2, 2, 4, 4, // 16, 16, 32, 32
-#else
-    1, 2, 3, 4, // 8, 16, 24, 32
-#endif
-    5, 5, 6, 6,                                                     // 48, 64
-    7, 7, 7, 7, 8, 8, 8, 8,                                         // 96, 128
-    9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10,         // 192, 256
-    11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, // 384
-    12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12  // 512
-};
-#ifdef LFALLOC_YT
-const unsigned char size2idxArr2[256] = {
-#else
-const unsigned char size2idxArr2[128] = {
-#endif
-    12, 12, 13, 14,                                                 // 512, 512, 768, 1024
-    15, 15, 16, 16,                                                 // 1536, 2048
-    17, 17, 17, 17, 18, 18, 18, 18,                                 // 3072, 4096
-    19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, // 6144, 8192
-    21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, // 12288
-    22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, // 16384
-    23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
-    23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, // 24576
-    24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
-    24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, // 32768
-#ifdef LFALLOC_YT
-    25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
-    25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
-    25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
-    25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, // 49152
-    26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
-    26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
-    26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
-    26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, // 65536
-#endif
-};
-
-// map entry number to size idx
-// special size idx's: 0 = not used, -1 = mem locked, but not allocated
-static volatile char chunkSizeIdx[N_CHUNKS];
-const int FREE_CHUNK_ARR_BUF = 0x20000; // this is effectively 128G of free memory (with 1M chunks), should not be exhausted actually
-static volatile uintptr_t freeChunkArr[FREE_CHUNK_ARR_BUF];
-static volatile int freeChunkCount;
-
-static void AddFreeChunk(uintptr_t chunkId) {
-    chunkSizeIdx[chunkId] = -1;
-    if (Y_UNLIKELY(freeChunkCount == FREE_CHUNK_ARR_BUF))
-        NMalloc::AbortFromCorruptedAllocator(); // free chunks arrray overflowed
-    freeChunkArr[freeChunkCount++] = chunkId;
-}
-
-static bool GetFreeChunk(uintptr_t* res) {
-    if (freeChunkCount == 0) {
-        *res = 0;
-        return false;
-    }
-    *res = freeChunkArr[--freeChunkCount];
-    return true;
-}
-
-//////////////////////////////////////////////////////////////////////////
-enum ELFAllocCounter {
-    CT_USER_ALLOC,     // accumulated size requested by user code
-    CT_MMAP,           // accumulated mmapped size
-    CT_MMAP_CNT,       // number of mmapped regions
-    CT_MUNMAP,         // accumulated unmmapped size
-    CT_MUNMAP_CNT,     // number of munmaped regions
-    CT_SYSTEM_ALLOC,   // accumulated allocated size for internal lfalloc needs
-    CT_SYSTEM_FREE,    // accumulated deallocated size for internal lfalloc needs
-    CT_SMALL_ALLOC,    // accumulated allocated size for fixed-size blocks
-    CT_SMALL_FREE,     // accumulated deallocated size for fixed-size blocks
-    CT_LARGE_ALLOC,    // accumulated allocated size for large blocks
-    CT_LARGE_FREE,     // accumulated deallocated size for large blocks
-    CT_SLOW_ALLOC_CNT, // number of slow (not LF) allocations
-    CT_DEGRAGMENT_CNT, // number of memory defragmentations
-    CT_MAX
-};
-
-static Y_FORCE_INLINE void IncrementCounter(ELFAllocCounter counter, size_t value);
-
-//////////////////////////////////////////////////////////////////////////
-enum EMMapMode {
-    MM_NORMAL, // memory for small allocs
-    MM_HUGE    // memory for large allocs
-};
-
-#ifndef _MSC_VER
-inline void VerifyMmapResult(void* result) {
-    if (Y_UNLIKELY(result == MAP_FAILED))
-        NMalloc::AbortFromCorruptedAllocator(); // negative size requested? or just out of mem
-}
-#endif
-
-#if !defined(_MSC_VER) && !defined(_freebsd_) && defined(_64_)
-static char* AllocWithMMapLinuxImpl(uintptr_t sz, EMMapMode mode) {
-    char* volatile* areaPtr;
-    char* areaStart;
-    uintptr_t areaFinish;
-
-    int mapProt = PROT_READ | PROT_WRITE;
-    int mapFlags = MAP_PRIVATE | MAP_ANON;
-
-    if (mode == MM_HUGE) {
-        areaPtr = reinterpret_cast<char* volatile*>(&linuxAllocPointerHuge);
-        areaStart = reinterpret_cast<char*>(LINUX_MMAP_AREA_START + N_MAX_WORKSET_SIZE);
-        areaFinish = N_HUGE_AREA_FINISH;
-    } else {
-        areaPtr = reinterpret_cast<char* volatile*>(&linuxAllocPointer);
-        areaStart = reinterpret_cast<char*>(LINUX_MMAP_AREA_START);
-        areaFinish = N_MAX_WORKSET_SIZE;
-
-        if (MapHugeTLB) {
-            mapFlags |= MAP_HUGETLB;
-        }
-    }
-
-    bool wrapped = false;
-    for (;;) {
-        char* prevAllocPtr = *areaPtr;
-        char* nextAllocPtr = prevAllocPtr + sz;
-        if (uintptr_t(nextAllocPtr - (char*)nullptr) >= areaFinish) {
-            if (Y_UNLIKELY(wrapped)) {
-                // virtual memory is over fragmented
-                NMalloc::AbortFromCorruptedAllocator();
-            }
-            // wrap after all area is used
-            DoCas(areaPtr, areaStart, prevAllocPtr);
-            wrapped = true;
-            continue;
-        }
-
-        if (DoCas(areaPtr, nextAllocPtr, prevAllocPtr) != prevAllocPtr)
-            continue;
-
-        char* largeBlock = (char*)mmap(prevAllocPtr, sz, mapProt, mapFlags, -1, 0);
-        VerifyMmapResult(largeBlock);
-        if (largeBlock == prevAllocPtr)
-            return largeBlock;
-        if (largeBlock)
-            munmap(largeBlock, sz);
-
-        if (sz < 0x80000) {
-            // skip utilized area with big steps
-            DoCas(areaPtr, nextAllocPtr + 0x10 * 0x10000, nextAllocPtr);
-        }
-    }
-}
-#endif
-
-static char* AllocWithMMap(uintptr_t sz, EMMapMode mode) {
-    (void)mode;
-#ifdef _MSC_VER
-    char* largeBlock = (char*)VirtualAlloc(0, sz, MEM_RESERVE, PAGE_READWRITE);
-    if (Y_UNLIKELY(largeBlock == nullptr))
-        NMalloc::AbortFromCorruptedAllocator(); // out of memory
-    if (Y_UNLIKELY(uintptr_t(((char*)largeBlock - ALLOC_START) + sz) >= N_MAX_WORKSET_SIZE))
-        NMalloc::AbortFromCorruptedAllocator(); // out of working set, something has broken
-#else
-#if defined(_freebsd_) || !defined(_64_) || defined(USE_LFALLOC_RANDOM_HINT)
-    uintptr_t areaStart;
-    uintptr_t areaFinish;
-    if (mode == MM_HUGE) {
-        areaStart = LINUX_MMAP_AREA_START + N_MAX_WORKSET_SIZE;
-        areaFinish = N_HUGE_AREA_FINISH;
-    } else {
-        areaStart = LINUX_MMAP_AREA_START;
-        areaFinish = N_MAX_WORKSET_SIZE;
-    }
-#if defined(USE_LFALLOC_RANDOM_HINT)
-    static thread_local std::mt19937_64 generator(std::random_device{}());
-    std::uniform_int_distribution<intptr_t> distr(areaStart, areaFinish - sz - 1);
-    char* largeBlock;
-    static constexpr size_t MaxAttempts = 100;
-    size_t attempt = 0;
-    do
-    {
-        largeBlock = (char*)mmap(reinterpret_cast<void*>(distr(generator)), sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
-        ++attempt;
-    } while (uintptr_t(((char*)largeBlock - ALLOC_START) + sz) >= areaFinish && attempt < MaxAttempts && munmap(largeBlock, sz) == 0);
-#else
-    char* largeBlock = (char*)mmap(0, sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
-#endif
-    VerifyMmapResult(largeBlock);
-    if (Y_UNLIKELY(uintptr_t(((char*)largeBlock - ALLOC_START) + sz) >= areaFinish))
-        NMalloc::AbortFromCorruptedAllocator(); // out of working set, something has broken
-#else
-    char* largeBlock = AllocWithMMapLinuxImpl(sz, mode);
-    if (TransparentHugePages) {
-        madvise(largeBlock, sz, MADV_HUGEPAGE);
-    }
-#endif
-#endif
-    Y_ASSERT_NOBT(largeBlock);
-    IncrementCounter(CT_MMAP, sz);
-    IncrementCounter(CT_MMAP_CNT, 1);
-    return largeBlock;
-}
-
-enum class ELarge : ui8 {
-    Free = 0,   // block in free cache
-    Alloc = 1,  // block is allocated
-    Gone = 2,   // block was unmapped
-};
-
-struct TLargeBlk {
-
-    static TLargeBlk* As(void *raw) {
-        return reinterpret_cast<TLargeBlk*>((char*)raw - 4096ll);
-    }
-
-    static const TLargeBlk* As(const void *raw) {
-        return reinterpret_cast<const TLargeBlk*>((const char*)raw - 4096ll);
-    }
-
-    void SetSize(size_t bytes, size_t pages) {
-        Pages = pages;
-        Bytes = bytes;
-    }
-
-    void Mark(ELarge state) {
-        const ui64 marks[] = {
-            0x8b38aa5ca4953c98, // ELarge::Free
-            0xf916d33584eb5087, // ELarge::Alloc
-            0xd33b0eca7651bc3f  // ELarge::Gone
-        };
-
-        Token = size_t(marks[ui8(state)]);
-    }
-
-    size_t Pages; // Total pages allocated with mmap like call
-    size_t Bytes; // Actually requested bytes by user
-    size_t Token; // Block state token, see ELarge enum.
-};
-
-
-static void LargeBlockUnmap(void* p, size_t pages) {
-    const auto bytes = (pages + 1) * uintptr_t(4096);
-
-    IncrementCounter(CT_MUNMAP, bytes);
-    IncrementCounter(CT_MUNMAP_CNT, 1);
-#ifdef _MSC_VER
-    Y_ASSERT_NOBT(0);
-#else
-    TLargeBlk::As(p)->Mark(ELarge::Gone);
-    munmap((char*)p - 4096ll, bytes);
-#endif
-}
-
-//////////////////////////////////////////////////////////////////////////
-const size_t LB_BUF_SIZE = 250;
-const size_t LB_BUF_HASH = 977;
-static int LB_LIMIT_TOTAL_SIZE = 500 * 1024 * 1024 / 4096; // do not keep more then this mem total in lbFreePtrs[]
-static void* volatile lbFreePtrs[LB_BUF_HASH][LB_BUF_SIZE];
-static TAtomic lbFreePageCount;
-
-
-static void* LargeBlockAlloc(size_t _nSize, ELFAllocCounter counter) {
-    size_t pgCount = (_nSize + 4095) / 4096;
-#ifdef _MSC_VER
-    char* pRes = (char*)VirtualAlloc(0, (pgCount + 1) * 4096ll, MEM_COMMIT, PAGE_READWRITE);
-    if (Y_UNLIKELY(pRes == 0)) {
-        NMalloc::AbortFromCorruptedAllocator(); // out of memory
-    }
-#else
-
-    IncrementCounter(counter, pgCount * 4096ll);
-    IncrementCounter(CT_SYSTEM_ALLOC, 4096ll);
-
-    int lbHash = pgCount % LB_BUF_HASH;
-    for (int i = 0; i < LB_BUF_SIZE; ++i) {
-        void* p = lbFreePtrs[lbHash][i];
-        if (p == nullptr)
-            continue;
-        if (DoCas(&lbFreePtrs[lbHash][i], (void*)nullptr, p) == p) {
-            size_t realPageCount = TLargeBlk::As(p)->Pages;
-            if (realPageCount == pgCount) {
-                AtomicAdd(lbFreePageCount, -pgCount);
-                TLargeBlk::As(p)->Mark(ELarge::Alloc);
-                return p;
-            } else {
-                if (DoCas(&lbFreePtrs[lbHash][i], p, (void*)nullptr) != (void*)nullptr) {
-                    // block was freed while we were busy
-                    AtomicAdd(lbFreePageCount, -realPageCount);
-                    LargeBlockUnmap(p, realPageCount);
-                    --i;
-                }
-            }
-        }
-    }
-    char* pRes = AllocWithMMap((pgCount + 1) * 4096ll, MM_HUGE);
-#endif
-    pRes += 4096ll;
-    TLargeBlk::As(pRes)->SetSize(_nSize, pgCount);
-    TLargeBlk::As(pRes)->Mark(ELarge::Alloc);
-
-    return pRes;
-}
-
-#ifndef _MSC_VER
-static void FreeAllLargeBlockMem() {
-    for (auto& lbFreePtr : lbFreePtrs) {
-        for (int i = 0; i < LB_BUF_SIZE; ++i) {
-            void* p = lbFreePtr[i];
-            if (p == nullptr)
-                continue;
-            if (DoCas(&lbFreePtr[i], (void*)nullptr, p) == p) {
-                int pgCount = TLargeBlk::As(p)->Pages;
-                AtomicAdd(lbFreePageCount, -pgCount);
-                LargeBlockUnmap(p, pgCount);
-            }
-        }
-    }
-}
-#endif
-
-static void LargeBlockFree(void* p, ELFAllocCounter counter) {
-    if (p == nullptr)
-        return;
-#ifdef _MSC_VER
-    VirtualFree((char*)p - 4096ll, 0, MEM_RELEASE);
-#else
-    size_t pgCount = TLargeBlk::As(p)->Pages;
-
-    TLargeBlk::As(p)->Mark(ELarge::Free);
-    IncrementCounter(counter, pgCount * 4096ll);
-    IncrementCounter(CT_SYSTEM_FREE, 4096ll);
-
-    if (lbFreePageCount > LB_LIMIT_TOTAL_SIZE)
-        FreeAllLargeBlockMem();
-    int lbHash = pgCount % LB_BUF_HASH;
-    for (int i = 0; i < LB_BUF_SIZE; ++i) {
-        if (lbFreePtrs[lbHash][i] == nullptr) {
-            if (DoCas(&lbFreePtrs[lbHash][i], p, (void*)nullptr) == nullptr) {
-                AtomicAdd(lbFreePageCount, pgCount);
-                return;
-            }
-        }
-    }
-
-    LargeBlockUnmap(p, pgCount);
-#endif
-}
-
-static void* SystemAlloc(size_t _nSize) {
-    //HeapAlloc(GetProcessHeap(), HEAP_GENERATE_EXCEPTIONS, _nSize);
-    return LargeBlockAlloc(_nSize, CT_SYSTEM_ALLOC);
-}
-static void SystemFree(void* p) {
-    //HeapFree(GetProcessHeap(), 0, p);
-    LargeBlockFree(p, CT_SYSTEM_FREE);
-}
-
-//////////////////////////////////////////////////////////////////////////
-static int* volatile nLock = nullptr;
-static int nLockVar;
-inline void RealEnterCriticalDefault(int* volatile* lockPtr) {
-    while (DoCas(lockPtr, &nLockVar, (int*)nullptr) != nullptr)
-        ; //pthread_yield();
-}
-inline void RealLeaveCriticalDefault(int* volatile* lockPtr) {
-    *lockPtr = nullptr;
-}
-static void (*RealEnterCritical)(int* volatile* lockPtr) = RealEnterCriticalDefault;
-static void (*RealLeaveCritical)(int* volatile* lockPtr) = RealLeaveCriticalDefault;
-static void (*BeforeLFAllocGlobalLockAcquired)() = nullptr;
-static void (*AfterLFAllocGlobalLockReleased)() = nullptr;
-class CCriticalSectionLockMMgr {
-public:
-    CCriticalSectionLockMMgr() {
-        if (BeforeLFAllocGlobalLockAcquired) {
-            BeforeLFAllocGlobalLockAcquired();
-        }
-        RealEnterCritical(&nLock);
-    }
-    ~CCriticalSectionLockMMgr() {
-        RealLeaveCritical(&nLock);
-        if (AfterLFAllocGlobalLockReleased) {
-            AfterLFAllocGlobalLockReleased();
-        }
-    }
-};
-
-//////////////////////////////////////////////////////////////////////////
-class TLFAllocFreeList {
-    struct TNode {
-        TNode* Next;
-    };
-
-    TNode* volatile Head;
-    TNode* volatile Pending;
-    TAtomic PendingToFreeListCounter;
-    TAtomic AllocCount;
-
-    static Y_FORCE_INLINE void Enqueue(TNode* volatile* headPtr, TNode* n) {
-        for (;;) {
-            TNode* volatile prevHead = *headPtr;
-            n->Next = prevHead;
-            if (DoCas(headPtr, n, prevHead) == prevHead)
-                break;
-        }
-    }
-    Y_FORCE_INLINE void* DoAlloc() {
-        TNode* res;
-        for (res = Head; res; res = Head) {
-            TNode* keepNext = res->Next;
-            if (DoCas(&Head, keepNext, res) == res) {
-                //Y_VERIFY(keepNext == res->Next);
-                break;
-            }
-        }
-        return res;
-    }
-    void FreeList(TNode* fl) {
-        if (!fl)
-            return;
-        TNode* flTail = fl;
-        while (flTail->Next)
-            flTail = flTail->Next;
-        for (;;) {
-            TNode* volatile prevHead = Head;
-            flTail->Next = prevHead;
-            if (DoCas(&Head, fl, prevHead) == prevHead)
-                break;
-        }
-    }
-
-public:
-    Y_FORCE_INLINE void Free(void* ptr) {
-        TNode* newFree = (TNode*)ptr;
-        if (AtomicAdd(AllocCount, 0) == 0)
-            Enqueue(&Head, newFree);
-        else
-            Enqueue(&Pending, newFree);
-    }
-    Y_FORCE_INLINE void* Alloc() {
-        TAtomic keepCounter = AtomicAdd(PendingToFreeListCounter, 0);
-        TNode* fl = Pending;
-        if (AtomicAdd(AllocCount, 1) == 1) {
-            // No other allocs in progress.
-            // If (keepCounter == PendingToFreeListCounter) then Pending was not freed by other threads.
-            // Hence Pending is not used in any concurrent DoAlloc() atm and can be safely moved to FreeList
-            if (fl && keepCounter == AtomicAdd(PendingToFreeListCounter, 0) && DoCas(&Pending, (TNode*)nullptr, fl) == fl) {
-                // pick first element from Pending and return it
-                void* res = fl;
-                fl = fl->Next;
-                // if there are other elements in Pending list, add them to main free list
-                FreeList(fl);
-                AtomicAdd(PendingToFreeListCounter, 1);
-                AtomicAdd(AllocCount, -1);
-                return res;
-            }
-        }
-        void* res = DoAlloc();
-        AtomicAdd(AllocCount, -1);
-        return res;
-    }
-    void* GetWholeList() {
-        TNode* res;
-        for (res = Head; res; res = Head) {
-            if (DoCas(&Head, (TNode*)nullptr, res) == res)
-                break;
-        }
-        return res;
-    }
-    void ReturnWholeList(void* ptr) {
-        while (AtomicAdd(AllocCount, 0) != 0) // theoretically can run into problems with parallel DoAlloc()
-            ;                                 //ThreadYield();
-        for (;;) {
-            TNode* prevHead = Head;
-            if (DoCas(&Head, (TNode*)ptr, prevHead) == prevHead) {
-                FreeList(prevHead);
-                break;
-            }
-        }
-    }
-};
-
-/////////////////////////////////////////////////////////////////////////
-static TLFAllocFreeList globalFreeLists[N_SIZES];
-static char* volatile globalCurrentPtr[N_SIZES];
-static TLFAllocFreeList blockFreeList;
-
-// globalFreeLists[] contains TFreeListGroup, each of them points up to 15 free blocks
-const int FL_GROUP_SIZE = 15;
-struct TFreeListGroup {
-    TFreeListGroup* Next;
-    char* Ptrs[FL_GROUP_SIZE];
-};
-#ifdef _64_
-const int FREE_LIST_GROUP_SIZEIDX = 8;
-#else
-const int FREE_LIST_GROUP_SIZEIDX = 6;
-#endif
-
-//////////////////////////////////////////////////////////////////////////
-// find free chunks and reset chunk size so they can be reused by different sized allocations
-// do not look at blockFreeList (TFreeListGroup has same size for any allocations)
-static bool DefragmentMem() {
-    if (!EnableDefrag) {
-        return false;
-    }
-
-    IncrementCounter(CT_DEGRAGMENT_CNT, 1);
-
-    int* nFreeCount = (int*)SystemAlloc(N_CHUNKS * sizeof(int));
-    if (Y_UNLIKELY(!nFreeCount)) {
-        //__debugbreak();
-        NMalloc::AbortFromCorruptedAllocator();
-    }
-    memset(nFreeCount, 0, N_CHUNKS * sizeof(int));
-
-    TFreeListGroup* wholeLists[N_SIZES];
-    for (int nSizeIdx = 0; nSizeIdx < N_SIZES; ++nSizeIdx) {
-        wholeLists[nSizeIdx] = (TFreeListGroup*)globalFreeLists[nSizeIdx].GetWholeList();
-        for (TFreeListGroup* g = wholeLists[nSizeIdx]; g; g = g->Next) {
-            for (auto pData : g->Ptrs) {
-                if (pData) {
-                    uintptr_t nChunk = (pData - ALLOC_START) / N_CHUNK_SIZE;
-                    ++nFreeCount[nChunk];
-                    Y_ASSERT_NOBT(chunkSizeIdx[nChunk] == nSizeIdx);
-                }
-            }
-        }
-    }
-
-    bool bRes = false;
-    for (size_t nChunk = 0; nChunk < N_CHUNKS; ++nChunk) {
-        int fc = nFreeCount[nChunk];
-        nFreeCount[nChunk] = 0;
-        if (chunkSizeIdx[nChunk] <= 0)
-            continue;
-        int nEntries = N_CHUNK_SIZE / nSizeIdxToSize[static_cast<int>(chunkSizeIdx[nChunk])];
-        Y_ASSERT_NOBT(fc <= nEntries); // can not have more free blocks then total count
-        if (fc == nEntries) {
-            bRes = true;
-            nFreeCount[nChunk] = 1;
-        }
-    }
-    if (bRes) {
-        for (auto& wholeList : wholeLists) {
-            TFreeListGroup** ppPtr = &wholeList;
-            while (*ppPtr) {
-                TFreeListGroup* g = *ppPtr;
-                int dst = 0;
-                for (auto pData : g->Ptrs) {
-                    if (pData) {
-                        uintptr_t nChunk = (pData - ALLOC_START) / N_CHUNK_SIZE;
-                        if (nFreeCount[nChunk] == 0)
-                            g->Ptrs[dst++] = pData; // block is not freed, keep pointer
-                    }
-                }
-                if (dst == 0) {
-                    // no valid pointers in group, free it
-                    *ppPtr = g->Next;
-                    blockFreeList.Free(g);
-                } else {
-                    // reset invalid pointers to 0
-                    for (int i = dst; i < FL_GROUP_SIZE; ++i)
-                        g->Ptrs[i] = nullptr;
-                    ppPtr = &g->Next;
-                }
-            }
-        }
-        for (uintptr_t nChunk = 0; nChunk < N_CHUNKS; ++nChunk) {
-            if (!nFreeCount[nChunk])
-                continue;
-            char* pStart = ALLOC_START + nChunk * N_CHUNK_SIZE;
-#ifdef _win_
-            VirtualFree(pStart, N_CHUNK_SIZE, MEM_DECOMMIT);
-#elif defined(_freebsd_)
-            madvise(pStart, N_CHUNK_SIZE, MADV_FREE);
-#else
-            madvise(pStart, N_CHUNK_SIZE, MADV_DONTNEED);
-#endif
-            AddFreeChunk(nChunk);
-        }
-    }
-
-    for (int nSizeIdx = 0; nSizeIdx < N_SIZES; ++nSizeIdx)
-        globalFreeLists[nSizeIdx].ReturnWholeList(wholeLists[nSizeIdx]);
-
-    SystemFree(nFreeCount);
-    return bRes;
-}
-
-static Y_FORCE_INLINE void* LFAllocFromCurrentChunk(int nSizeIdx, int blockSize, int count) {
-    char* volatile* pFreeArray = &globalCurrentPtr[nSizeIdx];
-    while (char* newBlock = *pFreeArray) {
-        char* nextFree = newBlock + blockSize * count;
-
-        // check if there is space in chunk
-        char* globalEndPtr = ALLOC_START + ((newBlock - ALLOC_START) & ~((uintptr_t)N_CHUNK_SIZE - 1)) + N_CHUNK_SIZE;
-        if (nextFree >= globalEndPtr) {
-            if (nextFree > globalEndPtr)
-                break;
-            nextFree = nullptr; // it was last block in chunk
-        }
-        if (DoCas(pFreeArray, nextFree, newBlock) == newBlock)
-            return newBlock;
-    }
-    return nullptr;
-}
-
-enum EDefrag {
-    MEM_DEFRAG,
-    NO_MEM_DEFRAG,
-};
-
-static void* SlowLFAlloc(int nSizeIdx, int blockSize, EDefrag defrag) {
-    IncrementCounter(CT_SLOW_ALLOC_CNT, 1);
-
-    CCriticalSectionLockMMgr ls;
-    void* res = LFAllocFromCurrentChunk(nSizeIdx, blockSize, 1);
-    if (res)
-        return res; // might happen when other thread allocated new current chunk
-
-    for (;;) {
-        uintptr_t nChunk;
-        if (GetFreeChunk(&nChunk)) {
-            char* newPlace = ALLOC_START + nChunk * N_CHUNK_SIZE;
-#ifdef _MSC_VER
-            void* pTest = VirtualAlloc(newPlace, N_CHUNK_SIZE, MEM_COMMIT, PAGE_READWRITE);
-            Y_ASSERT_NOBT(pTest == newPlace);
-#endif
-            chunkSizeIdx[nChunk] = (char)nSizeIdx;
-            globalCurrentPtr[nSizeIdx] = newPlace + blockSize;
-            return newPlace;
-        }
-
-        // out of luck, try to defrag
-        if (defrag == MEM_DEFRAG && DefragmentMem()) {
-            continue;
-        }
-
-        char* largeBlock = AllocWithMMap(N_LARGE_ALLOC_SIZE, MM_NORMAL);
-        uintptr_t addr = ((largeBlock - ALLOC_START) + N_CHUNK_SIZE - 1) & (~(N_CHUNK_SIZE - 1));
-        uintptr_t endAddr = ((largeBlock - ALLOC_START) + N_LARGE_ALLOC_SIZE) & (~(N_CHUNK_SIZE - 1));
-        for (uintptr_t p = addr; p < endAddr; p += N_CHUNK_SIZE) {
-            uintptr_t chunk = p / N_CHUNK_SIZE;
-            Y_ASSERT_NOBT(chunk * N_CHUNK_SIZE == p);
-            Y_ASSERT_NOBT(chunkSizeIdx[chunk] == 0);
-            AddFreeChunk(chunk);
-        }
-    }
-    return nullptr;
-}
-
-// allocate single block
-static Y_FORCE_INLINE void* LFAllocNoCache(int nSizeIdx, EDefrag defrag) {
-    int blockSize = nSizeIdxToSize[nSizeIdx];
-    void* res = LFAllocFromCurrentChunk(nSizeIdx, blockSize, 1);
-    if (res)
-        return res;
-
-    return SlowLFAlloc(nSizeIdx, blockSize, defrag);
-}
-
-// allocate multiple blocks, returns number of blocks allocated (max FL_GROUP_SIZE)
-// buf should have space for at least FL_GROUP_SIZE elems
-static Y_FORCE_INLINE int LFAllocNoCacheMultiple(int nSizeIdx, char** buf) {
-    int blockSize = nSizeIdxToSize[nSizeIdx];
-    void* res = LFAllocFromCurrentChunk(nSizeIdx, blockSize, FL_GROUP_SIZE);
-    if (res) {
-        char* resPtr = (char*)res;
-        for (int k = 0; k < FL_GROUP_SIZE; ++k) {
-            buf[k] = resPtr;
-            resPtr += blockSize;
-        }
-        return FL_GROUP_SIZE;
-    }
-    buf[0] = (char*)SlowLFAlloc(nSizeIdx, blockSize, MEM_DEFRAG);
-    return 1;
-}
-
-// take several blocks from global free list (max FL_GROUP_SIZE blocks), returns number of blocks taken
-// buf should have space for at least FL_GROUP_SIZE elems
-static Y_FORCE_INLINE int TakeBlocksFromGlobalFreeList(int nSizeIdx, char** buf) {
-    TLFAllocFreeList& fl = globalFreeLists[nSizeIdx];
-    TFreeListGroup* g = (TFreeListGroup*)fl.Alloc();
-    if (g) {
-        int resCount = 0;
-        for (auto& ptr : g->Ptrs) {
-            if (ptr)
-                buf[resCount++] = ptr;
-            else
-                break;
-        }
-        blockFreeList.Free(g);
-        return resCount;
-    }
-    return 0;
-}
-
-// add several blocks to global free list
-static Y_FORCE_INLINE void PutBlocksToGlobalFreeList(ptrdiff_t nSizeIdx, char** buf, int count) {
-    for (int startIdx = 0; startIdx < count;) {
-        TFreeListGroup* g = (TFreeListGroup*)blockFreeList.Alloc();
-        Y_ASSERT_NOBT(sizeof(TFreeListGroup) == nSizeIdxToSize[FREE_LIST_GROUP_SIZEIDX]);
-        if (!g) {
-            g = (TFreeListGroup*)LFAllocNoCache(FREE_LIST_GROUP_SIZEIDX, NO_MEM_DEFRAG);
-        }
-
-        int groupSize = count - startIdx;
-        if (groupSize > FL_GROUP_SIZE)
-            groupSize = FL_GROUP_SIZE;
-        for (int i = 0; i < groupSize; ++i)
-            g->Ptrs[i] = buf[startIdx + i];
-        for (int i = groupSize; i < FL_GROUP_SIZE; ++i)
-            g->Ptrs[i] = nullptr;
-
-        // add free group to the global list
-        TLFAllocFreeList& fl = globalFreeLists[nSizeIdx];
-        fl.Free(g);
-
-        startIdx += groupSize;
-    }
-}
-
-//////////////////////////////////////////////////////////////////////////
-static TAtomic GlobalCounters[CT_MAX];
-const int MAX_LOCAL_UPDATES = 100;
-
-struct TLocalCounter {
-    intptr_t Value;
-    int Updates;
-    TAtomic* Parent;
-
-    Y_FORCE_INLINE void Init(TAtomic* parent) {
-        Parent = parent;
-        Value = 0;
-        Updates = 0;
-    }
-
-    Y_FORCE_INLINE void Increment(size_t value) {
-        Value += value;
-        if (++Updates > MAX_LOCAL_UPDATES) {
-            Flush();
-        }
-    }
-
-    Y_FORCE_INLINE void Flush() {
-        AtomicAdd(*Parent, Value);
-        Value = 0;
-        Updates = 0;
-    }
-};
-
-////////////////////////////////////////////////////////////////////////////////
-// DBG stuff
-////////////////////////////////////////////////////////////////////////////////
-
-#if defined(LFALLOC_DBG)
-
-struct TPerTagAllocCounter {
-    TAtomic Size;
-    TAtomic Count;
-
-    Y_FORCE_INLINE void Alloc(size_t size) {
-        AtomicAdd(Size, size);
-        AtomicAdd(Count, 1);
-    }
-
-    Y_FORCE_INLINE void Free(size_t size) {
-        AtomicSub(Size, size);
-        AtomicSub(Count, 1);
-    }
-};
-
-struct TLocalPerTagAllocCounter {
-    intptr_t Size;
-    int Count;
-    int Updates;
-
-    Y_FORCE_INLINE void Init() {
-        Size = 0;
-        Count = 0;
-        Updates = 0;
-    }
-
-    Y_FORCE_INLINE void Alloc(TPerTagAllocCounter& parent, size_t size) {
-        Size += size;
-        ++Count;
-        if (++Updates > MAX_LOCAL_UPDATES) {
-            Flush(parent);
-        }
-    }
-
-    Y_FORCE_INLINE void Free(TPerTagAllocCounter& parent, size_t size) {
-        Size -= size;
-        --Count;
-        if (++Updates > MAX_LOCAL_UPDATES) {
-            Flush(parent);
-        }
-    }
-
-    Y_FORCE_INLINE void Flush(TPerTagAllocCounter& parent) {
-        AtomicAdd(parent.Size, Size);
-        Size = 0;
-        AtomicAdd(parent.Count, Count);
-        Count = 0;
-        Updates = 0;
-    }
-};
-
-static const int DBG_ALLOC_MAX_TAG = 1000;
-static const int DBG_ALLOC_NUM_SIZES = 30;
-static TPerTagAllocCounter GlobalPerTagAllocCounters[DBG_ALLOC_MAX_TAG][DBG_ALLOC_NUM_SIZES];
-
-#endif // LFALLOC_DBG
-
-//////////////////////////////////////////////////////////////////////////
-const int THREAD_BUF = 256;
-static int borderSizes[N_SIZES];
-const int MAX_MEM_PER_SIZE_PER_THREAD = 512 * 1024;
-struct TThreadAllocInfo {
-    // FreePtrs - pointers to first free blocks in per thread block list
-    // LastFreePtrs - pointers to last blocks in lists, may be invalid if FreePtr is zero
-    char* FreePtrs[N_SIZES][THREAD_BUF];
-    int FreePtrIndex[N_SIZES];
-    TThreadAllocInfo* pNextInfo;
-    TLocalCounter LocalCounters[CT_MAX];
-
-#if defined(LFALLOC_DBG)
-    TLocalPerTagAllocCounter LocalPerTagAllocCounters[DBG_ALLOC_MAX_TAG][DBG_ALLOC_NUM_SIZES];
-#endif
-#ifdef _win_
-    HANDLE hThread;
-#endif
-
-    void Init(TThreadAllocInfo** pHead) {
-        memset(this, 0, sizeof(*this));
-        for (auto& i : FreePtrIndex)
-            i = THREAD_BUF;
-#ifdef _win_
-        BOOL b = DuplicateHandle(
-            GetCurrentProcess(), GetCurrentThread(),
-            GetCurrentProcess(), &hThread,
-            0, FALSE, DUPLICATE_SAME_ACCESS);
-        Y_ASSERT_NOBT(b);
-#endif
-        pNextInfo = *pHead;
-        *pHead = this;
-        for (int k = 0; k < N_SIZES; ++k) {
-            int maxCount = MAX_MEM_PER_SIZE_PER_THREAD / nSizeIdxToSize[k];
-            if (maxCount > THREAD_BUF)
-                maxCount = THREAD_BUF;
-            borderSizes[k] = THREAD_BUF - maxCount;
-        }
-        for (int i = 0; i < CT_MAX; ++i) {
-            LocalCounters[i].Init(&GlobalCounters[i]);
-        }
-#if defined(LFALLOC_DBG)
-        for (int tag = 0; tag < DBG_ALLOC_MAX_TAG; ++tag) {
-            for (int sizeIdx = 0; sizeIdx < DBG_ALLOC_NUM_SIZES; ++sizeIdx) {
-                auto& local = LocalPerTagAllocCounters[tag][sizeIdx];
-                local.Init();
-            }
-        }
-#endif
-    }
-    void Done() {
-        for (auto sizeIdx : FreePtrIndex) {
-            Y_ASSERT_NOBT(sizeIdx == THREAD_BUF);
-        }
-        for (auto& localCounter : LocalCounters) {
-            localCounter.Flush();
-        }
-#if defined(LFALLOC_DBG)
-        for (int tag = 0; tag < DBG_ALLOC_MAX_TAG; ++tag) {
-            for (int sizeIdx = 0; sizeIdx < DBG_ALLOC_NUM_SIZES; ++sizeIdx) {
-                auto& local = LocalPerTagAllocCounters[tag][sizeIdx];
-                auto& global = GlobalPerTagAllocCounters[tag][sizeIdx];
-                local.Flush(global);
-            }
-        }
-#endif
-#ifdef _win_
-        if (hThread)
-            CloseHandle(hThread);
-#endif
-    }
-};
-PERTHREAD TThreadAllocInfo* pThreadInfo;
-static TThreadAllocInfo* pThreadInfoList;
-
-static int* volatile nLockThreadInfo = nullptr;
-class TLockThreadListMMgr {
-public:
-    TLockThreadListMMgr() {
-        RealEnterCritical(&nLockThreadInfo);
-    }
-    ~TLockThreadListMMgr() {
-        RealLeaveCritical(&nLockThreadInfo);
-    }
-};
-
-static Y_FORCE_INLINE void IncrementCounter(ELFAllocCounter counter, size_t value) {
-#ifdef LFALLOC_YT
-    TThreadAllocInfo* thr = pThreadInfo;
-    if (thr) {
-        thr->LocalCounters[counter].Increment(value);
-    } else {
-        AtomicAdd(GlobalCounters[counter], value);
-    }
-#endif
-}
-
-extern "C" i64 GetLFAllocCounterFast(int counter) {
-#ifdef LFALLOC_YT
-    return GlobalCounters[counter];
-#else
-    return 0;
-#endif
-}
-
-extern "C" i64 GetLFAllocCounterFull(int counter) {
-#ifdef LFALLOC_YT
-    i64 ret = GlobalCounters[counter];
-    {
-        TLockThreadListMMgr ll;
-        for (TThreadAllocInfo** p = &pThreadInfoList; *p;) {
-            TThreadAllocInfo* pInfo = *p;
-            ret += pInfo->LocalCounters[counter].Value;
-            p = &pInfo->pNextInfo;
-        }
-    }
-    return ret;
-#else
-    return 0;
-#endif
-}
-
-static void MoveSingleThreadFreeToGlobal(TThreadAllocInfo* pInfo) {
-    for (int sizeIdx = 0; sizeIdx < N_SIZES; ++sizeIdx) {
-        int& freePtrIdx = pInfo->FreePtrIndex[sizeIdx];
-        char** freePtrs = pInfo->FreePtrs[sizeIdx];
-        PutBlocksToGlobalFreeList(sizeIdx, freePtrs + freePtrIdx, THREAD_BUF - freePtrIdx);
-        freePtrIdx = THREAD_BUF;
-    }
-}
-
-#ifdef _win_
-static bool IsDeadThread(TThreadAllocInfo* pInfo) {
-    DWORD dwExit;
-    bool isDead = !GetExitCodeThread(pInfo->hThread, &dwExit) || dwExit != STILL_ACTIVE;
-    return isDead;
-}
-
-static void CleanupAfterDeadThreads() {
-    TLockThreadListMMgr ls;
-    for (TThreadAllocInfo** p = &pThreadInfoList; *p;) {
-        TThreadAllocInfo* pInfo = *p;
-        if (IsDeadThread(pInfo)) {
-            MoveSingleThreadFreeToGlobal(pInfo);
-            pInfo->Done();
-            *p = pInfo->pNextInfo;
-            SystemFree(pInfo);
-        } else
-            p = &pInfo->pNextInfo;
-    }
-}
-#endif
-
-#ifndef _win_
-static pthread_key_t ThreadCacheCleaner;
-static void* volatile ThreadCacheCleanerStarted; // 0 = not started, -1 = started, -2 = is starting
-static PERTHREAD bool IsStoppingThread;
-
-static void FreeThreadCache(void*) {
-    TThreadAllocInfo* pToDelete = nullptr;
-    {
-        TLockThreadListMMgr ls;
-        pToDelete = pThreadInfo;
-        if (pToDelete == nullptr)
-            return;
-
-        // remove from the list
-        for (TThreadAllocInfo** p = &pThreadInfoList; *p; p = &(*p)->pNextInfo) {
-            if (*p == pToDelete) {
-                *p = pToDelete->pNextInfo;
-                break;
-            }
-        }
-        IsStoppingThread = true;
-        pThreadInfo = nullptr;
-    }
-
-    // free per thread buf
-    MoveSingleThreadFreeToGlobal(pToDelete);
-    pToDelete->Done();
-    SystemFree(pToDelete);
-}
-#endif
-
-static void AllocThreadInfo() {
-#ifndef _win_
-    if (DoCas(&ThreadCacheCleanerStarted, (void*)-2, (void*)nullptr) == (void*)nullptr) {
-        pthread_key_create(&ThreadCacheCleaner, FreeThreadCache);
-        ThreadCacheCleanerStarted = (void*)-1;
-    }
-    if (ThreadCacheCleanerStarted != (void*)-1)
-        return; // do not use ThreadCacheCleaner until it is constructed
-
-    {
-        if (IsStoppingThread)
-            return;
-        TLockThreadListMMgr ls;
-        if (IsStoppingThread) // better safe than sorry
-            return;
-
-        pThreadInfo = (TThreadAllocInfo*)SystemAlloc(sizeof(TThreadAllocInfo));
-        pThreadInfo->Init(&pThreadInfoList);
-    }
-    pthread_setspecific(ThreadCacheCleaner, (void*)-1); // without value destructor will not be called
-#else
-    CleanupAfterDeadThreads();
-    {
-        TLockThreadListMMgr ls;
-        pThreadInfo = (TThreadAllocInfo*)SystemAlloc(sizeof(TThreadAllocInfo));
-        pThreadInfo->Init(&pThreadInfoList);
-    }
-#endif
-}
-
-    //////////////////////////////////////////////////////////////////////////
-    // DBG stuff
-    //////////////////////////////////////////////////////////////////////////
-
-#if defined(LFALLOC_DBG)
-
-struct TAllocHeader {
-    size_t Size;
-    int Tag;
-    int Cookie;
-};
-
-static inline void* GetAllocPtr(TAllocHeader* p) {
-    return p + 1;
-}
-
-static inline TAllocHeader* GetAllocHeader(void* p) {
-    return ((TAllocHeader*)p) - 1;
-}
-
-PERTHREAD int AllocationTag;
-extern "C" int SetThreadAllocTag(int tag) {
-    int prevTag = AllocationTag;
-    if (tag < DBG_ALLOC_MAX_TAG && tag >= 0) {
-        AllocationTag = tag;
-    }
-    return prevTag;
-}
-
-PERTHREAD bool ProfileCurrentThread;
-extern "C" bool SetProfileCurrentThread(bool newVal) {
-    bool prevVal = ProfileCurrentThread;
-    ProfileCurrentThread = newVal;
-    return prevVal;
-}
-
-static volatile bool ProfileAllThreads;
-extern "C" bool SetProfileAllThreads(bool newVal) {
-    bool prevVal = ProfileAllThreads;
-    ProfileAllThreads = newVal;
-    return prevVal;
-}
-
-static volatile bool AllocationSamplingEnabled;
-extern "C" bool SetAllocationSamplingEnabled(bool newVal) {
-    bool prevVal = AllocationSamplingEnabled;
-    AllocationSamplingEnabled = newVal;
-    return prevVal;
-}
-
-static size_t AllocationSampleRate = 1000;
-extern "C" size_t SetAllocationSampleRate(size_t newVal) {
-    size_t prevVal = AllocationSampleRate;
-    AllocationSampleRate = newVal;
-    return prevVal;
-}
-
-static size_t AllocationSampleMaxSize = N_MAX_FAST_SIZE;
-extern "C" size_t SetAllocationSampleMaxSize(size_t newVal) {
-    size_t prevVal = AllocationSampleMaxSize;
-    AllocationSampleMaxSize = newVal;
-    return prevVal;
-}
-
-using TAllocationCallback = int(int tag, size_t size, int sizeIdx);
-static TAllocationCallback* AllocationCallback;
-extern "C" TAllocationCallback* SetAllocationCallback(TAllocationCallback* newVal) {
-    TAllocationCallback* prevVal = AllocationCallback;
-    AllocationCallback = newVal;
-    return prevVal;
-}
-
-using TDeallocationCallback = void(int cookie, int tag, size_t size, int sizeIdx);
-static TDeallocationCallback* DeallocationCallback;
-extern "C" TDeallocationCallback* SetDeallocationCallback(TDeallocationCallback* newVal) {
-    TDeallocationCallback* prevVal = DeallocationCallback;
-    DeallocationCallback = newVal;
-    return prevVal;
-}
-
-PERTHREAD TAtomic AllocationsCount;
-PERTHREAD bool InAllocationCallback;
-
-static const int DBG_ALLOC_INVALID_COOKIE = -1;
-static inline int SampleAllocation(TAllocHeader* p, int sizeIdx) {
-    int cookie = DBG_ALLOC_INVALID_COOKIE;
-    if (AllocationSamplingEnabled && (ProfileCurrentThread || ProfileAllThreads) && !InAllocationCallback) {
-        if (p->Size > AllocationSampleMaxSize || ++AllocationsCount % AllocationSampleRate == 0) {
-            if (AllocationCallback) {
-                InAllocationCallback = true;
-                cookie = AllocationCallback(p->Tag, p->Size, sizeIdx);
-                InAllocationCallback = false;
-            }
-        }
-    }
-    return cookie;
-}
-
-static inline void SampleDeallocation(TAllocHeader* p, int sizeIdx) {
-    if (p->Cookie != DBG_ALLOC_INVALID_COOKIE && !InAllocationCallback) {
-        if (DeallocationCallback) {
-            InAllocationCallback = true;
-            DeallocationCallback(p->Cookie, p->Tag, p->Size, sizeIdx);
-            InAllocationCallback = false;
-        }
-    }
-}
-
-static inline void TrackPerTagAllocation(TAllocHeader* p, int sizeIdx) {
-    if (p->Tag < DBG_ALLOC_MAX_TAG && p->Tag >= 0) {
-        Y_ASSERT_NOBT(sizeIdx < DBG_ALLOC_NUM_SIZES);
-        auto& global = GlobalPerTagAllocCounters[p->Tag][sizeIdx];
-
-        TThreadAllocInfo* thr = pThreadInfo;
-        if (thr) {
-            auto& local = thr->LocalPerTagAllocCounters[p->Tag][sizeIdx];
-            local.Alloc(global, p->Size);
-        } else {
-            global.Alloc(p->Size);
-        }
-    }
-}
-
-static inline void TrackPerTagDeallocation(TAllocHeader* p, int sizeIdx) {
-    if (p->Tag < DBG_ALLOC_MAX_TAG && p->Tag >= 0) {
-        Y_ASSERT_NOBT(sizeIdx < DBG_ALLOC_NUM_SIZES);
-        auto& global = GlobalPerTagAllocCounters[p->Tag][sizeIdx];
-
-        TThreadAllocInfo* thr = pThreadInfo;
-        if (thr) {
-            auto& local = thr->LocalPerTagAllocCounters[p->Tag][sizeIdx];
-            local.Free(global, p->Size);
-        } else {
-            global.Free(p->Size);
-        }
-    }
-}
-
-static void* TrackAllocation(void* ptr, size_t size, int sizeIdx) {
-    TAllocHeader* p = (TAllocHeader*)ptr;
-    p->Size = size;
-    p->Tag = AllocationTag;
-    p->Cookie = SampleAllocation(p, sizeIdx);
-    TrackPerTagAllocation(p, sizeIdx);
-    return GetAllocPtr(p);
-}
-
-static void TrackDeallocation(void* ptr, int sizeIdx) {
-    TAllocHeader* p = (TAllocHeader*)ptr;
-    SampleDeallocation(p, sizeIdx);
-    TrackPerTagDeallocation(p, sizeIdx);
-}
-
-struct TPerTagAllocInfo {
-    ssize_t Count;
-    ssize_t Size;
-};
-
-extern "C" void GetPerTagAllocInfo(
-    bool flushPerThreadCounters,
-    TPerTagAllocInfo* info,
-    int& maxTag,
-    int& numSizes) {
-    maxTag = DBG_ALLOC_MAX_TAG;
-    numSizes = DBG_ALLOC_NUM_SIZES;
-
-    if (info) {
-        if (flushPerThreadCounters) {
-            TLockThreadListMMgr ll;
-            for (TThreadAllocInfo** p = &pThreadInfoList; *p;) {
-                TThreadAllocInfo* pInfo = *p;
-                for (int tag = 0; tag < DBG_ALLOC_MAX_TAG; ++tag) {
-                    for (int sizeIdx = 0; sizeIdx < DBG_ALLOC_NUM_SIZES; ++sizeIdx) {
-                        auto& local = pInfo->LocalPerTagAllocCounters[tag][sizeIdx];
-                        auto& global = GlobalPerTagAllocCounters[tag][sizeIdx];
-                        local.Flush(global);
-                    }
-                }
-                p = &pInfo->pNextInfo;
-            }
-        }
-
-        for (int tag = 0; tag < DBG_ALLOC_MAX_TAG; ++tag) {
-            for (int sizeIdx = 0; sizeIdx < DBG_ALLOC_NUM_SIZES; ++sizeIdx) {
-                auto& global = GlobalPerTagAllocCounters[tag][sizeIdx];
-                auto& res = info[tag * DBG_ALLOC_NUM_SIZES + sizeIdx];
-                res.Count = global.Count;
-                res.Size = global.Size;
-            }
-        }
-    }
-}
-
-#endif // LFALLOC_DBG
-
-//////////////////////////////////////////////////////////////////////////
-static Y_FORCE_INLINE void* LFAllocImpl(size_t _nSize) {
-#if defined(LFALLOC_DBG)
-    size_t size = _nSize;
-    _nSize += sizeof(TAllocHeader);
-#endif
-
-    IncrementCounter(CT_USER_ALLOC, _nSize);
-
-    int nSizeIdx;
-    if (_nSize > 512) {
-        if (_nSize > N_MAX_FAST_SIZE) {
-            void* ptr = LargeBlockAlloc(_nSize, CT_LARGE_ALLOC);
-#if defined(LFALLOC_DBG)
-            ptr = TrackAllocation(ptr, size, N_SIZES);
-#endif
-            return ptr;
-        }
-        nSizeIdx = size2idxArr2[(_nSize - 1) >> 8];
-    } else
-        nSizeIdx = size2idxArr1[1 + (((int)_nSize - 1) >> 3)];
-
-    IncrementCounter(CT_SMALL_ALLOC, nSizeIdxToSize[nSizeIdx]);
-
-    // check per thread buffer
-    TThreadAllocInfo* thr = pThreadInfo;
-    if (!thr) {
-        AllocThreadInfo();
-        thr = pThreadInfo;
-        if (!thr) {
-            void* ptr = LFAllocNoCache(nSizeIdx, MEM_DEFRAG);
-#if defined(LFALLOC_DBG)
-            ptr = TrackAllocation(ptr, size, nSizeIdx);
-#endif
-            return ptr;
-        }
-    }
-    {
-        int& freePtrIdx = thr->FreePtrIndex[nSizeIdx];
-        if (freePtrIdx < THREAD_BUF) {
-            void* ptr = thr->FreePtrs[nSizeIdx][freePtrIdx++];
-#if defined(LFALLOC_DBG)
-            ptr = TrackAllocation(ptr, size, nSizeIdx);
-#endif
-            return ptr;
-        }
-
-        // try to alloc from global free list
-        char* buf[FL_GROUP_SIZE];
-        int count = TakeBlocksFromGlobalFreeList(nSizeIdx, buf);
-        if (count == 0) {
-            count = LFAllocNoCacheMultiple(nSizeIdx, buf);
-            if (count == 0) {
-                NMalloc::AbortFromCorruptedAllocator(); // no way LFAllocNoCacheMultiple() can fail
-            }
-        }
-        char** dstBuf = thr->FreePtrs[nSizeIdx] + freePtrIdx - 1;
-        for (int i = 0; i < count - 1; ++i)
-            dstBuf[-i] = buf[i];
-        freePtrIdx -= count - 1;
-        void* ptr = buf[count - 1];
-#if defined(LFALLOC_DBG)
-        ptr = TrackAllocation(ptr, size, nSizeIdx);
-#endif
-        return ptr;
-    }
-}
-
-static Y_FORCE_INLINE void* LFAlloc(size_t _nSize) {
-    void* res = LFAllocImpl(_nSize);
-#ifdef DBG_FILL_MEMORY
-    if (FillMemoryOnAllocation && res && (_nSize <= DBG_FILL_MAX_SIZE)) {
-        memset(res, 0xcf, _nSize);
-    }
-#endif
-    return res;
-}
-
-static Y_FORCE_INLINE void LFFree(void* p) {
-#if defined(LFALLOC_DBG)
-    if (p == nullptr)
-        return;
-    p = GetAllocHeader(p);
-#endif
-
-    uintptr_t chkOffset = ((char*)p - ALLOC_START) - 1ll;
-    if (chkOffset >= N_MAX_WORKSET_SIZE) {
-        if (p == nullptr)
-            return;
-#if defined(LFALLOC_DBG)
-        TrackDeallocation(p, N_SIZES);
-#endif
-        LargeBlockFree(p, CT_LARGE_FREE);
-        return;
-    }
-
-    uintptr_t chunk = ((char*)p - ALLOC_START) / N_CHUNK_SIZE;
-    ptrdiff_t nSizeIdx = chunkSizeIdx[chunk];
-    if (nSizeIdx <= 0) {
-#if defined(LFALLOC_DBG)
-        TrackDeallocation(p, N_SIZES);
-#endif
-        LargeBlockFree(p, CT_LARGE_FREE);
-        return;
-    }
-
-#if defined(LFALLOC_DBG)
-    TrackDeallocation(p, nSizeIdx);
-#endif
-
-#ifdef DBG_FILL_MEMORY
-    memset(p, 0xfe, nSizeIdxToSize[nSizeIdx]);
-#endif
-
-    IncrementCounter(CT_SMALL_FREE, nSizeIdxToSize[nSizeIdx]);
-
-    // try to store info to per thread buf
-    TThreadAllocInfo* thr = pThreadInfo;
-    if (thr) {
-        int& freePtrIdx = thr->FreePtrIndex[nSizeIdx];
-        if (freePtrIdx > borderSizes[nSizeIdx]) {
-            thr->FreePtrs[nSizeIdx][--freePtrIdx] = (char*)p;
-            return;
-        }
-
-        // move several pointers to global free list
-        int freeCount = FL_GROUP_SIZE;
-        if (freeCount > THREAD_BUF - freePtrIdx)
-            freeCount = THREAD_BUF - freePtrIdx;
-        char** freePtrs = thr->FreePtrs[nSizeIdx];
-        PutBlocksToGlobalFreeList(nSizeIdx, freePtrs + freePtrIdx, freeCount);
-        freePtrIdx += freeCount;
-
-        freePtrs[--freePtrIdx] = (char*)p;
-
-    } else {
-        AllocThreadInfo();
-        PutBlocksToGlobalFreeList(nSizeIdx, (char**)&p, 1);
-    }
-}
-
-static size_t LFGetSize(const void* p) {
-#if defined(LFALLOC_DBG)
-    if (p == nullptr)
-        return 0;
-    return GetAllocHeader(const_cast<void*>(p))->Size;
-#endif
-
-    uintptr_t chkOffset = ((const char*)p - ALLOC_START);
-    if (chkOffset >= N_MAX_WORKSET_SIZE) {
-        if (p == nullptr)
-            return 0;
-        return TLargeBlk::As(p)->Pages * 4096ll;
-    }
-    uintptr_t chunk = ((const char*)p - ALLOC_START) / N_CHUNK_SIZE;
-    ptrdiff_t nSizeIdx = chunkSizeIdx[chunk];
-    if (nSizeIdx <= 0)
-        return TLargeBlk::As(p)->Pages * 4096ll;
-    return nSizeIdxToSize[nSizeIdx];
-}
-
-////////////////////////////////////////////////////////////////////////////////////////////////////
-// Output mem alloc stats
-const int N_PAGE_SIZE = 4096;
-static void DebugTraceMMgr(const char* pszFormat, ...) // __cdecl
-{
-    static char buff[20000];
-    va_list va;
-    //
-    va_start(va, pszFormat);
-    vsprintf(buff, pszFormat, va);
-    va_end(va);
-//
-#ifdef _win_
-    OutputDebugStringA(buff);
-#else
-    fprintf(stderr, buff);
-#endif
-}
-
-struct TChunkStats {
-    char *Start, *Finish;
-    i64 Size;
-    char* Entries;
-    i64 FreeCount;
-
-    TChunkStats(size_t chunk, i64 size, char* entries)
-        : Size(size)
-        , Entries(entries)
-        , FreeCount(0)
-    {
-        Start = ALLOC_START + chunk * N_CHUNK_SIZE;
-        Finish = Start + N_CHUNK_SIZE;
-    }
-    void CheckBlock(char* pBlock) {
-        if (pBlock && pBlock >= Start && pBlock < Finish) {
-            ++FreeCount;
-            i64 nShift = pBlock - Start;
-            i64 nOffsetInStep = nShift & (N_CHUNK_SIZE - 1);
-            Entries[nOffsetInStep / Size] = 1;
-        }
-    }
-    void SetGlobalFree(char* ptr) {
-        i64 nShift = ptr - Start;
-        i64 nOffsetInStep = nShift & (N_CHUNK_SIZE - 1);
-        while (nOffsetInStep + Size <= N_CHUNK_SIZE) {
-            ++FreeCount;
-            Entries[nOffsetInStep / Size] = 1;
-            nOffsetInStep += Size;
-        }
-    }
-};
-
-static void DumpMemoryBlockUtilizationLocked() {
-    TFreeListGroup* wholeLists[N_SIZES];
-    for (int nSizeIdx = 0; nSizeIdx < N_SIZES; ++nSizeIdx) {
-        wholeLists[nSizeIdx] = (TFreeListGroup*)globalFreeLists[nSizeIdx].GetWholeList();
-    }
-    char* bfList = (char*)blockFreeList.GetWholeList();
-
-    DebugTraceMMgr("memory blocks utilisation stats:\n");
-    i64 nTotalAllocated = 0, nTotalFree = 0, nTotalBadPages = 0, nTotalPages = 0, nTotalUsed = 0, nTotalLocked = 0;
-    i64 nTotalGroupBlocks = 0;
-    char* entries;
-    entries = (char*)SystemAlloc((N_CHUNK_SIZE / 4));
-    for (size_t k = 0; k < N_CHUNKS; ++k) {
-        if (chunkSizeIdx[k] <= 0) {
-            if (chunkSizeIdx[k] == -1)
-                nTotalLocked += N_CHUNK_SIZE;
-            continue;
-        }
-        i64 nSizeIdx = chunkSizeIdx[k];
-        i64 nSize = nSizeIdxToSize[nSizeIdx];
-        TChunkStats cs(k, nSize, entries);
-        int nEntriesTotal = N_CHUNK_SIZE / nSize;
-        memset(entries, 0, nEntriesTotal);
-        for (TFreeListGroup* g = wholeLists[nSizeIdx]; g; g = g->Next) {
-            for (auto& ptr : g->Ptrs)
-                cs.CheckBlock(ptr);
-        }
-        TChunkStats csGB(k, nSize, entries);
-        if (nSizeIdx == FREE_LIST_GROUP_SIZEIDX) {
-            for (auto g : wholeLists) {
-                for (; g; g = g->Next)
-                    csGB.CheckBlock((char*)g);
-            }
-            for (char* blk = bfList; blk; blk = *(char**)blk)
-                csGB.CheckBlock(blk);
-            nTotalGroupBlocks += csGB.FreeCount * nSize;
-        }
-        if (((globalCurrentPtr[nSizeIdx] - ALLOC_START) / N_CHUNK_SIZE) == k)
-            cs.SetGlobalFree(globalCurrentPtr[nSizeIdx]);
-        nTotalUsed += (nEntriesTotal - cs.FreeCount - csGB.FreeCount) * nSize;
-
-        char pages[N_CHUNK_SIZE / N_PAGE_SIZE];
-        memset(pages, 0, sizeof(pages));
-        for (int i = 0, nShift = 0; i < nEntriesTotal; ++i, nShift += nSize) {
-            int nBit = 0;
-            if (entries[i])
-                nBit = 1; // free entry
-            else
-                nBit = 2; // used entry
-            for (i64 nDelta = nSize - 1; nDelta >= 0; nDelta -= N_PAGE_SIZE)
-                pages[(nShift + nDelta) / N_PAGE_SIZE] |= nBit;
-        }
-        i64 nBadPages = 0;
-        for (auto page : pages) {
-            nBadPages += page == 3;
-            nTotalPages += page != 1;
-        }
-        DebugTraceMMgr("entry = %lld; size = %lld; free = %lld; system %lld; utilisation = %g%%, fragmentation = %g%%\n",
-                       k, nSize, cs.FreeCount * nSize, csGB.FreeCount * nSize,
-                       (N_CHUNK_SIZE - cs.FreeCount * nSize) * 100.0f / N_CHUNK_SIZE, 100.0f * nBadPages / Y_ARRAY_SIZE(pages));
-        nTotalAllocated += N_CHUNK_SIZE;
-        nTotalFree += cs.FreeCount * nSize;
-        nTotalBadPages += nBadPages;
-    }
-    SystemFree(entries);
-    DebugTraceMMgr("Total allocated = %llu, free = %lld, system = %lld, locked for future use %lld, utilisation = %g, fragmentation = %g\n",
-                   nTotalAllocated, nTotalFree, nTotalGroupBlocks, nTotalLocked,
-                   100.0f * (nTotalAllocated - nTotalFree) / nTotalAllocated, 100.0f * nTotalBadPages / nTotalPages);
-    DebugTraceMMgr("Total %lld bytes used, %lld bytes in used pages\n", nTotalUsed, nTotalPages * N_PAGE_SIZE);
-
-    for (int nSizeIdx = 0; nSizeIdx < N_SIZES; ++nSizeIdx)
-        globalFreeLists[nSizeIdx].ReturnWholeList(wholeLists[nSizeIdx]);
-    blockFreeList.ReturnWholeList(bfList);
-}
-
-void FlushThreadFreeList() {
-    if (pThreadInfo)
-        MoveSingleThreadFreeToGlobal(pThreadInfo);
-}
-
-void DumpMemoryBlockUtilization() {
-    // move current thread free to global lists to get better statistics
-    FlushThreadFreeList();
-    {
-        CCriticalSectionLockMMgr ls;
-        DumpMemoryBlockUtilizationLocked();
-    }
-}
-
-//////////////////////////////////////////////////////////////////////////
-// malloc api
-
-static bool LFAlloc_SetParam(const char* param, const char* value) {
-    if (!strcmp(param, "LB_LIMIT_TOTAL_SIZE")) {
-        LB_LIMIT_TOTAL_SIZE = atoi(value);
-        return true;
-    }
-    if (!strcmp(param, "LB_LIMIT_TOTAL_SIZE_BYTES")) {
-        LB_LIMIT_TOTAL_SIZE = (atoi(value) + N_PAGE_SIZE - 1) / N_PAGE_SIZE;
-        return true;
-    }
-#ifdef DBG_FILL_MEMORY
-    if (!strcmp(param, "FillMemoryOnAllocation")) {
-        FillMemoryOnAllocation = !strcmp(value, "true");
-        return true;
-    }
-#endif
-    if (!strcmp(param, "BeforeLFAllocGlobalLockAcquired")) {
-        BeforeLFAllocGlobalLockAcquired = (decltype(BeforeLFAllocGlobalLockAcquired))(value);
-        return true;
-    }
-    if (!strcmp(param, "AfterLFAllocGlobalLockReleased")) {
-        AfterLFAllocGlobalLockReleased = (decltype(AfterLFAllocGlobalLockReleased))(value);
-        return true;
-    }
-    if (!strcmp(param, "EnterCritical")) {
-        assert(value);
-        RealEnterCritical = (decltype(RealEnterCritical))(value);
-        return true;
-    }
-    if (!strcmp(param, "LeaveCritical")) {
-        assert(value);
-        RealLeaveCritical = (decltype(RealLeaveCritical))(value);
-        return true;
-    }
-    if (!strcmp(param, "TransparentHugePages")) {
-        TransparentHugePages = !strcmp(value, "true");
-        return true;
-    }
-    if (!strcmp(param, "MapHugeTLB")) {
-        MapHugeTLB = !strcmp(value, "true");
-        return true;
-    }
-    if (!strcmp(param, "EnableDefrag")) {
-        EnableDefrag = !strcmp(value, "true");
-        return true;
-    }
-    return false;
-};
-
-static const char* LFAlloc_GetParam(const char* param) {
-    struct TParam {
-        const char* Name;
-        const char* Value;
-    };
-
-    static const TParam Params[] = {
-        {"GetLFAllocCounterFast", (const char*)&GetLFAllocCounterFast},
-        {"GetLFAllocCounterFull", (const char*)&GetLFAllocCounterFull},
-#if defined(LFALLOC_DBG)
-        {"SetThreadAllocTag", (const char*)&SetThreadAllocTag},
-        {"SetProfileCurrentThread", (const char*)&SetProfileCurrentThread},
-        {"SetProfileAllThreads", (const char*)&SetProfileAllThreads},
-        {"SetAllocationSamplingEnabled", (const char*)&SetAllocationSamplingEnabled},
-        {"SetAllocationSampleRate", (const char*)&SetAllocationSampleRate},
-        {"SetAllocationSampleMaxSize", (const char*)&SetAllocationSampleMaxSize},
-        {"SetAllocationCallback", (const char*)&SetAllocationCallback},
-        {"SetDeallocationCallback", (const char*)&SetDeallocationCallback},
-        {"GetPerTagAllocInfo", (const char*)&GetPerTagAllocInfo},
-#endif // LFALLOC_DBG
-    };
-
-    for (int i = 0; i < Y_ARRAY_SIZE(Params); ++i) {
-        if (strcmp(param, Params[i].Name) == 0) {
-            return Params[i].Value;
-        }
-    }
-    return nullptr;
-}
-
-static Y_FORCE_INLINE void* LFVAlloc(size_t size) {
-    const size_t pg = N_PAGE_SIZE;
-    size_t bigsize = (size + pg - 1) & (~(pg - 1));
-    void* p = LFAlloc(bigsize);
-
-    Y_ASSERT_NOBT((intptr_t)p % N_PAGE_SIZE == 0);
-    return p;
-}
-
-static Y_FORCE_INLINE int LFPosixMemalign(void** memptr, size_t alignment, size_t size) {
-    if (Y_UNLIKELY(alignment > 4096)) {
-#ifdef _win_
-        OutputDebugStringA("Larger alignment are not guaranteed with this implementation\n");
-#else
-        fprintf(stderr, "Larger alignment are not guaranteed with this implementation\n");
-#endif
-        NMalloc::AbortFromCorruptedAllocator();
-    }
-    size_t bigsize = size;
-    if (bigsize <= alignment) {
-        bigsize = alignment;
-    } else if (bigsize < 2 * alignment) {
-        bigsize = 2 * alignment;
-    }
-    *memptr = LFAlloc(bigsize);
-    return 0;
-}
-#endif
diff --git a/contrib/lfalloc/src/lfmalloc.h b/contrib/lfalloc/src/lfmalloc.h
deleted file mode 100644
index 1e6a0d55773..00000000000
--- a/contrib/lfalloc/src/lfmalloc.h
+++ /dev/null
@@ -1,23 +0,0 @@
-#pragma once
-
-#include <string.h>
-#include <stdlib.h>
-#include "util/system/compiler.h"
-
-namespace NMalloc {
-    volatile inline bool IsAllocatorCorrupted = false;
-
-    static inline void AbortFromCorruptedAllocator() {
-        IsAllocatorCorrupted = true;
-        abort();
-    }
-
-    struct TAllocHeader {
-        void* Block;
-        size_t AllocSize;
-        void Y_FORCE_INLINE Encode(void* block, size_t size, size_t signature) {
-            Block = block;
-            AllocSize = size | signature;
-        }
-    };
-}
diff --git a/contrib/lfalloc/src/util/README.md b/contrib/lfalloc/src/util/README.md
deleted file mode 100644
index c367cb4b439..00000000000
--- a/contrib/lfalloc/src/util/README.md
+++ /dev/null
@@ -1,33 +0,0 @@
-Style guide for the util folder is a stricter version of general style guide (mostly in terms of ambiguity resolution).
-
- * all {} must be in K&R style
- * &, * tied closer to a type, not to variable
- * always use `using` not `typedef`
- * even a single line block must be in braces {}:
-   ```
-   if (A) {
-       B();
-   }
-   ```
- * _ at the end of private data member of a class - `First_`, `Second_`
- * every .h file must be accompanied with corresponding .cpp to avoid a leakage and check that it is self contained
- * prohibited to use `printf`-like functions
-
-
-Things declared in the general style guide, which sometimes are missed:
-
- * `template <`, not `template<`
- * `noexcept`, not `throw ()` nor `throw()`, not required for destructors
- * indents inside `namespace` same as inside `class`
-
-
-Requirements for a new code (and for corrections in an old code which involves change of behaviour) in util:
-
- * presence of UNIT-tests
- * presence of comments in Doxygen style
- * accessors without Get prefix (`Length()`, but not `GetLength()`)
-
-This guide is not a mandatory as there is the general style guide.
-Nevertheless if it is not followed, then a next `ya style .` run in the util folder will undeservedly update authors of some lines of code.
-
-Thus before a commit it is recommended to run `ya style .` in the util folder.
diff --git a/contrib/lfalloc/src/util/system/atomic.h b/contrib/lfalloc/src/util/system/atomic.h
deleted file mode 100644
index 9876515a54d..00000000000
--- a/contrib/lfalloc/src/util/system/atomic.h
+++ /dev/null
@@ -1,51 +0,0 @@
-#pragma once
-
-#include "defaults.h"
-
-using TAtomicBase = intptr_t;
-using TAtomic = volatile TAtomicBase;
-
-#if defined(__GNUC__)
-#include "atomic_gcc.h"
-#elif defined(_MSC_VER)
-#include "atomic_win.h"
-#else
-#error unsupported platform
-#endif
-
-#if !defined(ATOMIC_COMPILER_BARRIER)
-#define ATOMIC_COMPILER_BARRIER()
-#endif
-
-static inline TAtomicBase AtomicSub(TAtomic& a, TAtomicBase v) {
-    return AtomicAdd(a, -v);
-}
-
-static inline TAtomicBase AtomicGetAndSub(TAtomic& a, TAtomicBase v) {
-    return AtomicGetAndAdd(a, -v);
-}
-
-#if defined(USE_GENERIC_SETGET)
-static inline TAtomicBase AtomicGet(const TAtomic& a) {
-    return a;
-}
-
-static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
-    a = v;
-}
-#endif
-
-static inline bool AtomicTryLock(TAtomic* a) {
-    return AtomicCas(a, 1, 0);
-}
-
-static inline bool AtomicTryAndTryLock(TAtomic* a) {
-    return (AtomicGet(*a) == 0) && AtomicTryLock(a);
-}
-
-static inline void AtomicUnlock(TAtomic* a) {
-    ATOMIC_COMPILER_BARRIER();
-    AtomicSet(*a, 0);
-}
-
-#include "atomic_ops.h"
diff --git a/contrib/lfalloc/src/util/system/atomic_gcc.h b/contrib/lfalloc/src/util/system/atomic_gcc.h
deleted file mode 100644
index ed8dc2bdc53..00000000000
--- a/contrib/lfalloc/src/util/system/atomic_gcc.h
+++ /dev/null
@@ -1,90 +0,0 @@
-#pragma once
-
-#define ATOMIC_COMPILER_BARRIER() __asm__ __volatile__("" \
-                                                       :  \
-                                                       :  \
-                                                       : "memory")
-
-static inline TAtomicBase AtomicGet(const TAtomic& a) {
-    TAtomicBase tmp;
-#if defined(_arm64_)
-    __asm__ __volatile__(
-        "ldar %x[value], %[ptr]  \n\t"
-        : [value] "=r"(tmp)
-        : [ptr] "Q"(a)
-        : "memory");
-#else
-    __atomic_load(&a, &tmp, __ATOMIC_ACQUIRE);
-#endif
-    return tmp;
-}
-
-static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
-#if defined(_arm64_)
-    __asm__ __volatile__(
-        "stlr %x[value], %[ptr]  \n\t"
-        : [ptr] "=Q"(a)
-        : [value] "r"(v)
-        : "memory");
-#else
-    __atomic_store(&a, &v, __ATOMIC_RELEASE);
-#endif
-}
-
-static inline intptr_t AtomicIncrement(TAtomic& p) {
-    return __atomic_add_fetch(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& p) {
-    return __atomic_fetch_add(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& p) {
-    return __atomic_sub_fetch(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& p) {
-    return __atomic_fetch_sub(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicAdd(TAtomic& p, intptr_t v) {
-    return __atomic_add_fetch(&p, v, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& p, intptr_t v) {
-    return __atomic_fetch_add(&p, v, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* p, intptr_t v) {
-    (void)p; // disable strange 'parameter set but not used' warning on gcc
-    intptr_t ret;
-    __atomic_exchange(p, &v, &ret, __ATOMIC_SEQ_CST);
-    return ret;
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    (void)a; // disable strange 'parameter set but not used' warning on gcc
-    return __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    (void)a; // disable strange 'parameter set but not used' warning on gcc
-    __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
-    return compare;
-}
-
-static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
-    return __atomic_or_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
-    return __atomic_xor_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
-    return __atomic_and_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline void AtomicBarrier() {
-    __sync_synchronize();
-}
diff --git a/contrib/lfalloc/src/util/system/atomic_ops.h b/contrib/lfalloc/src/util/system/atomic_ops.h
deleted file mode 100644
index 425b643e14d..00000000000
--- a/contrib/lfalloc/src/util/system/atomic_ops.h
+++ /dev/null
@@ -1,189 +0,0 @@
-#pragma once
-
-#include <type_traits>
-
-template <typename T>
-inline TAtomic* AsAtomicPtr(T volatile* target) {
-    return reinterpret_cast<TAtomic*>(target);
-}
-
-template <typename T>
-inline const TAtomic* AsAtomicPtr(T const volatile* target) {
-    return reinterpret_cast<const TAtomic*>(target);
-}
-
-// integral types
-
-template <typename T>
-struct TAtomicTraits {
-    enum {
-        Castable = std::is_integral<T>::value && sizeof(T) == sizeof(TAtomicBase) && !std::is_const<T>::value,
-    };
-};
-
-template <typename T, typename TT>
-using TEnableIfCastable = std::enable_if_t<TAtomicTraits<T>::Castable, TT>;
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGet(T const volatile& target) {
-    return static_cast<T>(AtomicGet(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, void> AtomicSet(T volatile& target, TAtomicBase value) {
-    AtomicSet(*AsAtomicPtr(&target), value);
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicIncrement(T volatile& target) {
-    return static_cast<T>(AtomicIncrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndIncrement(T volatile& target) {
-    return static_cast<T>(AtomicGetAndIncrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicDecrement(T volatile& target) {
-    return static_cast<T>(AtomicDecrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndDecrement(T volatile& target) {
-    return static_cast<T>(AtomicGetAndDecrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicAdd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicAdd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndAdd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicGetAndAdd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicSub(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicSub(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndSub(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicGetAndSub(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicSwap(T volatile* target, TAtomicBase exchange) {
-    return static_cast<T>(AtomicSwap(AsAtomicPtr(target), exchange));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
-    return AtomicCas(AsAtomicPtr(target), exchange, compare);
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
-    return static_cast<T>(AtomicGetAndCas(AsAtomicPtr(target), exchange, compare));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicTryLock(T volatile* target) {
-    return AtomicTryLock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicTryAndTryLock(T volatile* target) {
-    return AtomicTryAndTryLock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, void> AtomicUnlock(T volatile* target) {
-    AtomicUnlock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicOr(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicOr(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicAnd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicAnd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicXor(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicXor(*AsAtomicPtr(&target), value));
-}
-
-// pointer types
-
-template <typename T>
-inline T* AtomicGet(T* const volatile& target) {
-    return reinterpret_cast<T*>(AtomicGet(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline void AtomicSet(T* volatile& target, T* value) {
-    AtomicSet(*AsAtomicPtr(&target), reinterpret_cast<TAtomicBase>(value));
-}
-
-using TNullPtr = decltype(nullptr);
-
-template <typename T>
-inline void AtomicSet(T* volatile& target, TNullPtr) {
-    AtomicSet(*AsAtomicPtr(&target), 0);
-}
-
-template <typename T>
-inline T* AtomicSwap(T* volatile* target, T* exchange) {
-    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange)));
-}
-
-template <typename T>
-inline T* AtomicSwap(T* volatile* target, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), 0));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, T* exchange, T* compare) {
-    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare));
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, T* exchange, T* compare) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare)));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, T* exchange, TNullPtr) {
-    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0);
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, T* exchange, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, TNullPtr, T* compare) {
-    return AtomicCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare));
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, T* compare) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare)));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, TNullPtr, TNullPtr) {
-    return AtomicCas(AsAtomicPtr(target), 0, 0);
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, 0));
-}
diff --git a/contrib/lfalloc/src/util/system/atomic_win.h b/contrib/lfalloc/src/util/system/atomic_win.h
deleted file mode 100644
index 1abebd87b38..00000000000
--- a/contrib/lfalloc/src/util/system/atomic_win.h
+++ /dev/null
@@ -1,114 +0,0 @@
-#pragma once
-
-#include <intrin.h>
-
-#define USE_GENERIC_SETGET
-
-#if defined(_i386_)
-
-#pragma intrinsic(_InterlockedIncrement)
-#pragma intrinsic(_InterlockedDecrement)
-#pragma intrinsic(_InterlockedExchangeAdd)
-#pragma intrinsic(_InterlockedExchange)
-#pragma intrinsic(_InterlockedCompareExchange)
-
-static inline intptr_t AtomicIncrement(TAtomic& a) {
-    return _InterlockedIncrement((volatile long*)&a);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
-    return _InterlockedIncrement((volatile long*)&a) - 1;
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& a) {
-    return _InterlockedDecrement((volatile long*)&a);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
-    return _InterlockedDecrement((volatile long*)&a) + 1;
-}
-
-static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd((volatile long*)&a, b) + b;
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd((volatile long*)&a, b);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
-    return _InterlockedExchange((volatile long*)a, b);
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange((volatile long*)a, exchange, compare) == compare;
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange((volatile long*)a, exchange, compare);
-}
-
-#else // _x86_64_
-
-#pragma intrinsic(_InterlockedIncrement64)
-#pragma intrinsic(_InterlockedDecrement64)
-#pragma intrinsic(_InterlockedExchangeAdd64)
-#pragma intrinsic(_InterlockedExchange64)
-#pragma intrinsic(_InterlockedCompareExchange64)
-
-static inline intptr_t AtomicIncrement(TAtomic& a) {
-    return _InterlockedIncrement64((volatile __int64*)&a);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
-    return _InterlockedIncrement64((volatile __int64*)&a) - 1;
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& a) {
-    return _InterlockedDecrement64((volatile __int64*)&a);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
-    return _InterlockedDecrement64((volatile __int64*)&a) + 1;
-}
-
-static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd64((volatile __int64*)&a, b) + b;
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd64((volatile __int64*)&a, b);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
-    return _InterlockedExchange64((volatile __int64*)a, b);
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare) == compare;
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare);
-}
-
-static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
-    return _InterlockedOr64(&a, b) | b;
-}
-
-static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
-    return _InterlockedAnd64(&a, b) & b;
-}
-
-static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
-    return _InterlockedXor64(&a, b) ^ b;
-}
-
-#endif // _x86_
-
-//TODO
-static inline void AtomicBarrier() {
-    TAtomic val = 0;
-
-    AtomicSwap(&val, 0);
-}
diff --git a/contrib/lfalloc/src/util/system/compiler.h b/contrib/lfalloc/src/util/system/compiler.h
deleted file mode 100644
index b5cec600923..00000000000
--- a/contrib/lfalloc/src/util/system/compiler.h
+++ /dev/null
@@ -1,617 +0,0 @@
-#pragma once
-
-// useful cross-platfrom definitions for compilers
-
-/**
- * @def Y_FUNC_SIGNATURE
- *
- * Use this macro to get pretty function name (see example).
- *
- * @code
- * void Hi() {
- *     Cout << Y_FUNC_SIGNATURE << Endl;
- * }
-
- * template <typename T>
- * void Do() {
- *     Cout << Y_FUNC_SIGNATURE << Endl;
- * }
-
- * int main() {
- *    Hi();         // void Hi()
- *    Do<int>();    // void Do() [T = int]
- *    Do<TString>(); // void Do() [T = TString]
- * }
- * @endcode
- */
-#if defined(__GNUC__)
-#define Y_FUNC_SIGNATURE __PRETTY_FUNCTION__
-#elif defined(_MSC_VER)
-#define Y_FUNC_SIGNATURE __FUNCSIG__
-#else
-#define Y_FUNC_SIGNATURE ""
-#endif
-
-#ifdef __GNUC__
-#define Y_PRINTF_FORMAT(n, m) __attribute__((__format__(__printf__, n, m)))
-#endif
-
-#ifndef Y_PRINTF_FORMAT
-#define Y_PRINTF_FORMAT(n, m)
-#endif
-
-#if defined(__clang__)
-#define Y_NO_SANITIZE(...) __attribute__((no_sanitize(__VA_ARGS__)))
-#endif
-
-#if !defined(Y_NO_SANITIZE)
-#define Y_NO_SANITIZE(...)
-#endif
-
-/**
- * @def Y_DECLARE_UNUSED
- *
- * Macro is needed to silence compiler warning about unused entities (e.g. function or argument).
- *
- * @code
- * Y_DECLARE_UNUSED int FunctionUsedSolelyForDebugPurposes();
- * assert(FunctionUsedSolelyForDebugPurposes() == 42);
- *
- * void Foo(const int argumentUsedOnlyForDebugPurposes Y_DECLARE_UNUSED) {
- *     assert(argumentUsedOnlyForDebugPurposes == 42);
- *     // however you may as well omit `Y_DECLARE_UNUSED` and use `UNUSED` macro instead
- *     Y_UNUSED(argumentUsedOnlyForDebugPurposes);
- * }
- * @endcode
- */
-#ifdef __GNUC__
-#define Y_DECLARE_UNUSED __attribute__((unused))
-#endif
-
-#ifndef Y_DECLARE_UNUSED
-#define Y_DECLARE_UNUSED
-#endif
-
-#if defined(__GNUC__)
-#define Y_LIKELY(Cond) __builtin_expect(!!(Cond), 1)
-#define Y_UNLIKELY(Cond) __builtin_expect(!!(Cond), 0)
-#define Y_PREFETCH_READ(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 0, Priority)
-#define Y_PREFETCH_WRITE(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 1, Priority)
-#endif
-
-/**
- * @def Y_FORCE_INLINE
- *
- * Macro to use in place of 'inline' in function declaration/definition to force
- * it to be inlined.
- */
-#if !defined(Y_FORCE_INLINE)
-#if defined(CLANG_COVERAGE)
-#/* excessive __always_inline__ might significantly slow down compilation of an instrumented unit */
-#define Y_FORCE_INLINE inline
-#elif defined(_MSC_VER)
-#define Y_FORCE_INLINE __forceinline
-#elif defined(__GNUC__)
-#/* Clang also defines __GNUC__ (as 4) */
-#define Y_FORCE_INLINE inline __attribute__((__always_inline__))
-#else
-#define Y_FORCE_INLINE inline
-#endif
-#endif
-
-/**
- * @def Y_NO_INLINE
- *
- * Macro to use in place of 'inline' in function declaration/definition to
- * prevent it from being inlined.
- */
-#if !defined(Y_NO_INLINE)
-#if defined(_MSC_VER)
-#define Y_NO_INLINE __declspec(noinline)
-#elif defined(__GNUC__) || defined(__INTEL_COMPILER)
-#/* Clang also defines __GNUC__ (as 4) */
-#define Y_NO_INLINE __attribute__((__noinline__))
-#else
-#define Y_NO_INLINE
-#endif
-#endif
-
-//to cheat compiler about strict aliasing or similar problems
-#if defined(__GNUC__)
-#define Y_FAKE_READ(X)                  \
-    do {                                \
-        __asm__ __volatile__(""         \
-                             :          \
-                             : "m"(X)); \
-    } while (0)
-
-#define Y_FAKE_WRITE(X)                  \
-    do {                                 \
-        __asm__ __volatile__(""          \
-                             : "=m"(X)); \
-    } while (0)
-#endif
-
-#if !defined(Y_FAKE_READ)
-#define Y_FAKE_READ(X)
-#endif
-
-#if !defined(Y_FAKE_WRITE)
-#define Y_FAKE_WRITE(X)
-#endif
-
-#ifndef Y_PREFETCH_READ
-#define Y_PREFETCH_READ(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
-#endif
-
-#ifndef Y_PREFETCH_WRITE
-#define Y_PREFETCH_WRITE(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
-#endif
-
-#ifndef Y_LIKELY
-#define Y_LIKELY(Cond) (Cond)
-#define Y_UNLIKELY(Cond) (Cond)
-#endif
-
-#ifdef __GNUC__
-#define _packed __attribute__((packed))
-#else
-#define _packed
-#endif
-
-#if defined(__GNUC__)
-#define Y_WARN_UNUSED_RESULT __attribute__((warn_unused_result))
-#endif
-
-#ifndef Y_WARN_UNUSED_RESULT
-#define Y_WARN_UNUSED_RESULT
-#endif
-
-#if defined(__GNUC__)
-#define Y_HIDDEN __attribute__((visibility("hidden")))
-#endif
-
-#if !defined(Y_HIDDEN)
-#define Y_HIDDEN
-#endif
-
-#if defined(__GNUC__)
-#define Y_PUBLIC __attribute__((visibility("default")))
-#endif
-
-#if !defined(Y_PUBLIC)
-#define Y_PUBLIC
-#endif
-
-#if !defined(Y_UNUSED) && !defined(__cplusplus)
-#define Y_UNUSED(var) (void)(var)
-#endif
-#if !defined(Y_UNUSED) && defined(__cplusplus)
-template <class... Types>
-constexpr Y_FORCE_INLINE int Y_UNUSED(Types&&...) {
-    return 0;
-};
-#endif
-
-/**
- * @def Y_ASSUME
- *
- * Macro that tells the compiler that it can generate optimized code
- * as if the given expression will always evaluate true.
- * The behavior is undefined if it ever evaluates false.
- *
- * @code
- * // factored into a function so that it's testable
- * inline int Avg(int x, int y) {
- *     if (x >= 0 && y >= 0) {
- *         return (static_cast<unsigned>(x) + static_cast<unsigned>(y)) >> 1;
- *     } else {
- *         // a slower implementation
- *     }
- * }
- *
- * // we know that xs and ys are non-negative from domain knowledge,
- * // but we can't change the types of xs and ys because of API constrains
- * int Foo(const TVector<int>& xs, const TVector<int>& ys) {
- *     TVector<int> avgs;
- *     avgs.resize(xs.size());
- *     for (size_t i = 0; i < xs.size(); ++i) {
- *         auto x = xs[i];
- *         auto y = ys[i];
- *         Y_ASSUME(x >= 0);
- *         Y_ASSUME(y >= 0);
- *         xs[i] = Avg(x, y);
- *     }
- * }
- * @endcode
- */
-#if defined(__GNUC__)
-#define Y_ASSUME(condition) ((condition) ? (void)0 : __builtin_unreachable())
-#elif defined(_MSC_VER)
-#define Y_ASSUME(condition) __assume(condition)
-#else
-#define Y_ASSUME(condition) Y_UNUSED(condition)
-#endif
-
-#ifdef __cplusplus
-[[noreturn]]
-#endif
-Y_HIDDEN void _YandexAbort();
-
-/**
- * @def Y_UNREACHABLE
- *
- * Macro that marks the rest of the code branch unreachable.
- * The behavior is undefined if it's ever reached.
- *
- * @code
- * switch (i % 3) {
- * case 0:
- *     return foo;
- * case 1:
- *     return bar;
- * case 2:
- *     return baz;
- * default:
- *     Y_UNREACHABLE();
- * }
- * @endcode
- */
-#if defined(__GNUC__) || defined(_MSC_VER)
-#define Y_UNREACHABLE() Y_ASSUME(0)
-#else
-#define Y_UNREACHABLE() _YandexAbort()
-#endif
-
-#if defined(undefined_sanitizer_enabled)
-#define _ubsan_enabled_
-#endif
-
-#ifdef __clang__
-
-#if __has_feature(thread_sanitizer)
-#define _tsan_enabled_
-#endif
-#if __has_feature(memory_sanitizer)
-#define _msan_enabled_
-#endif
-#if __has_feature(address_sanitizer)
-#define _asan_enabled_
-#endif
-
-#else
-
-#if defined(thread_sanitizer_enabled) || defined(__SANITIZE_THREAD__)
-#define _tsan_enabled_
-#endif
-#if defined(memory_sanitizer_enabled)
-#define _msan_enabled_
-#endif
-#if defined(address_sanitizer_enabled) || defined(__SANITIZE_ADDRESS__)
-#define _asan_enabled_
-#endif
-
-#endif
-
-#if defined(_asan_enabled_) || defined(_msan_enabled_) || defined(_tsan_enabled_) || defined(_ubsan_enabled_)
-#define _san_enabled_
-#endif
-
-#if defined(_MSC_VER)
-#define __PRETTY_FUNCTION__ __FUNCSIG__
-#endif
-
-#if defined(__GNUC__)
-#define Y_WEAK __attribute__((weak))
-#else
-#define Y_WEAK
-#endif
-
-#if defined(__CUDACC_VER_MAJOR__)
-#define Y_CUDA_AT_LEAST(x, y) (__CUDACC_VER_MAJOR__ > x || (__CUDACC_VER_MAJOR__ == x && __CUDACC_VER_MINOR__ >= y))
-#else
-#define Y_CUDA_AT_LEAST(x, y) 0
-#endif
-
-// NVidia CUDA C++ Compiler did not know about noexcept keyword until version 9.0
-#if !Y_CUDA_AT_LEAST(9, 0)
-#if defined(__CUDACC__) && !defined(noexcept)
-#define noexcept throw ()
-#endif
-#endif
-
-#if defined(__GNUC__)
-#define Y_COLD __attribute__((cold))
-#define Y_LEAF __attribute__((leaf))
-#define Y_WRAPPER __attribute__((artificial))
-#else
-#define Y_COLD
-#define Y_LEAF
-#define Y_WRAPPER
-#endif
-
-/**
- * @def Y_PRAGMA
- *
- * Macro for use in other macros to define compiler pragma
- * See below for other usage examples
- *
- * @code
- * #if defined(__clang__) || defined(__GNUC__)
- * #define Y_PRAGMA_NO_WSHADOW \
- *     Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
- * #elif defined(_MSC_VER)
- * #define Y_PRAGMA_NO_WSHADOW \
- *     Y_PRAGMA("warning(disable:4456 4457")
- * #else
- * #define Y_PRAGMA_NO_WSHADOW
- * #endif
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA(x) _Pragma(x)
-#elif defined(_MSC_VER)
-#define Y_PRAGMA(x) __pragma(x)
-#else
-#define Y_PRAGMA(x)
-#endif
-
-/**
- * @def Y_PRAGMA_DIAGNOSTIC_PUSH
- *
- * Cross-compiler pragma to save diagnostic settings
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
- *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
- *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_DIAGNOSTIC_PUSH \
-    Y_PRAGMA("GCC diagnostic push")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_DIAGNOSTIC_PUSH \
-    Y_PRAGMA(warning(push))
-#else
-#define Y_PRAGMA_DIAGNOSTIC_PUSH
-#endif
-
-/**
- * @def Y_PRAGMA_DIAGNOSTIC_POP
- *
- * Cross-compiler pragma to restore diagnostic settings
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
- *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
- *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_DIAGNOSTIC_POP \
-    Y_PRAGMA("GCC diagnostic pop")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_DIAGNOSTIC_POP \
-    Y_PRAGMA(warning(pop))
-#else
-#define Y_PRAGMA_DIAGNOSTIC_POP
-#endif
-
-/**
- * @def Y_PRAGMA_NO_WSHADOW
- *
- * Cross-compiler pragma to disable warnings about shadowing variables
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_WSHADOW
- *
- * // some code which use variable shadowing, e.g.:
- *
- * for (int i = 0; i < 100; ++i) {
- *   Use(i);
- *
- *   for (int i = 42; i < 100500; ++i) { // this i is shadowing previous i
- *       AnotherUse(i);
- *    }
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_WSHADOW \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_WSHADOW \
-    Y_PRAGMA(warning(disable : 4456 4457))
-#else
-#define Y_PRAGMA_NO_WSHADOW
-#endif
-
-/**
- * @ def Y_PRAGMA_NO_UNUSED_FUNCTION
- *
- * Cross-compiler pragma to disable warnings about unused functions
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function
- *     MSVC: there is no such warning
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_UNUSED_FUNCTION
- *
- * // some code which introduces a function which later will not be used, e.g.:
- *
- * void Foo() {
- * }
- *
- * int main() {
- *     return 0; // Foo() never called
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_UNUSED_FUNCTION \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-function\"")
-#else
-#define Y_PRAGMA_NO_UNUSED_FUNCTION
-#endif
-
-/**
- * @ def Y_PRAGMA_NO_UNUSED_PARAMETER
- *
- * Cross-compiler pragma to disable warnings about unused function parameters
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-parameter
- *     MSVC: https://msdn.microsoft.com/en-us/library/26kb9fy0.aspx
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_UNUSED_PARAMETER
- *
- * // some code which introduces a function with unused parameter, e.g.:
- *
- * void foo(int a) {
- *     // a is not referenced
- * }
- *
- * int main() {
- *     foo(1);
- *     return 0;
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_UNUSED_PARAMETER \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-parameter\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_UNUSED_PARAMETER \
-    Y_PRAGMA(warning(disable : 4100))
-#else
-#define Y_PRAGMA_NO_UNUSED_PARAMETER
-#endif
-
-/**
- * @def Y_PRAGMA_NO_DEPRECATED
- *
- * Cross compiler pragma to disable warnings and errors about deprecated
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wdeprecated
- *     MSVC: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4996?view=vs-2017
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_DEPRECATED
- *
- * [deprecated] void foo() {
- *     // ...
- * }
- *
- * int main() {
- *     foo();
- *     return 0;
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_DEPRECATED \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wdeprecated\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_DEPRECATED \
-    Y_PRAGMA(warning(disable : 4996))
-#else
-#define Y_PRAGMA_NO_DEPRECATED
-#endif
-
-#if defined(__clang__) || defined(__GNUC__)
-/**
- * @def Y_CONST_FUNCTION
-   methods and functions, marked with this method are promised to:
-     1. do not have side effects
-     2. this method do not read global memory
-   NOTE: this attribute can't be set for methods that depend on data, pointed by this
-   this allow compilers to do hard optimization of that functions
-   NOTE: in common case this attribute can't be set if method have pointer-arguments
-   NOTE: as result there no any reason to discard result of such method
-*/
-#define Y_CONST_FUNCTION [[gnu::const]]
-#endif
-
-#if !defined(Y_CONST_FUNCTION)
-#define Y_CONST_FUNCTION
-#endif
-
-#if defined(__clang__) || defined(__GNUC__)
-/**
- * @def Y_PURE_FUNCTION
-   methods and functions, marked with this method are promised to:
-     1. do not have side effects
-     2. result will be the same if no global memory changed
-   this allow compilers to do hard optimization of that functions
-   NOTE: as result there no any reason to discard result of such method
-*/
-#define Y_PURE_FUNCTION [[gnu::pure]]
-#endif
-
-#if !defined(Y_PURE_FUNCTION)
-#define Y_PURE_FUNCTION
-#endif
-
-/**
- * @ def Y_HAVE_INT128
- *
- * Defined when the compiler supports __int128 extension
- *
- * @code
- *
- * #if defined(Y_HAVE_INT128)
- *     __int128 myVeryBigInt = 12345678901234567890;
- * #endif
- *
- * @endcode
- */
-#if defined(__SIZEOF_INT128__)
-#define Y_HAVE_INT128 1
-#endif
-
-/**
- * XRAY macro must be passed to compiler if XRay is enabled.
- *
- * Define everything XRay-specific as a macro so that it doesn't cause errors
- * for compilers that doesn't support XRay.
- */
-#if defined(XRAY) && defined(__cplusplus)
-#include <xray/xray_interface.h>
-#define Y_XRAY_ALWAYS_INSTRUMENT [[clang::xray_always_instrument]]
-#define Y_XRAY_NEVER_INSTRUMENT [[clang::xray_never_instrument]]
-#define Y_XRAY_CUSTOM_EVENT(__string, __length) \
-    do {                                        \
-        __xray_customevent(__string, __length); \
-    } while (0)
-#else
-#define Y_XRAY_ALWAYS_INSTRUMENT
-#define Y_XRAY_NEVER_INSTRUMENT
-#define Y_XRAY_CUSTOM_EVENT(__string, __length) \
-    do {                                        \
-    } while (0)
-#endif
diff --git a/contrib/lfalloc/src/util/system/defaults.h b/contrib/lfalloc/src/util/system/defaults.h
deleted file mode 100644
index 19196a28b2b..00000000000
--- a/contrib/lfalloc/src/util/system/defaults.h
+++ /dev/null
@@ -1,168 +0,0 @@
-#pragma once
-
-#include "platform.h"
-
-#if defined _unix_
-#define LOCSLASH_C '/'
-#define LOCSLASH_S "/"
-#else
-#define LOCSLASH_C '\\'
-#define LOCSLASH_S "\\"
-#endif // _unix_
-
-#if defined(__INTEL_COMPILER) && defined(__cplusplus)
-#include <new>
-#endif
-
-// low and high parts of integers
-#if !defined(_win_)
-#include <sys/param.h>
-#endif
-
-#if defined(BSD) || defined(_android_)
-
-#if defined(BSD)
-#include <machine/endian.h>
-#endif
-
-#if defined(_android_)
-#include <endian.h>
-#endif
-
-#if (BYTE_ORDER == LITTLE_ENDIAN)
-#define _little_endian_
-#elif (BYTE_ORDER == BIG_ENDIAN)
-#define _big_endian_
-#else
-#error unknown endian not supported
-#endif
-
-#elif (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(WHATEVER_THAT_HAS_BIG_ENDIAN)
-#define _big_endian_
-#else
-#define _little_endian_
-#endif
-
-// alignment
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_QUADS)
-#define _must_align8_
-#endif
-
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_LONGS)
-#define _must_align4_
-#endif
-
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_SHORTS)
-#define _must_align2_
-#endif
-
-#if defined(__GNUC__)
-#define alias_hack __attribute__((__may_alias__))
-#endif
-
-#ifndef alias_hack
-#define alias_hack
-#endif
-
-#include "types.h"
-
-#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
-#define PRAGMA(x) _Pragma(#x)
-#define RCSID(idstr) PRAGMA(comment(exestr, idstr))
-#else
-#define RCSID(idstr) static const char rcsid[] = idstr
-#endif
-
-#include "compiler.h"
-
-#ifdef _win_
-#include <malloc.h>
-#elif defined(_sun_)
-#include <alloca.h>
-#endif
-
-#ifdef NDEBUG
-#define Y_IF_DEBUG(X)
-#else
-#define Y_IF_DEBUG(X) X
-#endif
-
-/**
- * @def Y_ARRAY_SIZE
- *
- * This macro is needed to get number of elements in a statically allocated fixed size array. The
- * expression is a compile-time constant and therefore can be used in compile time computations.
- *
- * @code
- * enum ENumbers {
- *     EN_ONE,
- *     EN_TWO,
- *     EN_SIZE
- * }
- *
- * const char* NAMES[] = {
- *     "one",
- *     "two"
- * }
- *
- * static_assert(Y_ARRAY_SIZE(NAMES) == EN_SIZE, "you should define `NAME` for each enumeration");
- * @endcode
- *
- * This macro also catches type errors. If you see a compiler error like "warning: division by zero
- * is undefined" when using `Y_ARRAY_SIZE` then you are probably giving it a pointer.
- *
- * Since all of our code is expected to work on a 64 bit platform where pointers are 8 bytes we may
- * falsefully accept pointers to types of sizes that are divisors of 8 (1, 2, 4 and 8).
- */
-#if defined(__cplusplus)
-namespace NArraySizePrivate {
-    template <class T>
-    struct TArraySize;
-
-    template <class T, size_t N>
-    struct TArraySize<T[N]> {
-        enum {
-            Result = N
-        };
-    };
-
-    template <class T, size_t N>
-    struct TArraySize<T (&)[N]> {
-        enum {
-            Result = N
-        };
-    };
-}
-
-#define Y_ARRAY_SIZE(arr) ((size_t)::NArraySizePrivate::TArraySize<decltype(arr)>::Result)
-#else
-#undef Y_ARRAY_SIZE
-#define Y_ARRAY_SIZE(arr) \
-    ((sizeof(arr) / sizeof((arr)[0])) / static_cast<size_t>(!(sizeof(arr) % sizeof((arr)[0]))))
-#endif
-
-#undef Y_ARRAY_BEGIN
-#define Y_ARRAY_BEGIN(arr) (arr)
-
-#undef Y_ARRAY_END
-#define Y_ARRAY_END(arr) ((arr) + Y_ARRAY_SIZE(arr))
-
-/**
- * Concatenates two symbols, even if one of them is itself a macro.
- */
-#define Y_CAT(X, Y) Y_CAT_I(X, Y)
-#define Y_CAT_I(X, Y) Y_CAT_II(X, Y)
-#define Y_CAT_II(X, Y) X##Y
-
-#define Y_STRINGIZE(X) UTIL_PRIVATE_STRINGIZE_AUX(X)
-#define UTIL_PRIVATE_STRINGIZE_AUX(X) #X
-
-#if defined(__COUNTER__)
-#define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __COUNTER__)
-#endif
-
-#if !defined(Y_GENERATE_UNIQUE_ID)
-#define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __LINE__)
-#endif
-
-#define NPOS ((size_t)-1)
diff --git a/contrib/lfalloc/src/util/system/platform.h b/contrib/lfalloc/src/util/system/platform.h
deleted file mode 100644
index 0687f239a2e..00000000000
--- a/contrib/lfalloc/src/util/system/platform.h
+++ /dev/null
@@ -1,242 +0,0 @@
-#pragma once
-
-// What OS ?
-// our definition has the form _{osname}_
-
-#if defined(_WIN64)
-#define _win64_
-#define _win32_
-#elif defined(__WIN32__) || defined(_WIN32) // _WIN32 is also defined by the 64-bit compiler for backward compatibility
-#define _win32_
-#else
-#define _unix_
-#if defined(__sun__) || defined(sun) || defined(sparc) || defined(__sparc)
-#define _sun_
-#endif
-#if defined(__hpux__)
-#define _hpux_
-#endif
-#if defined(__linux__)
-#define _linux_
-#endif
-#if defined(__FreeBSD__)
-#define _freebsd_
-#endif
-#if defined(__CYGWIN__)
-#define _cygwin_
-#endif
-#if defined(__APPLE__)
-#define _darwin_
-#endif
-#if defined(__ANDROID__)
-#define _android_
-#endif
-#endif
-
-#if defined(__IOS__)
-#define _ios_
-#endif
-
-#if defined(_linux_)
-#if defined(_musl_)
-//nothing to do
-#elif defined(_android_)
-#define _bionic_
-#else
-#define _glibc_
-#endif
-#endif
-
-#if defined(_darwin_)
-#define unix
-#define __unix__
-#endif
-
-#if defined(_win32_) || defined(_win64_)
-#define _win_
-#endif
-
-#if defined(__arm__) || defined(__ARM__) || defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM)
-#if defined(__arm64) || defined(__arm64__) || defined(__aarch64__)
-#define _arm64_
-#else
-#define _arm32_
-#endif
-#endif
-
-#if defined(_arm64_) || defined(_arm32_)
-#define _arm_
-#endif
-
-/* __ia64__ and __x86_64__      - defined by GNU C.
- * _M_IA64, _M_X64, _M_AMD64    - defined by Visual Studio.
- *
- * Microsoft can define _M_IX86, _M_AMD64 (before Visual Studio 8)
- * or _M_X64 (starting in Visual Studio 8).
- */
-#if defined(__x86_64__) || defined(_M_X64) || defined(_M_AMD64)
-#define _x86_64_
-#endif
-
-#if defined(__i386__) || defined(_M_IX86)
-#define _i386_
-#endif
-
-#if defined(__ia64__) || defined(_M_IA64)
-#define _ia64_
-#endif
-
-#if defined(__powerpc__)
-#define _ppc_
-#endif
-
-#if defined(__powerpc64__)
-#define _ppc64_
-#endif
-
-#if !defined(sparc) && !defined(__sparc) && !defined(__hpux__) && !defined(__alpha__) && !defined(_ia64_) && !defined(_x86_64_) && !defined(_arm_) && !defined(_i386_) && !defined(_ppc_) && !defined(_ppc64_)
-#error "platform not defined, please, define one"
-#endif
-
-#if defined(_x86_64_) || defined(_i386_)
-#define _x86_
-#endif
-
-#if defined(__MIC__)
-#define _mic_
-#define _k1om_
-#endif
-
-// stdio or MessageBox
-#if defined(__CONSOLE__) || defined(_CONSOLE)
-#define _console_
-#endif
-#if (defined(_win_) && !defined(_console_))
-#define _windows_
-#elif !defined(_console_)
-#define _console_
-#endif
-
-#if defined(__SSE__) || defined(SSE_ENABLED)
-#define _sse_
-#endif
-
-#if defined(__SSE2__) || defined(SSE2_ENABLED)
-#define _sse2_
-#endif
-
-#if defined(__SSE3__) || defined(SSE3_ENABLED)
-#define _sse3_
-#endif
-
-#if defined(__SSSE3__) || defined(SSSE3_ENABLED)
-#define _ssse3_
-#endif
-
-#if defined(POPCNT_ENABLED)
-#define _popcnt_
-#endif
-
-#if defined(__DLL__) || defined(_DLL)
-#define _dll_
-#endif
-
-// 16, 32 or 64
-#if defined(__sparc_v9__) || defined(_x86_64_) || defined(_ia64_) || defined(_arm64_) || defined(_ppc64_)
-#define _64_
-#else
-#define _32_
-#endif
-
-/* All modern 64-bit Unix systems use scheme LP64 (long, pointers are 64-bit).
- * Microsoft uses a different scheme: LLP64 (long long, pointers are 64-bit).
- *
- * Scheme          LP64   LLP64
- * char              8      8
- * short            16     16
- * int              32     32
- * long             64     32
- * long long        64     64
- * pointer          64     64
- */
-
-#if defined(_32_)
-#define SIZEOF_PTR 4
-#elif defined(_64_)
-#define SIZEOF_PTR 8
-#endif
-
-#define PLATFORM_DATA_ALIGN SIZEOF_PTR
-
-#if !defined(SIZEOF_PTR)
-#error todo
-#endif
-
-#define SIZEOF_CHAR 1
-#define SIZEOF_UNSIGNED_CHAR 1
-#define SIZEOF_SHORT 2
-#define SIZEOF_UNSIGNED_SHORT 2
-#define SIZEOF_INT 4
-#define SIZEOF_UNSIGNED_INT 4
-
-#if defined(_32_)
-#define SIZEOF_LONG 4
-#define SIZEOF_UNSIGNED_LONG 4
-#elif defined(_64_)
-#if defined(_win_)
-#define SIZEOF_LONG 4
-#define SIZEOF_UNSIGNED_LONG 4
-#else
-#define SIZEOF_LONG 8
-#define SIZEOF_UNSIGNED_LONG 8
-#endif // _win_
-#endif // _32_
-
-#if !defined(SIZEOF_LONG)
-#error todo
-#endif
-
-#define SIZEOF_LONG_LONG 8
-#define SIZEOF_UNSIGNED_LONG_LONG 8
-
-#undef SIZEOF_SIZE_T // in case we include <Python.h> which defines it, too
-#define SIZEOF_SIZE_T SIZEOF_PTR
-
-#if defined(__INTEL_COMPILER)
-#pragma warning(disable 1292)
-#pragma warning(disable 1469)
-#pragma warning(disable 193)
-#pragma warning(disable 271)
-#pragma warning(disable 383)
-#pragma warning(disable 424)
-#pragma warning(disable 444)
-#pragma warning(disable 584)
-#pragma warning(disable 593)
-#pragma warning(disable 981)
-#pragma warning(disable 1418)
-#pragma warning(disable 304)
-#pragma warning(disable 810)
-#pragma warning(disable 1029)
-#pragma warning(disable 1419)
-#pragma warning(disable 177)
-#pragma warning(disable 522)
-#pragma warning(disable 858)
-#pragma warning(disable 111)
-#pragma warning(disable 1599)
-#pragma warning(disable 411)
-#pragma warning(disable 304)
-#pragma warning(disable 858)
-#pragma warning(disable 444)
-#pragma warning(disable 913)
-#pragma warning(disable 310)
-#pragma warning(disable 167)
-#pragma warning(disable 180)
-#pragma warning(disable 1572)
-#endif
-
-#if defined(_MSC_VER)
-#undef _WINSOCKAPI_
-#define _WINSOCKAPI_
-#undef NOMINMAX
-#define NOMINMAX
-#endif
diff --git a/contrib/lfalloc/src/util/system/types.h b/contrib/lfalloc/src/util/system/types.h
deleted file mode 100644
index af4f0adb13d..00000000000
--- a/contrib/lfalloc/src/util/system/types.h
+++ /dev/null
@@ -1,117 +0,0 @@
-#pragma once
-
-// DO_NOT_STYLE
-
-#include "platform.h"
-
-#include <inttypes.h>
-
-typedef int8_t i8;
-typedef int16_t i16;
-typedef uint8_t ui8;
-typedef uint16_t ui16;
-
-typedef int yssize_t;
-#define PRIYSZT "d"
-
-#if defined(_darwin_) && defined(_32_)
-typedef unsigned long ui32;
-typedef long i32;
-#else
-typedef uint32_t ui32;
-typedef int32_t i32;
-#endif
-
-#if defined(_darwin_) && defined(_64_)
-typedef unsigned long ui64;
-typedef long i64;
-#else
-typedef uint64_t ui64;
-typedef int64_t i64;
-#endif
-
-#define LL(number) INT64_C(number)
-#define ULL(number) UINT64_C(number)
-
-// Macro for size_t and ptrdiff_t types
-#if defined(_32_)
-#   if defined(_darwin_)
-#       define PRISZT "lu"
-#       undef PRIi32
-#       define PRIi32 "li"
-#       undef SCNi32
-#       define SCNi32 "li"
-#       undef PRId32
-#       define PRId32 "li"
-#       undef SCNd32
-#       define SCNd32 "li"
-#       undef PRIu32
-#       define PRIu32 "lu"
-#       undef SCNu32
-#       define SCNu32 "lu"
-#       undef PRIx32
-#       define PRIx32 "lx"
-#       undef SCNx32
-#       define SCNx32 "lx"
-#   elif !defined(_cygwin_)
-#       define PRISZT PRIu32
-#   else
-#       define PRISZT "u"
-#   endif
-#   define SCNSZT SCNu32
-#   define PRIPDT PRIi32
-#   define SCNPDT SCNi32
-#   define PRITMT PRIi32
-#   define SCNTMT SCNi32
-#elif defined(_64_)
-#   if defined(_darwin_)
-#       define PRISZT "lu"
-#       undef PRIu64
-#       define PRIu64 PRISZT
-#       undef PRIx64
-#       define PRIx64 "lx"
-#       undef PRIX64
-#       define PRIX64 "lX"
-#       undef PRId64
-#       define PRId64 "ld"
-#       undef PRIi64
-#       define PRIi64 "li"
-#       undef SCNi64
-#       define SCNi64 "li"
-#       undef SCNu64
-#       define SCNu64 "lu"
-#       undef SCNx64
-#       define SCNx64 "lx"
-#   else
-#       define PRISZT PRIu64
-#   endif
-#   define SCNSZT SCNu64
-#   define PRIPDT PRIi64
-#   define SCNPDT SCNi64
-#   define PRITMT PRIi64
-#   define SCNTMT SCNi64
-#else
-#   error "Unsupported platform"
-#endif
-
-// SUPERLONG
-#if !defined(DONT_USE_SUPERLONG) && !defined(SUPERLONG_MAX)
-#define SUPERLONG_MAX ~LL(0)
-typedef i64 SUPERLONG;
-#endif
-
-// UNICODE
-// UCS-2, native byteorder
-typedef ui16 wchar16;
-// internal symbol type: UTF-16LE
-typedef wchar16 TChar;
-typedef ui32 wchar32;
-
-#if defined(_MSC_VER)
-#include <basetsd.h>
-typedef SSIZE_T ssize_t;
-#define HAVE_SSIZE_T 1
-#include <wchar.h>
-#endif
-
-#include <sys/types.h>
diff --git a/contrib/libhdfs3-cmake/CMake/Platform.cmake b/contrib/libhdfs3-cmake/CMake/Platform.cmake
index 55fbf646589..d9bc760ee3f 100644
--- a/contrib/libhdfs3-cmake/CMake/Platform.cmake
+++ b/contrib/libhdfs3-cmake/CMake/Platform.cmake
@@ -15,9 +15,14 @@ IF(CMAKE_COMPILER_IS_GNUCXX)
     
     STRING(REGEX MATCHALL "[0-9]+" GCC_COMPILER_VERSION ${GCC_COMPILER_VERSION})
     
+    LIST(LENGTH GCC_COMPILER_VERSION GCC_COMPILER_VERSION_LENGTH)
     LIST(GET GCC_COMPILER_VERSION 0 GCC_COMPILER_VERSION_MAJOR)
-    LIST(GET GCC_COMPILER_VERSION 0 GCC_COMPILER_VERSION_MINOR)
-    
+    if (GCC_COMPILER_VERSION_LENGTH GREATER 1)
+        LIST(GET GCC_COMPILER_VERSION 1 GCC_COMPILER_VERSION_MINOR)
+    else ()
+        set (GCC_COMPILER_VERSION_MINOR 0)
+    endif ()
+
     SET(GCC_COMPILER_VERSION_MAJOR ${GCC_COMPILER_VERSION_MAJOR} CACHE INTERNAL "gcc major version")
     SET(GCC_COMPILER_VERSION_MINOR ${GCC_COMPILER_VERSION_MINOR} CACHE INTERNAL "gcc minor version")
     
diff --git a/contrib/mimalloc b/contrib/mimalloc
new file mode 160000
index 00000000000..a787bdebce9
--- /dev/null
+++ b/contrib/mimalloc
@@ -0,0 +1 @@
+Subproject commit a787bdebce94bf3776dc0d1ad597917f479ab8d5
diff --git a/dbms/CMakeLists.txt b/dbms/CMakeLists.txt
index e2cc16fe122..4b47b77dec2 100644
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@@ -223,8 +223,9 @@ if(RE2_INCLUDE_DIR)
     target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
 endif()
 
-if (USE_LFALLOC)
-    target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${LFALLOC_INCLUDE_DIR})
+if (USE_MIMALLOC)
+    target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${MIMALLOC_INCLUDE_DIR})
+    target_link_libraries (clickhouse_common_io PRIVATE ${MIMALLOC_LIBRARY})
 endif ()
 
 if(CPUID_LIBRARY)
diff --git a/dbms/programs/client/readpassphrase/readpassphrase.h b/dbms/programs/client/readpassphrase/readpassphrase.h
index d504cff5f00..272c822423a 100644
--- a/dbms/programs/client/readpassphrase/readpassphrase.h
+++ b/dbms/programs/client/readpassphrase/readpassphrase.h
@@ -29,6 +29,11 @@
 //#include "includes.h"
 #include "config_client.h"
 
+// Should not be included on BSD systems, but if it happen...
+#ifdef HAVE_READPASSPHRASE
+#   include_next <readpassphrase.h>
+#endif
+
 #ifndef HAVE_READPASSPHRASE
 
 #    ifdef __cplusplus
diff --git a/dbms/programs/copier/ClusterCopier.cpp b/dbms/programs/copier/ClusterCopier.cpp
index 0b78e4a54cf..e4bdbfac7dc 100644
--- a/dbms/programs/copier/ClusterCopier.cpp
+++ b/dbms/programs/copier/ClusterCopier.cpp
@@ -96,14 +96,19 @@ namespace
 
 using DatabaseAndTableName = std::pair<String, String>;
 
-String getDatabaseDotTable(const String & database, const String & table)
+String getQuotedTable(const String & database, const String & table)
 {
+    if (database.empty())
+    {
+        return backQuoteIfNeed(table);
+    }
+
     return backQuoteIfNeed(database) + "." + backQuoteIfNeed(table);
 }
 
-String getDatabaseDotTable(const DatabaseAndTableName & db_and_table)
+String getQuotedTable(const DatabaseAndTableName & db_and_table)
 {
-    return getDatabaseDotTable(db_and_table.first, db_and_table.second);
+    return getQuotedTable(db_and_table.first, db_and_table.second);
 }
 
 
@@ -467,7 +472,7 @@ String DB::TaskShard::getDescription() const
     std::stringstream ss;
     ss << "N" << numberInCluster()
        << " (having a replica " << getHostNameExample()
-       << ", pull table " + getDatabaseDotTable(task_table.table_pull)
+       << ", pull table " + getQuotedTable(task_table.table_pull)
        << " of cluster " + task_table.cluster_pull_name << ")";
     return ss.str();
 }
@@ -741,8 +746,10 @@ public:
     {
         auto zookeeper = context.getZooKeeper();
 
-        task_description_watch_callback = [this] (const Coordination::WatchResponse &)
+        task_description_watch_callback = [this] (const Coordination::WatchResponse & response)
         {
+            if (response.error != Coordination::ZOK)
+                return;
             UInt64 version = ++task_descprtion_version;
             LOG_DEBUG(log, "Task description should be updated, local version " << version);
         };
@@ -1296,7 +1303,7 @@ protected:
         /// Remove all status nodes
         zookeeper->tryRemoveRecursive(current_shards_path);
 
-        String query = "ALTER TABLE " + getDatabaseDotTable(task_table.table_push);
+        String query = "ALTER TABLE " + getQuotedTable(task_table.table_push);
         query += " DROP PARTITION " + task_partition.name + "";
 
         /// TODO: use this statement after servers will be updated up to 1.1.54310
@@ -1539,7 +1546,7 @@ protected:
         auto get_select_query = [&] (const DatabaseAndTableName & from_table, const String & fields, String limit = "")
         {
             String query;
-            query += "SELECT " + fields + " FROM " + getDatabaseDotTable(from_table);
+            query += "SELECT " + fields + " FROM " + getQuotedTable(from_table);
             /// TODO: Bad, it is better to rewrite with ASTLiteral(partition_key_field)
             query += " WHERE (" + queryToString(task_table.engine_push_partition_key_ast) + " = (" + task_partition.name + " AS partition_key))";
             if (!task_table.where_condition_str.empty())
@@ -1677,7 +1684,7 @@ protected:
             LOG_DEBUG(log, "Create destination tables. Query: " << query);
             UInt64 shards = executeQueryOnCluster(task_table.cluster_push, query, create_query_push_ast, &task_cluster->settings_push,
                                     PoolMode::GET_MANY);
-            LOG_DEBUG(log, "Destination tables " << getDatabaseDotTable(task_table.table_push) << " have been created on " << shards
+            LOG_DEBUG(log, "Destination tables " << getQuotedTable(task_table.table_push) << " have been created on " << shards
                                                  << " shards of " << task_table.cluster_push->getShardCount());
         }
 
@@ -1699,7 +1706,7 @@ protected:
             ASTPtr query_insert_ast;
             {
                 String query;
-                query += "INSERT INTO " + getDatabaseDotTable(task_shard.table_split_shard) + " VALUES ";
+                query += "INSERT INTO " + getQuotedTable(task_shard.table_split_shard) + " VALUES ";
 
                 ParserQuery p_query(query.data() + query.size());
                 query_insert_ast = parseQuery(p_query, query, 0);
@@ -1824,7 +1831,7 @@ protected:
 
     String getRemoteCreateTable(const DatabaseAndTableName & table, Connection & connection, const Settings * settings = nullptr)
     {
-        String query = "SHOW CREATE TABLE " + getDatabaseDotTable(table);
+        String query = "SHOW CREATE TABLE " + getQuotedTable(table);
         Block block = getBlockWithAllStreamData(std::make_shared<RemoteBlockInputStream>(
             connection, query, InterpreterShowCreateQuery::getSampleBlock(), context, settings));
 
@@ -1887,7 +1894,7 @@ protected:
         {
             WriteBufferFromOwnString wb;
             wb << "SELECT DISTINCT " << queryToString(task_table.engine_push_partition_key_ast) << " AS partition FROM"
-               << " " << getDatabaseDotTable(task_shard.table_read_shard) << " ORDER BY partition DESC";
+               << " " << getQuotedTable(task_shard.table_read_shard) << " ORDER BY partition DESC";
             query = wb.str();
         }
 
@@ -1929,7 +1936,7 @@ protected:
         {
             WriteBufferFromOwnString wb;
             wb << "SELECT 1"
-               << " FROM "<< getDatabaseDotTable(task_shard.table_read_shard)
+               << " FROM "<< getQuotedTable(task_shard.table_read_shard)
                << " WHERE " << queryToString(task_table.engine_push_partition_key_ast) << " = " << partition_quoted_name
                << " LIMIT 1";
             query = wb.str();
diff --git a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp
index 49cfe0271d8..a08a485ea1c 100644
--- a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp
@@ -43,11 +43,11 @@ static IAggregateFunction * createWithExtraTypes(const DataTypePtr & argument_ty
     else if (which.idx == TypeIndex::DateTime) return new AggregateFunctionGroupUniqArrayDateTime<has_limit>(argument_type, std::forward<TArgs>(args)...);
     else
     {
-        /// Check that we can use plain version of AggreagteFunctionGroupUniqArrayGeneric
+        /// Check that we can use plain version of AggregateFunctionGroupUniqArrayGeneric
         if (argument_type->isValueUnambiguouslyRepresentedInContiguousMemoryRegion())
-            return new AggreagteFunctionGroupUniqArrayGeneric<true, has_limit>(argument_type, std::forward<TArgs>(args)...);
+            return new AggregateFunctionGroupUniqArrayGeneric<true, has_limit>(argument_type, std::forward<TArgs>(args)...);
         else
-            return new AggreagteFunctionGroupUniqArrayGeneric<false, has_limit>(argument_type, std::forward<TArgs>(args)...);
+            return new AggregateFunctionGroupUniqArrayGeneric<false, has_limit>(argument_type, std::forward<TArgs>(args)...);
     }
 }
 
diff --git a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h
index f2b339d2729..7a913c48ffa 100644
--- a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h
@@ -122,7 +122,7 @@ public:
 
 
 /// Generic implementation, it uses serialized representation as object descriptor.
-struct AggreagteFunctionGroupUniqArrayGenericData
+struct AggregateFunctionGroupUniqArrayGenericData
 {
     static constexpr size_t INIT_ELEMS = 2; /// adjustable
     static constexpr size_t ELEM_SIZE = sizeof(HashSetCellWithSavedHash<StringRef, StringRefHash>);
@@ -132,7 +132,7 @@ struct AggreagteFunctionGroupUniqArrayGenericData
 };
 
 
-/// Helper function for deserialize and insert for the class AggreagteFunctionGroupUniqArrayGeneric
+/// Helper function for deserialize and insert for the class AggregateFunctionGroupUniqArrayGeneric
 template <bool is_plain_column>
 static StringRef getSerializationImpl(const IColumn & column, size_t row_num, Arena & arena);
 
@@ -143,15 +143,15 @@ static void deserializeAndInsertImpl(StringRef str, IColumn & data_to);
  *  For such columns groupUniqArray() can be implemented more efficiently (especially for small numeric arrays).
  */
 template <bool is_plain_column = false, typename Tlimit_num_elem = std::false_type>
-class AggreagteFunctionGroupUniqArrayGeneric
-    : public IAggregateFunctionDataHelper<AggreagteFunctionGroupUniqArrayGenericData, AggreagteFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>
+class AggregateFunctionGroupUniqArrayGeneric
+    : public IAggregateFunctionDataHelper<AggregateFunctionGroupUniqArrayGenericData, AggregateFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>
 {
     DataTypePtr & input_data_type;
 
     static constexpr bool limit_num_elems = Tlimit_num_elem::value;
     UInt64 max_elems;
 
-    using State = AggreagteFunctionGroupUniqArrayGenericData;
+    using State = AggregateFunctionGroupUniqArrayGenericData;
 
     static StringRef getSerialization(const IColumn & column, size_t row_num, Arena & arena)
     {
@@ -164,8 +164,8 @@ class AggreagteFunctionGroupUniqArrayGeneric
     }
 
 public:
-    AggreagteFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type, UInt64 max_elems_ = std::numeric_limits<UInt64>::max())
-        : IAggregateFunctionDataHelper<AggreagteFunctionGroupUniqArrayGenericData, AggreagteFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>({input_data_type}, {})
+    AggregateFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type, UInt64 max_elems_ = std::numeric_limits<UInt64>::max())
+        : IAggregateFunctionDataHelper<AggregateFunctionGroupUniqArrayGenericData, AggregateFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>({input_data_type}, {})
         , input_data_type(this->argument_types[0])
         , max_elems(max_elems_) {}
 
diff --git a/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h b/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h
index 017b6d113dc..80860fdb62a 100644
--- a/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h
@@ -47,8 +47,7 @@ struct AggregateFunctionSequenceMatchData final
     using Comparator = ComparePairFirst<std::less>;
 
     bool sorted = true;
-    static constexpr size_t bytes_in_arena = 64;
-    PODArray<TimestampEvents, bytes_in_arena, AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>> events_list;
+    PODArrayWithStackMemory<TimestampEvents, 64> events_list;
 
     void add(const Timestamp timestamp, const Events & events)
     {
@@ -203,8 +202,7 @@ private:
         PatternAction(const PatternActionType type, const std::uint64_t extra = 0) : type{type}, extra{extra} {}
     };
 
-    static constexpr size_t bytes_on_stack = 64;
-    using PatternActions = PODArray<PatternAction, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using PatternActions = PODArrayWithStackMemory<PatternAction, 64>;
 
     Derived & derived() { return static_cast<Derived &>(*this); }
 
diff --git a/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h b/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h
index c74ad8c0bdb..5e2a9b15f4e 100644
--- a/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h
@@ -68,9 +68,8 @@ struct AggregateFunctionTimeSeriesGroupSumData
         }
     };
 
-    static constexpr size_t bytes_on_stack = 128;
     typedef std::map<UInt64, Points> Series;
-    typedef PODArray<DataPoint, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>> AggSeries;
+    typedef PODArrayWithStackMemory<DataPoint, 128> AggSeries;
     Series ss;
     AggSeries result;
 
diff --git a/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h b/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
index 9a738d3fefb..1e3c005f73f 100644
--- a/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
@@ -35,10 +35,7 @@ template <typename T>
 struct AggregateFunctionWindowFunnelData
 {
     using TimestampEvent = std::pair<T, UInt8>;
-
-    static constexpr size_t bytes_on_stack = 64;
-    using TimestampEvents = PODArray<TimestampEvent, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
-
+    using TimestampEvents = PODArray<TimestampEvent, 64>;
     using Comparator = ComparePairFirst;
 
     bool sorted = true;
diff --git a/dbms/src/AggregateFunctions/QuantileExact.h b/dbms/src/AggregateFunctions/QuantileExact.h
index b4398e8bb7f..a5b616669b9 100644
--- a/dbms/src/AggregateFunctions/QuantileExact.h
+++ b/dbms/src/AggregateFunctions/QuantileExact.h
@@ -27,8 +27,7 @@ struct QuantileExact
 {
     /// The memory will be allocated to several elements at once, so that the state occupies 64 bytes.
     static constexpr size_t bytes_in_arena = 64 - sizeof(PODArray<Value>);
-
-    using Array = PODArray<Value, bytes_in_arena, AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>>;
+    using Array = PODArrayWithStackMemory<Value, bytes_in_arena>;
     Array array;
 
     void add(const Value & x)
diff --git a/dbms/src/AggregateFunctions/QuantileTDigest.h b/dbms/src/AggregateFunctions/QuantileTDigest.h
index e9f261d4c21..f7201ef3b0d 100644
--- a/dbms/src/AggregateFunctions/QuantileTDigest.h
+++ b/dbms/src/AggregateFunctions/QuantileTDigest.h
@@ -86,8 +86,7 @@ class QuantileTDigest
 
     /// The memory will be allocated to several elements at once, so that the state occupies 64 bytes.
     static constexpr size_t bytes_in_arena = 128 - sizeof(PODArray<Centroid>) - sizeof(Count) - sizeof(UInt32);
-
-    using Summary = PODArray<Centroid, bytes_in_arena / sizeof(Centroid), AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>>;
+    using Summary = PODArrayWithStackMemory<Centroid, bytes_in_arena>;
 
     Summary summary;
     Count count = 0;
diff --git a/dbms/src/AggregateFunctions/ReservoirSampler.h b/dbms/src/AggregateFunctions/ReservoirSampler.h
index ad5bf10f48f..30d72709ac2 100644
--- a/dbms/src/AggregateFunctions/ReservoirSampler.h
+++ b/dbms/src/AggregateFunctions/ReservoirSampler.h
@@ -194,8 +194,7 @@ private:
     friend void rs_perf_test();
 
     /// We allocate a little memory on the stack - to avoid allocations when there are many objects with a small number of elements.
-    static constexpr size_t bytes_on_stack = 64;
-    using Array = DB::PODArray<T, bytes_on_stack / sizeof(T), AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using Array = DB::PODArrayWithStackMemory<T, 64>;
 
     size_t sample_count;
     size_t total_values = 0;
diff --git a/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h b/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h
index c543e662b2a..4beeecd93bc 100644
--- a/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h
+++ b/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h
@@ -164,9 +164,8 @@ public:
 
 private:
     /// We allocate some memory on the stack to avoid allocations when there are many objects with a small number of elements.
-    static constexpr size_t bytes_on_stack = 64;
     using Element = std::pair<T, UInt32>;
-    using Array = DB::PODArray<Element, bytes_on_stack / sizeof(Element), AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using Array = DB::PODArray<Element, 64>;
 
     size_t sample_count;
     size_t total_values{};
diff --git a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
index 9f3e16071a2..cc03965715c 100644
--- a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
+++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
@@ -28,6 +28,9 @@ void registerAggregateFunctionTopK(AggregateFunctionFactory &);
 void registerAggregateFunctionsBitwise(AggregateFunctionFactory &);
 void registerAggregateFunctionsBitmap(AggregateFunctionFactory &);
 void registerAggregateFunctionsMaxIntersections(AggregateFunctionFactory &);
+void registerAggregateFunctionHistogram(AggregateFunctionFactory &);
+void registerAggregateFunctionRetention(AggregateFunctionFactory &);
+void registerAggregateFunctionTimeSeriesGroupSum(AggregateFunctionFactory &);
 void registerAggregateFunctionMLMethod(AggregateFunctionFactory &);
 void registerAggregateFunctionEntropy(AggregateFunctionFactory &);
 void registerAggregateFunctionSimpleLinearRegression(AggregateFunctionFactory &);
@@ -41,9 +44,6 @@ void registerAggregateFunctionCombinatorMerge(AggregateFunctionCombinatorFactory
 void registerAggregateFunctionCombinatorNull(AggregateFunctionCombinatorFactory &);
 void registerAggregateFunctionCombinatorResample(AggregateFunctionCombinatorFactory &);
 
-void registerAggregateFunctionHistogram(AggregateFunctionFactory & factory);
-void registerAggregateFunctionRetention(AggregateFunctionFactory & factory);
-void registerAggregateFunctionTimeSeriesGroupSum(AggregateFunctionFactory & factory);
 void registerAggregateFunctions()
 {
     {
diff --git a/dbms/src/Columns/ColumnVector.cpp b/dbms/src/Columns/ColumnVector.cpp
index 6db110ef02e..a2d6de9df80 100644
--- a/dbms/src/Columns/ColumnVector.cpp
+++ b/dbms/src/Columns/ColumnVector.cpp
@@ -33,7 +33,7 @@ template <typename T>
 StringRef ColumnVector<T>::serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const
 {
     auto pos = arena.allocContinue(sizeof(T), begin);
-    unalignedStore(pos, data[n]);
+    unalignedStore<T>(pos, data[n]);
     return StringRef(pos, sizeof(T));
 }
 
diff --git a/dbms/src/Common/LFAllocator.cpp b/dbms/src/Common/LFAllocator.cpp
deleted file mode 100644
index 71396d341ab..00000000000
--- a/dbms/src/Common/LFAllocator.cpp
+++ /dev/null
@@ -1,53 +0,0 @@
-#include <Common/config.h>
-
-#if USE_LFALLOC
-#include "LFAllocator.h"
-
-#include <cstring>
-#include <lf_allocX64.h>
-
-namespace DB
-{
-
-void * LFAllocator::alloc(size_t size, size_t alignment)
-{
-    if (alignment == 0)
-        return LFAlloc(size);
-    else
-    {
-        void * ptr;
-        int res = LFPosixMemalign(&ptr, alignment, size);
-        return res ? nullptr : ptr;
-    }
-}
-
-void LFAllocator::free(void * buf, size_t)
-{
-    LFFree(buf);
-}
-
-void * LFAllocator::realloc(void * old_ptr, size_t, size_t new_size, size_t alignment)
-{
-    if (old_ptr == nullptr)
-    {
-        void * result = LFAllocator::alloc(new_size, alignment);
-        return result;
-    }
-    if (new_size == 0)
-    {
-        LFFree(old_ptr);
-        return nullptr;
-    }
-
-    void * new_ptr = LFAllocator::alloc(new_size, alignment);
-    if (new_ptr == nullptr)
-        return nullptr;
-    size_t old_size = LFGetSize(old_ptr);
-    memcpy(new_ptr, old_ptr, ((old_size < new_size) ? old_size : new_size));
-    LFFree(old_ptr);
-    return new_ptr;
-}
-
-}
-
-#endif
diff --git a/dbms/src/Common/LFAllocator.h b/dbms/src/Common/LFAllocator.h
deleted file mode 100644
index f2a10cc4508..00000000000
--- a/dbms/src/Common/LFAllocator.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#pragma once
-
-#include <Common/config.h>
-
-#if !USE_LFALLOC
-#error "do not include this file until USE_LFALLOC is set to 1"
-#endif
-
-#include <cstddef>
-
-namespace DB
-{
-struct LFAllocator
-{
-    static void * alloc(size_t size, size_t alignment = 0);
-
-    static void free(void * buf, size_t);
-
-    static void * realloc(void * buf, size_t, size_t new_size, size_t alignment = 0);
-};
-
-}
diff --git a/dbms/src/Common/MiAllocator.cpp b/dbms/src/Common/MiAllocator.cpp
new file mode 100644
index 00000000000..456609374ee
--- /dev/null
+++ b/dbms/src/Common/MiAllocator.cpp
@@ -0,0 +1,43 @@
+#include <Common/config.h>
+
+#if USE_MIMALLOC
+
+#include "MiAllocator.h"
+#include <mimalloc.h>
+
+namespace DB
+{
+
+void * MiAllocator::alloc(size_t size, size_t alignment)
+{
+    if (alignment == 0)
+        return mi_malloc(size);
+    else
+        return mi_malloc_aligned(size, alignment);
+}
+
+void MiAllocator::free(void * buf, size_t)
+{
+    mi_free(buf);
+}
+
+void * MiAllocator::realloc(void * old_ptr, size_t, size_t new_size, size_t alignment)
+{
+    if (old_ptr == nullptr)
+        return alloc(new_size, alignment);
+
+    if (new_size == 0)
+    {
+        mi_free(old_ptr);
+        return nullptr;
+    }
+
+    if (alignment == 0)
+        return mi_realloc(old_ptr, alignment);
+
+    return mi_realloc_aligned(old_ptr, new_size, alignment);
+}
+
+}
+
+#endif
diff --git a/dbms/src/Common/MiAllocator.h b/dbms/src/Common/MiAllocator.h
new file mode 100644
index 00000000000..48cfc6f9ab4
--- /dev/null
+++ b/dbms/src/Common/MiAllocator.h
@@ -0,0 +1,28 @@
+#pragma once
+
+#include <Common/config.h>
+
+#if !USE_MIMALLOC
+#error "do not include this file until USE_MIMALLOC is set to 1"
+#endif
+
+#include <cstddef>
+
+namespace DB
+{
+
+/*
+ * This is a different allocator that is based on mimalloc (Microsoft malloc).
+ * It can be used separately from main allocator to catch heap corruptions and vulnerabilities (for example, for caches).
+ * We use MI_SECURE mode in mimalloc to achieve such behaviour.
+ */
+struct MiAllocator
+{
+    static void * alloc(size_t size, size_t alignment = 0);
+
+    static void free(void * buf, size_t);
+
+    static void * realloc(void * old_ptr, size_t, size_t new_size, size_t alignment = 0);
+};
+
+}
diff --git a/dbms/src/Common/PODArray.h b/dbms/src/Common/PODArray.h
index 0e7d547a7d0..01085a2c5a7 100644
--- a/dbms/src/Common/PODArray.h
+++ b/dbms/src/Common/PODArray.h
@@ -45,7 +45,7 @@ inline constexpr size_t integerRoundUp(size_t value, size_t dividend)
   * Only part of the std::vector interface is supported.
   *
   * The default constructor creates an empty object that does not allocate memory.
-  * Then the memory is allocated at least INITIAL_SIZE bytes.
+  * Then the memory is allocated at least initial_bytes bytes.
   *
   * If you insert elements with push_back, without making a `reserve`, then PODArray is about 2.5 times faster than std::vector.
   *
@@ -74,7 +74,7 @@ extern const char EmptyPODArray[EmptyPODArraySize];
 /** Base class that depend only on size of element, not on element itself.
   * You can static_cast to this class if you want to insert some data regardless to the actual type T.
   */
-template <size_t ELEMENT_SIZE, size_t INITIAL_SIZE, typename TAllocator, size_t pad_right_, size_t pad_left_>
+template <size_t ELEMENT_SIZE, size_t initial_bytes, typename TAllocator, size_t pad_right_, size_t pad_left_>
 class PODArrayBase : private boost::noncopyable, private TAllocator    /// empty base optimization
 {
 protected:
@@ -161,7 +161,8 @@ protected:
         {
             // The allocated memory should be multiplication of ELEMENT_SIZE to hold the element, otherwise,
             // memory issue such as corruption could appear in edge case.
-            realloc(std::max(((INITIAL_SIZE - 1) / ELEMENT_SIZE + 1) * ELEMENT_SIZE, minimum_memory_for_elements(1)),
+            realloc(std::max(integerRoundUp(initial_bytes, ELEMENT_SIZE),
+                             minimum_memory_for_elements(1)),
                     std::forward<TAllocatorParams>(allocator_params)...);
         }
         else
@@ -257,11 +258,11 @@ public:
     }
 };
 
-template <typename T, size_t INITIAL_SIZE = 4096, typename TAllocator = Allocator<false>, size_t pad_right_ = 0, size_t pad_left_ = 0>
-class PODArray : public PODArrayBase<sizeof(T), INITIAL_SIZE, TAllocator, pad_right_, pad_left_>
+template <typename T, size_t initial_bytes = 4096, typename TAllocator = Allocator<false>, size_t pad_right_ = 0, size_t pad_left_ = 0>
+class PODArray : public PODArrayBase<sizeof(T), initial_bytes, TAllocator, pad_right_, pad_left_>
 {
 protected:
-    using Base = PODArrayBase<sizeof(T), INITIAL_SIZE, TAllocator, pad_right_, pad_left_>;
+    using Base = PODArrayBase<sizeof(T), initial_bytes, TAllocator, pad_right_, pad_left_>;
 
     T * t_start()                      { return reinterpret_cast<T *>(this->c_start); }
     T * t_end()                        { return reinterpret_cast<T *>(this->c_end); }
@@ -618,17 +619,23 @@ public:
     }
 };
 
-template <typename T, size_t INITIAL_SIZE, typename TAllocator, size_t pad_right_>
-void swap(PODArray<T, INITIAL_SIZE, TAllocator, pad_right_> & lhs, PODArray<T, INITIAL_SIZE, TAllocator, pad_right_> & rhs)
+template <typename T, size_t initial_bytes, typename TAllocator, size_t pad_right_>
+void swap(PODArray<T, initial_bytes, TAllocator, pad_right_> & lhs, PODArray<T, initial_bytes, TAllocator, pad_right_> & rhs)
 {
     lhs.swap(rhs);
 }
 
 /** For columns. Padding is enough to read and write xmm-register at the address of the last element. */
-template <typename T, size_t INITIAL_SIZE = 4096, typename TAllocator = Allocator<false>>
-using PaddedPODArray = PODArray<T, INITIAL_SIZE, TAllocator, 15, 16>;
+template <typename T, size_t initial_bytes = 4096, typename TAllocator = Allocator<false>>
+using PaddedPODArray = PODArray<T, initial_bytes, TAllocator, 15, 16>;
 
-template <typename T, size_t stack_size_in_bytes>
-using PODArrayWithStackMemory = PODArray<T, 0, AllocatorWithStackMemory<Allocator<false>, integerRoundUp(stack_size_in_bytes, sizeof(T))>>;
+/** A helper for declaring PODArray that uses inline memory.
+  * The initial size is set to use all the inline bytes, since using less would
+  * only add some extra allocation calls.
+  */
+template <typename T, size_t inline_bytes,
+          size_t rounded_bytes = integerRoundUp(inline_bytes, sizeof(T))>
+using PODArrayWithStackMemory = PODArray<T, rounded_bytes,
+    AllocatorWithStackMemory<Allocator<false>, rounded_bytes>>;
 
 }
diff --git a/dbms/src/Common/ThreadPool.cpp b/dbms/src/Common/ThreadPool.cpp
index 6ed350240c6..91ec29dc188 100644
--- a/dbms/src/Common/ThreadPool.cpp
+++ b/dbms/src/Common/ThreadPool.cpp
@@ -30,10 +30,18 @@ template <typename Thread>
 template <typename ReturnType>
 ReturnType ThreadPoolImpl<Thread>::scheduleImpl(Job job, int priority, std::optional<uint64_t> wait_microseconds)
 {
-    auto on_error = []
+    auto on_error = [&]
     {
         if constexpr (std::is_same_v<ReturnType, void>)
+        {
+            if (first_exception)
+            {
+                std::exception_ptr exception;
+                std::swap(exception, first_exception);
+                std::rethrow_exception(exception);
+            }
             throw DB::Exception("Cannot schedule a task", DB::ErrorCodes::CANNOT_SCHEDULE_TASK);
+        }
         else
             return false;
     };
diff --git a/dbms/src/Common/config.h.in b/dbms/src/Common/config.h.in
index 08d8e7e9af1..9b38dd9fc04 100644
--- a/dbms/src/Common/config.h.in
+++ b/dbms/src/Common/config.h.in
@@ -8,7 +8,6 @@
 #cmakedefine01 USE_CPUID
 #cmakedefine01 USE_CPUINFO
 #cmakedefine01 USE_BROTLI
-#cmakedefine01 USE_LFALLOC
-#cmakedefine01 USE_LFALLOC_RANDOM_HINT
+#cmakedefine01 USE_MIMALLOC
 
 #cmakedefine01 CLICKHOUSE_SPLIT_BINARY
diff --git a/dbms/src/Common/formatIPv6.cpp b/dbms/src/Common/formatIPv6.cpp
index 71f6c934a15..f8100557ba2 100644
--- a/dbms/src/Common/formatIPv6.cpp
+++ b/dbms/src/Common/formatIPv6.cpp
@@ -10,7 +10,8 @@ namespace DB
 {
 
 // To be used in formatIPv4, maps a byte to it's string form prefixed with length (so save strlen call).
-extern const char one_byte_to_string_lookup_table[256][4] = {
+extern const char one_byte_to_string_lookup_table[256][4] =
+{
     {1, '0'}, {1, '1'}, {1, '2'}, {1, '3'}, {1, '4'}, {1, '5'}, {1, '6'}, {1, '7'}, {1, '8'}, {1, '9'},
     {2, '1', '0'}, {2, '1', '1'}, {2, '1', '2'}, {2, '1', '3'}, {2, '1', '4'}, {2, '1', '5'}, {2, '1', '6'}, {2, '1', '7'}, {2, '1', '8'}, {2, '1', '9'},
     {2, '2', '0'}, {2, '2', '1'}, {2, '2', '2'}, {2, '2', '3'}, {2, '2', '4'}, {2, '2', '5'}, {2, '2', '6'}, {2, '2', '7'}, {2, '2', '8'}, {2, '2', '9'},
@@ -152,7 +153,7 @@ void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_
     }
 
     /// Was it a trailing run of 0x00's?
-    if (best.base != -1 && (best.base + best.len) == words.size())
+    if (best.base != -1 && size_t(best.base + best.len) == words.size())
         *dst++ = ':';
 
     *dst++ = '\0';
diff --git a/dbms/src/Common/tests/CMakeLists.txt b/dbms/src/Common/tests/CMakeLists.txt
index 1c6c7e9f504..23b1614e704 100644
--- a/dbms/src/Common/tests/CMakeLists.txt
+++ b/dbms/src/Common/tests/CMakeLists.txt
@@ -41,9 +41,6 @@ target_link_libraries (compact_array PRIVATE clickhouse_common_io ${Boost_FILESY
 add_executable (radix_sort radix_sort.cpp)
 target_link_libraries (radix_sort PRIVATE clickhouse_common_io)
 
-add_executable (shell_command_test shell_command_test.cpp)
-target_link_libraries (shell_command_test PRIVATE clickhouse_common_io)
-
 add_executable (arena_with_free_lists arena_with_free_lists.cpp)
 target_link_libraries (arena_with_free_lists PRIVATE clickhouse_compression clickhouse_common_io)
 
@@ -53,15 +50,6 @@ target_link_libraries (pod_array PRIVATE clickhouse_common_io)
 add_executable (thread_creation_latency thread_creation_latency.cpp)
 target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io)
 
-add_executable (thread_pool thread_pool.cpp)
-target_link_libraries (thread_pool PRIVATE clickhouse_common_io)
-
-add_executable (thread_pool_2 thread_pool_2.cpp)
-target_link_libraries (thread_pool_2 PRIVATE clickhouse_common_io)
-
-add_executable (thread_pool_3 thread_pool_3.cpp)
-target_link_libraries (thread_pool_3 PRIVATE clickhouse_common_io)
-
 add_executable (multi_version multi_version.cpp)
 target_link_libraries (multi_version PRIVATE clickhouse_common_io)
 add_check(multi_version)
diff --git a/dbms/src/Common/tests/gtest_shell_command.cpp b/dbms/src/Common/tests/gtest_shell_command.cpp
new file mode 100644
index 00000000000..2378cda2ee7
--- /dev/null
+++ b/dbms/src/Common/tests/gtest_shell_command.cpp
@@ -0,0 +1,72 @@
+#include <iostream>
+#include <Core/Types.h>
+#include <Common/ShellCommand.h>
+#include <IO/copyData.h>
+#include <IO/WriteBufferFromFileDescriptor.h>
+#include <IO/ReadBufferFromString.h>
+#include <IO/ReadHelpers.h>
+
+#include <chrono>
+#include <thread>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+
+using namespace DB;
+
+
+TEST(ShellCommand, Execute)
+{
+    auto command = ShellCommand::execute("echo 'Hello, world!'");
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, ExecuteDirect)
+{
+    auto command = ShellCommand::executeDirect("/bin/echo", {"Hello, world!"});
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, ExecuteWithInput)
+{
+    auto command = ShellCommand::execute("cat");
+
+    String in_str = "Hello, world!\n";
+    ReadBufferFromString in(in_str);
+    copyData(in, command->in);
+    command->in.close();
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, AutoWait)
+{
+    // <defunct> hunting:
+    for (int i = 0; i < 1000; ++i)
+    {
+        auto command = ShellCommand::execute("echo " + std::to_string(i));
+        //command->wait(); // now automatic
+    }
+
+    // std::cerr << "inspect me: ps auxwwf" << "\n";
+    // std::this_thread::sleep_for(std::chrono::seconds(100));
+}
diff --git a/dbms/src/Common/tests/thread_pool.cpp b/dbms/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp
similarity index 73%
rename from dbms/src/Common/tests/thread_pool.cpp
rename to dbms/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp
index 23dba2aadec..1e38e418a22 100644
--- a/dbms/src/Common/tests/thread_pool.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp
@@ -1,11 +1,18 @@
 #include <Common/ThreadPool.h>
 
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
 /** Reproduces bug in ThreadPool.
   * It get stuck if we call 'wait' many times from many other threads simultaneously.
   */
 
 
-int main(int, char **)
+TEST(ThreadPool, ConcurrentWait)
 {
     auto worker = []
     {
@@ -29,6 +36,4 @@ int main(int, char **)
         waiting_pool.schedule([&pool]{ pool.wait(); });
 
     waiting_pool.wait();
-
-    return 0;
 }
diff --git a/dbms/src/Common/tests/gtest_thread_pool_limit.cpp b/dbms/src/Common/tests/gtest_thread_pool_limit.cpp
new file mode 100644
index 00000000000..2bd38f34d10
--- /dev/null
+++ b/dbms/src/Common/tests/gtest_thread_pool_limit.cpp
@@ -0,0 +1,32 @@
+#include <atomic>
+#include <iostream>
+#include <Common/ThreadPool.h>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+/// Test for thread self-removal when number of free threads in pool is too large.
+/// Just checks that nothing weird happens.
+
+template <typename Pool>
+int test()
+{
+    Pool pool(10, 2, 10);
+
+    std::atomic<int> counter{0};
+    for (size_t i = 0; i < 10; ++i)
+        pool.schedule([&]{ ++counter; });
+    pool.wait();
+
+    return counter;
+}
+
+TEST(ThreadPool, ThreadRemoval)
+{
+    EXPECT_EQ(test<FreeThreadPool>(), 10);
+    EXPECT_EQ(test<ThreadPool>(), 10);
+}
diff --git a/dbms/src/Common/tests/thread_pool_2.cpp b/dbms/src/Common/tests/gtest_thread_pool_loop.cpp
similarity index 50%
rename from dbms/src/Common/tests/thread_pool_2.cpp
rename to dbms/src/Common/tests/gtest_thread_pool_loop.cpp
index 029c3695e36..80b7b94d988 100644
--- a/dbms/src/Common/tests/thread_pool_2.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_loop.cpp
@@ -2,10 +2,17 @@
 #include <iostream>
 #include <Common/ThreadPool.h>
 
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
 
-int main(int, char **)
+
+TEST(ThreadPool, Loop)
 {
-    std::atomic<size_t> res{0};
+    std::atomic<int> res{0};
 
     for (size_t i = 0; i < 1000; ++i)
     {
@@ -16,6 +23,5 @@ int main(int, char **)
         pool.wait();
     }
 
-    std::cerr << res << "\n";
-    return 0;
+    EXPECT_EQ(res, 16000);
 }
diff --git a/dbms/src/Common/tests/gtest_thread_pool_schedule_exception.cpp b/dbms/src/Common/tests/gtest_thread_pool_schedule_exception.cpp
new file mode 100644
index 00000000000..001d9c30b27
--- /dev/null
+++ b/dbms/src/Common/tests/gtest_thread_pool_schedule_exception.cpp
@@ -0,0 +1,38 @@
+#include <iostream>
+#include <stdexcept>
+#include <Common/ThreadPool.h>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+
+bool check()
+{
+    ThreadPool pool(10);
+
+    pool.schedule([]{ throw std::runtime_error("Hello, world!"); });
+
+    try
+    {
+        for (size_t i = 0; i < 100; ++i)
+            pool.schedule([]{});    /// An exception will be rethrown from this method.
+    }
+    catch (const std::runtime_error &)
+    {
+        return true;
+    }
+
+    pool.wait();
+
+    return false;
+}
+
+
+TEST(ThreadPool, ExceptionFromSchedule)
+{
+    EXPECT_TRUE(check());
+}
diff --git a/dbms/src/Common/tests/shell_command_test.cpp b/dbms/src/Common/tests/shell_command_test.cpp
deleted file mode 100644
index 7de6c18bfdf..00000000000
--- a/dbms/src/Common/tests/shell_command_test.cpp
+++ /dev/null
@@ -1,63 +0,0 @@
-#include <iostream>
-#include <Core/Types.h>
-#include <Common/ShellCommand.h>
-#include <IO/copyData.h>
-#include <IO/WriteBufferFromFileDescriptor.h>
-#include <IO/ReadBufferFromString.h>
-
-#include <chrono>
-#include <thread>
-
-using namespace DB;
-
-
-int main(int, char **)
-try
-{
-    {
-        auto command = ShellCommand::execute("echo 'Hello, world!'");
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    {
-        auto command = ShellCommand::executeDirect("/bin/echo", {"Hello, world!"});
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    {
-        auto command = ShellCommand::execute("cat");
-
-        String in_str = "Hello, world!\n";
-        ReadBufferFromString in(in_str);
-        copyData(in, command->in);
-        command->in.close();
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    // <defunct> hunting:
-    for (int i = 0; i < 1000; ++i)
-    {
-        auto command = ShellCommand::execute("echo " + std::to_string(i));
-        //command->wait(); // now automatic
-    }
-
-    // std::cerr << "inspect me: ps auxwwf" << "\n";
-    // std::this_thread::sleep_for(std::chrono::seconds(100));
-}
-catch (...)
-{
-    std::cerr << getCurrentExceptionMessage(false) << "\n";
-    return 1;
-}
diff --git a/dbms/src/Common/tests/thread_pool_3.cpp b/dbms/src/Common/tests/thread_pool_3.cpp
deleted file mode 100644
index 924895de308..00000000000
--- a/dbms/src/Common/tests/thread_pool_3.cpp
+++ /dev/null
@@ -1,27 +0,0 @@
-#include <mutex>
-#include <iostream>
-#include <Common/ThreadPool.h>
-
-/// Test for thread self-removal when number of free threads in pool is too large.
-/// Just checks that nothing weird happens.
-
-template <typename Pool>
-void test()
-{
-    Pool pool(10, 2, 10);
-
-    std::mutex mutex;
-    for (size_t i = 0; i < 10; ++i)
-        pool.schedule([&]{ std::lock_guard lock(mutex); std::cerr << '.'; });
-    pool.wait();
-}
-
-int main(int, char **)
-{
-    test<FreeThreadPool>();
-    std::cerr << '\n';
-    test<ThreadPool>();
-    std::cerr << '\n';
-
-    return 0;
-}
diff --git a/dbms/src/Compression/CompressionCodecDelta.cpp b/dbms/src/Compression/CompressionCodecDelta.cpp
index 1a37b95d712..9f2397f8e59 100644
--- a/dbms/src/Compression/CompressionCodecDelta.cpp
+++ b/dbms/src/Compression/CompressionCodecDelta.cpp
@@ -48,7 +48,7 @@ void compressDataForType(const char * source, UInt32 source_size, char * dest)
     while (source < source_end)
     {
         T curr_src = unalignedLoad<T>(source);
-        unalignedStore(dest, curr_src - prev_src);
+        unalignedStore<T>(dest, curr_src - prev_src);
         prev_src = curr_src;
 
         source += sizeof(T);
@@ -67,7 +67,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
     while (source < source_end)
     {
         accumulator += unalignedLoad<T>(source);
-        unalignedStore(dest, accumulator);
+        unalignedStore<T>(dest, accumulator);
 
         source += sizeof(T);
         dest += sizeof(T);
diff --git a/dbms/src/Compression/CompressionCodecDoubleDelta.cpp b/dbms/src/Compression/CompressionCodecDoubleDelta.cpp
index b40b2abccfa..8f306f3f06a 100644
--- a/dbms/src/Compression/CompressionCodecDoubleDelta.cpp
+++ b/dbms/src/Compression/CompressionCodecDoubleDelta.cpp
@@ -90,7 +90,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
     const char * source_end = source + source_size;
 
     const UInt32 items_count = source_size / sizeof(T);
-    unalignedStore(dest, items_count);
+    unalignedStore<UInt32>(dest, items_count);
     dest += sizeof(items_count);
 
     T prev_value{};
@@ -99,7 +99,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
     if (source < source_end)
     {
         prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);
 
         source += sizeof(prev_value);
         dest += sizeof(prev_value);
@@ -109,7 +109,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
     {
         const T curr_value = unalignedLoad<T>(source);
         prev_delta = static_cast<DeltaType>(curr_value - prev_value);
-        unalignedStore(dest, prev_delta);
+        unalignedStore<T>(dest, prev_delta);
 
         source += sizeof(curr_value);
         dest += sizeof(prev_delta);
@@ -164,7 +164,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
     if (source < source_end)
     {
         prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);
 
         source += sizeof(prev_value);
         dest += sizeof(prev_value);
@@ -174,7 +174,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
     {
         prev_delta = unalignedLoad<DeltaType>(source);
         prev_value = static_cast<T>(prev_value + prev_delta);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);
 
         source += sizeof(prev_delta);
         dest += sizeof(prev_value);
@@ -209,7 +209,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
         // else if first bit is zero, no need to read more data.
 
         const T curr_value = static_cast<T>(prev_value + prev_delta + double_delta);
-        unalignedStore(dest, curr_value);
+        unalignedStore<T>(dest, curr_value);
         dest += sizeof(curr_value);
 
         prev_delta = curr_value - prev_value;
diff --git a/dbms/src/Compression/CompressionCodecGorilla.cpp b/dbms/src/Compression/CompressionCodecGorilla.cpp
index f9c6b52756c..79cc6d27e81 100644
--- a/dbms/src/Compression/CompressionCodecGorilla.cpp
+++ b/dbms/src/Compression/CompressionCodecGorilla.cpp
@@ -94,7 +94,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
 
     const UInt32 items_count = source_size / sizeof(T);
 
-    unalignedStore(dest, items_count);
+    unalignedStore<UInt32>(dest, items_count);
     dest += sizeof(items_count);
 
     T prev_value{};
@@ -104,7 +104,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
     if (source < source_end)
     {
         prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);
 
         source += sizeof(prev_value);
         dest += sizeof(prev_value);
@@ -166,7 +166,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
     if (source < source_end)
     {
         prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);
 
         source += sizeof(prev_value);
         dest += sizeof(prev_value);
@@ -210,7 +210,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
         }
         // else: 0b0 prefix - use prev_value
 
-        unalignedStore(dest, curr_value);
+        unalignedStore<T>(dest, curr_value);
         dest += sizeof(curr_value);
 
         prev_xored_info = curr_xored_info;
diff --git a/dbms/src/Compression/CompressionCodecT64.cpp b/dbms/src/Compression/CompressionCodecT64.cpp
index cd369fc9c4e..9919f5322c5 100644
--- a/dbms/src/Compression/CompressionCodecT64.cpp
+++ b/dbms/src/Compression/CompressionCodecT64.cpp
@@ -390,7 +390,7 @@ void decompressData(const char * src, UInt32 bytes_size, char * dst, UInt32 unco
     {
         _T min_value = min;
         for (UInt32 i = 0; i < num_elements; ++i, dst += sizeof(_T))
-            unalignedStore(dst, min_value);
+            unalignedStore<_T>(dst, min_value);
         return;
     }
 
diff --git a/dbms/src/Compression/LZ4_decompress_faster.cpp b/dbms/src/Compression/LZ4_decompress_faster.cpp
index 387650d3dcc..0d65a06b098 100644
--- a/dbms/src/Compression/LZ4_decompress_faster.cpp
+++ b/dbms/src/Compression/LZ4_decompress_faster.cpp
@@ -200,7 +200,7 @@ inline void copyOverlap8Shuffle(UInt8 * op, const UInt8 *& match, const size_t o
         0, 1, 2, 3, 4, 5, 6, 0,
     };
 
-    unalignedStore(op, vtbl1_u8(unalignedLoad<uint8x8_t>(match), unalignedLoad<uint8x8_t>(masks + 8 * offset)));
+    unalignedStore<uint8x8_t>(op, vtbl1_u8(unalignedLoad<uint8x8_t>(match), unalignedLoad<uint8x8_t>(masks + 8 * offset)));
     match += masks[offset];
 }
 
@@ -328,10 +328,10 @@ inline void copyOverlap16Shuffle(UInt8 * op, const UInt8 *& match, const size_t
         0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,  0,
     };
 
-    unalignedStore(op,
+    unalignedStore<uint8x8_t>(op,
         vtbl2_u8(unalignedLoad<uint8x8x2_t>(match), unalignedLoad<uint8x8_t>(masks + 16 * offset)));
 
-    unalignedStore(op + 8,
+    unalignedStore<uint8x8_t>(op + 8,
         vtbl2_u8(unalignedLoad<uint8x8x2_t>(match), unalignedLoad<uint8x8_t>(masks + 16 * offset + 8)));
 
     match += masks[offset];
diff --git a/dbms/src/Core/Settings.h b/dbms/src/Core/Settings.h
index c01b110970c..fe65788656b 100644
--- a/dbms/src/Core/Settings.h
+++ b/dbms/src/Core/Settings.h
@@ -323,7 +323,7 @@ struct Settings : public SettingsCollection<Settings>
     M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for 'mysql' table function.") \
     M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.") \
     \
-    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
+    M(SettingBool, allow_hyperscan, 1, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
     M(SettingBool, allow_simdjson, 1, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.") \
     \
     M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.")
diff --git a/dbms/src/DataStreams/MarkInCompressedFile.h b/dbms/src/DataStreams/MarkInCompressedFile.h
index ff07b2afbe1..a5970a89738 100644
--- a/dbms/src/DataStreams/MarkInCompressedFile.h
+++ b/dbms/src/DataStreams/MarkInCompressedFile.h
@@ -7,8 +7,8 @@
 #include <Common/PODArray.h>
 
 #include <Common/config.h>
-#if USE_LFALLOC
-#include <Common/LFAllocator.h>
+#if USE_MIMALLOC
+#include <Common/MiAllocator.h>
 #endif
 
 namespace DB
@@ -43,8 +43,8 @@ struct MarkInCompressedFile
     }
 
 };
-#if USE_LFALLOC
-using MarksInCompressedFile = PODArray<MarkInCompressedFile, 4096, LFAllocator>;
+#if USE_MIMALLOC
+using MarksInCompressedFile = PODArray<MarkInCompressedFile, 4096, MiAllocator>;
 #else
 using MarksInCompressedFile = PODArray<MarkInCompressedFile>;
 #endif
diff --git a/dbms/src/Dictionaries/Embedded/TechDataHierarchy.cpp b/dbms/src/Dictionaries/Embedded/TechDataHierarchy.cpp
deleted file mode 100644
index fc6a373efe3..00000000000
--- a/dbms/src/Dictionaries/Embedded/TechDataHierarchy.cpp
+++ /dev/null
@@ -1,60 +0,0 @@
-#include "config_core.h"
-#if USE_MYSQL
-
-#    include "TechDataHierarchy.h"
-
-#    include <common/logger_useful.h>
-#    include <mysqlxx/PoolWithFailover.h>
-
-
-static constexpr auto config_key = "mysql_metrica";
-
-
-void TechDataHierarchy::reload()
-{
-    Logger * log = &Logger::get("TechDataHierarchy");
-    LOG_DEBUG(log, "Loading tech data hierarchy.");
-
-    mysqlxx::PoolWithFailover pool(config_key);
-    mysqlxx::Pool::Entry conn = pool.Get();
-
-    {
-        mysqlxx::Query q = conn->query("SELECT Id, COALESCE(Parent_Id, 0) FROM OS2");
-        LOG_TRACE(log, q.str());
-        mysqlxx::UseQueryResult res = q.use();
-        while (mysqlxx::Row row = res.fetch())
-        {
-            UInt64 child = row[0].getUInt();
-            UInt64 parent = row[1].getUInt();
-
-            if (child > 255 || parent > 255)
-                throw Poco::Exception("Too large OS id (> 255).");
-
-            os_parent[child] = parent;
-        }
-    }
-
-    {
-        mysqlxx::Query q = conn->query("SELECT Id, COALESCE(ParentId, 0) FROM SearchEngines");
-        LOG_TRACE(log, q.str());
-        mysqlxx::UseQueryResult res = q.use();
-        while (mysqlxx::Row row = res.fetch())
-        {
-            UInt64 child = row[0].getUInt();
-            UInt64 parent = row[1].getUInt();
-
-            if (child > 255 || parent > 255)
-                throw Poco::Exception("Too large search engine id (> 255).");
-
-            se_parent[child] = parent;
-        }
-    }
-}
-
-
-bool TechDataHierarchy::isConfigured(const Poco::Util::AbstractConfiguration & config)
-{
-    return config.has(config_key);
-}
-
-#endif
diff --git a/dbms/src/Dictionaries/Embedded/TechDataHierarchy.h b/dbms/src/Dictionaries/Embedded/TechDataHierarchy.h
deleted file mode 100644
index 887cdf9170c..00000000000
--- a/dbms/src/Dictionaries/Embedded/TechDataHierarchy.h
+++ /dev/null
@@ -1,76 +0,0 @@
-#pragma once
-
-#include <common/Types.h>
-#include <ext/singleton.h>
-
-namespace Poco
-{
-namespace Util
-{
-    class AbstractConfiguration;
-}
-
-class Logger;
-}
-
-
-/** @brief Class that lets you know if a search engine or operating system belongs
-  * another search engine or operating system, respectively.
-  * Information about the hierarchy of regions is downloaded from the database.
-  */
-class TechDataHierarchy
-{
-private:
-    UInt8 os_parent[256]{};
-    UInt8 se_parent[256]{};
-
-public:
-    void reload();
-
-    /// Has corresponding section in configuration file.
-    static bool isConfigured(const Poco::Util::AbstractConfiguration & config);
-
-
-    /// The "belongs" relation.
-    bool isOSIn(UInt8 lhs, UInt8 rhs) const
-    {
-        while (lhs != rhs && os_parent[lhs])
-            lhs = os_parent[lhs];
-
-        return lhs == rhs;
-    }
-
-    bool isSEIn(UInt8 lhs, UInt8 rhs) const
-    {
-        while (lhs != rhs && se_parent[lhs])
-            lhs = se_parent[lhs];
-
-        return lhs == rhs;
-    }
-
-
-    UInt8 OSToParent(UInt8 x) const { return os_parent[x]; }
-
-    UInt8 SEToParent(UInt8 x) const { return se_parent[x]; }
-
-
-    /// To the topmost ancestor.
-    UInt8 OSToMostAncestor(UInt8 x) const
-    {
-        while (os_parent[x])
-            x = os_parent[x];
-        return x;
-    }
-
-    UInt8 SEToMostAncestor(UInt8 x) const
-    {
-        while (se_parent[x])
-            x = se_parent[x];
-        return x;
-    }
-};
-
-
-class TechDataHierarchySingleton : public ext::singleton<TechDataHierarchySingleton>, public TechDataHierarchy
-{
-};
diff --git a/dbms/src/Formats/JSONEachRowRowInputStream.h b/dbms/src/Formats/JSONEachRowRowInputStream.h
index 4a915d6aa9d..726b63b084e 100644
--- a/dbms/src/Formats/JSONEachRowRowInputStream.h
+++ b/dbms/src/Formats/JSONEachRowRowInputStream.h
@@ -13,7 +13,7 @@ class ReadBuffer;
 
 
 /** A stream for reading data in JSON format, where each row is represented by a separate JSON object.
-  * Objects can be separated by feed return, other whitespace characters in any number and possibly a comma.
+  * Objects can be separated by line feed, other whitespace characters in any number and possibly a comma.
   * Fields can be listed in any order (including, in different lines there may be different order),
   *  and some fields may be missing.
   */
diff --git a/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp b/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp
index 2aa212d1e0a..63136be6790 100644
--- a/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp
+++ b/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp
@@ -16,15 +16,6 @@ void registerFunctionsEmbeddedDictionaries(FunctionFactory & factory)
     factory.registerFunction<FunctionRegionIn>();
     factory.registerFunction<FunctionRegionHierarchy>();
     factory.registerFunction<FunctionRegionToName>();
-
-#if USE_MYSQL
-    factory.registerFunction<FunctionOSToRoot>();
-    factory.registerFunction<FunctionSEToRoot>();
-    factory.registerFunction<FunctionOSIn>();
-    factory.registerFunction<FunctionSEIn>();
-    factory.registerFunction<FunctionOSHierarchy>();
-    factory.registerFunction<FunctionSEHierarchy>();
-#endif
 }
 
 }
diff --git a/dbms/src/Functions/FunctionsEmbeddedDictionaries.h b/dbms/src/Functions/FunctionsEmbeddedDictionaries.h
index 239a2f167ed..cb226fec18c 100644
--- a/dbms/src/Functions/FunctionsEmbeddedDictionaries.h
+++ b/dbms/src/Functions/FunctionsEmbeddedDictionaries.h
@@ -18,11 +18,6 @@
 #include <Common/config.h>
 #include <Common/typeid_cast.h>
 
-#include "config_core.h"
-#if USE_MYSQL
-#include <Dictionaries/Embedded/TechDataHierarchy.h>
-#endif
-
 
 namespace DB
 {
@@ -98,41 +93,6 @@ struct RegionHierarchyImpl
 };
 
 
-#if USE_MYSQL
-
-struct OSToRootImpl
-{
-    static UInt8 apply(UInt8 x, const TechDataHierarchy & hierarchy) { return hierarchy.OSToMostAncestor(x); }
-};
-
-struct SEToRootImpl
-{
-    static UInt8 apply(UInt8 x, const TechDataHierarchy & hierarchy) { return hierarchy.SEToMostAncestor(x); }
-};
-
-struct OSInImpl
-{
-    static bool apply(UInt32 x, UInt32 y, const TechDataHierarchy & hierarchy) { return hierarchy.isOSIn(x, y); }
-};
-
-struct SEInImpl
-{
-    static bool apply(UInt32 x, UInt32 y, const TechDataHierarchy & hierarchy) { return hierarchy.isSEIn(x, y); }
-};
-
-struct OSHierarchyImpl
-{
-    static UInt8 toParent(UInt8 x, const TechDataHierarchy & hierarchy) { return hierarchy.OSToParent(x); }
-};
-
-struct SEHierarchyImpl
-{
-    static UInt8 toParent(UInt8 x, const TechDataHierarchy & hierarchy) { return hierarchy.SEToParent(x); }
-};
-
-#endif
-
-
 /** Auxiliary thing, allowing to get from the dictionary a specific dictionary, corresponding to the point of view
   *  (the dictionary key passed as function argument).
   * Example: when calling regionToCountry(x, 'ua'), a dictionary can be used, in which Crimea refers to Ukraine.
@@ -515,18 +475,6 @@ struct NameRegionHierarchy             { static constexpr auto name = "regionHie
 struct NameRegionIn                    { static constexpr auto name = "regionIn"; };
 
 
-#if USE_MYSQL
-
-struct NameOSToRoot                    { static constexpr auto name = "OSToRoot"; };
-struct NameSEToRoot                    { static constexpr auto name = "SEToRoot"; };
-struct NameOSIn                        { static constexpr auto name = "OSIn"; };
-struct NameSEIn                        { static constexpr auto name = "SEIn"; };
-struct NameOSHierarchy                 { static constexpr auto name = "OSHierarchy"; };
-struct NameSEHierarchy                 { static constexpr auto name = "SEHierarchy"; };
-
-#endif
-
-
 struct FunctionRegionToCity :
     public FunctionTransformWithDictionary<UInt32, RegionToCityImpl,    RegionsHierarchyGetter,    NameRegionToCity>
 {
@@ -609,65 +557,6 @@ struct FunctionRegionHierarchy :
 };
 
 
-#if USE_MYSQL
-
-struct FunctionOSToRoot :
-    public FunctionTransformWithDictionary<UInt8, OSToRootImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameOSToRoot>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-struct FunctionSEToRoot :
-    public FunctionTransformWithDictionary<UInt8, SEToRootImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameSEToRoot>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-struct FunctionOSIn :
-    public FunctionIsInWithDictionary<UInt8,    OSInImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameOSIn>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-struct FunctionSEIn :
-    public FunctionIsInWithDictionary<UInt8,    SEInImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameSEIn>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-struct FunctionOSHierarchy :
-    public FunctionHierarchyWithDictionary<UInt8, OSHierarchyImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameOSHierarchy>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-struct FunctionSEHierarchy :
-    public FunctionHierarchyWithDictionary<UInt8, SEHierarchyImpl, IdentityDictionaryGetter<TechDataHierarchy>, NameSEHierarchy>
-{
-    static FunctionPtr create(const Context & context)
-    {
-        return std::make_shared<base_type>(context.getEmbeddedDictionaries().getTechDataHierarchy());
-    }
-};
-
-#endif
-
-
 /// Converts a region's numeric identifier to a name in the specified language using a dictionary.
 class FunctionRegionToName : public IFunction
 {
diff --git a/dbms/src/Functions/FunctionsRandom.cpp b/dbms/src/Functions/FunctionsRandom.cpp
index ede8c332d18..19b2f08cdba 100644
--- a/dbms/src/Functions/FunctionsRandom.cpp
+++ b/dbms/src/Functions/FunctionsRandom.cpp
@@ -57,10 +57,10 @@ void RandImpl::execute(char * output, size_t size)
 
     for (const char * end = output + size; output < end; output += 16)
     {
-        unalignedStore(output, generator0.next());
-        unalignedStore(output + 4, generator1.next());
-        unalignedStore(output + 8, generator2.next());
-        unalignedStore(output + 12, generator3.next());
+        unalignedStore<UInt32>(output, generator0.next());
+        unalignedStore<UInt32>(output + 4, generator1.next());
+        unalignedStore<UInt32>(output + 8, generator2.next());
+        unalignedStore<UInt32>(output + 12, generator3.next());
     }
 
     /// It is guaranteed (by PaddedPODArray) that we can overwrite up to 15 bytes after end.
diff --git a/dbms/src/Functions/FunctionsVisitParam.h b/dbms/src/Functions/FunctionsVisitParam.h
index 09cc3106719..41a49dfd908 100644
--- a/dbms/src/Functions/FunctionsVisitParam.h
+++ b/dbms/src/Functions/FunctionsVisitParam.h
@@ -91,8 +91,7 @@ struct ExtractBool
 
 struct ExtractRaw
 {
-    static constexpr size_t bytes_on_stack = 64;
-    using ExpectChars = PODArray<char, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using ExpectChars = PODArrayWithStackMemory<char, 64>;
 
     static void extract(const UInt8 * pos, const UInt8 * end, ColumnString::Chars & res_data)
     {
diff --git a/dbms/src/Functions/URL/domain.h b/dbms/src/Functions/URL/domain.h
index 1a7b1df5291..141887d8e96 100644
--- a/dbms/src/Functions/URL/domain.h
+++ b/dbms/src/Functions/URL/domain.h
@@ -3,44 +3,117 @@
 #include "protocol.h"
 #include <common/find_symbols.h>
 #include <cstring>
-
+#include <Common/StringUtils/StringUtils.h>
 
 namespace DB
 {
 
+namespace
+{
+
+inline StringRef checkAndReturnHost(const Pos & pos, const Pos & dot_pos, const Pos & start_of_host)
+{
+    if (!dot_pos || start_of_host >= pos || pos - dot_pos == 1)
+        return StringRef{};
+
+    auto after_dot = *(dot_pos + 1);
+    if (after_dot == ':' || after_dot == '/' || after_dot == '?' || after_dot == '#')
+        return StringRef{};
+
+    return StringRef(start_of_host, pos - start_of_host);
+}
+
+}
+
 /// Extracts host from given url.
 inline StringRef getURLHost(const char * data, size_t size)
 {
     Pos pos = data;
     Pos end = data + size;
 
-    if (end == (pos = find_first_symbols<'/'>(pos, end)))
-        return {};
-
-    if (pos != data)
+    if (*pos == '/' && *(pos + 1) == '/')
     {
-        StringRef scheme = getURLScheme(data, size);
-        Pos scheme_end = data + scheme.size;
-
-        // Colon must follows after scheme.
-        if (pos - scheme_end != 1 || *scheme_end != ':')
-            return {};
+        pos += 2;
+    }
+    else
+    {
+        Pos scheme_end = data + std::min(size, 16UL);
+        for (++pos; pos < scheme_end; ++pos)
+        {
+            if (!isAlphaNumericASCII(*pos))
+            {
+                switch (*pos)
+                {
+                case '.':
+                case '-':
+                case '+':
+                    break;
+                case ' ': /// restricted symbols
+                case '\t':
+                case '<':
+                case '>':
+                case '%':
+                case '{':
+                case '}':
+                case '|':
+                case '\\':
+                case '^':
+                case '~':
+                case '[':
+                case ']':
+                case ';':
+                case '=':
+                case '&':
+                    return StringRef{};
+                default:
+                    goto exloop;
+                }
+            }
+        }
+exloop: if ((scheme_end - pos) > 2 && *pos == ':' && *(pos + 1) == '/' && *(pos + 2) == '/')
+            pos += 3;
+        else
+            pos = data;
     }
 
-    if (end - pos < 2 || *(pos) != '/' || *(pos + 1) != '/')
-        return {};
-    pos += 2;
-
-    const char * start_of_host = pos;
+    Pos dot_pos = nullptr;
+    auto start_of_host = pos;
     for (; pos < end; ++pos)
     {
-        if (*pos == '@')
-            start_of_host = pos + 1;
-        else if (*pos == ':' || *pos == '/' || *pos == '?' || *pos == '#')
+        switch (*pos)
+        {
+        case '.':
+            dot_pos = pos;
             break;
+        case ':': /// end symbols
+        case '/':
+        case '?':
+        case '#':
+            return checkAndReturnHost(pos, dot_pos, start_of_host);
+        case '@': /// myemail@gmail.com
+            start_of_host = pos + 1;
+            break;
+        case ' ': /// restricted symbols in whole URL
+        case '\t':
+        case '<':
+        case '>':
+        case '%':
+        case '{':
+        case '}':
+        case '|':
+        case '\\':
+        case '^':
+        case '~':
+        case '[':
+        case ']':
+        case ';':
+        case '=':
+        case '&':
+            return StringRef{};
+        }
     }
 
-    return (pos == start_of_host) ? StringRef{} : StringRef(start_of_host, pos - start_of_host);
+    return checkAndReturnHost(pos, dot_pos, start_of_host);
 }
 
 template <bool without_www>
diff --git a/dbms/src/Functions/array/flatten.cpp b/dbms/src/Functions/array/arrayFlatten.cpp
similarity index 90%
rename from dbms/src/Functions/array/flatten.cpp
rename to dbms/src/Functions/array/arrayFlatten.cpp
index 8fe743db8ea..9898fbb0287 100644
--- a/dbms/src/Functions/array/flatten.cpp
+++ b/dbms/src/Functions/array/arrayFlatten.cpp
@@ -12,13 +12,13 @@ namespace ErrorCodes
     extern const int ILLEGAL_COLUMN;
 }
 
-/// flatten([[1, 2, 3], [4, 5]]) = [1, 2, 3, 4, 5] - flatten array.
-class FunctionFlatten : public IFunction
+/// arrayFlatten([[1, 2, 3], [4, 5]]) = [1, 2, 3, 4, 5] - flatten array.
+class ArrayFlatten : public IFunction
 {
 public:
-    static constexpr auto name = "flatten";
+    static constexpr auto name = "arrayFlatten";
 
-    static FunctionPtr create(const Context &) { return std::make_shared<FunctionFlatten>(); }
+    static FunctionPtr create(const Context &) { return std::make_shared<ArrayFlatten>(); }
 
     size_t getNumberOfArguments() const override { return 1; }
     bool useDefaultImplementationForConstants() const override { return true; }
@@ -80,7 +80,7 @@ result: Row 1: [1, 2, 3], Row2: [4]
         const ColumnArray * src_col = checkAndGetColumn<ColumnArray>(block.getByPosition(arguments[0]).column.get());
 
         if (!src_col)
-            throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " in argument of function 'flatten'",
+            throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " in argument of function 'arrayFlatten'",
                 ErrorCodes::ILLEGAL_COLUMN);
 
         const IColumn::Offsets & src_offsets = src_col->getOffsets();
@@ -118,9 +118,10 @@ private:
 };
 
 
-void registerFunctionFlatten(FunctionFactory & factory)
+void registerFunctionArrayFlatten(FunctionFactory & factory)
 {
-    factory.registerFunction<FunctionFlatten>();
+    factory.registerFunction<ArrayFlatten>();
+    factory.registerAlias("flatten", "arrayFlatten", FunctionFactory::CaseInsensitive);
 }
 
 }
diff --git a/dbms/src/Functions/array/registerFunctionsArray.cpp b/dbms/src/Functions/array/registerFunctionsArray.cpp
index 8f5fa7a1a6b..7cb07c553c4 100644
--- a/dbms/src/Functions/array/registerFunctionsArray.cpp
+++ b/dbms/src/Functions/array/registerFunctionsArray.cpp
@@ -30,7 +30,7 @@ void registerFunctionArrayEnumerateUniqRanked(FunctionFactory &);
 void registerFunctionArrayEnumerateDenseRanked(FunctionFactory &);
 void registerFunctionArrayUniq(FunctionFactory &);
 void registerFunctionArrayDistinct(FunctionFactory &);
-void registerFunctionFlatten(FunctionFactory &);
+void registerFunctionArrayFlatten(FunctionFactory &);
 void registerFunctionArrayWithConstant(FunctionFactory &);
 
 void registerFunctionsArray(FunctionFactory & factory)
@@ -62,7 +62,7 @@ void registerFunctionsArray(FunctionFactory & factory)
     registerFunctionArrayEnumerateDenseRanked(factory);
     registerFunctionArrayUniq(factory);
     registerFunctionArrayDistinct(factory);
-    registerFunctionFlatten(factory);
+    registerFunctionArrayFlatten(factory);
     registerFunctionArrayWithConstant(factory);
 }
 
diff --git a/dbms/src/Functions/array/empty.cpp b/dbms/src/Functions/empty.cpp
similarity index 100%
rename from dbms/src/Functions/array/empty.cpp
rename to dbms/src/Functions/empty.cpp
diff --git a/dbms/src/Functions/array/notEmpty.cpp b/dbms/src/Functions/notEmpty.cpp
similarity index 100%
rename from dbms/src/Functions/array/notEmpty.cpp
rename to dbms/src/Functions/notEmpty.cpp
diff --git a/dbms/src/Functions/registerFunctions.cpp b/dbms/src/Functions/registerFunctions.cpp
index 88f549ea01b..3e7f9c7136d 100644
--- a/dbms/src/Functions/registerFunctions.cpp
+++ b/dbms/src/Functions/registerFunctions.cpp
@@ -1,5 +1,6 @@
 #include <Functions/FunctionFactory.h>
 #include <Functions/registerFunctions.h>
+
 #include "config_core.h"
 #include "config_functions.h"
 
@@ -41,16 +42,11 @@ void registerFunctionsGeo(FunctionFactory &);
 void registerFunctionsNull(FunctionFactory &);
 void registerFunctionsFindCluster(FunctionFactory &);
 void registerFunctionsJSON(FunctionFactory &);
-void registerFunctionTransform(FunctionFactory &);
 
 #if USE_H3
 void registerFunctionGeoToH3(FunctionFactory &);
 #endif
 
-#if USE_ICU
-void registerFunctionConvertCharset(FunctionFactory &);
-#endif
-
 void registerFunctions()
 {
     auto & factory = FunctionFactory::instance();
@@ -88,15 +84,10 @@ void registerFunctions()
     registerFunctionsNull(factory);
     registerFunctionsFindCluster(factory);
     registerFunctionsJSON(factory);
-    registerFunctionTransform(factory);
 
 #if USE_H3
     registerFunctionGeoToH3(factory);
 #endif
-
-#if USE_ICU
-    registerFunctionConvertCharset(factory);
-#endif
 }
 
 }
diff --git a/dbms/src/Functions/registerFunctionsMiscellaneous.cpp b/dbms/src/Functions/registerFunctionsMiscellaneous.cpp
index 03df3a887ca..6d201d65bd3 100644
--- a/dbms/src/Functions/registerFunctionsMiscellaneous.cpp
+++ b/dbms/src/Functions/registerFunctionsMiscellaneous.cpp
@@ -1,3 +1,5 @@
+#include "config_core.h"
+
 namespace DB
 {
 
@@ -45,6 +47,11 @@ void registerFunctionJoinGet(FunctionFactory &);
 void registerFunctionFilesystem(FunctionFactory &);
 void registerFunctionEvalMLMethod(FunctionFactory &);
 void registerFunctionBasename(FunctionFactory &);
+void registerFunctionTransform(FunctionFactory &);
+
+#if USE_ICU
+void registerFunctionConvertCharset(FunctionFactory &);
+#endif
 
 void registerFunctionsMiscellaneous(FunctionFactory & factory)
 {
@@ -90,6 +97,11 @@ void registerFunctionsMiscellaneous(FunctionFactory & factory)
     registerFunctionFilesystem(factory);
     registerFunctionEvalMLMethod(factory);
     registerFunctionBasename(factory);
+    registerFunctionTransform(factory);
+
+#if USE_ICU
+    registerFunctionConvertCharset(factory);
+#endif
 }
 
 }
diff --git a/dbms/src/IO/BitHelpers.h b/dbms/src/IO/BitHelpers.h
index c2986299746..3652dd0057a 100644
--- a/dbms/src/IO/BitHelpers.h
+++ b/dbms/src/IO/BitHelpers.h
@@ -5,6 +5,15 @@
 #include <Core/Types.h>
 #include <Common/BitHelpers.h>
 
+#if defined(__OpenBSD__) || defined(__FreeBSD__)
+#   include <sys/endian.h>
+#elif defined(__APPLE__)
+#   include <libkern/OSByteOrder.h>
+
+#   define htobe64(x) OSSwapHostToBigInt64(x)
+#   define be64toh(x) OSSwapBigToHostInt64(x)
+#endif
+
 namespace DB
 {
 
diff --git a/dbms/src/IO/UncompressedCache.h b/dbms/src/IO/UncompressedCache.h
index 2347c6d7a28..1f17c5e61b6 100644
--- a/dbms/src/IO/UncompressedCache.h
+++ b/dbms/src/IO/UncompressedCache.h
@@ -7,8 +7,8 @@
 #include <IO/BufferWithOwnMemory.h>
 
 #include <Common/config.h>
-#if USE_LFALLOC
-#include <Common/LFAllocator.h>
+#if USE_MIMALLOC
+#include <Common/MiAllocator.h>
 #endif
 
 
@@ -25,8 +25,8 @@ namespace DB
 
 struct UncompressedCacheCell
 {
-#if USE_LFALLOC
-    Memory<LFAllocator> data;
+#if USE_MIMALLOC
+    Memory<MiAllocator> data;
 #else
     Memory<> data;
 #endif
diff --git a/dbms/src/Interpreters/BloomFilter.cpp b/dbms/src/Interpreters/BloomFilter.cpp
index 765f1ea9478..d648fd114f4 100644
--- a/dbms/src/Interpreters/BloomFilter.cpp
+++ b/dbms/src/Interpreters/BloomFilter.cpp
@@ -1,5 +1,4 @@
 #include <Interpreters/BloomFilter.h>
-
 #include <city.h>
 
 
@@ -9,14 +8,13 @@ namespace DB
 static constexpr UInt64 SEED_GEN_A = 845897321;
 static constexpr UInt64 SEED_GEN_B = 217728422;
 
-
-StringBloomFilter::StringBloomFilter(size_t size_, size_t hashes_, size_t seed_)
+BloomFilter::BloomFilter(size_t size_, size_t hashes_, size_t seed_)
     : size(size_), hashes(hashes_), seed(seed_), words((size + sizeof(UnderType) - 1) / sizeof(UnderType)), filter(words, 0) {}
 
-StringBloomFilter::StringBloomFilter(const StringBloomFilter & bloom_filter)
+BloomFilter::BloomFilter(const BloomFilter & bloom_filter)
     : size(bloom_filter.size), hashes(bloom_filter.hashes), seed(bloom_filter.seed), words(bloom_filter.words), filter(bloom_filter.filter) {}
 
-bool StringBloomFilter::find(const char * data, size_t len)
+bool BloomFilter::find(const char * data, size_t len)
 {
     size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
     size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
@@ -30,7 +28,7 @@ bool StringBloomFilter::find(const char * data, size_t len)
     return true;
 }
 
-void StringBloomFilter::add(const char * data, size_t len)
+void BloomFilter::add(const char * data, size_t len)
 {
     size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
     size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
@@ -42,12 +40,12 @@ void StringBloomFilter::add(const char * data, size_t len)
     }
 }
 
-void StringBloomFilter::clear()
+void BloomFilter::clear()
 {
     filter.assign(words, 0);
 }
 
-bool StringBloomFilter::contains(const StringBloomFilter & bf)
+bool BloomFilter::contains(const BloomFilter & bf)
 {
     for (size_t i = 0; i < words; ++i)
     {
@@ -57,7 +55,7 @@ bool StringBloomFilter::contains(const StringBloomFilter & bf)
     return true;
 }
 
-UInt64 StringBloomFilter::isEmpty() const
+UInt64 BloomFilter::isEmpty() const
 {
     for (size_t i = 0; i < words; ++i)
         if (filter[i] != 0)
@@ -65,7 +63,7 @@ UInt64 StringBloomFilter::isEmpty() const
     return true;
 }
 
-bool operator== (const StringBloomFilter & a, const StringBloomFilter & b)
+bool operator== (const BloomFilter & a, const BloomFilter & b)
 {
     for (size_t i = 0; i < a.words; ++i)
         if (a.filter[i] != b.filter[i])
@@ -73,4 +71,16 @@ bool operator== (const StringBloomFilter & a, const StringBloomFilter & b)
     return true;
 }
 
+void BloomFilter::addHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed)
+{
+    size_t pos = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)) % (8 * size);
+    filter[pos / (8 * sizeof(UnderType))] |= (1ULL << (pos % (8 * sizeof(UnderType))));
+}
+
+bool BloomFilter::findHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed)
+{
+    size_t pos = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)) % (8 * size);
+    return bool(filter[pos / (8 * sizeof(UnderType))] & (1ULL << (pos % (8 * sizeof(UnderType)))));
+}
+
 }
diff --git a/dbms/src/Interpreters/BloomFilter.h b/dbms/src/Interpreters/BloomFilter.h
index 1825dbec4bd..19469834c94 100644
--- a/dbms/src/Interpreters/BloomFilter.h
+++ b/dbms/src/Interpreters/BloomFilter.h
@@ -1,15 +1,17 @@
 #pragma once
 
-#include <Core/Types.h>
 #include <vector>
-
+#include <Core/Types.h>
+#include <Common/PODArray.h>
+#include <Common/Allocator.h>
+#include <Columns/ColumnVector.h>
 
 namespace DB
 {
 
-/// Bloom filter for strings.
-class StringBloomFilter
+class BloomFilter
 {
+
 public:
     using UnderType = UInt64;
     using Container = std::vector<UnderType>;
@@ -17,16 +19,19 @@ public:
     /// size -- size of filter in bytes.
     /// hashes -- number of used hash functions.
     /// seed -- random seed for hash functions generation.
-    StringBloomFilter(size_t size_, size_t hashes_, size_t seed_);
-    StringBloomFilter(const StringBloomFilter & bloom_filter);
+    BloomFilter(size_t size_, size_t hashes_, size_t seed_);
+    BloomFilter(const BloomFilter & bloom_filter);
 
     bool find(const char * data, size_t len);
     void add(const char * data, size_t len);
     void clear();
 
+    void addHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed);
+    bool findHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed);
+
     /// Checks if this contains everything from another bloom filter.
     /// Bloom filters must have equal size and seed.
-    bool contains(const StringBloomFilter & bf);
+    bool contains(const BloomFilter & bf);
 
     const Container & getFilter() const { return filter; }
     Container & getFilter() { return filter; }
@@ -34,7 +39,7 @@ public:
     /// For debug.
     UInt64 isEmpty() const;
 
-    friend bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
+    friend bool operator== (const BloomFilter & a, const BloomFilter & b);
 private:
 
     size_t size;
@@ -44,7 +49,8 @@ private:
     Container filter;
 };
 
+using BloomFilterPtr = std::shared_ptr<BloomFilter>;
 
-bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
+bool operator== (const BloomFilter & a, const BloomFilter & b);
 
 }
diff --git a/dbms/src/Interpreters/BloomFilterHash.h b/dbms/src/Interpreters/BloomFilterHash.h
new file mode 100644
index 00000000000..a94bc8687eb
--- /dev/null
+++ b/dbms/src/Interpreters/BloomFilterHash.h
@@ -0,0 +1,207 @@
+#pragma once
+
+#include <Columns/IColumn.h>
+#include <Columns/ColumnConst.h>
+#include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnString.h>
+#include <Columns/ColumnFixedString.h>
+#include <DataTypes/IDataType.h>
+#include <DataTypes/DataTypesNumber.h>
+#include <ext/bit_cast.h>
+#include <Common/HashTable/Hash.h>
+
+namespace DB
+{
+
+namespace ErrorCodes
+{
+    extern const int ILLEGAL_COLUMN;
+}
+
+struct BloomFilterHash
+{
+    static constexpr UInt64 bf_hash_seed[15] = {
+        13635471485423070496ULL, 10336109063487487899ULL, 17779957404565211594ULL, 8988612159822229247ULL, 4954614162757618085ULL,
+        12980113590177089081ULL, 9263883436177860930ULL, 3656772712723269762ULL, 10362091744962961274ULL, 7582936617938287249ULL,
+        15033938188484401405ULL, 18286745649494826751ULL, 6852245486148412312ULL, 8886056245089344681ULL, 10151472371158292780ULL
+    };
+
+    static ColumnPtr hashWithField(const IDataType * data_type, const Field & field)
+    {
+        WhichDataType which(data_type);
+
+        if (which.isUInt() || which.isDateOrDateTime())
+            return ColumnConst::create(ColumnUInt64::create(1, intHash64(field.safeGet<UInt64>())), 1);
+        else if (which.isInt() || which.isEnum())
+            return ColumnConst::create(ColumnUInt64::create(1, intHash64(ext::bit_cast<UInt64>(field.safeGet<Int64>()))), 1);
+        else if (which.isFloat32() || which.isFloat64())
+            return ColumnConst::create(ColumnUInt64::create(1, intHash64(ext::bit_cast<UInt64>(field.safeGet<Float64>()))), 1);
+        else if (which.isString() || which.isFixedString())
+        {
+            const auto & value = field.safeGet<String>();
+            return ColumnConst::create(ColumnUInt64::create(1, CityHash_v1_0_2::CityHash64(value.data(), value.size())), 1);
+        }
+        else
+            throw Exception("Unexpected type " + data_type->getName() + " of bloom filter index.", ErrorCodes::LOGICAL_ERROR);
+    }
+
+    static ColumnPtr hashWithColumn(const DataTypePtr & data_type, const ColumnPtr & column, size_t pos, size_t limit)
+    {
+        auto index_column = ColumnUInt64::create(limit);
+        ColumnUInt64::Container & index_column_vec = index_column->getData();
+        getAnyTypeHash<true>(&*data_type, &*column, index_column_vec, pos);
+        return index_column;
+    }
+
+    template <bool is_first>
+    static void getAnyTypeHash(const IDataType * data_type, const IColumn * column, ColumnUInt64::Container & vec, size_t pos)
+    {
+        WhichDataType which(data_type);
+
+        if      (which.isUInt8()) getNumberTypeHash<UInt8, is_first>(column, vec, pos);
+        else if (which.isUInt16()) getNumberTypeHash<UInt16, is_first>(column, vec, pos);
+        else if (which.isUInt32()) getNumberTypeHash<UInt32, is_first>(column, vec, pos);
+        else if (which.isUInt64()) getNumberTypeHash<UInt64, is_first>(column, vec, pos);
+        else if (which.isInt8()) getNumberTypeHash<Int8, is_first>(column, vec, pos);
+        else if (which.isInt16()) getNumberTypeHash<Int16, is_first>(column, vec, pos);
+        else if (which.isInt32()) getNumberTypeHash<Int32, is_first>(column, vec, pos);
+        else if (which.isInt64()) getNumberTypeHash<Int64, is_first>(column, vec, pos);
+        else if (which.isEnum8()) getNumberTypeHash<Int8, is_first>(column, vec, pos);
+        else if (which.isEnum16()) getNumberTypeHash<Int16, is_first>(column, vec, pos);
+        else if (which.isDate()) getNumberTypeHash<UInt16, is_first>(column, vec, pos);
+        else if (which.isDateTime()) getNumberTypeHash<UInt32, is_first>(column, vec, pos);
+        else if (which.isFloat32()) getNumberTypeHash<Float32, is_first>(column, vec, pos);
+        else if (which.isFloat64()) getNumberTypeHash<Float64, is_first>(column, vec, pos);
+        else if (which.isString()) getStringTypeHash<is_first>(column, vec, pos);
+        else if (which.isFixedString()) getStringTypeHash<is_first>(column, vec, pos);
+        else throw Exception("Unexpected type " + data_type->getName() + " of bloom filter index.", ErrorCodes::LOGICAL_ERROR);
+    }
+
+    template <typename Type, bool is_first>
+    static void getNumberTypeHash(const IColumn * column, ColumnUInt64::Container & vec, size_t pos)
+    {
+        const auto * index_column = typeid_cast<const ColumnVector<Type> *>(column);
+
+        if (unlikely(!index_column))
+            throw Exception("Illegal column type was passed to the bloom filter index.", ErrorCodes::ILLEGAL_COLUMN);
+
+        const typename ColumnVector<Type>::Container & vec_from = index_column->getData();
+
+        /// Because we're missing the precision of float in the Field.h
+        /// to be consistent, we need to convert Float32 to Float64 processing, also see: BloomFilterHash::hashWithField
+        if constexpr (std::is_same_v<ColumnVector<Type>, ColumnFloat32>)
+        {
+            for (size_t index = 0, size = vec.size(); index < size; ++index)
+            {
+                UInt64 hash = intHash64(ext::bit_cast<UInt64>(Float64(vec_from[index + pos])));
+
+                if constexpr (is_first)
+                    vec[index] = hash;
+                else
+                    vec[index] = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(vec[index], hash));
+            }
+        }
+        else
+        {
+            for (size_t index = 0, size = vec.size(); index < size; ++index)
+            {
+                UInt64 hash = intHash64(ext::bit_cast<UInt64>(vec_from[index + pos]));
+
+                if constexpr (is_first)
+                    vec[index] = hash;
+                else
+                    vec[index] = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(vec[index], hash));
+            }
+        }
+    }
+
+    template <bool is_first>
+    static void getStringTypeHash(const IColumn * column, ColumnUInt64::Container & vec, size_t pos)
+    {
+        if (const auto * index_column = typeid_cast<const ColumnString *>(column))
+        {
+            const ColumnString::Chars & data = index_column->getChars();
+            const ColumnString::Offsets & offsets = index_column->getOffsets();
+
+            ColumnString::Offset current_offset = pos;
+            for (size_t index = 0, size = vec.size(); index < size; ++index)
+            {
+                UInt64 city_hash = CityHash_v1_0_2::CityHash64(
+                    reinterpret_cast<const char *>(&data[current_offset]), offsets[index + pos] - current_offset - 1);
+
+                if constexpr (is_first)
+                    vec[index] = city_hash;
+                else
+                    vec[index] = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(vec[index], city_hash));
+
+                current_offset = offsets[index + pos];
+            }
+        }
+        else if (const auto * fixed_string_index_column = typeid_cast<const ColumnFixedString *>(column))
+        {
+            size_t fixed_len = fixed_string_index_column->getN();
+            const auto & data = fixed_string_index_column->getChars();
+
+            for (size_t index = 0, size = vec.size(); index < size; ++index)
+            {
+                UInt64 city_hash = CityHash_v1_0_2::CityHash64(reinterpret_cast<const char *>(&data[(index + pos) * fixed_len]), fixed_len);
+
+                if constexpr (is_first)
+                    vec[index] = city_hash;
+                else
+                    vec[index] = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(vec[index], city_hash));
+            }
+        }
+        else
+            throw Exception("Illegal column type was passed to the bloom filter index.", ErrorCodes::ILLEGAL_COLUMN);
+    }
+
+    static std::pair<size_t, size_t> calculationBestPractices(double max_conflict_probability)
+    {
+        static const size_t MAX_BITS_PER_ROW = 20;
+        static const size_t MAX_HASH_FUNCTION_COUNT = 15;
+
+        /// For the smallest index per level in probability_lookup_table
+        static const size_t min_probability_index_each_bits[] = {0, 0, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 8, 9, 10, 10, 11, 12, 12, 13, 14};
+
+        static const long double probability_lookup_table[MAX_BITS_PER_ROW + 1][MAX_HASH_FUNCTION_COUNT] =
+            {
+                {1.0},  /// dummy, 0 bits per row
+                {1.0, 1.0},
+                {1.0, 0.393,  0.400},
+                {1.0, 0.283,  0.237,   0.253},
+                {1.0, 0.221,  0.155,   0.147,   0.160},
+                {1.0, 0.181,  0.109,   0.092,   0.092,   0.101}, // 5
+                {1.0, 0.154,  0.0804,  0.0609,  0.0561,  0.0578,   0.0638},
+                {1.0, 0.133,  0.0618,  0.0423,  0.0359,  0.0347,   0.0364},
+                {1.0, 0.118,  0.0489,  0.0306,  0.024,   0.0217,   0.0216,   0.0229},
+                {1.0, 0.105,  0.0397,  0.0228,  0.0166,  0.0141,   0.0133,   0.0135,   0.0145},
+                {1.0, 0.0952, 0.0329,  0.0174,  0.0118,  0.00943,  0.00844,  0.00819,  0.00846}, // 10
+                {1.0, 0.0869, 0.0276,  0.0136,  0.00864, 0.0065,   0.00552,  0.00513,  0.00509},
+                {1.0, 0.08,   0.0236,  0.0108,  0.00646, 0.00459,  0.00371,  0.00329,  0.00314},
+                {1.0, 0.074,  0.0203,  0.00875, 0.00492, 0.00332,  0.00255,  0.00217,  0.00199,  0.00194},
+                {1.0, 0.0689, 0.0177,  0.00718, 0.00381, 0.00244,  0.00179,  0.00146,  0.00129,  0.00121,  0.0012},
+                {1.0, 0.0645, 0.0156,  0.00596, 0.003,   0.00183,  0.00128,  0.001,    0.000852, 0.000775, 0.000744}, // 15
+                {1.0, 0.0606, 0.0138,  0.005,   0.00239, 0.00139,  0.000935, 0.000702, 0.000574, 0.000505, 0.00047,  0.000459},
+                {1.0, 0.0571, 0.0123,  0.00423, 0.00193, 0.00107,  0.000692, 0.000499, 0.000394, 0.000335, 0.000302, 0.000287, 0.000284},
+                {1.0, 0.054,  0.0111,  0.00362, 0.00158, 0.000839, 0.000519, 0.00036,  0.000275, 0.000226, 0.000198, 0.000183, 0.000176},
+                {1.0, 0.0513, 0.00998, 0.00312, 0.0013,  0.000663, 0.000394, 0.000264, 0.000194, 0.000155, 0.000132, 0.000118, 0.000111, 0.000109},
+                {1.0, 0.0488, 0.00906, 0.0027,  0.00108, 0.00053,  0.000303, 0.000196, 0.00014,  0.000108, 8.89e-05, 7.77e-05, 7.12e-05, 6.79e-05, 6.71e-05} // 20
+            };
+
+        for (size_t bits_per_row = 1; bits_per_row < MAX_BITS_PER_ROW; ++bits_per_row)
+        {
+            if (probability_lookup_table[bits_per_row][min_probability_index_each_bits[bits_per_row]] <= max_conflict_probability)
+            {
+                size_t max_size_of_hash_functions = min_probability_index_each_bits[bits_per_row];
+                for (size_t size_of_hash_functions = max_size_of_hash_functions; size_of_hash_functions > 0; --size_of_hash_functions)
+                    if (probability_lookup_table[bits_per_row][size_of_hash_functions] > max_conflict_probability)
+                        return std::pair<size_t, size_t>(bits_per_row, size_of_hash_functions + 1);
+            }
+        }
+
+        return std::pair<size_t, size_t>(MAX_BITS_PER_ROW - 1, min_probability_index_each_bits[MAX_BITS_PER_ROW - 1]);
+    }
+};
+
+}
diff --git a/dbms/src/Interpreters/Compiler.cpp b/dbms/src/Interpreters/Compiler.cpp
index abdb0969121..ee6845767e6 100644
--- a/dbms/src/Interpreters/Compiler.cpp
+++ b/dbms/src/Interpreters/Compiler.cpp
@@ -262,8 +262,8 @@ void Compiler::compile(
             " -I " << compiler_headers << "/dbms/src/"
             " -isystem " << compiler_headers << "/contrib/cityhash102/include/"
             " -isystem " << compiler_headers << "/contrib/libpcg-random/include/"
-        #if USE_LFALLOC
-            " -isystem " << compiler_headers << "/contrib/lfalloc/src/"
+        #if USE_MIMALLOC
+            " -isystem " << compiler_headers << "/contrib/mimalloc/include/"
         #endif
             " -isystem " << compiler_headers << INTERNAL_DOUBLE_CONVERSION_INCLUDE_DIR
             " -isystem " << compiler_headers << INTERNAL_Poco_Foundation_INCLUDE_DIR
diff --git a/dbms/src/Interpreters/Context.cpp b/dbms/src/Interpreters/Context.cpp
index bbeba97aaad..0da7cdeb454 100644
--- a/dbms/src/Interpreters/Context.cpp
+++ b/dbms/src/Interpreters/Context.cpp
@@ -244,15 +244,12 @@ struct ContextShared
             return;
         shutdown_called = true;
 
-        {
-            std::lock_guard lock(mutex);
+        /**  After system_logs have been shut down it is guaranteed that no system table gets created or written to.
+          *  Note that part changes at shutdown won't be logged to part log.
+          */
 
-            /** After this point, system logs will shutdown their threads and no longer write any data.
-            * It will prevent recreation of system tables at shutdown.
-            * Note that part changes at shutdown won't be logged to part log.
-            */
-            system_logs.reset();
-        }
+        if (system_logs)
+            system_logs->shutdown();
 
         /** At this point, some tables may have threads that block our mutex.
           * To shutdown them correctly, we will copy the current list of tables,
@@ -280,6 +277,7 @@ struct ContextShared
         /// Preemptive destruction is important, because these objects may have a refcount to ContextShared (cyclic reference).
         /// TODO: Get rid of this.
 
+        system_logs.reset();
         embedded_dictionaries.reset();
         external_dictionaries.reset();
         external_models.reset();
diff --git a/dbms/src/Interpreters/DDLWorker.cpp b/dbms/src/Interpreters/DDLWorker.cpp
index 1608feb798d..e1cea632a37 100644
--- a/dbms/src/Interpreters/DDLWorker.cpp
+++ b/dbms/src/Interpreters/DDLWorker.cpp
@@ -1,6 +1,9 @@
 #include <Interpreters/DDLWorker.h>
 #include <Parsers/ASTAlterQuery.h>
+#include <Parsers/ASTDropQuery.h>
+#include <Parsers/ASTOptimizeQuery.h>
 #include <Parsers/ASTQueryWithOnCluster.h>
+#include <Parsers/ASTQueryWithTableAndOutput.h>
 #include <Parsers/ParserQuery.h>
 #include <Parsers/parseQuery.h>
 #include <Parsers/queryToString.h>
@@ -29,6 +32,7 @@
 #include <Common/ZooKeeper/KeeperException.h>
 #include <Common/ZooKeeper/Lock.h>
 #include <Common/isLocalAddress.h>
+#include <Storages/StorageReplicatedMergeTree.h>
 #include <Poco/Timestamp.h>
 #include <random>
 #include <pcg_random.hpp>
@@ -614,14 +618,24 @@ void DDLWorker::processTask(DDLTask & task, const ZooKeeperPtr & zookeeper)
             String rewritten_query = queryToString(rewritten_ast);
             LOG_DEBUG(log, "Executing query: " << rewritten_query);
 
-            if (const auto * ast_alter = rewritten_ast->as<ASTAlterQuery>())
+            if (auto query_with_table = dynamic_cast<ASTQueryWithTableAndOutput *>(rewritten_ast.get()); query_with_table)
             {
-                processTaskAlter(task, ast_alter, rewritten_query, task.entry_path, zookeeper);
+                String database = query_with_table->database.empty() ? context.getCurrentDatabase() : query_with_table->database;
+                StoragePtr storage = context.tryGetTable(database, query_with_table->table);
+
+                /// For some reason we check consistency of cluster definition only
+                /// in case of ALTER query, but not in case of CREATE/DROP etc.
+                /// It's strange, but this behaviour exits for a long and we cannot change it.
+                if (storage && query_with_table->as<ASTAlterQuery>())
+                    checkShardConfig(query_with_table->table, task, storage);
+
+                if (storage && taskShouldBeExecutedOnLeader(rewritten_ast, storage))
+                    tryExecuteQueryOnLeaderReplica(task, storage, rewritten_query, task.entry_path, zookeeper);
+                else
+                    tryExecuteQuery(rewritten_query, task, task.execution_status);
             }
             else
-            {
                 tryExecuteQuery(rewritten_query, task, task.execution_status);
-            }
         }
         catch (const Coordination::Exception &)
         {
@@ -646,43 +660,52 @@ void DDLWorker::processTask(DDLTask & task, const ZooKeeperPtr & zookeeper)
 }
 
 
-void DDLWorker::processTaskAlter(
-    DDLTask & task,
-    const ASTAlterQuery * ast_alter,
-    const String & rewritten_query,
-    const String & node_path,
-    const ZooKeeperPtr & zookeeper)
+bool DDLWorker::taskShouldBeExecutedOnLeader(const ASTPtr ast_ddl, const StoragePtr storage) const
 {
-    String database = ast_alter->database.empty() ? context.getCurrentDatabase() : ast_alter->database;
-    StoragePtr storage = context.getTable(database, ast_alter->table);
+    /// Pure DROP queries have to be executed on each node separately
+    if (auto query = ast_ddl->as<ASTDropQuery>(); query && query->kind != ASTDropQuery::Kind::Truncate)
+        return false;
 
-    bool execute_once_on_replica = storage->supportsReplication();
-    bool execute_on_leader_replica = false;
+    if (!ast_ddl->as<ASTAlterQuery>() && !ast_ddl->as<ASTOptimizeQuery>() && !ast_ddl->as<ASTDropQuery>())
+        return false;
 
-    for (const auto & command : ast_alter->command_list->commands)
-    {
-        if (!isSupportedAlterType(command->type))
-            throw Exception("Unsupported type of ALTER query", ErrorCodes::NOT_IMPLEMENTED);
+    return storage->supportsReplication();
+}
 
-        if (execute_once_on_replica)
-            execute_on_leader_replica |= command->type == ASTAlterCommand::DROP_PARTITION;
-    }
 
+void DDLWorker::checkShardConfig(const String & table, const DDLTask & task, StoragePtr storage) const
+{
     const auto & shard_info = task.cluster->getShardsInfo().at(task.host_shard_num);
     bool config_is_replicated_shard = shard_info.hasInternalReplication();
 
-    if (execute_once_on_replica && !config_is_replicated_shard)
+    if (storage->supportsReplication() && !config_is_replicated_shard)
     {
-        throw Exception("Table " + ast_alter->table + " is replicated, but shard #" + toString(task.host_shard_num + 1) +
+        throw Exception("Table '" + table + "' is replicated, but shard #" + toString(task.host_shard_num + 1) +
             " isn't replicated according to its cluster definition."
             " Possibly <internal_replication>true</internal_replication> is forgotten in the cluster config.",
             ErrorCodes::INCONSISTENT_CLUSTER_DEFINITION);
     }
-    if (!execute_once_on_replica && config_is_replicated_shard)
+
+    if (!storage->supportsReplication() && config_is_replicated_shard)
     {
-        throw Exception("Table " + ast_alter->table + " isn't replicated, but shard #" + toString(task.host_shard_num + 1) +
+        throw Exception("Table '" + table + "' isn't replicated, but shard #" + toString(task.host_shard_num + 1) +
             " is replicated according to its cluster definition", ErrorCodes::INCONSISTENT_CLUSTER_DEFINITION);
     }
+}
+
+
+bool DDLWorker::tryExecuteQueryOnLeaderReplica(
+    DDLTask & task,
+    StoragePtr storage,
+    const String & rewritten_query,
+    const String & node_path,
+    const ZooKeeperPtr & zookeeper)
+{
+    StorageReplicatedMergeTree * replicated_storage = dynamic_cast<StorageReplicatedMergeTree *>(storage.get());
+
+    /// If we will develop new replicated storage
+    if (!replicated_storage)
+        throw Exception("Storage type '" + storage->getName() + "' is not supported by distributed DDL", ErrorCodes::NOT_IMPLEMENTED);
 
     /// Generate unique name for shard node, it will be used to execute the query by only single host
     /// Shard node name has format 'replica_name1,replica_name2,...,replica_nameN'
@@ -701,70 +724,73 @@ void DDLWorker::processTaskAlter(
         return res;
     };
 
-    if (execute_once_on_replica)
+    String shard_node_name = get_shard_name(task.cluster->getShardsAddresses().at(task.host_shard_num));
+    String shard_path = node_path + "/shards/" + shard_node_name;
+    String is_executed_path = shard_path + "/executed";
+    zookeeper->createAncestors(shard_path + "/");
+
+    auto is_already_executed = [&]() -> bool
     {
-        String shard_node_name = get_shard_name(task.cluster->getShardsAddresses().at(task.host_shard_num));
-        String shard_path = node_path + "/shards/" + shard_node_name;
-        String is_executed_path = shard_path + "/executed";
-        zookeeper->createAncestors(shard_path + "/");
-
-        bool is_executed_by_any_replica = false;
+        String executed_by;
+        if (zookeeper->tryGet(is_executed_path, executed_by))
         {
-            auto lock = createSimpleZooKeeperLock(zookeeper, shard_path, "lock", task.host_id_str);
-            pcg64 rng(randomSeed());
+            LOG_DEBUG(log, "Task " << task.entry_name << " has already been executed by leader replica ("
+                << executed_by << ") of the same shard.");
+            return true;
+        }
 
-            auto is_already_executed = [&]() -> bool
+        return false;
+    };
+
+    pcg64 rng(randomSeed());
+
+    auto lock = createSimpleZooKeeperLock(zookeeper, shard_path, "lock", task.host_id_str);
+    static const size_t max_tries = 20;
+    bool executed_by_leader = false;
+    for (size_t num_tries = 0; num_tries < max_tries; ++num_tries)
+    {
+        if (is_already_executed())
+        {
+            executed_by_leader = true;
+            break;
+        }
+
+        StorageReplicatedMergeTree::Status status;
+        replicated_storage->getStatus(status);
+
+        /// Leader replica take lock
+        if (status.is_leader && lock->tryLock())
+        {
+            if (is_already_executed())
             {
-                String executed_by;
-                if (zookeeper->tryGet(is_executed_path, executed_by))
-                {
-                    is_executed_by_any_replica = true;
-                    LOG_DEBUG(log, "Task " << task.entry_name << " has already been executed by another replica ("
-                        << executed_by << ") of the same shard.");
-                    return true;
-                }
-
-                return false;
-            };
-
-            static const size_t max_tries = 20;
-            for (size_t num_tries = 0; num_tries < max_tries; ++num_tries)
-            {
-                if (is_already_executed())
-                    break;
-
-                if (lock->tryLock())
-                {
-                    if (is_already_executed())
-                        break;
-
-                    tryExecuteQuery(rewritten_query, task, task.execution_status);
-
-                    if (execute_on_leader_replica && task.execution_status.code == ErrorCodes::NOT_IMPLEMENTED)
-                    {
-                        /// TODO: it is ok to receive exception "host is not leader"
-                    }
-
-                    zookeeper->create(is_executed_path, task.host_id_str, zkutil::CreateMode::Persistent);
-                    lock->unlock();
-                    is_executed_by_any_replica = true;
-                    break;
-                }
-
-                std::this_thread::sleep_for(std::chrono::milliseconds(std::uniform_int_distribution<long>(0, 1000)(rng)));
+                executed_by_leader = true;
+                break;
             }
+
+            /// If the leader will unexpectedly changed this method will return false
+            /// and on the next iteration new leader will take lock
+            if (tryExecuteQuery(rewritten_query, task, task.execution_status))
+            {
+                zookeeper->create(is_executed_path, task.host_id_str, zkutil::CreateMode::Persistent);
+                executed_by_leader = true;
+                break;
+            }
+
         }
 
-        if (!is_executed_by_any_replica)
-        {
-            task.execution_status = ExecutionStatus(ErrorCodes::NOT_IMPLEMENTED,
-                                                    "Cannot enqueue replicated DDL query for a replicated shard");
-        }
+        /// Does nothing if wasn't previously locked
+        lock->unlock();
+        std::this_thread::sleep_for(std::chrono::milliseconds(std::uniform_int_distribution<long>(0, 1000)(rng)));
     }
-    else
+
+    /// Not executed by leader so was not executed at all
+    if (!executed_by_leader)
     {
-        tryExecuteQuery(rewritten_query, task, task.execution_status);
+        task.execution_status = ExecutionStatus(ErrorCodes::NOT_IMPLEMENTED,
+                                                "Cannot execute replicated DDL query on leader");
+        return false;
     }
+    return true;
 }
 
 
diff --git a/dbms/src/Interpreters/DDLWorker.h b/dbms/src/Interpreters/DDLWorker.h
index ef8a785e517..4659b221b1a 100644
--- a/dbms/src/Interpreters/DDLWorker.h
+++ b/dbms/src/Interpreters/DDLWorker.h
@@ -5,6 +5,7 @@
 #include <Common/CurrentThread.h>
 #include <Common/ThreadPool.h>
 #include <common/logger_useful.h>
+#include <Storages/IStorage.h>
 
 #include <atomic>
 #include <chrono>
@@ -54,12 +55,22 @@ private:
     /// Returns true and sets current_task if entry parsed and the check is passed
     bool initAndCheckTask(const String & entry_name, String & out_reason, const ZooKeeperPtr & zookeeper);
 
-
     void processTask(DDLTask & task, const ZooKeeperPtr & zookeeper);
 
-    void processTaskAlter(
+    /// Check that query should be executed on leader replica only
+    bool taskShouldBeExecutedOnLeader(const ASTPtr ast_ddl, StoragePtr storage) const;
+
+    /// Check that shard has consistent config with table
+    void checkShardConfig(const String & table, const DDLTask & taks, StoragePtr storage) const;
+
+    /// Executes query only on leader replica in case of replicated table.
+    /// Queries like TRUNCATE/ALTER .../OPTIMIZE have to be executed only on one node of shard.
+    /// Most of these queries can be executed on non-leader replica, but actually they still send
+    /// query via RemoteBlockOutputStream to leader, so to avoid such "2-phase" query execution we
+    /// execute query directly on leader.
+    bool tryExecuteQueryOnLeaderReplica(
         DDLTask & task,
-        const ASTAlterQuery * ast_alter,
+        StoragePtr storage,
         const String & rewritten_query,
         const String & node_path,
         const ZooKeeperPtr & zookeeper);
diff --git a/dbms/src/Interpreters/EmbeddedDictionaries.cpp b/dbms/src/Interpreters/EmbeddedDictionaries.cpp
index a585008ae5d..4dd9f50b82c 100644
--- a/dbms/src/Interpreters/EmbeddedDictionaries.cpp
+++ b/dbms/src/Interpreters/EmbeddedDictionaries.cpp
@@ -1,12 +1,10 @@
 #include <Dictionaries/Embedded/RegionsHierarchies.h>
 #include <Dictionaries/Embedded/RegionsNames.h>
-#include <Dictionaries/Embedded/TechDataHierarchy.h>
 #include <Dictionaries/Embedded/IGeoDictionariesLoader.h>
 #include <Interpreters/Context.h>
 #include <Interpreters/EmbeddedDictionaries.h>
 #include <Common/setThreadName.h>
 #include <Common/Exception.h>
-#include "config_core.h"
 #include <common/logger_useful.h>
 #include <Poco/Util/Application.h>
 
@@ -74,22 +72,6 @@ bool EmbeddedDictionaries::reloadImpl(const bool throw_on_error, const bool forc
 
     bool was_exception = false;
 
-#if USE_MYSQL
-    DictionaryReloader<TechDataHierarchy> reload_tech_data = [=] (const Poco::Util::AbstractConfiguration & config)
-        -> std::unique_ptr<TechDataHierarchy>
-    {
-        if (!TechDataHierarchy::isConfigured(config))
-            return {};
-
-        auto dictionary = std::make_unique<TechDataHierarchy>();
-        dictionary->reload();
-        return dictionary;
-    };
-
-    if (!reloadDictionary<TechDataHierarchy>(tech_data_hierarchy, reload_tech_data, throw_on_error, force_reload))
-        was_exception = true;
-#endif
-
     DictionaryReloader<RegionsHierarchies> reload_regions_hierarchies = [=] (const Poco::Util::AbstractConfiguration & config)
     {
         return geo_dictionaries_loader->reloadRegionsHierarchies(config);
diff --git a/dbms/src/Interpreters/EmbeddedDictionaries.h b/dbms/src/Interpreters/EmbeddedDictionaries.h
index caa7c1cc62d..56abfe12aaa 100644
--- a/dbms/src/Interpreters/EmbeddedDictionaries.h
+++ b/dbms/src/Interpreters/EmbeddedDictionaries.h
@@ -10,7 +10,6 @@
 namespace Poco { class Logger; namespace Util { class AbstractConfiguration; } }
 
 class RegionsHierarchies;
-class TechDataHierarchy;
 class RegionsNames;
 class IGeoDictionariesLoader;
 
@@ -30,7 +29,6 @@ private:
     Context & context;
 
     MultiVersion<RegionsHierarchies> regions_hierarchies;
-    MultiVersion<TechDataHierarchy> tech_data_hierarchy;
     MultiVersion<RegionsNames> regions_names;
 
     std::unique_ptr<IGeoDictionariesLoader> geo_dictionaries_loader;
@@ -85,11 +83,6 @@ public:
         return regions_hierarchies.get();
     }
 
-    MultiVersion<TechDataHierarchy>::Version getTechDataHierarchy() const
-    {
-        return tech_data_hierarchy.get();
-    }
-
     MultiVersion<RegionsNames>::Version getRegionsNames() const
     {
         return regions_names.get();
diff --git a/dbms/src/Interpreters/ProcessList.cpp b/dbms/src/Interpreters/ProcessList.cpp
index a4fe438af8f..def39d4d91c 100644
--- a/dbms/src/Interpreters/ProcessList.cpp
+++ b/dbms/src/Interpreters/ProcessList.cpp
@@ -87,10 +87,9 @@ ProcessList::EntryPtr ProcessList::insert(const String & query_, const IAST * as
     {
         std::unique_lock lock(mutex);
 
+        const auto max_wait_ms = settings.queue_max_wait_ms.totalMilliseconds();
         if (!is_unlimited_query && max_size && processes.size() >= max_size)
         {
-            auto max_wait_ms = settings.queue_max_wait_ms.totalMilliseconds();
-
             if (!max_wait_ms || !have_space.wait_for(lock, std::chrono::milliseconds(max_wait_ms), [&]{ return processes.size() < max_size; }))
                 throw Exception("Too many simultaneous queries. Maximum: " + toString(max_size), ErrorCodes::TOO_MANY_SIMULTANEOUS_QUERIES);
         }
@@ -117,20 +116,41 @@ ProcessList::EntryPtr ProcessList::insert(const String & query_, const IAST * as
                         + ", maximum: " + settings.max_concurrent_queries_for_user.toString(),
                         ErrorCodes::TOO_MANY_SIMULTANEOUS_QUERIES);
 
-                auto range = user_process_list->second.queries.equal_range(client_info.current_query_id);
-                if (range.first != range.second)
+                auto running_query = user_process_list->second.queries.find(client_info.current_query_id);
+
+                if (running_query != user_process_list->second.queries.end())
                 {
                     if (!settings.replace_running_query)
                         throw Exception("Query with id = " + client_info.current_query_id + " is already running.",
                             ErrorCodes::QUERY_WITH_SAME_ID_IS_ALREADY_RUNNING);
 
                     /// Ask queries to cancel. They will check this flag.
-                    for (auto it = range.first; it != range.second; ++it)
-                        it->second->is_killed.store(true, std::memory_order_relaxed);
-                }
+                    running_query->second->is_killed.store(true, std::memory_order_relaxed);
+
+                    if (!max_wait_ms || !have_space.wait_for(lock, std::chrono::milliseconds(max_wait_ms), [&]
+                        {
+                            running_query = user_process_list->second.queries.find(client_info.current_query_id);
+                            if (running_query == user_process_list->second.queries.end())
+                                return true;
+                            running_query->second->is_killed.store(true, std::memory_order_relaxed);
+                            return false;
+                        }))
+                        throw Exception("Query with id = " + client_info.current_query_id + " is already running and can't be stopped",
+                            ErrorCodes::QUERY_WITH_SAME_ID_IS_ALREADY_RUNNING);
+                 }
             }
         }
 
+        /// Check other users running query with our query_id
+        for (const auto & user_process_list : user_to_queries)
+        {
+            if (user_process_list.first == client_info.current_user)
+                continue;
+            if (auto running_query = user_process_list.second.queries.find(client_info.current_query_id); running_query != user_process_list.second.queries.end())
+                throw Exception("Query with id = " + client_info.current_query_id + " is already running by user " + user_process_list.first,
+                    ErrorCodes::QUERY_WITH_SAME_ID_IS_ALREADY_RUNNING);
+        }
+
         auto process_it = processes.emplace(processes.end(),
             query_, client_info, settings.max_memory_usage, settings.memory_tracker_fault_probability, priorities.insert(settings.priority));
 
@@ -226,17 +246,12 @@ ProcessListEntry::~ProcessListEntry()
 
     bool found = false;
 
-    auto range = user_process_list.queries.equal_range(query_id);
-    if (range.first != range.second)
+    if (auto running_query = user_process_list.queries.find(query_id); running_query != user_process_list.queries.end())
     {
-        for (auto jt = range.first; jt != range.second; ++jt)
+        if (running_query->second == process_list_element_ptr)
         {
-            if (jt->second == process_list_element_ptr)
-            {
-                user_process_list.queries.erase(jt);
-                found = true;
-                break;
-            }
+            user_process_list.queries.erase(running_query->first);
+            found = true;
         }
     }
 
@@ -245,8 +260,7 @@ ProcessListEntry::~ProcessListEntry()
         LOG_ERROR(&Logger::get("ProcessList"), "Logical error: cannot find query by query_id and pointer to ProcessListElement in ProcessListForUser");
         std::terminate();
     }
-
-    parent.have_space.notify_one();
+    parent.have_space.notify_all();
 
     /// If there are no more queries for the user, then we will reset memory tracker and network throttler.
     if (user_process_list.queries.empty())
diff --git a/dbms/src/Interpreters/ProcessList.h b/dbms/src/Interpreters/ProcessList.h
index 32f59749450..b75a4e7a730 100644
--- a/dbms/src/Interpreters/ProcessList.h
+++ b/dbms/src/Interpreters/ProcessList.h
@@ -203,7 +203,7 @@ struct ProcessListForUser
     ProcessListForUser();
 
     /// query_id -> ProcessListElement(s). There can be multiple queries with the same query_id as long as all queries except one are cancelled.
-    using QueryToElement = std::unordered_multimap<String, QueryStatus *>;
+    using QueryToElement = std::unordered_map<String, QueryStatus *>;
     QueryToElement queries;
 
     ProfileEvents::Counters user_performance_counters{VariableContext::User, &ProfileEvents::global_counters};
diff --git a/dbms/src/Interpreters/Set.cpp b/dbms/src/Interpreters/Set.cpp
index 9932bca84a0..98705caa949 100644
--- a/dbms/src/Interpreters/Set.cpp
+++ b/dbms/src/Interpreters/Set.cpp
@@ -465,7 +465,7 @@ MergeTreeSetIndex::MergeTreeSetIndex(const Columns & set_elements, std::vector<K
   * 1: the intersection of the set and the range is non-empty
   * 2: the range contains elements not in the set
   */
-BoolMask MergeTreeSetIndex::mayBeTrueInRange(const std::vector<Range> & key_ranges, const DataTypes & data_types)
+BoolMask MergeTreeSetIndex::checkInRange(const std::vector<Range> & key_ranges, const DataTypes & data_types)
 {
     size_t tuple_size = indexes_mapping.size();
 
diff --git a/dbms/src/Interpreters/Set.h b/dbms/src/Interpreters/Set.h
index e069a74059c..61314d3582e 100644
--- a/dbms/src/Interpreters/Set.h
+++ b/dbms/src/Interpreters/Set.h
@@ -167,7 +167,7 @@ using Sets = std::vector<SetPtr>;
 class IFunction;
 using FunctionPtr = std::shared_ptr<IFunction>;
 
-/// Class for mayBeTrueInRange function.
+/// Class for checkInRange function.
 class MergeTreeSetIndex
 {
 public:
@@ -185,7 +185,7 @@ public:
 
     size_t size() const { return ordered_set.at(0)->size(); }
 
-    BoolMask mayBeTrueInRange(const std::vector<Range> & key_ranges, const DataTypes & data_types);
+    BoolMask checkInRange(const std::vector<Range> & key_ranges, const DataTypes & data_types);
 
 private:
     Columns ordered_set;
diff --git a/dbms/src/Interpreters/SystemLog.cpp b/dbms/src/Interpreters/SystemLog.cpp
index 94214b26f6e..f46b348db7a 100644
--- a/dbms/src/Interpreters/SystemLog.cpp
+++ b/dbms/src/Interpreters/SystemLog.cpp
@@ -50,6 +50,12 @@ SystemLogs::SystemLogs(Context & global_context, const Poco::Util::AbstractConfi
 
 
 SystemLogs::~SystemLogs()
+{
+    shutdown();
+}
+
+
+void SystemLogs::shutdown()
 {
     if (query_log)
         query_log->shutdown();
diff --git a/dbms/src/Interpreters/SystemLog.h b/dbms/src/Interpreters/SystemLog.h
index 59dda00e71b..48dbde5a38b 100644
--- a/dbms/src/Interpreters/SystemLog.h
+++ b/dbms/src/Interpreters/SystemLog.h
@@ -2,6 +2,7 @@
 
 #include <thread>
 #include <atomic>
+#include <condition_variable>
 #include <boost/noncopyable.hpp>
 #include <common/logger_useful.h>
 #include <Core/Types.h>
@@ -67,6 +68,8 @@ struct SystemLogs
     SystemLogs(Context & global_context, const Poco::Util::AbstractConfiguration & config);
     ~SystemLogs();
 
+    void shutdown();
+
     std::shared_ptr<QueryLog> query_log;                /// Used to log queries.
     std::shared_ptr<QueryThreadLog> query_thread_log;   /// Used to log query threads.
     std::shared_ptr<PartLog> part_log;                  /// Used to log operations with parts
@@ -101,22 +104,10 @@ public:
     /** Append a record into log.
       * Writing to table will be done asynchronously and in case of failure, record could be lost.
       */
-    void add(const LogElement & element)
-    {
-        if (is_shutdown)
-            return;
-
-        /// Without try we could block here in case of queue overflow.
-        if (!queue.tryPush({false, element}))
-            LOG_ERROR(log, "SystemLog queue is full");
-    }
+    void add(const LogElement & element);
 
     /// Flush data in the buffer to disk
-    void flush()
-    {
-        if (!is_shutdown)
-            flushImpl(false);
-    }
+    void flush();
 
     /// Stop the background flush thread before destructor. No more data will be written.
     void shutdown();
@@ -130,7 +121,15 @@ protected:
     const size_t flush_interval_milliseconds;
     std::atomic<bool> is_shutdown{false};
 
-    using QueueItem = std::pair<bool, LogElement>;        /// First element is shutdown flag for thread.
+    enum class EntryType
+    {
+        LOG_ELEMENT = 0,
+        AUTO_FLUSH,
+        FORCE_FLUSH,
+        SHUTDOWN,
+    };
+
+    using QueueItem = std::pair<EntryType, LogElement>;
 
     /// Queue is bounded. But its size is quite large to not block in all normal cases.
     ConcurrentBoundedQueue<QueueItem> queue {DBMS_SYSTEM_LOG_QUEUE_SIZE};
@@ -140,7 +139,6 @@ protected:
       *  than accumulation of large amount of log records (for example, for query log - processing of large amount of queries).
       */
     std::vector<LogElement> data;
-    std::mutex data_mutex;
 
     Logger * log;
 
@@ -157,7 +155,13 @@ protected:
     bool is_prepared = false;
     void prepareTable();
 
-    void flushImpl(bool quiet);
+    std::mutex flush_mutex;
+    std::mutex condvar_mutex;
+    std::condition_variable flush_condvar;
+    bool force_flushing = false;
+
+    /// flushImpl can be executed only in saving_thread.
+    void flushImpl(EntryType reason);
 };
 
 
@@ -178,6 +182,37 @@ SystemLog<LogElement>::SystemLog(Context & context_,
 }
 
 
+template <typename LogElement>
+void SystemLog<LogElement>::add(const LogElement & element)
+{
+    if (is_shutdown)
+        return;
+
+    /// Without try we could block here in case of queue overflow.
+    if (!queue.tryPush({EntryType::LOG_ELEMENT, element}))
+        LOG_ERROR(log, "SystemLog queue is full");
+}
+
+
+template <typename LogElement>
+void SystemLog<LogElement>::flush()
+{
+    if (is_shutdown)
+        return;
+
+    std::lock_guard flush_lock(flush_mutex);
+    force_flushing = true;
+
+    /// Tell thread to execute extra flush.
+    queue.push({EntryType::FORCE_FLUSH, {}});
+
+    /// Wait for flush being finished.
+    std::unique_lock lock(condvar_mutex);
+    while (force_flushing)
+        flush_condvar.wait(lock);
+}
+
+
 template <typename LogElement>
 void SystemLog<LogElement>::shutdown()
 {
@@ -186,7 +221,7 @@ void SystemLog<LogElement>::shutdown()
         return;
 
     /// Tell thread to shutdown.
-    queue.push({true, {}});
+    queue.push({EntryType::SHUTDOWN, {}});
     saving_thread.join();
 }
 
@@ -219,16 +254,10 @@ void SystemLog<LogElement>::threadFunction()
             QueueItem element;
             bool has_element = false;
 
-            bool is_empty;
-            {
-                std::unique_lock lock(data_mutex);
-                is_empty = data.empty();
-            }
-
             /// data.size() is increased only in this function
             /// TODO: get rid of data and queue duality
 
-            if (is_empty)
+            if (data.empty())
             {
                 queue.pop(element);
                 has_element = true;
@@ -242,25 +271,27 @@ void SystemLog<LogElement>::threadFunction()
 
             if (has_element)
             {
-                if (element.first)
+                if (element.first == EntryType::SHUTDOWN)
                 {
-                    /// Shutdown.
                     /// NOTE: MergeTree engine can write data even it is already in shutdown state.
-                    flush();
+                    flushImpl(element.first);
                     break;
                 }
-                else
+                else if (element.first == EntryType::FORCE_FLUSH)
                 {
-                    std::unique_lock lock(data_mutex);
-                    data.push_back(element.second);
+                    flushImpl(element.first);
+                    time_after_last_write.restart();
+                    continue;
                 }
+                else
+                    data.push_back(element.second);
             }
 
             size_t milliseconds_elapsed = time_after_last_write.elapsed() / 1000000;
             if (milliseconds_elapsed >= flush_interval_milliseconds)
             {
                 /// Write data to a table.
-                flushImpl(true);
+                flushImpl(EntryType::AUTO_FLUSH);
                 time_after_last_write.restart();
             }
         }
@@ -275,13 +306,11 @@ void SystemLog<LogElement>::threadFunction()
 
 
 template <typename LogElement>
-void SystemLog<LogElement>::flushImpl(bool quiet)
+void SystemLog<LogElement>::flushImpl(EntryType reason)
 {
-    std::unique_lock lock(data_mutex);
-
     try
     {
-        if (quiet && data.empty())
+        if ((reason == EntryType::AUTO_FLUSH || reason == EntryType::SHUTDOWN) && data.empty())
             return;
 
         LOG_TRACE(log, "Flushing system log");
@@ -320,6 +349,12 @@ void SystemLog<LogElement>::flushImpl(bool quiet)
         /// In case of exception, also clean accumulated data - to avoid locking.
         data.clear();
     }
+    if (reason == EntryType::FORCE_FLUSH)
+    {
+        std::lock_guard lock(condvar_mutex);
+        force_flushing = false;
+        flush_condvar.notify_one();
+    }
 }
 
 
diff --git a/dbms/src/Storages/IStorage.cpp b/dbms/src/Storages/IStorage.cpp
index 06320cc1f30..3d0ac164e26 100644
--- a/dbms/src/Storages/IStorage.cpp
+++ b/dbms/src/Storages/IStorage.cpp
@@ -157,7 +157,7 @@ void IStorage::check(const Names & column_names) const
     {
         if (columns_map.end() == columns_map.find(name))
             throw Exception(
-                "There is no column with name " + name + " in table. There are columns: " + list_of_columns,
+                "There is no column with name " + backQuote(name) + " in table " + getTableName() + ". There are columns: " + list_of_columns,
                 ErrorCodes::NO_SUCH_COLUMN_IN_TABLE);
 
         if (unique_names.end() != unique_names.find(name))
diff --git a/dbms/src/Storages/MergeTree/KeyCondition.cpp b/dbms/src/Storages/MergeTree/KeyCondition.cpp
index 0d7bd729bcd..400feaad860 100644
--- a/dbms/src/Storages/MergeTree/KeyCondition.cpp
+++ b/dbms/src/Storages/MergeTree/KeyCondition.cpp
@@ -810,7 +810,7 @@ String KeyCondition::toString() const
   */
 
 template <typename F>
-static bool forAnyParallelogram(
+static BoolMask forAnyParallelogram(
     size_t key_size,
     const Field * key_left,
     const Field * key_right,
@@ -866,16 +866,15 @@ static bool forAnyParallelogram(
     for (size_t i = prefix_size + 1; i < key_size; ++i)
         parallelogram[i] = Range();
 
-    if (callback(parallelogram))
-        return true;
+    BoolMask result(false, false);
+    result = result | callback(parallelogram);
 
     /// [x1]       x [y1 .. +inf)
 
     if (left_bounded)
     {
         parallelogram[prefix_size] = Range(key_left[prefix_size]);
-        if (forAnyParallelogram(key_size, key_left, key_right, true, false, parallelogram, prefix_size + 1, callback))
-            return true;
+        result = result | forAnyParallelogram(key_size, key_left, key_right, true, false, parallelogram, prefix_size + 1, callback);
     }
 
     /// [x2]       x (-inf .. y2]
@@ -883,15 +882,14 @@ static bool forAnyParallelogram(
     if (right_bounded)
     {
         parallelogram[prefix_size] = Range(key_right[prefix_size]);
-        if (forAnyParallelogram(key_size, key_left, key_right, false, true, parallelogram, prefix_size + 1, callback))
-            return true;
+        result = result | forAnyParallelogram(key_size, key_left, key_right, false, true, parallelogram, prefix_size + 1, callback);
     }
 
-    return false;
+    return result;
 }
 
 
-bool KeyCondition::mayBeTrueInRange(
+BoolMask KeyCondition::checkInRange(
     size_t used_key_size,
     const Field * left_key,
     const Field * right_key,
@@ -917,7 +915,7 @@ bool KeyCondition::mayBeTrueInRange(
     return forAnyParallelogram(used_key_size, left_key, right_key, true, right_bounded, key_ranges, 0,
         [&] (const std::vector<Range> & key_ranges_parallelogram)
     {
-        auto res = mayBeTrueInParallelogram(key_ranges_parallelogram, data_types);
+        auto res = checkInParallelogram(key_ranges_parallelogram, data_types);
 
 /*      std::cerr << "Parallelogram: ";
         for (size_t i = 0, size = key_ranges.size(); i != size; ++i)
@@ -928,11 +926,11 @@ bool KeyCondition::mayBeTrueInRange(
     });
 }
 
+
 std::optional<Range> KeyCondition::applyMonotonicFunctionsChainToRange(
     Range key_range,
     MonotonicFunctionsChain & functions,
-    DataTypePtr current_type
-)
+    DataTypePtr current_type)
 {
     for (auto & func : functions)
     {
@@ -965,7 +963,7 @@ std::optional<Range> KeyCondition::applyMonotonicFunctionsChainToRange(
     return key_range;
 }
 
-bool KeyCondition::mayBeTrueInParallelogram(const std::vector<Range> & parallelogram, const DataTypes & data_types) const
+BoolMask KeyCondition::checkInParallelogram(const std::vector<Range> & parallelogram, const DataTypes & data_types) const
 {
     std::vector<BoolMask> rpn_stack;
     for (size_t i = 0; i < rpn.size(); ++i)
@@ -1013,7 +1011,7 @@ bool KeyCondition::mayBeTrueInParallelogram(const std::vector<Range> & parallelo
             if (!element.set_index)
                 throw Exception("Set for IN is not created yet", ErrorCodes::LOGICAL_ERROR);
 
-            rpn_stack.emplace_back(element.set_index->mayBeTrueInRange(parallelogram, data_types));
+            rpn_stack.emplace_back(element.set_index->checkInRange(parallelogram, data_types));
             if (element.function == RPNElement::FUNCTION_NOT_IN_SET)
                 rpn_stack.back() = !rpn_stack.back();
         }
@@ -1048,22 +1046,23 @@ bool KeyCondition::mayBeTrueInParallelogram(const std::vector<Range> & parallelo
     }
 
     if (rpn_stack.size() != 1)
-        throw Exception("Unexpected stack size in KeyCondition::mayBeTrueInRange", ErrorCodes::LOGICAL_ERROR);
+        throw Exception("Unexpected stack size in KeyCondition::checkInRange", ErrorCodes::LOGICAL_ERROR);
 
-    return rpn_stack[0].can_be_true;
+    return rpn_stack[0];
 }
 
 
-bool KeyCondition::mayBeTrueInRange(
+BoolMask KeyCondition::checkInRange(
     size_t used_key_size, const Field * left_key, const Field * right_key, const DataTypes & data_types) const
 {
-    return mayBeTrueInRange(used_key_size, left_key, right_key, data_types, true);
+    return checkInRange(used_key_size, left_key, right_key, data_types, true);
 }
 
-bool KeyCondition::mayBeTrueAfter(
+
+BoolMask KeyCondition::getMaskAfter(
     size_t used_key_size, const Field * left_key, const DataTypes & data_types) const
 {
-    return mayBeTrueInRange(used_key_size, left_key, nullptr, data_types, false);
+    return checkInRange(used_key_size, left_key, nullptr, data_types, false);
 }
 
 
diff --git a/dbms/src/Storages/MergeTree/KeyCondition.h b/dbms/src/Storages/MergeTree/KeyCondition.h
index 61989d1b2d9..2a5c520b243 100644
--- a/dbms/src/Storages/MergeTree/KeyCondition.h
+++ b/dbms/src/Storages/MergeTree/KeyCondition.h
@@ -235,17 +235,17 @@ public:
         const Names & key_column_names,
         const ExpressionActionsPtr & key_expr);
 
-    /// Whether the condition is feasible in the key range.
+    /// Whether the condition and its negation are (independently) feasible in the key range.
     /// left_key and right_key must contain all fields in the sort_descr in the appropriate order.
     /// data_types - the types of the key columns.
-    bool mayBeTrueInRange(size_t used_key_size, const Field * left_key, const Field * right_key, const DataTypes & data_types) const;
+    BoolMask checkInRange(size_t used_key_size, const Field * left_key, const Field * right_key, const DataTypes & data_types) const;
 
-    /// Whether the condition is feasible in the direct product of single column ranges specified by `parallelogram`.
-    bool mayBeTrueInParallelogram(const std::vector<Range> & parallelogram, const DataTypes & data_types) const;
+    /// Whether the condition and its negation are feasible in the direct product of single column ranges specified by `parallelogram`.
+    BoolMask checkInParallelogram(const std::vector<Range> & parallelogram, const DataTypes & data_types) const;
 
-    /// Is the condition valid in a semi-infinite (not limited to the right) key range.
+    /// Are the condition and its negation valid in a semi-infinite (not limited to the right) key range.
     /// left_key must contain all the fields in the sort_descr in the appropriate order.
-    bool mayBeTrueAfter(size_t used_key_size, const Field * left_key, const DataTypes & data_types) const;
+    BoolMask getMaskAfter(size_t used_key_size, const Field * left_key, const DataTypes & data_types) const;
 
     /// Checks that the index can not be used.
     bool alwaysUnknownOrTrue() const;
@@ -330,7 +330,7 @@ public:
     static const AtomMap atom_map;
 
 private:
-    bool mayBeTrueInRange(
+    BoolMask checkInRange(
         size_t used_key_size,
         const Field * left_key,
         const Field * right_key,
diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
index d8002f91a07..dfbd9c0e246 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp
@@ -265,8 +265,8 @@ BlockInputStreams MergeTreeDataSelectExecutor::readFromParts(
             if (part->isEmpty())
                 continue;
 
-            if (minmax_idx_condition && !minmax_idx_condition->mayBeTrueInParallelogram(
-                    part->minmax_idx.parallelogram, data.minmax_idx_column_types))
+            if (minmax_idx_condition && !minmax_idx_condition->checkInParallelogram(
+                    part->minmax_idx.parallelogram, data.minmax_idx_column_types).can_be_true)
                 continue;
 
             if (max_block_numbers_to_read)
@@ -518,7 +518,7 @@ BlockInputStreams MergeTreeDataSelectExecutor::readFromParts(
 
     RangesInDataParts parts_with_ranges;
 
-    std::vector<std::pair<MergeTreeIndexPtr, IndexConditionPtr>> useful_indices;
+    std::vector<std::pair<MergeTreeIndexPtr, MergeTreeIndexConditionPtr>> useful_indices;
     for (const auto & index : data.skip_indices)
     {
         auto condition = index->createIndexCondition(query_info, context);
@@ -950,8 +950,8 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
                 for (size_t i = 0; i < used_key_size; ++i)
                     index[i]->get(range.begin, index_left[i]);
 
-                may_be_true = key_condition.mayBeTrueAfter(
-                    used_key_size, index_left.data(), data.primary_key_data_types);
+                may_be_true = key_condition.getMaskAfter(
+                    used_key_size, index_left.data(), data.primary_key_data_types).can_be_true;
             }
             else
             {
@@ -964,8 +964,8 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
                     index[i]->get(range.end, index_right[i]);
                 }
 
-                may_be_true = key_condition.mayBeTrueInRange(
-                    used_key_size, index_left.data(), index_right.data(), data.primary_key_data_types);
+                may_be_true = key_condition.checkInRange(
+                    used_key_size, index_left.data(), index_right.data(), data.primary_key_data_types).can_be_true;
             }
 
             if (!may_be_true)
@@ -998,7 +998,7 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
 
 MarkRanges MergeTreeDataSelectExecutor::filterMarksUsingIndex(
     MergeTreeIndexPtr index,
-    IndexConditionPtr condition,
+    MergeTreeIndexConditionPtr condition,
     MergeTreeData::DataPartPtr part,
     const MarkRanges & ranges,
     const Settings & settings) const
diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h
index a949d593904..d38d00d055b 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h
@@ -84,7 +84,7 @@ private:
 
     MarkRanges filterMarksUsingIndex(
         MergeTreeIndexPtr index,
-        IndexConditionPtr condition,
+        MergeTreeIndexConditionPtr condition,
         MergeTreeData::DataPartPtr part,
         const MarkRanges & ranges,
         const Settings & settings) const;
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.cpp
new file mode 100644
index 00000000000..760721b5f3c
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.cpp
@@ -0,0 +1,62 @@
+#include <Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.h>
+
+#include <ext/bit_cast.h>
+#include <Columns/ColumnString.h>
+#include <Columns/ColumnsNumber.h>
+#include <Columns/ColumnFixedString.h>
+#include <Common/HashTable/Hash.h>
+#include <DataTypes/DataTypesNumber.h>
+#include <Interpreters/BloomFilterHash.h>
+
+
+namespace DB
+{
+
+namespace ErrorCodes
+{
+    extern const int LOGICAL_ERROR;
+    extern const int ILLEGAL_COLUMN;
+}
+
+MergeTreeIndexAggregatorBloomFilter::MergeTreeIndexAggregatorBloomFilter(
+    size_t bits_per_row_, size_t hash_functions_, const Names & columns_name_)
+    : bits_per_row(bits_per_row_), hash_functions(hash_functions_), index_columns_name(columns_name_)
+{
+}
+
+bool MergeTreeIndexAggregatorBloomFilter::empty() const
+{
+    return !total_rows;
+}
+
+MergeTreeIndexGranulePtr MergeTreeIndexAggregatorBloomFilter::getGranuleAndReset()
+{
+    const auto granule = std::make_shared<MergeTreeIndexGranuleBloomFilter>(bits_per_row, hash_functions, total_rows, granule_index_blocks);
+    total_rows = 0;
+    granule_index_blocks.clear();
+    return granule;
+}
+
+void MergeTreeIndexAggregatorBloomFilter::update(const Block & block, size_t * pos, size_t limit)
+{
+    if (*pos >= block.rows())
+        throw Exception("The provided position is not less than the number of block rows. Position: " + toString(*pos) + ", Block rows: " +
+                        toString(block.rows()) + ".", ErrorCodes::LOGICAL_ERROR);
+
+    Block granule_index_block;
+    size_t max_read_rows = std::min(block.rows() - *pos, limit);
+
+    for (size_t index = 0; index < index_columns_name.size(); ++index)
+    {
+        const auto & column_and_type = block.getByName(index_columns_name[index]);
+        const auto & index_column = BloomFilterHash::hashWithColumn(column_and_type.type, column_and_type.column, *pos, max_read_rows);
+
+        granule_index_block.insert({std::move(index_column), std::make_shared<DataTypeUInt64>(), column_and_type.name});
+    }
+
+    *pos += max_read_rows;
+    total_rows += max_read_rows;
+    granule_index_blocks.push_back(granule_index_block);
+}
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.h b/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.h
new file mode 100644
index 00000000000..ebbe9865313
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.h
@@ -0,0 +1,29 @@
+#pragma once
+
+#include <Storages/MergeTree/MergeTreeIndices.h>
+#include <Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h>
+
+namespace DB
+{
+
+class MergeTreeIndexAggregatorBloomFilter : public IMergeTreeIndexAggregator
+{
+public:
+    MergeTreeIndexAggregatorBloomFilter(size_t bits_per_row_, size_t hash_functions_, const Names & columns_name_);
+
+    bool empty() const override;
+
+    MergeTreeIndexGranulePtr getGranuleAndReset() override;
+
+    void update(const Block & block, size_t * pos, size_t limit) override;
+
+private:
+    size_t bits_per_row;
+    size_t hash_functions;
+    const Names index_columns_name;
+
+    size_t total_rows = 0;
+    Blocks granule_index_blocks;
+};
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp
new file mode 100644
index 00000000000..b86da56649d
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp
@@ -0,0 +1,110 @@
+#include <Storages/MergeTree/MergeTreeIndexBloomFilter.h>
+#include <Storages/MergeTree/MergeTreeData.h>
+#include <Interpreters/SyntaxAnalyzer.h>
+#include <Interpreters/ExpressionAnalyzer.h>
+#include <Core/Types.h>
+#include <ext/bit_cast.h>
+#include <Parsers/ASTLiteral.h>
+#include <IO/ReadHelpers.h>
+#include <IO/WriteHelpers.h>
+#include <DataTypes/DataTypeNullable.h>
+#include <Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h>
+#include <Parsers/queryToString.h>
+#include <Columns/ColumnConst.h>
+#include <Interpreters/BloomFilterHash.h>
+
+
+namespace DB
+{
+
+namespace ErrorCodes
+{
+    extern const int LOGICAL_ERROR;
+    extern const int INCORRECT_QUERY;
+}
+
+MergeTreeIndexBloomFilter::MergeTreeIndexBloomFilter(
+    const String & name_, const ExpressionActionsPtr & expr_, const Names & columns_, const DataTypes & data_types_, const Block & header_,
+    size_t granularity_, size_t bits_per_row_, size_t hash_functions_)
+    : IMergeTreeIndex(name_, expr_, columns_, data_types_, header_, granularity_), bits_per_row(bits_per_row_),
+      hash_functions(hash_functions_)
+{
+}
+
+MergeTreeIndexGranulePtr MergeTreeIndexBloomFilter::createIndexGranule() const
+{
+    return std::make_shared<MergeTreeIndexGranuleBloomFilter>(bits_per_row, hash_functions, columns.size());
+}
+
+bool MergeTreeIndexBloomFilter::mayBenefitFromIndexForIn(const ASTPtr & node) const
+{
+    const String & column_name = node->getColumnName();
+
+    for (const auto & name : columns)
+        if (column_name == name)
+            return true;
+
+    if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
+    {
+        for (const auto & children : func->arguments->children)
+            if (mayBenefitFromIndexForIn(children))
+                return true;
+    }
+
+    return false;
+}
+
+MergeTreeIndexAggregatorPtr MergeTreeIndexBloomFilter::createIndexAggregator() const
+{
+    return std::make_shared<MergeTreeIndexAggregatorBloomFilter>(bits_per_row, hash_functions, columns);
+}
+
+MergeTreeIndexConditionPtr MergeTreeIndexBloomFilter::createIndexCondition(const SelectQueryInfo & query_info, const Context & context) const
+{
+    return std::make_shared<MergeTreeIndexConditionBloomFilter>(query_info, context, header, hash_functions);
+}
+
+static void assertIndexColumnsType(const Block & header)
+{
+    if (!header || !header.columns())
+        throw Exception("Index must have columns.", ErrorCodes::INCORRECT_QUERY);
+
+    const DataTypes & columns_data_types = header.getDataTypes();
+
+    for (size_t index = 0; index < columns_data_types.size(); ++index)
+    {
+        WhichDataType which(columns_data_types[index]);
+
+        if (!which.isUInt() && !which.isInt() && !which.isString() && !which.isFixedString() && !which.isFloat() &&
+            !which.isDateOrDateTime() && !which.isEnum())
+            throw Exception("Unexpected type " + columns_data_types[index]->getName() + " of bloom filter index.",
+                            ErrorCodes::ILLEGAL_COLUMN);
+    }
+}
+
+std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreatorNew(
+    const NamesAndTypesList & columns, std::shared_ptr<ASTIndexDeclaration> node, const Context & context)
+{
+    if (node->name.empty())
+        throw Exception("Index must have unique name.", ErrorCodes::INCORRECT_QUERY);
+
+    ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(node->expr->clone());
+
+    auto syntax = SyntaxAnalyzer(context, {}).analyze(expr_list, columns);
+    auto index_expr = ExpressionAnalyzer(expr_list, syntax, context).getActions(false);
+    auto index_sample = ExpressionAnalyzer(expr_list, syntax, context).getActions(true)->getSampleBlock();
+
+    assertIndexColumnsType(index_sample);
+
+    double max_conflict_probability = 0.025;
+    if (node->type->arguments && !node->type->arguments->children.empty())
+        max_conflict_probability = typeid_cast<const ASTLiteral &>(*node->type->arguments->children[0]).value.get<Float64>();
+
+    const auto & bits_per_row_and_size_of_hash_functions = BloomFilterHash::calculationBestPractices(max_conflict_probability);
+
+    return std::make_unique<MergeTreeIndexBloomFilter>(
+        node->name, std::move(index_expr), index_sample.getNames(), index_sample.getDataTypes(), index_sample, node->granularity,
+        bits_per_row_and_size_of_hash_functions.first, bits_per_row_and_size_of_hash_functions.second);
+}
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.h b/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.h
new file mode 100644
index 00000000000..2b89b9bddfa
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexBloomFilter.h
@@ -0,0 +1,31 @@
+#pragma once
+
+#include <Interpreters/BloomFilter.h>
+#include <Storages/MergeTree/MergeTreeIndices.h>
+#include <Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h>
+#include <Storages/MergeTree/MergeTreeIndexAggregatorBloomFilter.h>
+
+namespace DB
+{
+
+class MergeTreeIndexBloomFilter : public IMergeTreeIndex
+{
+public:
+    MergeTreeIndexBloomFilter(
+        const String & name_, const ExpressionActionsPtr & expr_, const Names & columns_, const DataTypes & data_types_,
+        const Block & header_, size_t granularity_, size_t bits_per_row_, size_t hash_functions_);
+
+    MergeTreeIndexGranulePtr createIndexGranule() const override;
+
+    MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
+
+    MergeTreeIndexConditionPtr createIndexCondition(const SelectQueryInfo & query_info, const Context & context) const override;
+
+    bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
+
+private:
+    size_t bits_per_row;
+    size_t hash_functions;
+};
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp
new file mode 100644
index 00000000000..9c8a9d4b41c
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp
@@ -0,0 +1,352 @@
+#include <Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h>
+#include <Interpreters/QueryNormalizer.h>
+#include <Interpreters/BloomFilterHash.h>
+#include <Common/HashTable/ClearableHashMap.h>
+#include <Storages/MergeTree/RPNBuilder.h>
+#include <Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h>
+#include <DataTypes/DataTypeTuple.h>
+#include <Columns/ColumnConst.h>
+#include <ext/bit_cast.h>
+#include <Parsers/ASTSubquery.h>
+#include <Parsers/ASTIdentifier.h>
+#include <Columns/ColumnTuple.h>
+#include <Interpreters/castColumn.h>
+#include <Interpreters/convertFieldToType.h>
+
+
+namespace DB
+{
+
+namespace
+{
+
+PreparedSetKey getPreparedSetKey(const ASTPtr & node, const DataTypePtr & data_type)
+{
+    /// If the data type is tuple, let's try unbox once
+    if (node->as<ASTSubquery>() || node->as<ASTIdentifier>())
+        return PreparedSetKey::forSubquery(*node);
+
+    if (const auto * date_type_tuple = typeid_cast<const DataTypeTuple *>(&*data_type))
+        return PreparedSetKey::forLiteral(*node, date_type_tuple->getElements());
+
+    return PreparedSetKey::forLiteral(*node, DataTypes(1, data_type));
+}
+
+ColumnWithTypeAndName getPreparedSetInfo(const SetPtr & prepared_set)
+{
+    if (prepared_set->getDataTypes().size() == 1)
+        return {prepared_set->getSetElements()[0], prepared_set->getDataTypes()[0], "dummy"};
+
+    return {ColumnTuple::create(prepared_set->getSetElements()), std::make_shared<DataTypeTuple>(prepared_set->getDataTypes()), "dummy"};
+}
+
+bool maybeTrueOnBloomFilter(const IColumn * hash_column, const BloomFilterPtr & bloom_filter, size_t hash_functions)
+{
+    const auto const_column = typeid_cast<const ColumnConst *>(hash_column);
+    const auto non_const_column = typeid_cast<const ColumnUInt64 *>(hash_column);
+
+    if (!const_column && !non_const_column)
+        throw Exception("LOGICAL ERROR: hash column must be Const Column or UInt64 Column.", ErrorCodes::LOGICAL_ERROR);
+
+    if (const_column)
+    {
+        for (size_t index = 0; index < hash_functions; ++index)
+            if (!bloom_filter->findHashWithSeed(const_column->getValue<UInt64>(), BloomFilterHash::bf_hash_seed[index]))
+                return false;
+        return true;
+    }
+    else
+    {
+        bool missing_rows = true;
+        const ColumnUInt64::Container & data = non_const_column->getData();
+
+        for (size_t index = 0, size = data.size(); missing_rows && index < size; ++index)
+        {
+            bool match_row = true;
+            for (size_t hash_index = 0; match_row && hash_index < hash_functions; ++hash_index)
+                match_row = bloom_filter->findHashWithSeed(data[index], BloomFilterHash::bf_hash_seed[hash_index]);
+
+            missing_rows = !match_row;
+        }
+
+        return !missing_rows;
+    }
+}
+
+}
+
+MergeTreeIndexConditionBloomFilter::MergeTreeIndexConditionBloomFilter(
+    const SelectQueryInfo & info, const Context & context, const Block & header, size_t hash_functions)
+    : header(header), context(context), query_info(info), hash_functions(hash_functions)
+{
+    auto atomFromAST = [this](auto & node, auto &, auto & constants, auto & out) { return traverseAtomAST(node, constants, out); };
+    rpn = std::move(RPNBuilder<RPNElement>(info, context, atomFromAST).extractRPN());
+}
+
+bool MergeTreeIndexConditionBloomFilter::alwaysUnknownOrTrue() const
+{
+    std::vector<bool> rpn_stack;
+
+    for (const auto & element : rpn)
+    {
+        if (element.function == RPNElement::FUNCTION_UNKNOWN
+            || element.function == RPNElement::ALWAYS_TRUE)
+        {
+            rpn_stack.push_back(true);
+        }
+        else if (element.function == RPNElement::FUNCTION_EQUALS
+                 || element.function == RPNElement::FUNCTION_NOT_EQUALS
+                 || element.function == RPNElement::FUNCTION_IN
+                 || element.function == RPNElement::FUNCTION_NOT_IN
+                 || element.function == RPNElement::ALWAYS_FALSE)
+        {
+            rpn_stack.push_back(false);
+        }
+        else if (element.function == RPNElement::FUNCTION_NOT)
+        {
+            // do nothing
+        }
+        else if (element.function == RPNElement::FUNCTION_AND)
+        {
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+            rpn_stack.back() = arg1 && arg2;
+        }
+        else if (element.function == RPNElement::FUNCTION_OR)
+        {
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+            rpn_stack.back() = arg1 || arg2;
+        }
+        else
+            throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
+    }
+
+    return rpn_stack[0];
+}
+
+bool MergeTreeIndexConditionBloomFilter::mayBeTrueOnGranule(const MergeTreeIndexGranuleBloomFilter * granule) const
+{
+    std::vector<BoolMask> rpn_stack;
+    const auto & filters = granule->getFilters();
+
+    for (const auto & element : rpn)
+    {
+        if (element.function == RPNElement::FUNCTION_UNKNOWN)
+        {
+            rpn_stack.emplace_back(true, true);
+        }
+        else if (element.function == RPNElement::FUNCTION_IN
+            || element.function == RPNElement::FUNCTION_NOT_IN
+            || element.function == RPNElement::FUNCTION_EQUALS
+            || element.function == RPNElement::FUNCTION_NOT_EQUALS)
+        {
+            bool match_rows = true;
+            const auto & predicate = element.predicate;
+            for (size_t index = 0; match_rows && index < predicate.size(); ++index)
+            {
+                const auto & query_index_hash = predicate[index];
+                const auto & filter = filters[query_index_hash.first];
+                const ColumnPtr & hash_column = query_index_hash.second;
+                match_rows = maybeTrueOnBloomFilter(&*hash_column, filter, hash_functions);
+            }
+
+            rpn_stack.emplace_back(match_rows, !match_rows);
+            if (element.function == RPNElement::FUNCTION_NOT_EQUALS || element.function == RPNElement::FUNCTION_NOT_IN)
+                rpn_stack.back() = !rpn_stack.back();
+        }
+        else if (element.function == RPNElement::FUNCTION_NOT)
+        {
+            rpn_stack.back() = !rpn_stack.back();
+        }
+        else if (element.function == RPNElement::FUNCTION_OR)
+        {
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+            rpn_stack.back() = arg1 | arg2;
+        }
+        else if (element.function == RPNElement::FUNCTION_AND)
+        {
+            auto arg1 = rpn_stack.back();
+            rpn_stack.pop_back();
+            auto arg2 = rpn_stack.back();
+            rpn_stack.back() = arg1 & arg2;
+        }
+        else if (element.function == RPNElement::ALWAYS_TRUE)
+        {
+            rpn_stack.emplace_back(true, false);
+        }
+        else if (element.function == RPNElement::ALWAYS_FALSE)
+        {
+            rpn_stack.emplace_back(false, true);
+        }
+        else
+            throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
+    }
+
+    if (rpn_stack.size() != 1)
+        throw Exception("Unexpected stack size in KeyCondition::mayBeTrueInRange", ErrorCodes::LOGICAL_ERROR);
+
+    return rpn_stack[0].can_be_true;
+}
+
+bool MergeTreeIndexConditionBloomFilter::traverseAtomAST(const ASTPtr & node, Block & block_with_constants, RPNElement & out)
+{
+    {
+        Field const_value;
+        DataTypePtr const_type;
+        if (KeyCondition::getConstant(node, block_with_constants, const_value, const_type))
+        {
+            if (const_value.getType() == Field::Types::UInt64 || const_value.getType() == Field::Types::Int64 ||
+                const_value.getType() == Field::Types::Float64)
+            {
+                /// Zero in all types is represented in memory the same way as in UInt64.
+                out.function = const_value.get<UInt64>() ? RPNElement::ALWAYS_TRUE : RPNElement::ALWAYS_FALSE;
+                return true;
+            }
+        }
+    }
+
+    if (const auto * function = node->as<ASTFunction>())
+    {
+        const ASTs & arguments = function->arguments->children;
+
+        if (arguments.size() != 2)
+            return false;
+
+        if (functionIsInOrGlobalInOperator(function->name))
+        {
+            if (const auto & prepared_set = getPreparedSet(arguments[1]))
+                return traverseASTIn(function->name, arguments[0], prepared_set, out);
+        }
+        else if (function->name == "equals" || function->name  == "notEquals")
+        {
+            Field const_value;
+            DataTypePtr const_type;
+            if (KeyCondition::getConstant(arguments[1], block_with_constants, const_value, const_type))
+                return traverseASTEquals(function->name, arguments[0], const_type, const_value, out);
+            else if (KeyCondition::getConstant(arguments[0], block_with_constants, const_value, const_type))
+                return traverseASTEquals(function->name, arguments[1], const_type, const_value, out);
+        }
+    }
+
+    return false;
+}
+
+bool MergeTreeIndexConditionBloomFilter::traverseASTIn(
+    const String & function_name, const ASTPtr & key_ast, const SetPtr & prepared_set, RPNElement & out)
+{
+    const auto & prepared_info = getPreparedSetInfo(prepared_set);
+    return traverseASTIn(function_name, key_ast, prepared_info.type, prepared_info.column, out);
+}
+
+bool MergeTreeIndexConditionBloomFilter::traverseASTIn(
+    const String & function_name, const ASTPtr & key_ast, const DataTypePtr & type, const ColumnPtr & column, RPNElement & out)
+{
+    if (header.has(key_ast->getColumnName()))
+    {
+        size_t row_size = column->size();
+        size_t position = header.getPositionByName(key_ast->getColumnName());
+        const DataTypePtr & index_type = header.getByPosition(position).type;
+        const auto & converted_column = castColumn(ColumnWithTypeAndName{column, type, ""}, index_type, context);
+        out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithColumn(index_type, converted_column, 0, row_size)));
+
+        if (function_name == "in"  || function_name == "globalIn")
+            out.function = RPNElement::FUNCTION_IN;
+
+        if (function_name == "notIn"  || function_name == "globalNotIn")
+            out.function = RPNElement::FUNCTION_NOT_IN;
+
+        return true;
+    }
+
+    if (const auto * function = key_ast->as<ASTFunction>())
+    {
+        WhichDataType which(type);
+
+        if (which.isTuple() && function->name == "tuple")
+        {
+            const auto & tuple_column = typeid_cast<const ColumnTuple *>(column.get());
+            const auto & tuple_data_type = typeid_cast<const DataTypeTuple *>(type.get());
+            const ASTs & arguments = typeid_cast<const ASTExpressionList &>(*function->arguments).children;
+
+            if (tuple_data_type->getElements().size() != arguments.size() || tuple_column->getColumns().size() != arguments.size())
+                throw Exception("Illegal types of arguments of function " + function_name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+
+            bool match_with_subtype = false;
+            const auto & sub_columns = tuple_column->getColumns();
+            const auto & sub_data_types = tuple_data_type->getElements();
+
+            for (size_t index = 0; index < arguments.size(); ++index)
+                match_with_subtype |= traverseASTIn(function_name, arguments[index], sub_data_types[index], sub_columns[index], out);
+
+            return match_with_subtype;
+        }
+    }
+
+    return false;
+}
+
+bool MergeTreeIndexConditionBloomFilter::traverseASTEquals(
+    const String & function_name, const ASTPtr & key_ast, const DataTypePtr & value_type, const Field & value_field, RPNElement & out)
+{
+    if (header.has(key_ast->getColumnName()))
+    {
+        size_t position = header.getPositionByName(key_ast->getColumnName());
+        const DataTypePtr & index_type = header.getByPosition(position).type;
+        Field converted_field = convertFieldToType(value_field, *index_type, &*value_type);
+        out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithField(&*index_type, converted_field)));
+        out.function = function_name == "equals" ? RPNElement::FUNCTION_EQUALS : RPNElement::FUNCTION_NOT_EQUALS;
+        return true;
+    }
+
+    if (const auto * function = key_ast->as<ASTFunction>())
+    {
+        WhichDataType which(value_type);
+
+        if (which.isTuple() && function->name == "tuple")
+        {
+            const TupleBackend & tuple = get<const Tuple &>(value_field).toUnderType();
+            const auto value_tuple_data_type = typeid_cast<const DataTypeTuple *>(value_type.get());
+            const ASTs & arguments = typeid_cast<const ASTExpressionList &>(*function->arguments).children;
+
+            if (tuple.size() != arguments.size())
+                throw Exception("Illegal types of arguments of function " + function_name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
+
+            bool match_with_subtype = false;
+            const DataTypes & subtypes = value_tuple_data_type->getElements();
+
+            for (size_t index = 0; index < tuple.size(); ++index)
+                match_with_subtype |= traverseASTEquals(function_name, arguments[index], subtypes[index], tuple[index], out);
+
+            return match_with_subtype;
+        }
+    }
+
+    return false;
+}
+
+SetPtr MergeTreeIndexConditionBloomFilter::getPreparedSet(const ASTPtr & node)
+{
+    if (header.has(node->getColumnName()))
+    {
+        const auto & column_and_type = header.getByName(node->getColumnName());
+        const auto & prepared_set_it = query_info.sets.find(getPreparedSetKey(node, column_and_type.type));
+
+        if (prepared_set_it != query_info.sets.end() && prepared_set_it->second->hasExplicitSetElements())
+            return prepared_set_it->second;
+    }
+    else
+    {
+        for (const auto & prepared_set_it : query_info.sets)
+            if (prepared_set_it.first.ast_hash == node->getTreeHash() && prepared_set_it.second->hasExplicitSetElements())
+                return prepared_set_it.second;
+    }
+
+    return DB::SetPtr();
+}
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h b/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h
new file mode 100644
index 00000000000..6c268cadbb6
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h
@@ -0,0 +1,74 @@
+#pragma once
+
+#include <Columns/IColumn.h>
+#include <Interpreters/BloomFilter.h>
+#include <Storages/MergeTree/KeyCondition.h>
+#include <Storages/MergeTree/MergeTreeIndices.h>
+#include <Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h>
+
+namespace DB
+{
+
+class MergeTreeIndexConditionBloomFilter : public IMergeTreeIndexCondition
+{
+public:
+    struct RPNElement
+    {
+        enum Function
+        {
+            /// Atoms of a Boolean expression.
+            FUNCTION_EQUALS,
+            FUNCTION_NOT_EQUALS,
+            FUNCTION_IN,
+            FUNCTION_NOT_IN,
+            FUNCTION_UNKNOWN, /// Can take any value.
+            /// Operators of the logical expression.
+            FUNCTION_NOT,
+            FUNCTION_AND,
+            FUNCTION_OR,
+            /// Constants
+            ALWAYS_FALSE,
+            ALWAYS_TRUE,
+        };
+
+        RPNElement(Function function_ = FUNCTION_UNKNOWN) : function(function_) {}
+
+        Function function = FUNCTION_UNKNOWN;
+        std::vector<std::pair<size_t, ColumnPtr>> predicate;
+    };
+
+    MergeTreeIndexConditionBloomFilter(const SelectQueryInfo & info, const Context & context, const Block & header, size_t hash_functions);
+
+    bool alwaysUnknownOrTrue() const override;
+
+    bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr granule) const override
+    {
+        if (const auto & bf_granule = typeid_cast<const MergeTreeIndexGranuleBloomFilter *>(granule.get()))
+            return mayBeTrueOnGranule(bf_granule);
+
+        throw Exception("LOGICAL ERROR: require bloom filter index granule.", ErrorCodes::LOGICAL_ERROR);
+    }
+
+private:
+    const Block & header;
+    const Context & context;
+    const SelectQueryInfo & query_info;
+    const size_t hash_functions;
+    std::vector<RPNElement> rpn;
+
+    SetPtr getPreparedSet(const ASTPtr & node);
+
+    bool mayBeTrueOnGranule(const MergeTreeIndexGranuleBloomFilter * granule) const;
+
+    bool traverseAtomAST(const ASTPtr & node, Block & block_with_constants, RPNElement & out);
+
+    bool traverseASTIn(const String & function_name, const ASTPtr & key_ast, const SetPtr & prepared_set, RPNElement & out);
+
+    bool traverseASTIn(
+        const String & function_name, const ASTPtr & key_ast, const DataTypePtr & type, const ColumnPtr & column, RPNElement & out);
+
+    bool traverseASTEquals(
+        const String & function_name, const ASTPtr & key_ast, const DataTypePtr & value_type, const Field & value_field, RPNElement & out);
+};
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexFullText.cpp
similarity index 86%
rename from dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.cpp
rename to dbms/src/Storages/MergeTree/MergeTreeIndexFullText.cpp
index 966775e4017..895764339e5 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexFullText.cpp
@@ -1,4 +1,4 @@
-#include <Storages/MergeTree/MergeTreeBloomFilterIndex.h>
+#include <Storages/MergeTree/MergeTreeIndexFullText.h>
 
 #include <Common/StringUtils/StringUtils.h>
 #include <Common/UTF8Helpers.h>
@@ -31,7 +31,7 @@ namespace ErrorCodes
 
 /// Adds all tokens from string to bloom filter.
 static void stringToBloomFilter(
-    const char * data, size_t size, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
+    const char * data, size_t size, const std::unique_ptr<ITokenExtractor> & token_extractor, BloomFilter & bloom_filter)
 {
     size_t cur = 0;
     size_t token_start = 0;
@@ -42,7 +42,7 @@ static void stringToBloomFilter(
 
 /// Adds all tokens from like pattern string to bloom filter. (Because like pattern can contain `\%` and `\_`.)
 static void likeStringToBloomFilter(
-    const String & data, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
+    const String & data, const std::unique_ptr<ITokenExtractor> & token_extractor, BloomFilter & bloom_filter)
 {
     size_t cur = 0;
     String token;
@@ -51,24 +51,23 @@ static void likeStringToBloomFilter(
 }
 
 
-MergeTreeBloomFilterIndexGranule::MergeTreeBloomFilterIndexGranule(const MergeTreeBloomFilterIndex & index)
+MergeTreeIndexGranuleFullText::MergeTreeIndexGranuleFullText(const MergeTreeIndexFullText & index)
     : IMergeTreeIndexGranule()
     , index(index)
     , bloom_filters(
-            index.columns.size(), StringBloomFilter(index.bloom_filter_size, index.bloom_filter_hashes, index.seed))
+            index.columns.size(), BloomFilter(index.bloom_filter_size, index.bloom_filter_hashes, index.seed))
     , has_elems(false) {}
 
-void MergeTreeBloomFilterIndexGranule::serializeBinary(WriteBuffer & ostr) const
+void MergeTreeIndexGranuleFullText::serializeBinary(WriteBuffer & ostr) const
 {
     if (empty())
-        throw Exception(
-                "Attempt to write empty minmax index " + backQuote(index.name), ErrorCodes::LOGICAL_ERROR);
+        throw Exception("Attempt to write empty minmax index " + backQuote(index.name), ErrorCodes::LOGICAL_ERROR);
 
     for (const auto & bloom_filter : bloom_filters)
         ostr.write(reinterpret_cast<const char *>(bloom_filter.getFilter().data()), index.bloom_filter_size);
 }
 
-void MergeTreeBloomFilterIndexGranule::deserializeBinary(ReadBuffer & istr)
+void MergeTreeIndexGranuleFullText::deserializeBinary(ReadBuffer & istr)
 {
     for (auto & bloom_filter : bloom_filters)
     {
@@ -78,17 +77,17 @@ void MergeTreeBloomFilterIndexGranule::deserializeBinary(ReadBuffer & istr)
 }
 
 
-MergeTreeBloomFilterIndexAggregator::MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index)
-    : index(index), granule(std::make_shared<MergeTreeBloomFilterIndexGranule>(index)) {}
+MergeTreeIndexAggregatorFullText::MergeTreeIndexAggregatorFullText(const MergeTreeIndexFullText & index)
+    : index(index), granule(std::make_shared<MergeTreeIndexGranuleFullText>(index)) {}
 
-MergeTreeIndexGranulePtr MergeTreeBloomFilterIndexAggregator::getGranuleAndReset()
+MergeTreeIndexGranulePtr MergeTreeIndexAggregatorFullText::getGranuleAndReset()
 {
-    auto new_granule = std::make_shared<MergeTreeBloomFilterIndexGranule>(index);
+    auto new_granule = std::make_shared<MergeTreeIndexGranuleFullText>(index);
     new_granule.swap(granule);
     return new_granule;
 }
 
-void MergeTreeBloomFilterIndexAggregator::update(const Block & block, size_t * pos, size_t limit)
+void MergeTreeIndexAggregatorFullText::update(const Block & block, size_t * pos, size_t limit)
 {
     if (*pos >= block.rows())
         throw Exception(
@@ -111,14 +110,14 @@ void MergeTreeBloomFilterIndexAggregator::update(const Block & block, size_t * p
 }
 
 
-const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
+const MergeTreeConditionFullText::AtomMap MergeTreeConditionFullText::atom_map
 {
         {
                 "notEquals",
-                [] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
+                [] (RPNElement & out, const Field & value, const MergeTreeIndexFullText & idx)
                 {
                     out.function = RPNElement::FUNCTION_NOT_EQUALS;
-                    out.bloom_filter = std::make_unique<StringBloomFilter>(
+                    out.bloom_filter = std::make_unique<BloomFilter>(
                             idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
 
                     const auto & str = value.get<String>();
@@ -128,10 +127,10 @@ const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
         },
         {
                 "equals",
-                [] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
+                [] (RPNElement & out, const Field & value, const MergeTreeIndexFullText & idx)
                 {
                     out.function = RPNElement::FUNCTION_EQUALS;
-                    out.bloom_filter = std::make_unique<StringBloomFilter>(
+                    out.bloom_filter = std::make_unique<BloomFilter>(
                             idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
 
                     const auto & str = value.get<String>();
@@ -141,10 +140,10 @@ const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
         },
         {
                 "like",
-                [] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
+                [] (RPNElement & out, const Field & value, const MergeTreeIndexFullText & idx)
                 {
                     out.function = RPNElement::FUNCTION_LIKE;
-                    out.bloom_filter = std::make_unique<StringBloomFilter>(
+                    out.bloom_filter = std::make_unique<BloomFilter>(
                             idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
 
                     const auto & str = value.get<String>();
@@ -154,7 +153,7 @@ const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
         },
         {
                 "notIn",
-                [] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
+                [] (RPNElement & out, const Field &, const MergeTreeIndexFullText &)
                 {
                     out.function = RPNElement::FUNCTION_NOT_IN;
                     return true;
@@ -162,7 +161,7 @@ const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
         },
         {
                 "in",
-                [] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
+                [] (RPNElement & out, const Field &, const MergeTreeIndexFullText &)
                 {
                     out.function = RPNElement::FUNCTION_IN;
                     return true;
@@ -170,24 +169,21 @@ const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
         },
 };
 
-BloomFilterCondition::BloomFilterCondition(
+MergeTreeConditionFullText::MergeTreeConditionFullText(
     const SelectQueryInfo & query_info,
     const Context & context,
-    const MergeTreeBloomFilterIndex & index_) : index(index_), prepared_sets(query_info.sets)
+    const MergeTreeIndexFullText & index_) : index(index_), prepared_sets(query_info.sets)
 {
     rpn = std::move(
             RPNBuilder<RPNElement>(
                     query_info, context,
-                    [this] (const ASTPtr & node,
-                            const Context & /* context */,
-                            Block & block_with_constants,
-                            RPNElement & out) -> bool
+                    [this] (const ASTPtr & node, const Context & /* context */, Block & block_with_constants, RPNElement & out) -> bool
                     {
                         return this->atomFromAST(node, block_with_constants, out);
                     }).extractRPN());
 }
 
-bool BloomFilterCondition::alwaysUnknownOrTrue() const
+bool MergeTreeConditionFullText::alwaysUnknownOrTrue() const
 {
     /// Check like in KeyCondition.
     std::vector<bool> rpn_stack;
@@ -234,10 +230,10 @@ bool BloomFilterCondition::alwaysUnknownOrTrue() const
     return rpn_stack[0];
 }
 
-bool BloomFilterCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
+bool MergeTreeConditionFullText::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
 {
-    std::shared_ptr<MergeTreeBloomFilterIndexGranule> granule
-            = std::dynamic_pointer_cast<MergeTreeBloomFilterIndexGranule>(idx_granule);
+    std::shared_ptr<MergeTreeIndexGranuleFullText> granule
+            = std::dynamic_pointer_cast<MergeTreeIndexGranuleFullText>(idx_granule);
     if (!granule)
         throw Exception(
                 "BloomFilter index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
@@ -314,16 +310,16 @@ bool BloomFilterCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granu
             rpn_stack.emplace_back(true, false);
         }
         else
-            throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
+            throw Exception("Unexpected function type in BloomFilterCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
     }
 
     if (rpn_stack.size() != 1)
-        throw Exception("Unexpected stack size in KeyCondition::mayBeTrueInRange", ErrorCodes::LOGICAL_ERROR);
+        throw Exception("Unexpected stack size in BloomFilterCondition::mayBeTrueOnGranule", ErrorCodes::LOGICAL_ERROR);
 
     return rpn_stack[0].can_be_true;
 }
 
-bool BloomFilterCondition::getKey(const ASTPtr & node, size_t & key_column_num)
+bool MergeTreeConditionFullText::getKey(const ASTPtr & node, size_t & key_column_num)
 {
     auto it = std::find(index.columns.begin(), index.columns.end(), node->getColumnName());
     if (it == index.columns.end())
@@ -333,7 +329,7 @@ bool BloomFilterCondition::getKey(const ASTPtr & node, size_t & key_column_num)
     return true;
 }
 
-bool BloomFilterCondition::atomFromAST(
+bool MergeTreeConditionFullText::atomFromAST(
     const ASTPtr & node, Block & block_with_constants, RPNElement & out)
 {
     Field const_value;
@@ -399,7 +395,7 @@ bool BloomFilterCondition::atomFromAST(
     return false;
 }
 
-bool BloomFilterCondition::tryPrepareSetBloomFilter(
+bool MergeTreeConditionFullText::tryPrepareSetBloomFilter(
     const ASTs & args,
     RPNElement & out)
 {
@@ -454,7 +450,7 @@ bool BloomFilterCondition::tryPrepareSetBloomFilter(
         if (data_type->getTypeId() != TypeIndex::String && data_type->getTypeId() != TypeIndex::FixedString)
             return false;
 
-    std::vector<std::vector<StringBloomFilter>> bloom_filters;
+    std::vector<std::vector<BloomFilter>> bloom_filters;
     std::vector<size_t> key_position;
 
     Columns columns = prepared_set->getSetElements();
@@ -480,23 +476,23 @@ bool BloomFilterCondition::tryPrepareSetBloomFilter(
 }
 
 
-MergeTreeIndexGranulePtr MergeTreeBloomFilterIndex::createIndexGranule() const
+MergeTreeIndexGranulePtr MergeTreeIndexFullText::createIndexGranule() const
 {
-    return std::make_shared<MergeTreeBloomFilterIndexGranule>(*this);
+    return std::make_shared<MergeTreeIndexGranuleFullText>(*this);
 }
 
-MergeTreeIndexAggregatorPtr MergeTreeBloomFilterIndex::createIndexAggregator() const
+MergeTreeIndexAggregatorPtr MergeTreeIndexFullText::createIndexAggregator() const
 {
-    return std::make_shared<MergeTreeBloomFilterIndexAggregator>(*this);
+    return std::make_shared<MergeTreeIndexAggregatorFullText>(*this);
 }
 
-IndexConditionPtr MergeTreeBloomFilterIndex::createIndexCondition(
+MergeTreeIndexConditionPtr MergeTreeIndexFullText::createIndexCondition(
         const SelectQueryInfo & query, const Context & context) const
 {
-    return std::make_shared<BloomFilterCondition>(query, context, *this);
+    return std::make_shared<MergeTreeConditionFullText>(query, context, *this);
 };
 
-bool MergeTreeBloomFilterIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
+bool MergeTreeIndexFullText::mayBenefitFromIndexForIn(const ASTPtr & node) const
 {
     return std::find(std::cbegin(columns), std::cend(columns), node->getColumnName()) != std::cend(columns);
 }
@@ -679,7 +675,7 @@ std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
 
         auto tokenizer = std::make_unique<NgramTokenExtractor>(n);
 
-        return std::make_unique<MergeTreeBloomFilterIndex>(
+        return std::make_unique<MergeTreeIndexFullText>(
                 node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
                 bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
     }
@@ -697,7 +693,7 @@ std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
 
         auto tokenizer = std::make_unique<SplitTokenExtractor>();
 
-        return std::make_unique<MergeTreeBloomFilterIndex>(
+        return std::make_unique<MergeTreeIndexFullText>(
                 node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
                 bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
     }
diff --git a/dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.h b/dbms/src/Storages/MergeTree/MergeTreeIndexFullText.h
similarity index 79%
rename from dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.h
rename to dbms/src/Storages/MergeTree/MergeTreeIndexFullText.h
index 888ffe7f9cc..cd8ac534e64 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexFullText.h
@@ -10,54 +10,54 @@
 namespace DB
 {
 
-class MergeTreeBloomFilterIndex;
+class MergeTreeIndexFullText;
 
 
-struct MergeTreeBloomFilterIndexGranule : public IMergeTreeIndexGranule
+struct MergeTreeIndexGranuleFullText : public IMergeTreeIndexGranule
 {
-    explicit MergeTreeBloomFilterIndexGranule(
-            const MergeTreeBloomFilterIndex & index);
+    explicit MergeTreeIndexGranuleFullText(
+            const MergeTreeIndexFullText & index);
 
-    ~MergeTreeBloomFilterIndexGranule() override = default;
+    ~MergeTreeIndexGranuleFullText() override = default;
 
     void serializeBinary(WriteBuffer & ostr) const override;
     void deserializeBinary(ReadBuffer & istr) override;
 
     bool empty() const override { return !has_elems; }
 
-    const MergeTreeBloomFilterIndex & index;
-    std::vector<StringBloomFilter> bloom_filters;
+    const MergeTreeIndexFullText & index;
+    std::vector<BloomFilter> bloom_filters;
     bool has_elems;
 };
 
-using MergeTreeBloomFilterIndexGranulePtr = std::shared_ptr<MergeTreeBloomFilterIndexGranule>;
+using MergeTreeIndexGranuleFullTextPtr = std::shared_ptr<MergeTreeIndexGranuleFullText>;
 
 
-struct MergeTreeBloomFilterIndexAggregator : IMergeTreeIndexAggregator
+struct MergeTreeIndexAggregatorFullText : IMergeTreeIndexAggregator
 {
-    explicit MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index);
+    explicit MergeTreeIndexAggregatorFullText(const MergeTreeIndexFullText & index);
 
-    ~MergeTreeBloomFilterIndexAggregator() override = default;
+    ~MergeTreeIndexAggregatorFullText() override = default;
 
     bool empty() const override { return !granule || granule->empty(); }
     MergeTreeIndexGranulePtr getGranuleAndReset() override;
 
     void update(const Block & block, size_t * pos, size_t limit) override;
 
-    const MergeTreeBloomFilterIndex & index;
-    MergeTreeBloomFilterIndexGranulePtr granule;
+    const MergeTreeIndexFullText & index;
+    MergeTreeIndexGranuleFullTextPtr granule;
 };
 
 
-class BloomFilterCondition : public IIndexCondition
+class MergeTreeConditionFullText : public IMergeTreeIndexCondition
 {
 public:
-    BloomFilterCondition(
+    MergeTreeConditionFullText(
             const SelectQueryInfo & query_info,
             const Context & context,
-            const MergeTreeBloomFilterIndex & index_);
+            const MergeTreeIndexFullText & index_);
 
-    ~BloomFilterCondition() override = default;
+    ~MergeTreeConditionFullText() override = default;
 
     bool alwaysUnknownOrTrue() const override;
 
@@ -93,19 +93,19 @@ private:
         };
 
         RPNElement(
-            Function function_ = FUNCTION_UNKNOWN, size_t key_column_ = 0, std::unique_ptr<StringBloomFilter> && const_bloom_filter_ = nullptr)
+            Function function_ = FUNCTION_UNKNOWN, size_t key_column_ = 0, std::unique_ptr<BloomFilter> && const_bloom_filter_ = nullptr)
             : function(function_), key_column(key_column_), bloom_filter(std::move(const_bloom_filter_)) {}
 
         Function function = FUNCTION_UNKNOWN;
         /// For FUNCTION_EQUALS, FUNCTION_NOT_EQUALS, FUNCTION_LIKE, FUNCTION_NOT_LIKE.
         size_t key_column;
-        std::unique_ptr<StringBloomFilter> bloom_filter;
+        std::unique_ptr<BloomFilter> bloom_filter;
         /// For FUNCTION_IN and FUNCTION_NOT_IN
-        std::vector<std::vector<StringBloomFilter>> set_bloom_filters;
+        std::vector<std::vector<BloomFilter>> set_bloom_filters;
         std::vector<size_t> set_key_position;
     };
 
-    using AtomMap = std::unordered_map<std::string, bool(*)(RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)>;
+    using AtomMap = std::unordered_map<std::string, bool(*)(RPNElement & out, const Field & value, const MergeTreeIndexFullText & idx)>;
     using RPN = std::vector<RPNElement>;
 
     bool atomFromAST(const ASTPtr & node, Block & block_with_constants, RPNElement & out);
@@ -115,7 +115,7 @@ private:
 
     static const AtomMap atom_map;
 
-    const MergeTreeBloomFilterIndex & index;
+    const MergeTreeIndexFullText & index;
     RPN rpn;
     /// Sets from syntax analyzer.
     PreparedSets prepared_sets;
@@ -164,10 +164,10 @@ struct SplitTokenExtractor : public ITokenExtractor
 };
 
 
-class MergeTreeBloomFilterIndex : public IMergeTreeIndex
+class MergeTreeIndexFullText : public IMergeTreeIndex
 {
 public:
-    MergeTreeBloomFilterIndex(
+    MergeTreeIndexFullText(
             String name_,
             ExpressionActionsPtr expr_,
             const Names & columns_,
@@ -184,12 +184,12 @@ public:
             , seed(seed_)
             , token_extractor_func(std::move(token_extractor_func_)) {}
 
-    ~MergeTreeBloomFilterIndex() override = default;
+    ~MergeTreeIndexFullText() override = default;
 
     MergeTreeIndexGranulePtr createIndexGranule() const override;
     MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
 
-    IndexConditionPtr createIndexCondition(
+    MergeTreeIndexConditionPtr createIndexCondition(
             const SelectQueryInfo & query, const Context & context) const override;
 
     bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp
new file mode 100644
index 00000000000..4eee7309811
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp
@@ -0,0 +1,115 @@
+#include <Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h>
+#include <Columns/ColumnString.h>
+#include <Columns/ColumnNullable.h>
+#include <Columns/ColumnFixedString.h>
+#include <DataTypes/DataTypeNullable.h>
+#include <Common/HashTable/Hash.h>
+#include <ext/bit_cast.h>
+#include <Interpreters/BloomFilterHash.h>
+
+
+namespace DB
+{
+
+MergeTreeIndexGranuleBloomFilter::MergeTreeIndexGranuleBloomFilter(size_t bits_per_row, size_t hash_functions, size_t index_columns)
+    : bits_per_row(bits_per_row), hash_functions(hash_functions)
+{
+    total_rows = 0;
+    bloom_filters.resize(index_columns);
+}
+
+MergeTreeIndexGranuleBloomFilter::MergeTreeIndexGranuleBloomFilter(
+    size_t bits_per_row, size_t hash_functions, size_t total_rows, const Blocks & granule_index_blocks)
+        : total_rows(total_rows), bits_per_row(bits_per_row), hash_functions(hash_functions)
+{
+    if (granule_index_blocks.empty() || !total_rows)
+        throw Exception("LOGICAL ERROR: granule_index_blocks empty or total_rows is zero.", ErrorCodes::LOGICAL_ERROR);
+
+    assertGranuleBlocksStructure(granule_index_blocks);
+
+    for (size_t index = 0; index < granule_index_blocks.size(); ++index)
+    {
+        Block granule_index_block = granule_index_blocks[index];
+
+        if (unlikely(!granule_index_block || !granule_index_block.rows()))
+            throw Exception("LOGICAL ERROR: granule_index_block is empty.", ErrorCodes::LOGICAL_ERROR);
+
+        if (index == 0)
+        {
+            static size_t atom_size = 8;
+            size_t bytes_size = (bits_per_row * total_rows + atom_size - 1) / atom_size;
+
+            for (size_t column = 0, columns = granule_index_block.columns(); column < columns; ++column)
+                bloom_filters.emplace_back(std::make_shared<BloomFilter>(bytes_size, hash_functions, 0));
+        }
+
+        for (size_t column = 0, columns = granule_index_block.columns(); column < columns; ++column)
+            fillingBloomFilter(bloom_filters[column], granule_index_block, column);
+    }
+}
+
+bool MergeTreeIndexGranuleBloomFilter::empty() const
+{
+    return !total_rows;
+}
+
+void MergeTreeIndexGranuleBloomFilter::deserializeBinary(ReadBuffer & istr)
+{
+    if (!empty())
+        throw Exception("Cannot read data to a non-empty bloom filter index.", ErrorCodes::LOGICAL_ERROR);
+
+    readVarUInt(total_rows, istr);
+    for (size_t index = 0; index < bloom_filters.size(); ++index)
+    {
+        static size_t atom_size = 8;
+        size_t bytes_size = (bits_per_row * total_rows + atom_size - 1) / atom_size;
+        bloom_filters[index] = std::make_shared<BloomFilter>(bytes_size, hash_functions, 0);
+        istr.read(reinterpret_cast<char *>(bloom_filters[index]->getFilter().data()), bytes_size);
+    }
+}
+
+void MergeTreeIndexGranuleBloomFilter::serializeBinary(WriteBuffer & ostr) const
+{
+    if (empty())
+        throw Exception("Attempt to write empty bloom filter index.", ErrorCodes::LOGICAL_ERROR);
+
+    static size_t atom_size = 8;
+    writeVarUInt(total_rows, ostr);
+    size_t bytes_size = (bits_per_row * total_rows + atom_size - 1) / atom_size;
+    for (const auto & bloom_filter : bloom_filters)
+        ostr.write(reinterpret_cast<const char *>(bloom_filter->getFilter().data()), bytes_size);
+}
+
+void MergeTreeIndexGranuleBloomFilter::assertGranuleBlocksStructure(const Blocks & granule_index_blocks) const
+{
+    Block prev_block;
+    for (size_t index = 0; index < granule_index_blocks.size(); ++index)
+    {
+        Block granule_index_block = granule_index_blocks[index];
+
+        if (index != 0)
+            assertBlocksHaveEqualStructure(prev_block, granule_index_block, "Granule blocks of bloom filter has difference structure.");
+
+        prev_block = granule_index_block;
+    }
+}
+
+void MergeTreeIndexGranuleBloomFilter::fillingBloomFilter(BloomFilterPtr & bf, const Block & granule_index_block, size_t index_hash_column)
+{
+    const auto & column = granule_index_block.getByPosition(index_hash_column);
+
+    if (const auto hash_column = typeid_cast<const ColumnUInt64 *>(column.column.get()))
+    {
+        const auto & hash_column_vec = hash_column->getData();
+
+        for (size_t index = 0, size = hash_column_vec.size(); index < size; ++index)
+        {
+            const UInt64 & bf_base_hash = hash_column_vec[index];
+
+            for (size_t i = 0; i < hash_functions; ++i)
+                bf->addHashWithSeed(bf_base_hash, BloomFilterHash::bf_hash_seed[i]);
+        }
+    }
+}
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h b/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h
new file mode 100644
index 00000000000..79670678e79
--- /dev/null
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h
@@ -0,0 +1,36 @@
+#pragma once
+
+#include <Interpreters/BloomFilter.h>
+#include <Storages/MergeTree/MergeTreeIndices.h>
+
+namespace DB
+{
+
+class MergeTreeIndexGranuleBloomFilter : public IMergeTreeIndexGranule
+{
+public:
+    MergeTreeIndexGranuleBloomFilter(size_t bits_per_row, size_t hash_functions, size_t index_columns);
+
+    MergeTreeIndexGranuleBloomFilter(size_t bits_per_row, size_t hash_functions, size_t total_rows, const Blocks & granule_index_blocks);
+
+    bool empty() const override;
+
+    void serializeBinary(WriteBuffer & ostr) const override;
+
+    void deserializeBinary(ReadBuffer & istr) override;
+
+    const std::vector<BloomFilterPtr> getFilters() const { return bloom_filters; }
+
+private:
+    size_t total_rows;
+    size_t bits_per_row;
+    size_t hash_functions;
+    std::vector<BloomFilterPtr> bloom_filters;
+
+    void assertGranuleBlocksStructure(const Blocks & granule_index_blocks) const;
+
+    void fillingBloomFilter(BloomFilterPtr & bf, const Block & granule_index_block, size_t index_hash_column);
+};
+
+
+}
diff --git a/dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp
similarity index 72%
rename from dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.cpp
rename to dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp
index 23deb29758d..0d9c4722a25 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp
@@ -1,4 +1,4 @@
-#include <Storages/MergeTree/MergeTreeMinMaxIndex.h>
+#include <Storages/MergeTree/MergeTreeIndexMinMax.h>
 
 #include <Interpreters/ExpressionActions.h>
 #include <Interpreters/ExpressionAnalyzer.h>
@@ -16,14 +16,14 @@ namespace ErrorCodes
 }
 
 
-MergeTreeMinMaxGranule::MergeTreeMinMaxGranule(const MergeTreeMinMaxIndex & index)
+MergeTreeIndexGranuleMinMax::MergeTreeIndexGranuleMinMax(const MergeTreeIndexMinMax & index)
     : IMergeTreeIndexGranule(), index(index), parallelogram() {}
 
-MergeTreeMinMaxGranule::MergeTreeMinMaxGranule(
-    const MergeTreeMinMaxIndex & index, std::vector<Range> && parallelogram)
+MergeTreeIndexGranuleMinMax::MergeTreeIndexGranuleMinMax(
+    const MergeTreeIndexMinMax & index, std::vector<Range> && parallelogram)
     : IMergeTreeIndexGranule(), index(index), parallelogram(std::move(parallelogram)) {}
 
-void MergeTreeMinMaxGranule::serializeBinary(WriteBuffer & ostr) const
+void MergeTreeIndexGranuleMinMax::serializeBinary(WriteBuffer & ostr) const
 {
     if (empty())
         throw Exception(
@@ -50,7 +50,7 @@ void MergeTreeMinMaxGranule::serializeBinary(WriteBuffer & ostr) const
     }
 }
 
-void MergeTreeMinMaxGranule::deserializeBinary(ReadBuffer & istr)
+void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr)
 {
     parallelogram.clear();
     Field min_val;
@@ -83,15 +83,15 @@ void MergeTreeMinMaxGranule::deserializeBinary(ReadBuffer & istr)
 }
 
 
-MergeTreeMinMaxAggregator::MergeTreeMinMaxAggregator(const MergeTreeMinMaxIndex & index)
+MergeTreeIndexAggregatorMinMax::MergeTreeIndexAggregatorMinMax(const MergeTreeIndexMinMax & index)
     : index(index) {}
 
-MergeTreeIndexGranulePtr MergeTreeMinMaxAggregator::getGranuleAndReset()
+MergeTreeIndexGranulePtr MergeTreeIndexAggregatorMinMax::getGranuleAndReset()
 {
-    return std::make_shared<MergeTreeMinMaxGranule>(index, std::move(parallelogram));
+    return std::make_shared<MergeTreeIndexGranuleMinMax>(index, std::move(parallelogram));
 }
 
-void MergeTreeMinMaxAggregator::update(const Block & block, size_t * pos, size_t limit)
+void MergeTreeIndexAggregatorMinMax::update(const Block & block, size_t * pos, size_t limit)
 {
     if (*pos >= block.rows())
         throw Exception(
@@ -122,50 +122,50 @@ void MergeTreeMinMaxAggregator::update(const Block & block, size_t * pos, size_t
 }
 
 
-MinMaxCondition::MinMaxCondition(
+MergeTreeIndexConditionMinMax::MergeTreeIndexConditionMinMax(
     const SelectQueryInfo &query,
     const Context &context,
-    const MergeTreeMinMaxIndex &index)
-    : IIndexCondition(), index(index), condition(query, context, index.columns, index.expr) {}
+    const MergeTreeIndexMinMax &index)
+    : IMergeTreeIndexCondition(), index(index), condition(query, context, index.columns, index.expr) {}
 
-bool MinMaxCondition::alwaysUnknownOrTrue() const
+bool MergeTreeIndexConditionMinMax::alwaysUnknownOrTrue() const
 {
     return condition.alwaysUnknownOrTrue();
 }
 
-bool MinMaxCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
+bool MergeTreeIndexConditionMinMax::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
 {
-    std::shared_ptr<MergeTreeMinMaxGranule> granule
-        = std::dynamic_pointer_cast<MergeTreeMinMaxGranule>(idx_granule);
+    std::shared_ptr<MergeTreeIndexGranuleMinMax> granule
+        = std::dynamic_pointer_cast<MergeTreeIndexGranuleMinMax>(idx_granule);
     if (!granule)
         throw Exception(
             "Minmax index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
     for (const auto & range : granule->parallelogram)
         if (range.left.isNull() || range.right.isNull())
             return true;
-    return condition.mayBeTrueInParallelogram(granule->parallelogram, index.data_types);
+    return condition.checkInParallelogram(granule->parallelogram, index.data_types).can_be_true;
 }
 
 
-MergeTreeIndexGranulePtr MergeTreeMinMaxIndex::createIndexGranule() const
+MergeTreeIndexGranulePtr MergeTreeIndexMinMax::createIndexGranule() const
 {
-    return std::make_shared<MergeTreeMinMaxGranule>(*this);
+    return std::make_shared<MergeTreeIndexGranuleMinMax>(*this);
 }
 
 
-MergeTreeIndexAggregatorPtr MergeTreeMinMaxIndex::createIndexAggregator() const
+MergeTreeIndexAggregatorPtr MergeTreeIndexMinMax::createIndexAggregator() const
 {
-    return std::make_shared<MergeTreeMinMaxAggregator>(*this);
+    return std::make_shared<MergeTreeIndexAggregatorMinMax>(*this);
 }
 
 
-IndexConditionPtr MergeTreeMinMaxIndex::createIndexCondition(
+MergeTreeIndexConditionPtr MergeTreeIndexMinMax::createIndexCondition(
     const SelectQueryInfo & query, const Context & context) const
 {
-    return std::make_shared<MinMaxCondition>(query, context, *this);
+    return std::make_shared<MergeTreeIndexConditionMinMax>(query, context, *this);
 };
 
-bool MergeTreeMinMaxIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
+bool MergeTreeIndexMinMax::mayBenefitFromIndexForIn(const ASTPtr & node) const
 {
     const String column_name = node->getColumnName();
 
@@ -210,7 +210,7 @@ std::unique_ptr<IMergeTreeIndex> minmaxIndexCreator(
         data_types.emplace_back(column.type);
     }
 
-    return std::make_unique<MergeTreeMinMaxIndex>(
+    return std::make_unique<MergeTreeIndexMinMax>(
         node->name, std::move(minmax_expr), columns, data_types, sample, node->granularity);
 }
 
diff --git a/dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.h b/dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.h
similarity index 59%
rename from dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.h
rename to dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.h
index 06be8fe0cdd..5b514cdc738 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexMinMax.h
@@ -10,62 +10,62 @@
 namespace DB
 {
 
-class MergeTreeMinMaxIndex;
+class MergeTreeIndexMinMax;
 
 
-struct MergeTreeMinMaxGranule : public IMergeTreeIndexGranule
+struct MergeTreeIndexGranuleMinMax : public IMergeTreeIndexGranule
 {
-    explicit MergeTreeMinMaxGranule(const MergeTreeMinMaxIndex & index);
-    MergeTreeMinMaxGranule(const MergeTreeMinMaxIndex & index, std::vector<Range> && parallelogram);
-    ~MergeTreeMinMaxGranule() override = default;
+    explicit MergeTreeIndexGranuleMinMax(const MergeTreeIndexMinMax & index);
+    MergeTreeIndexGranuleMinMax(const MergeTreeIndexMinMax & index, std::vector<Range> && parallelogram);
+    ~MergeTreeIndexGranuleMinMax() override = default;
 
     void serializeBinary(WriteBuffer & ostr) const override;
     void deserializeBinary(ReadBuffer & istr) override;
 
     bool empty() const override { return parallelogram.empty(); }
 
-    const MergeTreeMinMaxIndex & index;
+    const MergeTreeIndexMinMax & index;
     std::vector<Range> parallelogram;
 };
 
 
-struct MergeTreeMinMaxAggregator : IMergeTreeIndexAggregator
+struct MergeTreeIndexAggregatorMinMax : IMergeTreeIndexAggregator
 {
-    explicit MergeTreeMinMaxAggregator(const MergeTreeMinMaxIndex & index);
-    ~MergeTreeMinMaxAggregator() override = default;
+    explicit MergeTreeIndexAggregatorMinMax(const MergeTreeIndexMinMax & index);
+    ~MergeTreeIndexAggregatorMinMax() override = default;
 
     bool empty() const override { return parallelogram.empty(); }
     MergeTreeIndexGranulePtr getGranuleAndReset() override;
     void update(const Block & block, size_t * pos, size_t limit) override;
 
-    const MergeTreeMinMaxIndex & index;
+    const MergeTreeIndexMinMax & index;
     std::vector<Range> parallelogram;
 };
 
 
-class MinMaxCondition : public IIndexCondition
+class MergeTreeIndexConditionMinMax : public IMergeTreeIndexCondition
 {
 public:
-    MinMaxCondition(
+    MergeTreeIndexConditionMinMax(
         const SelectQueryInfo & query,
         const Context & context,
-        const MergeTreeMinMaxIndex & index);
+        const MergeTreeIndexMinMax & index);
 
     bool alwaysUnknownOrTrue() const override;
 
     bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
 
-    ~MinMaxCondition() override = default;
+    ~MergeTreeIndexConditionMinMax() override = default;
 private:
-    const MergeTreeMinMaxIndex & index;
+    const MergeTreeIndexMinMax & index;
     KeyCondition condition;
 };
 
 
-class MergeTreeMinMaxIndex : public IMergeTreeIndex
+class MergeTreeIndexMinMax : public IMergeTreeIndex
 {
 public:
-    MergeTreeMinMaxIndex(
+    MergeTreeIndexMinMax(
         String name_,
         ExpressionActionsPtr expr_,
         const Names & columns_,
@@ -74,12 +74,12 @@ public:
         size_t granularity_)
         : IMergeTreeIndex(name_, expr_, columns_, data_types_, header_, granularity_) {}
 
-    ~MergeTreeMinMaxIndex() override = default;
+    ~MergeTreeIndexMinMax() override = default;
 
     MergeTreeIndexGranulePtr createIndexGranule() const override;
     MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
 
-    IndexConditionPtr createIndexCondition(
+    MergeTreeIndexConditionPtr createIndexCondition(
         const SelectQueryInfo & query, const Context & context) const override;
 
     bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
diff --git a/dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexSet.cpp
similarity index 87%
rename from dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.cpp
rename to dbms/src/Storages/MergeTree/MergeTreeIndexSet.cpp
index 5bf06a1ca6d..8efaae8e579 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexSet.cpp
@@ -1,4 +1,4 @@
-#include <Storages/MergeTree/MergeTreeSetSkippingIndex.h>
+#include <Storages/MergeTree/MergeTreeIndexSet.h>
 
 #include <Interpreters/ExpressionActions.h>
 #include <Interpreters/ExpressionAnalyzer.h>
@@ -21,18 +21,18 @@ namespace ErrorCodes
 const Field UNKNOWN_FIELD(3u);
 
 
-MergeTreeSetIndexGranule::MergeTreeSetIndexGranule(const MergeTreeSetSkippingIndex & index)
+MergeTreeIndexGranuleSet::MergeTreeIndexGranuleSet(const MergeTreeIndexSet & index)
     : IMergeTreeIndexGranule()
     , index(index)
     , block(index.header.cloneEmpty()) {}
 
-MergeTreeSetIndexGranule::MergeTreeSetIndexGranule(
-    const MergeTreeSetSkippingIndex & index, MutableColumns && mutable_columns)
+MergeTreeIndexGranuleSet::MergeTreeIndexGranuleSet(
+    const MergeTreeIndexSet & index, MutableColumns && mutable_columns)
     : IMergeTreeIndexGranule()
     , index(index)
     , block(index.header.cloneWithColumns(std::move(mutable_columns))) {}
 
-void MergeTreeSetIndexGranule::serializeBinary(WriteBuffer & ostr) const
+void MergeTreeIndexGranuleSet::serializeBinary(WriteBuffer & ostr) const
 {
     if (empty())
         throw Exception(
@@ -64,7 +64,7 @@ void MergeTreeSetIndexGranule::serializeBinary(WriteBuffer & ostr) const
     }
 }
 
-void MergeTreeSetIndexGranule::deserializeBinary(ReadBuffer & istr)
+void MergeTreeIndexGranuleSet::deserializeBinary(ReadBuffer & istr)
 {
     block.clear();
 
@@ -94,7 +94,7 @@ void MergeTreeSetIndexGranule::deserializeBinary(ReadBuffer & istr)
 }
 
 
-MergeTreeSetIndexAggregator::MergeTreeSetIndexAggregator(const MergeTreeSetSkippingIndex & index)
+MergeTreeIndexAggregatorSet::MergeTreeIndexAggregatorSet(const MergeTreeIndexSet & index)
     : index(index), columns(index.header.cloneEmptyColumns())
 {
     ColumnRawPtrs column_ptrs;
@@ -111,7 +111,7 @@ MergeTreeSetIndexAggregator::MergeTreeSetIndexAggregator(const MergeTreeSetSkipp
     columns = index.header.cloneEmptyColumns();
 }
 
-void MergeTreeSetIndexAggregator::update(const Block & block, size_t * pos, size_t limit)
+void MergeTreeIndexAggregatorSet::update(const Block & block, size_t * pos, size_t limit)
 {
     if (*pos >= block.rows())
         throw Exception(
@@ -164,7 +164,7 @@ void MergeTreeSetIndexAggregator::update(const Block & block, size_t * pos, size
 }
 
 template <typename Method>
-bool MergeTreeSetIndexAggregator::buildFilter(
+bool MergeTreeIndexAggregatorSet::buildFilter(
     Method & method,
     const ColumnRawPtrs & column_ptrs,
     IColumn::Filter & filter,
@@ -190,9 +190,9 @@ bool MergeTreeSetIndexAggregator::buildFilter(
     return has_new_data;
 }
 
-MergeTreeIndexGranulePtr MergeTreeSetIndexAggregator::getGranuleAndReset()
+MergeTreeIndexGranulePtr MergeTreeIndexAggregatorSet::getGranuleAndReset()
 {
-    auto granule = std::make_shared<MergeTreeSetIndexGranule>(index, std::move(columns));
+    auto granule = std::make_shared<MergeTreeIndexGranuleSet>(index, std::move(columns));
 
     switch (data.type)
     {
@@ -212,11 +212,11 @@ MergeTreeIndexGranulePtr MergeTreeSetIndexAggregator::getGranuleAndReset()
 }
 
 
-SetIndexCondition::SetIndexCondition(
+MergeTreeIndexConditionSet::MergeTreeIndexConditionSet(
         const SelectQueryInfo & query,
         const Context & context,
-        const MergeTreeSetSkippingIndex &index)
-        : IIndexCondition(), index(index)
+        const MergeTreeIndexSet &index)
+        : IMergeTreeIndexCondition(), index(index)
 {
     for (size_t i = 0, size = index.columns.size(); i < size; ++i)
     {
@@ -253,14 +253,14 @@ SetIndexCondition::SetIndexCondition(
     actions = ExpressionAnalyzer(expression_ast, syntax_analyzer_result, context).getActions(true);
 }
 
-bool SetIndexCondition::alwaysUnknownOrTrue() const
+bool MergeTreeIndexConditionSet::alwaysUnknownOrTrue() const
 {
     return useless;
 }
 
-bool SetIndexCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
+bool MergeTreeIndexConditionSet::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
 {
-    auto granule = std::dynamic_pointer_cast<MergeTreeSetIndexGranule>(idx_granule);
+    auto granule = std::dynamic_pointer_cast<MergeTreeIndexGranuleSet>(idx_granule);
     if (!granule)
         throw Exception(
                 "Set index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
@@ -294,7 +294,7 @@ bool SetIndexCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule)
     return false;
 }
 
-void SetIndexCondition::traverseAST(ASTPtr & node) const
+void MergeTreeIndexConditionSet::traverseAST(ASTPtr & node) const
 {
     if (operatorFromAST(node))
     {
@@ -309,7 +309,7 @@ void SetIndexCondition::traverseAST(ASTPtr & node) const
         node = std::make_shared<ASTLiteral>(UNKNOWN_FIELD);
 }
 
-bool SetIndexCondition::atomFromAST(ASTPtr & node) const
+bool MergeTreeIndexConditionSet::atomFromAST(ASTPtr & node) const
 {
     /// Function, literal or column
 
@@ -340,7 +340,7 @@ bool SetIndexCondition::atomFromAST(ASTPtr & node) const
     return false;
 }
 
-bool SetIndexCondition::operatorFromAST(ASTPtr & node) const
+bool MergeTreeIndexConditionSet::operatorFromAST(ASTPtr & node) const
 {
     /// Functions AND, OR, NOT. Replace with bit*.
     auto * func = node->as<ASTFunction>();
@@ -416,7 +416,7 @@ static bool checkAtomName(const String & name)
     return atoms.find(name) != atoms.end();
 }
 
-bool SetIndexCondition::checkASTUseless(const ASTPtr &node, bool atomic) const
+bool MergeTreeIndexConditionSet::checkASTUseless(const ASTPtr &node, bool atomic) const
 {
     if (const auto * func = node->as<ASTFunction>())
     {
@@ -446,23 +446,23 @@ bool SetIndexCondition::checkASTUseless(const ASTPtr &node, bool atomic) const
 }
 
 
-MergeTreeIndexGranulePtr MergeTreeSetSkippingIndex::createIndexGranule() const
+MergeTreeIndexGranulePtr MergeTreeIndexSet::createIndexGranule() const
 {
-    return std::make_shared<MergeTreeSetIndexGranule>(*this);
+    return std::make_shared<MergeTreeIndexGranuleSet>(*this);
 }
 
-MergeTreeIndexAggregatorPtr MergeTreeSetSkippingIndex::createIndexAggregator() const
+MergeTreeIndexAggregatorPtr MergeTreeIndexSet::createIndexAggregator() const
 {
-    return std::make_shared<MergeTreeSetIndexAggregator>(*this);
+    return std::make_shared<MergeTreeIndexAggregatorSet>(*this);
 }
 
-IndexConditionPtr MergeTreeSetSkippingIndex::createIndexCondition(
+MergeTreeIndexConditionPtr MergeTreeIndexSet::createIndexCondition(
     const SelectQueryInfo & query, const Context & context) const
 {
-    return std::make_shared<SetIndexCondition>(query, context, *this);
+    return std::make_shared<MergeTreeIndexConditionSet>(query, context, *this);
 };
 
-bool MergeTreeSetSkippingIndex::mayBenefitFromIndexForIn(const ASTPtr &) const
+bool MergeTreeIndexSet::mayBenefitFromIndexForIn(const ASTPtr &) const
 {
     return false;
 }
@@ -506,7 +506,7 @@ std::unique_ptr<IMergeTreeIndex> setIndexCreator(
         header.insert(ColumnWithTypeAndName(column.type->createColumn(), column.type, column.name));
     }
 
-    return std::make_unique<MergeTreeSetSkippingIndex>(
+    return std::make_unique<MergeTreeIndexSet>(
         node->name, std::move(unique_expr), columns, data_types, header, node->granularity, max_rows);
 }
 
diff --git a/dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.h b/dbms/src/Storages/MergeTree/MergeTreeIndexSet.h
similarity index 69%
rename from dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.h
rename to dbms/src/Storages/MergeTree/MergeTreeIndexSet.h
index 61d409af589..04f4d2bec1e 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeSetSkippingIndex.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndexSet.h
@@ -12,12 +12,12 @@
 namespace DB
 {
 
-class MergeTreeSetSkippingIndex;
+class MergeTreeIndexSet;
 
-struct MergeTreeSetIndexGranule : public IMergeTreeIndexGranule
+struct MergeTreeIndexGranuleSet : public IMergeTreeIndexGranule
 {
-    explicit MergeTreeSetIndexGranule(const MergeTreeSetSkippingIndex & index);
-    MergeTreeSetIndexGranule(const MergeTreeSetSkippingIndex & index, MutableColumns && columns);
+    explicit MergeTreeIndexGranuleSet(const MergeTreeIndexSet & index);
+    MergeTreeIndexGranuleSet(const MergeTreeIndexSet & index, MutableColumns && columns);
 
     void serializeBinary(WriteBuffer & ostr) const override;
     void deserializeBinary(ReadBuffer & istr) override;
@@ -25,17 +25,17 @@ struct MergeTreeSetIndexGranule : public IMergeTreeIndexGranule
     size_t size() const { return block.rows(); }
     bool empty() const override { return !size(); }
 
-    ~MergeTreeSetIndexGranule() override = default;
+    ~MergeTreeIndexGranuleSet() override = default;
 
-    const MergeTreeSetSkippingIndex & index;
+    const MergeTreeIndexSet & index;
     Block block;
 };
 
 
-struct MergeTreeSetIndexAggregator : IMergeTreeIndexAggregator
+struct MergeTreeIndexAggregatorSet : IMergeTreeIndexAggregator
 {
-    explicit MergeTreeSetIndexAggregator(const MergeTreeSetSkippingIndex & index);
-    ~MergeTreeSetIndexAggregator() override = default;
+    explicit MergeTreeIndexAggregatorSet(const MergeTreeIndexSet & index);
+    ~MergeTreeIndexAggregatorSet() override = default;
 
     size_t size() const { return data.getTotalRowCount(); }
     bool empty() const override { return !size(); }
@@ -55,26 +55,26 @@ private:
             size_t limit,
             ClearableSetVariants & variants) const;
 
-    const MergeTreeSetSkippingIndex & index;
+    const MergeTreeIndexSet & index;
     ClearableSetVariants data;
     Sizes key_sizes;
     MutableColumns columns;
 };
 
 
-class SetIndexCondition : public IIndexCondition
+class MergeTreeIndexConditionSet : public IMergeTreeIndexCondition
 {
 public:
-    SetIndexCondition(
+    MergeTreeIndexConditionSet(
             const SelectQueryInfo & query,
             const Context & context,
-            const MergeTreeSetSkippingIndex & index);
+            const MergeTreeIndexSet & index);
 
     bool alwaysUnknownOrTrue() const override;
 
     bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
 
-    ~SetIndexCondition() override = default;
+    ~MergeTreeIndexConditionSet() override = default;
 private:
     void traverseAST(ASTPtr & node) const;
     bool atomFromAST(ASTPtr & node) const;
@@ -82,7 +82,7 @@ private:
 
     bool checkASTUseless(const ASTPtr &node, bool atomic = false) const;
 
-    const MergeTreeSetSkippingIndex & index;
+    const MergeTreeIndexSet & index;
 
     bool useless;
     std::set<String> key_columns;
@@ -91,10 +91,10 @@ private:
 };
 
 
-class MergeTreeSetSkippingIndex : public IMergeTreeIndex
+class MergeTreeIndexSet : public IMergeTreeIndex
 {
 public:
-    MergeTreeSetSkippingIndex(
+    MergeTreeIndexSet(
         String name_,
         ExpressionActionsPtr expr_,
         const Names & columns_,
@@ -104,12 +104,12 @@ public:
         size_t max_rows_)
         : IMergeTreeIndex(std::move(name_), std::move(expr_), columns_, data_types_, header_, granularity_), max_rows(max_rows_) {}
 
-    ~MergeTreeSetSkippingIndex() override = default;
+    ~MergeTreeIndexSet() override = default;
 
     MergeTreeIndexGranulePtr createIndexGranule() const override;
     MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
 
-    IndexConditionPtr createIndexCondition(
+    MergeTreeIndexConditionPtr createIndexCondition(
             const SelectQueryInfo & query, const Context & context) const override;
 
     bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndices.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndices.cpp
index 74eb31ecd46..e19aafbd25d 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeIndices.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndices.cpp
@@ -19,7 +19,7 @@ namespace ErrorCodes
     extern const int UNKNOWN_EXCEPTION;
 }
 
-void MergeTreeIndexFactory::registerIndex(const std::string &name, Creator creator)
+void MergeTreeIndexFactory::registerIndex(const std::string & name, Creator creator)
 {
     if (!indexes.emplace(name, std::move(creator)).second)
         throw Exception("MergeTreeIndexFactory: the Index creator name '" + name + "' is not unique",
@@ -70,6 +70,11 @@ std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
         std::shared_ptr<ASTIndexDeclaration> node,
         const Context & context);
 
+std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreatorNew(
+    const NamesAndTypesList & columns,
+    std::shared_ptr<ASTIndexDeclaration> node,
+    const Context & context);
+
 
 MergeTreeIndexFactory::MergeTreeIndexFactory()
 {
@@ -77,6 +82,7 @@ MergeTreeIndexFactory::MergeTreeIndexFactory()
     registerIndex("set", setIndexCreator);
     registerIndex("ngrambf_v1", bloomFilterIndexCreator);
     registerIndex("tokenbf_v1", bloomFilterIndexCreator);
+    registerIndex("bloom_filter", bloomFilterIndexCreatorNew);
 }
 
 }
diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndices.h b/dbms/src/Storages/MergeTree/MergeTreeIndices.h
index b6ee89d87ef..2a00c902810 100644
--- a/dbms/src/Storages/MergeTree/MergeTreeIndices.h
+++ b/dbms/src/Storages/MergeTree/MergeTreeIndices.h
@@ -59,17 +59,17 @@ using MergeTreeIndexAggregators = std::vector<MergeTreeIndexAggregatorPtr>;
 
 
 /// Condition on the index.
-class IIndexCondition
+class IMergeTreeIndexCondition
 {
 public:
-    virtual ~IIndexCondition() = default;
+    virtual ~IMergeTreeIndexCondition() = default;
     /// Checks if this index is useful for query.
     virtual bool alwaysUnknownOrTrue() const = 0;
 
     virtual bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr granule) const = 0;
 };
 
-using IndexConditionPtr = std::shared_ptr<IIndexCondition>;
+using MergeTreeIndexConditionPtr = std::shared_ptr<IMergeTreeIndexCondition>;
 
 
 /// Structure for storing basic index info like columns, expression, arguments, ...
@@ -101,7 +101,7 @@ public:
     virtual MergeTreeIndexGranulePtr createIndexGranule() const = 0;
     virtual MergeTreeIndexAggregatorPtr createIndexAggregator() const = 0;
 
-    virtual IndexConditionPtr createIndexCondition(
+    virtual MergeTreeIndexConditionPtr createIndexCondition(
             const SelectQueryInfo & query_info, const Context & context) const = 0;
 
     String name;
diff --git a/dbms/src/Storages/MergeTree/RPNBuilder.h b/dbms/src/Storages/MergeTree/RPNBuilder.h
index 6a557cb5f6a..d5244c3285d 100644
--- a/dbms/src/Storages/MergeTree/RPNBuilder.h
+++ b/dbms/src/Storages/MergeTree/RPNBuilder.h
@@ -24,10 +24,7 @@ public:
     using AtomFromASTFunc = std::function<
             bool(const ASTPtr & node, const Context & context, Block & block_with_constants, RPNElement & out)>;
 
-    RPNBuilder(
-        const SelectQueryInfo & query_info,
-        const Context & context_,
-        const AtomFromASTFunc & atomFromAST_)
+    RPNBuilder(const SelectQueryInfo & query_info, const Context & context_, const AtomFromASTFunc & atomFromAST_)
         : context(context_), atomFromAST(atomFromAST_)
     {
         /** Evaluation of expressions that depend only on constants.
diff --git a/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp b/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp
index b23a2eedc0e..138e7c14f9d 100644
--- a/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp
+++ b/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp
@@ -2,8 +2,8 @@
 #include <Storages/StorageMergeTree.h>
 #include <Storages/StorageReplicatedMergeTree.h>
 #include <Storages/MergeTree/MergeTreeIndices.h>
-#include <Storages/MergeTree/MergeTreeMinMaxIndex.h>
-#include <Storages/MergeTree/MergeTreeSetSkippingIndex.h>
+#include <Storages/MergeTree/MergeTreeIndexMinMax.h>
+#include <Storages/MergeTree/MergeTreeIndexSet.h>
 
 #include <Common/typeid_cast.h>
 #include <Common/OptimizedRegularExpression.h>
diff --git a/dbms/src/Storages/StorageReplicatedMergeTree.cpp b/dbms/src/Storages/StorageReplicatedMergeTree.cpp
index c61edc285f4..86298df3f19 100644
--- a/dbms/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/dbms/src/Storages/StorageReplicatedMergeTree.cpp
@@ -3958,8 +3958,7 @@ void StorageReplicatedMergeTree::sendRequestToLeaderReplica(const ASTPtr & query
     /// there is no sense to send query to leader, because he will receive it from own DDLWorker
     if (query_context.getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY)
     {
-        LOG_DEBUG(log, "Not leader replica received query from DDLWorker, skipping it.");
-        return;
+        throw Exception("Cannot execute DDL query, because leader was suddenly changed or logical error.", ErrorCodes::LEADERSHIP_CHANGED);
     }
 
     ReplicatedMergeTreeAddress leader_address(getZooKeeper()->get(zookeeper_path + "/replicas/" + leader + "/host"));
diff --git a/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in b/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in
index 1ee9803dda3..63ddfe15649 100644
--- a/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in
+++ b/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in
@@ -37,8 +37,7 @@ const char * auto_config_build[]
     "USE_GLIBC_COMPATIBILITY", "@GLIBC_COMPATIBILITY@",
     "USE_JEMALLOC", "@USE_JEMALLOC@",
     "USE_TCMALLOC", "@USE_TCMALLOC@",
-    "USE_LFALLOC", "@USE_LFALLOC@",
-    "USE_LFALLOC_RANDOM_HINT", "@USE_LFALLOC_RANDOM_HINT@",
+    "USE_MIMALLOC", "@USE_MIMALLOC@",
     "USE_UNWIND", "@USE_UNWIND@",
     "USE_ICU", "@USE_ICU@",
     "USE_H3", "@USE_H3@",
diff --git a/dbms/src/Storages/System/StorageSystemProcesses.cpp b/dbms/src/Storages/System/StorageSystemProcesses.cpp
index f3842663477..2450ec9296e 100644
--- a/dbms/src/Storages/System/StorageSystemProcesses.cpp
+++ b/dbms/src/Storages/System/StorageSystemProcesses.cpp
@@ -57,11 +57,11 @@ NamesAndTypesList StorageSystemProcesses::getNamesAndTypes()
         {"peak_memory_usage", std::make_shared<DataTypeInt64>()},
         {"query", std::make_shared<DataTypeString>()},
 
-        { "thread_numbers", std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt32>()) },
-        { "ProfileEvents.Names", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>()) },
-        { "ProfileEvents.Values", std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>()) },
-        { "Settings.Names", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>()) },
-        { "Settings.Values", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>()) },
+        {"thread_numbers", std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt32>())},
+        {"ProfileEvents.Names", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>())},
+        {"ProfileEvents.Values", std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>())},
+        {"Settings.Names", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>())},
+        {"Settings.Values", std::make_shared<DataTypeArray>(std::make_shared<DataTypeString>())},
     };
 }
 
diff --git a/dbms/tests/integration/README.md b/dbms/tests/integration/README.md
index 1b2d190b383..06819af7668 100644
--- a/dbms/tests/integration/README.md
+++ b/dbms/tests/integration/README.md
@@ -12,7 +12,7 @@ You must install latest Docker from
 https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#set-up-the-repository
 Don't use Docker from your system repository.
 
-* [pip](https://pypi.python.org/pypi/pip). To install: `sudo apt-get install python-pip`
+* [pip](https://pypi.python.org/pypi/pip) and `libpq-dev`. To install: `sudo apt-get install python-pip libpq-dev`
 * [py.test](https://docs.pytest.org/) testing framework. To install: `sudo -H pip install pytest`
 * [docker-compose](https://docs.docker.com/compose/) and additional python libraries. To install: `sudo -H pip install docker-compose docker dicttoxml kazoo PyMySQL psycopg2 pymongo tzlocal kafka-python protobuf pytest-timeout`
 
diff --git a/dbms/tests/integration/test_cluster_copier/test.py b/dbms/tests/integration/test_cluster_copier/test.py
index 005655e938a..c223a73f59e 100644
--- a/dbms/tests/integration/test_cluster_copier/test.py
+++ b/dbms/tests/integration/test_cluster_copier/test.py
@@ -260,31 +260,24 @@ def execute_task(task, cmd_options):
 
 # Tests
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_copy1_simple(started_cluster):
     execute_task(Task1(started_cluster), [])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_copy1_with_recovering(started_cluster):
     execute_task(Task1(started_cluster), ['--copy-fault-probability', str(COPYING_FAIL_PROBABILITY)])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_copy_month_to_week_partition(started_cluster):
     execute_task(Task2(started_cluster), [])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_copy_month_to_week_partition_with_recovering(started_cluster):
     execute_task(Task2(started_cluster), ['--copy-fault-probability', str(0.3)])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_block_size(started_cluster):
     execute_task(Task_test_block_size(started_cluster), [])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_no_index(started_cluster):
     execute_task(Task_no_index(started_cluster), [])
 
-@pytest.mark.skip(reason="Fails under asan")
 def test_no_arg(started_cluster):
     execute_task(Task_no_arg(started_cluster), [])
 
diff --git a/dbms/tests/integration/test_distributed_ddl_password/configs/config.d/clusters.xml b/dbms/tests/integration/test_distributed_ddl_password/configs/config.d/clusters.xml
index c0ad3b1ba32..ffc4baa1199 100644
--- a/dbms/tests/integration/test_distributed_ddl_password/configs/config.d/clusters.xml
+++ b/dbms/tests/integration/test_distributed_ddl_password/configs/config.d/clusters.xml
@@ -24,5 +24,18 @@
             </replica>
         </shard>
     </awesome_cluster>
+    <simple_cluster>
+        <shard>
+            <internal_replication>true</internal_replication>
+            <replica>
+                <host>node5</host>
+                <port>9000</port>
+            </replica>
+            <replica>
+                <host>node6</host>
+                <port>9000</port>
+            </replica>
+        </shard>
+    </simple_cluster>
 </remote_servers>
 </yandex>
diff --git a/dbms/tests/integration/test_distributed_ddl_password/test.py b/dbms/tests/integration/test_distributed_ddl_password/test.py
index fc51fe5ddee..f957f001df1 100644
--- a/dbms/tests/integration/test_distributed_ddl_password/test.py
+++ b/dbms/tests/integration/test_distributed_ddl_password/test.py
@@ -1,19 +1,25 @@
 import time
 import pytest
 from helpers.cluster import ClickHouseCluster
+from helpers.test_tools import assert_eq_with_retry
+
+from helpers.client import QueryRuntimeException
 
 cluster = ClickHouseCluster(__file__)
 node1 = cluster.add_instance('node1', config_dir="configs", with_zookeeper=True)
 node2 = cluster.add_instance('node2', config_dir="configs", with_zookeeper=True)
 node3 = cluster.add_instance('node3', config_dir="configs", with_zookeeper=True)
 node4 = cluster.add_instance('node4', config_dir="configs", with_zookeeper=True)
+node5 = cluster.add_instance('node5', config_dir="configs", with_zookeeper=True)
+node6 = cluster.add_instance('node6', config_dir="configs", with_zookeeper=True)
+
 
 @pytest.fixture(scope="module")
 def start_cluster():
     try:
         cluster.start()
 
-        for node, shard in [(node1, 1), (node2, 1), (node3, 2), (node4, 2)]:
+        for node, shard in [(node1, 1), (node2, 1), (node3, 2), (node4, 2), (node5, 3), (node6, 3)]:
             node.query(
             '''
                 CREATE TABLE test_table(date Date, id UInt32, dummy UInt32)
@@ -42,13 +48,51 @@ def test_truncate(start_cluster):
     assert node4.query("select count(*) from test_table", settings={"password": "clickhouse"}) == "3\n"
 
     node3.query("truncate table test_table on cluster 'awesome_cluster'", settings={"password": "clickhouse"})
-    time.sleep(2)
 
     for node in [node1, node2, node3, node4]:
-        assert node.query("select count(*) from test_table", settings={"password": "clickhouse"}) == "0\n"
+        assert_eq_with_retry(node, "select count(*) from test_table", "0", settings={"password": "clickhouse"})
 
     node2.query("drop table test_table on cluster 'awesome_cluster'", settings={"password": "clickhouse"})
-    time.sleep(2)
 
     for node in [node1, node2, node3, node4]:
-        assert node.query("select count(*) from system.tables where name='test_table'", settings={"password": "clickhouse"}) == "0\n"
+        assert_eq_with_retry(node, "select count(*) from system.tables where name='test_table'", "0", settings={"password": "clickhouse"})
+
+def test_alter(start_cluster):
+    node5.query("insert into test_table values ('2019-02-15', 1, 2), ('2019-02-15', 2, 3), ('2019-02-15', 3, 4)", settings={"password": "clickhouse"})
+    node6.query("insert into test_table values ('2019-02-15', 4, 2), ('2019-02-15', 5, 3), ('2019-02-15', 6, 4)", settings={"password": "clickhouse"})
+
+    node5.query("SYSTEM SYNC REPLICA test_table", settings={"password": "clickhouse"})
+    node6.query("SYSTEM SYNC REPLICA test_table", settings={"password": "clickhouse"})
+
+    assert_eq_with_retry(node5, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+
+    node6.query("OPTIMIZE TABLE test_table ON CLUSTER 'simple_cluster' FINAL", settings={"password": "clickhouse"})
+
+    node5.query("SYSTEM SYNC REPLICA test_table", settings={"password": "clickhouse"})
+    node6.query("SYSTEM SYNC REPLICA test_table", settings={"password": "clickhouse"})
+
+    assert_eq_with_retry(node5, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+
+    node6.query("ALTER TABLE test_table ON CLUSTER 'simple_cluster' DETACH PARTITION '2019-02-15'", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node5, "select count(*) from test_table", "0", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select count(*) from test_table", "0", settings={"password": "clickhouse"})
+
+    with pytest.raises(QueryRuntimeException):
+        node6.query("ALTER TABLE test_table ON CLUSTER 'simple_cluster' ATTACH PARTITION '2019-02-15'", settings={"password": "clickhouse"})
+
+    node5.query("ALTER TABLE test_table ATTACH PARTITION '2019-02-15'", settings={"password": "clickhouse"})
+
+    assert_eq_with_retry(node5, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select count(*) from test_table", "6", settings={"password": "clickhouse"})
+
+    node5.query("ALTER TABLE test_table ON CLUSTER 'simple_cluster' MODIFY COLUMN dummy String", settings={"password": "clickhouse"})
+
+    assert_eq_with_retry(node5, "select length(dummy) from test_table ORDER BY dummy LIMIT 1", "1", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select length(dummy) from test_table ORDER BY dummy LIMIT 1", "1", settings={"password": "clickhouse"})
+
+    node6.query("ALTER TABLE test_table ON CLUSTER 'simple_cluster' DROP PARTITION '2019-02-15'", settings={"password": "clickhouse"})
+
+    assert_eq_with_retry(node5, "select count(*) from test_table", "0", settings={"password": "clickhouse"})
+    assert_eq_with_retry(node6, "select count(*) from test_table", "0", settings={"password": "clickhouse"})
diff --git a/dbms/tests/integration/test_system_queries/configs/config.d/query_log.xml b/dbms/tests/integration/test_system_queries/configs/config.d/query_log.xml
new file mode 100644
index 00000000000..9f55dcb829e
--- /dev/null
+++ b/dbms/tests/integration/test_system_queries/configs/config.d/query_log.xml
@@ -0,0 +1,9 @@
+<?xml version="1.0"?>
+<yandex>
+    <query_log>
+        <database>system</database>
+        <table>query_log</table>
+        <partition_by>toYYYYMM(event_date)</partition_by>
+        <flush_interval_milliseconds>300</flush_interval_milliseconds>
+    </query_log>
+</yandex>
diff --git a/dbms/tests/integration/test_system_queries/test.py b/dbms/tests/integration/test_system_queries/test.py
index a3899bab577..1761017362a 100644
--- a/dbms/tests/integration/test_system_queries/test.py
+++ b/dbms/tests/integration/test_system_queries/test.py
@@ -92,6 +92,23 @@ def test_RELOAD_CONFIG_AND_MACROS(started_cluster):
     instance.query("SYSTEM RELOAD CONFIG")
     assert TSV(instance.query("select * from system.macros")) == TSV("mac\tro\n")
 
+
+def test_SYSTEM_FLUSH_LOGS(started_cluster):
+    instance = cluster.instances['ch1']
+    for i in range(4):
+        # Sleep to execute flushing from background thread at first query
+        # by expiration of flush_interval_millisecond and test probable race condition.
+        time.sleep(0.5)
+        result = instance.query('''
+            SET log_queries = 1;
+            SELECT 1 FORMAT Null;
+            SET log_queries = 0;
+            SYSTEM FLUSH LOGS;
+            SELECT count() FROM system.query_log;''')
+        instance.query('TRUNCATE TABLE system.query_log')
+        assert TSV(result) == TSV('4')
+
+
 if __name__ == '__main__':
     with contextmanager(started_cluster)() as cluster:
        for name, instance in cluster.instances.items():
diff --git a/dbms/tests/queries/0_stateless/00381_first_significant_subdomain.reference b/dbms/tests/queries/0_stateless/00381_first_significant_subdomain.reference
index f13e9ddb1bd..1f1230a2104 100644
--- a/dbms/tests/queries/0_stateless/00381_first_significant_subdomain.reference
+++ b/dbms/tests/queries/0_stateless/00381_first_significant_subdomain.reference
@@ -1,3 +1,3 @@
 canada	congo	net-domena
 yandex	yandex		yandex	яндекс	yandex
-canada	hello	hello	hello	hello	hello									canada	canada
+canada		hello	hello											canada	
diff --git a/dbms/tests/queries/0_stateless/00398_url_functions.reference b/dbms/tests/queries/0_stateless/00398_url_functions.reference
index e4a31f0654a..acb605597d3 100644
--- a/dbms/tests/queries/0_stateless/00398_url_functions.reference
+++ b/dbms/tests/queries/0_stateless/00398_url_functions.reference
@@ -12,13 +12,17 @@ www.example.com
 127.0.0.1
 www.example.com
 www.example.com
+www.example.com
+example.com
 example.com
 example.com
 ====DOMAIN====
 com
 
 ru
-ru
+
+com
+com
 com
 ====PATH====
 П
@@ -61,6 +65,8 @@ example.com
 example.com
 example.com
 example.com
+example.com
+example.com
 ====CUT WWW====
 http://example.com
 http://example.com:1234
diff --git a/dbms/tests/queries/0_stateless/00398_url_functions.sql b/dbms/tests/queries/0_stateless/00398_url_functions.sql
index 16425dae46d..d301cac5b15 100644
--- a/dbms/tests/queries/0_stateless/00398_url_functions.sql
+++ b/dbms/tests/queries/0_stateless/00398_url_functions.sql
@@ -13,6 +13,8 @@ SELECT domain('http://www.example.com?q=4') AS Host;
 SELECT domain('http://127.0.0.1:443/') AS Host;
 SELECT domain('//www.example.com') AS Host;
 SELECT domain('//paul@www.example.com') AS Host;
+SELECT domain('www.example.com') as Host;
+SELECT domain('example.com') as Host;
 SELECT domainWithoutWWW('//paul@www.example.com') AS Host;
 SELECT domainWithoutWWW('http://paul@www.example.com:80/') AS Host;
 
@@ -23,6 +25,8 @@ SELECT topLevelDomain('http://127.0.0.1:443/') AS Domain;
 SELECT topLevelDomain('svn+ssh://example.ru?q=hello%20world') AS Domain;
 SELECT topLevelDomain('svn+ssh://example.ru.?q=hello%20world') AS Domain;
 SELECT topLevelDomain('//www.example.com') AS Domain;
+SELECT topLevelDomain('www.example.com') as Domain;
+SELECT topLevelDomain('example.com') as Domain;
 
 SELECT '====PATH====';
 SELECT decodeURLComponent('%D0%9F');
@@ -69,6 +73,8 @@ SELECT cutToFirstSignificantSubdomain('http://www.example.com/a/b/c?a=b');
 SELECT cutToFirstSignificantSubdomain('http://www.example.com/a/b/c?a=b#d=f');
 SELECT cutToFirstSignificantSubdomain('http://paul@www.example.com/a/b/c?a=b#d=f');
 SELECT cutToFirstSignificantSubdomain('//paul@www.example.com/a/b/c?a=b#d=f');
+SELECT cutToFirstSignificantSubdomain('www.example.com');
+SELECT cutToFirstSignificantSubdomain('example.com');
 
 SELECT '====CUT WWW====';
 SELECT cutWWW('http://www.example.com');
diff --git a/dbms/tests/queries/0_stateless/00600_replace_running_query.reference b/dbms/tests/queries/0_stateless/00600_replace_running_query.reference
index 573541ac970..237dd6b5309 100644
--- a/dbms/tests/queries/0_stateless/00600_replace_running_query.reference
+++ b/dbms/tests/queries/0_stateless/00600_replace_running_query.reference
@@ -1 +1,5 @@
 0
+1	0
+3	0
+2	0
+44
diff --git a/dbms/tests/queries/0_stateless/00600_replace_running_query.sh b/dbms/tests/queries/0_stateless/00600_replace_running_query.sh
index 6778bbce149..abe5dd69b8f 100755
--- a/dbms/tests/queries/0_stateless/00600_replace_running_query.sh
+++ b/dbms/tests/queries/0_stateless/00600_replace_running_query.sh
@@ -9,3 +9,16 @@ $CLICKHOUSE_CURL -sS "$CLICKHOUSE_URL?query_id=hello&replace_running_query=1" -d
 sleep 0.1 # First query (usually) should be received by the server after this sleep.
 $CLICKHOUSE_CURL -sS "$CLICKHOUSE_URL?query_id=hello&replace_running_query=1" -d 'SELECT 0'
 wait
+
+${CLICKHOUSE_CLIENT} --user=readonly --query_id=42 --query='SELECT 1, sleep(1)' &
+sleep 0.1
+( ${CLICKHOUSE_CLIENT} --query_id=42 --query='SELECT 43' ||: ) 2>&1 | grep -F 'is already running by user' > /dev/null
+wait
+
+${CLICKHOUSE_CLIENT} --query='SELECT 3, sleep(1)' &
+sleep 0.1
+${CLICKHOUSE_CLIENT} --query_id=42 --query='SELECT 2, sleep(1)' &
+sleep 0.1
+( ${CLICKHOUSE_CLIENT} --query_id=42 --replace_running_query=1 --queue_max_wait_ms=500 --query='SELECT 43' ||: ) 2>&1 | grep -F 'cant be stopped' > /dev/null
+${CLICKHOUSE_CLIENT} --query_id=42 --replace_running_query=1 --query='SELECT 44'
+wait
diff --git a/dbms/tests/queries/0_stateless/00897_flatten.reference b/dbms/tests/queries/0_stateless/00897_flatten.reference
index 36b61d51056..6c1aa724070 100644
--- a/dbms/tests/queries/0_stateless/00897_flatten.reference
+++ b/dbms/tests/queries/0_stateless/00897_flatten.reference
@@ -8,3 +8,6 @@
 [0,0,1,0,1,0,1,0,1]
 [0,0,1,0,1,0,1,0,1,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2]
 [0,0,1,0,1,0,1,0,1,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3]
+[1,2,3,4,5,6,7,8]
+[]
+[]
diff --git a/dbms/tests/queries/0_stateless/00897_flatten.sql b/dbms/tests/queries/0_stateless/00897_flatten.sql
index 344db389bac..04c725677bd 100644
--- a/dbms/tests/queries/0_stateless/00897_flatten.sql
+++ b/dbms/tests/queries/0_stateless/00897_flatten.sql
@@ -1,3 +1,6 @@
 SELECT flatten(arrayJoin([[[1, 2, 3], [4, 5]], [[6], [7, 8]]]));
-SELECT flatten(arrayJoin([[[[]], [[1], [], [2, 3]]], [[[4]]]]));
+SELECT arrayFlatten(arrayJoin([[[[]], [[1], [], [2, 3]]], [[[4]]]]));
 SELECT flatten(arrayMap(x -> arrayMap(x -> arrayMap(x -> range(x), range(x)), range(x)), range(number))) FROM numbers(6);
+SELECT arrayFlatten([[[1, 2, 3], [4, 5]], [[6], [7, 8]]]);
+SELECT flatten([[[]]]);
+SELECT arrayFlatten([]);
diff --git a/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.reference b/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.reference
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh b/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh
new file mode 100755
index 00000000000..52246b50b7a
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+. $CURDIR/../shell_config.sh
+
+set -e
+
+for sequence in 1 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000; do \
+rate=`echo "1 $sequence" | awk '{printf("%0.9f\n",$1/$2)}'`
+$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS test.bloom_filter_idx";
+$CLICKHOUSE_CLIENT --allow_experimental_data_skipping_indices=1 --query="CREATE TABLE test.bloom_filter_idx ( u64 UInt64, i32 Int32, f64 Float64, d Decimal(10, 2), s String, e Enum8('a' = 1, 'b' = 2, 'c' = 3), dt Date, INDEX bloom_filter_a i32 TYPE bloom_filter($rate) GRANULARITY 1 ) ENGINE = MergeTree() ORDER BY u64 SETTINGS index_granularity = 8192"
+done
diff --git a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference
new file mode 100755
index 00000000000..7b6d919d404
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference
@@ -0,0 +1,30 @@
+1
+0
+1
+1
+2
+0
+2
+2
+2
+0
+2
+2
+2
+0
+2
+2
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
diff --git a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql
new file mode 100755
index 00000000000..bb258b886a4
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql
@@ -0,0 +1,50 @@
+SET allow_experimental_data_skipping_indices = 1;
+
+DROP TABLE IF EXISTS test.single_column_bloom_filter;
+
+CREATE TABLE test.single_column_bloom_filter (u64 UInt64, i32 Int32, i64 UInt64, INDEX idx (i32) TYPE bloom_filter GRANULARITY 1) ENGINE = MergeTree() ORDER BY u64 SETTINGS index_granularity = 6;
+
+INSERT INTO test.single_column_bloom_filter SELECT number AS u64, number AS i32, number AS i64 FROM system.numbers LIMIT 100;
+
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE i32 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i32) = (1, 2) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i64) = (1, 1) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i64, (i64, i32)) = (1, (1, 1)) SETTINGS max_rows_to_read = 6;
+
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE i32 IN (1, 2) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i32) IN ((1, 2), (2, 3)) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i64) IN ((1, 1), (2, 2)) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i64, (i64, i32)) IN ((1, (1, 1)), (2, (2, 2))) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE i32 IN (SELECT arrayJoin([toInt32(1), toInt32(2)])) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i32) IN (SELECT arrayJoin([(toInt32(1), toInt32(2)), (toInt32(2), toInt32(3))])) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i64) IN (SELECT arrayJoin([(toInt32(1), toUInt64(1)), (toInt32(2), toUInt64(2))])) SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i64, (i64, i32)) IN (SELECT arrayJoin([(toUInt64(1), (toUInt64(1), toInt32(1))), (toUInt64(2), (toUInt64(2), toInt32(2)))])) SETTINGS max_rows_to_read = 6;
+WITH (1, 2) AS liter_prepared_set SELECT COUNT() FROM test.single_column_bloom_filter WHERE i32 IN liter_prepared_set SETTINGS max_rows_to_read = 6;
+WITH ((1, 2), (2, 3)) AS liter_prepared_set SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i32) IN liter_prepared_set SETTINGS max_rows_to_read = 6;
+WITH ((1, 1), (2, 2)) AS liter_prepared_set SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i32, i64) IN liter_prepared_set SETTINGS max_rows_to_read = 6;
+WITH ((1, (1, 1)), (2, (2, 2))) AS liter_prepared_set SELECT COUNT() FROM test.single_column_bloom_filter WHERE (i64, (i64, i32)) IN liter_prepared_set SETTINGS max_rows_to_read = 6;
+
+DROP TABLE IF EXISTS test.single_column_bloom_filter;
+
+
+DROP TABLE IF EXISTS test.bloom_filter_types_test;
+
+CREATE TABLE test.bloom_filter_types_test (order_key   UInt64, i8 Int8, i16 Int16, i32 Int32, i64 Int64, u8 UInt8, u16 UInt16, u32 UInt32, u64 UInt64, f32 Float32, f64 Float64, date Date, date_time DateTime('Europe/Moscow'), str String, fixed_string FixedString(5), INDEX idx (i8, i16, i32, i64, u8, u16, u32, u64, f32, f64, date, date_time, str, fixed_string) TYPE bloom_filter GRANULARITY 1) ENGINE = MergeTree() ORDER BY order_key SETTINGS index_granularity = 6;
+INSERT INTO test.bloom_filter_types_test SELECT number AS order_key, toInt8(number) AS i8, toInt16(number) AS i16, toInt32(number) AS i32, toInt64(number) AS i64, toUInt8(number) AS u8, toUInt16(number) AS u16, toUInt32(number) AS u32, toUInt64(number) AS u64, toFloat32(number) AS f32, toFloat64(number) AS f64, toDate(number, 'Europe/Moscow') AS date, toDateTime(number, 'Europe/Moscow') AS date_time, toString(number) AS str, toFixedString(toString(number), 5) AS fixed_string FROM system.numbers LIMIT 100;
+
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE i8 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE i16 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE i32 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE i64 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE u8 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE u16 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE u32 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE u64 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE f32 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE f64 = 1 SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE date = '1970-01-02' SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE date_time = toDateTime('1970-01-01 03:00:01', 'Europe/Moscow') SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE str = '1' SETTINGS max_rows_to_read = 6;
+SELECT COUNT() FROM test.bloom_filter_types_test WHERE fixed_string = toFixedString('1', 5) SETTINGS max_rows_to_read = 12;
+
+DROP TABLE IF EXISTS test.bloom_filter_types_test;
diff --git a/dbms/tests/queries/0_stateless/00957_delta_diff_bug.reference b/dbms/tests/queries/0_stateless/00957_delta_diff_bug.reference
new file mode 100644
index 00000000000..4f142ee300f
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00957_delta_diff_bug.reference
@@ -0,0 +1,2 @@
+1111
+2222
diff --git a/dbms/tests/queries/0_stateless/00957_delta_diff_bug.sql b/dbms/tests/queries/0_stateless/00957_delta_diff_bug.sql
new file mode 100644
index 00000000000..0c5fb6ce7e1
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00957_delta_diff_bug.sql
@@ -0,0 +1,9 @@
+DROP TABLE IF EXISTS segfault_table;
+
+CREATE TABLE segfault_table (id UInt16 CODEC(Delta(2))) ENGINE MergeTree() order by tuple();
+
+INSERT INTO segfault_table VALUES (1111), (2222);
+
+SELECT * FROM segfault_table;
+
+DROP TABLE IF EXISTS segfault_table;
diff --git a/dbms/tests/queries/1_stateful/00037_uniq_state_merge1.reference b/dbms/tests/queries/1_stateful/00037_uniq_state_merge1.reference
index d9ca7e3be21..3bedecd267b 100644
--- a/dbms/tests/queries/1_stateful/00037_uniq_state_merge1.reference
+++ b/dbms/tests/queries/1_stateful/00037_uniq_state_merge1.reference
@@ -1,24 +1,15 @@
-yandex.ru	25107	25107
-	21999	21999
-public_search	16749	16749
+	89348	89348
+yandex.ru	25105	25105
 avito.ru	16523	16523
-public	15429	15429
-mail.yandex.ru	13663	13663
-yandsearch	10039	10039
-news	8827	8827
+mail.yandex.ru	13659	13659
 mail.ru	7643	7643
-doc	7537	7537
 auto.ru	7350	7350
 hurpass.com	6395	6395
 best.ru	5477	5477
 tv.yandex.ru	5341	5341
 korer.ru	4967	4967
-mail.yandsearch	4246	4246
-cars	4077	4077
-publ	3970	3970
-yandex	3845	3845
-main=hurriyet.com	3806	3806
-yandex.ua	3803	3803
+mail.yandsearch	4237	4237
+yandex.ua	3802	3802
 korablitz.ru	3717	3717
 uyelik.hurriyet.com	3584	3584
 e.mail.ru	3508	3508
@@ -28,46 +19,32 @@ coccoc.com	2707	2707
 rutube.ru	2699	2699
 rbc.ru	2644	2644
 mamba.ru	2598	2598
-video	2558	2558
-mail.yandex	2447	2447
-wot	2253	2253
+mail.yandex	2441	2441
 pikabu.ru	2130	2130
 yandex.php	2057	2057
 e.mail.yandex.ru	1971	1971
 brandex.ru	1969	1969
-bravoslava-230v	1942	1942
-search	1933	1933
 market.ru	1913	1913
 mynet.ru	1881	1881
-mail	1845	1845
-mail.yandex.ua	1825	1825
+mail.yandex.ua	1823	1823
 rutube.com	1821	1821
-images	1812	1812
 news.rambler.com	1787	1787
 hurpass.com.tr	1763	1763
 ads.search	1742	1742
-marina_2_sezon	1680	1680
 cars.auto.ru	1628	1628
 cian.ru	1620	1620
 ivi.ru	1617	1617
 av.by	1598	1598
-world	1596	1596
 news.yandex.ru	1495	1495
 vk.com	1474	1474
-pub	1469	1469
-forum	1414	1414
 wow-girls.ru	1399	1399
-kinogo-dhpWXEdIcgoxWUZ6fgdTWw..	1338	1338
 uyelik.hurriyet.com.tr	1330	1330
 aukro.ua	1314	1314
-plugins	1244	1244
 images.yandsearch	1235	1235
 ondom.ru	1221	1221
 korablitz.com	1189	1189
-videovol-9-sezon	1187	1187
 kerl.org	1155	1155
 mail.yandex.php	1148	1148
-file	1147	1147
 love.mail.yandex.ru	1136	1136
 yandex.kz	1124	1124
 coccoc.com.tr	1113	1113
@@ -77,24 +54,47 @@ sprashivai.ru	1072	1072
 market.yandex.ru	1064	1064
 spb-n.ru	1056	1056
 sz.spaces.ru	1055	1055
-xofx.net%2F63857&secret-oper=reply&id=0&extras]	1054	1054
 marinance.ua	1050	1050
 tube.ru	1044	1044
 haber.com	1043	1043
-image&img_url=http	1042	1042
-sport	1040	1040
 megogo.net	993	993
 sozcu.com	991	991
 yandex.by	938	938
-image&uinfo	936	936
-fast-golove.mail.ru_Mobile=0&at=35&text=производств	927	927
-linka	901	901
 gazeta.ru	892	892
-yandex.ru;yandex.ru	892	892
-kinogo-dhpWXEdIcgoxWUZ6fgdTXA..	890	890
 fotki.yandex.ru	875	875
 fast-golove.mail.yandex.php	842	842
-news=previews	839	839
-faber	833	833
 lenta.ru	820	820
 publicdaroglundai_anketa.ru	813	813
+mail.yandex.kz	810	810
+censor.net	807	807
+mail.yandex.by	804	804
+nnn.ru	796	796
+maxi.su	788	788
+rambler.ru	755	755
+hurpass.com.ua	729	729
+g1.botva.lv	728	728
+m.sport.airway	724	724
+tvizle.com	723	723
+fast-golove.mail.yandex.ru	712	712
+spb.ru	693	693
+eksisozluk.com	689	689
+uyelik.hurriyet	666	666
+rst.ua	650	650
+deko.ru	647	647
+my.mail.yandex.ru	647	647
+astrov.pro	625	625
+yandsearch.php	624	624
+kinogo.net	617	617
+fanati-avtomobile.jsp	611	611
+tv.yandsearch	605	605
+soft.ru	603	603
+pluginplus.ru	601	601
+images.yandex	595	595
+1tv.rbc.ru	592	592
+ria.ru	591	591
+marina_prezideniz.hurriyet.com	578	578
+youtube.ru	575	575
+cars.autochno.ru	570	570
+a2.stars.auto.yandsearch	566	566
+love.mail.ru	560	560
+mail.rambler.ru	553	553
diff --git a/dbms/tests/queries/1_stateful/00038_uniq_state_merge2.reference b/dbms/tests/queries/1_stateful/00038_uniq_state_merge2.reference
index 926cb1911ba..9144afd90b2 100644
--- a/dbms/tests/queries/1_stateful/00038_uniq_state_merge2.reference
+++ b/dbms/tests/queries/1_stateful/00038_uniq_state_merge2.reference
@@ -1,100 +1,100 @@
-	582035	80248
-ru	299420	71339
-com	78253	34500
-html	40288	19569
-ua	33160	18847
-tr	19570	13117
-net	19003	12908
-php	17817	12011
-yandsearch	13598	10329
-by	9349	7695
-yandex	8946	7282
-org	5897	5320
-tv	5371	4660
-kz	5175	4588
-aspx	3084	2800
-phtml	3012	2725
-xml	2993	2726
-tr&callback_url=http	2897	2681
-su	2833	2587
-shtml	2442	2218
-hurriyet	2030	1907
-search	1915	1904
-tr&user	1556	1494
-jpg	1531	1427
-tr&users	1449	1373
-tr&callback	1294	1244
-jsp	1083	1048
-net%2F63857&secret-oper=reply&id=0&extras]	1054	1054
-htm	957	921
-ru_Mobile=0&at=35&text=производств	927	927
-lv	916	910
-tr&user_page	916	885
-exe	911	891
-me	911	864
-tr&user_page=http	900	868
-do	864	838
-tr&used	782	768
-pro	778	772
+ru	262914	69218
+	92101	89421
+com	63298	30285
+ua	29037	17475
+html	25079	15039
+tr	16770	11857
+net	16387	11686
+php	14374	10307
+yandsearch	12024	9484
+by	8192	6915
+yandex	7211	6124
+org	4890	4514
+kz	4679	4211
+tv	4400	3928
+su	2602	2396
+phtml	2409	2226
+xml	2322	2182
+aspx	1959	1848
+search	1835	1827
+hurriyet	1385	1345
+shtml	995	966
+lv	879	875
+jsp	855	845
+exe	814	798
+pro	737	734
 airway	724	724
-biz	685	672
-mail	677	660
-info	593	575
-tr&callback_url=https	534	526
-tr%2Fgaleri	533	522
+me	675	647
+jpg	662	647
+do	625	611
+mail	593	581
+biz	537	530
 bstatistik_dlja-dlya-naches	521	521
-sx	498	496
-ru%2Fupload	497	492
-news	492	487
-hu	486	479
-aspx&referer	473	459
-pogoda	460	460
-auto	438	429
-az	434	425
-net%2F63857&secret=506d9e3dfbd268e6b6630e58	432	432
+info	461	453
+pogoda	459	459
+sx	450	449
+news	448	444
 sportlibrary	431	431
-jpg,http	411	397
-tr&callbusiness	410	407
-fm	405	400
-online	401	399
-tr&callbusines	388	384
-ru%2Fnews	387	382
+hu	396	393
+htm	393	385
+fm	379	378
+online	374	372
 bstatistic	366	366
-wbp	346	346
-am	336	333
-ru;yandsearch	330	328
-tr&user_page=https	330	328
-tr&callback_url	329	319
-html&lang=ru&lr=110&category=dressages%2Fcs306755	328	328
-pl	328	326
-blog	327	326
-jpg&pos	307	302
-bstana	305	305
-ru;yandex	287	284
-im	283	278
-diary	277	275
-slando	276	274
-eu	274	269
-to	271	269
-asp	253	250
-html&lang	253	248
-mynet	253	251
-tj	242	241
-sberbank	241	238
-haber	234	227
-jpg,https	232	232
-cc	226	221
-_2544	222	222
-ws	221	219
-mamba	220	220
+auto	363	355
+az	356	350
+wbp	343	343
+bstana	304	304
+blog	268	268
+diary	262	261
+am	260	258
+slando	254	252
+im	238	235
+eu	237	234
 liveinteria	218	218
-tr%2Fanasayfa	215	210
-tr&user_pts=&states	213	213
-yandsearchplus	212	211
-jpg","photo	211	209
-ru%2Fwww	211	211
-com&callback_url=http	209	208
+to	215	213
+mamba	214	214
 auto-supers	208	208
-co	206	205
-kg	206	205
-ru%2Fuploads	206	205
+sberbank	207	207
+tj	205	205
+bstatistik_dlja-dlya_avia	201	201
+bstanii_otryasam	200	200
+pl	200	198
+wroad_5d	200	200
+mynet	191	190
+bstan	187	187
+yandsearchplus	186	186
+haber	184	179
+jpg,https	184	184
+turkasovki	183	183
+co	177	177
+video	177	177
+gif","photos	175	175
+mgshared_zone	172	172
+wssp	172	172
+jpg,http	170	168
+swf	167	167
+cc	166	164
+ws	164	164
+kg	157	156
+mobili_s_probegom	154	153
+cgi	153	152
+yandsearcher	152	151
+uz	150	150
+nsf	149	149
+adriver	147	144
+slandsearch	143	142
+korrez	140	140
+bstatistik_dlja-dlja-putin	139	139
+rambler	133	132
+mvideo	132	132
+asp	129	128
+vc	127	127
+md	121	121
+jpg","photo	119	119
+mp4	118	117
+ee	116	115
+loveplaceOfSearchplus	111	111
+nl	111	111
+bstatistika	107	107
+br	102	102
+sport	99	99
diff --git a/dbms/tests/queries/1_stateful/00044_any_left_join_string.reference b/dbms/tests/queries/1_stateful/00044_any_left_join_string.reference
index a96e3c9f457..364115011f9 100644
--- a/dbms/tests/queries/1_stateful/00044_any_left_join_string.reference
+++ b/dbms/tests/queries/1_stateful/00044_any_left_join_string.reference
@@ -1,10 +1,10 @@
+	4508153	712428
 auto.ru	576845	8935
-yandex.ru	410788	111278
-public	328528	23
-	313516	26015
-public_search	311125	0
+yandex.ru	410776	111278
 korer.ru	277987	0
 avito.ru	163820	15556
-mail.yandex.ru	152469	1046
-main=hurriyet.com	152096	259
-wot	116912	6682
+mail.yandex.ru	152447	1046
+mail.ru	87949	22225
+best.ru	58537	55
+korablitz.ru	51844	0
+hurpass.com	49671	1251
diff --git a/dbms/tests/queries/1_stateful/00089_position_functions_with_non_constant_arg.reference b/dbms/tests/queries/1_stateful/00089_position_functions_with_non_constant_arg.reference
index ad9a93d1113..4d0ba2b70f3 100644
--- a/dbms/tests/queries/1_stateful/00089_position_functions_with_non_constant_arg.reference
+++ b/dbms/tests/queries/1_stateful/00089_position_functions_with_non_constant_arg.reference
@@ -2,8 +2,5 @@
 0
 0
 0
-http://игры на передачи пригорька россия&lr=213&rpt=simage&uinfo=ww-1905-wh-643-fw-112-rossiisoft.in.ua%2FKievav@yandex?appkey=506d9e3dfbd268e6b6630e58
-http://игры на передачи пригорька россия&lr=213&rpt=simage&uinfo=ww-1905-wh-643-fw-112-rossiisoft.in.ua%2FKievav@yandex?appkey=506d9e3dfbd268e6b6630e58
-http://ru	slovari	15
-https://ru	spb.rabota	15
-https://e	yandex	12
+https://povary_dlya-511-gemotedDynamo_accoshyutoy-s-kortosh@bk.ru/yandsearch?text=simages%2F8%2F10544998#posts%2Fkartofeleri
+https://povary_dlya-511-gemotedDynamo_accoshyutoy-s-kortosh@bk.ru/yandsearch?text=simages%2F8%2F10544998#posts%2Fkartofeleri
diff --git a/debian/.pbuilderrc b/debian/.pbuilderrc
index 4eb4c6b7306..11c733f1056 100644
--- a/debian/.pbuilderrc
+++ b/debian/.pbuilderrc
@@ -31,9 +31,11 @@
 # sudo DIST=stable       ARCH=i386 pbuilder create --configfile debian/.pbuilderrc && DIST=stable       ARCH=i386 pdebuild --configfile debian/.pbuilderrc
 # sudo DIST=testing      ARCH=i386 pbuilder create --configfile debian/.pbuilderrc && DIST=testing      ARCH=i386 pdebuild --configfile debian/.pbuilderrc
 # sudo DIST=experimental ARCH=i386 pbuilder create --configfile debian/.pbuilderrc && DIST=experimental ARCH=i386 pdebuild --configfile debian/.pbuilderrc
+# test gcc-9
+# env DEB_CC=gcc-9 DEB_CXX=g++-9 EXTRAPACKAGES="g++-9 gcc-9" DIST=disco pdebuild --configfile debian/.pbuilderrc
 # use only clang:
-# env DEB_CC=clang-5.0 DEB_CXX=clang++-5.0 EXTRAPACKAGES=clang-5.0 DIST=artful pdebuild --configfile debian/.pbuilderrc
 # env DEB_CC=clang-8 DEB_CXX=clang++-8 EXTRAPACKAGES=clang-8 DIST=disco pdebuild --configfile debian/.pbuilderrc
+# env DEB_CC=clang-5.0 DEB_CXX=clang++-5.0 EXTRAPACKAGES=clang-5.0 DIST=artful pdebuild --configfile debian/.pbuilderrc
 # clang+asan:
 # env DEB_CC=clang-5.0 DEB_CXX=clang++-5.0 EXTRAPACKAGES="clang-5.0 libc++abi-dev libc++-dev" CMAKE_FLAGS="-DENABLE_TCMALLOC=0 -DENABLE_UNWIND=0 -DCMAKE_BUILD_TYPE=Asan" DIST=artful pdebuild --configfile debian/.pbuilderrc
 # clang+tsan:
diff --git a/debian/clickhouse-server.postinst b/debian/clickhouse-server.postinst
index 8b0fe0e9de2..cd1590258ce 100644
--- a/debian/clickhouse-server.postinst
+++ b/debian/clickhouse-server.postinst
@@ -1,5 +1,6 @@
 #!/bin/sh
 set -e
+# set -x
 
 CLICKHOUSE_USER=${CLICKHOUSE_USER:=clickhouse}
 CLICKHOUSE_GROUP=${CLICKHOUSE_GROUP:=${CLICKHOUSE_USER}}
@@ -12,14 +13,12 @@ EXTRACT_FROM_CONFIG=${CLICKHOUSE_GENERIC_PROGRAM}-extract-from-config
 CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml
 
 OS=${OS=`lsb_release -is 2>/dev/null ||:`}
-if [ -z "$OS" ]; then
-    test -f /etc/os-release && . /etc/os-release && OS=$ID
-fi
-OS=${OS=`uname -s ||:`}
+[ -z "$OS" ] && [ -f /etc/os-release ] && . /etc/os-release && OS=$ID
+[ -z "$OS" ] && [ -f /etc/centos-release ] && OS=centos
+[ -z "$OS" ] && OS=`uname -s ||:`
 
-test -f /usr/share/debconf/confmodule && . /usr/share/debconf/confmodule
-
-test -f /etc/default/clickhouse && . /etc/default/clickhouse
+[ -f /usr/share/debconf/confmodule ] && . /usr/share/debconf/confmodule
+[ -f /etc/default/clickhouse ] && . /etc/default/clickhouse
 
 if [ "$OS" = "rhel" ] || [ "$OS" = "centos" ] || [ "$OS" = "fedora" ] || [ "$OS" = "CentOS" ] || [ "$OS" = "Fedora" ]; then
     is_rh=1
diff --git a/debian/pbuilder-hooks/A00ccache b/debian/pbuilder-hooks/A00ccache
index 38d78caf1b9..575358f31eb 100755
--- a/debian/pbuilder-hooks/A00ccache
+++ b/debian/pbuilder-hooks/A00ccache
@@ -1,5 +1,7 @@
 #!/bin/sh
 
+# set -x
+
 # CCACHEDIR - for pbuilder ; CCACHE_DIR - for ccache
 
 echo "CCACHEDIR=$CCACHEDIR CCACHE_DIR=$CCACHE_DIR SET_CCACHEDIR=$SET_CCACHEDIR"
@@ -12,9 +14,7 @@ if [ -n "$CCACHE_DIR" ]; then
     chmod -R a+rwx $CCACHE_DIR $DISTCC_DIR ||:
 fi
 
-[ $CCACHE_PREFIX = 'distcc' ] && mkdir -p ~/.distcc && echo "localhost/`nproc`" >> ~/.distcc/hosts
-# [ $CCACHE_PREFIX = 'distcc' ] && mkdir -p /etc/distcc/ && echo "localhost/`nproc`" >> /etc/distcc/hosts
-# distcc --show-hosts
+[ $CCACHE_PREFIX = 'distcc' ] && mkdir -p $DISTCC_DIR && echo "localhost/`nproc`" >> $DISTCC_DIR/hosts && distcc --show-hosts
 
 df -h
 ccache --show-stats
diff --git a/debian/rules b/debian/rules
index 7c008d7456d..a49ffc3f66e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -130,5 +130,7 @@ override_dh_auto_install:
 override_dh_shlibdeps:
 	true # We depend only on libc and dh_shlibdeps gives us wrong (too strict) dependency.
 
+#TODO: faster packing of non-release builds: ifdef RELEASE_COMPATIBLE
 override_dh_builddeb:
 	dh_builddeb -- -Z gzip # Older systems don't have "xz", so use "gzip" instead.
+#TODO: endif
diff --git a/docker/packager/deb/Dockerfile b/docker/packager/deb/Dockerfile
index 6f6bbf1c0b5..c3c4bc3c0d6 100644
--- a/docker/packager/deb/Dockerfile
+++ b/docker/packager/deb/Dockerfile
@@ -68,7 +68,9 @@ RUN apt-get --allow-unauthenticated update -y \
             unixodbc-dev \
             odbcinst \
             tzdata \
-            gperf
+            gperf \
+            alien
+
 
 RUN git clone https://github.com/uber/h3 && cd h3 && cmake . && make && make install && cd .. && rm -rf h3
 
diff --git a/docker/packager/deb/build.sh b/docker/packager/deb/build.sh
index b395ed76d00..033e2c26464 100755
--- a/docker/packager/deb/build.sh
+++ b/docker/packager/deb/build.sh
@@ -4,8 +4,10 @@ set -x -e
 
 ccache --show-stats ||:
 ccache --zero-stats ||:
-build/release --no-pbuilder
+build/release --no-pbuilder $ALIEN_PKGS
 mv /*.deb /output
 mv *.changes /output
 mv *.buildinfo /output
+mv /*.rpm /output ||: # if exists
+mv /*.tgz /output ||: # if exists
 ccache --show-stats ||:
diff --git a/docker/packager/packager b/docker/packager/packager
index 63987d34594..9dadb96b46d 100755
--- a/docker/packager/packager
+++ b/docker/packager/packager
@@ -103,7 +103,7 @@ def run_vagrant_box_with_env(image_path, output_dir, ch_root):
         logging.info("Copying binary back")
         vagrant.copy_from_image("~/ClickHouse/dbms/programs/clickhouse", output_dir)
 
-def parse_env_variables(build_type, compiler, sanitizer, package_type, cache, distcc_hosts, unbundled, split_binary, version, author, official):
+def parse_env_variables(build_type, compiler, sanitizer, package_type, cache, distcc_hosts, unbundled, split_binary, version, author, official, alien_pkgs):
     result = []
     cmake_flags = ['$CMAKE_FLAGS']
 
@@ -139,6 +139,9 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, cache, di
     elif cache == "distcc":
         result.append('DISTCC_HOSTS="{}"'.format("localhost/`nproc`"))
 
+    if alien_pkgs:
+        result.append("ALIEN_PKGS='" + ' '.join(['--' + pkg for pkg in alien_pkgs]) + "'")
+
     if unbundled:
         cmake_flags.append('-DUNBUNDLED=1 -DENABLE_MYSQL=0 -DENABLE_POCO_ODBC=0 -DENABLE_ODBC=0')
 
@@ -176,6 +179,7 @@ if __name__ == "__main__":
     parser.add_argument("--version")
     parser.add_argument("--author", default="clickhouse")
     parser.add_argument("--official", action="store_true")
+    parser.add_argument("--alien-pkgs", nargs='+', default=[])
 
     args = parser.parse_args()
     if not os.path.isabs(args.output_dir):
@@ -188,6 +192,9 @@ if __name__ == "__main__":
     else:
         ch_root = args.clickhouse_repo_path
 
+    if args.alien_pkgs and not args.package_type == "deb":
+        raise Exception("Can add alien packages only in deb build")
+
     dockerfile = os.path.join(ch_root, "docker/packager", args.package_type, "Dockerfile")
     if args.package_type != "freebsd" and not check_image_exists_locally(image_name) or args.force_build_image:
         if not pull_image(image_name) or args.force_build_image:
@@ -195,7 +202,7 @@ if __name__ == "__main__":
     env_prepared = parse_env_variables(
         args.build_type, args.compiler, args.sanitizer, args.package_type,
         args.cache, args.distcc_hosts, args.unbundled, args.split_binary,
-        args.version, args.author, args.official)
+        args.version, args.author, args.official, args.alien_pkgs)
     if args.package_type != "freebsd":
         run_docker_image_with_env(image_name, args.output_dir, env_prepared, ch_root, args.ccache_dir)
     else:
diff --git a/docker/server/README.md b/docker/server/README.md
index 4ceea65a934..4ed73a0fb9b 100644
--- a/docker/server/README.md
+++ b/docker/server/README.md
@@ -43,6 +43,7 @@ When you use the image with mounting local directories inside you probably would
 ## How to extend this image
 
 If you would like to do additional initialization in an image derived from this one, add one or more `*.sql`, `*.sql.gz`, or `*.sh` scripts under `/docker-entrypoint-initdb.d`. After the entrypoint calls `initdb` it will run any `*.sql` files, run any executable `*.sh` scripts, and source any non-executable `*.sh` scripts found in that directory to do further initialization before starting the service.
+Also you can provide environment variables `CLICKHOUSE_USER` & `CLICKHOUSE_PASSWORD` that will be used for clickhouse-client during initialization.
 
 For example, to add an additional user and database, add the following to `/docker-entrypoint-initdb.d/init-db.sh`:
 
diff --git a/docker/server/entrypoint.sh b/docker/server/entrypoint.sh
index b1a8802d95f..03bacb7ac88 100644
--- a/docker/server/entrypoint.sh
+++ b/docker/server/entrypoint.sh
@@ -24,6 +24,7 @@ LOG_DIR="$(dirname $LOG_PATH || true)"
 ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=logger.errorlog || true)"
 ERROR_LOG_DIR="$(dirname $ERROR_LOG_PATH || true)"
 FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=format_schema_path || true)"
+CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
 
 for dir in "$DATA_DIR" \
   "$ERROR_LOG_DIR" \
@@ -62,7 +63,12 @@ if [ -n "$(ls /docker-entrypoint-initdb.d/)" ]; then
         exit 1
     fi
 
-    clickhouseclient=( clickhouse-client --multiquery )
+    if [ ! -z "$CLICKHOUSE_PASSWORD" ]; then
+        printf -v WITH_PASSWORD '%s %q' "--password" "$CLICKHOUSE_PASSWORD"
+    fi
+
+    clickhouseclient=( clickhouse-client --multiquery -u $CLICKHOUSE_USER $WITH_PASSWORD )
+
     echo
     for f in /docker-entrypoint-initdb.d/*; do
         case "$f" in
diff --git a/docker/test/pvs/Dockerfile b/docker/test/pvs/Dockerfile
index dfd4a54ce17..41244ba3a38 100644
--- a/docker/test/pvs/Dockerfile
+++ b/docker/test/pvs/Dockerfile
@@ -20,7 +20,7 @@ RUN apt-get --allow-unauthenticated update -y \
 #        apt-get --allow-unauthenticated install --yes --no-install-recommends \
 #            pvs-studio
 
-ENV PKG_VERSION="pvs-studio-7.02.32369.1285-amd64.deb"
+ENV PKG_VERSION="pvs-studio-7.03.32801.1369-amd64.deb"
 
 RUN wget -q http://files.viva64.com/beta/$PKG_VERSION
 RUN sudo dpkg -i $PKG_VERSION
diff --git a/docs/en/development/build.md b/docs/en/development/build.md
index 6d56d6a22b4..caeb7387d49 100644
--- a/docs/en/development/build.md
+++ b/docs/en/development/build.md
@@ -40,7 +40,7 @@ sudo apt-get install git cmake ninja-build
 
 Or cmake3 instead of cmake on older systems.
 
-## Install GCC 7
+## Install GCC 8
 
 There are several ways to do this.
 
@@ -50,18 +50,18 @@ There are several ways to do this.
 sudo apt-get install software-properties-common
 sudo apt-add-repository ppa:ubuntu-toolchain-r/test
 sudo apt-get update
-sudo apt-get install gcc-7 g++-7
+sudo apt-get install gcc-8 g++-8
 ```
 
 ### Install from Sources
 
 Look at [ci/build-gcc-from-sources.sh](https://github.com/yandex/ClickHouse/blob/master/ci/build-gcc-from-sources.sh)
 
-## Use GCC 7 for Builds
+## Use GCC 8 for Builds
 
 ```bash
-export CC=gcc-7
-export CXX=g++-7
+export CC=gcc-8
+export CXX=g++-8
 ```
 
 ## Install Required Libraries from Packages
diff --git a/docs/en/operations/table_engines/file.md b/docs/en/operations/table_engines/file.md
index c63d78a2ba3..284f8deff58 100644
--- a/docs/en/operations/table_engines/file.md
+++ b/docs/en/operations/table_engines/file.md
@@ -67,7 +67,7 @@ $ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64
 
 ## Details of Implementation
 
-- Reads can be parallel, but not writes
+- Multiple SELECT queries can be performed concurrently, but INSERT queries will wait each other.
 - Not supported:
   - `ALTER`
   - `SELECT ... SAMPLE`
diff --git a/docs/en/operations/table_engines/kafka.md b/docs/en/operations/table_engines/kafka.md
index 22d0384fd42..7bedd8f7ac9 100644
--- a/docs/en/operations/table_engines/kafka.md
+++ b/docs/en/operations/table_engines/kafka.md
@@ -26,7 +26,7 @@ SETTINGS
     [kafka_row_delimiter = 'delimiter_symbol',]
     [kafka_schema = '',]
     [kafka_num_consumers = N,]
-    [kafka_skip_broken_messages = <0|1>]
+    [kafka_skip_broken_messages = N]
 ```
 Required parameters:
 
@@ -40,7 +40,7 @@ Optional parameters:
 - `kafka_row_delimiter` – Delimiter character, which ends the message.
 - `kafka_schema` – Parameter that must be used if the format requires a schema definition. For example, [Cap'n Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
 - `kafka_num_consumers` – The number of consumers per table. Default: `1`. Specify more consumers if the throughput of one consumer is insufficient. The total number of consumers should not exceed the number of partitions in the topic, since only one consumer can be assigned per partition.
-- `kafka_skip_broken_messages` – Kafka message parser mode. If `kafka_skip_broken_messages = 1` then the engine skips the Kafka messages that can't be parsed (a message equals a row of data).
+- `kafka_skip_broken_messages` – Kafka message parser tolerance to schema-incompatible messages per block. Default: `0`. If `kafka_skip_broken_messages = N` then the engine skips *N* Kafka messages that cannot be parsed (a message equals a row of data).
 
 Examples:
 
@@ -100,6 +100,7 @@ Groups are flexible and synced on the cluster. For instance, if you have 10 topi
 3. Create a materialized view that converts data from the engine and puts it into a previously created table.
 
 When the `MATERIALIZED VIEW` joins the engine, it starts collecting data in the background. This allows you to continually receive messages from Kafka and convert them to the required format using `SELECT`.
+One kafka table can have as many materialized views as you like, they do not read data from the kafka table directly, but receive new records (in blocks), this way you can write to several tables with different detail level (with grouping - aggregation and without).
 
 Example:
 
diff --git a/docs/en/query_language/agg_functions/combinators.md b/docs/en/query_language/agg_functions/combinators.md
index bbd49a5c6f7..852b396332c 100644
--- a/docs/en/query_language/agg_functions/combinators.md
+++ b/docs/en/query_language/agg_functions/combinators.md
@@ -22,7 +22,7 @@ Example 2: `uniqArray(arr)` – Count the number of unique elements in all 'arr'
 
 ## -State
 
-If you apply this combinator, the aggregate function doesn't return the resulting value (such as the number of unique values for the `uniq` function), but an intermediate state of the aggregation (for `uniq`, this is the hash table for calculating the number of unique values). This is an AggregateFunction(...) that can be used for further processing or stored in a table to finish aggregating later. See the sections "AggregatingMergeTree" and "Functions for working with intermediate aggregation states".
+If you apply this combinator, the aggregate function doesn't return the resulting value (such as the number of unique values for the `uniq` function), but an intermediate state of the aggregation (for `uniq`, this is the hash table for calculating the number of unique values). This is an AggregateFunction(...) that can be used for further processing or stored in a table to finish aggregating later. To work with these states, use the [AggregatingMergeTree](../../operations/table_engines/aggregatingmergetree.md) table engine, the functions [`finalizeAggregation`](../functions/other_functions.md#finalizeaggregation) and [`runningAccumulate`](../functions/other_functions.md#function-runningaccumulate), and the combinators -Merge and -MergeState described below.
 
 ## -Merge
 
diff --git a/docs/en/query_language/functions/machine_learning_functions.md b/docs/en/query_language/functions/machine_learning_functions.md
index 6e10667c4e5..5d9983f015f 100644
--- a/docs/en/query_language/functions/machine_learning_functions.md
+++ b/docs/en/query_language/functions/machine_learning_functions.md
@@ -1,4 +1,4 @@
-# Machine learning methods
+# Machine learning functions
 
 ## evalMLMethod (prediction) {#machine_learning_methods-evalmlmethod}
 
diff --git a/docs/en/query_language/functions/url_functions.md b/docs/en/query_language/functions/url_functions.md
index 19b12bd5b21..93edf705e7e 100644
--- a/docs/en/query_language/functions/url_functions.md
+++ b/docs/en/query_language/functions/url_functions.md
@@ -12,7 +12,7 @@ Returns the protocol. Examples: http, ftp, mailto, magnet...
 
 ### domain
 
-Gets the domain.
+Gets the domain. Cut scheme with size less than 16 bytes.
 
 ### domainWithoutWWW
 
diff --git a/docs/ru/operations/table_engines/kafka.md b/docs/ru/operations/table_engines/kafka.md
index bdbc13e171a..3fe2e4d5cba 100644
--- a/docs/ru/operations/table_engines/kafka.md
+++ b/docs/ru/operations/table_engines/kafka.md
@@ -97,6 +97,7 @@ Kafka(kafka_broker_list, kafka_topic_list, kafka_group_name, kafka_format
 3. Создайте материализованное представление, которое преобразует данные от движка и помещает их в ранее созданную таблицу.
 
 Когда к движку присоединяется материализованное представление (`MATERIALIZED VIEW`), оно начинает в фоновом режиме собирать данные. Это позволяет непрерывно получать сообщения от Kafka и преобразовывать их в необходимый формат с помощью `SELECT`.
+Материализованных представлений у одной kafka таблицы может быть сколько угодно, они не считывают данные из таблицы kafka непосредственно, а получают новые записи (блоками), таким образом можно писать в несколько таблиц с разным уровнем детализации (с группировкой - агрегацией и без).
 
 Пример:
 
diff --git a/docs/ru/query_language/functions/url_functions.md b/docs/ru/query_language/functions/url_functions.md
index 4b4fdc9adda..1897d1b28a3 100644
--- a/docs/ru/query_language/functions/url_functions.md
+++ b/docs/ru/query_language/functions/url_functions.md
@@ -10,7 +10,7 @@
 Возвращает протокол. Примеры: http, ftp, mailto, magnet...
 
 ### domain
-Возвращает домен.
+Возвращает домен. Отсекает схему размером не более 16 байт.
 
 ### domainWithoutWWW
 Возвращает домен, удалив не более одного 'www.' с начала, если есть.
diff --git a/libs/libcommon/cmake/find_jemalloc.cmake b/libs/libcommon/cmake/find_jemalloc.cmake
index 3a1b14d9c33..0b1c80c8934 100644
--- a/libs/libcommon/cmake/find_jemalloc.cmake
+++ b/libs/libcommon/cmake/find_jemalloc.cmake
@@ -7,7 +7,7 @@ endif ()
 option (ENABLE_JEMALLOC "Set to TRUE to use jemalloc" ${ENABLE_JEMALLOC_DEFAULT})
 if (OS_LINUX AND NOT ARCH_ARM)
     option (USE_INTERNAL_JEMALLOC_LIBRARY "Set to FALSE to use system jemalloc library instead of bundled" ${NOT_UNBUNDLED})
-elseif ()
+else()
     option (USE_INTERNAL_JEMALLOC_LIBRARY "Set to FALSE to use system jemalloc library instead of bundled" OFF)
 endif()
 
@@ -30,7 +30,7 @@ if (ENABLE_JEMALLOC)
 
     if (JEMALLOC_LIBRARIES)
         set (USE_JEMALLOC 1)
-    else ()
+    elseif (NOT MISSING_INTERNAL_JEMALLOC_LIBRARY)
         message (FATAL_ERROR "ENABLE_JEMALLOC is set to true, but library was not found")
     endif ()
 
diff --git a/libs/libcommon/include/common/unaligned.h b/libs/libcommon/include/common/unaligned.h
index 2b1505ba2d3..ca73298adfb 100644
--- a/libs/libcommon/include/common/unaligned.h
+++ b/libs/libcommon/include/common/unaligned.h
@@ -1,6 +1,7 @@
 #pragma once
 
 #include <string.h>
+#include <type_traits>
 
 
 template <typename T>
@@ -11,8 +12,14 @@ inline T unalignedLoad(const void * address)
     return res;
 }
 
+/// We've had troubles before with wrong store size due to integral promotions
+/// (e.g., unalignedStore(dest, uint16_t + uint16_t) stores an uint32_t).
+/// To prevent this, make the caller specify the stored type explicitly.
+/// To disable deduction of T, wrap the argument type with std::enable_if.
 template <typename T>
-inline void unalignedStore(void * address, const T & src)
+inline void unalignedStore(void * address,
+                           const typename std::enable_if<true, T>::type & src)
 {
+    static_assert(std::is_trivially_copyable_v<T>);
     memcpy(address, &src, sizeof(src));
 }
diff --git a/libs/libglibc-compatibility/glibc-compatibility.c b/libs/libglibc-compatibility/glibc-compatibility.c
index ca5ceb9c5b1..d4bb739a72c 100644
--- a/libs/libglibc-compatibility/glibc-compatibility.c
+++ b/libs/libglibc-compatibility/glibc-compatibility.c
@@ -131,7 +131,7 @@ int sscanf(const char *restrict s, const char *restrict fmt, ...)
     return ret;
 }
 
-int __isoc99_sscanf(const char *str, const char *format, ...) __attribute__((weak, alias("sscanf")));
+int __isoc99_sscanf(const char *str, const char *format, ...) __attribute__((weak, nonnull, nothrow, alias("sscanf")));
 
 int open(const char *path, int oflag);
 
diff --git a/release b/release
index 1049b3921c5..758e346acc9 100755
--- a/release
+++ b/release
@@ -113,9 +113,8 @@ export EXTRAPACKAGES
 VERSION_STRING+=$VERSION_POSTFIX
 echo -e "\nCurrent version is $VERSION_STRING"
 
-gen_changelog "$VERSION_STRING" "" "$AUTHOR" ""
-
 if [ -z "$NO_BUILD" ] ; then
+    gen_changelog "$VERSION_STRING" "" "$AUTHOR" ""
     if [ -z "$USE_PBUILDER" ] ; then
         DEB_CC=${DEB_CC:=`which gcc-7 gcc-8 gcc | head -n1`}
         DEB_CXX=${DEB_CXX:=`which g++-7 g++-8 g++ | head -n1`}
diff --git a/utils/CMakeLists.txt b/utils/CMakeLists.txt
index 46d5519fa76..3b523822451 100644
--- a/utils/CMakeLists.txt
+++ b/utils/CMakeLists.txt
@@ -35,3 +35,5 @@ endif ()
 if (ENABLE_CODE_QUALITY)
     add_subdirectory (check-style)
 endif ()
+
+add_subdirectory (package)
diff --git a/utils/package/CMakeLists.txt b/utils/package/CMakeLists.txt
new file mode 100644
index 00000000000..8c8a09adc0f
--- /dev/null
+++ b/utils/package/CMakeLists.txt
@@ -0,0 +1 @@
+add_subdirectory (arch)
diff --git a/utils/package/arch/CMakeLists.txt b/utils/package/arch/CMakeLists.txt
new file mode 100644
index 00000000000..07489cf9b19
--- /dev/null
+++ b/utils/package/arch/CMakeLists.txt
@@ -0,0 +1,2 @@
+include (${ClickHouse_SOURCE_DIR}/dbms/cmake/version.cmake)
+configure_file (PKGBUILD.in PKGBUILD)
diff --git a/utils/package/arch/PKGBUILD.in b/utils/package/arch/PKGBUILD.in
new file mode 100644
index 00000000000..b3482b04907
--- /dev/null
+++ b/utils/package/arch/PKGBUILD.in
@@ -0,0 +1,27 @@
+pkgname=clickhouse
+pkgver=${VERSION_STRING}
+pkgrel=1
+pkgdesc='An open-source column-oriented database management system that allows generating analytical data reports in real time'
+arch=('x86_64')
+url='https://clickhouse.yandex/'
+license=('Apache')
+
+package() {
+    # This code was requisited from kmeaw@ https://aur.archlinux.org/packages/clickhouse/ .
+    SRC=${ClickHouse_SOURCE_DIR}
+    BIN=${ClickHouse_BINARY_DIR}
+    mkdir -p $pkgdir/etc/clickhouse-server/ $pkgdir/etc/clickhouse-client/
+    mkdir -p $pkgdir/usr/bin/
+    mkdir -p $pkgdir/usr/lib/systemd/system
+    ln -s clickhouse-client $pkgdir/usr/bin/clickhouse-server
+    cp $SRC/dbms/programs/server/config.xml $SRC/dbms/programs/server/users.xml $pkgdir/etc/clickhouse-server/
+    cp $BIN/dbms/programs/clickhouse $pkgdir/usr/bin/clickhouse-client
+    patchelf --remove-rpath $pkgdir/usr/bin/clickhouse-client
+    patchelf --replace-needed libz.so.1 libz-ng.so.1 $pkgdir/usr/bin/clickhouse-client
+    cp $SRC/dbms/programs/client/clickhouse-client.xml $pkgdir/etc/clickhouse-client/config.xml
+    compiler="libclickhouse-compiler.so"
+    if ! pacman -Q clang | grep '^clang 7'; then
+        compiler=""
+    fi
+    cp $SRC/debian/clickhouse-server.service $pkgdir/usr/lib/systemd/system
+} 
diff --git a/utils/package/arch/README.md b/utils/package/arch/README.md
new file mode 100644
index 00000000000..10bdae7367a
--- /dev/null
+++ b/utils/package/arch/README.md
@@ -0,0 +1,9 @@
+### Build Arch linux package
+
+From binary directory:
+
+```
+make
+cd arch
+makepkg
+```
diff --git a/utils/release/push_packages b/utils/release/push_packages
index 38852f94f1d..a32ceb0777f 100755
--- a/utils/release/push_packages
+++ b/utils/release/push_packages
@@ -145,16 +145,25 @@ def clear_old_incoming_packages(ssh_connection, user):
     for pkg in ('deb', 'rpm', 'tgz'):
         for release_type in ('stable', 'testing', 'prestable'):
             try:
-                ssh_connection.execute("rm /home/{user}/incoming/clickhouse/{pkg}/{release_type}/*".format(
-                    user=user, pkg=pkg, release_type=release_type))
+                if pkg != 'tgz':
+                    ssh_connection.execute("rm /home/{user}/incoming/clickhouse/{pkg}/{release_type}/*".format(
+                        user=user, pkg=pkg, release_type=release_type))
+                else:
+                    ssh_connection.execute("rm /home/{user}/incoming/clickhouse/{pkg}/*".format(
+                        user=user, pkg=pkg, release_type=release_type))
             except Exception:
                 logging.info("rm is not required")
 
 
 def _get_incoming_path(repo_url, user=None, pkg_type=None, release_type=None):
     if repo_url == 'repo.mirror.yandex.net':
-        return "/home/{user}/incoming/clickhouse/{pkg}/{release_type}".format(
-            user=user, pkg=pkg_type, release_type=release_type)
+        if pkg_type != 'tgz':
+            return "/home/{user}/incoming/clickhouse/{pkg}/{release_type}".format(
+                user=user, pkg=pkg_type, release_type=release_type)
+        else:
+            return "/home/{user}/incoming/clickhouse/{pkg}".format(
+                user=user, pkg=pkg_type)
+
     else:
         return "/repo/{0}/mini-dinstall/incoming/".format(repo_url.split('.')[0])
 
diff --git a/utils/release/release_lib.sh b/utils/release/release_lib.sh
index a2af31a3532..75307dfe0b0 100644
--- a/utils/release/release_lib.sh
+++ b/utils/release/release_lib.sh
@@ -1,4 +1,5 @@
 set +e
+# set -x
 
 function gen_version_string {
     if [ -n "$TEST" ]; then
@@ -181,17 +182,16 @@ function gen_dockerfiles {
 }
 
 function make_rpm {
-    get_version
-    VERSION_STRING+=$VERSION_POSTFIX
-    VERSION=$VERSION_STRING
-    PACKAGE_DIR=../
+    [ -z "$VERSION_STRING" ] && get_version && VERSION_STRING+=${VERSION_POSTFIX}
+    VERSION_FULL="${VERSION_STRING}"
+    PACKAGE_DIR=${PACKAGE_DIR=../}
 
     function deb_unpack {
-        rm -rf $PACKAGE-$VERSION
-        alien --verbose --generate --to-rpm --scripts ${PACKAGE_DIR}${PACKAGE}_${VERSION}_${ARCH}.deb
-        cd $PACKAGE-$VERSION
-        mv ${PACKAGE}-$VERSION-2.spec ${PACKAGE}-$VERSION-2.spec.tmp
-        cat ${PACKAGE}-$VERSION-2.spec.tmp \
+        rm -rf $PACKAGE-$VERSION_FULL
+        alien --verbose --generate --to-rpm --scripts ${PACKAGE_DIR}${PACKAGE}_${VERSION_FULL}_${ARCH}.deb
+        cd $PACKAGE-$VERSION_FULL
+        mv ${PACKAGE}-$VERSION_FULL-2.spec ${PACKAGE}-$VERSION_FULL-2.spec.tmp
+        cat ${PACKAGE}-$VERSION_FULL-2.spec.tmp \
             | grep -vF '%dir "/"' \
             | grep -vF '%dir "/usr/"' \
             | grep -vF '%dir "/usr/bin/"' \
@@ -210,11 +210,11 @@ function make_rpm {
             | grep -vF '%dir "/etc/cron.d/"' \
             | grep -vF '%dir "/etc/systemd/system/"' \
             | grep -vF '%dir "/etc/systemd/"' \
-            > ${PACKAGE}-$VERSION-2.spec
+            > ${PACKAGE}-$VERSION_FULL-2.spec
     }
 
     function rpm_pack {
-        rpmbuild --buildroot="$CUR_DIR/${PACKAGE}-$VERSION" -bb --target ${TARGET} "${PACKAGE}-$VERSION-2.spec"
+        rpmbuild --buildroot="$CUR_DIR/${PACKAGE}-$VERSION_FULL" -bb --target ${TARGET} "${PACKAGE}-$VERSION_FULL-2.spec"
         cd $CUR_DIR
     }
 
@@ -237,10 +237,10 @@ function make_rpm {
     ARCH=all
     TARGET=noarch
     deb_unpack
-    mv ${PACKAGE}-$VERSION-2.spec ${PACKAGE}-$VERSION-2.spec_tmp
-    echo "Requires: python2" >> ${PACKAGE}-$VERSION-2.spec
+    mv ${PACKAGE}-$VERSION_FULL-2.spec ${PACKAGE}-$VERSION_FULL-2.spec_tmp
+    echo "Requires: python2" >> ${PACKAGE}-$VERSION_FULL-2.spec
     #echo "Requires: python2-termcolor" >> ${PACKAGE}-$VERSION-2.spec
-    cat ${PACKAGE}-$VERSION-2.spec_tmp >> ${PACKAGE}-$VERSION-2.spec
+    cat ${PACKAGE}-$VERSION_FULL-2.spec_tmp >> ${PACKAGE}-$VERSION_FULL-2.spec
     rpm_pack
 
     PACKAGE=clickhouse-common-static
@@ -253,18 +253,17 @@ function make_rpm {
     TARGET=x86_64
     unpack_pack
 
-    mv clickhouse-*-${VERSION_STRING}-2.*.rpm ${PACKAGE_DIR}
+    mv clickhouse-*-${VERSION_FULL}-2.*.rpm ${PACKAGE_DIR}
 }
 
 function make_tgz {
-    get_version
-    VERSION_STRING+=$VERSION_POSTFIX
-    VERSION=$VERSION_STRING
-    PACKAGE_DIR=../
+    [ -z "$VERSION_STRING" ] && get_version && VERSION_STRING+=${VERSION_POSTFIX}
+    VERSION_FULL="${VERSION_STRING}"
+    PACKAGE_DIR=${PACKAGE_DIR=../}
 
     for PACKAGE in clickhouse-server clickhouse-client clickhouse-test clickhouse-common-static clickhouse-common-static-dbg; do
-        alien --verbose --to-tgz ${PACKAGE_DIR}${PACKAGE}_${VERSION}_*.deb
+        alien --verbose --to-tgz ${PACKAGE_DIR}${PACKAGE}_${VERSION_FULL}_*.deb
     done
 
-    mv clickhouse-*-${VERSION_STRING}.tgz ${PACKAGE_DIR}
+    mv clickhouse-*-${VERSION_FULL}.tgz ${PACKAGE_DIR}
 }