Merge remote-tracking branch 'upstream/master' into query-poor-mans-profiler

2024-09-20 08:40:50 +00:00 · 2019-07-04 22:13:51 +00:00 · 2019-07-04 22:13:51 +00:00 · 0f579860f7
commit 0f579860f7
parent 7cff36fbfc 74d17789d0
469 changed files with 7887 additions and 7443 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -76,6 +76,9 @@
 [submodule "contrib/brotli"]
 	path = contrib/brotli
 	url = https://github.com/google/brotli.git
+[submodule "contrib/h3"]
+	path = contrib/h3
+	url = https://github.com/uber/h3
 [submodule "contrib/hyperscan"]
 	path = contrib/hyperscan
 	url = https://github.com/ClickHouse-Extras/hyperscan.git
@ -88,3 +91,6 @@
 [submodule "contrib/rapidjson"]
 	path = contrib/rapidjson
 	url = https://github.com/Tencent/rapidjson
+[submodule "contrib/mimalloc"]
+	path = contrib/mimalloc
+	url = https://github.com/ClickHouse-Extras/mimalloc
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,163 @@
+## ClickHouse release 19.9.2.4, 2019-06-24
+
+### New Feature
+* Print information about frozen parts in `system.parts` table. [#5471](https://github.com/yandex/ClickHouse/pull/5471) ([proller](https://github.com/proller))
+* Ask client password on clickhouse-client start on tty if not set in arguments [#5092](https://github.com/yandex/ClickHouse/pull/5092) ([proller](https://github.com/proller))
+* Implement `dictGet` and `dictGetOrDefault` functions for Decimal types. [#5394](https://github.com/yandex/ClickHouse/pull/5394) ([Artem Zuikov](https://github.com/4ertus2))
+
+### Improvement
+* Debian init: Add service stop timeout [#5522](https://github.com/yandex/ClickHouse/pull/5522) ([proller](https://github.com/proller))
+* Add setting forbidden by default to create table with suspicious types for LowCardinality [#5448](https://github.com/yandex/ClickHouse/pull/5448) ([Olga Khvostikova](https://github.com/stavrolia))
+* Regression functions return model weights when not used as State in function `evalMLMethod`. [#5411](https://github.com/yandex/ClickHouse/pull/5411) ([Quid37](https://github.com/Quid37))
+* Rename and improve regression methods. [#5492](https://github.com/yandex/ClickHouse/pull/5492) ([Quid37](https://github.com/Quid37))
+* Clearer interfaces of string searchers. [#5586](https://github.com/yandex/ClickHouse/pull/5586) ([Danila Kutenin](https://github.com/danlark1))
+
+### Bug Fix
+* Fix potential data loss in Kafka [#5445](https://github.com/yandex/ClickHouse/pull/5445) ([Ivan](https://github.com/abyss7))
+* Fix potential infinite loop in `PrettySpace` format when called with zero columns [#5560](https://github.com/yandex/ClickHouse/pull/5560) ([Olga Khvostikova](https://github.com/stavrolia))
+* Fixed UInt32 overflow bug in linear models. Allow eval ML model for non-const model argument. [#5516](https://github.com/yandex/ClickHouse/pull/5516) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+* `ALTER TABLE ... DROP INDEX IF EXISTS ...` should not raise an exception if provided index does not exist [#5524](https://github.com/yandex/ClickHouse/pull/5524) ([Gleb Novikov](https://github.com/NanoBjorn))
+* Fix segfault with `bitmapHasAny` in scalar subquery [#5528](https://github.com/yandex/ClickHouse/pull/5528) ([Zhichang Yu](https://github.com/yuzhichang))
+* Fixed error when replication connection pool doesn't retry to resolve host, even when DNS cache was dropped. [#5534](https://github.com/yandex/ClickHouse/pull/5534) ([alesapin](https://github.com/alesapin))
+* Fixed `ALTER ... MODIFY TTL` on ReplicatedMergeTree. [#5539](https://github.com/yandex/ClickHouse/pull/5539) ([Anton Popov](https://github.com/CurtizJ))
+* Fix INSERT into Distributed table with MATERIALIZED column [#5429](https://github.com/yandex/ClickHouse/pull/5429) ([Azat Khuzhin](https://github.com/azat))
+* Fix bad alloc when truncate Join storage [#5437](https://github.com/yandex/ClickHouse/pull/5437) ([TCeason](https://github.com/TCeason))
+* In recent versions of package tzdata some of files are symlinks now. The current mechanism for detecting default timezone gets broken and gives wrong names for some timezones. Now at least we force the timezone name to the contents of TZ if provided. [#5443](https://github.com/yandex/ClickHouse/pull/5443) ([Ivan](https://github.com/abyss7))
+* Fix some extremely rare cases with MultiVolnitsky searcher when the constant needles in sum are at least 16KB long. The algorithm missed or overwrote the previous results which can lead to the incorrect result of `multiSearchAny`. [#5588](https://github.com/yandex/ClickHouse/pull/5588) ([Danila Kutenin](https://github.com/danlark1))
+* Fix the issue when settings for ExternalData requests couldn't use ClickHouse settings. Also, for now, settings `date_time_input_format` and `low_cardinality_allow_in_native_format` cannot be used because of the ambiguity of names (in external data it can be interpreted as table format and in the query it can be a setting). [#5455](https://github.com/yandex/ClickHouse/pull/5455) ([Danila Kutenin](https://github.com/danlark1))
+* Fix bug when parts were removed only from FS without dropping them from Zookeeper. [#5520](https://github.com/yandex/ClickHouse/pull/5520) ([alesapin](https://github.com/alesapin))
+* Remove debug logging from MySQL protocol [#5478](https://github.com/yandex/ClickHouse/pull/5478) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Skip ZNONODE during DDL query processing [#5489](https://github.com/yandex/ClickHouse/pull/5489) ([Azat Khuzhin](https://github.com/azat))
+* Fix mix `UNION ALL` result column type. There were cases with inconsistent data and column types of resulting columns. [#5503](https://github.com/yandex/ClickHouse/pull/5503) ([Artem Zuikov](https://github.com/4ertus2))
+* Throw an exception on wrong integers in `dictGetT` functions instead of crash. [#5446](https://github.com/yandex/ClickHouse/pull/5446) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix wrong element_count and load_factor for hashed dictionary in `system.dictionaries` table. [#5440](https://github.com/yandex/ClickHouse/pull/5440) ([Azat Khuzhin](https://github.com/azat))
+
+### Build/Testing/Packaging Improvement
+* Fixed build without `Brotli` HTTP compression support (`ENABLE_BROTLI=OFF` cmake variable). [#5521](https://github.com/yandex/ClickHouse/pull/5521) ([Anton Yuzhaninov](https://github.com/citrin))
+* Include roaring.h as roaring/roaring.h [#5523](https://github.com/yandex/ClickHouse/pull/5523) ([Orivej Desh](https://github.com/orivej))
+* Fix gcc9 warnings in hyperscan (#line directive is evil!) [#5546](https://github.com/yandex/ClickHouse/pull/5546) ([Danila Kutenin](https://github.com/danlark1))
+* Fix all warnings when compiling with gcc-9. Fix some contrib issues. Fix gcc9 ICE and submit it to bugzilla. [#5498](https://github.com/yandex/ClickHouse/pull/5498) ([Danila Kutenin](https://github.com/danlark1))
+* Fixed linking with lld [#5477](https://github.com/yandex/ClickHouse/pull/5477) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Remove unused specializations in dictionaries [#5452](https://github.com/yandex/ClickHouse/pull/5452) ([Artem Zuikov](https://github.com/4ertus2))
+* Improvement performance tests for formatting and parsing tables for different types of files [#5497](https://github.com/yandex/ClickHouse/pull/5497) ([Olga Khvostikova](https://github.com/stavrolia))
+* Fixes for parallel test run [#5506](https://github.com/yandex/ClickHouse/pull/5506) ([proller](https://github.com/proller))
+* Docker: use configs from clickhouse-test [#5531](https://github.com/yandex/ClickHouse/pull/5531) ([proller](https://github.com/proller))
+* Fix compile for FreeBSD [#5447](https://github.com/yandex/ClickHouse/pull/5447) ([proller](https://github.com/proller))
+* Upgrade boost to 1.70 [#5570](https://github.com/yandex/ClickHouse/pull/5570) ([proller](https://github.com/proller))
+* Fix build clickhouse as submodule [#5574](https://github.com/yandex/ClickHouse/pull/5574) ([proller](https://github.com/proller))
+* Improve JSONExtract performance tests [#5444](https://github.com/yandex/ClickHouse/pull/5444) ([Vitaly Baranov](https://github.com/vitlibar))
+
+## ClickHouse release 19.8.3.8, 2019-06-11
+
+### New Features
+* Added functions to work with JSON [#4686](https://github.com/yandex/ClickHouse/pull/4686) ([hcz](https://github.com/hczhcz)) [#5124](https://github.com/yandex/ClickHouse/pull/5124). ([Vitaly Baranov](https://github.com/vitlibar))
+* Add a function basename, with a similar behaviour to a basename function, which exists in a lot of languages (`os.path.basename` in python, `basename` in PHP, etc...). Work with both an UNIX-like path or a Windows path. [#5136](https://github.com/yandex/ClickHouse/pull/5136) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Added `LIMIT n, m BY` or `LIMIT m OFFSET n BY` syntax to set offset of n for LIMIT BY clause. [#5138](https://github.com/yandex/ClickHouse/pull/5138) ([Anton Popov](https://github.com/CurtizJ))
+* Added new data type `SimpleAggregateFunction`, which allows to have columns with light aggregation in an `AggregatingMergeTree`. This can only be used with simple functions like `any`, `anyLast`, `sum`, `min`, `max`. [#4629](https://github.com/yandex/ClickHouse/pull/4629) ([Boris Granveaud](https://github.com/bgranvea))
+* Added support for non-constant arguments in function `ngramDistance` [#5198](https://github.com/yandex/ClickHouse/pull/5198) ([Danila Kutenin](https://github.com/danlark1))
+* Added functions `skewPop`, `skewSamp`, `kurtPop` and `kurtSamp` to compute for sequence skewness, sample skewness, kurtosis and sample kurtosis respectively. [#5200](https://github.com/yandex/ClickHouse/pull/5200) ([hcz](https://github.com/hczhcz))
+* Support rename operation for `MaterializeView` storage. [#5209](https://github.com/yandex/ClickHouse/pull/5209) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Added server which allows connecting to ClickHouse using MySQL client. [#4715](https://github.com/yandex/ClickHouse/pull/4715) ([Yuriy Baranov](https://github.com/yurriy))
+* Add `toDecimal*OrZero` and `toDecimal*OrNull` functions. [#5291](https://github.com/yandex/ClickHouse/pull/5291) ([Artem Zuikov](https://github.com/4ertus2))
+* Support Decimal types in functions: `quantile`, `quantiles`, `median`, `quantileExactWeighted`, `quantilesExactWeighted`, medianExactWeighted. [#5304](https://github.com/yandex/ClickHouse/pull/5304) ([Artem Zuikov](https://github.com/4ertus2))
+* Added `toValidUTF8` function, which replaces all invalid UTF-8 characters by replacement character <20> (U+FFFD). [#5322](https://github.com/yandex/ClickHouse/pull/5322) ([Danila Kutenin](https://github.com/danlark1))
+* Added `format` function. Formatting constant pattern (simplified Python format pattern) with the strings listed in the arguments. [#5330](https://github.com/yandex/ClickHouse/pull/5330) ([Danila Kutenin](https://github.com/danlark1))
+* Added `system.detached_parts` table containing information about detached parts of `MergeTree` tables. [#5353](https://github.com/yandex/ClickHouse/pull/5353) ([akuzm](https://github.com/akuzm))
+* Added `ngramSearch` function to calculate the non-symmetric difference between needle and haystack. [#5418](https://github.com/yandex/ClickHouse/pull/5418)[#5422](https://github.com/yandex/ClickHouse/pull/5422)  ([Danila Kutenin](https://github.com/danlark1))
+* Implementation of basic machine learning methods (stochastic linear regression and logistic regression) using aggregate functions interface. Has different strategies for updating model weights (simple gradient descent, momentum method, Nesterov method). Also supports mini-batches of custom size. [#4943](https://github.com/yandex/ClickHouse/pull/4943) ([Quid37](https://github.com/Quid37))
+* Implementation of `geohashEncode` and `geohashDecode` functions. [#5003](https://github.com/yandex/ClickHouse/pull/5003) ([Vasily Nemkov](https://github.com/Enmk))
+* Added aggregate function `timeSeriesGroupSum`, which can aggregate different time series that sample timestamp not alignment. It will use linear interpolation between two sample timestamp and then sum time-series together. Added aggregate function `timeSeriesGroupRateSum`, which calculates the rate of time-series and then sum rates together. [#4542](https://github.com/yandex/ClickHouse/pull/4542) ([Yangkuan Liu](https://github.com/LiuYangkuan))
+* Added functions `IPv4CIDRtoIPv4Range` and `IPv6CIDRtoIPv6Range` to calculate the lower and higher bounds for an IP in the subnet using a CIDR. [#5095](https://github.com/yandex/ClickHouse/pull/5095) ([Guillaume Tassery](https://github.com/YiuRULE))
+* Add a X-ClickHouse-Summary header when we send a query using HTTP with enabled setting `send_progress_in_http_headers`. Return the usual information of X-ClickHouse-Progress, with additional information like how many rows and bytes were inserted in the query. [#5116](https://github.com/yandex/ClickHouse/pull/5116) ([Guillaume Tassery](https://github.com/YiuRULE))
+
+### Improvements
+* Added `max_parts_in_total` setting for MergeTree family of tables (default: 100 000) that prevents unsafe specification of partition key #5166. [#5171](https://github.com/yandex/ClickHouse/pull/5171) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* `clickhouse-obfuscator`: derive seed for individual columns by combining initial seed with column name, not column position. This is intended to transform datasets with multiple related tables, so that tables will remain JOINable after transformation. [#5178](https://github.com/yandex/ClickHouse/pull/5178) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Added functions `JSONExtractRaw`, `JSONExtractKeyAndValues`. Renamed functions `jsonExtract<type>` to `JSONExtract<type>`. When something goes wrong these functions return the correspondent values, not `NULL`. Modified function `JSONExtract`, now it gets the return type from its last parameter and doesn't inject nullables. Implemented fallback to RapidJSON in case AVX2 instructions are not available. Simdjson library updated to a new version. [#5235](https://github.com/yandex/ClickHouse/pull/5235) ([Vitaly Baranov](https://github.com/vitlibar))
+* Now `if` and `multiIf` functions don't rely on the condition's `Nullable`, but rely on the branches for sql compatibility. [#5238](https://github.com/yandex/ClickHouse/pull/5238) ([Jian Wu](https://github.com/janplus))
+* `In` predicate now generates `Null` result from `Null` input like the `Equal` function. [#5152](https://github.com/yandex/ClickHouse/pull/5152) ([Jian Wu](https://github.com/janplus))
+* Check the time limit every (flush_interval / poll_timeout) number of rows from Kafka. This allows to break the reading from Kafka consumer more frequently and to check the time limits for the top-level streams [#5249](https://github.com/yandex/ClickHouse/pull/5249) ([Ivan](https://github.com/abyss7))
+* Link rdkafka with bundled SASL. It should allow to use SASL SCRAM authentication [#5253](https://github.com/yandex/ClickHouse/pull/5253) ([Ivan](https://github.com/abyss7))
+* Batched version of RowRefList for ALL JOINS. [#5267](https://github.com/yandex/ClickHouse/pull/5267) ([Artem Zuikov](https://github.com/4ertus2))
+* clickhouse-server: more informative listen error messages. [#5268](https://github.com/yandex/ClickHouse/pull/5268) ([proller](https://github.com/proller))
+* Support dictionaries in clickhouse-copier for functions in `<sharding_key>` [#5270](https://github.com/yandex/ClickHouse/pull/5270) ([proller](https://github.com/proller))
+* Add new setting `kafka_commit_every_batch` to regulate Kafka committing policy. 
+It allows to set commit mode: after every batch of messages is handled, or after the whole block is written to the storage. It's a trade-off between losing some messages or reading them twice in some extreme situations. [#5308](https://github.com/yandex/ClickHouse/pull/5308) ([Ivan](https://github.com/abyss7))
+* Make `windowFunnel` support other Unsigned Integer Types. [#5320](https://github.com/yandex/ClickHouse/pull/5320) ([sundyli](https://github.com/sundy-li))
+* Allow to shadow virtual column `_table` in Merge engine. [#5325](https://github.com/yandex/ClickHouse/pull/5325) ([Ivan](https://github.com/abyss7))
+* Make `sequenceMatch` aggregate functions support other unsigned Integer types [#5339](https://github.com/yandex/ClickHouse/pull/5339) ([sundyli](https://github.com/sundy-li))
+* Better error messages if checksum mismatch is most likely caused by hardware failures. [#5355](https://github.com/yandex/ClickHouse/pull/5355) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Check that underlying tables support sampling for `StorageMerge` [#5366](https://github.com/yandex/ClickHouse/pull/5366) ([Ivan](https://github.com/abyss7))
+* Сlose MySQL connections after their usage in external dictionaries. It is related to issue #893. [#5395](https://github.com/yandex/ClickHouse/pull/5395) ([Clément Rodriguez](https://github.com/clemrodriguez))
+* Improvements of MySQL Wire Protocol. Changed name of format to MySQLWire. Using RAII for calling RSA_free. Disabling SSL if context cannot be created. [#5419](https://github.com/yandex/ClickHouse/pull/5419) ([Yuriy Baranov](https://github.com/yurriy))
+* clickhouse-client: allow to run with unaccessable history file (read-only, no disk space, file is directory, ...). [#5431](https://github.com/yandex/ClickHouse/pull/5431) ([proller](https://github.com/proller))
+* Respect query settings in asynchronous INSERTs into Distributed tables. [#4936](https://github.com/yandex/ClickHouse/pull/4936) ([TCeason](https://github.com/TCeason))
+* Renamed functions `leastSqr` to `simpleLinearRegression`, `LinearRegression` to `linearRegression`, `LogisticRegression` to `logisticRegression`. [#5391](https://github.com/yandex/ClickHouse/pull/5391) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+
+### Performance Improvements
+* Paralellize processing of parts in alter modify query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
+* Optimizations in regular expressions extraction. [#5193](https://github.com/yandex/ClickHouse/pull/5193) [#5191](https://github.com/yandex/ClickHouse/pull/5191) ([Danila Kutenin](https://github.com/danlark1))
+* Do not add right join key column to join result if it's used only in join on section. [#5260](https://github.com/yandex/ClickHouse/pull/5260) ([Artem Zuikov](https://github.com/4ertus2))
+* Freeze the Kafka buffer after first empty response. It avoids multiple invokations of `ReadBuffer::next()` for empty result in some row-parsing streams. [#5283](https://github.com/yandex/ClickHouse/pull/5283) ([Ivan](https://github.com/abyss7))
+* `concat` function optimization for multiple arguments. [#5357](https://github.com/yandex/ClickHouse/pull/5357) ([Danila Kutenin](https://github.com/danlark1))
+* Query optimisation. Allow push down IN statement while rewriting commа/cross join into inner one. [#5396](https://github.com/yandex/ClickHouse/pull/5396) ([Artem Zuikov](https://github.com/4ertus2))
+* Upgrade our LZ4 implementation with reference one to have faster decompression. [#5070](https://github.com/yandex/ClickHouse/pull/5070) ([Danila Kutenin](https://github.com/danlark1))
+* Implemented MSD radix sort (based on kxsort), and partial sorting. [#5129](https://github.com/yandex/ClickHouse/pull/5129) ([Evgenii Pravda](https://github.com/kvinty))
+
+### Bug Fixes
+* Fix push require columns with join [#5192](https://github.com/yandex/ClickHouse/pull/5192) ([Winter Zhang](https://github.com/zhang2014))
+* Fixed bug, when ClickHouse is run by systemd, the command `sudo service clickhouse-server forcerestart` was not working as expected. [#5204](https://github.com/yandex/ClickHouse/pull/5204) ([proller](https://github.com/proller))
+* Fix http error codes in DataPartsExchange (interserver http server on 9009 port always returned code 200, even on errors). [#5216](https://github.com/yandex/ClickHouse/pull/5216) ([proller](https://github.com/proller))
+* Fix SimpleAggregateFunction for String longer than MAX_SMALL_STRING_SIZE [#5311](https://github.com/yandex/ClickHouse/pull/5311) ([Azat Khuzhin](https://github.com/azat))
+* Fix error for `Decimal` to `Nullable(Decimal)` conversion in IN. Support other Decimal to Decimal conversions (including different scales). [#5350](https://github.com/yandex/ClickHouse/pull/5350) ([Artem Zuikov](https://github.com/4ertus2))
+* Fixed FPU clobbering in simdjson library that lead to wrong calculation of `uniqHLL` and `uniqCombined` aggregate function and math functions such as `log`. [#5354](https://github.com/yandex/ClickHouse/pull/5354) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed handling mixed const/nonconst cases in JSON functions. [#5435](https://github.com/yandex/ClickHouse/pull/5435) ([Vitaly Baranov](https://github.com/vitlibar))
+* Fix `retention` function. Now all conditions that satisfy in a row of data are added to the data state. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
+* Fix result type for `quantileExact` with Decimals. [#5304](https://github.com/yandex/ClickHouse/pull/5304) ([Artem Zuikov](https://github.com/4ertus2)) 
+
+### Documentation
+*  Translate documentation for `CollapsingMergeTree` to chinese. [#5168](https://github.com/yandex/ClickHouse/pull/5168) ([张风啸](https://github.com/AlexZFX))
+* Translate some documentation about table engines to chinese. 
+    [#5134](https://github.com/yandex/ClickHouse/pull/5134)
+    [#5328](https://github.com/yandex/ClickHouse/pull/5328)
+    ([never lee](https://github.com/neverlee))
+    
+
+### Build/Testing/Packaging Improvements
+* Fix some sanitizer reports that show probable use-after-free.[#5139](https://github.com/yandex/ClickHouse/pull/5139) [#5143](https://github.com/yandex/ClickHouse/pull/5143) [#5393](https://github.com/yandex/ClickHouse/pull/5393) ([Ivan](https://github.com/abyss7))
+* Move performance tests out of separate directories for convenience. [#5158](https://github.com/yandex/ClickHouse/pull/5158) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix incorrect performance tests. [#5255](https://github.com/yandex/ClickHouse/pull/5255) ([alesapin](https://github.com/alesapin))
+* Added a tool to calculate checksums caused by bit flips to debug hardware issues. [#5334](https://github.com/yandex/ClickHouse/pull/5334) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Make runner script more usable. [#5340](https://github.com/yandex/ClickHouse/pull/5340)[#5360](https://github.com/yandex/ClickHouse/pull/5360) ([filimonov](https://github.com/filimonov))
+* Add small instruction how to write performance tests. [#5408](https://github.com/yandex/ClickHouse/pull/5408) ([alesapin](https://github.com/alesapin))
+* Add ability to make substitutions in create, fill and drop query in performance tests [#5367](https://github.com/yandex/ClickHouse/pull/5367) ([Olga Khvostikova](https://github.com/stavrolia))
+
+## ClickHouse release 19.7.5.27, 2019-06-09
+
+### New features
+* Added bitmap related functions `bitmapHasAny` and `bitmapHasAll` analogous to `hasAny` and `hasAll` functions for arrays. [#5279](https://github.com/yandex/ClickHouse/pull/5279) ([Sergi Vladykin](https://github.com/svladykin))
+
+### Bug Fixes
+* Fix segfault on `minmax` INDEX with Null value. [#5246](https://github.com/yandex/ClickHouse/pull/5246) ([Nikita Vasilev](https://github.com/nikvas0))
+* Mark all input columns in LIMIT BY as required output. It fixes 'Not found column' error in some distributed queries. [#5407](https://github.com/yandex/ClickHouse/pull/5407) ([Constantin S. Pan](https://github.com/kvap))
+* Fix "Column '0' already exists" error in `SELECT .. PREWHERE` on column with DEFAULT [#5397](https://github.com/yandex/ClickHouse/pull/5397) ([proller](https://github.com/proller))
+* Fix `ALTER MODIFY TTL` query on `ReplicatedMergeTree`. [#5539](https://github.com/yandex/ClickHouse/pull/5539/commits) ([Anton Popov](https://github.com/CurtizJ))
+* Don't crash the server when Kafka consumers have failed to start. [#5285](https://github.com/yandex/ClickHouse/pull/5285) ([Ivan](https://github.com/abyss7))
+* Fixed bitmap functions produce wrong result. [#5359](https://github.com/yandex/ClickHouse/pull/5359) ([Andy Yang](https://github.com/andyyzh))
+* Fix element_count for hashed dictionary (do not include duplicates) [#5440](https://github.com/yandex/ClickHouse/pull/5440) ([Azat Khuzhin](https://github.com/azat))
+* Use contents of environment variable TZ as the name for timezone. It helps to correctly detect default timezone in some cases.[#5443](https://github.com/yandex/ClickHouse/pull/5443) ([Ivan](https://github.com/abyss7))
+* Do not try to convert integers in `dictGetT` functions, because it doesn't work correctly. Throw an exception instead. [#5446](https://github.com/yandex/ClickHouse/pull/5446) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix settings in ExternalData HTTP request. [#5455](https://github.com/yandex/ClickHouse/pull/5455) ([Danila
+  Kutenin](https://github.com/danlark1))
+* Fix bug when parts were removed only from FS without dropping them from Zookeeper. [#5520](https://github.com/yandex/ClickHouse/pull/5520) ([alesapin](https://github.com/alesapin))
+* Fix segmentation fault in `bitmapHasAny` function. [#5528](https://github.com/yandex/ClickHouse/pull/5528) ([Zhichang Yu](https://github.com/yuzhichang))
+* Fixed error when replication connection pool doesn't retry to resolve host, even when DNS cache was dropped. [#5534](https://github.com/yandex/ClickHouse/pull/5534) ([alesapin](https://github.com/alesapin))
+* Fixed `DROP INDEX IF EXISTS` query. Now `ALTER TABLE ... DROP INDEX IF EXISTS ...` query doesn't raise an exception if provided index does not exist. [#5524](https://github.com/yandex/ClickHouse/pull/5524) ([Gleb Novikov](https://github.com/NanoBjorn))
+* Fix union all supertype column. There were cases with inconsistent data and column types of resulting columns. [#5503](https://github.com/yandex/ClickHouse/pull/5503) ([Artem Zuikov](https://github.com/4ertus2))
+* Skip ZNONODE during DDL query processing. Before if another node removes the znode in task queue, the one that
+did not process it, but already get list of children, will terminate the DDLWorker thread. [#5489](https://github.com/yandex/ClickHouse/pull/5489) ([Azat Khuzhin](https://github.com/azat))
+* Fix INSERT into Distributed() table with MATERIALIZED column. [#5429](https://github.com/yandex/ClickHouse/pull/5429) ([Azat Khuzhin](https://github.com/azat))
+
 ## ClickHouse release 19.7.3.9, 2019-05-30

 ### New Features
@ -60,6 +220,16 @@ lee](https://github.com/neverlee))
  [#5110](https://github.com/yandex/ClickHouse/pull/5110)
 ([proller](https://github.com/proller))

+## ClickHouse release 19.6.3.18, 2019-06-13
+
+### Bug Fixes
+* Fixed IN condition pushdown for queries from table functions `mysql` and `odbc` and corresponding table engines. This fixes #3540 and #2384. [#5313](https://github.com/yandex/ClickHouse/pull/5313) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix deadlock in Zookeeper. [#5297](https://github.com/yandex/ClickHouse/pull/5297) ([github1youlc](https://github.com/github1youlc))
+* Allow quoted decimals in CSV. [#5284](https://github.com/yandex/ClickHouse/pull/5284) ([Artem Zuikov](https://github.com/4ertus2) 
+* Disallow conversion from float Inf/NaN into Decimals (throw exception). [#5282](https://github.com/yandex/ClickHouse/pull/5282) ([Artem Zuikov](https://github.com/4ertus2))
+* Fix data race in rename query. [#5247](https://github.com/yandex/ClickHouse/pull/5247) ([Winter Zhang](https://github.com/zhang2014))
+* Temporarily disable LFAlloc. Usage of LFAlloc might lead to a lot of MAP_FAILED in allocating UncompressedCache and in a result to crashes of queries at high loaded servers. [cfdba93](https://github.com/yandex/ClickHouse/commit/cfdba938ce22f16efeec504f7f90206a515b1280)([Danila Kutenin](https://github.com/danlark1))
+
 ## ClickHouse release 19.6.2.11, 2019-05-13

 ### New Features
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -16,6 +16,8 @@ set(CMAKE_LINK_DEPENDS_NO_SHARED 1) # Do not relink all depended targets on .so
 set(CMAKE_CONFIGURATION_TYPES "RelWithDebInfo;Debug;Release;MinSizeRel" CACHE STRING "" FORCE)
 set(CMAKE_DEBUG_POSTFIX "d" CACHE STRING "Generate debug library name with a postfix.")    # To be consistent with CMakeLists from contrib libs.

+include (cmake/arch.cmake)
+
 option(ENABLE_IPO "Enable inter-procedural optimization (aka LTO)" OFF) # need cmake 3.9+
 if(ENABLE_IPO)
    cmake_policy(SET CMP0069 NEW)
@ -31,7 +33,7 @@ else()
    message(STATUS "IPO/LTO not enabled.")
 endif()

-if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
+if (COMPILER_GCC)
    # Require at least gcc 7
    if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7 AND NOT CMAKE_VERSION VERSION_LESS 2.8.9)
        message (FATAL_ERROR "GCC version must be at least 7. For example, if GCC 7 is available under gcc-7, g++-7 names, do the following: export CC=gcc-7 CXX=g++-7; rm -rf CMakeCache.txt CMakeFiles; and re run cmake or ./release.")
@ -81,7 +83,6 @@ endif ()

 include (cmake/sanitize.cmake)

-include (cmake/arch.cmake)

 if (CMAKE_GENERATOR STREQUAL "Ninja")
    # Turn on colored output. https://github.com/ninja-build/ninja/wiki/FAQ
@ -102,13 +103,12 @@ if (COMPILER_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "8.3.0")
    set (CXX_WARNING_FLAGS "${CXX_WARNING_FLAGS} -Wno-array-bounds")
 endif ()

-if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+if (COMPILER_CLANG)
    # clang: warning: argument unused during compilation: '-stdlib=libc++'
    # clang: warning: argument unused during compilation: '-specs=/usr/share/dpkg/no-pie-compile.specs' [-Wunused-command-line-argument]
    set (COMMON_WARNING_FLAGS "${COMMON_WARNING_FLAGS} -Wno-unused-command-line-argument")
 endif ()

-option (TEST_COVERAGE "Enables flags for test coverage" OFF)
 option (ENABLE_TESTS "Enables tests" ON)

 if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64")
@ -134,7 +134,7 @@ string(REGEX MATCH "-?[0-9]+(.[0-9]+)?$" COMPILER_POSTFIX ${CMAKE_CXX_COMPILER})
 find_program (LLD_PATH NAMES "lld${COMPILER_POSTFIX}" "lld")
 find_program (GOLD_PATH NAMES "gold")

-if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND LLD_PATH AND NOT LINKER_NAME)
+if (COMPILER_CLANG AND LLD_PATH AND NOT LINKER_NAME)
    set (LINKER_NAME "lld")
 elseif (GOLD_PATH)
    set (LINKER_NAME "gold")
@ -168,7 +168,7 @@ if (ARCH_NATIVE)
 endif ()

 # Special options for better optimized code with clang
-#if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+#if (COMPILER_CLANG)
 #    set (CMAKE_CXX_FLAGS_RELWITHDEBINFO  "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -Wno-unused-command-line-argument -mllvm -inline-threshold=10000")
 #endif ()

@ -183,6 +183,14 @@ else ()
    set (CXX_FLAGS_INTERNAL_COMPILER "-std=c++1z")
 endif ()

+option(WITH_COVERAGE "Build with coverage." 0)
+if(WITH_COVERAGE AND COMPILER_CLANG)
+   set(COMPILER_FLAGS "${COMPILER_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
+endif()
+if(WITH_COVERAGE AND COMPILER_GCC)
+   set(COMPILER_FLAGS "${COMPILER_FLAGS} -fprofile-arcs -ftest-coverage")
+endif()
+
 set (CMAKE_BUILD_COLOR_MAKEFILE          ON)
 set (CMAKE_CXX_FLAGS                     "${CMAKE_CXX_FLAGS} ${COMPILER_FLAGS} ${PLATFORM_EXTRA_CXX_FLAG} -fno-omit-frame-pointer ${COMMON_WARNING_FLAGS} ${CXX_WARNING_FLAGS}")
 #set (CMAKE_CXX_FLAGS_RELEASE             "${CMAKE_CXX_FLAGS_RELEASE} ${CMAKE_CXX_FLAGS_ADD}")
@ -194,7 +202,6 @@ set (CMAKE_C_FLAGS                       "${CMAKE_C_FLAGS} ${COMPILER_FLAGS} -fn
 set (CMAKE_C_FLAGS_RELWITHDEBINFO        "${CMAKE_C_FLAGS_RELWITHDEBINFO} -O3 ${CMAKE_C_FLAGS_ADD}")
 set (CMAKE_C_FLAGS_DEBUG                 "${CMAKE_C_FLAGS_DEBUG} -O0 -g3 -ggdb3 -fno-inline ${CMAKE_C_FLAGS_ADD}")

-
 include (cmake/use_libcxx.cmake)
 include (cmake/find_unwind.cmake)

@ -274,7 +281,6 @@ if (DEFAULT_LIBS)
    set(CMAKE_CXX_STANDARD_LIBRARIES ${DEFAULT_LIBS})
 endif ()

-
 if (NOT MAKE_STATIC_LIBRARIES)
    set(CMAKE_POSITION_INDEPENDENT_CODE ON)
 endif ()
@ -291,10 +297,23 @@ if (USE_INCLUDE_WHAT_YOU_USE)
    endif()
 endif ()

-# Flags for test coverage
-if (TEST_COVERAGE)
-    set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -DIS_DEBUG")
-endif (TEST_COVERAGE)
+# Using clang-tidy static analyzer http://mariobadr.com/using-clang-tidy-with-cmake-36.html https://cmake.org/cmake/help/v3.6/prop_tgt/LANG_CLANG_TIDY.html
+option (ENABLE_CLANG_TIDY "Use 'clang-tidy' static analyzer" OFF)
+if (ENABLE_CLANG_TIDY)
+    if (${CMAKE_VERSION} VERSION_LESS "3.6.0")
+        message(FATAL_ERROR "clang-tidy requires CMake version at least 3.6.")
+    endif()
+    find_program (CLANG_TIDY_EXE NAMES "clang-tidy" DOC "Path to clang-tidy executable")
+    if (NOT CLANG_TIDY_EXE)
+        set (USE_CLANG_TIDY 0)
+        message (STATUS "clang-tidy not found.")
+    else ()
+        set (USE_CLANG_TIDY 1)
+        message (STATUS "clang-tidy found: ${CLANG_TIDY_EXE}")
+        set (DO_CLANG_TIDY "${CLANG_TIDY_EXE}" "-checks=*,-clang-analyzer-alpha.*")
+        # You can enable it within a directory by: set (CMAKE_CXX_CLANG_TIDY "${DO_CLANG_TIDY}")
+    endif ()
+endif ()

 if (ENABLE_TESTS)
    message (STATUS "Tests are enabled")
@ -346,6 +365,7 @@ include (cmake/find_libgsasl.cmake)
 include (cmake/find_rdkafka.cmake)
 include (cmake/find_capnp.cmake)
 include (cmake/find_llvm.cmake)
+include (cmake/find_h3.cmake)
 include (cmake/find_cpuid.cmake) # Freebsd, bundled
 if (NOT USE_CPUID)
    include (cmake/find_cpuinfo.cmake) # Debian
@ -358,7 +378,7 @@ include (cmake/find_hdfs3.cmake) # uses protobuf
 include (cmake/find_consistent-hashing.cmake)
 include (cmake/find_base64.cmake)
 include (cmake/find_hyperscan.cmake)
-include (cmake/find_lfalloc.cmake)
+include (cmake/find_mimalloc.cmake)
 include (cmake/find_simdjson.cmake)
 include (cmake/find_rapidjson.cmake)
 find_contrib_lib(cityhash)
@ -426,6 +446,7 @@ if (GLIBC_COMPATIBILITY OR USE_INTERNAL_UNWIND_LIBRARY_FOR_EXCEPTION_HANDLING)
    add_default_dependencies(kj)
    add_default_dependencies(simdjson)
    add_default_dependencies(apple_rt)
+    add_default_dependencies(h3)
    add_default_dependencies(re2)
    add_default_dependencies(re2_st)
    add_default_dependencies(hs_compile_shared)
--- a/2
+++ b/2
@ -1,3 +1,5 @@
+Copyright 2016-2019 Yandex LLC
+
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/
--- a/README.md
+++ b/README.md
@ -7,13 +7,12 @@ ClickHouse is an open-source column-oriented database management system that all
 * [Official website](https://clickhouse.yandex/) has quick high-level overview of ClickHouse on main page.
 * [Tutorial](https://clickhouse.yandex/tutorial.html) shows how to set up and query small ClickHouse cluster.
 * [Documentation](https://clickhouse.yandex/docs/en/) provides more in-depth information.
+* [YouTube channel](https://www.youtube.com/channel/UChtmrD-dsdpspr42P_PyRAw) has a lot of content about ClickHouse in video format.
 * [Blog](https://clickhouse.yandex/blog/en/) contains various ClickHouse-related articles, as well as announces and reports about events.
 * [Contacts](https://clickhouse.yandex/#contacts) can help to get your questions answered if there are any.
 * You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.

 ## Upcoming Events
-* [ClickHouse on HighLoad++ Siberia](https://www.highload.ru/siberia/2019/abstracts/5348) on June 24-25.
-* [ClickHouse Meetup in Novosibirsk](https://events.yandex.ru/events/ClickHouse/26-June-2019/) on June 26.
 * [ClickHouse Meetup in Minsk](https://yandex.ru/promo/metrica/clickhouse-minsk) on July 11.
 * [ClickHouse Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
 * [ClickHouse Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
--- a/ci/install-os-packages.sh
+++ b/ci/install-os-packages.sh
@ -7,9 +7,9 @@ WHAT=$1

 [[ $EUID -ne 0 ]] && SUDO=sudo

-command -v apt-get && PACKAGE_MANAGER=apt
 command -v yum && PACKAGE_MANAGER=yum
 command -v pkg && PACKAGE_MANAGER=pkg
+command -v apt-get && PACKAGE_MANAGER=apt


 case $PACKAGE_MANAGER in
--- a/cmake/find_h3.cmake
+++ b/cmake/find_h3.cmake
@ -0,0 +1,19 @@
+option (USE_INTERNAL_H3_LIBRARY "Set to FALSE to use system h3 library instead of bundled" ${NOT_UNBUNDLED})
+
+set (H3_INCLUDE_PATHS /usr/local/include/h3)
+
+if (USE_INTERNAL_H3_LIBRARY)
+    set (H3_LIBRARY h3)
+    set (H3_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/h3/src/h3lib/include)
+else ()
+    find_library (H3_LIBRARY h3)
+    find_path (H3_INCLUDE_DIR NAMES h3api.h PATHS ${H3_INCLUDE_PATHS})
+endif ()
+
+if (H3_LIBRARY AND H3_INCLUDE_DIR)
+    set (USE_H3 1)
+else ()
+    set (USE_H3 0)
+endif ()
+
+message (STATUS "Using h3=${USE_H3}: ${H3_INCLUDE_DIR} : ${H3_LIBRARY}")
--- a/cmake/find_lfalloc.cmake
+++ b/cmake/find_lfalloc.cmake
@ -1,11 +0,0 @@
-# TODO(danlark1). Disable LFAlloc for a while to fix mmap count problem
-if (NOT OS_LINUX AND NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE AND NOT OS_FREEBSD AND NOT APPLE)
-    option (ENABLE_LFALLOC "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
-endif ()
-
-if (ENABLE_LFALLOC)
-    set (USE_LFALLOC 1)
-    set (USE_LFALLOC_RANDOM_HINT 1)
-    set (LFALLOC_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/lfalloc/src)
-    message (STATUS "Using lfalloc=${USE_LFALLOC}: ${LFALLOC_INCLUDE_DIR}")
-endif ()
--- a/cmake/find_mimalloc.cmake
+++ b/cmake/find_mimalloc.cmake
@ -0,0 +1,15 @@
+if (OS_LINUX AND NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE)
+    option (ENABLE_MIMALLOC "Set to FALSE to disable usage of mimalloc for internal ClickHouse caches" ${NOT_UNBUNDLED})
+endif ()
+
+if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/mimalloc/include/mimalloc.h")
+    message (WARNING "submodule contrib/mimalloc is missing. to fix try run: \n git submodule update --init --recursive")
+    return()
+endif ()
+
+if (ENABLE_MIMALLOC)
+    set (MIMALLOC_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/mimalloc/include)
+    set (USE_MIMALLOC 1)
+    set (MIMALLOC_LIBRARY mimalloc-static)
+    message (STATUS "Using mimalloc: ${MIMALLOC_INCLUDE_DIR} : ${MIMALLOC_LIBRARY}")
+endif ()
--- a/contrib/CMakeLists.txt
+++ b/contrib/CMakeLists.txt
@ -1,11 +1,11 @@
 # Third-party libraries may have substandard code.

 if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch -Wno-stringop-truncation")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -Wno-array-bounds -Wno-missing-attributes -Wno-stringop-truncation -std=c++1z")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z")
 elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch -Wno-string-plus-int")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z")
 endif ()

 set_property(DIRECTORY PROPERTY EXCLUDE_FROM_ALL 1)
@ -106,6 +106,10 @@ if (USE_INTERNAL_CPUID_LIBRARY)
    add_subdirectory (libcpuid)
 endif ()

+if (USE_INTERNAL_H3_LIBRARY)
+    add_subdirectory(h3-cmake)
+endif ()
+
 if (USE_INTERNAL_SSL_LIBRARY)
    if (NOT MAKE_STATIC_LIBRARIES)
        set (BUILD_SHARED 1)
@ -317,3 +321,7 @@ endif()
 if (USE_SIMDJSON)
    add_subdirectory (simdjson-cmake)
 endif()
+
+if (USE_MIMALLOC)
+    add_subdirectory (mimalloc)
+endif()
--- a/contrib/h3
+++ b/contrib/h3
@ -0,0 +1 @@
+Subproject commit 6cfd649e8c0d3ed913e8aae928a669fc3b8a2365
--- a/contrib/h3-cmake/CMakeLists.txt
+++ b/contrib/h3-cmake/CMakeLists.txt
@ -0,0 +1,27 @@
+set(H3_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/h3/src/h3lib)
+set(H3_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/h3/src/h3lib)
+
+set(SRCS
+${H3_SOURCE_DIR}/lib/algos.c
+${H3_SOURCE_DIR}/lib/baseCells.c
+${H3_SOURCE_DIR}/lib/bbox.c
+${H3_SOURCE_DIR}/lib/coordijk.c
+${H3_SOURCE_DIR}/lib/faceijk.c
+${H3_SOURCE_DIR}/lib/geoCoord.c
+${H3_SOURCE_DIR}/lib/h3Index.c
+${H3_SOURCE_DIR}/lib/h3UniEdge.c
+${H3_SOURCE_DIR}/lib/linkedGeo.c
+${H3_SOURCE_DIR}/lib/localij.c
+${H3_SOURCE_DIR}/lib/mathExtensions.c
+${H3_SOURCE_DIR}/lib/polygon.c
+${H3_SOURCE_DIR}/lib/vec2d.c
+${H3_SOURCE_DIR}/lib/vec3d.c
+${H3_SOURCE_DIR}/lib/vertexGraph.c
+)
+
+configure_file(${H3_SOURCE_DIR}/include/h3api.h.in ${H3_BINARY_DIR}/include/h3api.h)
+
+add_library(h3 ${SRCS})
+target_include_directories(h3 SYSTEM PUBLIC ${H3_SOURCE_DIR}/include)
+target_include_directories(h3 SYSTEM PUBLIC ${H3_BINARY_DIR}/include)
+target_compile_definitions(h3 PRIVATE H3_HAVE_VLA)
--- a/contrib/hyperscan
+++ b/contrib/hyperscan
@ -1 +1 @@
-Subproject commit 05dab0efee80be405aad5f74721b692b6889b75e
+Subproject commit 01e6b83f9fbdb4020cd68a5287bf3a0471eeb272
--- a/contrib/lfalloc/src/lf_allocX64.h
+++ b/contrib/lfalloc/src/lf_allocX64.h
--- a/contrib/lfalloc/src/lfmalloc.h
+++ b/contrib/lfalloc/src/lfmalloc.h
@ -1,23 +0,0 @@
-#pragma once
-
-#include <string.h>
-#include <stdlib.h>
-#include "util/system/compiler.h"
-
-namespace NMalloc {
-    volatile inline bool IsAllocatorCorrupted = false;
-
-    static inline void AbortFromCorruptedAllocator() {
-        IsAllocatorCorrupted = true;
-        abort();
-    }
-
-    struct TAllocHeader {
-        void* Block;
-        size_t AllocSize;
-        void Y_FORCE_INLINE Encode(void* block, size_t size, size_t signature) {
-            Block = block;
-            AllocSize = size | signature;
-        }
-    };
-}
--- a/contrib/lfalloc/src/util/README.md
+++ b/contrib/lfalloc/src/util/README.md
@ -1,33 +0,0 @@
-Style guide for the util folder is a stricter version of general style guide (mostly in terms of ambiguity resolution).
-
- * all {} must be in K&R style
- * &, * tied closer to a type, not to variable
- * always use `using` not `typedef`
- * even a single line block must be in braces {}:
-   ```
-   if (A) {
-       B();
-   }
-   ```
- * _ at the end of private data member of a class - `First_`, `Second_`
- * every .h file must be accompanied with corresponding .cpp to avoid a leakage and check that it is self contained
- * prohibited to use `printf`-like functions
-
-
-Things declared in the general style guide, which sometimes are missed:
-
- * `template <`, not `template<`
- * `noexcept`, not `throw ()` nor `throw()`, not required for destructors
- * indents inside `namespace` same as inside `class`
-
-
-Requirements for a new code (and for corrections in an old code which involves change of behaviour) in util:
-
- * presence of UNIT-tests
- * presence of comments in Doxygen style
- * accessors without Get prefix (`Length()`, but not `GetLength()`)
-
-This guide is not a mandatory as there is the general style guide.
-Nevertheless if it is not followed, then a next `ya style .` run in the util folder will undeservedly update authors of some lines of code.
-
-Thus before a commit it is recommended to run `ya style .` in the util folder.
--- a/contrib/lfalloc/src/util/system/atomic.h
+++ b/contrib/lfalloc/src/util/system/atomic.h
@ -1,51 +0,0 @@
-#pragma once
-
-#include "defaults.h"
-
-using TAtomicBase = intptr_t;
-using TAtomic = volatile TAtomicBase;
-
-#if defined(__GNUC__)
-#include "atomic_gcc.h"
-#elif defined(_MSC_VER)
-#include "atomic_win.h"
-#else
-#error unsupported platform
-#endif
-
-#if !defined(ATOMIC_COMPILER_BARRIER)
-#define ATOMIC_COMPILER_BARRIER()
-#endif
-
-static inline TAtomicBase AtomicSub(TAtomic& a, TAtomicBase v) {
-    return AtomicAdd(a, -v);
-}
-
-static inline TAtomicBase AtomicGetAndSub(TAtomic& a, TAtomicBase v) {
-    return AtomicGetAndAdd(a, -v);
-}
-
-#if defined(USE_GENERIC_SETGET)
-static inline TAtomicBase AtomicGet(const TAtomic& a) {
-    return a;
-}
-
-static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
-    a = v;
-}
-#endif
-
-static inline bool AtomicTryLock(TAtomic* a) {
-    return AtomicCas(a, 1, 0);
-}
-
-static inline bool AtomicTryAndTryLock(TAtomic* a) {
-    return (AtomicGet(*a) == 0) && AtomicTryLock(a);
-}
-
-static inline void AtomicUnlock(TAtomic* a) {
-    ATOMIC_COMPILER_BARRIER();
-    AtomicSet(*a, 0);
-}
-
-#include "atomic_ops.h"
--- a/contrib/lfalloc/src/util/system/atomic_gcc.h
+++ b/contrib/lfalloc/src/util/system/atomic_gcc.h
@ -1,90 +0,0 @@
-#pragma once
-
-#define ATOMIC_COMPILER_BARRIER() __asm__ __volatile__("" \
-                                                       :  \
-                                                       :  \
-                                                       : "memory")
-
-static inline TAtomicBase AtomicGet(const TAtomic& a) {
-    TAtomicBase tmp;
-#if defined(_arm64_)
-    __asm__ __volatile__(
-        "ldar %x[value], %[ptr]  \n\t"
-        : [value] "=r"(tmp)
-        : [ptr] "Q"(a)
-        : "memory");
-#else
-    __atomic_load(&a, &tmp, __ATOMIC_ACQUIRE);
-#endif
-    return tmp;
-}
-
-static inline void AtomicSet(TAtomic& a, TAtomicBase v) {
-#if defined(_arm64_)
-    __asm__ __volatile__(
-        "stlr %x[value], %[ptr]  \n\t"
-        : [ptr] "=Q"(a)
-        : [value] "r"(v)
-        : "memory");
-#else
-    __atomic_store(&a, &v, __ATOMIC_RELEASE);
-#endif
-}
-
-static inline intptr_t AtomicIncrement(TAtomic& p) {
-    return __atomic_add_fetch(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& p) {
-    return __atomic_fetch_add(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& p) {
-    return __atomic_sub_fetch(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& p) {
-    return __atomic_fetch_sub(&p, 1, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicAdd(TAtomic& p, intptr_t v) {
-    return __atomic_add_fetch(&p, v, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& p, intptr_t v) {
-    return __atomic_fetch_add(&p, v, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* p, intptr_t v) {
-    (void)p; // disable strange 'parameter set but not used' warning on gcc
-    intptr_t ret;
-    __atomic_exchange(p, &v, &ret, __ATOMIC_SEQ_CST);
-    return ret;
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    (void)a; // disable strange 'parameter set but not used' warning on gcc
-    return __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    (void)a; // disable strange 'parameter set but not used' warning on gcc
-    __atomic_compare_exchange(a, &compare, &exchange, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
-    return compare;
-}
-
-static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
-    return __atomic_or_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
-    return __atomic_xor_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
-    return __atomic_and_fetch(&a, b, __ATOMIC_SEQ_CST);
-}
-
-static inline void AtomicBarrier() {
-    __sync_synchronize();
-}
--- a/contrib/lfalloc/src/util/system/atomic_ops.h
+++ b/contrib/lfalloc/src/util/system/atomic_ops.h
@ -1,189 +0,0 @@
-#pragma once
-
-#include <type_traits>
-
-template <typename T>
-inline TAtomic* AsAtomicPtr(T volatile* target) {
-    return reinterpret_cast<TAtomic*>(target);
-}
-
-template <typename T>
-inline const TAtomic* AsAtomicPtr(T const volatile* target) {
-    return reinterpret_cast<const TAtomic*>(target);
-}
-
-// integral types
-
-template <typename T>
-struct TAtomicTraits {
-    enum {
-        Castable = std::is_integral<T>::value && sizeof(T) == sizeof(TAtomicBase) && !std::is_const<T>::value,
-    };
-};
-
-template <typename T, typename TT>
-using TEnableIfCastable = std::enable_if_t<TAtomicTraits<T>::Castable, TT>;
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGet(T const volatile& target) {
-    return static_cast<T>(AtomicGet(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, void> AtomicSet(T volatile& target, TAtomicBase value) {
-    AtomicSet(*AsAtomicPtr(&target), value);
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicIncrement(T volatile& target) {
-    return static_cast<T>(AtomicIncrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndIncrement(T volatile& target) {
-    return static_cast<T>(AtomicGetAndIncrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicDecrement(T volatile& target) {
-    return static_cast<T>(AtomicDecrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndDecrement(T volatile& target) {
-    return static_cast<T>(AtomicGetAndDecrement(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicAdd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicAdd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndAdd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicGetAndAdd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicSub(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicSub(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndSub(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicGetAndSub(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicSwap(T volatile* target, TAtomicBase exchange) {
-    return static_cast<T>(AtomicSwap(AsAtomicPtr(target), exchange));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
-    return AtomicCas(AsAtomicPtr(target), exchange, compare);
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicGetAndCas(T volatile* target, TAtomicBase exchange, TAtomicBase compare) {
-    return static_cast<T>(AtomicGetAndCas(AsAtomicPtr(target), exchange, compare));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicTryLock(T volatile* target) {
-    return AtomicTryLock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, bool> AtomicTryAndTryLock(T volatile* target) {
-    return AtomicTryAndTryLock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, void> AtomicUnlock(T volatile* target) {
-    AtomicUnlock(AsAtomicPtr(target));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicOr(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicOr(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicAnd(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicAnd(*AsAtomicPtr(&target), value));
-}
-
-template <typename T>
-inline TEnableIfCastable<T, T> AtomicXor(T volatile& target, TAtomicBase value) {
-    return static_cast<T>(AtomicXor(*AsAtomicPtr(&target), value));
-}
-
-// pointer types
-
-template <typename T>
-inline T* AtomicGet(T* const volatile& target) {
-    return reinterpret_cast<T*>(AtomicGet(*AsAtomicPtr(&target)));
-}
-
-template <typename T>
-inline void AtomicSet(T* volatile& target, T* value) {
-    AtomicSet(*AsAtomicPtr(&target), reinterpret_cast<TAtomicBase>(value));
-}
-
-using TNullPtr = decltype(nullptr);
-
-template <typename T>
-inline void AtomicSet(T* volatile& target, TNullPtr) {
-    AtomicSet(*AsAtomicPtr(&target), 0);
-}
-
-template <typename T>
-inline T* AtomicSwap(T* volatile* target, T* exchange) {
-    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange)));
-}
-
-template <typename T>
-inline T* AtomicSwap(T* volatile* target, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicSwap(AsAtomicPtr(target), 0));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, T* exchange, T* compare) {
-    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare));
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, T* exchange, T* compare) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), reinterpret_cast<TAtomicBase>(compare)));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, T* exchange, TNullPtr) {
-    return AtomicCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0);
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, T* exchange, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), reinterpret_cast<TAtomicBase>(exchange), 0));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, TNullPtr, T* compare) {
-    return AtomicCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare));
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, T* compare) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, reinterpret_cast<TAtomicBase>(compare)));
-}
-
-template <typename T>
-inline bool AtomicCas(T* volatile* target, TNullPtr, TNullPtr) {
-    return AtomicCas(AsAtomicPtr(target), 0, 0);
-}
-
-template <typename T>
-inline T* AtomicGetAndCas(T* volatile* target, TNullPtr, TNullPtr) {
-    return reinterpret_cast<T*>(AtomicGetAndCas(AsAtomicPtr(target), 0, 0));
-}
--- a/contrib/lfalloc/src/util/system/atomic_win.h
+++ b/contrib/lfalloc/src/util/system/atomic_win.h
@ -1,114 +0,0 @@
-#pragma once
-
-#include <intrin.h>
-
-#define USE_GENERIC_SETGET
-
-#if defined(_i386_)
-
-#pragma intrinsic(_InterlockedIncrement)
-#pragma intrinsic(_InterlockedDecrement)
-#pragma intrinsic(_InterlockedExchangeAdd)
-#pragma intrinsic(_InterlockedExchange)
-#pragma intrinsic(_InterlockedCompareExchange)
-
-static inline intptr_t AtomicIncrement(TAtomic& a) {
-    return _InterlockedIncrement((volatile long*)&a);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
-    return _InterlockedIncrement((volatile long*)&a) - 1;
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& a) {
-    return _InterlockedDecrement((volatile long*)&a);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
-    return _InterlockedDecrement((volatile long*)&a) + 1;
-}
-
-static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd((volatile long*)&a, b) + b;
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd((volatile long*)&a, b);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
-    return _InterlockedExchange((volatile long*)a, b);
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange((volatile long*)a, exchange, compare) == compare;
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange((volatile long*)a, exchange, compare);
-}
-
-#else // _x86_64_
-
-#pragma intrinsic(_InterlockedIncrement64)
-#pragma intrinsic(_InterlockedDecrement64)
-#pragma intrinsic(_InterlockedExchangeAdd64)
-#pragma intrinsic(_InterlockedExchange64)
-#pragma intrinsic(_InterlockedCompareExchange64)
-
-static inline intptr_t AtomicIncrement(TAtomic& a) {
-    return _InterlockedIncrement64((volatile __int64*)&a);
-}
-
-static inline intptr_t AtomicGetAndIncrement(TAtomic& a) {
-    return _InterlockedIncrement64((volatile __int64*)&a) - 1;
-}
-
-static inline intptr_t AtomicDecrement(TAtomic& a) {
-    return _InterlockedDecrement64((volatile __int64*)&a);
-}
-
-static inline intptr_t AtomicGetAndDecrement(TAtomic& a) {
-    return _InterlockedDecrement64((volatile __int64*)&a) + 1;
-}
-
-static inline intptr_t AtomicAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd64((volatile __int64*)&a, b) + b;
-}
-
-static inline intptr_t AtomicGetAndAdd(TAtomic& a, intptr_t b) {
-    return _InterlockedExchangeAdd64((volatile __int64*)&a, b);
-}
-
-static inline intptr_t AtomicSwap(TAtomic* a, intptr_t b) {
-    return _InterlockedExchange64((volatile __int64*)a, b);
-}
-
-static inline bool AtomicCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare) == compare;
-}
-
-static inline intptr_t AtomicGetAndCas(TAtomic* a, intptr_t exchange, intptr_t compare) {
-    return _InterlockedCompareExchange64((volatile __int64*)a, exchange, compare);
-}
-
-static inline intptr_t AtomicOr(TAtomic& a, intptr_t b) {
-    return _InterlockedOr64(&a, b) | b;
-}
-
-static inline intptr_t AtomicAnd(TAtomic& a, intptr_t b) {
-    return _InterlockedAnd64(&a, b) & b;
-}
-
-static inline intptr_t AtomicXor(TAtomic& a, intptr_t b) {
-    return _InterlockedXor64(&a, b) ^ b;
-}
-
-#endif // _x86_
-
-//TODO
-static inline void AtomicBarrier() {
-    TAtomic val = 0;
-
-    AtomicSwap(&val, 0);
-}
--- a/contrib/lfalloc/src/util/system/compiler.h
+++ b/contrib/lfalloc/src/util/system/compiler.h
@ -1,617 +0,0 @@
-#pragma once
-
-// useful cross-platfrom definitions for compilers
-
-/**
- * @def Y_FUNC_SIGNATURE
- *
- * Use this macro to get pretty function name (see example).
- *
- * @code
- * void Hi() {
- *     Cout << Y_FUNC_SIGNATURE << Endl;
- * }
-
- * template <typename T>
- * void Do() {
- *     Cout << Y_FUNC_SIGNATURE << Endl;
- * }
-
- * int main() {
- *    Hi();         // void Hi()
- *    Do<int>();    // void Do() [T = int]
- *    Do<TString>(); // void Do() [T = TString]
- * }
- * @endcode
- */
-#if defined(__GNUC__)
-#define Y_FUNC_SIGNATURE __PRETTY_FUNCTION__
-#elif defined(_MSC_VER)
-#define Y_FUNC_SIGNATURE __FUNCSIG__
-#else
-#define Y_FUNC_SIGNATURE ""
-#endif
-
-#ifdef __GNUC__
-#define Y_PRINTF_FORMAT(n, m) __attribute__((__format__(__printf__, n, m)))
-#endif
-
-#ifndef Y_PRINTF_FORMAT
-#define Y_PRINTF_FORMAT(n, m)
-#endif
-
-#if defined(__clang__)
-#define Y_NO_SANITIZE(...) __attribute__((no_sanitize(__VA_ARGS__)))
-#endif
-
-#if !defined(Y_NO_SANITIZE)
-#define Y_NO_SANITIZE(...)
-#endif
-
-/**
- * @def Y_DECLARE_UNUSED
- *
- * Macro is needed to silence compiler warning about unused entities (e.g. function or argument).
- *
- * @code
- * Y_DECLARE_UNUSED int FunctionUsedSolelyForDebugPurposes();
- * assert(FunctionUsedSolelyForDebugPurposes() == 42);
- *
- * void Foo(const int argumentUsedOnlyForDebugPurposes Y_DECLARE_UNUSED) {
- *     assert(argumentUsedOnlyForDebugPurposes == 42);
- *     // however you may as well omit `Y_DECLARE_UNUSED` and use `UNUSED` macro instead
- *     Y_UNUSED(argumentUsedOnlyForDebugPurposes);
- * }
- * @endcode
- */
-#ifdef __GNUC__
-#define Y_DECLARE_UNUSED __attribute__((unused))
-#endif
-
-#ifndef Y_DECLARE_UNUSED
-#define Y_DECLARE_UNUSED
-#endif
-
-#if defined(__GNUC__)
-#define Y_LIKELY(Cond) __builtin_expect(!!(Cond), 1)
-#define Y_UNLIKELY(Cond) __builtin_expect(!!(Cond), 0)
-#define Y_PREFETCH_READ(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 0, Priority)
-#define Y_PREFETCH_WRITE(Pointer, Priority) __builtin_prefetch((const void*)(Pointer), 1, Priority)
-#endif
-
-/**
- * @def Y_FORCE_INLINE
- *
- * Macro to use in place of 'inline' in function declaration/definition to force
- * it to be inlined.
- */
-#if !defined(Y_FORCE_INLINE)
-#if defined(CLANG_COVERAGE)
-#/* excessive __always_inline__ might significantly slow down compilation of an instrumented unit */
-#define Y_FORCE_INLINE inline
-#elif defined(_MSC_VER)
-#define Y_FORCE_INLINE __forceinline
-#elif defined(__GNUC__)
-#/* Clang also defines __GNUC__ (as 4) */
-#define Y_FORCE_INLINE inline __attribute__((__always_inline__))
-#else
-#define Y_FORCE_INLINE inline
-#endif
-#endif
-
-/**
- * @def Y_NO_INLINE
- *
- * Macro to use in place of 'inline' in function declaration/definition to
- * prevent it from being inlined.
- */
-#if !defined(Y_NO_INLINE)
-#if defined(_MSC_VER)
-#define Y_NO_INLINE __declspec(noinline)
-#elif defined(__GNUC__) || defined(__INTEL_COMPILER)
-#/* Clang also defines __GNUC__ (as 4) */
-#define Y_NO_INLINE __attribute__((__noinline__))
-#else
-#define Y_NO_INLINE
-#endif
-#endif
-
-//to cheat compiler about strict aliasing or similar problems
-#if defined(__GNUC__)
-#define Y_FAKE_READ(X)                  \
-    do {                                \
-        __asm__ __volatile__(""         \
-                             :          \
-                             : "m"(X)); \
-    } while (0)
-
-#define Y_FAKE_WRITE(X)                  \
-    do {                                 \
-        __asm__ __volatile__(""          \
-                             : "=m"(X)); \
-    } while (0)
-#endif
-
-#if !defined(Y_FAKE_READ)
-#define Y_FAKE_READ(X)
-#endif
-
-#if !defined(Y_FAKE_WRITE)
-#define Y_FAKE_WRITE(X)
-#endif
-
-#ifndef Y_PREFETCH_READ
-#define Y_PREFETCH_READ(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
-#endif
-
-#ifndef Y_PREFETCH_WRITE
-#define Y_PREFETCH_WRITE(Pointer, Priority) (void)(const void*)(Pointer), (void)Priority
-#endif
-
-#ifndef Y_LIKELY
-#define Y_LIKELY(Cond) (Cond)
-#define Y_UNLIKELY(Cond) (Cond)
-#endif
-
-#ifdef __GNUC__
-#define _packed __attribute__((packed))
-#else
-#define _packed
-#endif
-
-#if defined(__GNUC__)
-#define Y_WARN_UNUSED_RESULT __attribute__((warn_unused_result))
-#endif
-
-#ifndef Y_WARN_UNUSED_RESULT
-#define Y_WARN_UNUSED_RESULT
-#endif
-
-#if defined(__GNUC__)
-#define Y_HIDDEN __attribute__((visibility("hidden")))
-#endif
-
-#if !defined(Y_HIDDEN)
-#define Y_HIDDEN
-#endif
-
-#if defined(__GNUC__)
-#define Y_PUBLIC __attribute__((visibility("default")))
-#endif
-
-#if !defined(Y_PUBLIC)
-#define Y_PUBLIC
-#endif
-
-#if !defined(Y_UNUSED) && !defined(__cplusplus)
-#define Y_UNUSED(var) (void)(var)
-#endif
-#if !defined(Y_UNUSED) && defined(__cplusplus)
-template <class... Types>
-constexpr Y_FORCE_INLINE int Y_UNUSED(Types&&...) {
-    return 0;
-};
-#endif
-
-/**
- * @def Y_ASSUME
- *
- * Macro that tells the compiler that it can generate optimized code
- * as if the given expression will always evaluate true.
- * The behavior is undefined if it ever evaluates false.
- *
- * @code
- * // factored into a function so that it's testable
- * inline int Avg(int x, int y) {
- *     if (x >= 0 && y >= 0) {
- *         return (static_cast<unsigned>(x) + static_cast<unsigned>(y)) >> 1;
- *     } else {
- *         // a slower implementation
- *     }
- * }
- *
- * // we know that xs and ys are non-negative from domain knowledge,
- * // but we can't change the types of xs and ys because of API constrains
- * int Foo(const TVector<int>& xs, const TVector<int>& ys) {
- *     TVector<int> avgs;
- *     avgs.resize(xs.size());
- *     for (size_t i = 0; i < xs.size(); ++i) {
- *         auto x = xs[i];
- *         auto y = ys[i];
- *         Y_ASSUME(x >= 0);
- *         Y_ASSUME(y >= 0);
- *         xs[i] = Avg(x, y);
- *     }
- * }
- * @endcode
- */
-#if defined(__GNUC__)
-#define Y_ASSUME(condition) ((condition) ? (void)0 : __builtin_unreachable())
-#elif defined(_MSC_VER)
-#define Y_ASSUME(condition) __assume(condition)
-#else
-#define Y_ASSUME(condition) Y_UNUSED(condition)
-#endif
-
-#ifdef __cplusplus
-[[noreturn]]
-#endif
-Y_HIDDEN void _YandexAbort();
-
-/**
- * @def Y_UNREACHABLE
- *
- * Macro that marks the rest of the code branch unreachable.
- * The behavior is undefined if it's ever reached.
- *
- * @code
- * switch (i % 3) {
- * case 0:
- *     return foo;
- * case 1:
- *     return bar;
- * case 2:
- *     return baz;
- * default:
- *     Y_UNREACHABLE();
- * }
- * @endcode
- */
-#if defined(__GNUC__) || defined(_MSC_VER)
-#define Y_UNREACHABLE() Y_ASSUME(0)
-#else
-#define Y_UNREACHABLE() _YandexAbort()
-#endif
-
-#if defined(undefined_sanitizer_enabled)
-#define _ubsan_enabled_
-#endif
-
-#ifdef __clang__
-
-#if __has_feature(thread_sanitizer)
-#define _tsan_enabled_
-#endif
-#if __has_feature(memory_sanitizer)
-#define _msan_enabled_
-#endif
-#if __has_feature(address_sanitizer)
-#define _asan_enabled_
-#endif
-
-#else
-
-#if defined(thread_sanitizer_enabled) || defined(__SANITIZE_THREAD__)
-#define _tsan_enabled_
-#endif
-#if defined(memory_sanitizer_enabled)
-#define _msan_enabled_
-#endif
-#if defined(address_sanitizer_enabled) || defined(__SANITIZE_ADDRESS__)
-#define _asan_enabled_
-#endif
-
-#endif
-
-#if defined(_asan_enabled_) || defined(_msan_enabled_) || defined(_tsan_enabled_) || defined(_ubsan_enabled_)
-#define _san_enabled_
-#endif
-
-#if defined(_MSC_VER)
-#define __PRETTY_FUNCTION__ __FUNCSIG__
-#endif
-
-#if defined(__GNUC__)
-#define Y_WEAK __attribute__((weak))
-#else
-#define Y_WEAK
-#endif
-
-#if defined(__CUDACC_VER_MAJOR__)
-#define Y_CUDA_AT_LEAST(x, y) (__CUDACC_VER_MAJOR__ > x || (__CUDACC_VER_MAJOR__ == x && __CUDACC_VER_MINOR__ >= y))
-#else
-#define Y_CUDA_AT_LEAST(x, y) 0
-#endif
-
-// NVidia CUDA C++ Compiler did not know about noexcept keyword until version 9.0
-#if !Y_CUDA_AT_LEAST(9, 0)
-#if defined(__CUDACC__) && !defined(noexcept)
-#define noexcept throw ()
-#endif
-#endif
-
-#if defined(__GNUC__)
-#define Y_COLD __attribute__((cold))
-#define Y_LEAF __attribute__((leaf))
-#define Y_WRAPPER __attribute__((artificial))
-#else
-#define Y_COLD
-#define Y_LEAF
-#define Y_WRAPPER
-#endif
-
-/**
- * @def Y_PRAGMA
- *
- * Macro for use in other macros to define compiler pragma
- * See below for other usage examples
- *
- * @code
- * #if defined(__clang__) || defined(__GNUC__)
- * #define Y_PRAGMA_NO_WSHADOW \
- *     Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
- * #elif defined(_MSC_VER)
- * #define Y_PRAGMA_NO_WSHADOW \
- *     Y_PRAGMA("warning(disable:4456 4457")
- * #else
- * #define Y_PRAGMA_NO_WSHADOW
- * #endif
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA(x) _Pragma(x)
-#elif defined(_MSC_VER)
-#define Y_PRAGMA(x) __pragma(x)
-#else
-#define Y_PRAGMA(x)
-#endif
-
-/**
- * @def Y_PRAGMA_DIAGNOSTIC_PUSH
- *
- * Cross-compiler pragma to save diagnostic settings
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
- *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
- *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_DIAGNOSTIC_PUSH \
-    Y_PRAGMA("GCC diagnostic push")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_DIAGNOSTIC_PUSH \
-    Y_PRAGMA(warning(push))
-#else
-#define Y_PRAGMA_DIAGNOSTIC_PUSH
-#endif
-
-/**
- * @def Y_PRAGMA_DIAGNOSTIC_POP
- *
- * Cross-compiler pragma to restore diagnostic settings
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
- *     MSVC: https://msdn.microsoft.com/en-us/library/2c8f766e.aspx
- *     Clang: https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_DIAGNOSTIC_POP \
-    Y_PRAGMA("GCC diagnostic pop")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_DIAGNOSTIC_POP \
-    Y_PRAGMA(warning(pop))
-#else
-#define Y_PRAGMA_DIAGNOSTIC_POP
-#endif
-
-/**
- * @def Y_PRAGMA_NO_WSHADOW
- *
- * Cross-compiler pragma to disable warnings about shadowing variables
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_WSHADOW
- *
- * // some code which use variable shadowing, e.g.:
- *
- * for (int i = 0; i < 100; ++i) {
- *   Use(i);
- *
- *   for (int i = 42; i < 100500; ++i) { // this i is shadowing previous i
- *       AnotherUse(i);
- *    }
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_WSHADOW \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wshadow\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_WSHADOW \
-    Y_PRAGMA(warning(disable : 4456 4457))
-#else
-#define Y_PRAGMA_NO_WSHADOW
-#endif
-
-/**
- * @ def Y_PRAGMA_NO_UNUSED_FUNCTION
- *
- * Cross-compiler pragma to disable warnings about unused functions
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function
- *     MSVC: there is no such warning
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_UNUSED_FUNCTION
- *
- * // some code which introduces a function which later will not be used, e.g.:
- *
- * void Foo() {
- * }
- *
- * int main() {
- *     return 0; // Foo() never called
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_UNUSED_FUNCTION \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-function\"")
-#else
-#define Y_PRAGMA_NO_UNUSED_FUNCTION
-#endif
-
-/**
- * @ def Y_PRAGMA_NO_UNUSED_PARAMETER
- *
- * Cross-compiler pragma to disable warnings about unused function parameters
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-parameter
- *     MSVC: https://msdn.microsoft.com/en-us/library/26kb9fy0.aspx
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_UNUSED_PARAMETER
- *
- * // some code which introduces a function with unused parameter, e.g.:
- *
- * void foo(int a) {
- *     // a is not referenced
- * }
- *
- * int main() {
- *     foo(1);
- *     return 0;
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_UNUSED_PARAMETER \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wunused-parameter\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_UNUSED_PARAMETER \
-    Y_PRAGMA(warning(disable : 4100))
-#else
-#define Y_PRAGMA_NO_UNUSED_PARAMETER
-#endif
-
-/**
- * @def Y_PRAGMA_NO_DEPRECATED
- *
- * Cross compiler pragma to disable warnings and errors about deprecated
- *
- * @see
- *     GCC: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
- *     Clang: https://clang.llvm.org/docs/DiagnosticsReference.html#wdeprecated
- *     MSVC: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4996?view=vs-2017
- *
- * @code
- * Y_PRAGMA_DIAGNOSTIC_PUSH
- * Y_PRAGMA_NO_DEPRECATED
- *
- * [deprecated] void foo() {
- *     // ...
- * }
- *
- * int main() {
- *     foo();
- *     return 0;
- * }
- *
- * Y_PRAGMA_DIAGNOSTIC_POP
- * @endcode
- */
-#if defined(__clang__) || defined(__GNUC__)
-#define Y_PRAGMA_NO_DEPRECATED \
-    Y_PRAGMA("GCC diagnostic ignored \"-Wdeprecated\"")
-#elif defined(_MSC_VER)
-#define Y_PRAGMA_NO_DEPRECATED \
-    Y_PRAGMA(warning(disable : 4996))
-#else
-#define Y_PRAGMA_NO_DEPRECATED
-#endif
-
-#if defined(__clang__) || defined(__GNUC__)
-/**
- * @def Y_CONST_FUNCTION
-   methods and functions, marked with this method are promised to:
-     1. do not have side effects
-     2. this method do not read global memory
-   NOTE: this attribute can't be set for methods that depend on data, pointed by this
-   this allow compilers to do hard optimization of that functions
-   NOTE: in common case this attribute can't be set if method have pointer-arguments
-   NOTE: as result there no any reason to discard result of such method
-*/
-#define Y_CONST_FUNCTION [[gnu::const]]
-#endif
-
-#if !defined(Y_CONST_FUNCTION)
-#define Y_CONST_FUNCTION
-#endif
-
-#if defined(__clang__) || defined(__GNUC__)
-/**
- * @def Y_PURE_FUNCTION
-   methods and functions, marked with this method are promised to:
-     1. do not have side effects
-     2. result will be the same if no global memory changed
-   this allow compilers to do hard optimization of that functions
-   NOTE: as result there no any reason to discard result of such method
-*/
-#define Y_PURE_FUNCTION [[gnu::pure]]
-#endif
-
-#if !defined(Y_PURE_FUNCTION)
-#define Y_PURE_FUNCTION
-#endif
-
-/**
- * @ def Y_HAVE_INT128
- *
- * Defined when the compiler supports __int128 extension
- *
- * @code
- *
- * #if defined(Y_HAVE_INT128)
- *     __int128 myVeryBigInt = 12345678901234567890;
- * #endif
- *
- * @endcode
- */
-#if defined(__SIZEOF_INT128__)
-#define Y_HAVE_INT128 1
-#endif
-
-/**
- * XRAY macro must be passed to compiler if XRay is enabled.
- *
- * Define everything XRay-specific as a macro so that it doesn't cause errors
- * for compilers that doesn't support XRay.
- */
-#if defined(XRAY) && defined(__cplusplus)
-#include <xray/xray_interface.h>
-#define Y_XRAY_ALWAYS_INSTRUMENT [[clang::xray_always_instrument]]
-#define Y_XRAY_NEVER_INSTRUMENT [[clang::xray_never_instrument]]
-#define Y_XRAY_CUSTOM_EVENT(__string, __length) \
-    do {                                        \
-        __xray_customevent(__string, __length); \
-    } while (0)
-#else
-#define Y_XRAY_ALWAYS_INSTRUMENT
-#define Y_XRAY_NEVER_INSTRUMENT
-#define Y_XRAY_CUSTOM_EVENT(__string, __length) \
-    do {                                        \
-    } while (0)
-#endif
--- a/contrib/lfalloc/src/util/system/defaults.h
+++ b/contrib/lfalloc/src/util/system/defaults.h
@ -1,168 +0,0 @@
-#pragma once
-
-#include "platform.h"
-
-#if defined _unix_
-#define LOCSLASH_C '/'
-#define LOCSLASH_S "/"
-#else
-#define LOCSLASH_C '\\'
-#define LOCSLASH_S "\\"
-#endif // _unix_
-
-#if defined(__INTEL_COMPILER) && defined(__cplusplus)
-#include <new>
-#endif
-
-// low and high parts of integers
-#if !defined(_win_)
-#include <sys/param.h>
-#endif
-
-#if defined(BSD) || defined(_android_)
-
-#if defined(BSD)
-#include <machine/endian.h>
-#endif
-
-#if defined(_android_)
-#include <endian.h>
-#endif
-
-#if (BYTE_ORDER == LITTLE_ENDIAN)
-#define _little_endian_
-#elif (BYTE_ORDER == BIG_ENDIAN)
-#define _big_endian_
-#else
-#error unknown endian not supported
-#endif
-
-#elif (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(WHATEVER_THAT_HAS_BIG_ENDIAN)
-#define _big_endian_
-#else
-#define _little_endian_
-#endif
-
-// alignment
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_QUADS)
-#define _must_align8_
-#endif
-
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_LONGS)
-#define _must_align4_
-#endif
-
-#if (defined(_sun_) && !defined(__i386__)) || defined(_hpux_) || defined(__alpha__) || defined(__ia64__) || defined(WHATEVER_THAT_NEEDS_ALIGNING_SHORTS)
-#define _must_align2_
-#endif
-
-#if defined(__GNUC__)
-#define alias_hack __attribute__((__may_alias__))
-#endif
-
-#ifndef alias_hack
-#define alias_hack
-#endif
-
-#include "types.h"
-
-#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
-#define PRAGMA(x) _Pragma(#x)
-#define RCSID(idstr) PRAGMA(comment(exestr, idstr))
-#else
-#define RCSID(idstr) static const char rcsid[] = idstr
-#endif
-
-#include "compiler.h"
-
-#ifdef _win_
-#include <malloc.h>
-#elif defined(_sun_)
-#include <alloca.h>
-#endif
-
-#ifdef NDEBUG
-#define Y_IF_DEBUG(X)
-#else
-#define Y_IF_DEBUG(X) X
-#endif
-
-/**
- * @def Y_ARRAY_SIZE
- *
- * This macro is needed to get number of elements in a statically allocated fixed size array. The
- * expression is a compile-time constant and therefore can be used in compile time computations.
- *
- * @code
- * enum ENumbers {
- *     EN_ONE,
- *     EN_TWO,
- *     EN_SIZE
- * }
- *
- * const char* NAMES[] = {
- *     "one",
- *     "two"
- * }
- *
- * static_assert(Y_ARRAY_SIZE(NAMES) == EN_SIZE, "you should define `NAME` for each enumeration");
- * @endcode
- *
- * This macro also catches type errors. If you see a compiler error like "warning: division by zero
- * is undefined" when using `Y_ARRAY_SIZE` then you are probably giving it a pointer.
- *
- * Since all of our code is expected to work on a 64 bit platform where pointers are 8 bytes we may
- * falsefully accept pointers to types of sizes that are divisors of 8 (1, 2, 4 and 8).
- */
-#if defined(__cplusplus)
-namespace NArraySizePrivate {
-    template <class T>
-    struct TArraySize;
-
-    template <class T, size_t N>
-    struct TArraySize<T[N]> {
-        enum {
-            Result = N
-        };
-    };
-
-    template <class T, size_t N>
-    struct TArraySize<T (&)[N]> {
-        enum {
-            Result = N
-        };
-    };
-}
-
-#define Y_ARRAY_SIZE(arr) ((size_t)::NArraySizePrivate::TArraySize<decltype(arr)>::Result)
-#else
-#undef Y_ARRAY_SIZE
-#define Y_ARRAY_SIZE(arr) \
-    ((sizeof(arr) / sizeof((arr)[0])) / static_cast<size_t>(!(sizeof(arr) % sizeof((arr)[0]))))
-#endif
-
-#undef Y_ARRAY_BEGIN
-#define Y_ARRAY_BEGIN(arr) (arr)
-
-#undef Y_ARRAY_END
-#define Y_ARRAY_END(arr) ((arr) + Y_ARRAY_SIZE(arr))
-
-/**
- * Concatenates two symbols, even if one of them is itself a macro.
- */
-#define Y_CAT(X, Y) Y_CAT_I(X, Y)
-#define Y_CAT_I(X, Y) Y_CAT_II(X, Y)
-#define Y_CAT_II(X, Y) X##Y
-
-#define Y_STRINGIZE(X) UTIL_PRIVATE_STRINGIZE_AUX(X)
-#define UTIL_PRIVATE_STRINGIZE_AUX(X) #X
-
-#if defined(__COUNTER__)
-#define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __COUNTER__)
-#endif
-
-#if !defined(Y_GENERATE_UNIQUE_ID)
-#define Y_GENERATE_UNIQUE_ID(N) Y_CAT(N, __LINE__)
-#endif
-
-#define NPOS ((size_t)-1)
--- a/contrib/lfalloc/src/util/system/platform.h
+++ b/contrib/lfalloc/src/util/system/platform.h
@ -1,242 +0,0 @@
-#pragma once
-
-// What OS ?
-// our definition has the form _{osname}_
-
-#if defined(_WIN64)
-#define _win64_
-#define _win32_
-#elif defined(__WIN32__) || defined(_WIN32) // _WIN32 is also defined by the 64-bit compiler for backward compatibility
-#define _win32_
-#else
-#define _unix_
-#if defined(__sun__) || defined(sun) || defined(sparc) || defined(__sparc)
-#define _sun_
-#endif
-#if defined(__hpux__)
-#define _hpux_
-#endif
-#if defined(__linux__)
-#define _linux_
-#endif
-#if defined(__FreeBSD__)
-#define _freebsd_
-#endif
-#if defined(__CYGWIN__)
-#define _cygwin_
-#endif
-#if defined(__APPLE__)
-#define _darwin_
-#endif
-#if defined(__ANDROID__)
-#define _android_
-#endif
-#endif
-
-#if defined(__IOS__)
-#define _ios_
-#endif
-
-#if defined(_linux_)
-#if defined(_musl_)
-//nothing to do
-#elif defined(_android_)
-#define _bionic_
-#else
-#define _glibc_
-#endif
-#endif
-
-#if defined(_darwin_)
-#define unix
-#define __unix__
-#endif
-
-#if defined(_win32_) || defined(_win64_)
-#define _win_
-#endif
-
-#if defined(__arm__) || defined(__ARM__) || defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM)
-#if defined(__arm64) || defined(__arm64__) || defined(__aarch64__)
-#define _arm64_
-#else
-#define _arm32_
-#endif
-#endif
-
-#if defined(_arm64_) || defined(_arm32_)
-#define _arm_
-#endif
-
-/* __ia64__ and __x86_64__      - defined by GNU C.
- * _M_IA64, _M_X64, _M_AMD64    - defined by Visual Studio.
- *
- * Microsoft can define _M_IX86, _M_AMD64 (before Visual Studio 8)
- * or _M_X64 (starting in Visual Studio 8).
- */
-#if defined(__x86_64__) || defined(_M_X64) || defined(_M_AMD64)
-#define _x86_64_
-#endif
-
-#if defined(__i386__) || defined(_M_IX86)
-#define _i386_
-#endif
-
-#if defined(__ia64__) || defined(_M_IA64)
-#define _ia64_
-#endif
-
-#if defined(__powerpc__)
-#define _ppc_
-#endif
-
-#if defined(__powerpc64__)
-#define _ppc64_
-#endif
-
-#if !defined(sparc) && !defined(__sparc) && !defined(__hpux__) && !defined(__alpha__) && !defined(_ia64_) && !defined(_x86_64_) && !defined(_arm_) && !defined(_i386_) && !defined(_ppc_) && !defined(_ppc64_)
-#error "platform not defined, please, define one"
-#endif
-
-#if defined(_x86_64_) || defined(_i386_)
-#define _x86_
-#endif
-
-#if defined(__MIC__)
-#define _mic_
-#define _k1om_
-#endif
-
-// stdio or MessageBox
-#if defined(__CONSOLE__) || defined(_CONSOLE)
-#define _console_
-#endif
-#if (defined(_win_) && !defined(_console_))
-#define _windows_
-#elif !defined(_console_)
-#define _console_
-#endif
-
-#if defined(__SSE__) || defined(SSE_ENABLED)
-#define _sse_
-#endif
-
-#if defined(__SSE2__) || defined(SSE2_ENABLED)
-#define _sse2_
-#endif
-
-#if defined(__SSE3__) || defined(SSE3_ENABLED)
-#define _sse3_
-#endif
-
-#if defined(__SSSE3__) || defined(SSSE3_ENABLED)
-#define _ssse3_
-#endif
-
-#if defined(POPCNT_ENABLED)
-#define _popcnt_
-#endif
-
-#if defined(__DLL__) || defined(_DLL)
-#define _dll_
-#endif
-
-// 16, 32 or 64
-#if defined(__sparc_v9__) || defined(_x86_64_) || defined(_ia64_) || defined(_arm64_) || defined(_ppc64_)
-#define _64_
-#else
-#define _32_
-#endif
-
-/* All modern 64-bit Unix systems use scheme LP64 (long, pointers are 64-bit).
- * Microsoft uses a different scheme: LLP64 (long long, pointers are 64-bit).
- *
- * Scheme          LP64   LLP64
- * char              8      8
- * short            16     16
- * int              32     32
- * long             64     32
- * long long        64     64
- * pointer          64     64
- */
-
-#if defined(_32_)
-#define SIZEOF_PTR 4
-#elif defined(_64_)
-#define SIZEOF_PTR 8
-#endif
-
-#define PLATFORM_DATA_ALIGN SIZEOF_PTR
-
-#if !defined(SIZEOF_PTR)
-#error todo
-#endif
-
-#define SIZEOF_CHAR 1
-#define SIZEOF_UNSIGNED_CHAR 1
-#define SIZEOF_SHORT 2
-#define SIZEOF_UNSIGNED_SHORT 2
-#define SIZEOF_INT 4
-#define SIZEOF_UNSIGNED_INT 4
-
-#if defined(_32_)
-#define SIZEOF_LONG 4
-#define SIZEOF_UNSIGNED_LONG 4
-#elif defined(_64_)
-#if defined(_win_)
-#define SIZEOF_LONG 4
-#define SIZEOF_UNSIGNED_LONG 4
-#else
-#define SIZEOF_LONG 8
-#define SIZEOF_UNSIGNED_LONG 8
-#endif // _win_
-#endif // _32_
-
-#if !defined(SIZEOF_LONG)
-#error todo
-#endif
-
-#define SIZEOF_LONG_LONG 8
-#define SIZEOF_UNSIGNED_LONG_LONG 8
-
-#undef SIZEOF_SIZE_T // in case we include <Python.h> which defines it, too
-#define SIZEOF_SIZE_T SIZEOF_PTR
-
-#if defined(__INTEL_COMPILER)
-#pragma warning(disable 1292)
-#pragma warning(disable 1469)
-#pragma warning(disable 193)
-#pragma warning(disable 271)
-#pragma warning(disable 383)
-#pragma warning(disable 424)
-#pragma warning(disable 444)
-#pragma warning(disable 584)
-#pragma warning(disable 593)
-#pragma warning(disable 981)
-#pragma warning(disable 1418)
-#pragma warning(disable 304)
-#pragma warning(disable 810)
-#pragma warning(disable 1029)
-#pragma warning(disable 1419)
-#pragma warning(disable 177)
-#pragma warning(disable 522)
-#pragma warning(disable 858)
-#pragma warning(disable 111)
-#pragma warning(disable 1599)
-#pragma warning(disable 411)
-#pragma warning(disable 304)
-#pragma warning(disable 858)
-#pragma warning(disable 444)
-#pragma warning(disable 913)
-#pragma warning(disable 310)
-#pragma warning(disable 167)
-#pragma warning(disable 180)
-#pragma warning(disable 1572)
-#endif
-
-#if defined(_MSC_VER)
-#undef _WINSOCKAPI_
-#define _WINSOCKAPI_
-#undef NOMINMAX
-#define NOMINMAX
-#endif
--- a/contrib/lfalloc/src/util/system/types.h
+++ b/contrib/lfalloc/src/util/system/types.h
@ -1,117 +0,0 @@
-#pragma once
-
-// DO_NOT_STYLE
-
-#include "platform.h"
-
-#include <inttypes.h>
-
-typedef int8_t i8;
-typedef int16_t i16;
-typedef uint8_t ui8;
-typedef uint16_t ui16;
-
-typedef int yssize_t;
-#define PRIYSZT "d"
-
-#if defined(_darwin_) && defined(_32_)
-typedef unsigned long ui32;
-typedef long i32;
-#else
-typedef uint32_t ui32;
-typedef int32_t i32;
-#endif
-
-#if defined(_darwin_) && defined(_64_)
-typedef unsigned long ui64;
-typedef long i64;
-#else
-typedef uint64_t ui64;
-typedef int64_t i64;
-#endif
-
-#define LL(number) INT64_C(number)
-#define ULL(number) UINT64_C(number)
-
-// Macro for size_t and ptrdiff_t types
-#if defined(_32_)
-#   if defined(_darwin_)
-#       define PRISZT "lu"
-#       undef PRIi32
-#       define PRIi32 "li"
-#       undef SCNi32
-#       define SCNi32 "li"
-#       undef PRId32
-#       define PRId32 "li"
-#       undef SCNd32
-#       define SCNd32 "li"
-#       undef PRIu32
-#       define PRIu32 "lu"
-#       undef SCNu32
-#       define SCNu32 "lu"
-#       undef PRIx32
-#       define PRIx32 "lx"
-#       undef SCNx32
-#       define SCNx32 "lx"
-#   elif !defined(_cygwin_)
-#       define PRISZT PRIu32
-#   else
-#       define PRISZT "u"
-#   endif
-#   define SCNSZT SCNu32
-#   define PRIPDT PRIi32
-#   define SCNPDT SCNi32
-#   define PRITMT PRIi32
-#   define SCNTMT SCNi32
-#elif defined(_64_)
-#   if defined(_darwin_)
-#       define PRISZT "lu"
-#       undef PRIu64
-#       define PRIu64 PRISZT
-#       undef PRIx64
-#       define PRIx64 "lx"
-#       undef PRIX64
-#       define PRIX64 "lX"
-#       undef PRId64
-#       define PRId64 "ld"
-#       undef PRIi64
-#       define PRIi64 "li"
-#       undef SCNi64
-#       define SCNi64 "li"
-#       undef SCNu64
-#       define SCNu64 "lu"
-#       undef SCNx64
-#       define SCNx64 "lx"
-#   else
-#       define PRISZT PRIu64
-#   endif
-#   define SCNSZT SCNu64
-#   define PRIPDT PRIi64
-#   define SCNPDT SCNi64
-#   define PRITMT PRIi64
-#   define SCNTMT SCNi64
-#else
-#   error "Unsupported platform"
-#endif
-
-// SUPERLONG
-#if !defined(DONT_USE_SUPERLONG) && !defined(SUPERLONG_MAX)
-#define SUPERLONG_MAX ~LL(0)
-typedef i64 SUPERLONG;
-#endif
-
-// UNICODE
-// UCS-2, native byteorder
-typedef ui16 wchar16;
-// internal symbol type: UTF-16LE
-typedef wchar16 TChar;
-typedef ui32 wchar32;
-
-#if defined(_MSC_VER)
-#include <basetsd.h>
-typedef SSIZE_T ssize_t;
-#define HAVE_SSIZE_T 1
-#include <wchar.h>
-#endif
-
-#include <sys/types.h>
--- a/contrib/libhdfs3-cmake/CMake/Platform.cmake
+++ b/contrib/libhdfs3-cmake/CMake/Platform.cmake
@ -15,9 +15,14 @@ IF(CMAKE_COMPILER_IS_GNUCXX)
    
    STRING(REGEX MATCHALL "[0-9]+" GCC_COMPILER_VERSION ${GCC_COMPILER_VERSION})
    
+    LIST(LENGTH GCC_COMPILER_VERSION GCC_COMPILER_VERSION_LENGTH)
    LIST(GET GCC_COMPILER_VERSION 0 GCC_COMPILER_VERSION_MAJOR)
-    LIST(GET GCC_COMPILER_VERSION 0 GCC_COMPILER_VERSION_MINOR)
-    
+    if (GCC_COMPILER_VERSION_LENGTH GREATER 1)
+        LIST(GET GCC_COMPILER_VERSION 1 GCC_COMPILER_VERSION_MINOR)
+    else ()
+        set (GCC_COMPILER_VERSION_MINOR 0)
+    endif ()
+
    SET(GCC_COMPILER_VERSION_MAJOR ${GCC_COMPILER_VERSION_MAJOR} CACHE INTERNAL "gcc major version")
    SET(GCC_COMPILER_VERSION_MINOR ${GCC_COMPILER_VERSION_MINOR} CACHE INTERNAL "gcc minor version")
    
--- a/contrib/mimalloc
+++ b/contrib/mimalloc
@ -0,0 +1 @@
+Subproject commit a787bdebce94bf3776dc0d1ad597917f479ab8d5
--- a/contrib/poco
+++ b/contrib/poco
@ -1 +1 @@
-Subproject commit fe5505e56c27b6ecb0dcbc40c49dc2caf4e9637f
+Subproject commit 29439cf7fa32c1a2d62d925bb6d6a3f14668a4a2
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@ -2,6 +2,10 @@ if (USE_INCLUDE_WHAT_YOU_USE)
    set (CMAKE_CXX_INCLUDE_WHAT_YOU_USE ${IWYU_PATH})
 endif ()

+if (USE_CLANG_TIDY)
+    set (CMAKE_CXX_CLANG_TIDY "${DO_CLANG_TIDY}")
+endif ()
+
 if(COMPILER_PIPE)
    set(MAX_COMPILER_MEMORY 2500)
 else()
@ -225,8 +229,9 @@ if(RE2_INCLUDE_DIR)
    target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
 endif()

-if (USE_LFALLOC)
-    target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${LFALLOC_INCLUDE_DIR})
+if (USE_MIMALLOC)
+    target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${MIMALLOC_INCLUDE_DIR})
+    target_link_libraries (clickhouse_common_io PRIVATE ${MIMALLOC_LIBRARY})
 endif ()

 if(CPUID_LIBRARY)
--- a/dbms/programs/client/Client.cpp
+++ b/dbms/programs/client/Client.cpp
@ -59,6 +59,7 @@
 #include <Parsers/parseQuery.h>
 #include <Interpreters/Context.h>
 #include <Interpreters/InterpreterSetQuery.h>
+#include <Interpreters/ReplaceQueryParameterVisitor.h>
 #include <Client/Connection.h>
 #include <Common/InterruptListener.h>
 #include <Functions/registerFunctions.h>
@ -202,6 +203,9 @@ private:
    /// External tables info.
    std::list<ExternalTable> external_tables;

+    /// Dictionary with query parameters for prepared statements.
+    NameToNameMap query_parameters;
+
    ConnectionParameters connection_parameters;

    void initialize(Poco::Util::Application & self)
@ -795,7 +799,6 @@ private:
        /// Some parts of a query (result output and formatting) are executed client-side.
        /// Thus we need to parse the query.
        parsed_query = parsed_query_;
-
        if (!parsed_query)
        {
            const char * begin = query.data();
@ -900,6 +903,16 @@ private:
    /// Process the query that doesn't require transferring data blocks to the server.
    void processOrdinaryQuery()
    {
+        /// We will always rewrite query (even if there are no query_parameters) because it will help to find errors in query formatter.
+        {
+            /// Replace ASTQueryParameter with ASTLiteral for prepared statements.
+            ReplaceQueryParameterVisitor visitor(query_parameters);
+            visitor.visit(parsed_query);
+
+            /// Get new query after substitutions. Note that it cannot be done for INSERT query with embedded data.
+            query = serializeAST(*parsed_query);
+        }
+
        connection->sendQuery(connection_parameters.timeouts, query, query_id, QueryProcessingStage::Complete, &context.getSettingsRef(), nullptr, true);
        sendExternalTables();
        receiveResult();
@ -1548,7 +1561,8 @@ public:
        /** We allow different groups of arguments:
          * - common arguments;
          * - arguments for any number of external tables each in form "--external args...",
-          *   where possible args are file, name, format, structure, types.
+          *   where possible args are file, name, format, structure, types;
+          * - param arguments for prepared statements.
          * Split these groups before processing.
          */
        using Arguments = std::vector<std::string>;
@ -1597,7 +1611,31 @@ public:
            else
            {
                in_external_group = false;
-                common_arguments.emplace_back(arg);
+
+                /// Parameter arg after underline.
+                if (startsWith(arg, "--param_"))
+                {
+                    const char * param_continuation = arg + strlen("--param_");
+                    const char * equal_pos = strchr(param_continuation, '=');
+
+                    if (equal_pos == param_continuation)
+                        throw Exception("Parameter name cannot be empty", ErrorCodes::BAD_ARGUMENTS);
+
+                    if (equal_pos)
+                    {
+                        /// param_name=value
+                        query_parameters.emplace(String(param_continuation, equal_pos), String(equal_pos + 1));
+                    }
+                    else
+                    {
+                        /// param_name value
+                        ++arg_num;
+                        arg = argv[arg_num];
+                        query_parameters.emplace(String(param_continuation), String(arg));
+                    }
+                }
+                else
+                    common_arguments.emplace_back(arg);
            }
        }

@ -1672,6 +1710,7 @@ public:
            ("structure", po::value<std::string>(), "structure")
            ("types", po::value<std::string>(), "types")
        ;
+
        /// Parse main commandline options.
        po::parsed_options parsed = po::command_line_parser(common_arguments).options(main_description).run();
        po::variables_map options;
@ -1696,6 +1735,7 @@ public:
        {
            std::cout << main_description << "\n";
            std::cout << external_description << "\n";
+            std::cout << "In addition, --param_name=value can be specified for substitution of parameters for parametrized queries.\n";
            exit(0);
        }

--- a/dbms/programs/client/readpassphrase/readpassphrase.h
+++ b/dbms/programs/client/readpassphrase/readpassphrase.h
@ -29,6 +29,11 @@
 //#include "includes.h"
 #include "config_client.h"

+// Should not be included on BSD systems, but if it happen...
+#ifdef HAVE_READPASSPHRASE
+#   include_next <readpassphrase.h>
+#endif
+
 #ifndef HAVE_READPASSPHRASE

 #    ifdef __cplusplus
--- a/dbms/programs/copier/ClusterCopier.cpp
+++ b/dbms/programs/copier/ClusterCopier.cpp
@ -96,14 +96,19 @@ namespace

 using DatabaseAndTableName = std::pair<String, String>;

-String getDatabaseDotTable(const String & database, const String & table)
+String getQuotedTable(const String & database, const String & table)
 {
+    if (database.empty())
+    {
+        return backQuoteIfNeed(table);
+    }
+
    return backQuoteIfNeed(database) + "." + backQuoteIfNeed(table);
 }

-String getDatabaseDotTable(const DatabaseAndTableName & db_and_table)
+String getQuotedTable(const DatabaseAndTableName & db_and_table)
 {
-    return getDatabaseDotTable(db_and_table.first, db_and_table.second);
+    return getQuotedTable(db_and_table.first, db_and_table.second);
 }


@ -467,7 +472,7 @@ String DB::TaskShard::getDescription() const
    std::stringstream ss;
    ss << "N" << numberInCluster()
       << " (having a replica " << getHostNameExample()
-       << ", pull table " + getDatabaseDotTable(task_table.table_pull)
+       << ", pull table " + getQuotedTable(task_table.table_pull)
       << " of cluster " + task_table.cluster_pull_name << ")";
    return ss.str();
 }
@ -741,8 +746,10 @@ public:
    {
        auto zookeeper = context.getZooKeeper();

-        task_description_watch_callback = [this] (const Coordination::WatchResponse &)
+        task_description_watch_callback = [this] (const Coordination::WatchResponse & response)
        {
+            if (response.error != Coordination::ZOK)
+                return;
            UInt64 version = ++task_descprtion_version;
            LOG_DEBUG(log, "Task description should be updated, local version " << version);
        };
@ -1296,7 +1303,7 @@ protected:
        /// Remove all status nodes
        zookeeper->tryRemoveRecursive(current_shards_path);

-        String query = "ALTER TABLE " + getDatabaseDotTable(task_table.table_push);
+        String query = "ALTER TABLE " + getQuotedTable(task_table.table_push);
        query += " DROP PARTITION " + task_partition.name + "";

        /// TODO: use this statement after servers will be updated up to 1.1.54310
@ -1539,7 +1546,7 @@ protected:
        auto get_select_query = [&] (const DatabaseAndTableName & from_table, const String & fields, String limit = "")
        {
            String query;
-            query += "SELECT " + fields + " FROM " + getDatabaseDotTable(from_table);
+            query += "SELECT " + fields + " FROM " + getQuotedTable(from_table);
            /// TODO: Bad, it is better to rewrite with ASTLiteral(partition_key_field)
            query += " WHERE (" + queryToString(task_table.engine_push_partition_key_ast) + " = (" + task_partition.name + " AS partition_key))";
            if (!task_table.where_condition_str.empty())
@ -1677,7 +1684,7 @@ protected:
            LOG_DEBUG(log, "Create destination tables. Query: " << query);
            UInt64 shards = executeQueryOnCluster(task_table.cluster_push, query, create_query_push_ast, &task_cluster->settings_push,
                                    PoolMode::GET_MANY);
-            LOG_DEBUG(log, "Destination tables " << getDatabaseDotTable(task_table.table_push) << " have been created on " << shards
+            LOG_DEBUG(log, "Destination tables " << getQuotedTable(task_table.table_push) << " have been created on " << shards
                                                 << " shards of " << task_table.cluster_push->getShardCount());
        }

@ -1699,7 +1706,7 @@ protected:
            ASTPtr query_insert_ast;
            {
                String query;
-                query += "INSERT INTO " + getDatabaseDotTable(task_shard.table_split_shard) + " VALUES ";
+                query += "INSERT INTO " + getQuotedTable(task_shard.table_split_shard) + " VALUES ";

                ParserQuery p_query(query.data() + query.size());
                query_insert_ast = parseQuery(p_query, query, 0);
@ -1824,7 +1831,7 @@ protected:

    String getRemoteCreateTable(const DatabaseAndTableName & table, Connection & connection, const Settings * settings = nullptr)
    {
-        String query = "SHOW CREATE TABLE " + getDatabaseDotTable(table);
+        String query = "SHOW CREATE TABLE " + getQuotedTable(table);
        Block block = getBlockWithAllStreamData(std::make_shared<RemoteBlockInputStream>(
            connection, query, InterpreterShowCreateQuery::getSampleBlock(), context, settings));

@ -1887,7 +1894,7 @@ protected:
        {
            WriteBufferFromOwnString wb;
            wb << "SELECT DISTINCT " << queryToString(task_table.engine_push_partition_key_ast) << " AS partition FROM"
-               << " " << getDatabaseDotTable(task_shard.table_read_shard) << " ORDER BY partition DESC";
+               << " " << getQuotedTable(task_shard.table_read_shard) << " ORDER BY partition DESC";
            query = wb.str();
        }

@ -1929,7 +1936,7 @@ protected:
        {
            WriteBufferFromOwnString wb;
            wb << "SELECT 1"
-               << " FROM "<< getDatabaseDotTable(task_shard.table_read_shard)
+               << " FROM "<< getQuotedTable(task_shard.table_read_shard)
               << " WHERE " << queryToString(task_table.engine_push_partition_key_ast) << " = " << partition_quoted_name
               << " LIMIT 1";
            query = wb.str();
--- a/dbms/programs/server/HTTPHandler.cpp
+++ b/dbms/programs/server/HTTPHandler.cpp
@ -475,9 +475,9 @@ void HTTPHandler::processQuery(
            settings.readonly = 2;
    }

-    bool isExternalData = startsWith(request.getContentType().data(), "multipart/form-data");
+    bool has_external_data = startsWith(request.getContentType().data(), "multipart/form-data");

-    if (isExternalData)
+    if (has_external_data)
    {
        /// Skip unneeded parameters to avoid confusing them later with context settings or query parameters.
        reserved_param_suffixes.reserve(3);
@ -501,6 +501,12 @@ void HTTPHandler::processQuery(
        else if (param_could_be_skipped(key))
        {
        }
+        else if (startsWith(key, "param_"))
+        {
+            /// Save name and values of substitution in dictionary.
+            const String parameter_name = key.substr(strlen("param_"));
+            context.setQueryParameter(parameter_name, value);
+        }
        else
        {
            /// All other query parameters are treated as settings.
@ -516,7 +522,7 @@ void HTTPHandler::processQuery(
    std::string full_query;

    /// Support for "external data for query processing".
-    if (isExternalData)
+    if (has_external_data)
    {
        ExternalTablesHandler handler(context, params);
        params.load(request, istr, handler);
--- a/dbms/programs/server/TCPHandler.h
+++ b/dbms/programs/server/TCPHandler.h
@ -36,6 +36,12 @@ struct QueryState
    QueryProcessingStage::Enum stage = QueryProcessingStage::Complete;
    Protocol::Compression compression = Protocol::Compression::Disable;

+    /// A queue with internal logs that will be passed to client. It must be
+    /// destroyed after input/output blocks, because they may contain other
+    /// threads that use this queue.
+    InternalTextLogsQueuePtr logs_queue;
+    BlockOutputStreamPtr logs_block_out;
+
    /// From where to read data for INSERT.
    std::shared_ptr<ReadBuffer> maybe_compressed_in;
    BlockInputStreamPtr block_in;
@ -64,10 +70,6 @@ struct QueryState
    /// Timeouts setter for current query
    std::unique_ptr<TimeoutSetter> timeout_setter;

-    /// A queue with internal logs that will be passed to client
-    InternalTextLogsQueuePtr logs_queue;
-    BlockOutputStreamPtr logs_block_out;
-
    void reset()
    {
        *this = QueryState();
--- a/dbms/src/AggregateFunctions/AggregateFunctionForEach.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionForEach.h
@ -53,42 +53,49 @@ private:
    {
        AggregateFunctionForEachData & state = data(place);

-        /// Ensure we have aggreate states for new_size elements, allocate from arena if needed
-
+        /// Ensure we have aggreate states for new_size elements, allocate
+        /// from arena if needed. When reallocating, we can't copy the
+        /// states to new buffer with memcpy, because they may contain pointers
+        /// to themselves. In particular, this happens when a state contains
+        /// a PODArrayWithStackMemory, which stores small number of elements
+        /// inline. This is why we create new empty states in the new buffer,
+        /// and merge the old states to them.
        size_t old_size = state.dynamic_array_size;
        if (old_size < new_size)
        {
-            state.array_of_aggregate_datas = arena.alignedRealloc(
-                state.array_of_aggregate_datas,
-                old_size * nested_size_of_data,
+            char * old_state = state.array_of_aggregate_datas;
+            char * new_state = arena.alignedAlloc(
                new_size * nested_size_of_data,
                nested_func->alignOfData());

-            size_t i = old_size;
-            char * nested_state = state.array_of_aggregate_datas + i * nested_size_of_data;
-
+            size_t i;
            try
            {
-                for (; i < new_size; ++i)
+                for (i = 0; i < new_size; ++i)
                {
-                    nested_func->create(nested_state);
-                    nested_state += nested_size_of_data;
+                    nested_func->create(&new_state[i * nested_size_of_data]);
                }
            }
            catch (...)
            {
                size_t cleanup_size = i;
-                nested_state = state.array_of_aggregate_datas + i * nested_size_of_data;

                for (i = 0; i < cleanup_size; ++i)
                {
-                    nested_func->destroy(nested_state);
-                    nested_state += nested_size_of_data;
+                    nested_func->destroy(&new_state[i * nested_size_of_data]);
                }

                throw;
            }

+            for (i = 0; i < old_size; i++)
+            {
+                nested_func->merge(&new_state[i * nested_size_of_data],
+                        &old_state[i * nested_size_of_data],
+                        &arena);
+            }
+
+            state.array_of_aggregate_datas = new_state;
            state.dynamic_array_size = new_size;
        }

--- a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp
@ -43,11 +43,11 @@ static IAggregateFunction * createWithExtraTypes(const DataTypePtr & argument_ty
    else if (which.idx == TypeIndex::DateTime) return new AggregateFunctionGroupUniqArrayDateTime<has_limit>(argument_type, std::forward<TArgs>(args)...);
    else
    {
-        /// Check that we can use plain version of AggreagteFunctionGroupUniqArrayGeneric
+        /// Check that we can use plain version of AggregateFunctionGroupUniqArrayGeneric
        if (argument_type->isValueUnambiguouslyRepresentedInContiguousMemoryRegion())
-            return new AggreagteFunctionGroupUniqArrayGeneric<true, has_limit>(argument_type, std::forward<TArgs>(args)...);
+            return new AggregateFunctionGroupUniqArrayGeneric<true, has_limit>(argument_type, std::forward<TArgs>(args)...);
        else
-            return new AggreagteFunctionGroupUniqArrayGeneric<false, has_limit>(argument_type, std::forward<TArgs>(args)...);
+            return new AggregateFunctionGroupUniqArrayGeneric<false, has_limit>(argument_type, std::forward<TArgs>(args)...);
    }
 }

--- a/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h
@ -122,7 +122,7 @@ public:


 /// Generic implementation, it uses serialized representation as object descriptor.
-struct AggreagteFunctionGroupUniqArrayGenericData
+struct AggregateFunctionGroupUniqArrayGenericData
 {
    static constexpr size_t INIT_ELEMS = 2; /// adjustable
    static constexpr size_t ELEM_SIZE = sizeof(HashSetCellWithSavedHash<StringRef, StringRefHash>);
@ -132,7 +132,7 @@ struct AggreagteFunctionGroupUniqArrayGenericData
 };


-/// Helper function for deserialize and insert for the class AggreagteFunctionGroupUniqArrayGeneric
+/// Helper function for deserialize and insert for the class AggregateFunctionGroupUniqArrayGeneric
 template <bool is_plain_column>
 static StringRef getSerializationImpl(const IColumn & column, size_t row_num, Arena & arena);

@ -143,15 +143,15 @@ static void deserializeAndInsertImpl(StringRef str, IColumn & data_to);
 *  For such columns groupUniqArray() can be implemented more efficiently (especially for small numeric arrays).
 */
 template <bool is_plain_column = false, typename Tlimit_num_elem = std::false_type>
-class AggreagteFunctionGroupUniqArrayGeneric
-    : public IAggregateFunctionDataHelper<AggreagteFunctionGroupUniqArrayGenericData, AggreagteFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>
+class AggregateFunctionGroupUniqArrayGeneric
+    : public IAggregateFunctionDataHelper<AggregateFunctionGroupUniqArrayGenericData, AggregateFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>
 {
    DataTypePtr & input_data_type;

    static constexpr bool limit_num_elems = Tlimit_num_elem::value;
    UInt64 max_elems;

-    using State = AggreagteFunctionGroupUniqArrayGenericData;
+    using State = AggregateFunctionGroupUniqArrayGenericData;

    static StringRef getSerialization(const IColumn & column, size_t row_num, Arena & arena)
    {
@ -164,8 +164,8 @@ class AggreagteFunctionGroupUniqArrayGeneric
    }

 public:
-    AggreagteFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type, UInt64 max_elems_ = std::numeric_limits<UInt64>::max())
-        : IAggregateFunctionDataHelper<AggreagteFunctionGroupUniqArrayGenericData, AggreagteFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>({input_data_type}, {})
+    AggregateFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type, UInt64 max_elems_ = std::numeric_limits<UInt64>::max())
+        : IAggregateFunctionDataHelper<AggregateFunctionGroupUniqArrayGenericData, AggregateFunctionGroupUniqArrayGeneric<is_plain_column, Tlimit_num_elem>>({input_data_type}, {})
        , input_data_type(this->argument_types[0])
        , max_elems(max_elems_) {}

--- a/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionSequenceMatch.h
@ -47,8 +47,7 @@ struct AggregateFunctionSequenceMatchData final
    using Comparator = ComparePairFirst<std::less>;

    bool sorted = true;
-    static constexpr size_t bytes_in_arena = 64;
-    PODArray<TimestampEvents, bytes_in_arena, AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>> events_list;
+    PODArrayWithStackMemory<TimestampEvents, 64> events_list;

    void add(const Timestamp timestamp, const Events & events)
    {
@ -203,8 +202,7 @@ private:
        PatternAction(const PatternActionType type, const std::uint64_t extra = 0) : type{type}, extra{extra} {}
    };

-    static constexpr size_t bytes_on_stack = 64;
-    using PatternActions = PODArray<PatternAction, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using PatternActions = PODArrayWithStackMemory<PatternAction, 64>;

    Derived & derived() { return static_cast<Derived &>(*this); }

--- a/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionTimeSeriesGroupSum.h
@ -68,9 +68,8 @@ struct AggregateFunctionTimeSeriesGroupSumData
        }
    };

-    static constexpr size_t bytes_on_stack = 128;
    typedef std::map<UInt64, Points> Series;
-    typedef PODArray<DataPoint, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>> AggSeries;
+    typedef PODArrayWithStackMemory<DataPoint, 128> AggSeries;
    Series ss;
    AggSeries result;

--- a/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
@ -35,10 +35,7 @@ template <typename T>
 struct AggregateFunctionWindowFunnelData
 {
    using TimestampEvent = std::pair<T, UInt8>;
-
-    static constexpr size_t bytes_on_stack = 64;
-    using TimestampEvents = PODArray<TimestampEvent, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
-
+    using TimestampEvents = PODArray<TimestampEvent, 64>;
    using Comparator = ComparePairFirst;

    bool sorted = true;
--- a/dbms/src/AggregateFunctions/QuantileExact.h
+++ b/dbms/src/AggregateFunctions/QuantileExact.h
@ -27,8 +27,7 @@ struct QuantileExact
 {
    /// The memory will be allocated to several elements at once, so that the state occupies 64 bytes.
    static constexpr size_t bytes_in_arena = 64 - sizeof(PODArray<Value>);
-
-    using Array = PODArray<Value, bytes_in_arena, AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>>;
+    using Array = PODArrayWithStackMemory<Value, bytes_in_arena>;
    Array array;

    void add(const Value & x)
--- a/dbms/src/AggregateFunctions/QuantileTDigest.h
+++ b/dbms/src/AggregateFunctions/QuantileTDigest.h
@ -86,8 +86,7 @@ class QuantileTDigest

    /// The memory will be allocated to several elements at once, so that the state occupies 64 bytes.
    static constexpr size_t bytes_in_arena = 128 - sizeof(PODArray<Centroid>) - sizeof(Count) - sizeof(UInt32);
-
-    using Summary = PODArray<Centroid, bytes_in_arena / sizeof(Centroid), AllocatorWithStackMemory<Allocator<false>, bytes_in_arena>>;
+    using Summary = PODArrayWithStackMemory<Centroid, bytes_in_arena>;

    Summary summary;
    Count count = 0;
--- a/dbms/src/AggregateFunctions/ReservoirSampler.h
+++ b/dbms/src/AggregateFunctions/ReservoirSampler.h
@ -194,8 +194,7 @@ private:
    friend void rs_perf_test();

    /// We allocate a little memory on the stack - to avoid allocations when there are many objects with a small number of elements.
-    static constexpr size_t bytes_on_stack = 64;
-    using Array = DB::PODArray<T, bytes_on_stack / sizeof(T), AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using Array = DB::PODArrayWithStackMemory<T, 64>;

    size_t sample_count;
    size_t total_values = 0;
--- a/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h
+++ b/dbms/src/AggregateFunctions/ReservoirSamplerDeterministic.h
@ -164,9 +164,8 @@ public:

 private:
    /// We allocate some memory on the stack to avoid allocations when there are many objects with a small number of elements.
-    static constexpr size_t bytes_on_stack = 64;
    using Element = std::pair<T, UInt32>;
-    using Array = DB::PODArray<Element, bytes_on_stack / sizeof(Element), AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
+    using Array = DB::PODArray<Element, 64>;

    size_t sample_count;
    size_t total_values{};
--- a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
+++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
@ -28,6 +28,9 @@ void registerAggregateFunctionTopK(AggregateFunctionFactory &);
 void registerAggregateFunctionsBitwise(AggregateFunctionFactory &);
 void registerAggregateFunctionsBitmap(AggregateFunctionFactory &);
 void registerAggregateFunctionsMaxIntersections(AggregateFunctionFactory &);
+void registerAggregateFunctionHistogram(AggregateFunctionFactory &);
+void registerAggregateFunctionRetention(AggregateFunctionFactory &);
+void registerAggregateFunctionTimeSeriesGroupSum(AggregateFunctionFactory &);
 void registerAggregateFunctionMLMethod(AggregateFunctionFactory &);
 void registerAggregateFunctionEntropy(AggregateFunctionFactory &);
 void registerAggregateFunctionSimpleLinearRegression(AggregateFunctionFactory &);
@ -41,9 +44,6 @@ void registerAggregateFunctionCombinatorMerge(AggregateFunctionCombinatorFactory
 void registerAggregateFunctionCombinatorNull(AggregateFunctionCombinatorFactory &);
 void registerAggregateFunctionCombinatorResample(AggregateFunctionCombinatorFactory &);

-void registerAggregateFunctionHistogram(AggregateFunctionFactory & factory);
-void registerAggregateFunctionRetention(AggregateFunctionFactory & factory);
-void registerAggregateFunctionTimeSeriesGroupSum(AggregateFunctionFactory & factory);
 void registerAggregateFunctions()
 {
    {
--- a/dbms/src/Columns/ColumnConst.h
+++ b/dbms/src/Columns/ColumnConst.h
@ -99,6 +99,11 @@ public:
        return data->getBool(0);
    }

+    Float64 getFloat64(size_t) const override
+    {
+        return data->getFloat64(0);
+    }
+
    bool isNullAt(size_t) const override
    {
        return data->isNullAt(0);
@ -197,8 +202,8 @@ public:
        return false;
    }

+    bool isNullable() const override { return isColumnNullable(*data); }
    bool onlyNull() const override { return data->isNullAt(0); }
-    bool isColumnConst() const override { return true; }
    bool isNumeric() const override { return data->isNumeric(); }
    bool isFixedAndContiguous() const override { return data->isFixedAndContiguous(); }
    bool valuesHaveFixedSize() const override { return data->valuesHaveFixedSize(); }
--- a/dbms/src/Columns/ColumnLowCardinality.h
+++ b/dbms/src/Columns/ColumnLowCardinality.h
@ -57,6 +57,8 @@ public:
    UInt64 get64(size_t n) const override { return getDictionary().get64(getIndexes().getUInt(n)); }
    UInt64 getUInt(size_t n) const override { return getDictionary().getUInt(getIndexes().getUInt(n)); }
    Int64 getInt(size_t n) const override { return getDictionary().getInt(getIndexes().getUInt(n)); }
+    Float64 getFloat64(size_t n) const override { return getDictionary().getInt(getIndexes().getFloat64(n)); }
+    bool getBool(size_t n) const override { return getDictionary().getInt(getIndexes().getBool(n)); }
    bool isNullAt(size_t n) const override { return getDictionary().isNullAt(getIndexes().getUInt(n)); }
    ColumnPtr cut(size_t start, size_t length) const override
    {
@ -146,6 +148,7 @@ public:
    size_t sizeOfValueIfFixed() const override { return getDictionary().sizeOfValueIfFixed(); }
    bool isNumeric() const override { return getDictionary().isNumeric(); }
    bool lowCardinality() const override { return true; }
+    bool isNullable() const override { return isColumnNullable(*dictionary.getColumnUniquePtr()); }

    const IColumnUnique & getDictionary() const { return dictionary.getColumnUnique(); }
    const ColumnPtr & getDictionaryPtr() const { return dictionary.getColumnUniquePtr(); }
--- a/dbms/src/Columns/ColumnNullable.cpp
+++ b/dbms/src/Columns/ColumnNullable.cpp
@ -27,7 +27,7 @@ ColumnNullable::ColumnNullable(MutableColumnPtr && nested_column_, MutableColumn
    if (!getNestedColumn().canBeInsideNullable())
        throw Exception{getNestedColumn().getName() + " cannot be inside Nullable column", ErrorCodes::ILLEGAL_COLUMN};

-    if (null_map->isColumnConst())
+    if (isColumnConst(*null_map))
        throw Exception{"ColumnNullable cannot have constant null map", ErrorCodes::ILLEGAL_COLUMN};
 }

@ -451,13 +451,12 @@ void ColumnNullable::checkConsistency() const
            ErrorCodes::SIZES_OF_NESTED_COLUMNS_ARE_INCONSISTENT);
 }

-
 ColumnPtr makeNullable(const ColumnPtr & column)
 {
-    if (column->isColumnNullable())
+    if (isColumnNullable(*column))
        return column;

-    if (column->isColumnConst())
+    if (isColumnConst(*column))
        return ColumnConst::create(makeNullable(static_cast<const ColumnConst &>(*column).getDataColumnPtr()), column->size());

    return ColumnNullable::create(column, ColumnUInt8::create(column->size(), 0));
--- a/dbms/src/Columns/ColumnNullable.h
+++ b/dbms/src/Columns/ColumnNullable.h
@ -100,7 +100,7 @@ public:
        return false;
    }

-    bool isColumnNullable() const override { return true; }
+    bool isNullable() const override { return true; }
    bool isFixedAndContiguous() const override { return false; }
    bool valuesHaveFixedSize() const override { return nested_column->valuesHaveFixedSize(); }
    size_t sizeOfValueIfFixed() const override { return null_map->sizeOfValueIfFixed() + nested_column->sizeOfValueIfFixed(); }
@ -142,7 +142,6 @@ private:
    void applyNullMapImpl(const ColumnUInt8 & map);
 };

-
 ColumnPtr makeNullable(const ColumnPtr & column);

 }
--- a/dbms/src/Columns/ColumnTuple.cpp
+++ b/dbms/src/Columns/ColumnTuple.cpp
@ -39,7 +39,7 @@ ColumnTuple::ColumnTuple(MutableColumns && mutable_columns)
    columns.reserve(mutable_columns.size());
    for (auto & column : mutable_columns)
    {
-        if (column->isColumnConst())
+        if (isColumnConst(*column))
            throw Exception{"ColumnTuple cannot have ColumnConst as its element", ErrorCodes::ILLEGAL_COLUMN};

        columns.push_back(std::move(column));
@ -49,7 +49,7 @@ ColumnTuple::ColumnTuple(MutableColumns && mutable_columns)
 ColumnTuple::Ptr ColumnTuple::create(const Columns & columns)
 {
    for (const auto & column : columns)
-        if (column->isColumnConst())
+        if (isColumnConst(*column))
            throw Exception{"ColumnTuple cannot have ColumnConst as its element", ErrorCodes::ILLEGAL_COLUMN};

    auto column_tuple = ColumnTuple::create(MutableColumns());
@ -61,7 +61,7 @@ ColumnTuple::Ptr ColumnTuple::create(const Columns & columns)
 ColumnTuple::Ptr ColumnTuple::create(const TupleColumns & columns)
 {
    for (const auto & column : columns)
-        if (column->isColumnConst())
+        if (isColumnConst(*column))
            throw Exception{"ColumnTuple cannot have ColumnConst as its element", ErrorCodes::ILLEGAL_COLUMN};

    auto column_tuple = ColumnTuple::create(MutableColumns());
--- a/dbms/src/Columns/ColumnUnique.h
+++ b/dbms/src/Columns/ColumnUnique.h
@ -64,6 +64,8 @@ public:
    UInt64 get64(size_t n) const override { return getNestedColumn()->get64(n); }
    UInt64 getUInt(size_t n) const override { return getNestedColumn()->getUInt(n); }
    Int64 getInt(size_t n) const override { return getNestedColumn()->getInt(n); }
+    Float64 getFloat64(size_t n) const override { return getNestedColumn()->getFloat64(n); }
+    bool getBool(size_t n) const override { return getNestedColumn()->getBool(n); }
    bool isNullAt(size_t n) const override { return is_nullable && n == getNullValueIndex(); }
    StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
    void updateHashWithValue(size_t n, SipHash & hash_func) const override
@ -191,7 +193,7 @@ ColumnUnique<ColumnType>::ColumnUnique(MutableColumnPtr && holder, bool is_nulla
 {
    if (column_holder->size() < numSpecialValues())
        throw Exception("Too small holder column for ColumnUnique.", ErrorCodes::ILLEGAL_COLUMN);
-    if (column_holder->isColumnNullable())
+    if (isColumnNullable(*column_holder))
        throw Exception("Holder column for ColumnUnique can't be nullable.", ErrorCodes::ILLEGAL_COLUMN);

    index.setColumn(getRawColumnPtr());
@ -271,7 +273,7 @@ size_t ColumnUnique<ColumnType>::uniqueInsertFrom(const IColumn & src, size_t n)
    if (is_nullable && src.isNullAt(n))
        return getNullValueIndex();

-    if (auto * nullable = typeid_cast<const ColumnNullable *>(&src))
+    if (auto * nullable = checkAndGetColumn<ColumnNullable>(src))
        return uniqueInsertFrom(nullable->getNestedColumn(), n);

    auto ref = src.getDataAt(n);
@ -430,7 +432,7 @@ MutableColumnPtr ColumnUnique<ColumnType>::uniqueInsertRangeImpl(
        return nullptr;
    };

-    if (auto nullable_column = typeid_cast<const ColumnNullable *>(&src))
+    if (auto * nullable_column = checkAndGetColumn<ColumnNullable>(src))
    {
        src_column = typeid_cast<const ColumnType *>(&nullable_column->getNestedColumn());
        null_map = &nullable_column->getNullMapData();
--- a/dbms/src/Columns/ColumnVector.cpp
+++ b/dbms/src/Columns/ColumnVector.cpp
@ -33,7 +33,7 @@ template <typename T>
 StringRef ColumnVector<T>::serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const
 {
    auto pos = arena.allocContinue(sizeof(T), begin);
-    unalignedStore(pos, data[n]);
+    unalignedStore<T>(pos, data[n]);
    return StringRef(pos, sizeof(T));
 }

--- a/dbms/src/Columns/FilterDescription.cpp
+++ b/dbms/src/Columns/FilterDescription.cpp
@ -23,14 +23,14 @@ ConstantFilterDescription::ConstantFilterDescription(const IColumn & column)
        return;
    }

-    if (column.isColumnConst())
+    if (isColumnConst(column))
    {
        const ColumnConst & column_const = static_cast<const ColumnConst &>(column);
        ColumnPtr column_nested = column_const.getDataColumnPtr()->convertToFullColumnIfLowCardinality();

        if (!typeid_cast<const ColumnUInt8 *>(column_nested.get()))
        {
-            const ColumnNullable * column_nested_nullable = typeid_cast<const ColumnNullable *>(column_nested.get());
+            const ColumnNullable * column_nested_nullable = checkAndGetColumn<ColumnNullable>(*column_nested);
            if (!column_nested_nullable || !typeid_cast<const ColumnUInt8 *>(&column_nested_nullable->getNestedColumn()))
            {
                throw Exception("Illegal type " + column_nested->getName() + " of column for constant filter. Must be UInt8 or Nullable(UInt8).",
@ -60,7 +60,7 @@ FilterDescription::FilterDescription(const IColumn & column_)
        return;
    }

-    if (const ColumnNullable * nullable_column = typeid_cast<const ColumnNullable *>(&column))
+    if (auto * nullable_column = checkAndGetColumn<ColumnNullable>(column))
    {
        ColumnPtr nested_column = nullable_column->getNestedColumnPtr();
        MutableColumnPtr mutable_holder = (*std::move(nested_column)).mutate();
--- a/dbms/src/Columns/IColumn.cpp
+++ b/dbms/src/Columns/IColumn.cpp
@ -1,6 +1,8 @@
 #include <IO/WriteBufferFromString.h>
 #include <IO/Operators.h>
 #include <Columns/IColumn.h>
+#include <Columns/ColumnNullable.h>
+#include <Columns/ColumnConst.h>


 namespace DB
@ -22,4 +24,14 @@ String IColumn::dumpStructure() const
    return res.str();
 }

+bool isColumnNullable(const IColumn & column)
+{
+    return checkColumn<ColumnNullable>(column);
+}
+
+bool isColumnConst(const IColumn & column)
+{
+    return checkColumn<ColumnConst>(column);
+}
+
 }
--- a/dbms/src/Columns/IColumn.h
+++ b/dbms/src/Columns/IColumn.h
@ -4,6 +4,7 @@
 #include <Common/COW.h>
 #include <Common/PODArray.h>
 #include <Common/Exception.h>
+#include <Common/typeid_cast.h>
 #include <common/StringRef.h>


@ -296,12 +297,8 @@ public:

    /// Various properties on behaviour of column type.

-    /// Is this column a container for Nullable values? It's true only for ColumnNullable.
-    /// Note that ColumnConst(ColumnNullable(...)) is not considered.
-    virtual bool isColumnNullable() const { return false; }
-
-    /// Column stores a constant value. It's true only for ColumnConst wrapper.
-    virtual bool isColumnConst() const { return false; }
+    /// True if column contains something nullable inside. It's true for ColumnNullable, can be true or false for ColumnConst, etc.
+    virtual bool isNullable() const { return false; }

    /// It's a special kind of column, that contain single value, but is not a ColumnConst.
    virtual bool isDummy() const { return false; }
@ -410,4 +407,35 @@ struct IsMutableColumns<Arg, Args ...>
 template <>
 struct IsMutableColumns<> { static const bool value = true; };

+
+template <typename Type>
+const Type * checkAndGetColumn(const IColumn & column)
+{
+    return typeid_cast<const Type *>(&column);
+}
+
+template <typename Type>
+const Type * checkAndGetColumn(const IColumn * column)
+{
+    return typeid_cast<const Type *>(column);
+}
+
+template <typename Type>
+bool checkColumn(const IColumn & column)
+{
+    return checkAndGetColumn<Type>(&column);
+}
+
+template <typename Type>
+bool checkColumn(const IColumn * column)
+{
+    return checkAndGetColumn<Type>(column);
+}
+
+/// True if column's an ColumnConst instance. It's just a syntax sugar for type check.
+bool isColumnConst(const IColumn & column);
+
+/// True if column's an ColumnNullable instance. It's just a syntax sugar for type check.
+bool isColumnNullable(const IColumn & column);
+
 }
--- a/dbms/src/Columns/getLeastSuperColumn.cpp
+++ b/dbms/src/Columns/getLeastSuperColumn.cpp
@ -31,7 +31,7 @@ ColumnWithTypeAndName getLeastSuperColumn(std::vector<const ColumnWithTypeAndNam
    for (size_t i = 0; i < columns.size(); ++i)
    {
        types[i] = columns[i]->type;
-        if (columns[i]->column->isColumnConst())
+        if (isColumnConst(*columns[i]->column))
            ++num_const;
    }

--- a/dbms/src/Common/Allocator.h
+++ b/dbms/src/Common/Allocator.h
@ -255,6 +255,14 @@ private:
    char stack_memory[N];

 public:
+    /// Do not use boost::noncopyable to avoid the warning about direct base
+    /// being inaccessible due to ambiguity, when derived classes are also
+    /// noncopiable (-Winaccessible-base).
+    AllocatorWithStackMemory(const AllocatorWithStackMemory&) = delete;
+    AllocatorWithStackMemory & operator = (const AllocatorWithStackMemory&) = delete;
+    AllocatorWithStackMemory() = default;
+    ~AllocatorWithStackMemory() = default;
+
    void * alloc(size_t size)
    {
        if (size <= N)
--- a/dbms/src/Common/Arena.h
+++ b/dbms/src/Common/Arena.h
@ -5,6 +5,7 @@
 #include <vector>
 #include <boost/noncopyable.hpp>
 #include <common/likely.h>
+#include <sanitizer/asan_interface.h>
 #include <Core/Defines.h>
 #include <Common/memcpySmall.h>
 #include <Common/ProfileEvents.h>
@ -53,10 +54,18 @@ private:
            pos = begin;
            end = begin + size_ - pad_right;
            prev = prev_;
+
+            ASAN_POISON_MEMORY_REGION(begin, size_);
        }

        ~Chunk()
        {
+            /// We must unpoison the memory before returning to the allocator,
+            /// because the allocator might not have asan integration, and the
+            /// memory would stay poisoned forever. If the allocator supports
+            /// asan, it will correctly poison the memory by itself.
+            ASAN_UNPOISON_MEMORY_REGION(begin, size());
+
            Allocator<false>::free(begin, size());

            if (prev)
@ -126,6 +135,7 @@ public:

        char * res = head->pos;
        head->pos += size;
+        ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
        return res;
    }

@ -142,6 +152,7 @@ public:
            {
                head->pos = static_cast<char *>(head_pos);
                head->pos += size;
+                ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
                return res;
            }

@ -161,6 +172,7 @@ public:
    void rollback(size_t size)
    {
        head->pos -= size;
+        ASAN_POISON_MEMORY_REGION(head->pos, size + pad_right);
    }

    /** Begin or expand allocation of contiguous piece of memory without alignment.
@ -187,6 +199,7 @@ public:
        if (!begin)
            begin = res;

+        ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
        return res;
    }

@ -218,6 +231,8 @@ public:

        if (!begin)
            begin = res;
+
+        ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
        return res;
    }

@ -226,7 +241,10 @@ public:
    {
        char * res = alloc(new_size);
        if (old_data)
+        {
            memcpy(res, old_data, old_size);
+            ASAN_POISON_MEMORY_REGION(old_data, old_size);
+        }
        return res;
    }

@ -234,7 +252,10 @@ public:
    {
        char * res = alignedAlloc(new_size, alignment);
        if (old_data)
+        {
            memcpy(res, old_data, old_size);
+            ASAN_POISON_MEMORY_REGION(old_data, old_size);
+        }
        return res;
    }

--- a/dbms/src/Common/ArenaWithFreeLists.h
+++ b/dbms/src/Common/ArenaWithFreeLists.h
@ -1,5 +1,6 @@
 #pragma once

+#include <sanitizer/asan_interface.h>
 #include <Common/Arena.h>
 #include <Common/BitHelpers.h>

@ -63,7 +64,13 @@ public:
        /// If there is a free block.
        if (auto & free_block_ptr = free_lists[list_idx])
        {
-            /// Let's take it. And change the head of the list to the next item in the list.
+            /// Let's take it. And change the head of the list to the next
+            /// item in the list. We poisoned the free block before putting
+            /// it into the free list, so we have to unpoison it before
+            /// reading anything.
+            ASAN_UNPOISON_MEMORY_REGION(free_block_ptr,
+                                        std::max(size, sizeof(Block)));
+
            const auto res = free_block_ptr->data;
            free_block_ptr = free_block_ptr->next;
            return res;
@ -86,6 +93,14 @@ public:
        const auto old_head = free_block_ptr;
        free_block_ptr = reinterpret_cast<Block *>(ptr);
        free_block_ptr->next = old_head;
+
+        /// The requested size may be less than the size of the block, but
+        /// we still want to poison the entire block.
+        /// Strictly speaking, the free blocks must be unpoisoned in
+        /// destructor, to support an underlying allocator that doesn't
+        /// integrate with asan. We don't do that, and rely on the fact that
+        /// our underlying allocator is Arena, which does have asan integration.
+        ASAN_POISON_MEMORY_REGION(ptr, 1ULL << (list_idx + 1));
    }

    /// Size of the allocated pool in bytes
--- a/dbms/src/Common/ColumnsHashingImpl.h
+++ b/dbms/src/Common/ColumnsHashingImpl.h
@ -279,11 +279,10 @@ protected:

        for (const auto & col : key_columns)
        {
-            if (col->isColumnNullable())
+            if (auto * nullable_col = checkAndGetColumn<ColumnNullable>(col))
            {
-                const auto & nullable_col = static_cast<const ColumnNullable &>(*col);
-                actual_columns.push_back(&nullable_col.getNestedColumn());
-                null_maps.push_back(&nullable_col.getNullMapColumn());
+                actual_columns.push_back(&nullable_col->getNestedColumn());
+                null_maps.push_back(&nullable_col->getNullMapColumn());
            }
            else
            {
--- a/dbms/src/Common/ErrorCodes.cpp
+++ b/dbms/src/Common/ErrorCodes.cpp
@ -430,6 +430,8 @@ namespace ErrorCodes
    extern const int MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES = 453;
    extern const int OPENSSL_ERROR = 454;
    extern const int SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY = 455;
+    extern const int UNKNOWN_QUERY_PARAMETER = 456;
+    extern const int BAD_QUERY_PARAMETER = 457;
    extern const int CANNOT_UNLINK = 458;

    extern const int KEEPER_EXCEPTION = 999;
--- a/dbms/src/Common/Exception.h
+++ b/dbms/src/Common/Exception.h
@ -6,7 +6,7 @@

 #include <Poco/Exception.h>

-#include <common/Backtrace.h>
+#include <common/StackTrace.h>

 namespace Poco { class Logger; }

@ -39,10 +39,10 @@ public:
    /// Add something to the existing message.
    void addMessage(const std::string & arg) { extendedMessage(arg); }

-    const Backtrace & getStackTrace() const { return trace; }
+    const StackTrace & getStackTrace() const { return trace; }

 private:
-    Backtrace trace;
+    StackTrace trace;

    const char * className() const throw() override { return "DB::Exception"; }
 };
--- a/dbms/src/Common/LFAllocator.cpp
+++ b/dbms/src/Common/LFAllocator.cpp
@ -1,53 +0,0 @@
-#include <Common/config.h>
-
-#if USE_LFALLOC
-#include "LFAllocator.h"
-
-#include <cstring>
-#include <lf_allocX64.h>
-
-namespace DB
-{
-
-void * LFAllocator::alloc(size_t size, size_t alignment)
-{
-    if (alignment == 0)
-        return LFAlloc(size);
-    else
-    {
-        void * ptr;
-        int res = LFPosixMemalign(&ptr, alignment, size);
-        return res ? nullptr : ptr;
-    }
-}
-
-void LFAllocator::free(void * buf, size_t)
-{
-    LFFree(buf);
-}
-
-void * LFAllocator::realloc(void * old_ptr, size_t, size_t new_size, size_t alignment)
-{
-    if (old_ptr == nullptr)
-    {
-        void * result = LFAllocator::alloc(new_size, alignment);
-        return result;
-    }
-    if (new_size == 0)
-    {
-        LFFree(old_ptr);
-        return nullptr;
-    }
-
-    void * new_ptr = LFAllocator::alloc(new_size, alignment);
-    if (new_ptr == nullptr)
-        return nullptr;
-    size_t old_size = LFGetSize(old_ptr);
-    memcpy(new_ptr, old_ptr, ((old_size < new_size) ? old_size : new_size));
-    LFFree(old_ptr);
-    return new_ptr;
-}
-
-}
-
-#endif
--- a/dbms/src/Common/LFAllocator.h
+++ b/dbms/src/Common/LFAllocator.h
@ -1,22 +0,0 @@
-#pragma once
-
-#include <Common/config.h>
-
-#if !USE_LFALLOC
-#error "do not include this file until USE_LFALLOC is set to 1"
-#endif
-
-#include <cstddef>
-
-namespace DB
-{
-struct LFAllocator
-{
-    static void * alloc(size_t size, size_t alignment = 0);
-
-    static void free(void * buf, size_t);
-
-    static void * realloc(void * buf, size_t, size_t new_size, size_t alignment = 0);
-};
-
-}
--- a/dbms/src/Common/MiAllocator.cpp
+++ b/dbms/src/Common/MiAllocator.cpp
@ -0,0 +1,43 @@
+#include <Common/config.h>
+
+#if USE_MIMALLOC
+
+#include "MiAllocator.h"
+#include <mimalloc.h>
+
+namespace DB
+{
+
+void * MiAllocator::alloc(size_t size, size_t alignment)
+{
+    if (alignment == 0)
+        return mi_malloc(size);
+    else
+        return mi_malloc_aligned(size, alignment);
+}
+
+void MiAllocator::free(void * buf, size_t)
+{
+    mi_free(buf);
+}
+
+void * MiAllocator::realloc(void * old_ptr, size_t, size_t new_size, size_t alignment)
+{
+    if (old_ptr == nullptr)
+        return alloc(new_size, alignment);
+
+    if (new_size == 0)
+    {
+        mi_free(old_ptr);
+        return nullptr;
+    }
+
+    if (alignment == 0)
+        return mi_realloc(old_ptr, alignment);
+
+    return mi_realloc_aligned(old_ptr, new_size, alignment);
+}
+
+}
+
+#endif
--- a/dbms/src/Common/MiAllocator.h
+++ b/dbms/src/Common/MiAllocator.h
@ -0,0 +1,28 @@
+#pragma once
+
+#include <Common/config.h>
+
+#if !USE_MIMALLOC
+#error "do not include this file until USE_MIMALLOC is set to 1"
+#endif
+
+#include <cstddef>
+
+namespace DB
+{
+
+/*
+ * This is a different allocator that is based on mimalloc (Microsoft malloc).
+ * It can be used separately from main allocator to catch heap corruptions and vulnerabilities (for example, for caches).
+ * We use MI_SECURE mode in mimalloc to achieve such behaviour.
+ */
+struct MiAllocator
+{
+    static void * alloc(size_t size, size_t alignment = 0);
+
+    static void free(void * buf, size_t);
+
+    static void * realloc(void * old_ptr, size_t, size_t new_size, size_t alignment = 0);
+};
+
+}
--- a/dbms/src/Common/PODArray.h
+++ b/dbms/src/Common/PODArray.h
@ -45,7 +45,7 @@ inline constexpr size_t integerRoundUp(size_t value, size_t dividend)
  * Only part of the std::vector interface is supported.
  *
  * The default constructor creates an empty object that does not allocate memory.
-  * Then the memory is allocated at least INITIAL_SIZE bytes.
+  * Then the memory is allocated at least initial_bytes bytes.
  *
  * If you insert elements with push_back, without making a `reserve`, then PODArray is about 2.5 times faster than std::vector.
  *
@ -74,7 +74,7 @@ extern const char EmptyPODArray[EmptyPODArraySize];
 /** Base class that depend only on size of element, not on element itself.
  * You can static_cast to this class if you want to insert some data regardless to the actual type T.
  */
-template <size_t ELEMENT_SIZE, size_t INITIAL_SIZE, typename TAllocator, size_t pad_right_, size_t pad_left_>
+template <size_t ELEMENT_SIZE, size_t initial_bytes, typename TAllocator, size_t pad_right_, size_t pad_left_>
 class PODArrayBase : private boost::noncopyable, private TAllocator    /// empty base optimization
 {
 protected:
@ -161,7 +161,8 @@ protected:
        {
            // The allocated memory should be multiplication of ELEMENT_SIZE to hold the element, otherwise,
            // memory issue such as corruption could appear in edge case.
-            realloc(std::max(((INITIAL_SIZE - 1) / ELEMENT_SIZE + 1) * ELEMENT_SIZE, minimum_memory_for_elements(1)),
+            realloc(std::max(integerRoundUp(initial_bytes, ELEMENT_SIZE),
+                             minimum_memory_for_elements(1)),
                    std::forward<TAllocatorParams>(allocator_params)...);
        }
        else
@ -257,11 +258,11 @@ public:
    }
 };

-template <typename T, size_t INITIAL_SIZE = 4096, typename TAllocator = Allocator<false>, size_t pad_right_ = 0, size_t pad_left_ = 0>
-class PODArray : public PODArrayBase<sizeof(T), INITIAL_SIZE, TAllocator, pad_right_, pad_left_>
+template <typename T, size_t initial_bytes = 4096, typename TAllocator = Allocator<false>, size_t pad_right_ = 0, size_t pad_left_ = 0>
+class PODArray : public PODArrayBase<sizeof(T), initial_bytes, TAllocator, pad_right_, pad_left_>
 {
 protected:
-    using Base = PODArrayBase<sizeof(T), INITIAL_SIZE, TAllocator, pad_right_, pad_left_>;
+    using Base = PODArrayBase<sizeof(T), initial_bytes, TAllocator, pad_right_, pad_left_>;

    T * t_start()                      { return reinterpret_cast<T *>(this->c_start); }
    T * t_end()                        { return reinterpret_cast<T *>(this->c_end); }
@ -618,17 +619,23 @@ public:
    }
 };

-template <typename T, size_t INITIAL_SIZE, typename TAllocator, size_t pad_right_>
-void swap(PODArray<T, INITIAL_SIZE, TAllocator, pad_right_> & lhs, PODArray<T, INITIAL_SIZE, TAllocator, pad_right_> & rhs)
+template <typename T, size_t initial_bytes, typename TAllocator, size_t pad_right_>
+void swap(PODArray<T, initial_bytes, TAllocator, pad_right_> & lhs, PODArray<T, initial_bytes, TAllocator, pad_right_> & rhs)
 {
    lhs.swap(rhs);
 }

 /** For columns. Padding is enough to read and write xmm-register at the address of the last element. */
-template <typename T, size_t INITIAL_SIZE = 4096, typename TAllocator = Allocator<false>>
-using PaddedPODArray = PODArray<T, INITIAL_SIZE, TAllocator, 15, 16>;
+template <typename T, size_t initial_bytes = 4096, typename TAllocator = Allocator<false>>
+using PaddedPODArray = PODArray<T, initial_bytes, TAllocator, 15, 16>;

-template <typename T, size_t stack_size_in_bytes>
-using PODArrayWithStackMemory = PODArray<T, 0, AllocatorWithStackMemory<Allocator<false>, integerRoundUp(stack_size_in_bytes, sizeof(T))>>;
+/** A helper for declaring PODArray that uses inline memory.
+  * The initial size is set to use all the inline bytes, since using less would
+  * only add some extra allocation calls.
+  */
+template <typename T, size_t inline_bytes,
+          size_t rounded_bytes = integerRoundUp(inline_bytes, sizeof(T))>
+using PODArrayWithStackMemory = PODArray<T, rounded_bytes,
+    AllocatorWithStackMemory<Allocator<false>, rounded_bytes>>;

 }
--- a/dbms/src/Common/QueryProfiler.h
+++ b/dbms/src/Common/QueryProfiler.h
@ -1,7 +1,7 @@
 #pragma once

 #include <common/Pipe.h>
-#include <common/Backtrace.h>
+#include <common/StackTrace.h>
 #include <common/logger_useful.h>
 #include <Common/CurrentThread.h>
 #include <IO/WriteHelpers.h>
@ -43,11 +43,11 @@ namespace
        const std::string & query_id = CurrentThread::getQueryId();

        const auto signal_context = *reinterpret_cast<ucontext_t *>(context);
-        const Backtrace backtrace(signal_context);
+        const StackTrace stack_trace(signal_context);

        DB::writeIntBinary(false, out);
        DB::writeStringBinary(query_id, out);
-        DB::writePODBinary(backtrace, out);
+        DB::writePODBinary(stack_trace, out);
        DB::writeIntBinary(timer_type, out);
        out.next();
    }
--- a/dbms/src/Common/ThreadPool.cpp
+++ b/dbms/src/Common/ThreadPool.cpp
@ -30,10 +30,18 @@ template <typename Thread>
 template <typename ReturnType>
 ReturnType ThreadPoolImpl<Thread>::scheduleImpl(Job job, int priority, std::optional<uint64_t> wait_microseconds)
 {
-    auto on_error = []
+    auto on_error = [&]
    {
        if constexpr (std::is_same_v<ReturnType, void>)
+        {
+            if (first_exception)
+            {
+                std::exception_ptr exception;
+                std::swap(exception, first_exception);
+                std::rethrow_exception(exception);
+            }
            throw DB::Exception("Cannot schedule a task", DB::ErrorCodes::CANNOT_SCHEDULE_TASK);
+        }
        else
            return false;
    };
--- a/dbms/src/Common/config.h.in
+++ b/dbms/src/Common/config.h.in
@ -8,7 +8,6 @@
 #cmakedefine01 USE_CPUID
 #cmakedefine01 USE_CPUINFO
 #cmakedefine01 USE_BROTLI
-#cmakedefine01 USE_LFALLOC
-#cmakedefine01 USE_LFALLOC_RANDOM_HINT
+#cmakedefine01 USE_MIMALLOC

 #cmakedefine01 CLICKHOUSE_SPLIT_BINARY
--- a/dbms/src/Common/formatIPv6.cpp
+++ b/dbms/src/Common/formatIPv6.cpp
@ -10,7 +10,8 @@ namespace DB
 {

 // To be used in formatIPv4, maps a byte to it's string form prefixed with length (so save strlen call).
-extern const char one_byte_to_string_lookup_table[256][4] = {
+extern const char one_byte_to_string_lookup_table[256][4] =
+{
    {1, '0'}, {1, '1'}, {1, '2'}, {1, '3'}, {1, '4'}, {1, '5'}, {1, '6'}, {1, '7'}, {1, '8'}, {1, '9'},
    {2, '1', '0'}, {2, '1', '1'}, {2, '1', '2'}, {2, '1', '3'}, {2, '1', '4'}, {2, '1', '5'}, {2, '1', '6'}, {2, '1', '7'}, {2, '1', '8'}, {2, '1', '9'},
    {2, '2', '0'}, {2, '2', '1'}, {2, '2', '2'}, {2, '2', '3'}, {2, '2', '4'}, {2, '2', '5'}, {2, '2', '6'}, {2, '2', '7'}, {2, '2', '8'}, {2, '2', '9'},
@ -152,7 +153,7 @@ void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_
    }

    /// Was it a trailing run of 0x00's?
-    if (best.base != -1 && (best.base + best.len) == words.size())
+    if (best.base != -1 && size_t(best.base + best.len) == words.size())
        *dst++ = ':';

    *dst++ = '\0';
--- a/dbms/src/Common/tests/CMakeLists.txt
+++ b/dbms/src/Common/tests/CMakeLists.txt
@ -41,9 +41,6 @@ target_link_libraries (compact_array PRIVATE clickhouse_common_io ${Boost_FILESY
 add_executable (radix_sort radix_sort.cpp)
 target_link_libraries (radix_sort PRIVATE clickhouse_common_io)

-add_executable (shell_command_test shell_command_test.cpp)
-target_link_libraries (shell_command_test PRIVATE clickhouse_common_io)
-
 add_executable (arena_with_free_lists arena_with_free_lists.cpp)
 target_link_libraries (arena_with_free_lists PRIVATE clickhouse_compression clickhouse_common_io)

@ -53,15 +50,6 @@ target_link_libraries (pod_array PRIVATE clickhouse_common_io)
 add_executable (thread_creation_latency thread_creation_latency.cpp)
 target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io)

-add_executable (thread_pool thread_pool.cpp)
-target_link_libraries (thread_pool PRIVATE clickhouse_common_io)
-
-add_executable (thread_pool_2 thread_pool_2.cpp)
-target_link_libraries (thread_pool_2 PRIVATE clickhouse_common_io)
-
-add_executable (thread_pool_3 thread_pool_3.cpp)
-target_link_libraries (thread_pool_3 PRIVATE clickhouse_common_io)
-
 add_executable (multi_version multi_version.cpp)
 target_link_libraries (multi_version PRIVATE clickhouse_common_io)
 add_check(multi_version)
--- a/dbms/src/Common/tests/gtest_shell_command.cpp
+++ b/dbms/src/Common/tests/gtest_shell_command.cpp
@ -0,0 +1,72 @@
+#include <iostream>
+#include <Core/Types.h>
+#include <Common/ShellCommand.h>
+#include <IO/copyData.h>
+#include <IO/WriteBufferFromFileDescriptor.h>
+#include <IO/ReadBufferFromString.h>
+#include <IO/ReadHelpers.h>
+
+#include <chrono>
+#include <thread>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+
+using namespace DB;
+
+
+TEST(ShellCommand, Execute)
+{
+    auto command = ShellCommand::execute("echo 'Hello, world!'");
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, ExecuteDirect)
+{
+    auto command = ShellCommand::executeDirect("/bin/echo", {"Hello, world!"});
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, ExecuteWithInput)
+{
+    auto command = ShellCommand::execute("cat");
+
+    String in_str = "Hello, world!\n";
+    ReadBufferFromString in(in_str);
+    copyData(in, command->in);
+    command->in.close();
+
+    std::string res;
+    readStringUntilEOF(res, command->out);
+    command->wait();
+
+    EXPECT_EQ(res, "Hello, world!\n");
+}
+
+TEST(ShellCommand, AutoWait)
+{
+    // <defunct> hunting:
+    for (int i = 0; i < 1000; ++i)
+    {
+        auto command = ShellCommand::execute("echo " + std::to_string(i));
+        //command->wait(); // now automatic
+    }
+
+    // std::cerr << "inspect me: ps auxwwf" << "\n";
+    // std::this_thread::sleep_for(std::chrono::seconds(100));
+}
--- a/dbms/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp
@ -1,11 +1,18 @@
 #include <Common/ThreadPool.h>

+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
 /** Reproduces bug in ThreadPool.
  * It get stuck if we call 'wait' many times from many other threads simultaneously.
  */


-int main(int, char **)
+TEST(ThreadPool, ConcurrentWait)
 {
    auto worker = []
    {
@ -29,6 +36,4 @@ int main(int, char **)
        waiting_pool.schedule([&pool]{ pool.wait(); });

    waiting_pool.wait();
-
-    return 0;
 }
--- a/dbms/src/Common/tests/gtest_thread_pool_limit.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_limit.cpp
@ -0,0 +1,32 @@
+#include <atomic>
+#include <iostream>
+#include <Common/ThreadPool.h>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+/// Test for thread self-removal when number of free threads in pool is too large.
+/// Just checks that nothing weird happens.
+
+template <typename Pool>
+int test()
+{
+    Pool pool(10, 2, 10);
+
+    std::atomic<int> counter{0};
+    for (size_t i = 0; i < 10; ++i)
+        pool.schedule([&]{ ++counter; });
+    pool.wait();
+
+    return counter;
+}
+
+TEST(ThreadPool, ThreadRemoval)
+{
+    EXPECT_EQ(test<FreeThreadPool>(), 10);
+    EXPECT_EQ(test<ThreadPool>(), 10);
+}
--- a/dbms/src/Common/tests/gtest_thread_pool_loop.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_loop.cpp
@ -2,10 +2,17 @@
 #include <iostream>
 #include <Common/ThreadPool.h>

+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>

-int main(int, char **)
+
+TEST(ThreadPool, Loop)
 {
-    std::atomic<size_t> res{0};
+    std::atomic<int> res{0};

    for (size_t i = 0; i < 1000; ++i)
    {
@ -16,6 +23,5 @@ int main(int, char **)
        pool.wait();
    }

-    std::cerr << res << "\n";
-    return 0;
+    EXPECT_EQ(res, 16000);
 }
--- a/dbms/src/Common/tests/gtest_thread_pool_schedule_exception.cpp
+++ b/dbms/src/Common/tests/gtest_thread_pool_schedule_exception.cpp
@ -0,0 +1,38 @@
+#include <iostream>
+#include <stdexcept>
+#include <Common/ThreadPool.h>
+
+#pragma GCC diagnostic ignored "-Wsign-compare"
+#ifdef __clang__
+    #pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+    #pragma clang diagnostic ignored "-Wundef"
+#endif
+#include <gtest/gtest.h>
+
+
+bool check()
+{
+    ThreadPool pool(10);
+
+    pool.schedule([]{ throw std::runtime_error("Hello, world!"); });
+
+    try
+    {
+        for (size_t i = 0; i < 100; ++i)
+            pool.schedule([]{});    /// An exception will be rethrown from this method.
+    }
+    catch (const std::runtime_error &)
+    {
+        return true;
+    }
+
+    pool.wait();
+
+    return false;
+}
+
+
+TEST(ThreadPool, ExceptionFromSchedule)
+{
+    EXPECT_TRUE(check());
+}
--- a/dbms/src/Common/tests/shell_command_test.cpp
+++ b/dbms/src/Common/tests/shell_command_test.cpp
@ -1,63 +0,0 @@
-#include <iostream>
-#include <Core/Types.h>
-#include <Common/ShellCommand.h>
-#include <IO/copyData.h>
-#include <IO/WriteBufferFromFileDescriptor.h>
-#include <IO/ReadBufferFromString.h>
-
-#include <chrono>
-#include <thread>
-
-using namespace DB;
-
-
-int main(int, char **)
-try
-{
-    {
-        auto command = ShellCommand::execute("echo 'Hello, world!'");
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    {
-        auto command = ShellCommand::executeDirect("/bin/echo", {"Hello, world!"});
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    {
-        auto command = ShellCommand::execute("cat");
-
-        String in_str = "Hello, world!\n";
-        ReadBufferFromString in(in_str);
-        copyData(in, command->in);
-        command->in.close();
-
-        WriteBufferFromFileDescriptor out(STDOUT_FILENO);
-        copyData(command->out, out);
-
-        command->wait();
-    }
-
-    // <defunct> hunting:
-    for (int i = 0; i < 1000; ++i)
-    {
-        auto command = ShellCommand::execute("echo " + std::to_string(i));
-        //command->wait(); // now automatic
-    }
-
-    // std::cerr << "inspect me: ps auxwwf" << "\n";
-    // std::this_thread::sleep_for(std::chrono::seconds(100));
-}
-catch (...)
-{
-    std::cerr << getCurrentExceptionMessage(false) << "\n";
-    return 1;
-}
--- a/dbms/src/Common/tests/thread_pool_3.cpp
+++ b/dbms/src/Common/tests/thread_pool_3.cpp
@ -1,27 +0,0 @@
-#include <mutex>
-#include <iostream>
-#include <Common/ThreadPool.h>
-
-/// Test for thread self-removal when number of free threads in pool is too large.
-/// Just checks that nothing weird happens.
-
-template <typename Pool>
-void test()
-{
-    Pool pool(10, 2, 10);
-
-    std::mutex mutex;
-    for (size_t i = 0; i < 10; ++i)
-        pool.schedule([&]{ std::lock_guard lock(mutex); std::cerr << '.'; });
-    pool.wait();
-}
-
-int main(int, char **)
-{
-    test<FreeThreadPool>();
-    std::cerr << '\n';
-    test<ThreadPool>();
-    std::cerr << '\n';
-
-    return 0;
-}
--- a/dbms/src/Compression/CompressionCodecDelta.cpp
+++ b/dbms/src/Compression/CompressionCodecDelta.cpp
@ -48,7 +48,7 @@ void compressDataForType(const char * source, UInt32 source_size, char * dest)
    while (source < source_end)
    {
        T curr_src = unalignedLoad<T>(source);
-        unalignedStore(dest, curr_src - prev_src);
+        unalignedStore<T>(dest, curr_src - prev_src);
        prev_src = curr_src;

        source += sizeof(T);
@ -67,7 +67,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
    while (source < source_end)
    {
        accumulator += unalignedLoad<T>(source);
-        unalignedStore(dest, accumulator);
+        unalignedStore<T>(dest, accumulator);

        source += sizeof(T);
        dest += sizeof(T);
--- a/dbms/src/Compression/CompressionCodecDoubleDelta.cpp
+++ b/dbms/src/Compression/CompressionCodecDoubleDelta.cpp
@ -83,6 +83,7 @@ WriteSpec getWriteSpec(const T & value)
 template <typename T, typename DeltaType>
 UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
 {
+    static_assert(std::is_unsigned_v<T> && std::is_signed_v<DeltaType>, "T must be unsigned, while DeltaType must be signed integer type.");
    using UnsignedDeltaType = typename std::make_unsigned<DeltaType>::type;

    if (source_size % sizeof(T) != 0)
@ -90,7 +91,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
    const char * source_end = source + source_size;

    const UInt32 items_count = source_size / sizeof(T);
-    unalignedStore(dest, items_count);
+    unalignedStore<UInt32>(dest, items_count);
    dest += sizeof(items_count);

    T prev_value{};
@ -99,7 +100,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
    if (source < source_end)
    {
        prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);

        source += sizeof(prev_value);
        dest += sizeof(prev_value);
@ -109,7 +110,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
    {
        const T curr_value = unalignedLoad<T>(source);
        prev_delta = static_cast<DeltaType>(curr_value - prev_value);
-        unalignedStore(dest, prev_delta);
+        unalignedStore<DeltaType>(dest, prev_delta);

        source += sizeof(curr_value);
        dest += sizeof(prev_delta);
@ -123,8 +124,8 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
    {
        const T curr_value = unalignedLoad<T>(source);

-        const auto delta = curr_value - prev_value;
-        const DeltaType double_delta = static_cast<DeltaType>(delta - static_cast<T>(prev_delta));
+        const DeltaType delta = static_cast<DeltaType>(curr_value - prev_value);
+        const DeltaType double_delta = delta - prev_delta;

        prev_delta = delta;
        prev_value = curr_value;
@ -153,6 +154,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
 template <typename T, typename DeltaType>
 void decompressDataForType(const char * source, UInt32 source_size, char * dest)
 {
+    static_assert(std::is_unsigned_v<T> && std::is_signed_v<DeltaType>, "T must be unsigned, while DeltaType must be signed integer type.");
    const char * source_end = source + source_size;

    const UInt32 items_count = unalignedLoad<UInt32>(source);
@ -164,7 +166,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
    if (source < source_end)
    {
        prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);

        source += sizeof(prev_value);
        dest += sizeof(prev_value);
@ -173,8 +175,8 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
    if (source < source_end)
    {
        prev_delta = unalignedLoad<DeltaType>(source);
-        prev_value = static_cast<T>(prev_value + prev_delta);
-        unalignedStore(dest, prev_value);
+        prev_value = prev_value + static_cast<T>(prev_delta);
+        unalignedStore<T>(dest, prev_value);

        source += sizeof(prev_delta);
        dest += sizeof(prev_value);
@ -208,11 +210,11 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
        }
        // else if first bit is zero, no need to read more data.

-        const T curr_value = static_cast<T>(prev_value + prev_delta + double_delta);
-        unalignedStore(dest, curr_value);
+        const T curr_value = prev_value + static_cast<T>(prev_delta + double_delta);
+        unalignedStore<T>(dest, curr_value);
        dest += sizeof(curr_value);

-        prev_delta = curr_value - prev_value;
+        prev_delta = static_cast<DeltaType>(curr_value - prev_value);
        prev_value = curr_value;
    }
 }
--- a/dbms/src/Compression/CompressionCodecGorilla.cpp
+++ b/dbms/src/Compression/CompressionCodecGorilla.cpp
@ -94,7 +94,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)

    const UInt32 items_count = source_size / sizeof(T);

-    unalignedStore(dest, items_count);
+    unalignedStore<UInt32>(dest, items_count);
    dest += sizeof(items_count);

    T prev_value{};
@ -104,7 +104,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest)
    if (source < source_end)
    {
        prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);

        source += sizeof(prev_value);
        dest += sizeof(prev_value);
@ -166,7 +166,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
    if (source < source_end)
    {
        prev_value = unalignedLoad<T>(source);
-        unalignedStore(dest, prev_value);
+        unalignedStore<T>(dest, prev_value);

        source += sizeof(prev_value);
        dest += sizeof(prev_value);
@ -210,7 +210,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest)
        }
        // else: 0b0 prefix - use prev_value

-        unalignedStore(dest, curr_value);
+        unalignedStore<T>(dest, curr_value);
        dest += sizeof(curr_value);

        prev_xored_info = curr_xored_info;
--- a/dbms/src/Compression/CompressionCodecT64.cpp
+++ b/dbms/src/Compression/CompressionCodecT64.cpp
@ -91,9 +91,10 @@ TypeIndex typeIdx(const DataTypePtr & data_type)
    return TypeIndex::Nothing;
 }

-void transpose64x8(const UInt64 * src, UInt64 * dst, UInt32)
+void transpose64x8(UInt64 * src_dst)
 {
-    auto * src8 = reinterpret_cast<const UInt8 *>(src);
+    auto * src8 = reinterpret_cast<const UInt8 *>(src_dst);
+    UInt64 dst[8] = {};

    for (UInt32 i = 0; i < 64; ++i)
    {
@ -107,142 +108,171 @@ void transpose64x8(const UInt64 * src, UInt64 * dst, UInt32)
        dst[6] |= ((value >> 6) & 0x1) << i;
        dst[7] |= ((value >> 7) & 0x1) << i;
    }
+
+    memcpy(src_dst, dst, 8 * sizeof(UInt64));
 }

-void revTranspose64x8(const UInt64 * src, UInt64 * dst, UInt32)
+void reverseTranspose64x8(UInt64 * src_dst)
 {
-    auto * dst8 = reinterpret_cast<UInt8 *>(dst);
+    UInt8 dst8[64];

    for (UInt32 i = 0; i < 64; ++i)
    {
-        dst8[i] = ((src[0] >> i) & 0x1)
-            | (((src[1] >> i) & 0x1) << 1)
-            | (((src[2] >> i) & 0x1) << 2)
-            | (((src[3] >> i) & 0x1) << 3)
-            | (((src[4] >> i) & 0x1) << 4)
-            | (((src[5] >> i) & 0x1) << 5)
-            | (((src[6] >> i) & 0x1) << 6)
-            | (((src[7] >> i) & 0x1) << 7);
+        dst8[i] = ((src_dst[0] >> i) & 0x1)
+            | (((src_dst[1] >> i) & 0x1) << 1)
+            | (((src_dst[2] >> i) & 0x1) << 2)
+            | (((src_dst[3] >> i) & 0x1) << 3)
+            | (((src_dst[4] >> i) & 0x1) << 4)
+            | (((src_dst[5] >> i) & 0x1) << 5)
+            | (((src_dst[6] >> i) & 0x1) << 6)
+            | (((src_dst[7] >> i) & 0x1) << 7);
    }
+
+    memcpy(src_dst, dst8, 8 * sizeof(UInt64));
 }

-template <typename _T>
-void transposeBytes(_T value, UInt64 * mx, UInt32 col)
+template <typename T>
+void transposeBytes(T value, UInt64 * matrix, UInt32 col)
 {
-    UInt8 * mx8 = reinterpret_cast<UInt8 *>(mx);
+    UInt8 * matrix8 = reinterpret_cast<UInt8 *>(matrix);
    const UInt8 * value8 = reinterpret_cast<const UInt8 *>(&value);

-    if constexpr (sizeof(_T) > 4)
+    if constexpr (sizeof(T) > 4)
    {
-        mx8[64 * 7 + col] = value8[7];
-        mx8[64 * 6 + col] = value8[6];
-        mx8[64 * 5 + col] = value8[5];
-        mx8[64 * 4 + col] = value8[4];
+        matrix8[64 * 7 + col] = value8[7];
+        matrix8[64 * 6 + col] = value8[6];
+        matrix8[64 * 5 + col] = value8[5];
+        matrix8[64 * 4 + col] = value8[4];
    }

-    if constexpr (sizeof(_T) > 2)
+    if constexpr (sizeof(T) > 2)
    {
-        mx8[64 * 3 + col] = value8[3];
-        mx8[64 * 2 + col] = value8[2];
+        matrix8[64 * 3 + col] = value8[3];
+        matrix8[64 * 2 + col] = value8[2];
    }

-    if constexpr (sizeof(_T) > 1)
-        mx8[64 * 1 + col] = value8[1];
+    if constexpr (sizeof(T) > 1)
+        matrix8[64 * 1 + col] = value8[1];

-    mx8[64 * 0 + col] = value8[0];
+    matrix8[64 * 0 + col] = value8[0];
 }

-template <typename _T>
-void revTransposeBytes(const UInt64 * mx, UInt32 col, _T & value)
+template <typename T>
+void reverseTransposeBytes(const UInt64 * matrix, UInt32 col, T & value)
 {
-    auto * mx8 = reinterpret_cast<const UInt8 *>(mx);
+    auto * matrix8 = reinterpret_cast<const UInt8 *>(matrix);

-    if constexpr (sizeof(_T) > 4)
+    if constexpr (sizeof(T) > 4)
    {
-        value |= UInt64(mx8[64 * 7 + col]) << (8 * 7);
-        value |= UInt64(mx8[64 * 6 + col]) << (8 * 6);
-        value |= UInt64(mx8[64 * 5 + col]) << (8 * 5);
-        value |= UInt64(mx8[64 * 4 + col]) << (8 * 4);
+        value |= UInt64(matrix8[64 * 7 + col]) << (8 * 7);
+        value |= UInt64(matrix8[64 * 6 + col]) << (8 * 6);
+        value |= UInt64(matrix8[64 * 5 + col]) << (8 * 5);
+        value |= UInt64(matrix8[64 * 4 + col]) << (8 * 4);
    }

-    if constexpr (sizeof(_T) > 2)
+    if constexpr (sizeof(T) > 2)
    {
-        value |= UInt32(mx8[64 * 3 + col]) << (8 * 3);
-        value |= UInt32(mx8[64 * 2 + col]) << (8 * 2);
+        value |= UInt32(matrix8[64 * 3 + col]) << (8 * 3);
+        value |= UInt32(matrix8[64 * 2 + col]) << (8 * 2);
    }

-    if constexpr (sizeof(_T) > 1)
-        value |= UInt32(mx8[64 * 1 + col]) << (8 * 1);
+    if constexpr (sizeof(T) > 1)
+        value |= UInt32(matrix8[64 * 1 + col]) << (8 * 1);

-    value |= UInt32(mx8[col]);
+    value |= UInt32(matrix8[col]);
+}
+
+
+template <typename T>
+void load(const char * src, T * buf, UInt32 tail = 64)
+{
+    memcpy(buf, src, tail * sizeof(T));
+}
+
+template <typename T>
+void store(const T * buf, char * dst, UInt32 tail = 64)
+{
+    memcpy(dst, buf, tail * sizeof(T));
+}
+
+template <typename T>
+void clear(T * buf)
+{
+    for (UInt32 i = 0; i < 64; ++i)
+        buf[i] = 0;
 }


 /// UIntX[64] -> UInt64[N] transposed matrix, N <= X
-template <typename _T>
-void transpose(const char * src, char * dst, UInt32 num_bits, UInt32 tail = 64)
+template <typename T, bool full = false>
+void transpose(const T * src, char * dst, UInt32 num_bits, UInt32 tail = 64)
 {
    UInt32 full_bytes = num_bits / 8;
    UInt32 part_bits = num_bits % 8;

-    UInt64 mx[64] = {};
-    const char * ptr = src;
-    for (UInt32 col = 0; col < tail; ++col, ptr += sizeof(_T))
+    UInt64 matrix[64] = {};
+    for (UInt32 col = 0; col < tail; ++col)
+        transposeBytes(src[col], matrix, col);
+
+    if constexpr (full)
    {
-        _T value = unalignedLoad<_T>(ptr);
-        transposeBytes(value, mx, col);
+        UInt64 * matrix_line = matrix;
+        for (UInt32 byte = 0; byte < full_bytes; ++byte, matrix_line += 8)
+            transpose64x8(matrix_line);
    }

    UInt32 full_size = sizeof(UInt64) * (num_bits - part_bits);
-    memcpy(dst, mx, full_size);
+    memcpy(dst, matrix, full_size);
    dst += full_size;

    /// transpose only partially filled last byte
    if (part_bits)
    {
-        UInt64 * partial = &mx[full_bytes * 8];
-        UInt64 res[8] = {};
-        transpose64x8(partial, res, part_bits);
-        memcpy(dst, res, part_bits * sizeof(UInt64));
+        UInt64 * matrix_line = &matrix[full_bytes * 8];
+        transpose64x8(matrix_line);
+        memcpy(dst, matrix_line, part_bits * sizeof(UInt64));
    }
 }

 /// UInt64[N] transposed matrix -> UIntX[64]
-template <typename _T, typename _MinMaxT = std::conditional_t<std::is_signed_v<_T>, Int64, UInt64>>
-void revTranspose(const char * src, char * dst, UInt32 num_bits, _MinMaxT min, _MinMaxT max [[maybe_unused]], UInt32 tail = 64)
+template <typename T, bool full = false>
+void reverseTranspose(const char * src, T * buf, UInt32 num_bits, UInt32 tail = 64)
 {
-    UInt64 mx[64] = {};
-    memcpy(mx, src, num_bits * sizeof(UInt64));
+    UInt64 matrix[64] = {};
+    memcpy(matrix, src, num_bits * sizeof(UInt64));

    UInt32 full_bytes = num_bits / 8;
    UInt32 part_bits = num_bits % 8;
-    UInt64 * partial = &mx[full_bytes * 8];
+
+    if constexpr (full)
+    {
+        UInt64 * matrix_line = matrix;
+        for (UInt32 byte = 0; byte < full_bytes; ++byte, matrix_line += 8)
+            reverseTranspose64x8(matrix_line);
+    }

    if (part_bits)
    {
-        UInt64 res[8] = {};
-        revTranspose64x8(partial, res, part_bits);
-        memcpy(partial, res, 8 * sizeof(UInt64));
+        UInt64 * matrix_line = &matrix[full_bytes * 8];
+        reverseTranspose64x8(matrix_line);
    }

-    _T upper_min = 0;
-    if (num_bits < 64)
-        upper_min = UInt64(min) >> num_bits << num_bits;
+    clear(buf);
+    for (UInt32 col = 0; col < tail; ++col)
+        reverseTransposeBytes(matrix, col, buf[col]);
+}

-    _T buf[64] = {};
-
-    if constexpr (std::is_signed_v<_T>)
+template <typename T, typename MinMaxT = std::conditional_t<std::is_signed_v<T>, Int64, UInt64>>
+void restoreUpperBits(T * buf, T upper_min, T upper_max [[maybe_unused]], T sign_bit [[maybe_unused]], UInt32 tail = 64)
+{
+    if constexpr (std::is_signed_v<T>)
    {
        /// Restore some data as negatives and others as positives
-        if (min < 0 && max >= 0 && num_bits < 64)
+        if (sign_bit)
        {
-            _T sign_bit = 1ull << (num_bits - 1);
-            _T upper_max = UInt64(max) >> num_bits << num_bits;
-
            for (UInt32 col = 0; col < tail; ++col)
            {
-                _T & value = buf[col];
-                revTransposeBytes(mx, col, value);
+                T & value = buf[col];

                if (value & sign_bit)
                    value |= upper_min;
@ -250,20 +280,12 @@ void revTranspose(const char * src, char * dst, UInt32 num_bits, _MinMaxT min, _
                    value |= upper_max;
            }

-            memcpy(dst, buf, tail * sizeof(_T));
            return;
        }
    }

    for (UInt32 col = 0; col < tail; ++col)
-    {
-        _T & value = buf[col];
-        revTransposeBytes(mx, col, value);
-
-        value |= upper_min;
-    }
-
-    memcpy(dst, buf, tail * sizeof(_T));
+        buf[col] |= upper_min;
 }


@ -289,16 +311,16 @@ UInt32 getValuableBitsNumber(Int64 min, Int64 max)
 }


-template <typename _T>
-void findMinMax(const char * src, UInt32 src_size, _T & min, _T & max)
+template <typename T>
+void findMinMax(const char * src, UInt32 src_size, T & min, T & max)
 {
-    min = unalignedLoad<_T>(src);
-    max = unalignedLoad<_T>(src);
+    min = unalignedLoad<T>(src);
+    max = unalignedLoad<T>(src);

    const char * end = src + src_size;
-    for (; src < end; src += sizeof(_T))
+    for (; src < end; src += sizeof(T))
    {
-        auto current = unalignedLoad<_T>(src);
+        auto current = unalignedLoad<T>(src);
        if (current < min)
            min = current;
        if (current > max)
@ -306,24 +328,27 @@ void findMinMax(const char * src, UInt32 src_size, _T & min, _T & max)
    }
 }

-template <typename _T>
+
+using Variant = CompressionCodecT64::Variant;
+
+template <typename T, bool full>
 UInt32 compressData(const char * src, UInt32 bytes_size, char * dst)
 {
-    using MinMaxType = std::conditional_t<std::is_signed_v<_T>, Int64, UInt64>;
+    using MinMaxType = std::conditional_t<std::is_signed_v<T>, Int64, UInt64>;

-    const UInt32 mx_size = 64;
-    const UInt32 header_size = 2 * sizeof(UInt64);
+    static constexpr const UInt32 matrix_size = 64;
+    static constexpr const UInt32 header_size = 2 * sizeof(UInt64);

-    if (bytes_size % sizeof(_T))
-        throw Exception("Cannot compress, data size " + toString(bytes_size) + " is not multiplier of " + toString(sizeof(_T)),
+    if (bytes_size % sizeof(T))
+        throw Exception("Cannot compress, data size " + toString(bytes_size) + " is not multiplier of " + toString(sizeof(T)),
                        ErrorCodes::CANNOT_COMPRESS);

-    UInt32 src_size = bytes_size / sizeof(_T);
-    UInt32 num_full = src_size / mx_size;
-    UInt32 tail = src_size % mx_size;
+    UInt32 src_size = bytes_size / sizeof(T);
+    UInt32 num_full = src_size / matrix_size;
+    UInt32 tail = src_size % matrix_size;

-    _T min, max;
-    findMinMax<_T>(src, bytes_size, min, max);
+    T min, max;
+    findMinMax<T>(src, bytes_size, min, max);
    MinMaxType min64 = min;
    MinMaxType max64 = max;

@ -338,11 +363,13 @@ UInt32 compressData(const char * src, UInt32 bytes_size, char * dst)
    if (!num_bits)
        return header_size;

-    UInt32 src_shift = sizeof(_T) * mx_size;
+    T buf[matrix_size];
+    UInt32 src_shift = sizeof(T) * matrix_size;
    UInt32 dst_shift = sizeof(UInt64) * num_bits;
    for (UInt32 i = 0; i < num_full; ++i)
    {
-        transpose<_T>(src, dst, num_bits);
+        load<T>(src, buf, matrix_size);
+        transpose<T, full>(buf, dst, num_bits);
        src += src_shift;
        dst += dst_shift;
    }
@ -351,29 +378,31 @@ UInt32 compressData(const char * src, UInt32 bytes_size, char * dst)

    if (tail)
    {
-        transpose<_T>(src, dst, num_bits, tail);
+        load<T>(src, buf, tail);
+        transpose<T, full>(buf, dst, num_bits, tail);
        dst_bytes += dst_shift;
    }

    return header_size + dst_bytes;
 }

-template <typename _T>
+template <typename T, bool full>
 void decompressData(const char * src, UInt32 bytes_size, char * dst, UInt32 uncompressed_size)
 {
-    using MinMaxType = std::conditional_t<std::is_signed_v<_T>, Int64, UInt64>;
+    using MinMaxType = std::conditional_t<std::is_signed_v<T>, Int64, UInt64>;

-    const UInt32 header_size = 2 * sizeof(UInt64);
+    static constexpr const UInt32 matrix_size = 64;
+    static constexpr const UInt32 header_size = 2 * sizeof(UInt64);

    if (bytes_size < header_size)
        throw Exception("Cannot decompress, data size " + toString(bytes_size) + " is less then T64 header",
                        ErrorCodes::CANNOT_DECOMPRESS);

-    if (uncompressed_size % sizeof(_T))
+    if (uncompressed_size % sizeof(T))
        throw Exception("Cannot decompress, unexpected uncompressed size " + toString(uncompressed_size),
                        ErrorCodes::CANNOT_DECOMPRESS);

-    UInt64 num_elements = uncompressed_size / sizeof(_T);
+    UInt64 num_elements = uncompressed_size / sizeof(T);
    MinMaxType min;
    MinMaxType max;

@ -388,60 +417,101 @@ void decompressData(const char * src, UInt32 bytes_size, char * dst, UInt32 unco
    UInt32 num_bits = getValuableBitsNumber(min, max);
    if (!num_bits)
    {
-        _T min_value = min;
-        for (UInt32 i = 0; i < num_elements; ++i, dst += sizeof(_T))
-            unalignedStore(dst, min_value);
+        T min_value = min;
+        for (UInt32 i = 0; i < num_elements; ++i, dst += sizeof(T))
+            unalignedStore<T>(dst, min_value);
        return;
    }

    UInt32 src_shift = sizeof(UInt64) * num_bits;
-    UInt32 dst_shift = sizeof(_T) * 64;
+    UInt32 dst_shift = sizeof(T) * matrix_size;

    if (!bytes_size || bytes_size % src_shift)
        throw Exception("Cannot decompress, data size " + toString(bytes_size) + " is not multiplier of " + toString(src_shift),
                        ErrorCodes::CANNOT_DECOMPRESS);

    UInt32 num_full = bytes_size / src_shift;
-    UInt32 tail = num_elements % 64;
+    UInt32 tail = num_elements % matrix_size;
    if (tail)
        --num_full;

+    T upper_min = 0;
+    T upper_max [[maybe_unused]] = 0;
+    T sign_bit [[maybe_unused]] = 0;
+    if (num_bits < 64)
+        upper_min = UInt64(min) >> num_bits << num_bits;
+
+    if constexpr (std::is_signed_v<T>)
+    {
+        if (min < 0 && max >= 0 && num_bits < 64)
+        {
+            sign_bit = 1ull << (num_bits - 1);
+            upper_max = UInt64(max) >> num_bits << num_bits;
+        }
+    }
+
+    T buf[matrix_size];
    for (UInt32 i = 0; i < num_full; ++i)
    {
-        revTranspose<_T>(src, dst, num_bits, min, max);
+        reverseTranspose<T, full>(src, buf, num_bits);
+        restoreUpperBits(buf, upper_min, upper_max, sign_bit);
+        store<T>(buf, dst, matrix_size);
        src += src_shift;
        dst += dst_shift;
    }

    if (tail)
-        revTranspose<_T>(src, dst, num_bits, min, max, tail);
+    {
+        reverseTranspose<T, full>(src, buf, num_bits, tail);
+        restoreUpperBits(buf, upper_min, upper_max, sign_bit, tail);
+        store<T>(buf, dst, tail);
+    }
+}
+
+template <typename T>
+UInt32 compressData(const char * src, UInt32 src_size, char * dst, Variant variant)
+{
+    if (variant == Variant::Bit)
+        return compressData<T, true>(src, src_size, dst);
+    return compressData<T, false>(src, src_size, dst);
+}
+
+template <typename T>
+void decompressData(const char * src, UInt32 src_size, char * dst, UInt32 uncompressed_size, Variant variant)
+{
+    if (variant == Variant::Bit)
+        decompressData<T, true>(src, src_size, dst, uncompressed_size);
+    else
+        decompressData<T, false>(src, src_size, dst, uncompressed_size);
 }

 }

+
 UInt32 CompressionCodecT64::doCompressData(const char * src, UInt32 src_size, char * dst) const
 {
-    memcpy(dst, &type_idx, 1);
+    UInt8 cookie = static_cast<UInt8>(type_idx) | (static_cast<UInt8>(variant) << 7);
+    memcpy(dst, &cookie, 1);
    dst += 1;

    switch (baseType(type_idx))
    {
        case TypeIndex::Int8:
-            return 1 + compressData<Int8>(src, src_size, dst);
+            return 1 + compressData<Int8>(src, src_size, dst, variant);
        case TypeIndex::Int16:
-            return 1 + compressData<Int16>(src, src_size, dst);
+            return 1 + compressData<Int16>(src, src_size, dst, variant);
        case TypeIndex::Int32:
-            return 1 + compressData<Int32>(src, src_size, dst);
+            return 1 + compressData<Int32>(src, src_size, dst, variant);
        case TypeIndex::Int64:
-            return 1 + compressData<Int64>(src, src_size, dst);
+            return 1 + compressData<Int64>(src, src_size, dst, variant);
        case TypeIndex::UInt8:
-            return 1 + compressData<UInt8>(src, src_size, dst);
+            return 1 + compressData<UInt8>(src, src_size, dst, variant);
        case TypeIndex::UInt16:
-            return 1 + compressData<UInt16>(src, src_size, dst);
+            return 1 + compressData<UInt16>(src, src_size, dst, variant);
        case TypeIndex::UInt32:
-            return 1 + compressData<UInt32>(src, src_size, dst);
+            return 1 + compressData<UInt32>(src, src_size, dst, variant);
        case TypeIndex::UInt64:
-            return 1 + compressData<UInt64>(src, src_size, dst);
+            return 1 + compressData<UInt64>(src, src_size, dst, variant);
        default:
            break;
    }
@ -454,32 +524,31 @@ void CompressionCodecT64::doDecompressData(const char * src, UInt32 src_size, ch
    if (!src_size)
        throw Exception("Connot decompress with T64", ErrorCodes::CANNOT_DECOMPRESS);

-    UInt8 saved_type_id = unalignedLoad<UInt8>(src);
+    UInt8 cookie = unalignedLoad<UInt8>(src);
    src += 1;
    src_size -= 1;

-    TypeIndex actual_type_id = type_idx;
-    if (actual_type_id == TypeIndex::Nothing)
-        actual_type_id = static_cast<TypeIndex>(saved_type_id);
+    auto saved_variant = static_cast<Variant>(cookie >> 7);
+    auto saved_type_id = static_cast<TypeIndex>(cookie & 0x7F);

-    switch (baseType(actual_type_id))
+    switch (baseType(saved_type_id))
    {
        case TypeIndex::Int8:
-            return decompressData<Int8>(src, src_size, dst, uncompressed_size);
+            return decompressData<Int8>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::Int16:
-            return decompressData<Int16>(src, src_size, dst, uncompressed_size);
+            return decompressData<Int16>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::Int32:
-            return decompressData<Int32>(src, src_size, dst, uncompressed_size);
+            return decompressData<Int32>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::Int64:
-            return decompressData<Int64>(src, src_size, dst, uncompressed_size);
+            return decompressData<Int64>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::UInt8:
-            return decompressData<UInt8>(src, src_size, dst, uncompressed_size);
+            return decompressData<UInt8>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::UInt16:
-            return decompressData<UInt16>(src, src_size, dst, uncompressed_size);
+            return decompressData<UInt16>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::UInt32:
-            return decompressData<UInt32>(src, src_size, dst, uncompressed_size);
+            return decompressData<UInt32>(src, src_size, dst, uncompressed_size, saved_variant);
        case TypeIndex::UInt64:
-            return decompressData<UInt64>(src, src_size, dst, uncompressed_size);
+            return decompressData<UInt64>(src, src_size, dst, uncompressed_size, saved_variant);
        default:
            break;
    }
@ -506,13 +575,33 @@ void registerCodecT64(CompressionCodecFactory & factory)
 {
    auto reg_func = [&](const ASTPtr & arguments, DataTypePtr type) -> CompressionCodecPtr
    {
+        Variant variant = Variant::Byte;
+
        if (arguments && !arguments->children.empty())
-            throw Exception("T64 codec should not have parameters", ErrorCodes::ILLEGAL_SYNTAX_FOR_CODEC_TYPE);
+        {
+            if (arguments->children.size() > 1)
+                throw Exception("T64 support zero or one parameter, given " + std::to_string(arguments->children.size()),
+                                ErrorCodes::ILLEGAL_SYNTAX_FOR_CODEC_TYPE);
+
+            const auto children = arguments->children;
+            const auto * literal = children[0]->as<ASTLiteral>();
+            if (!literal)
+                throw Exception("Wrong modification for T64. Expected: 'bit', 'byte')",
+                                ErrorCodes::ILLEGAL_CODEC_PARAMETER);
+            String name = literal->value.safeGet<String>();
+
+            if (name == "byte")
+                variant = Variant::Byte;
+            else if (name == "bit")
+                variant = Variant::Bit;
+            else
+                throw Exception("Wrong modification for T64: " + name, ErrorCodes::ILLEGAL_CODEC_PARAMETER);
+        }

        auto type_idx = typeIdx(type);
        if (type && type_idx == TypeIndex::Nothing)
            throw Exception("T64 codec is not supported for specified type", ErrorCodes::ILLEGAL_SYNTAX_FOR_CODEC_TYPE);
-        return std::make_shared<CompressionCodecT64>(type_idx);
+        return std::make_shared<CompressionCodecT64>(type_idx, variant);
    };

    factory.registerCompressionCodecWithType("T64", codecId(), reg_func);
--- a/dbms/src/Compression/CompressionCodecT64.h
+++ b/dbms/src/Compression/CompressionCodecT64.h
@ -6,18 +6,35 @@
 namespace DB
 {

+/// Get 64 integer valuses, makes 64x64 bit matrix, transpose it and crop unused bits (most significant zeroes).
+/// In example, if we have UInt8 with only 0 and 1 inside 64xUInt8 would be compressed into 1xUInt64.
+/// It detects unused bits by calculating min and max values of data part, saving them in header in compression phase.
+/// There's a special case with signed integers parts with crossing zero data. Here it stores one more bit to detect sign of value.
 class CompressionCodecT64 : public ICompressionCodec
 {
 public:
    static constexpr UInt32 HEADER_SIZE = 1 + 2 * sizeof(UInt64);
    static constexpr UInt32 MAX_COMPRESSED_BLOCK_SIZE = sizeof(UInt64) * 64;

-    CompressionCodecT64(TypeIndex type_idx_)
+    /// There're 2 compression variants:
+    /// Byte - transpose bit matrix by bytes (only the last not full byte is transposed by bits). It's default.
+    /// Bits - full bit-transpose of the bit matrix. It uses more resources and leads to better compression with ZSTD (but worse with LZ4).
+    enum class Variant
+    {
+        Byte,
+        Bit
+    };
+
+    CompressionCodecT64(TypeIndex type_idx_, Variant variant_)
        : type_idx(type_idx_)
+        , variant(variant_)
    {}

    UInt8 getMethodByte() const override;
-    String getCodecDesc() const override { return "T64"; }
+    String getCodecDesc() const override
+    {
+        return String("T64") + ((variant == Variant::Byte) ? "" : "(\'bit\')");
+    }

    void useInfoAboutType(DataTypePtr data_type) override;

@ -33,6 +50,7 @@ protected:

 private:
    TypeIndex type_idx;
+    Variant variant;
 };

 }
--- a/dbms/src/Compression/LZ4_decompress_faster.cpp
+++ b/dbms/src/Compression/LZ4_decompress_faster.cpp
@ -200,7 +200,7 @@ inline void copyOverlap8Shuffle(UInt8 * op, const UInt8 *& match, const size_t o
        0, 1, 2, 3, 4, 5, 6, 0,
    };

-    unalignedStore(op, vtbl1_u8(unalignedLoad<uint8x8_t>(match), unalignedLoad<uint8x8_t>(masks + 8 * offset)));
+    unalignedStore<uint8x8_t>(op, vtbl1_u8(unalignedLoad<uint8x8_t>(match), unalignedLoad<uint8x8_t>(masks + 8 * offset)));
    match += masks[offset];
 }

@ -328,10 +328,10 @@ inline void copyOverlap16Shuffle(UInt8 * op, const UInt8 *& match, const size_t
        0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,  0,
    };

-    unalignedStore(op,
+    unalignedStore<uint8x8_t>(op,
        vtbl2_u8(unalignedLoad<uint8x8x2_t>(match), unalignedLoad<uint8x8_t>(masks + 16 * offset)));

-    unalignedStore(op + 8,
+    unalignedStore<uint8x8_t>(op + 8,
        vtbl2_u8(unalignedLoad<uint8x8x2_t>(match), unalignedLoad<uint8x8_t>(masks + 16 * offset + 8)));

    match += masks[offset];
--- a/dbms/src/Core/Block.cpp
+++ b/dbms/src/Core/Block.cpp
@ -336,6 +336,7 @@ MutableColumns Block::mutateColumns()

 void Block::setColumns(MutableColumns && columns)
 {
+    /// TODO: assert if |columns| doesn't match |data|!
    size_t num_columns = data.size();
    for (size_t i = 0; i < num_columns; ++i)
        data[i].column = std::move(columns[i]);
@ -344,6 +345,7 @@ void Block::setColumns(MutableColumns && columns)

 void Block::setColumns(const Columns & columns)
 {
+    /// TODO: assert if |columns| doesn't match |data|!
    size_t num_columns = data.size();
    for (size_t i = 0; i < num_columns; ++i)
        data[i].column = columns[i];
@ -472,7 +474,7 @@ static ReturnType checkBlockStructure(const Block & lhs, const Block & rhs, cons
            return on_error("Block structure mismatch in " + context_description + " stream: different columns:\n"
                + lhs.dumpStructure() + "\n" + rhs.dumpStructure(), ErrorCodes::BLOCKS_HAVE_DIFFERENT_STRUCTURE);

-        if (actual.column->isColumnConst() && expected.column->isColumnConst())
+        if (isColumnConst(*actual.column) && isColumnConst(*expected.column))
        {
            Field actual_value = static_cast<const ColumnConst &>(*actual.column).getField();
            Field expected_value = static_cast<const ColumnConst &>(*expected.column).getField();
--- a/dbms/src/Core/DecimalComparison.h
+++ b/dbms/src/Core/DecimalComparison.h
@ -167,8 +167,8 @@ private:

        if constexpr (_actual)
        {
-            bool c0_is_const = c0->isColumnConst();
-            bool c1_is_const = c1->isColumnConst();
+            bool c0_is_const = isColumnConst(*c0);
+            bool c1_is_const = isColumnConst(*c1);

            if (c0_is_const && c1_is_const)
            {
--- a/dbms/src/Core/Defines.h
+++ b/dbms/src/Core/Defines.h
@ -88,7 +88,7 @@
 #define PLATFORM_NOT_SUPPORTED "The only supported platforms are x86_64 and AArch64, PowerPC (work in progress)"

 #if !defined(__x86_64__) && !defined(__aarch64__) && !defined(__PPC__)
-//    #error PLATFORM_NOT_SUPPORTED
+    #error PLATFORM_NOT_SUPPORTED
 #endif

 /// Check for presence of address sanitizer
@ -114,10 +114,12 @@
 #if defined(__clang__)
    #define NO_SANITIZE_UNDEFINED __attribute__((__no_sanitize__("undefined")))
    #define NO_SANITIZE_ADDRESS __attribute__((__no_sanitize__("address")))
+    #define NO_SANITIZE_THREAD __attribute__((__no_sanitize__("thread")))
 #else
    /// It does not work in GCC. GCC 7 cannot recognize this attribute and GCC 8 simply ignores it.
    #define NO_SANITIZE_UNDEFINED
    #define NO_SANITIZE_ADDRESS
+    #define NO_SANITIZE_THREAD
 #endif

 #if defined __GNUC__ && !defined __clang__
--- a/dbms/src/Core/Settings.h
+++ b/dbms/src/Core/Settings.h
@ -104,12 +104,12 @@ struct Settings : public SettingsCollection<Settings>
    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.") \
    \
    M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.") \
-    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (100 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.") \
+    M(SettingUInt64, merge_tree_min_bytes_for_concurrent_read, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized.") \
    M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.") \
    M(SettingUInt64, merge_tree_min_bytes_for_seek, 0, "You can skip reading more than that number of bytes at the price of one seek per file.") \
    M(SettingUInt64, merge_tree_coarse_index_granularity, 8, "If the index segment can contain the required keys, divide it into as many parts and recursively check them.") \
-    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
-    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (600 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
+    M(SettingUInt64, merge_tree_max_rows_to_use_cache, (128 * 8192), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
+    M(SettingUInt64, merge_tree_max_bytes_to_use_cache, (192 * 10 * 1024 * 1024), "The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)") \
    \
    M(SettingBool, merge_tree_uniform_read_distribution, true, "Distribute read from MergeTree over threads evenly, ensuring stable average execution time of each thread within one read operation.") \
    \
@ -325,7 +325,7 @@ struct Settings : public SettingsCollection<Settings>
    M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for 'mysql' table function.") \
    M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.") \
    \
-    M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
+    M(SettingBool, allow_hyperscan, 1, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.") \
    M(SettingBool, allow_simdjson, 1, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.") \
    \
    M(SettingUInt64, max_partitions_per_insert_block, 100, "Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.")
--- a/dbms/src/Core/Types.h
+++ b/dbms/src/Core/Types.h
@ -1,8 +1,8 @@
 #pragma once

+#include <cstdint>
 #include <string>
 #include <vector>
-#include <cstdint>


 namespace DB
--- a/dbms/src/DataStreams/AddingDefaultsBlockInputStream.cpp
+++ b/dbms/src/DataStreams/AddingDefaultsBlockInputStream.cpp
@ -187,7 +187,7 @@ MutableColumnPtr AddingDefaultsBlockInputStream::mixColumns(const ColumnWithType
    {
        if (defaults_mask[i])
        {
-            if (col_defaults.column->isColumnConst())
+            if (isColumnConst(*col_defaults.column))
                column_mixed->insert((*col_defaults.column)[i]);
            else
                column_mixed->insertFrom(*col_defaults.column, i);
--- a/dbms/src/DataStreams/ConvertingBlockInputStream.cpp
+++ b/dbms/src/DataStreams/ConvertingBlockInputStream.cpp
@ -60,7 +60,7 @@ ConvertingBlockInputStream::ConvertingBlockInputStream(
                if (input_header.has(res_elem.name))
                    conversion[result_col_num] = input_header.getPositionByName(res_elem.name);
                else
-                    throw Exception("Cannot find column " + backQuoteIfNeed(res_elem.name) + " in source stream",
+                    throw Exception("Cannot find column " + backQuote(res_elem.name) + " in source stream",
                        ErrorCodes::THERE_IS_NO_COLUMN);
                break;
        }
@ -69,9 +69,9 @@ ConvertingBlockInputStream::ConvertingBlockInputStream(

        /// Check constants.

-        if (res_elem.column->isColumnConst())
+        if (isColumnConst(*res_elem.column))
        {
-            if (!src_elem.column->isColumnConst())
+            if (!isColumnConst(*src_elem.column))
                throw Exception("Cannot convert column " + backQuoteIfNeed(res_elem.name)
                    + " because it is non constant in source stream but must be constant in result",
                    ErrorCodes::BLOCKS_HAVE_DIFFERENT_STRUCTURE);
@ -103,7 +103,7 @@ Block ConvertingBlockInputStream::readImpl()

        ColumnPtr converted = castColumnWithDiagnostic(src_elem, res_elem, context);

-        if (src_elem.column->isColumnConst() && !res_elem.column->isColumnConst())
+        if (isColumnConst(*src_elem.column) && !isColumnConst(*res_elem.column))
            converted = converted->convertToFullColumnIfConst();

        res_elem.column = std::move(converted);
--- a/dbms/src/DataStreams/DistinctBlockInputStream.cpp
+++ b/dbms/src/DataStreams/DistinctBlockInputStream.cpp
@ -112,7 +112,7 @@ ColumnRawPtrs DistinctBlockInputStream::getKeyColumns(const Block & block) const
            : block.getByName(columns_names[i]).column;

        /// Ignore all constant columns.
-        if (!column->isColumnConst())
+        if (!isColumnConst(*column))
            column_ptrs.emplace_back(column.get());
    }

--- a/dbms/src/DataStreams/DistinctSortedBlockInputStream.cpp
+++ b/dbms/src/DataStreams/DistinctSortedBlockInputStream.cpp
@ -131,7 +131,7 @@ ColumnRawPtrs DistinctSortedBlockInputStream::getKeyColumns(const Block & block)
            : block.getByName(columns_names[i]).column;

        /// Ignore all constant columns.
-        if (!column->isColumnConst())
+        if (!isColumnConst(*column))
            column_ptrs.emplace_back(column.get());
    }

--- a/dbms/src/DataStreams/FilterBlockInputStream.cpp
+++ b/dbms/src/DataStreams/FilterBlockInputStream.cpp
@ -112,7 +112,7 @@ Block FilterBlockInputStream::readImpl()
        size_t first_non_constant_column = 0;
        for (size_t i = 0; i < columns; ++i)
        {
-            if (!res.safeGetByPosition(i).column->isColumnConst())
+            if (!isColumnConst(*res.safeGetByPosition(i).column))
            {
                first_non_constant_column = i;

@ -165,7 +165,7 @@ Block FilterBlockInputStream::readImpl()
            if (i == first_non_constant_column)
                continue;

-            if (current_column.column->isColumnConst())
+            if (isColumnConst(*current_column.column))
                current_column.column = current_column.column->cut(0, filtered_rows);
            else
                current_column.column = current_column.column->filter(*filter_and_holder.data, -1);
--- a/dbms/src/DataStreams/IBlockInputStream.cpp
+++ b/dbms/src/DataStreams/IBlockInputStream.cpp
@ -144,7 +144,7 @@ void IBlockInputStream::updateExtremes(Block & block)
        {
            const ColumnPtr & src = block.safeGetByPosition(i).column;

-            if (src->isColumnConst())
+            if (isColumnConst(*src))
            {
                /// Equal min and max.
                extremes_columns[i] = src->cloneResized(2);
@ -171,7 +171,7 @@ void IBlockInputStream::updateExtremes(Block & block)
        {
            ColumnPtr & old_extremes = extremes.safeGetByPosition(i).column;

-            if (old_extremes->isColumnConst())
+            if (isColumnConst(*old_extremes))
                continue;

            Field min_value = (*old_extremes)[0];
--- a/dbms/src/DataStreams/LimitByBlockInputStream.cpp
+++ b/dbms/src/DataStreams/LimitByBlockInputStream.cpp
@ -71,7 +71,7 @@ ColumnRawPtrs LimitByBlockInputStream::getKeyColumns(Block & block) const
        auto & column = block.getByName(name).column;

        /// Ignore all constant columns.
-        if (!column->isColumnConst())
+        if (!isColumnConst(*column))
            column_ptrs.emplace_back(column.get());
    }

--- a/dbms/src/DataStreams/MarkInCompressedFile.h
+++ b/dbms/src/DataStreams/MarkInCompressedFile.h
@ -7,8 +7,8 @@
 #include <Common/PODArray.h>

 #include <Common/config.h>
-#if USE_LFALLOC
-#include <Common/LFAllocator.h>
+#if USE_MIMALLOC
+#include <Common/MiAllocator.h>
 #endif

 namespace DB
@ -43,8 +43,8 @@ struct MarkInCompressedFile
    }

 };
-#if USE_LFALLOC
-using MarksInCompressedFile = PODArray<MarkInCompressedFile, 4096, LFAllocator>;
+#if USE_MIMALLOC
+using MarksInCompressedFile = PODArray<MarkInCompressedFile, 4096, MiAllocator>;
 #else
 using MarksInCompressedFile = PODArray<MarkInCompressedFile>;
 #endif
--- a/dbms/src/DataStreams/OneBlockInputStream.h
+++ b/dbms/src/DataStreams/OneBlockInputStream.h
@ -12,7 +12,7 @@ namespace DB
 class OneBlockInputStream : public IBlockInputStream
 {
 public:
-    OneBlockInputStream(const Block & block_) : block(block_) {}
+    explicit OneBlockInputStream(const Block & block_) : block(block_) {}

    String getName() const override { return "One"; }

--- a/dbms/src/DataStreams/ParallelInputsProcessor.h
+++ b/dbms/src/DataStreams/ParallelInputsProcessor.h
@ -95,12 +95,11 @@ public:
    {
        active_threads = max_threads;
        threads.reserve(max_threads);
-        auto thread_group = CurrentThread::getGroup();

        try
        {
            for (size_t i = 0; i < max_threads; ++i)
-                threads.emplace_back([=] () { thread(thread_group, i); });
+                threads.emplace_back(&ParallelInputsProcessor::thread, this, CurrentThread::getGroup(), i);
        }
        catch (...)
        {
--- a/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp
+++ b/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp
@ -63,6 +63,17 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
 }


+Block PushingToViewsBlockOutputStream::getHeader() const
+{
+    /// If we don't write directly to the destination
+    /// then expect that we're inserting with precalculated virtual columns
+    if (output)
+        return storage->getSampleBlock();
+    else
+        return storage->getSampleBlockWithVirtuals();
+}
+
+
 void PushingToViewsBlockOutputStream::write(const Block & block)
 {
    /** Throw an exception if the sizes of arrays - elements of nested data structures doesn't match.
@ -73,6 +84,8 @@ void PushingToViewsBlockOutputStream::write(const Block & block)
    Nested::validateArraySizes(block);

    if (output)
+        /// TODO: to support virtual and alias columns inside MVs, we should return here the inserted block extended
+        ///       with additional columns directly from storage and pass it to MVs instead of raw block.
        output->write(block);

    /// Don't process materialized views if this block is duplicate
--- a/Show More
+++ b/Show More
				`@ -0,0 +1 @@`
				`Subproject commit 6cfd649e8c0d3ed913e8aae928a669fc3b8a2365`
				`@ -0,0 +1 @@`
				`Subproject commit a787bdebce94bf3776dc0d1ad597917f479ab8d5`