Merge branch 'master' into brotli

2024-09-20 00:30:49 +00:00 · 2019-02-11 23:44:39 +03:00 · 2019-02-11 23:44:39 +03:00 · 763b3dcf73
commit 763b3dcf73
parent b01d735421 29a7649a8c
728 changed files with 13631 additions and 6393 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -0,0 +1,30 @@
 ---
 name: Bug report
 about: Create a report to help us improve ClickHouse
 title: ''
 labels: bug, issue
 assignees: ''
 ---
 (you don't have to strictly follow this form)
 **Describe the bug**
 A clear and concise description of what the bug is.
 **How to reproduce**
 * Which ClickHouse server version to use
 * Which interface to use, if matters
 * Non-default settings, if any
 * `CREATE TABLE` statements for all tables involved
 * Sample data for all these tables, use [clickhouse-obfuscator](https://github.com/yandex/ClickHouse/blob/master/dbms/programs/obfuscator/Obfuscator.cpp#L42-L80) if necessary
 * Queries to run that lead to unexpected result
 **Expected behavior**
 A clear and concise description of what you expected to happen.
 **Error message and/or stacktrace**
 If applicable, add screenshots to help explain your problem.
 **Additional context**
 Add any other context about the problem here.
--- a/.github/ISSUE_TEMPLATE/build-issue.md
+++ b/.github/ISSUE_TEMPLATE/build-issue.md
@ -0,0 +1,21 @@
 ---
 name: Build issue
 about: Report failed ClickHouse build from master
 title: ''
 labels: build
 assignees: ''
 ---
 Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.yandex/docs/en/development/build/
 **Operating system**
 OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too.
 **Cmake version**
 **Ninja version**
 **Compiler name and version**
 **Full cmake and/or ninja output**
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -0,0 +1,22 @@
 ---
 name: Feature request
 about: Suggest an idea for ClickHouse
 title: ''
 labels: feature
 assignees: ''
 ---
 (you don't have to strictly follow this form)
 **Use case**
 A clear and concise description of what is the intended usage scenario is.
 **Describe the solution you'd like**
 A clear and concise description of what you want to happen.
 **Describe alternatives you've considered**
 A clear and concise description of any alternative solutions or features you've considered.
 **Additional context**
 Add any other context or screenshots about the feature request here.
--- a/.github/ISSUE_TEMPLATE/question.md
+++ b/.github/ISSUE_TEMPLATE/question.md
@ -0,0 +1,12 @@
 ---
 name: Question
 about: Ask question about ClickHouse
 title: ''
 labels: question
 assignees: ''
 ---
 Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse 
 If you still prefer GitHub issues, remove all this text and ask your question here.
--- a/.gitignore
+++ b/.gitignore
@ -243,3 +243,5 @@ website/package-lock.json
 # ccls cache
 /.ccls-cache
 /compile_commands.json
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -8,7 +8,7 @@
 * Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` for SQL standard compatibility. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
 * Support for write in `HDFS` tables and `hdfs` table function. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
 * Added functions to search for multiple constant strings from big haystack: `multiPosition`, `multiSearch` ,`firstMatch` also with `-UTF8`, `-CaseInsensitive`, and `-CaseInsensitiveUTF8` variants. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
-* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Ivan](https://github.com/abyss7))
+* Pruning of unused shards if `SELECT` query filters by sharding key (setting `optimize_skip_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Gleb Kanterov](https://github.com/kanterov), [Ivan](https://github.com/abyss7))
 * Allow `Kafka` engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
 * Added support for `CatBoost` multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
--- a/CHANGELOG_RU.md
+++ b/CHANGELOG_RU.md
@ -1,3 +1,112 @@
 ## ClickHouse release 19.1.6, 2019-01-24
 ### Новые возможности:
 * Задание формата сжатия для отдельных столбцов. [#3899](https://github.com/yandex/ClickHouse/pull/3899) [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin), [Winter Zhang](https://github.com/zhang2014), [Anatoly](https://github.com/Sindbag))
 * Формат сжатия `Delta`. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
 * Изменение формата сжатия запросом `ALTER`. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
 * Добавлены функции `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` для совместимости со стандартом SQL. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
 * Поддержка записи в движок `HDFS` и табличную функцию `hdfs`. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
 * Добавлены функции поиска набора константных строк в тексте: `multiPosition`, `multiSearch` ,`firstMatch` также с суффиксами `-UTF8`, `-CaseInsensitive`, и `-CaseInsensitiveUTF8`. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
 * Пропуск неиспользуемых шардов в случае, если запрос `SELECT` содержит фильтрацию по ключу шардирования (настройка `optimize_skip_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Gleb Kanterov](https://github.com/kanterov), [Ivan](https://github.com/abyss7))
 * Пропуск строк в случае ошибки парсинга для движка `Kafka` (настройка `kafka_skip_broken_messages`). [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
 * Поддержка применения мультиклассовых моделей `CatBoost`. Функция `modelEvaluate` возвращает кортеж в случае использования мультиклассовой модели. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Добавлены функции `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
 * Добавлены функции хеширования `xxHash64` и `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov))
 * Добавлена функция хеширования `gccMurmurHash` (GCC flavoured Murmur hash), использующая те же hash seed, что и [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundyli](https://github.com/sundy-li))
 * Добавлены функции хеширования `javaHash`, `hiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
 * Добавлена функция `remoteSecure`. Функция работает аналогично `remote`, но использует безопасное соединение. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
 ### Экспериментальные возможности:
 * Эмуляция запросов с несколькими секциями `JOIN` (настройка `allow_experimental_multiple_joins_emulation`). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([Artem Zuikov](https://github.com/4ertus2))
 ### Исправления ошибок:
 * Ограничен размер кеша скомпилированных выражений в случае, если не указана настройка `compiled_expression_cache_size` для экономии потребляемой памяти. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
 * Исправлена проблема зависания потоков, выполняющих запрос `ALTER` для таблиц семейства `Replicated`, а также потоков, обновляющих конфигурацию из ZooKeeper. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3891](https://github.com/yandex/ClickHouse/issues/3891) [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([Alex Zatelepin](https://github.com/ztlpn))
 * Исправлен race condition в случае выполнения распределенной задачи запроса `ALTER`. Race condition приводил к состоянию, когда более чем одна реплика пыталась выполнить задачу, в результате чего все такие реплики, кроме одной, падали с ошибкой обращения к ZooKeeper. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([Alex Zatelepin](https://github.com/ztlpn))
 * Исправлена проблема обновления настройки `from_zk`. Настройка, указанная в файле конфигурации, не обновлялась в случае, если запрос к ZooKeeper падал по timeout. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([Alex Zatelepin](https://github.com/ztlpn))
 * Исправлена ошибка в вычислении сетевого префикса при указании IPv4 маски подсети. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
 * Исправлено падение (`std::terminate`) в редком сценарии, когда новый поток не мог быть создан из-за нехватки ресурсов. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлено падение табличной функции `remote` в случае, когда не удавалось получить структуру таблицы из-за ограничений пользователя. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
 * Исправлена утечка сетевых сокетов. Сокеты создавались в пуле и никогда не закрывались. При создании потока, создавались новые сокеты в случае, если все доступные использовались. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([Alex Zatelepin](https://github.com/ztlpn))
 * Исправлена проблема закрывания `/proc/self/fd` раньше, чем все файловые дескрипторы были прочитаны из `/proc` после создания процесса `odbc-bridge`. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
 * Исправлен баг в монотонном преобразовании String в UInt в случае использования String в первичном ключе. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([Winter Zhang](https://github.com/zhang2014))
 * Исправлен баг в вычислении монотонности функции преобразования типа целых значений. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлено падение в функциях `arrayEnumerateUniq`, `arrayEnumerateDense` при передаче невалидных аргументов. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлен undefined behavior в StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([Amos Bird](https://github.com/amosbird))
 * Исправлено падение в функциях `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлена проблема, в результате которой функции `round`, `floor`, `trunc`, `ceil` могли возвращать неверный результат для отрицательных целочисленных аргументов с большим значением. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлена проблема, в результате которой 'kill query sync' приводил к падению сервера. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([muVulDeePecker](https://github.com/fancyqlx))
 * Исправлен баг, приводящий к большой задержке в случае пустой очереди репликации. [#3928](https://github.com/yandex/ClickHouse/pull/3928) [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
 * Исправлено избыточное использование памяти в случае вставки в таблицу с `LowCardinality` в первичном ключе. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Исправлена сериализация пустых массивов типа `LowCardinality` для формата `Native`. [#3907](https://github.com/yandex/ClickHouse/issues/3907) [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Исправлен неверный результат в случае использования distinct для числового столбца `LowCardinality`. [#3895](https://github.com/yandex/ClickHouse/issues/3895) [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Исправлена компиляция вычисления агрегатных функций для ключа `LowCardinality` (для случая, когда включена настройка `compile`). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Исправлена передача пользователя и пароля для запросов с реплик. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) ([小路](https://github.com/nicelulu))
 * Исправлен очень редкий race condition возникающий при перечислении таблиц из базы данных типа `Dictionary` во время перезагрузки словарей. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлен неверный результат в случае использования HAVING с ROLLUP или CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([Sam Chou](https://github.com/reflection))
 * Исправлена проблема с алиасами столбцов для запросов с `JOIN ON` над распределенными таблицами. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([Winter Zhang](https://github.com/zhang2014))
 * Исправлена ошибка в реализации функции `quantileTDigest` (нашел Artem Vakhrushev). Эта ошибка никогда не происходит в ClickHouse и актуальна только для тех, кто использует кодовую базу ClickHouse напрямую в качестве библиотеки. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
 ### Улучшения:
 * Добавлена поддержка `IF NOT EXISTS` в выражении `ALTER TABLE ADD COLUMN`, `IF EXISTS` в выражении `DROP/MODIFY/CLEAR/COMMENT COLUMN`. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([Boris Granveaud](https://github.com/bgranvea))
 * Функция `parseDateTimeBestEffort` теперь поддерживает форматы `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` и аналогичные. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * `CapnProtoInputStream` теперь поддерживает jagged структуры. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Odin Hultgren Van Der Horst](https://github.com/Miniwoffer))
 * Улучшение usability: добавлена проверка, что сервер запущен от пользователя, совпадающего с владельцем директории данных. Запрещен запуск от пользователя root в случае, если root не владеет директорией с данными. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
 * Улучшена логика проверки столбцов, необходимых для JOIN, на стадии анализа запроса. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([Artem Zuikov](https://github.com/4ertus2))
 * Уменьшено число поддерживаемых соединений в случае большого числа распределенных таблиц. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([Winter Zhang](https://github.com/zhang2014))
 * Добавлена поддержка строки с totals для запроса с `WITH TOTALS` через ODBC драйвер. [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([Maksim Koritckiy](https://github.com/nightweb))
 * Поддержано использование `Enum` в качестве чисел в функции `if`. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([Ivan](https://github.com/abyss7))
 * Добавлена настройка `low_cardinality_allow_in_native_format`. Если она выключена, то тип `LowCadrinality` не используется в формате `Native`. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
 * Удалены некоторые избыточные объекты из кеша скомпилированных выражений для уменьшения потребления памяти. [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
 * Добавлена проверка того, что в запрос `SET send_logs_level = 'value'` передается верное значение. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([Sabyanin Maxim](https://github.com/s-mx))
 * Добавлена проверка типов для функций преобразования типов. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([Winter Zhang](https://github.com/zhang2014))
 ### Улучшения производительности:
 * Добавлена настройка `use_minimalistic_part_header_in_zookeeper` для движка MergeTree. Если настройка включена, Replicated таблицы будут хранить метаданные куска в компактном виде (в соответствующем znode для этого куска). Это может значительно уменьшить размер для ZooKeeper snapshot (особенно для таблиц с большим числом столбцов). После включения данной настройки будет невозможно сделать откат к версии, которая эту настройку не поддерживает. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([Alex Zatelepin](https://github.com/ztlpn))
 * Добавлена реализация функций `sequenceMatch` и `sequenceCount` на основе конечного автомата в случае, если последовательность событий не содержит условия на время. [#4004](https://github.com/yandex/ClickHouse/pull/4004) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
 * Улучшена производительность сериализации целых чисел. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([Amos Bird](https://github.com/amosbird))
 * Добавлен zero left padding для PODArray. Теперь элемент с индексом -1 является валидным нулевым значением. Эта особенность используется для удаления условного выражения при вычислении оффсетов массивов. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([Amos Bird](https://github.com/amosbird))
 * Откат версии `jemalloc`, приводящей к деградации производительности. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
 ### Обратно несовместимые изменения:
 * Удалена недокументированная возможность `ALTER MODIFY PRIMARY KEY`, замененная выражением `ALTER MODIFY ORDER BY`. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([Alex Zatelepin](https://github.com/ztlpn))
 * Удалена функция `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Запрещено использование скалярных подзапросов с результатом, имеющим тип `AggregateFunction`. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([Ivan](https://github.com/abyss7))
 ### Улучшения сборки/тестирования/пакетирования:
 * Добавлена поддержка сборки под PowerPC (`ppc64le`). [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([Danila Kutenin](https://github.com/danlark1))
 * Функциональные stateful тесты запускаются на публично доступных данных. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлена ошибка, при которой сервер не мог запуститься с сообщением `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` при использовании Docker или systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Обновлена библиотека `rdkafka` до версии v1.0.0-RC5. Использована cppkafka на замену интерфейса языка C. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([Ivan](https://github.com/abyss7))
 * Обновлена библиотека `mariadb-client`. Исправлена проблема, обнаруженная с использованием UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправления для сборок с UBSan. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3021](https://github.com/yandex/ClickHouse/pull/3021) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Добавлены покоммитные запуски тестов с UBSan сборкой.
 * Добавлены покоммитные запуски тестов со статическим анализатором PVS-Studio.
 * Исправлены проблемы, найденные с использованием PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправлены проблемы совместимости glibc. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Docker образы перемещены на Ubuntu 18.10, добавлена совместимость с glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
 * Добавлена переменная окружения `CLICKHOUSE_DO_NOT_CHOWN`, позволяющая не делать shown директории для Docker образа сервера. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
 * Включены большинство предупреждений из `-Weverything` для clang. Включено `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Добавлены некоторые предупреждения, специфичные только для clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * При использовании динамической линковки используется `libLLVM` вместо библиотеки `LLVM`. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([Orivej Desh](https://github.com/orivej))
 * Добавлены переменные окружения для параметров `TSan`, `UBSan`, `ASan` в тестовом Docker образе. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin))
 * Debian пакет `clickhouse-server` будет рекомендовать пакет `libcap2-bin` для того, чтобы использовать утилиту `setcap` для настроек. Данный пакет опционален. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Уменьшено время сборки, убраны ненужные включения заголовочных файлов. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller))
 * Добавлены тесты производительности для функций хеширования. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov))
 * Исправлены циклические зависимости библиотек. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller))
 * Улучшена компиляция при малом объеме памяти. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller))
 * Добавлен тестовый скрипт для воспроизведения деградации производительности в `jemalloc`. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
 * Исправления опечаток в комментариях и строковых литералах. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
 * Исправления опечаток в комментариях. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([Evgenii Pravda](https://github.com/kvinty))
 ## ClickHouse release 18.16.1, 2018-12-21
 ### Исправления ошибок:
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -221,7 +221,7 @@ if (UNBUNDLED OR NOT (OS_LINUX OR APPLE) OR ARCH_32)
    option (NO_WERROR "Disable -Werror compiler option" ON)
 endif ()
-message (STATUS "Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ; USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES} MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES} UNBUNDLED=${UNBUNDLED} CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
+message (STATUS "Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ; USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES} MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES} SPLIT_SHARED=${SPLIT_SHARED_LIBRARIES} UNBUNDLED=${UNBUNDLED} CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
 include(GNUInstallDirs)
--- a/README.md
+++ b/README.md
@ -13,4 +13,5 @@ ClickHouse is an open-source column-oriented database management system that all
 ## Upcoming Events
-* [C++ ClickHouse and CatBoost Sprints](https://events.yandex.ru/events/ClickHouse/2-feb-2019/) in Moscow on February 2.
+* [ClickHouse Community Meetup](https://www.eventbrite.com/e/meetup-clickhouse-in-the-wild-deployment-success-stories-registration-55305051899) in San Francisco on February 19.
 * [ClickHouse Community Meetup](https://www.eventbrite.com/e/clickhouse-meetup-in-madrid-registration-55376746339) in Madrid on April 2.
--- a/ci/build-clang-from-sources.sh
+++ b/ci/build-clang-from-sources.sh
@ -3,7 +3,6 @@ set -e -x
 source default-config
 # TODO Non debian systems
 ./install-os-packages.sh svn
 ./install-os-packages.sh cmake
--- a/cmake/find_cpuid.cmake
+++ b/cmake/find_cpuid.cmake
@ -1,3 +1,4 @@
 # ARM: Cannot cpuid_get_raw_data: CPUID instruction is not supported
 if (NOT ARCH_ARM)
    option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${NOT_UNBUNDLED})
 endif ()
@ -21,7 +22,7 @@ if (CPUID_LIBRARY AND CPUID_INCLUDE_DIR)
        # TODO: make virtual target cpuid:cpuid with COMPILE_DEFINITIONS property
    endif ()
    set (USE_CPUID 1)
-elseif (NOT MISSING_INTERNAL_CPUID_LIBRARY)
+elseif (NOT ARCH_ARM AND NOT MISSING_INTERNAL_CPUID_LIBRARY)
    set (CPUID_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include)
    set (USE_INTERNAL_CPUID_LIBRARY 1)
    set (CPUID_LIBRARY cpuid)
--- a/cmake/find_cpuinfo.cmake
+++ b/cmake/find_cpuinfo.cmake
@ -1,5 +1,12 @@
 option(USE_INTERNAL_CPUINFO_LIBRARY "Set to FALSE to use system cpuinfo library instead of bundled" ${NOT_UNBUNDLED})
 # Now we have no contrib/libcpuinfo, use from system.
 if (USE_INTERNAL_CPUINFO_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuinfo/include")
   #message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")
   set (USE_INTERNAL_CPUINFO_LIBRARY 0)
   set (MISSING_INTERNAL_CPUINFO_LIBRARY 1)
 endif ()
 if(NOT USE_INTERNAL_CPUINFO_LIBRARY)
    find_library(CPUINFO_LIBRARY cpuinfo)
    find_path(CPUINFO_INCLUDE_DIR NAMES cpuinfo.h PATHS ${CPUINFO_INCLUDE_PATHS})
--- a/contrib/CMakeLists.txt
+++ b/contrib/CMakeLists.txt
@ -8,6 +8,8 @@ elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")
 endif ()
 set_property(DIRECTORY PROPERTY EXCLUDE_FROM_ALL 1)
 if (USE_INTERNAL_BOOST_LIBRARY)
    add_subdirectory (boost-cmake)
 endif ()
--- a/contrib/base64-cmake/CMakeLists.txt
+++ b/contrib/base64-cmake/CMakeLists.txt
@ -39,5 +39,20 @@ add_library(base64 ${LINK_MODE}
        ${LIBRARY_DIR}/lib/codecs.h
        ${CMAKE_CURRENT_BINARY_DIR}/config.h)
-target_compile_options(base64 PRIVATE ${base64_SSSE3_opt} ${base64_SSE41_opt} ${base64_SSE42_opt} ${base64_AVX_opt} ${base64_AVX2_opt})
+if(HAVE_AVX)
    set_source_files_properties(${LIBRARY_DIR}/lib/arch/avx/codec.c PROPERTIES COMPILE_FLAGS -mavx)
 endif()
 if(HAVE_AVX2)
    set_source_files_properties(${LIBRARY_DIR}/lib/arch/avx2/codec.c PROPERTIES COMPILE_FLAGS -mavx2)
 endif()
 if(HAVE_SSE41)
    set_source_files_properties(${LIBRARY_DIR}/lib/arch/sse41/codec.c PROPERTIES COMPILE_FLAGS -msse4.1)
 endif()
 if(HAVE_SSE42)
    set_source_files_properties(${LIBRARY_DIR}/lib/arch/sse42/codec.c PROPERTIES COMPILE_FLAGS -msse4.2)
 endif()
 if(HAVE_SSSE3)
    set_source_files_properties(${LIBRARY_DIR}/lib/arch/ssse3/codec.c PROPERTIES COMPILE_FLAGS -mssse3)
 endif()
 target_include_directories(base64 PRIVATE ${LIBRARY_DIR}/include ${CMAKE_CURRENT_BINARY_DIR})
--- a/contrib/cppkafka
+++ b/contrib/cppkafka
@ -1 +1 @@
-Subproject commit 520465510efef7704346cf8d140967c4abb057c1
+Subproject commit 860c90e92eee6690aa74a2ca7b7c5c6930dffecd
--- a/contrib/pdqsort
+++ b/contrib/pdqsort
@ -1 +0,0 @@
 Subproject commit 08879029ab8dcb80a70142acb709e3df02de5d37
--- a/contrib/pdqsort/README
+++ b/contrib/pdqsort/README
@ -0,0 +1,2 @@
 Source from https://github.com/orlp/pdqsort
 Mandatory for Clickhouse, not available in OS packages, we can't use it as submodule.
--- a/contrib/pdqsort/license.txt
+++ b/contrib/pdqsort/license.txt
@ -0,0 +1,16 @@
 Copyright (c) 2015 Orson Peters <orsonpeters@gmail.com>
 This software is provided 'as-is', without any express or implied warranty. In no event will the
 authors be held liable for any damages arising from the use of this software.
 Permission is granted to anyone to use this software for any purpose, including commercial
 applications, and to alter it and redistribute it freely, subject to the following restrictions:
 1. The origin of this software must not be misrepresented; you must not claim that you wrote the
   original software. If you use this software in a product, an acknowledgment in the product
   documentation would be appreciated but is not required.
 2. Altered source versions must be plainly marked as such, and must not be misrepresented as
   being the original software.
 3. This notice may not be removed or altered from any source distribution.
--- a/contrib/pdqsort/pdqsort.h
+++ b/contrib/pdqsort/pdqsort.h
@ -0,0 +1,544 @@
 /*
    pdqsort.h - Pattern-defeating quicksort.
    Copyright (c) 2015 Orson Peters
    This software is provided 'as-is', without any express or implied warranty. In no event will the
    authors be held liable for any damages arising from the use of this software.
    Permission is granted to anyone to use this software for any purpose, including commercial
    applications, and to alter it and redistribute it freely, subject to the following restrictions:
    1. The origin of this software must not be misrepresented; you must not claim that you wrote the
       original software. If you use this software in a product, an acknowledgment in the product
       documentation would be appreciated but is not required.
    2. Altered source versions must be plainly marked as such, and must not be misrepresented as
       being the original software.
    3. This notice may not be removed or altered from any source distribution.
 */
 #ifndef PDQSORT_H
 #define PDQSORT_H
 #include <algorithm>
 #include <cstddef>
 #include <functional>
 #include <utility>
 #include <iterator>
 #if __cplusplus >= 201103L
    #include <cstdint>
    #include <type_traits>
    #define PDQSORT_PREFER_MOVE(x) std::move(x)
 #else
    #define PDQSORT_PREFER_MOVE(x) (x)
 #endif
 namespace pdqsort_detail {
    enum {
        // Partitions below this size are sorted using insertion sort.
        insertion_sort_threshold = 24,
        // Partitions above this size use Tukey's ninther to select the pivot.
        ninther_threshold = 128,
        // When we detect an already sorted partition, attempt an insertion sort that allows this
        // amount of element moves before giving up.
        partial_insertion_sort_limit = 8,
        // Must be multiple of 8 due to loop unrolling, and < 256 to fit in unsigned char.
        block_size = 64,
        // Cacheline size, assumes power of two.
        cacheline_size = 64
    };
 #if __cplusplus >= 201103L
    template<class T> struct is_default_compare : std::false_type { };
    template<class T> struct is_default_compare<std::less<T>> : std::true_type { };
    template<class T> struct is_default_compare<std::greater<T>> : std::true_type { };
 #endif
    // Returns floor(log2(n)), assumes n > 0.
    template<class T>
    inline int log2(T n) {
        int log = 0;
        while (n >>= 1) ++log;
        return log;
    }
    // Sorts [begin, end) using insertion sort with the given comparison function.
    template<class Iter, class Compare>
    inline void insertion_sort(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        if (begin == end) return;
        for (Iter cur = begin + 1; cur != end; ++cur) {
            Iter sift = cur;
            Iter sift_1 = cur - 1;
            // Compare first so we can avoid 2 moves for an element already positioned correctly.
            if (comp(*sift, *sift_1)) {
                T tmp = PDQSORT_PREFER_MOVE(*sift);
                do { *sift-- = PDQSORT_PREFER_MOVE(*sift_1); }
                while (sift != begin && comp(tmp, *--sift_1));
                *sift = PDQSORT_PREFER_MOVE(tmp);
            }
        }
    }
    // Sorts [begin, end) using insertion sort with the given comparison function. Assumes
    // *(begin - 1) is an element smaller than or equal to any element in [begin, end).
    template<class Iter, class Compare>
    inline void unguarded_insertion_sort(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        if (begin == end) return;
        for (Iter cur = begin + 1; cur != end; ++cur) {
            Iter sift = cur;
            Iter sift_1 = cur - 1;
            // Compare first so we can avoid 2 moves for an element already positioned correctly.
            if (comp(*sift, *sift_1)) {
                T tmp = PDQSORT_PREFER_MOVE(*sift);
                do { *sift-- = PDQSORT_PREFER_MOVE(*sift_1); }
                while (comp(tmp, *--sift_1));
                *sift = PDQSORT_PREFER_MOVE(tmp);
            }
        }
    }
    // Attempts to use insertion sort on [begin, end). Will return false if more than
    // partial_insertion_sort_limit elements were moved, and abort sorting. Otherwise it will
    // successfully sort and return true.
    template<class Iter, class Compare>
    inline bool partial_insertion_sort(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        if (begin == end) return true;
        int limit = 0;
        for (Iter cur = begin + 1; cur != end; ++cur) {
            if (limit > partial_insertion_sort_limit) return false;
            Iter sift = cur;
            Iter sift_1 = cur - 1;
            // Compare first so we can avoid 2 moves for an element already positioned correctly.
            if (comp(*sift, *sift_1)) {
                T tmp = PDQSORT_PREFER_MOVE(*sift);
                do { *sift-- = PDQSORT_PREFER_MOVE(*sift_1); }
                while (sift != begin && comp(tmp, *--sift_1));
                *sift = PDQSORT_PREFER_MOVE(tmp);
                limit += cur - sift;
            }
        }
        return true;
    }
    template<class Iter, class Compare>
    inline void sort2(Iter a, Iter b, Compare comp) {
        if (comp(*b, *a)) std::iter_swap(a, b);
    }
    // Sorts the elements *a, *b and *c using comparison function comp.
    template<class Iter, class Compare>
    inline void sort3(Iter a, Iter b, Iter c, Compare comp) {
        sort2(a, b, comp);
        sort2(b, c, comp);
        sort2(a, b, comp);
    }
    template<class T>
    inline T* align_cacheline(T* p) {
 #if defined(UINTPTR_MAX) && __cplusplus >= 201103L
        std::uintptr_t ip = reinterpret_cast<std::uintptr_t>(p);
 #else
        std::size_t ip = reinterpret_cast<std::size_t>(p);
 #endif
        ip = (ip + cacheline_size - 1) & -cacheline_size;
        return reinterpret_cast<T*>(ip);
    }
    template<class Iter>
    inline void swap_offsets(Iter first, Iter last,
                             unsigned char* offsets_l, unsigned char* offsets_r,
                             int num, bool use_swaps) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        if (use_swaps) {
            // This case is needed for the descending distribution, where we need
            // to have proper swapping for pdqsort to remain O(n).
            for (int i = 0; i < num; ++i) {
                std::iter_swap(first + offsets_l[i], last - offsets_r[i]);
            }
        } else if (num > 0) {
            Iter l = first + offsets_l[0]; Iter r = last - offsets_r[0];
            T tmp(PDQSORT_PREFER_MOVE(*l)); *l = PDQSORT_PREFER_MOVE(*r);
            for (int i = 1; i < num; ++i) {
                l = first + offsets_l[i]; *r = PDQSORT_PREFER_MOVE(*l);
                r = last - offsets_r[i]; *l = PDQSORT_PREFER_MOVE(*r);
            }
            *r = PDQSORT_PREFER_MOVE(tmp);
        }
    }
    // Partitions [begin, end) around pivot *begin using comparison function comp. Elements equal
    // to the pivot are put in the right-hand partition. Returns the position of the pivot after
    // partitioning and whether the passed sequence already was correctly partitioned. Assumes the
    // pivot is a median of at least 3 elements and that [begin, end) is at least
    // insertion_sort_threshold long. Uses branchless partitioning.
    template<class Iter, class Compare>
    inline std::pair<Iter, bool> partition_right_branchless(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        // Move pivot into local for speed.
        T pivot(PDQSORT_PREFER_MOVE(*begin));
        Iter first = begin;
        Iter last = end;
        // Find the first element greater than or equal than the pivot (the median of 3 guarantees
        // this exists).
        while (comp(*++first, pivot));
        // Find the first element strictly smaller than the pivot. We have to guard this search if
        // there was no element before *first.
        if (first - 1 == begin) while (first < last && !comp(*--last, pivot));
        else                    while (                !comp(*--last, pivot));
        // If the first pair of elements that should be swapped to partition are the same element,
        // the passed in sequence already was correctly partitioned.
        bool already_partitioned = first >= last;
        if (!already_partitioned) {
            std::iter_swap(first, last);
            ++first;
        }
        // The following branchless partitioning is derived from "BlockQuicksort: How Branch
        // Mispredictions don’t affect Quicksort" by Stefan Edelkamp and Armin Weiss.
        unsigned char offsets_l_storage[block_size + cacheline_size];
        unsigned char offsets_r_storage[block_size + cacheline_size];
        unsigned char* offsets_l = align_cacheline(offsets_l_storage);
        unsigned char* offsets_r = align_cacheline(offsets_r_storage);
        int num_l, num_r, start_l, start_r;
        num_l = num_r = start_l = start_r = 0;
        while (last - first > 2 * block_size) {
            // Fill up offset blocks with elements that are on the wrong side.
            if (num_l == 0) {
                start_l = 0;
                Iter it = first;
                for (unsigned char i = 0; i < block_size;) {
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                    offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
                }
            }
            if (num_r == 0) {
                start_r = 0;
                Iter it = last;
                for (unsigned char i = 0; i < block_size;) {
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                    offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
                }
            }
            // Swap elements and update block sizes and first/last boundaries.
            int num = std::min(num_l, num_r);
            swap_offsets(first, last, offsets_l + start_l, offsets_r + start_r,
                         num, num_l == num_r);
            num_l -= num; num_r -= num;
            start_l += num; start_r += num;
            if (num_l == 0) first += block_size;
            if (num_r == 0) last -= block_size;
        }
        int l_size = 0, r_size = 0;
        int unknown_left = (last - first) - ((num_r || num_l) ? block_size : 0);
        if (num_r) {
            // Handle leftover block by assigning the unknown elements to the other block.
            l_size = unknown_left;
            r_size = block_size;
        } else if (num_l) {
            l_size = block_size;
            r_size = unknown_left;
        } else {
            // No leftover block, split the unknown elements in two blocks.
            l_size = unknown_left/2;
            r_size = unknown_left - l_size;
        }
        // Fill offset buffers if needed.
        if (unknown_left && !num_l) {
            start_l = 0;
            Iter it = first;
            for (unsigned char i = 0; i < l_size;) {
                offsets_l[num_l] = i++; num_l += !comp(*it, pivot); ++it;
            }
        }
        if (unknown_left && !num_r) {
            start_r = 0;
            Iter it = last;
            for (unsigned char i = 0; i < r_size;) {
                offsets_r[num_r] = ++i; num_r += comp(*--it, pivot);
            }
        }
        int num = std::min(num_l, num_r);
        swap_offsets(first, last, offsets_l + start_l, offsets_r + start_r, num, num_l == num_r);
        num_l -= num; num_r -= num;
        start_l += num; start_r += num;
        if (num_l == 0) first += l_size;
        if (num_r == 0) last -= r_size;
        // We have now fully identified [first, last)'s proper position. Swap the last elements.
        if (num_l) {
            offsets_l += start_l;
            while (num_l--) std::iter_swap(first + offsets_l[num_l], --last);
            first = last;
        }
        if (num_r) {
            offsets_r += start_r;
            while (num_r--) std::iter_swap(last - offsets_r[num_r], first), ++first;
            last = first;
        }
        // Put the pivot in the right place.
        Iter pivot_pos = first - 1;
        *begin = PDQSORT_PREFER_MOVE(*pivot_pos);
        *pivot_pos = PDQSORT_PREFER_MOVE(pivot);
        return std::make_pair(pivot_pos, already_partitioned);
    }
    // Partitions [begin, end) around pivot *begin using comparison function comp. Elements equal
    // to the pivot are put in the right-hand partition. Returns the position of the pivot after
    // partitioning and whether the passed sequence already was correctly partitioned. Assumes the
    // pivot is a median of at least 3 elements and that [begin, end) is at least
    // insertion_sort_threshold long.
    template<class Iter, class Compare>
    inline std::pair<Iter, bool> partition_right(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        // Move pivot into local for speed.
        T pivot(PDQSORT_PREFER_MOVE(*begin));
        Iter first = begin;
        Iter last = end;
        // Find the first element greater than or equal than the pivot (the median of 3 guarantees
        // this exists).
        while (comp(*++first, pivot));
        // Find the first element strictly smaller than the pivot. We have to guard this search if
        // there was no element before *first.
        if (first - 1 == begin) while (first < last && !comp(*--last, pivot));
        else                    while (                !comp(*--last, pivot));
        // If the first pair of elements that should be swapped to partition are the same element,
        // the passed in sequence already was correctly partitioned.
        bool already_partitioned = first >= last;
        // Keep swapping pairs of elements that are on the wrong side of the pivot. Previously
        // swapped pairs guard the searches, which is why the first iteration is special-cased
        // above.
        while (first < last) {
            std::iter_swap(first, last);
            while (comp(*++first, pivot));
            while (!comp(*--last, pivot));
        }
        // Put the pivot in the right place.
        Iter pivot_pos = first - 1;
        *begin = PDQSORT_PREFER_MOVE(*pivot_pos);
        *pivot_pos = PDQSORT_PREFER_MOVE(pivot);
        return std::make_pair(pivot_pos, already_partitioned);
    }
    // Similar function to the one above, except elements equal to the pivot are put to the left of
    // the pivot and it doesn't check or return if the passed sequence already was partitioned.
    // Since this is rarely used (the many equal case), and in that case pdqsort already has O(n)
    // performance, no block quicksort is applied here for simplicity.
    template<class Iter, class Compare>
    inline Iter partition_left(Iter begin, Iter end, Compare comp) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        T pivot(PDQSORT_PREFER_MOVE(*begin));
        Iter first = begin;
        Iter last = end;
        while (comp(pivot, *--last));
        if (last + 1 == end) while (first < last && !comp(pivot, *++first));
        else                 while (                !comp(pivot, *++first));
        while (first < last) {
            std::iter_swap(first, last);
            while (comp(pivot, *--last));
            while (!comp(pivot, *++first));
        }
        Iter pivot_pos = last;
        *begin = PDQSORT_PREFER_MOVE(*pivot_pos);
        *pivot_pos = PDQSORT_PREFER_MOVE(pivot);
        return pivot_pos;
    }
    template<class Iter, class Compare, bool Branchless>
    inline void pdqsort_loop(Iter begin, Iter end, Compare comp, int bad_allowed, bool leftmost = true) {
        typedef typename std::iterator_traits<Iter>::difference_type diff_t;
        // Use a while loop for tail recursion elimination.
        while (true) {
            diff_t size = end - begin;
            // Insertion sort is faster for small arrays.
            if (size < insertion_sort_threshold) {
                if (leftmost) insertion_sort(begin, end, comp);
                else unguarded_insertion_sort(begin, end, comp);
                return;
            }
            // Choose pivot as median of 3 or pseudomedian of 9.
            diff_t s2 = size / 2;
            if (size > ninther_threshold) {
                sort3(begin, begin + s2, end - 1, comp);
                sort3(begin + 1, begin + (s2 - 1), end - 2, comp);
                sort3(begin + 2, begin + (s2 + 1), end - 3, comp);
                sort3(begin + (s2 - 1), begin + s2, begin + (s2 + 1), comp);
                std::iter_swap(begin, begin + s2);
            } else sort3(begin + s2, begin, end - 1, comp);
            // If *(begin - 1) is the end of the right partition of a previous partition operation
            // there is no element in [begin, end) that is smaller than *(begin - 1). Then if our
            // pivot compares equal to *(begin - 1) we change strategy, putting equal elements in
            // the left partition, greater elements in the right partition. We do not have to
            // recurse on the left partition, since it's sorted (all equal).
            if (!leftmost && !comp(*(begin - 1), *begin)) {
                begin = partition_left(begin, end, comp) + 1;
                continue;
            }
            // Partition and get results.
            std::pair<Iter, bool> part_result =
                Branchless ? partition_right_branchless(begin, end, comp)
                           : partition_right(begin, end, comp);
            Iter pivot_pos = part_result.first;
            bool already_partitioned = part_result.second;
            // Check for a highly unbalanced partition.
            diff_t l_size = pivot_pos - begin;
            diff_t r_size = end - (pivot_pos + 1);
            bool highly_unbalanced = l_size < size / 8 || r_size < size / 8;
            // If we got a highly unbalanced partition we shuffle elements to break many patterns.
            if (highly_unbalanced) {
                // If we had too many bad partitions, switch to heapsort to guarantee O(n log n).
                if (--bad_allowed == 0) {
                    std::make_heap(begin, end, comp);
                    std::sort_heap(begin, end, comp);
                    return;
                }
                if (l_size >= insertion_sort_threshold) {
                    std::iter_swap(begin,             begin + l_size / 4);
                    std::iter_swap(pivot_pos - 1, pivot_pos - l_size / 4);
                    if (l_size > ninther_threshold) {
                        std::iter_swap(begin + 1,         begin + (l_size / 4 + 1));
                        std::iter_swap(begin + 2,         begin + (l_size / 4 + 2));
                        std::iter_swap(pivot_pos - 2, pivot_pos - (l_size / 4 + 1));
                        std::iter_swap(pivot_pos - 3, pivot_pos - (l_size / 4 + 2));
                    }
                }
                if (r_size >= insertion_sort_threshold) {
                    std::iter_swap(pivot_pos + 1, pivot_pos + (1 + r_size / 4));
                    std::iter_swap(end - 1,                   end - r_size / 4);
                    if (r_size > ninther_threshold) {
                        std::iter_swap(pivot_pos + 2, pivot_pos + (2 + r_size / 4));
                        std::iter_swap(pivot_pos + 3, pivot_pos + (3 + r_size / 4));
                        std::iter_swap(end - 2,             end - (1 + r_size / 4));
                        std::iter_swap(end - 3,             end - (2 + r_size / 4));
                    }
                }
            } else {
                // If we were decently balanced and we tried to sort an already partitioned
                // sequence try to use insertion sort.
                if (already_partitioned && partial_insertion_sort(begin, pivot_pos, comp)
                                        && partial_insertion_sort(pivot_pos + 1, end, comp)) return;
            }
            // Sort the left partition first using recursion and do tail recursion elimination for
            // the right-hand partition.
            pdqsort_loop<Iter, Compare, Branchless>(begin, pivot_pos, comp, bad_allowed, leftmost);
            begin = pivot_pos + 1;
            leftmost = false;
        }
    }
 }
 template<class Iter, class Compare>
 inline void pdqsort(Iter begin, Iter end, Compare comp) {
    if (begin == end) return;
 #if __cplusplus >= 201103L
    pdqsort_detail::pdqsort_loop<Iter, Compare,
        pdqsort_detail::is_default_compare<typename std::decay<Compare>::type>::value &&
        std::is_arithmetic<typename std::iterator_traits<Iter>::value_type>::value>(
        begin, end, comp, pdqsort_detail::log2(end - begin));
 #else
    pdqsort_detail::pdqsort_loop<Iter, Compare, false>(
        begin, end, comp, pdqsort_detail::log2(end - begin));
 #endif
 }
 template<class Iter>
 inline void pdqsort(Iter begin, Iter end) {
    typedef typename std::iterator_traits<Iter>::value_type T;
    pdqsort(begin, end, std::less<T>());
 }
 template<class Iter, class Compare>
 inline void pdqsort_branchless(Iter begin, Iter end, Compare comp) {
    if (begin == end) return;
    pdqsort_detail::pdqsort_loop<Iter, Compare, true>(
        begin, end, comp, pdqsort_detail::log2(end - begin));
 }
 template<class Iter>
 inline void pdqsort_branchless(Iter begin, Iter end) {
    typedef typename std::iterator_traits<Iter>::value_type T;
    pdqsort_branchless(begin, end, std::less<T>());
 }
 #undef PDQSORT_PREFER_MOVE
 #endif
--- a/contrib/pdqsort/readme.md
+++ b/contrib/pdqsort/readme.md
@ -0,0 +1,119 @@
 pdqsort
 -------
 Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast average
 case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on
 inputs with certain patterns. pdqsort is an extension and improvement of David Mussers introsort.
 All code is available for free under the zlib license.
    Best        Average     Worst       Memory      Stable      Deterministic
    n           n log n     n log n     log n       No          Yes
 ### Usage
 `pdqsort` is a drop-in replacement for [`std::sort`](http://en.cppreference.com/w/cpp/algorithm/sort).
 Just replace a call to `std::sort` with `pdqsort` to start using pattern-defeating quicksort. If your
 comparison function is branchless, you can call `pdqsort_branchless` for a potential big speedup. If
 you are using C++11, the type you're sorting is arithmetic and your comparison function is not given
 or is `std::less`/`std::greater`, `pdqsort` automatically delegates to `pdqsort_branchless`.
 ### Benchmark
 A comparison of pdqsort and GCC's `std::sort` and `std::stable_sort` with various input
 distributions:
 ![Performance graph](http://i.imgur.com/1RnIGBO.png)
 Compiled with `-std=c++11 -O2 -m64 -march=native`.
 ### Visualization
 A visualization of pattern-defeating quicksort sorting a ~200 element array with some duplicates.
 Generated using Timo Bingmann's [The Sound of Sorting](http://panthema.net/2013/sound-of-sorting/)
 program, a tool that has been invaluable during the development of pdqsort. For the purposes of
 this visualization the cutoff point for insertion sort was lowered to 8 elements.
 ![Visualization](http://i.imgur.com/QzFG09F.gif)
 ### The best case
 pdqsort is designed to run in linear time for a couple of best-case patterns. Linear time is
 achieved for inputs that are in strictly ascending or descending order, only contain equal elements,
 or are strictly in ascending order followed by one out-of-place element. There are two separate
 mechanisms at play to achieve this.
 For equal elements a smart partitioning scheme is used that always puts equal elements in the
 partition containing elements greater than the pivot. When a new pivot is chosen it's compared to
 the greatest element in the partition before it. If they compare equal we can derive that there are
 no elements smaller than the chosen pivot. When this happens we switch strategy for this partition,
 and filter out all elements equal to the pivot.
 To get linear time for the other patterns we check after every partition if any swaps were made. If
 no swaps were made and the partition was decently balanced we will optimistically attempt to use
 insertion sort. This insertion sort aborts if more than a constant amount of moves are required to
 sort.
 ### The average case
 On average case data where no patterns are detected pdqsort is effectively a quicksort that uses
 median-of-3 pivot selection, switching to insertion sort if the number of elements to be
 (recursively) sorted is small. The overhead associated with detecting the patterns for the best case
 is so small it lies within the error of measurement.
 pdqsort gets a great speedup over the traditional way of implementing quicksort when sorting large
 arrays (1000+ elements). This is due to a new technique described in "BlockQuicksort: How Branch
 Mispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss. In short, we bypass the
 branch predictor by using small buffers (entirely in L1 cache) of the indices of elements that need
 to be swapped. We fill these buffers in a branch-free way that's quite elegant (in pseudocode):
 ```cpp
 buffer_num = 0; buffer_max_size = 64;
 for (int i = 0; i < buffer_max_size; ++i) {
    // With branch:
    if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; }
    // Without:
    buffer[buffer_num] = i; buffer_num += (elements[i] < pivot);
 }
 ```
 This is only a speedup if the comparison function itself is branchless, however. By default pdqsort
 will detect this if you're using C++11 or higher, the type you're sorting is arithmetic (e.g.
 `int`), and you're using either `std::less` or `std::greater`. You can explicitly request branchless
 partitioning by calling `pdqsort_branchless` instead of `pdqsort`.
 ### The worst case
 Quicksort naturally performs bad on inputs that form patterns, due to it being a partition-based
 sort. Choosing a bad pivot will result in many comparisons that give little to no progress in the
 sorting process. If the pattern does not get broken up, this can happen many times in a row. Worse,
 real world data is filled with these patterns.
 Traditionally the solution to this is to randomize the pivot selection of quicksort. While this
 technically still allows for a quadratic worst case, the chances of it happening are astronomically
 small. Later, in introsort, pivot selection is kept deterministic, instead switching to the
 guaranteed O(n log n) heapsort if the recursion depth becomes too big. In pdqsort we adopt a hybrid
 approach, (deterministically) shuffling some elements to break up patterns when we encounter a "bad"
 partition. If we encounter too many "bad" partitions we switch to heapsort.
 ### Bad partitions
 A bad partition occurs when the position of the pivot after partitioning is under 12.5% (1/8th)
 percentile or over 87,5% percentile - the partition is highly unbalanced. When this happens we will
 shuffle four elements at fixed locations for both partitions. This effectively breaks up many
 patterns. If we encounter more than log(n) bad partitions we will switch to heapsort.
 The 1/8th percentile is not chosen arbitrarily. An upper bound of quicksorts worst case runtime can
 be approximated within a constant factor by the following recurrence:
    T(n, p) = n + T(p(n-1), p) + T((1-p)(n-1), p)
 Where n is the number of elements, and p is the percentile of the pivot after partitioning.
 `T(n, 1/2)` is the best case for quicksort. On modern systems heapsort is profiled to be
 approximately 1.8 to 2 times as slow as quicksort. Choosing p such that `T(n, 1/2) / T(n, p) ~= 1.9`
 as n gets big will ensure that we will only switch to heapsort if it would speed up the sorting.
 p = 1/8 is a reasonably close value and is cheap to compute on every platform using a bitshift.
--- a/dbms/CMakeLists.txt
+++ b/dbms/CMakeLists.txt
@ -102,7 +102,9 @@ add_headers_and_sources(dbms src/Interpreters/ClusterProxy)
 add_headers_and_sources(dbms src/Columns)
 add_headers_and_sources(dbms src/Storages)
 add_headers_and_sources(dbms src/Storages/Distributed)
-add_headers_and_sources(dbms src/Storages/Kafka)
+if(USE_RDKAFKA)
    add_headers_and_sources(dbms src/Storages/Kafka)
 endif()
 add_headers_and_sources(dbms src/Storages/MergeTree)
 add_headers_and_sources(dbms src/Client)
 add_headers_and_sources(dbms src/Formats)
--- a/dbms/cmake/version.cmake
+++ b/dbms/cmake/version.cmake
@ -1,11 +1,11 @@
 # This strings autochanged from release_lib.sh:
-set(VERSION_REVISION 54413)
+set(VERSION_REVISION 54415)
 set(VERSION_MAJOR 19)
-set(VERSION_MINOR 1)
+set(VERSION_MINOR 3)
-set(VERSION_PATCH 6)
+set(VERSION_PATCH 0)
-set(VERSION_GITHASH f73b337a93d534671b2187660398b8573fc1d464)
+set(VERSION_GITHASH 1db4bd8c2a1a0cd610c8a6564e8194dca5265562)
-set(VERSION_DESCRIBE v19.1.6-testing)
+set(VERSION_DESCRIBE v19.3.0-testing)
-set(VERSION_STRING 19.1.6)
+set(VERSION_STRING 19.3.0)
 # end of autochange
 set(VERSION_EXTRA "" CACHE STRING "")
--- a/dbms/programs/client/Client.cpp
+++ b/dbms/programs/client/Client.cpp
@ -12,6 +12,7 @@
 #include <unordered_set>
 #include <algorithm>
 #include <optional>
 #include <ext/scope_guard.h>
 #include <boost/program_options.hpp>
 #include <boost/algorithm/string/replace.hpp>
 #include <Poco/String.h>
@ -400,6 +401,7 @@ private:
                throw Exception("time option could be specified only in non-interactive mode", ErrorCodes::BAD_ARGUMENTS);
 #if USE_READLINE
            SCOPE_EXIT({ Suggest::instance().finalize(); });
            if (server_revision >= Suggest::MIN_SERVER_REVISION
                && !config().getBool("disable_suggestion", false))
            {
@ -722,7 +724,11 @@ private:
                try
                {
-                    if (!processSingleQuery(str, ast) && !ignore_error)
+                    auto ast_to_process = ast;
                    if (insert && insert->data)
                        ast_to_process = nullptr;
                    if (!processSingleQuery(str, ast_to_process) && !ignore_error)
                        return false;
                }
                catch (...)
@ -1029,25 +1035,56 @@ private:
        InterruptListener interrupt_listener;
        bool cancelled = false;
        // TODO: get the poll_interval from commandline.
        const auto receive_timeout = connection->getTimeouts().receive_timeout;
        constexpr size_t default_poll_interval = 1000000; /// in microseconds
        constexpr size_t min_poll_interval = 5000; /// in microseconds
        const size_t poll_interval
            = std::max(min_poll_interval, std::min<size_t>(receive_timeout.totalMicroseconds(), default_poll_interval));
        while (true)
        {
-            /// Has the Ctrl+C been pressed and thus the query should be cancelled?
+            Stopwatch receive_watch(CLOCK_MONOTONIC_COARSE);
            /// If this is the case, inform the server about it and receive the remaining packets
            /// to avoid losing sync.
            if (!cancelled)
            {
                if (interrupt_listener.check())
                {
                    connection->sendCancel();
                    cancelled = true;
                    if (is_interactive)
                        std::cout << "Cancelling query." << std::endl;
-                    /// Pressing Ctrl+C twice results in shut down.
+            while (true)
-                    interrupt_listener.unblock();
+            {
                /// Has the Ctrl+C been pressed and thus the query should be cancelled?
                /// If this is the case, inform the server about it and receive the remaining packets
                /// to avoid losing sync.
                if (!cancelled)
                {
                    auto cancelQuery = [&] {
                        connection->sendCancel();
                        cancelled = true;
                        if (is_interactive)
                            std::cout << "Cancelling query." << std::endl;
                        /// Pressing Ctrl+C twice results in shut down.
                        interrupt_listener.unblock();
                    };
                    if (interrupt_listener.check())
                    {
                        cancelQuery();
                    }
                    else
                    {
                        double elapsed = receive_watch.elapsedSeconds();
                        if (elapsed > receive_timeout.totalSeconds())
                        {
                            std::cout << "Timeout exceeded while receiving data from server."
                                      << " Waited for " << static_cast<size_t>(elapsed) << " seconds,"
                                      << " timeout is " << receive_timeout.totalSeconds() << " seconds." << std::endl;
                            cancelQuery();
                        }
                    }
                }
-                else if (!connection->poll(1000000))
+
-                    continue;    /// If there is no new data, continue checking whether the query was cancelled after a timeout.
+                /// Poll for changes after a cancellation check, otherwise it never reached
                /// because of progress updates from server.
                if (connection->poll(poll_interval))
                  break;
            }
            if (!receiveAndProcessPacket())
@ -1303,7 +1340,11 @@ private:
    void onProgress(const Progress & value)
    {
-        progress.incrementPiecewiseAtomically(value);
+        if (!progress.incrementPiecewiseAtomically(value))
        {
            // Just a keep-alive update.
            return;
        }
        if (block_out_stream)
            block_out_stream->onProgress(value);
        writeProgress();
@ -1542,12 +1583,19 @@ public:
        po::options_description main_description("Main options", line_length, min_description_length);
        main_description.add_options()
            ("help", "produce help message")
-            ("config-file,c", po::value<std::string>(), "config-file path")
+            ("config-file,C", po::value<std::string>(), "config-file path")
            ("config,c", po::value<std::string>(), "config-file path (another shorthand)")
            ("host,h", po::value<std::string>()->default_value("localhost"), "server host")
            ("port", po::value<int>()->default_value(9000), "server port")
            ("secure,s", "Use TLS connection")
            ("user,u", po::value<std::string>()->default_value("default"), "user")
-            ("password", po::value<std::string>(), "password")
+            /** If "--password [value]" is used but the value is omitted, the bad argument exception will be thrown.
              * implicit_value is used to avoid this exception (to allow user to type just "--password")
              * Since currently boost provides no way to check if a value has been set implicitly for an option,
              * the "\n" is used to distinguish this case because there is hardly a chance an user would use "\n"
              * as the password.
              */
            ("password", po::value<std::string>()->implicit_value("\n"), "password")
            ("ask-password", "ask-password")
            ("query_id", po::value<std::string>(), "query_id")
            ("query,q", po::value<std::string>(), "query")
@ -1585,13 +1633,11 @@ public:
            ("structure", po::value<std::string>(), "structure")
            ("types", po::value<std::string>(), "types")
        ;
        /// Parse main commandline options.
        po::parsed_options parsed = po::command_line_parser(
            common_arguments.size(), common_arguments.data()).options(main_description).run();
        po::variables_map options;
        po::store(parsed, options);
        if (options.count("version") || options.count("V"))
        {
            showClientVersion();
@ -1643,15 +1689,23 @@ public:
        }
        /// Extract settings from the options.
-#define EXTRACT_SETTING(TYPE, NAME, DEFAULT, DESCRIPTION) \
+#define EXTRACT_SETTING(TYPE, NAME, DEFAULT, DESCRIPTION)                \
-        if (options.count(#NAME)) \
+        if (options.count(#NAME))                                        \
-            context.setSetting(#NAME, options[#NAME].as<std::string>());
+        {                                                                \
            context.setSetting(#NAME, options[#NAME].as<std::string>()); \
            config().setString(#NAME, options[#NAME].as<std::string>()); \
        }
        APPLY_FOR_SETTINGS(EXTRACT_SETTING)
 #undef EXTRACT_SETTING
        if (options.count("config-file") && options.count("config"))
            throw Exception("Two or more configuration files referenced in arguments", ErrorCodes::BAD_ARGUMENTS);
        /// Save received data into the internal config.
        if (options.count("config-file"))
            config().setString("config-file", options["config-file"].as<std::string>());
        if (options.count("config"))
            config().setString("config-file", options["config"].as<std::string>());
        if (options.count("host") && !options["host"].defaulted())
            config().setString("host", options["host"].as<std::string>());
        if (options.count("query_id"))
@ -1710,11 +1764,11 @@ public:
 int mainEntryClickHouseClient(int argc, char ** argv)
 {
    DB::Client client;
    try
    {
        DB::Client client;
        client.init(argc, argv);
        return client.run();
    }
    catch (const boost::program_options::error & e)
    {
@ -1726,6 +1780,4 @@ int mainEntryClickHouseClient(int argc, char ** argv)
        std::cerr << DB::getCurrentExceptionMessage(true) << std::endl;
        return 1;
    }
    return client.run();
 }
--- a/dbms/programs/client/ConnectionParameters.h
+++ b/dbms/programs/client/ConnectionParameters.h
@ -48,14 +48,25 @@ struct ConnectionParameters
                is_secure ? DBMS_DEFAULT_SECURE_PORT : DBMS_DEFAULT_PORT));
        default_database = config.getString("database", "");
-        user = config.getString("user", "");
+        /// changed the default value to "default" to fix the issue when the user in the prompt is blank
-
+        user = config.getString("user", "default");
        bool password_prompt = false;
        if (config.getBool("ask-password", false))
        {
            if (config.has("password"))
                throw Exception("Specified both --password and --ask-password. Remove one of them", ErrorCodes::BAD_ARGUMENTS);
-
+            password_prompt = true;
-            std::cout << "Password for user " << user << ": ";
+        }
        else
        {
            password = config.getString("password", "");
            /// if the value of --password is omitted, the password will be set implicitly to "\n"
            if (password == "\n")
                password_prompt = true;
        }
        if (password_prompt)
        {
            std::cout << "Password for user (" << user << "): ";
            setTerminalEcho(false);
            SCOPE_EXIT({
@ -64,19 +75,14 @@ struct ConnectionParameters
            std::getline(std::cin, password);
            std::cout << std::endl;
        }
        else
        {
            password = config.getString("password", "");
        }
        compression = config.getBool("compression", true)
            ? Protocol::Compression::Enable
            : Protocol::Compression::Disable;
        timeouts = ConnectionTimeouts(
            Poco::Timespan(config.getInt("connect_timeout", DBMS_DEFAULT_CONNECT_TIMEOUT_SEC), 0),
            Poco::Timespan(config.getInt("receive_timeout", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC), 0),
            Poco::Timespan(config.getInt("send_timeout", DBMS_DEFAULT_SEND_TIMEOUT_SEC), 0),
            Poco::Timespan(config.getInt("receive_timeout", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC), 0),
            Poco::Timespan(config.getInt("tcp_keep_alive_timeout", 0), 0));
    }
 };
--- a/dbms/programs/client/Suggest.h
+++ b/dbms/programs/client/Suggest.h
@ -39,7 +39,7 @@ private:
        "DATABASES", "LIKE", "PROCESSLIST", "CASE", "WHEN", "THEN", "ELSE", "END", "DESCRIBE", "DESC", "USE", "SET", "OPTIMIZE", "FINAL", "DEDUPLICATE",
        "INSERT", "VALUES", "SELECT", "DISTINCT", "SAMPLE", "ARRAY", "JOIN", "GLOBAL", "LOCAL", "ANY", "ALL", "INNER", "LEFT", "RIGHT", "FULL", "OUTER",
        "CROSS", "USING", "PREWHERE", "WHERE", "GROUP", "BY", "WITH", "TOTALS", "HAVING", "ORDER", "COLLATE", "LIMIT", "UNION", "AND", "OR", "ASC", "IN",
-        "KILL", "QUERY", "SYNC", "ASYNC", "TEST"
+        "KILL", "QUERY", "SYNC", "ASYNC", "TEST", "BETWEEN"
    };
    /// Words are fetched asynchonously.
@ -194,6 +194,12 @@ public:
        });
    }
    void finalize()
    {
        if (loading_thread.joinable())
            loading_thread.join();
    }
    /// A function for readline.
    static char * generator(const char * text, int state)
    {
@ -211,8 +217,7 @@ public:
    ~Suggest()
    {
-        if (loading_thread.joinable())
+        finalize();
            loading_thread.join();
    }
 };
--- a/dbms/programs/copier/ClusterCopier.cpp
+++ b/dbms/programs/copier/ClusterCopier.cpp
@ -67,7 +67,6 @@
 #include <Storages/StorageDistributed.h>
 #include <Databases/DatabaseMemory.h>
 #include <Common/StatusFile.h>
 #include <daemon/OwnPatternFormatter.h>
 namespace DB
@ -817,7 +816,7 @@ public:
            try
            {
-                type->deserializeTextQuoted(*column_dummy, rb, FormatSettings());
+                type->deserializeAsTextQuoted(*column_dummy, rb, FormatSettings());
            }
            catch (Exception & e)
            {
@ -1179,7 +1178,7 @@ protected:
    /// Removes MATERIALIZED and ALIAS columns from create table query
    static ASTPtr removeAliasColumnsFromCreateQuery(const ASTPtr & query_ast)
    {
-        const ASTs & column_asts = typeid_cast<ASTCreateQuery &>(*query_ast).columns->children;
+        const ASTs & column_asts = typeid_cast<ASTCreateQuery &>(*query_ast).columns_list->columns->children;
        auto new_columns = std::make_shared<ASTExpressionList>();
        for (const ASTPtr & column_ast : column_asts)
@ -1198,8 +1197,13 @@ protected:
        ASTPtr new_query_ast = query_ast->clone();
        ASTCreateQuery & new_query = typeid_cast<ASTCreateQuery &>(*new_query_ast);
-        new_query.columns = new_columns.get();
+
-        new_query.children.at(0) = std::move(new_columns);
+        auto new_columns_list = std::make_shared<ASTColumns>();
        new_columns_list->set(new_columns_list->columns, new_columns);
        new_columns_list->set(
                new_columns_list->indices, typeid_cast<ASTCreateQuery &>(*query_ast).columns_list->indices->clone());
        new_query.replace(new_query.columns_list, new_columns_list);
        return new_query_ast;
    }
@ -1217,7 +1221,7 @@ protected:
        res->table = new_table.second;
        res->children.clear();
-        res->set(res->columns, create.columns->clone());
+        res->set(res->columns_list, create.columns_list->clone());
        res->set(res->storage, new_storage_ast->clone());
        return res;
@ -1877,7 +1881,7 @@ protected:
            for (size_t i = 0; i < column.column->size(); ++i)
            {
                WriteBufferFromOwnString wb;
-                column.type->serializeTextQuoted(*column.column, i, wb, FormatSettings());
+                column.type->serializeAsTextQuoted(*column.column, i, wb, FormatSettings());
                res.emplace(wb.str());
            }
        }
--- a/dbms/programs/obfuscator/Obfuscator.cpp
+++ b/dbms/programs/obfuscator/Obfuscator.cpp
@ -1037,7 +1037,7 @@ try
    Obfuscator obfuscator(header, seed, markov_model_params);
-    size_t max_block_size = 8192;
+    UInt64 max_block_size = 8192;
    /// Train step
    {
--- a/dbms/programs/odbc-bridge/MainHandler.cpp
+++ b/dbms/programs/odbc-bridge/MainHandler.cpp
@ -75,7 +75,7 @@ void ODBCHandler::handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Ne
        return;
    }
-    size_t max_block_size = DEFAULT_BLOCK_SIZE;
+    UInt64 max_block_size = DEFAULT_BLOCK_SIZE;
    if (params.has("max_block_size"))
    {
        std::string max_block_size_str = params.get("max_block_size", "");
--- a/dbms/programs/performance-test/PerformanceTest.cpp
+++ b/dbms/programs/performance-test/PerformanceTest.cpp
@ -18,6 +18,32 @@ namespace ErrorCodes
 extern const int NOT_IMPLEMENTED;
 }
 namespace
 {
 void waitQuery(Connection & connection)
 {
    bool finished = false;
    while (true)
    {
        if (!connection.poll(1000000))
            continue;
        Connection::Packet packet = connection.receivePacket();
        switch (packet.type)
        {
            case Protocol::Server::EndOfStream:
                finished = true;
                break;
            case Protocol::Server::Exception:
                throw *packet.exception;
        }
        if (finished)
            break;
    }
 }
 }
 namespace fs = boost::filesystem;
 PerformanceTest::PerformanceTest(
@ -25,12 +51,14 @@ PerformanceTest::PerformanceTest(
    Connection & connection_,
    InterruptListener & interrupt_listener_,
    const PerformanceTestInfo & test_info_,
-    Context & context_)
+    Context & context_,
    const std::vector<size_t> & queries_to_run_)
    : config(config_)
    , connection(connection_)
    , interrupt_listener(interrupt_listener_)
    , test_info(test_info_)
    , context(context_)
    , queries_to_run(queries_to_run_)
    , log(&Poco::Logger::get("PerformanceTest"))
 {
 }
@ -133,14 +161,18 @@ void PerformanceTest::prepare() const
 {
    for (const auto & query : test_info.create_queries)
    {
-        LOG_INFO(log, "Executing create query '" << query << "'");
+        LOG_INFO(log, "Executing create query \"" << query << '\"');
-        connection.sendQuery(query);
+        connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
        waitQuery(connection);
        LOG_INFO(log, "Query finished");
    }
    for (const auto & query : test_info.fill_queries)
    {
-        LOG_INFO(log, "Executing fill query '" << query << "'");
+        LOG_INFO(log, "Executing fill query \"" << query << '\"');
-        connection.sendQuery(query);
+        connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
        waitQuery(connection);
        LOG_INFO(log, "Query finished");
    }
 }
@ -149,17 +181,24 @@ void PerformanceTest::finish() const
 {
    for (const auto & query : test_info.drop_queries)
    {
-        LOG_INFO(log, "Executing drop query '" << query << "'");
+        LOG_INFO(log, "Executing drop query \"" << query << '\"');
-        connection.sendQuery(query);
+        connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
        waitQuery(connection);
        LOG_INFO(log, "Query finished");
    }
 }
 std::vector<TestStats> PerformanceTest::execute()
 {
    std::vector<TestStats> statistics_by_run;
    size_t query_count;
    if (queries_to_run.empty())
        query_count = test_info.queries.size();
    else
        query_count = queries_to_run.size();
    size_t total_runs = test_info.times_to_run * test_info.queries.size();
    statistics_by_run.resize(total_runs);
-    LOG_INFO(log, "Totally will run cases " << total_runs << " times");
+    LOG_INFO(log, "Totally will run cases " << test_info.times_to_run * query_count << " times");
    UInt64 max_exec_time = calculateMaxExecTime();
    if (max_exec_time != 0)
        LOG_INFO(log, "Test will be executed for a maximum of " << max_exec_time / 1000. << " seconds");
@ -172,9 +211,13 @@ std::vector<TestStats> PerformanceTest::execute()
        for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
        {
-            size_t statistic_index = number_of_launch * test_info.queries.size() + query_index;
+            if (queries_to_run.empty() || std::find(queries_to_run.begin(), queries_to_run.end(), query_index) != queries_to_run.end())
-
+            {
-            queries_with_indexes.push_back({test_info.queries[query_index], statistic_index});
+                size_t statistic_index = number_of_launch * test_info.queries.size() + query_index;
                queries_with_indexes.push_back({test_info.queries[query_index], statistic_index});
            }
            else
                LOG_INFO(log, "Will skip query " << test_info.queries[query_index] << " by index");
        }
        if (got_SIGINT)
--- a/dbms/programs/performance-test/PerformanceTest.h
+++ b/dbms/programs/performance-test/PerformanceTest.h
@ -22,7 +22,8 @@ public:
        Connection & connection_,
        InterruptListener & interrupt_listener_,
        const PerformanceTestInfo & test_info_,
-        Context & context_);
+        Context & context_,
        const std::vector<size_t> & queries_to_run_);
    bool checkPreconditions() const;
    void prepare() const;
@ -54,6 +55,7 @@ private:
    PerformanceTestInfo test_info;
    Context & context;
    std::vector<size_t> queries_to_run;
    Poco::Logger * log;
    bool got_SIGINT = false;
--- a/dbms/programs/performance-test/PerformanceTestInfo.cpp
+++ b/dbms/programs/performance-test/PerformanceTestInfo.cpp
@ -36,42 +36,6 @@ void extractSettings(
    }
 }
 void checkMetricsInput(const Strings & metrics, ExecutionType exec_type)
 {
    Strings loop_metrics = {
        "min_time", "quantiles", "total_time",
        "queries_per_second", "rows_per_second",
        "bytes_per_second"};
    Strings non_loop_metrics = {
        "max_rows_per_second", "max_bytes_per_second",
        "avg_rows_per_second", "avg_bytes_per_second"};
    if (exec_type == ExecutionType::Loop)
    {
        for (const std::string & metric : metrics)
        {
            auto non_loop_pos =
                std::find(non_loop_metrics.begin(), non_loop_metrics.end(), metric);
            if (non_loop_pos != non_loop_metrics.end())
               throw Exception("Wrong type of metric for loop execution type (" + metric + ")",
                   ErrorCodes::BAD_ARGUMENTS);
        }
    }
    else
    {
        for (const std::string & metric : metrics)
        {
            auto loop_pos = std::find(loop_metrics.begin(), loop_metrics.end(), metric);
            if (loop_pos != loop_metrics.end())
                throw Exception(
                    "Wrong type of metric for non-loop execution type (" + metric + ")",
                    ErrorCodes::BAD_ARGUMENTS);
        }
    }
 }
 }
@ -84,12 +48,19 @@ PerformanceTestInfo::PerformanceTestInfo(
 {
    test_name = config->getString("name");
    path = config->getString("path");
    if (config->has("main_metric"))
    {
        Strings main_metrics;
        config->keys("main_metric", main_metrics);
        if (main_metrics.size())
            main_metric = main_metrics[0];
    }
    applySettings(config);
    extractQueries(config);
    processSubstitutions(config);
    getExecutionType(config);
    getStopConditions(config);
    getMetrics(config);
    extractAuxiliaryQueries(config);
 }
@ -239,37 +210,6 @@ void PerformanceTestInfo::getStopConditions(XMLConfigurationPtr config)
 }
 void PerformanceTestInfo::getMetrics(XMLConfigurationPtr config)
 {
    ConfigurationPtr metrics_view(config->createView("metrics"));
    metrics_view->keys(metrics);
    if (config->has("main_metric"))
    {
        Strings main_metrics;
        config->keys("main_metric", main_metrics);
        if (main_metrics.size())
            main_metric = main_metrics[0];
    }
    if (!main_metric.empty())
    {
        if (std::find(metrics.begin(), metrics.end(), main_metric) == metrics.end())
            metrics.push_back(main_metric);
    }
    else
    {
        if (metrics.empty())
            throw Exception("You shoud specify at least one metric",
                ErrorCodes::BAD_ARGUMENTS);
        main_metric = metrics[0];
    }
    if (metrics.size() > 0)
        checkMetricsInput(metrics, exec_type);
 }
 void PerformanceTestInfo::extractAuxiliaryQueries(XMLConfigurationPtr config)
 {
    if (config->has("create_query"))
--- a/dbms/programs/performance-test/PerformanceTestInfo.h
+++ b/dbms/programs/performance-test/PerformanceTestInfo.h
@ -33,7 +33,6 @@ public:
    std::string main_metric;
    Strings queries;
    Strings metrics;
    Settings settings;
    ExecutionType exec_type;
--- a/dbms/programs/performance-test/PerformanceTestSuite.cpp
+++ b/dbms/programs/performance-test/PerformanceTestSuite.cpp
@ -11,12 +11,13 @@
 #include <boost/filesystem.hpp>
 #include <boost/program_options.hpp>
-#include <Poco/Util/XMLConfiguration.h>
+#include <Poco/AutoPtr.h>
 #include <Poco/Logger.h>
 #include <Poco/ConsoleChannel.h>
 #include <Poco/FormattingChannel.h>
 #include <Poco/Logger.h>
 #include <Poco/Path.h>
 #include <Poco/PatternFormatter.h>
-
+#include <Poco/Util/XMLConfiguration.h>
 #include <common/logger_useful.h>
 #include <Client/Connection.h>
@ -25,7 +26,6 @@
 #include <IO/ConnectionTimeouts.h>
 #include <IO/UseSSL.h>
 #include <Interpreters/Settings.h>
 #include <Poco/AutoPtr.h>
 #include <Common/Exception.h>
 #include <Common/InterruptListener.h>
@ -70,6 +70,7 @@ public:
        Strings && skip_names_,
        Strings && tests_names_regexp_,
        Strings && skip_names_regexp_,
        const std::unordered_map<std::string, std::vector<size_t>> query_indexes_,
        const ConnectionTimeouts & timeouts)
        : connection(host_, port_, default_database_, user_,
            password_, timeouts, "performance-test", Protocol::Compression::Enable,
@ -80,6 +81,7 @@ public:
        , skip_tags(std::move(skip_tags_))
        , skip_names(std::move(skip_names_))
        , skip_names_regexp(std::move(skip_names_regexp_))
        , query_indexes(query_indexes_)
        , lite_output(lite_output_)
        , profiles_file(profiles_file_)
        , input_files(input_files_)
@ -128,6 +130,7 @@ private:
    const Strings & skip_tags;
    const Strings & skip_names;
    const Strings & skip_names_regexp;
    std::unordered_map<std::string, std::vector<size_t>> query_indexes;
    Context global_context = Context::createGlobal();
    std::shared_ptr<ReportBuilder> report_builder;
@ -167,11 +170,13 @@ private:
            for (auto & test_config : tests_configurations)
            {
                auto [output, signal] = runTest(test_config);
-                if (lite_output)
+                if (!output.empty())
-                    std::cout << output;
+                {
-                else
+                    if (lite_output)
-                    outputs.push_back(output);
+                        std::cout << output;
-
+                    else
                        outputs.push_back(output);
                }
                if (signal)
                    break;
            }
@ -198,28 +203,34 @@ private:
    {
        PerformanceTestInfo info(test_config, profiles_file);
        LOG_INFO(log, "Config for test '" << info.test_name << "' parsed");
-        PerformanceTest current(test_config, connection, interrupt_listener, info, global_context);
+        PerformanceTest current(test_config, connection, interrupt_listener, info, global_context, query_indexes[info.path]);
-        current.checkPreconditions();
+        if (current.checkPreconditions())
-        LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are fullfilled");
+        {
-        LOG_INFO(log, "Preparing for run, have " << info.create_queries.size()
+            LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are fullfilled");
-            << " create queries and " << info.fill_queries.size() << " fill queries");
+            LOG_INFO(
-        current.prepare();
+                log,
-        LOG_INFO(log, "Prepared");
+                "Preparing for run, have " << info.create_queries.size() << " create queries and " << info.fill_queries.size()
-        LOG_INFO(log, "Running test '" << info.test_name << "'");
+                                           << " fill queries");
-        auto result = current.execute();
+            current.prepare();
-        LOG_INFO(log, "Test '" << info.test_name << "' finished");
+            LOG_INFO(log, "Prepared");
            LOG_INFO(log, "Running test '" << info.test_name << "'");
            auto result = current.execute();
            LOG_INFO(log, "Test '" << info.test_name << "' finished");
-        LOG_INFO(log, "Running post run queries");
+            LOG_INFO(log, "Running post run queries");
-        current.finish();
+            current.finish();
-        LOG_INFO(log, "Postqueries finished");
+            LOG_INFO(log, "Postqueries finished");
-
+            if (lite_output)
-        if (lite_output)
+                return {report_builder->buildCompactReport(info, result, query_indexes[info.path]), current.checkSIGINT()};
-            return {report_builder->buildCompactReport(info, result), current.checkSIGINT()};
+            else
                return {report_builder->buildFullReport(info, result, query_indexes[info.path]), current.checkSIGINT()};
        }
        else
-            return {report_builder->buildFullReport(info, result), current.checkSIGINT()};
+            LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are not fullfilled, skip run");
    }
        return {"", current.checkSIGINT()};
    }
 };
 }
@ -289,6 +300,29 @@ static std::vector<std::string> getInputFiles(const po::variables_map & options,
    return input_files;
 }
 std::unordered_map<std::string, std::vector<std::size_t>> getTestQueryIndexes(const po::basic_parsed_options<char> & parsed_opts)
 {
    std::unordered_map<std::string, std::vector<std::size_t>> result;
    const auto & options = parsed_opts.options;
    for (size_t i = 0; i < options.size() - 1; ++i)
    {
        const auto & opt = options[i];
        if (opt.string_key == "input-files")
        {
            if (options[i + 1].string_key == "query-indexes")
            {
                const std::string & test_path = Poco::Path(opt.value[0]).absolute().toString();
                for (const auto & query_num_str : options[i + 1].value)
                {
                    size_t query_num = std::stoul(query_num_str);
                    result[test_path].push_back(query_num);
                }
            }
        }
    }
    return result;
 }
 int mainEntryClickHousePerformanceTest(int argc, char ** argv)
 try
 {
@ -314,24 +348,18 @@ try
        ("skip-names", value<Strings>()->multitoken(), "Do not run tests with name")
        ("names-regexp", value<Strings>()->multitoken(), "Run tests with names matching regexp")
        ("skip-names-regexp", value<Strings>()->multitoken(), "Do not run tests with names matching regexp")
        ("input-files", value<Strings>()->multitoken(), "Input .xml files")
        ("query-indexes", value<std::vector<size_t>>()->multitoken(), "Input query indexes")
        ("recursive,r", "Recurse in directories to find all xml's");
    /// These options will not be displayed in --help
    po::options_description hidden("Hidden options");
    hidden.add_options()
        ("input-files", value<std::vector<std::string>>(), "");
    /// But they will be legit, though. And they must be given without name
    po::positional_options_description positional;
    positional.add("input-files", -1);
    po::options_description cmdline_options;
-    cmdline_options.add(desc).add(hidden);
+    cmdline_options.add(desc);
    po::variables_map options;
-    po::store(
+    po::basic_parsed_options<char> parsed = po::command_line_parser(argc, argv).options(cmdline_options).run();
-        po::command_line_parser(argc, argv).
+    auto queries_with_indexes = getTestQueryIndexes(parsed);
-        options(cmdline_options).positional(positional).run(), options);
+    po::store(parsed, options);
    po::notify(options);
    Poco::AutoPtr<Poco::PatternFormatter> formatter(new Poco::PatternFormatter("%Y.%m.%d %H:%M:%S.%F <%p> %s: %t"));
@ -378,6 +406,7 @@ try
        std::move(skip_names),
        std::move(tests_names_regexp),
        std::move(skip_names_regexp),
        queries_with_indexes,
        timeouts);
    return performance_test_suite.run();
 }
--- a/dbms/programs/performance-test/ReportBuilder.cpp
+++ b/dbms/programs/performance-test/ReportBuilder.cpp
@ -17,6 +17,18 @@ namespace DB
 namespace
 {
 const std::regex QUOTE_REGEX{"\""};
 std::string getMainMetric(const PerformanceTestInfo & test_info)
 {
    std::string main_metric;
    if (test_info.main_metric.empty())
        if (test_info.exec_type == ExecutionType::Loop)
            main_metric = "min_time";
        else
            main_metric = "rows_per_second";
    else
        main_metric = test_info.main_metric;
    return main_metric;
 }
 }
 ReportBuilder::ReportBuilder(const std::string & server_version_)
@ -35,7 +47,8 @@ std::string ReportBuilder::getCurrentTime() const
 std::string ReportBuilder::buildFullReport(
    const PerformanceTestInfo & test_info,
-    std::vector<TestStats> & stats) const
+    std::vector<TestStats> & stats,
    const std::vector<std::size_t> & queries_to_run) const
 {
    JSONString json_output;
@ -47,13 +60,7 @@ std::string ReportBuilder::buildFullReport(
    json_output.set("time", getCurrentTime());
    json_output.set("test_name", test_info.test_name);
    json_output.set("path", test_info.path);
-    json_output.set("main_metric", test_info.main_metric);
+    json_output.set("main_metric", getMainMetric(test_info));
    auto has_metric = [&test_info] (const std::string & metric_name)
    {
        return std::find(test_info.metrics.begin(),
            test_info.metrics.end(), metric_name) != test_info.metrics.end();
    };
    if (test_info.substitutions.size())
    {
@ -85,6 +92,9 @@ std::string ReportBuilder::buildFullReport(
    std::vector<JSONString> run_infos;
    for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
    {
        if (!queries_to_run.empty() && std::find(queries_to_run.begin(), queries_to_run.end(), query_index) == queries_to_run.end())
            continue;
        for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
        {
            size_t stat_index = number_of_launch * test_info.queries.size() + query_index;
@ -97,16 +107,16 @@ std::string ReportBuilder::buildFullReport(
            auto query = std::regex_replace(test_info.queries[query_index], QUOTE_REGEX, "\\\"");
            runJSON.set("query", query);
            runJSON.set("query_index", query_index);
            if (!statistics.exception.empty())
                runJSON.set("exception", statistics.exception);
            if (test_info.exec_type == ExecutionType::Loop)
            {
                /// in seconds
-                if (has_metric("min_time"))
+                runJSON.set("min_time", statistics.min_time / double(1000));
                    runJSON.set("min_time", statistics.min_time / double(1000));
-                if (has_metric("quantiles"))
+                if (statistics.sampler.size() != 0)
                {
                    JSONString quantiles(4); /// here, 4 is the size of \t padding
                    for (double percent = 10; percent <= 90; percent += 10)
@ -130,34 +140,21 @@ std::string ReportBuilder::buildFullReport(
                    runJSON.set("quantiles", quantiles.asString());
                }
-                if (has_metric("total_time"))
+                runJSON.set("total_time", statistics.total_time);
                    runJSON.set("total_time", statistics.total_time);
-                if (has_metric("queries_per_second"))
+                if (statistics.total_time != 0)
-                    runJSON.set("queries_per_second",
+                {
-                        double(statistics.queries) / statistics.total_time);
+                    runJSON.set("queries_per_second", static_cast<double>(statistics.queries) / statistics.total_time);
-
+                    runJSON.set("rows_per_second", static_cast<double>(statistics.total_rows_read) / statistics.total_time);
-                if (has_metric("rows_per_second"))
+                    runJSON.set("bytes_per_second", static_cast<double>(statistics.total_bytes_read) / statistics.total_time);
-                    runJSON.set("rows_per_second",
+                }
                        double(statistics.total_rows_read) / statistics.total_time);
                if (has_metric("bytes_per_second"))
                    runJSON.set("bytes_per_second",
                        double(statistics.total_bytes_read) / statistics.total_time);
            }
            else
            {
-                if (has_metric("max_rows_per_second"))
+                runJSON.set("max_rows_per_second", statistics.max_rows_speed);
-                    runJSON.set("max_rows_per_second", statistics.max_rows_speed);
+                runJSON.set("max_bytes_per_second", statistics.max_bytes_speed);
-
+                runJSON.set("avg_rows_per_second", statistics.avg_rows_speed_value);
-                if (has_metric("max_bytes_per_second"))
+                runJSON.set("avg_bytes_per_second", statistics.avg_bytes_speed_value);
                    runJSON.set("max_bytes_per_second", statistics.max_bytes_speed);
                if (has_metric("avg_rows_per_second"))
                    runJSON.set("avg_rows_per_second", statistics.avg_rows_speed_value);
                if (has_metric("avg_bytes_per_second"))
                    runJSON.set("avg_bytes_per_second", statistics.avg_bytes_speed_value);
            }
            run_infos.push_back(runJSON);
@ -171,26 +168,32 @@ std::string ReportBuilder::buildFullReport(
 std::string ReportBuilder::buildCompactReport(
    const PerformanceTestInfo & test_info,
-    std::vector<TestStats> & stats) const
+    std::vector<TestStats> & stats,
    const std::vector<std::size_t> & queries_to_run) const
 {
    std::ostringstream output;
    for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
    {
        if (!queries_to_run.empty() && std::find(queries_to_run.begin(), queries_to_run.end(), query_index) == queries_to_run.end())
            continue;
        for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
        {
            if (test_info.queries.size() > 1)
                output << "query \"" << test_info.queries[query_index] << "\", ";
            output << "run " << std::to_string(number_of_launch + 1) << ": ";
-            output << test_info.main_metric << " = ";
+
            std::string main_metric = getMainMetric(test_info);
            output << main_metric << " = ";
            size_t index = number_of_launch * test_info.queries.size() + query_index;
-            output << stats[index].getStatisticByName(test_info.main_metric);
+            output << stats[index].getStatisticByName(main_metric);
            output << "\n";
        }
    }
    return output.str();
 }
 }
--- a/dbms/programs/performance-test/ReportBuilder.h
+++ b/dbms/programs/performance-test/ReportBuilder.h
@ -9,14 +9,18 @@ namespace DB
 class ReportBuilder
 {
 public:
-    explicit ReportBuilder(const std::string & server_version_);
+    ReportBuilder(const std::string & server_version_);
    std::string buildFullReport(
        const PerformanceTestInfo & test_info,
-        std::vector<TestStats> & stats) const;
+        std::vector<TestStats> & stats,
        const std::vector<std::size_t> & queries_to_run) const;
    std::string buildCompactReport(
        const PerformanceTestInfo & test_info,
-        std::vector<TestStats> & stats) const;
+        std::vector<TestStats> & stats,
        const std::vector<std::size_t> & queries_to_run) const;
 private:
    std::string server_version;
    std::string hostname;
--- a/dbms/programs/server/HTTPHandler.cpp
+++ b/dbms/programs/server/HTTPHandler.cpp
@ -4,6 +4,7 @@
 #include <Poco/File.h>
 #include <Poco/Net/HTTPBasicCredentials.h>
 #include <Poco/Net/HTTPServerRequest.h>
 #include <Poco/Net/HTTPServerRequestImpl.h>
 #include <Poco/Net/HTTPServerResponse.h>
 #include <Poco/Net/NetException.h>
@ -15,13 +16,12 @@
 #include <Common/getFQDNOrHostName.h>
 #include <Common/CurrentThread.h>
 #include <Common/setThreadName.h>
 #include <Compression/CompressedReadBuffer.h>
 #include <Compression/CompressedWriteBuffer.h>
 #include <IO/ReadBufferFromIStream.h>
 #include <IO/ZlibInflatingReadBuffer.h>
 #include <IO/BrotliReadBuffer.h>
 #include <IO/ReadBufferFromString.h>
 #include <IO/ConcatReadBuffer.h>
 #include <Compression/CompressedReadBuffer.h>
 #include <Compression/CompressedWriteBuffer.h>
 #include <IO/WriteBufferFromString.h>
 #include <IO/WriteBufferFromHTTPServerResponse.h>
 #include <IO/WriteBufferFromFile.h>
@ -563,9 +563,47 @@ void HTTPHandler::processQuery(
    client_info.http_method = http_method;
    client_info.http_user_agent = request.get("User-Agent", "");
    auto appendCallback = [&context] (ProgressCallback callback)
    {
        auto prev = context.getProgressCallback();
        context.setProgressCallback([prev, callback] (const Progress & progress)
        {
            if (prev)
                prev(progress);
            callback(progress);
        });
    };
    /// While still no data has been sent, we will report about query execution progress by sending HTTP headers.
    if (settings.send_progress_in_http_headers)
-        context.setProgressCallback([&used_output] (const Progress & progress) { used_output.out->onProgress(progress); });
+        appendCallback([&used_output] (const Progress & progress) { used_output.out->onProgress(progress); });
    if (settings.readonly > 0 && settings.cancel_http_readonly_queries_on_client_close)
    {
        Poco::Net::StreamSocket & socket = dynamic_cast<Poco::Net::HTTPServerRequestImpl &>(request).socket();
        appendCallback([&context, &socket](const Progress &)
        {
            /// Assume that at the point this method is called no one is reading data from the socket any more.
            /// True for read-only queries.
            try
            {
                char b;
                int status = socket.receiveBytes(&b, 1, MSG_DONTWAIT | MSG_PEEK);
                if (status == 0)
                    context.killCurrentQuery();
            }
            catch (Poco::TimeoutException &)
            {
            }
            catch (...)
            {
                context.killCurrentQuery();
            }
        });
    }
    executeQuery(*in, *used_output.out_maybe_delayed_and_compressed, /* allow_into_outfile = */ false, context,
        [&response] (const String & content_type) { response.setContentType(content_type); },
--- a/dbms/programs/server/MetricsTransmitter.cpp
+++ b/dbms/programs/server/MetricsTransmitter.cpp
@ -41,7 +41,7 @@ void MetricsTransmitter::run()
    const auto & config = context.getConfigRef();
    auto interval = config.getInt(config_name + ".interval", 60);
-    const std::string thread_name = "MericsTrns " + std::to_string(interval) + "s";
+    const std::string thread_name = "MetrTx" + std::to_string(interval);
    setThreadName(thread_name.c_str());
    const auto get_next_time = [](size_t seconds)
--- a/dbms/programs/server/Server.cpp
+++ b/dbms/programs/server/Server.cpp
@ -11,6 +11,7 @@
 #include <Poco/DirectoryIterator.h>
 #include <Poco/Net/HTTPServer.h>
 #include <Poco/Net/NetException.h>
 #include <Poco/Util/HelpFormatter.h>
 #include <ext/scope_guard.h>
 #include <common/logger_useful.h>
 #include <common/ErrorHandlers.h>
@ -47,6 +48,7 @@
 #include "MetricsTransmitter.h"
 #include <Common/StatusFile.h>
 #include "TCPHandlerFactory.h"
 #include "Common/config_version.h"
 #if defined(__linux__)
 #include <Common/hasLinuxCapability.h>
@ -116,6 +118,26 @@ void Server::uninitialize()
    BaseDaemon::uninitialize();
 }
 int Server::run()
 {
    if (config().hasOption("help"))
    {
        Poco::Util::HelpFormatter helpFormatter(Server::options());
        std::stringstream header;
        header << commandName() << " [OPTION] [-- [ARG]...]\n";
        header << "positional arguments can be used to rewrite config.xml properties, for example, --http_port=8010";
        helpFormatter.setHeader(header.str());
        helpFormatter.format(std::cout);
        return 0;
    }
    if (config().hasOption("version"))
    {
        std::cout << DBMS_NAME << " server version " << VERSION_STRING << "." << std::endl;
        return 0;
    }
    return Application::run();
 }
 void Server::initialize(Poco::Util::Application & self)
 {
    BaseDaemon::initialize(self);
@ -127,6 +149,21 @@ std::string Server::getDefaultCorePath() const
    return getCanonicalPath(config().getString("path", DBMS_DEFAULT_PATH)) + "cores";
 }
 void Server::defineOptions(Poco::Util::OptionSet & _options)
 {
    _options.addOption(
        Poco::Util::Option("help", "h", "show help and exit")
            .required(false)
            .repeatable(false)
            .binding("help"));
    _options.addOption(
        Poco::Util::Option("version", "V", "show version and exit")
            .required(false)
            .repeatable(false)
            .binding("version"));
    BaseDaemon::defineOptions(_options);
 }
 int Server::main(const std::vector<std::string> & /*args*/)
 {
    Logger * log = &logger();
@ -398,19 +435,37 @@ int Server::main(const std::vector<std::string> & /*args*/)
    if (config().has("max_partition_size_to_drop"))
        global_context->setMaxPartitionSizeToDrop(config().getUInt64("max_partition_size_to_drop"));
    /// Set up caches.
    /// Lower cache size on low-memory systems.
    double cache_size_to_ram_max_ratio = config().getDouble("cache_size_to_ram_max_ratio", 0.5);
    size_t max_cache_size = memory_amount * cache_size_to_ram_max_ratio;
    /// Size of cache for uncompressed blocks. Zero means disabled.
    size_t uncompressed_cache_size = config().getUInt64("uncompressed_cache_size", 0);
-    if (uncompressed_cache_size)
+    if (uncompressed_cache_size > max_cache_size)
-        global_context->setUncompressedCache(uncompressed_cache_size);
+    {
        uncompressed_cache_size = max_cache_size;
        LOG_INFO(log, "Uncompressed cache size was lowered to " << formatReadableSizeWithBinarySuffix(uncompressed_cache_size)
            << " because the system has low amount of memory");
    }
    global_context->setUncompressedCache(uncompressed_cache_size);
    /// Load global settings from default_profile and system_profile.
    global_context->setDefaultProfiles(config());
    Settings & settings = global_context->getSettingsRef();
-    /// Size of cache for marks (index of MergeTree family of tables). It is necessary.
+    /// Size of cache for marks (index of MergeTree family of tables). It is mandatory.
    size_t mark_cache_size = config().getUInt64("mark_cache_size");
-    if (mark_cache_size)
+    if (!mark_cache_size)
-        global_context->setMarkCache(mark_cache_size);
+        LOG_ERROR(log, "Too low mark cache size will lead to severe performance degradation.");
    if (mark_cache_size > max_cache_size)
    {
        mark_cache_size = max_cache_size;
        LOG_INFO(log, "Mark cache size was lowered to " << formatReadableSizeWithBinarySuffix(uncompressed_cache_size)
            << " because the system has low amount of memory");
    }
    global_context->setMarkCache(mark_cache_size);
 #if USE_EMBEDDED_COMPILER
    size_t compiled_expression_cache_size = config().getUInt64("compiled_expression_cache_size", 500);
@ -697,10 +752,10 @@ int Server::main(const std::vector<std::string> & /*args*/)
        {
            std::stringstream message;
-            message << "Available RAM = " << formatReadableSizeWithBinarySuffix(memory_amount) << ";"
+            message << "Available RAM: " << formatReadableSizeWithBinarySuffix(memory_amount) << ";"
-                << " physical cores = " << getNumberOfPhysicalCPUCores() << ";"
+                << " physical cores: " << getNumberOfPhysicalCPUCores() << ";"
                // on ARM processors it can show only enabled at current moment cores
-                << " threads = " << std::thread::hardware_concurrency() << ".";
+                << " logical cores: " << std::thread::hardware_concurrency() << ".";
            LOG_INFO(log, message.str());
        }
--- a/dbms/programs/server/Server.h
+++ b/dbms/programs/server/Server.h
@ -21,6 +21,8 @@ namespace DB
 class Server : public BaseDaemon, public IServer
 {
 public:
    using ServerApplication::run;
    Poco::Util::LayeredConfiguration & config() const override
    {
        return BaseDaemon::config();
@ -41,7 +43,10 @@ public:
        return BaseDaemon::isCancelled();
    }
    void defineOptions(Poco::Util::OptionSet & _options) override;
 protected:
    int run() override;
    void initialize(Application & self) override;
    void uninitialize() override;
--- a/dbms/programs/server/TCPHandler.cpp
+++ b/dbms/programs/server/TCPHandler.cpp
@ -6,8 +6,6 @@
 #include <Common/ClickHouseRevision.h>
 #include <Common/CurrentThread.h>
 #include <Common/Stopwatch.h>
 #include <Common/ClickHouseRevision.h>
 #include <Common/Stopwatch.h>
 #include <Common/NetException.h>
 #include <Common/setThreadName.h>
 #include <Common/config_version.h>
@ -302,10 +300,10 @@ void TCPHandler::runImpl()
 void TCPHandler::readData(const Settings & global_settings)
 {
-    auto receive_timeout = query_context.getSettingsRef().receive_timeout.value;
+    const auto receive_timeout = query_context.getSettingsRef().receive_timeout.value;
    /// Poll interval should not be greater than receive_timeout
-    size_t default_poll_interval = global_settings.poll_interval.value * 1000000;
+    const size_t default_poll_interval = global_settings.poll_interval.value * 1000000;
    size_t current_poll_interval = static_cast<size_t>(receive_timeout.totalMicroseconds());
    constexpr size_t min_poll_interval = 5000; // 5 ms
    size_t poll_interval = std::max(min_poll_interval, std::min(default_poll_interval, current_poll_interval));
@ -409,7 +407,7 @@ void TCPHandler::processOrdinaryQuery()
                }
                else
                {
-                    if (state.progress.rows && after_send_progress.elapsed() / 1000 >= query_context.getSettingsRef().interactive_delay)
+                    if (after_send_progress.elapsed() / 1000 >= query_context.getSettingsRef().interactive_delay)
                    {
                        /// Some time passed and there is a progress.
                        after_send_progress.restart();
--- a/dbms/src/AggregateFunctions/AggregateFunctionCount.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionCount.h
@ -1,12 +1,12 @@
 #pragma once
 #include <IO/VarInt.h>
 #include <IO/WriteHelpers.h>
 #include <array>
 #include <DataTypes/DataTypesNumber.h>
 #include <Columns/ColumnNullable.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <IO/WriteHelpers.h>
 namespace DB
--- a/dbms/src/AggregateFunctions/AggregateFunctionEntropy.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionEntropy.cpp
@ -0,0 +1,44 @@
 #include <AggregateFunctions/AggregateFunctionFactory.h>
 #include <AggregateFunctions/AggregateFunctionEntropy.h>
 #include <AggregateFunctions/FactoryHelpers.h>
 #include <AggregateFunctions/Helpers.h>
 namespace DB
 {
 namespace ErrorCodes
 {
    extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
 }
 namespace
 {
 AggregateFunctionPtr createAggregateFunctionEntropy(const std::string & name, const DataTypes & argument_types, const Array & parameters)
 {
    assertNoParameters(name, parameters);
    if (argument_types.empty())
        throw Exception("Incorrect number of arguments for aggregate function " + name,
                        ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
    size_t num_args = argument_types.size();
    if (num_args == 1)
    {
        /// Specialized implementation for single argument of numeric type.
        if (auto res = createWithNumericBasedType<AggregateFunctionEntropy>(*argument_types[0], num_args))
            return AggregateFunctionPtr(res);
    }
    /// Generic implementation for other types or for multiple arguments.
    return std::make_shared<AggregateFunctionEntropy<UInt128>>(num_args);
 }
 }
 void registerAggregateFunctionEntropy(AggregateFunctionFactory & factory)
 {
    factory.registerFunction("entropy", createAggregateFunctionEntropy);
 }
 }
--- a/dbms/src/AggregateFunctions/AggregateFunctionEntropy.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionEntropy.h
@ -0,0 +1,149 @@
 #pragma once
 #include <Common/HashTable/HashMap.h>
 #include <Common/NaNUtils.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <AggregateFunctions/UniqVariadicHash.h>
 #include <DataTypes/DataTypesNumber.h>
 #include <Columns/ColumnVector.h>
 #include <cmath>
 namespace DB
 {
 /** Calculates Shannon Entropy, using HashMap and computing empirical distribution function.
  * Entropy is measured in bits (base-2 logarithm is used).
  */
 template <typename Value>
 struct EntropyData
 {
    using Weight = UInt64;
    using HashingMap = HashMap<
        Value, Weight,
        HashCRC32<Value>,
        HashTableGrower<4>,
        HashTableAllocatorWithStackMemory<sizeof(std::pair<Value, Weight>) * (1 << 3)>>;
    /// For the case of pre-hashed values.
    using TrivialMap = HashMap<
        Value, Weight,
        UInt128TrivialHash,
        HashTableGrower<4>,
        HashTableAllocatorWithStackMemory<sizeof(std::pair<Value, Weight>) * (1 << 3)>>;
    using Map = std::conditional_t<std::is_same_v<UInt128, Value>, TrivialMap, HashingMap>;
    Map map;
    void add(const Value & x)
    {
        if (!isNaN(x))
            ++map[x];
    }
    void add(const Value & x, const Weight & weight)
    {
        if (!isNaN(x))
            map[x] += weight;
    }
    void merge(const EntropyData & rhs)
    {
        for (const auto & pair : rhs.map)
            map[pair.first] += pair.second;
    }
    void serialize(WriteBuffer & buf) const
    {
        map.write(buf);
    }
    void deserialize(ReadBuffer & buf)
    {
        typename Map::Reader reader(buf);
        while (reader.next())
        {
            const auto & pair = reader.get();
            map[pair.first] = pair.second;
        }
    }
    Float64 get() const
    {
        UInt64 total_value = 0;
        for (const auto & pair : map)
            total_value += pair.second;
        Float64 shannon_entropy = 0;
        for (const auto & pair : map)
        {
            Float64 frequency = Float64(pair.second) / total_value;
            shannon_entropy -= frequency * log2(frequency);
        }
        return shannon_entropy;
    }
 };
 template <typename Value>
 class AggregateFunctionEntropy final : public IAggregateFunctionDataHelper<EntropyData<Value>, AggregateFunctionEntropy<Value>>
 {
 private:
    size_t num_args;
 public:
    AggregateFunctionEntropy(size_t num_args) : num_args(num_args)
    {
    }
    String getName() const override { return "entropy"; }
    DataTypePtr getReturnType() const override
    {
        return std::make_shared<DataTypeNumber<Float64>>();
    }
    void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
    {
        if constexpr (!std::is_same_v<UInt128, Value>)
        {
            /// Here we manage only with numerical types
            const auto & column = static_cast<const ColumnVector <Value> &>(*columns[0]);
            this->data(place).add(column.getData()[row_num]);
        }
        else
        {
            this->data(place).add(UniqVariadicHash<true, false>::apply(num_args, columns, row_num));
        }
    }
    void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override
    {
        this->data(place).merge(this->data(rhs));
    }
    void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override
    {
        this->data(const_cast<AggregateDataPtr>(place)).serialize(buf);
    }
    void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override
    {
        this->data(place).deserialize(buf);
    }
    void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
    {
        auto & column = static_cast<ColumnVector<Float64> &>(to);
        column.getData().push_back(this->data(place).get());
    }
    const char * getHeaderFilePath() const override { return __FILE__; }
 };
 }
--- a/dbms/src/AggregateFunctions/AggregateFunctionFactory.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionFactory.cpp
@ -5,15 +5,17 @@
 #include <DataTypes/DataTypeArray.h>
 #include <DataTypes/DataTypeNullable.h>
 #include <DataTypes/DataTypesNumber.h>
 #include <DataTypes/DataTypeLowCardinality.h>
 #include <IO/WriteBuffer.h>
 #include <IO/WriteHelpers.h>
 #include <Interpreters/Context.h>
 #include <Common/StringUtils/StringUtils.h>
 #include <Common/typeid_cast.h>
 #include <Poco/String.h>
 #include <DataTypes/DataTypeLowCardinality.h>
 namespace DB
@ -128,7 +130,11 @@ AggregateFunctionPtr AggregateFunctionFactory::getImpl(
        return combinator->transformAggregateFunction(nested_function, argument_types, parameters);
    }
-    throw Exception("Unknown aggregate function " + name, ErrorCodes::UNKNOWN_AGGREGATE_FUNCTION);
+    auto hints = this->getHints(name);
    if (!hints.empty())
        throw Exception("Unknown aggregate function " + name + ". Maybe you meant: " + toString(hints), ErrorCodes::UNKNOWN_AGGREGATE_FUNCTION);
    else
        throw Exception("Unknown aggregate function " + name, ErrorCodes::UNKNOWN_AGGREGATE_FUNCTION);
 }
--- a/dbms/src/AggregateFunctions/AggregateFunctionHistogram.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionHistogram.h
@ -13,6 +13,8 @@
 #include <IO/WriteBuffer.h>
 #include <IO/ReadBuffer.h>
 #include <IO/WriteHelpers.h>
 #include <IO/ReadHelpers.h>
 #include <IO/VarInt.h>
 #include <AggregateFunctions/IAggregateFunction.h>
@ -268,15 +270,13 @@ public:
        lower_bound = std::min(lower_bound, other.lower_bound);
        upper_bound = std::max(lower_bound, other.upper_bound);
        for (size_t i = 0; i < other.size; i++)
        {
            add(other.points[i].mean, other.points[i].weight, max_bins);
        }
    }
    void write(WriteBuffer & buf) const
    {
-        buf.write(reinterpret_cast<const char *>(&lower_bound), sizeof(lower_bound));
+        writeBinary(lower_bound, buf);
-        buf.write(reinterpret_cast<const char *>(&upper_bound), sizeof(upper_bound));
+        writeBinary(upper_bound, buf);
        writeVarUInt(size, buf);
        buf.write(reinterpret_cast<const char *>(points), size * sizeof(WeightedValue));
@ -284,11 +284,10 @@ public:
    void read(ReadBuffer & buf, UInt32 max_bins)
    {
-        buf.read(reinterpret_cast<char *>(&lower_bound), sizeof(lower_bound));
+        readBinary(lower_bound, buf);
-        buf.read(reinterpret_cast<char *>(&upper_bound), sizeof(upper_bound));
+        readBinary(upper_bound, buf);
        readVarUInt(size, buf);
        if (size > max_bins * 2)
            throw Exception("Too many bins", ErrorCodes::TOO_LARGE_ARRAY_SIZE);
--- a/dbms/src/AggregateFunctions/AggregateFunctionMerge.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionMerge.h
@ -3,7 +3,6 @@
 #include <DataTypes/DataTypeAggregateFunction.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <Columns/ColumnAggregateFunction.h>
 #include <DataTypes/DataTypeAggregateFunction.h>
 #include <Common/typeid_cast.h>
--- a/dbms/src/AggregateFunctions/AggregateFunctionQuantile.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionQuantile.cpp
@ -41,7 +41,7 @@ template <typename T> using FuncQuantilesTDigestWeighted = AggregateFunctionQuan
 template <template <typename> class Function>
-static constexpr bool SupportDecimal()
+static constexpr bool supportDecimal()
 {
    return std::is_same_v<Function<Float32>, FuncQuantileExact<Float32>> ||
        std::is_same_v<Function<Float32>, FuncQuantilesExact<Float32>>;
@ -61,11 +61,10 @@ AggregateFunctionPtr createAggregateFunctionQuantile(const std::string & name, c
    if (which.idx == TypeIndex::TYPE) return std::make_shared<Function<TYPE>>(argument_type, params);
    FOR_NUMERIC_TYPES(DISPATCH)
 #undef DISPATCH
 #undef FOR_NUMERIC_TYPES
    if (which.idx == TypeIndex::Date) return std::make_shared<Function<DataTypeDate::FieldType>>(argument_type, params);
    if (which.idx == TypeIndex::DateTime) return std::make_shared<Function<DataTypeDateTime::FieldType>>(argument_type, params);
-    if constexpr (SupportDecimal<Function>())
+    if constexpr (supportDecimal<Function>())
    {
        if (which.idx == TypeIndex::Decimal32) return std::make_shared<Function<Decimal32>>(argument_type, params);
        if (which.idx == TypeIndex::Decimal64) return std::make_shared<Function<Decimal64>>(argument_type, params);
--- a/dbms/src/AggregateFunctions/AggregateFunctionTopK.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionTopK.cpp
@ -15,6 +15,7 @@ namespace ErrorCodes
 {
    extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
    extern const int ARGUMENT_OUT_OF_BOUND;
    extern const int ILLEGAL_TYPE_OF_ARGUMENT;
 }
@ -22,42 +23,59 @@ namespace
 {
 /// Substitute return type for Date and DateTime
-class AggregateFunctionTopKDate : public AggregateFunctionTopK<DataTypeDate::FieldType>
+template <bool is_weighted>
 class AggregateFunctionTopKDate : public AggregateFunctionTopK<DataTypeDate::FieldType, is_weighted>
 {
-    using AggregateFunctionTopK<DataTypeDate::FieldType>::AggregateFunctionTopK;
+    using AggregateFunctionTopK<DataTypeDate::FieldType, is_weighted>::AggregateFunctionTopK;
    DataTypePtr getReturnType() const override { return std::make_shared<DataTypeArray>(std::make_shared<DataTypeDate>()); }
 };
-class AggregateFunctionTopKDateTime : public AggregateFunctionTopK<DataTypeDateTime::FieldType>
+template <bool is_weighted>
 class AggregateFunctionTopKDateTime : public AggregateFunctionTopK<DataTypeDateTime::FieldType, is_weighted>
 {
-    using AggregateFunctionTopK<DataTypeDateTime::FieldType>::AggregateFunctionTopK;
+    using AggregateFunctionTopK<DataTypeDateTime::FieldType, is_weighted>::AggregateFunctionTopK;
    DataTypePtr getReturnType() const override { return std::make_shared<DataTypeArray>(std::make_shared<DataTypeDateTime>()); }
 };
 template <bool is_weighted>
 static IAggregateFunction * createWithExtraTypes(const DataTypePtr & argument_type, UInt64 threshold)
 {
    WhichDataType which(argument_type);
-    if (which.idx == TypeIndex::Date) return new AggregateFunctionTopKDate(threshold);
+    if (which.idx == TypeIndex::Date)
-    if (which.idx == TypeIndex::DateTime) return new AggregateFunctionTopKDateTime(threshold);
+        return new AggregateFunctionTopKDate<is_weighted>(threshold);
    if (which.idx == TypeIndex::DateTime)
        return new AggregateFunctionTopKDateTime<is_weighted>(threshold);
    /// Check that we can use plain version of AggregateFunctionTopKGeneric
    if (argument_type->isValueUnambiguouslyRepresentedInContiguousMemoryRegion())
-        return new AggregateFunctionTopKGeneric<true>(threshold, argument_type);
+        return new AggregateFunctionTopKGeneric<true, is_weighted>(threshold, argument_type);
    else
-        return new AggregateFunctionTopKGeneric<false>(threshold, argument_type);
+        return new AggregateFunctionTopKGeneric<false, is_weighted>(threshold, argument_type);
 }
 template <bool is_weighted>
 AggregateFunctionPtr createAggregateFunctionTopK(const std::string & name, const DataTypes & argument_types, const Array & params)
 {
-    assertUnary(name, argument_types);
+    if (!is_weighted)
    {
        assertUnary(name, argument_types);
    }
    else
    {
        assertBinary(name, argument_types);
        if (!isNumber(argument_types[1]))
            throw Exception("The second argument for aggregate function 'topKWeighted' must have numeric type", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
    }
    UInt64 threshold = 10;  /// default value
    if (!params.empty())
    {
        if (params.size() != 1)
-            throw Exception("Aggregate function " + name + " requires one parameter or less.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
+            throw Exception("Aggregate function " + name + " requires one parameter or less.",
                            ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
        UInt64 k = applyVisitor(FieldVisitorConvertToNumber<UInt64>(), params[0]);
@ -72,10 +90,10 @@ AggregateFunctionPtr createAggregateFunctionTopK(const std::string & name, const
        threshold = k;
    }
-    AggregateFunctionPtr res(createWithNumericType<AggregateFunctionTopK>(*argument_types[0], threshold));
+    AggregateFunctionPtr res(createWithNumericType<AggregateFunctionTopK, is_weighted>(*argument_types[0], threshold));
    if (!res)
-        res = AggregateFunctionPtr(createWithExtraTypes(argument_types[0], threshold));
+        res = AggregateFunctionPtr(createWithExtraTypes<is_weighted>(argument_types[0], threshold));
    if (!res)
        throw Exception("Illegal type " + argument_types[0]->getName() +
@ -88,7 +106,8 @@ AggregateFunctionPtr createAggregateFunctionTopK(const std::string & name, const
 void registerAggregateFunctionTopK(AggregateFunctionFactory & factory)
 {
-    factory.registerFunction("topK", createAggregateFunctionTopK);
+    factory.registerFunction("topK", createAggregateFunctionTopK<false>);
    factory.registerFunction("topKWeighted", createAggregateFunctionTopK<true>);
 }
 }
--- a/dbms/src/AggregateFunctions/AggregateFunctionTopK.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionTopK.h
@ -38,13 +38,12 @@ struct AggregateFunctionTopKData
 };
-template <typename T>
+template <typename T, bool is_weighted>
 class AggregateFunctionTopK
-    : public IAggregateFunctionDataHelper<AggregateFunctionTopKData<T>, AggregateFunctionTopK<T>>
+    : public IAggregateFunctionDataHelper<AggregateFunctionTopKData<T>, AggregateFunctionTopK<T, is_weighted>>
 {
-private:
+protected:
    using State = AggregateFunctionTopKData<T>;
    UInt64 threshold;
    UInt64 reserved;
@ -52,7 +51,7 @@ public:
    AggregateFunctionTopK(UInt64 threshold)
        : threshold(threshold), reserved(TOP_K_LOAD_FACTOR * threshold) {}
-    String getName() const override { return "topK"; }
+    String getName() const override { return is_weighted ? "topKWeighted" : "topK"; }
    DataTypePtr getReturnType() const override
    {
@ -64,7 +63,11 @@ public:
        auto & set = this->data(place).value;
        if (set.capacity() != reserved)
            set.resize(reserved);
-        set.insert(static_cast<const ColumnVector<T> &>(*columns[0]).getData()[row_num]);
+
        if constexpr (is_weighted)
            set.insert(static_cast<const ColumnVector<T> &>(*columns[0]).getData()[row_num], columns[1]->getUInt(row_num));
        else
            set.insert(static_cast<const ColumnVector<T> &>(*columns[0]).getData()[row_num]);
    }
    void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override
@ -125,8 +128,8 @@ struct AggregateFunctionTopKGenericData
 /** Template parameter with true value should be used for columns that store their elements in memory continuously.
 *  For such columns topK() can be implemented more efficiently (especially for small numeric arrays).
 */
-template <bool is_plain_column = false>
+template <bool is_plain_column, bool is_weighted>
-class AggregateFunctionTopKGeneric : public IAggregateFunctionDataHelper<AggregateFunctionTopKGenericData, AggregateFunctionTopKGeneric<is_plain_column>>
+class AggregateFunctionTopKGeneric : public IAggregateFunctionDataHelper<AggregateFunctionTopKGenericData, AggregateFunctionTopKGeneric<is_plain_column, is_weighted>>
 {
 private:
    using State = AggregateFunctionTopKGenericData;
@ -141,7 +144,7 @@ public:
    AggregateFunctionTopKGeneric(UInt64 threshold, const DataTypePtr & input_data_type)
        : threshold(threshold), reserved(TOP_K_LOAD_FACTOR * threshold), input_data_type(input_data_type) {}
-    String getName() const override { return "topK"; }
+    String getName() const override { return is_weighted ? "topKWeighted" : "topK"; }
    DataTypePtr getReturnType() const override
    {
@ -189,13 +192,19 @@ public:
        if constexpr (is_plain_column)
        {
-            set.insert(columns[0]->getDataAt(row_num));
+            if constexpr (is_weighted)
                set.insert(columns[0]->getDataAt(row_num), columns[1]->getUInt(row_num));
            else
                set.insert(columns[0]->getDataAt(row_num));
        }
        else
        {
            const char * begin = nullptr;
            StringRef str_serialized = columns[0]->serializeValueIntoArena(row_num, *arena, begin);
-            set.insert(str_serialized);
+            if constexpr (is_weighted)
                set.insert(str_serialized, columns[1]->getUInt(row_num));
            else
                set.insert(str_serialized);
            arena->rollback(str_serialized.size);
        }
    }
@ -226,7 +235,6 @@ public:
    const char * getHeaderFilePath() const override { return __FILE__; }
 };
 #undef TOP_K_LOAD_FACTOR
 }
--- a/dbms/src/AggregateFunctions/AggregateFunctionUniq.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionUniq.h
@ -5,8 +5,6 @@
 #include <ext/bit_cast.h>
 #include <AggregateFunctions/UniquesHashSet.h>
 #include <IO/WriteHelpers.h>
 #include <IO/ReadHelpers.h>
@ -14,13 +12,14 @@
 #include <DataTypes/DataTypeTuple.h>
 #include <Interpreters/AggregationCommon.h>
 #include <Common/HashTable/HashSet.h>
 #include <Common/HyperLogLogWithSmallSetOptimization.h>
 #include <Common/CombinedCardinalityEstimator.h>
 #include <Common/MemoryTracker.h>
 #include <Common/typeid_cast.h>
 #include <AggregateFunctions/UniquesHashSet.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <AggregateFunctions/UniqVariadicHash.h>
--- a/dbms/src/AggregateFunctions/AggregateFunctionUniqCombined.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionUniqCombined.h
@ -1,6 +1,9 @@
 #pragma once
 #include <Common/CombinedCardinalityEstimator.h>
 #include <Common/FieldVisitors.h>
 #include <Common/SipHash.h>
 #include <Common/typeid_cast.h>
 #include <DataTypes/DataTypeTuple.h>
 #include <DataTypes/DataTypeUUID.h>
@ -14,10 +17,7 @@
 #include <Columns/ColumnVector.h>
 #include <Columns/ColumnsNumber.h>
 #include <Common/FieldVisitors.h>
 #include <Common/SipHash.h>
 #include <Common/typeid_cast.h>
 namespace DB
 {
--- a/dbms/src/AggregateFunctions/AggregateFunctionUniqUpTo.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionUniqUpTo.h
@ -1,15 +1,19 @@
 #pragma once
 #include <Common/FieldVisitors.h>
 #include <Common/typeid_cast.h>
 #include <AggregateFunctions/IAggregateFunction.h>
 #include <AggregateFunctions/UniqVariadicHash.h>
 #include <DataTypes/DataTypesNumber.h>
 #include <DataTypes/DataTypeTuple.h>
 #include <DataTypes/DataTypeUUID.h>
 #include <Columns/ColumnsNumber.h>
 #include <IO/ReadHelpers.h>
 #include <IO/WriteHelpers.h>
 #include <Common/typeid_cast.h>
 namespace DB
--- a/dbms/src/AggregateFunctions/Helpers.h
+++ b/dbms/src/AggregateFunctions/Helpers.h
@ -20,7 +20,7 @@ namespace DB
 /** Create an aggregate function with a numeric type in the template parameter, depending on the type of the argument.
  */
-template <template <typename> class AggregateFunctionTemplate, typename ... TArgs>
+template <template <typename> class AggregateFunctionTemplate, typename... TArgs>
 static IAggregateFunction * createWithNumericType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
@ -33,7 +33,20 @@ static IAggregateFunction * createWithNumericType(const IDataType & argument_typ
    return nullptr;
 }
-template <template <typename, typename> class AggregateFunctionTemplate, typename Data, typename ... TArgs>
+template <template <typename, bool> class AggregateFunctionTemplate, bool bool_param, typename... TArgs>
 static IAggregateFunction * createWithNumericType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
 #define DISPATCH(TYPE) \
    if (which.idx == TypeIndex::TYPE) return new AggregateFunctionTemplate<TYPE, bool_param>(std::forward<TArgs>(args)...);
    FOR_NUMERIC_TYPES(DISPATCH)
 #undef DISPATCH
    if (which.idx == TypeIndex::Enum8) return new AggregateFunctionTemplate<Int8, bool_param>(std::forward<TArgs>(args)...);
    if (which.idx == TypeIndex::Enum16) return new AggregateFunctionTemplate<Int16, bool_param>(std::forward<TArgs>(args)...);
    return nullptr;
 }
 template <template <typename, typename> class AggregateFunctionTemplate, typename Data, typename... TArgs>
 static IAggregateFunction * createWithNumericType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
@ -46,7 +59,7 @@ static IAggregateFunction * createWithNumericType(const IDataType & argument_typ
    return nullptr;
 }
-template <template <typename, typename> class AggregateFunctionTemplate, template <typename> class Data, typename ... TArgs>
+template <template <typename, typename> class AggregateFunctionTemplate, template <typename> class Data, typename... TArgs>
 static IAggregateFunction * createWithNumericType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
@ -59,7 +72,7 @@ static IAggregateFunction * createWithNumericType(const IDataType & argument_typ
    return nullptr;
 }
-template <template <typename, typename> class AggregateFunctionTemplate, template <typename> class Data, typename ... TArgs>
+template <template <typename, typename> class AggregateFunctionTemplate, template <typename> class Data, typename... TArgs>
 static IAggregateFunction * createWithUnsignedIntegerType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
@ -70,7 +83,7 @@ static IAggregateFunction * createWithUnsignedIntegerType(const IDataType & argu
    return nullptr;
 }
-template <template <typename> class AggregateFunctionTemplate, typename ... TArgs>
+template <template <typename> class AggregateFunctionTemplate, typename... TArgs>
 static IAggregateFunction * createWithNumericBasedType(const IDataType & argument_type, TArgs && ... args)
 {
    IAggregateFunction * f = createWithNumericType<AggregateFunctionTemplate>(argument_type, std::forward<TArgs>(args)...);
@ -85,7 +98,7 @@ static IAggregateFunction * createWithNumericBasedType(const IDataType & argumen
    return nullptr;
 }
-template <template <typename> class AggregateFunctionTemplate, typename ... TArgs>
+template <template <typename> class AggregateFunctionTemplate, typename... TArgs>
 static IAggregateFunction * createWithDecimalType(const IDataType & argument_type, TArgs && ... args)
 {
    WhichDataType which(argument_type);
@ -98,7 +111,7 @@ static IAggregateFunction * createWithDecimalType(const IDataType & argument_typ
 /** For template with two arguments.
  */
-template <typename FirstType, template <typename, typename> class AggregateFunctionTemplate, typename ... TArgs>
+template <typename FirstType, template <typename, typename> class AggregateFunctionTemplate, typename... TArgs>
 static IAggregateFunction * createWithTwoNumericTypesSecond(const IDataType & second_type, TArgs && ... args)
 {
    WhichDataType which(second_type);
@ -111,7 +124,7 @@ static IAggregateFunction * createWithTwoNumericTypesSecond(const IDataType & se
    return nullptr;
 }
-template <template <typename, typename> class AggregateFunctionTemplate, typename ... TArgs>
+template <template <typename, typename> class AggregateFunctionTemplate, typename... TArgs>
 static IAggregateFunction * createWithTwoNumericTypes(const IDataType & first_type, const IDataType & second_type, TArgs && ... args)
 {
    WhichDataType which(first_type);
--- a/dbms/src/AggregateFunctions/QuantileExact.h
+++ b/dbms/src/AggregateFunctions/QuantileExact.h
@ -2,9 +2,9 @@
 #include <Common/PODArray.h>
 #include <Common/NaNUtils.h>
 #include <Core/Types.h>
 #include <IO/WriteBuffer.h>
 #include <IO/ReadBuffer.h>
 #include <Core/Types.h>
 #include <IO/VarInt.h>
@ -19,7 +19,7 @@ namespace ErrorCodes
 /** Calculates quantile by collecting all values into array
  *  and applying n-th element (introselect) algorithm for the resulting array.
  *
-  * It use O(N) memory and it is very inefficient in case of high amount of identical values.
+  * It uses O(N) memory and it is very inefficient in case of high amount of identical values.
  * But it is very CPU efficient for not large datasets.
  */
 template <typename Value>
--- a/dbms/src/AggregateFunctions/QuantileExactWeighted.h
+++ b/dbms/src/AggregateFunctions/QuantileExactWeighted.h
@ -14,7 +14,7 @@ namespace ErrorCodes
 /** Calculates quantile by counting number of occurrences for each value in a hash map.
  *
-  * It use O(distinct(N)) memory. Can be naturally applied for values with weight.
+  * It uses O(distinct(N)) memory. Can be naturally applied for values with weight.
  * In case of many identical values, it can be more efficient than QuantileExact even when weight is not used.
  */
 template <typename Value>
--- a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
+++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp
@ -27,6 +27,7 @@ void registerAggregateFunctionUniqUpTo(AggregateFunctionFactory &);
 void registerAggregateFunctionTopK(AggregateFunctionFactory &);
 void registerAggregateFunctionsBitwise(AggregateFunctionFactory &);
 void registerAggregateFunctionsMaxIntersections(AggregateFunctionFactory &);
 void registerAggregateFunctionEntropy(AggregateFunctionFactory &);
 void registerAggregateFunctionCombinatorIf(AggregateFunctionCombinatorFactory &);
 void registerAggregateFunctionCombinatorArray(AggregateFunctionCombinatorFactory &);
@ -65,6 +66,7 @@ void registerAggregateFunctions()
        registerAggregateFunctionsMaxIntersections(factory);
        registerAggregateFunctionHistogram(factory);
        registerAggregateFunctionRetention(factory);
        registerAggregateFunctionEntropy(factory);
    }
    {
--- a/dbms/src/Client/Connection.h
+++ b/dbms/src/Client/Connection.h
@ -1,7 +1,5 @@
 #pragma once
 #include <optional>
 #include <common/logger_useful.h>
 #include <Poco/Net/StreamSocket.h>
@ -121,6 +119,12 @@ public:
    UInt16 getPort() const;
    const String & getDefaultDatabase() const;
    /// For proper polling.
    inline const auto & getTimeouts() const
    {
        return timeouts;
    }
    /// If last flag is true, you need to call sendExternalTablesData after.
    void sendQuery(
        const String & query,
--- a/dbms/src/Columns/ColumnAggregateFunction.cpp
+++ b/dbms/src/Columns/ColumnAggregateFunction.cpp
@ -1,4 +1,5 @@
 #include <Columns/ColumnAggregateFunction.h>
 #include <Columns/ColumnsCommon.h>
 #include <AggregateFunctions/AggregateFunctionState.h>
 #include <DataStreams/ColumnGathererStream.h>
 #include <IO/WriteBufferFromArena.h>
@ -6,7 +7,6 @@
 #include <Common/AlignedBuffer.h>
 #include <Common/typeid_cast.h>
 #include <Common/Arena.h>
 #include <Columns/ColumnsCommon.h>
 namespace DB
@ -182,7 +182,7 @@ ColumnPtr ColumnAggregateFunction::filter(const Filter & filter, ssize_t result_
 }
-ColumnPtr ColumnAggregateFunction::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnAggregateFunction::permute(const Permutation & perm, UInt64 limit) const
 {
    size_t size = data.size();
@ -203,13 +203,13 @@ ColumnPtr ColumnAggregateFunction::permute(const Permutation & perm, size_t limi
    return res;
 }
-ColumnPtr ColumnAggregateFunction::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnAggregateFunction::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
 template <typename Type>
-ColumnPtr ColumnAggregateFunction::indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const
+ColumnPtr ColumnAggregateFunction::indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const
 {
    auto res = createView();
--- a/dbms/src/Columns/ColumnAggregateFunction.h
+++ b/dbms/src/Columns/ColumnAggregateFunction.h
@ -161,12 +161,12 @@ public:
    ColumnPtr filter(const Filter & filter, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    template <typename Type>
-    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    ColumnPtr replicate(const Offsets & offsets) const override;
@ -179,7 +179,7 @@ public:
        return 0;
    }
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    /** More efficient manipulation methods */
    Container & getData()
--- a/dbms/src/Columns/ColumnArray.cpp
+++ b/dbms/src/Columns/ColumnArray.cpp
@ -589,7 +589,7 @@ ColumnPtr ColumnArray::filterTuple(const Filter & filt, ssize_t result_size_hint
 }
-ColumnPtr ColumnArray::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnArray::permute(const Permutation & perm, UInt64 limit) const
 {
    size_t size = getOffsets().size();
@ -626,13 +626,13 @@ ColumnPtr ColumnArray::permute(const Permutation & perm, size_t limit) const
    return res;
 }
-ColumnPtr ColumnArray::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnArray::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
 template <typename T>
-ColumnPtr ColumnArray::indexImpl(const PaddedPODArray<T> & indexes, size_t limit) const
+ColumnPtr ColumnArray::indexImpl(const PaddedPODArray<T> & indexes, UInt64 limit) const
 {
    if (limit == 0)
        return ColumnArray::create(data);
@ -664,7 +664,7 @@ ColumnPtr ColumnArray::indexImpl(const PaddedPODArray<T> & indexes, size_t limit
 INSTANTIATE_INDEX_IMPL(ColumnArray)
-void ColumnArray::getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const
+void ColumnArray::getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const
 {
    size_t s = size();
    if (limit >= s)
--- a/dbms/src/Columns/ColumnArray.h
+++ b/dbms/src/Columns/ColumnArray.h
@ -70,11 +70,11 @@ public:
    void insertDefault() override;
    void popBack(size_t n) override;
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
-    template <typename Type> ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    template <typename Type> ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/dbms/src/Columns/ColumnConst.cpp
+++ b/dbms/src/Columns/ColumnConst.cpp
@ -54,7 +54,7 @@ ColumnPtr ColumnConst::replicate(const Offsets & offsets) const
    return ColumnConst::create(data, replicated_size);
 }
-ColumnPtr ColumnConst::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnConst::permute(const Permutation & perm, UInt64 limit) const
 {
    if (limit == 0)
        limit = s;
@ -68,7 +68,7 @@ ColumnPtr ColumnConst::permute(const Permutation & perm, size_t limit) const
    return ColumnConst::create(data, limit);
 }
-ColumnPtr ColumnConst::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnConst::index(const IColumn & indexes, UInt64 limit) const
 {
    if (limit == 0)
        limit = indexes.size();
--- a/dbms/src/Columns/ColumnConst.h
+++ b/dbms/src/Columns/ColumnConst.h
@ -154,9 +154,9 @@ public:
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
    ColumnPtr replicate(const Offsets & offsets) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    size_t byteSize() const override
    {
--- a/dbms/src/Columns/ColumnDecimal.cpp
+++ b/dbms/src/Columns/ColumnDecimal.cpp
@ -63,7 +63,7 @@ void ColumnDecimal<T>::updateHashWithValue(size_t n, SipHash & hash) const
 }
 template <typename T>
-void ColumnDecimal<T>::getPermutation(bool reverse, size_t limit, int , IColumn::Permutation & res) const
+void ColumnDecimal<T>::getPermutation(bool reverse, UInt64 limit, int , IColumn::Permutation & res) const
 {
 #if 1 /// TODO: perf test
    if (data.size() <= std::numeric_limits<UInt32>::max())
@ -82,7 +82,7 @@ void ColumnDecimal<T>::getPermutation(bool reverse, size_t limit, int , IColumn:
 }
 template <typename T>
-ColumnPtr ColumnDecimal<T>::permute(const IColumn::Permutation & perm, size_t limit) const
+ColumnPtr ColumnDecimal<T>::permute(const IColumn::Permutation & perm, UInt64 limit) const
 {
    size_t size = limit ? std::min(data.size(), limit) : data.size();
    if (perm.size() < size)
@ -173,7 +173,7 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter & filt, ssize_t result_
 }
 template <typename T>
-ColumnPtr ColumnDecimal<T>::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnDecimal<T>::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
--- a/dbms/src/Columns/ColumnDecimal.h
+++ b/dbms/src/Columns/ColumnDecimal.h
@ -101,7 +101,7 @@ public:
    const char * deserializeAndInsertFromArena(const char * pos) override;
    void updateHashWithValue(size_t n, SipHash & hash) const override;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, IColumn::Permutation & res) const override;
    MutableColumnPtr cloneResized(size_t size) const override;
@ -116,11 +116,11 @@ public:
    bool isDefaultAt(size_t n) const override { return data[n] == 0; }
    ColumnPtr filter(const IColumn::Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const IColumn::Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const IColumn::Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    template <typename Type>
-    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    ColumnPtr replicate(const IColumn::Offsets & offsets) const override;
    void getExtremes(Field & min, Field & max) const override;
@ -144,7 +144,7 @@ protected:
    UInt32 scale;
    template <typename U>
-    void permutation(bool reverse, size_t limit, PaddedPODArray<U> & res) const
+    void permutation(bool reverse, UInt64 limit, PaddedPODArray<U> & res) const
    {
        size_t s = data.size();
        res.resize(s);
@ -164,7 +164,7 @@ protected:
 template <typename T>
 template <typename Type>
-ColumnPtr ColumnDecimal<T>::indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const
+ColumnPtr ColumnDecimal<T>::indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const
 {
    size_t size = indexes.size();
--- a/dbms/src/Columns/ColumnFixedString.cpp
+++ b/dbms/src/Columns/ColumnFixedString.cpp
@ -112,7 +112,7 @@ struct ColumnFixedString::less
    }
 };
-void ColumnFixedString::getPermutation(bool reverse, size_t limit, int /*nan_direction_hint*/, Permutation & res) const
+void ColumnFixedString::getPermutation(bool reverse, UInt64 limit, int /*nan_direction_hint*/, Permutation & res) const
 {
    size_t s = size();
    res.resize(s);
@ -231,7 +231,7 @@ ColumnPtr ColumnFixedString::filter(const IColumn::Filter & filt, ssize_t result
    return res;
 }
-ColumnPtr ColumnFixedString::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnFixedString::permute(const Permutation & perm, UInt64 limit) const
 {
    size_t col_size = size();
@ -260,14 +260,14 @@ ColumnPtr ColumnFixedString::permute(const Permutation & perm, size_t limit) con
 }
-ColumnPtr ColumnFixedString::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnFixedString::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
 template <typename Type>
-ColumnPtr ColumnFixedString::indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const
+ColumnPtr ColumnFixedString::indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const
 {
    if (limit == 0)
        return ColumnFixedString::create(n);
--- a/dbms/src/Columns/ColumnFixedString.h
+++ b/dbms/src/Columns/ColumnFixedString.h
@ -101,18 +101,18 @@ public:
        return memcmp(&chars[p1 * n], &rhs.chars[p2 * n], n);
    }
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
    ColumnPtr filter(const IColumn::Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    template <typename Type>
-    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    ColumnPtr replicate(const Offsets & offsets) const override;
@ -138,7 +138,7 @@ public:
    StringRef getRawData() const override { return StringRef(chars.data(), chars.size()); }
    /// Specialized part of interface, not from IColumn.
-
+    void insertString(const String & string) { insertData(string.c_str(), string.size()); }
    Chars & getChars() { return chars; }
    const Chars & getChars() const { return chars; }
--- a/dbms/src/Columns/ColumnFunction.cpp
+++ b/dbms/src/Columns/ColumnFunction.cpp
@ -70,7 +70,7 @@ ColumnPtr ColumnFunction::filter(const Filter & filt, ssize_t result_size_hint)
    return ColumnFunction::create(filtered_size, function, capture);
 }
-ColumnPtr ColumnFunction::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnFunction::permute(const Permutation & perm, UInt64 limit) const
 {
    if (limit == 0)
        limit = size_;
@ -88,7 +88,7 @@ ColumnPtr ColumnFunction::permute(const Permutation & perm, size_t limit) const
    return ColumnFunction::create(limit, function, capture);
 }
-ColumnPtr ColumnFunction::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnFunction::index(const IColumn & indexes, UInt64 limit) const
 {
    ColumnsWithTypeAndName capture = captured_columns;
    for (auto & column : capture)
--- a/dbms/src/Columns/ColumnFunction.h
+++ b/dbms/src/Columns/ColumnFunction.h
@ -32,8 +32,8 @@ public:
    ColumnPtr cut(size_t start, size_t length) const override;
    ColumnPtr replicate(const Offsets & offsets) const override;
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    void insertDefault() override;
    void popBack(size_t n) override;
    std::vector<MutableColumnPtr> scatter(IColumn::ColumnIndex num_columns,
--- a/dbms/src/Columns/ColumnLowCardinality.cpp
+++ b/dbms/src/Columns/ColumnLowCardinality.cpp
@ -250,7 +250,7 @@ int ColumnLowCardinality::compareAt(size_t n, size_t m, const IColumn & rhs, int
    return getDictionary().compareAt(n_index, m_index, low_cardinality_column.getDictionary(), nan_direction_hint);
 }
-void ColumnLowCardinality::getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const
+void ColumnLowCardinality::getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const
 {
    if (limit == 0)
        limit = size();
@ -343,7 +343,7 @@ void ColumnLowCardinality::compactIfSharedDictionary()
 ColumnLowCardinality::DictionaryEncodedColumn
-ColumnLowCardinality::getMinimalDictionaryEncodedColumn(size_t offset, size_t limit) const
+ColumnLowCardinality::getMinimalDictionaryEncodedColumn(UInt64 offset, UInt64 limit) const
 {
    MutableColumnPtr sub_indexes = (*std::move(idx.getPositions()->cut(offset, limit))).mutate();
    auto indexes_map = mapUniqueIndex(*sub_indexes);
@ -527,7 +527,7 @@ void ColumnLowCardinality::Index::insertPosition(UInt64 position)
    checkSizeOfType();
 }
-void ColumnLowCardinality::Index::insertPositionsRange(const IColumn & column, size_t offset, size_t limit)
+void ColumnLowCardinality::Index::insertPositionsRange(const IColumn & column, UInt64 offset, UInt64 limit)
 {
    auto insertForType = [&](auto type)
    {
@ -550,10 +550,10 @@ void ColumnLowCardinality::Index::insertPositionsRange(const IColumn & column, s
                auto & positions_data = getPositionsData<CurIndexType>();
                const auto & column_data = column_ptr->getData();
-                size_t size = positions_data.size();
+                UInt64 size = positions_data.size();
                positions_data.resize(size + limit);
-                for (size_t i = 0; i < limit; ++i)
+                for (UInt64 i = 0; i < limit; ++i)
                    positions_data[size + i] = column_data[offset + i];
            };
--- a/dbms/src/Columns/ColumnLowCardinality.h
+++ b/dbms/src/Columns/ColumnLowCardinality.h
@ -90,19 +90,19 @@ public:
        return ColumnLowCardinality::create(dictionary.getColumnUniquePtr(), getIndexes().filter(filt, result_size_hint));
    }
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override
    {
        return ColumnLowCardinality::create(dictionary.getColumnUniquePtr(), getIndexes().permute(perm, limit));
    }
-    ColumnPtr index(const IColumn & indexes_, size_t limit) const override
+    ColumnPtr index(const IColumn & indexes_, UInt64 limit) const override
    {
        return ColumnLowCardinality::create(dictionary.getColumnUniquePtr(), getIndexes().index(indexes_, limit));
    }
    int compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const override;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    ColumnPtr replicate(const Offsets & offsets) const override
    {
@ -180,7 +180,7 @@ public:
        ColumnPtr indexes;
    };
-    DictionaryEncodedColumn getMinimalDictionaryEncodedColumn(size_t offset, size_t limit) const;
+    DictionaryEncodedColumn getMinimalDictionaryEncodedColumn(UInt64 offset, UInt64 limit) const;
    ColumnPtr countKeys() const;
@ -196,7 +196,7 @@ public:
        ColumnPtr & getPositionsPtr() { return positions; }
        size_t getPositionAt(size_t row) const;
        void insertPosition(UInt64 position);
-        void insertPositionsRange(const IColumn & column, size_t offset, size_t limit);
+        void insertPositionsRange(const IColumn & column, UInt64 offset, UInt64 limit);
        void popBack(size_t n) { positions->assumeMutableRef().popBack(n); }
        void reserve(size_t n) { positions->assumeMutableRef().reserve(n); }
--- a/dbms/src/Columns/ColumnNullable.cpp
+++ b/dbms/src/Columns/ColumnNullable.cpp
@ -158,14 +158,14 @@ ColumnPtr ColumnNullable::filter(const Filter & filt, ssize_t result_size_hint)
    return ColumnNullable::create(filtered_data, filtered_null_map);
 }
-ColumnPtr ColumnNullable::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnNullable::permute(const Permutation & perm, UInt64 limit) const
 {
    ColumnPtr permuted_data = getNestedColumn().permute(perm, limit);
    ColumnPtr permuted_null_map = getNullMapColumn().permute(perm, limit);
    return ColumnNullable::create(permuted_data, permuted_null_map);
 }
-ColumnPtr ColumnNullable::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnNullable::index(const IColumn & indexes, UInt64 limit) const
 {
    ColumnPtr indexed_data = getNestedColumn().index(indexes, limit);
    ColumnPtr indexed_null_map = getNullMapColumn().index(indexes, limit);
@ -197,7 +197,7 @@ int ColumnNullable::compareAt(size_t n, size_t m, const IColumn & rhs_, int null
    return getNestedColumn().compareAt(n, m, nested_rhs, null_direction_hint);
 }
-void ColumnNullable::getPermutation(bool reverse, size_t limit, int null_direction_hint, Permutation & res) const
+void ColumnNullable::getPermutation(bool reverse, UInt64 limit, int null_direction_hint, Permutation & res) const
 {
    /// Cannot pass limit because of unknown amount of NULLs.
    getNestedColumn().getPermutation(reverse, 0, null_direction_hint, res);
--- a/dbms/src/Columns/ColumnNullable.h
+++ b/dbms/src/Columns/ColumnNullable.h
@ -64,10 +64,10 @@ public:
    void popBack(size_t n) override;
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    int compareAt(size_t n, size_t m, const IColumn & rhs_, int null_direction_hint) const override;
-    void getPermutation(bool reverse, size_t limit, int null_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int null_direction_hint, Permutation & res) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/dbms/src/Columns/ColumnString.cpp
+++ b/dbms/src/Columns/ColumnString.cpp
@ -111,7 +111,7 @@ ColumnPtr ColumnString::filter(const Filter & filt, ssize_t result_size_hint) co
 }
-ColumnPtr ColumnString::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnString::permute(const Permutation & perm, UInt64 limit) const
 {
    size_t size = offsets.size();
@ -191,13 +191,13 @@ const char * ColumnString::deserializeAndInsertFromArena(const char * pos)
 }
-ColumnPtr ColumnString::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnString::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
 template <typename Type>
-ColumnPtr ColumnString::indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const
+ColumnPtr ColumnString::indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const
 {
    if (limit == 0)
        return ColumnString::create();
@ -251,7 +251,7 @@ struct ColumnString::less
    }
 };
-void ColumnString::getPermutation(bool reverse, size_t limit, int /*nan_direction_hint*/, Permutation & res) const
+void ColumnString::getPermutation(bool reverse, UInt64 limit, int /*nan_direction_hint*/, Permutation & res) const
 {
    size_t s = offsets.size();
    res.resize(s);
@ -389,7 +389,7 @@ struct ColumnString::lessWithCollation
    }
 };
-void ColumnString::getPermutationWithCollation(const Collator & collator, bool reverse, size_t limit, Permutation & res) const
+void ColumnString::getPermutationWithCollation(const Collator & collator, bool reverse, UInt64 limit, Permutation & res) const
 {
    size_t s = offsets.size();
    res.resize(s);
--- a/dbms/src/Columns/ColumnString.h
+++ b/dbms/src/Columns/ColumnString.h
@ -194,12 +194,12 @@ public:
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    template <typename Type>
-    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    void insertDefault() override
    {
@ -225,10 +225,10 @@ public:
    /// Variant of compareAt for string comparison with respect of collation.
    int compareAtWithCollation(size_t n, size_t m, const IColumn & rhs_, const Collator & collator) const;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    /// Sorting with respect of collation.
-    void getPermutationWithCollation(const Collator & collator, bool reverse, size_t limit, Permutation & res) const;
+    void getPermutationWithCollation(const Collator & collator, bool reverse, UInt64 limit, Permutation & res) const;
    ColumnPtr replicate(const Offsets & replicate_offsets) const override;
--- a/dbms/src/Columns/ColumnTuple.cpp
+++ b/dbms/src/Columns/ColumnTuple.cpp
@ -170,7 +170,7 @@ ColumnPtr ColumnTuple::filter(const Filter & filt, ssize_t result_size_hint) con
    return ColumnTuple::create(new_columns);
 }
-ColumnPtr ColumnTuple::permute(const Permutation & perm, size_t limit) const
+ColumnPtr ColumnTuple::permute(const Permutation & perm, UInt64 limit) const
 {
    const size_t tuple_size = columns.size();
    Columns new_columns(tuple_size);
@ -181,7 +181,7 @@ ColumnPtr ColumnTuple::permute(const Permutation & perm, size_t limit) const
    return ColumnTuple::create(new_columns);
 }
-ColumnPtr ColumnTuple::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnTuple::index(const IColumn & indexes, UInt64 limit) const
 {
    const size_t tuple_size = columns.size();
    Columns new_columns(tuple_size);
@ -261,7 +261,7 @@ struct ColumnTuple::Less
    }
 };
-void ColumnTuple::getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const
+void ColumnTuple::getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const
 {
    size_t rows = size();
    res.resize(rows);
--- a/dbms/src/Columns/ColumnTuple.h
+++ b/dbms/src/Columns/ColumnTuple.h
@ -60,14 +60,14 @@ public:
    void updateHashWithValue(size_t n, SipHash & hash) const override;
    void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
    ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    ColumnPtr replicate(const Offsets & offsets) const override;
    MutableColumns scatter(ColumnIndex num_columns, const Selector & selector) const override;
    void gather(ColumnGathererStream & gatherer_stream) override;
    int compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const override;
    void getExtremes(Field & min, Field & max) const override;
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const override;
    void reserve(size_t n) override;
    size_t byteSize() const override;
    size_t allocatedBytes() const override;
--- a/dbms/src/Columns/ColumnVector.cpp
+++ b/dbms/src/Columns/ColumnVector.cpp
@ -16,8 +16,6 @@
 #ifdef __SSE2__
    #include <emmintrin.h>
 #include <Columns/ColumnsCommon.h>
 #endif
@ -71,7 +69,7 @@ struct ColumnVector<T>::greater
 };
 template <typename T>
-void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const
+void ColumnVector<T>::getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, IColumn::Permutation & res) const
 {
    size_t s = data.size();
    res.resize(s);
@ -211,7 +209,7 @@ ColumnPtr ColumnVector<T>::filter(const IColumn::Filter & filt, ssize_t result_s
 }
 template <typename T>
-ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, size_t limit) const
+ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, UInt64 limit) const
 {
    size_t size = data.size();
@ -232,7 +230,7 @@ ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, size_t lim
 }
 template <typename T>
-ColumnPtr ColumnVector<T>::index(const IColumn & indexes, size_t limit) const
+ColumnPtr ColumnVector<T>::index(const IColumn & indexes, UInt64 limit) const
 {
    return selectIndexImpl(*this, indexes, limit);
 }
--- a/dbms/src/Columns/ColumnVector.h
+++ b/dbms/src/Columns/ColumnVector.h
@ -174,7 +174,7 @@ public:
        return CompareHelper<T>::compare(data[n], static_cast<const Self &>(rhs_).data[m], nan_direction_hint);
    }
-    void getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const override;
+    void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, IColumn::Permutation & res) const override;
    void reserve(size_t n) override
    {
@ -221,12 +221,12 @@ public:
    ColumnPtr filter(const IColumn::Filter & filt, ssize_t result_size_hint) const override;
-    ColumnPtr permute(const IColumn::Permutation & perm, size_t limit) const override;
+    ColumnPtr permute(const IColumn::Permutation & perm, UInt64 limit) const override;
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override;
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override;
    template <typename Type>
-    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const;
+    ColumnPtr indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const;
    ColumnPtr replicate(const IColumn::Offsets & offsets) const override;
@ -273,7 +273,7 @@ protected:
 template <typename T>
 template <typename Type>
-ColumnPtr ColumnVector<T>::indexImpl(const PaddedPODArray<Type> & indexes, size_t limit) const
+ColumnPtr ColumnVector<T>::indexImpl(const PaddedPODArray<Type> & indexes, UInt64 limit) const
 {
    size_t size = indexes.size();
--- a/dbms/src/Columns/ColumnsCommon.cpp
+++ b/dbms/src/Columns/ColumnsCommon.cpp
@ -3,8 +3,8 @@
 #endif
 #include <Columns/IColumn.h>
 #include <Common/typeid_cast.h>
 #include <Columns/ColumnVector.h>
 #include <Common/typeid_cast.h>
 #include <Common/HashTable/HashSet.h>
 #include <Common/HashTable/HashMap.h>
--- a/dbms/src/Columns/ColumnsCommon.h
+++ b/dbms/src/Columns/ColumnsCommon.h
@ -44,9 +44,9 @@ namespace detail
    const PaddedPODArray<T> * getIndexesData(const IColumn & indexes);
 }
-/// Check limit <= indexes->size() and call column.indexImpl(const PaddedPodArray<Type> & indexes, size_t limit).
+/// Check limit <= indexes->size() and call column.indexImpl(const PaddedPodArray<Type> & indexes, UInt64 limit).
 template <typename Column>
-ColumnPtr selectIndexImpl(const Column & column, const IColumn & indexes, size_t limit)
+ColumnPtr selectIndexImpl(const Column & column, const IColumn & indexes, UInt64 limit)
 {
    if (limit == 0)
        limit = indexes.size();
@ -68,8 +68,8 @@ ColumnPtr selectIndexImpl(const Column & column, const IColumn & indexes, size_t
 }
 #define INSTANTIATE_INDEX_IMPL(Column) \
-    template ColumnPtr Column::indexImpl<UInt8>(const PaddedPODArray<UInt8> & indexes, size_t limit) const; \
+    template ColumnPtr Column::indexImpl<UInt8>(const PaddedPODArray<UInt8> & indexes, UInt64 limit) const; \
-    template ColumnPtr Column::indexImpl<UInt16>(const PaddedPODArray<UInt16> & indexes, size_t limit) const; \
+    template ColumnPtr Column::indexImpl<UInt16>(const PaddedPODArray<UInt16> & indexes, UInt64 limit) const; \
-    template ColumnPtr Column::indexImpl<UInt32>(const PaddedPODArray<UInt32> & indexes, size_t limit) const; \
+    template ColumnPtr Column::indexImpl<UInt32>(const PaddedPODArray<UInt32> & indexes, UInt64 limit) const; \
-    template ColumnPtr Column::indexImpl<UInt64>(const PaddedPODArray<UInt64> & indexes, size_t limit) const;
+    template ColumnPtr Column::indexImpl<UInt64>(const PaddedPODArray<UInt64> & indexes, UInt64 limit) const;
 }
--- a/dbms/src/Columns/FilterDescription.cpp
+++ b/dbms/src/Columns/FilterDescription.cpp
@ -1,6 +1,5 @@
 #include <Columns/FilterDescription.h>
 #include <Common/typeid_cast.h>
 #include <Columns/FilterDescription.h>
 #include <Columns/ColumnsNumber.h>
 #include <Columns/ColumnNullable.h>
 #include <Columns/ColumnConst.h>
--- a/dbms/src/Columns/IColumn.h
+++ b/dbms/src/Columns/IColumn.h
@ -184,11 +184,11 @@ public:
    /// Permutes elements using specified permutation. Is used in sortings.
    /// limit - if it isn't 0, puts only first limit elements in the result.
    using Permutation = PaddedPODArray<size_t>;
-    virtual Ptr permute(const Permutation & perm, size_t limit) const = 0;
+    virtual Ptr permute(const Permutation & perm, UInt64 limit) const = 0;
    /// Creates new column with values column[indexes[:limit]]. If limit is 0, all indexes are used.
    /// Indexes must be one of the ColumnUInt. For default implementation, see selectIndexImpl from ColumnsCommon.h
-    virtual Ptr index(const IColumn & indexes, size_t limit) const = 0;
+    virtual Ptr index(const IColumn & indexes, UInt64 limit) const = 0;
    /** Compares (*this)[n] and rhs[m]. Column rhs should have the same type.
      * Returns negative number, 0, or positive number (*this)[n] is less, equal, greater than rhs[m] respectively.
@ -209,7 +209,7 @@ public:
      * limit - if isn't 0, then only first limit elements of the result column could be sorted.
      * nan_direction_hint - see above.
      */
-    virtual void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const = 0;
+    virtual void getPermutation(bool reverse, UInt64 limit, int nan_direction_hint, Permutation & res) const = 0;
    /** Copies each element according offsets parameter.
      * (i-th element should be copied offsets[i] - offsets[i - 1] times.)
--- a/dbms/src/Columns/IColumnDummy.h
+++ b/dbms/src/Columns/IColumnDummy.h
@ -79,7 +79,7 @@ public:
        return cloneDummy(countBytesInFilter(filt));
    }
-    ColumnPtr permute(const Permutation & perm, size_t limit) const override
+    ColumnPtr permute(const Permutation & perm, UInt64 limit) const override
    {
        if (s != perm.size())
            throw Exception("Size of permutation doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
@ -87,7 +87,7 @@ public:
        return cloneDummy(limit ? std::min(s, limit) : s);
    }
-    ColumnPtr index(const IColumn & indexes, size_t limit) const override
+    ColumnPtr index(const IColumn & indexes, UInt64 limit) const override
    {
        if (indexes.size() < limit)
            throw Exception("Size of indexes is less than required.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
--- a/dbms/src/Common/ColumnsHashing.h
+++ b/dbms/src/Common/ColumnsHashing.h
@ -0,0 +1,557 @@
 #pragma once
 #include <Common/ColumnsHashingImpl.h>
 #include <Common/Arena.h>
 #include <Common/LRUCache.h>
 #include <common/unaligned.h>
 #include <Columns/ColumnString.h>
 #include <Columns/ColumnFixedString.h>
 #include <Columns/ColumnLowCardinality.h>
 #include <Core/Defines.h>
 #include <memory>
 namespace DB
 {
 namespace ColumnsHashing
 {
 /// For the case when there is one numeric key.
 /// UInt8/16/32/64 for any type with corresponding bit width.
 template <typename Value, typename Mapped, typename FieldType, bool use_cache = true>
 struct HashMethodOneNumber
    : public columns_hashing_impl::HashMethodBase<HashMethodOneNumber<Value, Mapped, FieldType, use_cache>, Value, Mapped, use_cache>
 {
    using Self = HashMethodOneNumber<Value, Mapped, FieldType, use_cache>;
    using Base = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    const char * vec;
    /// If the keys of a fixed length then key_sizes contains their lengths, empty otherwise.
    HashMethodOneNumber(const ColumnRawPtrs & key_columns, const Sizes & /*key_sizes*/, const HashMethodContextPtr &)
    {
        vec = key_columns[0]->getRawData().data;
    }
    /// Creates context. Method is called once and result context is used in all threads.
    using Base::createContext; /// (const HashMethodContext::Settings &) -> HashMethodContextPtr
    /// Emplace key into HashTable or HashMap. If Data is HashMap, returns ptr to value, otherwise nullptr.
    /// Data is a HashTable where to insert key from column's row.
    /// For Serialized method, key may be placed in pool.
    using Base::emplaceKey; /// (Data & data, size_t row, Arena & pool) -> EmplaceResult
    /// Find key into HashTable or HashMap. If Data is HashMap and key was found, returns ptr to value, otherwise nullptr.
    using Base::findKey;  /// (Data & data, size_t row, Arena & pool) -> FindResult
    /// Get hash value of row.
    using Base::getHash; /// (const Data & data, size_t row, Arena & pool) -> size_t
    /// Is used for default implementation in HashMethodBase.
    FieldType getKey(size_t row, Arena &) const { return unalignedLoad<FieldType>(vec + row * sizeof(FieldType)); }
    /// Get StringRef from value which can be inserted into column.
    static StringRef getValueRef(const Value & value)
    {
        return StringRef(reinterpret_cast<const char *>(&value.first), sizeof(value.first));
    }
 };
 /// For the case when there is one string key.
 template <typename Value, typename Mapped, bool place_string_to_arena = true, bool use_cache = true>
 struct HashMethodString
    : public columns_hashing_impl::HashMethodBase<HashMethodString<Value, Mapped, place_string_to_arena, use_cache>, Value, Mapped, use_cache>
 {
    using Self = HashMethodString<Value, Mapped, place_string_to_arena, use_cache>;
    using Base = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    const IColumn::Offset * offsets;
    const UInt8 * chars;
    HashMethodString(const ColumnRawPtrs & key_columns, const Sizes & /*key_sizes*/, const HashMethodContextPtr &)
    {
        const IColumn & column = *key_columns[0];
        const ColumnString & column_string = static_cast<const ColumnString &>(column);
        offsets = column_string.getOffsets().data();
        chars = column_string.getChars().data();
    }
    auto getKey(ssize_t row, Arena &) const
    {
        return StringRef(chars + offsets[row - 1], offsets[row] - offsets[row - 1] - 1);
    }
    static StringRef getValueRef(const Value & value) { return StringRef(value.first.data, value.first.size); }
 protected:
    friend class columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    static ALWAYS_INLINE void onNewKey([[maybe_unused]] StringRef & key, [[maybe_unused]] Arena & pool)
    {
        if constexpr (place_string_to_arena)
        {
            if (key.size)
                key.data = pool.insert(key.data, key.size);
        }
    }
 };
 /// For the case when there is one fixed-length string key.
 template <typename Value, typename Mapped, bool place_string_to_arena = true, bool use_cache = true>
 struct HashMethodFixedString
    : public columns_hashing_impl::HashMethodBase<HashMethodFixedString<Value, Mapped, place_string_to_arena, use_cache>, Value, Mapped, use_cache>
 {
    using Self = HashMethodFixedString<Value, Mapped, place_string_to_arena, use_cache>;
    using Base = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    size_t n;
    const ColumnFixedString::Chars * chars;
    HashMethodFixedString(const ColumnRawPtrs & key_columns, const Sizes & /*key_sizes*/, const HashMethodContextPtr &)
    {
        const IColumn & column = *key_columns[0];
        const ColumnFixedString & column_string = static_cast<const ColumnFixedString &>(column);
        n = column_string.getN();
        chars = &column_string.getChars();
    }
    StringRef getKey(size_t row, Arena &) const { return StringRef(&(*chars)[row * n], n); }
    static StringRef getValueRef(const Value & value) { return StringRef(value.first.data, value.first.size); }
 protected:
    friend class columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    static ALWAYS_INLINE void onNewKey([[maybe_unused]] StringRef & key, [[maybe_unused]] Arena & pool)
    {
        if constexpr (place_string_to_arena)
            key.data = pool.insert(key.data, key.size);
    }
 };
 /// Cache stores dictionaries and saved_hash per dictionary key.
 class LowCardinalityDictionaryCache : public HashMethodContext
 {
 public:
    /// Will assume that dictionaries with same hash has the same keys.
    /// Just in case, check that they have also the same size.
    struct DictionaryKey
    {
        UInt128 hash;
        UInt64 size;
        bool operator== (const DictionaryKey & other) const { return hash == other.hash && size == other.size; }
    };
    struct DictionaryKeyHash
    {
        size_t operator()(const DictionaryKey & key) const
        {
            SipHash hash;
            hash.update(key.hash.low);
            hash.update(key.hash.high);
            hash.update(key.size);
            return hash.get64();
        }
    };
    struct CachedValues
    {
        /// Store ptr to dictionary to be sure it won't be deleted.
        ColumnPtr dictionary_holder;
        /// Hashes for dictionary keys.
        const UInt64 * saved_hash = nullptr;
    };
    using CachedValuesPtr = std::shared_ptr<CachedValues>;
    explicit LowCardinalityDictionaryCache(const HashMethodContext::Settings & settings) : cache(settings.max_threads) {}
    CachedValuesPtr get(const DictionaryKey & key) { return cache.get(key); }
    void set(const DictionaryKey & key, const CachedValuesPtr & mapped) { cache.set(key, mapped); }
 private:
    using Cache = LRUCache<DictionaryKey, CachedValues, DictionaryKeyHash>;
    Cache cache;
 };
 /// Single low cardinality column.
 template <typename SingleColumnMethod, typename Mapped, bool use_cache>
 struct HashMethodSingleLowCardinalityColumn : public SingleColumnMethod
 {
    using Base = SingleColumnMethod;
    enum class VisitValue
    {
        Empty = 0,
        Found = 1,
        NotFound = 2,
    };
    static constexpr bool has_mapped = !std::is_same<Mapped, void>::value;
    using EmplaceResult = columns_hashing_impl::EmplaceResultImpl<Mapped>;
    using FindResult = columns_hashing_impl::FindResultImpl<Mapped>;
    static HashMethodContextPtr createContext(const HashMethodContext::Settings & settings)
    {
        return std::make_shared<LowCardinalityDictionaryCache>(settings);
    }
    ColumnRawPtrs key_columns;
    const IColumn * positions = nullptr;
    size_t size_of_index_type = 0;
    /// saved hash is from current column or from cache.
    const UInt64 * saved_hash = nullptr;
    /// Hold dictionary in case saved_hash is from cache to be sure it won't be deleted.
    ColumnPtr dictionary_holder;
    /// Cache AggregateDataPtr for current column in order to decrease the number of hash table usages.
    columns_hashing_impl::MappedCache<Mapped> mapped_cache;
    PaddedPODArray<VisitValue> visit_cache;
    /// If initialized column is nullable.
    bool is_nullable = false;
    static const ColumnLowCardinality & getLowCardinalityColumn(const IColumn * low_cardinality_column)
    {
        auto column = typeid_cast<const ColumnLowCardinality *>(low_cardinality_column);
        if (!column)
            throw Exception("Invalid aggregation key type for HashMethodSingleLowCardinalityColumn method. "
                            "Excepted LowCardinality, got " + column->getName(), ErrorCodes::LOGICAL_ERROR);
        return *column;
    }
    HashMethodSingleLowCardinalityColumn(
        const ColumnRawPtrs & key_columns_low_cardinality, const Sizes & key_sizes, const HashMethodContextPtr & context)
        : Base({getLowCardinalityColumn(key_columns_low_cardinality[0]).getDictionary().getNestedNotNullableColumn().get()}, key_sizes, context)
    {
        auto column = &getLowCardinalityColumn(key_columns_low_cardinality[0]);
        if (!context)
            throw Exception("Cache wasn't created for HashMethodSingleLowCardinalityColumn",
                            ErrorCodes::LOGICAL_ERROR);
        LowCardinalityDictionaryCache * cache;
        if constexpr (use_cache)
        {
            cache = typeid_cast<LowCardinalityDictionaryCache *>(context.get());
            if (!cache)
            {
                const auto & cached_val = *context;
                throw Exception("Invalid type for HashMethodSingleLowCardinalityColumn cache: "
                                + demangle(typeid(cached_val).name()), ErrorCodes::LOGICAL_ERROR);
            }
        }
        auto * dict = column->getDictionary().getNestedNotNullableColumn().get();
        is_nullable = column->getDictionary().nestedColumnIsNullable();
        key_columns = {dict};
        bool is_shared_dict = column->isSharedDictionary();
        typename LowCardinalityDictionaryCache::DictionaryKey dictionary_key;
        typename LowCardinalityDictionaryCache::CachedValuesPtr cached_values;
        if (is_shared_dict)
        {
            dictionary_key = {column->getDictionary().getHash(), dict->size()};
            if constexpr (use_cache)
                cached_values = cache->get(dictionary_key);
        }
        if (cached_values)
        {
            saved_hash = cached_values->saved_hash;
            dictionary_holder = cached_values->dictionary_holder;
        }
        else
        {
            saved_hash = column->getDictionary().tryGetSavedHash();
            dictionary_holder = column->getDictionaryPtr();
            if constexpr (use_cache)
            {
                if (is_shared_dict)
                {
                    cached_values = std::make_shared<typename LowCardinalityDictionaryCache::CachedValues>();
                    cached_values->saved_hash = saved_hash;
                    cached_values->dictionary_holder = dictionary_holder;
                    cache->set(dictionary_key, cached_values);
                }
            }
        }
        if constexpr (has_mapped)
            mapped_cache.resize(key_columns[0]->size());
        VisitValue empty(VisitValue::Empty);
        visit_cache.assign(key_columns[0]->size(), empty);
        size_of_index_type = column->getSizeOfIndexType();
        positions = column->getIndexesPtr().get();
    }
    ALWAYS_INLINE size_t getIndexAt(size_t row) const
    {
        switch (size_of_index_type)
        {
            case sizeof(UInt8): return static_cast<const ColumnUInt8 *>(positions)->getElement(row);
            case sizeof(UInt16): return static_cast<const ColumnUInt16 *>(positions)->getElement(row);
            case sizeof(UInt32): return static_cast<const ColumnUInt32 *>(positions)->getElement(row);
            case sizeof(UInt64): return static_cast<const ColumnUInt64 *>(positions)->getElement(row);
            default: throw Exception("Unexpected size of index type for low cardinality column.", ErrorCodes::LOGICAL_ERROR);
        }
    }
    /// Get the key from the key columns for insertion into the hash table.
    ALWAYS_INLINE auto getKey(size_t row, Arena & pool) const
    {
        return Base::getKey(getIndexAt(row), pool);
    }
    template <typename Data>
    ALWAYS_INLINE EmplaceResult emplaceKey(Data & data, size_t row_, Arena & pool)
    {
        size_t row = getIndexAt(row_);
        if (is_nullable && row == 0)
        {
            visit_cache[row] = VisitValue::Found;
            bool has_null_key = data.hasNullKeyData();
            data.hasNullKeyData() = true;
            if constexpr (has_mapped)
                return EmplaceResult(data.getNullKeyData(), mapped_cache[0], !has_null_key);
            else
                return EmplaceResult(!has_null_key);
        }
        if (visit_cache[row] == VisitValue::Found)
        {
            if constexpr (has_mapped)
                return EmplaceResult(mapped_cache[row], mapped_cache[row], false);
            else
                return EmplaceResult(false);
        }
        auto key = getKey(row_, pool);
        bool inserted = false;
        typename Data::iterator it;
        if (saved_hash)
            data.emplace(key, it, inserted, saved_hash[row]);
        else
            data.emplace(key, it, inserted);
        visit_cache[row] = VisitValue::Found;
        if (inserted)
        {
            if constexpr (has_mapped)
            {
                new(&it->second) Mapped();
                Base::onNewKey(it->first, pool);
            }
            else
                Base::onNewKey(*it, pool);
        }
        if constexpr (has_mapped)
            return EmplaceResult(it->second, mapped_cache[row], inserted);
        else
            return EmplaceResult(inserted);
    }
    ALWAYS_INLINE bool isNullAt(size_t i)
    {
        if (!is_nullable)
            return false;
        return getIndexAt(i) == 0;
    }
    template <typename Data>
    ALWAYS_INLINE FindResult findFromRow(Data & data, size_t row_, Arena & pool)
    {
        size_t row = getIndexAt(row_);
        if (is_nullable && row == 0)
        {
            if constexpr (has_mapped)
                return FindResult(data.hasNullKeyData() ? &data.getNullKeyData() : nullptr, data.hasNullKeyData());
            else
                return FindResult(data.hasNullKeyData());
        }
        if (visit_cache[row] != VisitValue::Empty)
        {
            if constexpr (has_mapped)
                return FindResult(&mapped_cache[row], visit_cache[row] == VisitValue::Found);
            else
                return FindResult(visit_cache[row] == VisitValue::Found);
        }
        auto key = getKey(row_, pool);
        typename Data::iterator it;
        if (saved_hash)
            it = data.find(key, saved_hash[row]);
        else
            it = data.find(key);
        bool found = it != data.end();
        visit_cache[row] = found ? VisitValue::Found : VisitValue::NotFound;
        if constexpr (has_mapped)
        {
            if (found)
                mapped_cache[row] = it->second;
        }
        if constexpr (has_mapped)
            return FindResult(&mapped_cache[row], found);
        else
            return FindResult(found);
    }
    template <typename Data>
    ALWAYS_INLINE size_t getHash(const Data & data, size_t row, Arena & pool)
    {
        row = getIndexAt(row);
        if (saved_hash)
            return saved_hash[row];
        return Base::getHash(data, row, pool);
    }
 };
 // Optional mask for low cardinality columns.
 template <bool has_low_cardinality>
 struct LowCardinalityKeys
 {
    ColumnRawPtrs nested_columns;
    ColumnRawPtrs positions;
    Sizes position_sizes;
 };
 template <>
 struct LowCardinalityKeys<false> {};
 /// For the case when all keys are of fixed length, and they fit in N (for example, 128) bits.
 template <typename Value, typename Key, typename Mapped, bool has_nullable_keys_ = false, bool has_low_cardinality_ = false, bool use_cache = true>
 struct HashMethodKeysFixed
    : private columns_hashing_impl::BaseStateKeysFixed<Key, has_nullable_keys_>
    , public columns_hashing_impl::HashMethodBase<HashMethodKeysFixed<Value, Key, Mapped, has_nullable_keys_, has_low_cardinality_, use_cache>, Value, Mapped, use_cache>
 {
    using Self = HashMethodKeysFixed<Value, Key, Mapped, has_nullable_keys_, has_low_cardinality_, use_cache>;
    using BaseHashed = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    using Base = columns_hashing_impl::BaseStateKeysFixed<Key, has_nullable_keys_>;
    static constexpr bool has_nullable_keys = has_nullable_keys_;
    static constexpr bool has_low_cardinality = has_low_cardinality_;
    LowCardinalityKeys<has_low_cardinality> low_cardinality_keys;
    Sizes key_sizes;
    size_t keys_size;
    HashMethodKeysFixed(const ColumnRawPtrs & key_columns, const Sizes & key_sizes, const HashMethodContextPtr &)
        : Base(key_columns), key_sizes(std::move(key_sizes)), keys_size(key_columns.size())
    {
        if constexpr (has_low_cardinality)
        {
            low_cardinality_keys.nested_columns.resize(key_columns.size());
            low_cardinality_keys.positions.assign(key_columns.size(), nullptr);
            low_cardinality_keys.position_sizes.resize(key_columns.size());
            for (size_t i = 0; i < key_columns.size(); ++i)
            {
                if (auto * low_cardinality_col = typeid_cast<const ColumnLowCardinality *>(key_columns[i]))
                {
                    low_cardinality_keys.nested_columns[i] = low_cardinality_col->getDictionary().getNestedColumn().get();
                    low_cardinality_keys.positions[i] = &low_cardinality_col->getIndexes();
                    low_cardinality_keys.position_sizes[i] = low_cardinality_col->getSizeOfIndexType();
                }
                else
                    low_cardinality_keys.nested_columns[i] = key_columns[i];
            }
        }
    }
    ALWAYS_INLINE Key getKey(size_t row, Arena &) const
    {
        if constexpr (has_nullable_keys)
        {
            auto bitmap = Base::createBitmap(row);
            return packFixed<Key>(row, keys_size, Base::getActualColumns(), key_sizes, bitmap);
        }
        else
        {
            if constexpr (has_low_cardinality)
                return packFixed<Key, true>(row, keys_size, low_cardinality_keys.nested_columns, key_sizes,
                                            &low_cardinality_keys.positions, &low_cardinality_keys.position_sizes);
            return packFixed<Key>(row, keys_size, Base::getActualColumns(), key_sizes);
        }
    }
 };
 /** Hash by concatenating serialized key values.
  * The serialized value differs in that it uniquely allows to deserialize it, having only the position with which it starts.
  * That is, for example, for strings, it contains first the serialized length of the string, and then the bytes.
  * Therefore, when aggregating by several strings, there is no ambiguity.
  */
 template <typename Value, typename Mapped>
 struct HashMethodSerialized
    : public columns_hashing_impl::HashMethodBase<HashMethodSerialized<Value, Mapped>, Value, Mapped, false>
 {
    using Self = HashMethodSerialized<Value, Mapped>;
    using Base = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, false>;
    ColumnRawPtrs key_columns;
    size_t keys_size;
    HashMethodSerialized(const ColumnRawPtrs & key_columns, const Sizes & /*key_sizes*/, const HashMethodContextPtr &)
        : key_columns(key_columns), keys_size(key_columns.size()) {}
 protected:
    friend class columns_hashing_impl::HashMethodBase<Self, Value, Mapped, false>;
    ALWAYS_INLINE StringRef getKey(size_t row, Arena & pool) const
    {
        return serializeKeysToPoolContiguous(row, keys_size, key_columns, pool);
    }
    static ALWAYS_INLINE void onExistingKey(StringRef & key, Arena & pool) { pool.rollback(key.size); }
 };
 /// For the case when there is one string key.
 template <typename Value, typename Mapped, bool use_cache = true>
 struct HashMethodHashed
    : public columns_hashing_impl::HashMethodBase<HashMethodHashed<Value, Mapped, use_cache>, Value, Mapped, use_cache>
 {
    using Key = UInt128;
    using Self = HashMethodHashed<Value, Mapped, use_cache>;
    using Base = columns_hashing_impl::HashMethodBase<Self, Value, Mapped, use_cache>;
    ColumnRawPtrs key_columns;
    HashMethodHashed(ColumnRawPtrs key_columns, const Sizes &, const HashMethodContextPtr &)
        : key_columns(std::move(key_columns)) {}
    ALWAYS_INLINE Key getKey(size_t row, Arena &) const { return hash128(row, key_columns.size(), key_columns); }
    static ALWAYS_INLINE StringRef getValueRef(const Value & value)
    {
        return StringRef(reinterpret_cast<const char *>(&value.first), sizeof(value.first));
    }
 };
 }
 }
--- a/dbms/src/Common/ColumnsHashingImpl.h
+++ b/dbms/src/Common/ColumnsHashingImpl.h
@ -0,0 +1,356 @@
 #pragma once
 #include <Columns/IColumn.h>
 #include <Interpreters/AggregationCommon.h>
 namespace DB
 {
 namespace ColumnsHashing
 {
 /// Generic context for HashMethod. Context is shared between multiple threads, all methods must be thread-safe.
 /// Is used for caching.
 class HashMethodContext
 {
 public:
    virtual ~HashMethodContext() = default;
    struct Settings
    {
        size_t max_threads;
    };
 };
 using HashMethodContextPtr = std::shared_ptr<HashMethodContext>;
 namespace columns_hashing_impl
 {
 template <typename Value, bool consecutive_keys_optimization_>
 struct LastElementCache
 {
    static constexpr bool consecutive_keys_optimization = consecutive_keys_optimization_;
    Value value;
    bool empty = true;
    bool found = false;
    bool check(const Value & value_) { return !empty && value == value_; }
    template <typename Key>
    bool check(const Key & key) { return !empty && value.first == key; }
 };
 template <typename Data>
 struct LastElementCache<Data, false>
 {
    static constexpr bool consecutive_keys_optimization = false;
 };
 template <typename Mapped>
 class EmplaceResultImpl
 {
    Mapped & value;
    Mapped & cached_value;
    bool inserted;
 public:
    EmplaceResultImpl(Mapped & value, Mapped & cached_value, bool inserted)
            : value(value), cached_value(cached_value), inserted(inserted) {}
    bool isInserted() const { return inserted; }
    auto & getMapped() const { return value; }
    void setMapped(const Mapped & mapped)
    {
        cached_value = mapped;
        value = mapped;
    }
 };
 template <>
 class EmplaceResultImpl<void>
 {
    bool inserted;
 public:
    explicit EmplaceResultImpl(bool inserted) : inserted(inserted) {}
    bool isInserted() const { return inserted; }
 };
 template <typename Mapped>
 class FindResultImpl
 {
    Mapped * value;
    bool found;
 public:
    FindResultImpl(Mapped * value, bool found) : value(value), found(found) {}
    bool isFound() const { return found; }
    Mapped & getMapped() const { return *value; }
 };
 template <>
 class FindResultImpl<void>
 {
    bool found;
 public:
    explicit FindResultImpl(bool found) : found(found) {}
    bool isFound() const { return found; }
 };
 template <typename Derived, typename Value, typename Mapped, bool consecutive_keys_optimization>
 class HashMethodBase
 {
 public:
    using EmplaceResult = EmplaceResultImpl<Mapped>;
    using FindResult = FindResultImpl<Mapped>;
    static constexpr bool has_mapped = !std::is_same<Mapped, void>::value;
    using Cache = LastElementCache<Value, consecutive_keys_optimization>;
    static HashMethodContextPtr createContext(const HashMethodContext::Settings &) { return nullptr; }
    template <typename Data>
    ALWAYS_INLINE EmplaceResult emplaceKey(Data & data, size_t row, Arena & pool)
    {
        auto key = static_cast<Derived &>(*this).getKey(row, pool);
        return emplaceKeyImpl(key, data, pool);
    }
    template <typename Data>
    ALWAYS_INLINE FindResult findKey(Data & data, size_t row, Arena & pool)
    {
        auto key = static_cast<Derived &>(*this).getKey(row, pool);
        auto res = findKeyImpl(key, data);
        static_cast<Derived &>(*this).onExistingKey(key, pool);
        return res;
    }
    template <typename Data>
    ALWAYS_INLINE size_t getHash(const Data & data, size_t row, Arena & pool)
    {
        auto key = static_cast<Derived &>(*this).getKey(row, pool);
        auto res = data.hash(key);
        static_cast<Derived &>(*this).onExistingKey(key, pool);
        return res;
    }
 protected:
    Cache cache;
    HashMethodBase()
    {
        if constexpr (consecutive_keys_optimization)
        {
            if constexpr (has_mapped)
            {
                /// Init PairNoInit elements.
                cache.value.second = Mapped();
                using Key = decltype(cache.value.first);
                cache.value.first = Key();
            }
            else
                cache.value = Value();
        }
    }
    template <typename Key>
    static ALWAYS_INLINE void onNewKey(Key & /*key*/, Arena & /*pool*/) {}
    template <typename Key>
    static ALWAYS_INLINE void onExistingKey(Key & /*key*/, Arena & /*pool*/) {}
    template <typename Data, typename Key>
    ALWAYS_INLINE EmplaceResult emplaceKeyImpl(Key key, Data & data, Arena & pool)
    {
        if constexpr (Cache::consecutive_keys_optimization)
        {
            if (cache.found && cache.check(key))
            {
                static_cast<Derived &>(*this).onExistingKey(key, pool);
                if constexpr (has_mapped)
                    return EmplaceResult(cache.value.second, cache.value.second, false);
                else
                    return EmplaceResult(false);
            }
        }
        typename Data::iterator it;
        bool inserted = false;
        data.emplace(key, it, inserted);
        [[maybe_unused]] Mapped * cached = nullptr;
        if constexpr (has_mapped)
            cached = &it->second;
        if (inserted)
        {
            if constexpr (has_mapped)
            {
                new(&it->second) Mapped();
                static_cast<Derived &>(*this).onNewKey(it->first, pool);
            }
            else
                static_cast<Derived &>(*this).onNewKey(*it, pool);
        }
        else
            static_cast<Derived &>(*this).onExistingKey(key, pool);
        if constexpr (consecutive_keys_optimization)
        {
            cache.value = *it;
            cache.found = true;
            cache.empty = false;
            if constexpr (has_mapped)
                cached = &cache.value.second;
        }
        if constexpr (has_mapped)
            return EmplaceResult(it->second, *cached, inserted);
        else
            return EmplaceResult(inserted);
    }
    template <typename Data, typename Key>
    ALWAYS_INLINE FindResult findKeyImpl(Key key, Data & data)
    {
        if constexpr (Cache::consecutive_keys_optimization)
        {
            if (cache.check(key))
            {
                if constexpr (has_mapped)
                    return FindResult(&cache.value.second, cache.found);
                else
                    return FindResult(cache.found);
            }
        }
        auto it = data.find(key);
        bool found = it != data.end();
        if constexpr (consecutive_keys_optimization)
        {
            cache.found = found;
            cache.empty = false;
            if (found)
                cache.value = *it;
            else
            {
                if constexpr (has_mapped)
                    cache.value.first = key;
                else
                    cache.value = key;
            }
        }
        if constexpr (has_mapped)
            return FindResult(found ? &it->second : nullptr, found);
        else
            return FindResult(found);
    }
 };
 template <typename T>
 struct MappedCache : public PaddedPODArray<T> {};
 template <>
 struct MappedCache<void> {};
 /// This class is designed to provide the functionality that is required for
 /// supporting nullable keys in HashMethodKeysFixed. If there are
 /// no nullable keys, this class is merely implemented as an empty shell.
 template <typename Key, bool has_nullable_keys>
 class BaseStateKeysFixed;
 /// Case where nullable keys are supported.
 template <typename Key>
 class BaseStateKeysFixed<Key, true>
 {
 protected:
    BaseStateKeysFixed(const ColumnRawPtrs & key_columns)
    {
        null_maps.reserve(key_columns.size());
        actual_columns.reserve(key_columns.size());
        for (const auto & col : key_columns)
        {
            if (col->isColumnNullable())
            {
                const auto & nullable_col = static_cast<const ColumnNullable &>(*col);
                actual_columns.push_back(&nullable_col.getNestedColumn());
                null_maps.push_back(&nullable_col.getNullMapColumn());
            }
            else
            {
                actual_columns.push_back(col);
                null_maps.push_back(nullptr);
            }
        }
    }
    /// Return the columns which actually contain the values of the keys.
    /// For a given key column, if it is nullable, we return its nested
    /// column. Otherwise we return the key column itself.
    inline const ColumnRawPtrs & getActualColumns() const
    {
        return actual_columns;
    }
    /// Create a bitmap that indicates whether, for a particular row,
    /// a key column bears a null value or not.
    KeysNullMap<Key> createBitmap(size_t row) const
    {
        KeysNullMap<Key> bitmap{};
        for (size_t k = 0; k < null_maps.size(); ++k)
        {
            if (null_maps[k] != nullptr)
            {
                const auto & null_map = static_cast<const ColumnUInt8 &>(*null_maps[k]).getData();
                if (null_map[row] == 1)
                {
                    size_t bucket = k / 8;
                    size_t offset = k % 8;
                    bitmap[bucket] |= UInt8(1) << offset;
                }
            }
        }
        return bitmap;
    }
 private:
    ColumnRawPtrs actual_columns;
    ColumnRawPtrs null_maps;
 };
 /// Case where nullable keys are not supported.
 template <typename Key>
 class BaseStateKeysFixed<Key, false>
 {
 protected:
    BaseStateKeysFixed(const ColumnRawPtrs & columns) : actual_columns(columns) {}
    const ColumnRawPtrs & getActualColumns() const { return actual_columns; }
    KeysNullMap<Key> createBitmap(size_t) const
    {
        throw Exception{"Internal error: calling createBitmap() for non-nullable keys"
                        " is forbidden", ErrorCodes::LOGICAL_ERROR};
    }
 private:
    ColumnRawPtrs actual_columns;
 };
 }
 }
 }
--- a/dbms/src/Common/CurrentThread.h
+++ b/dbms/src/Common/CurrentThread.h
@ -69,7 +69,7 @@ public:
    static void finalizePerformanceCounters();
    /// Returns a non-empty string if the thread is attached to a query
-    static std::string getCurrentQueryID();
+    static const std::string & getQueryId();
    /// Non-master threads call this method in destructor automatically
    static void detachQuery();
--- a/dbms/src/Common/ErrorCodes.cpp
+++ b/dbms/src/Common/ErrorCodes.cpp
@ -414,6 +414,9 @@ namespace ErrorCodes
    extern const int PROTOBUF_FIELD_NOT_REPEATED = 437;
    extern const int DATA_TYPE_CANNOT_BE_PROMOTED = 438;
    extern const int CANNOT_SCHEDULE_TASK = 439;
    extern const int INVALID_LIMIT_EXPRESSION = 440;
    extern const int CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING = 441;
    extern const int BAD_DATABASE_FOR_TEMPORARY_TABLE = 442;
    extern const int KEEPER_EXCEPTION = 999;
    extern const int POCO_EXCEPTION = 1000;
--- a/dbms/src/Common/Exception.cpp
+++ b/dbms/src/Common/Exception.cpp
@ -38,10 +38,10 @@ std::string errnoToString(int code, int e)
 #endif
    {
        std::string tmp = std::to_string(code);
-        const char * code = tmp.c_str();
+        const char * code_str = tmp.c_str();
        const char * unknown_message = "Unknown error ";
        strcpy(buf, unknown_message);
-        strcpy(buf + strlen(unknown_message), code);
+        strcpy(buf + strlen(unknown_message), code_str);
    }
    return "errno: " + toString(e) + ", strerror: " + std::string(buf);
 #else
@ -88,7 +88,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded
        try
        {
            stream << "Poco::Exception. Code: " << ErrorCodes::POCO_EXCEPTION << ", e.code() = " << e.code()
-                << ", e.displayText() = " << e.displayText() << ", e.what() = " << e.what();
+                << ", e.displayText() = " << e.displayText();
        }
        catch (...) {}
    }
@ -202,7 +202,7 @@ std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool
            }
        }
-        stream << "Code: " << e.code() << ", e.displayText() = " << text << ", e.what() = " << e.what();
+        stream << "Code: " << e.code() << ", e.displayText() = " << text;
        if (with_stacktrace && !has_embedded_stack_trace)
            stream << ", Stack trace:\n\n" << e.getStackTrace().toString();
--- a/dbms/src/Common/Exception.h
+++ b/dbms/src/Common/Exception.h
@ -33,6 +33,7 @@ public:
    Exception * clone() const override { return new Exception(*this); }
    void rethrow() const override { throw *this; }
    const char * name() const throw() override { return "DB::Exception"; }
    const char * what() const throw() override { return message().data(); }
    /// Add something to the existing message.
    void addMessage(const std::string & arg) { extendedMessage(arg); }
--- a/dbms/src/Common/IFactoryWithAliases.h
+++ b/dbms/src/Common/IFactoryWithAliases.h
@ -1,6 +1,7 @@
 #pragma once
 #include <Common/Exception.h>
 #include <Common/NamePrompter.h>
 #include <Core/Types.h>
 #include <Poco/String.h>
@ -105,6 +106,12 @@ public:
        return aliases.count(name) || case_insensitive_aliases.count(name);
    }
    std::vector<String> getHints(const String & name) const
    {
        static const auto registered_names = getAllRegisteredNames();
        return prompter.getHints(name, registered_names);
    }
    virtual ~IFactoryWithAliases() {}
 private:
@ -120,6 +127,13 @@ private:
    /// Case insensitive aliases
    AliasMap case_insensitive_aliases;
    /**
      * prompter for names, if a person makes a typo for some function or type, it
      * helps to find best possible match (in particular, edit distance is done like in clang
      * (max edit distance is (typo.size() + 2) / 3)
      */
    NamePrompter</*MaxNumHints=*/2> prompter;
 };
 }
--- a/dbms/src/Common/NamePrompter.h
+++ b/dbms/src/Common/NamePrompter.h
@ -0,0 +1,92 @@
 #pragma once
 #include <Core/Types.h>
 #include <algorithm>
 #include <cctype>
 #include <cmath>
 #include <queue>
 #include <utility>
 namespace DB
 {
 template <size_t MaxNumHints>
 class NamePrompter
 {
 public:
    using DistanceIndex = std::pair<size_t, size_t>;
    using DistanceIndexQueue = std::priority_queue<DistanceIndex>;
    static std::vector<String> getHints(const String & name, const std::vector<String> & prompting_strings)
    {
        DistanceIndexQueue queue;
        for (size_t i = 0; i < prompting_strings.size(); ++i)
            appendToQueue(i, name, queue, prompting_strings);
        return release(queue, prompting_strings);
    }
 private:
    static size_t levenshteinDistance(const String & lhs, const String & rhs)
    {
        size_t n = lhs.size();
        size_t m = rhs.size();
        std::vector<std::vector<size_t>> dp(n + 1, std::vector<size_t>(m + 1));
        for (size_t i = 1; i <= n; ++i)
            dp[i][0] = i;
        for (size_t i = 1; i <= m; ++i)
            dp[0][i] = i;
        for (size_t j = 1; j <= m; ++j)
        {
            for (size_t i = 1; i <= n; ++i)
            {
                if (std::tolower(lhs[i - 1]) == std::tolower(rhs[j - 1]))
                    dp[i][j] = dp[i - 1][j - 1];
                else
                    dp[i][j] = std::min(dp[i - 1][j] + 1, std::min(dp[i][j - 1] + 1, dp[i - 1][j - 1] + 1));
            }
        }
        return dp[n][m];
    }
    static void appendToQueue(size_t ind, const String & name, DistanceIndexQueue & queue, const std::vector<String> & prompting_strings)
    {
        const String & prompt = prompting_strings[ind];
        /// Clang SimpleTypoCorrector logic
        const size_t min_possible_edit_distance = std::abs(static_cast<int64_t>(name.size()) - static_cast<int64_t>(prompt.size()));
        const size_t mistake_factor = (name.size() + 2) / 3;
        if (min_possible_edit_distance > 0 && name.size() / min_possible_edit_distance < 3)
            return;
        if (prompt.size() <= name.size() + mistake_factor && prompt.size() + mistake_factor >= name.size())
        {
            size_t distance = levenshteinDistance(prompt, name);
            if (distance <= mistake_factor)
            {
                queue.emplace(distance, ind);
                if (queue.size() > MaxNumHints)
                    queue.pop();
            }
        }
    }
    static std::vector<String> release(DistanceIndexQueue & queue, const std::vector<String> & prompting_strings)
    {
        std::vector<String> ans;
        ans.reserve(queue.size());
        while (!queue.empty())
        {
            auto top = queue.top();
            queue.pop();
            ans.push_back(prompting_strings[top.second]);
        }
        std::reverse(ans.begin(), ans.end());
        return ans;
    }
 };
 }
--- a/dbms/src/Common/SipHash.h
+++ b/dbms/src/Common/SipHash.h
@ -17,6 +17,7 @@
 #include <common/unaligned.h>
 #include <string>
 #include <type_traits>
 #include <Core/Defines.h>
 #define ROTL(x, b) static_cast<UInt64>(((x) << (b)) | ((x) >> (64 - (b))))
@ -49,7 +50,7 @@ private:
        UInt8 current_bytes[8];
    };
-    void finalize()
+    ALWAYS_INLINE void finalize()
    {
        /// In the last free byte, we write the remainder of the division by 256.
        current_bytes[7] = cnt;
@ -156,7 +157,7 @@ public:
    /// template for avoiding 'unsigned long long' vs 'unsigned long' problem on old poco in macos
    template <typename T>
-    void get128(T & lo, T & hi)
+    ALWAYS_INLINE void get128(T & lo, T & hi)
    {
        static_assert(sizeof(T) == 8);
        finalize();
@ -199,8 +200,6 @@ std::enable_if_t<std::/*has_unique_object_representations_v*/is_standard_layout_
    return hash.get64();
 }
 #include <string>
 inline UInt64 sipHash64(const std::string & s)
 {
    return sipHash64(s.data(), s.size());
--- a/dbms/src/Common/ThreadStatus.cpp
+++ b/dbms/src/Common/ThreadStatus.cpp
@ -21,7 +21,7 @@ namespace ErrorCodes
 }
-thread_local ThreadStatusPtr current_thread = nullptr;
+thread_local ThreadStatus * current_thread = nullptr;
 TasksStatsCounters TasksStatsCounters::current()
@ -124,7 +124,7 @@ void ThreadStatus::attachInternalTextLogsQueue(const InternalTextLogsQueuePtr &
    if (!thread_group)
        return;
-    std::unique_lock lock(thread_group->mutex);
+    std::lock_guard lock(thread_group->mutex);
    thread_group->logs_queue_ptr = logs_queue;
 }
--- a/dbms/src/Common/ThreadStatus.h
+++ b/dbms/src/Common/ThreadStatus.h
@ -25,7 +25,6 @@ namespace DB
 class Context;
 class QueryStatus;
 class ThreadStatus;
 using ThreadStatusPtr = ThreadStatus*;
 class QueryThreadLog;
 struct TasksStatsCounters;
 struct RUsageCounters;
@ -46,7 +45,7 @@ using InternalTextLogsQueueWeakPtr = std::weak_ptr<InternalTextLogsQueue>;
 class ThreadGroupStatus
 {
 public:
-    mutable std::shared_mutex mutex;
+    mutable std::mutex mutex;
    ProfileEvents::Counters performance_counters{VariableContext::Process};
    MemoryTracker memory_tracker{VariableContext::Process};
@ -56,12 +55,11 @@ public:
    InternalTextLogsQueueWeakPtr logs_queue_ptr;
-    /// Key is Poco's thread_id
+    std::vector<UInt32> thread_numbers;
    using QueryThreadStatuses = std::map<UInt32, ThreadStatusPtr>;
    QueryThreadStatuses thread_statuses;
    /// The first thread created this thread group
-    ThreadStatusPtr master_thread;
+    UInt32 master_thread_number = 0;
    Int32 master_thread_os_id = -1;
    String query;
 };
@ -69,7 +67,7 @@ public:
 using ThreadGroupStatusPtr = std::shared_ptr<ThreadGroupStatus>;
-extern thread_local ThreadStatusPtr current_thread;
+extern thread_local ThreadStatus * current_thread;
 /** Encapsulates all per-thread info (ProfileEvents, MemoryTracker, query_id, query context, etc.).
  * The object must be created in thread function and destroyed in the same thread before the exit.
@ -116,7 +114,7 @@ public:
        return thread_state.load(std::memory_order_relaxed);
    }
-    String getQueryID();
+    const std::string & getQueryId() const;
    /// Starts new query and create new thread group for it, current thread becomes master thread of the query
    void initializeQuery();
@ -160,6 +158,8 @@ protected:
    /// Use it only from current thread
    Context * query_context = nullptr;
    String query_id;
    /// A logs queue used by TCPHandler to pass logs to a client
    InternalTextLogsQueueWeakPtr logs_queue_ptr;
--- a/dbms/src/Common/Throttler.h
+++ b/dbms/src/Common/Throttler.h
@ -108,7 +108,7 @@ public:
 private:
    size_t count = 0;
    const size_t max_speed = 0;
-    const size_t limit = 0;        /// 0 - not limited.
+    const UInt64 limit = 0;        /// 0 - not limited.
    const char * limit_exceeded_exception_message = nullptr;
    Stopwatch watch {CLOCK_MONOTONIC_COARSE};
    std::mutex mutex;
--- a/dbms/src/Common/XDBCBridgeHelper.h
+++ b/dbms/src/Common/XDBCBridgeHelper.h
@ -33,7 +33,7 @@ class IXDBCBridgeHelper
 public:
    static constexpr inline auto DEFAULT_FORMAT = "RowBinary";
-    virtual std::vector<std::pair<std::string, std::string>> getURLParams(const std::string & cols, size_t max_block_size) const = 0;
+    virtual std::vector<std::pair<std::string, std::string>> getURLParams(const std::string & cols, UInt64 max_block_size) const = 0;
    virtual void startBridgeSync() const = 0;
    virtual Poco::URI getMainURI() const = 0;
    virtual Poco::URI getColumnsInfoURI() const = 0;
@ -127,7 +127,7 @@ public:
    /**
     * @todo leaky abstraction - used by external API's
     */
-    std::vector<std::pair<std::string, std::string>> getURLParams(const std::string & cols, size_t max_block_size) const override
+    std::vector<std::pair<std::string, std::string>> getURLParams(const std::string & cols, UInt64 max_block_size) const override
    {
        std::vector<std::pair<std::string, std::string>> result;
--- a/dbms/src/Common/formatIPv6.cpp
+++ b/dbms/src/Common/formatIPv6.cpp
@ -1,12 +1,44 @@
 #include <Common/formatIPv6.h>
 #include <Common/hex.h>
 #include <Common/StringUtils/StringUtils.h>
 #include <ext/range.h>
 #include <array>
-
+#include <algorithm>
 namespace DB
 {
 // To be used in formatIPv4, maps a byte to it's string form prefixed with length (so save strlen call).
 extern const char one_byte_to_string_lookup_table[256][4] = {
    {1, '0'}, {1, '1'}, {1, '2'}, {1, '3'}, {1, '4'}, {1, '5'}, {1, '6'}, {1, '7'}, {1, '8'}, {1, '9'},
    {2, '1', '0'}, {2, '1', '1'}, {2, '1', '2'}, {2, '1', '3'}, {2, '1', '4'}, {2, '1', '5'}, {2, '1', '6'}, {2, '1', '7'}, {2, '1', '8'}, {2, '1', '9'},
    {2, '2', '0'}, {2, '2', '1'}, {2, '2', '2'}, {2, '2', '3'}, {2, '2', '4'}, {2, '2', '5'}, {2, '2', '6'}, {2, '2', '7'}, {2, '2', '8'}, {2, '2', '9'},
    {2, '3', '0'}, {2, '3', '1'}, {2, '3', '2'}, {2, '3', '3'}, {2, '3', '4'}, {2, '3', '5'}, {2, '3', '6'}, {2, '3', '7'}, {2, '3', '8'}, {2, '3', '9'},
    {2, '4', '0'}, {2, '4', '1'}, {2, '4', '2'}, {2, '4', '3'}, {2, '4', '4'}, {2, '4', '5'}, {2, '4', '6'}, {2, '4', '7'}, {2, '4', '8'}, {2, '4', '9'},
    {2, '5', '0'}, {2, '5', '1'}, {2, '5', '2'}, {2, '5', '3'}, {2, '5', '4'}, {2, '5', '5'}, {2, '5', '6'}, {2, '5', '7'}, {2, '5', '8'}, {2, '5', '9'},
    {2, '6', '0'}, {2, '6', '1'}, {2, '6', '2'}, {2, '6', '3'}, {2, '6', '4'}, {2, '6', '5'}, {2, '6', '6'}, {2, '6', '7'}, {2, '6', '8'}, {2, '6', '9'},
    {2, '7', '0'}, {2, '7', '1'}, {2, '7', '2'}, {2, '7', '3'}, {2, '7', '4'}, {2, '7', '5'}, {2, '7', '6'}, {2, '7', '7'}, {2, '7', '8'}, {2, '7', '9'},
    {2, '8', '0'}, {2, '8', '1'}, {2, '8', '2'}, {2, '8', '3'}, {2, '8', '4'}, {2, '8', '5'}, {2, '8', '6'}, {2, '8', '7'}, {2, '8', '8'}, {2, '8', '9'},
    {2, '9', '0'}, {2, '9', '1'}, {2, '9', '2'}, {2, '9', '3'}, {2, '9', '4'}, {2, '9', '5'}, {2, '9', '6'}, {2, '9', '7'}, {2, '9', '8'}, {2, '9', '9'},
    {3, '1', '0', '0'}, {3, '1', '0', '1'}, {3, '1', '0', '2'}, {3, '1', '0', '3'}, {3, '1', '0', '4'}, {3, '1', '0', '5'}, {3, '1', '0', '6'}, {3, '1', '0', '7'}, {3, '1', '0', '8'}, {3, '1', '0', '9'},
    {3, '1', '1', '0'}, {3, '1', '1', '1'}, {3, '1', '1', '2'}, {3, '1', '1', '3'}, {3, '1', '1', '4'}, {3, '1', '1', '5'}, {3, '1', '1', '6'}, {3, '1', '1', '7'}, {3, '1', '1', '8'}, {3, '1', '1', '9'},
    {3, '1', '2', '0'}, {3, '1', '2', '1'}, {3, '1', '2', '2'}, {3, '1', '2', '3'}, {3, '1', '2', '4'}, {3, '1', '2', '5'}, {3, '1', '2', '6'}, {3, '1', '2', '7'}, {3, '1', '2', '8'}, {3, '1', '2', '9'},
    {3, '1', '3', '0'}, {3, '1', '3', '1'}, {3, '1', '3', '2'}, {3, '1', '3', '3'}, {3, '1', '3', '4'}, {3, '1', '3', '5'}, {3, '1', '3', '6'}, {3, '1', '3', '7'}, {3, '1', '3', '8'}, {3, '1', '3', '9'},
    {3, '1', '4', '0'}, {3, '1', '4', '1'}, {3, '1', '4', '2'}, {3, '1', '4', '3'}, {3, '1', '4', '4'}, {3, '1', '4', '5'}, {3, '1', '4', '6'}, {3, '1', '4', '7'}, {3, '1', '4', '8'}, {3, '1', '4', '9'},
    {3, '1', '5', '0'}, {3, '1', '5', '1'}, {3, '1', '5', '2'}, {3, '1', '5', '3'}, {3, '1', '5', '4'}, {3, '1', '5', '5'}, {3, '1', '5', '6'}, {3, '1', '5', '7'}, {3, '1', '5', '8'}, {3, '1', '5', '9'},
    {3, '1', '6', '0'}, {3, '1', '6', '1'}, {3, '1', '6', '2'}, {3, '1', '6', '3'}, {3, '1', '6', '4'}, {3, '1', '6', '5'}, {3, '1', '6', '6'}, {3, '1', '6', '7'}, {3, '1', '6', '8'}, {3, '1', '6', '9'},
    {3, '1', '7', '0'}, {3, '1', '7', '1'}, {3, '1', '7', '2'}, {3, '1', '7', '3'}, {3, '1', '7', '4'}, {3, '1', '7', '5'}, {3, '1', '7', '6'}, {3, '1', '7', '7'}, {3, '1', '7', '8'}, {3, '1', '7', '9'},
    {3, '1', '8', '0'}, {3, '1', '8', '1'}, {3, '1', '8', '2'}, {3, '1', '8', '3'}, {3, '1', '8', '4'}, {3, '1', '8', '5'}, {3, '1', '8', '6'}, {3, '1', '8', '7'}, {3, '1', '8', '8'}, {3, '1', '8', '9'},
    {3, '1', '9', '0'}, {3, '1', '9', '1'}, {3, '1', '9', '2'}, {3, '1', '9', '3'}, {3, '1', '9', '4'}, {3, '1', '9', '5'}, {3, '1', '9', '6'}, {3, '1', '9', '7'}, {3, '1', '9', '8'}, {3, '1', '9', '9'},
    {3, '2', '0', '0'}, {3, '2', '0', '1'}, {3, '2', '0', '2'}, {3, '2', '0', '3'}, {3, '2', '0', '4'}, {3, '2', '0', '5'}, {3, '2', '0', '6'}, {3, '2', '0', '7'}, {3, '2', '0', '8'}, {3, '2', '0', '9'},
    {3, '2', '1', '0'}, {3, '2', '1', '1'}, {3, '2', '1', '2'}, {3, '2', '1', '3'}, {3, '2', '1', '4'}, {3, '2', '1', '5'}, {3, '2', '1', '6'}, {3, '2', '1', '7'}, {3, '2', '1', '8'}, {3, '2', '1', '9'},
    {3, '2', '2', '0'}, {3, '2', '2', '1'}, {3, '2', '2', '2'}, {3, '2', '2', '3'}, {3, '2', '2', '4'}, {3, '2', '2', '5'}, {3, '2', '2', '6'}, {3, '2', '2', '7'}, {3, '2', '2', '8'}, {3, '2', '2', '9'},
    {3, '2', '3', '0'}, {3, '2', '3', '1'}, {3, '2', '3', '2'}, {3, '2', '3', '3'}, {3, '2', '3', '4'}, {3, '2', '3', '5'}, {3, '2', '3', '6'}, {3, '2', '3', '7'}, {3, '2', '3', '8'}, {3, '2', '3', '9'},
    {3, '2', '4', '0'}, {3, '2', '4', '1'}, {3, '2', '4', '2'}, {3, '2', '4', '3'}, {3, '2', '4', '4'}, {3, '2', '4', '5'}, {3, '2', '4', '6'}, {3, '2', '4', '7'}, {3, '2', '4', '8'}, {3, '2', '4', '9'},
    {3, '2', '5', '0'}, {3, '2', '5', '1'}, {3, '2', '5', '2'}, {3, '2', '5', '3'}, {3, '2', '5', '4'}, {3, '2', '5', '5'},
 };
 /// integer logarithm, return ceil(log(value, base)) (the smallest integer greater or equal than log(value, base)
 static constexpr UInt32 intLog(const UInt32 value, const UInt32 base, const bool carry)
 {
@ -45,22 +77,6 @@ static void printInteger(char *& out, T value)
    }
 }
 /// print IPv4 address as %u.%u.%u.%u
 static void formatIPv4(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_count)
 {
    const auto limit = IPV4_BINARY_LENGTH - zeroed_tail_bytes_count;
    for (const auto i : ext::range(0, IPV4_BINARY_LENGTH))
    {
        UInt8 byte = (i < limit) ? src[i] : 0;
        printInteger<10, UInt8>(dst, byte);
        if (i != IPV4_BINARY_LENGTH - 1)
            *dst++ = '.';
    }
 }
 void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_count)
 {
    struct { int base, len; } best{-1, 0}, cur{-1, 0};
@ -122,8 +138,14 @@ void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_
        /// Is this address an encapsulated IPv4?
        if (i == 6 && best.base == 0 && (best.len == 6 || (best.len == 5 && words[5] == 0xffffu)))
        {
-            formatIPv4(src + 12, dst, std::min(zeroed_tail_bytes_count, static_cast<UInt8>(IPV4_BINARY_LENGTH)));
+            UInt8 ipv4_buffer[IPV4_BINARY_LENGTH] = {0};
-            break;
+            memcpy(ipv4_buffer, src + 12, IPV4_BINARY_LENGTH);
            // Due to historical reasons formatIPv4() takes ipv4 in BE format, but inside ipv6 we store it in LE-format.
            std::reverse(std::begin(ipv4_buffer), std::end(ipv4_buffer));
            formatIPv4(ipv4_buffer, dst, std::min(zeroed_tail_bytes_count, static_cast<UInt8>(IPV4_BINARY_LENGTH)), "0");
            // formatIPv4 has already added a null-terminator for us.
            return;
        }
        printInteger<16>(dst, words[i]);
--- a/dbms/src/Common/formatIPv6.h
+++ b/dbms/src/Common/formatIPv6.h
@ -1,12 +1,17 @@
 #pragma once
 #include <common/Types.h>
 #include <string.h>
 #include <algorithm>
 #include <utility>
 #include <ext/range.h>
 #include <Common/hex.h>
 #include <Common/StringUtils/StringUtils.h>
-#define IPV4_BINARY_LENGTH 4
+constexpr size_t IPV4_BINARY_LENGTH = 4;
-#define IPV6_BINARY_LENGTH 16
+constexpr size_t IPV6_BINARY_LENGTH = 16;
-#define IPV4_MAX_TEXT_LENGTH 15     /// Does not count tail zero byte.
+constexpr size_t IPV4_MAX_TEXT_LENGTH = 15;     /// Does not count tail zero byte.
-#define IPV6_MAX_TEXT_LENGTH 39
+constexpr size_t IPV6_MAX_TEXT_LENGTH = 39;
 namespace DB
 {
@ -18,4 +23,205 @@ namespace DB
  */
 void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_count = 0);
 /** Unsafe (no bounds-checking for src nor dst), optimized version of parsing IPv4 string.
 *
 * Parses the input string `src` and stores binary BE value into buffer pointed by `dst`,
 * which should be long enough.
 * That is "127.0.0.1" becomes 0x7f000001.
 *
 * In case of failure returns false and doesn't modify buffer pointed by `dst`.
 *
 * @param src - input string, expected to be non-null and null-terminated right after the IPv4 string value.
 * @param dst - where to put output bytes, expected to be non-null and atleast IPV4_BINARY_LENGTH-long.
 * @return false if parsing failed, true otherwise.
 */
 inline bool parseIPv4(const char * src, unsigned char * dst)
 {
    UInt32 result = 0;
    for (int offset = 24; offset >= 0; offset -= 8)
    {
        UInt32 value = 0;
        size_t len = 0;
        while (isNumericASCII(*src) && len <= 3)
        {
            value = value * 10 + (*src - '0');
            ++len;
            ++src;
        }
        if (len == 0 || value > 255 || (offset > 0 && *src != '.'))
            return false;
        result |= value << offset;
        ++src;
    }
    if (*(src - 1) != '\0')
        return false;
    memcpy(dst, &result, sizeof(result));
    return true;
 }
 /** Unsafe (no bounds-checking for src nor dst), optimized version of parsing IPv6 string.
 *
 * Slightly altered implementation from http://svn.apache.org/repos/asf/apr/apr/trunk/network_io/unix/inet_pton.c
 * Parses the input string `src` and stores binary LE value into buffer pointed by `dst`,
 * which should be long enough. In case of failure zeroes
 * IPV6_BINARY_LENGTH bytes of buffer pointed by `dst`.
 *
 * @param src - input string, expected to be non-null and null-terminated right after the IPv6 string value.
 * @param dst - where to put output bytes, expected to be non-null and atleast IPV6_BINARY_LENGTH-long.
 * @return false if parsing failed, true otherwise.
 */
 inline bool parseIPv6(const char * src, unsigned char * dst)
 {
    const auto clear_dst = [dst]()
    {
        memset(dst, '\0', IPV6_BINARY_LENGTH);
        return false;
    };
    /// Leading :: requires some special handling.
    if (*src == ':')
        if (*++src != ':')
            return clear_dst();
    unsigned char tmp[IPV6_BINARY_LENGTH]{};
    auto tp = tmp;
    auto endp = tp + IPV6_BINARY_LENGTH;
    auto curtok = src;
    auto saw_xdigit = false;
    UInt32 val{};
    unsigned char * colonp = nullptr;
    /// Assuming zero-terminated string.
    while (const auto ch = *src++)
    {
        const auto num = unhex(ch);
        if (num != '\xff')
        {
            val <<= 4;
            val |= num;
            if (val > 0xffffu)
                return clear_dst();
            saw_xdigit = 1;
            continue;
        }
        if (ch == ':')
        {
            curtok = src;
            if (!saw_xdigit)
            {
                if (colonp)
                    return clear_dst();
                colonp = tp;
                continue;
            }
            if (tp + sizeof(UInt16) > endp)
                return clear_dst();
            *tp++ = static_cast<unsigned char>((val >> 8) & 0xffu);
            *tp++ = static_cast<unsigned char>(val & 0xffu);
            saw_xdigit = false;
            val = 0;
            continue;
        }
        if (ch == '.' && (tp + IPV4_BINARY_LENGTH) <= endp)
        {
            if (!parseIPv4(curtok, tp))
                return clear_dst();
            std::reverse(tp, tp + IPV4_BINARY_LENGTH);
            tp += IPV4_BINARY_LENGTH;
            saw_xdigit = false;
            break;    /* '\0' was seen by ipv4_scan(). */
        }
        return clear_dst();
    }
    if (saw_xdigit)
    {
        if (tp + sizeof(UInt16) > endp)
            return clear_dst();
        *tp++ = static_cast<unsigned char>((val >> 8) & 0xffu);
        *tp++ = static_cast<unsigned char>(val & 0xffu);
    }
    if (colonp)
    {
        /*
         * Since some memmove()'s erroneously fail to handle
         * overlapping regions, we'll do the shift by hand.
         */
        const auto n = tp - colonp;
        for (int i = 1; i <= n; ++i)
        {
            endp[- i] = colonp[n - i];
            colonp[n - i] = 0;
        }
        tp = endp;
    }
    if (tp != endp)
        return clear_dst();
    memcpy(dst, tmp, sizeof(tmp));
    return true;
 }
 /** Format 4-byte binary sequesnce as IPv4 text: 'aaa.bbb.ccc.ddd',
  * expects inout to be in BE-format, that is 0x7f000001 => "127.0.0.1".
  *
  * Any number of the tail bytes can be masked with given mask string.
  *
  * Assumptions:
  *     src is IPV4_BINARY_LENGTH long,
  *     dst is IPV4_MAX_TEXT_LENGTH long,
  *     mask_tail_octets <= IPV4_BINARY_LENGTH
  *     mask_string is NON-NULL, if mask_tail_octets > 0.
  *
  * Examples:
  *     formatIPv4(&0x7f000001, dst, mask_tail_octets = 0, nullptr);
  *         > dst == "127.0.0.1"
  *     formatIPv4(&0x7f000001, dst, mask_tail_octets = 1, "xxx");
  *         > dst == "127.0.0.xxx"
  *     formatIPv4(&0x7f000001, dst, mask_tail_octets = 1, "0");
  *         > dst == "127.0.0.0"
  */
 inline void formatIPv4(const unsigned char * src, char *& dst, UInt8 mask_tail_octets = 0, const char * mask_string = "xxx")
 {
    extern const char one_byte_to_string_lookup_table[256][4];
    const size_t mask_length = mask_string ? strlen(mask_string) : 0;
    const size_t limit = std::min(IPV4_BINARY_LENGTH, IPV4_BINARY_LENGTH - mask_tail_octets);
    for (size_t octet = 0; octet < limit; ++octet)
    {
        const UInt8 value = static_cast<UInt8>(src[IPV4_BINARY_LENGTH - octet - 1]);
        auto rep = one_byte_to_string_lookup_table[value];
        const UInt8 len = rep[0];
        const char* str = rep + 1;
        memcpy(dst, str, len);
        dst += len;
        *dst++ = '.';
    }
    for (size_t mask = 0; mask < mask_tail_octets; ++mask)
    {
        memcpy(dst, mask_string, mask_length);
        dst += mask_length;
        *dst++ = '.';
    }
    dst[-1] = '\0';
 }
 }
--- a/dbms/src/Common/setThreadName.cpp
+++ b/dbms/src/Common/setThreadName.cpp
@ -1,17 +1,18 @@
 #include <pthread.h>
 #if defined(__APPLE__)
 #include <pthread.h>
 #elif defined(__FreeBSD__)
-#include <pthread.h>
+    #include <pthread_np.h>
 #include <pthread_np.h>
 #else
-#include <sys/prctl.h>
+    #include <sys/prctl.h>
 #endif
-#include <pthread.h>
+
 #include <cstring>
 #include <Common/Exception.h>
 #include <Common/setThreadName.h>
 namespace DB
 {
 namespace ErrorCodes
@ -23,6 +24,11 @@ namespace ErrorCodes
 void setThreadName(const char * name)
 {
 #ifndef NDEBUG
    if (strlen(name) > 15)
        throw DB::Exception("Thread name cannot be longer than 15 bytes", DB::ErrorCodes::PTHREAD_ERROR);
 #endif
 #if defined(__FreeBSD__)
    pthread_set_name_np(pthread_self(), name);
    return;
--- a/Show More
+++ b/Show More
		`@ -1 +1 @@`
			`Subproject commit 520465510efef7704346cf8d140967c4abb057c1`				`Subproject commit 860c90e92eee6690aa74a2ca7b7c5c6930dffecd`
		`@ -1 +0,0 @@`
			`Subproject commit 08879029ab8dcb80a70142acb709e3df02de5d37`
		`@ -0,0 +1,2 @@`
							`Source from https://github.com/orlp/pdqsort`
							`Mandatory for Clickhouse, not available in OS packages, we can't use it as submodule.`