Merge branch 'master' of github.com:yandex/ClickHouse

This commit is contained in:
BayoNet 2019-03-25 16:46:01 +03:00
commit 05f0a95e45
183 changed files with 5228 additions and 1668 deletions

3
.gitmodules vendored
View File

@ -76,3 +76,6 @@
[submodule "contrib/brotli"]
path = contrib/brotli
url = https://github.com/google/brotli.git
[submodule "contrib/hyperscan"]
path = contrib/hyperscan
url = https://github.com/ClickHouse-Extras/hyperscan.git

View File

@ -1,3 +1,8 @@
## ClickHouse release 19.4.1.3, 2019-03-19
### Bug Fixes
* Fixed remote queries which contain both `LIMIT BY` and `LIMIT`. Previously, if `LIMIT BY` and `LIMIT` were used for remote query, `LIMIT` could happen before `LIMIT BY`, which led to too filtered result. [#4708](https://github.com/yandex/ClickHouse/pull/4708) ([Constantin S. Pan](https://github.com/kvap))
## ClickHouse release 19.4.0.49, 2019-03-09
### New Features
@ -62,7 +67,7 @@
### Bug fixes
* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 19.3.6, 2019-03-02

View File

@ -1,3 +1,89 @@
## ClickHouse release 19.4.0.49, 2019-03-09
### Новые возможности
* Добавлена полная поддержка формата `Protobuf` (чтение и запись, вложенные структуры данных). [#4174](https://github.com/yandex/ClickHouse/pull/4174) [#4493](https://github.com/yandex/ClickHouse/pull/4493) ([Vitaly Baranov](https://github.com/vitlibar))
* Добавлены функции для работы с битовыми масками с использованием библиотеки Roaring Bitmaps. [#4207](https://github.com/yandex/ClickHouse/pull/4207) ([Andy Yang](https://github.com/andyyzh)) [#4568](https://github.com/yandex/ClickHouse/pull/4568) ([Vitaly Baranov](https://github.com/vitlibar))
* Поддержка формата `Parquet` [#4448](https://github.com/yandex/ClickHouse/pull/4448) ([proller](https://github.com/proller))
* Вычисление расстояния между строками с помощью подсчёта N-грам - для приближённого сравнения строк. Алгоритм похож на q-gram metrics в языке R. [#4466](https://github.com/yandex/ClickHouse/pull/4466) ([Danila Kutenin](https://github.com/danlark1))
* Движок таблиц GraphiteMergeTree поддерживает отдельные шаблоны для правил агрегации и для правил времени хранения. [#4426](https://github.com/yandex/ClickHouse/pull/4426) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
* Добавлены настройки `max_execution_speed` и `max_execution_speed_bytes` для того, чтобы ограничить потребление ресурсов запросами. Добавлена настройка `min_execution_speed_bytes` в дополнение к `min_execution_speed`. [#4430](https://github.com/yandex/ClickHouse/pull/4430) ([Winter Zhang](https://github.com/zhang2014))
* Добавлена функция `flatten` - конвертация многомерных массивов в плоский массив. [#4555](https://github.com/yandex/ClickHouse/pull/4555) [#4409](https://github.com/yandex/ClickHouse/pull/4409) ([alexey-milovidov](https://github.com/alexey-milovidov), [kzon](https://github.com/kzon))
* Добавлены функции `arrayEnumerateDenseRanked` и `arrayEnumerateUniqRanked` (похожа на `arrayEnumerateUniq` но позволяет указать глубину, на которую следует смотреть в многомерные массивы). [#4475](https://github.com/yandex/ClickHouse/pull/4475) ([proller](https://github.com/proller)) [#4601](https://github.com/yandex/ClickHouse/pull/4601) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Добавлена поддержка множества JOIN в одном запросе без подзапросов, с некоторыми ограничениями: без звёздочки и без алиасов сложных выражений в ON/WHERE/GROUP BY/... [#4462](https://github.com/yandex/ClickHouse/pull/4462) ([Artem Zuikov](https://github.com/4ertus2))
### Исправления ошибок
* Этот релиз также содержит все исправления из 19.3 и 19.1.
* Исправлена ошибка во вторичных индексах (экспериментальная возможность): порядок гранул при INSERT был неверным. [#4407](https://github.com/yandex/ClickHouse/pull/4407) ([Nikita Vasilev](https://github.com/nikvas0))
* Исправлена работа вторичного индекса (экспериментальная возможность) типа `set` для столбцов типа `Nullable` и `LowCardinality`. Ранее их использование вызывало ошибку `Data type must be deserialized with multiple streams` при запросе SELECT. [#4594](https://github.com/yandex/ClickHouse/pull/4594) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Правильное запоминание времени последнего обновления при полной перезагрузке словарей типа `executable`. [#4551](https://github.com/yandex/ClickHouse/pull/4551) ([Tema Novikov](https://github.com/temoon))
* Исправлена неработоспособность прогресс-бара, возникшая в версии 19.3 [#4627](https://github.com/yandex/ClickHouse/pull/4627) ([filimonov](https://github.com/filimonov))
* Исправлены неправильные значения MemoryTracker, если кусок памяти был уменьшен в размере, в очень редких случаях. [#4619](https://github.com/yandex/ClickHouse/pull/4619) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлено undefined behaviour в ThreadPool [#4612](https://github.com/yandex/ClickHouse/pull/4612) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлено очень редкое падение с сообщением `mutex lock failed: Invalid argument`, которое могло произойти, если таблица типа MergeTree удалялась одновременно с SELECT. [#4608](https://github.com/yandex/ClickHouse/pull/4608) ([Alex Zatelepin](https://github.com/ztlpn))
* Совместимость ODBC драйвера с типом данных `LowCardinality` [#4381](https://github.com/yandex/ClickHouse/pull/4381) ([proller](https://github.com/proller))
* Исправление ошибки `AIOcontextPool: Found io_event with unknown id 0` под ОС FreeBSD [#4438](https://github.com/yandex/ClickHouse/pull/4438) ([urgordeadbeef](https://github.com/urgordeadbeef))
* Таблица `system.part_log` создавалась независимо от того, была ли она объявлена в конфигурации. [#4483](https://github.com/yandex/ClickHouse/pull/4483) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлено undefined behaviour в функции `dictIsIn` для словарей типа `cache`. [#4515](https://github.com/yandex/ClickHouse/pull/4515) ([alesapin](https://github.com/alesapin))
* Исправлен deadlock в случае, если запрос SELECT блокирует одну и ту же таблицу несколько раз (например - из разных потоков, либо при выполнении разных подзапросов) и одновременно с этим производится DDL запрос. [#4535](https://github.com/yandex/ClickHouse/pull/4535) ([Alex Zatelepin](https://github.com/ztlpn))
* Настройка `compile_expressions` выключена по-умолчанию до тех пор, пока мы не зафиксируем исходники используемой библиотеки `LLVM` и не будем проверять её под `ASan` (сейчас библиотека LLVM берётся из системы). [#4579](https://github.com/yandex/ClickHouse/pull/4579) ([alesapin](https://github.com/alesapin))
* Исправлено падение по `std::terminate`, если `invalidate_query` для внешних словарей с истоником `clickhouse` вернул неправильный результат (пустой; более чем одну строку; более чем один столбец). Исправлена ошибка, из-за которой запрос `invalidate_query` производился каждые пять секунд, независимо от указанного `lifetime`. [#4583](https://github.com/yandex/ClickHouse/pull/4583) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлен deadlock в случае, если запрос `invalidate_query` для внешнего словаря с источником `clickhouse` использовал таблицу `system.dictionaries` или базу данных типа `Dictionary` (редкий случай). [#4599](https://github.com/yandex/ClickHouse/pull/4599) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена работа CROSS JOIN с пустым WHERE [#4598](https://github.com/yandex/ClickHouse/pull/4598) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлен segfault в функции `replicate` с константным аргументом. [#4603](https://github.com/yandex/ClickHouse/pull/4603) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена работа predicate pushdown (настройка `enable_optimize_predicate_expression`) с лямбда-функциями. [#4408](https://github.com/yandex/ClickHouse/pull/4408) ([Winter Zhang](https://github.com/zhang2014))
* Множественные исправления для множества JOIN в одном запросе. [#4595](https://github.com/yandex/ClickHouse/pull/4595) ([Artem Zuikov](https://github.com/4ertus2))
### Улучшения
* Поддержка алиасов в секции JOIN ON для правой таблицы [#4412](https://github.com/yandex/ClickHouse/pull/4412) ([Artem Zuikov](https://github.com/4ertus2))
* Используются правильные алиасы в случае множественных JOIN с подзапросами. [#4474](https://github.com/yandex/ClickHouse/pull/4474) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлена логика работы predicate pushdown (настройка `enable_optimize_predicate_expression`) для JOIN. [#4387](https://github.com/yandex/ClickHouse/pull/4387) ([Ivan](https://github.com/abyss7))
### Улучшения производительности
* Улучшена эвристика оптимизации "перенос в PREWHERE". [#4405](https://github.com/yandex/ClickHouse/pull/4405) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Используются настоящие lookup таблицы вместо хэш-таблиц в случае 8 и 16 битных ключей. Интерфейс хэш-таблиц обобщён, чтобы поддерживать этот случай. [#4536](https://github.com/yandex/ClickHouse/pull/4536) ([Amos Bird](https://github.com/amosbird))
* Улучшена производительность сравнения строк. [#4564](https://github.com/yandex/ClickHouse/pull/4564) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Очередь DDL операций (для запросов ON CLUSTER) очищается в отдельном потоке, чтобы не замедлять основную работу. [#4502](https://github.com/yandex/ClickHouse/pull/4502) ([Alex Zatelepin](https://github.com/ztlpn))
* Даже если настройка `min_bytes_to_use_direct_io` выставлена в 1, не каждый файл открывался в режиме O_DIRECT, потому что размер файлов иногда недооценивался на размер одного сжатого блока. [#4526](https://github.com/yandex/ClickHouse/pull/4526) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Улучшения сборки/тестирования/пакетирования
* Добавлена поддержка компилятора clang-9 [#4604](https://github.com/yandex/ClickHouse/pull/4604) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлены неправильные `__asm__` инструкции [#4621](https://github.com/yandex/ClickHouse/pull/4621) ([Konstantin Podshumok](https://github.com/podshumok))
* Добавлена поддержка задания настроек выполнения запросов для `clickhouse-performance-test` из командной строки. [#4437](https://github.com/yandex/ClickHouse/pull/4437) ([alesapin](https://github.com/alesapin))
* Тесты словарей перенесены в интеграционные тесты. [#4477](https://github.com/yandex/ClickHouse/pull/4477) ([alesapin](https://github.com/alesapin))
* В набор автоматизированных тестов производительности добавлены запросы, находящиеся в разделе "benchmark" на официальном сайте. [#4496](https://github.com/yandex/ClickHouse/pull/4496) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправления сборки в случае использования внешних библиотек lz4 и xxhash. [#4495](https://github.com/yandex/ClickHouse/pull/4495) ([Orivej Desh](https://github.com/orivej))
* Исправлен undefined behaviour, если функция `quantileTiming` была вызвана с отрицательным или нецелым аргументом (обнаружено с помощью fuzz test под undefined behaviour sanitizer). [#4506](https://github.com/yandex/ClickHouse/pull/4506) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлены опечатки в коде. [#4531](https://github.com/yandex/ClickHouse/pull/4531) ([sdk2](https://github.com/sdk2))
* Исправлена сборка под Mac. [#4371](https://github.com/yandex/ClickHouse/pull/4371) ([Vitaly Baranov](https://github.com/vitlibar))
* Исправлена сборка под FreeBSD и для некоторых необычных конфигурациях сборки. [#4444](https://github.com/yandex/ClickHouse/pull/4444) ([proller](https://github.com/proller))
## ClickHouse release 19.3.7, 2019-03-12
### Исправления ошибок
* Исправлена ошибка в #3920. Ошибка проявлялась в виде случайных повреждений кэша (сообщения `Unknown codec family code`, `Cannot seek through file`) и segfault. Ошибка впервые возникла в 19.1 и присутствует во всех версиях до 19.1.10 и 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 19.3.6, 2019-03-02
### Исправления ошибок
* Если в пуле потоков было более 1000 потоков, то при выходе из потока, вызывается `std::terminate`. [Azat Khuzhin](https://github.com/azat) [#4485](https://github.com/yandex/ClickHouse/pull/4485) [#4505](https://github.com/yandex/ClickHouse/pull/4505) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Теперь возможно создавать таблицы `ReplicatedMergeTree*` с комментариями столбцов без указания DEFAULT, а также с CODEC но без COMMENT и DEFAULT. Исправлено сравнение CODEC друг с другом. [#4523](https://github.com/yandex/ClickHouse/pull/4523) ([alesapin](https://github.com/alesapin))
* Исправлено падение при JOIN по массивам и кортежам. [#4552](https://github.com/yandex/ClickHouse/pull/4552) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлено падение `clickhouse-copier` с сообщением `ThreadStatus not created`. [#4540](https://github.com/yandex/ClickHouse/pull/4540) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлено зависание сервера при завершении работы в случае использования распределённых DDL. [#4472](https://github.com/yandex/ClickHouse/pull/4472) ([Alex Zatelepin](https://github.com/ztlpn))
* В сообщениях об ошибке при парсинге текстовых форматов, выдавались неправильные номера столбцов, в случае, если номер больше 10. [#4484](https://github.com/yandex/ClickHouse/pull/4484) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Улучшения сборки/тестирования/пакетирования
* Исправлена сборка с включенным AVX. [#4527](https://github.com/yandex/ClickHouse/pull/4527) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена поддержка расширенных метрик выполнения запроса в случае, если ClickHouse был собран на системе с новым ядром Linux, а запускается на системе с существенно более старым ядром. [#4541](https://github.com/yandex/ClickHouse/pull/4541) ([nvartolomei](https://github.com/nvartolomei))
* Продолжение работы в случае невозможности применить настройку `core_dump.size_limit` с выводом предупреждения. [#4473](https://github.com/yandex/ClickHouse/pull/4473) ([proller](https://github.com/proller))
* Удалено `inline` для `void readBinary(...)` в `Field.cpp`. [#4530](https://github.com/yandex/ClickHouse/pull/4530) ([hcz](https://github.com/hczhcz))
## ClickHouse release 19.3.5, 2019-02-21
### Исправления ошибок:
@ -74,7 +160,7 @@
* Исправлена ошибка, из-за которой при запросе к таблице `system.tables` могло возникать исключение `table doesn't exist`. [#4313](https://github.com/yandex/ClickHouse/pull/4313) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, приводившая к падению `clickhouse-client` в интерактивном режиме, если успеть выйти из него во время загрузки подсказок командной строки. [#4317](https://github.com/yandex/ClickHouse/pull/4317) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, приводившая к неверным результатам исполнения мутаций, содержащих оператор `IN`. [#4099](https://github.com/yandex/ClickHouse/pull/4099) ([Alex Zatelepin](https://github.com/ztlpn))
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлено повторное создание таблиц с системными логами (`system.query_log`, `system.part_log`) при остановке сервера. [#4254](https://github.com/yandex/ClickHouse/pull/4254) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлен вывод типа возвращаемого значения, а также использование блокировок в функции `joinGet`. [#4153](https://github.com/yandex/ClickHouse/pull/4153) ([Amos Bird](https://github.com/amosbird))
* Исправлено падение сервера при использовании настройки `allow_experimental_multiple_joins_emulation`. [52de2c](https://github.com/yandex/ClickHouse/commit/52de2cd927f7b5257dd67e175f0a5560a48840d0) ([Artem Zuikov](https://github.com/4ertus2))
@ -98,7 +184,7 @@
* Добавлен инструмент, собирающий changelog из описаний pull request-ов. [#4169](https://github.com/yandex/ClickHouse/pull/4169) [#4173](https://github.com/yandex/ClickHouse/pull/4173) ([KochetovNicolai](https://github.com/KochetovNicolai)) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Добавлен puppet-модуль для Clickhouse. [#4182](https://github.com/yandex/ClickHouse/pull/4182) ([Maxim Fedotov](https://github.com/MaxFedotov))
* Добавлена документация для нескольких недокументированных функций. [#4168](https://github.com/yandex/ClickHouse/pull/4168) ([Winter Zhang](https://github.com/zhang2014))
* Исправления сборки под ARM. [#4210](https://github.com/yandex/ClickHouse/pull/4210)[#4306](https://github.com/yandex/ClickHouse/pull/4306) [#4291](https://github.com/yandex/ClickHouse/pull/4291) ([proller](https://github.com/proller)) ([proller](https://github.com/proller))
* Исправления сборки под ARM. [#4210](https://github.com/yandex/ClickHouse/pull/4210)[#4306](https://github.com/yandex/ClickHouse/pull/4306) [#4291](https://github.com/yandex/ClickHouse/pull/4291) ([proller](https://github.com/proller)) ([proller](https://github.com/proller))
* Добавлена возможность запускать тесты словарей из `ctest`. [#4189](https://github.com/yandex/ClickHouse/pull/4189) ([proller](https://github.com/proller))
* Теперь директорией с SSL-сертификатами по умолчанию является `/etc/ssl`. [#4167](https://github.com/yandex/ClickHouse/pull/4167) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Добавлена проверка доступности SSE и AVX-инструкций на старте. [#4234](https://github.com/yandex/ClickHouse/pull/4234) ([Igr](https://github.com/igron99))
@ -133,6 +219,18 @@
* Уменьшено время ожидания завершения сервера и завершения запросов `ALTER`. [#4372](https://github.com/yandex/ClickHouse/pull/4372) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Добавлена информация о значении настройки `replicated_can_become_leader` в таблицу `system.replicas`. Добавлено логирование в случае, если реплика не собирается стать лидером. [#4379](https://github.com/yandex/ClickHouse/pull/4379) ([Alex Zatelepin](https://github.com/ztlpn))
## ClickHouse release 19.1.14, 2019-03-14
* Исправлена ошибка `Column ... queried more than once`, которая могла произойти в случае включенной настройки `asterisk_left_columns_only` в случае использования `GLOBAL JOIN` а также `SELECT *` (редкий случай). Эта ошибка изначально отсутствует в версиях 19.3 и более новых. [6bac7d8d](https://github.com/yandex/ClickHouse/pull/4692/commits/6bac7d8d11a9b0d6de0b32b53c47eb2f6f8e7062) ([Artem Zuikov](https://github.com/4ertus2))
## ClickHouse release 19.1.13, 2019-03-12
Этот релиз содержит такие же исправления ошибок, как и 19.3.7.
## ClickHouse release 19.1.10, 2019-03-03
Этот релиз содержит такие же исправления ошибок, как и 19.3.6.
## ClickHouse release 19.1.9, 2019-02-21
### Исправления ошибок:
@ -152,7 +250,7 @@
* Исправлен вывод типа возвращаемого значения, а также использование блокировок в функции `joinGet`. [#4153](https://github.com/yandex/ClickHouse/pull/4153) ([Amos Bird](https://github.com/amosbird))
* Исправлено повторное создание таблиц с системными логами (`system.query_log`, `system.part_log`) при остановке сервера. [#4254](https://github.com/yandex/ClickHouse/pull/4254) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, приводившая к неверным результатам исполнения мутаций, содержащих оператор `IN`. [#4099](https://github.com/yandex/ClickHouse/pull/4099) ([Alex Zatelepin](https://github.com/ztlpn))
* Исправлена ошибка, приводившая к падению `clickhouse-client` в интерактивном режиме, если успеть выйти из него во время загрузки подсказок командной строки. [#4317](https://github.com/yandex/ClickHouse/pull/4317) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка, из-за которой при запросе к таблице `system.tables` могло возникать исключение `table doesn't exist`. [#4313](https://github.com/yandex/ClickHouse/pull/4313) ([alexey-milovidov](https://github.com/alexey-milovidov))

View File

@ -1,5 +1,6 @@
project (ClickHouse)
cmake_minimum_required (VERSION 3.3)
cmake_policy(SET CMP0023 NEW)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules/")
@ -141,7 +142,7 @@ if(NOT COMPILER_CLANG) # clang: error: the clang compiler does not support '-mar
endif()
if (ARCH_NATIVE)
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
endif ()
# Special options for better optimized code with clang
@ -180,13 +181,19 @@ include (cmake/use_libcxx.cmake)
set (DEFAULT_LIBS "")
if (OS_LINUX AND NOT UNBUNDLED)
# Note: this probably has no effict, but I'm not an expert in CMake.
# Note: this probably has no effect, but I'm not an expert in CMake.
set (CMAKE_C_IMPLICIT_LINK_LIBRARIES "")
set (CMAKE_CXX_IMPLICIT_LINK_LIBRARIES "")
# Disable default linked libraries.
set (DEFAULT_LIBS "-nodefaultlibs")
# We need builtins from Clang's RT even without libcxx - for ubsan+int128. See https://bugs.llvm.org/show_bug.cgi?id=16404
set (BUILTINS_LIB_PATH "")
if (COMPILER_CLANG)
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIB_PATH OUTPUT_STRIP_TRAILING_WHITESPACE)
endif ()
# Add C++ libraries.
#
# This consist of:
@ -197,14 +204,9 @@ if (OS_LINUX AND NOT UNBUNDLED)
#
# There are two variants of C++ library: libc++ (from LLVM compiler infrastructure) and libstdc++ (from GCC).
if (USE_LIBCXX)
set (BUILTINS_LIB_PATH "")
if (COMPILER_CLANG)
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIB_PATH OUTPUT_STRIP_TRAILING_WHITESPACE)
endif ()
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lc++ -lc++abi -lgcc_eh ${BUILTINS_LIB_PATH} -Wl,-Bdynamic")
else ()
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lstdc++ -lgcc_eh -lgcc -Wl,-Bdynamic")
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lstdc++ -lgcc_eh -lgcc ${BUILTINS_LIB_PATH} -Wl,-Bdynamic")
endif ()
# Linking with GLIBC prevents portability of binaries to older systems.
@ -216,6 +218,7 @@ if (OS_LINUX AND NOT UNBUNDLED)
string (TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC)
set (CMAKE_POSTFIX_VARIABLE "CMAKE_${CMAKE_BUILD_TYPE_UC}_POSTFIX")
# FIXME: glibc-compatibility may be non-static in some builds!
set (DEFAULT_LIBS "${DEFAULT_LIBS} libs/libglibc-compatibility/libglibc-compatibility${${CMAKE_POSTFIX_VARIABLE}}.a")
endif ()
@ -227,6 +230,11 @@ if (OS_LINUX AND NOT UNBUNDLED)
message(STATUS "Default libraries: ${DEFAULT_LIBS}")
endif ()
if (DEFAULT_LIBS)
# Add default libs to all targets as the last dependency.
set(CMAKE_CXX_STANDARD_LIBRARIES ${DEFAULT_LIBS})
endif ()
if (NOT MAKE_STATIC_LIBRARIES)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
@ -310,6 +318,7 @@ include (cmake/find_pdqsort.cmake)
include (cmake/find_hdfs3.cmake) # uses protobuf
include (cmake/find_consistent-hashing.cmake)
include (cmake/find_base64.cmake)
include (cmake/find_hyperscan.cmake)
find_contrib_lib(cityhash)
find_contrib_lib(farmhash)
find_contrib_lib(metrohash)
@ -336,35 +345,29 @@ add_subdirectory (dbms)
include (cmake/print_include_directories.cmake)
if (DEFAULT_LIBS)
# Add default libs to all targets as the last dependency.
# I have found no better way to specify default libs in CMake that will appear single time in specific order at the end of linker arguments.
function(add_default_libs target_name)
if (GLIBC_COMPATIBILITY)
# FIXME: actually glibc-compatibility should always be built first,
# because it's unconditionally linked via $DEFAULT_LIBS,
# and these looks like the first places that get linked.
function (add_glibc_compat target_name)
if (TARGET ${target_name})
# message(STATUS "Has target ${target_name}")
set_property(TARGET ${target_name} APPEND PROPERTY LINK_LIBRARIES "${DEFAULT_LIBS}")
set_property(TARGET ${target_name} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${DEFAULT_LIBS}")
if (GLIBC_COMPATIBILITY)
add_dependencies(${target_name} glibc-compatibility)
endif ()
add_dependencies(${target_name} glibc-compatibility)
endif ()
endfunction ()
add_default_libs(ltdl)
add_default_libs(zlibstatic)
add_default_libs(jemalloc)
add_default_libs(unwind)
add_default_libs(memcpy)
add_default_libs(Foundation)
add_default_libs(common)
add_default_libs(gtest)
add_default_libs(lz4)
add_default_libs(zstd)
add_default_libs(snappy)
add_default_libs(arrow)
add_default_libs(protoc)
add_default_libs(thrift_static)
add_default_libs(boost_regex_internal)
add_glibc_compat(ltdl)
add_glibc_compat(zlibstatic)
add_glibc_compat(jemalloc)
add_glibc_compat(unwind)
add_glibc_compat(memcpy)
add_glibc_compat(Foundation)
add_glibc_compat(common)
add_glibc_compat(gtest)
add_glibc_compat(lz4)
add_glibc_compat(zstd)
add_glibc_compat(snappy)
add_glibc_compat(arrow)
add_glibc_compat(protoc)
add_glibc_compat(thrift_static)
add_glibc_compat(boost_regex_internal)
endif ()

View File

@ -21,7 +21,7 @@ BUILD_TARGETS=clickhouse
BUILD_TYPE=Debug
ENABLE_EMBEDDED_COMPILER=0
CMAKE_FLAGS="-D CMAKE_C_FLAGS_ADD=-g0 -D CMAKE_CXX_FLAGS_ADD=-g0 -D ENABLE_JEMALLOC=0 -D ENABLE_CAPNP=0 -D ENABLE_RDKAFKA=0 -D ENABLE_UNWIND=0 -D ENABLE_ICU=0 -D ENABLE_POCO_MONGODB=0 -D ENABLE_POCO_NETSSL=0 -D ENABLE_POCO_ODBC=0 -D ENABLE_ODBC=0 -D ENABLE_MYSQL=0"
CMAKE_FLAGS="-D CMAKE_C_FLAGS_ADD=-g0 -D CMAKE_CXX_FLAGS_ADD=-g0 -D ENABLE_JEMALLOC=0 -D ENABLE_CAPNP=0 -D ENABLE_RDKAFKA=0 -D ENABLE_UNWIND=0 -D ENABLE_ICU=0 -D ENABLE_POCO_MONGODB=0 -D ENABLE_POCO_NETSSL=0 -D ENABLE_POCO_ODBC=0 -D ENABLE_ODBC=0 -D ENABLE_MYSQL=0 -D ENABLE_SSL=0 -D ENABLE_POCO_NETSSL=0"
[[ $(uname) == "FreeBSD" ]] && COMPILER_PACKAGE_VERSION=devel && export COMPILER_PATH=/usr/local/bin

View File

@ -0,0 +1,7 @@
if (HAVE_SSSE3)
set (HYPERSCAN_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/hyperscan/src)
set (HYPERSCAN_LIBRARY hs)
set (USE_HYPERSCAN 1)
set (USE_INTERNAL_HYPERSCAN_LIBRARY 1)
message (STATUS "Using hyperscan: ${HYPERSCAN_INCLUDE_DIR} " : ${HYPERSCAN_LIBRARY})
endif()

View File

@ -1,5 +1,5 @@
# Freebsd: contrib/cppkafka/include/cppkafka/detail/endianness.h:53:23: error: 'betoh16' was not declared in this scope
if (NOT ARCH_ARM AND NOT ARCH_32 AND NOT APPLE AND NOT OS_FREEBSD)
if (NOT ARCH_ARM AND NOT ARCH_32 AND NOT APPLE AND NOT OS_FREEBSD AND OPENSSL_FOUND)
option (ENABLE_RDKAFKA "Enable kafka" ON)
endif ()

View File

@ -1,7 +1,19 @@
option (ENABLE_SSL "Enable ssl" ON)
if (ENABLE_SSL)
if(NOT ARCH_32)
option(USE_INTERNAL_SSL_LIBRARY "Set to FALSE to use system *ssl library instead of bundled" ${NOT_UNBUNDLED})
endif()
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/ssl/CMakeLists.txt")
if(USE_INTERNAL_SSL_LIBRARY)
message(WARNING "submodule contrib/ssl is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set(USE_INTERNAL_SSL_LIBRARY 0)
set(MISSING_INTERNAL_SSL_LIBRARY 1)
endif()
set (OPENSSL_USE_STATIC_LIBS ${USE_STATIC_LIBRARIES})
if (NOT USE_INTERNAL_SSL_LIBRARY)
@ -28,7 +40,7 @@ if (NOT USE_INTERNAL_SSL_LIBRARY)
endif ()
endif ()
if (NOT OPENSSL_FOUND)
if (NOT OPENSSL_FOUND AND NOT MISSING_INTERNAL_SSL_LIBRARY)
set (USE_INTERNAL_SSL_LIBRARY 1)
set (OPENSSL_ROOT_DIR "${ClickHouse_SOURCE_DIR}/contrib/ssl")
set (OPENSSL_INCLUDE_DIR "${OPENSSL_ROOT_DIR}/include")
@ -43,4 +55,11 @@ if (NOT OPENSSL_FOUND)
set (OPENSSL_FOUND 1)
endif ()
message (STATUS "Using ssl=${OPENSSL_FOUND}: ${OPENSSL_INCLUDE_DIR} : ${OPENSSL_LIBRARIES}")
if(OPENSSL_FOUND)
# we need keep OPENSSL_FOUND for many libs in contrib
set(USE_SSL 1)
endif()
endif ()
message (STATUS "Using ssl=${USE_SSL}: ${OPENSSL_INCLUDE_DIR} : ${OPENSSL_LIBRARIES}")

View File

@ -4,7 +4,7 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -std=c++1z")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch -Wno-string-plus-int")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")
endif ()
@ -125,13 +125,17 @@ endif ()
if (ENABLE_MYSQL AND USE_INTERNAL_MYSQL_LIBRARY)
add_subdirectory (mariadb-connector-c-cmake)
target_include_directories(mysqlclient BEFORE PRIVATE ${ZLIB_INCLUDE_DIR})
target_include_directories(mysqlclient BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
if(OPENSSL_INCLUDE_DIR)
target_include_directories(mysqlclient BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
endif()
endif ()
if (USE_INTERNAL_RDKAFKA_LIBRARY)
add_subdirectory (librdkafka-cmake)
target_include_directories(rdkafka BEFORE PRIVATE ${ZLIB_INCLUDE_DIR})
target_include_directories(rdkafka BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
if(OPENSSL_INCLUDE_DIR)
target_include_directories(rdkafka BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
endif()
endif ()
if (USE_RDKAFKA)
@ -300,3 +304,7 @@ endif ()
if (USE_BASE64)
add_subdirectory (base64-cmake)
endif()
if (USE_HYPERSCAN)
add_subdirectory (hyperscan)
endif()

2
contrib/boost vendored

@ -1 +1 @@
Subproject commit 6a96e8b59f76148eb8ad54a9d15259f8ce84c606
Subproject commit 32abf16beb7bb8b243a4d100ccdd6acb271738c4

1
contrib/hyperscan vendored Submodule

@ -0,0 +1 @@
Subproject commit 05dab0efee80be405aad5f74721b692b6889b75e

View File

@ -208,7 +208,8 @@ target_link_libraries(hdfs3 ${LIBXML2_LIBRARY})
# inherit from parent cmake
target_include_directories(hdfs3 PRIVATE ${Boost_INCLUDE_DIRS})
target_include_directories(hdfs3 PRIVATE ${Protobuf_INCLUDE_DIR})
target_include_directories(hdfs3 PRIVATE ${OPENSSL_INCLUDE_DIR})
target_link_libraries(hdfs3 ${Protobuf_LIBRARY})
target_link_libraries(hdfs3 ${OPENSSL_LIBRARIES})
if(OPENSSL_INCLUDE_DIR AND OPENSSL_LIBRARIES)
target_include_directories(hdfs3 PRIVATE ${OPENSSL_INCLUDE_DIR})
target_link_libraries(hdfs3 ${OPENSSL_LIBRARIES})
endif()

View File

@ -58,4 +58,7 @@ add_library(rdkafka ${LINK_MODE} ${SRCS})
target_include_directories(rdkafka SYSTEM PUBLIC include)
target_include_directories(rdkafka SYSTEM PUBLIC ${RDKAFKA_SOURCE_DIR}) # Because weird logic with "include_next" is used.
target_include_directories(rdkafka SYSTEM PRIVATE ${ZSTD_INCLUDE_DIR}/common) # Because wrong path to "zstd_errors.h" is used.
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY} ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY})
if(OPENSSL_SSL_LIBRARY AND OPENSSL_CRYPTO_LIBRARY)
target_link_libraries(rdkafka PUBLIC ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
endif()

View File

@ -33,7 +33,6 @@ ${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/ma_time.c
${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/ma_tls.c
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/gnutls.c
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/ma_schannel.c
${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/openssl.c
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/schannel.c
#${MARIADB_CLIENT_SOURCE_DIR}/plugins/auth/auth_gssapi_client.c
#${MARIADB_CLIENT_SOURCE_DIR}/plugins/auth/dialog.c
@ -55,12 +54,19 @@ ${MARIADB_CLIENT_SOURCE_DIR}/plugins/pvio/pvio_socket.c
${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/libmariadb/ma_client_plugin.c
)
if(OPENSSL_LIBRARIES)
list(APPEND SRCS ${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/openssl.c)
endif()
add_library(mysqlclient STATIC ${SRCS})
target_link_libraries(mysqlclient ${OPENSSL_LIBRARIES})
if(OPENSSL_LIBRARIES)
target_link_libraries(mysqlclient ${OPENSSL_LIBRARIES})
target_compile_definitions(mysqlclient PRIVATE -D HAVE_OPENSSL -D HAVE_TLS)
endif()
target_include_directories(mysqlclient PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/include)
target_include_directories(mysqlclient PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/common/include)
target_include_directories(mysqlclient PUBLIC ${MARIADB_CLIENT_SOURCE_DIR}/include)
target_compile_definitions(mysqlclient PRIVATE -D THREAD -D HAVE_OPENSSL -D HAVE_TLS)
target_compile_definitions(mysqlclient PRIVATE -D THREAD)

View File

@ -57,7 +57,7 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
endif ()
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wshadow-field -Wstring-plus-int")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wshadow-field -Wstring-plus-int -Wempty-init-stmt")
endif ()
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9)
@ -309,7 +309,10 @@ if (USE_PARQUET)
endif ()
endif ()
target_link_libraries(dbms PRIVATE ${OPENSSL_CRYPTO_LIBRARY} Threads::Threads)
if(OPENSSL_CRYPTO_LIBRARY)
target_link_libraries(dbms PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
endif ()
target_link_libraries(dbms PRIVATE Threads::Threads)
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${DIVIDE_INCLUDE_DIR})
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})

View File

@ -154,7 +154,7 @@ else ()
clickhouse_target_link_split_lib(clickhouse obfuscator)
endif ()
if (USE_EMBEDDED_COMPILER)
clickhouse_target_link_split_lib(clickhouse compiler)
target_link_libraries(clickhouse PRIVATE clickhouse-compiler-lib)
endif ()
set (CLICKHOUSE_BUNDLE)

View File

@ -101,6 +101,7 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
extern const int CANNOT_SET_SIGNAL_HANDLER;
extern const int CANNOT_READLINE;
extern const int SYSTEM_ERROR;
}
@ -295,7 +296,6 @@ private:
/// The value of the option is used as the text of query (or of multiple queries).
/// If stdin is not a terminal, INSERT data for the first query is read from it.
/// - stdin is not a terminal. In this case queries are read from it.
stdin_is_not_tty = !isatty(STDIN_FILENO);
if (stdin_is_not_tty || config().has("query"))
is_interactive = false;
@ -610,9 +610,6 @@ private:
try
{
/// Determine the terminal size.
ioctl(0, TIOCGWINSZ, &terminal_size);
if (!process(input))
break;
}
@ -1568,7 +1565,7 @@ public:
}
}
ioctl(0, TIOCGWINSZ, &terminal_size);
stdin_is_not_tty = !isatty(STDIN_FILENO);
namespace po = boost::program_options;
@ -1576,7 +1573,11 @@ public:
unsigned min_description_length = line_length / 2;
if (!stdin_is_not_tty)
{
line_length = std::max(3U, static_cast<unsigned>(terminal_size.ws_col));
if (ioctl(STDIN_FILENO, TIOCGWINSZ, &terminal_size))
throwFromErrno("Cannot obtain terminal window size (ioctl TIOCGWINSZ)", ErrorCodes::SYSTEM_ERROR);
line_length = std::max(
static_cast<unsigned>(strlen("--http_native_compression_disable_checksumming_on_decompress ")),
static_cast<unsigned>(terminal_size.ws_col));
min_description_length = std::min(min_description_length, line_length - 2);
}

View File

@ -1,6 +1,6 @@
#pragma once
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Poco/Util/Application.h>
#include <memory>

View File

@ -2,7 +2,7 @@
#include <string>
#include <vector>
#include <map>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Poco/Util/XMLConfiguration.h>
#include <Poco/AutoPtr.h>

View File

@ -25,7 +25,7 @@
#include <Interpreters/Context.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/UseSSL.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Common/Exception.h>
#include <Common/InterruptListener.h>

View File

@ -4,7 +4,7 @@
#include "TestStopConditions.h"
#include <Common/InterruptListener.h>
#include <Interpreters/Context.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Client/Connection.h>
namespace DB

View File

@ -0,0 +1,21 @@
<?xml version="1.0"?>
<yandex>
<profiles>
<!-- Profile that allows only read queries. -->
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
<users>
<readonly>
<password></password>
<networks incl="networks" replace="replace">
<ip>::1</ip>
<ip>127.0.0.1</ip>
</networks>
<profile>readonly</profile>
<quota>default</quota>
</readonly>
</users>
</yandex>

View File

@ -77,7 +77,7 @@
</default>
<!-- Example of user with readonly access. -->
<readonly>
<!-- <readonly>
<password></password>
<networks incl="networks" replace="replace">
<ip>::1</ip>
@ -85,7 +85,7 @@
</networks>
<profile>readonly</profile>
<quota>default</quota>
</readonly>
</readonly> -->
</users>
<!-- Quotas. -->

View File

@ -199,8 +199,13 @@ public:
for (auto & rhs_elem : rhs_set)
{
cur_set.emplace(rhs_elem.getValue(), it, inserted);
if (inserted && it->getValue().size)
it->getValueMutable().data = arena->insert(it->getValue().data, it->getValue().size);
if (inserted)
{
if (it->getValue().size)
it->getValueMutable().data = arena->insert(it->getValue().data, it->getValue().size);
else
it->getValueMutable().data = nullptr;
}
}
}

View File

@ -268,7 +268,7 @@ public:
void merge(const AggregateFunctionHistogramData & other, UInt32 max_bins)
{
lower_bound = std::min(lower_bound, other.lower_bound);
upper_bound = std::max(lower_bound, other.upper_bound);
upper_bound = std::max(upper_bound, other.upper_bound);
for (size_t i = 0; i < other.size; i++)
add(other.points[i].mean, other.points[i].weight, max_bins);
}

View File

@ -18,7 +18,7 @@
#include <IO/ConnectionTimeouts.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/TablesStatus.h>
#include <Compression/ICompressionCodec.h>

View File

@ -6,7 +6,7 @@
#include <Common/getFQDNOrHostName.h>
#include <Common/isLocalAddress.h>
#include <Common/ProfileEvents.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
namespace ProfileEvents

View File

@ -169,8 +169,6 @@ RWLockImpl::LockHolderImpl::~LockHolderImpl()
if (!parent_queue.empty())
parent_queue.front().cv.notify_all();
}
parent.reset();
}

View File

@ -437,10 +437,10 @@ public:
}
template <typename ResultType, typename AnsCallback>
void searchAll(
void searchAllPositions(
const ColumnString::Chars & haystack_data,
const ColumnString::Offsets & haystack_offsets,
const AnsCallback & ansCallback,
const AnsCallback & ans_callback,
ResultType & ans)
{
const size_t haystack_string_size = haystack_offsets.size();
@ -461,7 +461,7 @@ public:
{
const UInt8 * ptr = fallback_searchers[fallback_needles[i]].search(haystack, haystack_end);
if (ptr != haystack_end)
ans[from + fallback_needles[i]] = ansCallback(haystack, ptr);
ans[from + fallback_needles[i]] = ans_callback(haystack, ptr);
}
/// check if we have one non empty volnitsky searcher
@ -481,7 +481,7 @@ public:
{
if (fallback_searchers[ind].compare(res))
{
ans[from + ind] = ansCallback(haystack, res);
ans[from + ind] = ans_callback(haystack, res);
}
}
}
@ -513,6 +513,16 @@ public:
searchInternal(haystack_data, haystack_offsets, callback, ans);
}
template <typename ResultType, typename CountCharsCallback>
void searchFirstPosition(const ColumnString::Chars & haystack_data, const ColumnString::Offsets & haystack_offsets, const CountCharsCallback & count_chars_callback, ResultType & ans)
{
auto callback = [this, &count_chars_callback](const UInt8 * haystack, const UInt8 * haystack_end) -> size_t
{
return this->searchOneFirstPosition(haystack, haystack_end, count_chars_callback);
};
searchInternal(haystack_data, haystack_offsets, callback, ans);
}
private:
/**
* This function is needed to initialize hash table
@ -582,7 +592,7 @@ private:
inline void searchInternal(
const ColumnString::Chars & haystack_data,
const ColumnString::Offsets & haystack_offsets,
const OneSearcher & searchFallback,
const OneSearcher & search_fallback,
ResultType & ans)
{
const size_t haystack_string_size = haystack_offsets.size();
@ -593,7 +603,7 @@ private:
{
const auto * haystack = &haystack_data[prev_offset];
const auto * haystack_end = haystack + haystack_offsets[j] - prev_offset - 1;
ans[j] = searchFallback(haystack, haystack_end);
ans[j] = search_fallback(haystack, haystack_end);
prev_offset = haystack_offsets[j];
}
}
@ -665,6 +675,41 @@ private:
return ans + 1;
}
template <typename CountCharsCallback>
inline size_t searchOneFirstPosition(const UInt8 * haystack, const UInt8 * haystack_end, const CountCharsCallback & callback) const
{
const size_t fallback_size = fallback_needles.size();
size_t ans = std::numeric_limits<size_t>::max();
for (size_t i = 0; i < fallback_size; ++i)
if (auto pos = fallback_searchers[fallback_needles[i]].search(haystack, haystack_end); pos != haystack_end)
ans = std::min(ans, callback(haystack, pos));
/// check if we have one non empty volnitsky searcher
if (step != std::numeric_limits<size_t>::max())
{
const auto * pos = haystack + step - sizeof(VolnitskyTraits::Ngram);
for (; pos <= haystack_end - sizeof(VolnitskyTraits::Ngram); pos += step)
{
for (size_t cell_num = VolnitskyTraits::toNGram(pos) % VolnitskyTraits::hash_size; hash[cell_num].off;
cell_num = (cell_num + 1) % VolnitskyTraits::hash_size)
{
if (pos >= haystack + hash[cell_num].off - 1)
{
const auto res = pos - (hash[cell_num].off - 1);
const size_t ind = hash[cell_num].id;
if (res + needles[ind].size <= haystack_end && fallback_searchers[ind].compare(res))
ans = std::min(ans, callback(haystack, res));
}
}
}
}
if (ans == std::numeric_limits<size_t>::max())
return 0;
return ans;
}
void putNGramBase(const VolnitskyTraits::Ngram ngram, const int offset, const size_t num)
{
size_t cell_num = ngram % VolnitskyTraits::hash_size;

View File

@ -23,6 +23,7 @@
#cmakedefine01 USE_CPUID
#cmakedefine01 USE_CPUINFO
#cmakedefine01 USE_BROTLI
#cmakedefine01 USE_SSL
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
#cmakedefine01 LLVM_HAS_RTTI

View File

@ -1,5 +1,8 @@
add_executable (hashes_test hashes_test.cpp)
target_link_libraries (hashes_test PRIVATE clickhouse_common_io ${OPENSSL_CRYPTO_LIBRARY} ${CITYHASH_LIBRARIES})
target_link_libraries (hashes_test PRIVATE clickhouse_common_io ${CITYHASH_LIBRARIES})
if(OPENSSL_CRYPTO_LIBRARY)
target_link_libraries (hashes_test PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
endif()
add_executable (sip_hash sip_hash.cpp)
target_link_libraries (sip_hash PRIVATE clickhouse_common_io)

View File

@ -1,14 +1,14 @@
#include <iostream>
#include <iomanip>
#include <city.h>
#include <openssl/md5.h>
#include <Common/Stopwatch.h>
#include <Common/SipHash.h>
#include <IO/ReadBufferFromFileDescriptor.h>
#include <IO/ReadHelpers.h>
#include <Common/config.h>
#if USE_SSL
# include <openssl/md5.h>
#endif
int main(int, char **)
@ -108,6 +108,7 @@ int main(int, char **)
<< std::endl;
}
#if USE_SSL
{
Stopwatch watch;
@ -129,6 +130,7 @@ int main(int, char **)
<< " (" << rows / watch.elapsedSeconds() << " rows/sec., " << bytes / 1000000.0 / watch.elapsedSeconds() << " MB/sec.)"
<< std::endl;
}
#endif
return 0;
}

View File

@ -1,8 +1,9 @@
#include "Settings.h"
#include <Poco/Util/AbstractConfiguration.h>
#include <Core/Field.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/Settings.h>
#include <Columns/ColumnArray.h>
#include <Common/typeid_cast.h>
#include <string.h>

View File

@ -1,7 +1,7 @@
#pragma once
#include "SettingsCommon.h"
#include <Core/Defines.h>
#include <Interpreters/SettingsCommon.h>
namespace Poco
@ -127,7 +127,6 @@ struct Settings
M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.") \
\
M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed") \
M(SettingBool, insert_sample_with_metadata, false, "For INSERT queries, specifies that the server need to send metadata about column defaults to the client. This will be used to calculate default expressions.") \
\
M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.") \
M(SettingMilliseconds, insert_quorum_timeout, 600000, "") \
@ -153,6 +152,7 @@ struct Settings
\
M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow and TSKV formats).") \
M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).") \
M(SettingBool, input_format_defaults_for_omitted_fields, false, "For input data calculate default expressions for omitted fields (it works for JSONEachRow format).") \
\
M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.") \
\

View File

@ -1,12 +1,11 @@
#include <Core/Field.h>
#include "SettingsCommon.h"
#include <Core/Field.h>
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <Common/FieldVisitors.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/SettingsCommon.h>
namespace DB

View File

@ -5,7 +5,7 @@
#include <DataStreams/BlockStreamProfileInfo.h>
#include <DataStreams/SizeLimits.h>
#include <IO/Progress.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
#include <Storages/TableStructureLockHolder.h>
#include <atomic>

View File

@ -16,7 +16,7 @@
#include <Parsers/parseQuery.h>
#include <Parsers/ParserCreateQuery.h>
#include <Interpreters/Context.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/InterpreterCreateQuery.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/ReadBufferFromFile.h>

View File

@ -1,6 +1,6 @@
#include <Common/Exception.h>
#include <Interpreters/Context.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <DataStreams/MaterializingBlockOutputStream.h>
#include <Formats/FormatSettings.h>
#include <Formats/FormatFactory.h>

View File

@ -26,11 +26,11 @@
# include <IO/WriteHelpers.h>
# include <IO/copyData.h>
# include <Interpreters/castColumn.h>
# include <common/DateLUTImpl.h>
# include <ext/range.h>
# include <arrow/api.h>
# include <parquet/arrow/reader.h>
# include <parquet/file_reader.h>
# include <common/DateLUTImpl.h>
# include <ext/range.h>
namespace DB
{
@ -223,7 +223,8 @@ void fillColumnWithDecimalData(std::shared_ptr<arrow::Column> & arrow_column, Mu
auto & chunk = static_cast<arrow::DecimalArray &>(*(arrow_column->data()->chunk(chunk_i)));
for (size_t value_i = 0, length = static_cast<size_t>(chunk.length()); value_i < length; ++value_i)
{
column_data.emplace_back(chunk.IsNull(value_i) ? Decimal128(0) : *reinterpret_cast<const Decimal128 *>(chunk.Value(value_i))); // TODO: copy column
column_data.emplace_back(
chunk.IsNull(value_i) ? Decimal128(0) : *reinterpret_cast<const Decimal128 *>(chunk.Value(value_i))); // TODO: copy column
}
}
}
@ -259,45 +260,46 @@ void fillByteMapFromArrowColumn(std::shared_ptr<arrow::Column> & arrow_column, M
using NameToColumnPtr = std::unordered_map<std::string, std::shared_ptr<arrow::Column>>;
const std::unordered_map<arrow::Type::type, std::shared_ptr<IDataType>> arrow_type_to_internal_type = {
//{arrow::Type::DECIMAL, std::make_shared<DataTypeDecimal>()},
{arrow::Type::UINT8, std::make_shared<DataTypeUInt8>()},
{arrow::Type::INT8, std::make_shared<DataTypeInt8>()},
{arrow::Type::UINT16, std::make_shared<DataTypeUInt16>()},
{arrow::Type::INT16, std::make_shared<DataTypeInt16>()},
{arrow::Type::UINT32, std::make_shared<DataTypeUInt32>()},
{arrow::Type::INT32, std::make_shared<DataTypeInt32>()},
{arrow::Type::UINT64, std::make_shared<DataTypeUInt64>()},
{arrow::Type::INT64, std::make_shared<DataTypeInt64>()},
{arrow::Type::HALF_FLOAT, std::make_shared<DataTypeFloat32>()},
{arrow::Type::FLOAT, std::make_shared<DataTypeFloat32>()},
{arrow::Type::DOUBLE, std::make_shared<DataTypeFloat64>()},
{arrow::Type::BOOL, std::make_shared<DataTypeUInt8>()},
//{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
//{arrow::Type::DATE32, std::make_shared<DataTypeDateTime>()},
{arrow::Type::DATE64, std::make_shared<DataTypeDateTime>()},
{arrow::Type::TIMESTAMP, std::make_shared<DataTypeDateTime>()},
//{arrow::Type::TIME32, std::make_shared<DataTypeDateTime>()},
{arrow::Type::STRING, std::make_shared<DataTypeString>()},
{arrow::Type::BINARY, std::make_shared<DataTypeString>()},
//{arrow::Type::FIXED_SIZE_BINARY, std::make_shared<DataTypeString>()},
//{arrow::Type::UUID, std::make_shared<DataTypeString>()},
// TODO: add other types that are convertable to internal ones:
// 0. ENUM?
// 1. UUID -> String
// 2. JSON -> String
// Full list of types: contrib/arrow/cpp/src/arrow/type.h
};
Block ParquetBlockInputStream::readImpl()
{
static const std::unordered_map<arrow::Type::type, std::shared_ptr<IDataType>> arrow_type_to_internal_type = {
//{arrow::Type::DECIMAL, std::make_shared<DataTypeDecimal>()},
{arrow::Type::UINT8, std::make_shared<DataTypeUInt8>()},
{arrow::Type::INT8, std::make_shared<DataTypeInt8>()},
{arrow::Type::UINT16, std::make_shared<DataTypeUInt16>()},
{arrow::Type::INT16, std::make_shared<DataTypeInt16>()},
{arrow::Type::UINT32, std::make_shared<DataTypeUInt32>()},
{arrow::Type::INT32, std::make_shared<DataTypeInt32>()},
{arrow::Type::UINT64, std::make_shared<DataTypeUInt64>()},
{arrow::Type::INT64, std::make_shared<DataTypeInt64>()},
{arrow::Type::HALF_FLOAT, std::make_shared<DataTypeFloat32>()},
{arrow::Type::FLOAT, std::make_shared<DataTypeFloat32>()},
{arrow::Type::DOUBLE, std::make_shared<DataTypeFloat64>()},
{arrow::Type::BOOL, std::make_shared<DataTypeUInt8>()},
//{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
//{arrow::Type::DATE32, std::make_shared<DataTypeDateTime>()},
{arrow::Type::DATE64, std::make_shared<DataTypeDateTime>()},
{arrow::Type::TIMESTAMP, std::make_shared<DataTypeDateTime>()},
//{arrow::Type::TIME32, std::make_shared<DataTypeDateTime>()},
{arrow::Type::STRING, std::make_shared<DataTypeString>()},
{arrow::Type::BINARY, std::make_shared<DataTypeString>()},
//{arrow::Type::FIXED_SIZE_BINARY, std::make_shared<DataTypeString>()},
//{arrow::Type::UUID, std::make_shared<DataTypeString>()},
// TODO: add other types that are convertable to internal ones:
// 0. ENUM?
// 1. UUID -> String
// 2. JSON -> String
// Full list of types: contrib/arrow/cpp/src/arrow/type.h
};
Block res;
if (!istr.eof())
@ -308,7 +310,9 @@ Block ParquetBlockInputStream::readImpl()
*/
if (row_group_current < row_group_total)
throw Exception{"Got new data, but data from previous chunks not readed " + std::to_string(row_group_current) + "/" + std::to_string(row_group_total), ErrorCodes::CANNOT_READ_ALL_DATA};
throw Exception{"Got new data, but data from previous chunks not readed " + std::to_string(row_group_current) + "/"
+ std::to_string(row_group_total),
ErrorCodes::CANNOT_READ_ALL_DATA};
file_data.clear();
{

View File

@ -20,7 +20,11 @@ target_link_libraries(clickhouse_functions
${METROHASH_LIBRARIES}
murmurhash
${BASE64_LIBRARY}
${OPENSSL_CRYPTO_LIBRARY})
)
if (OPENSSL_CRYPTO_LIBRARY)
target_link_libraries(clickhouse_functions PUBLIC ${OPENSSL_CRYPTO_LIBRARY})
endif()
target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR} ${METROHASH_INCLUDE_DIR})
@ -60,3 +64,8 @@ if (USE_XXHASH)
target_link_libraries(clickhouse_functions PRIVATE ${XXHASH_LIBRARY})
target_include_directories(clickhouse_functions SYSTEM PRIVATE ${XXHASH_INCLUDE_DIR})
endif()
if (USE_HYPERSCAN)
target_link_libraries (clickhouse_functions PRIVATE ${HYPERSCAN_LIBRARY})
target_include_directories (clickhouse_functions SYSTEM PRIVATE ${HYPERSCAN_INCLUDE_DIR})
endif ()

View File

@ -547,21 +547,27 @@ class FunctionBinaryArithmetic : public IFunction
throw Exception{"Illegal column " + block.getByPosition(new_arguments[1]).column->getName()
+ " of argument of aggregation state multiply. Should be integer constant", ErrorCodes::ILLEGAL_COLUMN};
const ColumnAggregateFunction * column = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(new_arguments[0]).column.get());
IAggregateFunction * function = column->getAggregateFunction().get();
const IColumn & agg_state_column = *block.getByPosition(new_arguments[0]).column;
bool agg_state_is_const = agg_state_column.isColumnConst();
const ColumnAggregateFunction & column = typeid_cast<const ColumnAggregateFunction &>(
agg_state_is_const ? static_cast<const ColumnConst &>(agg_state_column).getDataColumn() : agg_state_column);
AggregateFunctionPtr function = column.getAggregateFunction();
auto arena = std::make_shared<Arena>();
auto column_to = ColumnAggregateFunction::create(column->getAggregateFunction(), Arenas(1, arena));
column_to->reserve(input_rows_count);
size_t size = agg_state_is_const ? 1 : input_rows_count;
auto column_from = ColumnAggregateFunction::create(column->getAggregateFunction(), Arenas(1, arena));
column_from->reserve(input_rows_count);
auto column_to = ColumnAggregateFunction::create(function, Arenas(1, arena));
column_to->reserve(size);
for (size_t i = 0; i < input_rows_count; ++i)
auto column_from = ColumnAggregateFunction::create(function, Arenas(1, arena));
column_from->reserve(size);
for (size_t i = 0; i < size; ++i)
{
column_to->insertDefault();
column_from->insertFrom(column->getData()[i]);
column_from->insertFrom(column.getData()[i]);
}
auto & vec_to = column_to->getData();
@ -575,38 +581,55 @@ class FunctionBinaryArithmetic : public IFunction
{
if (m % 2)
{
for (size_t i = 0; i < input_rows_count; ++i)
for (size_t i = 0; i < size; ++i)
function->merge(vec_to[i], vec_from[i], arena.get());
--m;
}
else
{
for (size_t i = 0; i < input_rows_count; ++i)
for (size_t i = 0; i < size; ++i)
function->merge(vec_from[i], vec_from[i], arena.get());
m /= 2;
}
}
block.getByPosition(result).column = std::move(column_to);
if (agg_state_is_const)
block.getByPosition(result).column = ColumnConst::create(std::move(column_to), input_rows_count);
else
block.getByPosition(result).column = std::move(column_to);
}
/// Merge two aggregation states together.
void executeAggregateAddition(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) const
{
const ColumnAggregateFunction * columns[2];
for (size_t i = 0; i < 2; ++i)
columns[i] = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[i]).column.get());
const IColumn & lhs_column = *block.getByPosition(arguments[0]).column;
const IColumn & rhs_column = *block.getByPosition(arguments[1]).column;
auto column_to = ColumnAggregateFunction::create(columns[0]->getAggregateFunction());
column_to->reserve(input_rows_count);
bool lhs_is_const = lhs_column.isColumnConst();
bool rhs_is_const = rhs_column.isColumnConst();
for (size_t i = 0; i < input_rows_count; ++i)
const ColumnAggregateFunction & lhs = typeid_cast<const ColumnAggregateFunction &>(
lhs_is_const ? static_cast<const ColumnConst &>(lhs_column).getDataColumn() : lhs_column);
const ColumnAggregateFunction & rhs = typeid_cast<const ColumnAggregateFunction &>(
rhs_is_const ? static_cast<const ColumnConst &>(rhs_column).getDataColumn() : rhs_column);
AggregateFunctionPtr function = lhs.getAggregateFunction();
size_t size = (lhs_is_const && rhs_is_const) ? 1 : input_rows_count;
auto column_to = ColumnAggregateFunction::create(function);
column_to->reserve(size);
for (size_t i = 0; i < size; ++i)
{
column_to->insertFrom(columns[0]->getData()[i]);
column_to->insertMergeFrom(columns[1]->getData()[i]);
column_to->insertFrom(lhs.getData()[lhs_is_const ? 0 : i]);
column_to->insertMergeFrom(rhs.getData()[rhs_is_const ? 0 : i]);
}
block.getByPosition(result).column = std::move(column_to);
if (lhs_is_const && rhs_is_const)
block.getByPosition(result).column = ColumnConst::create(std::move(column_to), input_rows_count);
else
block.getByPosition(result).column = std::move(column_to);
}
void executeDateTimeIntervalPlusMinus(Block & block, const ColumnNumbers & arguments,

View File

@ -8,11 +8,13 @@ namespace DB
void registerFunctionsHashing(FunctionFactory & factory)
{
#if USE_SSL
factory.registerFunction<FunctionHalfMD5>();
factory.registerFunction<FunctionMD5>();
factory.registerFunction<FunctionSHA1>();
factory.registerFunction<FunctionSHA224>();
factory.registerFunction<FunctionSHA256>();
#endif
factory.registerFunction<FunctionSipHash64>();
factory.registerFunction<FunctionSipHash128>();
factory.registerFunction<FunctionCityHash64>();

View File

@ -1,7 +1,5 @@
#pragma once
#include <openssl/md5.h>
#include <openssl/sha.h>
#include <city.h>
#include <farmhash.h>
#include <metrohash.h>
@ -14,7 +12,12 @@
#include <Common/config.h>
#if USE_XXHASH
#include <xxhash.h> // Y_IGNORE
# include <xxhash.h> // Y_IGNORE
#endif
#if USE_SSL
# include <openssl/md5.h>
# include <openssl/sha.h>
#endif
#include <Poco/ByteOrder.h>
@ -94,7 +97,7 @@ struct IntHash64Impl
}
};
#if USE_SSL
struct HalfMD5Impl
{
static constexpr auto name = "halfMD5";
@ -183,6 +186,7 @@ struct SHA256Impl
SHA256_Final(out_char_data, &ctx);
}
};
#endif
struct SipHash64Impl
{
@ -1076,15 +1080,18 @@ private:
struct NameIntHash32 { static constexpr auto name = "intHash32"; };
struct NameIntHash64 { static constexpr auto name = "intHash64"; };
#if USE_SSL
using FunctionHalfMD5 = FunctionAnyHash<HalfMD5Impl>;
#endif
using FunctionSipHash64 = FunctionAnyHash<SipHash64Impl>;
using FunctionIntHash32 = FunctionIntHash<IntHash32Impl, NameIntHash32>;
using FunctionIntHash64 = FunctionIntHash<IntHash64Impl, NameIntHash64>;
#if USE_SSL
using FunctionMD5 = FunctionStringHashFixedString<MD5Impl>;
using FunctionSHA1 = FunctionStringHashFixedString<SHA1Impl>;
using FunctionSHA224 = FunctionStringHashFixedString<SHA224Impl>;
using FunctionSHA256 = FunctionStringHashFixedString<SHA256Impl>;
#endif
using FunctionSipHash128 = FunctionStringHashFixedString<SipHash128Impl>;
using FunctionCityHash64 = FunctionAnyHash<ImplCityHash64>;
using FunctionFarmHash64 = FunctionAnyHash<ImplFarmHash64>;

View File

@ -15,6 +15,10 @@
#include <algorithm>
#include <memory>
#ifdef __SSSE3__
# include <hs.h>
#endif
#if USE_RE2_ST
# include <re2_st/re2.h> // Y_IGNORE
#else
@ -312,7 +316,7 @@ struct PositionImpl
};
template <typename Impl>
struct MultiPositionImpl
struct MultiSearchAllPositionsImpl
{
using ResultType = UInt64;
@ -322,17 +326,31 @@ struct MultiPositionImpl
const std::vector<StringRef> & needles,
PaddedPODArray<UInt64> & res)
{
auto resCallback = [](const UInt8 * start, const UInt8 * end) -> UInt64
auto res_callback = [](const UInt8 * start, const UInt8 * end) -> UInt64
{
return 1 + Impl::countChars(reinterpret_cast<const char *>(start), reinterpret_cast<const char *>(end));
};
Impl::createMultiSearcherInBigHaystack(needles).searchAll(haystack_data, haystack_offsets, resCallback, res);
Impl::createMultiSearcherInBigHaystack(needles).searchAllPositions(haystack_data, haystack_offsets, res_callback, res);
}
};
template <typename Impl>
struct MultiSearchImpl
{
using ResultType = UInt8;
static void vector_constant(
const ColumnString::Chars & haystack_data,
const ColumnString::Offsets & haystack_offsets,
const std::vector<StringRef> & needles,
PaddedPODArray<UInt8> & res)
{
Impl::createMultiSearcherInBigHaystack(needles).search(haystack_data, haystack_offsets, res);
}
};
template <typename Impl>
struct MultiSearchFirstPositionImpl
{
using ResultType = UInt64;
@ -342,12 +360,16 @@ struct MultiSearchImpl
const std::vector<StringRef> & needles,
PaddedPODArray<UInt64> & res)
{
Impl::createMultiSearcherInBigHaystack(needles).search(haystack_data, haystack_offsets, res);
auto res_callback = [](const UInt8 * start, const UInt8 * end) -> UInt64
{
return 1 + Impl::countChars(reinterpret_cast<const char *>(start), reinterpret_cast<const char *>(end));
};
Impl::createMultiSearcherInBigHaystack(needles).searchFirstPosition(haystack_data, haystack_offsets, res_callback, res);
}
};
template <typename Impl>
struct FirstMatchImpl
struct MultiSearchFirstIndexImpl
{
using ResultType = UInt64;
@ -524,8 +546,8 @@ struct MatchImpl
res[i] = !revert;
else
{
const char * str_data = reinterpret_cast<const char *>(&data[i != 0 ? offsets[i - 1] : 0]);
size_t str_size = (i != 0 ? offsets[i] - offsets[i - 1] : offsets[0]) - 1;
const char * str_data = reinterpret_cast<const char *>(&data[offsets[i - 1]]);
size_t str_size = offsets[i] - offsets[i - 1] - 1;
/** Even in the case of `required_substring_is_prefix` use UNANCHORED check for regexp,
* so that it can match when `required_substring` occurs into the string several times,
@ -581,6 +603,78 @@ struct MatchImpl
};
template <typename Type, bool FindAny, bool FindAnyIndex>
struct MultiMatchAnyImpl
{
static_assert(static_cast<int>(FindAny) + static_cast<int>(FindAnyIndex) == 1);
using ResultType = Type;
static void vector_constant(
const ColumnString::Chars & haystack_data,
const ColumnString::Offsets & haystack_offsets,
const std::vector<StringRef> & needles,
PaddedPODArray<Type> & res)
{
(void)FindAny;
(void)FindAnyIndex;
#ifdef __SSSE3__
using ScratchPtr = std::unique_ptr<hs_scratch_t, DB::MultiRegexps::HyperscanDeleter<decltype(&hs_free_scratch), &hs_free_scratch>>;
const auto & hyperscan_regex = MultiRegexps::get<FindAnyIndex>(needles);
hs_scratch_t * scratch = nullptr;
hs_error_t err = hs_alloc_scratch(hyperscan_regex->get(), &scratch);
if (err != HS_SUCCESS)
throw Exception("Could not allocate scratch space for hyperscan.", ErrorCodes::CANNOT_ALLOCATE_MEMORY);
ScratchPtr smart_scratch(scratch);
auto on_match = []([[maybe_unused]] unsigned int id,
unsigned long long /* from */,
unsigned long long /* to */,
unsigned int /* flags */,
void * context) -> int
{
if constexpr (FindAnyIndex)
*reinterpret_cast<Type *>(context) = id;
else if constexpr (FindAny)
*reinterpret_cast<Type *>(context) = 1;
return 0;
};
const size_t haystack_offsets_size = haystack_offsets.size();
size_t offset = 0;
for (size_t i = 0; i < haystack_offsets_size; ++i)
{
res[i] = 0;
hs_scan(
hyperscan_regex->get(),
reinterpret_cast<const char *>(haystack_data.data()) + offset,
haystack_offsets[i] - offset - 1,
0,
smart_scratch.get(),
on_match,
&res[i]);
offset = haystack_offsets[i];
}
#else
/// Fallback if not an intel processor
PaddedPODArray<UInt8> accum(res.size());
memset(res.data(), 0, res.size() * sizeof(res.front()));
memset(accum.data(), 0, accum.size());
for (size_t j = 0; j < needles.size(); ++j)
{
MatchImpl<false, false>::vector_constant(haystack_data, haystack_offsets, needles[j].toString(), accum);
for (size_t i = 0; i < res.size(); ++i)
{
if constexpr (FindAny)
res[i] |= accum[i];
else if (accum[i])
res[i] = j + 1;
}
}
#endif // __SSSE3__
}
};
struct ExtractImpl
{
static void vector(
@ -1090,53 +1184,69 @@ struct NamePositionCaseInsensitiveUTF8
{
static constexpr auto name = "positionCaseInsensitiveUTF8";
};
struct NameMultiPosition
struct NameMultiSearchAllPositions
{
static constexpr auto name = "multiPosition";
static constexpr auto name = "multiSearchAllPositions";
};
struct NameMultiPositionUTF8
struct NameMultiSearchAllPositionsUTF8
{
static constexpr auto name = "multiPositionUTF8";
static constexpr auto name = "multiSearchAllPositionsUTF8";
};
struct NameMultiPositionCaseInsensitive
struct NameMultiSearchAllPositionsCaseInsensitive
{
static constexpr auto name = "multiPositionCaseInsensitive";
static constexpr auto name = "multiSearchAllPositionsCaseInsensitive";
};
struct NameMultiPositionCaseInsensitiveUTF8
struct NameMultiSearchAllPositionsCaseInsensitiveUTF8
{
static constexpr auto name = "multiPositionCaseInsensitiveUTF8";
static constexpr auto name = "multiSearchAllPositionsCaseInsensitiveUTF8";
};
struct NameMultiSearch
struct NameMultiSearchAny
{
static constexpr auto name = "multiSearch";
static constexpr auto name = "multiSearchAny";
};
struct NameMultiSearchUTF8
struct NameMultiSearchAnyUTF8
{
static constexpr auto name = "multiSearchUTF8";
static constexpr auto name = "multiSearchAnyUTF8";
};
struct NameMultiSearchCaseInsensitive
struct NameMultiSearchAnyCaseInsensitive
{
static constexpr auto name = "multiSearchCaseInsensitive";
static constexpr auto name = "multiSearchAnyCaseInsensitive";
};
struct NameMultiSearchCaseInsensitiveUTF8
struct NameMultiSearchAnyCaseInsensitiveUTF8
{
static constexpr auto name = "multiSearchCaseInsensitiveUTF8";
static constexpr auto name = "multiSearchAnyCaseInsensitiveUTF8";
};
struct NameFirstMatch
struct NameMultiSearchFirstIndex
{
static constexpr auto name = "firstMatch";
static constexpr auto name = "multiSearchFirstIndex";
};
struct NameFirstMatchUTF8
struct NameMultiSearchFirstIndexUTF8
{
static constexpr auto name = "firstMatchUTF8";
static constexpr auto name = "multiSearchFirstIndexUTF8";
};
struct NameFirstMatchCaseInsensitive
struct NameMultiSearchFirstIndexCaseInsensitive
{
static constexpr auto name = "firstMatchCaseInsensitive";
static constexpr auto name = "multiSearchFirstIndexCaseInsensitive";
};
struct NameFirstMatchCaseInsensitiveUTF8
struct NameMultiSearchFirstIndexCaseInsensitiveUTF8
{
static constexpr auto name = "firstMatchCaseInsensitiveUTF8";
static constexpr auto name = "multiSearchFirstIndexCaseInsensitiveUTF8";
};
struct NameMultiSearchFirstPosition
{
static constexpr auto name = "multiSearchFirstPosition";
};
struct NameMultiSearchFirstPositionUTF8
{
static constexpr auto name = "multiSearchFirstPositionUTF8";
};
struct NameMultiSearchFirstPositionCaseInsensitive
{
static constexpr auto name = "multiSearchFirstPositionCaseInsensitive";
};
struct NameMultiSearchFirstPositionCaseInsensitiveUTF8
{
static constexpr auto name = "multiSearchFirstPositionCaseInsensitiveUTF8";
};
struct NameMatch
{
@ -1150,6 +1260,14 @@ struct NameNotLike
{
static constexpr auto name = "notLike";
};
struct NameMultiMatchAny
{
static constexpr auto name = "multiMatchAny";
};
struct NameMultiMatchAnyIndex
{
static constexpr auto name = "multiMatchAnyIndex";
};
struct NameExtract
{
static constexpr auto name = "extract";
@ -1177,28 +1295,37 @@ using FunctionPositionCaseInsensitive = FunctionsStringSearch<PositionImpl<Posit
using FunctionPositionCaseInsensitiveUTF8
= FunctionsStringSearch<PositionImpl<PositionCaseInsensitiveUTF8>, NamePositionCaseInsensitiveUTF8>;
using FunctionMultiPosition = FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseSensitiveASCII>, NameMultiPosition>;
using FunctionMultiPositionUTF8 = FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseSensitiveUTF8>, NameMultiPositionUTF8>;
using FunctionMultiPositionCaseInsensitive
= FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseInsensitiveASCII>, NameMultiPositionCaseInsensitive>;
using FunctionMultiPositionCaseInsensitiveUTF8
= FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseInsensitiveUTF8>, NameMultiPositionCaseInsensitiveUTF8>;
using FunctionMultiSearchAllPositions = FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseSensitiveASCII>, NameMultiSearchAllPositions>;
using FunctionMultiSearchAllPositionsUTF8 = FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseSensitiveUTF8>, NameMultiSearchAllPositionsUTF8>;
using FunctionMultiSearchAllPositionsCaseInsensitive
= FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseInsensitiveASCII>, NameMultiSearchAllPositionsCaseInsensitive>;
using FunctionMultiSearchAllPositionsCaseInsensitiveUTF8
= FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchAllPositionsCaseInsensitiveUTF8>;
using FunctionMultiSearch = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveASCII>, NameMultiSearch>;
using FunctionMultiSearchUTF8 = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveUTF8>, NameMultiSearchUTF8>;
using FunctionMultiSearch = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveASCII>, NameMultiSearchAny>;
using FunctionMultiSearchUTF8 = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveUTF8>, NameMultiSearchAnyUTF8>;
using FunctionMultiSearchCaseInsensitive
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveASCII>, NameMultiSearchCaseInsensitive>;
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveASCII>, NameMultiSearchAnyCaseInsensitive>;
using FunctionMultiSearchCaseInsensitiveUTF8
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchCaseInsensitiveUTF8>;
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchAnyCaseInsensitiveUTF8>;
using FunctionFirstMatch = FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseSensitiveASCII>, NameFirstMatch>;
using FunctionFirstMatchUTF8 = FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseSensitiveUTF8>, NameFirstMatchUTF8>;
using FunctionFirstMatchCaseInsensitive
= FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseInsensitiveASCII>, NameFirstMatchCaseInsensitive>;
using FunctionFirstMatchCaseInsensitiveUTF8
= FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseInsensitiveUTF8>, NameFirstMatchCaseInsensitiveUTF8>;
using FunctionMultiSearchFirstIndex = FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseSensitiveASCII>, NameMultiSearchFirstIndex>;
using FunctionMultiSearchFirstIndexUTF8 = FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseSensitiveUTF8>, NameMultiSearchFirstIndexUTF8>;
using FunctionMultiSearchFirstIndexCaseInsensitive
= FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseInsensitiveASCII>, NameMultiSearchFirstIndexCaseInsensitive>;
using FunctionMultiSearchFirstIndexCaseInsensitiveUTF8
= FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchFirstIndexCaseInsensitiveUTF8>;
using FunctionMultiSearchFirstPosition = FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseSensitiveASCII>, NameMultiSearchFirstPosition>;
using FunctionMultiSearchFirstPositionUTF8 = FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseSensitiveUTF8>, NameMultiSearchFirstPositionUTF8>;
using FunctionMultiSearchFirstPositionCaseInsensitive
= FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseInsensitiveASCII>, NameMultiSearchFirstPositionCaseInsensitive>;
using FunctionMultiSearchFirstPositionCaseInsensitiveUTF8
= FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchFirstPositionCaseInsensitiveUTF8>;
using FunctionMatch = FunctionsStringSearch<MatchImpl<false>, NameMatch>;
using FunctionMultiMatchAny = FunctionsMultiStringSearch<MultiMatchAnyImpl<UInt8, true, false>, NameMultiMatchAny, std::numeric_limits<UInt32>::max()>;
using FunctionMultiMatchAnyIndex = FunctionsMultiStringSearch<MultiMatchAnyImpl<UInt64, false, true>, NameMultiMatchAnyIndex, std::numeric_limits<UInt32>::max()>;
using FunctionLike = FunctionsStringSearch<MatchImpl<true>, NameLike>;
using FunctionNotLike = FunctionsStringSearch<MatchImpl<true, true>, NameNotLike>;
using FunctionExtract = FunctionsStringSearchToString<ExtractImpl, NameExtract>;
@ -1220,26 +1347,34 @@ void registerFunctionsStringSearch(FunctionFactory & factory)
factory.registerFunction<FunctionPositionCaseInsensitive>();
factory.registerFunction<FunctionPositionCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMultiPosition>();
factory.registerFunction<FunctionMultiPositionUTF8>();
factory.registerFunction<FunctionMultiPositionCaseInsensitive>();
factory.registerFunction<FunctionMultiPositionCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMultiSearchAllPositions>();
factory.registerFunction<FunctionMultiSearchAllPositionsUTF8>();
factory.registerFunction<FunctionMultiSearchAllPositionsCaseInsensitive>();
factory.registerFunction<FunctionMultiSearchAllPositionsCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMultiSearch>();
factory.registerFunction<FunctionMultiSearchUTF8>();
factory.registerFunction<FunctionMultiSearchCaseInsensitive>();
factory.registerFunction<FunctionMultiSearchCaseInsensitiveUTF8>();
factory.registerFunction<FunctionFirstMatch>();
factory.registerFunction<FunctionFirstMatchUTF8>();
factory.registerFunction<FunctionFirstMatchCaseInsensitive>();
factory.registerFunction<FunctionFirstMatchCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMultiSearchFirstIndex>();
factory.registerFunction<FunctionMultiSearchFirstIndexUTF8>();
factory.registerFunction<FunctionMultiSearchFirstIndexCaseInsensitive>();
factory.registerFunction<FunctionMultiSearchFirstIndexCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMultiSearchFirstPosition>();
factory.registerFunction<FunctionMultiSearchFirstPositionUTF8>();
factory.registerFunction<FunctionMultiSearchFirstPositionCaseInsensitive>();
factory.registerFunction<FunctionMultiSearchFirstPositionCaseInsensitiveUTF8>();
factory.registerFunction<FunctionMatch>();
factory.registerFunction<FunctionLike>();
factory.registerFunction<FunctionNotLike>();
factory.registerFunction<FunctionExtract>();
factory.registerFunction<FunctionMultiMatchAny>();
factory.registerFunction<FunctionMultiMatchAnyIndex>();
factory.registerAlias("locate", NamePosition::name, FunctionFactory::CaseInsensitive);
factory.registerAlias("replace", NameReplaceAll::name, FunctionFactory::CaseInsensitive);
}

View File

@ -26,6 +26,8 @@ namespace DB
* notLike(haystack, pattern)
*
* match(haystack, pattern) - search by regular expression re2; Returns 0 or 1.
* multiMatchAny(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- search by re2 regular expressions pattern_i; Returns 0 or 1 if any pattern_i matches.
* multiMatchAnyIndex(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- search by re2 regular expressions pattern_i; Returns index of any match or zero if none;
*
* Applies regexp re2 and pulls:
* - the first subpattern, if the regexp has a subpattern;
@ -39,20 +41,25 @@ namespace DB
* replaceRegexpOne(haystack, pattern, replacement) - replaces the pattern with the specified regexp, only the first occurrence.
* replaceRegexpAll(haystack, pattern, replacement) - replaces the pattern with the specified type, all occurrences.
*
* multiPosition(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find first occurrences (positions) of all the const patterns inside haystack
* multiPositionUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiPositionCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiPositionCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
*
* multiSearch(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find any of the const patterns inside haystack and return 0 or 1
* multiSearchUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchAllPositions(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find first occurrences (positions) of all the const patterns inside haystack
* multiSearchAllPositionsUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchAllPositionsCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* firstMatch(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first index of the matched string or zero if nothing was found
* firstMatchUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* firstMatchCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* firstMatchCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstPosition(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first position of the haystack matched by strings or zero if nothing was found
* multiSearchFirstPositionUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstPositionCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
*
* multiSearchAny(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find any of the const patterns inside haystack and return 0 or 1
* multiSearchAnyUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchAnyCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchAnyCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstIndex(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first index of the matched string or zero if nothing was found
* multiSearchFirstIndexUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstIndexCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
* multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
*/
namespace ErrorCodes
@ -269,9 +276,13 @@ public:
}
};
template <typename Impl, typename Name>
/// The argument limiting raises from Volnitsky searcher -- it is performance crucial to save only one byte for pattern number.
/// But some other searchers use this function, for example, multiMatchAny -- hyperscan does not have such restrictions
template <typename Impl, typename Name, size_t LimitArgs = std::numeric_limits<UInt8>::max()>
class FunctionsMultiStringSearch : public IFunction
{
static_assert(LimitArgs > 0);
public:
static constexpr auto name = Name::name;
static FunctionPtr create(const Context &) { return std::make_shared<FunctionsMultiStringSearch>(); }
@ -282,10 +293,10 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (arguments.size() + 1 >= std::numeric_limits<UInt8>::max())
if (arguments.size() + 1 >= LimitArgs)
throw Exception(
"Number of arguments for function " + getName() + " doesn't match: passed " + std::to_string(arguments.size())
+ ", should be at most 255.",
+ ", should be at most " + std::to_string(LimitArgs) + ".",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (!isString(arguments[0]))
@ -333,6 +344,7 @@ public:
vec_res.resize(column_haystack_size);
/// TODO support constant_constant version
if (col_haystack_vector)
Impl::vector_constant(col_haystack_vector->getChars(), col_haystack_vector->getOffsets(), refs, vec_res);
else

View File

@ -1,19 +1,32 @@
#pragma once
#include <Common/OptimizedRegularExpression.h>
#include <Common/ObjectPool.h>
#include <Common/ProfileEvents.h>
#include <Functions/likePatternToRegexp.h>
#include <Common/ObjectPool.h>
#include <Common/OptimizedRegularExpression.h>
#include <Common/ProfileEvents.h>
#include <common/StringRef.h>
#include <memory>
#include <string>
#include <vector>
#ifdef __SSSE3__
# include <hs.h>
#endif
namespace ProfileEvents
{
extern const Event RegexpCreated;
extern const Event RegexpCreated;
}
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_ALLOCATE_MEMORY;
extern const int LOGICAL_ERROR;
}
namespace Regexps
{
@ -21,16 +34,22 @@ namespace Regexps
using Pool = ObjectPoolMap<Regexp, String>;
template <bool like>
inline Regexp createRegexp(const std::string & pattern, int flags) { return {pattern, flags}; }
inline Regexp createRegexp(const std::string & pattern, int flags)
{
return {pattern, flags};
}
template <>
inline Regexp createRegexp<true>(const std::string & pattern, int flags) { return {likePatternToRegexp(pattern), flags}; }
inline Regexp createRegexp<true>(const std::string & pattern, int flags)
{
return {likePatternToRegexp(pattern), flags};
}
template <bool like, bool no_capture>
inline Pool::Pointer get(const std::string & pattern)
{
/// C++11 has thread-safe function-local statics on most modern compilers.
static Pool known_regexps; /// Different variables for different pattern parameters.
static Pool known_regexps; /// Different variables for different pattern parameters.
return known_regexps.get(pattern, [&pattern]
{
@ -44,4 +63,82 @@ namespace Regexps
}
}
#ifdef __SSSE3__
namespace MultiRegexps
{
template <typename Deleter, Deleter deleter>
struct HyperscanDeleter
{
template <typename T>
void operator()(T * ptr) const
{
deleter(ptr);
}
};
using Regexps = std::unique_ptr<hs_database_t, HyperscanDeleter<decltype(&hs_free_database), &hs_free_database>>;
using Pool = ObjectPoolMap<Regexps, std::vector<String>>;
template <bool FindAnyIndex>
inline Pool::Pointer get(const std::vector<StringRef> & patterns)
{
/// C++11 has thread-safe function-local statics on most modern compilers.
static Pool known_regexps; /// Different variables for different pattern parameters.
std::vector<String> str_patterns;
str_patterns.reserve(patterns.size());
for (const StringRef & ref : patterns)
str_patterns.push_back(ref.toString());
return known_regexps.get(str_patterns, [&str_patterns]
{
std::vector<const char *> ptrns;
std::vector<unsigned int> flags;
ptrns.reserve(str_patterns.size());
flags.reserve(str_patterns.size());
for (const StringRef ref : str_patterns)
{
ptrns.push_back(ref.data);
flags.push_back(HS_FLAG_DOTALL | HS_FLAG_ALLOWEMPTY | HS_FLAG_SINGLEMATCH);
}
hs_database_t * db = nullptr;
hs_compile_error_t * compile_error;
std::unique_ptr<unsigned int[]> ids;
if constexpr (FindAnyIndex)
{
ids.reset(new unsigned int[ptrns.size()]);
for (size_t i = 0; i < ptrns.size(); ++i)
ids[i] = i + 1;
}
hs_error_t err
= hs_compile_multi(ptrns.data(), flags.data(), ids.get(), ptrns.size(), HS_MODE_BLOCK, nullptr, &db, &compile_error);
if (err != HS_SUCCESS)
{
std::unique_ptr<
hs_compile_error_t,
HyperscanDeleter<decltype(&hs_free_compile_error), &hs_free_compile_error>> error(compile_error);
if (error->expression < 0)
throw Exception(String(error->message), ErrorCodes::LOGICAL_ERROR);
else
throw Exception(
"Pattern '" + str_patterns[error->expression] + "' failed with error '" + String(error->message),
ErrorCodes::LOGICAL_ERROR);
}
ProfileEvents::increment(ProfileEvents::RegexpCreated);
return new Regexps{db};
});
}
}
#endif // __SSSE3__
}

View File

@ -43,6 +43,8 @@ public:
return 1;
}
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
const DataTypeAggregateFunction * type = checkAndGetDataType<DataTypeAggregateFunction>(arguments[0].get());

View File

@ -23,12 +23,7 @@ namespace ErrorCodes
/// where N >= 1.
///
/// For all 1 <= i <= N, "cond_i" has type UInt8.
/// Types of all the branches "then_i" and "else" are either of the following:
/// - numeric types for which there exists a common type;
/// - dates;
/// - dates with time;
/// - strings;
/// - arrays of such types.
/// Types of all the branches "then_i" and "else" have a common type.
///
/// Additionally the arguments, conditions or branches, support nullable types
/// and the NULL value, with a NULL condition treated as false.

View File

@ -1,7 +1,7 @@
#pragma once
#include <Poco/Timespan.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
namespace DB
{

View File

@ -0,0 +1,76 @@
#include <Interpreters/BloomFilter.h>
#include <city.h>
namespace DB
{
static constexpr UInt64 SEED_GEN_A = 845897321;
static constexpr UInt64 SEED_GEN_B = 217728422;
StringBloomFilter::StringBloomFilter(size_t size_, size_t hashes_, size_t seed_)
: size(size_), hashes(hashes_), seed(seed_), words((size + sizeof(UnderType) - 1) / sizeof(UnderType)), filter(words, 0) {}
StringBloomFilter::StringBloomFilter(const StringBloomFilter & bloom_filter)
: size(bloom_filter.size), hashes(bloom_filter.hashes), seed(bloom_filter.seed), words(bloom_filter.words), filter(bloom_filter.filter) {}
bool StringBloomFilter::find(const char * data, size_t len)
{
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
for (size_t i = 0; i < hashes; ++i)
{
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
if (!(filter[pos / (8 * sizeof(UnderType))] & (1ULL << (pos % (8 * sizeof(UnderType))))))
return false;
}
return true;
}
void StringBloomFilter::add(const char * data, size_t len)
{
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
for (size_t i = 0; i < hashes; ++i)
{
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
filter[pos / (8 * sizeof(UnderType))] |= (1ULL << (pos % (8 * sizeof(UnderType))));
}
}
void StringBloomFilter::clear()
{
filter.assign(words, 0);
}
bool StringBloomFilter::contains(const StringBloomFilter & bf)
{
for (size_t i = 0; i < words; ++i)
{
if ((filter[i] & bf.filter[i]) != bf.filter[i])
return false;
}
return true;
}
UInt64 StringBloomFilter::isEmpty() const
{
for (size_t i = 0; i < words; ++i)
if (filter[i] != 0)
return false;
return true;
}
bool operator== (const StringBloomFilter & a, const StringBloomFilter & b)
{
for (size_t i = 0; i < a.words; ++i)
if (a.filter[i] != b.filter[i])
return false;
return true;
}
}

View File

@ -0,0 +1,50 @@
#pragma once
#include <Core/Types.h>
#include <vector>
namespace DB
{
/// Bloom filter for strings.
class StringBloomFilter
{
public:
using UnderType = UInt64;
using Container = std::vector<UnderType>;
/// size -- size of filter in bytes.
/// hashes -- number of used hash functions.
/// seed -- random seed for hash functions generation.
StringBloomFilter(size_t size_, size_t hashes_, size_t seed_);
StringBloomFilter(const StringBloomFilter & bloom_filter);
bool find(const char * data, size_t len);
void add(const char * data, size_t len);
void clear();
/// Checks if this contains everything from another bloom filter.
/// Bloom filters must have equal size and seed.
bool contains(const StringBloomFilter & bf);
const Container & getFilter() const { return filter; }
Container & getFilter() { return filter; }
/// For debug.
UInt64 isEmpty() const;
friend bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
private:
size_t size;
size_t hashes;
size_t seed;
size_t words;
Container filter;
};
bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
}

View File

@ -1,7 +1,7 @@
#pragma once
#include <map>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Client/ConnectionPool.h>
#include <Client/ConnectionPoolWithFailover.h>
#include <Poco/Net/SocketAddress.h>

View File

@ -1,6 +1,6 @@
#include <Interpreters/ClusterProxy/executeQuery.h>
#include <Interpreters/ClusterProxy/IStreamFactory.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/Context.h>
#include <Interpreters/Cluster.h>
#include <Interpreters/IInterpreter.h>

View File

@ -23,7 +23,7 @@
#include <Storages/CompressionCodecSelector.h>
#include <TableFunctions/TableFunctionFactory.h>
#include <Interpreters/ActionLocksManager.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/ExpressionJIT.h>
#include <Interpreters/RuntimeComponentsFactory.h>
#include <Interpreters/ISecurityManager.h>
@ -244,10 +244,18 @@ struct ContextShared
return;
shutdown_called = true;
system_logs.reset();
{
std::lock_guard lock(mutex);
/** After this point, system logs will shutdown their threads and no longer write any data.
* It will prevent recreation of system tables at shutdown.
* Note that part changes at shutdown won't be logged to part log.
*/
system_logs.reset();
}
/** At this point, some tables may have threads that block our mutex.
* To complete them correctly, we will copy the current list of tables,
* To shutdown them correctly, we will copy the current list of tables,
* and ask them all to finish their work.
* Then delete all objects with tables.
*/
@ -259,6 +267,8 @@ struct ContextShared
current_databases = databases;
}
/// We still hold "databases" in Context (instead of std::move) for Buffer tables to flush data correctly.
for (auto & database : current_databases)
database.second->shutdown();
@ -1548,51 +1558,47 @@ Compiler & Context::getCompiler()
void Context::initializeSystemLogs()
{
auto lock = getLock();
if (!global_context)
throw Exception("Logical error: no global context for system logs", ErrorCodes::LOGICAL_ERROR);
shared->system_logs.emplace(*global_context, getConfigRef());
}
QueryLog * Context::getQueryLog()
std::shared_ptr<QueryLog> Context::getQueryLog()
{
auto lock = getLock();
if (!shared->system_logs || !shared->system_logs->query_log)
return nullptr;
return {};
return shared->system_logs->query_log.get();
return shared->system_logs->query_log;
}
QueryThreadLog * Context::getQueryThreadLog()
std::shared_ptr<QueryThreadLog> Context::getQueryThreadLog()
{
auto lock = getLock();
if (!shared->system_logs || !shared->system_logs->query_thread_log)
return nullptr;
return {};
return shared->system_logs->query_thread_log.get();
return shared->system_logs->query_thread_log;
}
PartLog * Context::getPartLog(const String & part_database)
std::shared_ptr<PartLog> Context::getPartLog(const String & part_database)
{
auto lock = getLock();
/// No part log or system logs are shutting down.
if (!shared->system_logs || !shared->system_logs->part_log)
return nullptr;
return {};
/// Will not log operations on system tables (including part_log itself).
/// It doesn't make sense and not allow to destruct PartLog correctly due to infinite logging and flushing,
/// and also make troubles on startup.
if (part_database == shared->system_logs->part_log_database)
return nullptr;
return {};
return shared->system_logs->part_log.get();
return shared->system_logs->part_log;
}

View File

@ -4,7 +4,7 @@
#include <Core/NamesAndTypes.h>
#include <Core/Types.h>
#include <Interpreters/ClientInfo.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Parsers/IAST_fwd.h>
#include <Common/LRUCache.h>
#include <Common/MultiVersion.h>
@ -402,12 +402,12 @@ public:
void initializeSystemLogs();
/// Nullptr if the query log is not ready for this moment.
QueryLog * getQueryLog();
QueryThreadLog * getQueryThreadLog();
std::shared_ptr<QueryLog> getQueryLog();
std::shared_ptr<QueryThreadLog> getQueryThreadLog();
/// Returns an object used to log opertaions with parts if it possible.
/// Provide table name to make required cheks.
PartLog * getPartLog(const String & part_database);
std::shared_ptr<PartLog> getPartLog(const String & part_database);
const MergeTreeSettings & getMergeTreeSettings() const;

View File

@ -3,7 +3,7 @@
#include <Interpreters/Context.h>
#include <Common/config.h>
#include <Common/SipHash.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Core/Names.h>
#include <Core/ColumnWithTypeAndName.h>
#include <Core/Block.h>

View File

@ -2,7 +2,7 @@
#include <Interpreters/ActionsVisitor.h>
#include <Interpreters/AggregateDescription.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/SyntaxAnalyzer.h>
#include <Parsers/IAST_fwd.h>

View File

@ -2,7 +2,7 @@
#include <string>
#include <Core/Types.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
namespace DB

View File

@ -5,7 +5,7 @@
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Interpreters/AggregationCommon.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
#include <Common/Arena.h>
#include <Common/ColumnsHashing.h>

View File

@ -1,5 +1,5 @@
#include <Interpreters/LogicalExpressionsOptimizer.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTSelectQuery.h>

View File

@ -104,7 +104,7 @@ bool PartLog::addNewParts(Context & current_context, const PartLog::MutableDataP
if (parts.empty())
return true;
PartLog * part_log = nullptr;
std::shared_ptr<PartLog> part_log;
try
{

View File

@ -1,5 +1,5 @@
#include <Interpreters/ProcessList.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/Context.h>
#include <Interpreters/DatabaseAndTableWithAlias.h>
#include <Parsers/ASTSelectWithUnionQuery.h>

View File

@ -1,17 +1,17 @@
#include <Interpreters/SecurityManager.h>
#include "SecurityManager.h"
#include <Poco/Net/IPAddress.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Poco/String.h>
#include <Common/Exception.h>
#include <IO/HexWriteBuffer.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteHelpers.h>
#include <openssl/sha.h>
#include <common/logger_useful.h>
#include <Common/config.h>
#if USE_SSL
# include <openssl/sha.h>
#endif
namespace DB
{
@ -25,6 +25,7 @@ namespace ErrorCodes
extern const int WRONG_PASSWORD;
extern const int IP_ADDRESS_NOT_ALLOWED;
extern const int BAD_ARGUMENTS;
extern const int SUPPORT_IS_DISABLED;
}
using UserPtr = SecurityManager::UserPtr;
@ -68,6 +69,7 @@ UserPtr SecurityManager::authorizeAndGetUser(
if (!it->second->password_sha256_hex.empty())
{
#if USE_SSL
unsigned char hash[32];
SHA256_CTX ctx;
@ -86,6 +88,9 @@ UserPtr SecurityManager::authorizeAndGetUser(
if (hash_hex != it->second->password_sha256_hex)
on_wrong_password();
#else
throw DB::Exception("SHA256 passwords support is disabled, because ClickHouse was built without SSL library", DB::ErrorCodes::SUPPORT_IS_DISABLED);
#endif
}
else if (password != it->second->password)
{

View File

@ -1,7 +1,7 @@
#include <Interpreters/SyntaxAnalyzer.h>
#include <Interpreters/InJoinSubqueriesPreprocessor.h>
#include <Interpreters/LogicalExpressionsOptimizer.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/QueryAliasesVisitor.h>
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
#include <Interpreters/ArrayJoinedColumnsVisitor.h>

View File

@ -16,7 +16,7 @@ constexpr size_t DEFAULT_SYSTEM_LOG_FLUSH_INTERVAL_MILLISECONDS = 7500;
/// Creates a system log with MergeTree engine using parameters from config
template <typename TSystemLog>
std::unique_ptr<TSystemLog> createSystemLog(
std::shared_ptr<TSystemLog> createSystemLog(
Context & context,
const String & default_database_name,
const String & default_table_name,
@ -33,7 +33,7 @@ std::unique_ptr<TSystemLog> createSystemLog(
size_t flush_interval_milliseconds = config.getUInt64(config_prefix + ".flush_interval_milliseconds", DEFAULT_SYSTEM_LOG_FLUSH_INTERVAL_MILLISECONDS);
return std::make_unique<TSystemLog>(context, database, table, engine, flush_interval_milliseconds);
return std::make_shared<TSystemLog>(context, database, table, engine, flush_interval_milliseconds);
}
}
@ -49,6 +49,14 @@ SystemLogs::SystemLogs(Context & global_context, const Poco::Util::AbstractConfi
}
SystemLogs::~SystemLogs() = default;
SystemLogs::~SystemLogs()
{
if (query_log)
query_log->shutdown();
if (query_thread_log)
query_thread_log->shutdown();
if (part_log)
part_log->shutdown();
}
}

View File

@ -1,6 +1,7 @@
#pragma once
#include <thread>
#include <atomic>
#include <boost/noncopyable.hpp>
#include <common/logger_useful.h>
#include <Core/Types.h>
@ -66,9 +67,9 @@ struct SystemLogs
SystemLogs(Context & global_context, const Poco::Util::AbstractConfiguration & config);
~SystemLogs();
std::unique_ptr<QueryLog> query_log; /// Used to log queries.
std::unique_ptr<QueryThreadLog> query_thread_log; /// Used to log query threads.
std::unique_ptr<PartLog> part_log; /// Used to log operations with parts
std::shared_ptr<QueryLog> query_log; /// Used to log queries.
std::shared_ptr<QueryThreadLog> query_thread_log; /// Used to log query threads.
std::shared_ptr<PartLog> part_log; /// Used to log operations with parts
String part_log_database;
};
@ -78,7 +79,6 @@ template <typename LogElement>
class SystemLog : private boost::noncopyable
{
public:
using Self = SystemLog;
/** Parameter: table name where to write log.
@ -103,13 +103,23 @@ public:
*/
void add(const LogElement & element)
{
if (is_shutdown)
return;
/// Without try we could block here in case of queue overflow.
if (!queue.tryPush({false, element}))
LOG_ERROR(log, "SystemLog queue is full");
}
/// Flush data in the buffer to disk
void flush(bool quiet = false);
void flush()
{
if (!is_shutdown)
flushImpl(false);
}
/// Stop the background flush thread before destructor. No more data will be written.
void shutdown();
protected:
Context & context;
@ -118,6 +128,7 @@ protected:
const String storage_def;
StoragePtr table;
const size_t flush_interval_milliseconds;
std::atomic<bool> is_shutdown{false};
using QueueItem = std::pair<bool, LogElement>; /// First element is shutdown flag for thread.
@ -145,6 +156,8 @@ protected:
*/
bool is_prepared = false;
void prepareTable();
void flushImpl(bool quiet);
};
@ -166,14 +179,25 @@ SystemLog<LogElement>::SystemLog(Context & context_,
template <typename LogElement>
SystemLog<LogElement>::~SystemLog()
void SystemLog<LogElement>::shutdown()
{
bool old_val = false;
if (!is_shutdown.compare_exchange_strong(old_val, true))
return;
/// Tell thread to shutdown.
queue.push({true, {}});
saving_thread.join();
}
template <typename LogElement>
SystemLog<LogElement>::~SystemLog()
{
shutdown();
}
template <typename LogElement>
void SystemLog<LogElement>::threadFunction()
{
@ -236,7 +260,7 @@ void SystemLog<LogElement>::threadFunction()
if (milliseconds_elapsed >= flush_interval_milliseconds)
{
/// Write data to a table.
flush(true);
flushImpl(true);
time_after_last_write.restart();
}
}
@ -251,7 +275,7 @@ void SystemLog<LogElement>::threadFunction()
template <typename LogElement>
void SystemLog<LogElement>::flush(bool quiet)
void SystemLog<LogElement>::flushImpl(bool quiet)
{
std::unique_lock lock(data_mutex);

View File

@ -1,5 +1,4 @@
#include <string.h>
#include <Poco/RegularExpression.h>
#include <Poco/Net/IPAddress.h>
#include <Poco/Net/SocketAddress.h>
@ -7,7 +6,6 @@
#include <Poco/Util/Application.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Poco/String.h>
#include <Common/Exception.h>
#include <IO/ReadHelpers.h>
#include <IO/HexWriteBuffer.h>
@ -16,12 +14,9 @@
#include <Common/SimpleCache.h>
#include <Common/StringUtils/StringUtils.h>
#include <Interpreters/Users.h>
#include <openssl/sha.h>
#include <common/logger_useful.h>
#include <ext/scope_guard.h>
#include <Common/config.h>
namespace DB

View File

@ -6,7 +6,7 @@
#include <Parsers/queryToString.h>
#include <Interpreters/InJoinSubqueriesPreprocessor.h>
#include <Interpreters/Context.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Storages/IStorage.h>
#include <Databases/IDatabase.h>
#include <Databases/DatabaseOrdinary.h>

View File

@ -3,7 +3,7 @@
#include <Parsers/parseQuery.h>
#include <Parsers/queryToString.h>
#include <Interpreters/LogicalExpressionsOptimizer.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Common/typeid_cast.h>
#include <iostream>

View File

@ -0,0 +1,70 @@
#include <Parsers/ASTColumnDeclaration.h>
namespace DB
{
ASTPtr ASTColumnDeclaration::clone() const
{
const auto res = std::make_shared<ASTColumnDeclaration>(*this);
res->children.clear();
if (type)
{
res->type = type;
res->children.push_back(res->type);
}
if (default_expression)
{
res->default_expression = default_expression->clone();
res->children.push_back(res->default_expression);
}
if (codec)
{
res->codec = codec->clone();
res->children.push_back(res->codec);
}
if (comment)
{
res->comment = comment->clone();
res->children.push_back(res->comment);
}
return res;
}
void ASTColumnDeclaration::formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
frame.need_parens = false;
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
settings.ostr << settings.nl_or_ws << indent_str << backQuoteIfNeed(name);
if (type)
{
settings.ostr << ' ';
type->formatImpl(settings, state, frame);
}
if (default_expression)
{
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << default_specifier << (settings.hilite ? hilite_none : "") << ' ';
default_expression->formatImpl(settings, state, frame);
}
if (comment)
{
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << "COMMENT" << (settings.hilite ? hilite_none : "") << ' ';
comment->formatImpl(settings, state, frame);
}
if (codec)
{
settings.ostr << ' ';
codec->formatImpl(settings, state, frame);
}
}
}

View File

@ -20,68 +20,8 @@ public:
String getID(char delim) const override { return "ColumnDeclaration" + (delim + name); }
ASTPtr clone() const override
{
const auto res = std::make_shared<ASTColumnDeclaration>(*this);
res->children.clear();
if (type)
{
res->type = type;
res->children.push_back(res->type);
}
if (default_expression)
{
res->default_expression = default_expression->clone();
res->children.push_back(res->default_expression);
}
if (codec)
{
res->codec = codec->clone();
res->children.push_back(res->codec);
}
if (comment)
{
res->comment = comment->clone();
res->children.push_back(res->comment);
}
return res;
}
void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override
{
frame.need_parens = false;
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
settings.ostr << settings.nl_or_ws << indent_str << backQuoteIfNeed(name);
if (type)
{
settings.ostr << ' ';
type->formatImpl(settings, state, frame);
}
if (default_expression)
{
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << default_specifier << (settings.hilite ? hilite_none : "") << ' ';
default_expression->formatImpl(settings, state, frame);
}
if (comment)
{
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << "COMMENT" << (settings.hilite ? hilite_none : "") << ' ';
comment->formatImpl(settings, state, frame);
}
if (codec)
{
settings.ostr << ' ';
codec->formatImpl(settings, state, frame);
}
}
ASTPtr clone() const override;
void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override;
};
}

View File

@ -0,0 +1,252 @@
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
#include <Parsers/ASTCreateQuery.h>
namespace DB
{
ASTPtr ASTStorage::clone() const
{
auto res = std::make_shared<ASTStorage>(*this);
res->children.clear();
if (engine)
res->set(res->engine, engine->clone());
if (partition_by)
res->set(res->partition_by, partition_by->clone());
if (primary_key)
res->set(res->primary_key, primary_key->clone());
if (order_by)
res->set(res->order_by, order_by->clone());
if (sample_by)
res->set(res->sample_by, sample_by->clone());
if (settings)
res->set(res->settings, settings->clone());
return res;
}
void ASTStorage::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
{
if (engine)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ENGINE" << (s.hilite ? hilite_none : "") << " = ";
engine->formatImpl(s, state, frame);
}
if (partition_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PARTITION BY " << (s.hilite ? hilite_none : "");
partition_by->formatImpl(s, state, frame);
}
if (primary_key)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PRIMARY KEY " << (s.hilite ? hilite_none : "");
primary_key->formatImpl(s, state, frame);
}
if (order_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ORDER BY " << (s.hilite ? hilite_none : "");
order_by->formatImpl(s, state, frame);
}
if (sample_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SAMPLE BY " << (s.hilite ? hilite_none : "");
sample_by->formatImpl(s, state, frame);
}
if (settings)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SETTINGS " << (s.hilite ? hilite_none : "");
settings->formatImpl(s, state, frame);
}
}
class ASTColumnsElement : public IAST
{
public:
String prefix;
IAST * elem;
String getID(char c) const override { return "ASTColumnsElement for " + elem->getID(c); }
ASTPtr clone() const override;
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
};
ASTPtr ASTColumnsElement::clone() const
{
auto res = std::make_shared<ASTColumnsElement>();
res->prefix = prefix;
if (elem)
res->set(res->elem, elem->clone());
return res;
}
void ASTColumnsElement::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
{
if (!elem)
return;
if (prefix.empty())
{
elem->formatImpl(s, state, frame);
return;
}
frame.need_parens = false;
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
s.ostr << s.nl_or_ws << indent_str;
s.ostr << (s.hilite ? hilite_keyword : "") << prefix << (s.hilite ? hilite_none : "");
FormatSettings nested_settings = s;
nested_settings.one_line = true;
nested_settings.nl_or_ws = ' ';
elem->formatImpl(nested_settings, state, frame);
}
ASTPtr ASTColumns::clone() const
{
auto res = std::make_shared<ASTColumns>();
if (columns)
res->set(res->columns, columns->clone());
if (indices)
res->set(res->indices, indices->clone());
return res;
}
void ASTColumns::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
{
ASTExpressionList list;
if (columns)
{
for (const auto & column : columns->children)
{
auto elem = std::make_shared<ASTColumnsElement>();
elem->prefix = "";
elem->set(elem->elem, column->clone());
list.children.push_back(elem);
}
}
if (indices)
{
for (const auto & index : indices->children)
{
auto elem = std::make_shared<ASTColumnsElement>();
elem->prefix = "INDEX";
elem->set(elem->elem, index->clone());
list.children.push_back(elem);
}
}
if (!list.children.empty())
list.formatImpl(s, state, frame);
}
ASTPtr ASTCreateQuery::clone() const
{
auto res = std::make_shared<ASTCreateQuery>(*this);
res->children.clear();
if (columns_list)
res->set(res->columns_list, columns_list->clone());
if (storage)
res->set(res->storage, storage->clone());
if (select)
res->set(res->select, select->clone());
cloneOutputOptions(*res);
return res;
}
void ASTCreateQuery::formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
frame.need_parens = false;
if (!database.empty() && table.empty())
{
settings.ostr << (settings.hilite ? hilite_keyword : "")
<< (attach ? "ATTACH DATABASE " : "CREATE DATABASE ")
<< (if_not_exists ? "IF NOT EXISTS " : "")
<< (settings.hilite ? hilite_none : "")
<< backQuoteIfNeed(database);
formatOnCluster(settings);
if (storage)
storage->formatImpl(settings, state, frame);
return;
}
{
std::string what = "TABLE";
if (is_view)
what = "VIEW";
if (is_materialized_view)
what = "MATERIALIZED VIEW";
settings.ostr
<< (settings.hilite ? hilite_keyword : "")
<< (attach ? "ATTACH " : "CREATE ")
<< (temporary ? "TEMPORARY " : "")
<< (replace_view ? "OR REPLACE " : "")
<< what << " "
<< (if_not_exists ? "IF NOT EXISTS " : "")
<< (settings.hilite ? hilite_none : "")
<< (!database.empty() ? backQuoteIfNeed(database) + "." : "") << backQuoteIfNeed(table);
formatOnCluster(settings);
}
if (!to_table.empty())
{
settings.ostr
<< (settings.hilite ? hilite_keyword : "") << " TO " << (settings.hilite ? hilite_none : "")
<< (!to_database.empty() ? backQuoteIfNeed(to_database) + "." : "") << backQuoteIfNeed(to_table);
}
if (!as_table.empty())
{
settings.ostr
<< (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_none : "")
<< (!as_database.empty() ? backQuoteIfNeed(as_database) + "." : "") << backQuoteIfNeed(as_table);
}
if (columns_list)
{
settings.ostr << (settings.one_line ? " (" : "\n(");
FormatStateStacked frame_nested = frame;
++frame_nested.indent;
columns_list->formatImpl(settings, state, frame_nested);
settings.ostr << (settings.one_line ? ")" : "\n)");
}
if (storage)
storage->formatImpl(settings, state, frame);
if (is_populate)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " POPULATE" << (settings.hilite ? hilite_none : "");
}
if (select)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS" << settings.nl_or_ws << (settings.hilite ? hilite_none : "");
select->formatImpl(settings, state, frame);
}
}
}

View File

@ -1,9 +1,5 @@
#pragma once
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
#include <Parsers/ASTQueryWithTableAndOutput.h>
#include <Parsers/ASTQueryWithOnCluster.h>
@ -11,6 +7,9 @@
namespace DB
{
class ASTFunction;
class ASTSetQuery;
class ASTStorage : public IAST
{
public:
@ -23,154 +22,30 @@ public:
String getID(char) const override { return "Storage definition"; }
ASTPtr clone() const override
{
auto res = std::make_shared<ASTStorage>(*this);
res->children.clear();
ASTPtr clone() const override;
if (engine)
res->set(res->engine, engine->clone());
if (partition_by)
res->set(res->partition_by, partition_by->clone());
if (primary_key)
res->set(res->primary_key, primary_key->clone());
if (order_by)
res->set(res->order_by, order_by->clone());
if (sample_by)
res->set(res->sample_by, sample_by->clone());
if (settings)
res->set(res->settings, settings->clone());
return res;
}
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
{
if (engine)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ENGINE" << (s.hilite ? hilite_none : "") << " = ";
engine->formatImpl(s, state, frame);
}
if (partition_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PARTITION BY " << (s.hilite ? hilite_none : "");
partition_by->formatImpl(s, state, frame);
}
if (primary_key)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PRIMARY KEY " << (s.hilite ? hilite_none : "");
primary_key->formatImpl(s, state, frame);
}
if (order_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ORDER BY " << (s.hilite ? hilite_none : "");
order_by->formatImpl(s, state, frame);
}
if (sample_by)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SAMPLE BY " << (s.hilite ? hilite_none : "");
sample_by->formatImpl(s, state, frame);
}
if (settings)
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SETTINGS " << (s.hilite ? hilite_none : "");
settings->formatImpl(s, state, frame);
}
}
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
};
class ASTExpressionList;
class ASTColumns : public IAST
{
private:
class ASTColumnsElement : public IAST
{
public:
String prefix;
IAST * elem;
String getID(char c) const override { return "ASTColumnsElement for " + elem->getID(c); }
ASTPtr clone() const override
{
auto res = std::make_shared<ASTColumnsElement>();
res->prefix = prefix;
if (elem)
res->set(res->elem, elem->clone());
return res;
}
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
{
if (!elem)
return;
if (prefix.empty())
{
elem->formatImpl(s, state, frame);
return;
}
frame.need_parens = false;
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
s.ostr << s.nl_or_ws << indent_str;
s.ostr << (s.hilite ? hilite_keyword : "") << prefix << (s.hilite ? hilite_none : "");
FormatSettings nested_settings = s;
nested_settings.one_line = true;
nested_settings.nl_or_ws = ' ';
elem->formatImpl(nested_settings, state, frame);
}
};
public:
ASTExpressionList * columns = nullptr;
ASTExpressionList * indices = nullptr;
String getID(char) const override { return "Columns definition"; }
ASTPtr clone() const override
{
auto res = std::make_shared<ASTColumns>();
ASTPtr clone() const override;
if (columns)
res->set(res->columns, columns->clone());
if (indices)
res->set(res->indices, indices->clone());
return res;
}
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
{
ASTExpressionList list;
if (columns)
for (const auto & column : columns->children)
{
auto elem = std::make_shared<ASTColumnsElement>();
elem->prefix = "";
elem->set(elem->elem, column->clone());
list.children.push_back(elem);
}
if (indices)
for (const auto & index : indices->children)
{
auto elem = std::make_shared<ASTColumnsElement>();
elem->prefix = "INDEX";
elem->set(elem->elem, index->clone());
list.children.push_back(elem);
}
if (!list.children.empty())
list.formatImpl(s, state, frame);
}
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
};
class ASTSelectWithUnionQuery;
/// CREATE TABLE or ATTACH TABLE query
class ASTCreateQuery : public ASTQueryWithTableAndOutput, public ASTQueryWithOnCluster
{
@ -192,22 +67,7 @@ public:
/** Get the text that identifies this element. */
String getID(char delim) const override { return (attach ? "AttachQuery" : "CreateQuery") + (delim + database) + delim + table; }
ASTPtr clone() const override
{
auto res = std::make_shared<ASTCreateQuery>(*this);
res->children.clear();
if (columns_list)
res->set(res->columns_list, columns_list->clone());
if (storage)
res->set(res->storage, storage->clone());
if (select)
res->set(res->select, select->clone());
cloneOutputOptions(*res);
return res;
}
ASTPtr clone() const override;
ASTPtr getRewrittenASTWithoutOnCluster(const std::string & new_database) const override
{
@ -215,81 +75,7 @@ public:
}
protected:
void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override
{
frame.need_parens = false;
if (!database.empty() && table.empty())
{
settings.ostr << (settings.hilite ? hilite_keyword : "")
<< (attach ? "ATTACH DATABASE " : "CREATE DATABASE ")
<< (if_not_exists ? "IF NOT EXISTS " : "")
<< (settings.hilite ? hilite_none : "")
<< backQuoteIfNeed(database);
formatOnCluster(settings);
if (storage)
storage->formatImpl(settings, state, frame);
return;
}
{
std::string what = "TABLE";
if (is_view)
what = "VIEW";
if (is_materialized_view)
what = "MATERIALIZED VIEW";
settings.ostr
<< (settings.hilite ? hilite_keyword : "")
<< (attach ? "ATTACH " : "CREATE ")
<< (temporary ? "TEMPORARY " : "")
<< (replace_view ? "OR REPLACE " : "")
<< what << " "
<< (if_not_exists ? "IF NOT EXISTS " : "")
<< (settings.hilite ? hilite_none : "")
<< (!database.empty() ? backQuoteIfNeed(database) + "." : "") << backQuoteIfNeed(table);
formatOnCluster(settings);
}
if (!to_table.empty())
{
settings.ostr
<< (settings.hilite ? hilite_keyword : "") << " TO " << (settings.hilite ? hilite_none : "")
<< (!to_database.empty() ? backQuoteIfNeed(to_database) + "." : "") << backQuoteIfNeed(to_table);
}
if (!as_table.empty())
{
settings.ostr
<< (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_none : "")
<< (!as_database.empty() ? backQuoteIfNeed(as_database) + "." : "") << backQuoteIfNeed(as_table);
}
if (columns_list)
{
settings.ostr << (settings.one_line ? " (" : "\n(");
FormatStateStacked frame_nested = frame;
++frame_nested.indent;
columns_list->formatImpl(settings, state, frame_nested);
settings.ostr << (settings.one_line ? ")" : "\n)");
}
if (storage)
storage->formatImpl(settings, state, frame);
if (is_populate)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " POPULATE" << (settings.hilite ? hilite_none : "");
}
if (select)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS" << settings.nl_or_ws << (settings.hilite ? hilite_none : "");
select->formatImpl(settings, state, frame);
}
}
void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override;
};
}

View File

@ -4,6 +4,8 @@
#include <Parsers/ASTIndexDeclaration.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
#include <Parsers/ExpressionListParsers.h>
#include <Parsers/ParserCreateQuery.h>
#include <Parsers/ParserSelectWithUnionQuery.h>

View File

@ -427,7 +427,7 @@ ColumnsDescription ColumnsDescription::parse(const String & str)
const ColumnsDescription * ColumnsDescription::loadFromContext(const Context & context, const String & db, const String & table)
{
if (context.getSettingsRef().insert_sample_with_metadata)
if (context.getSettingsRef().input_format_defaults_for_omitted_fields)
{
if (context.isTableExist(db, table))
{

View File

@ -1,5 +1,7 @@
#include <Storages/Kafka/KafkaSettings.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTFunction.h>
#include <Common/Exception.h>

View File

@ -3,7 +3,7 @@
#include <Poco/Util/AbstractConfiguration.h>
#include <Core/Defines.h>
#include <Core/Types.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
namespace DB

View File

@ -317,7 +317,7 @@ bool KeyCondition::addCondition(const String & column, const Range & range)
/** Computes value of constant expression and its data type.
* Returns false, if expression isn't constant.
*/
static bool getConstant(const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type)
bool KeyCondition::getConstant(const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type)
{
String column_name = expr->getColumnName();

View File

@ -266,6 +266,11 @@ public:
*/
using MonotonicFunctionsChain = std::vector<FunctionBasePtr>;
/** Computes value of constant expression and its data type.
* Returns false, if expression isn't constant.
*/
static bool getConstant(
const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type);
static Block getBlockWithConstants(
const ASTPtr & query, const SyntaxAnalyzerResultPtr & syntax_analyzer_result, const Context & context);

View File

@ -0,0 +1,710 @@
#include <Storages/MergeTree/MergeTreeBloomFilterIndex.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/UTF8Helpers.h>
#include <DataTypes/DataTypesNumber.h>
#include <IO/WriteHelpers.h>
#include <IO/ReadHelpers.h>
#include <Interpreters/ExpressionActions.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Interpreters/SyntaxAnalyzer.h>
#include <Interpreters/QueryNormalizer.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Storages/MergeTree/RPNBuilder.h>
#include <Parsers/ASTIdentifier.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTSubquery.h>
#include <Poco/Logger.h>
#include <boost/algorithm/string.hpp>
namespace DB
{
namespace ErrorCodes
{
extern const int INCORRECT_QUERY;
}
/// Adds all tokens from string to bloom filter.
static void stringToBloomFilter(
const char * data, size_t size, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
{
size_t cur = 0;
size_t token_start = 0;
size_t token_len = 0;
while (cur < size && token_extractor->next(data, size, &cur, &token_start, &token_len))
bloom_filter.add(data + token_start, token_len);
}
/// Adds all tokens from like pattern string to bloom filter. (Because like pattern can contain `\%` and `\_`.)
static void likeStringToBloomFilter(
const String & data, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
{
size_t cur = 0;
String token;
while (cur < data.size() && token_extractor->nextLike(data, &cur, token))
bloom_filter.add(token.c_str(), token.size());
}
MergeTreeBloomFilterIndexGranule::MergeTreeBloomFilterIndexGranule(const MergeTreeBloomFilterIndex & index)
: IMergeTreeIndexGranule()
, index(index)
, bloom_filters(
index.columns.size(), StringBloomFilter(index.bloom_filter_size, index.bloom_filter_hashes, index.seed))
, has_elems(false) {}
void MergeTreeBloomFilterIndexGranule::serializeBinary(WriteBuffer & ostr) const
{
if (empty())
throw Exception(
"Attempt to write empty minmax index `" + index.name + "`", ErrorCodes::LOGICAL_ERROR);
for (const auto & bloom_filter : bloom_filters)
ostr.write(reinterpret_cast<const char *>(bloom_filter.getFilter().data()), index.bloom_filter_size);
}
void MergeTreeBloomFilterIndexGranule::deserializeBinary(ReadBuffer & istr)
{
for (auto & bloom_filter : bloom_filters)
{
istr.read(reinterpret_cast<char *>(bloom_filter.getFilter().data()), index.bloom_filter_size);
}
has_elems = true;
}
MergeTreeBloomFilterIndexAggregator::MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index)
: index(index), granule(std::make_shared<MergeTreeBloomFilterIndexGranule>(index)) {}
MergeTreeIndexGranulePtr MergeTreeBloomFilterIndexAggregator::getGranuleAndReset()
{
auto new_granule = std::make_shared<MergeTreeBloomFilterIndexGranule>(index);
new_granule.swap(granule);
return new_granule;
}
void MergeTreeBloomFilterIndexAggregator::update(const Block & block, size_t * pos, size_t limit)
{
if (*pos >= block.rows())
throw Exception(
"The provided position is not less than the number of block rows. Position: "
+ toString(*pos) + ", Block rows: " + toString(block.rows()) + ".", ErrorCodes::LOGICAL_ERROR);
size_t rows_read = std::min(limit, block.rows() - *pos);
for (size_t col = 0; col < index.columns.size(); ++col)
{
const auto & column = block.getByName(index.columns[col]).column;
for (size_t i = 0; i < rows_read; ++i)
{
auto ref = column->getDataAt(*pos + i);
stringToBloomFilter(ref.data, ref.size, index.token_extractor_func, granule->bloom_filters[col]);
}
}
granule->has_elems = true;
*pos += rows_read;
}
const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
{
{
"notEquals",
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
{
out.function = RPNElement::FUNCTION_NOT_EQUALS;
out.bloom_filter = std::make_unique<StringBloomFilter>(
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
const auto & str = value.get<String>();
stringToBloomFilter(str.c_str(), str.size(), idx.token_extractor_func, *out.bloom_filter);
return true;
}
},
{
"equals",
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
{
out.function = RPNElement::FUNCTION_EQUALS;
out.bloom_filter = std::make_unique<StringBloomFilter>(
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
const auto & str = value.get<String>();
stringToBloomFilter(str.c_str(), str.size(), idx.token_extractor_func, *out.bloom_filter);
return true;
}
},
{
"like",
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
{
out.function = RPNElement::FUNCTION_LIKE;
out.bloom_filter = std::make_unique<StringBloomFilter>(
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
const auto & str = value.get<String>();
likeStringToBloomFilter(str, idx.token_extractor_func, *out.bloom_filter);
return true;
}
},
{
"notIn",
[] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
{
out.function = RPNElement::FUNCTION_NOT_IN;
return true;
}
},
{
"in",
[] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
{
out.function = RPNElement::FUNCTION_IN;
return true;
}
},
};
BloomFilterCondition::BloomFilterCondition(
const SelectQueryInfo & query_info,
const Context & context,
const MergeTreeBloomFilterIndex & index_) : index(index_), prepared_sets(query_info.sets)
{
rpn = std::move(
RPNBuilder<RPNElement>(
query_info, context,
[this] (const ASTPtr & node,
const Context & /* context */,
Block & block_with_constants,
RPNElement & out) -> bool
{
return this->atomFromAST(node, block_with_constants, out);
}).extractRPN());
}
bool BloomFilterCondition::alwaysUnknownOrTrue() const
{
/// Check like in KeyCondition.
std::vector<bool> rpn_stack;
for (const auto & element : rpn)
{
if (element.function == RPNElement::FUNCTION_UNKNOWN
|| element.function == RPNElement::ALWAYS_TRUE)
{
rpn_stack.push_back(true);
}
else if (element.function == RPNElement::FUNCTION_EQUALS
|| element.function == RPNElement::FUNCTION_NOT_EQUALS
|| element.function == RPNElement::FUNCTION_LIKE
|| element.function == RPNElement::FUNCTION_NOT_LIKE
|| element.function == RPNElement::FUNCTION_IN
|| element.function == RPNElement::FUNCTION_NOT_IN
|| element.function == RPNElement::ALWAYS_FALSE)
{
rpn_stack.push_back(false);
}
else if (element.function == RPNElement::FUNCTION_NOT)
{
// do nothing
}
else if (element.function == RPNElement::FUNCTION_AND)
{
auto arg1 = rpn_stack.back();
rpn_stack.pop_back();
auto arg2 = rpn_stack.back();
rpn_stack.back() = arg1 && arg2;
}
else if (element.function == RPNElement::FUNCTION_OR)
{
auto arg1 = rpn_stack.back();
rpn_stack.pop_back();
auto arg2 = rpn_stack.back();
rpn_stack.back() = arg1 || arg2;
}
else
throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
}
return rpn_stack[0];
}
bool BloomFilterCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
{
std::shared_ptr<MergeTreeBloomFilterIndexGranule> granule
= std::dynamic_pointer_cast<MergeTreeBloomFilterIndexGranule>(idx_granule);
if (!granule)
throw Exception(
"BloomFilter index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
/// Check like in KeyCondition.
std::vector<BoolMask> rpn_stack;
for (const auto & element : rpn)
{
if (element.function == RPNElement::FUNCTION_UNKNOWN)
{
rpn_stack.emplace_back(true, true);
}
else if (element.function == RPNElement::FUNCTION_EQUALS
|| element.function == RPNElement::FUNCTION_NOT_EQUALS)
{
rpn_stack.emplace_back(
granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true);
if (element.function == RPNElement::FUNCTION_NOT_EQUALS)
rpn_stack.back() = !rpn_stack.back();
}
else if (element.function == RPNElement::FUNCTION_LIKE
|| element.function == RPNElement::FUNCTION_NOT_LIKE)
{
rpn_stack.emplace_back(
granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true);
if (element.function == RPNElement::FUNCTION_NOT_LIKE)
rpn_stack.back() = !rpn_stack.back();
}
else if (element.function == RPNElement::FUNCTION_IN
|| element.function == RPNElement::FUNCTION_NOT_IN)
{
std::vector<bool> result(element.set_bloom_filters.back().size(), true);
for (size_t column = 0; column < element.set_key_position.size(); ++column)
{
const size_t key_idx = element.set_key_position[column];
const auto & bloom_filters = element.set_bloom_filters[column];
for (size_t row = 0; row < bloom_filters.size(); ++row)
result[row] = result[row] && granule->bloom_filters[key_idx].contains(bloom_filters[row]);
}
rpn_stack.emplace_back(
std::find(std::cbegin(result), std::cend(result), true) != std::end(result), true);
if (element.function == RPNElement::FUNCTION_NOT_IN)
rpn_stack.back() = !rpn_stack.back();
}
else if (element.function == RPNElement::FUNCTION_NOT)
{
rpn_stack.back() = !rpn_stack.back();
}
else if (element.function == RPNElement::FUNCTION_AND)
{
auto arg1 = rpn_stack.back();
rpn_stack.pop_back();
auto arg2 = rpn_stack.back();
rpn_stack.back() = arg1 & arg2;
}
else if (element.function == RPNElement::FUNCTION_OR)
{
auto arg1 = rpn_stack.back();
rpn_stack.pop_back();
auto arg2 = rpn_stack.back();
rpn_stack.back() = arg1 | arg2;
}
else if (element.function == RPNElement::ALWAYS_FALSE)
{
rpn_stack.emplace_back(false, true);
}
else if (element.function == RPNElement::ALWAYS_TRUE)
{
rpn_stack.emplace_back(true, false);
}
else
throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
}
if (rpn_stack.size() != 1)
throw Exception("Unexpected stack size in KeyCondition::mayBeTrueInRange", ErrorCodes::LOGICAL_ERROR);
return rpn_stack[0].can_be_true;
}
bool BloomFilterCondition::getKey(const ASTPtr & node, size_t & key_column_num)
{
auto it = std::find(index.columns.begin(), index.columns.end(), node->getColumnName());
if (it == index.columns.end())
return false;
key_column_num = static_cast<size_t>(it - index.columns.begin());
return true;
}
bool BloomFilterCondition::atomFromAST(
const ASTPtr & node, Block & block_with_constants, RPNElement & out)
{
Field const_value;
DataTypePtr const_type;
if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
{
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
if (args.size() != 2)
return false;
size_t key_arg_pos; /// Position of argument with key column (non-const argument)
size_t key_column_num = -1; /// Number of a key column (inside key_column_names array)
if (functionIsInOrGlobalInOperator(func->name) && tryPrepareSetBloomFilter(args, out))
{
key_arg_pos = 0;
}
else if (KeyCondition::getConstant(args[1], block_with_constants, const_value, const_type) && getKey(args[0], key_column_num))
{
key_arg_pos = 0;
}
else if (KeyCondition::getConstant(args[0], block_with_constants, const_value, const_type) && getKey(args[1], key_column_num))
{
key_arg_pos = 1;
}
else
return false;
if (const_type && const_type->getTypeId() != TypeIndex::String && const_type->getTypeId() != TypeIndex::FixedString)
return false;
if (key_arg_pos == 1 && (func->name != "equals" || func->name != "notEquals"))
return false;
else if (!index.token_extractor_func->supportLike() && (func->name == "like" || func->name == "notLike"))
return false;
else
key_arg_pos = 0;
const auto atom_it = atom_map.find(func->name);
if (atom_it == std::end(atom_map))
return false;
out.key_column = key_column_num;
return atom_it->second(out, const_value, index);
}
else if (KeyCondition::getConstant(node, block_with_constants, const_value, const_type))
{
/// Check constant like in KeyCondition
if (const_value.getType() == Field::Types::UInt64
|| const_value.getType() == Field::Types::Int64
|| const_value.getType() == Field::Types::Float64)
{
/// Zero in all types is represented in memory the same way as in UInt64.
out.function = const_value.get<UInt64>()
? RPNElement::ALWAYS_TRUE
: RPNElement::ALWAYS_FALSE;
return true;
}
}
return false;
}
bool BloomFilterCondition::tryPrepareSetBloomFilter(
const ASTs & args,
RPNElement & out)
{
const ASTPtr & left_arg = args[0];
const ASTPtr & right_arg = args[1];
std::vector<KeyTuplePositionMapping> key_tuple_mapping;
DataTypes data_types;
const auto * left_arg_tuple = typeid_cast<const ASTFunction *>(left_arg.get());
if (left_arg_tuple && left_arg_tuple->name == "tuple")
{
const auto & tuple_elements = left_arg_tuple->arguments->children;
for (size_t i = 0; i < tuple_elements.size(); ++i)
{
size_t key = 0;
if (getKey(tuple_elements[i], key))
{
key_tuple_mapping.emplace_back(i, key);
data_types.push_back(index.data_types[key]);
}
}
}
else
{
size_t key = 0;
if (getKey(left_arg, key))
{
key_tuple_mapping.emplace_back(0, key);
data_types.push_back(index.data_types[key]);
}
}
if (key_tuple_mapping.empty())
return false;
PreparedSetKey set_key;
if (typeid_cast<const ASTSubquery *>(right_arg.get()) || typeid_cast<const ASTIdentifier *>(right_arg.get()))
set_key = PreparedSetKey::forSubquery(*right_arg);
else
set_key = PreparedSetKey::forLiteral(*right_arg, data_types);
auto set_it = prepared_sets.find(set_key);
if (set_it == prepared_sets.end())
return false;
const SetPtr & prepared_set = set_it->second;
if (!prepared_set->hasExplicitSetElements())
return false;
for (const auto & data_type : prepared_set->getDataTypes())
if (data_type->getTypeId() != TypeIndex::String && data_type->getTypeId() != TypeIndex::FixedString)
return false;
std::vector<std::vector<StringBloomFilter>> bloom_filters;
std::vector<size_t> key_position;
const auto & columns = prepared_set->getSetElements();
for (size_t col = 0; col < key_tuple_mapping.size(); ++col)
{
bloom_filters.emplace_back();
key_position.push_back(key_tuple_mapping[col].key_index);
size_t tuple_idx = key_tuple_mapping[col].tuple_index;
const auto & column = columns[tuple_idx];
for (size_t row = 0; row < prepared_set->getTotalRowCount(); ++row)
{
bloom_filters.back().emplace_back(index.bloom_filter_size, index.bloom_filter_hashes, index.seed);
auto ref = column->getDataAt(row);
stringToBloomFilter(ref.data, ref.size, index.token_extractor_func, bloom_filters.back().back());
}
}
out.set_key_position = std::move(key_position);
out.set_bloom_filters = std::move(bloom_filters);
return true;
}
MergeTreeIndexGranulePtr MergeTreeBloomFilterIndex::createIndexGranule() const
{
return std::make_shared<MergeTreeBloomFilterIndexGranule>(*this);
}
MergeTreeIndexAggregatorPtr MergeTreeBloomFilterIndex::createIndexAggregator() const
{
return std::make_shared<MergeTreeBloomFilterIndexAggregator>(*this);
}
IndexConditionPtr MergeTreeBloomFilterIndex::createIndexCondition(
const SelectQueryInfo & query, const Context & context) const
{
return std::make_shared<BloomFilterCondition>(query, context, *this);
};
bool MergeTreeBloomFilterIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
{
return std::find(std::cbegin(columns), std::cend(columns), node->getColumnName()) != std::cend(columns);
}
bool NgramTokenExtractor::next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const
{
*token_start = *pos;
*token_len = 0;
size_t code_points = 0;
for (; code_points < n && *token_start + *token_len < len; ++code_points)
{
size_t sz = UTF8::seqLength(static_cast<UInt8>(data[*token_start + *token_len]));
*token_len += sz;
}
*pos += UTF8::seqLength(static_cast<UInt8>(data[*pos]));
return code_points == n;
}
bool NgramTokenExtractor::nextLike(const String & str, size_t * pos, String & token) const
{
token.clear();
size_t code_points = 0;
bool escaped = false;
for (size_t i = *pos; i < str.size();)
{
if (escaped && (str[i] == '%' || str[i] == '_' || str[i] == '\\'))
{
token += str[i];
++code_points;
escaped = false;
++i;
}
else if (!escaped && (str[i] == '%' || str[i] == '_'))
{
/// This token is too small, go to the next.
token.clear();
code_points = 0;
escaped = false;
*pos = ++i;
}
else if (!escaped && str[i] == '\\')
{
escaped = true;
++i;
}
else
{
const size_t sz = UTF8::seqLength(static_cast<UInt8>(str[i]));
for (size_t j = 0; j < sz; ++j)
token += str[i + j];
i += sz;
++code_points;
escaped = false;
}
if (code_points == n)
{
*pos += UTF8::seqLength(static_cast<UInt8>(str[*pos]));
return true;
}
}
return false;
}
bool SplitTokenExtractor::next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const
{
*token_start = *pos;
*token_len = 0;
while (*pos < len)
{
if (isASCII(data[*pos]) && !isAlphaNumericASCII(data[*pos]))
{
if (*token_len > 0)
return true;
*token_start = ++*pos;
}
else
{
const size_t sz = UTF8::seqLength(static_cast<UInt8>(data[*pos]));
*pos += sz;
*token_len += sz;
}
}
return *token_len > 0;
}
bool SplitTokenExtractor::nextLike(const String & str, size_t * pos, String & token) const
{
token.clear();
bool bad_token = false; // % or _ before token
bool escaped = false;
while (*pos < str.size())
{
if (!escaped && (str[*pos] == '%' || str[*pos] == '_'))
{
token.clear();
bad_token = true;
++*pos;
}
else if (!escaped && str[*pos] == '\\')
{
escaped = true;
++*pos;
}
else if (isASCII(str[*pos]) && !isAlphaNumericASCII(str[*pos]))
{
if (!bad_token && !token.empty())
return true;
token.clear();
bad_token = false;
escaped = false;
++*pos;
}
else
{
const size_t sz = UTF8::seqLength(static_cast<UInt8>(str[*pos]));
for (size_t j = 0; j < sz; ++j)
{
token += str[*pos];
++*pos;
}
escaped = false;
}
}
return !bad_token && !token.empty();
}
std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
const NamesAndTypesList & new_columns,
std::shared_ptr<ASTIndexDeclaration> node,
const Context & context)
{
if (node->name.empty())
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);
ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(node->expr->clone());
auto syntax = SyntaxAnalyzer(context, {}).analyze(
expr_list, new_columns);
auto index_expr = ExpressionAnalyzer(expr_list, syntax, context).getActions(false);
auto sample = ExpressionAnalyzer(expr_list, syntax, context)
.getActions(true)->getSampleBlock();
Names columns;
DataTypes data_types;
for (size_t i = 0; i < expr_list->children.size(); ++i)
{
const auto & column = sample.getByPosition(i);
columns.emplace_back(column.name);
data_types.emplace_back(column.type);
if (data_types.back()->getTypeId() != TypeIndex::String
&& data_types.back()->getTypeId() != TypeIndex::FixedString)
throw Exception("Bloom filter index can be used only with `String` or `FixedString` column.", ErrorCodes::INCORRECT_QUERY);
}
boost::algorithm::to_lower(node->type->name);
if (node->type->name == NgramTokenExtractor::getName())
{
if (!node->type->arguments || node->type->arguments->children.size() != 4)
throw Exception("`ngrambf` index must have exactly 4 arguments.", ErrorCodes::INCORRECT_QUERY);
size_t n = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[0]).value.get<size_t>();
size_t bloom_filter_size = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[1]).value.get<size_t>();
size_t bloom_filter_hashes = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[2]).value.get<size_t>();
size_t seed = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[3]).value.get<size_t>();
auto tokenizer = std::make_unique<NgramTokenExtractor>(n);
return std::make_unique<MergeTreeBloomFilterIndex>(
node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
}
else if (node->type->name == SplitTokenExtractor::getName())
{
if (!node->type->arguments || node->type->arguments->children.size() != 3)
throw Exception("`tokenbf` index must have exactly 3 arguments.", ErrorCodes::INCORRECT_QUERY);
size_t bloom_filter_size = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[0]).value.get<size_t>();
size_t bloom_filter_hashes = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[1]).value.get<size_t>();
size_t seed = typeid_cast<const ASTLiteral &>(
*node->type->arguments->children[2]).value.get<size_t>();
auto tokenizer = std::make_unique<SplitTokenExtractor>();
return std::make_unique<MergeTreeBloomFilterIndex>(
node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
}
else
{
throw Exception("Unknown index type: `" + node->name + "`.", ErrorCodes::LOGICAL_ERROR);
}
}
}

View File

@ -0,0 +1,207 @@
#pragma once
#include <Interpreters/BloomFilter.h>
#include <Storages/MergeTree/MergeTreeIndices.h>
#include <Storages/MergeTree/KeyCondition.h>
#include <memory>
namespace DB
{
class MergeTreeBloomFilterIndex;
struct MergeTreeBloomFilterIndexGranule : public IMergeTreeIndexGranule
{
explicit MergeTreeBloomFilterIndexGranule(
const MergeTreeBloomFilterIndex & index);
~MergeTreeBloomFilterIndexGranule() override = default;
void serializeBinary(WriteBuffer & ostr) const override;
void deserializeBinary(ReadBuffer & istr) override;
bool empty() const override { return !has_elems; }
const MergeTreeBloomFilterIndex & index;
std::vector<StringBloomFilter> bloom_filters;
bool has_elems;
};
using MergeTreeBloomFilterIndexGranulePtr = std::shared_ptr<MergeTreeBloomFilterIndexGranule>;
struct MergeTreeBloomFilterIndexAggregator : IMergeTreeIndexAggregator
{
explicit MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index);
~MergeTreeBloomFilterIndexAggregator() override = default;
bool empty() const override { return !granule || granule->empty(); }
MergeTreeIndexGranulePtr getGranuleAndReset() override;
void update(const Block & block, size_t * pos, size_t limit) override;
const MergeTreeBloomFilterIndex & index;
MergeTreeBloomFilterIndexGranulePtr granule;
};
class BloomFilterCondition : public IIndexCondition
{
public:
BloomFilterCondition(
const SelectQueryInfo & query_info,
const Context & context,
const MergeTreeBloomFilterIndex & index_);
~BloomFilterCondition() override = default;
bool alwaysUnknownOrTrue() const override;
bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
private:
struct KeyTuplePositionMapping
{
KeyTuplePositionMapping(size_t tuple_index_, size_t key_index_) : tuple_index(tuple_index_), key_index(key_index_) {}
size_t tuple_index;
size_t key_index;
};
/// Uses RPN like KeyCondition
struct RPNElement
{
enum Function
{
/// Atoms of a Boolean expression.
FUNCTION_EQUALS,
FUNCTION_NOT_EQUALS,
FUNCTION_LIKE,
FUNCTION_NOT_LIKE,
FUNCTION_IN,
FUNCTION_NOT_IN,
FUNCTION_UNKNOWN, /// Can take any value.
/// Operators of the logical expression.
FUNCTION_NOT,
FUNCTION_AND,
FUNCTION_OR,
/// Constants
ALWAYS_FALSE,
ALWAYS_TRUE,
};
RPNElement(
Function function_ = FUNCTION_UNKNOWN, size_t key_column_ = 0, std::unique_ptr<StringBloomFilter> && const_bloom_filter_ = nullptr)
: function(function_), key_column(key_column_), bloom_filter(std::move(const_bloom_filter_)) {}
Function function = FUNCTION_UNKNOWN;
/// For FUNCTION_EQUALS, FUNCTION_NOT_EQUALS, FUNCTION_LIKE, FUNCTION_NOT_LIKE.
size_t key_column;
std::unique_ptr<StringBloomFilter> bloom_filter;
/// For FUNCTION_IN and FUNCTION_NOT_IN
std::vector<std::vector<StringBloomFilter>> set_bloom_filters;
std::vector<size_t> set_key_position;
};
using AtomMap = std::unordered_map<std::string, bool(*)(RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)>;
using RPN = std::vector<RPNElement>;
bool atomFromAST(const ASTPtr & node, Block & block_with_constants, RPNElement & out);
bool getKey(const ASTPtr & node, size_t & key_column_num);
bool tryPrepareSetBloomFilter(const ASTs & args, RPNElement & out);
static const AtomMap atom_map;
const MergeTreeBloomFilterIndex & index;
RPN rpn;
/// Sets from syntax analyzer.
PreparedSets prepared_sets;
};
/// Interface for string parsers.
struct ITokenExtractor
{
virtual ~ITokenExtractor() = default;
/// Fast inplace implementation for regular use.
/// Gets string (data ptr and len) and start position for extracting next token (state of extractor).
/// Returns false if parsing is finished, otherwise returns true.
virtual bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const = 0;
/// Special implementation for creating bloom filter for LIKE function.
/// It skips unescaped `%` and `_` and supports escaping symbols, but it is less lightweight.
virtual bool nextLike(const String & str, size_t * pos, String & out) const = 0;
virtual bool supportLike() const = 0;
};
/// Parser extracting all ngrams from string.
struct NgramTokenExtractor : public ITokenExtractor
{
NgramTokenExtractor(size_t n_) : n(n_) {}
static String getName() { return "ngrambf_v1"; }
bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const override;
bool nextLike(const String & str, size_t * pos, String & token) const override;
bool supportLike() const override { return true; }
size_t n;
};
/// Parser extracting tokens (sequences of numbers and ascii letters).
struct SplitTokenExtractor : public ITokenExtractor
{
static String getName() { return "tokenbf_v1"; }
bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const override;
bool nextLike(const String & str, size_t * pos, String & token) const override;
bool supportLike() const override { return true; }
};
class MergeTreeBloomFilterIndex : public IMergeTreeIndex
{
public:
MergeTreeBloomFilterIndex(
String name_,
ExpressionActionsPtr expr_,
const Names & columns_,
const DataTypes & data_types_,
const Block & header_,
size_t granularity_,
size_t bloom_filter_size_,
size_t bloom_filter_hashes_,
size_t seed_,
std::unique_ptr<ITokenExtractor> && token_extractor_func_)
: IMergeTreeIndex(name_, expr_, columns_, data_types_, header_, granularity_)
, bloom_filter_size(bloom_filter_size_)
, bloom_filter_hashes(bloom_filter_hashes_)
, seed(seed_)
, token_extractor_func(std::move(token_extractor_func_)) {}
~MergeTreeBloomFilterIndex() override = default;
MergeTreeIndexGranulePtr createIndexGranule() const override;
MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
IndexConditionPtr createIndexCondition(
const SelectQueryInfo & query, const Context & context) const override;
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
/// Bloom filter size in bytes.
size_t bloom_filter_size;
/// Number of bloom filter hash functions.
size_t bloom_filter_hashes;
/// Bloom filter seed.
size_t seed;
/// Fucntion for selecting next token.
std::unique_ptr<ITokenExtractor> token_extractor_func;
};
}

View File

@ -223,8 +223,8 @@ static void checkKeyExpression(const ExpressionActions & expr, const Block & sam
void MergeTreeData::setPrimaryKeyIndicesAndColumns(
const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
const ColumnsDescription &new_columns, const IndicesDescription &indices_description, bool only_check)
const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast,
const ColumnsDescription & new_columns, const IndicesDescription & indices_description, bool only_check)
{
if (!new_order_by_ast)
throw Exception("ORDER BY cannot be empty", ErrorCodes::BAD_ARGUMENTS);
@ -2517,14 +2517,22 @@ bool MergeTreeData::mayBenefitFromIndexForIn(const ASTPtr & left_in_operand) con
if (left_in_operand_tuple && left_in_operand_tuple->name == "tuple")
{
for (const auto & item : left_in_operand_tuple->arguments->children)
{
if (isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(item))
return true;
for (const auto & index : skip_indices)
if (index->mayBenefitFromIndexForIn(item))
return true;
}
/// The tuple itself may be part of the primary key, so check that as a last resort.
return isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand);
}
else
{
for (const auto & index : skip_indices)
if (index->mayBenefitFromIndexForIn(left_in_operand))
return true;
return isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand);
}
}

View File

@ -732,9 +732,9 @@ private:
/// The same for clearOldTemporaryDirectories.
std::mutex clear_old_temporary_directories_mutex;
void setPrimaryKeyIndicesAndColumns(const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
const ColumnsDescription &new_columns,
const IndicesDescription &indices_description, bool only_check = false);
void setPrimaryKeyIndicesAndColumns(const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast,
const ColumnsDescription & new_columns,
const IndicesDescription & indices_description, bool only_check = false);
void initPartitionKey();

View File

@ -65,10 +65,18 @@ std::unique_ptr<IMergeTreeIndex> setIndexCreator(
std::shared_ptr<ASTIndexDeclaration> node,
const Context & context);
std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
const NamesAndTypesList & columns,
std::shared_ptr<ASTIndexDeclaration> node,
const Context & context);
MergeTreeIndexFactory::MergeTreeIndexFactory()
{
registerIndex("minmax", minmaxIndexCreator);
registerIndex("set", setIndexCreator);
registerIndex("ngrambf_v1", bloomFilterIndexCreator);
registerIndex("tokenbf_v1", bloomFilterIndexCreator);
}
}

View File

@ -95,6 +95,9 @@ public:
/// gets filename without extension
String getFileName() const { return INDEX_FILE_PREFIX + name; }
/// Checks whether the column is in data skipping index.
virtual bool mayBenefitFromIndexForIn(const ASTPtr & node) const = 0;
virtual MergeTreeIndexGranulePtr createIndexGranule() const = 0;
virtual MergeTreeIndexAggregatorPtr createIndexAggregator() const = 0;

View File

@ -133,6 +133,20 @@ IndexConditionPtr MergeTreeMinMaxIndex::createIndexCondition(
return std::make_shared<MinMaxCondition>(query, context, *this);
};
bool MergeTreeMinMaxIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
{
const String column_name = node->getColumnName();
for (const auto & name : columns)
if (column_name == name)
return true;
if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
if (func->arguments->children.size() == 1)
return mayBenefitFromIndexForIn(func->arguments->children.front());
return false;
}
std::unique_ptr<IMergeTreeIndex> minmaxIndexCreator(
const NamesAndTypesList & new_columns,

View File

@ -82,6 +82,7 @@ public:
IndexConditionPtr createIndexCondition(
const SelectQueryInfo & query, const Context & context) const override;
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
};
}

View File

@ -462,11 +462,16 @@ IndexConditionPtr MergeTreeSetSkippingIndex::createIndexCondition(
return std::make_shared<SetIndexCondition>(query, context, *this);
};
bool MergeTreeSetSkippingIndex::mayBenefitFromIndexForIn(const ASTPtr &) const
{
return false;
}
std::unique_ptr<IMergeTreeIndex> setIndexCreator(
const NamesAndTypesList & new_columns,
std::shared_ptr<ASTIndexDeclaration> node,
const Context & context)
const NamesAndTypesList & new_columns,
std::shared_ptr<ASTIndexDeclaration> node,
const Context & context)
{
if (node->name.empty())
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);

View File

@ -112,6 +112,8 @@ public:
IndexConditionPtr createIndexCondition(
const SelectQueryInfo & query, const Context & context) const override;
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
size_t max_rows = 0;
};

View File

@ -1,5 +1,7 @@
#include <Storages/MergeTree/MergeTreeSettings.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTFunction.h>
#include <Common/Exception.h>

View File

@ -3,7 +3,7 @@
#include <Poco/Util/AbstractConfiguration.h>
#include <Core/Defines.h>
#include <Core/Types.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
namespace DB

View File

@ -0,0 +1,130 @@
#pragma once
#include <Common/typeid_cast.h>
#include <Core/Block.h>
#include <DataTypes/DataTypesNumber.h>
#include <Interpreters/Context.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTFunction.h>
#include <Storages/SelectQueryInfo.h>
#include <Storages/MergeTree/KeyCondition.h>
namespace DB
{
/// Builds reverse polish notation
template <typename RPNElement>
class RPNBuilder
{
public:
using RPN = std::vector<RPNElement>;
using AtomFromASTFunc = std::function<
bool(const ASTPtr & node, const Context & context, Block & block_with_constants, RPNElement & out)>;
RPNBuilder(
const SelectQueryInfo & query_info,
const Context & context_,
const AtomFromASTFunc & atomFromAST_)
: context(context_), atomFromAST(atomFromAST_)
{
/** Evaluation of expressions that depend only on constants.
* For the index to be used, if it is written, for example `WHERE Date = toDate(now())`.
*/
block_with_constants = KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context);
/// Trasform WHERE section to Reverse Polish notation
const ASTSelectQuery & select = typeid_cast<const ASTSelectQuery &>(*query_info.query);
if (select.where_expression)
{
traverseAST(select.where_expression);
if (select.prewhere_expression)
{
traverseAST(select.prewhere_expression);
rpn.emplace_back(RPNElement::FUNCTION_AND);
}
}
else if (select.prewhere_expression)
{
traverseAST(select.prewhere_expression);
}
else
{
rpn.emplace_back(RPNElement::FUNCTION_UNKNOWN);
}
}
RPN && extractRPN() { return std::move(rpn); }
private:
void traverseAST(const ASTPtr & node)
{
RPNElement element;
if (ASTFunction * func = typeid_cast<ASTFunction *>(&*node))
{
if (operatorFromAST(func, element))
{
auto & args = typeid_cast<ASTExpressionList &>(*func->arguments).children;
for (size_t i = 0, size = args.size(); i < size; ++i)
{
traverseAST(args[i]);
/** The first part of the condition is for the correct support of `and` and `or` functions of arbitrary arity
* - in this case `n - 1` elements are added (where `n` is the number of arguments).
*/
if (i != 0 || element.function == RPNElement::FUNCTION_NOT)
rpn.emplace_back(std::move(element));
}
return;
}
}
if (!atomFromAST(node, context, block_with_constants, element))
{
element.function = RPNElement::FUNCTION_UNKNOWN;
}
rpn.emplace_back(std::move(element));
}
bool operatorFromAST(const ASTFunction * func, RPNElement & out)
{
/// Functions AND, OR, NOT.
/** Also a special function `indexHint` - works as if instead of calling a function there are just parentheses
* (or, the same thing - calling the function `and` from one argument).
*/
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
if (func->name == "not")
{
if (args.size() != 1)
return false;
out.function = RPNElement::FUNCTION_NOT;
}
else
{
if (func->name == "and" || func->name == "indexHint")
out.function = RPNElement::FUNCTION_AND;
else if (func->name == "or")
out.function = RPNElement::FUNCTION_OR;
else
return false;
}
return true;
}
const Context & context;
const AtomFromASTFunc & atomFromAST;
Block block_with_constants;
RPN rpn;
};
};

View File

@ -33,8 +33,6 @@ ReplicatedMergeTreeAlterThread::ReplicatedMergeTreeAlterThread(StorageReplicated
void ReplicatedMergeTreeAlterThread::run()
{
bool force_recheck_parts = true;
try
{
/** We have a description of columns in ZooKeeper, common for all replicas (Example: /clickhouse/tables/02-06/visits/columns),

View File

@ -36,6 +36,7 @@ private:
String log_name;
Logger * log;
BackgroundSchedulePool::TaskHolder task;
bool force_recheck_parts = true;
};
}

View File

@ -6,7 +6,7 @@
#include <Common/SimpleIncrement.h>
#include <Client/ConnectionPool.h>
#include <Client/ConnectionPoolWithFailover.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/Cluster.h>
#include <Interpreters/ExpressionActions.h>
#include <Parsers/ASTFunction.h>

View File

@ -2,6 +2,7 @@
#include <Storages/StorageFactory.h>
#include <Interpreters/Join.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTIdentifier.h>
#include <Core/ColumnNumbers.h>
#include <DataStreams/IBlockInputStream.h>

View File

@ -22,7 +22,7 @@
#include <Columns/ColumnString.h>
#include <Common/typeid_cast.h>
#include <Databases/IDatabase.h>
#include <Interpreters/SettingsCommon.h>
#include <Core/SettingsCommon.h>
#include <DataStreams/MaterializingBlockInputStream.h>
#include <DataStreams/FilterBlockInputStream.h>
#include <ext/range.h>

View File

@ -6,7 +6,7 @@
#include <Storages/transformQueryForExternalDatabase.h>
#include <Formats/MySQLBlockInputStream.h>
#include <Interpreters/evaluateConstantExpression.h>
#include <Interpreters/Settings.h>
#include <Core/Settings.h>
#include <Interpreters/Context.h>
#include <DataStreams/IBlockOutputStream.h>
#include <Formats/FormatFactory.h>

View File

@ -3,6 +3,7 @@
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ASTSelectWithUnionQuery.h>
#include <Storages/StorageView.h>
#include <Storages/StorageFactory.h>

Some files were not shown because too many files have changed in this diff Show More