mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-10-04 15:40:49 +00:00
Merge branch 'master' of github.com:yandex/ClickHouse
This commit is contained in:
commit
05f0a95e45
3
.gitmodules
vendored
3
.gitmodules
vendored
@ -76,3 +76,6 @@
|
||||
[submodule "contrib/brotli"]
|
||||
path = contrib/brotli
|
||||
url = https://github.com/google/brotli.git
|
||||
[submodule "contrib/hyperscan"]
|
||||
path = contrib/hyperscan
|
||||
url = https://github.com/ClickHouse-Extras/hyperscan.git
|
||||
|
@ -1,3 +1,8 @@
|
||||
## ClickHouse release 19.4.1.3, 2019-03-19
|
||||
|
||||
### Bug Fixes
|
||||
* Fixed remote queries which contain both `LIMIT BY` and `LIMIT`. Previously, if `LIMIT BY` and `LIMIT` were used for remote query, `LIMIT` could happen before `LIMIT BY`, which led to too filtered result. [#4708](https://github.com/yandex/ClickHouse/pull/4708) ([Constantin S. Pan](https://github.com/kvap))
|
||||
|
||||
## ClickHouse release 19.4.0.49, 2019-03-09
|
||||
|
||||
### New Features
|
||||
@ -62,7 +67,7 @@
|
||||
|
||||
### Bug fixes
|
||||
|
||||
* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Fixed error in #3920. This error manifestate itself as random cache corruption (messages `Unknown codec family code`, `Cannot seek through file`) and segfaults. This bug first appeared in version 19.1 and is present in versions up to 19.1.10 and 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
|
||||
## ClickHouse release 19.3.6, 2019-03-02
|
||||
|
104
CHANGELOG_RU.md
104
CHANGELOG_RU.md
@ -1,3 +1,89 @@
|
||||
## ClickHouse release 19.4.0.49, 2019-03-09
|
||||
|
||||
### Новые возможности
|
||||
* Добавлена полная поддержка формата `Protobuf` (чтение и запись, вложенные структуры данных). [#4174](https://github.com/yandex/ClickHouse/pull/4174) [#4493](https://github.com/yandex/ClickHouse/pull/4493) ([Vitaly Baranov](https://github.com/vitlibar))
|
||||
* Добавлены функции для работы с битовыми масками с использованием библиотеки Roaring Bitmaps. [#4207](https://github.com/yandex/ClickHouse/pull/4207) ([Andy Yang](https://github.com/andyyzh)) [#4568](https://github.com/yandex/ClickHouse/pull/4568) ([Vitaly Baranov](https://github.com/vitlibar))
|
||||
* Поддержка формата `Parquet` [#4448](https://github.com/yandex/ClickHouse/pull/4448) ([proller](https://github.com/proller))
|
||||
* Вычисление расстояния между строками с помощью подсчёта N-грам - для приближённого сравнения строк. Алгоритм похож на q-gram metrics в языке R. [#4466](https://github.com/yandex/ClickHouse/pull/4466) ([Danila Kutenin](https://github.com/danlark1))
|
||||
* Движок таблиц GraphiteMergeTree поддерживает отдельные шаблоны для правил агрегации и для правил времени хранения. [#4426](https://github.com/yandex/ClickHouse/pull/4426) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
|
||||
* Добавлены настройки `max_execution_speed` и `max_execution_speed_bytes` для того, чтобы ограничить потребление ресурсов запросами. Добавлена настройка `min_execution_speed_bytes` в дополнение к `min_execution_speed`. [#4430](https://github.com/yandex/ClickHouse/pull/4430) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Добавлена функция `flatten` - конвертация многомерных массивов в плоский массив. [#4555](https://github.com/yandex/ClickHouse/pull/4555) [#4409](https://github.com/yandex/ClickHouse/pull/4409) ([alexey-milovidov](https://github.com/alexey-milovidov), [kzon](https://github.com/kzon))
|
||||
* Добавлены функции `arrayEnumerateDenseRanked` и `arrayEnumerateUniqRanked` (похожа на `arrayEnumerateUniq` но позволяет указать глубину, на которую следует смотреть в многомерные массивы). [#4475](https://github.com/yandex/ClickHouse/pull/4475) ([proller](https://github.com/proller)) [#4601](https://github.com/yandex/ClickHouse/pull/4601) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Добавлена поддержка множества JOIN в одном запросе без подзапросов, с некоторыми ограничениями: без звёздочки и без алиасов сложных выражений в ON/WHERE/GROUP BY/... [#4462](https://github.com/yandex/ClickHouse/pull/4462) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
|
||||
### Исправления ошибок
|
||||
* Этот релиз также содержит все исправления из 19.3 и 19.1.
|
||||
* Исправлена ошибка во вторичных индексах (экспериментальная возможность): порядок гранул при INSERT был неверным. [#4407](https://github.com/yandex/ClickHouse/pull/4407) ([Nikita Vasilev](https://github.com/nikvas0))
|
||||
* Исправлена работа вторичного индекса (экспериментальная возможность) типа `set` для столбцов типа `Nullable` и `LowCardinality`. Ранее их использование вызывало ошибку `Data type must be deserialized with multiple streams` при запросе SELECT. [#4594](https://github.com/yandex/ClickHouse/pull/4594) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
|
||||
* Правильное запоминание времени последнего обновления при полной перезагрузке словарей типа `executable`. [#4551](https://github.com/yandex/ClickHouse/pull/4551) ([Tema Novikov](https://github.com/temoon))
|
||||
* Исправлена неработоспособность прогресс-бара, возникшая в версии 19.3 [#4627](https://github.com/yandex/ClickHouse/pull/4627) ([filimonov](https://github.com/filimonov))
|
||||
* Исправлены неправильные значения MemoryTracker, если кусок памяти был уменьшен в размере, в очень редких случаях. [#4619](https://github.com/yandex/ClickHouse/pull/4619) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено undefined behaviour в ThreadPool [#4612](https://github.com/yandex/ClickHouse/pull/4612) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено очень редкое падение с сообщением `mutex lock failed: Invalid argument`, которое могло произойти, если таблица типа MergeTree удалялась одновременно с SELECT. [#4608](https://github.com/yandex/ClickHouse/pull/4608) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Совместимость ODBC драйвера с типом данных `LowCardinality` [#4381](https://github.com/yandex/ClickHouse/pull/4381) ([proller](https://github.com/proller))
|
||||
* Исправление ошибки `AIOcontextPool: Found io_event with unknown id 0` под ОС FreeBSD [#4438](https://github.com/yandex/ClickHouse/pull/4438) ([urgordeadbeef](https://github.com/urgordeadbeef))
|
||||
* Таблица `system.part_log` создавалась независимо от того, была ли она объявлена в конфигурации. [#4483](https://github.com/yandex/ClickHouse/pull/4483) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено undefined behaviour в функции `dictIsIn` для словарей типа `cache`. [#4515](https://github.com/yandex/ClickHouse/pull/4515) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлен deadlock в случае, если запрос SELECT блокирует одну и ту же таблицу несколько раз (например - из разных потоков, либо при выполнении разных подзапросов) и одновременно с этим производится DDL запрос. [#4535](https://github.com/yandex/ClickHouse/pull/4535) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Настройка `compile_expressions` выключена по-умолчанию до тех пор, пока мы не зафиксируем исходники используемой библиотеки `LLVM` и не будем проверять её под `ASan` (сейчас библиотека LLVM берётся из системы). [#4579](https://github.com/yandex/ClickHouse/pull/4579) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлено падение по `std::terminate`, если `invalidate_query` для внешних словарей с истоником `clickhouse` вернул неправильный результат (пустой; более чем одну строку; более чем один столбец). Исправлена ошибка, из-за которой запрос `invalidate_query` производился каждые пять секунд, независимо от указанного `lifetime`. [#4583](https://github.com/yandex/ClickHouse/pull/4583) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлен deadlock в случае, если запрос `invalidate_query` для внешнего словаря с источником `clickhouse` использовал таблицу `system.dictionaries` или базу данных типа `Dictionary` (редкий случай). [#4599](https://github.com/yandex/ClickHouse/pull/4599) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена работа CROSS JOIN с пустым WHERE [#4598](https://github.com/yandex/ClickHouse/pull/4598) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Исправлен segfault в функции `replicate` с константным аргументом. [#4603](https://github.com/yandex/ClickHouse/pull/4603) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена работа predicate pushdown (настройка `enable_optimize_predicate_expression`) с лямбда-функциями. [#4408](https://github.com/yandex/ClickHouse/pull/4408) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Множественные исправления для множества JOIN в одном запросе. [#4595](https://github.com/yandex/ClickHouse/pull/4595) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
|
||||
### Улучшения
|
||||
* Поддержка алиасов в секции JOIN ON для правой таблицы [#4412](https://github.com/yandex/ClickHouse/pull/4412) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Используются правильные алиасы в случае множественных JOIN с подзапросами. [#4474](https://github.com/yandex/ClickHouse/pull/4474) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Исправлена логика работы predicate pushdown (настройка `enable_optimize_predicate_expression`) для JOIN. [#4387](https://github.com/yandex/ClickHouse/pull/4387) ([Ivan](https://github.com/abyss7))
|
||||
|
||||
### Улучшения производительности
|
||||
* Улучшена эвристика оптимизации "перенос в PREWHERE". [#4405](https://github.com/yandex/ClickHouse/pull/4405) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Используются настоящие lookup таблицы вместо хэш-таблиц в случае 8 и 16 битных ключей. Интерфейс хэш-таблиц обобщён, чтобы поддерживать этот случай. [#4536](https://github.com/yandex/ClickHouse/pull/4536) ([Amos Bird](https://github.com/amosbird))
|
||||
* Улучшена производительность сравнения строк. [#4564](https://github.com/yandex/ClickHouse/pull/4564) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Очередь DDL операций (для запросов ON CLUSTER) очищается в отдельном потоке, чтобы не замедлять основную работу. [#4502](https://github.com/yandex/ClickHouse/pull/4502) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Даже если настройка `min_bytes_to_use_direct_io` выставлена в 1, не каждый файл открывался в режиме O_DIRECT, потому что размер файлов иногда недооценивался на размер одного сжатого блока. [#4526](https://github.com/yandex/ClickHouse/pull/4526) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
### Улучшения сборки/тестирования/пакетирования
|
||||
* Добавлена поддержка компилятора clang-9 [#4604](https://github.com/yandex/ClickHouse/pull/4604) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлены неправильные `__asm__` инструкции [#4621](https://github.com/yandex/ClickHouse/pull/4621) ([Konstantin Podshumok](https://github.com/podshumok))
|
||||
* Добавлена поддержка задания настроек выполнения запросов для `clickhouse-performance-test` из командной строки. [#4437](https://github.com/yandex/ClickHouse/pull/4437) ([alesapin](https://github.com/alesapin))
|
||||
* Тесты словарей перенесены в интеграционные тесты. [#4477](https://github.com/yandex/ClickHouse/pull/4477) ([alesapin](https://github.com/alesapin))
|
||||
* В набор автоматизированных тестов производительности добавлены запросы, находящиеся в разделе "benchmark" на официальном сайте. [#4496](https://github.com/yandex/ClickHouse/pull/4496) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправления сборки в случае использования внешних библиотек lz4 и xxhash. [#4495](https://github.com/yandex/ClickHouse/pull/4495) ([Orivej Desh](https://github.com/orivej))
|
||||
* Исправлен undefined behaviour, если функция `quantileTiming` была вызвана с отрицательным или нецелым аргументом (обнаружено с помощью fuzz test под undefined behaviour sanitizer). [#4506](https://github.com/yandex/ClickHouse/pull/4506) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлены опечатки в коде. [#4531](https://github.com/yandex/ClickHouse/pull/4531) ([sdk2](https://github.com/sdk2))
|
||||
* Исправлена сборка под Mac. [#4371](https://github.com/yandex/ClickHouse/pull/4371) ([Vitaly Baranov](https://github.com/vitlibar))
|
||||
* Исправлена сборка под FreeBSD и для некоторых необычных конфигурациях сборки. [#4444](https://github.com/yandex/ClickHouse/pull/4444) ([proller](https://github.com/proller))
|
||||
|
||||
|
||||
## ClickHouse release 19.3.7, 2019-03-12
|
||||
|
||||
### Исправления ошибок
|
||||
|
||||
* Исправлена ошибка в #3920. Ошибка проявлялась в виде случайных повреждений кэша (сообщения `Unknown codec family code`, `Cannot seek through file`) и segfault. Ошибка впервые возникла в 19.1 и присутствует во всех версиях до 19.1.10 и 19.3.6. [#4623](https://github.com/yandex/ClickHouse/pull/4623) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
|
||||
## ClickHouse release 19.3.6, 2019-03-02
|
||||
|
||||
### Исправления ошибок
|
||||
|
||||
* Если в пуле потоков было более 1000 потоков, то при выходе из потока, вызывается `std::terminate`. [Azat Khuzhin](https://github.com/azat) [#4485](https://github.com/yandex/ClickHouse/pull/4485) [#4505](https://github.com/yandex/ClickHouse/pull/4505) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Теперь возможно создавать таблицы `ReplicatedMergeTree*` с комментариями столбцов без указания DEFAULT, а также с CODEC но без COMMENT и DEFAULT. Исправлено сравнение CODEC друг с другом. [#4523](https://github.com/yandex/ClickHouse/pull/4523) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлено падение при JOIN по массивам и кортежам. [#4552](https://github.com/yandex/ClickHouse/pull/4552) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Исправлено падение `clickhouse-copier` с сообщением `ThreadStatus not created`. [#4540](https://github.com/yandex/ClickHouse/pull/4540) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Исправлено зависание сервера при завершении работы в случае использования распределённых DDL. [#4472](https://github.com/yandex/ClickHouse/pull/4472) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* В сообщениях об ошибке при парсинге текстовых форматов, выдавались неправильные номера столбцов, в случае, если номер больше 10. [#4484](https://github.com/yandex/ClickHouse/pull/4484) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
### Улучшения сборки/тестирования/пакетирования
|
||||
|
||||
* Исправлена сборка с включенным AVX. [#4527](https://github.com/yandex/ClickHouse/pull/4527) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена поддержка расширенных метрик выполнения запроса в случае, если ClickHouse был собран на системе с новым ядром Linux, а запускается на системе с существенно более старым ядром. [#4541](https://github.com/yandex/ClickHouse/pull/4541) ([nvartolomei](https://github.com/nvartolomei))
|
||||
* Продолжение работы в случае невозможности применить настройку `core_dump.size_limit` с выводом предупреждения. [#4473](https://github.com/yandex/ClickHouse/pull/4473) ([proller](https://github.com/proller))
|
||||
* Удалено `inline` для `void readBinary(...)` в `Field.cpp`. [#4530](https://github.com/yandex/ClickHouse/pull/4530) ([hcz](https://github.com/hczhcz))
|
||||
|
||||
|
||||
## ClickHouse release 19.3.5, 2019-02-21
|
||||
|
||||
### Исправления ошибок:
|
||||
@ -74,7 +160,7 @@
|
||||
* Исправлена ошибка, из-за которой при запросе к таблице `system.tables` могло возникать исключение `table doesn't exist`. [#4313](https://github.com/yandex/ClickHouse/pull/4313) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, приводившая к падению `clickhouse-client` в интерактивном режиме, если успеть выйти из него во время загрузки подсказок командной строки. [#4317](https://github.com/yandex/ClickHouse/pull/4317) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, приводившая к неверным результатам исполнения мутаций, содержащих оператор `IN`. [#4099](https://github.com/yandex/ClickHouse/pull/4099) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено повторное создание таблиц с системными логами (`system.query_log`, `system.part_log`) при остановке сервера. [#4254](https://github.com/yandex/ClickHouse/pull/4254) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлен вывод типа возвращаемого значения, а также использование блокировок в функции `joinGet`. [#4153](https://github.com/yandex/ClickHouse/pull/4153) ([Amos Bird](https://github.com/amosbird))
|
||||
* Исправлено падение сервера при использовании настройки `allow_experimental_multiple_joins_emulation`. [52de2c](https://github.com/yandex/ClickHouse/commit/52de2cd927f7b5257dd67e175f0a5560a48840d0) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
@ -98,7 +184,7 @@
|
||||
* Добавлен инструмент, собирающий changelog из описаний pull request-ов. [#4169](https://github.com/yandex/ClickHouse/pull/4169) [#4173](https://github.com/yandex/ClickHouse/pull/4173) ([KochetovNicolai](https://github.com/KochetovNicolai)) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Добавлен puppet-модуль для Clickhouse. [#4182](https://github.com/yandex/ClickHouse/pull/4182) ([Maxim Fedotov](https://github.com/MaxFedotov))
|
||||
* Добавлена документация для нескольких недокументированных функций. [#4168](https://github.com/yandex/ClickHouse/pull/4168) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Исправления сборки под ARM. [#4210](https://github.com/yandex/ClickHouse/pull/4210)[#4306](https://github.com/yandex/ClickHouse/pull/4306) [#4291](https://github.com/yandex/ClickHouse/pull/4291) ([proller](https://github.com/proller)) ([proller](https://github.com/proller))
|
||||
* Исправления сборки под ARM. [#4210](https://github.com/yandex/ClickHouse/pull/4210)[#4306](https://github.com/yandex/ClickHouse/pull/4306) [#4291](https://github.com/yandex/ClickHouse/pull/4291) ([proller](https://github.com/proller)) ([proller](https://github.com/proller))
|
||||
* Добавлена возможность запускать тесты словарей из `ctest`. [#4189](https://github.com/yandex/ClickHouse/pull/4189) ([proller](https://github.com/proller))
|
||||
* Теперь директорией с SSL-сертификатами по умолчанию является `/etc/ssl`. [#4167](https://github.com/yandex/ClickHouse/pull/4167) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Добавлена проверка доступности SSE и AVX-инструкций на старте. [#4234](https://github.com/yandex/ClickHouse/pull/4234) ([Igr](https://github.com/igron99))
|
||||
@ -133,6 +219,18 @@
|
||||
* Уменьшено время ожидания завершения сервера и завершения запросов `ALTER`. [#4372](https://github.com/yandex/ClickHouse/pull/4372) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Добавлена информация о значении настройки `replicated_can_become_leader` в таблицу `system.replicas`. Добавлено логирование в случае, если реплика не собирается стать лидером. [#4379](https://github.com/yandex/ClickHouse/pull/4379) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
|
||||
## ClickHouse release 19.1.14, 2019-03-14
|
||||
|
||||
* Исправлена ошибка `Column ... queried more than once`, которая могла произойти в случае включенной настройки `asterisk_left_columns_only` в случае использования `GLOBAL JOIN` а также `SELECT *` (редкий случай). Эта ошибка изначально отсутствует в версиях 19.3 и более новых. [6bac7d8d](https://github.com/yandex/ClickHouse/pull/4692/commits/6bac7d8d11a9b0d6de0b32b53c47eb2f6f8e7062) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
|
||||
## ClickHouse release 19.1.13, 2019-03-12
|
||||
|
||||
Этот релиз содержит такие же исправления ошибок, как и 19.3.7.
|
||||
|
||||
## ClickHouse release 19.1.10, 2019-03-03
|
||||
|
||||
Этот релиз содержит такие же исправления ошибок, как и 19.3.6.
|
||||
|
||||
## ClickHouse release 19.1.9, 2019-02-21
|
||||
|
||||
### Исправления ошибок:
|
||||
@ -152,7 +250,7 @@
|
||||
|
||||
* Исправлен вывод типа возвращаемого значения, а также использование блокировок в функции `joinGet`. [#4153](https://github.com/yandex/ClickHouse/pull/4153) ([Amos Bird](https://github.com/amosbird))
|
||||
* Исправлено повторное создание таблиц с системными логами (`system.query_log`, `system.part_log`) при остановке сервера. [#4254](https://github.com/yandex/ClickHouse/pull/4254) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, из-за которой, если была создана база данных с движком `Dictionary`, все словари загружались при старте сервера, а словари с источником из локального ClickHouse не могли загрузиться. [#4255](https://github.com/yandex/ClickHouse/pull/4255) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, приводившая к неверным результатам исполнения мутаций, содержащих оператор `IN`. [#4099](https://github.com/yandex/ClickHouse/pull/4099) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлена ошибка, приводившая к падению `clickhouse-client` в интерактивном режиме, если успеть выйти из него во время загрузки подсказок командной строки. [#4317](https://github.com/yandex/ClickHouse/pull/4317) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, из-за которой при запросе к таблице `system.tables` могло возникать исключение `table doesn't exist`. [#4313](https://github.com/yandex/ClickHouse/pull/4313) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
@ -1,5 +1,6 @@
|
||||
project (ClickHouse)
|
||||
cmake_minimum_required (VERSION 3.3)
|
||||
cmake_policy(SET CMP0023 NEW)
|
||||
|
||||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules/")
|
||||
|
||||
@ -141,7 +142,7 @@ if(NOT COMPILER_CLANG) # clang: error: the clang compiler does not support '-mar
|
||||
endif()
|
||||
|
||||
if (ARCH_NATIVE)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
|
||||
endif ()
|
||||
|
||||
# Special options for better optimized code with clang
|
||||
@ -180,13 +181,19 @@ include (cmake/use_libcxx.cmake)
|
||||
|
||||
set (DEFAULT_LIBS "")
|
||||
if (OS_LINUX AND NOT UNBUNDLED)
|
||||
# Note: this probably has no effict, but I'm not an expert in CMake.
|
||||
# Note: this probably has no effect, but I'm not an expert in CMake.
|
||||
set (CMAKE_C_IMPLICIT_LINK_LIBRARIES "")
|
||||
set (CMAKE_CXX_IMPLICIT_LINK_LIBRARIES "")
|
||||
|
||||
# Disable default linked libraries.
|
||||
set (DEFAULT_LIBS "-nodefaultlibs")
|
||||
|
||||
# We need builtins from Clang's RT even without libcxx - for ubsan+int128. See https://bugs.llvm.org/show_bug.cgi?id=16404
|
||||
set (BUILTINS_LIB_PATH "")
|
||||
if (COMPILER_CLANG)
|
||||
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIB_PATH OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
endif ()
|
||||
|
||||
# Add C++ libraries.
|
||||
#
|
||||
# This consist of:
|
||||
@ -197,14 +204,9 @@ if (OS_LINUX AND NOT UNBUNDLED)
|
||||
#
|
||||
# There are two variants of C++ library: libc++ (from LLVM compiler infrastructure) and libstdc++ (from GCC).
|
||||
if (USE_LIBCXX)
|
||||
set (BUILTINS_LIB_PATH "")
|
||||
if (COMPILER_CLANG)
|
||||
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIB_PATH OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
endif ()
|
||||
|
||||
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lc++ -lc++abi -lgcc_eh ${BUILTINS_LIB_PATH} -Wl,-Bdynamic")
|
||||
else ()
|
||||
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lstdc++ -lgcc_eh -lgcc -Wl,-Bdynamic")
|
||||
set (DEFAULT_LIBS "${DEFAULT_LIBS} -Wl,-Bstatic -lstdc++ -lgcc_eh -lgcc ${BUILTINS_LIB_PATH} -Wl,-Bdynamic")
|
||||
endif ()
|
||||
|
||||
# Linking with GLIBC prevents portability of binaries to older systems.
|
||||
@ -216,6 +218,7 @@ if (OS_LINUX AND NOT UNBUNDLED)
|
||||
string (TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC)
|
||||
set (CMAKE_POSTFIX_VARIABLE "CMAKE_${CMAKE_BUILD_TYPE_UC}_POSTFIX")
|
||||
|
||||
# FIXME: glibc-compatibility may be non-static in some builds!
|
||||
set (DEFAULT_LIBS "${DEFAULT_LIBS} libs/libglibc-compatibility/libglibc-compatibility${${CMAKE_POSTFIX_VARIABLE}}.a")
|
||||
endif ()
|
||||
|
||||
@ -227,6 +230,11 @@ if (OS_LINUX AND NOT UNBUNDLED)
|
||||
message(STATUS "Default libraries: ${DEFAULT_LIBS}")
|
||||
endif ()
|
||||
|
||||
if (DEFAULT_LIBS)
|
||||
# Add default libs to all targets as the last dependency.
|
||||
set(CMAKE_CXX_STANDARD_LIBRARIES ${DEFAULT_LIBS})
|
||||
endif ()
|
||||
|
||||
|
||||
if (NOT MAKE_STATIC_LIBRARIES)
|
||||
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
|
||||
@ -310,6 +318,7 @@ include (cmake/find_pdqsort.cmake)
|
||||
include (cmake/find_hdfs3.cmake) # uses protobuf
|
||||
include (cmake/find_consistent-hashing.cmake)
|
||||
include (cmake/find_base64.cmake)
|
||||
include (cmake/find_hyperscan.cmake)
|
||||
find_contrib_lib(cityhash)
|
||||
find_contrib_lib(farmhash)
|
||||
find_contrib_lib(metrohash)
|
||||
@ -336,35 +345,29 @@ add_subdirectory (dbms)
|
||||
|
||||
include (cmake/print_include_directories.cmake)
|
||||
|
||||
|
||||
if (DEFAULT_LIBS)
|
||||
# Add default libs to all targets as the last dependency.
|
||||
# I have found no better way to specify default libs in CMake that will appear single time in specific order at the end of linker arguments.
|
||||
|
||||
function(add_default_libs target_name)
|
||||
if (GLIBC_COMPATIBILITY)
|
||||
# FIXME: actually glibc-compatibility should always be built first,
|
||||
# because it's unconditionally linked via $DEFAULT_LIBS,
|
||||
# and these looks like the first places that get linked.
|
||||
function (add_glibc_compat target_name)
|
||||
if (TARGET ${target_name})
|
||||
# message(STATUS "Has target ${target_name}")
|
||||
set_property(TARGET ${target_name} APPEND PROPERTY LINK_LIBRARIES "${DEFAULT_LIBS}")
|
||||
set_property(TARGET ${target_name} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${DEFAULT_LIBS}")
|
||||
if (GLIBC_COMPATIBILITY)
|
||||
add_dependencies(${target_name} glibc-compatibility)
|
||||
endif ()
|
||||
add_dependencies(${target_name} glibc-compatibility)
|
||||
endif ()
|
||||
endfunction ()
|
||||
|
||||
add_default_libs(ltdl)
|
||||
add_default_libs(zlibstatic)
|
||||
add_default_libs(jemalloc)
|
||||
add_default_libs(unwind)
|
||||
add_default_libs(memcpy)
|
||||
add_default_libs(Foundation)
|
||||
add_default_libs(common)
|
||||
add_default_libs(gtest)
|
||||
add_default_libs(lz4)
|
||||
add_default_libs(zstd)
|
||||
add_default_libs(snappy)
|
||||
add_default_libs(arrow)
|
||||
add_default_libs(protoc)
|
||||
add_default_libs(thrift_static)
|
||||
add_default_libs(boost_regex_internal)
|
||||
add_glibc_compat(ltdl)
|
||||
add_glibc_compat(zlibstatic)
|
||||
add_glibc_compat(jemalloc)
|
||||
add_glibc_compat(unwind)
|
||||
add_glibc_compat(memcpy)
|
||||
add_glibc_compat(Foundation)
|
||||
add_glibc_compat(common)
|
||||
add_glibc_compat(gtest)
|
||||
add_glibc_compat(lz4)
|
||||
add_glibc_compat(zstd)
|
||||
add_glibc_compat(snappy)
|
||||
add_glibc_compat(arrow)
|
||||
add_glibc_compat(protoc)
|
||||
add_glibc_compat(thrift_static)
|
||||
add_glibc_compat(boost_regex_internal)
|
||||
endif ()
|
||||
|
@ -21,7 +21,7 @@ BUILD_TARGETS=clickhouse
|
||||
BUILD_TYPE=Debug
|
||||
ENABLE_EMBEDDED_COMPILER=0
|
||||
|
||||
CMAKE_FLAGS="-D CMAKE_C_FLAGS_ADD=-g0 -D CMAKE_CXX_FLAGS_ADD=-g0 -D ENABLE_JEMALLOC=0 -D ENABLE_CAPNP=0 -D ENABLE_RDKAFKA=0 -D ENABLE_UNWIND=0 -D ENABLE_ICU=0 -D ENABLE_POCO_MONGODB=0 -D ENABLE_POCO_NETSSL=0 -D ENABLE_POCO_ODBC=0 -D ENABLE_ODBC=0 -D ENABLE_MYSQL=0"
|
||||
CMAKE_FLAGS="-D CMAKE_C_FLAGS_ADD=-g0 -D CMAKE_CXX_FLAGS_ADD=-g0 -D ENABLE_JEMALLOC=0 -D ENABLE_CAPNP=0 -D ENABLE_RDKAFKA=0 -D ENABLE_UNWIND=0 -D ENABLE_ICU=0 -D ENABLE_POCO_MONGODB=0 -D ENABLE_POCO_NETSSL=0 -D ENABLE_POCO_ODBC=0 -D ENABLE_ODBC=0 -D ENABLE_MYSQL=0 -D ENABLE_SSL=0 -D ENABLE_POCO_NETSSL=0"
|
||||
|
||||
[[ $(uname) == "FreeBSD" ]] && COMPILER_PACKAGE_VERSION=devel && export COMPILER_PATH=/usr/local/bin
|
||||
|
||||
|
7
cmake/find_hyperscan.cmake
Normal file
7
cmake/find_hyperscan.cmake
Normal file
@ -0,0 +1,7 @@
|
||||
if (HAVE_SSSE3)
|
||||
set (HYPERSCAN_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/hyperscan/src)
|
||||
set (HYPERSCAN_LIBRARY hs)
|
||||
set (USE_HYPERSCAN 1)
|
||||
set (USE_INTERNAL_HYPERSCAN_LIBRARY 1)
|
||||
message (STATUS "Using hyperscan: ${HYPERSCAN_INCLUDE_DIR} " : ${HYPERSCAN_LIBRARY})
|
||||
endif()
|
@ -1,5 +1,5 @@
|
||||
# Freebsd: contrib/cppkafka/include/cppkafka/detail/endianness.h:53:23: error: 'betoh16' was not declared in this scope
|
||||
if (NOT ARCH_ARM AND NOT ARCH_32 AND NOT APPLE AND NOT OS_FREEBSD)
|
||||
if (NOT ARCH_ARM AND NOT ARCH_32 AND NOT APPLE AND NOT OS_FREEBSD AND OPENSSL_FOUND)
|
||||
option (ENABLE_RDKAFKA "Enable kafka" ON)
|
||||
endif ()
|
||||
|
||||
|
@ -1,7 +1,19 @@
|
||||
option (ENABLE_SSL "Enable ssl" ON)
|
||||
|
||||
if (ENABLE_SSL)
|
||||
|
||||
if(NOT ARCH_32)
|
||||
option(USE_INTERNAL_SSL_LIBRARY "Set to FALSE to use system *ssl library instead of bundled" ${NOT_UNBUNDLED})
|
||||
endif()
|
||||
|
||||
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/ssl/CMakeLists.txt")
|
||||
if(USE_INTERNAL_SSL_LIBRARY)
|
||||
message(WARNING "submodule contrib/ssl is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
endif()
|
||||
set(USE_INTERNAL_SSL_LIBRARY 0)
|
||||
set(MISSING_INTERNAL_SSL_LIBRARY 1)
|
||||
endif()
|
||||
|
||||
set (OPENSSL_USE_STATIC_LIBS ${USE_STATIC_LIBRARIES})
|
||||
|
||||
if (NOT USE_INTERNAL_SSL_LIBRARY)
|
||||
@ -28,7 +40,7 @@ if (NOT USE_INTERNAL_SSL_LIBRARY)
|
||||
endif ()
|
||||
endif ()
|
||||
|
||||
if (NOT OPENSSL_FOUND)
|
||||
if (NOT OPENSSL_FOUND AND NOT MISSING_INTERNAL_SSL_LIBRARY)
|
||||
set (USE_INTERNAL_SSL_LIBRARY 1)
|
||||
set (OPENSSL_ROOT_DIR "${ClickHouse_SOURCE_DIR}/contrib/ssl")
|
||||
set (OPENSSL_INCLUDE_DIR "${OPENSSL_ROOT_DIR}/include")
|
||||
@ -43,4 +55,11 @@ if (NOT OPENSSL_FOUND)
|
||||
set (OPENSSL_FOUND 1)
|
||||
endif ()
|
||||
|
||||
message (STATUS "Using ssl=${OPENSSL_FOUND}: ${OPENSSL_INCLUDE_DIR} : ${OPENSSL_LIBRARIES}")
|
||||
if(OPENSSL_FOUND)
|
||||
# we need keep OPENSSL_FOUND for many libs in contrib
|
||||
set(USE_SSL 1)
|
||||
endif()
|
||||
|
||||
endif ()
|
||||
|
||||
message (STATUS "Using ssl=${USE_SSL}: ${OPENSSL_INCLUDE_DIR} : ${OPENSSL_LIBRARIES}")
|
||||
|
14
contrib/CMakeLists.txt
vendored
14
contrib/CMakeLists.txt
vendored
@ -4,7 +4,7 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
|
||||
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch")
|
||||
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -std=c++1z")
|
||||
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
|
||||
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch")
|
||||
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch -Wno-string-plus-int")
|
||||
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")
|
||||
endif ()
|
||||
|
||||
@ -125,13 +125,17 @@ endif ()
|
||||
if (ENABLE_MYSQL AND USE_INTERNAL_MYSQL_LIBRARY)
|
||||
add_subdirectory (mariadb-connector-c-cmake)
|
||||
target_include_directories(mysqlclient BEFORE PRIVATE ${ZLIB_INCLUDE_DIR})
|
||||
target_include_directories(mysqlclient BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
if(OPENSSL_INCLUDE_DIR)
|
||||
target_include_directories(mysqlclient BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
endif()
|
||||
endif ()
|
||||
|
||||
if (USE_INTERNAL_RDKAFKA_LIBRARY)
|
||||
add_subdirectory (librdkafka-cmake)
|
||||
target_include_directories(rdkafka BEFORE PRIVATE ${ZLIB_INCLUDE_DIR})
|
||||
target_include_directories(rdkafka BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
if(OPENSSL_INCLUDE_DIR)
|
||||
target_include_directories(rdkafka BEFORE PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
endif()
|
||||
endif ()
|
||||
|
||||
if (USE_RDKAFKA)
|
||||
@ -300,3 +304,7 @@ endif ()
|
||||
if (USE_BASE64)
|
||||
add_subdirectory (base64-cmake)
|
||||
endif()
|
||||
|
||||
if (USE_HYPERSCAN)
|
||||
add_subdirectory (hyperscan)
|
||||
endif()
|
||||
|
2
contrib/boost
vendored
2
contrib/boost
vendored
@ -1 +1 @@
|
||||
Subproject commit 6a96e8b59f76148eb8ad54a9d15259f8ce84c606
|
||||
Subproject commit 32abf16beb7bb8b243a4d100ccdd6acb271738c4
|
1
contrib/hyperscan
vendored
Submodule
1
contrib/hyperscan
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit 05dab0efee80be405aad5f74721b692b6889b75e
|
@ -208,7 +208,8 @@ target_link_libraries(hdfs3 ${LIBXML2_LIBRARY})
|
||||
# inherit from parent cmake
|
||||
target_include_directories(hdfs3 PRIVATE ${Boost_INCLUDE_DIRS})
|
||||
target_include_directories(hdfs3 PRIVATE ${Protobuf_INCLUDE_DIR})
|
||||
target_include_directories(hdfs3 PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
|
||||
target_link_libraries(hdfs3 ${Protobuf_LIBRARY})
|
||||
target_link_libraries(hdfs3 ${OPENSSL_LIBRARIES})
|
||||
if(OPENSSL_INCLUDE_DIR AND OPENSSL_LIBRARIES)
|
||||
target_include_directories(hdfs3 PRIVATE ${OPENSSL_INCLUDE_DIR})
|
||||
target_link_libraries(hdfs3 ${OPENSSL_LIBRARIES})
|
||||
endif()
|
||||
|
@ -58,4 +58,7 @@ add_library(rdkafka ${LINK_MODE} ${SRCS})
|
||||
target_include_directories(rdkafka SYSTEM PUBLIC include)
|
||||
target_include_directories(rdkafka SYSTEM PUBLIC ${RDKAFKA_SOURCE_DIR}) # Because weird logic with "include_next" is used.
|
||||
target_include_directories(rdkafka SYSTEM PRIVATE ${ZSTD_INCLUDE_DIR}/common) # Because wrong path to "zstd_errors.h" is used.
|
||||
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY} ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
|
||||
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY})
|
||||
if(OPENSSL_SSL_LIBRARY AND OPENSSL_CRYPTO_LIBRARY)
|
||||
target_link_libraries(rdkafka PUBLIC ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
|
||||
endif()
|
||||
|
@ -33,7 +33,6 @@ ${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/ma_time.c
|
||||
${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/ma_tls.c
|
||||
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/gnutls.c
|
||||
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/ma_schannel.c
|
||||
${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/openssl.c
|
||||
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/schannel.c
|
||||
#${MARIADB_CLIENT_SOURCE_DIR}/plugins/auth/auth_gssapi_client.c
|
||||
#${MARIADB_CLIENT_SOURCE_DIR}/plugins/auth/dialog.c
|
||||
@ -55,12 +54,19 @@ ${MARIADB_CLIENT_SOURCE_DIR}/plugins/pvio/pvio_socket.c
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/libmariadb/ma_client_plugin.c
|
||||
)
|
||||
|
||||
if(OPENSSL_LIBRARIES)
|
||||
list(APPEND SRCS ${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/secure/openssl.c)
|
||||
endif()
|
||||
|
||||
add_library(mysqlclient STATIC ${SRCS})
|
||||
|
||||
target_link_libraries(mysqlclient ${OPENSSL_LIBRARIES})
|
||||
if(OPENSSL_LIBRARIES)
|
||||
target_link_libraries(mysqlclient ${OPENSSL_LIBRARIES})
|
||||
target_compile_definitions(mysqlclient PRIVATE -D HAVE_OPENSSL -D HAVE_TLS)
|
||||
endif()
|
||||
|
||||
target_include_directories(mysqlclient PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/include)
|
||||
target_include_directories(mysqlclient PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/common/include)
|
||||
target_include_directories(mysqlclient PUBLIC ${MARIADB_CLIENT_SOURCE_DIR}/include)
|
||||
|
||||
target_compile_definitions(mysqlclient PRIVATE -D THREAD -D HAVE_OPENSSL -D HAVE_TLS)
|
||||
target_compile_definitions(mysqlclient PRIVATE -D THREAD)
|
||||
|
@ -57,7 +57,7 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
|
||||
endif ()
|
||||
|
||||
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8)
|
||||
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wshadow-field -Wstring-plus-int")
|
||||
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wshadow-field -Wstring-plus-int -Wempty-init-stmt")
|
||||
endif ()
|
||||
|
||||
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9)
|
||||
@ -309,7 +309,10 @@ if (USE_PARQUET)
|
||||
endif ()
|
||||
endif ()
|
||||
|
||||
target_link_libraries(dbms PRIVATE ${OPENSSL_CRYPTO_LIBRARY} Threads::Threads)
|
||||
if(OPENSSL_CRYPTO_LIBRARY)
|
||||
target_link_libraries(dbms PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
|
||||
endif ()
|
||||
target_link_libraries(dbms PRIVATE Threads::Threads)
|
||||
|
||||
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${DIVIDE_INCLUDE_DIR})
|
||||
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
|
||||
|
@ -154,7 +154,7 @@ else ()
|
||||
clickhouse_target_link_split_lib(clickhouse obfuscator)
|
||||
endif ()
|
||||
if (USE_EMBEDDED_COMPILER)
|
||||
clickhouse_target_link_split_lib(clickhouse compiler)
|
||||
target_link_libraries(clickhouse PRIVATE clickhouse-compiler-lib)
|
||||
endif ()
|
||||
|
||||
set (CLICKHOUSE_BUNDLE)
|
||||
|
@ -101,6 +101,7 @@ namespace ErrorCodes
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int CANNOT_SET_SIGNAL_HANDLER;
|
||||
extern const int CANNOT_READLINE;
|
||||
extern const int SYSTEM_ERROR;
|
||||
}
|
||||
|
||||
|
||||
@ -295,7 +296,6 @@ private:
|
||||
/// The value of the option is used as the text of query (or of multiple queries).
|
||||
/// If stdin is not a terminal, INSERT data for the first query is read from it.
|
||||
/// - stdin is not a terminal. In this case queries are read from it.
|
||||
stdin_is_not_tty = !isatty(STDIN_FILENO);
|
||||
if (stdin_is_not_tty || config().has("query"))
|
||||
is_interactive = false;
|
||||
|
||||
@ -610,9 +610,6 @@ private:
|
||||
|
||||
try
|
||||
{
|
||||
/// Determine the terminal size.
|
||||
ioctl(0, TIOCGWINSZ, &terminal_size);
|
||||
|
||||
if (!process(input))
|
||||
break;
|
||||
}
|
||||
@ -1568,7 +1565,7 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
ioctl(0, TIOCGWINSZ, &terminal_size);
|
||||
stdin_is_not_tty = !isatty(STDIN_FILENO);
|
||||
|
||||
namespace po = boost::program_options;
|
||||
|
||||
@ -1576,7 +1573,11 @@ public:
|
||||
unsigned min_description_length = line_length / 2;
|
||||
if (!stdin_is_not_tty)
|
||||
{
|
||||
line_length = std::max(3U, static_cast<unsigned>(terminal_size.ws_col));
|
||||
if (ioctl(STDIN_FILENO, TIOCGWINSZ, &terminal_size))
|
||||
throwFromErrno("Cannot obtain terminal window size (ioctl TIOCGWINSZ)", ErrorCodes::SYSTEM_ERROR);
|
||||
line_length = std::max(
|
||||
static_cast<unsigned>(strlen("--http_native_compression_disable_checksumming_on_decompress ")),
|
||||
static_cast<unsigned>(terminal_size.ws_col));
|
||||
min_description_length = std::min(min_description_length, line_length - 2);
|
||||
}
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
#pragma once
|
||||
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Poco/Util/Application.h>
|
||||
#include <memory>
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Poco/Util/XMLConfiguration.h>
|
||||
#include <Poco/AutoPtr.h>
|
||||
|
||||
|
@ -25,7 +25,7 @@
|
||||
#include <Interpreters/Context.h>
|
||||
#include <IO/ConnectionTimeouts.h>
|
||||
#include <IO/UseSSL.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Common/InterruptListener.h>
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
#include "TestStopConditions.h"
|
||||
#include <Common/InterruptListener.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Client/Connection.h>
|
||||
|
||||
namespace DB
|
||||
|
21
dbms/programs/server/users.d/readonly.xml
Normal file
21
dbms/programs/server/users.d/readonly.xml
Normal file
@ -0,0 +1,21 @@
|
||||
<?xml version="1.0"?>
|
||||
<yandex>
|
||||
<profiles>
|
||||
<!-- Profile that allows only read queries. -->
|
||||
<readonly>
|
||||
<readonly>1</readonly>
|
||||
</readonly>
|
||||
</profiles>
|
||||
|
||||
<users>
|
||||
<readonly>
|
||||
<password></password>
|
||||
<networks incl="networks" replace="replace">
|
||||
<ip>::1</ip>
|
||||
<ip>127.0.0.1</ip>
|
||||
</networks>
|
||||
<profile>readonly</profile>
|
||||
<quota>default</quota>
|
||||
</readonly>
|
||||
</users>
|
||||
</yandex>
|
@ -77,7 +77,7 @@
|
||||
</default>
|
||||
|
||||
<!-- Example of user with readonly access. -->
|
||||
<readonly>
|
||||
<!-- <readonly>
|
||||
<password></password>
|
||||
<networks incl="networks" replace="replace">
|
||||
<ip>::1</ip>
|
||||
@ -85,7 +85,7 @@
|
||||
</networks>
|
||||
<profile>readonly</profile>
|
||||
<quota>default</quota>
|
||||
</readonly>
|
||||
</readonly> -->
|
||||
</users>
|
||||
|
||||
<!-- Quotas. -->
|
||||
|
@ -199,8 +199,13 @@ public:
|
||||
for (auto & rhs_elem : rhs_set)
|
||||
{
|
||||
cur_set.emplace(rhs_elem.getValue(), it, inserted);
|
||||
if (inserted && it->getValue().size)
|
||||
it->getValueMutable().data = arena->insert(it->getValue().data, it->getValue().size);
|
||||
if (inserted)
|
||||
{
|
||||
if (it->getValue().size)
|
||||
it->getValueMutable().data = arena->insert(it->getValue().data, it->getValue().size);
|
||||
else
|
||||
it->getValueMutable().data = nullptr;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -268,7 +268,7 @@ public:
|
||||
void merge(const AggregateFunctionHistogramData & other, UInt32 max_bins)
|
||||
{
|
||||
lower_bound = std::min(lower_bound, other.lower_bound);
|
||||
upper_bound = std::max(lower_bound, other.upper_bound);
|
||||
upper_bound = std::max(upper_bound, other.upper_bound);
|
||||
for (size_t i = 0; i < other.size; i++)
|
||||
add(other.points[i].mean, other.points[i].weight, max_bins);
|
||||
}
|
||||
|
@ -18,7 +18,7 @@
|
||||
|
||||
#include <IO/ConnectionTimeouts.h>
|
||||
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/TablesStatus.h>
|
||||
|
||||
#include <Compression/ICompressionCodec.h>
|
||||
|
@ -6,7 +6,7 @@
|
||||
#include <Common/getFQDNOrHostName.h>
|
||||
#include <Common/isLocalAddress.h>
|
||||
#include <Common/ProfileEvents.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
|
||||
|
||||
namespace ProfileEvents
|
||||
|
@ -169,8 +169,6 @@ RWLockImpl::LockHolderImpl::~LockHolderImpl()
|
||||
if (!parent_queue.empty())
|
||||
parent_queue.front().cv.notify_all();
|
||||
}
|
||||
|
||||
parent.reset();
|
||||
}
|
||||
|
||||
|
||||
|
@ -437,10 +437,10 @@ public:
|
||||
}
|
||||
|
||||
template <typename ResultType, typename AnsCallback>
|
||||
void searchAll(
|
||||
void searchAllPositions(
|
||||
const ColumnString::Chars & haystack_data,
|
||||
const ColumnString::Offsets & haystack_offsets,
|
||||
const AnsCallback & ansCallback,
|
||||
const AnsCallback & ans_callback,
|
||||
ResultType & ans)
|
||||
{
|
||||
const size_t haystack_string_size = haystack_offsets.size();
|
||||
@ -461,7 +461,7 @@ public:
|
||||
{
|
||||
const UInt8 * ptr = fallback_searchers[fallback_needles[i]].search(haystack, haystack_end);
|
||||
if (ptr != haystack_end)
|
||||
ans[from + fallback_needles[i]] = ansCallback(haystack, ptr);
|
||||
ans[from + fallback_needles[i]] = ans_callback(haystack, ptr);
|
||||
}
|
||||
|
||||
/// check if we have one non empty volnitsky searcher
|
||||
@ -481,7 +481,7 @@ public:
|
||||
{
|
||||
if (fallback_searchers[ind].compare(res))
|
||||
{
|
||||
ans[from + ind] = ansCallback(haystack, res);
|
||||
ans[from + ind] = ans_callback(haystack, res);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -513,6 +513,16 @@ public:
|
||||
searchInternal(haystack_data, haystack_offsets, callback, ans);
|
||||
}
|
||||
|
||||
template <typename ResultType, typename CountCharsCallback>
|
||||
void searchFirstPosition(const ColumnString::Chars & haystack_data, const ColumnString::Offsets & haystack_offsets, const CountCharsCallback & count_chars_callback, ResultType & ans)
|
||||
{
|
||||
auto callback = [this, &count_chars_callback](const UInt8 * haystack, const UInt8 * haystack_end) -> size_t
|
||||
{
|
||||
return this->searchOneFirstPosition(haystack, haystack_end, count_chars_callback);
|
||||
};
|
||||
searchInternal(haystack_data, haystack_offsets, callback, ans);
|
||||
}
|
||||
|
||||
private:
|
||||
/**
|
||||
* This function is needed to initialize hash table
|
||||
@ -582,7 +592,7 @@ private:
|
||||
inline void searchInternal(
|
||||
const ColumnString::Chars & haystack_data,
|
||||
const ColumnString::Offsets & haystack_offsets,
|
||||
const OneSearcher & searchFallback,
|
||||
const OneSearcher & search_fallback,
|
||||
ResultType & ans)
|
||||
{
|
||||
const size_t haystack_string_size = haystack_offsets.size();
|
||||
@ -593,7 +603,7 @@ private:
|
||||
{
|
||||
const auto * haystack = &haystack_data[prev_offset];
|
||||
const auto * haystack_end = haystack + haystack_offsets[j] - prev_offset - 1;
|
||||
ans[j] = searchFallback(haystack, haystack_end);
|
||||
ans[j] = search_fallback(haystack, haystack_end);
|
||||
prev_offset = haystack_offsets[j];
|
||||
}
|
||||
}
|
||||
@ -665,6 +675,41 @@ private:
|
||||
return ans + 1;
|
||||
}
|
||||
|
||||
template <typename CountCharsCallback>
|
||||
inline size_t searchOneFirstPosition(const UInt8 * haystack, const UInt8 * haystack_end, const CountCharsCallback & callback) const
|
||||
{
|
||||
const size_t fallback_size = fallback_needles.size();
|
||||
|
||||
size_t ans = std::numeric_limits<size_t>::max();
|
||||
|
||||
for (size_t i = 0; i < fallback_size; ++i)
|
||||
if (auto pos = fallback_searchers[fallback_needles[i]].search(haystack, haystack_end); pos != haystack_end)
|
||||
ans = std::min(ans, callback(haystack, pos));
|
||||
|
||||
/// check if we have one non empty volnitsky searcher
|
||||
if (step != std::numeric_limits<size_t>::max())
|
||||
{
|
||||
const auto * pos = haystack + step - sizeof(VolnitskyTraits::Ngram);
|
||||
for (; pos <= haystack_end - sizeof(VolnitskyTraits::Ngram); pos += step)
|
||||
{
|
||||
for (size_t cell_num = VolnitskyTraits::toNGram(pos) % VolnitskyTraits::hash_size; hash[cell_num].off;
|
||||
cell_num = (cell_num + 1) % VolnitskyTraits::hash_size)
|
||||
{
|
||||
if (pos >= haystack + hash[cell_num].off - 1)
|
||||
{
|
||||
const auto res = pos - (hash[cell_num].off - 1);
|
||||
const size_t ind = hash[cell_num].id;
|
||||
if (res + needles[ind].size <= haystack_end && fallback_searchers[ind].compare(res))
|
||||
ans = std::min(ans, callback(haystack, res));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if (ans == std::numeric_limits<size_t>::max())
|
||||
return 0;
|
||||
return ans;
|
||||
}
|
||||
|
||||
void putNGramBase(const VolnitskyTraits::Ngram ngram, const int offset, const size_t num)
|
||||
{
|
||||
size_t cell_num = ngram % VolnitskyTraits::hash_size;
|
||||
|
@ -23,6 +23,7 @@
|
||||
#cmakedefine01 USE_CPUID
|
||||
#cmakedefine01 USE_CPUINFO
|
||||
#cmakedefine01 USE_BROTLI
|
||||
#cmakedefine01 USE_SSL
|
||||
|
||||
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
|
||||
#cmakedefine01 LLVM_HAS_RTTI
|
||||
|
@ -1,5 +1,8 @@
|
||||
add_executable (hashes_test hashes_test.cpp)
|
||||
target_link_libraries (hashes_test PRIVATE clickhouse_common_io ${OPENSSL_CRYPTO_LIBRARY} ${CITYHASH_LIBRARIES})
|
||||
target_link_libraries (hashes_test PRIVATE clickhouse_common_io ${CITYHASH_LIBRARIES})
|
||||
if(OPENSSL_CRYPTO_LIBRARY)
|
||||
target_link_libraries (hashes_test PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
|
||||
endif()
|
||||
|
||||
add_executable (sip_hash sip_hash.cpp)
|
||||
target_link_libraries (sip_hash PRIVATE clickhouse_common_io)
|
||||
|
@ -1,14 +1,14 @@
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
|
||||
#include <city.h>
|
||||
#include <openssl/md5.h>
|
||||
|
||||
#include <Common/Stopwatch.h>
|
||||
|
||||
#include <Common/SipHash.h>
|
||||
#include <IO/ReadBufferFromFileDescriptor.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <Common/config.h>
|
||||
#if USE_SSL
|
||||
# include <openssl/md5.h>
|
||||
#endif
|
||||
|
||||
|
||||
int main(int, char **)
|
||||
@ -108,6 +108,7 @@ int main(int, char **)
|
||||
<< std::endl;
|
||||
}
|
||||
|
||||
#if USE_SSL
|
||||
{
|
||||
Stopwatch watch;
|
||||
|
||||
@ -129,6 +130,7 @@ int main(int, char **)
|
||||
<< " (" << rows / watch.elapsedSeconds() << " rows/sec., " << bytes / 1000000.0 / watch.elapsedSeconds() << " MB/sec.)"
|
||||
<< std::endl;
|
||||
}
|
||||
#endif
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -1,8 +1,9 @@
|
||||
#include "Settings.h"
|
||||
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <Core/Field.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <string.h>
|
@ -1,7 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include "SettingsCommon.h"
|
||||
#include <Core/Defines.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
|
||||
|
||||
namespace Poco
|
||||
@ -127,7 +127,6 @@ struct Settings
|
||||
M(SettingUInt64, max_concurrent_queries_for_user, 0, "The maximum number of concurrent requests per user.") \
|
||||
\
|
||||
M(SettingBool, insert_deduplicate, true, "For INSERT queries in the replicated table, specifies that deduplication of insertings blocks should be preformed") \
|
||||
M(SettingBool, insert_sample_with_metadata, false, "For INSERT queries, specifies that the server need to send metadata about column defaults to the client. This will be used to calculate default expressions.") \
|
||||
\
|
||||
M(SettingUInt64, insert_quorum, 0, "For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled.") \
|
||||
M(SettingMilliseconds, insert_quorum_timeout, 600000, "") \
|
||||
@ -153,6 +152,7 @@ struct Settings
|
||||
\
|
||||
M(SettingBool, input_format_skip_unknown_fields, false, "Skip columns with unknown names from input data (it works for JSONEachRow and TSKV formats).") \
|
||||
M(SettingBool, input_format_import_nested_json, false, "Map nested JSON data to nested tables (it works for JSONEachRow format).") \
|
||||
M(SettingBool, input_format_defaults_for_omitted_fields, false, "For input data calculate default expressions for omitted fields (it works for JSONEachRow format).") \
|
||||
\
|
||||
M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.") \
|
||||
\
|
@ -1,12 +1,11 @@
|
||||
#include <Core/Field.h>
|
||||
#include "SettingsCommon.h"
|
||||
|
||||
#include <Core/Field.h>
|
||||
#include <Common/getNumberOfPhysicalCPUCores.h>
|
||||
#include <Common/FieldVisitors.h>
|
||||
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
|
||||
|
||||
namespace DB
|
@ -5,7 +5,7 @@
|
||||
#include <DataStreams/BlockStreamProfileInfo.h>
|
||||
#include <DataStreams/SizeLimits.h>
|
||||
#include <IO/Progress.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
#include <Storages/TableStructureLockHolder.h>
|
||||
|
||||
#include <atomic>
|
||||
|
@ -16,7 +16,7 @@
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/ParserCreateQuery.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/InterpreterCreateQuery.h>
|
||||
#include <IO/WriteBufferFromFile.h>
|
||||
#include <IO/ReadBufferFromFile.h>
|
||||
|
@ -1,6 +1,6 @@
|
||||
#include <Common/Exception.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <DataStreams/MaterializingBlockOutputStream.h>
|
||||
#include <Formats/FormatSettings.h>
|
||||
#include <Formats/FormatFactory.h>
|
||||
|
@ -26,11 +26,11 @@
|
||||
# include <IO/WriteHelpers.h>
|
||||
# include <IO/copyData.h>
|
||||
# include <Interpreters/castColumn.h>
|
||||
# include <common/DateLUTImpl.h>
|
||||
# include <ext/range.h>
|
||||
# include <arrow/api.h>
|
||||
# include <parquet/arrow/reader.h>
|
||||
# include <parquet/file_reader.h>
|
||||
# include <common/DateLUTImpl.h>
|
||||
# include <ext/range.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -223,7 +223,8 @@ void fillColumnWithDecimalData(std::shared_ptr<arrow::Column> & arrow_column, Mu
|
||||
auto & chunk = static_cast<arrow::DecimalArray &>(*(arrow_column->data()->chunk(chunk_i)));
|
||||
for (size_t value_i = 0, length = static_cast<size_t>(chunk.length()); value_i < length; ++value_i)
|
||||
{
|
||||
column_data.emplace_back(chunk.IsNull(value_i) ? Decimal128(0) : *reinterpret_cast<const Decimal128 *>(chunk.Value(value_i))); // TODO: copy column
|
||||
column_data.emplace_back(
|
||||
chunk.IsNull(value_i) ? Decimal128(0) : *reinterpret_cast<const Decimal128 *>(chunk.Value(value_i))); // TODO: copy column
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -259,45 +260,46 @@ void fillByteMapFromArrowColumn(std::shared_ptr<arrow::Column> & arrow_column, M
|
||||
|
||||
using NameToColumnPtr = std::unordered_map<std::string, std::shared_ptr<arrow::Column>>;
|
||||
|
||||
const std::unordered_map<arrow::Type::type, std::shared_ptr<IDataType>> arrow_type_to_internal_type = {
|
||||
//{arrow::Type::DECIMAL, std::make_shared<DataTypeDecimal>()},
|
||||
{arrow::Type::UINT8, std::make_shared<DataTypeUInt8>()},
|
||||
{arrow::Type::INT8, std::make_shared<DataTypeInt8>()},
|
||||
{arrow::Type::UINT16, std::make_shared<DataTypeUInt16>()},
|
||||
{arrow::Type::INT16, std::make_shared<DataTypeInt16>()},
|
||||
{arrow::Type::UINT32, std::make_shared<DataTypeUInt32>()},
|
||||
{arrow::Type::INT32, std::make_shared<DataTypeInt32>()},
|
||||
{arrow::Type::UINT64, std::make_shared<DataTypeUInt64>()},
|
||||
{arrow::Type::INT64, std::make_shared<DataTypeInt64>()},
|
||||
{arrow::Type::HALF_FLOAT, std::make_shared<DataTypeFloat32>()},
|
||||
{arrow::Type::FLOAT, std::make_shared<DataTypeFloat32>()},
|
||||
{arrow::Type::DOUBLE, std::make_shared<DataTypeFloat64>()},
|
||||
|
||||
{arrow::Type::BOOL, std::make_shared<DataTypeUInt8>()},
|
||||
//{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
|
||||
{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
|
||||
//{arrow::Type::DATE32, std::make_shared<DataTypeDateTime>()},
|
||||
{arrow::Type::DATE64, std::make_shared<DataTypeDateTime>()},
|
||||
{arrow::Type::TIMESTAMP, std::make_shared<DataTypeDateTime>()},
|
||||
//{arrow::Type::TIME32, std::make_shared<DataTypeDateTime>()},
|
||||
|
||||
|
||||
{arrow::Type::STRING, std::make_shared<DataTypeString>()},
|
||||
{arrow::Type::BINARY, std::make_shared<DataTypeString>()},
|
||||
//{arrow::Type::FIXED_SIZE_BINARY, std::make_shared<DataTypeString>()},
|
||||
//{arrow::Type::UUID, std::make_shared<DataTypeString>()},
|
||||
|
||||
|
||||
// TODO: add other types that are convertable to internal ones:
|
||||
// 0. ENUM?
|
||||
// 1. UUID -> String
|
||||
// 2. JSON -> String
|
||||
// Full list of types: contrib/arrow/cpp/src/arrow/type.h
|
||||
};
|
||||
|
||||
|
||||
Block ParquetBlockInputStream::readImpl()
|
||||
{
|
||||
static const std::unordered_map<arrow::Type::type, std::shared_ptr<IDataType>> arrow_type_to_internal_type = {
|
||||
//{arrow::Type::DECIMAL, std::make_shared<DataTypeDecimal>()},
|
||||
{arrow::Type::UINT8, std::make_shared<DataTypeUInt8>()},
|
||||
{arrow::Type::INT8, std::make_shared<DataTypeInt8>()},
|
||||
{arrow::Type::UINT16, std::make_shared<DataTypeUInt16>()},
|
||||
{arrow::Type::INT16, std::make_shared<DataTypeInt16>()},
|
||||
{arrow::Type::UINT32, std::make_shared<DataTypeUInt32>()},
|
||||
{arrow::Type::INT32, std::make_shared<DataTypeInt32>()},
|
||||
{arrow::Type::UINT64, std::make_shared<DataTypeUInt64>()},
|
||||
{arrow::Type::INT64, std::make_shared<DataTypeInt64>()},
|
||||
{arrow::Type::HALF_FLOAT, std::make_shared<DataTypeFloat32>()},
|
||||
{arrow::Type::FLOAT, std::make_shared<DataTypeFloat32>()},
|
||||
{arrow::Type::DOUBLE, std::make_shared<DataTypeFloat64>()},
|
||||
|
||||
{arrow::Type::BOOL, std::make_shared<DataTypeUInt8>()},
|
||||
//{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
|
||||
{arrow::Type::DATE32, std::make_shared<DataTypeDate>()},
|
||||
//{arrow::Type::DATE32, std::make_shared<DataTypeDateTime>()},
|
||||
{arrow::Type::DATE64, std::make_shared<DataTypeDateTime>()},
|
||||
{arrow::Type::TIMESTAMP, std::make_shared<DataTypeDateTime>()},
|
||||
//{arrow::Type::TIME32, std::make_shared<DataTypeDateTime>()},
|
||||
|
||||
|
||||
{arrow::Type::STRING, std::make_shared<DataTypeString>()},
|
||||
{arrow::Type::BINARY, std::make_shared<DataTypeString>()},
|
||||
//{arrow::Type::FIXED_SIZE_BINARY, std::make_shared<DataTypeString>()},
|
||||
//{arrow::Type::UUID, std::make_shared<DataTypeString>()},
|
||||
|
||||
|
||||
// TODO: add other types that are convertable to internal ones:
|
||||
// 0. ENUM?
|
||||
// 1. UUID -> String
|
||||
// 2. JSON -> String
|
||||
// Full list of types: contrib/arrow/cpp/src/arrow/type.h
|
||||
};
|
||||
|
||||
|
||||
Block res;
|
||||
|
||||
if (!istr.eof())
|
||||
@ -308,7 +310,9 @@ Block ParquetBlockInputStream::readImpl()
|
||||
*/
|
||||
|
||||
if (row_group_current < row_group_total)
|
||||
throw Exception{"Got new data, but data from previous chunks not readed " + std::to_string(row_group_current) + "/" + std::to_string(row_group_total), ErrorCodes::CANNOT_READ_ALL_DATA};
|
||||
throw Exception{"Got new data, but data from previous chunks not readed " + std::to_string(row_group_current) + "/"
|
||||
+ std::to_string(row_group_total),
|
||||
ErrorCodes::CANNOT_READ_ALL_DATA};
|
||||
|
||||
file_data.clear();
|
||||
{
|
||||
|
@ -20,7 +20,11 @@ target_link_libraries(clickhouse_functions
|
||||
${METROHASH_LIBRARIES}
|
||||
murmurhash
|
||||
${BASE64_LIBRARY}
|
||||
${OPENSSL_CRYPTO_LIBRARY})
|
||||
)
|
||||
|
||||
if (OPENSSL_CRYPTO_LIBRARY)
|
||||
target_link_libraries(clickhouse_functions PUBLIC ${OPENSSL_CRYPTO_LIBRARY})
|
||||
endif()
|
||||
|
||||
target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR} ${METROHASH_INCLUDE_DIR})
|
||||
|
||||
@ -60,3 +64,8 @@ if (USE_XXHASH)
|
||||
target_link_libraries(clickhouse_functions PRIVATE ${XXHASH_LIBRARY})
|
||||
target_include_directories(clickhouse_functions SYSTEM PRIVATE ${XXHASH_INCLUDE_DIR})
|
||||
endif()
|
||||
|
||||
if (USE_HYPERSCAN)
|
||||
target_link_libraries (clickhouse_functions PRIVATE ${HYPERSCAN_LIBRARY})
|
||||
target_include_directories (clickhouse_functions SYSTEM PRIVATE ${HYPERSCAN_INCLUDE_DIR})
|
||||
endif ()
|
||||
|
@ -547,21 +547,27 @@ class FunctionBinaryArithmetic : public IFunction
|
||||
throw Exception{"Illegal column " + block.getByPosition(new_arguments[1]).column->getName()
|
||||
+ " of argument of aggregation state multiply. Should be integer constant", ErrorCodes::ILLEGAL_COLUMN};
|
||||
|
||||
const ColumnAggregateFunction * column = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(new_arguments[0]).column.get());
|
||||
IAggregateFunction * function = column->getAggregateFunction().get();
|
||||
const IColumn & agg_state_column = *block.getByPosition(new_arguments[0]).column;
|
||||
bool agg_state_is_const = agg_state_column.isColumnConst();
|
||||
const ColumnAggregateFunction & column = typeid_cast<const ColumnAggregateFunction &>(
|
||||
agg_state_is_const ? static_cast<const ColumnConst &>(agg_state_column).getDataColumn() : agg_state_column);
|
||||
|
||||
AggregateFunctionPtr function = column.getAggregateFunction();
|
||||
|
||||
auto arena = std::make_shared<Arena>();
|
||||
|
||||
auto column_to = ColumnAggregateFunction::create(column->getAggregateFunction(), Arenas(1, arena));
|
||||
column_to->reserve(input_rows_count);
|
||||
size_t size = agg_state_is_const ? 1 : input_rows_count;
|
||||
|
||||
auto column_from = ColumnAggregateFunction::create(column->getAggregateFunction(), Arenas(1, arena));
|
||||
column_from->reserve(input_rows_count);
|
||||
auto column_to = ColumnAggregateFunction::create(function, Arenas(1, arena));
|
||||
column_to->reserve(size);
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
auto column_from = ColumnAggregateFunction::create(function, Arenas(1, arena));
|
||||
column_from->reserve(size);
|
||||
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
column_to->insertDefault();
|
||||
column_from->insertFrom(column->getData()[i]);
|
||||
column_from->insertFrom(column.getData()[i]);
|
||||
}
|
||||
|
||||
auto & vec_to = column_to->getData();
|
||||
@ -575,38 +581,55 @@ class FunctionBinaryArithmetic : public IFunction
|
||||
{
|
||||
if (m % 2)
|
||||
{
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
function->merge(vec_to[i], vec_from[i], arena.get());
|
||||
--m;
|
||||
}
|
||||
else
|
||||
{
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
function->merge(vec_from[i], vec_from[i], arena.get());
|
||||
m /= 2;
|
||||
}
|
||||
}
|
||||
|
||||
block.getByPosition(result).column = std::move(column_to);
|
||||
if (agg_state_is_const)
|
||||
block.getByPosition(result).column = ColumnConst::create(std::move(column_to), input_rows_count);
|
||||
else
|
||||
block.getByPosition(result).column = std::move(column_to);
|
||||
}
|
||||
|
||||
/// Merge two aggregation states together.
|
||||
void executeAggregateAddition(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) const
|
||||
{
|
||||
const ColumnAggregateFunction * columns[2];
|
||||
for (size_t i = 0; i < 2; ++i)
|
||||
columns[i] = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[i]).column.get());
|
||||
const IColumn & lhs_column = *block.getByPosition(arguments[0]).column;
|
||||
const IColumn & rhs_column = *block.getByPosition(arguments[1]).column;
|
||||
|
||||
auto column_to = ColumnAggregateFunction::create(columns[0]->getAggregateFunction());
|
||||
column_to->reserve(input_rows_count);
|
||||
bool lhs_is_const = lhs_column.isColumnConst();
|
||||
bool rhs_is_const = rhs_column.isColumnConst();
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
const ColumnAggregateFunction & lhs = typeid_cast<const ColumnAggregateFunction &>(
|
||||
lhs_is_const ? static_cast<const ColumnConst &>(lhs_column).getDataColumn() : lhs_column);
|
||||
const ColumnAggregateFunction & rhs = typeid_cast<const ColumnAggregateFunction &>(
|
||||
rhs_is_const ? static_cast<const ColumnConst &>(rhs_column).getDataColumn() : rhs_column);
|
||||
|
||||
AggregateFunctionPtr function = lhs.getAggregateFunction();
|
||||
|
||||
size_t size = (lhs_is_const && rhs_is_const) ? 1 : input_rows_count;
|
||||
|
||||
auto column_to = ColumnAggregateFunction::create(function);
|
||||
column_to->reserve(size);
|
||||
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
column_to->insertFrom(columns[0]->getData()[i]);
|
||||
column_to->insertMergeFrom(columns[1]->getData()[i]);
|
||||
column_to->insertFrom(lhs.getData()[lhs_is_const ? 0 : i]);
|
||||
column_to->insertMergeFrom(rhs.getData()[rhs_is_const ? 0 : i]);
|
||||
}
|
||||
|
||||
block.getByPosition(result).column = std::move(column_to);
|
||||
if (lhs_is_const && rhs_is_const)
|
||||
block.getByPosition(result).column = ColumnConst::create(std::move(column_to), input_rows_count);
|
||||
else
|
||||
block.getByPosition(result).column = std::move(column_to);
|
||||
}
|
||||
|
||||
void executeDateTimeIntervalPlusMinus(Block & block, const ColumnNumbers & arguments,
|
||||
|
@ -8,11 +8,13 @@ namespace DB
|
||||
|
||||
void registerFunctionsHashing(FunctionFactory & factory)
|
||||
{
|
||||
#if USE_SSL
|
||||
factory.registerFunction<FunctionHalfMD5>();
|
||||
factory.registerFunction<FunctionMD5>();
|
||||
factory.registerFunction<FunctionSHA1>();
|
||||
factory.registerFunction<FunctionSHA224>();
|
||||
factory.registerFunction<FunctionSHA256>();
|
||||
#endif
|
||||
factory.registerFunction<FunctionSipHash64>();
|
||||
factory.registerFunction<FunctionSipHash128>();
|
||||
factory.registerFunction<FunctionCityHash64>();
|
||||
|
@ -1,7 +1,5 @@
|
||||
#pragma once
|
||||
|
||||
#include <openssl/md5.h>
|
||||
#include <openssl/sha.h>
|
||||
#include <city.h>
|
||||
#include <farmhash.h>
|
||||
#include <metrohash.h>
|
||||
@ -14,7 +12,12 @@
|
||||
|
||||
#include <Common/config.h>
|
||||
#if USE_XXHASH
|
||||
#include <xxhash.h> // Y_IGNORE
|
||||
# include <xxhash.h> // Y_IGNORE
|
||||
#endif
|
||||
|
||||
#if USE_SSL
|
||||
# include <openssl/md5.h>
|
||||
# include <openssl/sha.h>
|
||||
#endif
|
||||
|
||||
#include <Poco/ByteOrder.h>
|
||||
@ -94,7 +97,7 @@ struct IntHash64Impl
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
#if USE_SSL
|
||||
struct HalfMD5Impl
|
||||
{
|
||||
static constexpr auto name = "halfMD5";
|
||||
@ -183,6 +186,7 @@ struct SHA256Impl
|
||||
SHA256_Final(out_char_data, &ctx);
|
||||
}
|
||||
};
|
||||
#endif
|
||||
|
||||
struct SipHash64Impl
|
||||
{
|
||||
@ -1076,15 +1080,18 @@ private:
|
||||
struct NameIntHash32 { static constexpr auto name = "intHash32"; };
|
||||
struct NameIntHash64 { static constexpr auto name = "intHash64"; };
|
||||
|
||||
|
||||
#if USE_SSL
|
||||
using FunctionHalfMD5 = FunctionAnyHash<HalfMD5Impl>;
|
||||
#endif
|
||||
using FunctionSipHash64 = FunctionAnyHash<SipHash64Impl>;
|
||||
using FunctionIntHash32 = FunctionIntHash<IntHash32Impl, NameIntHash32>;
|
||||
using FunctionIntHash64 = FunctionIntHash<IntHash64Impl, NameIntHash64>;
|
||||
#if USE_SSL
|
||||
using FunctionMD5 = FunctionStringHashFixedString<MD5Impl>;
|
||||
using FunctionSHA1 = FunctionStringHashFixedString<SHA1Impl>;
|
||||
using FunctionSHA224 = FunctionStringHashFixedString<SHA224Impl>;
|
||||
using FunctionSHA256 = FunctionStringHashFixedString<SHA256Impl>;
|
||||
#endif
|
||||
using FunctionSipHash128 = FunctionStringHashFixedString<SipHash128Impl>;
|
||||
using FunctionCityHash64 = FunctionAnyHash<ImplCityHash64>;
|
||||
using FunctionFarmHash64 = FunctionAnyHash<ImplFarmHash64>;
|
||||
|
@ -15,6 +15,10 @@
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
|
||||
#ifdef __SSSE3__
|
||||
# include <hs.h>
|
||||
#endif
|
||||
|
||||
#if USE_RE2_ST
|
||||
# include <re2_st/re2.h> // Y_IGNORE
|
||||
#else
|
||||
@ -312,7 +316,7 @@ struct PositionImpl
|
||||
};
|
||||
|
||||
template <typename Impl>
|
||||
struct MultiPositionImpl
|
||||
struct MultiSearchAllPositionsImpl
|
||||
{
|
||||
using ResultType = UInt64;
|
||||
|
||||
@ -322,17 +326,31 @@ struct MultiPositionImpl
|
||||
const std::vector<StringRef> & needles,
|
||||
PaddedPODArray<UInt64> & res)
|
||||
{
|
||||
auto resCallback = [](const UInt8 * start, const UInt8 * end) -> UInt64
|
||||
auto res_callback = [](const UInt8 * start, const UInt8 * end) -> UInt64
|
||||
{
|
||||
return 1 + Impl::countChars(reinterpret_cast<const char *>(start), reinterpret_cast<const char *>(end));
|
||||
};
|
||||
|
||||
Impl::createMultiSearcherInBigHaystack(needles).searchAll(haystack_data, haystack_offsets, resCallback, res);
|
||||
Impl::createMultiSearcherInBigHaystack(needles).searchAllPositions(haystack_data, haystack_offsets, res_callback, res);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Impl>
|
||||
struct MultiSearchImpl
|
||||
{
|
||||
using ResultType = UInt8;
|
||||
|
||||
static void vector_constant(
|
||||
const ColumnString::Chars & haystack_data,
|
||||
const ColumnString::Offsets & haystack_offsets,
|
||||
const std::vector<StringRef> & needles,
|
||||
PaddedPODArray<UInt8> & res)
|
||||
{
|
||||
Impl::createMultiSearcherInBigHaystack(needles).search(haystack_data, haystack_offsets, res);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Impl>
|
||||
struct MultiSearchFirstPositionImpl
|
||||
{
|
||||
using ResultType = UInt64;
|
||||
|
||||
@ -342,12 +360,16 @@ struct MultiSearchImpl
|
||||
const std::vector<StringRef> & needles,
|
||||
PaddedPODArray<UInt64> & res)
|
||||
{
|
||||
Impl::createMultiSearcherInBigHaystack(needles).search(haystack_data, haystack_offsets, res);
|
||||
auto res_callback = [](const UInt8 * start, const UInt8 * end) -> UInt64
|
||||
{
|
||||
return 1 + Impl::countChars(reinterpret_cast<const char *>(start), reinterpret_cast<const char *>(end));
|
||||
};
|
||||
Impl::createMultiSearcherInBigHaystack(needles).searchFirstPosition(haystack_data, haystack_offsets, res_callback, res);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Impl>
|
||||
struct FirstMatchImpl
|
||||
struct MultiSearchFirstIndexImpl
|
||||
{
|
||||
using ResultType = UInt64;
|
||||
|
||||
@ -524,8 +546,8 @@ struct MatchImpl
|
||||
res[i] = !revert;
|
||||
else
|
||||
{
|
||||
const char * str_data = reinterpret_cast<const char *>(&data[i != 0 ? offsets[i - 1] : 0]);
|
||||
size_t str_size = (i != 0 ? offsets[i] - offsets[i - 1] : offsets[0]) - 1;
|
||||
const char * str_data = reinterpret_cast<const char *>(&data[offsets[i - 1]]);
|
||||
size_t str_size = offsets[i] - offsets[i - 1] - 1;
|
||||
|
||||
/** Even in the case of `required_substring_is_prefix` use UNANCHORED check for regexp,
|
||||
* so that it can match when `required_substring` occurs into the string several times,
|
||||
@ -581,6 +603,78 @@ struct MatchImpl
|
||||
};
|
||||
|
||||
|
||||
template <typename Type, bool FindAny, bool FindAnyIndex>
|
||||
struct MultiMatchAnyImpl
|
||||
{
|
||||
static_assert(static_cast<int>(FindAny) + static_cast<int>(FindAnyIndex) == 1);
|
||||
using ResultType = Type;
|
||||
|
||||
static void vector_constant(
|
||||
const ColumnString::Chars & haystack_data,
|
||||
const ColumnString::Offsets & haystack_offsets,
|
||||
const std::vector<StringRef> & needles,
|
||||
PaddedPODArray<Type> & res)
|
||||
{
|
||||
(void)FindAny;
|
||||
(void)FindAnyIndex;
|
||||
#ifdef __SSSE3__
|
||||
using ScratchPtr = std::unique_ptr<hs_scratch_t, DB::MultiRegexps::HyperscanDeleter<decltype(&hs_free_scratch), &hs_free_scratch>>;
|
||||
|
||||
const auto & hyperscan_regex = MultiRegexps::get<FindAnyIndex>(needles);
|
||||
hs_scratch_t * scratch = nullptr;
|
||||
hs_error_t err = hs_alloc_scratch(hyperscan_regex->get(), &scratch);
|
||||
if (err != HS_SUCCESS)
|
||||
throw Exception("Could not allocate scratch space for hyperscan.", ErrorCodes::CANNOT_ALLOCATE_MEMORY);
|
||||
ScratchPtr smart_scratch(scratch);
|
||||
|
||||
auto on_match = []([[maybe_unused]] unsigned int id,
|
||||
unsigned long long /* from */,
|
||||
unsigned long long /* to */,
|
||||
unsigned int /* flags */,
|
||||
void * context) -> int
|
||||
{
|
||||
if constexpr (FindAnyIndex)
|
||||
*reinterpret_cast<Type *>(context) = id;
|
||||
else if constexpr (FindAny)
|
||||
*reinterpret_cast<Type *>(context) = 1;
|
||||
return 0;
|
||||
};
|
||||
const size_t haystack_offsets_size = haystack_offsets.size();
|
||||
size_t offset = 0;
|
||||
for (size_t i = 0; i < haystack_offsets_size; ++i)
|
||||
{
|
||||
res[i] = 0;
|
||||
hs_scan(
|
||||
hyperscan_regex->get(),
|
||||
reinterpret_cast<const char *>(haystack_data.data()) + offset,
|
||||
haystack_offsets[i] - offset - 1,
|
||||
0,
|
||||
smart_scratch.get(),
|
||||
on_match,
|
||||
&res[i]);
|
||||
offset = haystack_offsets[i];
|
||||
}
|
||||
#else
|
||||
/// Fallback if not an intel processor
|
||||
PaddedPODArray<UInt8> accum(res.size());
|
||||
memset(res.data(), 0, res.size() * sizeof(res.front()));
|
||||
memset(accum.data(), 0, accum.size());
|
||||
for (size_t j = 0; j < needles.size(); ++j)
|
||||
{
|
||||
MatchImpl<false, false>::vector_constant(haystack_data, haystack_offsets, needles[j].toString(), accum);
|
||||
for (size_t i = 0; i < res.size(); ++i)
|
||||
{
|
||||
if constexpr (FindAny)
|
||||
res[i] |= accum[i];
|
||||
else if (accum[i])
|
||||
res[i] = j + 1;
|
||||
}
|
||||
}
|
||||
#endif // __SSSE3__
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
struct ExtractImpl
|
||||
{
|
||||
static void vector(
|
||||
@ -1090,53 +1184,69 @@ struct NamePositionCaseInsensitiveUTF8
|
||||
{
|
||||
static constexpr auto name = "positionCaseInsensitiveUTF8";
|
||||
};
|
||||
struct NameMultiPosition
|
||||
struct NameMultiSearchAllPositions
|
||||
{
|
||||
static constexpr auto name = "multiPosition";
|
||||
static constexpr auto name = "multiSearchAllPositions";
|
||||
};
|
||||
struct NameMultiPositionUTF8
|
||||
struct NameMultiSearchAllPositionsUTF8
|
||||
{
|
||||
static constexpr auto name = "multiPositionUTF8";
|
||||
static constexpr auto name = "multiSearchAllPositionsUTF8";
|
||||
};
|
||||
struct NameMultiPositionCaseInsensitive
|
||||
struct NameMultiSearchAllPositionsCaseInsensitive
|
||||
{
|
||||
static constexpr auto name = "multiPositionCaseInsensitive";
|
||||
static constexpr auto name = "multiSearchAllPositionsCaseInsensitive";
|
||||
};
|
||||
struct NameMultiPositionCaseInsensitiveUTF8
|
||||
struct NameMultiSearchAllPositionsCaseInsensitiveUTF8
|
||||
{
|
||||
static constexpr auto name = "multiPositionCaseInsensitiveUTF8";
|
||||
static constexpr auto name = "multiSearchAllPositionsCaseInsensitiveUTF8";
|
||||
};
|
||||
struct NameMultiSearch
|
||||
struct NameMultiSearchAny
|
||||
{
|
||||
static constexpr auto name = "multiSearch";
|
||||
static constexpr auto name = "multiSearchAny";
|
||||
};
|
||||
struct NameMultiSearchUTF8
|
||||
struct NameMultiSearchAnyUTF8
|
||||
{
|
||||
static constexpr auto name = "multiSearchUTF8";
|
||||
static constexpr auto name = "multiSearchAnyUTF8";
|
||||
};
|
||||
struct NameMultiSearchCaseInsensitive
|
||||
struct NameMultiSearchAnyCaseInsensitive
|
||||
{
|
||||
static constexpr auto name = "multiSearchCaseInsensitive";
|
||||
static constexpr auto name = "multiSearchAnyCaseInsensitive";
|
||||
};
|
||||
struct NameMultiSearchCaseInsensitiveUTF8
|
||||
struct NameMultiSearchAnyCaseInsensitiveUTF8
|
||||
{
|
||||
static constexpr auto name = "multiSearchCaseInsensitiveUTF8";
|
||||
static constexpr auto name = "multiSearchAnyCaseInsensitiveUTF8";
|
||||
};
|
||||
struct NameFirstMatch
|
||||
struct NameMultiSearchFirstIndex
|
||||
{
|
||||
static constexpr auto name = "firstMatch";
|
||||
static constexpr auto name = "multiSearchFirstIndex";
|
||||
};
|
||||
struct NameFirstMatchUTF8
|
||||
struct NameMultiSearchFirstIndexUTF8
|
||||
{
|
||||
static constexpr auto name = "firstMatchUTF8";
|
||||
static constexpr auto name = "multiSearchFirstIndexUTF8";
|
||||
};
|
||||
struct NameFirstMatchCaseInsensitive
|
||||
struct NameMultiSearchFirstIndexCaseInsensitive
|
||||
{
|
||||
static constexpr auto name = "firstMatchCaseInsensitive";
|
||||
static constexpr auto name = "multiSearchFirstIndexCaseInsensitive";
|
||||
};
|
||||
struct NameFirstMatchCaseInsensitiveUTF8
|
||||
struct NameMultiSearchFirstIndexCaseInsensitiveUTF8
|
||||
{
|
||||
static constexpr auto name = "firstMatchCaseInsensitiveUTF8";
|
||||
static constexpr auto name = "multiSearchFirstIndexCaseInsensitiveUTF8";
|
||||
};
|
||||
struct NameMultiSearchFirstPosition
|
||||
{
|
||||
static constexpr auto name = "multiSearchFirstPosition";
|
||||
};
|
||||
struct NameMultiSearchFirstPositionUTF8
|
||||
{
|
||||
static constexpr auto name = "multiSearchFirstPositionUTF8";
|
||||
};
|
||||
struct NameMultiSearchFirstPositionCaseInsensitive
|
||||
{
|
||||
static constexpr auto name = "multiSearchFirstPositionCaseInsensitive";
|
||||
};
|
||||
struct NameMultiSearchFirstPositionCaseInsensitiveUTF8
|
||||
{
|
||||
static constexpr auto name = "multiSearchFirstPositionCaseInsensitiveUTF8";
|
||||
};
|
||||
struct NameMatch
|
||||
{
|
||||
@ -1150,6 +1260,14 @@ struct NameNotLike
|
||||
{
|
||||
static constexpr auto name = "notLike";
|
||||
};
|
||||
struct NameMultiMatchAny
|
||||
{
|
||||
static constexpr auto name = "multiMatchAny";
|
||||
};
|
||||
struct NameMultiMatchAnyIndex
|
||||
{
|
||||
static constexpr auto name = "multiMatchAnyIndex";
|
||||
};
|
||||
struct NameExtract
|
||||
{
|
||||
static constexpr auto name = "extract";
|
||||
@ -1177,28 +1295,37 @@ using FunctionPositionCaseInsensitive = FunctionsStringSearch<PositionImpl<Posit
|
||||
using FunctionPositionCaseInsensitiveUTF8
|
||||
= FunctionsStringSearch<PositionImpl<PositionCaseInsensitiveUTF8>, NamePositionCaseInsensitiveUTF8>;
|
||||
|
||||
using FunctionMultiPosition = FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseSensitiveASCII>, NameMultiPosition>;
|
||||
using FunctionMultiPositionUTF8 = FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseSensitiveUTF8>, NameMultiPositionUTF8>;
|
||||
using FunctionMultiPositionCaseInsensitive
|
||||
= FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseInsensitiveASCII>, NameMultiPositionCaseInsensitive>;
|
||||
using FunctionMultiPositionCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringPosition<MultiPositionImpl<PositionCaseInsensitiveUTF8>, NameMultiPositionCaseInsensitiveUTF8>;
|
||||
using FunctionMultiSearchAllPositions = FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseSensitiveASCII>, NameMultiSearchAllPositions>;
|
||||
using FunctionMultiSearchAllPositionsUTF8 = FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseSensitiveUTF8>, NameMultiSearchAllPositionsUTF8>;
|
||||
using FunctionMultiSearchAllPositionsCaseInsensitive
|
||||
= FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseInsensitiveASCII>, NameMultiSearchAllPositionsCaseInsensitive>;
|
||||
using FunctionMultiSearchAllPositionsCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringPosition<MultiSearchAllPositionsImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchAllPositionsCaseInsensitiveUTF8>;
|
||||
|
||||
using FunctionMultiSearch = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveASCII>, NameMultiSearch>;
|
||||
using FunctionMultiSearchUTF8 = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveUTF8>, NameMultiSearchUTF8>;
|
||||
using FunctionMultiSearch = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveASCII>, NameMultiSearchAny>;
|
||||
using FunctionMultiSearchUTF8 = FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseSensitiveUTF8>, NameMultiSearchAnyUTF8>;
|
||||
using FunctionMultiSearchCaseInsensitive
|
||||
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveASCII>, NameMultiSearchCaseInsensitive>;
|
||||
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveASCII>, NameMultiSearchAnyCaseInsensitive>;
|
||||
using FunctionMultiSearchCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchCaseInsensitiveUTF8>;
|
||||
= FunctionsMultiStringSearch<MultiSearchImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchAnyCaseInsensitiveUTF8>;
|
||||
|
||||
using FunctionFirstMatch = FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseSensitiveASCII>, NameFirstMatch>;
|
||||
using FunctionFirstMatchUTF8 = FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseSensitiveUTF8>, NameFirstMatchUTF8>;
|
||||
using FunctionFirstMatchCaseInsensitive
|
||||
= FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseInsensitiveASCII>, NameFirstMatchCaseInsensitive>;
|
||||
using FunctionFirstMatchCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringSearch<FirstMatchImpl<PositionCaseInsensitiveUTF8>, NameFirstMatchCaseInsensitiveUTF8>;
|
||||
using FunctionMultiSearchFirstIndex = FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseSensitiveASCII>, NameMultiSearchFirstIndex>;
|
||||
using FunctionMultiSearchFirstIndexUTF8 = FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseSensitiveUTF8>, NameMultiSearchFirstIndexUTF8>;
|
||||
using FunctionMultiSearchFirstIndexCaseInsensitive
|
||||
= FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseInsensitiveASCII>, NameMultiSearchFirstIndexCaseInsensitive>;
|
||||
using FunctionMultiSearchFirstIndexCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringSearch<MultiSearchFirstIndexImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchFirstIndexCaseInsensitiveUTF8>;
|
||||
|
||||
using FunctionMultiSearchFirstPosition = FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseSensitiveASCII>, NameMultiSearchFirstPosition>;
|
||||
using FunctionMultiSearchFirstPositionUTF8 = FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseSensitiveUTF8>, NameMultiSearchFirstPositionUTF8>;
|
||||
using FunctionMultiSearchFirstPositionCaseInsensitive
|
||||
= FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseInsensitiveASCII>, NameMultiSearchFirstPositionCaseInsensitive>;
|
||||
using FunctionMultiSearchFirstPositionCaseInsensitiveUTF8
|
||||
= FunctionsMultiStringSearch<MultiSearchFirstPositionImpl<PositionCaseInsensitiveUTF8>, NameMultiSearchFirstPositionCaseInsensitiveUTF8>;
|
||||
|
||||
using FunctionMatch = FunctionsStringSearch<MatchImpl<false>, NameMatch>;
|
||||
using FunctionMultiMatchAny = FunctionsMultiStringSearch<MultiMatchAnyImpl<UInt8, true, false>, NameMultiMatchAny, std::numeric_limits<UInt32>::max()>;
|
||||
using FunctionMultiMatchAnyIndex = FunctionsMultiStringSearch<MultiMatchAnyImpl<UInt64, false, true>, NameMultiMatchAnyIndex, std::numeric_limits<UInt32>::max()>;
|
||||
using FunctionLike = FunctionsStringSearch<MatchImpl<true>, NameLike>;
|
||||
using FunctionNotLike = FunctionsStringSearch<MatchImpl<true, true>, NameNotLike>;
|
||||
using FunctionExtract = FunctionsStringSearchToString<ExtractImpl, NameExtract>;
|
||||
@ -1220,26 +1347,34 @@ void registerFunctionsStringSearch(FunctionFactory & factory)
|
||||
factory.registerFunction<FunctionPositionCaseInsensitive>();
|
||||
factory.registerFunction<FunctionPositionCaseInsensitiveUTF8>();
|
||||
|
||||
factory.registerFunction<FunctionMultiPosition>();
|
||||
factory.registerFunction<FunctionMultiPositionUTF8>();
|
||||
factory.registerFunction<FunctionMultiPositionCaseInsensitive>();
|
||||
factory.registerFunction<FunctionMultiPositionCaseInsensitiveUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchAllPositions>();
|
||||
factory.registerFunction<FunctionMultiSearchAllPositionsUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchAllPositionsCaseInsensitive>();
|
||||
factory.registerFunction<FunctionMultiSearchAllPositionsCaseInsensitiveUTF8>();
|
||||
|
||||
factory.registerFunction<FunctionMultiSearch>();
|
||||
factory.registerFunction<FunctionMultiSearchUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchCaseInsensitive>();
|
||||
factory.registerFunction<FunctionMultiSearchCaseInsensitiveUTF8>();
|
||||
|
||||
factory.registerFunction<FunctionFirstMatch>();
|
||||
factory.registerFunction<FunctionFirstMatchUTF8>();
|
||||
factory.registerFunction<FunctionFirstMatchCaseInsensitive>();
|
||||
factory.registerFunction<FunctionFirstMatchCaseInsensitiveUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstIndex>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstIndexUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstIndexCaseInsensitive>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstIndexCaseInsensitiveUTF8>();
|
||||
|
||||
factory.registerFunction<FunctionMultiSearchFirstPosition>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstPositionUTF8>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstPositionCaseInsensitive>();
|
||||
factory.registerFunction<FunctionMultiSearchFirstPositionCaseInsensitiveUTF8>();
|
||||
|
||||
factory.registerFunction<FunctionMatch>();
|
||||
factory.registerFunction<FunctionLike>();
|
||||
factory.registerFunction<FunctionNotLike>();
|
||||
factory.registerFunction<FunctionExtract>();
|
||||
|
||||
factory.registerFunction<FunctionMultiMatchAny>();
|
||||
factory.registerFunction<FunctionMultiMatchAnyIndex>();
|
||||
|
||||
factory.registerAlias("locate", NamePosition::name, FunctionFactory::CaseInsensitive);
|
||||
factory.registerAlias("replace", NameReplaceAll::name, FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
@ -26,6 +26,8 @@ namespace DB
|
||||
* notLike(haystack, pattern)
|
||||
*
|
||||
* match(haystack, pattern) - search by regular expression re2; Returns 0 or 1.
|
||||
* multiMatchAny(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- search by re2 regular expressions pattern_i; Returns 0 or 1 if any pattern_i matches.
|
||||
* multiMatchAnyIndex(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- search by re2 regular expressions pattern_i; Returns index of any match or zero if none;
|
||||
*
|
||||
* Applies regexp re2 and pulls:
|
||||
* - the first subpattern, if the regexp has a subpattern;
|
||||
@ -39,20 +41,25 @@ namespace DB
|
||||
* replaceRegexpOne(haystack, pattern, replacement) - replaces the pattern with the specified regexp, only the first occurrence.
|
||||
* replaceRegexpAll(haystack, pattern, replacement) - replaces the pattern with the specified type, all occurrences.
|
||||
*
|
||||
* multiPosition(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find first occurrences (positions) of all the const patterns inside haystack
|
||||
* multiPositionUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiPositionCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiPositionCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
*
|
||||
* multiSearch(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find any of the const patterns inside haystack and return 0 or 1
|
||||
* multiSearchUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchAllPositions(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find first occurrences (positions) of all the const patterns inside haystack
|
||||
* multiSearchAllPositionsUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchAllPositionsCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
|
||||
* firstMatch(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first index of the matched string or zero if nothing was found
|
||||
* firstMatchUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* firstMatchCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* firstMatchCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchFirstPosition(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first position of the haystack matched by strings or zero if nothing was found
|
||||
* multiSearchFirstPositionUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchFirstPositionCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
*
|
||||
* multiSearchAny(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- find any of the const patterns inside haystack and return 0 or 1
|
||||
* multiSearchAnyUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchAnyCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchAnyCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
|
||||
* multiSearchFirstIndex(haystack, [pattern_1, pattern_2, ..., pattern_n]) -- returns the first index of the matched string or zero if nothing was found
|
||||
* multiSearchFirstIndexUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchFirstIndexCaseInsensitive(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
* multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [pattern_1, pattern_2, ..., pattern_n])
|
||||
*/
|
||||
|
||||
namespace ErrorCodes
|
||||
@ -269,9 +276,13 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Impl, typename Name>
|
||||
/// The argument limiting raises from Volnitsky searcher -- it is performance crucial to save only one byte for pattern number.
|
||||
/// But some other searchers use this function, for example, multiMatchAny -- hyperscan does not have such restrictions
|
||||
template <typename Impl, typename Name, size_t LimitArgs = std::numeric_limits<UInt8>::max()>
|
||||
class FunctionsMultiStringSearch : public IFunction
|
||||
{
|
||||
static_assert(LimitArgs > 0);
|
||||
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionsMultiStringSearch>(); }
|
||||
@ -282,10 +293,10 @@ public:
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
if (arguments.size() + 1 >= std::numeric_limits<UInt8>::max())
|
||||
if (arguments.size() + 1 >= LimitArgs)
|
||||
throw Exception(
|
||||
"Number of arguments for function " + getName() + " doesn't match: passed " + std::to_string(arguments.size())
|
||||
+ ", should be at most 255.",
|
||||
+ ", should be at most " + std::to_string(LimitArgs) + ".",
|
||||
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
if (!isString(arguments[0]))
|
||||
@ -333,6 +344,7 @@ public:
|
||||
|
||||
vec_res.resize(column_haystack_size);
|
||||
|
||||
/// TODO support constant_constant version
|
||||
if (col_haystack_vector)
|
||||
Impl::vector_constant(col_haystack_vector->getChars(), col_haystack_vector->getOffsets(), refs, vec_res);
|
||||
else
|
||||
|
@ -1,19 +1,32 @@
|
||||
#pragma once
|
||||
|
||||
#include <Common/OptimizedRegularExpression.h>
|
||||
#include <Common/ObjectPool.h>
|
||||
#include <Common/ProfileEvents.h>
|
||||
#include <Functions/likePatternToRegexp.h>
|
||||
#include <Common/ObjectPool.h>
|
||||
#include <Common/OptimizedRegularExpression.h>
|
||||
#include <Common/ProfileEvents.h>
|
||||
#include <common/StringRef.h>
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#ifdef __SSSE3__
|
||||
# include <hs.h>
|
||||
#endif
|
||||
|
||||
namespace ProfileEvents
|
||||
{
|
||||
extern const Event RegexpCreated;
|
||||
extern const Event RegexpCreated;
|
||||
}
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int CANNOT_ALLOCATE_MEMORY;
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
namespace Regexps
|
||||
{
|
||||
@ -21,16 +34,22 @@ namespace Regexps
|
||||
using Pool = ObjectPoolMap<Regexp, String>;
|
||||
|
||||
template <bool like>
|
||||
inline Regexp createRegexp(const std::string & pattern, int flags) { return {pattern, flags}; }
|
||||
inline Regexp createRegexp(const std::string & pattern, int flags)
|
||||
{
|
||||
return {pattern, flags};
|
||||
}
|
||||
|
||||
template <>
|
||||
inline Regexp createRegexp<true>(const std::string & pattern, int flags) { return {likePatternToRegexp(pattern), flags}; }
|
||||
inline Regexp createRegexp<true>(const std::string & pattern, int flags)
|
||||
{
|
||||
return {likePatternToRegexp(pattern), flags};
|
||||
}
|
||||
|
||||
template <bool like, bool no_capture>
|
||||
inline Pool::Pointer get(const std::string & pattern)
|
||||
{
|
||||
/// C++11 has thread-safe function-local statics on most modern compilers.
|
||||
static Pool known_regexps; /// Different variables for different pattern parameters.
|
||||
static Pool known_regexps; /// Different variables for different pattern parameters.
|
||||
|
||||
return known_regexps.get(pattern, [&pattern]
|
||||
{
|
||||
@ -44,4 +63,82 @@ namespace Regexps
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef __SSSE3__
|
||||
|
||||
namespace MultiRegexps
|
||||
{
|
||||
template <typename Deleter, Deleter deleter>
|
||||
struct HyperscanDeleter
|
||||
{
|
||||
template <typename T>
|
||||
void operator()(T * ptr) const
|
||||
{
|
||||
deleter(ptr);
|
||||
}
|
||||
};
|
||||
|
||||
using Regexps = std::unique_ptr<hs_database_t, HyperscanDeleter<decltype(&hs_free_database), &hs_free_database>>;
|
||||
|
||||
using Pool = ObjectPoolMap<Regexps, std::vector<String>>;
|
||||
|
||||
template <bool FindAnyIndex>
|
||||
inline Pool::Pointer get(const std::vector<StringRef> & patterns)
|
||||
{
|
||||
/// C++11 has thread-safe function-local statics on most modern compilers.
|
||||
static Pool known_regexps; /// Different variables for different pattern parameters.
|
||||
|
||||
std::vector<String> str_patterns;
|
||||
str_patterns.reserve(patterns.size());
|
||||
for (const StringRef & ref : patterns)
|
||||
str_patterns.push_back(ref.toString());
|
||||
|
||||
return known_regexps.get(str_patterns, [&str_patterns]
|
||||
{
|
||||
std::vector<const char *> ptrns;
|
||||
std::vector<unsigned int> flags;
|
||||
ptrns.reserve(str_patterns.size());
|
||||
flags.reserve(str_patterns.size());
|
||||
for (const StringRef ref : str_patterns)
|
||||
{
|
||||
ptrns.push_back(ref.data);
|
||||
flags.push_back(HS_FLAG_DOTALL | HS_FLAG_ALLOWEMPTY | HS_FLAG_SINGLEMATCH);
|
||||
}
|
||||
hs_database_t * db = nullptr;
|
||||
hs_compile_error_t * compile_error;
|
||||
|
||||
|
||||
std::unique_ptr<unsigned int[]> ids;
|
||||
|
||||
if constexpr (FindAnyIndex)
|
||||
{
|
||||
ids.reset(new unsigned int[ptrns.size()]);
|
||||
for (size_t i = 0; i < ptrns.size(); ++i)
|
||||
ids[i] = i + 1;
|
||||
}
|
||||
|
||||
hs_error_t err
|
||||
= hs_compile_multi(ptrns.data(), flags.data(), ids.get(), ptrns.size(), HS_MODE_BLOCK, nullptr, &db, &compile_error);
|
||||
if (err != HS_SUCCESS)
|
||||
{
|
||||
std::unique_ptr<
|
||||
hs_compile_error_t,
|
||||
HyperscanDeleter<decltype(&hs_free_compile_error), &hs_free_compile_error>> error(compile_error);
|
||||
|
||||
if (error->expression < 0)
|
||||
throw Exception(String(error->message), ErrorCodes::LOGICAL_ERROR);
|
||||
else
|
||||
throw Exception(
|
||||
"Pattern '" + str_patterns[error->expression] + "' failed with error '" + String(error->message),
|
||||
ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
ProfileEvents::increment(ProfileEvents::RegexpCreated);
|
||||
|
||||
return new Regexps{db};
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
#endif // __SSSE3__
|
||||
|
||||
}
|
||||
|
@ -43,6 +43,8 @@ public:
|
||||
return 1;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
const DataTypeAggregateFunction * type = checkAndGetDataType<DataTypeAggregateFunction>(arguments[0].get());
|
||||
|
@ -23,12 +23,7 @@ namespace ErrorCodes
|
||||
/// where N >= 1.
|
||||
///
|
||||
/// For all 1 <= i <= N, "cond_i" has type UInt8.
|
||||
/// Types of all the branches "then_i" and "else" are either of the following:
|
||||
/// - numeric types for which there exists a common type;
|
||||
/// - dates;
|
||||
/// - dates with time;
|
||||
/// - strings;
|
||||
/// - arrays of such types.
|
||||
/// Types of all the branches "then_i" and "else" have a common type.
|
||||
///
|
||||
/// Additionally the arguments, conditions or branches, support nullable types
|
||||
/// and the NULL value, with a NULL condition treated as false.
|
||||
|
@ -1,7 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <Poco/Timespan.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
76
dbms/src/Interpreters/BloomFilter.cpp
Normal file
76
dbms/src/Interpreters/BloomFilter.cpp
Normal file
@ -0,0 +1,76 @@
|
||||
#include <Interpreters/BloomFilter.h>
|
||||
|
||||
#include <city.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
static constexpr UInt64 SEED_GEN_A = 845897321;
|
||||
static constexpr UInt64 SEED_GEN_B = 217728422;
|
||||
|
||||
|
||||
StringBloomFilter::StringBloomFilter(size_t size_, size_t hashes_, size_t seed_)
|
||||
: size(size_), hashes(hashes_), seed(seed_), words((size + sizeof(UnderType) - 1) / sizeof(UnderType)), filter(words, 0) {}
|
||||
|
||||
StringBloomFilter::StringBloomFilter(const StringBloomFilter & bloom_filter)
|
||||
: size(bloom_filter.size), hashes(bloom_filter.hashes), seed(bloom_filter.seed), words(bloom_filter.words), filter(bloom_filter.filter) {}
|
||||
|
||||
bool StringBloomFilter::find(const char * data, size_t len)
|
||||
{
|
||||
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
|
||||
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
|
||||
|
||||
for (size_t i = 0; i < hashes; ++i)
|
||||
{
|
||||
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
|
||||
if (!(filter[pos / (8 * sizeof(UnderType))] & (1ULL << (pos % (8 * sizeof(UnderType))))))
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
void StringBloomFilter::add(const char * data, size_t len)
|
||||
{
|
||||
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
|
||||
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
|
||||
|
||||
for (size_t i = 0; i < hashes; ++i)
|
||||
{
|
||||
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
|
||||
filter[pos / (8 * sizeof(UnderType))] |= (1ULL << (pos % (8 * sizeof(UnderType))));
|
||||
}
|
||||
}
|
||||
|
||||
void StringBloomFilter::clear()
|
||||
{
|
||||
filter.assign(words, 0);
|
||||
}
|
||||
|
||||
bool StringBloomFilter::contains(const StringBloomFilter & bf)
|
||||
{
|
||||
for (size_t i = 0; i < words; ++i)
|
||||
{
|
||||
if ((filter[i] & bf.filter[i]) != bf.filter[i])
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
UInt64 StringBloomFilter::isEmpty() const
|
||||
{
|
||||
for (size_t i = 0; i < words; ++i)
|
||||
if (filter[i] != 0)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
bool operator== (const StringBloomFilter & a, const StringBloomFilter & b)
|
||||
{
|
||||
for (size_t i = 0; i < a.words; ++i)
|
||||
if (a.filter[i] != b.filter[i])
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
}
|
50
dbms/src/Interpreters/BloomFilter.h
Normal file
50
dbms/src/Interpreters/BloomFilter.h
Normal file
@ -0,0 +1,50 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Types.h>
|
||||
#include <vector>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Bloom filter for strings.
|
||||
class StringBloomFilter
|
||||
{
|
||||
public:
|
||||
using UnderType = UInt64;
|
||||
using Container = std::vector<UnderType>;
|
||||
|
||||
/// size -- size of filter in bytes.
|
||||
/// hashes -- number of used hash functions.
|
||||
/// seed -- random seed for hash functions generation.
|
||||
StringBloomFilter(size_t size_, size_t hashes_, size_t seed_);
|
||||
StringBloomFilter(const StringBloomFilter & bloom_filter);
|
||||
|
||||
bool find(const char * data, size_t len);
|
||||
void add(const char * data, size_t len);
|
||||
void clear();
|
||||
|
||||
/// Checks if this contains everything from another bloom filter.
|
||||
/// Bloom filters must have equal size and seed.
|
||||
bool contains(const StringBloomFilter & bf);
|
||||
|
||||
const Container & getFilter() const { return filter; }
|
||||
Container & getFilter() { return filter; }
|
||||
|
||||
/// For debug.
|
||||
UInt64 isEmpty() const;
|
||||
|
||||
friend bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
|
||||
private:
|
||||
|
||||
size_t size;
|
||||
size_t hashes;
|
||||
size_t seed;
|
||||
size_t words;
|
||||
Container filter;
|
||||
};
|
||||
|
||||
|
||||
bool operator== (const StringBloomFilter & a, const StringBloomFilter & b);
|
||||
|
||||
}
|
@ -1,7 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <map>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Client/ConnectionPool.h>
|
||||
#include <Client/ConnectionPoolWithFailover.h>
|
||||
#include <Poco/Net/SocketAddress.h>
|
||||
|
@ -1,6 +1,6 @@
|
||||
#include <Interpreters/ClusterProxy/executeQuery.h>
|
||||
#include <Interpreters/ClusterProxy/IStreamFactory.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/Cluster.h>
|
||||
#include <Interpreters/IInterpreter.h>
|
||||
|
@ -23,7 +23,7 @@
|
||||
#include <Storages/CompressionCodecSelector.h>
|
||||
#include <TableFunctions/TableFunctionFactory.h>
|
||||
#include <Interpreters/ActionLocksManager.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/ExpressionJIT.h>
|
||||
#include <Interpreters/RuntimeComponentsFactory.h>
|
||||
#include <Interpreters/ISecurityManager.h>
|
||||
@ -244,10 +244,18 @@ struct ContextShared
|
||||
return;
|
||||
shutdown_called = true;
|
||||
|
||||
system_logs.reset();
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
|
||||
/** After this point, system logs will shutdown their threads and no longer write any data.
|
||||
* It will prevent recreation of system tables at shutdown.
|
||||
* Note that part changes at shutdown won't be logged to part log.
|
||||
*/
|
||||
system_logs.reset();
|
||||
}
|
||||
|
||||
/** At this point, some tables may have threads that block our mutex.
|
||||
* To complete them correctly, we will copy the current list of tables,
|
||||
* To shutdown them correctly, we will copy the current list of tables,
|
||||
* and ask them all to finish their work.
|
||||
* Then delete all objects with tables.
|
||||
*/
|
||||
@ -259,6 +267,8 @@ struct ContextShared
|
||||
current_databases = databases;
|
||||
}
|
||||
|
||||
/// We still hold "databases" in Context (instead of std::move) for Buffer tables to flush data correctly.
|
||||
|
||||
for (auto & database : current_databases)
|
||||
database.second->shutdown();
|
||||
|
||||
@ -1548,51 +1558,47 @@ Compiler & Context::getCompiler()
|
||||
void Context::initializeSystemLogs()
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
if (!global_context)
|
||||
throw Exception("Logical error: no global context for system logs", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
shared->system_logs.emplace(*global_context, getConfigRef());
|
||||
}
|
||||
|
||||
|
||||
QueryLog * Context::getQueryLog()
|
||||
std::shared_ptr<QueryLog> Context::getQueryLog()
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
if (!shared->system_logs || !shared->system_logs->query_log)
|
||||
return nullptr;
|
||||
return {};
|
||||
|
||||
return shared->system_logs->query_log.get();
|
||||
return shared->system_logs->query_log;
|
||||
}
|
||||
|
||||
|
||||
QueryThreadLog * Context::getQueryThreadLog()
|
||||
std::shared_ptr<QueryThreadLog> Context::getQueryThreadLog()
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
if (!shared->system_logs || !shared->system_logs->query_thread_log)
|
||||
return nullptr;
|
||||
return {};
|
||||
|
||||
return shared->system_logs->query_thread_log.get();
|
||||
return shared->system_logs->query_thread_log;
|
||||
}
|
||||
|
||||
|
||||
PartLog * Context::getPartLog(const String & part_database)
|
||||
std::shared_ptr<PartLog> Context::getPartLog(const String & part_database)
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
/// No part log or system logs are shutting down.
|
||||
if (!shared->system_logs || !shared->system_logs->part_log)
|
||||
return nullptr;
|
||||
return {};
|
||||
|
||||
/// Will not log operations on system tables (including part_log itself).
|
||||
/// It doesn't make sense and not allow to destruct PartLog correctly due to infinite logging and flushing,
|
||||
/// and also make troubles on startup.
|
||||
if (part_database == shared->system_logs->part_log_database)
|
||||
return nullptr;
|
||||
return {};
|
||||
|
||||
return shared->system_logs->part_log.get();
|
||||
return shared->system_logs->part_log;
|
||||
}
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Interpreters/ClientInfo.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Parsers/IAST_fwd.h>
|
||||
#include <Common/LRUCache.h>
|
||||
#include <Common/MultiVersion.h>
|
||||
@ -402,12 +402,12 @@ public:
|
||||
void initializeSystemLogs();
|
||||
|
||||
/// Nullptr if the query log is not ready for this moment.
|
||||
QueryLog * getQueryLog();
|
||||
QueryThreadLog * getQueryThreadLog();
|
||||
std::shared_ptr<QueryLog> getQueryLog();
|
||||
std::shared_ptr<QueryThreadLog> getQueryThreadLog();
|
||||
|
||||
/// Returns an object used to log opertaions with parts if it possible.
|
||||
/// Provide table name to make required cheks.
|
||||
PartLog * getPartLog(const String & part_database);
|
||||
std::shared_ptr<PartLog> getPartLog(const String & part_database);
|
||||
|
||||
const MergeTreeSettings & getMergeTreeSettings() const;
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Common/config.h>
|
||||
#include <Common/SipHash.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Core/Names.h>
|
||||
#include <Core/ColumnWithTypeAndName.h>
|
||||
#include <Core/Block.h>
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
#include <Interpreters/ActionsVisitor.h>
|
||||
#include <Interpreters/AggregateDescription.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/SyntaxAnalyzer.h>
|
||||
#include <Parsers/IAST_fwd.h>
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
#include <string>
|
||||
#include <Core/Types.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
|
@ -5,7 +5,7 @@
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
|
||||
#include <Interpreters/AggregationCommon.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
|
||||
#include <Common/Arena.h>
|
||||
#include <Common/ColumnsHashing.h>
|
||||
|
@ -1,5 +1,5 @@
|
||||
#include <Interpreters/LogicalExpressionsOptimizer.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
|
@ -104,7 +104,7 @@ bool PartLog::addNewParts(Context & current_context, const PartLog::MutableDataP
|
||||
if (parts.empty())
|
||||
return true;
|
||||
|
||||
PartLog * part_log = nullptr;
|
||||
std::shared_ptr<PartLog> part_log;
|
||||
|
||||
try
|
||||
{
|
||||
|
@ -1,5 +1,5 @@
|
||||
#include <Interpreters/ProcessList.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/DatabaseAndTableWithAlias.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
|
@ -1,17 +1,17 @@
|
||||
#include <Interpreters/SecurityManager.h>
|
||||
|
||||
#include "SecurityManager.h"
|
||||
#include <Poco/Net/IPAddress.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <Poco/String.h>
|
||||
|
||||
#include <Common/Exception.h>
|
||||
#include <IO/HexWriteBuffer.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
#include <openssl/sha.h>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
#include <Common/config.h>
|
||||
#if USE_SSL
|
||||
# include <openssl/sha.h>
|
||||
#endif
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -25,6 +25,7 @@ namespace ErrorCodes
|
||||
extern const int WRONG_PASSWORD;
|
||||
extern const int IP_ADDRESS_NOT_ALLOWED;
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int SUPPORT_IS_DISABLED;
|
||||
}
|
||||
|
||||
using UserPtr = SecurityManager::UserPtr;
|
||||
@ -68,6 +69,7 @@ UserPtr SecurityManager::authorizeAndGetUser(
|
||||
|
||||
if (!it->second->password_sha256_hex.empty())
|
||||
{
|
||||
#if USE_SSL
|
||||
unsigned char hash[32];
|
||||
|
||||
SHA256_CTX ctx;
|
||||
@ -86,6 +88,9 @@ UserPtr SecurityManager::authorizeAndGetUser(
|
||||
|
||||
if (hash_hex != it->second->password_sha256_hex)
|
||||
on_wrong_password();
|
||||
#else
|
||||
throw DB::Exception("SHA256 passwords support is disabled, because ClickHouse was built without SSL library", DB::ErrorCodes::SUPPORT_IS_DISABLED);
|
||||
#endif
|
||||
}
|
||||
else if (password != it->second->password)
|
||||
{
|
||||
|
@ -1,7 +1,7 @@
|
||||
#include <Interpreters/SyntaxAnalyzer.h>
|
||||
#include <Interpreters/InJoinSubqueriesPreprocessor.h>
|
||||
#include <Interpreters/LogicalExpressionsOptimizer.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/QueryAliasesVisitor.h>
|
||||
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
|
||||
#include <Interpreters/ArrayJoinedColumnsVisitor.h>
|
||||
|
@ -16,7 +16,7 @@ constexpr size_t DEFAULT_SYSTEM_LOG_FLUSH_INTERVAL_MILLISECONDS = 7500;
|
||||
|
||||
/// Creates a system log with MergeTree engine using parameters from config
|
||||
template <typename TSystemLog>
|
||||
std::unique_ptr<TSystemLog> createSystemLog(
|
||||
std::shared_ptr<TSystemLog> createSystemLog(
|
||||
Context & context,
|
||||
const String & default_database_name,
|
||||
const String & default_table_name,
|
||||
@ -33,7 +33,7 @@ std::unique_ptr<TSystemLog> createSystemLog(
|
||||
|
||||
size_t flush_interval_milliseconds = config.getUInt64(config_prefix + ".flush_interval_milliseconds", DEFAULT_SYSTEM_LOG_FLUSH_INTERVAL_MILLISECONDS);
|
||||
|
||||
return std::make_unique<TSystemLog>(context, database, table, engine, flush_interval_milliseconds);
|
||||
return std::make_shared<TSystemLog>(context, database, table, engine, flush_interval_milliseconds);
|
||||
}
|
||||
|
||||
}
|
||||
@ -49,6 +49,14 @@ SystemLogs::SystemLogs(Context & global_context, const Poco::Util::AbstractConfi
|
||||
}
|
||||
|
||||
|
||||
SystemLogs::~SystemLogs() = default;
|
||||
SystemLogs::~SystemLogs()
|
||||
{
|
||||
if (query_log)
|
||||
query_log->shutdown();
|
||||
if (query_thread_log)
|
||||
query_thread_log->shutdown();
|
||||
if (part_log)
|
||||
part_log->shutdown();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -1,6 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <thread>
|
||||
#include <atomic>
|
||||
#include <boost/noncopyable.hpp>
|
||||
#include <common/logger_useful.h>
|
||||
#include <Core/Types.h>
|
||||
@ -66,9 +67,9 @@ struct SystemLogs
|
||||
SystemLogs(Context & global_context, const Poco::Util::AbstractConfiguration & config);
|
||||
~SystemLogs();
|
||||
|
||||
std::unique_ptr<QueryLog> query_log; /// Used to log queries.
|
||||
std::unique_ptr<QueryThreadLog> query_thread_log; /// Used to log query threads.
|
||||
std::unique_ptr<PartLog> part_log; /// Used to log operations with parts
|
||||
std::shared_ptr<QueryLog> query_log; /// Used to log queries.
|
||||
std::shared_ptr<QueryThreadLog> query_thread_log; /// Used to log query threads.
|
||||
std::shared_ptr<PartLog> part_log; /// Used to log operations with parts
|
||||
|
||||
String part_log_database;
|
||||
};
|
||||
@ -78,7 +79,6 @@ template <typename LogElement>
|
||||
class SystemLog : private boost::noncopyable
|
||||
{
|
||||
public:
|
||||
|
||||
using Self = SystemLog;
|
||||
|
||||
/** Parameter: table name where to write log.
|
||||
@ -103,13 +103,23 @@ public:
|
||||
*/
|
||||
void add(const LogElement & element)
|
||||
{
|
||||
if (is_shutdown)
|
||||
return;
|
||||
|
||||
/// Without try we could block here in case of queue overflow.
|
||||
if (!queue.tryPush({false, element}))
|
||||
LOG_ERROR(log, "SystemLog queue is full");
|
||||
}
|
||||
|
||||
/// Flush data in the buffer to disk
|
||||
void flush(bool quiet = false);
|
||||
void flush()
|
||||
{
|
||||
if (!is_shutdown)
|
||||
flushImpl(false);
|
||||
}
|
||||
|
||||
/// Stop the background flush thread before destructor. No more data will be written.
|
||||
void shutdown();
|
||||
|
||||
protected:
|
||||
Context & context;
|
||||
@ -118,6 +128,7 @@ protected:
|
||||
const String storage_def;
|
||||
StoragePtr table;
|
||||
const size_t flush_interval_milliseconds;
|
||||
std::atomic<bool> is_shutdown{false};
|
||||
|
||||
using QueueItem = std::pair<bool, LogElement>; /// First element is shutdown flag for thread.
|
||||
|
||||
@ -145,6 +156,8 @@ protected:
|
||||
*/
|
||||
bool is_prepared = false;
|
||||
void prepareTable();
|
||||
|
||||
void flushImpl(bool quiet);
|
||||
};
|
||||
|
||||
|
||||
@ -166,14 +179,25 @@ SystemLog<LogElement>::SystemLog(Context & context_,
|
||||
|
||||
|
||||
template <typename LogElement>
|
||||
SystemLog<LogElement>::~SystemLog()
|
||||
void SystemLog<LogElement>::shutdown()
|
||||
{
|
||||
bool old_val = false;
|
||||
if (!is_shutdown.compare_exchange_strong(old_val, true))
|
||||
return;
|
||||
|
||||
/// Tell thread to shutdown.
|
||||
queue.push({true, {}});
|
||||
saving_thread.join();
|
||||
}
|
||||
|
||||
|
||||
template <typename LogElement>
|
||||
SystemLog<LogElement>::~SystemLog()
|
||||
{
|
||||
shutdown();
|
||||
}
|
||||
|
||||
|
||||
template <typename LogElement>
|
||||
void SystemLog<LogElement>::threadFunction()
|
||||
{
|
||||
@ -236,7 +260,7 @@ void SystemLog<LogElement>::threadFunction()
|
||||
if (milliseconds_elapsed >= flush_interval_milliseconds)
|
||||
{
|
||||
/// Write data to a table.
|
||||
flush(true);
|
||||
flushImpl(true);
|
||||
time_after_last_write.restart();
|
||||
}
|
||||
}
|
||||
@ -251,7 +275,7 @@ void SystemLog<LogElement>::threadFunction()
|
||||
|
||||
|
||||
template <typename LogElement>
|
||||
void SystemLog<LogElement>::flush(bool quiet)
|
||||
void SystemLog<LogElement>::flushImpl(bool quiet)
|
||||
{
|
||||
std::unique_lock lock(data_mutex);
|
||||
|
||||
|
@ -1,5 +1,4 @@
|
||||
#include <string.h>
|
||||
|
||||
#include <Poco/RegularExpression.h>
|
||||
#include <Poco/Net/IPAddress.h>
|
||||
#include <Poco/Net/SocketAddress.h>
|
||||
@ -7,7 +6,6 @@
|
||||
#include <Poco/Util/Application.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <Poco/String.h>
|
||||
|
||||
#include <Common/Exception.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/HexWriteBuffer.h>
|
||||
@ -16,12 +14,9 @@
|
||||
#include <Common/SimpleCache.h>
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
#include <Interpreters/Users.h>
|
||||
|
||||
#include <openssl/sha.h>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <ext/scope_guard.h>
|
||||
#include <Common/config.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
|
@ -6,7 +6,7 @@
|
||||
#include <Parsers/queryToString.h>
|
||||
#include <Interpreters/InJoinSubqueriesPreprocessor.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Storages/IStorage.h>
|
||||
#include <Databases/IDatabase.h>
|
||||
#include <Databases/DatabaseOrdinary.h>
|
||||
|
@ -3,7 +3,7 @@
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/queryToString.h>
|
||||
#include <Interpreters/LogicalExpressionsOptimizer.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
|
||||
#include <iostream>
|
||||
|
70
dbms/src/Parsers/ASTColumnDeclaration.cpp
Normal file
70
dbms/src/Parsers/ASTColumnDeclaration.cpp
Normal file
@ -0,0 +1,70 @@
|
||||
#include <Parsers/ASTColumnDeclaration.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
ASTPtr ASTColumnDeclaration::clone() const
|
||||
{
|
||||
const auto res = std::make_shared<ASTColumnDeclaration>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (type)
|
||||
{
|
||||
res->type = type;
|
||||
res->children.push_back(res->type);
|
||||
}
|
||||
|
||||
if (default_expression)
|
||||
{
|
||||
res->default_expression = default_expression->clone();
|
||||
res->children.push_back(res->default_expression);
|
||||
}
|
||||
|
||||
if (codec)
|
||||
{
|
||||
res->codec = codec->clone();
|
||||
res->children.push_back(res->codec);
|
||||
}
|
||||
|
||||
if (comment)
|
||||
{
|
||||
res->comment = comment->clone();
|
||||
res->children.push_back(res->comment);
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void ASTColumnDeclaration::formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
|
||||
{
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
settings.ostr << settings.nl_or_ws << indent_str << backQuoteIfNeed(name);
|
||||
if (type)
|
||||
{
|
||||
settings.ostr << ' ';
|
||||
type->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (default_expression)
|
||||
{
|
||||
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << default_specifier << (settings.hilite ? hilite_none : "") << ' ';
|
||||
default_expression->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (comment)
|
||||
{
|
||||
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << "COMMENT" << (settings.hilite ? hilite_none : "") << ' ';
|
||||
comment->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (codec)
|
||||
{
|
||||
settings.ostr << ' ';
|
||||
codec->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
@ -20,68 +20,8 @@ public:
|
||||
|
||||
String getID(char delim) const override { return "ColumnDeclaration" + (delim + name); }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
const auto res = std::make_shared<ASTColumnDeclaration>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (type)
|
||||
{
|
||||
res->type = type;
|
||||
res->children.push_back(res->type);
|
||||
}
|
||||
|
||||
if (default_expression)
|
||||
{
|
||||
res->default_expression = default_expression->clone();
|
||||
res->children.push_back(res->default_expression);
|
||||
}
|
||||
|
||||
if (codec)
|
||||
{
|
||||
res->codec = codec->clone();
|
||||
res->children.push_back(res->codec);
|
||||
}
|
||||
|
||||
if (comment)
|
||||
{
|
||||
res->comment = comment->clone();
|
||||
res->children.push_back(res->comment);
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
settings.ostr << settings.nl_or_ws << indent_str << backQuoteIfNeed(name);
|
||||
if (type)
|
||||
{
|
||||
settings.ostr << ' ';
|
||||
type->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (default_expression)
|
||||
{
|
||||
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << default_specifier << (settings.hilite ? hilite_none : "") << ' ';
|
||||
default_expression->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (comment)
|
||||
{
|
||||
settings.ostr << ' ' << (settings.hilite ? hilite_keyword : "") << "COMMENT" << (settings.hilite ? hilite_none : "") << ' ';
|
||||
comment->formatImpl(settings, state, frame);
|
||||
}
|
||||
|
||||
if (codec)
|
||||
{
|
||||
settings.ostr << ' ';
|
||||
codec->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
ASTPtr clone() const override;
|
||||
void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override;
|
||||
};
|
||||
|
||||
}
|
||||
|
252
dbms/src/Parsers/ASTCreateQuery.cpp
Normal file
252
dbms/src/Parsers/ASTCreateQuery.cpp
Normal file
@ -0,0 +1,252 @@
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
ASTPtr ASTStorage::clone() const
|
||||
{
|
||||
auto res = std::make_shared<ASTStorage>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (engine)
|
||||
res->set(res->engine, engine->clone());
|
||||
if (partition_by)
|
||||
res->set(res->partition_by, partition_by->clone());
|
||||
if (primary_key)
|
||||
res->set(res->primary_key, primary_key->clone());
|
||||
if (order_by)
|
||||
res->set(res->order_by, order_by->clone());
|
||||
if (sample_by)
|
||||
res->set(res->sample_by, sample_by->clone());
|
||||
|
||||
if (settings)
|
||||
res->set(res->settings, settings->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void ASTStorage::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
|
||||
{
|
||||
if (engine)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ENGINE" << (s.hilite ? hilite_none : "") << " = ";
|
||||
engine->formatImpl(s, state, frame);
|
||||
}
|
||||
if (partition_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PARTITION BY " << (s.hilite ? hilite_none : "");
|
||||
partition_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (primary_key)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PRIMARY KEY " << (s.hilite ? hilite_none : "");
|
||||
primary_key->formatImpl(s, state, frame);
|
||||
}
|
||||
if (order_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ORDER BY " << (s.hilite ? hilite_none : "");
|
||||
order_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (sample_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SAMPLE BY " << (s.hilite ? hilite_none : "");
|
||||
sample_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (settings)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SETTINGS " << (s.hilite ? hilite_none : "");
|
||||
settings->formatImpl(s, state, frame);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
class ASTColumnsElement : public IAST
|
||||
{
|
||||
public:
|
||||
String prefix;
|
||||
IAST * elem;
|
||||
|
||||
String getID(char c) const override { return "ASTColumnsElement for " + elem->getID(c); }
|
||||
|
||||
ASTPtr clone() const override;
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
|
||||
};
|
||||
|
||||
ASTPtr ASTColumnsElement::clone() const
|
||||
{
|
||||
auto res = std::make_shared<ASTColumnsElement>();
|
||||
res->prefix = prefix;
|
||||
if (elem)
|
||||
res->set(res->elem, elem->clone());
|
||||
return res;
|
||||
}
|
||||
|
||||
void ASTColumnsElement::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
|
||||
{
|
||||
if (!elem)
|
||||
return;
|
||||
|
||||
if (prefix.empty())
|
||||
{
|
||||
elem->formatImpl(s, state, frame);
|
||||
return;
|
||||
}
|
||||
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
s.ostr << s.nl_or_ws << indent_str;
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << prefix << (s.hilite ? hilite_none : "");
|
||||
|
||||
FormatSettings nested_settings = s;
|
||||
nested_settings.one_line = true;
|
||||
nested_settings.nl_or_ws = ' ';
|
||||
|
||||
elem->formatImpl(nested_settings, state, frame);
|
||||
}
|
||||
|
||||
|
||||
ASTPtr ASTColumns::clone() const
|
||||
{
|
||||
auto res = std::make_shared<ASTColumns>();
|
||||
|
||||
if (columns)
|
||||
res->set(res->columns, columns->clone());
|
||||
if (indices)
|
||||
res->set(res->indices, indices->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void ASTColumns::formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const
|
||||
{
|
||||
ASTExpressionList list;
|
||||
|
||||
if (columns)
|
||||
{
|
||||
for (const auto & column : columns->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "";
|
||||
elem->set(elem->elem, column->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
}
|
||||
if (indices)
|
||||
{
|
||||
for (const auto & index : indices->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "INDEX";
|
||||
elem->set(elem->elem, index->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
}
|
||||
|
||||
if (!list.children.empty())
|
||||
list.formatImpl(s, state, frame);
|
||||
}
|
||||
|
||||
|
||||
ASTPtr ASTCreateQuery::clone() const
|
||||
{
|
||||
auto res = std::make_shared<ASTCreateQuery>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (columns_list)
|
||||
res->set(res->columns_list, columns_list->clone());
|
||||
if (storage)
|
||||
res->set(res->storage, storage->clone());
|
||||
if (select)
|
||||
res->set(res->select, select->clone());
|
||||
|
||||
cloneOutputOptions(*res);
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void ASTCreateQuery::formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
|
||||
{
|
||||
frame.need_parens = false;
|
||||
|
||||
if (!database.empty() && table.empty())
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "")
|
||||
<< (attach ? "ATTACH DATABASE " : "CREATE DATABASE ")
|
||||
<< (if_not_exists ? "IF NOT EXISTS " : "")
|
||||
<< (settings.hilite ? hilite_none : "")
|
||||
<< backQuoteIfNeed(database);
|
||||
formatOnCluster(settings);
|
||||
|
||||
if (storage)
|
||||
storage->formatImpl(settings, state, frame);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
{
|
||||
std::string what = "TABLE";
|
||||
if (is_view)
|
||||
what = "VIEW";
|
||||
if (is_materialized_view)
|
||||
what = "MATERIALIZED VIEW";
|
||||
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "")
|
||||
<< (attach ? "ATTACH " : "CREATE ")
|
||||
<< (temporary ? "TEMPORARY " : "")
|
||||
<< (replace_view ? "OR REPLACE " : "")
|
||||
<< what << " "
|
||||
<< (if_not_exists ? "IF NOT EXISTS " : "")
|
||||
<< (settings.hilite ? hilite_none : "")
|
||||
<< (!database.empty() ? backQuoteIfNeed(database) + "." : "") << backQuoteIfNeed(table);
|
||||
formatOnCluster(settings);
|
||||
}
|
||||
|
||||
if (!to_table.empty())
|
||||
{
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "") << " TO " << (settings.hilite ? hilite_none : "")
|
||||
<< (!to_database.empty() ? backQuoteIfNeed(to_database) + "." : "") << backQuoteIfNeed(to_table);
|
||||
}
|
||||
|
||||
if (!as_table.empty())
|
||||
{
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_none : "")
|
||||
<< (!as_database.empty() ? backQuoteIfNeed(as_database) + "." : "") << backQuoteIfNeed(as_table);
|
||||
}
|
||||
|
||||
if (columns_list)
|
||||
{
|
||||
settings.ostr << (settings.one_line ? " (" : "\n(");
|
||||
FormatStateStacked frame_nested = frame;
|
||||
++frame_nested.indent;
|
||||
columns_list->formatImpl(settings, state, frame_nested);
|
||||
settings.ostr << (settings.one_line ? ")" : "\n)");
|
||||
}
|
||||
|
||||
if (storage)
|
||||
storage->formatImpl(settings, state, frame);
|
||||
|
||||
if (is_populate)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " POPULATE" << (settings.hilite ? hilite_none : "");
|
||||
}
|
||||
|
||||
if (select)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS" << settings.nl_or_ws << (settings.hilite ? hilite_none : "");
|
||||
select->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -1,9 +1,5 @@
|
||||
#pragma once
|
||||
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
#include <Parsers/ASTQueryWithTableAndOutput.h>
|
||||
#include <Parsers/ASTQueryWithOnCluster.h>
|
||||
|
||||
@ -11,6 +7,9 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class ASTFunction;
|
||||
class ASTSetQuery;
|
||||
|
||||
class ASTStorage : public IAST
|
||||
{
|
||||
public:
|
||||
@ -23,154 +22,30 @@ public:
|
||||
|
||||
String getID(char) const override { return "Storage definition"; }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTStorage>(*this);
|
||||
res->children.clear();
|
||||
ASTPtr clone() const override;
|
||||
|
||||
if (engine)
|
||||
res->set(res->engine, engine->clone());
|
||||
if (partition_by)
|
||||
res->set(res->partition_by, partition_by->clone());
|
||||
if (primary_key)
|
||||
res->set(res->primary_key, primary_key->clone());
|
||||
if (order_by)
|
||||
res->set(res->order_by, order_by->clone());
|
||||
if (sample_by)
|
||||
res->set(res->sample_by, sample_by->clone());
|
||||
|
||||
if (settings)
|
||||
res->set(res->settings, settings->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
if (engine)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ENGINE" << (s.hilite ? hilite_none : "") << " = ";
|
||||
engine->formatImpl(s, state, frame);
|
||||
}
|
||||
if (partition_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PARTITION BY " << (s.hilite ? hilite_none : "");
|
||||
partition_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (primary_key)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "PRIMARY KEY " << (s.hilite ? hilite_none : "");
|
||||
primary_key->formatImpl(s, state, frame);
|
||||
}
|
||||
if (order_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "ORDER BY " << (s.hilite ? hilite_none : "");
|
||||
order_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (sample_by)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SAMPLE BY " << (s.hilite ? hilite_none : "");
|
||||
sample_by->formatImpl(s, state, frame);
|
||||
}
|
||||
if (settings)
|
||||
{
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << "SETTINGS " << (s.hilite ? hilite_none : "");
|
||||
settings->formatImpl(s, state, frame);
|
||||
}
|
||||
|
||||
}
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
|
||||
};
|
||||
|
||||
|
||||
class ASTExpressionList;
|
||||
|
||||
class ASTColumns : public IAST
|
||||
{
|
||||
private:
|
||||
class ASTColumnsElement : public IAST
|
||||
{
|
||||
public:
|
||||
String prefix;
|
||||
IAST * elem;
|
||||
|
||||
String getID(char c) const override { return "ASTColumnsElement for " + elem->getID(c); }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTColumnsElement>();
|
||||
res->prefix = prefix;
|
||||
if (elem)
|
||||
res->set(res->elem, elem->clone());
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
if (!elem)
|
||||
return;
|
||||
|
||||
if (prefix.empty())
|
||||
{
|
||||
elem->formatImpl(s, state, frame);
|
||||
return;
|
||||
}
|
||||
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
s.ostr << s.nl_or_ws << indent_str;
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << prefix << (s.hilite ? hilite_none : "");
|
||||
|
||||
FormatSettings nested_settings = s;
|
||||
nested_settings.one_line = true;
|
||||
nested_settings.nl_or_ws = ' ';
|
||||
|
||||
elem->formatImpl(nested_settings, state, frame);
|
||||
}
|
||||
};
|
||||
public:
|
||||
ASTExpressionList * columns = nullptr;
|
||||
ASTExpressionList * indices = nullptr;
|
||||
|
||||
String getID(char) const override { return "Columns definition"; }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTColumns>();
|
||||
ASTPtr clone() const override;
|
||||
|
||||
if (columns)
|
||||
res->set(res->columns, columns->clone());
|
||||
if (indices)
|
||||
res->set(res->indices, indices->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
ASTExpressionList list;
|
||||
|
||||
if (columns)
|
||||
for (const auto & column : columns->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "";
|
||||
elem->set(elem->elem, column->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
if (indices)
|
||||
for (const auto & index : indices->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "INDEX";
|
||||
elem->set(elem->elem, index->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
|
||||
if (!list.children.empty())
|
||||
list.formatImpl(s, state, frame);
|
||||
}
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override;
|
||||
};
|
||||
|
||||
|
||||
class ASTSelectWithUnionQuery;
|
||||
|
||||
/// CREATE TABLE or ATTACH TABLE query
|
||||
class ASTCreateQuery : public ASTQueryWithTableAndOutput, public ASTQueryWithOnCluster
|
||||
{
|
||||
@ -192,22 +67,7 @@ public:
|
||||
/** Get the text that identifies this element. */
|
||||
String getID(char delim) const override { return (attach ? "AttachQuery" : "CreateQuery") + (delim + database) + delim + table; }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTCreateQuery>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (columns_list)
|
||||
res->set(res->columns_list, columns_list->clone());
|
||||
if (storage)
|
||||
res->set(res->storage, storage->clone());
|
||||
if (select)
|
||||
res->set(res->select, select->clone());
|
||||
|
||||
cloneOutputOptions(*res);
|
||||
|
||||
return res;
|
||||
}
|
||||
ASTPtr clone() const override;
|
||||
|
||||
ASTPtr getRewrittenASTWithoutOnCluster(const std::string & new_database) const override
|
||||
{
|
||||
@ -215,81 +75,7 @@ public:
|
||||
}
|
||||
|
||||
protected:
|
||||
void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
frame.need_parens = false;
|
||||
|
||||
if (!database.empty() && table.empty())
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "")
|
||||
<< (attach ? "ATTACH DATABASE " : "CREATE DATABASE ")
|
||||
<< (if_not_exists ? "IF NOT EXISTS " : "")
|
||||
<< (settings.hilite ? hilite_none : "")
|
||||
<< backQuoteIfNeed(database);
|
||||
formatOnCluster(settings);
|
||||
|
||||
if (storage)
|
||||
storage->formatImpl(settings, state, frame);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
{
|
||||
std::string what = "TABLE";
|
||||
if (is_view)
|
||||
what = "VIEW";
|
||||
if (is_materialized_view)
|
||||
what = "MATERIALIZED VIEW";
|
||||
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "")
|
||||
<< (attach ? "ATTACH " : "CREATE ")
|
||||
<< (temporary ? "TEMPORARY " : "")
|
||||
<< (replace_view ? "OR REPLACE " : "")
|
||||
<< what << " "
|
||||
<< (if_not_exists ? "IF NOT EXISTS " : "")
|
||||
<< (settings.hilite ? hilite_none : "")
|
||||
<< (!database.empty() ? backQuoteIfNeed(database) + "." : "") << backQuoteIfNeed(table);
|
||||
formatOnCluster(settings);
|
||||
}
|
||||
|
||||
if (!to_table.empty())
|
||||
{
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "") << " TO " << (settings.hilite ? hilite_none : "")
|
||||
<< (!to_database.empty() ? backQuoteIfNeed(to_database) + "." : "") << backQuoteIfNeed(to_table);
|
||||
}
|
||||
|
||||
if (!as_table.empty())
|
||||
{
|
||||
settings.ostr
|
||||
<< (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_none : "")
|
||||
<< (!as_database.empty() ? backQuoteIfNeed(as_database) + "." : "") << backQuoteIfNeed(as_table);
|
||||
}
|
||||
|
||||
if (columns_list)
|
||||
{
|
||||
settings.ostr << (settings.one_line ? " (" : "\n(");
|
||||
FormatStateStacked frame_nested = frame;
|
||||
++frame_nested.indent;
|
||||
columns_list->formatImpl(settings, state, frame_nested);
|
||||
settings.ostr << (settings.one_line ? ")" : "\n)");
|
||||
}
|
||||
|
||||
if (storage)
|
||||
storage->formatImpl(settings, state, frame);
|
||||
|
||||
if (is_populate)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " POPULATE" << (settings.hilite ? hilite_none : "");
|
||||
}
|
||||
|
||||
if (select)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS" << settings.nl_or_ws << (settings.hilite ? hilite_none : "");
|
||||
select->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -4,6 +4,8 @@
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
#include <Parsers/ExpressionListParsers.h>
|
||||
#include <Parsers/ParserCreateQuery.h>
|
||||
#include <Parsers/ParserSelectWithUnionQuery.h>
|
||||
|
@ -427,7 +427,7 @@ ColumnsDescription ColumnsDescription::parse(const String & str)
|
||||
|
||||
const ColumnsDescription * ColumnsDescription::loadFromContext(const Context & context, const String & db, const String & table)
|
||||
{
|
||||
if (context.getSettingsRef().insert_sample_with_metadata)
|
||||
if (context.getSettingsRef().input_format_defaults_for_omitted_fields)
|
||||
{
|
||||
if (context.isTableExist(db, table))
|
||||
{
|
||||
|
@ -1,5 +1,7 @@
|
||||
#include <Storages/Kafka/KafkaSettings.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Common/Exception.h>
|
||||
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <Core/Defines.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
|
@ -317,7 +317,7 @@ bool KeyCondition::addCondition(const String & column, const Range & range)
|
||||
/** Computes value of constant expression and its data type.
|
||||
* Returns false, if expression isn't constant.
|
||||
*/
|
||||
static bool getConstant(const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type)
|
||||
bool KeyCondition::getConstant(const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type)
|
||||
{
|
||||
String column_name = expr->getColumnName();
|
||||
|
||||
|
@ -266,6 +266,11 @@ public:
|
||||
*/
|
||||
using MonotonicFunctionsChain = std::vector<FunctionBasePtr>;
|
||||
|
||||
/** Computes value of constant expression and its data type.
|
||||
* Returns false, if expression isn't constant.
|
||||
*/
|
||||
static bool getConstant(
|
||||
const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type);
|
||||
|
||||
static Block getBlockWithConstants(
|
||||
const ASTPtr & query, const SyntaxAnalyzerResultPtr & syntax_analyzer_result, const Context & context);
|
||||
|
710
dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.cpp
Normal file
710
dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.cpp
Normal file
@ -0,0 +1,710 @@
|
||||
#include <Storages/MergeTree/MergeTreeBloomFilterIndex.h>
|
||||
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
#include <Common/UTF8Helpers.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Interpreters/ExpressionAnalyzer.h>
|
||||
#include <Interpreters/SyntaxAnalyzer.h>
|
||||
#include <Interpreters/QueryNormalizer.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
#include <Storages/MergeTree/RPNBuilder.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTSubquery.h>
|
||||
|
||||
#include <Poco/Logger.h>
|
||||
|
||||
#include <boost/algorithm/string.hpp>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int INCORRECT_QUERY;
|
||||
}
|
||||
|
||||
|
||||
/// Adds all tokens from string to bloom filter.
|
||||
static void stringToBloomFilter(
|
||||
const char * data, size_t size, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
|
||||
{
|
||||
size_t cur = 0;
|
||||
size_t token_start = 0;
|
||||
size_t token_len = 0;
|
||||
while (cur < size && token_extractor->next(data, size, &cur, &token_start, &token_len))
|
||||
bloom_filter.add(data + token_start, token_len);
|
||||
}
|
||||
|
||||
/// Adds all tokens from like pattern string to bloom filter. (Because like pattern can contain `\%` and `\_`.)
|
||||
static void likeStringToBloomFilter(
|
||||
const String & data, const std::unique_ptr<ITokenExtractor> & token_extractor, StringBloomFilter & bloom_filter)
|
||||
{
|
||||
size_t cur = 0;
|
||||
String token;
|
||||
while (cur < data.size() && token_extractor->nextLike(data, &cur, token))
|
||||
bloom_filter.add(token.c_str(), token.size());
|
||||
}
|
||||
|
||||
|
||||
MergeTreeBloomFilterIndexGranule::MergeTreeBloomFilterIndexGranule(const MergeTreeBloomFilterIndex & index)
|
||||
: IMergeTreeIndexGranule()
|
||||
, index(index)
|
||||
, bloom_filters(
|
||||
index.columns.size(), StringBloomFilter(index.bloom_filter_size, index.bloom_filter_hashes, index.seed))
|
||||
, has_elems(false) {}
|
||||
|
||||
void MergeTreeBloomFilterIndexGranule::serializeBinary(WriteBuffer & ostr) const
|
||||
{
|
||||
if (empty())
|
||||
throw Exception(
|
||||
"Attempt to write empty minmax index `" + index.name + "`", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
for (const auto & bloom_filter : bloom_filters)
|
||||
ostr.write(reinterpret_cast<const char *>(bloom_filter.getFilter().data()), index.bloom_filter_size);
|
||||
}
|
||||
|
||||
void MergeTreeBloomFilterIndexGranule::deserializeBinary(ReadBuffer & istr)
|
||||
{
|
||||
for (auto & bloom_filter : bloom_filters)
|
||||
{
|
||||
istr.read(reinterpret_cast<char *>(bloom_filter.getFilter().data()), index.bloom_filter_size);
|
||||
}
|
||||
has_elems = true;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeBloomFilterIndexAggregator::MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index)
|
||||
: index(index), granule(std::make_shared<MergeTreeBloomFilterIndexGranule>(index)) {}
|
||||
|
||||
MergeTreeIndexGranulePtr MergeTreeBloomFilterIndexAggregator::getGranuleAndReset()
|
||||
{
|
||||
auto new_granule = std::make_shared<MergeTreeBloomFilterIndexGranule>(index);
|
||||
new_granule.swap(granule);
|
||||
return new_granule;
|
||||
}
|
||||
|
||||
void MergeTreeBloomFilterIndexAggregator::update(const Block & block, size_t * pos, size_t limit)
|
||||
{
|
||||
if (*pos >= block.rows())
|
||||
throw Exception(
|
||||
"The provided position is not less than the number of block rows. Position: "
|
||||
+ toString(*pos) + ", Block rows: " + toString(block.rows()) + ".", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
size_t rows_read = std::min(limit, block.rows() - *pos);
|
||||
|
||||
for (size_t col = 0; col < index.columns.size(); ++col)
|
||||
{
|
||||
const auto & column = block.getByName(index.columns[col]).column;
|
||||
for (size_t i = 0; i < rows_read; ++i)
|
||||
{
|
||||
auto ref = column->getDataAt(*pos + i);
|
||||
stringToBloomFilter(ref.data, ref.size, index.token_extractor_func, granule->bloom_filters[col]);
|
||||
}
|
||||
}
|
||||
granule->has_elems = true;
|
||||
*pos += rows_read;
|
||||
}
|
||||
|
||||
|
||||
const BloomFilterCondition::AtomMap BloomFilterCondition::atom_map
|
||||
{
|
||||
{
|
||||
"notEquals",
|
||||
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
|
||||
{
|
||||
out.function = RPNElement::FUNCTION_NOT_EQUALS;
|
||||
out.bloom_filter = std::make_unique<StringBloomFilter>(
|
||||
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
|
||||
|
||||
const auto & str = value.get<String>();
|
||||
stringToBloomFilter(str.c_str(), str.size(), idx.token_extractor_func, *out.bloom_filter);
|
||||
return true;
|
||||
}
|
||||
},
|
||||
{
|
||||
"equals",
|
||||
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
|
||||
{
|
||||
out.function = RPNElement::FUNCTION_EQUALS;
|
||||
out.bloom_filter = std::make_unique<StringBloomFilter>(
|
||||
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
|
||||
|
||||
const auto & str = value.get<String>();
|
||||
stringToBloomFilter(str.c_str(), str.size(), idx.token_extractor_func, *out.bloom_filter);
|
||||
return true;
|
||||
}
|
||||
},
|
||||
{
|
||||
"like",
|
||||
[] (RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)
|
||||
{
|
||||
out.function = RPNElement::FUNCTION_LIKE;
|
||||
out.bloom_filter = std::make_unique<StringBloomFilter>(
|
||||
idx.bloom_filter_size, idx.bloom_filter_hashes, idx.seed);
|
||||
|
||||
const auto & str = value.get<String>();
|
||||
likeStringToBloomFilter(str, idx.token_extractor_func, *out.bloom_filter);
|
||||
return true;
|
||||
}
|
||||
},
|
||||
{
|
||||
"notIn",
|
||||
[] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
|
||||
{
|
||||
out.function = RPNElement::FUNCTION_NOT_IN;
|
||||
return true;
|
||||
}
|
||||
},
|
||||
{
|
||||
"in",
|
||||
[] (RPNElement & out, const Field &, const MergeTreeBloomFilterIndex &)
|
||||
{
|
||||
out.function = RPNElement::FUNCTION_IN;
|
||||
return true;
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
BloomFilterCondition::BloomFilterCondition(
|
||||
const SelectQueryInfo & query_info,
|
||||
const Context & context,
|
||||
const MergeTreeBloomFilterIndex & index_) : index(index_), prepared_sets(query_info.sets)
|
||||
{
|
||||
rpn = std::move(
|
||||
RPNBuilder<RPNElement>(
|
||||
query_info, context,
|
||||
[this] (const ASTPtr & node,
|
||||
const Context & /* context */,
|
||||
Block & block_with_constants,
|
||||
RPNElement & out) -> bool
|
||||
{
|
||||
return this->atomFromAST(node, block_with_constants, out);
|
||||
}).extractRPN());
|
||||
}
|
||||
|
||||
bool BloomFilterCondition::alwaysUnknownOrTrue() const
|
||||
{
|
||||
/// Check like in KeyCondition.
|
||||
std::vector<bool> rpn_stack;
|
||||
|
||||
for (const auto & element : rpn)
|
||||
{
|
||||
if (element.function == RPNElement::FUNCTION_UNKNOWN
|
||||
|| element.function == RPNElement::ALWAYS_TRUE)
|
||||
{
|
||||
rpn_stack.push_back(true);
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_EQUALS
|
||||
|| element.function == RPNElement::FUNCTION_NOT_EQUALS
|
||||
|| element.function == RPNElement::FUNCTION_LIKE
|
||||
|| element.function == RPNElement::FUNCTION_NOT_LIKE
|
||||
|| element.function == RPNElement::FUNCTION_IN
|
||||
|| element.function == RPNElement::FUNCTION_NOT_IN
|
||||
|| element.function == RPNElement::ALWAYS_FALSE)
|
||||
{
|
||||
rpn_stack.push_back(false);
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_NOT)
|
||||
{
|
||||
// do nothing
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_AND)
|
||||
{
|
||||
auto arg1 = rpn_stack.back();
|
||||
rpn_stack.pop_back();
|
||||
auto arg2 = rpn_stack.back();
|
||||
rpn_stack.back() = arg1 && arg2;
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_OR)
|
||||
{
|
||||
auto arg1 = rpn_stack.back();
|
||||
rpn_stack.pop_back();
|
||||
auto arg2 = rpn_stack.back();
|
||||
rpn_stack.back() = arg1 || arg2;
|
||||
}
|
||||
else
|
||||
throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
return rpn_stack[0];
|
||||
}
|
||||
|
||||
bool BloomFilterCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
|
||||
{
|
||||
std::shared_ptr<MergeTreeBloomFilterIndexGranule> granule
|
||||
= std::dynamic_pointer_cast<MergeTreeBloomFilterIndexGranule>(idx_granule);
|
||||
if (!granule)
|
||||
throw Exception(
|
||||
"BloomFilter index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
/// Check like in KeyCondition.
|
||||
std::vector<BoolMask> rpn_stack;
|
||||
for (const auto & element : rpn)
|
||||
{
|
||||
if (element.function == RPNElement::FUNCTION_UNKNOWN)
|
||||
{
|
||||
rpn_stack.emplace_back(true, true);
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_EQUALS
|
||||
|| element.function == RPNElement::FUNCTION_NOT_EQUALS)
|
||||
{
|
||||
rpn_stack.emplace_back(
|
||||
granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true);
|
||||
|
||||
if (element.function == RPNElement::FUNCTION_NOT_EQUALS)
|
||||
rpn_stack.back() = !rpn_stack.back();
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_LIKE
|
||||
|| element.function == RPNElement::FUNCTION_NOT_LIKE)
|
||||
{
|
||||
rpn_stack.emplace_back(
|
||||
granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true);
|
||||
|
||||
if (element.function == RPNElement::FUNCTION_NOT_LIKE)
|
||||
rpn_stack.back() = !rpn_stack.back();
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_IN
|
||||
|| element.function == RPNElement::FUNCTION_NOT_IN)
|
||||
{
|
||||
std::vector<bool> result(element.set_bloom_filters.back().size(), true);
|
||||
|
||||
for (size_t column = 0; column < element.set_key_position.size(); ++column)
|
||||
{
|
||||
const size_t key_idx = element.set_key_position[column];
|
||||
|
||||
const auto & bloom_filters = element.set_bloom_filters[column];
|
||||
for (size_t row = 0; row < bloom_filters.size(); ++row)
|
||||
result[row] = result[row] && granule->bloom_filters[key_idx].contains(bloom_filters[row]);
|
||||
}
|
||||
|
||||
rpn_stack.emplace_back(
|
||||
std::find(std::cbegin(result), std::cend(result), true) != std::end(result), true);
|
||||
if (element.function == RPNElement::FUNCTION_NOT_IN)
|
||||
rpn_stack.back() = !rpn_stack.back();
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_NOT)
|
||||
{
|
||||
rpn_stack.back() = !rpn_stack.back();
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_AND)
|
||||
{
|
||||
auto arg1 = rpn_stack.back();
|
||||
rpn_stack.pop_back();
|
||||
auto arg2 = rpn_stack.back();
|
||||
rpn_stack.back() = arg1 & arg2;
|
||||
}
|
||||
else if (element.function == RPNElement::FUNCTION_OR)
|
||||
{
|
||||
auto arg1 = rpn_stack.back();
|
||||
rpn_stack.pop_back();
|
||||
auto arg2 = rpn_stack.back();
|
||||
rpn_stack.back() = arg1 | arg2;
|
||||
}
|
||||
else if (element.function == RPNElement::ALWAYS_FALSE)
|
||||
{
|
||||
rpn_stack.emplace_back(false, true);
|
||||
}
|
||||
else if (element.function == RPNElement::ALWAYS_TRUE)
|
||||
{
|
||||
rpn_stack.emplace_back(true, false);
|
||||
}
|
||||
else
|
||||
throw Exception("Unexpected function type in KeyCondition::RPNElement", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
if (rpn_stack.size() != 1)
|
||||
throw Exception("Unexpected stack size in KeyCondition::mayBeTrueInRange", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
return rpn_stack[0].can_be_true;
|
||||
}
|
||||
|
||||
bool BloomFilterCondition::getKey(const ASTPtr & node, size_t & key_column_num)
|
||||
{
|
||||
auto it = std::find(index.columns.begin(), index.columns.end(), node->getColumnName());
|
||||
if (it == index.columns.end())
|
||||
return false;
|
||||
|
||||
key_column_num = static_cast<size_t>(it - index.columns.begin());
|
||||
return true;
|
||||
}
|
||||
|
||||
bool BloomFilterCondition::atomFromAST(
|
||||
const ASTPtr & node, Block & block_with_constants, RPNElement & out)
|
||||
{
|
||||
Field const_value;
|
||||
DataTypePtr const_type;
|
||||
if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
|
||||
{
|
||||
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
if (args.size() != 2)
|
||||
return false;
|
||||
|
||||
size_t key_arg_pos; /// Position of argument with key column (non-const argument)
|
||||
size_t key_column_num = -1; /// Number of a key column (inside key_column_names array)
|
||||
|
||||
if (functionIsInOrGlobalInOperator(func->name) && tryPrepareSetBloomFilter(args, out))
|
||||
{
|
||||
key_arg_pos = 0;
|
||||
}
|
||||
else if (KeyCondition::getConstant(args[1], block_with_constants, const_value, const_type) && getKey(args[0], key_column_num))
|
||||
{
|
||||
key_arg_pos = 0;
|
||||
}
|
||||
else if (KeyCondition::getConstant(args[0], block_with_constants, const_value, const_type) && getKey(args[1], key_column_num))
|
||||
{
|
||||
key_arg_pos = 1;
|
||||
}
|
||||
else
|
||||
return false;
|
||||
|
||||
if (const_type && const_type->getTypeId() != TypeIndex::String && const_type->getTypeId() != TypeIndex::FixedString)
|
||||
return false;
|
||||
|
||||
if (key_arg_pos == 1 && (func->name != "equals" || func->name != "notEquals"))
|
||||
return false;
|
||||
else if (!index.token_extractor_func->supportLike() && (func->name == "like" || func->name == "notLike"))
|
||||
return false;
|
||||
else
|
||||
key_arg_pos = 0;
|
||||
|
||||
const auto atom_it = atom_map.find(func->name);
|
||||
if (atom_it == std::end(atom_map))
|
||||
return false;
|
||||
|
||||
out.key_column = key_column_num;
|
||||
return atom_it->second(out, const_value, index);
|
||||
}
|
||||
else if (KeyCondition::getConstant(node, block_with_constants, const_value, const_type))
|
||||
{
|
||||
/// Check constant like in KeyCondition
|
||||
if (const_value.getType() == Field::Types::UInt64
|
||||
|| const_value.getType() == Field::Types::Int64
|
||||
|| const_value.getType() == Field::Types::Float64)
|
||||
{
|
||||
/// Zero in all types is represented in memory the same way as in UInt64.
|
||||
out.function = const_value.get<UInt64>()
|
||||
? RPNElement::ALWAYS_TRUE
|
||||
: RPNElement::ALWAYS_FALSE;
|
||||
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
bool BloomFilterCondition::tryPrepareSetBloomFilter(
|
||||
const ASTs & args,
|
||||
RPNElement & out)
|
||||
{
|
||||
const ASTPtr & left_arg = args[0];
|
||||
const ASTPtr & right_arg = args[1];
|
||||
|
||||
std::vector<KeyTuplePositionMapping> key_tuple_mapping;
|
||||
DataTypes data_types;
|
||||
|
||||
const auto * left_arg_tuple = typeid_cast<const ASTFunction *>(left_arg.get());
|
||||
if (left_arg_tuple && left_arg_tuple->name == "tuple")
|
||||
{
|
||||
const auto & tuple_elements = left_arg_tuple->arguments->children;
|
||||
for (size_t i = 0; i < tuple_elements.size(); ++i)
|
||||
{
|
||||
size_t key = 0;
|
||||
if (getKey(tuple_elements[i], key))
|
||||
{
|
||||
key_tuple_mapping.emplace_back(i, key);
|
||||
data_types.push_back(index.data_types[key]);
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
size_t key = 0;
|
||||
if (getKey(left_arg, key))
|
||||
{
|
||||
key_tuple_mapping.emplace_back(0, key);
|
||||
data_types.push_back(index.data_types[key]);
|
||||
}
|
||||
}
|
||||
|
||||
if (key_tuple_mapping.empty())
|
||||
return false;
|
||||
|
||||
PreparedSetKey set_key;
|
||||
if (typeid_cast<const ASTSubquery *>(right_arg.get()) || typeid_cast<const ASTIdentifier *>(right_arg.get()))
|
||||
set_key = PreparedSetKey::forSubquery(*right_arg);
|
||||
else
|
||||
set_key = PreparedSetKey::forLiteral(*right_arg, data_types);
|
||||
|
||||
auto set_it = prepared_sets.find(set_key);
|
||||
if (set_it == prepared_sets.end())
|
||||
return false;
|
||||
|
||||
const SetPtr & prepared_set = set_it->second;
|
||||
if (!prepared_set->hasExplicitSetElements())
|
||||
return false;
|
||||
|
||||
for (const auto & data_type : prepared_set->getDataTypes())
|
||||
if (data_type->getTypeId() != TypeIndex::String && data_type->getTypeId() != TypeIndex::FixedString)
|
||||
return false;
|
||||
|
||||
std::vector<std::vector<StringBloomFilter>> bloom_filters;
|
||||
std::vector<size_t> key_position;
|
||||
|
||||
const auto & columns = prepared_set->getSetElements();
|
||||
for (size_t col = 0; col < key_tuple_mapping.size(); ++col)
|
||||
{
|
||||
bloom_filters.emplace_back();
|
||||
key_position.push_back(key_tuple_mapping[col].key_index);
|
||||
|
||||
size_t tuple_idx = key_tuple_mapping[col].tuple_index;
|
||||
const auto & column = columns[tuple_idx];
|
||||
for (size_t row = 0; row < prepared_set->getTotalRowCount(); ++row)
|
||||
{
|
||||
bloom_filters.back().emplace_back(index.bloom_filter_size, index.bloom_filter_hashes, index.seed);
|
||||
auto ref = column->getDataAt(row);
|
||||
stringToBloomFilter(ref.data, ref.size, index.token_extractor_func, bloom_filters.back().back());
|
||||
}
|
||||
}
|
||||
|
||||
out.set_key_position = std::move(key_position);
|
||||
out.set_bloom_filters = std::move(bloom_filters);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeIndexGranulePtr MergeTreeBloomFilterIndex::createIndexGranule() const
|
||||
{
|
||||
return std::make_shared<MergeTreeBloomFilterIndexGranule>(*this);
|
||||
}
|
||||
|
||||
MergeTreeIndexAggregatorPtr MergeTreeBloomFilterIndex::createIndexAggregator() const
|
||||
{
|
||||
return std::make_shared<MergeTreeBloomFilterIndexAggregator>(*this);
|
||||
}
|
||||
|
||||
IndexConditionPtr MergeTreeBloomFilterIndex::createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const
|
||||
{
|
||||
return std::make_shared<BloomFilterCondition>(query, context, *this);
|
||||
};
|
||||
|
||||
bool MergeTreeBloomFilterIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
|
||||
{
|
||||
return std::find(std::cbegin(columns), std::cend(columns), node->getColumnName()) != std::cend(columns);
|
||||
}
|
||||
|
||||
|
||||
bool NgramTokenExtractor::next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const
|
||||
{
|
||||
*token_start = *pos;
|
||||
*token_len = 0;
|
||||
size_t code_points = 0;
|
||||
for (; code_points < n && *token_start + *token_len < len; ++code_points)
|
||||
{
|
||||
size_t sz = UTF8::seqLength(static_cast<UInt8>(data[*token_start + *token_len]));
|
||||
*token_len += sz;
|
||||
}
|
||||
*pos += UTF8::seqLength(static_cast<UInt8>(data[*pos]));
|
||||
return code_points == n;
|
||||
}
|
||||
|
||||
bool NgramTokenExtractor::nextLike(const String & str, size_t * pos, String & token) const
|
||||
{
|
||||
token.clear();
|
||||
|
||||
size_t code_points = 0;
|
||||
bool escaped = false;
|
||||
for (size_t i = *pos; i < str.size();)
|
||||
{
|
||||
if (escaped && (str[i] == '%' || str[i] == '_' || str[i] == '\\'))
|
||||
{
|
||||
token += str[i];
|
||||
++code_points;
|
||||
escaped = false;
|
||||
++i;
|
||||
}
|
||||
else if (!escaped && (str[i] == '%' || str[i] == '_'))
|
||||
{
|
||||
/// This token is too small, go to the next.
|
||||
token.clear();
|
||||
code_points = 0;
|
||||
escaped = false;
|
||||
*pos = ++i;
|
||||
}
|
||||
else if (!escaped && str[i] == '\\')
|
||||
{
|
||||
escaped = true;
|
||||
++i;
|
||||
}
|
||||
else
|
||||
{
|
||||
const size_t sz = UTF8::seqLength(static_cast<UInt8>(str[i]));
|
||||
for (size_t j = 0; j < sz; ++j)
|
||||
token += str[i + j];
|
||||
i += sz;
|
||||
++code_points;
|
||||
escaped = false;
|
||||
}
|
||||
|
||||
if (code_points == n)
|
||||
{
|
||||
*pos += UTF8::seqLength(static_cast<UInt8>(str[*pos]));
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
bool SplitTokenExtractor::next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const
|
||||
{
|
||||
*token_start = *pos;
|
||||
*token_len = 0;
|
||||
while (*pos < len)
|
||||
{
|
||||
if (isASCII(data[*pos]) && !isAlphaNumericASCII(data[*pos]))
|
||||
{
|
||||
if (*token_len > 0)
|
||||
return true;
|
||||
*token_start = ++*pos;
|
||||
}
|
||||
else
|
||||
{
|
||||
const size_t sz = UTF8::seqLength(static_cast<UInt8>(data[*pos]));
|
||||
*pos += sz;
|
||||
*token_len += sz;
|
||||
}
|
||||
}
|
||||
return *token_len > 0;
|
||||
}
|
||||
|
||||
bool SplitTokenExtractor::nextLike(const String & str, size_t * pos, String & token) const
|
||||
{
|
||||
token.clear();
|
||||
bool bad_token = false; // % or _ before token
|
||||
bool escaped = false;
|
||||
while (*pos < str.size())
|
||||
{
|
||||
if (!escaped && (str[*pos] == '%' || str[*pos] == '_'))
|
||||
{
|
||||
token.clear();
|
||||
bad_token = true;
|
||||
++*pos;
|
||||
}
|
||||
else if (!escaped && str[*pos] == '\\')
|
||||
{
|
||||
escaped = true;
|
||||
++*pos;
|
||||
}
|
||||
else if (isASCII(str[*pos]) && !isAlphaNumericASCII(str[*pos]))
|
||||
{
|
||||
if (!bad_token && !token.empty())
|
||||
return true;
|
||||
|
||||
token.clear();
|
||||
bad_token = false;
|
||||
escaped = false;
|
||||
++*pos;
|
||||
}
|
||||
else
|
||||
{
|
||||
const size_t sz = UTF8::seqLength(static_cast<UInt8>(str[*pos]));
|
||||
for (size_t j = 0; j < sz; ++j)
|
||||
{
|
||||
token += str[*pos];
|
||||
++*pos;
|
||||
}
|
||||
escaped = false;
|
||||
}
|
||||
}
|
||||
|
||||
return !bad_token && !token.empty();
|
||||
}
|
||||
|
||||
|
||||
std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
|
||||
const NamesAndTypesList & new_columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)
|
||||
{
|
||||
if (node->name.empty())
|
||||
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(node->expr->clone());
|
||||
|
||||
auto syntax = SyntaxAnalyzer(context, {}).analyze(
|
||||
expr_list, new_columns);
|
||||
auto index_expr = ExpressionAnalyzer(expr_list, syntax, context).getActions(false);
|
||||
|
||||
auto sample = ExpressionAnalyzer(expr_list, syntax, context)
|
||||
.getActions(true)->getSampleBlock();
|
||||
|
||||
Names columns;
|
||||
DataTypes data_types;
|
||||
|
||||
for (size_t i = 0; i < expr_list->children.size(); ++i)
|
||||
{
|
||||
const auto & column = sample.getByPosition(i);
|
||||
|
||||
columns.emplace_back(column.name);
|
||||
data_types.emplace_back(column.type);
|
||||
|
||||
if (data_types.back()->getTypeId() != TypeIndex::String
|
||||
&& data_types.back()->getTypeId() != TypeIndex::FixedString)
|
||||
throw Exception("Bloom filter index can be used only with `String` or `FixedString` column.", ErrorCodes::INCORRECT_QUERY);
|
||||
}
|
||||
|
||||
boost::algorithm::to_lower(node->type->name);
|
||||
if (node->type->name == NgramTokenExtractor::getName())
|
||||
{
|
||||
if (!node->type->arguments || node->type->arguments->children.size() != 4)
|
||||
throw Exception("`ngrambf` index must have exactly 4 arguments.", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
size_t n = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[0]).value.get<size_t>();
|
||||
size_t bloom_filter_size = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[1]).value.get<size_t>();
|
||||
size_t bloom_filter_hashes = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[2]).value.get<size_t>();
|
||||
size_t seed = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[3]).value.get<size_t>();
|
||||
|
||||
auto tokenizer = std::make_unique<NgramTokenExtractor>(n);
|
||||
|
||||
return std::make_unique<MergeTreeBloomFilterIndex>(
|
||||
node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
|
||||
bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
|
||||
}
|
||||
else if (node->type->name == SplitTokenExtractor::getName())
|
||||
{
|
||||
if (!node->type->arguments || node->type->arguments->children.size() != 3)
|
||||
throw Exception("`tokenbf` index must have exactly 3 arguments.", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
size_t bloom_filter_size = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[0]).value.get<size_t>();
|
||||
size_t bloom_filter_hashes = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[1]).value.get<size_t>();
|
||||
size_t seed = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[2]).value.get<size_t>();
|
||||
|
||||
auto tokenizer = std::make_unique<SplitTokenExtractor>();
|
||||
|
||||
return std::make_unique<MergeTreeBloomFilterIndex>(
|
||||
node->name, std::move(index_expr), columns, data_types, sample, node->granularity,
|
||||
bloom_filter_size, bloom_filter_hashes, seed, std::move(tokenizer));
|
||||
}
|
||||
else
|
||||
{
|
||||
throw Exception("Unknown index type: `" + node->name + "`.", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
207
dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.h
Normal file
207
dbms/src/Storages/MergeTree/MergeTreeBloomFilterIndex.h
Normal file
@ -0,0 +1,207 @@
|
||||
#pragma once
|
||||
|
||||
#include <Interpreters/BloomFilter.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/KeyCondition.h>
|
||||
|
||||
#include <memory>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeBloomFilterIndex;
|
||||
|
||||
|
||||
struct MergeTreeBloomFilterIndexGranule : public IMergeTreeIndexGranule
|
||||
{
|
||||
explicit MergeTreeBloomFilterIndexGranule(
|
||||
const MergeTreeBloomFilterIndex & index);
|
||||
|
||||
~MergeTreeBloomFilterIndexGranule() override = default;
|
||||
|
||||
void serializeBinary(WriteBuffer & ostr) const override;
|
||||
void deserializeBinary(ReadBuffer & istr) override;
|
||||
|
||||
bool empty() const override { return !has_elems; }
|
||||
|
||||
const MergeTreeBloomFilterIndex & index;
|
||||
std::vector<StringBloomFilter> bloom_filters;
|
||||
bool has_elems;
|
||||
};
|
||||
|
||||
using MergeTreeBloomFilterIndexGranulePtr = std::shared_ptr<MergeTreeBloomFilterIndexGranule>;
|
||||
|
||||
|
||||
struct MergeTreeBloomFilterIndexAggregator : IMergeTreeIndexAggregator
|
||||
{
|
||||
explicit MergeTreeBloomFilterIndexAggregator(const MergeTreeBloomFilterIndex & index);
|
||||
|
||||
~MergeTreeBloomFilterIndexAggregator() override = default;
|
||||
|
||||
bool empty() const override { return !granule || granule->empty(); }
|
||||
MergeTreeIndexGranulePtr getGranuleAndReset() override;
|
||||
|
||||
void update(const Block & block, size_t * pos, size_t limit) override;
|
||||
|
||||
const MergeTreeBloomFilterIndex & index;
|
||||
MergeTreeBloomFilterIndexGranulePtr granule;
|
||||
};
|
||||
|
||||
|
||||
class BloomFilterCondition : public IIndexCondition
|
||||
{
|
||||
public:
|
||||
BloomFilterCondition(
|
||||
const SelectQueryInfo & query_info,
|
||||
const Context & context,
|
||||
const MergeTreeBloomFilterIndex & index_);
|
||||
|
||||
~BloomFilterCondition() override = default;
|
||||
|
||||
bool alwaysUnknownOrTrue() const override;
|
||||
|
||||
bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
|
||||
private:
|
||||
struct KeyTuplePositionMapping
|
||||
{
|
||||
KeyTuplePositionMapping(size_t tuple_index_, size_t key_index_) : tuple_index(tuple_index_), key_index(key_index_) {}
|
||||
|
||||
size_t tuple_index;
|
||||
size_t key_index;
|
||||
};
|
||||
/// Uses RPN like KeyCondition
|
||||
struct RPNElement
|
||||
{
|
||||
enum Function
|
||||
{
|
||||
/// Atoms of a Boolean expression.
|
||||
FUNCTION_EQUALS,
|
||||
FUNCTION_NOT_EQUALS,
|
||||
FUNCTION_LIKE,
|
||||
FUNCTION_NOT_LIKE,
|
||||
FUNCTION_IN,
|
||||
FUNCTION_NOT_IN,
|
||||
FUNCTION_UNKNOWN, /// Can take any value.
|
||||
/// Operators of the logical expression.
|
||||
FUNCTION_NOT,
|
||||
FUNCTION_AND,
|
||||
FUNCTION_OR,
|
||||
/// Constants
|
||||
ALWAYS_FALSE,
|
||||
ALWAYS_TRUE,
|
||||
};
|
||||
|
||||
RPNElement(
|
||||
Function function_ = FUNCTION_UNKNOWN, size_t key_column_ = 0, std::unique_ptr<StringBloomFilter> && const_bloom_filter_ = nullptr)
|
||||
: function(function_), key_column(key_column_), bloom_filter(std::move(const_bloom_filter_)) {}
|
||||
|
||||
Function function = FUNCTION_UNKNOWN;
|
||||
/// For FUNCTION_EQUALS, FUNCTION_NOT_EQUALS, FUNCTION_LIKE, FUNCTION_NOT_LIKE.
|
||||
size_t key_column;
|
||||
std::unique_ptr<StringBloomFilter> bloom_filter;
|
||||
/// For FUNCTION_IN and FUNCTION_NOT_IN
|
||||
std::vector<std::vector<StringBloomFilter>> set_bloom_filters;
|
||||
std::vector<size_t> set_key_position;
|
||||
};
|
||||
|
||||
using AtomMap = std::unordered_map<std::string, bool(*)(RPNElement & out, const Field & value, const MergeTreeBloomFilterIndex & idx)>;
|
||||
using RPN = std::vector<RPNElement>;
|
||||
|
||||
bool atomFromAST(const ASTPtr & node, Block & block_with_constants, RPNElement & out);
|
||||
|
||||
bool getKey(const ASTPtr & node, size_t & key_column_num);
|
||||
bool tryPrepareSetBloomFilter(const ASTs & args, RPNElement & out);
|
||||
|
||||
static const AtomMap atom_map;
|
||||
|
||||
const MergeTreeBloomFilterIndex & index;
|
||||
RPN rpn;
|
||||
/// Sets from syntax analyzer.
|
||||
PreparedSets prepared_sets;
|
||||
};
|
||||
|
||||
|
||||
/// Interface for string parsers.
|
||||
struct ITokenExtractor
|
||||
{
|
||||
virtual ~ITokenExtractor() = default;
|
||||
/// Fast inplace implementation for regular use.
|
||||
/// Gets string (data ptr and len) and start position for extracting next token (state of extractor).
|
||||
/// Returns false if parsing is finished, otherwise returns true.
|
||||
virtual bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const = 0;
|
||||
/// Special implementation for creating bloom filter for LIKE function.
|
||||
/// It skips unescaped `%` and `_` and supports escaping symbols, but it is less lightweight.
|
||||
virtual bool nextLike(const String & str, size_t * pos, String & out) const = 0;
|
||||
|
||||
virtual bool supportLike() const = 0;
|
||||
};
|
||||
|
||||
/// Parser extracting all ngrams from string.
|
||||
struct NgramTokenExtractor : public ITokenExtractor
|
||||
{
|
||||
NgramTokenExtractor(size_t n_) : n(n_) {}
|
||||
|
||||
static String getName() { return "ngrambf_v1"; }
|
||||
|
||||
bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const override;
|
||||
bool nextLike(const String & str, size_t * pos, String & token) const override;
|
||||
|
||||
bool supportLike() const override { return true; }
|
||||
|
||||
size_t n;
|
||||
};
|
||||
|
||||
/// Parser extracting tokens (sequences of numbers and ascii letters).
|
||||
struct SplitTokenExtractor : public ITokenExtractor
|
||||
{
|
||||
static String getName() { return "tokenbf_v1"; }
|
||||
|
||||
bool next(const char * data, size_t len, size_t * pos, size_t * token_start, size_t * token_len) const override;
|
||||
bool nextLike(const String & str, size_t * pos, String & token) const override;
|
||||
|
||||
bool supportLike() const override { return true; }
|
||||
};
|
||||
|
||||
|
||||
class MergeTreeBloomFilterIndex : public IMergeTreeIndex
|
||||
{
|
||||
public:
|
||||
MergeTreeBloomFilterIndex(
|
||||
String name_,
|
||||
ExpressionActionsPtr expr_,
|
||||
const Names & columns_,
|
||||
const DataTypes & data_types_,
|
||||
const Block & header_,
|
||||
size_t granularity_,
|
||||
size_t bloom_filter_size_,
|
||||
size_t bloom_filter_hashes_,
|
||||
size_t seed_,
|
||||
std::unique_ptr<ITokenExtractor> && token_extractor_func_)
|
||||
: IMergeTreeIndex(name_, expr_, columns_, data_types_, header_, granularity_)
|
||||
, bloom_filter_size(bloom_filter_size_)
|
||||
, bloom_filter_hashes(bloom_filter_hashes_)
|
||||
, seed(seed_)
|
||||
, token_extractor_func(std::move(token_extractor_func_)) {}
|
||||
|
||||
~MergeTreeBloomFilterIndex() override = default;
|
||||
|
||||
MergeTreeIndexGranulePtr createIndexGranule() const override;
|
||||
MergeTreeIndexAggregatorPtr createIndexAggregator() const override;
|
||||
|
||||
IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const override;
|
||||
|
||||
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
|
||||
|
||||
/// Bloom filter size in bytes.
|
||||
size_t bloom_filter_size;
|
||||
/// Number of bloom filter hash functions.
|
||||
size_t bloom_filter_hashes;
|
||||
/// Bloom filter seed.
|
||||
size_t seed;
|
||||
/// Fucntion for selecting next token.
|
||||
std::unique_ptr<ITokenExtractor> token_extractor_func;
|
||||
};
|
||||
|
||||
}
|
@ -223,8 +223,8 @@ static void checkKeyExpression(const ExpressionActions & expr, const Block & sam
|
||||
|
||||
|
||||
void MergeTreeData::setPrimaryKeyIndicesAndColumns(
|
||||
const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
|
||||
const ColumnsDescription &new_columns, const IndicesDescription &indices_description, bool only_check)
|
||||
const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast,
|
||||
const ColumnsDescription & new_columns, const IndicesDescription & indices_description, bool only_check)
|
||||
{
|
||||
if (!new_order_by_ast)
|
||||
throw Exception("ORDER BY cannot be empty", ErrorCodes::BAD_ARGUMENTS);
|
||||
@ -2517,14 +2517,22 @@ bool MergeTreeData::mayBenefitFromIndexForIn(const ASTPtr & left_in_operand) con
|
||||
if (left_in_operand_tuple && left_in_operand_tuple->name == "tuple")
|
||||
{
|
||||
for (const auto & item : left_in_operand_tuple->arguments->children)
|
||||
{
|
||||
if (isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(item))
|
||||
return true;
|
||||
|
||||
for (const auto & index : skip_indices)
|
||||
if (index->mayBenefitFromIndexForIn(item))
|
||||
return true;
|
||||
}
|
||||
/// The tuple itself may be part of the primary key, so check that as a last resort.
|
||||
return isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand);
|
||||
}
|
||||
else
|
||||
{
|
||||
for (const auto & index : skip_indices)
|
||||
if (index->mayBenefitFromIndexForIn(left_in_operand))
|
||||
return true;
|
||||
|
||||
return isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand);
|
||||
}
|
||||
}
|
||||
|
@ -732,9 +732,9 @@ private:
|
||||
/// The same for clearOldTemporaryDirectories.
|
||||
std::mutex clear_old_temporary_directories_mutex;
|
||||
|
||||
void setPrimaryKeyIndicesAndColumns(const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
|
||||
const ColumnsDescription &new_columns,
|
||||
const IndicesDescription &indices_description, bool only_check = false);
|
||||
void setPrimaryKeyIndicesAndColumns(const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast,
|
||||
const ColumnsDescription & new_columns,
|
||||
const IndicesDescription & indices_description, bool only_check = false);
|
||||
|
||||
void initPartitionKey();
|
||||
|
||||
|
@ -65,10 +65,18 @@ std::unique_ptr<IMergeTreeIndex> setIndexCreator(
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context);
|
||||
|
||||
std::unique_ptr<IMergeTreeIndex> bloomFilterIndexCreator(
|
||||
const NamesAndTypesList & columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context);
|
||||
|
||||
|
||||
MergeTreeIndexFactory::MergeTreeIndexFactory()
|
||||
{
|
||||
registerIndex("minmax", minmaxIndexCreator);
|
||||
registerIndex("set", setIndexCreator);
|
||||
registerIndex("ngrambf_v1", bloomFilterIndexCreator);
|
||||
registerIndex("tokenbf_v1", bloomFilterIndexCreator);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -95,6 +95,9 @@ public:
|
||||
/// gets filename without extension
|
||||
String getFileName() const { return INDEX_FILE_PREFIX + name; }
|
||||
|
||||
/// Checks whether the column is in data skipping index.
|
||||
virtual bool mayBenefitFromIndexForIn(const ASTPtr & node) const = 0;
|
||||
|
||||
virtual MergeTreeIndexGranulePtr createIndexGranule() const = 0;
|
||||
virtual MergeTreeIndexAggregatorPtr createIndexAggregator() const = 0;
|
||||
|
||||
|
@ -133,6 +133,20 @@ IndexConditionPtr MergeTreeMinMaxIndex::createIndexCondition(
|
||||
return std::make_shared<MinMaxCondition>(query, context, *this);
|
||||
};
|
||||
|
||||
bool MergeTreeMinMaxIndex::mayBenefitFromIndexForIn(const ASTPtr & node) const
|
||||
{
|
||||
const String column_name = node->getColumnName();
|
||||
|
||||
for (const auto & name : columns)
|
||||
if (column_name == name)
|
||||
return true;
|
||||
|
||||
if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
|
||||
if (func->arguments->children.size() == 1)
|
||||
return mayBenefitFromIndexForIn(func->arguments->children.front());
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
std::unique_ptr<IMergeTreeIndex> minmaxIndexCreator(
|
||||
const NamesAndTypesList & new_columns,
|
||||
|
@ -82,6 +82,7 @@ public:
|
||||
IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const override;
|
||||
|
||||
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -462,11 +462,16 @@ IndexConditionPtr MergeTreeSetSkippingIndex::createIndexCondition(
|
||||
return std::make_shared<SetIndexCondition>(query, context, *this);
|
||||
};
|
||||
|
||||
bool MergeTreeSetSkippingIndex::mayBenefitFromIndexForIn(const ASTPtr &) const
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
std::unique_ptr<IMergeTreeIndex> setIndexCreator(
|
||||
const NamesAndTypesList & new_columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)
|
||||
const NamesAndTypesList & new_columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)
|
||||
{
|
||||
if (node->name.empty())
|
||||
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);
|
||||
|
@ -112,6 +112,8 @@ public:
|
||||
IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const override;
|
||||
|
||||
bool mayBenefitFromIndexForIn(const ASTPtr & node) const override;
|
||||
|
||||
size_t max_rows = 0;
|
||||
};
|
||||
|
||||
|
@ -1,5 +1,7 @@
|
||||
#include <Storages/MergeTree/MergeTreeSettings.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Common/Exception.h>
|
||||
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <Core/Defines.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
|
130
dbms/src/Storages/MergeTree/RPNBuilder.h
Normal file
130
dbms/src/Storages/MergeTree/RPNBuilder.h
Normal file
@ -0,0 +1,130 @@
|
||||
#pragma once
|
||||
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Core/Block.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/ExpressionAnalyzer.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Storages/SelectQueryInfo.h>
|
||||
#include <Storages/MergeTree/KeyCondition.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Builds reverse polish notation
|
||||
template <typename RPNElement>
|
||||
class RPNBuilder
|
||||
{
|
||||
public:
|
||||
using RPN = std::vector<RPNElement>;
|
||||
using AtomFromASTFunc = std::function<
|
||||
bool(const ASTPtr & node, const Context & context, Block & block_with_constants, RPNElement & out)>;
|
||||
|
||||
RPNBuilder(
|
||||
const SelectQueryInfo & query_info,
|
||||
const Context & context_,
|
||||
const AtomFromASTFunc & atomFromAST_)
|
||||
: context(context_), atomFromAST(atomFromAST_)
|
||||
{
|
||||
/** Evaluation of expressions that depend only on constants.
|
||||
* For the index to be used, if it is written, for example `WHERE Date = toDate(now())`.
|
||||
*/
|
||||
block_with_constants = KeyCondition::getBlockWithConstants(query_info.query, query_info.syntax_analyzer_result, context);
|
||||
|
||||
/// Trasform WHERE section to Reverse Polish notation
|
||||
const ASTSelectQuery & select = typeid_cast<const ASTSelectQuery &>(*query_info.query);
|
||||
if (select.where_expression)
|
||||
{
|
||||
traverseAST(select.where_expression);
|
||||
|
||||
if (select.prewhere_expression)
|
||||
{
|
||||
traverseAST(select.prewhere_expression);
|
||||
rpn.emplace_back(RPNElement::FUNCTION_AND);
|
||||
}
|
||||
}
|
||||
else if (select.prewhere_expression)
|
||||
{
|
||||
traverseAST(select.prewhere_expression);
|
||||
}
|
||||
else
|
||||
{
|
||||
rpn.emplace_back(RPNElement::FUNCTION_UNKNOWN);
|
||||
}
|
||||
}
|
||||
|
||||
RPN && extractRPN() { return std::move(rpn); }
|
||||
|
||||
private:
|
||||
void traverseAST(const ASTPtr & node)
|
||||
{
|
||||
RPNElement element;
|
||||
|
||||
if (ASTFunction * func = typeid_cast<ASTFunction *>(&*node))
|
||||
{
|
||||
if (operatorFromAST(func, element))
|
||||
{
|
||||
auto & args = typeid_cast<ASTExpressionList &>(*func->arguments).children;
|
||||
for (size_t i = 0, size = args.size(); i < size; ++i)
|
||||
{
|
||||
traverseAST(args[i]);
|
||||
|
||||
/** The first part of the condition is for the correct support of `and` and `or` functions of arbitrary arity
|
||||
* - in this case `n - 1` elements are added (where `n` is the number of arguments).
|
||||
*/
|
||||
if (i != 0 || element.function == RPNElement::FUNCTION_NOT)
|
||||
rpn.emplace_back(std::move(element));
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
if (!atomFromAST(node, context, block_with_constants, element))
|
||||
{
|
||||
element.function = RPNElement::FUNCTION_UNKNOWN;
|
||||
}
|
||||
|
||||
rpn.emplace_back(std::move(element));
|
||||
}
|
||||
|
||||
bool operatorFromAST(const ASTFunction * func, RPNElement & out)
|
||||
{
|
||||
/// Functions AND, OR, NOT.
|
||||
/** Also a special function `indexHint` - works as if instead of calling a function there are just parentheses
|
||||
* (or, the same thing - calling the function `and` from one argument).
|
||||
*/
|
||||
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
if (func->name == "not")
|
||||
{
|
||||
if (args.size() != 1)
|
||||
return false;
|
||||
|
||||
out.function = RPNElement::FUNCTION_NOT;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (func->name == "and" || func->name == "indexHint")
|
||||
out.function = RPNElement::FUNCTION_AND;
|
||||
else if (func->name == "or")
|
||||
out.function = RPNElement::FUNCTION_OR;
|
||||
else
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
const Context & context;
|
||||
const AtomFromASTFunc & atomFromAST;
|
||||
Block block_with_constants;
|
||||
RPN rpn;
|
||||
};
|
||||
|
||||
|
||||
};
|
@ -33,8 +33,6 @@ ReplicatedMergeTreeAlterThread::ReplicatedMergeTreeAlterThread(StorageReplicated
|
||||
|
||||
void ReplicatedMergeTreeAlterThread::run()
|
||||
{
|
||||
bool force_recheck_parts = true;
|
||||
|
||||
try
|
||||
{
|
||||
/** We have a description of columns in ZooKeeper, common for all replicas (Example: /clickhouse/tables/02-06/visits/columns),
|
||||
|
@ -36,6 +36,7 @@ private:
|
||||
String log_name;
|
||||
Logger * log;
|
||||
BackgroundSchedulePool::TaskHolder task;
|
||||
bool force_recheck_parts = true;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -6,7 +6,7 @@
|
||||
#include <Common/SimpleIncrement.h>
|
||||
#include <Client/ConnectionPool.h>
|
||||
#include <Client/ConnectionPoolWithFailover.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/Cluster.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
|
@ -2,6 +2,7 @@
|
||||
#include <Storages/StorageFactory.h>
|
||||
#include <Interpreters/Join.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Core/ColumnNumbers.h>
|
||||
#include <DataStreams/IBlockInputStream.h>
|
||||
|
@ -22,7 +22,7 @@
|
||||
#include <Columns/ColumnString.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Databases/IDatabase.h>
|
||||
#include <Interpreters/SettingsCommon.h>
|
||||
#include <Core/SettingsCommon.h>
|
||||
#include <DataStreams/MaterializingBlockInputStream.h>
|
||||
#include <DataStreams/FilterBlockInputStream.h>
|
||||
#include <ext/range.h>
|
||||
|
@ -6,7 +6,7 @@
|
||||
#include <Storages/transformQueryForExternalDatabase.h>
|
||||
#include <Formats/MySQLBlockInputStream.h>
|
||||
#include <Interpreters/evaluateConstantExpression.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Core/Settings.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <DataStreams/IBlockOutputStream.h>
|
||||
#include <Formats/FormatFactory.h>
|
||||
|
@ -3,6 +3,7 @@
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTSubquery.h>
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
|
||||
#include <Storages/StorageView.h>
|
||||
#include <Storages/StorageFactory.h>
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user