mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 07:31:57 +00:00
Merge remote-tracking branch 'upstream/master' into useful_what_in_exceptions
This commit is contained in:
commit
6d5c0bdf91
28
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
28
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
@ -0,0 +1,28 @@
|
||||
---
|
||||
name: Bug report
|
||||
about: Create a report to help us improve ClickHouse
|
||||
title: ''
|
||||
labels: bug, issue
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
**Describe the bug**
|
||||
A clear and concise description of what the bug is.
|
||||
|
||||
**How to reproduce**
|
||||
* Which ClickHouse server version to use
|
||||
* Which interface to use, if matters
|
||||
* Non-default settings, if any
|
||||
* `CREATE TABLE` statements for all tables involved
|
||||
* Sample data for all these tables, use [clickhouse-obfuscator](https://github.com/yandex/ClickHouse/blob/master/dbms/programs/obfuscator/Obfuscator.cpp#L42-L80) if necessary
|
||||
* Queries to run that lead to unexpected result
|
||||
|
||||
**Expected behavior**
|
||||
A clear and concise description of what you expected to happen.
|
||||
|
||||
**Error message and/or stacktrace**
|
||||
If applicable, add screenshots to help explain your problem.
|
||||
|
||||
**Additional context**
|
||||
Add any other context about the problem here.
|
20
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
20
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
@ -0,0 +1,20 @@
|
||||
---
|
||||
name: Feature request
|
||||
about: Suggest an idea for ClickHouse
|
||||
title: ''
|
||||
labels: feature
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
**Use case.**
|
||||
A clear and concise description of what is the intended usage scenario is.
|
||||
|
||||
**Describe the solution you'd like**
|
||||
A clear and concise description of what you want to happen.
|
||||
|
||||
**Describe alternatives you've considered**
|
||||
A clear and concise description of any alternative solutions or features you've considered.
|
||||
|
||||
**Additional context**
|
||||
Add any other context or screenshots about the feature request here.
|
@ -8,7 +8,7 @@
|
||||
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` for SQL standard compatibility. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
|
||||
* Support for write in `HDFS` tables and `hdfs` table function. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
|
||||
* Added functions to search for multiple constant strings from big haystack: `multiPosition`, `multiSearch` ,`firstMatch` also with `-UTF8`, `-CaseInsensitive`, and `-CaseInsensitiveUTF8` variants. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
|
||||
* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Ivan](https://github.com/abyss7))
|
||||
* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Gleb Kanterov](https://github.com/kanterov), [Ivan](https://github.com/abyss7))
|
||||
* Allow `Kafka` engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
|
||||
* Added support for `CatBoost` multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
|
||||
|
109
CHANGELOG_RU.md
109
CHANGELOG_RU.md
@ -1,3 +1,112 @@
|
||||
## ClickHouse release 19.1.6, 2019-01-24
|
||||
|
||||
### Новые возможности:
|
||||
|
||||
* Задание формата сжатия для отдельных столбцов. [#3899](https://github.com/yandex/ClickHouse/pull/3899) [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin), [Winter Zhang](https://github.com/zhang2014), [Anatoly](https://github.com/Sindbag))
|
||||
* Формат сжатия `Delta`. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
|
||||
* Изменение формата сжатия запросом `ALTER`. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
|
||||
* Добавлены функции `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` для совместимости со стандартом SQL. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
|
||||
* Поддержка записи в движок `HDFS` и табличную функцию `hdfs`. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
|
||||
* Добавлены функции поиска набора константных строк в тексте: `multiPosition`, `multiSearch` ,`firstMatch` также с суффиксами `-UTF8`, `-CaseInsensitive`, и `-CaseInsensitiveUTF8`. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
|
||||
* Пропуск неиспользуемых шардов в случае, если запрос `SELECT` содержит фильтрацию по ключу шардирования (настройка `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Gleb Kanterov](https://github.com/kanterov), [Ivan](https://github.com/abyss7))
|
||||
* Пропуск строк в случае ошибки парсинга для движка `Kafka` (настройка `kafka_skip_broken_messages`). [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
|
||||
* Поддержка применения мультиклассовых моделей `CatBoost`. Функция `modelEvaluate` возвращает кортеж в случае использования мультиклассовой модели. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Добавлены функции `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
|
||||
* Добавлены функции хеширования `xxHash64` и `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov))
|
||||
* Добавлена функция хеширования `gccMurmurHash` (GCC flavoured Murmur hash), использующая те же hash seed, что и [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundyli](https://github.com/sundy-li))
|
||||
* Добавлены функции хеширования `javaHash`, `hiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
|
||||
* Добавлена функция `remoteSecure`. Функция работает аналогично `remote`, но использует безопасное соединение. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
|
||||
|
||||
|
||||
### Экспериментальные возможности:
|
||||
|
||||
* Эмуляция запросов с несколькими секциями `JOIN` (настройка `allow_experimental_multiple_joins_emulation`). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
|
||||
### Исправления ошибок:
|
||||
|
||||
* Ограничен размер кеша скомпилированных выражений в случае, если не указана настройка `compiled_expression_cache_size` для экономии потребляемой памяти. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлена проблема зависания потоков, выполняющих запрос `ALTER` для таблиц семейства `Replicated`, а также потоков, обновляющих конфигурацию из ZooKeeper. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3891](https://github.com/yandex/ClickHouse/issues/3891) [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлен race condition в случае выполнения распределенной задачи запроса `ALTER`. Race condition приводил к состоянию, когда более чем одна реплика пыталась выполнить задачу, в результате чего все такие реплики, кроме одной, падали с ошибкой обращения к ZooKeeper. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлена проблема обновления настройки `from_zk`. Настройка, указанная в файле конфигурации, не обновлялась в случае, если запрос к ZooKeeper падал по timeout. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлена ошибка в вычислении сетевого префикса при указании IPv4 маски подсети. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлено падение (`std::terminate`) в редком сценарии, когда новый поток не мог быть создан из-за нехватки ресурсов. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено падение табличной функции `remote` в случае, когда не удавалось получить структуру таблицы из-за ограничений пользователя. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлена утечка сетевых сокетов. Сокеты создавались в пуле и никогда не закрывались. При создании потока, создавались новые сокеты в случае, если все доступные использовались. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Исправлена проблема закрывания `/proc/self/fd` раньше, чем все файловые дескрипторы были прочитаны из `/proc` после создания процесса `odbc-bridge`. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлен баг в монотонном преобразовании String в UInt в случае использования String в первичном ключе. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Исправлен баг в вычислении монотонности функции преобразования типа целых значений. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлено падение в функциях `arrayEnumerateUniq`, `arrayEnumerateDense` при передаче невалидных аргументов. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлен undefined behavior в StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([Amos Bird](https://github.com/amosbird))
|
||||
* Исправлено падение в функциях `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена проблема, в результате которой функции `round`, `floor`, `trunc`, `ceil` могли возвращать неверный результат для отрицательных целочисленных аргументов с большим значением. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена проблема, в результате которой 'kill query sync' приводил к падению сервера. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([muVulDeePecker](https://github.com/fancyqlx))
|
||||
* Исправлен баг, приводящий к большой задержке в случае пустой очереди репликации. [#3928](https://github.com/yandex/ClickHouse/pull/3928) [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
|
||||
* Исправлено избыточное использование памяти в случае вставки в таблицу с `LowCardinality` в первичном ключе. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Исправлена сериализация пустых массивов типа `LowCardinality` для формата `Native`. [#3907](https://github.com/yandex/ClickHouse/issues/3907) [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Исправлен неверный результат в случае использования distinct для числового столбца `LowCardinality`. [#3895](https://github.com/yandex/ClickHouse/issues/3895) [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Исправлена компиляция вычисления агрегатных функций для ключа `LowCardinality` (для случая, когда включена настройка `compile`). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Исправлена передача пользователя и пароля для запросов с реплик. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) ([小路](https://github.com/nicelulu))
|
||||
* Исправлен очень редкий race condition возникающий при перечислении таблиц из базы данных типа `Dictionary` во время перезагрузки словарей. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлен неверный результат в случае использования HAVING с ROLLUP или CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([Sam Chou](https://github.com/reflection))
|
||||
* Исправлена проблема с алиасами столбцов для запросов с `JOIN ON` над распределенными таблицами. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Исправлена ошибка в реализации функции `quantileTDigest` (нашел Artem Vakhrushev). Эта ошибка никогда не происходит в ClickHouse и актуальна только для тех, кто использует кодовую базу ClickHouse напрямую в качестве библиотеки. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
### Улучшения:
|
||||
|
||||
* Добавлена поддержка `IF NOT EXISTS` в выражении `ALTER TABLE ADD COLUMN`, `IF EXISTS` в выражении `DROP/MODIFY/CLEAR/COMMENT COLUMN`. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([Boris Granveaud](https://github.com/bgranvea))
|
||||
* Функция `parseDateTimeBestEffort` теперь поддерживает форматы `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` и аналогичные. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* `CapnProtoInputStream` теперь поддерживает jagged структуры. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Odin Hultgren Van Der Horst](https://github.com/Miniwoffer))
|
||||
* Улучшение usability: добавлена проверка, что сервер запущен от пользователя, совпадающего с владельцем директории данных. Запрещен запуск от пользователя root в случае, если root не владеет директорией с данными. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
|
||||
* Улучшена логика проверки столбцов, необходимых для JOIN, на стадии анализа запроса. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([Artem Zuikov](https://github.com/4ertus2))
|
||||
* Уменьшено число поддерживаемых соединений в случае большого числа распределенных таблиц. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([Winter Zhang](https://github.com/zhang2014))
|
||||
* Добавлена поддержка строки с totals для запроса с `WITH TOTALS` через ODBC драйвер. [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([Maksim Koritckiy](https://github.com/nightweb))
|
||||
* Поддержано использование `Enum` в качестве чисел в функции `if`. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([Ivan](https://github.com/abyss7))
|
||||
* Добавлена настройка `low_cardinality_allow_in_native_format`. Если она выключена, то тип `LowCadrinality` не используется в формате `Native`. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
|
||||
* Удалены некоторые избыточные объекты из кеша скомпилированных выражений для уменьшения потребления памяти. [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
|
||||
* Добавлена проверка того, что в запрос `SET send_logs_level = 'value'` передается верное значение. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([Sabyanin Maxim](https://github.com/s-mx))
|
||||
* Добавлена проверка типов для функций преобразования типов. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([Winter Zhang](https://github.com/zhang2014))
|
||||
|
||||
### Улучшения производительности:
|
||||
|
||||
* Добавлена настройка `use_minimalistic_part_header_in_zookeeper` для движка MergeTree. Если настройка включена, Replicated таблицы будут хранить метаданные куска в компактном виде (в соответствующем znode для этого куска). Это может значительно уменьшить размер для ZooKeeper snapshot (особенно для таблиц с большим числом столбцов). После включения данной настройки будет невозможно сделать откат к версии, которая эту настройку не поддерживает. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Добавлена реализация функций `sequenceMatch` и `sequenceCount` на основе конечного автомата в случае, если последовательность событий не содержит условия на время. [#4004](https://github.com/yandex/ClickHouse/pull/4004) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
|
||||
* Улучшена производительность сериализации целых чисел. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([Amos Bird](https://github.com/amosbird))
|
||||
* Добавлен zero left padding для PODArray. Теперь элемент с индексом -1 является валидным нулевым значением. Эта особенность используется для удаления условного выражения при вычислении оффсетов массивов. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([Amos Bird](https://github.com/amosbird))
|
||||
* Откат версии `jemalloc`, приводящей к деградации производительности. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
|
||||
### Обратно несовместимые изменения:
|
||||
|
||||
* Удалена недокументированная возможность `ALTER MODIFY PRIMARY KEY`, замененная выражением `ALTER MODIFY ORDER BY`. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([Alex Zatelepin](https://github.com/ztlpn))
|
||||
* Удалена функция `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Запрещено использование скалярных подзапросов с результатом, имеющим тип `AggregateFunction`. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([Ivan](https://github.com/abyss7))
|
||||
|
||||
### Улучшения сборки/тестирования/пакетирования:
|
||||
|
||||
* Добавлена поддержка сборки под PowerPC (`ppc64le`). [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([Danila Kutenin](https://github.com/danlark1))
|
||||
* Функциональные stateful тесты запускаются на публично доступных данных. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлена ошибка, при которой сервер не мог запуститься с сообщением `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` при использовании Docker или systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Обновлена библиотека `rdkafka` до версии v1.0.0-RC5. Использована cppkafka на замену интерфейса языка C. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([Ivan](https://github.com/abyss7))
|
||||
* Обновлена библиотека `mariadb-client`. Исправлена проблема, обнаруженная с использованием UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправления для сборок с UBSan. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3021](https://github.com/yandex/ClickHouse/pull/3021) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Добавлены покоммитные запуски тестов с UBSan сборкой.
|
||||
* Добавлены покоммитные запуски тестов со статическим анализатором PVS-Studio.
|
||||
* Исправлены проблемы, найденные с использованием PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправлены проблемы совместимости glibc. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Docker образы перемещены на Ubuntu 18.10, добавлена совместимость с glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
|
||||
* Добавлена переменная окружения `CLICKHOUSE_DO_NOT_CHOWN`, позволяющая не делать shown директории для Docker образа сервера. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
|
||||
* Включены большинство предупреждений из `-Weverything` для clang. Включено `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Добавлены некоторые предупреждения, специфичные только для clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* При использовании динамической линковки используется `libLLVM` вместо библиотеки `LLVM`. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([Orivej Desh](https://github.com/orivej))
|
||||
* Добавлены переменные окружения для параметров `TSan`, `UBSan`, `ASan` в тестовом Docker образе. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin))
|
||||
* Debian пакет `clickhouse-server` будет рекомендовать пакет `libcap2-bin` для того, чтобы использовать утилиту `setcap` для настроек. Данный пакет опционален. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Уменьшено время сборки, убраны ненужные включения заголовочных файлов. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller))
|
||||
* Добавлены тесты производительности для функций хеширования. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov))
|
||||
* Исправлены циклические зависимости библиотек. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller))
|
||||
* Улучшена компиляция при малом объеме памяти. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller))
|
||||
* Добавлен тестовый скрипт для воспроизведения деградации производительности в `jemalloc`. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
|
||||
* Исправления опечаток в комментариях и строковых литералах. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
|
||||
* Исправления опечаток в комментариях. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([Evgenii Pravda](https://github.com/kvinty))
|
||||
|
||||
## ClickHouse release 18.16.1, 2018-12-21
|
||||
|
||||
### Исправления ошибок:
|
||||
|
@ -39,5 +39,10 @@ add_library(base64 ${LINK_MODE}
|
||||
${LIBRARY_DIR}/lib/codecs.h
|
||||
${CMAKE_CURRENT_BINARY_DIR}/config.h)
|
||||
|
||||
target_compile_options(base64 PRIVATE ${base64_SSSE3_opt} ${base64_SSE41_opt} ${base64_SSE42_opt} ${base64_AVX_opt} ${base64_AVX2_opt})
|
||||
set_source_files_properties(${LIBRARY_DIR}/lib/arch/avx/codec.c PROPERTIES COMPILE_FLAGS -mavx)
|
||||
set_source_files_properties(${LIBRARY_DIR}/lib/arch/avx2/codec.c PROPERTIES COMPILE_FLAGS -mavx2)
|
||||
set_source_files_properties(${LIBRARY_DIR}/lib/arch/sse41/codec.c PROPERTIES COMPILE_FLAGS -msse4.1)
|
||||
set_source_files_properties(${LIBRARY_DIR}/lib/arch/sse42/codec.c PROPERTIES COMPILE_FLAGS -msse4.2)
|
||||
set_source_files_properties(${LIBRARY_DIR}/lib/arch/ssse3/codec.c PROPERTIES COMPILE_FLAGS -mssse3)
|
||||
|
||||
target_include_directories(base64 PRIVATE ${LIBRARY_DIR}/include ${CMAKE_CURRENT_BINARY_DIR})
|
||||
|
@ -1179,7 +1179,7 @@ protected:
|
||||
/// Removes MATERIALIZED and ALIAS columns from create table query
|
||||
static ASTPtr removeAliasColumnsFromCreateQuery(const ASTPtr & query_ast)
|
||||
{
|
||||
const ASTs & column_asts = typeid_cast<ASTCreateQuery &>(*query_ast).columns->children;
|
||||
const ASTs & column_asts = typeid_cast<ASTCreateQuery &>(*query_ast).columns_list->columns->children;
|
||||
auto new_columns = std::make_shared<ASTExpressionList>();
|
||||
|
||||
for (const ASTPtr & column_ast : column_asts)
|
||||
@ -1198,8 +1198,13 @@ protected:
|
||||
|
||||
ASTPtr new_query_ast = query_ast->clone();
|
||||
ASTCreateQuery & new_query = typeid_cast<ASTCreateQuery &>(*new_query_ast);
|
||||
new_query.columns = new_columns.get();
|
||||
new_query.children.at(0) = std::move(new_columns);
|
||||
|
||||
auto new_columns_list = std::make_shared<ASTColumns>();
|
||||
new_columns_list->set(new_columns_list->columns, new_columns);
|
||||
new_columns_list->set(
|
||||
new_columns_list->indices, typeid_cast<ASTCreateQuery &>(*query_ast).columns_list->indices->clone());
|
||||
|
||||
new_query.replace(new_query.columns_list, new_columns_list);
|
||||
|
||||
return new_query_ast;
|
||||
}
|
||||
@ -1217,7 +1222,7 @@ protected:
|
||||
res->table = new_table.second;
|
||||
|
||||
res->children.clear();
|
||||
res->set(res->columns, create.columns->clone());
|
||||
res->set(res->columns_list, create.columns_list->clone());
|
||||
res->set(res->storage, new_storage_ast->clone());
|
||||
|
||||
return res;
|
||||
|
@ -25,12 +25,14 @@ PerformanceTest::PerformanceTest(
|
||||
Connection & connection_,
|
||||
InterruptListener & interrupt_listener_,
|
||||
const PerformanceTestInfo & test_info_,
|
||||
Context & context_)
|
||||
Context & context_,
|
||||
const std::vector<size_t> & queries_to_run_)
|
||||
: config(config_)
|
||||
, connection(connection_)
|
||||
, interrupt_listener(interrupt_listener_)
|
||||
, test_info(test_info_)
|
||||
, context(context_)
|
||||
, queries_to_run(queries_to_run_)
|
||||
, log(&Poco::Logger::get("PerformanceTest"))
|
||||
{
|
||||
}
|
||||
@ -157,9 +159,14 @@ void PerformanceTest::finish() const
|
||||
std::vector<TestStats> PerformanceTest::execute()
|
||||
{
|
||||
std::vector<TestStats> statistics_by_run;
|
||||
size_t query_count;
|
||||
if (queries_to_run.empty())
|
||||
query_count = test_info.queries.size();
|
||||
else
|
||||
query_count = queries_to_run.size();
|
||||
size_t total_runs = test_info.times_to_run * test_info.queries.size();
|
||||
statistics_by_run.resize(total_runs);
|
||||
LOG_INFO(log, "Totally will run cases " << total_runs << " times");
|
||||
LOG_INFO(log, "Totally will run cases " << test_info.times_to_run * query_count << " times");
|
||||
UInt64 max_exec_time = calculateMaxExecTime();
|
||||
if (max_exec_time != 0)
|
||||
LOG_INFO(log, "Test will be executed for a maximum of " << max_exec_time / 1000. << " seconds");
|
||||
@ -172,9 +179,13 @@ std::vector<TestStats> PerformanceTest::execute()
|
||||
|
||||
for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
|
||||
{
|
||||
size_t statistic_index = number_of_launch * test_info.queries.size() + query_index;
|
||||
|
||||
queries_with_indexes.push_back({test_info.queries[query_index], statistic_index});
|
||||
if (queries_to_run.empty() || std::find(queries_to_run.begin(), queries_to_run.end(), query_index) != queries_to_run.end())
|
||||
{
|
||||
size_t statistic_index = number_of_launch * test_info.queries.size() + query_index;
|
||||
queries_with_indexes.push_back({test_info.queries[query_index], statistic_index});
|
||||
}
|
||||
else
|
||||
LOG_INFO(log, "Will skip query " << test_info.queries[query_index] << " by index");
|
||||
}
|
||||
|
||||
if (got_SIGINT)
|
||||
|
@ -22,7 +22,8 @@ public:
|
||||
Connection & connection_,
|
||||
InterruptListener & interrupt_listener_,
|
||||
const PerformanceTestInfo & test_info_,
|
||||
Context & context_);
|
||||
Context & context_,
|
||||
const std::vector<size_t> & queries_to_run_);
|
||||
|
||||
bool checkPreconditions() const;
|
||||
void prepare() const;
|
||||
@ -54,6 +55,7 @@ private:
|
||||
PerformanceTestInfo test_info;
|
||||
Context & context;
|
||||
|
||||
std::vector<size_t> queries_to_run;
|
||||
Poco::Logger * log;
|
||||
|
||||
bool got_SIGINT = false;
|
||||
|
@ -11,12 +11,13 @@
|
||||
#include <boost/filesystem.hpp>
|
||||
#include <boost/program_options.hpp>
|
||||
|
||||
#include <Poco/Util/XMLConfiguration.h>
|
||||
#include <Poco/Logger.h>
|
||||
#include <Poco/AutoPtr.h>
|
||||
#include <Poco/ConsoleChannel.h>
|
||||
#include <Poco/FormattingChannel.h>
|
||||
#include <Poco/Logger.h>
|
||||
#include <Poco/Path.h>
|
||||
#include <Poco/PatternFormatter.h>
|
||||
|
||||
#include <Poco/Util/XMLConfiguration.h>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
#include <Client/Connection.h>
|
||||
@ -25,7 +26,6 @@
|
||||
#include <IO/ConnectionTimeouts.h>
|
||||
#include <IO/UseSSL.h>
|
||||
#include <Interpreters/Settings.h>
|
||||
#include <Poco/AutoPtr.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Common/InterruptListener.h>
|
||||
|
||||
@ -70,6 +70,7 @@ public:
|
||||
Strings && skip_names_,
|
||||
Strings && tests_names_regexp_,
|
||||
Strings && skip_names_regexp_,
|
||||
const std::unordered_map<std::string, std::vector<size_t>> query_indexes_,
|
||||
const ConnectionTimeouts & timeouts)
|
||||
: connection(host_, port_, default_database_, user_,
|
||||
password_, timeouts, "performance-test", Protocol::Compression::Enable,
|
||||
@ -80,6 +81,7 @@ public:
|
||||
, skip_tags(std::move(skip_tags_))
|
||||
, skip_names(std::move(skip_names_))
|
||||
, skip_names_regexp(std::move(skip_names_regexp_))
|
||||
, query_indexes(query_indexes_)
|
||||
, lite_output(lite_output_)
|
||||
, profiles_file(profiles_file_)
|
||||
, input_files(input_files_)
|
||||
@ -128,6 +130,7 @@ private:
|
||||
const Strings & skip_tags;
|
||||
const Strings & skip_names;
|
||||
const Strings & skip_names_regexp;
|
||||
std::unordered_map<std::string, std::vector<size_t>> query_indexes;
|
||||
|
||||
Context global_context = Context::createGlobal();
|
||||
std::shared_ptr<ReportBuilder> report_builder;
|
||||
@ -198,7 +201,7 @@ private:
|
||||
{
|
||||
PerformanceTestInfo info(test_config, profiles_file);
|
||||
LOG_INFO(log, "Config for test '" << info.test_name << "' parsed");
|
||||
PerformanceTest current(test_config, connection, interrupt_listener, info, global_context);
|
||||
PerformanceTest current(test_config, connection, interrupt_listener, info, global_context, query_indexes[info.path]);
|
||||
|
||||
current.checkPreconditions();
|
||||
LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are fullfilled");
|
||||
@ -215,9 +218,9 @@ private:
|
||||
LOG_INFO(log, "Postqueries finished");
|
||||
|
||||
if (lite_output)
|
||||
return {report_builder->buildCompactReport(info, result), current.checkSIGINT()};
|
||||
return {report_builder->buildCompactReport(info, result, query_indexes[info.path]), current.checkSIGINT()};
|
||||
else
|
||||
return {report_builder->buildFullReport(info, result), current.checkSIGINT()};
|
||||
return {report_builder->buildFullReport(info, result, query_indexes[info.path]), current.checkSIGINT()};
|
||||
}
|
||||
|
||||
};
|
||||
@ -289,6 +292,29 @@ static std::vector<std::string> getInputFiles(const po::variables_map & options,
|
||||
return input_files;
|
||||
}
|
||||
|
||||
std::unordered_map<std::string, std::vector<std::size_t>> getTestQueryIndexes(const po::basic_parsed_options<char> & parsed_opts)
|
||||
{
|
||||
std::unordered_map<std::string, std::vector<std::size_t>> result;
|
||||
const auto & options = parsed_opts.options;
|
||||
for (size_t i = 0; i < options.size() - 1; ++i)
|
||||
{
|
||||
const auto & opt = options[i];
|
||||
if (opt.string_key == "input-files")
|
||||
{
|
||||
if (options[i + 1].string_key == "query-indexes")
|
||||
{
|
||||
const std::string & test_path = Poco::Path(opt.value[0]).absolute().toString();
|
||||
for (const auto & query_num_str : options[i + 1].value)
|
||||
{
|
||||
size_t query_num = std::stoul(query_num_str);
|
||||
result[test_path].push_back(query_num);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
int mainEntryClickHousePerformanceTest(int argc, char ** argv)
|
||||
try
|
||||
{
|
||||
@ -314,24 +340,18 @@ try
|
||||
("skip-names", value<Strings>()->multitoken(), "Do not run tests with name")
|
||||
("names-regexp", value<Strings>()->multitoken(), "Run tests with names matching regexp")
|
||||
("skip-names-regexp", value<Strings>()->multitoken(), "Do not run tests with names matching regexp")
|
||||
("input-files", value<Strings>()->multitoken(), "Input .xml files")
|
||||
("query-indexes", value<std::vector<size_t>>()->multitoken(), "Input query indexes")
|
||||
("recursive,r", "Recurse in directories to find all xml's");
|
||||
|
||||
/// These options will not be displayed in --help
|
||||
po::options_description hidden("Hidden options");
|
||||
hidden.add_options()
|
||||
("input-files", value<std::vector<std::string>>(), "");
|
||||
|
||||
/// But they will be legit, though. And they must be given without name
|
||||
po::positional_options_description positional;
|
||||
positional.add("input-files", -1);
|
||||
|
||||
po::options_description cmdline_options;
|
||||
cmdline_options.add(desc).add(hidden);
|
||||
cmdline_options.add(desc);
|
||||
|
||||
po::variables_map options;
|
||||
po::store(
|
||||
po::command_line_parser(argc, argv).
|
||||
options(cmdline_options).positional(positional).run(), options);
|
||||
po::basic_parsed_options<char> parsed = po::command_line_parser(argc, argv).options(cmdline_options).run();
|
||||
auto queries_with_indexes = getTestQueryIndexes(parsed);
|
||||
po::store(parsed, options);
|
||||
|
||||
po::notify(options);
|
||||
|
||||
Poco::AutoPtr<Poco::PatternFormatter> formatter(new Poco::PatternFormatter("%Y.%m.%d %H:%M:%S.%F <%p> %s: %t"));
|
||||
@ -378,6 +398,7 @@ try
|
||||
std::move(skip_names),
|
||||
std::move(tests_names_regexp),
|
||||
std::move(skip_names_regexp),
|
||||
queries_with_indexes,
|
||||
timeouts);
|
||||
return performance_test_suite.run();
|
||||
}
|
||||
|
@ -35,7 +35,8 @@ std::string ReportBuilder::getCurrentTime() const
|
||||
|
||||
std::string ReportBuilder::buildFullReport(
|
||||
const PerformanceTestInfo & test_info,
|
||||
std::vector<TestStats> & stats) const
|
||||
std::vector<TestStats> & stats,
|
||||
const std::vector<std::size_t> & queries_to_run) const
|
||||
{
|
||||
JSONString json_output;
|
||||
|
||||
@ -85,6 +86,9 @@ std::string ReportBuilder::buildFullReport(
|
||||
std::vector<JSONString> run_infos;
|
||||
for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
|
||||
{
|
||||
if (!queries_to_run.empty() && std::find(queries_to_run.begin(), queries_to_run.end(), query_index) == queries_to_run.end())
|
||||
continue;
|
||||
|
||||
for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
|
||||
{
|
||||
size_t stat_index = number_of_launch * test_info.queries.size() + query_index;
|
||||
@ -97,6 +101,7 @@ std::string ReportBuilder::buildFullReport(
|
||||
|
||||
auto query = std::regex_replace(test_info.queries[query_index], QUOTE_REGEX, "\\\"");
|
||||
runJSON.set("query", query);
|
||||
runJSON.set("query_index", query_index);
|
||||
if (!statistics.exception.empty())
|
||||
runJSON.set("exception", statistics.exception);
|
||||
|
||||
@ -171,13 +176,17 @@ std::string ReportBuilder::buildFullReport(
|
||||
|
||||
std::string ReportBuilder::buildCompactReport(
|
||||
const PerformanceTestInfo & test_info,
|
||||
std::vector<TestStats> & stats) const
|
||||
std::vector<TestStats> & stats,
|
||||
const std::vector<std::size_t> & queries_to_run) const
|
||||
{
|
||||
|
||||
std::ostringstream output;
|
||||
|
||||
for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
|
||||
{
|
||||
if (!queries_to_run.empty() && std::find(queries_to_run.begin(), queries_to_run.end(), query_index) == queries_to_run.end())
|
||||
continue;
|
||||
|
||||
for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
|
||||
{
|
||||
if (test_info.queries.size() > 1)
|
||||
@ -192,5 +201,4 @@ std::string ReportBuilder::buildCompactReport(
|
||||
}
|
||||
return output.str();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -9,14 +9,18 @@ namespace DB
|
||||
class ReportBuilder
|
||||
{
|
||||
public:
|
||||
explicit ReportBuilder(const std::string & server_version_);
|
||||
ReportBuilder(const std::string & server_version_);
|
||||
std::string buildFullReport(
|
||||
const PerformanceTestInfo & test_info,
|
||||
std::vector<TestStats> & stats) const;
|
||||
std::vector<TestStats> & stats,
|
||||
const std::vector<std::size_t> & queries_to_run) const;
|
||||
|
||||
|
||||
std::string buildCompactReport(
|
||||
const PerformanceTestInfo & test_info,
|
||||
std::vector<TestStats> & stats) const;
|
||||
std::vector<TestStats> & stats,
|
||||
const std::vector<std::size_t> & queries_to_run) const;
|
||||
|
||||
private:
|
||||
std::string server_version;
|
||||
std::string hostname;
|
||||
|
@ -4,6 +4,7 @@
|
||||
#include <Poco/File.h>
|
||||
#include <Poco/Net/HTTPBasicCredentials.h>
|
||||
#include <Poco/Net/HTTPServerRequest.h>
|
||||
#include <Poco/Net/HTTPServerRequestImpl.h>
|
||||
#include <Poco/Net/HTTPServerResponse.h>
|
||||
#include <Poco/Net/NetException.h>
|
||||
|
||||
@ -558,9 +559,47 @@ void HTTPHandler::processQuery(
|
||||
client_info.http_method = http_method;
|
||||
client_info.http_user_agent = request.get("User-Agent", "");
|
||||
|
||||
auto appendCallback = [&context] (ProgressCallback callback)
|
||||
{
|
||||
auto prev = context.getProgressCallback();
|
||||
|
||||
context.setProgressCallback([prev, callback] (const Progress & progress)
|
||||
{
|
||||
if (prev)
|
||||
prev(progress);
|
||||
|
||||
callback(progress);
|
||||
});
|
||||
};
|
||||
|
||||
/// While still no data has been sent, we will report about query execution progress by sending HTTP headers.
|
||||
if (settings.send_progress_in_http_headers)
|
||||
context.setProgressCallback([&used_output] (const Progress & progress) { used_output.out->onProgress(progress); });
|
||||
appendCallback([&used_output] (const Progress & progress) { used_output.out->onProgress(progress); });
|
||||
|
||||
if (settings.readonly > 0 && settings.cancel_http_readonly_queries_on_client_close)
|
||||
{
|
||||
Poco::Net::StreamSocket & socket = dynamic_cast<Poco::Net::HTTPServerRequestImpl &>(request).socket();
|
||||
|
||||
appendCallback([&context, &socket](const Progress &)
|
||||
{
|
||||
/// Assume that at the point this method is called no one is reading data from the socket any more.
|
||||
/// True for read-only queries.
|
||||
try
|
||||
{
|
||||
char b;
|
||||
int status = socket.receiveBytes(&b, 1, MSG_DONTWAIT | MSG_PEEK);
|
||||
if (status == 0)
|
||||
context.killCurrentQuery();
|
||||
}
|
||||
catch (Poco::TimeoutException &)
|
||||
{
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
context.killCurrentQuery();
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
executeQuery(*in, *used_output.out_maybe_delayed_and_compressed, /* allow_into_outfile = */ false, context,
|
||||
[&response] (const String & content_type) { response.setContentType(content_type); },
|
||||
|
58
dbms/src/AggregateFunctions/AggregateFunctionEntropy.cpp
Normal file
58
dbms/src/AggregateFunctions/AggregateFunctionEntropy.cpp
Normal file
@ -0,0 +1,58 @@
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <AggregateFunctions/AggregateFunctionEntropy.h>
|
||||
#include <AggregateFunctions/FactoryHelpers.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
AggregateFunctionPtr createAggregateFunctionEntropy(const std::string & name, const DataTypes & argument_types, const Array & parameters)
|
||||
{
|
||||
assertNoParameters(name, parameters);
|
||||
if (argument_types.empty())
|
||||
throw Exception("Incorrect number of arguments for aggregate function " + name,
|
||||
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
WhichDataType which(argument_types[0]);
|
||||
if (isNumber(argument_types[0]))
|
||||
{
|
||||
if (which.isUInt64())
|
||||
{
|
||||
return std::make_shared<AggregateFunctionEntropy<UInt64>>();
|
||||
}
|
||||
else if (which.isInt64())
|
||||
{
|
||||
return std::make_shared<AggregateFunctionEntropy<Int64>>();
|
||||
}
|
||||
else if (which.isInt32())
|
||||
{
|
||||
return std::make_shared<AggregateFunctionEntropy<Int32>>();
|
||||
}
|
||||
else if (which.isUInt32())
|
||||
{
|
||||
return std::make_shared<AggregateFunctionEntropy<UInt32>>();
|
||||
}
|
||||
else if (which.isUInt128())
|
||||
{
|
||||
return std::make_shared<AggregateFunctionEntropy<UInt128, true>>();
|
||||
}
|
||||
}
|
||||
|
||||
return std::make_shared<AggregateFunctionEntropy<UInt128>>();
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
void registerAggregateFunctionEntropy(AggregateFunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction("entropy", createAggregateFunctionEntropy);
|
||||
}
|
||||
|
||||
}
|
152
dbms/src/AggregateFunctions/AggregateFunctionEntropy.h
Normal file
152
dbms/src/AggregateFunctions/AggregateFunctionEntropy.h
Normal file
@ -0,0 +1,152 @@
|
||||
#pragma once
|
||||
|
||||
#include <AggregateFunctions/FactoryHelpers.h>
|
||||
#include <Common/HashTable/HashMap.h>
|
||||
#include <Common/NaNUtils.h>
|
||||
|
||||
#include <AggregateFunctions/IAggregateFunction.h>
|
||||
#include <AggregateFunctions/UniqVariadicHash.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
#include <cmath>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/** Calculates Shannon Entropy, using HashMap and computing empirical distribution function
|
||||
*/
|
||||
template <typename Value, bool is_hashed>
|
||||
struct EntropyData
|
||||
{
|
||||
using Weight = UInt64;
|
||||
using HashingMap = HashMap <
|
||||
Value, Weight,
|
||||
HashCRC32<Value>,
|
||||
HashTableGrower<4>,
|
||||
HashTableAllocatorWithStackMemory<sizeof(std::pair<Value, Weight>) * (1 << 3)>
|
||||
>;
|
||||
|
||||
using TrivialMap = HashMap <
|
||||
Value, Weight,
|
||||
UInt128TrivialHash,
|
||||
HashTableGrower<4>,
|
||||
HashTableAllocatorWithStackMemory<sizeof(std::pair<Value, Weight>) * (1 << 3)>
|
||||
>;
|
||||
|
||||
/// If column value is UInt128 then there is no need to hash values
|
||||
using Map = std::conditional_t<is_hashed, TrivialMap, HashingMap>;
|
||||
|
||||
Map map;
|
||||
|
||||
void add(const Value & x)
|
||||
{
|
||||
if (!isNaN(x))
|
||||
++map[x];
|
||||
}
|
||||
|
||||
void add(const Value & x, const Weight & weight)
|
||||
{
|
||||
if (!isNaN(x))
|
||||
map[x] += weight;
|
||||
}
|
||||
|
||||
void merge(const EntropyData & rhs)
|
||||
{
|
||||
for (const auto & pair : rhs.map)
|
||||
map[pair.first] += pair.second;
|
||||
}
|
||||
|
||||
void serialize(WriteBuffer & buf) const
|
||||
{
|
||||
map.write(buf);
|
||||
}
|
||||
|
||||
void deserialize(ReadBuffer & buf)
|
||||
{
|
||||
typename Map::Reader reader(buf);
|
||||
while (reader.next())
|
||||
{
|
||||
const auto &pair = reader.get();
|
||||
map[pair.first] = pair.second;
|
||||
}
|
||||
}
|
||||
|
||||
Float64 get() const
|
||||
{
|
||||
Float64 shannon_entropy = 0;
|
||||
UInt64 total_value = 0;
|
||||
for (const auto & pair : map)
|
||||
{
|
||||
total_value += pair.second;
|
||||
}
|
||||
Float64 cur_proba;
|
||||
Float64 log2e = 1 / std::log(2);
|
||||
for (const auto & pair : map)
|
||||
{
|
||||
cur_proba = Float64(pair.second) / total_value;
|
||||
shannon_entropy -= cur_proba * std::log(cur_proba) * log2e;
|
||||
}
|
||||
|
||||
return shannon_entropy;
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Value, bool is_hashed = false>
|
||||
class AggregateFunctionEntropy final : public IAggregateFunctionDataHelper<EntropyData<Value, is_hashed>,
|
||||
AggregateFunctionEntropy<Value>>
|
||||
{
|
||||
public:
|
||||
AggregateFunctionEntropy()
|
||||
{}
|
||||
|
||||
String getName() const override { return "entropy"; }
|
||||
|
||||
DataTypePtr getReturnType() const override
|
||||
{
|
||||
return std::make_shared<DataTypeNumber<Float64>>();
|
||||
}
|
||||
|
||||
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
|
||||
{
|
||||
if constexpr (!std::is_same_v<UInt128, Value>)
|
||||
{
|
||||
/// Here we manage only with numerical types
|
||||
const auto &column = static_cast<const ColumnVector <Value> &>(*columns[0]);
|
||||
this->data(place).add(column.getData()[row_num]);
|
||||
}
|
||||
else
|
||||
{
|
||||
this->data(place).add(UniqVariadicHash<true, false>::apply(1, columns, row_num));
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override
|
||||
{
|
||||
this->data(place).merge(this->data(rhs));
|
||||
}
|
||||
|
||||
void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override
|
||||
{
|
||||
this->data(const_cast<AggregateDataPtr>(place)).serialize(buf);
|
||||
}
|
||||
|
||||
void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override
|
||||
{
|
||||
this->data(place).deserialize(buf);
|
||||
}
|
||||
|
||||
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
|
||||
{
|
||||
auto &column = dynamic_cast<ColumnVector<Float64> &>(to);
|
||||
column.getData().push_back(this->data(place).get());
|
||||
}
|
||||
|
||||
const char * getHeaderFilePath() const override { return __FILE__; }
|
||||
|
||||
};
|
||||
|
||||
}
|
@ -19,7 +19,7 @@ namespace ErrorCodes
|
||||
/** Calculates quantile by collecting all values into array
|
||||
* and applying n-th element (introselect) algorithm for the resulting array.
|
||||
*
|
||||
* It use O(N) memory and it is very inefficient in case of high amount of identical values.
|
||||
* It uses O(N) memory and it is very inefficient in case of high amount of identical values.
|
||||
* But it is very CPU efficient for not large datasets.
|
||||
*/
|
||||
template <typename Value>
|
||||
|
@ -14,7 +14,7 @@ namespace ErrorCodes
|
||||
|
||||
/** Calculates quantile by counting number of occurrences for each value in a hash map.
|
||||
*
|
||||
* It use O(distinct(N)) memory. Can be naturally applied for values with weight.
|
||||
* It uses O(distinct(N)) memory. Can be naturally applied for values with weight.
|
||||
* In case of many identical values, it can be more efficient than QuantileExact even when weight is not used.
|
||||
*/
|
||||
template <typename Value>
|
||||
|
@ -27,6 +27,7 @@ void registerAggregateFunctionUniqUpTo(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionTopK(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionsBitwise(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionsMaxIntersections(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionEntropy(AggregateFunctionFactory &);
|
||||
|
||||
void registerAggregateFunctionCombinatorIf(AggregateFunctionCombinatorFactory &);
|
||||
void registerAggregateFunctionCombinatorArray(AggregateFunctionCombinatorFactory &);
|
||||
@ -65,6 +66,7 @@ void registerAggregateFunctions()
|
||||
registerAggregateFunctionsMaxIntersections(factory);
|
||||
registerAggregateFunctionHistogram(factory);
|
||||
registerAggregateFunctionRetention(factory);
|
||||
registerAggregateFunctionEntropy(factory);
|
||||
}
|
||||
|
||||
{
|
||||
|
@ -69,7 +69,7 @@ public:
|
||||
static void finalizePerformanceCounters();
|
||||
|
||||
/// Returns a non-empty string if the thread is attached to a query
|
||||
static std::string getCurrentQueryID();
|
||||
static const std::string & getQueryId();
|
||||
|
||||
/// Non-master threads call this method in destructor automatically
|
||||
static void detachQuery();
|
||||
|
@ -116,7 +116,7 @@ public:
|
||||
return thread_state.load(std::memory_order_relaxed);
|
||||
}
|
||||
|
||||
String getQueryID();
|
||||
const std::string & getQueryId() const;
|
||||
|
||||
/// Starts new query and create new thread group for it, current thread becomes master thread of the query
|
||||
void initializeQuery();
|
||||
@ -160,6 +160,8 @@ protected:
|
||||
/// Use it only from current thread
|
||||
Context * query_context = nullptr;
|
||||
|
||||
String query_id;
|
||||
|
||||
/// A logs queue used by TCPHandler to pass logs to a client
|
||||
InternalTextLogsQueueWeakPtr logs_queue_ptr;
|
||||
|
||||
|
@ -153,6 +153,4 @@ private:
|
||||
void attachToThreadGroup();
|
||||
};
|
||||
|
||||
using BackgroundSchedulePoolPtr = std::shared_ptr<BackgroundSchedulePool>;
|
||||
|
||||
}
|
||||
|
@ -157,7 +157,7 @@ protected:
|
||||
using QueueWithCollation = std::priority_queue<SortCursorWithCollation>;
|
||||
QueueWithCollation queue_with_collation;
|
||||
|
||||
/// Used in Vertical merge algorithm to gather non-PK columns (on next step)
|
||||
/// Used in Vertical merge algorithm to gather non-PK/non-index columns (on next step)
|
||||
/// If it is not nullptr then it should be populated during execution
|
||||
WriteBuffer * out_row_sources_buf;
|
||||
|
||||
|
@ -183,7 +183,8 @@ private:
|
||||
try
|
||||
{
|
||||
setThreadName("ParalInputsProc");
|
||||
CurrentThread::attachTo(thread_group);
|
||||
if (thread_group)
|
||||
CurrentThread::attachTo(thread_group);
|
||||
|
||||
while (!finish)
|
||||
{
|
||||
|
@ -20,9 +20,8 @@ namespace ErrorCodes
|
||||
extern const int SYNTAX_ERROR;
|
||||
}
|
||||
|
||||
DatabaseDictionary::DatabaseDictionary(const String & name_, const Context & context)
|
||||
DatabaseDictionary::DatabaseDictionary(const String & name_)
|
||||
: name(name_),
|
||||
external_dictionaries(context.getExternalDictionaries()),
|
||||
log(&Logger::get("DatabaseDictionary(" + name + ")"))
|
||||
{
|
||||
}
|
||||
@ -31,23 +30,21 @@ void DatabaseDictionary::loadTables(Context &, ThreadPool *, bool)
|
||||
{
|
||||
}
|
||||
|
||||
Tables DatabaseDictionary::loadTables()
|
||||
Tables DatabaseDictionary::listTables(const Context & context)
|
||||
{
|
||||
auto objects_map = external_dictionaries.getObjectsMap();
|
||||
auto objects_map = context.getExternalDictionaries().getObjectsMap();
|
||||
const auto & dictionaries = objects_map.get();
|
||||
|
||||
Tables tables;
|
||||
for (const auto & pair : dictionaries)
|
||||
{
|
||||
const std::string & dict_name = pair.first;
|
||||
if (deleted_tables.count(dict_name))
|
||||
continue;
|
||||
auto dict_ptr = std::static_pointer_cast<IDictionaryBase>(pair.second.loadable);
|
||||
if (dict_ptr)
|
||||
{
|
||||
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
|
||||
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
tables[dict_name] = StorageDictionary::create(dict_name, ColumnsDescription{columns}, dictionary_structure, dict_name);
|
||||
const std::string & dict_name = pair.first;
|
||||
tables[dict_name] = StorageDictionary::create(dict_name, ColumnsDescription{columns}, context, true, dict_name);
|
||||
}
|
||||
}
|
||||
|
||||
@ -55,23 +52,21 @@ Tables DatabaseDictionary::loadTables()
|
||||
}
|
||||
|
||||
bool DatabaseDictionary::isTableExist(
|
||||
const Context & /*context*/,
|
||||
const Context & context,
|
||||
const String & table_name) const
|
||||
{
|
||||
auto objects_map = external_dictionaries.getObjectsMap();
|
||||
auto objects_map = context.getExternalDictionaries().getObjectsMap();
|
||||
const auto & dictionaries = objects_map.get();
|
||||
return dictionaries.count(table_name) && !deleted_tables.count(table_name);
|
||||
return dictionaries.count(table_name);
|
||||
}
|
||||
|
||||
StoragePtr DatabaseDictionary::tryGetTable(
|
||||
const Context & /*context*/,
|
||||
const Context & context,
|
||||
const String & table_name) const
|
||||
{
|
||||
auto objects_map = external_dictionaries.getObjectsMap();
|
||||
auto objects_map = context.getExternalDictionaries().getObjectsMap();
|
||||
const auto & dictionaries = objects_map.get();
|
||||
|
||||
if (deleted_tables.count(table_name))
|
||||
return {};
|
||||
{
|
||||
auto it = dictionaries.find(table_name);
|
||||
if (it != dictionaries.end())
|
||||
@ -81,7 +76,7 @@ StoragePtr DatabaseDictionary::tryGetTable(
|
||||
{
|
||||
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
|
||||
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
return StorageDictionary::create(table_name, ColumnsDescription{columns}, dictionary_structure, table_name);
|
||||
return StorageDictionary::create(table_name, ColumnsDescription{columns}, context, true, table_name);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -89,17 +84,17 @@ StoragePtr DatabaseDictionary::tryGetTable(
|
||||
return {};
|
||||
}
|
||||
|
||||
DatabaseIteratorPtr DatabaseDictionary::getIterator(const Context & /*context*/)
|
||||
DatabaseIteratorPtr DatabaseDictionary::getIterator(const Context & context)
|
||||
{
|
||||
return std::make_unique<DatabaseSnapshotIterator>(loadTables());
|
||||
return std::make_unique<DatabaseSnapshotIterator>(listTables(context));
|
||||
}
|
||||
|
||||
bool DatabaseDictionary::empty(const Context & /*context*/) const
|
||||
bool DatabaseDictionary::empty(const Context & context) const
|
||||
{
|
||||
auto objects_map = external_dictionaries.getObjectsMap();
|
||||
auto objects_map = context.getExternalDictionaries().getObjectsMap();
|
||||
const auto & dictionaries = objects_map.get();
|
||||
for (const auto & pair : dictionaries)
|
||||
if (pair.second.loadable && !deleted_tables.count(pair.first))
|
||||
if (pair.second.loadable)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
@ -115,23 +110,19 @@ void DatabaseDictionary::attachTable(const String & /*table_name*/, const Storag
|
||||
}
|
||||
|
||||
void DatabaseDictionary::createTable(
|
||||
const Context & /*context*/,
|
||||
const String & /*table_name*/,
|
||||
const StoragePtr & /*table*/,
|
||||
const ASTPtr & /*query*/)
|
||||
const Context &,
|
||||
const String &,
|
||||
const StoragePtr &,
|
||||
const ASTPtr &)
|
||||
{
|
||||
throw Exception("DatabaseDictionary: createTable() is not supported", ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
|
||||
void DatabaseDictionary::removeTable(
|
||||
const Context & context,
|
||||
const String & table_name)
|
||||
const Context &,
|
||||
const String &)
|
||||
{
|
||||
if (!isTableExist(context, table_name))
|
||||
throw Exception("Table " + name + "." + table_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE);
|
||||
|
||||
auto objects_map = external_dictionaries.getObjectsMap();
|
||||
deleted_tables.insert(table_name);
|
||||
throw Exception("DatabaseDictionary: removeTable() is not supported", ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
|
||||
void DatabaseDictionary::renameTable(
|
||||
@ -147,6 +138,7 @@ void DatabaseDictionary::alterTable(
|
||||
const Context &,
|
||||
const String &,
|
||||
const ColumnsDescription &,
|
||||
const IndicesDescription &,
|
||||
const ASTModifier &)
|
||||
{
|
||||
throw Exception("DatabaseDictionary: alterTable() is not supported", ErrorCodes::NOT_IMPLEMENTED);
|
||||
|
@ -15,7 +15,6 @@ namespace Poco
|
||||
|
||||
namespace DB
|
||||
{
|
||||
class ExternalDictionaries;
|
||||
|
||||
/* Database to store StorageDictionary tables
|
||||
* automatically creates tables for all dictionaries
|
||||
@ -23,7 +22,7 @@ class ExternalDictionaries;
|
||||
class DatabaseDictionary : public IDatabase
|
||||
{
|
||||
public:
|
||||
DatabaseDictionary(const String & name_, const Context & context);
|
||||
DatabaseDictionary(const String & name_);
|
||||
|
||||
String getDatabaseName() const override;
|
||||
|
||||
@ -72,6 +71,7 @@ public:
|
||||
const Context & context,
|
||||
const String & name,
|
||||
const ColumnsDescription & columns,
|
||||
const IndicesDescription & indices,
|
||||
const ASTModifier & engine_modifier) override;
|
||||
|
||||
time_t getTableMetadataModificationTime(
|
||||
@ -93,13 +93,10 @@ public:
|
||||
private:
|
||||
const String name;
|
||||
mutable std::mutex mutex;
|
||||
const ExternalDictionaries & external_dictionaries;
|
||||
std::unordered_set<String> deleted_tables;
|
||||
|
||||
Poco::Logger * log;
|
||||
|
||||
Tables loadTables();
|
||||
|
||||
Tables listTables(const Context & context);
|
||||
ASTPtr getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const;
|
||||
};
|
||||
|
||||
|
@ -23,7 +23,7 @@ DatabasePtr DatabaseFactory::get(
|
||||
else if (engine_name == "Memory")
|
||||
return std::make_shared<DatabaseMemory>(database_name);
|
||||
else if (engine_name == "Dictionary")
|
||||
return std::make_shared<DatabaseDictionary>(database_name, context);
|
||||
return std::make_shared<DatabaseDictionary>(database_name);
|
||||
|
||||
throw Exception("Unknown database engine: " + engine_name, ErrorCodes::UNKNOWN_DATABASE_ENGINE);
|
||||
}
|
||||
|
@ -53,6 +53,7 @@ void DatabaseMemory::alterTable(
|
||||
const Context &,
|
||||
const String &,
|
||||
const ColumnsDescription &,
|
||||
const IndicesDescription &,
|
||||
const ASTModifier &)
|
||||
{
|
||||
throw Exception("DatabaseMemory: alterTable() is not supported", ErrorCodes::NOT_IMPLEMENTED);
|
||||
|
@ -48,6 +48,7 @@ public:
|
||||
const Context & context,
|
||||
const String & name,
|
||||
const ColumnsDescription & columns,
|
||||
const IndicesDescription & indices,
|
||||
const ASTModifier & engine_modifier) override;
|
||||
|
||||
time_t getTableMetadataModificationTime(
|
||||
|
@ -510,6 +510,7 @@ void DatabaseOrdinary::alterTable(
|
||||
const Context & context,
|
||||
const String & table_name,
|
||||
const ColumnsDescription & columns,
|
||||
const IndicesDescription & indices,
|
||||
const ASTModifier & storage_modifier)
|
||||
{
|
||||
/// Read the definition of the table and replace the necessary parts with new ones.
|
||||
@ -531,7 +532,14 @@ void DatabaseOrdinary::alterTable(
|
||||
ASTCreateQuery & ast_create_query = typeid_cast<ASTCreateQuery &>(*ast);
|
||||
|
||||
ASTPtr new_columns = InterpreterCreateQuery::formatColumns(columns);
|
||||
ast_create_query.replace(ast_create_query.columns, new_columns);
|
||||
ASTPtr new_indices = InterpreterCreateQuery::formatIndices(indices);
|
||||
|
||||
ast_create_query.columns_list->replace(ast_create_query.columns_list->columns, new_columns);
|
||||
|
||||
if (ast_create_query.columns_list->indices)
|
||||
ast_create_query.columns_list->replace(ast_create_query.columns_list->indices, new_indices);
|
||||
else
|
||||
ast_create_query.columns_list->set(ast_create_query.columns_list->indices, new_indices);
|
||||
|
||||
if (storage_modifier)
|
||||
storage_modifier(*ast_create_query.storage);
|
||||
|
@ -42,6 +42,7 @@ public:
|
||||
const Context & context,
|
||||
const String & name,
|
||||
const ColumnsDescription & columns,
|
||||
const IndicesDescription & indices,
|
||||
const ASTModifier & engine_modifier) override;
|
||||
|
||||
time_t getTableMetadataModificationTime(
|
||||
|
@ -68,10 +68,10 @@ std::pair<String, StoragePtr> createTableFromDefinition(
|
||||
/// We do not directly use `InterpreterCreateQuery::execute`, because
|
||||
/// - the database has not been created yet;
|
||||
/// - the code is simpler, since the query is already brought to a suitable form.
|
||||
if (!ast_create_query.columns)
|
||||
if (!ast_create_query.columns_list || !ast_create_query.columns_list->columns)
|
||||
throw Exception("Missing definition of columns.", ErrorCodes::EMPTY_LIST_OF_COLUMNS_PASSED);
|
||||
|
||||
ColumnsDescription columns = InterpreterCreateQuery::getColumnsDescription(*ast_create_query.columns, context);
|
||||
ColumnsDescription columns = InterpreterCreateQuery::getColumnsDescription(*ast_create_query.columns_list->columns, context);
|
||||
|
||||
return
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
#include <Core/Types.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <Storages/IndicesDescription.h>
|
||||
#include <ctime>
|
||||
#include <memory>
|
||||
#include <functional>
|
||||
@ -115,6 +116,7 @@ public:
|
||||
const Context & context,
|
||||
const String & name,
|
||||
const ColumnsDescription & columns,
|
||||
const IndicesDescription & indices,
|
||||
const ASTModifier & engine_modifier) = 0;
|
||||
|
||||
/// Returns time of table's metadata change, 0 if there is no corresponding metadata file.
|
||||
|
@ -54,7 +54,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
const Block & sample_block,
|
||||
Context & context)
|
||||
Context & context_)
|
||||
: update_time{std::chrono::system_clock::from_time_t(0)}
|
||||
, dict_struct{dict_struct_}
|
||||
, host{config.getString(config_prefix + ".host")}
|
||||
@ -69,11 +69,13 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
|
||||
, invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
|
||||
, query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks}
|
||||
, sample_block{sample_block}
|
||||
, context(context)
|
||||
, context(context_)
|
||||
, is_local{isLocalAddress({host, port}, context.getTCPPort())}
|
||||
, pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)}
|
||||
, load_all_query{query_builder.composeLoadAllQuery()}
|
||||
{
|
||||
/// We should set user info even for the case when the dictionary is loaded in-process (without TCP communication).
|
||||
context.setUser(user, password, Poco::Net::SocketAddress("127.0.0.1", 0), {});
|
||||
}
|
||||
|
||||
|
||||
@ -182,7 +184,8 @@ std::string ClickHouseDictionarySource::doInvalidateQuery(const std::string & re
|
||||
{
|
||||
if (is_local)
|
||||
{
|
||||
auto input_block = executeQuery(request, context, true).in;
|
||||
Context query_context = context;
|
||||
auto input_block = executeQuery(request, query_context, true).in;
|
||||
return readInvalidateQuery(*input_block);
|
||||
}
|
||||
else
|
||||
@ -201,7 +204,8 @@ void registerDictionarySourceClickHouse(DictionarySourceFactory & factory)
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
Context & context) -> DictionarySourcePtr {
|
||||
Context & context) -> DictionarySourcePtr
|
||||
{
|
||||
return std::make_unique<ClickHouseDictionarySource>(dict_struct, config, config_prefix + ".clickhouse", sample_block, context);
|
||||
};
|
||||
factory.registerSource("clickhouse", createTableSource);
|
||||
|
@ -2,6 +2,7 @@
|
||||
|
||||
#include <memory>
|
||||
#include <Client/ConnectionPoolWithFailover.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include "DictionaryStructure.h"
|
||||
#include "ExternalQueryBuilder.h"
|
||||
#include "IDictionarySource.h"
|
||||
@ -65,7 +66,7 @@ private:
|
||||
mutable std::string invalidate_query_response;
|
||||
ExternalQueryBuilder query_builder;
|
||||
Block sample_block;
|
||||
Context & context;
|
||||
Context context;
|
||||
const bool is_local;
|
||||
ConnectionPoolWithFailoverPtr pool;
|
||||
const std::string load_all_query;
|
||||
|
@ -14,7 +14,6 @@ namespace ErrorCodes
|
||||
|
||||
void DictionaryFactory::registerLayout(const std::string & layout_type, Creator create_layout)
|
||||
{
|
||||
//LOG_DEBUG(log, "Register dictionary layout type `" + layout_type + "`");
|
||||
if (!registered_layouts.emplace(layout_type, std::move(create_layout)).second)
|
||||
throw Exception("DictionaryFactory: the layout name '" + layout_type + "' is not unique", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
@ -234,7 +234,8 @@ void registerDictionarySourceExecutable(DictionarySourceFactory & factory)
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context) -> DictionarySourcePtr {
|
||||
Context & context) -> DictionarySourcePtr
|
||||
{
|
||||
if (dict_struct.has_expressions)
|
||||
throw Exception{"Dictionary source of type `executable` does not support attribute expressions", ErrorCodes::LOGICAL_ERROR};
|
||||
|
||||
|
@ -56,7 +56,8 @@ void registerDictionarySourceFile(DictionarySourceFactory & factory)
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context) -> DictionarySourcePtr {
|
||||
Context & context) -> DictionarySourcePtr
|
||||
{
|
||||
if (dict_struct.has_expressions)
|
||||
throw Exception{"Dictionary source of type `file` does not support attribute expressions", ErrorCodes::LOGICAL_ERROR};
|
||||
|
||||
|
@ -157,7 +157,8 @@ void registerDictionarySourceHTTP(DictionarySourceFactory & factory)
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context) -> DictionarySourcePtr {
|
||||
Context & context) -> DictionarySourcePtr
|
||||
{
|
||||
if (dict_struct.has_expressions)
|
||||
throw Exception{"Dictionary source of type `http` does not support attribute expressions", ErrorCodes::LOGICAL_ERROR};
|
||||
|
||||
|
@ -121,14 +121,12 @@ LibraryDictionarySource::LibraryDictionarySource(
|
||||
const DictionaryStructure & dict_struct_,
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context)
|
||||
Block & sample_block)
|
||||
: log(&Logger::get("LibraryDictionarySource"))
|
||||
, dict_struct{dict_struct_}
|
||||
, config_prefix{config_prefix}
|
||||
, path{config.getString(config_prefix + ".path", "")}
|
||||
, sample_block{sample_block}
|
||||
, context(context)
|
||||
{
|
||||
if (!Poco::File(path).exists())
|
||||
throw Exception(
|
||||
@ -152,7 +150,6 @@ LibraryDictionarySource::LibraryDictionarySource(const LibraryDictionarySource &
|
||||
, config_prefix{other.config_prefix}
|
||||
, path{other.path}
|
||||
, sample_block{other.sample_block}
|
||||
, context(other.context)
|
||||
, library{other.library}
|
||||
, description{other.description}
|
||||
, settings{other.settings}
|
||||
@ -288,8 +285,9 @@ void registerDictionarySourceLibrary(DictionarySourceFactory & factory)
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context) -> DictionarySourcePtr {
|
||||
return std::make_unique<LibraryDictionarySource>(dict_struct, config, config_prefix + ".library", sample_block, context);
|
||||
const Context &) -> DictionarySourcePtr
|
||||
{
|
||||
return std::make_unique<LibraryDictionarySource>(dict_struct, config, config_prefix + ".library", sample_block);
|
||||
};
|
||||
factory.registerSource("library", createTableSource);
|
||||
}
|
||||
|
@ -32,8 +32,7 @@ public:
|
||||
const DictionaryStructure & dict_struct_,
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & config_prefix,
|
||||
Block & sample_block,
|
||||
const Context & context);
|
||||
Block & sample_block);
|
||||
|
||||
LibraryDictionarySource(const LibraryDictionarySource & other);
|
||||
|
||||
@ -70,7 +69,6 @@ private:
|
||||
const std::string config_prefix;
|
||||
const std::string path;
|
||||
Block sample_block;
|
||||
const Context & context;
|
||||
SharedLibraryPtr library;
|
||||
ExternalResultDescription description;
|
||||
std::shared_ptr<CStringsHolder> settings;
|
||||
|
@ -186,7 +186,7 @@ public:
|
||||
: owned_dict(owned_dict_)
|
||||
{
|
||||
if (!owned_dict)
|
||||
throw Exception("Dictionaries was not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
throw Exception("Embedded dictionaries were not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
@ -280,7 +280,7 @@ public:
|
||||
: owned_dict(owned_dict_)
|
||||
{
|
||||
if (!owned_dict)
|
||||
throw Exception("Dictionaries was not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
throw Exception("Embedded dictionaries were not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
@ -418,7 +418,7 @@ public:
|
||||
: owned_dict(owned_dict_)
|
||||
{
|
||||
if (!owned_dict)
|
||||
throw Exception("Dictionaries was not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
throw Exception("Embedded dictionaries were not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
@ -690,7 +690,7 @@ public:
|
||||
: owned_dict(owned_dict_)
|
||||
{
|
||||
if (!owned_dict)
|
||||
throw Exception("Dictionaries was not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
throw Exception("Embedded dictionaries were not loaded. You need to check configuration file.", ErrorCodes::DICTIONARIES_WAS_NOT_LOADED);
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
|
51
dbms/src/Functions/bitSwapLastTwo.cpp
Normal file
51
dbms/src/Functions/bitSwapLastTwo.cpp
Normal file
@ -0,0 +1,51 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionUnaryArithmetic.h>
|
||||
#include <DataTypes/NumberTraits.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
template <typename A>
|
||||
struct BitSwapLastTwoImpl
|
||||
{
|
||||
using ResultType = UInt8;
|
||||
|
||||
static inline ResultType apply(A a)
|
||||
{
|
||||
return static_cast<ResultType>(
|
||||
((static_cast<ResultType>(a) & 1) << 1) | ((static_cast<ResultType>(a) >> 1) & 1));
|
||||
}
|
||||
|
||||
#if USE_EMBEDDED_COMPILER
|
||||
static constexpr bool compilable = true;
|
||||
|
||||
static inline llvm::Value * compile(llvm::IRBuilder<> & b, llvm::Value * arg, bool)
|
||||
{
|
||||
if (!arg->getType()->isIntegerTy())
|
||||
throw Exception("__bitSwapLastTwo expected an integral type", ErrorCodes::LOGICAL_ERROR);
|
||||
return b.CreateOr(
|
||||
b.CreateShl(b.CreateAnd(arg, 1), 1),
|
||||
b.CreateAnd(b.CreateLShr(arg, 1), 1)
|
||||
);
|
||||
}
|
||||
#endif
|
||||
};
|
||||
|
||||
struct NameBitSwapLastTwo { static constexpr auto name = "__bitSwapLastTwo"; };
|
||||
using FunctionBitSwapLastTwo = FunctionUnaryArithmetic<BitSwapLastTwoImpl, NameBitSwapLastTwo, true>;
|
||||
|
||||
template <> struct FunctionUnaryArithmeticMonotonicity<NameBitSwapLastTwo>
|
||||
{
|
||||
static bool has() { return false; }
|
||||
static IFunction::Monotonicity get(const Field &, const Field &)
|
||||
{
|
||||
return {};
|
||||
}
|
||||
};
|
||||
|
||||
void registerFunctionBitSwapLastTwo(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionBitSwapLastTwo>();
|
||||
}
|
||||
|
||||
}
|
@ -33,6 +33,8 @@ void registerFunctionRoundToExp2(FunctionFactory & factory);
|
||||
void registerFunctionRoundDuration(FunctionFactory & factory);
|
||||
void registerFunctionRoundAge(FunctionFactory & factory);
|
||||
|
||||
void registerFunctionBitSwapLastTwo(FunctionFactory & factory);
|
||||
|
||||
void registerFunctionsArithmetic(FunctionFactory & factory)
|
||||
{
|
||||
registerFunctionPlus(factory);
|
||||
@ -64,6 +66,9 @@ void registerFunctionsArithmetic(FunctionFactory & factory)
|
||||
registerFunctionRoundToExp2(factory);
|
||||
registerFunctionRoundDuration(factory);
|
||||
registerFunctionRoundAge(factory);
|
||||
|
||||
/// Not for external use.
|
||||
registerFunctionBitSwapLastTwo(factory);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -1,8 +1,8 @@
|
||||
#include <map>
|
||||
#include <set>
|
||||
#include <boost/functional/hash/hash.hpp>
|
||||
#include <optional>
|
||||
#include <memory>
|
||||
#include <Poco/Mutex.h>
|
||||
#include <Poco/File.h>
|
||||
#include <Poco/UUID.h>
|
||||
#include <Poco/Net/IPAddress.h>
|
||||
#include <common/logger_useful.h>
|
||||
@ -98,7 +98,7 @@ struct ContextShared
|
||||
{
|
||||
Logger * log = &Logger::get("Context");
|
||||
|
||||
std::shared_ptr<IRuntimeComponentsFactory> runtime_components_factory;
|
||||
std::unique_ptr<IRuntimeComponentsFactory> runtime_components_factory;
|
||||
|
||||
/// For access of most of shared objects. Recursive mutex.
|
||||
mutable std::recursive_mutex mutex;
|
||||
@ -124,12 +124,12 @@ struct ContextShared
|
||||
ConfigurationPtr config; /// Global configuration settings.
|
||||
|
||||
Databases databases; /// List of databases and tables in them.
|
||||
mutable std::shared_ptr<EmbeddedDictionaries> embedded_dictionaries; /// Metrica's dictionaries. Have lazy initialization.
|
||||
mutable std::shared_ptr<ExternalDictionaries> external_dictionaries;
|
||||
mutable std::shared_ptr<ExternalModels> external_models;
|
||||
mutable std::optional<EmbeddedDictionaries> embedded_dictionaries; /// Metrica's dictionaries. Have lazy initialization.
|
||||
mutable std::optional<ExternalDictionaries> external_dictionaries;
|
||||
mutable std::optional<ExternalModels> external_models;
|
||||
String default_profile_name; /// Default profile name used for default values.
|
||||
String system_profile_name; /// Profile used by system processes
|
||||
std::shared_ptr<ISecurityManager> security_manager; /// Known users.
|
||||
std::unique_ptr<ISecurityManager> security_manager; /// Known users.
|
||||
Quotas quotas; /// Known quotas for resource use.
|
||||
mutable UncompressedCachePtr uncompressed_cache; /// The cache of decompressed blocks.
|
||||
mutable MarkCachePtr mark_cache; /// Cache of marks in compressed files.
|
||||
@ -138,18 +138,19 @@ struct ContextShared
|
||||
ViewDependencies view_dependencies; /// Current dependencies
|
||||
ConfigurationPtr users_config; /// Config with the users, profiles and quotas sections.
|
||||
InterserverIOHandler interserver_io_handler; /// Handler for interserver communication.
|
||||
BackgroundProcessingPoolPtr background_pool; /// The thread pool for the background work performed by the tables.
|
||||
BackgroundSchedulePoolPtr schedule_pool; /// A thread pool that can run different jobs in background (used in replicated tables)
|
||||
std::optional<BackgroundProcessingPool> background_pool; /// The thread pool for the background work performed by the tables.
|
||||
std::optional<BackgroundSchedulePool> schedule_pool; /// A thread pool that can run different jobs in background (used in replicated tables)
|
||||
MultiVersion<Macros> macros; /// Substitutions extracted from config.
|
||||
std::unique_ptr<Compiler> compiler; /// Used for dynamic compilation of queries' parts if it necessary.
|
||||
std::optional<Compiler> compiler; /// Used for dynamic compilation of queries' parts if it necessary.
|
||||
std::shared_ptr<DDLWorker> ddl_worker; /// Process ddl commands from zk.
|
||||
/// Rules for selecting the compression settings, depending on the size of the part.
|
||||
mutable std::unique_ptr<CompressionCodecSelector> compression_codec_selector;
|
||||
std::unique_ptr<MergeTreeSettings> merge_tree_settings; /// Settings of MergeTree* engines.
|
||||
std::optional<MergeTreeSettings> merge_tree_settings; /// Settings of MergeTree* engines.
|
||||
size_t max_table_size_to_drop = 50000000000lu; /// Protects MergeTree tables from accidental DROP (50GB by default)
|
||||
size_t max_partition_size_to_drop = 50000000000lu; /// Protects MergeTree partitions from accidental DROP (50GB by default)
|
||||
String format_schema_path; /// Path to a directory that contains schema files used by input formats.
|
||||
ActionLocksManagerPtr action_locks_manager; /// Set of storages' action lockers
|
||||
SystemLogsPtr system_logs; /// Used to log queries and operations on parts
|
||||
|
||||
/// Named sessions. The user could specify session identifier to reuse settings and temporary tables in subsequent requests.
|
||||
|
||||
@ -206,7 +207,7 @@ struct ContextShared
|
||||
|
||||
Context::ConfigReloadCallback config_reload_callback;
|
||||
|
||||
ContextShared(std::shared_ptr<IRuntimeComponentsFactory> runtime_components_factory_)
|
||||
ContextShared(std::unique_ptr<IRuntimeComponentsFactory> runtime_components_factory_)
|
||||
: runtime_components_factory(std::move(runtime_components_factory_)), macros(std::make_unique<Macros>())
|
||||
{
|
||||
/// TODO: make it singleton (?)
|
||||
@ -243,6 +244,8 @@ struct ContextShared
|
||||
return;
|
||||
shutdown_called = true;
|
||||
|
||||
system_logs.reset();
|
||||
|
||||
/** At this point, some tables may have threads that block our mutex.
|
||||
* To complete them correctly, we will copy the current list of tables,
|
||||
* and ask them all to finish their work.
|
||||
@ -263,6 +266,15 @@ struct ContextShared
|
||||
std::lock_guard lock(mutex);
|
||||
databases.clear();
|
||||
}
|
||||
|
||||
/// Preemptive destruction is important, because these objects may have a refcount to ContextShared (cyclic reference).
|
||||
/// TODO: Get rid of this.
|
||||
|
||||
embedded_dictionaries.reset();
|
||||
external_dictionaries.reset();
|
||||
external_models.reset();
|
||||
background_pool.reset();
|
||||
schedule_pool.reset();
|
||||
}
|
||||
|
||||
private:
|
||||
@ -276,11 +288,10 @@ private:
|
||||
Context::Context() = default;
|
||||
|
||||
|
||||
Context Context::createGlobal(std::shared_ptr<IRuntimeComponentsFactory> runtime_components_factory)
|
||||
Context Context::createGlobal(std::unique_ptr<IRuntimeComponentsFactory> runtime_components_factory)
|
||||
{
|
||||
Context res;
|
||||
res.runtime_components_factory = runtime_components_factory;
|
||||
res.shared = std::make_shared<ContextShared>(runtime_components_factory);
|
||||
res.shared = std::make_shared<ContextShared>(std::move(runtime_components_factory));
|
||||
res.quota = std::make_shared<QuotaForIntervals>();
|
||||
return res;
|
||||
}
|
||||
@ -290,18 +301,7 @@ Context Context::createGlobal()
|
||||
return createGlobal(std::make_unique<RuntimeComponentsFactory>());
|
||||
}
|
||||
|
||||
Context::~Context()
|
||||
{
|
||||
try
|
||||
{
|
||||
/// Destroy system logs while at least one Context is alive
|
||||
system_logs.reset();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
}
|
||||
Context::~Context() = default;
|
||||
|
||||
|
||||
InterserverIOHandler & Context::getInterserverIOHandler() { return shared->interserver_io_handler; }
|
||||
@ -1077,6 +1077,13 @@ void Context::setCurrentQueryId(const String & query_id)
|
||||
client_info.current_query_id = query_id_to_set;
|
||||
}
|
||||
|
||||
void Context::killCurrentQuery()
|
||||
{
|
||||
if (process_list_elem)
|
||||
{
|
||||
process_list_elem->cancelQuery(true);
|
||||
}
|
||||
};
|
||||
|
||||
String Context::getDefaultFormat() const
|
||||
{
|
||||
@ -1181,9 +1188,9 @@ EmbeddedDictionaries & Context::getEmbeddedDictionariesImpl(const bool throw_on_
|
||||
|
||||
if (!shared->embedded_dictionaries)
|
||||
{
|
||||
auto geo_dictionaries_loader = runtime_components_factory->createGeoDictionariesLoader();
|
||||
auto geo_dictionaries_loader = shared->runtime_components_factory->createGeoDictionariesLoader();
|
||||
|
||||
shared->embedded_dictionaries = std::make_shared<EmbeddedDictionaries>(
|
||||
shared->embedded_dictionaries.emplace(
|
||||
std::move(geo_dictionaries_loader),
|
||||
*this->global_context,
|
||||
throw_on_error);
|
||||
@ -1202,9 +1209,9 @@ ExternalDictionaries & Context::getExternalDictionariesImpl(const bool throw_on_
|
||||
if (!this->global_context)
|
||||
throw Exception("Logical error: there is no global context", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
auto config_repository = runtime_components_factory->createExternalDictionariesConfigRepository();
|
||||
auto config_repository = shared->runtime_components_factory->createExternalDictionariesConfigRepository();
|
||||
|
||||
shared->external_dictionaries = std::make_shared<ExternalDictionaries>(
|
||||
shared->external_dictionaries.emplace(
|
||||
std::move(config_repository),
|
||||
*this->global_context,
|
||||
throw_on_error);
|
||||
@ -1222,9 +1229,9 @@ ExternalModels & Context::getExternalModelsImpl(bool throw_on_error) const
|
||||
if (!this->global_context)
|
||||
throw Exception("Logical error: there is no global context", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
auto config_repository = runtime_components_factory->createExternalModelsConfigRepository();
|
||||
auto config_repository = shared->runtime_components_factory->createExternalModelsConfigRepository();
|
||||
|
||||
shared->external_models = std::make_shared<ExternalModels>(
|
||||
shared->external_models.emplace(
|
||||
std::move(config_repository),
|
||||
*this->global_context,
|
||||
throw_on_error);
|
||||
@ -1342,7 +1349,7 @@ BackgroundProcessingPool & Context::getBackgroundPool()
|
||||
{
|
||||
auto lock = getLock();
|
||||
if (!shared->background_pool)
|
||||
shared->background_pool = std::make_shared<BackgroundProcessingPool>(settings.background_pool_size);
|
||||
shared->background_pool.emplace(settings.background_pool_size);
|
||||
return *shared->background_pool;
|
||||
}
|
||||
|
||||
@ -1350,7 +1357,7 @@ BackgroundSchedulePool & Context::getSchedulePool()
|
||||
{
|
||||
auto lock = getLock();
|
||||
if (!shared->schedule_pool)
|
||||
shared->schedule_pool = std::make_shared<BackgroundSchedulePool>(settings.background_schedule_pool_size);
|
||||
shared->schedule_pool.emplace(settings.background_schedule_pool_size);
|
||||
return *shared->schedule_pool;
|
||||
}
|
||||
|
||||
@ -1529,7 +1536,7 @@ Compiler & Context::getCompiler()
|
||||
auto lock = getLock();
|
||||
|
||||
if (!shared->compiler)
|
||||
shared->compiler = std::make_unique<Compiler>(shared->path + "build/", 1);
|
||||
shared->compiler.emplace(shared->path + "build/", 1);
|
||||
|
||||
return *shared->compiler;
|
||||
}
|
||||
@ -1542,7 +1549,7 @@ void Context::initializeSystemLogs()
|
||||
if (!global_context)
|
||||
throw Exception("Logical error: no global context for system logs", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
system_logs = std::make_shared<SystemLogs>(*global_context, getConfigRef());
|
||||
shared->system_logs = std::make_shared<SystemLogs>(*global_context, getConfigRef());
|
||||
}
|
||||
|
||||
|
||||
@ -1550,10 +1557,10 @@ QueryLog * Context::getQueryLog()
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
if (!system_logs || !system_logs->query_log)
|
||||
if (!shared->system_logs || !shared->system_logs->query_log)
|
||||
return nullptr;
|
||||
|
||||
return system_logs->query_log.get();
|
||||
return shared->system_logs->query_log.get();
|
||||
}
|
||||
|
||||
|
||||
@ -1561,10 +1568,10 @@ QueryThreadLog * Context::getQueryThreadLog()
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
||||
if (!system_logs || !system_logs->query_thread_log)
|
||||
if (!shared->system_logs || !shared->system_logs->query_thread_log)
|
||||
return nullptr;
|
||||
|
||||
return system_logs->query_thread_log.get();
|
||||
return shared->system_logs->query_thread_log.get();
|
||||
}
|
||||
|
||||
|
||||
@ -1573,16 +1580,16 @@ PartLog * Context::getPartLog(const String & part_database)
|
||||
auto lock = getLock();
|
||||
|
||||
/// System logs are shutting down.
|
||||
if (!system_logs || !system_logs->part_log)
|
||||
if (!shared->system_logs || !shared->system_logs->part_log)
|
||||
return nullptr;
|
||||
|
||||
/// Will not log operations on system tables (including part_log itself).
|
||||
/// It doesn't make sense and not allow to destruct PartLog correctly due to infinite logging and flushing,
|
||||
/// and also make troubles on startup.
|
||||
if (part_database == system_logs->part_log_database)
|
||||
if (part_database == shared->system_logs->part_log_database)
|
||||
return nullptr;
|
||||
|
||||
return system_logs->part_log.get();
|
||||
return shared->system_logs->part_log.get();
|
||||
}
|
||||
|
||||
|
||||
@ -1612,7 +1619,7 @@ const MergeTreeSettings & Context::getMergeTreeSettings() const
|
||||
if (!shared->merge_tree_settings)
|
||||
{
|
||||
auto & config = getConfigRef();
|
||||
shared->merge_tree_settings = std::make_unique<MergeTreeSettings>();
|
||||
shared->merge_tree_settings.emplace();
|
||||
shared->merge_tree_settings->loadFromConfig("merge_tree", config);
|
||||
}
|
||||
|
||||
@ -1727,7 +1734,6 @@ void Context::reloadConfig() const
|
||||
|
||||
void Context::shutdown()
|
||||
{
|
||||
system_logs.reset();
|
||||
shared->shutdown();
|
||||
}
|
||||
|
||||
|
@ -113,8 +113,6 @@ private:
|
||||
using Shared = std::shared_ptr<ContextShared>;
|
||||
Shared shared;
|
||||
|
||||
std::shared_ptr<IRuntimeComponentsFactory> runtime_components_factory;
|
||||
|
||||
ClientInfo client_info;
|
||||
ExternalTablesInitializer external_tables_initializer_callback;
|
||||
|
||||
@ -133,7 +131,6 @@ private:
|
||||
Context * query_context = nullptr;
|
||||
Context * session_context = nullptr; /// Session context or nullptr. Could be equal to this.
|
||||
Context * global_context = nullptr; /// Global context or nullptr. Could be equal to this.
|
||||
SystemLogsPtr system_logs; /// Used to log queries and operations on parts
|
||||
|
||||
UInt64 session_close_cycle = 0;
|
||||
bool session_is_used = false;
|
||||
@ -149,7 +146,7 @@ private:
|
||||
|
||||
public:
|
||||
/// Create initial Context with ContextShared and etc.
|
||||
static Context createGlobal(std::shared_ptr<IRuntimeComponentsFactory> runtime_components_factory);
|
||||
static Context createGlobal(std::unique_ptr<IRuntimeComponentsFactory> runtime_components_factory);
|
||||
static Context createGlobal();
|
||||
|
||||
Context(const Context &) = default;
|
||||
@ -236,6 +233,8 @@ public:
|
||||
void setCurrentDatabase(const String & name);
|
||||
void setCurrentQueryId(const String & query_id);
|
||||
|
||||
void killCurrentQuery();
|
||||
|
||||
void setInsertionTable(std::pair<String, String> && db_and_table) { insertion_table = db_and_table; }
|
||||
const std::pair<String, String> & getInsertionTable() const { return insertion_table; }
|
||||
|
||||
|
@ -5,6 +5,7 @@
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ParserTablesInSelectQuery.h>
|
||||
#include <Parsers/ExpressionListParsers.h>
|
||||
@ -19,23 +20,112 @@ namespace ErrorCodes
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
/// TODO: array join aliases?
|
||||
struct CheckColumnsVisitorData
|
||||
/// It checks if where expression could be moved to JOIN ON expression partially or entirely.
|
||||
class CheckExpressionVisitorData
|
||||
{
|
||||
using TypeToVisit = ASTIdentifier;
|
||||
public:
|
||||
using TypeToVisit = const ASTFunction;
|
||||
|
||||
const std::vector<DatabaseAndTableWithAlias> & tables;
|
||||
size_t visited;
|
||||
size_t found;
|
||||
CheckExpressionVisitorData(const std::vector<DatabaseAndTableWithAlias> & tables_)
|
||||
: tables(tables_)
|
||||
, save_where(false)
|
||||
, flat_ands(true)
|
||||
{}
|
||||
|
||||
size_t allMatch() const { return visited == found; }
|
||||
|
||||
void visit(ASTIdentifier & node, ASTPtr &)
|
||||
void visit(const ASTFunction & node, ASTPtr & ast)
|
||||
{
|
||||
++visited;
|
||||
for (const auto & t : tables)
|
||||
if (IdentifierSemantic::canReferColumnToTable(node, t))
|
||||
++found;
|
||||
if (node.name == "and")
|
||||
{
|
||||
if (!node.arguments || node.arguments->children.empty())
|
||||
throw Exception("Logical error: function requires argiment", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
for (auto & child : node.arguments->children)
|
||||
{
|
||||
if (auto func = typeid_cast<const ASTFunction *>(child.get()))
|
||||
{
|
||||
if (func->name == "and")
|
||||
flat_ands = false;
|
||||
visit(*func, child);
|
||||
}
|
||||
else
|
||||
save_where = true;
|
||||
}
|
||||
}
|
||||
else if (node.name == "equals")
|
||||
{
|
||||
if (checkEquals(node))
|
||||
asts_to_join_on.push_back(ast);
|
||||
else
|
||||
save_where = true;
|
||||
}
|
||||
else
|
||||
save_where = true;
|
||||
}
|
||||
|
||||
bool matchAny() const { return !asts_to_join_on.empty(); }
|
||||
bool matchAll() const { return matchAny() && !save_where; }
|
||||
bool canReuseWhere() const { return matchAll() && flat_ands; }
|
||||
|
||||
ASTPtr makeOnExpression()
|
||||
{
|
||||
if (asts_to_join_on.size() == 1)
|
||||
return asts_to_join_on[0]->clone();
|
||||
|
||||
std::vector<ASTPtr> arguments;
|
||||
arguments.reserve(asts_to_join_on.size());
|
||||
for (auto & ast : asts_to_join_on)
|
||||
arguments.emplace_back(ast->clone());
|
||||
|
||||
return makeASTFunction("and", std::move(arguments));
|
||||
}
|
||||
|
||||
private:
|
||||
const std::vector<DatabaseAndTableWithAlias> & tables;
|
||||
std::vector<ASTPtr> asts_to_join_on;
|
||||
bool save_where;
|
||||
bool flat_ands;
|
||||
|
||||
bool checkEquals(const ASTFunction & node)
|
||||
{
|
||||
if (!node.arguments)
|
||||
throw Exception("Logical error: function requires argiment", ErrorCodes::LOGICAL_ERROR);
|
||||
if (node.arguments->children.size() != 2)
|
||||
return false;
|
||||
|
||||
auto left = typeid_cast<const ASTIdentifier *>(node.arguments->children[0].get());
|
||||
auto right = typeid_cast<const ASTIdentifier *>(node.arguments->children[1].get());
|
||||
if (!left || !right)
|
||||
return false;
|
||||
|
||||
return checkIdentifiers(*left, *right);
|
||||
}
|
||||
|
||||
/// Check if the identifiers are from different joined tables. If it's a self joint, tables should have aliases.
|
||||
/// select * from t1 a cross join t2 b where a.x = b.x
|
||||
bool checkIdentifiers(const ASTIdentifier & left, const ASTIdentifier & right)
|
||||
{
|
||||
/// {best_match, berst_table_pos}
|
||||
std::pair<size_t, size_t> left_best{0, 0};
|
||||
std::pair<size_t, size_t> right_best{0, 0};
|
||||
|
||||
for (size_t i = 0; i < tables.size(); ++i)
|
||||
{
|
||||
size_t match = IdentifierSemantic::canReferColumnToTable(left, tables[i]);
|
||||
if (match > left_best.first)
|
||||
{
|
||||
left_best.first = match;
|
||||
left_best.second = i;
|
||||
}
|
||||
|
||||
match = IdentifierSemantic::canReferColumnToTable(right, tables[i]);
|
||||
if (match > right_best.first)
|
||||
{
|
||||
right_best.first = match;
|
||||
right_best.second = i;
|
||||
}
|
||||
}
|
||||
|
||||
return left_best.first && right_best.first && (left_best.second != right_best.second);
|
||||
}
|
||||
};
|
||||
|
||||
@ -100,27 +190,33 @@ std::vector<ASTPtr *> CrossToInnerJoinMatcher::visit(ASTPtr & ast, Data & data)
|
||||
|
||||
void CrossToInnerJoinMatcher::visit(ASTSelectQuery & select, ASTPtr & ast, Data & data)
|
||||
{
|
||||
using CheckColumnsMatcher = OneTypeMatcher<CheckColumnsVisitorData>;
|
||||
using CheckColumnsVisitor = InDepthNodeVisitor<CheckColumnsMatcher, true>;
|
||||
using CheckExpressionMatcher = OneTypeMatcher<CheckExpressionVisitorData, false>;
|
||||
using CheckExpressionVisitor = InDepthNodeVisitor<CheckExpressionMatcher, true>;
|
||||
|
||||
std::vector<DatabaseAndTableWithAlias> table_names;
|
||||
ASTPtr ast_join = getCrossJoin(select, table_names);
|
||||
if (!ast_join)
|
||||
return;
|
||||
|
||||
/// check Identifier names from where expression
|
||||
CheckColumnsVisitor::Data columns_data{table_names, 0, 0};
|
||||
CheckColumnsVisitor(columns_data).visit(select.where_expression);
|
||||
CheckExpressionVisitor::Data visitor_data{table_names};
|
||||
CheckExpressionVisitor(visitor_data).visit(select.where_expression);
|
||||
|
||||
if (!columns_data.allMatch())
|
||||
return;
|
||||
if (visitor_data.matchAny())
|
||||
{
|
||||
auto & join = typeid_cast<ASTTableJoin &>(*ast_join);
|
||||
join.kind = ASTTableJoin::Kind::Inner;
|
||||
join.strictness = ASTTableJoin::Strictness::All;
|
||||
|
||||
auto & join = typeid_cast<ASTTableJoin &>(*ast_join);
|
||||
join.kind = ASTTableJoin::Kind::Inner;
|
||||
join.strictness = ASTTableJoin::Strictness::All; /// TODO: do we need it?
|
||||
if (visitor_data.canReuseWhere())
|
||||
join.on_expression.swap(select.where_expression);
|
||||
else
|
||||
join.on_expression = visitor_data.makeOnExpression();
|
||||
|
||||
join.on_expression.swap(select.where_expression);
|
||||
join.children.push_back(join.on_expression);
|
||||
if (visitor_data.matchAll())
|
||||
select.where_expression.reset();
|
||||
|
||||
join.children.push_back(join.on_expression);
|
||||
}
|
||||
|
||||
ast = ast->clone(); /// rewrite AST in right manner
|
||||
data.done = true;
|
||||
|
@ -53,7 +53,7 @@ private:
|
||||
};
|
||||
|
||||
/// Simple matcher for one node type without complex traversal logic.
|
||||
template <typename _Data>
|
||||
template <typename _Data, bool _visit_children = true>
|
||||
class OneTypeMatcher
|
||||
{
|
||||
public:
|
||||
@ -62,7 +62,7 @@ public:
|
||||
|
||||
static constexpr const char * label = "";
|
||||
|
||||
static bool needChildVisit(ASTPtr &, const ASTPtr &) { return true; }
|
||||
static bool needChildVisit(ASTPtr &, const ASTPtr &) { return _visit_children; }
|
||||
|
||||
static std::vector<ASTPtr *> visit(ASTPtr & ast, Data & data)
|
||||
{
|
||||
|
@ -105,7 +105,9 @@ BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create)
|
||||
const ASTStorage & storage = *create.storage;
|
||||
const ASTFunction & engine = *storage.engine;
|
||||
/// Currently, there are no database engines, that support any arguments.
|
||||
if (engine.arguments || engine.parameters || storage.partition_by || storage.primary_key || storage.order_by || storage.sample_by || storage.settings)
|
||||
if (engine.arguments || engine.parameters || storage.partition_by || storage.primary_key
|
||||
|| storage.order_by || storage.sample_by || storage.settings ||
|
||||
(create.columns_list && create.columns_list->indices && !create.columns_list->indices->children.empty()))
|
||||
{
|
||||
std::stringstream ostr;
|
||||
formatAST(storage, ostr, false, false);
|
||||
@ -397,6 +399,16 @@ ASTPtr InterpreterCreateQuery::formatColumns(const ColumnsDescription & columns)
|
||||
return columns_list;
|
||||
}
|
||||
|
||||
ASTPtr InterpreterCreateQuery::formatIndices(const IndicesDescription & indices)
|
||||
{
|
||||
auto res = std::make_shared<ASTExpressionList>();
|
||||
|
||||
for (const auto & index : indices.indices)
|
||||
res->children.push_back(index->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
ColumnsDescription InterpreterCreateQuery::getColumnsDescription(const ASTExpressionList & columns, const Context & context)
|
||||
{
|
||||
ColumnsDescription res;
|
||||
@ -449,9 +461,9 @@ ColumnsDescription InterpreterCreateQuery::setColumns(
|
||||
{
|
||||
ColumnsDescription res;
|
||||
|
||||
if (create.columns)
|
||||
if (create.columns_list && create.columns_list->columns)
|
||||
{
|
||||
res = getColumnsDescription(*create.columns, context);
|
||||
res = getColumnsDescription(*create.columns_list->columns, context);
|
||||
}
|
||||
else if (!create.as_table.empty())
|
||||
{
|
||||
@ -467,10 +479,16 @@ ColumnsDescription InterpreterCreateQuery::setColumns(
|
||||
|
||||
/// Even if query has list of columns, canonicalize it (unfold Nested columns).
|
||||
ASTPtr new_columns = formatColumns(res);
|
||||
if (create.columns)
|
||||
create.replace(create.columns, new_columns);
|
||||
if (!create.columns_list)
|
||||
{
|
||||
auto new_columns_list = std::make_shared<ASTColumns>();
|
||||
create.set(create.columns_list, new_columns_list);
|
||||
}
|
||||
|
||||
if (create.columns_list->columns)
|
||||
create.columns_list->replace(create.columns_list->columns, new_columns);
|
||||
else
|
||||
create.set(create.columns, new_columns);
|
||||
create.columns_list->set(create.columns_list->columns, new_columns);
|
||||
|
||||
/// Check for duplicates
|
||||
std::set<String> all_columns;
|
||||
@ -550,7 +568,7 @@ BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create)
|
||||
String table_name_escaped = escapeForFileName(table_name);
|
||||
|
||||
// If this is a stub ATTACH query, read the query definition from the database
|
||||
if (create.attach && !create.storage && !create.columns)
|
||||
if (create.attach && !create.storage && !create.columns_list)
|
||||
{
|
||||
// Table SQL definition is available even if the table is detached
|
||||
auto query = context.getCreateTableQuery(database_name, table_name);
|
||||
@ -569,7 +587,7 @@ BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create)
|
||||
}
|
||||
|
||||
Block as_select_sample;
|
||||
if (create.select && (!create.attach || !create.columns))
|
||||
if (create.select && (!create.attach || !create.columns_list))
|
||||
as_select_sample = InterpreterSelectWithUnionQuery::getSampleBlock(create.select->clone(), context);
|
||||
|
||||
String as_database_name = create.as_database.empty() ? current_database : create.as_database;
|
||||
|
@ -2,6 +2,7 @@
|
||||
|
||||
#include <Interpreters/IInterpreter.h>
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <Storages/IndicesDescription.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
|
||||
|
||||
@ -29,6 +30,8 @@ public:
|
||||
static ASTPtr formatColumns(const NamesAndTypesList & columns);
|
||||
static ASTPtr formatColumns(const ColumnsDescription & columns);
|
||||
|
||||
static ASTPtr formatIndices(const IndicesDescription & indices);
|
||||
|
||||
void setDatabaseLoadingThreadpool(ThreadPool & thread_pool_)
|
||||
{
|
||||
thread_pool = &thread_pool_;
|
||||
|
@ -26,9 +26,6 @@ namespace ErrorCodes
|
||||
extern const int CANNOT_KILL;
|
||||
}
|
||||
|
||||
|
||||
using CancellationCode = ProcessList::CancellationCode;
|
||||
|
||||
static const char * cancellationCodeToStatus(CancellationCode code)
|
||||
{
|
||||
switch (code)
|
||||
|
@ -252,7 +252,7 @@ StoragePtr InterpreterSystemQuery::tryRestartReplica(const String & database_nam
|
||||
create.attach = true;
|
||||
|
||||
std::string data_path = database->getDataPath();
|
||||
auto columns = InterpreterCreateQuery::getColumnsDescription(*create.columns, system_context);
|
||||
auto columns = InterpreterCreateQuery::getColumnsDescription(*create.columns_list->columns, system_context);
|
||||
|
||||
StoragePtr table = StorageFactory::instance().get(create,
|
||||
data_path,
|
||||
|
@ -469,9 +469,15 @@ bool Join::insertFromBlock(const Block & block)
|
||||
}
|
||||
else
|
||||
{
|
||||
NameSet erased; /// HOTFIX: there could be duplicates in JOIN ON section
|
||||
|
||||
/// Remove the key columns from stored_block, as they are not needed.
|
||||
for (const auto & name : key_names_right)
|
||||
stored_block->erase(stored_block->getPositionByName(name));
|
||||
{
|
||||
if (!erased.count(name))
|
||||
stored_block->erase(stored_block->getPositionByName(name));
|
||||
erased.insert(name);
|
||||
}
|
||||
}
|
||||
|
||||
size_t size = stored_block->columns();
|
||||
|
@ -325,6 +325,29 @@ bool QueryStatus::tryGetQueryStreams(BlockInputStreamPtr & in, BlockOutputStream
|
||||
return true;
|
||||
}
|
||||
|
||||
CancellationCode QueryStatus::cancelQuery(bool kill)
|
||||
{
|
||||
/// Streams are destroyed, and ProcessListElement will be deleted from ProcessList soon. We need wait a little bit
|
||||
if (streamsAreReleased())
|
||||
return CancellationCode::CancelSent;
|
||||
|
||||
BlockInputStreamPtr input_stream;
|
||||
BlockOutputStreamPtr output_stream;
|
||||
|
||||
if (tryGetQueryStreams(input_stream, output_stream))
|
||||
{
|
||||
if (input_stream)
|
||||
{
|
||||
input_stream->cancel(kill);
|
||||
return CancellationCode::CancelSent;
|
||||
}
|
||||
return CancellationCode::CancelCannotBeSent;
|
||||
}
|
||||
/// Query is not even started
|
||||
is_killed.store(true);
|
||||
return CancellationCode::CancelSent;
|
||||
}
|
||||
|
||||
|
||||
void QueryStatus::setUserProcessList(ProcessListForUser * user_process_list_)
|
||||
{
|
||||
@ -356,7 +379,7 @@ QueryStatus * ProcessList::tryGetProcessListElement(const String & current_query
|
||||
}
|
||||
|
||||
|
||||
ProcessList::CancellationCode ProcessList::sendCancelToQuery(const String & current_query_id, const String & current_user, bool kill)
|
||||
CancellationCode ProcessList::sendCancelToQuery(const String & current_query_id, const String & current_user, bool kill)
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
|
||||
@ -365,25 +388,7 @@ ProcessList::CancellationCode ProcessList::sendCancelToQuery(const String & curr
|
||||
if (!elem)
|
||||
return CancellationCode::NotFound;
|
||||
|
||||
/// Streams are destroyed, and ProcessListElement will be deleted from ProcessList soon. We need wait a little bit
|
||||
if (elem->streamsAreReleased())
|
||||
return CancellationCode::CancelSent;
|
||||
|
||||
BlockInputStreamPtr input_stream;
|
||||
BlockOutputStreamPtr output_stream;
|
||||
|
||||
if (elem->tryGetQueryStreams(input_stream, output_stream))
|
||||
{
|
||||
if (input_stream)
|
||||
{
|
||||
input_stream->cancel(kill);
|
||||
return CancellationCode::CancelSent;
|
||||
}
|
||||
return CancellationCode::CancelCannotBeSent;
|
||||
}
|
||||
/// Query is not even started
|
||||
elem->is_killed.store(true);
|
||||
return CancellationCode::CancelSent;
|
||||
return elem->cancelQuery(kill);
|
||||
}
|
||||
|
||||
|
||||
|
@ -70,6 +70,14 @@ struct QueryStatusInfo
|
||||
std::shared_ptr<Settings> query_settings;
|
||||
};
|
||||
|
||||
enum class CancellationCode
|
||||
{
|
||||
NotFound = 0, /// already cancelled
|
||||
QueryIsNotInitializedYet = 1,
|
||||
CancelCannotBeSent = 2,
|
||||
CancelSent = 3,
|
||||
Unknown
|
||||
};
|
||||
|
||||
/// Query and information about its execution.
|
||||
class QueryStatus
|
||||
@ -192,6 +200,8 @@ public:
|
||||
/// Get query in/out pointers from BlockIO
|
||||
bool tryGetQueryStreams(BlockInputStreamPtr & in, BlockOutputStreamPtr & out) const;
|
||||
|
||||
CancellationCode cancelQuery(bool kill);
|
||||
|
||||
bool isKilled() const { return is_killed; }
|
||||
};
|
||||
|
||||
@ -312,15 +322,6 @@ public:
|
||||
max_size = max_size_;
|
||||
}
|
||||
|
||||
enum class CancellationCode
|
||||
{
|
||||
NotFound = 0, /// already cancelled
|
||||
QueryIsNotInitializedYet = 1,
|
||||
CancelCannotBeSent = 2,
|
||||
CancelSent = 3,
|
||||
Unknown
|
||||
};
|
||||
|
||||
/// Try call cancel() for input and output streams of query with specified id and user
|
||||
CancellationCode sendCancelToQuery(const String & current_query_id, const String & current_user, bool kill = false);
|
||||
};
|
||||
|
@ -299,6 +299,7 @@ struct Settings
|
||||
M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.") \
|
||||
M(SettingBool, allow_experimental_multiple_joins_emulation, false, "Emulate multiple joins using subselects") \
|
||||
M(SettingBool, allow_experimental_cross_to_join_conversion, false, "Convert CROSS JOIN to INNER JOIN if possible") \
|
||||
M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.") \
|
||||
|
||||
#define DECLARE(TYPE, NAME, DEFAULT, DESCRIPTION) \
|
||||
TYPE NAME {DEFAULT};
|
||||
|
@ -358,7 +358,10 @@ void SystemLog<LogElement>::prepareTable()
|
||||
create->table = table_name;
|
||||
|
||||
Block sample = LogElement::createBlock();
|
||||
create->set(create->columns, InterpreterCreateQuery::formatColumns(sample.getNamesAndTypesList()));
|
||||
|
||||
auto new_columns_list = std::make_shared<ASTColumns>();
|
||||
new_columns_list->set(new_columns_list->columns, InterpreterCreateQuery::formatColumns(sample.getNamesAndTypesList()));
|
||||
create->set(create->columns_list, new_columns_list);
|
||||
|
||||
ParserStorage storage_parser;
|
||||
ASTPtr storage_ast = parseQuery(
|
||||
|
@ -8,6 +8,8 @@
|
||||
|
||||
|
||||
/// Implement some methods of ThreadStatus and CurrentThread here to avoid extra linking dependencies in clickhouse_common_io
|
||||
/// TODO It doesn't make sense.
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
@ -17,21 +19,20 @@ void ThreadStatus::attachQueryContext(Context & query_context_)
|
||||
if (!global_context)
|
||||
global_context = &query_context->getGlobalContext();
|
||||
|
||||
if (!thread_group)
|
||||
return;
|
||||
query_id = query_context->getCurrentQueryId();
|
||||
|
||||
std::unique_lock lock(thread_group->mutex);
|
||||
thread_group->query_context = query_context;
|
||||
if (!thread_group->global_context)
|
||||
thread_group->global_context = global_context;
|
||||
if (thread_group)
|
||||
{
|
||||
std::unique_lock lock(thread_group->mutex);
|
||||
thread_group->query_context = query_context;
|
||||
if (!thread_group->global_context)
|
||||
thread_group->global_context = global_context;
|
||||
}
|
||||
}
|
||||
|
||||
String ThreadStatus::getQueryID()
|
||||
const std::string & ThreadStatus::getQueryId() const
|
||||
{
|
||||
if (query_context)
|
||||
return query_context->getClientInfo().current_query_id;
|
||||
|
||||
return {};
|
||||
return query_id;
|
||||
}
|
||||
|
||||
void CurrentThread::defaultThreadDeleter()
|
||||
@ -208,11 +209,9 @@ void CurrentThread::attachToIfDetached(const ThreadGroupStatusPtr & thread_group
|
||||
get().deleter = CurrentThread::defaultThreadDeleter;
|
||||
}
|
||||
|
||||
std::string CurrentThread::getCurrentQueryID()
|
||||
const std::string & CurrentThread::getQueryId()
|
||||
{
|
||||
if (!current_thread)
|
||||
return {};
|
||||
return get().getQueryID();
|
||||
return get().getQueryId();
|
||||
}
|
||||
|
||||
void CurrentThread::attachQueryContext(Context & query_context)
|
||||
|
@ -192,7 +192,7 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
|
||||
if (!internal)
|
||||
logQuery(query.substr(0, settings.log_queries_cut_to_length), context);
|
||||
|
||||
if (settings.allow_experimental_multiple_joins_emulation)
|
||||
if (!internal && settings.allow_experimental_multiple_joins_emulation)
|
||||
{
|
||||
JoinToSubqueryTransformVisitor::Data join_to_subs_data;
|
||||
JoinToSubqueryTransformVisitor(join_to_subs_data).visit(ast);
|
||||
@ -200,7 +200,7 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
|
||||
logQuery(queryToString(*ast), context);
|
||||
}
|
||||
|
||||
if (settings.allow_experimental_cross_to_join_conversion)
|
||||
if (!internal && settings.allow_experimental_cross_to_join_conversion)
|
||||
{
|
||||
CrossToInnerJoinVisitor::Data cross_to_inner;
|
||||
CrossToInnerJoinVisitor(cross_to_inner).visit(ast);
|
||||
|
@ -82,6 +82,24 @@ void ASTAlterCommand::formatImpl(
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "MODIFY ORDER BY " << (settings.hilite ? hilite_none : "");
|
||||
order_by->formatImpl(settings, state, frame);
|
||||
}
|
||||
else if (type == ASTAlterCommand::ADD_INDEX)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "ADD INDEX " << (if_not_exists ? "IF NOT EXISTS " : "") << (settings.hilite ? hilite_none : "");
|
||||
index_decl->formatImpl(settings, state, frame);
|
||||
|
||||
/// AFTER
|
||||
if (index)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << " AFTER " << (settings.hilite ? hilite_none : "");
|
||||
index->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
else if (type == ASTAlterCommand::DROP_INDEX)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str
|
||||
<< "DROP INDEX " << (if_exists ? "IF EXISTS " : "") << (settings.hilite ? hilite_none : "");
|
||||
index->formatImpl(settings, state, frame);
|
||||
}
|
||||
else if (type == ASTAlterCommand::DROP_PARTITION)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << (detach ? "DETACH" : "DROP") << " PARTITION "
|
||||
|
@ -28,6 +28,9 @@ public:
|
||||
COMMENT_COLUMN,
|
||||
MODIFY_ORDER_BY,
|
||||
|
||||
ADD_INDEX,
|
||||
DROP_INDEX,
|
||||
|
||||
DROP_PARTITION,
|
||||
ATTACH_PARTITION,
|
||||
REPLACE_PARTITION,
|
||||
@ -58,6 +61,15 @@ public:
|
||||
*/
|
||||
ASTPtr order_by;
|
||||
|
||||
/** The ADD INDEX query stores the IndexDeclaration there.
|
||||
*/
|
||||
ASTPtr index_decl;
|
||||
|
||||
/** The ADD INDEX query stores the name of the index following AFTER.
|
||||
* The DROP INDEX query stores the name for deletion.
|
||||
*/
|
||||
ASTPtr index;
|
||||
|
||||
/** Used in DROP PARTITION and ATTACH PARTITION FROM queries.
|
||||
* The value or ID of the partition is stored here.
|
||||
*/
|
||||
|
@ -38,6 +38,7 @@ public:
|
||||
res->set(res->order_by, order_by->clone());
|
||||
if (sample_by)
|
||||
res->set(res->sample_by, sample_by->clone());
|
||||
|
||||
if (settings)
|
||||
res->set(res->settings, settings->clone());
|
||||
|
||||
@ -81,6 +82,95 @@ public:
|
||||
};
|
||||
|
||||
|
||||
class ASTColumns : public IAST
|
||||
{
|
||||
private:
|
||||
class ASTColumnsElement : public IAST
|
||||
{
|
||||
public:
|
||||
String prefix;
|
||||
IAST * elem;
|
||||
|
||||
String getID(char c) const override { return "ASTColumnsElement for " + elem->getID(c); }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTColumnsElement>();
|
||||
res->prefix = prefix;
|
||||
if (elem)
|
||||
res->set(res->elem, elem->clone());
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
if (!elem)
|
||||
return;
|
||||
|
||||
if (prefix.empty())
|
||||
{
|
||||
elem->formatImpl(s, state, frame);
|
||||
return;
|
||||
}
|
||||
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
s.ostr << s.nl_or_ws << indent_str;
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << prefix << (s.hilite ? hilite_none : "");
|
||||
|
||||
FormatSettings nested_settings = s;
|
||||
nested_settings.one_line = true;
|
||||
nested_settings.nl_or_ws = ' ';
|
||||
|
||||
elem->formatImpl(nested_settings, state, frame);
|
||||
}
|
||||
};
|
||||
public:
|
||||
ASTExpressionList * columns = nullptr;
|
||||
ASTExpressionList * indices = nullptr;
|
||||
|
||||
String getID(char) const override { return "Columns definition"; }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTColumns>();
|
||||
|
||||
if (columns)
|
||||
res->set(res->columns, columns->clone());
|
||||
if (indices)
|
||||
res->set(res->indices, indices->clone());
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState & state, FormatStateStacked frame) const override
|
||||
{
|
||||
ASTExpressionList list;
|
||||
|
||||
if (columns)
|
||||
for (const auto & column : columns->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "";
|
||||
elem->set(elem->elem, column->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
if (indices)
|
||||
for (const auto & index : indices->children)
|
||||
{
|
||||
auto elem = std::make_shared<ASTColumnsElement>();
|
||||
elem->prefix = "INDEX";
|
||||
elem->set(elem->elem, index->clone());
|
||||
list.children.push_back(elem);
|
||||
}
|
||||
|
||||
if (!list.children.empty())
|
||||
list.formatImpl(s, state, frame);
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
/// CREATE TABLE or ATTACH TABLE query
|
||||
class ASTCreateQuery : public ASTQueryWithTableAndOutput, public ASTQueryWithOnCluster
|
||||
{
|
||||
@ -90,7 +180,7 @@ public:
|
||||
bool is_view{false};
|
||||
bool is_materialized_view{false};
|
||||
bool is_populate{false};
|
||||
ASTExpressionList * columns = nullptr;
|
||||
ASTColumns * columns_list = nullptr;
|
||||
String to_database; /// For CREATE MATERIALIZED VIEW mv TO table.
|
||||
String to_table;
|
||||
ASTStorage * storage = nullptr;
|
||||
@ -106,8 +196,8 @@ public:
|
||||
auto res = std::make_shared<ASTCreateQuery>(*this);
|
||||
res->children.clear();
|
||||
|
||||
if (columns)
|
||||
res->set(res->columns, columns->clone());
|
||||
if (columns_list)
|
||||
res->set(res->columns_list, columns_list->clone());
|
||||
if (storage)
|
||||
res->set(res->storage, storage->clone());
|
||||
if (select)
|
||||
@ -175,12 +265,12 @@ protected:
|
||||
<< (!as_database.empty() ? backQuoteIfNeed(as_database) + "." : "") << backQuoteIfNeed(as_table);
|
||||
}
|
||||
|
||||
if (columns)
|
||||
if (columns_list)
|
||||
{
|
||||
settings.ostr << (settings.one_line ? " (" : "\n(");
|
||||
FormatStateStacked frame_nested = frame;
|
||||
++frame_nested.indent;
|
||||
columns->formatImpl(settings, state, frame_nested);
|
||||
columns_list->formatImpl(settings, state, frame_nested);
|
||||
settings.ostr << (settings.one_line ? ")" : "\n)");
|
||||
}
|
||||
|
||||
|
59
dbms/src/Parsers/ASTIndexDeclaration.h
Normal file
59
dbms/src/Parsers/ASTIndexDeclaration.h
Normal file
@ -0,0 +1,59 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Field.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Common/FieldVisitors.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/IAST.h>
|
||||
|
||||
#include <vector>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/** name BY expr TYPE typename(args) GRANULARITY int in create query
|
||||
*/
|
||||
class ASTIndexDeclaration : public IAST
|
||||
{
|
||||
public:
|
||||
String name;
|
||||
IAST * expr;
|
||||
ASTFunction * type;
|
||||
Field granularity;
|
||||
|
||||
/** Get the text that identifies this element. */
|
||||
String getID(char) const override { return "Index"; }
|
||||
|
||||
ASTPtr clone() const override
|
||||
{
|
||||
auto res = std::make_shared<ASTIndexDeclaration>();
|
||||
|
||||
res->name = name;
|
||||
res->granularity = granularity;
|
||||
|
||||
if (expr)
|
||||
res->set(res->expr, expr->clone());
|
||||
if (type)
|
||||
res->set(res->type, type->clone());
|
||||
return res;
|
||||
}
|
||||
|
||||
void formatImpl(const FormatSettings & s, FormatState &state, FormatStateStacked frame) const override
|
||||
{
|
||||
frame.need_parens = false;
|
||||
std::string indent_str = s.one_line ? "" : std::string(4 * frame.indent, ' ');
|
||||
|
||||
s.ostr << s.nl_or_ws << indent_str;
|
||||
s.ostr << backQuoteIfNeed(name);
|
||||
s.ostr << " ";
|
||||
expr->formatImpl(s, state, frame);
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << " TYPE " << (s.hilite ? hilite_none : "");
|
||||
type->formatImpl(s, state, frame);
|
||||
s.ostr << (s.hilite ? hilite_keyword : "") << " GRANULARITY " << (s.hilite ? hilite_none : "");
|
||||
s.ostr << applyVisitor(FieldVisitorToString(), granularity);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
@ -6,6 +6,7 @@
|
||||
#include <Parsers/ParserCreateQuery.h>
|
||||
#include <Parsers/ParserPartition.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
#include <Parsers/ASTAlterQuery.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTAssignment.h>
|
||||
@ -27,6 +28,9 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
|
||||
ParserKeyword s_comment_column("COMMENT COLUMN");
|
||||
ParserKeyword s_modify_order_by("MODIFY ORDER BY");
|
||||
|
||||
ParserKeyword s_add_index("ADD INDEX");
|
||||
ParserKeyword s_drop_index("DROP INDEX");
|
||||
|
||||
ParserKeyword s_attach_partition("ATTACH PARTITION");
|
||||
ParserKeyword s_detach_partition("DETACH PARTITION");
|
||||
ParserKeyword s_drop_partition("DROP PARTITION");
|
||||
@ -51,6 +55,7 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
|
||||
ParserCompoundIdentifier parser_name;
|
||||
ParserStringLiteral parser_string_literal;
|
||||
ParserCompoundColumnDeclaration parser_col_decl;
|
||||
ParserIndexDeclaration parser_idx_decl;
|
||||
ParserCompoundColumnDeclaration parser_modify_col_decl(false);
|
||||
ParserPartition parser_partition;
|
||||
ParserExpression parser_exp_elem;
|
||||
@ -92,6 +97,33 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
|
||||
command->type = ASTAlterCommand::DROP_COLUMN;
|
||||
command->detach = false;
|
||||
}
|
||||
else if (s_add_index.ignore(pos, expected))
|
||||
{
|
||||
if (s_if_not_exists.ignore(pos, expected))
|
||||
command->if_not_exists = true;
|
||||
|
||||
if (!parser_idx_decl.parse(pos, command->index_decl, expected))
|
||||
return false;
|
||||
|
||||
if (s_after.ignore(pos, expected))
|
||||
{
|
||||
if (!parser_name.parse(pos, command->index, expected))
|
||||
return false;
|
||||
}
|
||||
|
||||
command->type = ASTAlterCommand::ADD_INDEX;
|
||||
}
|
||||
else if (s_drop_index.ignore(pos, expected))
|
||||
{
|
||||
if (s_if_exists.ignore(pos, expected))
|
||||
command->if_exists = true;
|
||||
|
||||
if (!parser_name.parse(pos, command->index, expected))
|
||||
return false;
|
||||
|
||||
command->type = ASTAlterCommand::DROP_INDEX;
|
||||
command->detach = false;
|
||||
}
|
||||
else if (s_clear_column.ignore(pos, expected))
|
||||
{
|
||||
if (s_if_exists.ignore(pos, expected))
|
||||
|
@ -1,5 +1,7 @@
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ExpressionListParsers.h>
|
||||
@ -90,6 +92,113 @@ bool ParserColumnDeclarationList::parseImpl(Pos & pos, ASTPtr & node, Expected &
|
||||
.parse(pos, node, expected);
|
||||
}
|
||||
|
||||
bool ParserIndexDeclaration::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
ParserKeyword s_type("TYPE");
|
||||
ParserKeyword s_granularity("GRANULARITY");
|
||||
|
||||
ParserIdentifier name_p;
|
||||
ParserIdentifierWithOptionalParameters ident_with_optional_params_p;
|
||||
ParserExpression expression_p;
|
||||
ParserUnsignedInteger granularity_p;
|
||||
|
||||
ASTPtr name;
|
||||
ASTPtr expr;
|
||||
ASTPtr type;
|
||||
ASTPtr granularity;
|
||||
|
||||
if (!name_p.parse(pos, name, expected))
|
||||
return false;
|
||||
|
||||
if (!expression_p.parse(pos, expr, expected))
|
||||
return false;
|
||||
|
||||
if (!s_type.ignore(pos, expected))
|
||||
return false;
|
||||
|
||||
if (!ident_with_optional_params_p.parse(pos, type, expected))
|
||||
return false;
|
||||
|
||||
if (!s_granularity.ignore(pos, expected))
|
||||
return false;
|
||||
|
||||
if (!granularity_p.parse(pos, granularity, expected))
|
||||
return false;
|
||||
|
||||
auto index = std::make_shared<ASTIndexDeclaration>();
|
||||
index->name = typeid_cast<const ASTIdentifier &>(*name).name;
|
||||
index->granularity = typeid_cast<const ASTLiteral &>(*granularity).value;
|
||||
index->set(index->expr, expr);
|
||||
index->set(index->type, type);
|
||||
node = index;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bool ParserColumnAndIndexDeclaraion::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
ParserKeyword s_index("INDEX");
|
||||
|
||||
ParserIndexDeclaration index_p;
|
||||
ParserColumnDeclaration column_p;
|
||||
|
||||
ASTPtr new_node = nullptr;
|
||||
|
||||
if (s_index.ignore(pos, expected))
|
||||
{
|
||||
if (!index_p.parse(pos, new_node, expected))
|
||||
return false;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (!column_p.parse(pos, new_node, expected))
|
||||
return false;
|
||||
}
|
||||
|
||||
node = new_node;
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ParserIndexDeclarationList::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
return ParserList(std::make_unique<ParserIndexDeclaration>(), std::make_unique<ParserToken>(TokenType::Comma), false)
|
||||
.parse(pos, node, expected);
|
||||
}
|
||||
|
||||
|
||||
bool ParserColumnsOrIndicesDeclarationList::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
ASTPtr list;
|
||||
if (!ParserList(std::make_unique<ParserColumnAndIndexDeclaraion>(), std::make_unique<ParserToken>(TokenType::Comma), false)
|
||||
.parse(pos, list, expected))
|
||||
return false;
|
||||
|
||||
ASTPtr columns = std::make_shared<ASTExpressionList>();
|
||||
ASTPtr indices = std::make_shared<ASTExpressionList>();
|
||||
|
||||
for (const auto & elem : list->children)
|
||||
{
|
||||
if (typeid_cast<const ASTColumnDeclaration *>(elem.get()))
|
||||
columns->children.push_back(elem);
|
||||
else if (typeid_cast<const ASTIndexDeclaration *>(elem.get()))
|
||||
indices->children.push_back(elem);
|
||||
else
|
||||
return false;
|
||||
}
|
||||
|
||||
auto res = std::make_shared<ASTColumns>();
|
||||
|
||||
if (!columns->children.empty())
|
||||
res->set(res->columns, columns);
|
||||
if (!indices->children.empty())
|
||||
res->set(res->indices, indices);
|
||||
|
||||
node = res;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
@ -169,6 +278,7 @@ bool ParserStorage::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
storage->set(storage->primary_key, primary_key);
|
||||
storage->set(storage->order_by, order_by);
|
||||
storage->set(storage->sample_by, sample_by);
|
||||
|
||||
storage->set(storage->settings, settings);
|
||||
|
||||
node = storage;
|
||||
@ -193,12 +303,12 @@ bool ParserCreateQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
ParserToken s_rparen(TokenType::ClosingRoundBracket);
|
||||
ParserStorage storage_p;
|
||||
ParserIdentifier name_p;
|
||||
ParserColumnDeclarationList columns_p;
|
||||
ParserColumnsOrIndicesDeclarationList columns_or_indices_p;
|
||||
ParserSelectWithUnionQuery select_p;
|
||||
|
||||
ASTPtr database;
|
||||
ASTPtr table;
|
||||
ASTPtr columns;
|
||||
ASTPtr columns_list;
|
||||
ASTPtr to_database;
|
||||
ASTPtr to_table;
|
||||
ASTPtr storage;
|
||||
@ -266,7 +376,7 @@ bool ParserCreateQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
/// List of columns.
|
||||
if (s_lparen.ignore(pos, expected))
|
||||
{
|
||||
if (!columns_p.parse(pos, columns, expected))
|
||||
if (!columns_or_indices_p.parse(pos, columns_list, expected))
|
||||
return false;
|
||||
|
||||
if (!s_rparen.ignore(pos, expected))
|
||||
@ -368,7 +478,7 @@ bool ParserCreateQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
/// Optional - a list of columns can be specified. It must fully comply with SELECT.
|
||||
if (s_lparen.ignore(pos, expected))
|
||||
{
|
||||
if (!columns_p.parse(pos, columns, expected))
|
||||
if (!columns_or_indices_p.parse(pos, columns_list, expected))
|
||||
return false;
|
||||
|
||||
if (!s_rparen.ignore(pos, expected))
|
||||
@ -410,7 +520,7 @@ bool ParserCreateQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
getIdentifierName(to_database, query->to_database);
|
||||
getIdentifierName(to_table, query->to_table);
|
||||
|
||||
query->set(query->columns, columns);
|
||||
query->set(query->columns_list, columns_list);
|
||||
query->set(query->storage, storage);
|
||||
|
||||
getIdentifierName(as_database, query->as_database);
|
||||
|
@ -218,7 +218,45 @@ protected:
|
||||
};
|
||||
|
||||
|
||||
/** ENGINE = name [PARTITION BY expr] [ORDER BY expr] [PRIMARY KEY expr] [SAMPLE BY expr] [SETTINGS name = value, ...] */
|
||||
/** name BY expr TYPE typename(arg1, arg2, ...) GRANULARITY value */
|
||||
class ParserIndexDeclaration : public IParserBase
|
||||
{
|
||||
public:
|
||||
ParserIndexDeclaration() {}
|
||||
|
||||
protected:
|
||||
const char * getName() const override { return "index declaration"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
|
||||
class ParserColumnAndIndexDeclaraion : public IParserBase
|
||||
{
|
||||
protected:
|
||||
const char * getName() const override { return "column or index declaration"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
|
||||
class ParserIndexDeclarationList : public IParserBase
|
||||
{
|
||||
protected:
|
||||
const char * getName() const override { return "index declaration list"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
|
||||
class ParserColumnsOrIndicesDeclarationList : public IParserBase
|
||||
{
|
||||
protected:
|
||||
const char * getName() const override { return "columns or indices declaration list"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
|
||||
/**
|
||||
* ENGINE = name [PARTITION BY expr] [ORDER BY expr] [PRIMARY KEY expr] [SAMPLE BY expr] [SETTINGS name = value, ...]
|
||||
*/
|
||||
class ParserStorage : public IParserBase
|
||||
{
|
||||
protected:
|
||||
@ -233,6 +271,8 @@ protected:
|
||||
* name1 type1,
|
||||
* name2 type2,
|
||||
* ...
|
||||
* INDEX name1 expr TYPE type1(args) GRANULARITY value,
|
||||
* ...
|
||||
* ) ENGINE = engine
|
||||
*
|
||||
* Or:
|
||||
|
@ -8,6 +8,7 @@
|
||||
#include <Interpreters/ExpressionAnalyzer.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
@ -120,6 +121,35 @@ std::optional<AlterCommand> AlterCommand::parse(const ASTAlterCommand * command_
|
||||
command.order_by = command_ast->order_by;
|
||||
return command;
|
||||
}
|
||||
else if (command_ast->type == ASTAlterCommand::ADD_INDEX)
|
||||
{
|
||||
AlterCommand command;
|
||||
command.index_decl = command_ast->index_decl;
|
||||
command.type = AlterCommand::ADD_INDEX;
|
||||
|
||||
const auto & ast_index_decl = typeid_cast<const ASTIndexDeclaration &>(*command_ast->index_decl);
|
||||
|
||||
command.index_name = ast_index_decl.name;
|
||||
|
||||
if (command_ast->index)
|
||||
command.after_index_name = typeid_cast<const ASTIdentifier &>(*command_ast->index).name;
|
||||
|
||||
command.if_not_exists = command_ast->if_not_exists;
|
||||
|
||||
return command;
|
||||
}
|
||||
else if (command_ast->type == ASTAlterCommand::DROP_INDEX)
|
||||
{
|
||||
if (command_ast->clear_column)
|
||||
throw Exception("\"ALTER TABLE table CLEAR COLUMN column\" queries are not supported yet. Use \"CLEAR COLUMN column IN PARTITION\".", ErrorCodes::NOT_IMPLEMENTED);
|
||||
|
||||
AlterCommand command;
|
||||
command.type = AlterCommand::DROP_INDEX;
|
||||
command.index_name = typeid_cast<const ASTIdentifier &>(*(command_ast->index)).name;
|
||||
command.if_exists = command_ast->if_exists;
|
||||
|
||||
return command;
|
||||
}
|
||||
else
|
||||
return {};
|
||||
}
|
||||
@ -132,7 +162,8 @@ static bool namesEqual(const String & name_without_dot, const DB::NameAndTypePai
|
||||
return (name_with_dot == name_type.name.substr(0, name_without_dot.length() + 1) || name_without_dot == name_type.name);
|
||||
}
|
||||
|
||||
void AlterCommand::apply(ColumnsDescription & columns_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
|
||||
void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
|
||||
ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
|
||||
{
|
||||
if (type == ADD_COLUMN)
|
||||
{
|
||||
@ -297,6 +328,60 @@ void AlterCommand::apply(ColumnsDescription & columns_description, ASTPtr & orde
|
||||
{
|
||||
columns_description.comments[column_name] = comment;
|
||||
}
|
||||
else if (type == ADD_INDEX)
|
||||
{
|
||||
if (std::any_of(
|
||||
indices_description.indices.cbegin(),
|
||||
indices_description.indices.cend(),
|
||||
[this](const ASTPtr & index_ast)
|
||||
{
|
||||
return typeid_cast<const ASTIndexDeclaration &>(*index_ast).name == index_name;
|
||||
}))
|
||||
{
|
||||
if (if_not_exists)
|
||||
return;
|
||||
else
|
||||
throw Exception{"Cannot add index " + index_name + ": index with this name already exists",
|
||||
ErrorCodes::ILLEGAL_COLUMN};
|
||||
}
|
||||
|
||||
auto insert_it = indices_description.indices.end();
|
||||
|
||||
if (!after_index_name.empty())
|
||||
{
|
||||
insert_it = std::find_if(
|
||||
indices_description.indices.begin(),
|
||||
indices_description.indices.end(),
|
||||
[this](const ASTPtr & index_ast)
|
||||
{
|
||||
return typeid_cast<const ASTIndexDeclaration &>(*index_ast).name == after_index_name;
|
||||
});
|
||||
|
||||
if (insert_it == indices_description.indices.end())
|
||||
throw Exception("Wrong index name. Cannot find index `" + after_index_name + "` to insert after.",
|
||||
ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
++insert_it;
|
||||
}
|
||||
|
||||
indices_description.indices.emplace(insert_it, std::dynamic_pointer_cast<ASTIndexDeclaration>(index_decl));
|
||||
}
|
||||
else if (type == DROP_INDEX)
|
||||
{
|
||||
auto erase_it = std::find_if(
|
||||
indices_description.indices.begin(),
|
||||
indices_description.indices.end(),
|
||||
[this](const ASTPtr & index_ast)
|
||||
{
|
||||
return typeid_cast<const ASTIndexDeclaration &>(*index_ast).name == index_name;
|
||||
});
|
||||
|
||||
if (erase_it == indices_description.indices.end())
|
||||
throw Exception("Wrong index name. Cannot find index `" + index_name + "` to drop.",
|
||||
ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
indices_description.indices.erase(erase_it);
|
||||
}
|
||||
else
|
||||
throw Exception("Wrong parameter type in ALTER query", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
@ -311,17 +396,19 @@ bool AlterCommand::is_mutable() const
|
||||
return true;
|
||||
}
|
||||
|
||||
void AlterCommands::apply(ColumnsDescription & columns_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
|
||||
void AlterCommands::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
|
||||
ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const
|
||||
{
|
||||
auto new_columns_description = columns_description;
|
||||
auto new_indices_description = indices_description;
|
||||
auto new_order_by_ast = order_by_ast;
|
||||
auto new_primary_key_ast = primary_key_ast;
|
||||
|
||||
for (const AlterCommand & command : *this)
|
||||
if (!command.ignore)
|
||||
command.apply(new_columns_description, new_order_by_ast, new_primary_key_ast);
|
||||
|
||||
command.apply(new_columns_description, new_indices_description, new_order_by_ast, new_primary_key_ast);
|
||||
columns_description = std::move(new_columns_description);
|
||||
indices_description = std::move(new_indices_description);
|
||||
order_by_ast = std::move(new_order_by_ast);
|
||||
primary_key_ast = std::move(new_primary_key_ast);
|
||||
}
|
||||
@ -538,14 +625,17 @@ void AlterCommands::validate(const IStorage & table, const Context & context)
|
||||
void AlterCommands::apply(ColumnsDescription & columns_description) const
|
||||
{
|
||||
auto out_columns_description = columns_description;
|
||||
IndicesDescription indices_description;
|
||||
ASTPtr out_order_by;
|
||||
ASTPtr out_primary_key;
|
||||
apply(out_columns_description, out_order_by, out_primary_key);
|
||||
apply(out_columns_description, indices_description, out_order_by, out_primary_key);
|
||||
|
||||
if (out_order_by)
|
||||
throw Exception("Storage doesn't support modifying ORDER BY expression", ErrorCodes::NOT_IMPLEMENTED);
|
||||
if (out_primary_key)
|
||||
throw Exception("Storage doesn't support modifying PRIMARY KEY expression", ErrorCodes::NOT_IMPLEMENTED);
|
||||
if (!indices_description.indices.empty())
|
||||
throw Exception("Storage doesn't support modifying indices", ErrorCodes::NOT_IMPLEMENTED);
|
||||
|
||||
columns_description = std::move(out_columns_description);
|
||||
}
|
||||
|
@ -3,6 +3,7 @@
|
||||
#include <optional>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <Storages/IndicesDescription.h>
|
||||
#include <optional>
|
||||
|
||||
|
||||
@ -23,6 +24,8 @@ struct AlterCommand
|
||||
MODIFY_COLUMN,
|
||||
COMMENT_COLUMN,
|
||||
MODIFY_ORDER_BY,
|
||||
ADD_INDEX,
|
||||
DROP_INDEX,
|
||||
UKNOWN_TYPE,
|
||||
};
|
||||
|
||||
@ -52,6 +55,13 @@ struct AlterCommand
|
||||
/// For MODIFY_ORDER_BY
|
||||
ASTPtr order_by;
|
||||
|
||||
/// For ADD INDEX
|
||||
ASTPtr index_decl;
|
||||
String after_index_name;
|
||||
|
||||
/// For ADD/DROP INDEX
|
||||
String index_name;
|
||||
|
||||
/// indicates that this command should not be applied, for example in case of if_exists=true and column doesn't exist.
|
||||
bool ignore = false;
|
||||
|
||||
@ -70,7 +80,8 @@ struct AlterCommand
|
||||
|
||||
static std::optional<AlterCommand> parse(const ASTAlterCommand * command);
|
||||
|
||||
void apply(ColumnsDescription & columns_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const;
|
||||
void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description,
|
||||
ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const;
|
||||
/// Checks that not only metadata touched by that command
|
||||
bool is_mutable() const;
|
||||
};
|
||||
@ -81,7 +92,8 @@ class Context;
|
||||
class AlterCommands : public std::vector<AlterCommand>
|
||||
{
|
||||
public:
|
||||
void apply(ColumnsDescription & columns_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const;
|
||||
void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, ASTPtr & order_by_ast,
|
||||
ASTPtr & primary_key_ast) const;
|
||||
|
||||
/// For storages that don't support MODIFY_ORDER_BY.
|
||||
void apply(ColumnsDescription & columns_description) const;
|
||||
|
@ -24,8 +24,9 @@ void IStorage::alter(const AlterCommands & params, const String & database_name,
|
||||
|
||||
auto lock = lockStructureForAlter();
|
||||
auto new_columns = getColumns();
|
||||
auto new_indices = getIndicesDescription();
|
||||
params.apply(new_columns);
|
||||
context.getDatabase(database_name)->alterTable(context, table_name, new_columns, {});
|
||||
context.getDatabase(database_name)->alterTable(context, table_name, new_columns, new_indices, {});
|
||||
setColumns(std::move(new_columns));
|
||||
}
|
||||
|
||||
|
@ -31,6 +31,11 @@ void ITableDeclaration::setColumns(ColumnsDescription columns_)
|
||||
columns = std::move(columns_);
|
||||
}
|
||||
|
||||
void ITableDeclaration::setIndicesDescription(IndicesDescription indices_)
|
||||
{
|
||||
indices = std::move(indices_);
|
||||
}
|
||||
|
||||
|
||||
bool ITableDeclaration::hasColumn(const String & column_name) const
|
||||
{
|
||||
|
@ -1,6 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <Storages/IndicesDescription.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
@ -15,6 +16,9 @@ public:
|
||||
virtual const ColumnsDescription & getColumns() const { return columns; }
|
||||
virtual void setColumns(ColumnsDescription columns_);
|
||||
|
||||
virtual const IndicesDescription & getIndicesDescription() const { return indices; }
|
||||
virtual void setIndicesDescription(IndicesDescription indices_);
|
||||
|
||||
/// NOTE: These methods should include virtual columns, but should NOT include ALIAS columns
|
||||
/// (they are treated separately).
|
||||
virtual NameAndTypePair getColumn(const String & column_name) const;
|
||||
@ -52,6 +56,7 @@ public:
|
||||
|
||||
private:
|
||||
ColumnsDescription columns;
|
||||
IndicesDescription indices;
|
||||
};
|
||||
|
||||
}
|
||||
|
38
dbms/src/Storages/IndicesDescription.cpp
Normal file
38
dbms/src/Storages/IndicesDescription.cpp
Normal file
@ -0,0 +1,38 @@
|
||||
#include <Storages/IndicesDescription.h>
|
||||
|
||||
#include <Parsers/formatAST.h>
|
||||
#include <Parsers/ParserCreateQuery.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
String IndicesDescription::toString() const
|
||||
{
|
||||
if (indices.empty())
|
||||
return {};
|
||||
|
||||
ASTExpressionList list;
|
||||
for (const auto & index : indices)
|
||||
list.children.push_back(index);
|
||||
|
||||
return serializeAST(list, true);
|
||||
}
|
||||
|
||||
IndicesDescription IndicesDescription::parse(const String & str)
|
||||
{
|
||||
if (str.empty())
|
||||
return {};
|
||||
|
||||
IndicesDescription res;
|
||||
ParserIndexDeclarationList parser;
|
||||
ASTPtr list = parseQuery(parser, str, 0);
|
||||
|
||||
for (const auto & index : list->children)
|
||||
res.indices.push_back(std::dynamic_pointer_cast<ASTIndexDeclaration>(index));
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
}
|
22
dbms/src/Storages/IndicesDescription.h
Normal file
22
dbms/src/Storages/IndicesDescription.h
Normal file
@ -0,0 +1,22 @@
|
||||
#pragma once
|
||||
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
using IndicesAsts = std::vector<std::shared_ptr<ASTIndexDeclaration>>;
|
||||
|
||||
struct IndicesDescription
|
||||
{
|
||||
IndicesAsts indices;
|
||||
|
||||
IndicesDescription() = default;
|
||||
|
||||
String toString() const;
|
||||
|
||||
static IndicesDescription parse(const String & str);
|
||||
};
|
||||
|
||||
}
|
@ -80,8 +80,6 @@ protected:
|
||||
void threadFunction();
|
||||
};
|
||||
|
||||
using BackgroundProcessingPoolPtr = std::shared_ptr<BackgroundProcessingPool>;
|
||||
|
||||
|
||||
class BackgroundProcessingPoolTaskInfo
|
||||
{
|
||||
|
@ -47,6 +47,7 @@
|
||||
|
||||
#include <algorithm>
|
||||
#include <iomanip>
|
||||
#include <set>
|
||||
#include <thread>
|
||||
#include <typeinfo>
|
||||
#include <typeindex>
|
||||
@ -88,6 +89,7 @@ namespace ErrorCodes
|
||||
MergeTreeData::MergeTreeData(
|
||||
const String & database_, const String & table_,
|
||||
const String & full_path_, const ColumnsDescription & columns_,
|
||||
const IndicesDescription & indices_,
|
||||
Context & context_,
|
||||
const String & date_column_name,
|
||||
const ASTPtr & partition_by_ast_,
|
||||
@ -113,7 +115,7 @@ MergeTreeData::MergeTreeData(
|
||||
data_parts_by_info(data_parts_indexes.get<TagByInfo>()),
|
||||
data_parts_by_state_and_info(data_parts_indexes.get<TagByStateAndInfo>())
|
||||
{
|
||||
setPrimaryKeyAndColumns(order_by_ast_, primary_key_ast_, columns_);
|
||||
setPrimaryKeyIndicesAndColumns(order_by_ast_, primary_key_ast_, columns_, indices_);
|
||||
|
||||
/// NOTE: using the same columns list as is read when performing actual merges.
|
||||
merging_params.check(getColumns().getAllPhysical());
|
||||
@ -219,8 +221,9 @@ static void checkKeyExpression(const ExpressionActions & expr, const Block & sam
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeData::setPrimaryKeyAndColumns(
|
||||
const ASTPtr & new_order_by_ast, ASTPtr new_primary_key_ast, const ColumnsDescription & new_columns, bool only_check)
|
||||
void MergeTreeData::setPrimaryKeyIndicesAndColumns(
|
||||
const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
|
||||
const ColumnsDescription &new_columns, const IndicesDescription &indices_description, bool only_check)
|
||||
{
|
||||
if (!new_order_by_ast)
|
||||
throw Exception("ORDER BY cannot be empty", ErrorCodes::BAD_ARGUMENTS);
|
||||
@ -327,6 +330,50 @@ void MergeTreeData::setPrimaryKeyAndColumns(
|
||||
new_primary_key_data_types.push_back(elem.type);
|
||||
}
|
||||
|
||||
ASTPtr skip_indices_with_primary_key_expr_list = new_primary_key_expr_list->clone();
|
||||
ASTPtr skip_indices_with_sorting_key_expr_list = new_sorting_key_expr_list->clone();
|
||||
|
||||
MergeTreeIndices new_indices;
|
||||
|
||||
if (!indices_description.indices.empty())
|
||||
{
|
||||
std::set<String> indices_names;
|
||||
|
||||
for (const auto & index_ast : indices_description.indices)
|
||||
{
|
||||
const auto & index_decl = std::dynamic_pointer_cast<ASTIndexDeclaration>(index_ast);
|
||||
|
||||
new_indices.push_back(
|
||||
MergeTreeIndexFactory::instance().get(
|
||||
all_columns,
|
||||
std::dynamic_pointer_cast<ASTIndexDeclaration>(index_decl->clone()),
|
||||
global_context));
|
||||
|
||||
if (indices_names.find(new_indices.back()->name) != indices_names.end())
|
||||
throw Exception(
|
||||
"Index with name `" + new_indices.back()->name + "` already exsists",
|
||||
ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(index_decl->expr->clone());
|
||||
for (const auto & expr : expr_list->children)
|
||||
{
|
||||
skip_indices_with_primary_key_expr_list->children.push_back(expr->clone());
|
||||
skip_indices_with_sorting_key_expr_list->children.push_back(expr->clone());
|
||||
}
|
||||
|
||||
indices_names.insert(new_indices.back()->name);
|
||||
}
|
||||
}
|
||||
auto syntax_primary = SyntaxAnalyzer(global_context, {}).analyze(
|
||||
skip_indices_with_primary_key_expr_list, all_columns);
|
||||
auto new_indices_with_primary_key_expr = ExpressionAnalyzer(
|
||||
skip_indices_with_primary_key_expr_list, syntax_primary, global_context).getActions(false);
|
||||
|
||||
auto syntax_sorting = SyntaxAnalyzer(global_context, {}).analyze(
|
||||
skip_indices_with_sorting_key_expr_list, all_columns);
|
||||
auto new_indices_with_sorting_key_expr = ExpressionAnalyzer(
|
||||
skip_indices_with_sorting_key_expr_list, syntax_sorting, global_context).getActions(false);
|
||||
|
||||
if (!only_check)
|
||||
{
|
||||
setColumns(new_columns);
|
||||
@ -342,6 +389,12 @@ void MergeTreeData::setPrimaryKeyAndColumns(
|
||||
primary_key_expr = std::move(new_primary_key_expr);
|
||||
primary_key_sample = std::move(new_primary_key_sample);
|
||||
primary_key_data_types = std::move(new_primary_key_data_types);
|
||||
|
||||
setIndicesDescription(indices_description);
|
||||
skip_indices = std::move(new_indices);
|
||||
|
||||
primary_key_and_skip_indices_expr = new_indices_with_primary_key_expr;
|
||||
sorting_key_and_skip_indices_expr = new_indices_with_sorting_key_expr;
|
||||
}
|
||||
}
|
||||
|
||||
@ -1001,9 +1054,10 @@ void MergeTreeData::checkAlter(const AlterCommands & commands)
|
||||
{
|
||||
/// Check that needed transformations can be applied to the list of columns without considering type conversions.
|
||||
auto new_columns = getColumns();
|
||||
auto new_indices = getIndicesDescription();
|
||||
ASTPtr new_order_by_ast = order_by_ast;
|
||||
ASTPtr new_primary_key_ast = primary_key_ast;
|
||||
commands.apply(new_columns, new_order_by_ast, new_primary_key_ast);
|
||||
commands.apply(new_columns, new_indices, new_order_by_ast, new_primary_key_ast);
|
||||
|
||||
/// Set of columns that shouldn't be altered.
|
||||
NameSet columns_alter_forbidden;
|
||||
@ -1021,6 +1075,12 @@ void MergeTreeData::checkAlter(const AlterCommands & commands)
|
||||
columns_alter_forbidden.insert(col);
|
||||
}
|
||||
|
||||
for (const auto & index : skip_indices)
|
||||
{
|
||||
for (const String & col : index->expr->getRequiredColumns())
|
||||
columns_alter_forbidden.insert(col);
|
||||
}
|
||||
|
||||
if (sorting_key_expr)
|
||||
{
|
||||
for (const ExpressionAction & action : sorting_key_expr->getActions())
|
||||
@ -1075,18 +1135,21 @@ void MergeTreeData::checkAlter(const AlterCommands & commands)
|
||||
}
|
||||
}
|
||||
|
||||
setPrimaryKeyAndColumns(new_order_by_ast, new_primary_key_ast, new_columns, /* only_check = */ true);
|
||||
setPrimaryKeyIndicesAndColumns(new_order_by_ast, new_primary_key_ast,
|
||||
new_columns, new_indices, /* only_check = */ true);
|
||||
|
||||
/// Check that type conversions are possible.
|
||||
ExpressionActionsPtr unused_expression;
|
||||
NameToNameMap unused_map;
|
||||
bool unused_bool;
|
||||
|
||||
createConvertExpression(nullptr, getColumns().getAllPhysical(), new_columns.getAllPhysical(), unused_expression, unused_map, unused_bool);
|
||||
createConvertExpression(nullptr, getColumns().getAllPhysical(), new_columns.getAllPhysical(),
|
||||
getIndicesDescription().indices, new_indices.indices, unused_expression, unused_map, unused_bool);
|
||||
}
|
||||
|
||||
void MergeTreeData::createConvertExpression(const DataPartPtr & part, const NamesAndTypesList & old_columns, const NamesAndTypesList & new_columns,
|
||||
ExpressionActionsPtr & out_expression, NameToNameMap & out_rename_map, bool & out_force_update_metadata) const
|
||||
const IndicesAsts & old_indices, const IndicesAsts & new_indices, ExpressionActionsPtr & out_expression,
|
||||
NameToNameMap & out_rename_map, bool & out_force_update_metadata) const
|
||||
{
|
||||
out_expression = nullptr;
|
||||
out_rename_map = {};
|
||||
@ -1100,6 +1163,21 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name
|
||||
/// For every column that need to be converted: source column name, column name of calculated expression for conversion.
|
||||
std::vector<std::pair<String, String>> conversions;
|
||||
|
||||
|
||||
/// Remove old indices
|
||||
std::set<String> new_indices_set;
|
||||
for (const auto & index_decl : new_indices)
|
||||
new_indices_set.emplace(dynamic_cast<const ASTIndexDeclaration &>(*index_decl.get()).name);
|
||||
for (const auto & index_decl : old_indices)
|
||||
{
|
||||
const auto & index = dynamic_cast<const ASTIndexDeclaration &>(*index_decl.get());
|
||||
if (!new_indices_set.count(index.name))
|
||||
{
|
||||
out_rename_map["skp_idx_" + index.name + ".idx"] = "";
|
||||
out_rename_map["skp_idx_" + index.name + ".mrk"] = "";
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect counts for shared streams of different columns. As an example, Nested columns have shared stream with array sizes.
|
||||
std::map<String, size_t> stream_counts;
|
||||
for (const NameAndTypePair & column : old_columns)
|
||||
@ -1230,12 +1308,15 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name
|
||||
MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart(
|
||||
const DataPartPtr & part,
|
||||
const NamesAndTypesList & new_columns,
|
||||
const IndicesAsts & new_indices,
|
||||
bool skip_sanity_checks)
|
||||
{
|
||||
ExpressionActionsPtr expression;
|
||||
AlterDataPartTransactionPtr transaction(new AlterDataPartTransaction(part)); /// Blocks changes to the part.
|
||||
bool force_update_metadata;
|
||||
createConvertExpression(part, part->columns, new_columns, expression, transaction->rename_map, force_update_metadata);
|
||||
createConvertExpression(part, part->columns, new_columns,
|
||||
getIndicesDescription().indices, new_indices,
|
||||
expression, transaction->rename_map, force_update_metadata);
|
||||
|
||||
size_t num_files_to_modify = transaction->rename_map.size();
|
||||
size_t num_files_to_remove = 0;
|
||||
@ -2062,7 +2143,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeData::loadPartAndFixMetadata(const St
|
||||
/// Check the data while we are at it.
|
||||
if (part->checksums.empty())
|
||||
{
|
||||
part->checksums = checkDataPart(full_part_path, index_granularity, false, primary_key_data_types);
|
||||
part->checksums = checkDataPart(full_part_path, index_granularity, false, primary_key_data_types, skip_indices);
|
||||
|
||||
{
|
||||
WriteBufferFromFile out(full_part_path + "checksums.txt.tmp", 4096);
|
||||
|
@ -4,6 +4,7 @@
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Storages/ITableDeclaration.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreePartInfo.h>
|
||||
#include <Storages/MergeTree/MergeTreeSettings.h>
|
||||
#include <IO/ReadBufferFromString.h>
|
||||
@ -13,6 +14,7 @@
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataStreams/GraphiteRollupSortedBlockInputStream.h>
|
||||
#include <Storages/MergeTree/MergeTreeDataPart.h>
|
||||
#include <Storages/IndicesDescription.h>
|
||||
|
||||
#include <boost/multi_index_container.hpp>
|
||||
#include <boost/multi_index/ordered_index.hpp>
|
||||
@ -303,6 +305,7 @@ public:
|
||||
MergeTreeData(const String & database_, const String & table_,
|
||||
const String & full_path_,
|
||||
const ColumnsDescription & columns_,
|
||||
const IndicesDescription & indices_,
|
||||
Context & context_,
|
||||
const String & date_column_name,
|
||||
const ASTPtr & partition_by_ast_,
|
||||
@ -476,7 +479,7 @@ public:
|
||||
/// Check if the ALTER can be performed:
|
||||
/// - all needed columns are present.
|
||||
/// - all type conversions can be done.
|
||||
/// - columns corresponding to primary key, sign, sampling expression and date are not affected.
|
||||
/// - columns corresponding to primary key, indices, sign, sampling expression and date are not affected.
|
||||
/// If something is wrong, throws an exception.
|
||||
void checkAlter(const AlterCommands & commands);
|
||||
|
||||
@ -487,6 +490,7 @@ public:
|
||||
AlterDataPartTransactionPtr alterDataPart(
|
||||
const DataPartPtr & part,
|
||||
const NamesAndTypesList & new_columns,
|
||||
const IndicesAsts & new_indices,
|
||||
bool skip_sanity_checks);
|
||||
|
||||
/// Freezes all parts.
|
||||
@ -508,6 +512,7 @@ public:
|
||||
|
||||
bool hasSortingKey() const { return !sorting_key_columns.empty(); }
|
||||
bool hasPrimaryKey() const { return !primary_key_columns.empty(); }
|
||||
bool hasSkipIndices() const { return !skip_indices.empty(); }
|
||||
|
||||
ASTPtr getSortingKeyAST() const { return sorting_key_expr_ast; }
|
||||
ASTPtr getPrimaryKeyAST() const { return primary_key_expr_ast; }
|
||||
@ -581,6 +586,12 @@ public:
|
||||
Int64 minmax_idx_date_column_pos = -1; /// In a common case minmax index includes a date column.
|
||||
Int64 minmax_idx_time_column_pos = -1; /// In other cases, minmax index often includes a dateTime column.
|
||||
|
||||
/// Secondary (data skipping) indices for MergeTree
|
||||
MergeTreeIndices skip_indices;
|
||||
|
||||
ExpressionActionsPtr primary_key_and_skip_indices_expr;
|
||||
ExpressionActionsPtr sorting_key_and_skip_indices_expr;
|
||||
|
||||
/// Names of columns for primary key + secondary sorting columns.
|
||||
Names sorting_key_columns;
|
||||
ASTPtr sorting_key_expr_ast;
|
||||
@ -721,7 +732,9 @@ private:
|
||||
/// The same for clearOldTemporaryDirectories.
|
||||
std::mutex clear_old_temporary_directories_mutex;
|
||||
|
||||
void setPrimaryKeyAndColumns(const ASTPtr & new_order_by_ast, ASTPtr new_primary_key_ast, const ColumnsDescription & new_columns, bool only_check = false);
|
||||
void setPrimaryKeyIndicesAndColumns(const ASTPtr &new_order_by_ast, ASTPtr new_primary_key_ast,
|
||||
const ColumnsDescription &new_columns,
|
||||
const IndicesDescription &indices_description, bool only_check = false);
|
||||
|
||||
void initPartitionKey();
|
||||
|
||||
@ -733,7 +746,8 @@ private:
|
||||
/// Files to be deleted are mapped to an empty string in out_rename_map.
|
||||
/// If part == nullptr, just checks that all type conversions are possible.
|
||||
void createConvertExpression(const DataPartPtr & part, const NamesAndTypesList & old_columns, const NamesAndTypesList & new_columns,
|
||||
ExpressionActionsPtr & out_expression, NameToNameMap & out_rename_map, bool & out_force_update_metadata) const;
|
||||
const IndicesAsts & old_indices, const IndicesAsts & new_indices,
|
||||
ExpressionActionsPtr & out_expression, NameToNameMap & out_rename_map, bool & out_force_update_metadata) const;
|
||||
|
||||
/// Calculates column sizes in compressed form for the current state of data_parts. Call with data_parts mutex locked.
|
||||
void calculateColumnSizesImpl();
|
||||
|
@ -334,12 +334,19 @@ MergeTreeData::DataPartsVector MergeTreeDataMergerMutator::selectAllPartsFromPar
|
||||
static void extractMergingAndGatheringColumns(
|
||||
const NamesAndTypesList & all_columns,
|
||||
const ExpressionActionsPtr & sorting_key_expr,
|
||||
const MergeTreeIndices & indexes,
|
||||
const MergeTreeData::MergingParams & merging_params,
|
||||
NamesAndTypesList & gathering_columns, Names & gathering_column_names,
|
||||
NamesAndTypesList & merging_columns, Names & merging_column_names)
|
||||
{
|
||||
Names sort_key_columns_vec = sorting_key_expr->getRequiredColumns();
|
||||
std::set<String> key_columns(sort_key_columns_vec.cbegin(), sort_key_columns_vec.cend());
|
||||
for (const auto & index : indexes)
|
||||
{
|
||||
Names index_columns_vec = index->expr->getRequiredColumns();
|
||||
std::copy(index_columns_vec.cbegin(), index_columns_vec.cend(),
|
||||
std::inserter(key_columns, key_columns.end()));
|
||||
}
|
||||
|
||||
/// Force sign column for Collapsing mode
|
||||
if (merging_params.mode == MergeTreeData::MergingParams::Collapsing)
|
||||
@ -550,7 +557,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
NamesAndTypesList gathering_columns, merging_columns;
|
||||
Names gathering_column_names, merging_column_names;
|
||||
extractMergingAndGatheringColumns(
|
||||
all_columns, data.sorting_key_expr,
|
||||
all_columns, data.sorting_key_expr, data.skip_indices,
|
||||
data.merging_params, gathering_columns, gathering_column_names, merging_columns, merging_column_names);
|
||||
|
||||
MergeTreeData::MutableDataPartPtr new_data_part = std::make_shared<MergeTreeData::DataPart>(
|
||||
@ -629,11 +636,12 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
input->setProgressCallback(MergeProgressCallback(
|
||||
merge_entry, sum_input_rows_upper_bound, column_sizes, watch_prev_elapsed, merge_alg));
|
||||
|
||||
if (data.hasPrimaryKey())
|
||||
src_streams.emplace_back(std::make_shared<MaterializingBlockInputStream>(
|
||||
std::make_shared<ExpressionBlockInputStream>(BlockInputStreamPtr(std::move(input)), data.sorting_key_expr)));
|
||||
else
|
||||
src_streams.emplace_back(std::move(input));
|
||||
BlockInputStreamPtr stream = std::move(input);
|
||||
if (data.hasPrimaryKey() || data.hasSkipIndices())
|
||||
stream = std::make_shared<MaterializingBlockInputStream>(
|
||||
std::make_shared<ExpressionBlockInputStream>(stream, data.sorting_key_and_skip_indices_expr));
|
||||
|
||||
src_streams.emplace_back(stream);
|
||||
}
|
||||
|
||||
Names sort_columns = data.sorting_key_columns;
|
||||
@ -897,10 +905,9 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
|
||||
if (in_header.columns() == all_columns.size())
|
||||
{
|
||||
/// All columns are modified, proceed to write a new part from scratch.
|
||||
|
||||
if (data.hasPrimaryKey())
|
||||
if (data.hasPrimaryKey() || data.hasSkipIndices())
|
||||
in = std::make_shared<MaterializingBlockInputStream>(
|
||||
std::make_shared<ExpressionBlockInputStream>(in, data.primary_key_expr));
|
||||
std::make_shared<ExpressionBlockInputStream>(in, data.primary_key_and_skip_indices_expr));
|
||||
|
||||
MergeTreeDataPart::MinMaxIndex minmax_idx;
|
||||
|
||||
@ -927,6 +934,20 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
|
||||
/// We will modify only some of the columns. Other columns and key values can be copied as-is.
|
||||
/// TODO: check that we modify only non-key columns in this case.
|
||||
|
||||
/// Checks if columns used in skipping indexes modified/
|
||||
for (const auto & col : in_header.getNames())
|
||||
{
|
||||
for (const auto & index : data.skip_indices)
|
||||
{
|
||||
const auto & index_cols = index->expr->getRequiredColumns();
|
||||
auto it = find(cbegin(index_cols), cend(index_cols), col);
|
||||
if (it != cend(index_cols))
|
||||
throw Exception("You can not modify columns used in index. Index name: '"
|
||||
+ index->name
|
||||
+ "' bad column: '" + *it + "'", ErrorCodes::ILLEGAL_COLUMN);
|
||||
}
|
||||
}
|
||||
|
||||
NameSet files_to_skip = {"checksums.txt", "columns.txt"};
|
||||
for (const auto & entry : in_header)
|
||||
{
|
||||
|
@ -120,7 +120,7 @@ public:
|
||||
enum class MergeAlgorithm
|
||||
{
|
||||
Horizontal, /// per-row merge of all columns
|
||||
Vertical /// per-row merge of PK columns, per-column gather for non-PK columns
|
||||
Vertical /// per-row merge of PK and secondary indices columns, per-column gather for non-PK columns
|
||||
};
|
||||
|
||||
private:
|
||||
|
@ -4,6 +4,7 @@
|
||||
#include <Core/Block.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreePartInfo.h>
|
||||
#include <Storages/MergeTree/MergeTreePartition.h>
|
||||
#include <Storages/MergeTree/MergeTreeDataPartChecksum.h>
|
||||
|
@ -1,11 +1,15 @@
|
||||
#include <boost/rational.hpp> /// For calculations related to sampling coefficients.
|
||||
#include <optional>
|
||||
|
||||
#include <Poco/File.h>
|
||||
|
||||
#include <Common/FieldVisitors.h>
|
||||
#include <Storages/MergeTree/MergeTreeDataSelectExecutor.h>
|
||||
#include <Storages/MergeTree/MergeTreeSelectBlockInputStream.h>
|
||||
#include <Storages/MergeTree/MergeTreeReadPool.h>
|
||||
#include <Storages/MergeTree/MergeTreeThreadSelectBlockInputStream.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndexReader.h>
|
||||
#include <Storages/MergeTree/KeyCondition.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
@ -528,6 +532,17 @@ BlockInputStreams MergeTreeDataSelectExecutor::readFromParts(
|
||||
else
|
||||
ranges.ranges = MarkRanges{MarkRange{0, part->marks_count}};
|
||||
|
||||
/// It can be done in multiple threads (one thread for each part).
|
||||
/// Maybe it should be moved to BlockInputStream, but it can cause some problems.
|
||||
for (const auto & index : data.skip_indices)
|
||||
{
|
||||
auto condition = index->createIndexCondition(query_info, context);
|
||||
if (!condition->alwaysUnknownOrTrue())
|
||||
{
|
||||
ranges.ranges = filterMarksUsingIndex(index, condition, part, ranges.ranges, settings);
|
||||
}
|
||||
}
|
||||
|
||||
if (!ranges.ranges.empty())
|
||||
{
|
||||
parts_with_ranges.push_back(ranges);
|
||||
@ -942,4 +957,70 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
|
||||
return res;
|
||||
}
|
||||
|
||||
MarkRanges MergeTreeDataSelectExecutor::filterMarksUsingIndex(
|
||||
MergeTreeIndexPtr index,
|
||||
IndexConditionPtr condition,
|
||||
MergeTreeData::DataPartPtr part,
|
||||
const MarkRanges & ranges,
|
||||
const Settings & settings) const
|
||||
{
|
||||
if (!Poco::File(part->getFullPath() + index->getFileName() + ".idx").exists())
|
||||
{
|
||||
LOG_DEBUG(log, "File for index `" << index->name << "` does not exist. Skipping it.");
|
||||
return ranges;
|
||||
}
|
||||
|
||||
const size_t min_marks_for_seek = (settings.merge_tree_min_rows_for_seek + data.index_granularity - 1) / data.index_granularity;
|
||||
|
||||
size_t granules_dropped = 0;
|
||||
|
||||
MergeTreeIndexReader reader(
|
||||
index, part,
|
||||
((part->marks_count + index->granularity - 1) / index->granularity),
|
||||
ranges);
|
||||
|
||||
MarkRanges res;
|
||||
|
||||
/// Some granules can cover two or more ranges,
|
||||
/// this variable is stored to avoid reading the same granule twice.
|
||||
MergeTreeIndexGranulePtr granule = nullptr;
|
||||
size_t last_index_mark = 0;
|
||||
for (const auto & range : ranges)
|
||||
{
|
||||
MarkRange index_range(
|
||||
range.begin / index->granularity,
|
||||
(range.end + index->granularity - 1) / index->granularity);
|
||||
|
||||
if (last_index_mark != index_range.begin || !granule)
|
||||
reader.seek(index_range.begin);
|
||||
|
||||
for (size_t index_mark = index_range.begin; index_mark < index_range.end; ++index_mark)
|
||||
{
|
||||
if (index_mark != index_range.begin || !granule || last_index_mark != index_range.begin)
|
||||
granule = reader.read();
|
||||
|
||||
MarkRange data_range(
|
||||
std::max(range.begin, index_mark * index->granularity),
|
||||
std::min(range.end, (index_mark + 1) * index->granularity));
|
||||
|
||||
if (!condition->mayBeTrueOnGranule(granule))
|
||||
{
|
||||
++granules_dropped;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (res.empty() || res.back().end - data_range.begin >= min_marks_for_seek)
|
||||
res.push_back(data_range);
|
||||
else
|
||||
res.back().end = data_range.end;
|
||||
}
|
||||
|
||||
last_index_mark = index_range.end - 1;
|
||||
}
|
||||
|
||||
LOG_DEBUG(log, "Index `" << index->name << "` has dropped " << granules_dropped << " granules.");
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -81,6 +81,13 @@ private:
|
||||
const MergeTreeData::DataPart::Index & index,
|
||||
const KeyCondition & key_condition,
|
||||
const Settings & settings) const;
|
||||
|
||||
MarkRanges filterMarksUsingIndex(
|
||||
MergeTreeIndexPtr index,
|
||||
IndexConditionPtr condition,
|
||||
MergeTreeData::DataPartPtr part,
|
||||
const MarkRanges & ranges,
|
||||
const Settings & settings) const;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -180,8 +180,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithPa
|
||||
dir.createDirectories();
|
||||
|
||||
/// If we need to calculate some columns to sort.
|
||||
if (data.hasSortingKey())
|
||||
data.sorting_key_expr->execute(block);
|
||||
if (data.hasSortingKey() || data.hasSkipIndices())
|
||||
data.sorting_key_and_skip_indices_expr->execute(block);
|
||||
|
||||
Names sort_columns = data.sorting_key_columns;
|
||||
SortDescription sort_description;
|
||||
|
29
dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp
Normal file
29
dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp
Normal file
@ -0,0 +1,29 @@
|
||||
#include <Storages/MergeTree/MergeTreeIndexReader.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
MergeTreeIndexReader::MergeTreeIndexReader(
|
||||
MergeTreeIndexPtr index, MergeTreeData::DataPartPtr part, size_t marks_count, const MarkRanges & all_mark_ranges)
|
||||
: index(index), stream(
|
||||
part->getFullPath() + index->getFileName(), ".idx", marks_count,
|
||||
all_mark_ranges, nullptr, false, nullptr, 0, DBMS_DEFAULT_BUFFER_SIZE,
|
||||
ReadBufferFromFileBase::ProfileCallback{}, CLOCK_MONOTONIC_COARSE)
|
||||
{
|
||||
stream.seekToStart();
|
||||
}
|
||||
|
||||
void MergeTreeIndexReader::seek(size_t mark)
|
||||
{
|
||||
stream.seekToMark(mark);
|
||||
}
|
||||
|
||||
MergeTreeIndexGranulePtr MergeTreeIndexReader::read()
|
||||
{
|
||||
auto granule = index->createIndexGranule();
|
||||
granule->deserializeBinary(*stream.data_buffer);
|
||||
return granule;
|
||||
}
|
||||
|
||||
}
|
28
dbms/src/Storages/MergeTree/MergeTreeIndexReader.h
Normal file
28
dbms/src/Storages/MergeTree/MergeTreeIndexReader.h
Normal file
@ -0,0 +1,28 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/MergeTree/MergeTreeReaderStream.h>
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeIndexReader
|
||||
{
|
||||
public:
|
||||
MergeTreeIndexReader(
|
||||
MergeTreeIndexPtr index,
|
||||
MergeTreeData::DataPartPtr part,
|
||||
size_t marks_count,
|
||||
const MarkRanges & all_mark_ranges);
|
||||
|
||||
void seek(size_t mark);
|
||||
|
||||
MergeTreeIndexGranulePtr read();
|
||||
|
||||
private:
|
||||
MergeTreeIndexPtr index;
|
||||
MergeTreeReaderStream stream;
|
||||
};
|
||||
|
||||
}
|
57
dbms/src/Storages/MergeTree/MergeTreeIndices.cpp
Normal file
57
dbms/src/Storages/MergeTree/MergeTreeIndices.cpp
Normal file
@ -0,0 +1,57 @@
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/ParserCreateQuery.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
|
||||
#include <numeric>
|
||||
|
||||
#include <boost/algorithm/string.hpp>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int INCORRECT_QUERY;
|
||||
extern const int UNKNOWN_EXCEPTION;
|
||||
}
|
||||
|
||||
void MergeTreeIndexFactory::registerIndex(const std::string &name, Creator creator)
|
||||
{
|
||||
if (!indexes.emplace(name, std::move(creator)).second)
|
||||
throw Exception("MergeTreeIndexFactory: the Index creator name '" + name + "' is not unique",
|
||||
ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> MergeTreeIndexFactory::get(
|
||||
const NamesAndTypesList & columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context) const
|
||||
{
|
||||
if (!node->type)
|
||||
throw Exception(
|
||||
"for index TYPE is required", ErrorCodes::INCORRECT_QUERY);
|
||||
if (node->type->parameters && !node->type->parameters->children.empty())
|
||||
throw Exception(
|
||||
"Index type can not have parameters", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
boost::algorithm::to_lower(node->type->name);
|
||||
auto it = indexes.find(node->type->name);
|
||||
if (it == indexes.end())
|
||||
throw Exception(
|
||||
"Unknown Index type '" + node->type->name + "'. Available index types: " +
|
||||
std::accumulate(indexes.cbegin(), indexes.cend(), std::string{},
|
||||
[] (auto && lft, const auto & rht) -> std::string {
|
||||
if (lft == "")
|
||||
return rht.first;
|
||||
else
|
||||
return lft + ", " + rht.first;
|
||||
}),
|
||||
ErrorCodes::INCORRECT_QUERY);
|
||||
return it->second(columns, node, context);
|
||||
}
|
||||
|
||||
}
|
126
dbms/src/Storages/MergeTree/MergeTreeIndices.h
Normal file
126
dbms/src/Storages/MergeTree/MergeTreeIndices.h
Normal file
@ -0,0 +1,126 @@
|
||||
#pragma once
|
||||
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include <Core/Block.h>
|
||||
#include <ext/singleton.h>
|
||||
#include <Storages/MergeTree/MergeTreeDataPartChecksum.h>
|
||||
#include <Storages/SelectQueryInfo.h>
|
||||
#include <Storages/MergeTree/MarkRange.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Parsers/ASTIndexDeclaration.h>
|
||||
|
||||
constexpr auto INDEX_FILE_PREFIX = "skp_idx_";
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeData;
|
||||
class MergeTreeIndex;
|
||||
|
||||
using MergeTreeIndexPtr = std::shared_ptr<const MergeTreeIndex>;
|
||||
using MutableMergeTreeIndexPtr = std::shared_ptr<MergeTreeIndex>;
|
||||
|
||||
|
||||
struct MergeTreeIndexGranule
|
||||
{
|
||||
virtual ~MergeTreeIndexGranule() = default;
|
||||
|
||||
virtual void serializeBinary(WriteBuffer & ostr) const = 0;
|
||||
virtual void deserializeBinary(ReadBuffer & istr) = 0;
|
||||
|
||||
virtual String toString() const = 0;
|
||||
virtual bool empty() const = 0;
|
||||
|
||||
virtual void update(const Block & block, size_t * pos, size_t limit) = 0;
|
||||
};
|
||||
|
||||
|
||||
using MergeTreeIndexGranulePtr = std::shared_ptr<MergeTreeIndexGranule>;
|
||||
using MergeTreeIndexGranules = std::vector<MergeTreeIndexGranulePtr>;
|
||||
|
||||
/// Condition on the index.
|
||||
class IndexCondition
|
||||
{
|
||||
public:
|
||||
virtual ~IndexCondition() = default;
|
||||
/// Checks if this index is useful for query.
|
||||
virtual bool alwaysUnknownOrTrue() const = 0;
|
||||
|
||||
virtual bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr granule) const = 0;
|
||||
};
|
||||
|
||||
using IndexConditionPtr = std::shared_ptr<IndexCondition>;
|
||||
|
||||
|
||||
/// Structure for storing basic index info like columns, expression, arguments, ...
|
||||
class MergeTreeIndex
|
||||
{
|
||||
public:
|
||||
MergeTreeIndex(
|
||||
String name,
|
||||
ExpressionActionsPtr expr,
|
||||
const Names & columns,
|
||||
const DataTypes & data_types,
|
||||
const Block & header,
|
||||
size_t granularity)
|
||||
: name(name)
|
||||
, expr(expr)
|
||||
, columns(columns)
|
||||
, data_types(data_types)
|
||||
, header(header)
|
||||
, granularity(granularity) {}
|
||||
|
||||
virtual ~MergeTreeIndex() = default;
|
||||
|
||||
/// gets filename without extension
|
||||
String getFileName() const { return INDEX_FILE_PREFIX + name; }
|
||||
|
||||
virtual MergeTreeIndexGranulePtr createIndexGranule() const = 0;
|
||||
|
||||
virtual IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query_info, const Context & context) const = 0;
|
||||
|
||||
String name;
|
||||
ExpressionActionsPtr expr;
|
||||
Names columns;
|
||||
DataTypes data_types;
|
||||
Block header;
|
||||
size_t granularity;
|
||||
};
|
||||
|
||||
|
||||
using MergeTreeIndices = std::vector<MutableMergeTreeIndexPtr>;
|
||||
|
||||
|
||||
class MergeTreeIndexFactory : public ext::singleton<MergeTreeIndexFactory>
|
||||
{
|
||||
friend class ext::singleton<MergeTreeIndexFactory>;
|
||||
|
||||
public:
|
||||
using Creator = std::function<
|
||||
std::unique_ptr<MergeTreeIndex>(
|
||||
const NamesAndTypesList & columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)>;
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> get(
|
||||
const NamesAndTypesList & columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context) const;
|
||||
|
||||
void registerIndex(const std::string & name, Creator creator);
|
||||
|
||||
const auto & getAllIndexes() const { return indexes; }
|
||||
|
||||
protected:
|
||||
MergeTreeIndexFactory() = default;
|
||||
|
||||
private:
|
||||
using Indexes = std::unordered_map<std::string, Creator>;
|
||||
Indexes indexes;
|
||||
};
|
||||
|
||||
}
|
164
dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.cpp
Normal file
164
dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.cpp
Normal file
@ -0,0 +1,164 @@
|
||||
#include <Storages/MergeTree/MergeTreeMinMaxIndex.h>
|
||||
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Interpreters/ExpressionAnalyzer.h>
|
||||
#include <Interpreters/SyntaxAnalyzer.h>
|
||||
|
||||
#include <Poco/Logger.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int INCORRECT_QUERY;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeMinMaxGranule::MergeTreeMinMaxGranule(const MergeTreeMinMaxIndex & index)
|
||||
: MergeTreeIndexGranule(), index(index), parallelogram()
|
||||
{
|
||||
}
|
||||
|
||||
void MergeTreeMinMaxGranule::serializeBinary(WriteBuffer & ostr) const
|
||||
{
|
||||
if (empty())
|
||||
throw Exception(
|
||||
"Attempt to write empty minmax index `" + index.name + "`", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const DataTypePtr & type = index.data_types[i];
|
||||
|
||||
type->serializeBinary(parallelogram[i].left, ostr);
|
||||
type->serializeBinary(parallelogram[i].right, ostr);
|
||||
}
|
||||
}
|
||||
|
||||
void MergeTreeMinMaxGranule::deserializeBinary(ReadBuffer & istr)
|
||||
{
|
||||
parallelogram.clear();
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const DataTypePtr & type = index.data_types[i];
|
||||
|
||||
Field min_val;
|
||||
type->deserializeBinary(min_val, istr);
|
||||
Field max_val;
|
||||
type->deserializeBinary(max_val, istr);
|
||||
|
||||
parallelogram.emplace_back(min_val, true, max_val, true);
|
||||
}
|
||||
}
|
||||
|
||||
String MergeTreeMinMaxGranule::toString() const
|
||||
{
|
||||
String res = "";
|
||||
|
||||
for (size_t i = 0; i < parallelogram.size(); ++i)
|
||||
{
|
||||
res += "["
|
||||
+ applyVisitor(FieldVisitorToString(), parallelogram[i].left) + ", "
|
||||
+ applyVisitor(FieldVisitorToString(), parallelogram[i].right) + "]";
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void MergeTreeMinMaxGranule::update(const Block & block, size_t * pos, size_t limit)
|
||||
{
|
||||
size_t rows_read = std::min(limit, block.rows() - *pos);
|
||||
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const auto & column = block.getByName(index.columns[i]).column;
|
||||
|
||||
Field field_min, field_max;
|
||||
column->cut(*pos, rows_read)->getExtremes(field_min, field_max);
|
||||
|
||||
if (parallelogram.size() <= i)
|
||||
{
|
||||
parallelogram.emplace_back(field_min, true, field_max, true);
|
||||
}
|
||||
else
|
||||
{
|
||||
parallelogram[i].left = std::min(parallelogram[i].left, field_min);
|
||||
parallelogram[i].right = std::max(parallelogram[i].right, field_max);
|
||||
}
|
||||
}
|
||||
|
||||
*pos += rows_read;
|
||||
}
|
||||
|
||||
|
||||
MinMaxCondition::MinMaxCondition(
|
||||
const SelectQueryInfo &query,
|
||||
const Context &context,
|
||||
const MergeTreeMinMaxIndex &index)
|
||||
: IndexCondition(), index(index), condition(query, context, index.columns, index.expr) {}
|
||||
|
||||
bool MinMaxCondition::alwaysUnknownOrTrue() const
|
||||
{
|
||||
return condition.alwaysUnknownOrTrue();
|
||||
}
|
||||
|
||||
bool MinMaxCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
|
||||
{
|
||||
std::shared_ptr<MergeTreeMinMaxGranule> granule
|
||||
= std::dynamic_pointer_cast<MergeTreeMinMaxGranule>(idx_granule);
|
||||
if (!granule)
|
||||
throw Exception(
|
||||
"Minmax index condition got wrong granule", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
return condition.mayBeTrueInParallelogram(granule->parallelogram, index.data_types);
|
||||
}
|
||||
|
||||
|
||||
MergeTreeIndexGranulePtr MergeTreeMinMaxIndex::createIndexGranule() const
|
||||
{
|
||||
return std::make_shared<MergeTreeMinMaxGranule>(*this);
|
||||
}
|
||||
|
||||
IndexConditionPtr MergeTreeMinMaxIndex::createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const
|
||||
{
|
||||
return std::make_shared<MinMaxCondition>(query, context, *this);
|
||||
};
|
||||
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> MergeTreeMinMaxIndexCreator(
|
||||
const NamesAndTypesList & new_columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)
|
||||
{
|
||||
if (node->name.empty())
|
||||
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
if (node->type->arguments)
|
||||
throw Exception("Minmax index have not any arguments", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(node->expr->clone());
|
||||
auto syntax = SyntaxAnalyzer(context, {}).analyze(
|
||||
expr_list, new_columns);
|
||||
auto minmax_expr = ExpressionAnalyzer(expr_list, syntax, context).getActions(false);
|
||||
|
||||
auto sample = ExpressionAnalyzer(expr_list, syntax, context)
|
||||
.getActions(true)->getSampleBlock();
|
||||
|
||||
Names columns;
|
||||
DataTypes data_types;
|
||||
|
||||
for (size_t i = 0; i < expr_list->children.size(); ++i)
|
||||
{
|
||||
const auto & column = sample.getByPosition(i);
|
||||
|
||||
columns.emplace_back(column.name);
|
||||
data_types.emplace_back(column.type);
|
||||
}
|
||||
|
||||
return std::make_unique<MergeTreeMinMaxIndex>(
|
||||
node->name, std::move(minmax_expr), columns, data_types, sample, node->granularity.get<size_t>());
|
||||
}
|
||||
|
||||
}
|
78
dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.h
Normal file
78
dbms/src/Storages/MergeTree/MergeTreeMinMaxIndex.h
Normal file
@ -0,0 +1,78 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
#include <Storages/MergeTree/KeyCondition.h>
|
||||
|
||||
#include <memory>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeMinMaxIndex;
|
||||
|
||||
|
||||
struct MergeTreeMinMaxGranule : public MergeTreeIndexGranule
|
||||
{
|
||||
explicit MergeTreeMinMaxGranule(const MergeTreeMinMaxIndex & index);
|
||||
|
||||
void serializeBinary(WriteBuffer & ostr) const override;
|
||||
void deserializeBinary(ReadBuffer & istr) override;
|
||||
|
||||
String toString() const override;
|
||||
bool empty() const override { return parallelogram.empty(); }
|
||||
|
||||
void update(const Block & block, size_t * pos, size_t limit) override;
|
||||
|
||||
~MergeTreeMinMaxGranule() override = default;
|
||||
|
||||
const MergeTreeMinMaxIndex & index;
|
||||
std::vector<Range> parallelogram;
|
||||
};
|
||||
|
||||
|
||||
class MinMaxCondition : public IndexCondition
|
||||
{
|
||||
public:
|
||||
MinMaxCondition(
|
||||
const SelectQueryInfo & query,
|
||||
const Context & context,
|
||||
const MergeTreeMinMaxIndex & index);
|
||||
|
||||
bool alwaysUnknownOrTrue() const override;
|
||||
|
||||
bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
|
||||
|
||||
~MinMaxCondition() override = default;
|
||||
private:
|
||||
const MergeTreeMinMaxIndex & index;
|
||||
KeyCondition condition;
|
||||
};
|
||||
|
||||
|
||||
class MergeTreeMinMaxIndex : public MergeTreeIndex
|
||||
{
|
||||
public:
|
||||
MergeTreeMinMaxIndex(
|
||||
String name_,
|
||||
ExpressionActionsPtr expr_,
|
||||
const Names & columns_,
|
||||
const DataTypes & data_types_,
|
||||
const Block & header_,
|
||||
size_t granularity_)
|
||||
: MergeTreeIndex(name_, expr_, columns_, data_types_, header_, granularity_) {}
|
||||
|
||||
~MergeTreeMinMaxIndex() override = default;
|
||||
|
||||
MergeTreeIndexGranulePtr createIndexGranule() const override;
|
||||
|
||||
IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const override;
|
||||
|
||||
};
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> MergeTreeMinMaxIndexCreator(
|
||||
const NamesAndTypesList & columns, std::shared_ptr<ASTIndexDeclaration> node, const Context & context);
|
||||
|
||||
}
|
@ -154,205 +154,6 @@ size_t MergeTreeReader::readRows(size_t from_mark, bool continue_reading, size_t
|
||||
return read_rows;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeReader::Stream::Stream(
|
||||
const String & path_prefix_, const String & extension_, size_t marks_count_,
|
||||
const MarkRanges & all_mark_ranges,
|
||||
MarkCache * mark_cache_, bool save_marks_in_cache_,
|
||||
UncompressedCache * uncompressed_cache,
|
||||
size_t aio_threshold, size_t max_read_buffer_size,
|
||||
const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type)
|
||||
: path_prefix(path_prefix_), extension(extension_), marks_count(marks_count_)
|
||||
, mark_cache(mark_cache_), save_marks_in_cache(save_marks_in_cache_)
|
||||
{
|
||||
/// Compute the size of the buffer.
|
||||
size_t max_mark_range = 0;
|
||||
|
||||
for (size_t i = 0; i < all_mark_ranges.size(); ++i)
|
||||
{
|
||||
size_t right = all_mark_ranges[i].end;
|
||||
/// NOTE: if we are reading the whole file, then right == marks_count
|
||||
/// and we will use max_read_buffer_size for buffer size, thus avoiding the need to load marks.
|
||||
|
||||
/// If the end of range is inside the block, we will need to read it too.
|
||||
if (right < marks_count && getMark(right).offset_in_decompressed_block > 0)
|
||||
{
|
||||
while (right < marks_count
|
||||
&& getMark(right).offset_in_compressed_file
|
||||
== getMark(all_mark_ranges[i].end).offset_in_compressed_file)
|
||||
{
|
||||
++right;
|
||||
}
|
||||
}
|
||||
|
||||
/// If there are no marks after the end of range, just use max_read_buffer_size
|
||||
if (right >= marks_count
|
||||
|| (right + 1 == marks_count
|
||||
&& getMark(right).offset_in_compressed_file
|
||||
== getMark(all_mark_ranges[i].end).offset_in_compressed_file))
|
||||
{
|
||||
max_mark_range = max_read_buffer_size;
|
||||
break;
|
||||
}
|
||||
|
||||
max_mark_range = std::max(max_mark_range,
|
||||
getMark(right).offset_in_compressed_file - getMark(all_mark_ranges[i].begin).offset_in_compressed_file);
|
||||
}
|
||||
|
||||
/// Avoid empty buffer. May happen while reading dictionary for DataTypeLowCardinality.
|
||||
/// For example: part has single dictionary and all marks point to the same position.
|
||||
if (max_mark_range == 0)
|
||||
max_mark_range = max_read_buffer_size;
|
||||
|
||||
size_t buffer_size = std::min(max_read_buffer_size, max_mark_range);
|
||||
|
||||
/// Estimate size of the data to be read.
|
||||
size_t estimated_size = 0;
|
||||
if (aio_threshold > 0)
|
||||
{
|
||||
for (const auto & mark_range : all_mark_ranges)
|
||||
{
|
||||
size_t offset_begin = (mark_range.begin > 0)
|
||||
? getMark(mark_range.begin).offset_in_compressed_file
|
||||
: 0;
|
||||
|
||||
size_t offset_end = (mark_range.end < marks_count)
|
||||
? getMark(mark_range.end).offset_in_compressed_file
|
||||
: Poco::File(path_prefix + extension).getSize();
|
||||
|
||||
if (offset_end > offset_begin)
|
||||
estimated_size += offset_end - offset_begin;
|
||||
}
|
||||
}
|
||||
|
||||
/// Initialize the objects that shall be used to perform read operations.
|
||||
if (uncompressed_cache)
|
||||
{
|
||||
auto buffer = std::make_unique<CachedCompressedReadBuffer>(
|
||||
path_prefix + extension, uncompressed_cache, estimated_size, aio_threshold, buffer_size);
|
||||
|
||||
if (profile_callback)
|
||||
buffer->setProfileCallback(profile_callback, clock_type);
|
||||
|
||||
cached_buffer = std::move(buffer);
|
||||
data_buffer = cached_buffer.get();
|
||||
}
|
||||
else
|
||||
{
|
||||
auto buffer = std::make_unique<CompressedReadBufferFromFile>(
|
||||
path_prefix + extension, estimated_size, aio_threshold, buffer_size);
|
||||
|
||||
if (profile_callback)
|
||||
buffer->setProfileCallback(profile_callback, clock_type);
|
||||
|
||||
non_cached_buffer = std::move(buffer);
|
||||
data_buffer = non_cached_buffer.get();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
const MarkInCompressedFile & MergeTreeReader::Stream::getMark(size_t index)
|
||||
{
|
||||
if (!marks)
|
||||
loadMarks();
|
||||
return (*marks)[index];
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReader::Stream::loadMarks()
|
||||
{
|
||||
std::string mrk_path = path_prefix + ".mrk";
|
||||
|
||||
auto load = [&]() -> MarkCache::MappedPtr
|
||||
{
|
||||
/// Memory for marks must not be accounted as memory usage for query, because they are stored in shared cache.
|
||||
auto temporarily_disable_memory_tracker = getCurrentMemoryTrackerActionLock();
|
||||
|
||||
size_t file_size = Poco::File(mrk_path).getSize();
|
||||
size_t expected_file_size = sizeof(MarkInCompressedFile) * marks_count;
|
||||
if (expected_file_size != file_size)
|
||||
throw Exception(
|
||||
"bad size of marks file `" + mrk_path + "':" + std::to_string(file_size) + ", must be: " + std::to_string(expected_file_size),
|
||||
ErrorCodes::CORRUPTED_DATA);
|
||||
|
||||
auto res = std::make_shared<MarksInCompressedFile>(marks_count);
|
||||
|
||||
/// Read directly to marks.
|
||||
ReadBufferFromFile buffer(mrk_path, file_size, -1, reinterpret_cast<char *>(res->data()));
|
||||
|
||||
if (buffer.eof() || buffer.buffer().size() != file_size)
|
||||
throw Exception("Cannot read all marks from file " + mrk_path, ErrorCodes::CANNOT_READ_ALL_DATA);
|
||||
|
||||
return res;
|
||||
};
|
||||
|
||||
if (mark_cache)
|
||||
{
|
||||
auto key = mark_cache->hash(mrk_path);
|
||||
if (save_marks_in_cache)
|
||||
{
|
||||
marks = mark_cache->getOrSet(key, load);
|
||||
}
|
||||
else
|
||||
{
|
||||
marks = mark_cache->get(key);
|
||||
if (!marks)
|
||||
marks = load();
|
||||
}
|
||||
}
|
||||
else
|
||||
marks = load();
|
||||
|
||||
if (!marks)
|
||||
throw Exception("Failed to load marks: " + mrk_path, ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReader::Stream::seekToMark(size_t index)
|
||||
{
|
||||
MarkInCompressedFile mark = getMark(index);
|
||||
|
||||
try
|
||||
{
|
||||
if (cached_buffer)
|
||||
cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
|
||||
if (non_cached_buffer)
|
||||
non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
|
||||
}
|
||||
catch (Exception & e)
|
||||
{
|
||||
/// Better diagnostics.
|
||||
if (e.code() == ErrorCodes::ARGUMENT_OUT_OF_BOUND)
|
||||
e.addMessage("(while seeking to mark " + toString(index)
|
||||
+ " of column " + path_prefix + "; offsets are: "
|
||||
+ toString(mark.offset_in_compressed_file) + " "
|
||||
+ toString(mark.offset_in_decompressed_block) + ")");
|
||||
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReader::Stream::seekToStart()
|
||||
{
|
||||
try
|
||||
{
|
||||
if (cached_buffer)
|
||||
cached_buffer->seek(0, 0);
|
||||
if (non_cached_buffer)
|
||||
non_cached_buffer->seek(0, 0);
|
||||
}
|
||||
catch (Exception & e)
|
||||
{
|
||||
/// Better diagnostics.
|
||||
if (e.code() == ErrorCodes::ARGUMENT_OUT_OF_BOUND)
|
||||
e.addMessage("(while seeking to start of column " + path_prefix + ")");
|
||||
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReader::addStreams(const String & name, const IDataType & type,
|
||||
const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type)
|
||||
{
|
||||
@ -371,7 +172,7 @@ void MergeTreeReader::addStreams(const String & name, const IDataType & type,
|
||||
if (!data_file_exists)
|
||||
return;
|
||||
|
||||
streams.emplace(stream_name, std::make_unique<Stream>(
|
||||
streams.emplace(stream_name, std::make_unique<MergeTreeReaderStream>(
|
||||
path + stream_name, DATA_FILE_EXTENSION, data_part->marks_count,
|
||||
all_mark_ranges, mark_cache, save_marks_in_cache,
|
||||
uncompressed_cache, aio_threshold, max_read_buffer_size, profile_callback, clock_type));
|
||||
@ -401,7 +202,7 @@ void MergeTreeReader::readData(
|
||||
if (it == streams.end())
|
||||
return nullptr;
|
||||
|
||||
Stream & stream = *it->second;
|
||||
MergeTreeReaderStream & stream = *it->second;
|
||||
|
||||
if (stream_for_prefix)
|
||||
{
|
||||
|
@ -1,11 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/MarkCache.h>
|
||||
#include <Storages/MergeTree/MarkRange.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
#include <Storages/MergeTree/MergeTreeRangeReader.h>
|
||||
#include <Compression/CompressedReadBufferFromFile.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Storages/MergeTree/MergeTreeReaderStream.h>
|
||||
#include <port/clock.h>
|
||||
|
||||
|
||||
@ -13,7 +9,6 @@ namespace DB
|
||||
{
|
||||
|
||||
class IDataType;
|
||||
class CachedCompressedReadBuffer;
|
||||
|
||||
/// Reads the data between pairs of marks in the same part. When reading consecutive ranges, avoids unnecessary seeks.
|
||||
/// When ranges are almost consecutive, seeks are fast because they are performed inside the buffer.
|
||||
@ -57,44 +52,7 @@ public:
|
||||
size_t readRows(size_t from_mark, bool continue_reading, size_t max_rows_to_read, Block & res);
|
||||
|
||||
private:
|
||||
class Stream
|
||||
{
|
||||
public:
|
||||
Stream(
|
||||
const String & path_prefix_, const String & extension_, size_t marks_count_,
|
||||
const MarkRanges & all_mark_ranges,
|
||||
MarkCache * mark_cache, bool save_marks_in_cache,
|
||||
UncompressedCache * uncompressed_cache,
|
||||
size_t aio_threshold, size_t max_read_buffer_size,
|
||||
const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type);
|
||||
|
||||
void seekToMark(size_t index);
|
||||
void seekToStart();
|
||||
|
||||
ReadBuffer * data_buffer;
|
||||
|
||||
private:
|
||||
Stream() = default;
|
||||
|
||||
/// NOTE: lazily loads marks from the marks cache.
|
||||
const MarkInCompressedFile & getMark(size_t index);
|
||||
|
||||
void loadMarks();
|
||||
|
||||
std::string path_prefix;
|
||||
std::string extension;
|
||||
|
||||
size_t marks_count;
|
||||
|
||||
MarkCache * mark_cache;
|
||||
bool save_marks_in_cache;
|
||||
MarkCache::MappedPtr marks;
|
||||
|
||||
std::unique_ptr<CachedCompressedReadBuffer> cached_buffer;
|
||||
std::unique_ptr<CompressedReadBufferFromFile> non_cached_buffer;
|
||||
};
|
||||
|
||||
using FileStreams = std::map<std::string, std::unique_ptr<Stream>>;
|
||||
using FileStreams = std::map<std::string, std::unique_ptr<MergeTreeReaderStream>>;
|
||||
|
||||
/// avg_value_size_hints are used to reduce the number of reallocations when creating columns of variable size.
|
||||
ValueSizeMap avg_value_size_hints;
|
||||
|
215
dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp
Normal file
215
dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp
Normal file
@ -0,0 +1,215 @@
|
||||
#include <Common/MemoryTracker.h>
|
||||
#include <Storages/MergeTree/MergeTreeReaderStream.h>
|
||||
#include <Poco/File.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int CORRUPTED_DATA;
|
||||
extern const int CANNOT_READ_ALL_DATA;
|
||||
extern const int ARGUMENT_OUT_OF_BOUND;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeReaderStream::MergeTreeReaderStream(
|
||||
const String & path_prefix_, const String & extension_, size_t marks_count_,
|
||||
const MarkRanges & all_mark_ranges,
|
||||
MarkCache * mark_cache_, bool save_marks_in_cache_,
|
||||
UncompressedCache * uncompressed_cache,
|
||||
size_t aio_threshold, size_t max_read_buffer_size,
|
||||
const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type)
|
||||
: path_prefix(path_prefix_), extension(extension_), marks_count(marks_count_)
|
||||
, mark_cache(mark_cache_), save_marks_in_cache(save_marks_in_cache_)
|
||||
{
|
||||
/// Compute the size of the buffer.
|
||||
size_t max_mark_range = 0;
|
||||
|
||||
for (size_t i = 0; i < all_mark_ranges.size(); ++i)
|
||||
{
|
||||
size_t right = all_mark_ranges[i].end;
|
||||
/// NOTE: if we are reading the whole file, then right == marks_count
|
||||
/// and we will use max_read_buffer_size for buffer size, thus avoiding the need to load marks.
|
||||
|
||||
/// If the end of range is inside the block, we will need to read it too.
|
||||
if (right < marks_count && getMark(right).offset_in_decompressed_block > 0)
|
||||
{
|
||||
while (right < marks_count
|
||||
&& getMark(right).offset_in_compressed_file
|
||||
== getMark(all_mark_ranges[i].end).offset_in_compressed_file)
|
||||
{
|
||||
++right;
|
||||
}
|
||||
}
|
||||
|
||||
/// If there are no marks after the end of range, just use max_read_buffer_size
|
||||
if (right >= marks_count
|
||||
|| (right + 1 == marks_count
|
||||
&& getMark(right).offset_in_compressed_file
|
||||
== getMark(all_mark_ranges[i].end).offset_in_compressed_file))
|
||||
{
|
||||
max_mark_range = max_read_buffer_size;
|
||||
break;
|
||||
}
|
||||
|
||||
max_mark_range = std::max(max_mark_range,
|
||||
getMark(right).offset_in_compressed_file - getMark(all_mark_ranges[i].begin).offset_in_compressed_file);
|
||||
}
|
||||
|
||||
/// Avoid empty buffer. May happen while reading dictionary for DataTypeLowCardinality.
|
||||
/// For example: part has single dictionary and all marks point to the same position.
|
||||
if (max_mark_range == 0)
|
||||
max_mark_range = max_read_buffer_size;
|
||||
|
||||
size_t buffer_size = std::min(max_read_buffer_size, max_mark_range);
|
||||
|
||||
/// Estimate size of the data to be read.
|
||||
size_t estimated_size = 0;
|
||||
if (aio_threshold > 0)
|
||||
{
|
||||
for (const auto & mark_range : all_mark_ranges)
|
||||
{
|
||||
size_t offset_begin = (mark_range.begin > 0)
|
||||
? getMark(mark_range.begin).offset_in_compressed_file
|
||||
: 0;
|
||||
|
||||
size_t offset_end = (mark_range.end < marks_count)
|
||||
? getMark(mark_range.end).offset_in_compressed_file
|
||||
: Poco::File(path_prefix + extension).getSize();
|
||||
|
||||
if (offset_end > offset_begin)
|
||||
estimated_size += offset_end - offset_begin;
|
||||
}
|
||||
}
|
||||
|
||||
/// Initialize the objects that shall be used to perform read operations.
|
||||
if (uncompressed_cache)
|
||||
{
|
||||
auto buffer = std::make_unique<CachedCompressedReadBuffer>(
|
||||
path_prefix + extension, uncompressed_cache, estimated_size, aio_threshold, buffer_size);
|
||||
|
||||
if (profile_callback)
|
||||
buffer->setProfileCallback(profile_callback, clock_type);
|
||||
|
||||
cached_buffer = std::move(buffer);
|
||||
data_buffer = cached_buffer.get();
|
||||
}
|
||||
else
|
||||
{
|
||||
auto buffer = std::make_unique<CompressedReadBufferFromFile>(
|
||||
path_prefix + extension, estimated_size, aio_threshold, buffer_size);
|
||||
|
||||
if (profile_callback)
|
||||
buffer->setProfileCallback(profile_callback, clock_type);
|
||||
|
||||
non_cached_buffer = std::move(buffer);
|
||||
data_buffer = non_cached_buffer.get();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
const MarkInCompressedFile & MergeTreeReaderStream::getMark(size_t index)
|
||||
{
|
||||
if (!marks)
|
||||
loadMarks();
|
||||
return (*marks)[index];
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReaderStream::loadMarks()
|
||||
{
|
||||
std::string mrk_path = path_prefix + ".mrk";
|
||||
|
||||
auto load = [&]() -> MarkCache::MappedPtr
|
||||
{
|
||||
/// Memory for marks must not be accounted as memory usage for query, because they are stored in shared cache.
|
||||
auto temporarily_disable_memory_tracker = getCurrentMemoryTrackerActionLock();
|
||||
|
||||
size_t file_size = Poco::File(mrk_path).getSize();
|
||||
size_t expected_file_size = sizeof(MarkInCompressedFile) * marks_count;
|
||||
if (expected_file_size != file_size)
|
||||
throw Exception(
|
||||
"bad size of marks file `" + mrk_path + "':" + std::to_string(file_size) + ", must be: " + std::to_string(expected_file_size),
|
||||
ErrorCodes::CORRUPTED_DATA);
|
||||
|
||||
auto res = std::make_shared<MarksInCompressedFile>(marks_count);
|
||||
|
||||
/// Read directly to marks.
|
||||
ReadBufferFromFile buffer(mrk_path, file_size, -1, reinterpret_cast<char *>(res->data()));
|
||||
|
||||
if (buffer.eof() || buffer.buffer().size() != file_size)
|
||||
throw Exception("Cannot read all marks from file " + mrk_path, ErrorCodes::CANNOT_READ_ALL_DATA);
|
||||
|
||||
return res;
|
||||
};
|
||||
|
||||
if (mark_cache)
|
||||
{
|
||||
auto key = mark_cache->hash(mrk_path);
|
||||
if (save_marks_in_cache)
|
||||
{
|
||||
marks = mark_cache->getOrSet(key, load);
|
||||
}
|
||||
else
|
||||
{
|
||||
marks = mark_cache->get(key);
|
||||
if (!marks)
|
||||
marks = load();
|
||||
}
|
||||
}
|
||||
else
|
||||
marks = load();
|
||||
|
||||
if (!marks)
|
||||
throw Exception("Failed to load marks: " + mrk_path, ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReaderStream::seekToMark(size_t index)
|
||||
{
|
||||
MarkInCompressedFile mark = getMark(index);
|
||||
|
||||
try
|
||||
{
|
||||
if (cached_buffer)
|
||||
cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
|
||||
if (non_cached_buffer)
|
||||
non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
|
||||
}
|
||||
catch (Exception & e)
|
||||
{
|
||||
/// Better diagnostics.
|
||||
if (e.code() == ErrorCodes::ARGUMENT_OUT_OF_BOUND)
|
||||
e.addMessage("(while seeking to mark " + toString(index)
|
||||
+ " of column " + path_prefix + "; offsets are: "
|
||||
+ toString(mark.offset_in_compressed_file) + " "
|
||||
+ toString(mark.offset_in_decompressed_block) + ")");
|
||||
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void MergeTreeReaderStream::seekToStart()
|
||||
{
|
||||
try
|
||||
{
|
||||
if (cached_buffer)
|
||||
cached_buffer->seek(0, 0);
|
||||
if (non_cached_buffer)
|
||||
non_cached_buffer->seek(0, 0);
|
||||
}
|
||||
catch (Exception & e)
|
||||
{
|
||||
/// Better diagnostics.
|
||||
if (e.code() == ErrorCodes::ARGUMENT_OUT_OF_BOUND)
|
||||
e.addMessage("(while seeking to start of column " + path_prefix + ")");
|
||||
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
49
dbms/src/Storages/MergeTree/MergeTreeReaderStream.h
Normal file
49
dbms/src/Storages/MergeTree/MergeTreeReaderStream.h
Normal file
@ -0,0 +1,49 @@
|
||||
#include <Storages/MarkCache.h>
|
||||
#include <Storages/MergeTree/MarkRange.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
#include <Storages/MergeTree/MergeTreeRangeReader.h>
|
||||
#include <Compression/CachedCompressedReadBuffer.h>
|
||||
#include <Compression/CompressedReadBufferFromFile.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeReaderStream
|
||||
{
|
||||
public:
|
||||
MergeTreeReaderStream(
|
||||
const String &path_prefix_, const String &extension_, size_t marks_count_,
|
||||
const MarkRanges &all_mark_ranges,
|
||||
MarkCache *mark_cache, bool save_marks_in_cache,
|
||||
UncompressedCache *uncompressed_cache,
|
||||
size_t aio_threshold, size_t max_read_buffer_size,
|
||||
const ReadBufferFromFileBase::ProfileCallback &profile_callback, clockid_t clock_type);
|
||||
|
||||
void seekToMark(size_t index);
|
||||
|
||||
void seekToStart();
|
||||
|
||||
ReadBuffer *data_buffer;
|
||||
|
||||
private:
|
||||
MergeTreeReaderStream() = default;
|
||||
|
||||
/// NOTE: lazily loads marks from the marks cache.
|
||||
const MarkInCompressedFile &getMark(size_t index);
|
||||
|
||||
void loadMarks();
|
||||
|
||||
std::string path_prefix;
|
||||
std::string extension;
|
||||
|
||||
size_t marks_count;
|
||||
|
||||
MarkCache *mark_cache;
|
||||
bool save_marks_in_cache;
|
||||
MarkCache::MappedPtr marks;
|
||||
|
||||
std::unique_ptr<CachedCompressedReadBuffer> cached_buffer;
|
||||
std::unique_ptr<CompressedReadBufferFromFile> non_cached_buffer;
|
||||
};
|
||||
}
|
393
dbms/src/Storages/MergeTree/MergeTreeUniqueIndex.cpp
Normal file
393
dbms/src/Storages/MergeTree/MergeTreeUniqueIndex.cpp
Normal file
@ -0,0 +1,393 @@
|
||||
#include <Storages/MergeTree/MergeTreeUniqueIndex.h>
|
||||
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
#include <Interpreters/ExpressionAnalyzer.h>
|
||||
#include <Interpreters/SyntaxAnalyzer.h>
|
||||
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
|
||||
#include <Poco/Logger.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int INCORRECT_QUERY;
|
||||
}
|
||||
|
||||
MergeTreeUniqueGranule::MergeTreeUniqueGranule(const MergeTreeUniqueIndex & index)
|
||||
: MergeTreeIndexGranule(), index(index), set(new Set(SizeLimits{}, true))
|
||||
{
|
||||
set->setHeader(index.header);
|
||||
}
|
||||
|
||||
void MergeTreeUniqueGranule::serializeBinary(WriteBuffer & ostr) const
|
||||
{
|
||||
if (empty())
|
||||
throw Exception(
|
||||
"Attempt to write empty unique index `" + index.name + "`", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
const auto & columns = set->getSetElements();
|
||||
const auto & size_type = DataTypePtr(std::make_shared<DataTypeUInt64>());
|
||||
|
||||
if (index.max_rows && size() > index.max_rows)
|
||||
{
|
||||
size_type->serializeBinary(0, ostr);
|
||||
return;
|
||||
}
|
||||
|
||||
size_type->serializeBinary(size(), ostr);
|
||||
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const auto & type = index.data_types[i];
|
||||
type->serializeBinaryBulk(*columns[i], ostr, 0, size());
|
||||
}
|
||||
}
|
||||
|
||||
void MergeTreeUniqueGranule::deserializeBinary(ReadBuffer & istr)
|
||||
{
|
||||
if (!set->empty())
|
||||
{
|
||||
auto new_set = std::make_unique<Set>(SizeLimits{}, true);
|
||||
new_set->setHeader(index.header);
|
||||
set.swap(new_set);
|
||||
}
|
||||
|
||||
Block block;
|
||||
Field field_rows;
|
||||
const auto & size_type = DataTypePtr(std::make_shared<DataTypeUInt64>());
|
||||
size_type->deserializeBinary(field_rows, istr);
|
||||
size_t rows_to_read = field_rows.get<size_t>();
|
||||
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const auto & type = index.data_types[i];
|
||||
auto new_column = type->createColumn();
|
||||
type->deserializeBinaryBulk(*new_column, istr, rows_to_read, 0);
|
||||
|
||||
block.insert(ColumnWithTypeAndName(new_column->getPtr(), type, index.columns[i]));
|
||||
}
|
||||
|
||||
set->insertFromBlock(block);
|
||||
}
|
||||
|
||||
String MergeTreeUniqueGranule::toString() const
|
||||
{
|
||||
String res = "";
|
||||
|
||||
const auto & columns = set->getSetElements();
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const auto & column = columns[i];
|
||||
res += " [";
|
||||
for (size_t j = 0; j < column->size(); ++j)
|
||||
{
|
||||
if (j != 0)
|
||||
res += ", ";
|
||||
Field field;
|
||||
column->get(j, field);
|
||||
res += applyVisitor(FieldVisitorToString(), field);
|
||||
}
|
||||
res += "]\n";
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
void MergeTreeUniqueGranule::update(const Block & new_block, size_t * pos, size_t limit)
|
||||
{
|
||||
size_t rows_read = std::min(limit, new_block.rows() - *pos);
|
||||
|
||||
if (index.max_rows && size() > index.max_rows)
|
||||
{
|
||||
*pos += rows_read;
|
||||
return;
|
||||
}
|
||||
|
||||
Block key_block;
|
||||
for (size_t i = 0; i < index.columns.size(); ++i)
|
||||
{
|
||||
const auto & name = index.columns[i];
|
||||
const auto & type = index.data_types[i];
|
||||
key_block.insert(
|
||||
ColumnWithTypeAndName(
|
||||
new_block.getByName(name).column->cut(*pos, rows_read),
|
||||
type,
|
||||
name));
|
||||
}
|
||||
|
||||
set->insertFromBlock(key_block);
|
||||
|
||||
*pos += rows_read;
|
||||
}
|
||||
|
||||
Block MergeTreeUniqueGranule::getElementsBlock() const
|
||||
{
|
||||
if (index.max_rows && size() > index.max_rows)
|
||||
return index.header;
|
||||
return index.header.cloneWithColumns(set->getSetElements());
|
||||
}
|
||||
|
||||
|
||||
UniqueCondition::UniqueCondition(
|
||||
const SelectQueryInfo & query,
|
||||
const Context & context,
|
||||
const MergeTreeUniqueIndex &index)
|
||||
: IndexCondition(), index(index)
|
||||
{
|
||||
for (size_t i = 0, size = index.columns.size(); i < size; ++i)
|
||||
{
|
||||
std::string name = index.columns[i];
|
||||
if (!key_columns.count(name))
|
||||
key_columns.insert(name);
|
||||
}
|
||||
|
||||
const ASTSelectQuery & select = typeid_cast<const ASTSelectQuery &>(*query.query);
|
||||
|
||||
/// Replace logical functions with bit functions.
|
||||
/// Working with UInt8: last bit = can be true, previous = can be false.
|
||||
ASTPtr new_expression;
|
||||
if (select.where_expression && select.prewhere_expression)
|
||||
new_expression = makeASTFunction(
|
||||
"and",
|
||||
select.where_expression->clone(),
|
||||
select.prewhere_expression->clone());
|
||||
else if (select.where_expression)
|
||||
new_expression = select.where_expression->clone();
|
||||
else if (select.prewhere_expression)
|
||||
new_expression = select.prewhere_expression->clone();
|
||||
else
|
||||
/// 0b11 -- can be true and false at the same time
|
||||
new_expression = std::make_shared<ASTLiteral>(Field(3));
|
||||
|
||||
useless = checkASTAlwaysUnknownOrTrue(new_expression);
|
||||
/// Do not proceed if index is useless for this query.
|
||||
if (useless)
|
||||
return;
|
||||
|
||||
expression_ast = makeASTFunction(
|
||||
"bitAnd",
|
||||
new_expression,
|
||||
std::make_shared<ASTLiteral>(Field(1)));
|
||||
|
||||
traverseAST(expression_ast);
|
||||
|
||||
auto syntax_analyzer_result = SyntaxAnalyzer(context, {}).analyze(
|
||||
expression_ast, index.header.getNamesAndTypesList());
|
||||
actions = ExpressionAnalyzer(expression_ast, syntax_analyzer_result, context).getActions(true);
|
||||
}
|
||||
|
||||
bool UniqueCondition::alwaysUnknownOrTrue() const
|
||||
{
|
||||
return useless;
|
||||
}
|
||||
|
||||
bool UniqueCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const
|
||||
{
|
||||
auto granule = std::dynamic_pointer_cast<MergeTreeUniqueGranule>(idx_granule);
|
||||
if (!granule)
|
||||
throw Exception(
|
||||
"Unique index condition got wrong granule", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
if (useless)
|
||||
return true;
|
||||
|
||||
if (index.max_rows && granule->size() > index.max_rows)
|
||||
return true;
|
||||
|
||||
Block result = granule->getElementsBlock();
|
||||
actions->execute(result);
|
||||
|
||||
|
||||
const auto & column = result.getByName(expression_ast->getColumnName()).column;
|
||||
|
||||
for (size_t i = 0; i < column->size(); ++i)
|
||||
if (column->getBool(i))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
void UniqueCondition::traverseAST(ASTPtr & node) const
|
||||
{
|
||||
if (operatorFromAST(node))
|
||||
{
|
||||
auto * func = typeid_cast<ASTFunction *>(&*node);
|
||||
auto & args = typeid_cast<ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
for (auto & arg : args)
|
||||
traverseAST(arg);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!atomFromAST(node))
|
||||
node = std::make_shared<ASTLiteral>(Field(3)); /// can_be_true=1 can_be_false=1
|
||||
}
|
||||
|
||||
bool UniqueCondition::atomFromAST(ASTPtr & node) const
|
||||
{
|
||||
/// Function, literal or column
|
||||
|
||||
if (typeid_cast<const ASTLiteral *>(node.get()))
|
||||
return true;
|
||||
|
||||
if (const auto * identifier = typeid_cast<const ASTIdentifier *>(node.get()))
|
||||
return key_columns.count(identifier->getColumnName()) != 0;
|
||||
|
||||
if (auto * func = typeid_cast<ASTFunction *>(node.get()))
|
||||
{
|
||||
if (key_columns.count(func->getColumnName()))
|
||||
{
|
||||
/// Function is already calculated.
|
||||
node = std::make_shared<ASTIdentifier>(func->getColumnName());
|
||||
return true;
|
||||
}
|
||||
|
||||
ASTs & args = typeid_cast<ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
for (auto & arg : args)
|
||||
if (!atomFromAST(arg))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
bool UniqueCondition::operatorFromAST(ASTPtr & node) const
|
||||
{
|
||||
/// Functions AND, OR, NOT. Replace with bit*.
|
||||
auto * func = typeid_cast<ASTFunction *>(&*node);
|
||||
if (!func)
|
||||
return false;
|
||||
|
||||
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
if (func->name == "not")
|
||||
{
|
||||
if (args.size() != 1)
|
||||
return false;
|
||||
|
||||
func->name = "__bitSwapLastTwo";
|
||||
}
|
||||
else if (func->name == "and" || func->name == "indexHint")
|
||||
func->name = "bitAnd";
|
||||
else if (func->name == "or")
|
||||
func->name = "bitOr";
|
||||
else
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
bool checkAtomName(const String & name)
|
||||
{
|
||||
static std::set<String> atoms = {
|
||||
"notEquals",
|
||||
"equals",
|
||||
"less",
|
||||
"greater",
|
||||
"lessOrEquals",
|
||||
"greaterOrEquals",
|
||||
"in",
|
||||
"notIn",
|
||||
"like"
|
||||
};
|
||||
return atoms.find(name) != atoms.end();
|
||||
}
|
||||
|
||||
bool UniqueCondition::checkASTAlwaysUnknownOrTrue(const ASTPtr & node, bool atomic) const
|
||||
{
|
||||
if (const auto * func = typeid_cast<const ASTFunction *>(node.get()))
|
||||
{
|
||||
if (key_columns.count(func->getColumnName()))
|
||||
return false;
|
||||
|
||||
const ASTs & args = typeid_cast<const ASTExpressionList &>(*func->arguments).children;
|
||||
|
||||
if (func->name == "and" || func->name == "indexHint")
|
||||
return checkASTAlwaysUnknownOrTrue(args[0], atomic) && checkASTAlwaysUnknownOrTrue(args[1], atomic);
|
||||
else if (func->name == "or")
|
||||
return checkASTAlwaysUnknownOrTrue(args[0], atomic) || checkASTAlwaysUnknownOrTrue(args[1], atomic);
|
||||
else if (func->name == "not")
|
||||
return checkASTAlwaysUnknownOrTrue(args[0], atomic);
|
||||
else if (!atomic && checkAtomName(func->name))
|
||||
return checkASTAlwaysUnknownOrTrue(node, true);
|
||||
else
|
||||
return std::any_of(args.begin(), args.end(),
|
||||
[this, &atomic](const auto & arg) { return checkASTAlwaysUnknownOrTrue(arg, atomic); });
|
||||
}
|
||||
else if (const auto * literal = typeid_cast<const ASTLiteral *>(node.get()))
|
||||
return !atomic && literal->value.get<bool>();
|
||||
else if (const auto * identifier = typeid_cast<const ASTIdentifier *>(node.get()))
|
||||
return key_columns.find(identifier->getColumnName()) == key_columns.end();
|
||||
else
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
MergeTreeIndexGranulePtr MergeTreeUniqueIndex::createIndexGranule() const
|
||||
{
|
||||
return std::make_shared<MergeTreeUniqueGranule>(*this);
|
||||
}
|
||||
|
||||
IndexConditionPtr MergeTreeUniqueIndex::createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const
|
||||
{
|
||||
return std::make_shared<UniqueCondition>(query, context, *this);
|
||||
};
|
||||
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> MergeTreeUniqueIndexCreator(
|
||||
const NamesAndTypesList & new_columns,
|
||||
std::shared_ptr<ASTIndexDeclaration> node,
|
||||
const Context & context)
|
||||
{
|
||||
if (node->name.empty())
|
||||
throw Exception("Index must have unique name", ErrorCodes::INCORRECT_QUERY);
|
||||
|
||||
size_t max_rows = 0;
|
||||
if (node->type->arguments)
|
||||
{
|
||||
if (node->type->arguments->children.size() > 1)
|
||||
throw Exception("Unique index cannot have only 0 or 1 argument", ErrorCodes::INCORRECT_QUERY);
|
||||
else if (node->type->arguments->children.size() == 1)
|
||||
max_rows = typeid_cast<const ASTLiteral &>(
|
||||
*node->type->arguments->children[0]).value.get<size_t>();
|
||||
}
|
||||
|
||||
|
||||
ASTPtr expr_list = MergeTreeData::extractKeyExpressionList(node->expr->clone());
|
||||
auto syntax = SyntaxAnalyzer(context, {}).analyze(
|
||||
expr_list, new_columns);
|
||||
auto unique_expr = ExpressionAnalyzer(expr_list, syntax, context).getActions(false);
|
||||
|
||||
auto sample = ExpressionAnalyzer(expr_list, syntax, context)
|
||||
.getActions(true)->getSampleBlock();
|
||||
|
||||
Block header;
|
||||
|
||||
Names columns;
|
||||
DataTypes data_types;
|
||||
|
||||
for (size_t i = 0; i < expr_list->children.size(); ++i)
|
||||
{
|
||||
const auto & column = sample.getByPosition(i);
|
||||
|
||||
columns.emplace_back(column.name);
|
||||
data_types.emplace_back(column.type);
|
||||
|
||||
header.insert(ColumnWithTypeAndName(column.type->createColumn(), column.type, column.name));
|
||||
}
|
||||
|
||||
return std::make_unique<MergeTreeUniqueIndex>(
|
||||
node->name, std::move(unique_expr), columns, data_types, header, node->granularity.get<size_t>(), max_rows);
|
||||
}
|
||||
|
||||
}
|
93
dbms/src/Storages/MergeTree/MergeTreeUniqueIndex.h
Normal file
93
dbms/src/Storages/MergeTree/MergeTreeUniqueIndex.h
Normal file
@ -0,0 +1,93 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/MergeTree/MergeTreeIndices.h>
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
|
||||
#include <Interpreters/Set.h>
|
||||
|
||||
#include <memory>
|
||||
#include <set>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MergeTreeUniqueIndex;
|
||||
|
||||
struct MergeTreeUniqueGranule : public MergeTreeIndexGranule
|
||||
{
|
||||
explicit MergeTreeUniqueGranule(const MergeTreeUniqueIndex & index);
|
||||
|
||||
void serializeBinary(WriteBuffer & ostr) const override;
|
||||
void deserializeBinary(ReadBuffer & istr) override;
|
||||
|
||||
String toString() const override;
|
||||
size_t size() const { return set->getTotalRowCount(); }
|
||||
bool empty() const override { return !size(); }
|
||||
|
||||
void update(const Block & block, size_t * pos, size_t limit) override;
|
||||
Block getElementsBlock() const;
|
||||
|
||||
~MergeTreeUniqueGranule() override = default;
|
||||
|
||||
const MergeTreeUniqueIndex & index;
|
||||
std::unique_ptr<Set> set;
|
||||
};
|
||||
|
||||
|
||||
class UniqueCondition : public IndexCondition
|
||||
{
|
||||
public:
|
||||
UniqueCondition(
|
||||
const SelectQueryInfo & query,
|
||||
const Context & context,
|
||||
const MergeTreeUniqueIndex & index);
|
||||
|
||||
bool alwaysUnknownOrTrue() const override;
|
||||
|
||||
bool mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const override;
|
||||
|
||||
~UniqueCondition() override = default;
|
||||
private:
|
||||
void traverseAST(ASTPtr & node) const;
|
||||
bool atomFromAST(ASTPtr & node) const;
|
||||
bool operatorFromAST(ASTPtr & node) const;
|
||||
|
||||
bool checkASTAlwaysUnknownOrTrue(const ASTPtr & node, bool atomic = false) const;
|
||||
|
||||
const MergeTreeUniqueIndex & index;
|
||||
|
||||
bool useless;
|
||||
std::set<String> key_columns;
|
||||
ASTPtr expression_ast;
|
||||
ExpressionActionsPtr actions;
|
||||
};
|
||||
|
||||
|
||||
class MergeTreeUniqueIndex : public MergeTreeIndex
|
||||
{
|
||||
public:
|
||||
MergeTreeUniqueIndex(
|
||||
String name_,
|
||||
ExpressionActionsPtr expr_,
|
||||
const Names & columns_,
|
||||
const DataTypes & data_types_,
|
||||
const Block & header_,
|
||||
size_t granularity_,
|
||||
size_t max_rows_)
|
||||
: MergeTreeIndex(std::move(name_), std::move(expr_), columns_, data_types_, header_, granularity_), max_rows(max_rows_) {}
|
||||
|
||||
~MergeTreeUniqueIndex() override = default;
|
||||
|
||||
MergeTreeIndexGranulePtr createIndexGranule() const override;
|
||||
|
||||
IndexConditionPtr createIndexCondition(
|
||||
const SelectQueryInfo & query, const Context & context) const override;
|
||||
|
||||
size_t max_rows = 0;
|
||||
};
|
||||
|
||||
std::unique_ptr<MergeTreeIndex> MergeTreeUniqueIndexCreator(
|
||||
const NamesAndTypesList & columns, std::shared_ptr<ASTIndexDeclaration> node, const Context & context);
|
||||
|
||||
}
|
@ -16,6 +16,7 @@ namespace
|
||||
|
||||
constexpr auto DATA_FILE_EXTENSION = ".bin";
|
||||
constexpr auto MARKS_FILE_EXTENSION = ".mrk";
|
||||
constexpr auto INDEX_FILE_EXTENSION = ".idx";
|
||||
|
||||
}
|
||||
|
||||
@ -325,6 +326,18 @@ void MergedBlockOutputStream::writeSuffixAndFinalizePart(
|
||||
}
|
||||
}
|
||||
|
||||
/// Finish skip index serialization
|
||||
for (size_t i = 0; i < storage.skip_indices.size(); ++i)
|
||||
{
|
||||
auto & stream = *skip_indices_streams[i];
|
||||
if (skip_indices_granules[i] && !skip_indices_granules[i]->empty())
|
||||
{
|
||||
skip_indices_granules[i]->serializeBinary(stream.compressed);
|
||||
skip_indices_granules[i].reset();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
if (!total_column_list)
|
||||
total_column_list = &columns_list;
|
||||
|
||||
@ -342,6 +355,16 @@ void MergedBlockOutputStream::writeSuffixAndFinalizePart(
|
||||
index_stream = nullptr;
|
||||
}
|
||||
|
||||
for (auto & stream : skip_indices_streams)
|
||||
{
|
||||
stream->finalize();
|
||||
stream->addToChecksums(checksums);
|
||||
}
|
||||
|
||||
skip_indices_streams.clear();
|
||||
skip_indices_granules.clear();
|
||||
skip_index_filling.clear();
|
||||
|
||||
for (ColumnStreams::iterator it = column_streams.begin(); it != column_streams.end(); ++it)
|
||||
{
|
||||
it->second->finalize();
|
||||
@ -398,6 +421,21 @@ void MergedBlockOutputStream::init()
|
||||
part_path + "primary.idx", DBMS_DEFAULT_BUFFER_SIZE, O_TRUNC | O_CREAT | O_WRONLY);
|
||||
index_stream = std::make_unique<HashingWriteBuffer>(*index_file_stream);
|
||||
}
|
||||
|
||||
for (const auto & index : storage.skip_indices)
|
||||
{
|
||||
String stream_name = index->getFileName();
|
||||
skip_indices_streams.emplace_back(
|
||||
std::make_unique<ColumnStream>(
|
||||
stream_name,
|
||||
part_path + stream_name, INDEX_FILE_EXTENSION,
|
||||
part_path + stream_name, MARKS_FILE_EXTENSION,
|
||||
codec, max_compress_block_size,
|
||||
0, aio_threshold));
|
||||
|
||||
skip_indices_granules.emplace_back(nullptr);
|
||||
skip_index_filling.push_back(0);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -410,6 +448,9 @@ void MergedBlockOutputStream::writeImpl(const Block & block, const IColumn::Perm
|
||||
WrittenOffsetColumns offset_columns;
|
||||
|
||||
auto primary_key_column_names = storage.primary_key_columns;
|
||||
Names skip_indexes_column_names;
|
||||
for (const auto & index : storage.skip_indices)
|
||||
std::copy(index->columns.cbegin(), index->columns.cend(), std::back_inserter(skip_indexes_column_names));
|
||||
|
||||
/// Here we will add the columns related to the Primary Key, then write the index.
|
||||
std::vector<ColumnWithTypeAndName> primary_key_columns(primary_key_column_names.size());
|
||||
@ -429,6 +470,21 @@ void MergedBlockOutputStream::writeImpl(const Block & block, const IColumn::Perm
|
||||
primary_key_columns[i].column = primary_key_columns[i].column->permute(*permutation, 0);
|
||||
}
|
||||
|
||||
/// The same for skip indexes columns
|
||||
std::vector<ColumnWithTypeAndName> skip_indexes_columns(skip_indexes_column_names.size());
|
||||
std::map<String, size_t> skip_indexes_column_name_to_position;
|
||||
|
||||
for (size_t i = 0, size = skip_indexes_column_names.size(); i < size; ++i)
|
||||
{
|
||||
const auto & name = skip_indexes_column_names[i];
|
||||
skip_indexes_column_name_to_position.emplace(name, i);
|
||||
skip_indexes_columns[i] = block.getByName(name);
|
||||
|
||||
/// Reorder index columns in advance.
|
||||
if (permutation)
|
||||
skip_indexes_columns[i].column = skip_indexes_columns[i].column->permute(*permutation, 0);
|
||||
}
|
||||
|
||||
if (index_columns.empty())
|
||||
{
|
||||
index_columns.resize(primary_key_column_names.size());
|
||||
@ -459,11 +515,17 @@ void MergedBlockOutputStream::writeImpl(const Block & block, const IColumn::Perm
|
||||
if (permutation)
|
||||
{
|
||||
auto primary_column_it = primary_key_column_name_to_position.find(it->name);
|
||||
auto skip_index_column_it = skip_indexes_column_name_to_position.find(it->name);
|
||||
if (primary_key_column_name_to_position.end() != primary_column_it)
|
||||
{
|
||||
auto & primary_column = *primary_key_columns[primary_column_it->second].column;
|
||||
const auto & primary_column = *primary_key_columns[primary_column_it->second].column;
|
||||
writeData(column.name, *column.type, primary_column, offset_columns, false, serialization_states[i]);
|
||||
}
|
||||
else if (skip_indexes_column_name_to_position.end() != skip_index_column_it)
|
||||
{
|
||||
const auto & index_column = *skip_indexes_columns[skip_index_column_it->second].column;
|
||||
writeData(column.name, *column.type, index_column, offset_columns, false, serialization_states[i]);
|
||||
}
|
||||
else
|
||||
{
|
||||
/// We rearrange the columns that are not included in the primary key here; Then the result is released - to save RAM.
|
||||
@ -479,6 +541,57 @@ void MergedBlockOutputStream::writeImpl(const Block & block, const IColumn::Perm
|
||||
|
||||
rows_count += rows;
|
||||
|
||||
{
|
||||
/// Filling and writing skip indices like in IMergedBlockOutputStream::writeData
|
||||
for (size_t i = 0; i < storage.skip_indices.size(); ++i)
|
||||
{
|
||||
const auto index = storage.skip_indices[i];
|
||||
auto & stream = *skip_indices_streams[i];
|
||||
size_t prev_pos = 0;
|
||||
|
||||
while (prev_pos < rows)
|
||||
{
|
||||
size_t limit = 0;
|
||||
if (prev_pos == 0 && index_offset != 0)
|
||||
{
|
||||
limit = index_offset;
|
||||
}
|
||||
else
|
||||
{
|
||||
limit = storage.index_granularity;
|
||||
if (!skip_indices_granules[i])
|
||||
{
|
||||
skip_indices_granules[i] = index->createIndexGranule();
|
||||
skip_index_filling[i] = 0;
|
||||
|
||||
if (stream.compressed.offset() >= min_compress_block_size)
|
||||
stream.compressed.next();
|
||||
|
||||
writeIntBinary(stream.plain_hashing.count(), stream.marks);
|
||||
writeIntBinary(stream.compressed.offset(), stream.marks);
|
||||
}
|
||||
}
|
||||
|
||||
size_t pos = prev_pos;
|
||||
skip_indices_granules[i]->update(block, &pos, limit);
|
||||
|
||||
if (pos == prev_pos + limit)
|
||||
{
|
||||
++skip_index_filling[i];
|
||||
|
||||
/// write index if it is filled
|
||||
if (skip_index_filling[i] == index->granularity)
|
||||
{
|
||||
skip_indices_granules[i]->serializeBinary(stream.compressed);
|
||||
skip_indices_granules[i].reset();
|
||||
skip_index_filling[i] = 0;
|
||||
}
|
||||
}
|
||||
prev_pos = pos;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
/** While filling index (index_columns), disable memory tracker.
|
||||
* Because memory is allocated here (maybe in context of INSERT query),
|
||||
|
@ -149,6 +149,10 @@ private:
|
||||
std::unique_ptr<WriteBufferFromFile> index_file_stream;
|
||||
std::unique_ptr<HashingWriteBuffer> index_stream;
|
||||
MutableColumns index_columns;
|
||||
|
||||
std::vector<std::unique_ptr<ColumnStream>> skip_indices_streams;
|
||||
MergeTreeIndexGranules skip_indices_granules;
|
||||
std::vector<size_t> skip_index_filling;
|
||||
};
|
||||
|
||||
|
||||
|
@ -145,14 +145,14 @@ void ReplicatedMergeTreeAlterThread::run()
|
||||
parts = storage.data.getDataParts();
|
||||
|
||||
const auto columns_for_parts = storage.getColumns().getAllPhysical();
|
||||
const auto indices_for_parts = storage.getIndicesDescription();
|
||||
|
||||
for (const MergeTreeData::DataPartPtr & part : parts)
|
||||
{
|
||||
/// Update the part and write result to temporary files.
|
||||
/// TODO: You can skip checking for too large changes if ZooKeeper has, for example,
|
||||
/// node /flags/force_alter.
|
||||
auto transaction = storage.data.alterDataPart(part, columns_for_parts, false);
|
||||
|
||||
auto transaction = storage.data.alterDataPart(part, columns_for_parts, indices_for_parts.indices, false);
|
||||
if (!transaction)
|
||||
continue;
|
||||
|
||||
|
@ -238,6 +238,7 @@ void ReplicatedMergeTreePartCheckThread::checkPart(const String & part_name)
|
||||
storage.data.index_granularity,
|
||||
true,
|
||||
storage.data.primary_key_data_types,
|
||||
storage.data.skip_indices,
|
||||
[this] { return need_stop.load(); });
|
||||
|
||||
if (need_stop)
|
||||
|
@ -44,6 +44,8 @@ ReplicatedMergeTreeTableMetadata::ReplicatedMergeTreeTableMetadata(const MergeTr
|
||||
|
||||
if (data.format_version >= MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
|
||||
partition_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.partition_by_ast));
|
||||
|
||||
skip_indices = data.getIndicesDescription().toString();
|
||||
}
|
||||
|
||||
void ReplicatedMergeTreeTableMetadata::write(WriteBuffer & out) const
|
||||
@ -64,6 +66,9 @@ void ReplicatedMergeTreeTableMetadata::write(WriteBuffer & out) const
|
||||
|
||||
if (!sorting_key.empty())
|
||||
out << "sorting key: " << sorting_key << "\n";
|
||||
|
||||
if (!skip_indices.empty())
|
||||
out << "indices: " << skip_indices << "\n";
|
||||
}
|
||||
|
||||
String ReplicatedMergeTreeTableMetadata::toString() const
|
||||
@ -93,6 +98,9 @@ void ReplicatedMergeTreeTableMetadata::read(ReadBuffer & in)
|
||||
|
||||
if (checkString("sorting key: ", in))
|
||||
in >> sorting_key >> "\n";
|
||||
|
||||
if (checkString("indices: ", in))
|
||||
in >> skip_indices >> "\n";
|
||||
}
|
||||
|
||||
ReplicatedMergeTreeTableMetadata ReplicatedMergeTreeTableMetadata::parse(const String & s)
|
||||
@ -175,6 +183,21 @@ ReplicatedMergeTreeTableMetadata::checkAndFindDiff(const ReplicatedMergeTreeTabl
|
||||
ErrorCodes::METADATA_MISMATCH);
|
||||
}
|
||||
|
||||
if (skip_indices != from_zk.skip_indices)
|
||||
{
|
||||
if (allow_alter)
|
||||
{
|
||||
diff.skip_indices_changed = true;
|
||||
diff.new_skip_indices = from_zk.skip_indices;
|
||||
}
|
||||
else
|
||||
throw Exception(
|
||||
"Existing table metadata in ZooKeeper differs in skip indexes."
|
||||
" Stored in ZooKeeper: " + from_zk.skip_indices +
|
||||
", local: " + skip_indices,
|
||||
ErrorCodes::METADATA_MISMATCH);
|
||||
}
|
||||
|
||||
return diff;
|
||||
}
|
||||
|
||||
|
@ -25,6 +25,7 @@ struct ReplicatedMergeTreeTableMetadata
|
||||
MergeTreeDataFormatVersion data_format_version;
|
||||
String partition_key;
|
||||
String sorting_key;
|
||||
String skip_indices;
|
||||
|
||||
ReplicatedMergeTreeTableMetadata() = default;
|
||||
explicit ReplicatedMergeTreeTableMetadata(const MergeTreeData & data);
|
||||
@ -40,7 +41,10 @@ struct ReplicatedMergeTreeTableMetadata
|
||||
bool sorting_key_changed = false;
|
||||
String new_sorting_key;
|
||||
|
||||
bool empty() const { return !sorting_key_changed; }
|
||||
bool skip_indices_changed = false;
|
||||
String new_skip_indices;
|
||||
|
||||
bool empty() const { return !sorting_key_changed && !skip_indices_changed; }
|
||||
};
|
||||
|
||||
Diff checkAndFindDiff(const ReplicatedMergeTreeTableMetadata & from_zk, bool allow_alter) const;
|
||||
|
@ -30,12 +30,13 @@ namespace ErrorCodes
|
||||
namespace
|
||||
{
|
||||
|
||||
/** To read and checksum single stream (a pair of .bin, .mrk files) for a single column.
|
||||
/** To read and checksum single stream (a pair of .bin, .mrk files) for a single column or secondary index.
|
||||
*/
|
||||
class Stream
|
||||
{
|
||||
public:
|
||||
String base_name;
|
||||
String bin_file_ext;
|
||||
String bin_file_path;
|
||||
String mrk_file_path;
|
||||
private:
|
||||
@ -50,10 +51,11 @@ private:
|
||||
public:
|
||||
HashingReadBuffer mrk_hashing_buf;
|
||||
|
||||
Stream(const String & path, const String & base_name)
|
||||
Stream(const String & path, const String & base_name, const String & bin_file_ext = ".bin")
|
||||
:
|
||||
base_name(base_name),
|
||||
bin_file_path(path + base_name + ".bin"),
|
||||
bin_file_ext(bin_file_ext),
|
||||
bin_file_path(path + base_name + bin_file_ext),
|
||||
mrk_file_path(path + base_name + ".mrk"),
|
||||
file_buf(bin_file_path),
|
||||
compressed_hashing_buf(file_buf),
|
||||
@ -118,7 +120,7 @@ public:
|
||||
|
||||
void saveChecksums(MergeTreeData::DataPart::Checksums & checksums)
|
||||
{
|
||||
checksums.files[base_name + ".bin"] = MergeTreeData::DataPart::Checksums::Checksum(
|
||||
checksums.files[base_name + bin_file_ext] = MergeTreeData::DataPart::Checksums::Checksum(
|
||||
compressed_hashing_buf.count(), compressed_hashing_buf.getHash(),
|
||||
uncompressed_hashing_buf.count(), uncompressed_hashing_buf.getHash());
|
||||
|
||||
@ -135,6 +137,7 @@ MergeTreeData::DataPart::Checksums checkDataPart(
|
||||
size_t index_granularity,
|
||||
bool require_checksums,
|
||||
const DataTypes & primary_key_data_types,
|
||||
const MergeTreeIndices & indices,
|
||||
std::function<bool()> is_cancelled)
|
||||
{
|
||||
Logger * log = &Logger::get("checkDataPart");
|
||||
@ -239,6 +242,48 @@ MergeTreeData::DataPart::Checksums checkDataPart(
|
||||
rows = count;
|
||||
}
|
||||
|
||||
/// Read and check skip indices.
|
||||
for (const auto & index : indices)
|
||||
{
|
||||
Stream stream(path, index->getFileName(), ".idx");
|
||||
size_t mark_num = 0;
|
||||
|
||||
while (!stream.uncompressed_hashing_buf.eof())
|
||||
{
|
||||
if (stream.mrk_hashing_buf.eof())
|
||||
throw Exception("Unexpected end of mrk file while reading index " + index->name,
|
||||
ErrorCodes::CORRUPTED_DATA);
|
||||
try
|
||||
{
|
||||
stream.assertMark();
|
||||
}
|
||||
catch (Exception &e)
|
||||
{
|
||||
e.addMessage("Cannot read mark " + toString(mark_num)
|
||||
+ " in file " + stream.mrk_file_path
|
||||
+ ", mrk file offset: " + toString(stream.mrk_hashing_buf.count()));
|
||||
throw;
|
||||
}
|
||||
try
|
||||
{
|
||||
index->createIndexGranule()->deserializeBinary(stream.uncompressed_hashing_buf);
|
||||
}
|
||||
catch (Exception &e)
|
||||
{
|
||||
e.addMessage("Cannot read granule " + toString(mark_num)
|
||||
+ " in file " + stream.bin_file_path
|
||||
+ ", mrk file offset: " + toString(stream.mrk_hashing_buf.count()));
|
||||
throw;
|
||||
}
|
||||
++mark_num;
|
||||
if (is_cancelled())
|
||||
return {};
|
||||
}
|
||||
|
||||
stream.assertEnd();
|
||||
stream.saveChecksums(checksums_data);
|
||||
}
|
||||
|
||||
/// Read all columns, calculate checksums and validate marks.
|
||||
for (const NameAndTypePair & name_type : columns)
|
||||
{
|
||||
|
@ -17,6 +17,7 @@ MergeTreeData::DataPart::Checksums checkDataPart(
|
||||
size_t index_granularity,
|
||||
bool require_checksums,
|
||||
const DataTypes & primary_key_data_types, /// Check the primary key. If it is not necessary, pass an empty array.
|
||||
const MergeTreeIndices & indices = {}, /// Check skip indices
|
||||
std::function<bool()> is_cancelled = []{ return false; });
|
||||
|
||||
}
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user